IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Find one path between two functioins

    一根稻草发表于 2015-10-27 00:00:00
    love 0

    有许多优秀的源代码分析工具,如cscope, ctags和SI等,利用这些工具可以在源码中穿梭,可以查找符号,搜索函数定义,搜索被哪些函数调用,查找字符串等等,但是如果你知道两个函数,怎么找到它们之间的调用路径,CodeViz可以生成函数调用图,但是CodeViz安装复杂,而且需要用修改过的gcc编译源码,虽然能输出可视化的图片,但是调用分枝太庞大,包含很多无用信息。

    下面从分析cscope.out文件格式,构造函数调用关系图,DFS搜索两个函数之间的调用路径,针对Linux这样的大型项目,首先找出调用频率TOP 1000的函数,不作考虑,以提升搜索速度。

    cscope 的四个参数

      -b       Build the database only.
      -c       Use  only  ASCII characters in the database file, that is, do not compress the data.
      -k       Kernel  Mode  turns  off  the  use of the default include dir (usually /usr/include) when  building the database, since kernel source trees generally do not use it.
      -R      Recurse subdirectories during search  for  source files.
    

    cscope.out fromat

        A mark is a tab followed by one of these characters:
    
        Char   Meaning
    
        @      file
        $      function definition
        `      function call
    

    reference http://goo.gl/R2ErDs

    运行gengraph.sh生成 graph.out文件

    #!/bin/bash
    rm cscope.out
    cscope -b -c -k -R
    grep -e "[\$\`]" cscope.out > graph.out
    sed -i 's/^[[:space:]]*//' graph.out
    

    graph.out 文件的格式

    $函数名(定义,不是声明,不是被调用)
    `函数名(被调用)
    

    例如Linux内核生成的graph.out

        $vfs_read
        `rw_verify_area
        `do_sync_read
        `fsnotify_access
        `add_rchar
        `inc_syscr
        $do_sync_write
        `init_sync_kiocb
        `aio_write
        `wait_on_sync_kiocb
    

    在图中,$后的函数是图的顶点,"`" 是前一个顶点的邻接顶点。

    写完第一个版本,搜索小项目毫无压力,即使用递归的DFS也不会导致函数栈崩溃。但是像Linux这样的工程,由于没有压缩,cscope.out 有347M(linux3.12),graph.out也有36M,分析下发现代码中有很多printk,memcpy等函数调用,可认为树的叶节点,有什么方法找出所有的叶节点? 统计!找出调用频率top 100(最后还是低估Linux了,top 5000中每个函数调用次数都在50以上)。 排除top 100后,图依然很大,先工作起来吧。

    xroute.py

        #!/usr/bin/python
        '''
        find one path between two functions.
        cannot process function pointer.
        '''
    
        import sys
        import os
        import operator
    
        graph = "graph.out"
        graph_lite = "graph-lite.out"
        data = []
        route = []
        visited = []
    
        def reverse_output_route(call):
            if call[1] >=0 :
                reverse_output_route(route[call[1]])
            print "\t->", call[0]
    
        def non_recursive_dfs(f1, f2):
            global route 
            global visited 
            route = []
            visited = []
            stack = [(f1, -1)]
            while len(stack) > 0:
                call = stack.pop()
                if call[0] == f2:
                    reverse_output_route(call)
                    return True
                parent = len(route) 
                route.append(call)
                visited.append(call[0])
                try:
                    no = data.index("$" + call[0]) + 1
                    dedup = set()
                    for i in range(no, len(data)):
                        v = data[i]
                        if v[0]!='`':
                            break
                        v = v[1:]
                        if v not in visited and v not in dedup:
                            dedup.add(v)
                            stack.append((v, parent))
                except ValueError:
                    pass
    
        def top(n):
            f = open(graph, "r")
            data = f.read().splitlines()
            f.close()
    
            stat = dict()
    
            for name in data:
                k = name[1:]
                if k in stat:
                    stat[k] += 1
                else:
                    stat[k] = 1
    
            sorted_stat = sorted(stat.items(), key=operator.itemgetter(1), reverse=True)
    
            topdata = []
            i = 1 
            for k in sorted_stat:
                topdata.append(k[0])
                i+=1
                if i > n:
                    break
            return topdata
    
        def readdata():
            global data
            if os.path.isfile(graph_lite):
                fd = open(graph_lite, "r")
                data = fd.read().splitlines()
                fd.close()
                return
    
            fd = open(graph, "r")
            temp = fd.read().splitlines()
            fd.close()
    
            topdata = top(5000)
            i=1
            sz = len(temp)
            while i < sz:
                if temp[i][1:] not in topdata:
                    data.append(temp[i])
                    i += 1
                    continue
                if temp[i][0] == '`':
                    i += 1
                else:
                    i += 1
                    while i < sz:
                        if temp[i][0] == '$':
                            break;
                        i += 1
            fd = open(graph_lite, "w")
            fd.write("\n".join(data))
            fd.close()
    
        def main(f1, f2):
            readdata()
            #if not non_recursive_dfs(f1, f2) and not non_recursive_dfs(f2, f1):
            if not non_recursive_dfs(f1, f2):
                print "\nthere is no path between %s and %s\n" %(f1, f2)
    
        def usage(pname):
            print "\n", pname, ": output the path between two functions\n"
            print "usage: ./%s [function a] [function b]\n" %(pname)
            sys.exit(-1)
    
        if __name__ == '__main__':
            if len(sys.argv) < 3:
                usage(sys.argv[0])
            main(sys.argv[1], sys.argv[2])
    

    示例

        root@ubuntu:~/linux-3.12# time ./xroute.py do_sys_open security_file_alloc
            -> do_sys_open
            -> do_filp_open
            -> path_openat
            -> get_empty_filp
            -> security_file_alloc
    
        real    0m25.051s
        user    0m24.799s
        sys 0m0.260s
    

    缺点是比较明显,速度慢,不能处理函数指针。

    顺便列出Linux3.12 中top 10的函数

    root@ubuntu:~# ./stats.py stats.out 10
    rank    function                        frequency
    1   printk                          44492
    2   kfree                           24554
    3   dev_err                         20570
    4   memcpy                          18013
    5   spin_unlock_irqrestore          16125
    6   mutex_unlock                    15917
    7   spin_lock_irqsave               13696
    8   memset                          13518
    9   dev_dbg                         12451
    10  BIT                             12117
    


沪ICP备19023445号-2号
友情链接