0%

c++ profile的大杀器-gperftools的使用

熟悉golang的同学,一定很熟悉用pprof来作为性能分析和可视化的工具,包括 cpu profile, memery profile等。这么方便且炫的功能,在C++里也一样能实现。所需要的工具就是gperftools。

安装

安装libunwind

64位操作系统需要安装libunwind,gperftools推荐版本是libunwind-0.99-beta,详见gperftools/INSTALL里的说明。

1
2
3
4
5
6
wget http://download.savannah.gnu.org/releases/libunwind/libunwind-0.99-beta.tar.gz
tar -zxvf libunwind-0.99-beta.tar.gz
cd libunwind-0.99-beta/
./configure
make
make install

因为默认的libunwind安装在/usr/local/lib目录下,需要将这个目录添加到系统动态库缓存中。

1
2
echo "/usr/local/lib" > /etc/ld.so.conf.d/usr_local_lib.conf
/sbin/ldconfig

安装graphviz

Graphviz是一个由AT&T实验室启动的开源工具包,用于绘制DOT语言脚本描述的图形,gperftools依靠此工具生成图形分析结果。
安装命令:yum install graphviz

生成图像时依赖ps2pdf
安装命令:yum -y install ghostscript

安装perftools

1
2
3
4
5
6
git clone https://github.com/gperftools/gperftools.git
cd gperftools/
./autogen.sh
./configure
make
make install

遇到的问题1:

1
2
3
4
5
[root@10-8-152-53 gperftools]# ./autogen.sh
configure.ac:174: error: possibly undefined macro: AC_PROG_LIBTOOL
     If this token and others are legitimate, please use m4_pattern_allow.
     See the Autoconf documentation.
     autoreconf: /usr/bin/autoconf failed with exit status:

解决方法:yum -y install libtool

遇到的问题2:

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@hb06-ufile-132-199 gperftools]# ./autogen.sh
libtoolize: putting macros in AC_CONFIG_MACRO_DIR, `m4'.
libtoolize: copying file `m4/libtool.m4'
libtoolize: copying file `m4/ltoptions.m4'
libtoolize: copying file `m4/ltsugar.m4'
libtoolize: copying file `m4/ltversion.m4'
libtoolize: copying file `m4/lt~obsolete.m4'
configure.ac:159: installing './compile'
configure.ac:22: installing './config.guess'
configure.ac:22: installing './config.sub'
configure.ac:23: installing './install-sh'
configure.ac:174: error: required file './ltmain.sh' not found
configure.ac:23: installing './missing'

解决方法:aclocal && autoheader && autoconf && automake –add-missing
参考了https://stackoverflow.com/questions/22603163/automake-error-ltmain-sh-not-found

使用举例

一共有5种使用方式:

  • TC Malloc

    1
    gcc [...] -ltcmalloc
  • Heap Checker

    1
    2
    gcc [...] -o myprogram -ltcmalloc
    HEAPCHECK=normal ./myprogram
  • Heap Profiler

    1
    2
    gcc [...] -o myprogram -ltcmalloc
    HEAPPROFILE=/tmp/netheap ./myprogram
  • Cpu Profiler

    1
    2
    gcc [...] -o myprogram -lprofiler
    CPUPROFILE=/tmp/profile ./myprogram
  • pprof and Remote Servers

接下来以Cpu Profiler来详细说明使用方式。

Cpu Profiler

修改启动方式的运行

示例代码test.cpp如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <gperftools/profiler.h>
#include <iostream>
using namespace std;
void func1() {
int i = 0;
while (i < 100000) {
++i;
}
}
void func2() {
int i = 0;
while (i < 200000) {
++i;
}
}
void func3() {
for (int i = 0; i < 1000; ++i) {
func1();
func2();
}
}
int main(){
func3();
return 0;
}
1
2
3
[root@10-8-152-53 cpuprofilertest]# g++ test.cpp -lprofiler
[root@10-8-152-53 cpuprofilertest]# CPUPROFILE=./test.prof ./a.out
PROFILE: interrupts/evictions/bytes = 52/4/512

运行后会生成test.prof文件,然后用pprof就可以生成text的分析报告,具体如下:

1
2
3
4
5
6
7
8
9
10
[root@10-8-152-53 cpuprofilertest]# pprof --text a.out test.prof
Using local file a.out.
Using local file test.prof.
Total: 52 samples
40 76.9% 76.9% 40 76.9% func2
12 23.1% 100.0% 12 23.1% func1
0 0.0% 100.0% 52 100.0% __libc_start_main
0 0.0% 100.0% 52 100.0% _start
0 0.0% 100.0% 52 100.0% func3
0 0.0% 100.0% 52 100.0% main

输出数据解析:每行包含6列数据,依次为:

  1. 分析样本数量(不包含其他函数调用)
  2. 分析样本百分比(不包含其他函数调用)
  3. 目前为止的分析样本百分比(不包含其他函数调用)
  4. 分析样本数量(包含其他函数调用)
  5. 分析样本百分比(包含其他函数调用)
  6. 函数名

运行命令生成函数调用树形式的pdf分析报告:

1
pprof --pdf a.out test.prof > test.pdf

树上的每个节点代表一个函数,节点数据格式:

  1. 函数名 或者 类名+方法名
  2. 不包含内部函数调用的样本数 (百分比)
  3. 包含内部函数调用的样本数 (百分比) #如果没有内部调用函数则这一项数据不显示

不修改启动方式,但修改代码方式的运行

运行一段时间会正常退出的程序的性能分析

这种情况,我们可以直接在代码中插入性能分析函数。示例代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <gperftools/profiler.h>
#include <iostream>
using namespace std;
void func1() {
int i = 0;
while (i < 100000) {
++i;
}
}
void func2() {
int i = 0;
while (i < 200000) {
++i;
}
}
void func3() {
for (int i = 0; i < 1000; ++i) {
func1();
func2();
}
}
int main(){
ProfilerStart("test.prof"); // 指定所生成的profile文件名
func3();
ProfilerStop(); // 结束profiling
return 0;
}

编译运行,注意编译时需要连接tcmalloc和profiler库。运行后会生成test.prof文件,然后用pprof就可以生成text的分析报告,具体如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@10-8-152-53 cpuprofilertest]# g++ not_run_always.cpp -lprofiler -ltcmalloc
[root@10-8-152-53 cpuprofilertest]# ./a.out
PROFILE: interrupts/evictions/bytes = 52/7/680
[root@10-8-152-53 cpuprofilertest]# pprof --text a.out test.prof
Using local file a.out.
Using local file test.prof.
Total: 52 samples
32 61.5% 61.5% 32 61.5% func2
20 38.5% 100.0% 20 38.5% func1
0 0.0% 100.0% 52 100.0% __libc_start_main
0 0.0% 100.0% 52 100.0% _start
0 0.0% 100.0% 52 100.0% func3
0 0.0% 100.0% 52 100.0% main

一直运行的程序的性能分析

一直运行的程序由于不能正常退出,所以不能采用上面的方法。我们可以用信号量来开启/关闭性能分析,具体代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <gperftools/profiler.h>
#include <stdio.h>
#include <signal.h>
#include <unistd.h>

void gprofStartAndStop(int signum) {
static int isStarted = 0;
if (signum != SIGUSR1) return;

//通过isStarted标记未来控制第一次收到信号量开启性能分析,第二次收到关闭性能分析。
if (!isStarted){
isStarted = 1;
ProfilerStart("test.prof");
printf("ProfilerStart success\n");
}else{
ProfilerStop();
printf("ProfilerStop success\n");
}
}

void func1() {
int i = 0;
while (i < 100000) {
++i;
}
}
void func2() {
int i = 0;
while (i < 200000) {
++i;
}
}
void func3() {
for (int i = 0; i < 1000; ++i) {
func1();
func2();
}
}
int main(){
signal(SIGUSR1, gprofStartAndStop);

while(1){
printf("call f\n");
func3();
sleep(1);//为了防止死循环,导致信号处理函数得不到调度
}
return 0;
}

编译运行如下:

1
2
3
4
5
6
[root@10-8-152-53 cpuprofilertest]# g++ run_always.cpp -lprofiler -ltcmalloc
[root@10-8-152-53 cpuprofilertest]# ./a.out
call f
call f
...

通过kill命令发送信号给进程来开启/关闭性能分析:
用top命令查看进程的PID
kill -s SIGUSR1 PID //第一次运行命令启动性能分析
kill -s SIGUSR1 PID //再次运行命令关闭性能分析,产生test.prof

后续查看分析报告和之前一样。
这种方式适合灵活关闭profile,不用重启启动服务,适合在线上临时查看。

Heap Profiler

示例1 - 修改启动方式

示例代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <gperftools/profiler.h>
#include <iostream>
using namespace std;

void f1()
{
int i;
for (i=0; i<1024*1024; ++i)
{
int* p2 = new int;
//delete[] p2;
}
}

void f2()
{
int i;
for (i=0; i<1024*1024; ++i)
{
int* p2 = new int;
//delete[] p2;
}
}

int main(){
f1();
f2();
return 0;
}

编译运行如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[root@10-8-152-53 heapprofilertest]# g++ test.cpp -ltcmalloc
[root@10-8-152-53 heapprofilertest]# env HEAPPROFILE=./test_heap.prof ./a.out
Starting tracking the heap
Dumping heap profile to ./test_heap.prof.0001.heap (Exiting, 8 MB in use)
[root@10-8-152-53 heapprofilertest]# pprof --text ./a.out ./test_heap.prof.0001.heap
Using local file ./a.out.
Using local file ./test_heap.prof.0001.heap.
Total: 8.0 MB
4.0 50.0% 50.0% 4.0 50.0% f1
4.0 50.0% 100.0% 4.0 50.0% f2
0.0 0.0% 100.0% 8.0 100.0% __libc_start_main
0.0 0.0% 100.0% 8.0 100.0% _start
0.0 0.0% 100.0% 8.0 100.0% main
[root@10-8-152-53 heapprofilertest]# pprof --pdf ./a.out ./test_heap.prof.0001.heap > ./test_heap.pdf
Using local file ./a.out.
Using local file ./test_heap.prof.0001.heap.
Dropping nodes with <= 0.0 MB; edges with <= 0.0 abs(MB)
[root@10-8-152-53 heapprofilertest]# ls
a.out test.cpp test_heap.pdf test_heap.prof.0001.heap

在生成heap的过程中,还有另外的一些属性可以设置[2]:

HEAP_PROFILE_ALLOCATION_INTERVAL default: 1073741824 (1 Gb) Dump heap profiling information each time the specified number of bytes has been allocated by the program.
HEAP_PROFILE_INUSE_INTERVAL default: 104857600 (100 Mb) Dump heap profiling information whenever the high-water memory usage mark increases by the specified number of bytes.
HEAP_PROFILE_TIME_INTERVAL default: 0 Dump heap profiling information each time the specified number of seconds has elapsed.
HEAPPROFILESIGNAL default: disabled Dump heap profiling information whenever the specified signal is sent to the process.
HEAP_PROFILE_MMAP default: false Profile mmap, mremap and sbrk calls in addition to malloc, calloc, realloc, and new. NOTE: this causes the profiler to profile calls internal to tcmalloc, since tcmalloc and friends use mmap and sbrk internally for allocations. One partial solution is to filter these allocations out when running pprof, with something like `pprof –ignore=’DoAllocWithArena
HEAP_PROFILE_ONLY_MMAP default: false Only profile mmap, mremap, and sbrk calls; do not profile malloc, calloc, realloc, or new.
HEAP_PROFILE_MMAP_LOG default: false Log mmap/munmap calls.

线上例子-修改启动方式

编译

加上-ltcmalloc
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -g3 -ggdb3 -O3 -mavx2 -Wall -DDEBUG_RING -ltcmalloc")

启动命令

1
env HEAPPROFILE=/tmp/broker_1003_xxx_heap.prof HEAP_PROFILE_ALLOCATION_INTERVAL=107374182400 HEAP_PROFILE_INUSE_INTERVAL=1073741824000 /root/ufile/UFileBroker-set1003/UFileBroker-set1003 -c /root/ufile/UFileBroker-set1003/config-set1003.ini
  • HEAP_PROFILE_ALLOCATION_INTERVAL=107374182400 每次分配了100GB内存,进行dump
  • HEAP_PROFILE_INUSE_INTERVAL=1073741824000 每次内存的最高使用量超过1000GB,进行dump,看下面日志没有达到触发条件。
1
2
3
4
5
6
7
8
9
Dumping heap profile to /tmp/broker_1003_xxx_heap.prof.0001.heap (10240 MB allocated cumulatively, 1118 MB currently in use)
Dumping heap profile to /tmp/broker_1003_xxx_heap.prof.0002.heap (20480 MB allocated cumulatively, 2005 MB currently in use)
Dumping heap profile to /tmp/broker_1003_xxx_heap.prof.0003.heap (30720 MB allocated cumulatively, 2383 MB currently in use)
Dumping heap profile to /tmp/broker_1003_xxx_heap.prof.0004.heap (40960 MB allocated cumulatively, 2482 MB currently in use)
......
Dumping heap profile to /tmp/broker_1003_xxx_heap.prof.0126.heap (1290366 MB allocated cumulatively, 2876 MB currently in use)
Dumping heap profile to /tmp/broker_1003_xxx_heap.prof.0127.heap (1300606 MB allocated cumulatively, 2871 MB currently in use)
Dumping heap profile to /tmp/broker_1003_xxx_heap.prof.0128.heap (1310849 MB allocated cumulatively, 2870 MB currently in use)
Dumping heap profile to /tmp/broker_1003_xxx_heap.prof.0129.heap (1321090 MB allocated cumulatively, 2878 MB currently in use)

分析heap

可以看到95%的heap都是从同一个地方分配出来的,是需要优化的方向。

参考链接

  1. https://github.com/gperftools/gperftools/wiki
  2. https://gperftools.github.io/gperftools/heapprofile.html
  3. https://github.com/gperftools/gperftools/wiki
  4. gperftools对程序进行分析