0%

通过delve(dlv)调试golang程序

介绍

gdb是强大的调试工具。但对于golang程序来说,delve是个更好的选择,它能更好地理解Go runtime, data structures, and expressions,尤其是goroutine。

以下描述引用自Debugging Go Code with GDB【1】

Note that Delve is a better alternative to GDB when debugging Go programs built with the standard toolchain. It understands the Go runtime, data structures, and expressions better than GDB. Delve currently supports Linux, OSX, and Windows on amd64. For the most up-to-date list of supported platforms, please see the Delve documentation.

GDB does not understand Go programs well. The stack management, threading, and runtime contain aspects that differ enough from the execution model GDB expects that they can confuse the debugger and cause incorrect results even when the program is compiled with gccgo. As a consequence, although GDB can be useful in some situations (e.g., debugging Cgo code, or debugging the runtime itself), it is not a reliable debugger for Go programs, particularly heavily concurrent ones. Moreover, it is not a priority for the Go project to address these issues, which are difficult.

Delve is a debugger for the Go programming language【2】.

安装

安装过程参考https://github.com/go-delve/delve/tree/master/Documentation/installation

安装好后,可以查看版本

1
2
3
4
➜  _posts dlv version
Delve Debugger
Version: 1.1.0
Build: $Id: 1990ba12450cab9425a2ae62e6ab988725023d5c

调试方法

dlv提供了多种调试命令,如debug、exec、attach、core等,具体见官网文档或help信息。接下介绍常用的几种。

dlv debug

这是一个代码工程实例, 位于目录GoWorks/GoDbg。

  • 如果位于工程目录下,可以dlv debug开始调试;
  • 也可以指定目录,dlv debug GoWorks/GoDbg;
  • 如果要传入参数,添加–后指定, 如dlv debug GoWorks/GoDbg – -arg1 value
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
➜  GoDbg dlv debug GoWorks/GoDbg
Type 'help' for list of commands.
(dlv) break main.main
Breakpoint 1 set at 0x1089f6b for main.main() ./main.go:7
(dlv) continue
> main.main() ./main.go:7 (hits goroutine(1):1 total:1) (PC: 0x1089f6b)
2: import (
3: "GoWorks/GoDbg/mylib"
4: "fmt"
5: "os"
6: )
=> 7: func main() {
8: fmt.Println("Golang dbg test...")
9: var argc = len(os.Args)
10: var argv = append([]string{}, os.Args...)
11: fmt.Printf("argc:%d\n", argc)
12: fmt.Printf("argv:%v\n", argv)
(dlv) next
> main.main() ./main.go:8 (PC: 0x1089f82)
3: "GoWorks/GoDbg/mylib"
4: "fmt"
5: "os"
6: )
7: func main() {
=> 8: fmt.Println("Golang dbg test...")
9: var argc = len(os.Args)
10: var argv = append([]string{}, os.Args...)
11: fmt.Printf("argc:%d\n", argc)
12: fmt.Printf("argv:%v\n", argv)
13: var var1 = 1

具体有哪些debug命令,可以help来查看,与gdb的很类似,goroutine/goroutines的是dlv中特有的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
(dlv) help
The following commands are available:
args ------------------------ Print function arguments.
break (alias: b) ------------ Sets a breakpoint.
breakpoints (alias: bp) ----- Print out info for active breakpoints.
call ------------------------ Resumes process, injecting a function call (EXPERIMENTAL!!!)
clear ----------------------- Deletes breakpoint.
clearall -------------------- Deletes multiple breakpoints.
condition (alias: cond) ----- Set breakpoint condition.
config ---------------------- Changes configuration parameters.
continue (alias: c) --------- Run until breakpoint or program termination. // 继续运行
deferred -------------------- Executes command in the context of a deferred call.
disassemble (alias: disass) - Disassembler.
down ------------------------ Move the current frame down.
edit (alias: ed) ------------ Open where you are in $DELVE_EDITOR or $EDITOR
exit (alias: quit | q) ------ Exit the debugger.
frame ----------------------- Set the current frame, or execute command on a different frame.
funcs ----------------------- Print list of functions.
goroutine (alias: gr) ------- Shows or changes current goroutine
goroutines (alias: grs) ----- List program goroutines.
help (alias: h) ------------- Prints the help message.
libraries ------------------- List loaded dynamic libraries
list (alias: ls | l) -------- Show source code. // 显示代码
locals ---------------------- Print local variables.
next (alias: n) ------------- Step over to next source line. // 单步下一句
on -------------------------- Executes a command when a breakpoint is hit.
print (alias: p) ------------ Evaluate an expression.
regs ------------------------ Print contents of CPU registers.
restart (alias: r) ---------- Restart process.
set ------------------------- Changes the value of a variable.
source ---------------------- Executes a file containing a list of delve commands
sources --------------------- Print list of source files.
stack (alias: bt) ----------- Print stack trace.
step (alias: s) ------------- Single step through program. // 进入函数内,单步下一句
step-instruction (alias: si) Single step a single cpu instruction.
stepout (alias: so) --------- Step out of the current function.
thread (alias: tr) ---------- Switch to the specified thread.
threads --------------------- Print out info for every traced thread.
trace (alias: t) ------------ Set tracepoint.
types ----------------------- Print list of types
up -------------------------- Move the current frame up.
vars ------------------------ Print package variables.
whatis ---------------------- Prints type of an expression.
Type help followed by a command for full documentation.

dlv exec

Execute a precompiled binary, and begin a debug session.

1
2
3
4
5
➜  file ./main
./main: Mach-O 64-bit executable x86_64
➜ dlv exec ./main
Type 'help' for list of commands.
(dlv)

dlv attach

Attach to running process and begin debugging.

注意:在退出时,有可选是否要kill该进程

1
2
3
4
[root@yg-man-uhost-set9-01 ~]# dlv attach 22063
Type 'help' for list of commands.
(dlv) q
Would you like to kill the process? [Y/n] n

dlv core

Examine a core dump.

注意:dlv不支持生成core,可以通过gdb attach上去后,执行gcore来生成core;然后再dlv core来调试

线上调试的例子

排查过程

线上一个服务出现了问题,看日志是一个接口的程序跑了一半,后面不再执行下去了。

dlv attach上去后,看到有很多.runtime_SemacquireMutex这个的goroutine,一直卡着。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Goroutine 28232679 - User: /usr/local/go/src/runtime/sema.go:62 sync.runtime_SemacquireMutex (0x43cb74)
Goroutine 28232684 - User: /Users/patrick.xu/go/src/uframework/task/tcp_task.go:84 uframework/task.(*TCPTask).Run (0x6ce8cc)
Goroutine 28232685 - User: /usr/local/go/src/runtime/sema.go:62 sync.runtime_SemacquireMutex (0x43cb74)
Goroutine 28232686 - User: /usr/local/go/src/runtime/sema.go:62 sync.runtime_SemacquireMutex (0x43cb74)
Goroutine 28232688 - User: /Users/patrick.xu/go/src/uframework/task/tcp_task.go:84 uframework/task.(*TCPTask).Run (0x6ce8cc)
Goroutine 28232704 - User: /Users/patrick.xu/go/src/uframework/task/tcp_task.go:84 uframework/task.(*TCPTask).Run (0x6ce8cc)
Goroutine 28232705 - User: /usr/local/go/src/runtime/sema.go:62 sync.runtime_SemacquireMutex (0x43cb74)
Goroutine 28232710 - User: /Users/patrick.xu/go/src/uframework/task/tcp_task.go:84 uframework/task.(*TCPTask).Run (0x6ce8cc)
Goroutine 28232714 - User: /Users/patrick.xu/go/src/uframework/task/tcp_task.go:84 uframework/task.(*TCPTask).Run (0x6ce8cc)
Goroutine 28232715 - User: /usr/local/go/src/runtime/sema.go:62 sync.runtime_SemacquireMutex (0x43cb74)
Goroutine 28232718 - User: /Users/patrick.xu/go/src/uframework/task/tcp_task.go:84 uframework/task.(*TCPTask).Run (0x6ce8cc)
Goroutine 28232719 - User: /usr/local/go/src/runtime/sema.go:62 sync.runtime_SemacquireMutex (0x43cb74)
Goroutine 28232721 - User: /usr/local/go/src/runtime/sema.go:62 sync.runtime_SemacquireMutex (0x43cb74)
Goroutine 28232726 - User: /Users/patrick.xu/go/src/uframework/task/tcp_task.go:84 uframework/task.(*TCPTask).Run (0x6ce8cc)
Goroutine 28232727 - User: /usr/local/go/src/runtime/sema.go:62 sync.runtime_SemacquireMutex (0x43cb74)
Goroutine 28232939 - User: /Users/patrick.xu/go/src/uframework/task/tcp_task.go:84 uframework/task.(*TCPTask).Run (0x6ce8cc)
Goroutine 28232940 - User: /usr/local/go/src/runtime/sema.go:62 sync.runtime_SemacquireMutex (0x43cb74)
Goroutine 28239464 - User: /Users/patrick.xu/go/src/uframework/task/tcp_task.go:84 uframework/task.(*TCPTask).Run (0x6ce8cc)
Goroutine 28239465 - User: /usr/local/go/src/runtime/sema.go:62 sync.runtime_SemacquireMutex (0x43cb74)
Goroutine 28239914 - User: /Users/patrick.xu/go/src/uframework/task/tcp_task.go:84 uframework/task.(*TCPTask).Run (0x6ce8cc)
Goroutine 28239915 - User: /usr/local/go/src/runtime/sema.go:62 sync.runtime_SemacquireMutex (0x43cb74)
Goroutine 28239938 - User: /Users/patrick.xu/go/src/uframework/task/tcp_task.go:84 uframework/task.(*TCPTask).Run (0x6ce8cc)
Goroutine 28239939 - User: /usr/local/go/src/runtime/sema.go:62 sync.runtime_SemacquireMutex (0x43cb74)
Goroutine 28239965 - User: /Users/patrick.xu/go/src/uframework/task/tcp_task.go:84 uframework/task.(*TCPTask).Run (0x6ce8cc)
Goroutine 28239966 - User: /usr/local/go/src/runtime/sema.go:62 sync.runtime_SemacquireMutex (0x43cb74)
Goroutine 28239968 - User: /Users/patrick.xu/go/src/uframework/task/tcp_task.go:84 uframework/task.(*TCPTask).Run (0x6ce8cc)
Goroutine 28239984 - User: /Users/patrick.xu/go/src/uframework/task/tcp_task.go:84 uframework/task.(*TCPTask).Run (0x6ce8cc)
Goroutine 28239985 - User: /usr/local/go/src/runtime/sema.go:62 sync.runtime_SemacquireMutex (0x43cb74)
Goroutine 28240001 - User: /usr/local/go/src/runtime/sema.go:62 sync.runtime_SemacquireMutex (0x43cb74)
Goroutine 28240961 - User: /Users/patrick.xu/go/src/uframework/log/logext.go:281 uframework/log.RealWrite (0x6bff2d)
Goroutine 28241103 - User: /Users/patrick.xu/go/src/uframework/utils/zookeeper/zk/conn.go:515 uframework/utils/zookeeper/zk.(*Conn).sendLoop (0x7c7835)
Goroutine 28241104 - User: /usr/local/go/src/runtime/netpoll.go:164 net.runtime_pollWait (0x426ec9)
Goroutine 28241126 - User: /usr/local/go/src/runtime/sema.go:47 sync.runtime_Semacquire (0x43ca94)

看这个goroutine卡在了0x000000000046c45d in sync.(*Mutex).Lock,对应业务代码uhost-go/uhost-scheduler/logic.updateBuffer。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
(dlv) gr 28239985
Switched from 28241159 to 28239985 (thread 31788)
(dlv) bt
0 0x000000000042c76a in runtime.gopark
at /usr/local/go/src/runtime/proc.go:272
1 0x000000000042c84e in runtime.goparkunlock
at /usr/local/go/src/runtime/proc.go:277
2 0x000000000043ce91 in runtime.semacquire
at /usr/local/go/src/runtime/sema.go:130
3 0x000000000043cb74 in sync.runtime_SemacquireMutex
at /usr/local/go/src/runtime/sema.go:62
4 0x000000000046c45d in sync.(*Mutex).Lock
at /usr/local/go/src/sync/mutex.go:87
5 0x0000000000817b1d in uhost-go/uhost-scheduler/logic.updateBuffer
at /Users/patrick.xu/go/src/uhost-go/uhost-scheduler/logic/get_suitable_resource.go:418
6 0x0000000000814bd7 in uhost-go/uhost-scheduler/logic.getSuitableResource
at /Users/patrick.xu/go/src/uhost-go/uhost-scheduler/logic/get_suitable_resource.go:328
7 0x00000000006cecd4 in uframework/task.TCPTaskFunc.ServeTCP
at /Users/patrick.xu/go/src/uframework/task/tcp_task_handle.go:27
8 0x00000000006d02b8 in uframework/task.(*TCPTask).Run.func1
at /Users/patrick.xu/go/src/uframework/task/tcp_task.go:66
9 0x0000000000458dd1 in runtime.goexit
at /usr/local/go/src/runtime/asm_amd64.s:2197

再去看业务代码, frame 5 list

1
2
3
(dlv) frame 5 list
Goroutine 28239985 frame 5 at /Users/patrick.xu/go/src/uhost-go/uhost-scheduler/logic/get_suitable_resource.go:418 (PC: 0x817b1d)
Command failed: open /Users/patrick.xu/go/src/uhost-go/uhost-scheduler/logic/get_suitable_resource.go: no such file or directory

由于线上没有源代码,提示文件不存在。到本地查看这行的代码。

定位到这行后,再结合业务逻辑去分析,就能定位到是哪里问题,造成了这里的死锁。

最后退出来后,注意向上代码别重启。

1
2
(dlv) q
Would you like to kill the process? [Y/n] n

最后查明原因是:

调用updatePUhostSlice时,有多个不同wh.hostId的mapHostIdLock[wh.hostId].Lock(),没执行完时不会unlock;如果有并发调用updatePUhostSlice,且遍历whs的顺序和第一次不一样,有可能陷入死锁。

1
2
3
4
5
6
7
8
9
10
11
12
func updatePUhostSlice(
......
for _, wh := range whs {

if mapHostIdLock[wh.hostId] == nil {
mapHostIdLock[wh.hostId] = new(sync.Mutex)
}
mapHostIdLock[wh.hostId].Lock()
defer mapHostIdLock[wh.hostId].Unlock()
......
}
}

改进方法是:
把Lock()和defer Unlock()抽到一个函数里。注意:不能在循环里出现直接调用Lock()的代码。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
func updatePUhostSlice(
......
for _, wh := range whs {
recheckHostResource(xxx......)
}
}

func recheckHostResource(
sessionNo string,
hostId uint64,
alreadyMatched *uint32,
pBHRrs *uhost.HostResourceSurvey,
uhostParams UhostParams,
)(isEnough bool) {

if mapHostIdLock[hostId] == nil {
mapHostIdLock[hostId] = new(sync.Mutex)
}
mapHostIdLock[hostId].Lock()
defer mapHostIdLock[hostId].Unlock()
.......
}

和gdb的对比

gdb能看出的信息有限,只能定位到线程这个层面,无法到goroutine这一层,很难定位出问题。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
(gdb) info threads
* 1 process 10732 runtime.epollwait () at /usr/local/go/src/runtime/sys_linux_amd64.s:560
(gdb) thread apply all bt

Thread 1 (process 10732):
#0 runtime.epollwait () at /usr/local/go/src/runtime/sys_linux_amd64.s:560
#1 0x00000000004280d1 in runtime.netpoll (block=true, ~r1=0x1) at /usr/local/go/src/runtime/netpoll_epoll.go:67
#2 0x0000000000430f8f in runtime.findrunnable (gp#10=0xc42001c000, inheritTime=false) at /usr/local/go/src/runtime/proc.go:2084
#3 0x0000000000431aec in runtime.schedule () at /usr/local/go/src/runtime/proc.go:2222
#4 0x0000000000431deb in runtime.park_m (gp=0xc4200011e0) at /usr/local/go/src/runtime/proc.go:2285
#5 0x000000000045626b in runtime.mcall () at /usr/local/go/src/runtime/asm_amd64.s:269
#6 0x0000000000cebd00 in runtime.work ()
#7 0x00007ffe3162bc70 in ?? ()
#8 0x0000000000cebd80 in runtime.work ()
#9 0x00007ffe3162bc60 in ?? ()
#10 0x000000000042f0e4 in runtime.mstart () at /usr/local/go/src/runtime/proc.go:1149
#11 0x00000000004560d9 in runtime.rt0_go () at /usr/local/go/src/runtime/asm_amd64.s:169
#12 0x0000000000000009 in ?? ()
#13 0x00007ffe3162bca8 in ?? ()
#14 0x0000000000000009 in ?? ()
#15 0x00007ffe3162bca8 in ?? ()
#16 0x0000000000000000 in ?? ()

但可以用gdb产生core,采集一些现场信息,然后重启尽快恢复业务,事后再用dlv分析core来定位原因。

参考链接

  1. Debugging Go Code with GDB
  2. go-delve/delve
  3. Golang程序调试工具介绍(gdb vs dlv)
    1. 实战分析一个运行起来会卡死的Go程序