Segfault on suspend, unplug dock, resume
Regarding merge request !85 (merged)
Following on from drm/amd bug reports drm/amd#2375 (comment 1987516)
I tracked the Xorg driver crash down to a null pointer dereference.
I am not familiar with the Xorg amdgpu code so this may not be the correct way to go about it, and this may not be the only possible segfault. I ran gdb attached to Xorg and could see that, after the external monitors were disconnected and resume done, drmmode_set_mode
was still iterating over data structures for the disconnected external monitors, and attempts to dereference a NULL pointer. The call to drmmode_set_mode
appears to be attempting to change the mode on one of those disconnected monitors.
The crash happens randomly on resume, but I found one (mostly) reproducible test case, which is:
- plug dock in with 3 external monitors
- close laptop lid (laptop display goes off)
- suspend using XFCE command (I think there is something timing sensitive here - a simple
systemctl suspend
does not trigger it) - unplug USB dock
- resume by opening laptop lid
The patch fixes this test case. The gdb backtrace was:
#0 0x00007f2029565e82 in drmmode_set_mode
(crtc=crtc@entry=0x55a0c7610390, fb=fb@entry=0x55a0c86d7410, mode=mode@entry=0x55a0c76103a8, x=x@entry=0, y=y@entry=0) at drmmode_display.c:1267
#1 0x00007f20295663f6 in drmmode_set_mode_major
(crtc=0x55a0c7610390, mode=0x55a0c76103a8, rotation=<optimized out>, x=<optimized out>, y=<optimized out>) at drmmode_display.c:1371
#2 0x00007f202955ff8a in AMDGPUUnblank (pScrn=pScrn@entry=0x55a0c740a8e0) at amdgpu_kms.c:1823
#3 0x00007f2029560047 in AMDGPUSaveScreen_KMS (pScreen=<optimized out>, mode=1) at amdgpu_kms.c:1905
#4 0x000055a0c643803c in dixSaveScreens ()
#5 0x000055a0c64b9ab5 in ()
#6 0x000055a0c64b9b35 in ()
#7 0x000055a0c6407734 in ()
#8 0x000055a0c640b6cc in ()
#9 0x00007f2029e4618a in __libc_start_call_main
(main=main@entry=0x55a0c63f4b40, argc=argc@entry=10, argv=argv@entry=0x7fffb7cefbf8)
at ../sysdeps/nptl/libc_start_call_main.h:58
#10 0x00007f2029e46245 in __libc_start_main_impl
(main=0x55a0c63f4b40, argc=10, argv=0x7fffb7cefbf8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffb7cefbe8) at ../csu/libc-start.c:381
#11 0x000055a0c63f4b71 in _start ()
This crash is specific to the amdgpu driver. The modesetting driver does not crash under the same test case on the same hardware.