Skip to content

drm/amdgpu: Fix the deadlock between UQ fence and EF suspend_worker

Here, amdgpu_userq_signal_ioctl is waiting for the eviction fence to be signaled by holding a mutex lock and eviction fence worker thread is trying to acquire the same lock before signaling an eviction fence.

process1- amdgpu_userq_signal_ioctl is waiting by holding lock. mutex_lock(); dma_resv_wait_timeout(); mutex_lock();

Process2 - amdgpu_eviction_fence_suspend_worker is waiting for lock before signal the dma fence. mutex_lock(); userq_active = queue->queue_active mutex_unlock();

if (userq_active)
	dma_fence_signal();

Call Trace: [ 242.853248] INFO: task kworker/10:1:223 blocked for more than 120 seconds. [ 242.853261] Tainted: G OE 6.12.0-rc2+ #1 [ 242.853264] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.853266] task:kworker/10:1 state:D stack:0 pid:223 tgid:223 ppid:2 flags:0x00004000 [ 242.853273] Workqueue: events amdgpu_eviction_fence_suspend_worker [amdgpu] [ 242.853538] Call Trace: [ 242.853541] [ 242.853547] __schedule+0x432/0x15e0 [ 242.853555] ? srso_alias_return_thunk+0x5/0xfbef5 [ 242.853562] ? srso_alias_return_thunk+0x5/0xfbef5 [ 242.853566] ? srso_alias_return_thunk+0x5/0xfbef5 [ 242.853571] schedule+0x2b/0x140 [ 242.853574] schedule_preempt_disabled+0x15/0x30 [ 242.853576] __mutex_lock.constprop.0+0x37a/0x700 [ 242.853579] ? psi_group_change+0x21b/0x4d0 [ 242.853587] __mutex_lock_slowpath+0x13/0x20 [ 242.853590] mutex_lock+0x3d/0x50 [ 242.853593] amdgpu_userqueue_active+0x33/0x90 [amdgpu] [ 242.853807] amdgpu_eviction_fence_suspend_worker+0x3c/0x160 [amdgpu] [ 242.853979] process_one_work+0x17a/0x3c0 [ 242.853984] ? __pfx_worker_thread+0x10/0x10 [ 242.853987] worker_thread+0x2b5/0x3c0 [ 242.853989] ? __pfx_worker_thread+0x10/0x10 [ 242.853991] kthread+0xe1/0x120 [ 242.853993] ? __pfx_kthread+0x10/0x10 [ 242.853995] ret_from_fork+0x43/0x70 [ 242.854000] ? __pfx_kthread+0x10/0x10 [ 242.854001] ret_from_fork_asm+0x1a/0x30 [ 242.854008] [ 242.854060] INFO: task gnome-shel:cs0:1280 blocked for more than 120 seconds. [ 242.854063] Tainted: G OE 6.12.0-rc2+ #1 [ 242.854064] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.854066] task:gnome-shel:cs0 state:D stack:0 pid:1280 tgid:1274 ppid:1184 flags:0x00000002 [ 242.854069] Call Trace: [ 242.854071] [ 242.854072] __schedule+0x432/0x15e0 [ 242.854076] schedule+0x2b/0x140 [ 242.854078] schedule_timeout+0x152/0x160 [ 242.854081] ? srso_alias_return_thunk+0x5/0xfbef5 [ 242.854084] ? dma_fence_default_wait+0x11e/0x1e0 [ 242.854090] dma_fence_default_wait+0x186/0x1e0 [ 242.854093] ? __pfx_dma_fence_default_wait_cb+0x10/0x10 [ 242.854097] dma_fence_wait_timeout+0x113/0x140 [ 242.854099] dma_resv_wait_timeout+0x70/0xe0 [ 242.854102] amdgpu_bo_kmap+0x39/0xc0 [amdgpu] [ 242.854225] amdgpu_userq_signal_ioctl+0x70c/0xbc0 [amdgpu] [ 242.854354] ? __check_object_size+0x6d/0x300 [ 242.854360] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu] [ 242.854485] drm_ioctl_kernel+0xaf/0x110 [drm] [ 242.854510] ? srso_alias_return_thunk+0x5/0xfbef5 [ 242.854514] drm_ioctl+0x2c6/0x580 [drm] [ 242.854526] ? __pfx_amdgpu_userq_signal_ioctl+0x10/0x10 [amdgpu] [ 242.854654] amdgpu_drm_ioctl+0x4b/0x90 [amdgpu] [ 242.854773] __x64_sys_ioctl+0x9a/0xe0 [ 242.854777] x64_sys_call+0x1395/0x2670 [ 242.854780] do_syscall_64+0x70/0x130 [ 242.854783] ? srso_alias_return_thunk+0x5/0xfbef5 [ 242.854786] ? irqentry_exit_to_user_mode+0x33/0x180 [ 242.854789] ? srso_alias_return_thunk+0x5/0xfbef5 [ 242.854791] ? irqentry_exit+0x43/0x50 [ 242.854793] ? srso_alias_return_thunk+0x5/0xfbef5 [ 242.854795] ? exc_page_fault+0x94/0x1d0 [ 242.854798] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 242.854801] RIP: 0033:0x7de987b1a94f [ 242.854803] RSP: 002b:00007de9799feb00 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 242.854805] RAX: ffffffffffffffda RBX: 00007de9799fecb0 RCX: 00007de987b1a94f [ 242.854807] RDX: 00007de9799fecb0 RSI: 00000000c0306457 RDI: 000000000000000d [ 242.854808] RBP: 00007de9799feb80 R08: 000000000000027f R09: 000000000000027e [ 242.854809] R10: 000000000000027d R11: 0000000000000246 R12: 00005b3d90a8e898 [ 242.854810] R13: 00000000c0306457 R14: 000000000000000d R15: 00005b3d908b6758 [ 242.854813]

Cc: Alex Deucher alexander.deucher@amd.com Cc: Christian Koenig christian.koenig@amd.com Cc: Shashank Sharma shashank.sharma@amd.com Cc: Arunpravin Paneer Selvam Arunpravin.PaneerSelvam@amd.com Signed-off-by: Arvind Yadav Arvind.Yadav@amd.com

Testing is done on Navi33

**Adding testing results : **

  • Able to launch Desktop.
  • Glxgear is running successfully
  • unigen_heaven is running successfully.
Edited by Arvind Yadav

Merge request reports

Loading