From: "Christian König" <christian.koenig@amd.com>
To: "Chen, Guchun" <Guchun.Chen@amd.com>,
Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
dri-devel <dri-devel@lists.freedesktop.org>,
amd-gfx list <amd-gfx@lists.freedesktop.org>,
Linux List Kernel Mailing <linux-kernel@vger.kernel.org>
Subject: Keyword Review - Re: BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched]
Date: Wed, 26 Apr 2023 13:48:51 +0200 [thread overview]
Message-ID: <989d7a71-ebfc-d245-9e05-a5a46085234e@amd.com> (raw)
In-Reply-To: <BL0PR12MB2465BE82A18038353E48E025F1659@BL0PR12MB2465.namprd12.prod.outlook.com>
WTF? I own you a beer!
I've fixed exactly that problem during the review process of the cleanup
patch and because of this didn't considered that the code is still there.
It also explains why we don't see that in our testing.
@Mikhail can you test that patch with drm-misc-next?
Thanks,
Christian.
Am 26.04.23 um 04:00 schrieb Chen, Guchun:
> After reviewing this whole history, maybe attached patch is able to fix your problem. Can you have a try please?
>
> Regards,
> Guchun
>
>> -----Original Message-----
>> From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> On Behalf Of
>> Mikhail Gavrilov
>> Sent: Tuesday, April 25, 2023 9:20 PM
>> To: Koenig, Christian <Christian.Koenig@amd.com>
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>; dri-devel <dri-
>> devel@lists.freedesktop.org>; amd-gfx list <amd-gfx@lists.freedesktop.org>;
>> Linux List Kernel Mailing <linux-kernel@vger.kernel.org>
>> Subject: Re: BUG: KASAN: null-ptr-deref in
>> drm_sched_job_cleanup+0x96/0x290 [gpu_sched]
>>
>> On Thu, Apr 20, 2023 at 3:32 PM Mikhail Gavrilov
>> <mikhail.v.gavrilov@gmail.com> wrote:
>>> Important don't give up.
>>> https://youtu.be/25zhHBGIHJ8 [40 min]
>>> https://youtu.be/utnDR26eYBY [50 min]
>>> https://youtu.be/DJQ_tiimW6g [12 min]
>>> https://youtu.be/Y6AH1oJKivA [6 min]
>>> Yes the issue is everything reproducible, but time to time it not
>>> happens at first attempt.
>>> I also uploaded other videos which proves that the issue definitely
>>> exists if someone will launch those games in turn.
>>> Reproducibility is only a matter of time.
>>>
>>> Anyway I didn't want you to spend so much time trying to reproduce it.
>>> This monkey business fits me more than you.
>>> It would be better if I could collect more useful info.
>> Christian,
>> Did you manage to reproduce the problem?
>>
>> At the weekend I faced with slab-use-after-free in
>> amdgpu_vm_handle_moved.
>> I didn't play in the games at this time.
>> The Xwayland process was affected so it leads to desktop hang.
>>
>> ================================================================
>> ==
>> BUG: KASAN: slab-use-after-free in
>> amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu] Read of size 8 at addr
>> ffff888295c66190 by task Xwayland:cs0/173185
>>
>> CPU: 21 PID: 173185 Comm: Xwayland:cs0 Tainted: G W L
>> ------- --- 6.3.0-0.rc7.20230420gitcb0856346a60.59.fc39.x86_64+debug
>> #1
>> Hardware name: System manufacturer System Product Name/ROG STRIX
>> X570-I GAMING, BIOS 4601 02/02/2023 Call Trace:
>> <TASK>
>> dump_stack_lvl+0x76/0xd0
>> print_report+0xcf/0x670
>> ? amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu] ?
>> amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu]
>> kasan_report+0xa8/0xe0
>> ? amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu]
>> amdgpu_vm_handle_moved+0x286/0x2d0 [amdgpu]
>> amdgpu_cs_ioctl+0x2b7e/0x5630 [amdgpu]
>> ? __pfx___lock_acquire+0x10/0x10
>> ? __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu] ? mark_lock+0x101/0x16e0 ?
>> __lock_acquire+0xe54/0x59f0 ? __pfx_lock_release+0x10/0x10 ?
>> __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu]
>> drm_ioctl_kernel+0x1fc/0x3d0
>> ? __pfx_drm_ioctl_kernel+0x10/0x10
>> drm_ioctl+0x4c5/0xaa0
>> ? __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu] ?
>> __pfx_drm_ioctl+0x10/0x10 ? _raw_spin_unlock_irqrestore+0x66/0x80
>> ? lockdep_hardirqs_on+0x81/0x110
>> ? _raw_spin_unlock_irqrestore+0x4f/0x80
>> amdgpu_drm_ioctl+0xd2/0x1b0 [amdgpu]
>> __x64_sys_ioctl+0x131/0x1a0
>> do_syscall_64+0x60/0x90
>> ? do_syscall_64+0x6c/0x90
>> ? lockdep_hardirqs_on+0x81/0x110
>> ? do_syscall_64+0x6c/0x90
>> ? lockdep_hardirqs_on+0x81/0x110
>> ? do_syscall_64+0x6c/0x90
>> ? lockdep_hardirqs_on+0x81/0x110
>> ? do_syscall_64+0x6c/0x90
>> ? lockdep_hardirqs_on+0x81/0x110
>> entry_SYSCALL_64_after_hwframe+0x72/0xdc
>> RIP: 0033:0x7ffb71b0892d
>> Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00
>> 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00
>> f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
>> RSP: 002b:00007ffb677fe840 EFLAGS: 00000246 ORIG_RAX:
>> 0000000000000010
>> RAX: ffffffffffffffda RBX: 00007ffb677fe9f8 RCX: 00007ffb71b0892d
>> RDX: 00007ffb677fe900 RSI: 00000000c0186444 RDI: 000000000000000d
>> RBP: 00007ffb677fe890 R08: 00007ffb677fea50 R09: 00007ffb677fe8e0
>> R10: 0000556c4611bec0 R11: 0000000000000246 R12: 00007ffb677fe900
>> R13: 00000000c0186444 R14: 000000000000000d R15: 00007ffb677fe9f8
>> </TASK>
>>
>> Allocated by task 173181:
>> kasan_save_stack+0x33/0x60
>> kasan_set_track+0x25/0x30
>> __kasan_kmalloc+0x8f/0xa0
>> __kmalloc_node+0x65/0x160
>> amdgpu_bo_create+0x31e/0xfb0 [amdgpu]
>> amdgpu_bo_create_user+0xca/0x160 [amdgpu]
>> amdgpu_gem_create_ioctl+0x398/0x980 [amdgpu]
>> drm_ioctl_kernel+0x1fc/0x3d0
>> drm_ioctl+0x4c5/0xaa0
>> amdgpu_drm_ioctl+0xd2/0x1b0 [amdgpu]
>> __x64_sys_ioctl+0x131/0x1a0
>> do_syscall_64+0x60/0x90
>> entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>
>> Freed by task 173185:
>> kasan_save_stack+0x33/0x60
>> kasan_set_track+0x25/0x30
>> kasan_save_free_info+0x2e/0x50
>> __kasan_slab_free+0x10b/0x1a0
>> slab_free_freelist_hook+0x11e/0x1d0
>> __kmem_cache_free+0xc0/0x2e0
>> ttm_bo_release+0x667/0x9e0 [ttm]
>> amdgpu_bo_unref+0x35/0x70 [amdgpu]
>> amdgpu_gem_object_free+0x73/0xb0 [amdgpu]
>> drm_gem_handle_delete+0xe3/0x150
>> drm_ioctl_kernel+0x1fc/0x3d0
>> drm_ioctl+0x4c5/0xaa0
>> amdgpu_drm_ioctl+0xd2/0x1b0 [amdgpu]
>> __x64_sys_ioctl+0x131/0x1a0
>> do_syscall_64+0x60/0x90
>> entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>
>> Last potentially related work creation:
>> kasan_save_stack+0x33/0x60
>> __kasan_record_aux_stack+0x97/0xb0
>> __call_rcu_common.constprop.0+0xf8/0x1af0
>> drm_sched_fence_release_scheduled+0xb8/0xe0 [gpu_sched]
>> dma_resv_reserve_fences+0x4dc/0x7f0
>> ttm_eu_reserve_buffers+0x3f6/0x1190 [ttm]
>> amdgpu_cs_ioctl+0x204d/0x5630 [amdgpu]
>> drm_ioctl_kernel+0x1fc/0x3d0
>> drm_ioctl+0x4c5/0xaa0
>> amdgpu_drm_ioctl+0xd2/0x1b0 [amdgpu]
>> __x64_sys_ioctl+0x131/0x1a0
>> do_syscall_64+0x60/0x90
>> entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>
>> Second to last potentially related work creation:
>> kasan_save_stack+0x33/0x60
>> __kasan_record_aux_stack+0x97/0xb0
>> __call_rcu_common.constprop.0+0xf8/0x1af0
>> drm_sched_fence_release_scheduled+0xb8/0xe0 [gpu_sched]
>> amdgpu_ctx_add_fence+0x2b1/0x390 [amdgpu]
>> amdgpu_cs_ioctl+0x44d0/0x5630 [amdgpu]
>> drm_ioctl_kernel+0x1fc/0x3d0
>> drm_ioctl+0x4c5/0xaa0
>> amdgpu_drm_ioctl+0xd2/0x1b0 [amdgpu]
>> __x64_sys_ioctl+0x131/0x1a0
>> do_syscall_64+0x60/0x90
>> entry_SYSCALL_64_after_hwframe+0x72/0xdc
>>
>> The buggy address belongs to the object at ffff888295c66000 which belongs
>> to the cache kmalloc-1k of size 1024 The buggy address is located 400 bytes
>> inside of freed 1024-byte region [ffff888295c66000, ffff888295c66400)
>>
>> The buggy address belongs to the physical page:
>> page:00000000125ffbe3 refcount:1 mapcount:0 mapping:0000000000000000
>> index:0x0 pfn:0x295c60
>> head:00000000125ffbe3 order:3 entire_mapcount:0 nr_pages_mapped:0
>> pincount:0 anon flags:
>> 0x17ffffc0010200(slab|head|node=0|zone=2|lastcpupid=0x1fffff)
>> raw: 0017ffffc0010200 ffff88810004cdc0 0000000000000000
>> dead000000000001
>> raw: 0000000000000000 0000000000100010 00000001ffffffff
>> 0000000000000000 page dumped because: kasan: bad access detected
>>
>> Memory state around the buggy address:
>> ffff888295c66080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ffff888295c66100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>> ffff888295c66180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ^
>> ffff888295c66200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ffff888295c66280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ================================================================
>> ==
>>
>> --
>> Best Regards,
>> Mike Gavrilov.
next prev parent reply other threads:[~2023-04-26 15:52 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-11 17:40 BUG: KASAN: null-ptr-deref in drm_sched_job_cleanup+0x96/0x290 [gpu_sched] Mikhail Gavrilov
2023-04-14 15:08 ` Mikhail Gavrilov
2023-04-19 7:00 ` Mikhail Gavrilov
2023-04-19 8:12 ` Christian König
2023-04-19 13:13 ` Mikhail Gavrilov
2023-04-19 13:15 ` Christian König
2023-04-19 19:17 ` Mikhail Gavrilov
2023-04-20 9:59 ` Christian König
2023-04-20 10:32 ` Mikhail Gavrilov
2023-04-25 13:19 ` Mikhail Gavrilov
2023-04-26 2:00 ` Chen, Guchun
2023-04-26 11:48 ` Christian König [this message]
2023-04-26 11:50 ` Christian König
2023-05-02 19:28 ` Mikhail Gavrilov
2023-04-20 21:24 ` Mikhail Gavrilov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=989d7a71-ebfc-d245-9e05-a5a46085234e@amd.com \
--to=christian.koenig@amd.com \
--cc=Guchun.Chen@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=daniel.vetter@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mikhail.v.gavrilov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox