From: Nicolin Chen <nicolinc@nvidia.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: <iommu@lists.linux.dev>,
Alex Williamson <alex.williamson@redhat.com>,
"Lu Baolu" <baolu.lu@linux.intel.com>,
Eric Auger <eric.auger@redhat.com>,
"Kevin Tian" <kevin.tian@intel.com>,
Lixiao Yang <lixiao.yang@intel.com>,
"Matthew Rosato" <mjrosato@linux.ibm.com>,
<stable@vger.kernel.org>,
<syzbot+7574ebfe589049630608@syzkaller.appspotmail.com>,
Terrence Xu <terrence.xu@intel.com>, Yi Liu <yi.l.liu@intel.com>
Subject: Re: [PATCH rc 3/3] iommufd: Set end correctly when doing batch carry
Date: Tue, 25 Jul 2023 12:55:11 -0700 [thread overview]
Message-ID: <ZMAonwbzOgm6IY7/@Asurada-Nvidia> (raw)
In-Reply-To: <3-v1-85aacb2af554+bc-iommufd_syz3_jgg@nvidia.com>
On Tue, Jul 25, 2023 at 04:05:50PM -0300, Jason Gunthorpe wrote:
> Even though the test suite covers this it somehow became obscured that
> this wasn't working.
>
> The test iommufd_ioas.mock_domain.access_domain_destory would blow up
> rarely.
>
> end should be set to 1 because this just pushed an item, the carry, to the
> pfns list.
>
> Sometimes the test would blow up with:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000000
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP
> CPU: 5 PID: 584 Comm: iommufd Not tainted 6.5.0-rc1-dirty #1236
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> RIP: 0010:batch_unpin+0xa2/0x100 [iommufd]
> Code: 17 48 81 fe ff ff 07 00 77 70 48 8b 15 b7 be 97 e2 48 85 d2 74 14 48 8b 14 fa 48 85 d2 74 0b 40 0f b6 f6 48 c1 e6 04 48 01 f2 <48> 8b 3a 48 c1 e0 06 89 ca 48 89 de 48 83 e7 f0 48 01 c7 e8 96 dc
> RSP: 0018:ffffc90001677a58 EFLAGS: 00010246
> RAX: 00007f7e2646f000 RBX: 0000000000000000 RCX: 0000000000000001
> RDX: 0000000000000000 RSI: 00000000fefc4c8d RDI: 0000000000fefc4c
> RBP: ffffc90001677a80 R08: 0000000000000048 R09: 0000000000000200
> R10: 0000000000030b98 R11: ffffffff81f3bb40 R12: 0000000000000001
> R13: ffff888101f75800 R14: ffffc90001677ad0 R15: 00000000000001fe
> FS: 00007f9323679740(0000) GS:ffff8881ba540000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 0000000105ede003 CR4: 00000000003706a0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> ? show_regs+0x5c/0x70
> ? __die+0x1f/0x60
> ? page_fault_oops+0x15d/0x440
> ? lock_release+0xbc/0x240
> ? exc_page_fault+0x4a4/0x970
> ? asm_exc_page_fault+0x27/0x30
> ? batch_unpin+0xa2/0x100 [iommufd]
> ? batch_unpin+0xba/0x100 [iommufd]
> __iopt_area_unfill_domain+0x198/0x430 [iommufd]
> ? __mutex_lock+0x8c/0xb80
> ? __mutex_lock+0x6aa/0xb80
> ? xa_erase+0x28/0x30
> ? iopt_table_remove_domain+0x162/0x320 [iommufd]
> ? lock_release+0xbc/0x240
> iopt_area_unfill_domain+0xd/0x10 [iommufd]
> iopt_table_remove_domain+0x195/0x320 [iommufd]
> iommufd_hw_pagetable_destroy+0xb3/0x110 [iommufd]
> iommufd_object_destroy_user+0x8e/0xf0 [iommufd]
> iommufd_device_detach+0xc5/0x140 [iommufd]
> iommufd_selftest_destroy+0x1f/0x70 [iommufd]
> iommufd_object_destroy_user+0x8e/0xf0 [iommufd]
> iommufd_destroy+0x3a/0x50 [iommufd]
> iommufd_fops_ioctl+0xfb/0x170 [iommufd]
> __x64_sys_ioctl+0x40d/0x9a0
> do_syscall_64+0x3c/0x80
> entry_SYSCALL_64_after_hwframe+0x46/0xb0
>
> Cc: <stable@vger.kernel.org>
> Fixes: f394576eb11d ("iommufd: PFN handling for iopt_pages")
> Reported-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This fixes the memory leak at the HugePages, and likely the rarely
triggered BUG too since I see no repro after applying this patch.
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Thanks!
next prev parent reply other threads:[~2023-07-25 19:55 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-25 19:05 [PATCH rc 0/3] Several iommufd bug fixes Jason Gunthorpe
2023-07-25 19:05 ` [PATCH rc 1/3] iommufd/selftest: Do not try to destroy an access once it is attached Jason Gunthorpe
2023-07-25 21:45 ` Nicolin Chen
2023-07-26 3:53 ` Greg KH
2023-07-25 19:05 ` [PATCH rc 2/3] iommufd: IOMMUFD_DESTROY should not increase the refcount Jason Gunthorpe
2023-07-27 5:25 ` Tian, Kevin
2023-07-27 14:10 ` Jason Gunthorpe
2023-07-25 19:05 ` [PATCH rc 3/3] iommufd: Set end correctly when doing batch carry Jason Gunthorpe
2023-07-25 19:55 ` Nicolin Chen [this message]
2023-07-27 5:26 ` Tian, Kevin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZMAonwbzOgm6IY7/@Asurada-Nvidia \
--to=nicolinc@nvidia.com \
--cc=alex.williamson@redhat.com \
--cc=baolu.lu@linux.intel.com \
--cc=eric.auger@redhat.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=kevin.tian@intel.com \
--cc=lixiao.yang@intel.com \
--cc=mjrosato@linux.ibm.com \
--cc=stable@vger.kernel.org \
--cc=syzbot+7574ebfe589049630608@syzkaller.appspotmail.com \
--cc=terrence.xu@intel.com \
--cc=yi.l.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox