From: Ido Schimmel <idosch@idosch.org>
To: Robin Murphy <robin.murphy@arm.com>
Cc: joro@8bytes.org, will@kernel.org, iommu@lists.linux.dev,
linux-kernel@vger.kernel.org, zhangzekun11@huawei.com,
john.g.garry@oracle.com, dheerajkumar.srivastava@amd.com,
jsnitsel@redhat.com, Catalin Marinas <catalin.marinas@arm.com>
Subject: Re: [PATCH v3 0/2] iommu/iova: Make the rcache depot properly flexible
Date: Wed, 10 Jan 2024 16:00:59 +0200 [thread overview]
Message-ID: <ZZ6jG5NyaUpeCpXq@shredder> (raw)
In-Reply-To: <ab22c439-e7da-49b5-b20b-856daf376c02@arm.com>
On Wed, Jan 10, 2024 at 12:48:06PM +0000, Robin Murphy wrote:
> On 2024-01-09 5:21 pm, Ido Schimmel wrote:
> > Hi Robin,
> >
> > Thanks for the reply.
> >
> > On Mon, Jan 08, 2024 at 05:35:26PM +0000, Robin Murphy wrote:
> > > Hmm, we've got what looks to be a set of magazines forming a plausible depot
> > > list (or at least the tail end of one):
> > >
> > > ffff8881411f9000 -> ffff8881261c1000
> > >
> > > ffff8881261c1000 -> ffff88812be26400
> > >
> > > ffff88812be26400 -> ffff8188392ec000
> > >
> > > ffff8188392ec000 -> ffff8881a5301000
> > >
> > > ffff8881a5301000 -> NULL
> > >
> > > which I guess has somehow become detached from its rcache->depot without
> > > being freed properly? However I'm struggling to see any conceivable way that
> > > could happen which wouldn't already be more severely broken in other ways as
> > > well (i.e. either general memory corruption or someone somehow still trying
> > > to use the IOVA domain while it's being torn down).
> >
> > The machine is running a debug kernel that among other things has KASAN
> > enabled, but there are no traces in the kernel log so there is no memory
> > corruption that I'm aware of.
> >
> > > Out of curiosity, does reverting just patch #2 alone make a difference?
> >
> > Will try and let you know.
I can confirm that the issue reproduces when only patch #2 is reverted.
IOW, patch #1 seems to be the problem:
unreferenced object 0xffff8881a1ff3400 (size 1024):
comm "softirq", pid 0, jiffies 4296362635 (age 3540.420s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 67 b7 05 00 00 00 00 00 ........g.......
3f a6 05 00 00 00 00 00 93 99 05 00 00 00 00 00 ?...............
backtrace:
[<ffffffff819f7a68>] __kmem_cache_alloc_node+0x1e8/0x320
[<ffffffff818a3efa>] kmalloc_trace+0x2a/0x60
[<ffffffff8231f8f3>] free_iova_fast+0x293/0x460
[<ffffffff823132f0>] fq_ring_free_locked+0x1b0/0x310
[<ffffffff82314ced>] fq_flush_timeout+0x19d/0x2e0
[<ffffffff813e97da>] call_timer_fn+0x19a/0x5c0
[<ffffffff813ea38b>] __run_timers+0x78b/0xb80
[<ffffffff813ea7dd>] run_timer_softirq+0x5d/0xd0
[<ffffffff82f21605>] __do_softirq+0x205/0x8b5
unreferenced object 0xffff888165b9a800 (size 1024):
comm "softirq", pid 0, jiffies 4299383627 (age 519.460s)
hex dump (first 32 bytes):
00 34 ff a1 81 88 ff ff bd 9d 05 00 00 00 00 00 .4..............
f3 ab 05 00 00 00 00 00 37 b5 05 00 00 00 00 00 ........7.......
backtrace:
[<ffffffff819f7a68>] __kmem_cache_alloc_node+0x1e8/0x320
[<ffffffff818a3efa>] kmalloc_trace+0x2a/0x60
[<ffffffff8231f8f3>] free_iova_fast+0x293/0x460
[<ffffffff823132f0>] fq_ring_free_locked+0x1b0/0x310
[<ffffffff82314ced>] fq_flush_timeout+0x19d/0x2e0
[<ffffffff813e97da>] call_timer_fn+0x19a/0x5c0
[<ffffffff813ea38b>] __run_timers+0x78b/0xb80
[<ffffffff813ea7dd>] run_timer_softirq+0x5d/0xd0
[<ffffffff82f21605>] __do_softirq+0x205/0x8b5
> >
> > > And is your workload doing anything "interesting" in relation to IOVA
> > > domain lifetimes, like creating and destroying SR-IOV virtual
> > > functions, changing IOMMU domain types via sysfs, or using that
> > > horrible vdpa thing, or are you seeing this purely from regular driver
> > > DMA API usage?
> >
> > The machine is running networking related tests, but it is not using
> > SR-IOV, VMs or VDPA so there shouldn't be anything "interesting" as far
> > as IOMMU is concerned.
> >
> > The two networking drivers on the machine are "igb" for the management
> > port and "mlxsw" for the data ports (the machine is a physical switch).
> > I believe the DMA API usage in the latter is quite basic and I don't
> > recall any DMA related problems with this driver since it was first
> > accepted upstream in 2015.
>
> Thanks for the clarifications, that seems to rule out all the most
> confusingly impossible scenarios, at least.
>
> The best explanation I've managed to come up with is a false-positive race
> dependent on the order in which kmemleak scans the relevant objects. Say we
> have the list as depot -> A -> B -> C; the rcache object is scanned and sees
> the pointer to magazine A, but then A is popped *before* kmemleak scans it,
> such that when it is then scanned, its "next" pointer has already been
> wiped, thus kmemleak never observes any reference to B, so it appears that B
> and (transitively) C are "leaked". If that is the case, then I'd expect it
> should be reproducible with patch #1 alone (although patch #2 might make it
> slightly more likely if the work ever does result in additional pops
> happening), but I'd expect the leaked objects to be transient and not
> persist forever through repeated scans (what I don't know is whether
> kmemleak automatically un-leaks an object if it subsequently finds a new
> reference, or if it needs manually clearing in between scans). I'm not sure
> if there's a nice way to make that any better... unless maybe it might make
> sense to call kmemleak_not_leak(mag->next) in iova_depot_pop() before that
> reference disappears?
I'm not familiar with the code so I can't comment if that's the best
solution, but I will say that we've been running kmemleak as part of our
regression for years and every time we got a report it was an actual
memory leak. Therefore, in order to keep the tool reliable, I think it's
better to annotate the code to suppress false-positives rather than
ignoring it.
Please let me know if you want me to test a fix.
Thanks for looking into this!
next prev parent reply other threads:[~2024-01-10 14:01 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-12 16:28 [PATCH v3 0/2] iommu/iova: Make the rcache depot properly flexible Robin Murphy
2023-09-12 16:28 ` [PATCH v3 1/2] iommu/iova: Make the rcache depot scale better Robin Murphy
2023-09-12 16:28 ` [PATCH v3 2/2] iommu/iova: Manage the depot list size Robin Murphy
2023-09-25 10:08 ` [PATCH v3 0/2] iommu/iova: Make the rcache depot properly flexible Joerg Roedel
2023-12-28 12:23 ` Ido Schimmel
2024-01-02 7:24 ` Ido Schimmel
2024-01-03 8:38 ` Joerg Roedel
2024-01-06 4:21 ` Ethan Zhao
2024-01-06 7:07 ` zhangzekun (A)
2024-01-06 7:33 ` Ethan Zhao
2024-01-06 4:03 ` Ethan Zhao
2024-01-08 3:13 ` Ethan Zhao
2024-01-08 17:35 ` Robin Murphy
2024-01-09 5:54 ` Ethan Zhao
2024-01-09 6:23 ` Ethan Zhao
2024-01-09 11:26 ` Robin Murphy
2024-01-10 0:52 ` Ethan Zhao
2024-01-09 17:21 ` Ido Schimmel
2024-01-10 12:48 ` Robin Murphy
2024-01-10 14:00 ` Ido Schimmel [this message]
2024-01-10 17:58 ` Catalin Marinas
2024-01-11 8:20 ` Ido Schimmel
2024-01-11 10:13 ` Catalin Marinas
2024-01-12 15:31 ` Ido Schimmel
2024-01-15 7:17 ` Ido Schimmel
2024-10-28 8:04 ` Ido Schimmel
2024-10-28 17:45 ` Catalin Marinas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZZ6jG5NyaUpeCpXq@shredder \
--to=idosch@idosch.org \
--cc=catalin.marinas@arm.com \
--cc=dheerajkumar.srivastava@amd.com \
--cc=iommu@lists.linux.dev \
--cc=john.g.garry@oracle.com \
--cc=joro@8bytes.org \
--cc=jsnitsel@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
--cc=zhangzekun11@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.