From: Catalin Marinas <catalin.marinas@arm.com>
To: "Petr Tesařík" <petr@tesarici.cz>
Cc: Petr Tesarik <petrtesarik@huaweicloud.com>,
Jonathan Corbet <corbet@lwn.net>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
"Rafael J. Wysocki" <rafael@kernel.org>,
Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
Maxime Ripard <mripard@kernel.org>,
Thomas Zimmermann <tzimmermann@suse.de>,
David Airlie <airlied@gmail.com>, Daniel Vetter <daniel@ffwll.ch>,
Christoph Hellwig <hch@lst.de>,
Marek Szyprowski <m.szyprowski@samsung.com>,
Robin Murphy <robin.murphy@arm.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Borislav Petkov <bp@suse.de>,
Randy Dunlap <rdunlap@infradead.org>,
Damien Le Moal <damien.lemoal@opensource.wdc.com>,
Kim Phillips <kim.phillips@amd.com>,
"Steven Rostedt (Google)" <rostedt@goodmis.org>,
Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
Hans de Goede <hdegoede@redhat.com>,
Jason Gunthorpe <jgg@ziepe.ca>, Kees Cook <keescook@chromium.org>,
Thomas Gleixner <tglx@linutronix.de>,
"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
open list <linux-kernel@vger.kernel.org>,
"open list:DRM DRIVERS" <dri-devel@lists.freedesktop.org>,
"open list:DMA MAPPING HELPERS" <iommu@lists.linux.dev>,
Roberto Sassu <roberto.sassu@huawei.com>,
Kefeng Wang <wangkefeng.wang@huawei.com>
Subject: Re: [PATCH v2 RESEND 7/7] swiotlb: per-device flag if there are dynamically allocated buffers
Date: Mon, 15 May 2023 17:28:38 +0100 [thread overview]
Message-ID: <ZGJdtmP13pv06xDH@arm.com> (raw)
In-Reply-To: <20230515120054.0115a4eb@meshulam.tesarici.cz>
(some of you replies may have been filtered to various of my mailboxes,
depending on which lists you cc'ed; replying here)
On Mon, May 15, 2023 at 12:00:54PM +0200, Petr Tesařík wrote:
> On Mon, 15 May 2023 10:48:47 +0200
> Petr Tesařík <petr@tesarici.cz> wrote:
> > On Sun, 14 May 2023 19:54:27 +0100
> > Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > Now, thinking about the list_head access and the flag ordering, since it
> > > doesn't matter, you might as well not bother with the flag at all and
> > > rely on list_add() and list_empty() ordering vs the hypothetical 'blah'
> > > access. Both of these use READ/WRITE_ONCE() for setting
> > > dma_io_tlb_dyn_slots.next. You only need an smp_wmb() after the
> > > list_add() and an smp_rmb() before a list_empty() check in
> ^^^^^^^^^
> Got it, finally. Well, that's exactly something I don't want to do.
> For example, on arm64 (seeing your email address), smp_rmb() translates
> to a "dsb ld" instruction. I would expect that this is more expensive
> than a "ldar", generated by smp_load_acquire().
It translates to a dmb ishld which is on par with ldar (dsb is indeed a
lot more expensive but that's not generated here).
> > > is_swiotlb_buffer(), no dma_iotlb_have_dyn variable.
> >
> > Wait, let me check that I understand you right. Do you suggest that I
> > convert dma_io_tlb_dyn_slots to a lockless list and get rid of the
> > spinlock?
> >
> > I'm sure it can be done for list_add() and list_del(). I'll have
> > to think about list_move().
>
> Hm, even the documentation of llist_empty() says that it is "not
> guaranteed to be accurate or up to date". If it could be, I'm quite
> sure the authors would have gladly implemented it as such.
It doesn't but neither does your flag. If you want a guarantee, you'd
need locks because a llist_empty() on its own can race with other
llist_add/del_*() that may not yet be visible to a CPU at exactly that
moment. BTW, the llist implementation cannot delete a random element, so
not sure this is suitable for your implementation (it can only delete
the first element or the whole list).
I do think you need to change your invariants and not rely on an
absolute list_empty() or some flag:
P0:
list_add(paddr);
WRITE_ONCE(blah, paddr);
P1:
paddr = READ_ONCE(blah);
list_empty();
Your invariant (on P1) should be blah == paddr => !list_empty(). If
there is another P2 removing paddr from the list, this wouldn't work
(nor your flag) but the assumption is that a correctly written driver
that still has a reference to paddr doesn't use it after being removed
from the list (i.e. it doesn't do a dma_unmap(paddr) and still continue
to use this paddr for e.g. dma_sync()).
For such invariant, you'd need ordering between list_add() and the
write of paddr (smp_wmb() would do). On P1, you need an smp_rmb() before
list_empty() since the implementation does a READ_ONCE only).
You still need the locks for list modifications and list traversal as I
don't see how you can use the llist implementation with random element
removal.
There is another scenario to take into account on the list_del() side.
Let's assume that there are other elements on the list, so
list_empty() == false:
P0:
list_del(paddr);
/* the memory gets freed, added to some slab or page free list */
WRITE_ONCE(slab_free_list, __va(paddr));
P1:
paddr = __pa(READ_ONCE(slab_free_list));/* re-allocating paddr freed on P0 */
if (!list_empty()) { /* assuming other elements on the list */
/* searching the list */
list_for_each() {
if (pos->paddr) == __pa(vaddr))
/* match */
}
}
On P0, you want the list update to be visible before the memory is freed
(and potentially reallocated on P1). An smp_wmb() on P0 would do. For
P1, we don't care about list_empty() as there can be other elements
already. But we do want any list elements reading during the search to
be ordered after the slab_free_list reading. The smp_rmb() you'd add for
the case above would suffice.
--
Catalin
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: "Petr Tesařík" <petr@tesarici.cz>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>,
"Rafael J. Wysocki" <rafael@kernel.org>,
"open list:DRM DRIVERS" <dri-devel@lists.freedesktop.org>,
Kim Phillips <kim.phillips@amd.com>,
Christoph Hellwig <hch@lst.de>,
Marek Szyprowski <m.szyprowski@samsung.com>,
Petr Tesarik <petrtesarik@huaweicloud.com>,
Jonathan Corbet <corbet@lwn.net>,
Damien Le Moal <damien.lemoal@opensource.wdc.com>,
"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
Jason Gunthorpe <jgg@ziepe.ca>,
"open list:DMA MAPPING HELPERS" <iommu@lists.linux.dev>,
Borislav Petkov <bp@suse.de>, Kees Cook <keescook@chromium.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
Hans de Goede <hdegoede@redhat.com>,
"Steven Rostedt \(Google\)" <rostedt@goodmis.org>,
Thomas Gleixner <tglx@linutronix.de>,
Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
Thomas Zimmermann <tzimmermann@suse.de>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Randy Dunlap <rdunlap@infradead.org>,
Roberto Sassu <roberto.sassu@huawei.com>,
open list <linux-kernel@vger.kernel.org>,
Robin Murphy <robin.murphy@arm.com>
Subject: Re: [PATCH v2 RESEND 7/7] swiotlb: per-device flag if there are dynamically allocated buffers
Date: Mon, 15 May 2023 17:28:38 +0100 [thread overview]
Message-ID: <ZGJdtmP13pv06xDH@arm.com> (raw)
In-Reply-To: <20230515120054.0115a4eb@meshulam.tesarici.cz>
(some of you replies may have been filtered to various of my mailboxes,
depending on which lists you cc'ed; replying here)
On Mon, May 15, 2023 at 12:00:54PM +0200, Petr Tesařík wrote:
> On Mon, 15 May 2023 10:48:47 +0200
> Petr Tesařík <petr@tesarici.cz> wrote:
> > On Sun, 14 May 2023 19:54:27 +0100
> > Catalin Marinas <catalin.marinas@arm.com> wrote:
> > > Now, thinking about the list_head access and the flag ordering, since it
> > > doesn't matter, you might as well not bother with the flag at all and
> > > rely on list_add() and list_empty() ordering vs the hypothetical 'blah'
> > > access. Both of these use READ/WRITE_ONCE() for setting
> > > dma_io_tlb_dyn_slots.next. You only need an smp_wmb() after the
> > > list_add() and an smp_rmb() before a list_empty() check in
> ^^^^^^^^^
> Got it, finally. Well, that's exactly something I don't want to do.
> For example, on arm64 (seeing your email address), smp_rmb() translates
> to a "dsb ld" instruction. I would expect that this is more expensive
> than a "ldar", generated by smp_load_acquire().
It translates to a dmb ishld which is on par with ldar (dsb is indeed a
lot more expensive but that's not generated here).
> > > is_swiotlb_buffer(), no dma_iotlb_have_dyn variable.
> >
> > Wait, let me check that I understand you right. Do you suggest that I
> > convert dma_io_tlb_dyn_slots to a lockless list and get rid of the
> > spinlock?
> >
> > I'm sure it can be done for list_add() and list_del(). I'll have
> > to think about list_move().
>
> Hm, even the documentation of llist_empty() says that it is "not
> guaranteed to be accurate or up to date". If it could be, I'm quite
> sure the authors would have gladly implemented it as such.
It doesn't but neither does your flag. If you want a guarantee, you'd
need locks because a llist_empty() on its own can race with other
llist_add/del_*() that may not yet be visible to a CPU at exactly that
moment. BTW, the llist implementation cannot delete a random element, so
not sure this is suitable for your implementation (it can only delete
the first element or the whole list).
I do think you need to change your invariants and not rely on an
absolute list_empty() or some flag:
P0:
list_add(paddr);
WRITE_ONCE(blah, paddr);
P1:
paddr = READ_ONCE(blah);
list_empty();
Your invariant (on P1) should be blah == paddr => !list_empty(). If
there is another P2 removing paddr from the list, this wouldn't work
(nor your flag) but the assumption is that a correctly written driver
that still has a reference to paddr doesn't use it after being removed
from the list (i.e. it doesn't do a dma_unmap(paddr) and still continue
to use this paddr for e.g. dma_sync()).
For such invariant, you'd need ordering between list_add() and the
write of paddr (smp_wmb() would do). On P1, you need an smp_rmb() before
list_empty() since the implementation does a READ_ONCE only).
You still need the locks for list modifications and list traversal as I
don't see how you can use the llist implementation with random element
removal.
There is another scenario to take into account on the list_del() side.
Let's assume that there are other elements on the list, so
list_empty() == false:
P0:
list_del(paddr);
/* the memory gets freed, added to some slab or page free list */
WRITE_ONCE(slab_free_list, __va(paddr));
P1:
paddr = __pa(READ_ONCE(slab_free_list));/* re-allocating paddr freed on P0 */
if (!list_empty()) { /* assuming other elements on the list */
/* searching the list */
list_for_each() {
if (pos->paddr) == __pa(vaddr))
/* match */
}
}
On P0, you want the list update to be visible before the memory is freed
(and potentially reallocated on P1). An smp_wmb() on P0 would do. For
P1, we don't care about list_empty() as there can be other elements
already. But we do want any list elements reading during the search to
be ordered after the slab_free_list reading. The smp_rmb() you'd add for
the case above would suffice.
--
Catalin
next prev parent reply other threads:[~2023-05-15 16:28 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-09 9:18 [PATCH v2 RESEND 0/7] Allow dynamic allocation of software IO TLB bounce buffers Petr Tesarik
2023-05-09 9:18 ` Petr Tesarik
2023-05-09 9:18 ` [PATCH v2 RESEND 1/7] swiotlb: Use a helper to initialize swiotlb fields in struct device Petr Tesarik
2023-05-09 9:18 ` Petr Tesarik
2023-05-09 9:18 ` [PATCH v2 RESEND 2/7] swiotlb: Move code around in preparation for dynamic bounce buffers Petr Tesarik
2023-05-09 9:18 ` Petr Tesarik
2023-05-09 9:18 ` [PATCH v2 RESEND 3/7] dma-mapping: introduce the DMA_ATTR_MAY_SLEEP attribute Petr Tesarik
2023-05-09 9:18 ` Petr Tesarik
2023-05-09 9:18 ` [PATCH v2 RESEND 4/7] swiotlb: Dynamically allocated bounce buffers Petr Tesarik
2023-05-09 9:18 ` Petr Tesarik
2023-05-15 19:43 ` Michael Kelley (LINUX)
2023-05-15 19:43 ` Michael Kelley (LINUX)
2023-05-16 6:13 ` Christoph Hellwig
2023-05-16 6:39 ` Petr Tesařík
2023-05-16 6:39 ` Petr Tesařík
2023-05-16 17:59 ` Catalin Marinas
2023-05-16 17:59 ` Catalin Marinas
2023-05-17 6:35 ` Petr Tesařík
2023-05-17 6:35 ` Petr Tesařík
2023-05-17 6:56 ` Christoph Hellwig
2023-05-17 7:32 ` Petr Tesařík
2023-05-17 7:32 ` Petr Tesařík
2023-05-17 9:41 ` Catalin Marinas
2023-05-17 9:41 ` Catalin Marinas
2023-05-17 9:58 ` Petr Tesařík
2023-05-17 9:58 ` Petr Tesařík
2023-05-17 11:08 ` Catalin Marinas
2023-05-17 11:08 ` Catalin Marinas
2023-05-17 11:27 ` Petr Tesařík
2023-05-17 11:27 ` Petr Tesařík
2023-05-23 9:54 ` Catalin Marinas
2023-05-23 9:54 ` Catalin Marinas
2023-05-23 11:53 ` Petr Tesařík
2023-05-23 11:53 ` Petr Tesařík
2023-05-16 6:16 ` Petr Tesařík
2023-05-16 6:16 ` Petr Tesařík
2023-05-09 9:18 ` [PATCH v2 RESEND 5/7] swiotlb: Add a boot option to enable dynamic " Petr Tesarik
2023-05-09 9:18 ` Petr Tesarik
2023-05-09 9:18 ` [PATCH v2 RESEND 6/7] drm: Use DMA_ATTR_MAY_SLEEP from process context Petr Tesarik
2023-05-09 9:18 ` Petr Tesarik
2023-05-09 9:18 ` [PATCH v2 RESEND 7/7] swiotlb: per-device flag if there are dynamically allocated buffers Petr Tesarik
2023-05-09 9:18 ` Petr Tesarik
2023-05-14 18:54 ` Catalin Marinas
2023-05-14 18:54 ` Catalin Marinas
2023-05-15 8:48 ` Petr Tesařík
2023-05-15 8:48 ` Petr Tesařík
2023-05-15 10:00 ` Petr Tesařík
2023-05-15 10:00 ` Petr Tesařík
2023-05-15 16:28 ` Catalin Marinas [this message]
2023-05-15 16:28 ` Catalin Marinas
2023-05-16 7:55 ` Petr Tesařík
2023-05-16 7:55 ` Petr Tesařík
2023-05-16 11:22 ` Catalin Marinas
2023-05-16 11:22 ` Catalin Marinas
[not found] ` <20230515104737.2c4c05db@meshulam.tesarici.cz>
[not found] ` <ZGH9v2KWJWZnKvxP@arm.com>
2023-05-15 10:47 ` Petr Tesařík
2023-05-15 10:47 ` Petr Tesařík
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZGJdtmP13pv06xDH@arm.com \
--to=catalin.marinas@arm.com \
--cc=airlied@gmail.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=bp@suse.de \
--cc=corbet@lwn.net \
--cc=damien.lemoal@opensource.wdc.com \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=gregkh@linuxfoundation.org \
--cc=hch@lst.de \
--cc=hdegoede@redhat.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@ziepe.ca \
--cc=keescook@chromium.org \
--cc=kim.phillips@amd.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=m.szyprowski@samsung.com \
--cc=maarten.lankhorst@linux.intel.com \
--cc=mripard@kernel.org \
--cc=paulmck@kernel.org \
--cc=petr@tesarici.cz \
--cc=petrtesarik@huaweicloud.com \
--cc=rafael@kernel.org \
--cc=rdunlap@infradead.org \
--cc=roberto.sassu@huawei.com \
--cc=robin.murphy@arm.com \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tzimmermann@suse.de \
--cc=wangkefeng.wang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.