From: Mark Rutland <mark.rutland@arm.com>
To: Fenghua Yu <fenghua.yu@intel.com>
Cc: Vinod Koul <vkoul@kernel.org>, Dave Jiang <dave.jiang@intel.com>,
dmaengine@vger.kernel.org,
linux-kernel <linux-kernel@vger.kernel.org>,
Nikhil Rao <nikhil.rao@intel.com>, Tony Zhu <tony.zhu@intel.com>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Subject: Re: [PATCH] dmaengine: idxd: Change wmb() to smp_wmb() when copying completion record to user space
Date: Tue, 30 Jan 2024 18:14:11 +0000 [thread overview]
Message-ID: <Zbk8c6l5gZslj6wJ@FVFF77S0Q05N> (raw)
In-Reply-To: <Zbk4wGNcB-g91Vr0@FVFF77S0Q05N>
On Tue, Jan 30, 2024 at 05:58:24PM +0000, Mark Rutland wrote:
> This patch might be ok (it looks reasonable as an optimization), but I think
> the description of wmb() and smp_wmb() is incorrect. I also think that you're
> missing an rmb()/smp_rmb()eor equivalent on the reader side.
Sorry, the above should have said:
an rmb()/smp_rmb() *or* equivalent
>
> On Mon, Jan 29, 2024 at 06:58:06PM -0800, Fenghua Yu wrote:
> > wmb() is used to ensure status in the completion record is written
> > after the rest of the completion record, making it visible to the user.
> > However, on SMP systems, this may not guarantee visibility across
> > different CPUs.
> >
> > Considering this scenario that event log handler is running on CPU1 while
> > user app is polling completion record (cr) status on CPU2:
> >
> > CPU1 CPU2
> > event log handler user app
> >
> > 1. cr = 0 (status = 0)
> > 2. copy X to user cr except "status"
> > 3. wmb()
> > 4. copy Y to user cr "status"
> > 5. poll status value Y
> > 6. read rest cr which is still 0.
> > cr handling fails
> > 7. cr value X visible now
> >
> > Although wmb() ensure value Y is written and visible after X is written
> > on CPU1, the order is not guaranteed on CPU2. So user app may see status
> > value Y while cr value X is still not visible yet on CPU2. This will
> > cause reading 0 from the rest of cr and cr handling fails.
>
> The wmb() on CPU1 ensures the order of the reads, but you need an rmb() on CPU2
Sorry again, the above should have said:
The wmb() on CPU1 ensures the order of the *writes*
Apologies for any confusion resulting from those mistakes.
Mark.
> between reading the 'status' and 'rest' parts; otherwise CPU2 (or the
> compiler!) is permitted to hoist the read of 'rest' early, before reading from
> 'status', and hence you can end up with a sequence that is effectively:
>
> CPU1 CPU2
> event log handler user app
>
> 1. cr = 0 (status = 0)
> 6a. read rest cr which is still 0.
> 2. copy X to user cr except "status"
> 3. wmb()
> 4. copy Y to user cr "status"
> 5. poll status value Y
> 6b. cr handling fails
> 7. cr value X visible now
>
> Since this is all to regular cacheable memory, it's *sufficient* to use
> smp_wmb() and smp_rmb(), but that's an optimization rather than an ordering
> fix.
>
> Note that on x86_64, TSO means that the stores are in-order (and so smp_wmb()
> is just a compiler barrier), and IIUC loads are not reordered w.r.t. other
> loads (and so smp_rmb() is also just a compiler barrier).
>
> > Changing wmb() to smp_wmb() ensures Y is written after X on both CPU1
> > and CPU2. This guarantees that user app can consume cr in right order.
>
> This implies that smp_wmb() is *stronger* than wmb(), whereas smp_wmb() is
> actually *weaker* (e.g. on x86_64 wmb() is an sfence, whereas smp_wmb() is a
> barrier()).
>
> Thanks,
> Mark.
>
> >
> > Fixes: b022f59725f0 ("dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling")
> > Suggested-by: Nikhil Rao <nikhil.rao@intel.com>
> > Tested-by: Tony Zhu <tony.zhu@intel.com>
> > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> > ---
> > drivers/dma/idxd/cdev.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
> > index 77f8885cf407..9b7388a23cbe 100644
> > --- a/drivers/dma/idxd/cdev.c
> > +++ b/drivers/dma/idxd/cdev.c
> > @@ -681,9 +681,10 @@ int idxd_copy_cr(struct idxd_wq *wq, ioasid_t pasid, unsigned long addr,
> > * Ensure that the completion record's status field is written
> > * after the rest of the completion record has been written.
> > * This ensures that the user receives the correct completion
> > - * record information once polling for a non-zero status.
> > + * record information on any CPU once polling for a non-zero
> > + * status.
> > */
> > - wmb();
> > + smp_wmb();
> > status = *(u8 *)cr;
> > if (put_user(status, (u8 __user *)addr))
> > left += status_size;
> > --
> > 2.37.1
> >
> >
>
next prev parent reply other threads:[~2024-01-30 18:14 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-30 2:58 [PATCH] dmaengine: idxd: Change wmb() to smp_wmb() when copying completion record to user space Fenghua Yu
2024-01-30 15:35 ` Dave Jiang
2024-01-30 16:01 ` Lijun Pan
2024-01-30 17:58 ` Mark Rutland
2024-01-30 18:14 ` Mark Rutland [this message]
2024-01-30 19:53 ` Boqun Feng
2024-01-30 20:30 ` Dave Hansen
2024-01-30 20:41 ` Fenghua Yu
2024-01-31 4:18 ` Rao, Nikhil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zbk8c6l5gZslj6wJ@FVFF77S0Q05N \
--to=mark.rutland@arm.com \
--cc=dave.jiang@intel.com \
--cc=dmaengine@vger.kernel.org \
--cc=fenghua.yu@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=nikhil.rao@intel.com \
--cc=tony.zhu@intel.com \
--cc=vkoul@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox