DMA Engine development
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Fenghua Yu <fenghua.yu@intel.com>
Cc: Vinod Koul <vkoul@kernel.org>, Dave Jiang <dave.jiang@intel.com>,
	dmaengine@vger.kernel.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Nikhil Rao <nikhil.rao@intel.com>, Tony Zhu <tony.zhu@intel.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Subject: Re: [PATCH] dmaengine: idxd: Change wmb() to smp_wmb() when copying completion record to user space
Date: Tue, 30 Jan 2024 17:58:24 +0000	[thread overview]
Message-ID: <Zbk4wGNcB-g91Vr0@FVFF77S0Q05N> (raw)
In-Reply-To: <20240130025806.2027284-1-fenghua.yu@intel.com>

This patch might be ok (it looks reasonable as an optimization), but I think
the description of wmb() and smp_wmb() is incorrect. I also think that you're
missing an rmb()/smp_rmb()eor equivalent on the reader side.

On Mon, Jan 29, 2024 at 06:58:06PM -0800, Fenghua Yu wrote:
> wmb() is used to ensure status in the completion record is written
> after the rest of the completion record, making it visible to the user.
> However, on SMP systems, this may not guarantee visibility across
> different CPUs.
> 
> Considering this scenario that event log handler is running on CPU1 while
> user app is polling completion record (cr) status on CPU2:
> 
> 	CPU1				CPU2
> event log handler			user app
> 
> 					1. cr = 0 (status = 0)
> 2. copy X to user cr except "status"
> 3. wmb()
> 4. copy Y to user cr "status"
> 					5. poll status value Y
> 				 	6. read rest cr which is still 0.
> 					   cr handling fails
> 					7. cr value X visible now
> 
> Although wmb() ensure value Y is written and visible after X is written
> on CPU1, the order is not guaranteed on CPU2. So user app may see status
> value Y while cr value X is still not visible yet on CPU2. This will
> cause reading 0 from the rest of cr and cr handling fails.

The wmb() on CPU1 ensures the order of the reads, but you need an rmb() on CPU2
between reading the 'status' and 'rest' parts; otherwise CPU2 (or the
compiler!) is permitted to hoist the read of 'rest' early, before reading from
'status', and hence you can end up with a sequence that is effectively:

	CPU1				CPU2
  event log handler			user app
					
  					1. cr = 0 (status = 0)
  				 	6a. read rest cr which is still 0.
  2. copy X to user cr except "status"
  3. wmb()
  4. copy Y to user cr "status"
  					5. poll status value Y
  					6b. cr handling fails
  					7. cr value X visible now

Since this is all to regular cacheable memory, it's *sufficient* to use
smp_wmb() and smp_rmb(), but that's an optimization rather than an ordering
fix.

Note that on x86_64, TSO means that the stores are in-order (and so smp_wmb()
is just a compiler barrier), and IIUC loads are not reordered w.r.t. other
loads (and so smp_rmb() is also just a compiler barrier).

> Changing wmb() to smp_wmb() ensures Y is written after X on both CPU1
> and CPU2. This guarantees that user app can consume cr in right order.

This implies that smp_wmb() is *stronger* than wmb(), whereas smp_wmb() is
actually *weaker* (e.g. on x86_64 wmb() is an sfence, whereas smp_wmb() is a
barrier()).

Thanks,
Mark.

> 
> Fixes: b022f59725f0 ("dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling")
> Suggested-by: Nikhil Rao <nikhil.rao@intel.com>
> Tested-by: Tony Zhu <tony.zhu@intel.com>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> ---
>  drivers/dma/idxd/cdev.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
> index 77f8885cf407..9b7388a23cbe 100644
> --- a/drivers/dma/idxd/cdev.c
> +++ b/drivers/dma/idxd/cdev.c
> @@ -681,9 +681,10 @@ int idxd_copy_cr(struct idxd_wq *wq, ioasid_t pasid, unsigned long addr,
>  		 * Ensure that the completion record's status field is written
>  		 * after the rest of the completion record has been written.
>  		 * This ensures that the user receives the correct completion
> -		 * record information once polling for a non-zero status.
> +		 * record information on any CPU once polling for a non-zero
> +		 * status.
>  		 */
> -		wmb();
> +		smp_wmb();
>  		status = *(u8 *)cr;
>  		if (put_user(status, (u8 __user *)addr))
>  			left += status_size;
> -- 
> 2.37.1
> 
> 

  parent reply	other threads:[~2024-01-30 17:58 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-30  2:58 [PATCH] dmaengine: idxd: Change wmb() to smp_wmb() when copying completion record to user space Fenghua Yu
2024-01-30 15:35 ` Dave Jiang
2024-01-30 16:01 ` Lijun Pan
2024-01-30 17:58 ` Mark Rutland [this message]
2024-01-30 18:14   ` Mark Rutland
2024-01-30 19:53   ` Boqun Feng
2024-01-30 20:30     ` Dave Hansen
2024-01-30 20:41       ` Fenghua Yu
2024-01-31  4:18         ` Rao, Nikhil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zbk4wGNcB-g91Vr0@FVFF77S0Q05N \
    --to=mark.rutland@arm.com \
    --cc=dave.jiang@intel.com \
    --cc=dmaengine@vger.kernel.org \
    --cc=fenghua.yu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=nikhil.rao@intel.com \
    --cc=tony.zhu@intel.com \
    --cc=vkoul@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox