Re: [Regression] b1a000d3b8ec ("block: relax direct io memory alignment")

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Catalin Marinas <catalin.marinas@arm.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>,
	linux-block@vger.kernel.org, Keith Busch <kbusch@kernel.org>,
	Jens Axboe <axboe@kernel.dk>, Robin Murphy <robin.murphy@arm.com>
Subject: Re: [Regression] b1a000d3b8ec ("block: relax direct io memory alignment")
Date: Tue, 22 Oct 2024 11:24:31 +0100	[thread overview]
Message-ID: <Zxd9XyqqA604F1Rn@arm.com> (raw)
In-Reply-To: <Zw958YtMExrNhUxy@fedora>

On Wed, Oct 16, 2024 at 04:31:45PM +0800, Ming Lei wrote:
> On Wed, Oct 16, 2024 at 10:04:19AM +0200, Christoph Hellwig wrote:
> > On Wed, Oct 16, 2024 at 12:40:13AM +0800, Ming Lei wrote:
> > > Turns out host controller's DMA alignment is often too relax, so two DMA
> > > buffers may cross same cache line easily, and trigger the warning of
> > > "cacheline tracking EEXIST, overlapping mappings aren't supported".
> > > 
> > > The attached test code can trigger the warning immediately with CONFIG_DMA_API_DEBUG
> > > enabled when reading from one scsi disk which queue DMA alignment is 3.
> > 
> > We should not allow smaller than cache line alignment on architectures
> > that are not cache coherent indeed.

Even on architectures that are not fully coherent, the coherency is a
property of the device. You may need to somehow pass this information in
struct queue_limits if you want it to be optimal.

> Yes, something like the following change:
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index a446654ddee5..26bd0e72c68e 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -348,7 +348,9 @@ static int blk_validate_limits(struct queue_limits *lim)
>  	 */
>  	if (!lim->dma_alignment)
>  		lim->dma_alignment = SECTOR_SIZE - 1;
> -	if (WARN_ON_ONCE(lim->dma_alignment > PAGE_SIZE))
> +	else if (lim->dma_alignment < L1_CACHE_BYTES - 1)
> +		lim->dma_alignment = L1_CACHE_BYTES - 1;
> +	else if (WARN_ON_ONCE(lim->dma_alignment > PAGE_SIZE))
>  		return -EINVAL;

L1_CACHE_BYTES is not the right check here since a level 2/3 cache may
have a larger cache line than level 1 (and we have such configurations
on arm64 where ARCH_DMA_MINALIGN is 128 and L1_CACHE_BYTES is 64). Use
dma_get_cache_alignment() instead. On fully coherent architectures like
x86 it should return 1.

That said, the DMA debug code also uses the static L1_CACHE_SHIFT and it
will trigger the warning anyway. Some discussion around the DMA API
debug came up during the small ARCH_KMALLOC_MINALIGN changes (don't
remember it was in private with Robin or on the list). Now kmalloc() can
return a small buffer (less than a cache line) that won't be bounced if
the device is coherent (see dma_kmalloc_safe()) but the DMA API debug
code only checks for direction == DMA_TO_DEVICE, not
dev_is_dma_coherent(). For arm64 I did not want to disable small
ARCH_KMALLOC_MINALIGN if CONFIG_DMA_API_DEBUG is enabled as this would
skew the testing by forcing all allocations to be ARCH_DMA_MINALIGN
aligned.

Maybe I'm missing something in those checks but I'm surprised that the
DMA API debug code doesn't complain about small kmalloc() buffers on x86
(which never had any bouncing for this specific case since it's fully
coherent). I suspect people just don't enable DMA debugging on x86 for
such devices (typically USB drivers have this issue).

So maybe the DMA API debug should have two modes: a generic one that
catches alignments irrespective of the coherency of the device and
another that's specific to the device/architecture coherency properties.
The former, if enabled, should also force a higher minimum kmalloc()
alignment and a dma_get_cache_alignment() > 1.

-- 
Catalin

next prev parent reply	other threads:[~2024-10-22 10:24 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-15 16:40 [Regression] b1a000d3b8ec ("block: relax direct io memory alignment") Ming Lei
2024-10-16  8:04 ` Christoph Hellwig
2024-10-16  8:31   ` Ming Lei
2024-10-16 12:31     ` Christoph Hellwig
2024-10-22  1:21       ` Ming Lei
2024-10-22  7:25         ` Christoph Hellwig
2024-10-22  2:15     ` Jens Axboe
2024-10-22 10:24     ` Catalin Marinas [this message]
2024-10-23  0:50       ` Ming Lei
2024-10-23  6:12       ` Christoph Hellwig
2024-10-23  8:14         ` Ming Lei
2024-10-23 12:23           ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zxd9XyqqA604F1Rn@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=robin.murphy@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).