Re: [PATCH v1 0/2] nvme-pci: Fix dma-iommu mapping failures when PAGE_SIZE=64KB

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Nicolin Chen <nicolinc@nvidia.com>
To: Will Deacon <will@kernel.org>
Cc: <sagi@grimberg.me>, <hch@lst.de>, <axboe@kernel.dk>,
	<kbusch@kernel.org>, <joro@8bytes.org>, <robin.murphy@arm.com>,
	<jgg@nvidia.com>, <linux-nvme@lists.infradead.org>,
	<linux-kernel@vger.kernel.org>, <iommu@lists.linux.dev>,
	<murphyt7@tcd.ie>, <baolu.lu@linux.intel.com>
Subject: Re: [PATCH v1 0/2] nvme-pci: Fix dma-iommu mapping failures when PAGE_SIZE=64KB
Date: Fri, 16 Feb 2024 21:19:44 -0800	[thread overview]
Message-ID: <ZdBB8BYsK6WvwTYC@Asurada-Nvidia> (raw)
In-Reply-To: <20240216161312.GA2203@willie-the-truck>

Hi Will,

On Fri, Feb 16, 2024 at 04:13:12PM +0000, Will Deacon wrote:
> On Thu, Feb 15, 2024 at 04:26:23PM -0800, Nicolin Chen wrote:
> > On Thu, Feb 15, 2024 at 04:35:45PM +0000, Will Deacon wrote:
> > > On Thu, Feb 15, 2024 at 02:22:09PM +0000, Will Deacon wrote:
> > > > On Wed, Feb 14, 2024 at 11:57:32AM -0800, Nicolin Chen wrote:
> > > > > On Wed, Feb 14, 2024 at 04:41:38PM +0000, Will Deacon wrote:
> > > > > > On Tue, Feb 13, 2024 at 01:53:55PM -0800, Nicolin Chen wrote:
> > > > > And it seems to get worse, as even a 64KB mapping is failing:
> > > > > [    0.239821] nvme 0000:00:01.0: swiotlb buffer is full (sz: 65536 bytes), total 32768 (slots), used 0 (slots)
> > > > >
> > > > > With a printk, I found the iotlb_align_mask isn't correct:
> > > > >    swiotlb_area_find_slots:alloc_align_mask 0xffff, iotlb_align_mask 0x800
> > > > >
> > > > > But fixing the iotlb_align_mask to 0x7ff still fails the 64KB
> > > > > mapping..
> > > >
> > > > Hmm. A mask of 0x7ff doesn't make a lot of sense given that the slabs
> > > > are 2KiB aligned. I'll try plugging in some of the constants you have
> > > > here, as something definitely isn't right...
> > >
> > > Sorry, another ask: please can you print 'orig_addr' in the case of the
> > > failing allocation?
> >
> > I added nvme_print_sgl() in the nvme-pci driver before its
> > dma_map_sgtable() call, so the orig_addr isn't aligned with
> > PAGE_SIZE=64K or NVME_CTRL_PAGE_SIZE=4K:
> >  sg[0] phys_addr:0x0000000105774600 offset:17920 length:512 dma_address:0x0000000000000000 dma_length:0
> >
> > Also attaching some verbose logs, in case you'd like to check:
> >    nvme 0000:00:01.0: swiotlb_area_find_slots: dma_get_min_align_mask 0xfff, IO_TLB_SIZE 0xfffff7ff
> >    nvme 0000:00:01.0: swiotlb_area_find_slots: alloc_align_mask 0xffff, iotlb_align_mask 0x7ff
> >    nvme 0000:00:01.0: swiotlb_area_find_slots: stride 0x20, max 0xffff
> >    nvme 0000:00:01.0: swiotlb_area_find_slots: tlb_addr=0xbd830000, iotlb_align_mask=0x7ff, alloc_align_mask=0xffff
> > => nvme 0000:00:01.0: swiotlb_area_find_slots: orig_addr=0x105774600, iotlb_align_mask=0x7ff
> 
> With my patches, I think 'iotlb_align_mask' will be 0x800 here, so this

Oops, my bad. I forgot to revert the part that I mentioned in
my previous reply.

> particular allocation might be alright, however I think I'm starting to
> see the wider problem. The IOMMU code is asking for a 64k-aligned
> allocation so that it can map it safely, but at the same time
> dma_get_min_align_mask() is asking for congruence in the 4k NVME page
> offset. Now, because we're going to allocate a 64k-aligned mapping and
> offset it, I think the NVME alignment will just fall out in the wash and
> checking the 'orig_addr' (which includes the offset) is wrong.
> 
> So perhaps this diff (which I'm sadly not able to test) will help? You'll
> want to apply it on top of my other patches. The idea is to ignore the
> bits of 'orig_addr' which will be aligned automatically by offseting from
> the aligned allocation. I fixed the max() thing too, although that's only
> an issue for older kernels.
 
Yea, I tested all 4 patches. They still failed at some large
mapping, until I added on top of them my PATCH-1 implementing
the max_mapping_size op. IOW, with your patches it looks like
252KB max_mapping_size is working :)

Though we seem to have a solution now, I hope we can make it
applicable to older kernels too. The mapping failure on arm64
with PAGE_SIZE=64KB looks like a regression to me, since dma-
iommu started to use swiotlb bounce buffer.

Thanks
Nicolin

> --->8
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 283eea33dd22..4a000d97f568 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -981,8 +981,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
>         dma_addr_t tbl_dma_addr =
>                 phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
>         unsigned long max_slots = get_max_slots(boundary_mask);
> -       unsigned int iotlb_align_mask =
> -               dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
> +       unsigned int iotlb_align_mask = dma_get_min_align_mask(dev);
>         unsigned int nslots = nr_slots(alloc_size), stride;
>         unsigned int offset = swiotlb_align_offset(dev, orig_addr);
>         unsigned int index, slots_checked, count = 0, i;
> @@ -993,6 +992,9 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
>         BUG_ON(!nslots);
>         BUG_ON(area_index >= pool->nareas);
> 
> +       alloc_align_mask |= (IO_TLB_SIZE - 1);
> +       iotlb_align_mask &= ~alloc_align_mask;
> +
>         /*
>          * For mappings with an alignment requirement don't bother looping to
>          * unaligned slots once we found an aligned one.
> @@ -1004,7 +1006,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
>          * allocations.
>          */
>         if (alloc_size >= PAGE_SIZE)
> -               stride = max(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1);
> +               stride = umax(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1);
> 
>         spin_lock_irqsave(&area->lock, flags);
>         if (unlikely(nslots > pool->area_nslabs - area->used))
>

next prev parent reply	other threads:[~2024-02-17  5:20 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-13 21:53 [PATCH v1 0/2] nvme-pci: Fix dma-iommu mapping failures when PAGE_SIZE=64KB Nicolin Chen
2024-02-13 21:53 ` [PATCH v1 1/2] iommu/dma: Force swiotlb_max_mapping_size on an untrusted device Nicolin Chen
2024-02-13 21:53 ` [PATCH v1 2/2] nvme-pci: Fix iommu map (via swiotlb) failures when PAGE_SIZE=64KB Nicolin Chen
2024-02-13 23:31   ` Keith Busch
2024-02-14  6:09     ` Nicolin Chen
2024-02-15  1:36       ` Keith Busch
2024-02-15  4:46         ` Nicolin Chen
2024-02-15 12:01           ` Robin Murphy
2024-02-16  1:07             ` Nicolin Chen
2024-02-14 16:41 ` [PATCH v1 0/2] nvme-pci: Fix dma-iommu mapping " Will Deacon
2024-02-14 19:57   ` Nicolin Chen
2024-02-15 14:22     ` Will Deacon
2024-02-15 16:35       ` Will Deacon
2024-02-16  0:26         ` Nicolin Chen
2024-02-16 16:13           ` Will Deacon
2024-02-17  5:19             ` Nicolin Chen [this message]
2024-02-19  4:05               ` Michael Kelley
2024-02-16  0:29       ` Nicolin Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZdBB8BYsK6WvwTYC@Asurada-Nvidia \
    --to=nicolinc@nvidia.com \
    --cc=axboe@kernel.dk \
    --cc=baolu.lu@linux.intel.com \
    --cc=hch@lst.de \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kbusch@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=murphyt7@tcd.ie \
    --cc=robin.murphy@arm.com \
    --cc=sagi@grimberg.me \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox