From: Nicolin Chen <nicolinc@nvidia.com>
To: Will Deacon <will@kernel.org>
Cc: <sagi@grimberg.me>, <hch@lst.de>, <axboe@kernel.dk>,
<kbusch@kernel.org>, <joro@8bytes.org>, <robin.murphy@arm.com>,
<jgg@nvidia.com>, <linux-nvme@lists.infradead.org>,
<linux-kernel@vger.kernel.org>, <iommu@lists.linux.dev>,
<murphyt7@tcd.ie>, <baolu.lu@linux.intel.com>
Subject: Re: [PATCH v1 0/2] nvme-pci: Fix dma-iommu mapping failures when PAGE_SIZE=64KB
Date: Fri, 16 Feb 2024 21:19:44 -0800 [thread overview]
Message-ID: <ZdBB8BYsK6WvwTYC@Asurada-Nvidia> (raw)
In-Reply-To: <20240216161312.GA2203@willie-the-truck>
Hi Will,
On Fri, Feb 16, 2024 at 04:13:12PM +0000, Will Deacon wrote:
> On Thu, Feb 15, 2024 at 04:26:23PM -0800, Nicolin Chen wrote:
> > On Thu, Feb 15, 2024 at 04:35:45PM +0000, Will Deacon wrote:
> > > On Thu, Feb 15, 2024 at 02:22:09PM +0000, Will Deacon wrote:
> > > > On Wed, Feb 14, 2024 at 11:57:32AM -0800, Nicolin Chen wrote:
> > > > > On Wed, Feb 14, 2024 at 04:41:38PM +0000, Will Deacon wrote:
> > > > > > On Tue, Feb 13, 2024 at 01:53:55PM -0800, Nicolin Chen wrote:
> > > > > And it seems to get worse, as even a 64KB mapping is failing:
> > > > > [ 0.239821] nvme 0000:00:01.0: swiotlb buffer is full (sz: 65536 bytes), total 32768 (slots), used 0 (slots)
> > > > >
> > > > > With a printk, I found the iotlb_align_mask isn't correct:
> > > > > swiotlb_area_find_slots:alloc_align_mask 0xffff, iotlb_align_mask 0x800
> > > > >
> > > > > But fixing the iotlb_align_mask to 0x7ff still fails the 64KB
> > > > > mapping..
> > > >
> > > > Hmm. A mask of 0x7ff doesn't make a lot of sense given that the slabs
> > > > are 2KiB aligned. I'll try plugging in some of the constants you have
> > > > here, as something definitely isn't right...
> > >
> > > Sorry, another ask: please can you print 'orig_addr' in the case of the
> > > failing allocation?
> >
> > I added nvme_print_sgl() in the nvme-pci driver before its
> > dma_map_sgtable() call, so the orig_addr isn't aligned with
> > PAGE_SIZE=64K or NVME_CTRL_PAGE_SIZE=4K:
> > sg[0] phys_addr:0x0000000105774600 offset:17920 length:512 dma_address:0x0000000000000000 dma_length:0
> >
> > Also attaching some verbose logs, in case you'd like to check:
> > nvme 0000:00:01.0: swiotlb_area_find_slots: dma_get_min_align_mask 0xfff, IO_TLB_SIZE 0xfffff7ff
> > nvme 0000:00:01.0: swiotlb_area_find_slots: alloc_align_mask 0xffff, iotlb_align_mask 0x7ff
> > nvme 0000:00:01.0: swiotlb_area_find_slots: stride 0x20, max 0xffff
> > nvme 0000:00:01.0: swiotlb_area_find_slots: tlb_addr=0xbd830000, iotlb_align_mask=0x7ff, alloc_align_mask=0xffff
> > => nvme 0000:00:01.0: swiotlb_area_find_slots: orig_addr=0x105774600, iotlb_align_mask=0x7ff
>
> With my patches, I think 'iotlb_align_mask' will be 0x800 here, so this
Oops, my bad. I forgot to revert the part that I mentioned in
my previous reply.
> particular allocation might be alright, however I think I'm starting to
> see the wider problem. The IOMMU code is asking for a 64k-aligned
> allocation so that it can map it safely, but at the same time
> dma_get_min_align_mask() is asking for congruence in the 4k NVME page
> offset. Now, because we're going to allocate a 64k-aligned mapping and
> offset it, I think the NVME alignment will just fall out in the wash and
> checking the 'orig_addr' (which includes the offset) is wrong.
>
> So perhaps this diff (which I'm sadly not able to test) will help? You'll
> want to apply it on top of my other patches. The idea is to ignore the
> bits of 'orig_addr' which will be aligned automatically by offseting from
> the aligned allocation. I fixed the max() thing too, although that's only
> an issue for older kernels.
Yea, I tested all 4 patches. They still failed at some large
mapping, until I added on top of them my PATCH-1 implementing
the max_mapping_size op. IOW, with your patches it looks like
252KB max_mapping_size is working :)
Though we seem to have a solution now, I hope we can make it
applicable to older kernels too. The mapping failure on arm64
with PAGE_SIZE=64KB looks like a regression to me, since dma-
iommu started to use swiotlb bounce buffer.
Thanks
Nicolin
> --->8
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 283eea33dd22..4a000d97f568 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -981,8 +981,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> dma_addr_t tbl_dma_addr =
> phys_to_dma_unencrypted(dev, pool->start) & boundary_mask;
> unsigned long max_slots = get_max_slots(boundary_mask);
> - unsigned int iotlb_align_mask =
> - dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
> + unsigned int iotlb_align_mask = dma_get_min_align_mask(dev);
> unsigned int nslots = nr_slots(alloc_size), stride;
> unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> unsigned int index, slots_checked, count = 0, i;
> @@ -993,6 +992,9 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> BUG_ON(!nslots);
> BUG_ON(area_index >= pool->nareas);
>
> + alloc_align_mask |= (IO_TLB_SIZE - 1);
> + iotlb_align_mask &= ~alloc_align_mask;
> +
> /*
> * For mappings with an alignment requirement don't bother looping to
> * unaligned slots once we found an aligned one.
> @@ -1004,7 +1006,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool
> * allocations.
> */
> if (alloc_size >= PAGE_SIZE)
> - stride = max(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1);
> + stride = umax(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1);
>
> spin_lock_irqsave(&area->lock, flags);
> if (unlikely(nslots > pool->area_nslabs - area->used))
>
next prev parent reply other threads:[~2024-02-17 5:20 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-13 21:53 [PATCH v1 0/2] nvme-pci: Fix dma-iommu mapping failures when PAGE_SIZE=64KB Nicolin Chen
2024-02-13 21:53 ` [PATCH v1 1/2] iommu/dma: Force swiotlb_max_mapping_size on an untrusted device Nicolin Chen
2024-02-13 21:53 ` [PATCH v1 2/2] nvme-pci: Fix iommu map (via swiotlb) failures when PAGE_SIZE=64KB Nicolin Chen
2024-02-13 23:31 ` Keith Busch
2024-02-14 6:09 ` Nicolin Chen
2024-02-15 1:36 ` Keith Busch
2024-02-15 4:46 ` Nicolin Chen
2024-02-15 12:01 ` Robin Murphy
2024-02-16 1:07 ` Nicolin Chen
2024-02-14 16:41 ` [PATCH v1 0/2] nvme-pci: Fix dma-iommu mapping " Will Deacon
2024-02-14 19:57 ` Nicolin Chen
2024-02-15 14:22 ` Will Deacon
2024-02-15 16:35 ` Will Deacon
2024-02-16 0:26 ` Nicolin Chen
2024-02-16 16:13 ` Will Deacon
2024-02-17 5:19 ` Nicolin Chen [this message]
2024-02-19 4:05 ` Michael Kelley
2024-02-16 0:29 ` Nicolin Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZdBB8BYsK6WvwTYC@Asurada-Nvidia \
--to=nicolinc@nvidia.com \
--cc=axboe@kernel.dk \
--cc=baolu.lu@linux.intel.com \
--cc=hch@lst.de \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=murphyt7@tcd.ie \
--cc=robin.murphy@arm.com \
--cc=sagi@grimberg.me \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.