From: John Garry <john.g.garry@oracle.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vasant Hegde <vasant.hegde@amd.com>,
Robin Murphy <robin.murphy@arm.com>,
joro@8bytes.org, will@kernel.org, iommu@lists.linux.dev,
linux-kernel@vger.kernel.org, Jakub Kicinski <kuba@kernel.org>
Subject: Re: [PATCH v4] iommu: Optimise PCI SAC address trick
Date: Tue, 18 Apr 2023 19:50:06 +0100 [thread overview]
Message-ID: <8404e7ef-d929-bd39-a10d-f4053cc4bd3e@oracle.com> (raw)
In-Reply-To: <CAHk-=whogEk1UJfU3E7aW18PDYRbdAzXta5J0ECg=CB5=sCe7g@mail.gmail.com>
On 18/04/2023 18:36, Linus Torvalds wrote:
>> JFYI, Since you are using NVMe, you could also alternatively try
>> something like which I did for some SCSI storage controller drivers to
>> limit the request_queue max_sectors soft limit, like:
> That patch is not only whitespace-damaged, it's randomly missing one
> '+' character
My copy and paste error
> so it makes no sense even ignoring the whitespace
> problems._and_ it has a nonsensical cast to 'unsigned int' which
> makes that 'min()' possibly do crazy and invalid things (ie imagine
> dma_opt_mapping_size() returning 4GB).
>
> You can't cast things to the smaller size just to get rid of a
> warning, for chrissake!
Yeah, sorry, I was just trying to show a very quick demo of how this can
actually be done.
Indeed, I could have mentioned that it would actually have been easier
to test by feeding a lower limit into /sys/block/<dev>/queue/max_sectors_kb
>
> In fact, even without the cast, it seems entirely broken, since the
> fallback for dma_opt_mapping_size() is to return 0 (admittedly_that_
> case only happens with HAS_DMA=n).
>
> Finally, doing this inside the
>
> if (ctrl->max_hw_sectors) {
I think that this would be set for PCI NVMe controllers, which we were
interested in here. But, indeed, I could check for a better place to set
this.
>
> conditional seems entirely wrong, since any dma mapping limits would
> be entirely independent of any driver maximum hw size, and in fact
> *easier* to hit if the block device itself doesn't have any max
> limits.
>
> So please burn that patch in the darkest pits of hell and let's try to
> forget it ever existed. Ok?
Sure
>
> Also, shouldn't any possible dma mapping size affect not
> 'max_sectors', but 'max_segment_size'? At least the docs imply that
> dma_opt_mapping_size() is about the max size of a_single_ mapping,
> not of the whole thing?
It's meant to apply to total mapping length and not a single segment, so
then the doc would be misleading.
>
> Anyway, if this is actually an issue, to the point that it's now being
> discussed for a_second_ block driver subsystem, then shouldn't the
> queue handling just do this all automatically, instead of adding
> random crap to random block driver architectures?
Other storage controllers may enjoy better performance with very large
DMA mappings (whose total length exceed the IOVA caching limit), so it
was too risky to apply a performance-related change of this nature
across the board when that API was introduced.
So far it had only been a single controller where we were actually
seeing the issue of alloc'ing IOVAs giving very (very) poor performance.
However, as far as I am aware, there was nothing special about that
controller, apart from the fact that it was often creating requests
whose length exceeded that IOVA caching limit, and it also filling the
32b IOVA space quickly - that may be because the system had lots of CPUs.
Since there are now reports of poor performance in other storage
controllers and also in networking adapters, I can only assume that
people are testing more often with IOMMU-enabled systems with lots of
CPUs. Having said that, I would still be cautious of applying that limit
everywhere.
>
> And no, I don't know this code, so maybe I'm entirely missing
> something, but that patch just raised my hackles enough that I had to
> say something.
Sure.
Thanks,
John
next prev parent reply other threads:[~2023-04-18 18:50 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-13 13:40 [PATCH v4] iommu: Optimise PCI SAC address trick Robin Murphy
2023-04-13 14:02 ` Jakub Kicinski
2023-04-14 11:45 ` Joerg Roedel
2023-04-14 17:45 ` Robin Murphy
2023-05-23 16:06 ` Joerg Roedel
2023-05-24 14:56 ` Robin Murphy
2023-06-13 17:58 ` Jakub Kicinski
2023-06-15 7:49 ` John Garry
2023-06-15 9:04 ` Robin Murphy
2023-06-15 10:11 ` John Garry
2023-06-15 11:41 ` Robin Murphy
2023-06-15 12:15 ` John Garry
2023-04-18 9:23 ` Vasant Hegde
2023-04-18 10:19 ` John Garry
2023-04-18 17:36 ` Linus Torvalds
2023-04-18 18:50 ` John Garry [this message]
2023-04-18 10:57 ` Robin Murphy
2023-04-18 13:05 ` Vasant Hegde
2023-07-14 14:09 ` Joerg Roedel
2023-07-17 9:24 ` John Garry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8404e7ef-d929-bd39-a10d-f4053cc4bd3e@oracle.com \
--to=john.g.garry@oracle.com \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=robin.murphy@arm.com \
--cc=torvalds@linux-foundation.org \
--cc=vasant.hegde@amd.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox