From: dwmw2@infradead.org (David Woodhouse)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware
Date: Fri, 11 Nov 2011 13:27:28 +0000 [thread overview]
Message-ID: <1321018048.2027.44.camel@shinybook.infradead.org> (raw)
In-Reply-To: <20111111125837.GF13213@amd.com>
On Fri, 2011-11-11 at 13:58 +0100, Joerg Roedel wrote:
> For AMD IOMMU there is a feature called not-present cache. It says that
> the IOMMU caches non-present entries as well and needs an IOTLB flush
> when something is mapped (meant for software implementations of the
> IOMMU).
> So it can't be really taken out of the fast-path. But the IOMMU driver
> can optimize the function so that it only flushes the IOTLB when there
> was an unmap-call before.
We have exactly the same situation with the Intel IOMMU (we call it
'Caching Mode') for the same reasons.
I'd be wary about making the IOMMU driver *track* whether there was an
unmap call before ? that seems like hard work and more cache contention,
especially if the ->commit() call happens on a CPU other than the one
that just did the unmap.
I'm also not sure exactly when you'd call the ->commit() function when
the DMA API is being used, and which 'side' of that API the
deferred-flush optimisations would live.
Would the optimisation be done on the generic side, only calling
->commit when it absolutely *has* to happen? (Or periodically after
unmaps have happened to avoid entries hanging around for ever?)
Or would the optimisation be done in the IOMMU driver, thus turning the
->commit() function into more of a *hint*? You could add a 'simon_says'
boolean argument to it, I suppose...?
> It is also an improvement over the current
> situation where every iommu_unmap call results in a flush implicitly.
> This pretty much a no-go for using IOMMU-API in DMA mapping at the
> moment.
Right. That definitely needs to be handled. We just need to work out the
(above and other) details.
> > But also, it's not *so* much of an issue to divide the space up even
> > when it's limited. The idea was not to have it *strictly* per-CPU, but
> > just for a CPU to try allocating from "its own" subrange first?
>
> Yeah, I get the idea. I fear that the memory consumption will get pretty
> high with that approach. It basically means one round-robin allocator
> per cpu and device. What does that mean on a 4096 CPU machine :)
Well, if your network device is taking interrupts, and mapping/unmapping
buffers across all 4096 CPUs, then your performance is screwed anyway :)
Certainly your concerns are valid, but I think we can cope with them
fairly reasonably. If we *do* have large number of CPUs allocating for a
given domain, we can move to a per-node rather than per-CPU allocator.
And we can have dynamically sized allocation regions, so we aren't
wasting too much space on unused bitmaps if you map just *one* page from
each of your 4096 CPUs.
> How much lock contention will be lowered also depends on the work-load.
> If dma-handles are frequently freed from another cpu than they were
> allocated from the same problem re-appears.
The idea is that dma handles are *infrequently* freed, in batches. So
we'll bounce the lock's cache line occasionally, but not all the time.
In "strict" or "unmap_flush" mode, you get to go slowly unless you do
the unmap on the same CPU that you mapped it from. I can live with that.
> But in the end we have to try it out and see what works best :)
Indeed. I'm just trying to work out if I should try to do the allocator
thing purely inside the Intel code first, and then try to move it out
and make it generic ? or if I should start with making the DMA API work
with a wrapper around the IOMMU API, with your ->commit() and other
necessary changes. I think I'd prefer the latter, if we can work out how
it should look.
--
dwmw2
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5818 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20111111/52150256/attachment-0001.bin>
next prev parent reply other threads:[~2011-11-11 13:27 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-17 11:27 [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen
2011-10-17 11:27 ` [PATCH v4 1/7] iommu/core: stop converting bytes to page order back and forth Ohad Ben-Cohen
2011-10-17 11:27 ` [PATCH v4 2/7] iommu/core: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen
2011-11-10 6:17 ` Kai Huang
2011-11-10 7:31 ` Ohad Ben-Cohen
2011-11-10 12:16 ` cody
2011-11-10 13:08 ` Joerg Roedel
2011-11-10 14:35 ` cody
2011-11-10 14:51 ` Joerg Roedel
2011-11-10 15:28 ` David Woodhouse
2011-11-10 17:09 ` Joerg Roedel
2011-11-10 19:28 ` David Woodhouse
2011-11-11 12:58 ` Joerg Roedel
2011-11-11 13:27 ` David Woodhouse [this message]
2011-11-11 14:18 ` Joerg Roedel
[not found] ` <20111111131728.GG13213@amd.com>
2011-11-24 12:52 ` Changing IOMMU-API for generic DMA-mapping " Marek Szyprowski
2011-11-24 15:27 ` 'Joerg Roedel'
2011-11-10 21:12 ` [PATCH v4 2/7] iommu/core: split mapping to page sizes as " Stepan Moskovchenko
2011-11-11 13:24 ` Joerg Roedel
2011-11-12 2:04 ` Stepan Moskovchenko
2011-11-13 1:43 ` KyongHo Cho
2011-10-17 11:27 ` [PATCH v4 3/7] iommu/omap: announce supported page sizes Ohad Ben-Cohen
2011-10-17 11:27 ` [PATCH v4 4/7] iommu/msm: " Ohad Ben-Cohen
2011-10-17 11:27 ` [PATCH v4 5/7] iommu/amd: " Ohad Ben-Cohen
2011-10-17 11:27 ` [PATCH v4 6/7] iommu/intel: " Ohad Ben-Cohen
2011-10-17 11:27 ` [PATCH v4 7/7] iommu/core: remove the temporary pgsize settings Ohad Ben-Cohen
2011-11-08 12:57 ` [PATCH v4 0/7] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen
2011-11-08 14:01 ` Joerg Roedel
2011-11-08 14:03 ` Ohad Ben-Cohen
2011-11-08 16:23 ` Joerg Roedel
2011-11-08 16:43 ` Ohad Ben-Cohen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1321018048.2027.44.camel@shinybook.infradead.org \
--to=dwmw2@infradead.org \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).