* IOMMUs was Re: Intel vs AMD x86-64 [not found] ` <Pine.LNX.4.58.0402231359280.3005@ppc970.osdl.org.suse.lists.linux.kernel> @ 2004-02-24 14:06 ` Andi Kleen 2004-02-24 18:13 ` David S. Miller 0 siblings, 1 reply; 7+ messages in thread From: Andi Kleen @ 2004-02-24 14:06 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, davem Linus Torvalds <torvalds@osdl.org> writes: > In fact, I _think_ you could actually use the AGP bridge as a strange > IOMMU. Of course, right now their AGP bridges are all 32-bit limited > anyway, but the point being that they at least in theory would seem to > have the capability to do this. Actually AGPv3 is 40 bits capable (using a strange encoding, but it works). On Opteron the IOMMU code (ab)uses the built in AGPv3 GART in the CPU, which was originally intended for AGP. AMD converted it to be able to remap PCI especially for Linux, which I think deserves applause. It works surprisingly well even though it was not designed as a real IOMMU. Of course one of the main advantages of a real IOMMU - preventing arbitary memory corruption from broken devices - is lost because the remapping table is just a hole in the memory. I'm secretly hoping that when there is more support for Linux at chipset vendors they will someday add a bit to isolate all traffic that doesn't go through the GART from the main memory. This way you could get a much more reliable system that can tolerate broken PCI devices at a moderate performance penalty. One side effect of this is that the IOMMU TLB flush strategy is a bit dumb, because it has to do config space accesses for it. This is understandable because AGP rarely sets up new mappings. This is a bit of a problem because the @#$@$-X server does direct PCI accesses on its own and can race with an IOMMU TLB flush. But I hope this can get fixed eventually e.g. with the new freedesktop.org X server. When we get PCI Express memory mapped config space support this problem will hopefully go away. The bad message is that PCI Express will do away with GARTs, so they may not be there anymore in future chipsets. But I hope they will at least keep it in the Opteron on CPU bridge. > > Really, not having an IOMMU on a 64-bit platform these days is basically like > > pulling out one's toenails with an ice pick. > > Well, as long as they had that "64-bit is server" mentality, they can > honestly say that you just have to use 64-bit-capable PCI cards. > > Now, the "server only" mentality is obviously crap, but since we haven't > even seen the chipsets designed for the 64-bit chips, we shouldn't > complain. At least yet. What I find especially ironic is that exactly the same chipset people who use these crap arguments put 32bit only USB and IDE devices into the same chips. USB and IDE are the major users of the IOMMU. And yes, they're 32bit only even in the "highend" Intel server chipsets. And they already have a mostly working IOMMU in the chipset for the GART, they just refuse to use it for PCI too. > Now, I'm not above complaining about Intel (in fact, the Intel people seem > to often think I hate them because I'm apparently the only person who gets > quoted who complains about bad decisions publicly), but at least I try to > avoid complaining before-the-fact ;) Can you please complain a bit more about the chipset people and get quoted so that Intel management hears you ? ;-) -Andi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: IOMMUs was Re: Intel vs AMD x86-64 2004-02-24 14:06 ` IOMMUs was Re: Intel vs AMD x86-64 Andi Kleen @ 2004-02-24 18:13 ` David S. Miller 2004-02-27 1:28 ` Andi Kleen 0 siblings, 1 reply; 7+ messages in thread From: David S. Miller @ 2004-02-24 18:13 UTC (permalink / raw) To: Andi Kleen; +Cc: torvalds, linux-kernel On 24 Feb 2004 15:06:47 +0100 Andi Kleen <ak@suse.de> wrote: > One side effect of this is that the IOMMU TLB flush strategy is a bit > dumb, because it has to do config space accesses for it. This can be costly, but if you flush the IOMMU like sparc64 does (basically it's similar to how KMAPs are flushed on x86), the cost gets real low because then you only flush the whole iommu once every time you walk the whole mapping table of the iommu. I'm sure you've probably thought of this already, just mentioning it in case you haven't. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: IOMMUs was Re: Intel vs AMD x86-64 2004-02-24 18:13 ` David S. Miller @ 2004-02-27 1:28 ` Andi Kleen 2004-02-24 18:41 ` David S. Miller 2004-02-25 0:36 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 7+ messages in thread From: Andi Kleen @ 2004-02-27 1:28 UTC (permalink / raw) To: David S. Miller; +Cc: torvalds, linux-kernel, richard.brunner On Tue, 24 Feb 2004 10:13:40 -0800 "David S. Miller" <davem@redhat.com> wrote: > On 24 Feb 2004 15:06:47 +0100 > Andi Kleen <ak@suse.de> wrote: > > > One side effect of this is that the IOMMU TLB flush strategy is a bit > > dumb, because it has to do config space accesses for it. > > This can be costly, but if you flush the IOMMU like sparc64 does (basically > it's similar to how KMAPs are flushed on x86), the cost gets real low because > then you only flush the whole iommu once every time you walk the whole mapping > table of the iommu. > > I'm sure you've probably thought of this already, just mentioning it in case > you haven't. Arjan suggested it some time ago already. In fact I implemented it, it's in the current code. But it caused data corruption with a few devices, in particular 3ware, so I had to disable it again. I didn't find a bug in the code. It worked fine with others. My theory was that it triggered some hardware bug that was normally masked by the frequent flushes, but I wasn't able to track it down without heavy equipment. Currently it is in there, but disabled by default. Can be enabled with iommu=nofullflush. Also the other part of the dumbness is that the flush is global, not per mapping. I guess you don't have that problem on Sparc64. Anyways, even with these restrictions having the GART as IOMMU is much better than doing software bouncing. -Andi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: IOMMUs was Re: Intel vs AMD x86-64 2004-02-27 1:28 ` Andi Kleen @ 2004-02-24 18:41 ` David S. Miller 2004-02-25 0:36 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 7+ messages in thread From: David S. Miller @ 2004-02-24 18:41 UTC (permalink / raw) To: Andi Kleen; +Cc: torvalds, linux-kernel, richard.brunner On Fri, 27 Feb 2004 02:28:49 +0100 Andi Kleen <ak@suse.de> wrote: > Also the other part of the dumbness is that the flush is global, not per mapping. I guess > you don't have that problem on Sparc64. Yes, we can per-page flush, but I don't use that feature at all since I do the "flush all when wrap around IOMMU pte table" thing we're discussing. In fact there is no "global flush" so what I have to do is use diagnostic accesses to the IOMMU TLB to kick out the entries one by one. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: IOMMUs was Re: Intel vs AMD x86-64 2004-02-27 1:28 ` Andi Kleen 2004-02-24 18:41 ` David S. Miller @ 2004-02-25 0:36 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 7+ messages in thread From: Benjamin Herrenschmidt @ 2004-02-25 0:36 UTC (permalink / raw) To: Andi Kleen Cc: David S. Miller, Linus Torvalds, Linux Kernel list, richard.brunner > Arjan suggested it some time ago already. In fact I implemented it, it's in the current code. > But it caused data corruption with a few devices, in particular 3ware, so I had > to disable it again. I didn't find a bug in the code. It worked fine with others. My theory > was that it triggered some hardware bug that was normally masked by the frequent flushes, but > I wasn't able to track it down without heavy equipment. Interesting. I'm having a data corruption issue with the G5 iommu that I can fix by always mapping everything. That is non-mapped virtual IO pages are actually mapped to a dummy RAM page. It seems there is a problem with the PCI<->HT bridge doing prefetches beyond iommu mapped pages, thus triggering an iommu error, which in turns probably triggers some other chipset bug ending up in data corruption. Having everything mapped (allowing prefetch to complete even while prefetched data is actually useless) fixes the problem and we don't see any corruption. Of course, that means we can not longer use the mecanism we first implemented where we would only flush the iommu TLB once after runnning out of virtual pages to allocate. We have to flush on every insertion and removal now :( On the other hand, we can probably do per-tag TLB flushes instead of flushing the whole TLB once we properly figure out how to access the tag registers on the chipset and their format (the darwin source code seem to imply that is doable, but doesn't actually use that, but in this regard, apple's implementation is impressively sub-optimal). Ben. ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: IOMMUs was Re: Intel vs AMD x86-64 @ 2004-02-24 15:50 richard.brunner 2004-02-24 16:27 ` Mike Fedyk 0 siblings, 1 reply; 7+ messages in thread From: richard.brunner @ 2004-02-24 15:50 UTC (permalink / raw) To: linux-kernel > -----Original Message----- > From: Andi Kleen [mailto:ak@suse.de] > On Opteron the IOMMU code (ab)uses the built in AGPv3 GART in > the CPU, which > was originally intended for AGP. AMD converted it to be able > to remap PCI especially for Linux, which I think deserves applause. > > It works surprisingly well even though it was not designed as > a real IOMMU. Of course one of the main advantages of a real > IOMMU - preventing arbitary memory corruption from broken > devices - is lost because the remapping table is just a hole > in the memory. I'm > secretly hoping that when there is more support for Linux at > chipset vendors they will someday add a bit to isolate all > traffic that doesn't go through the GART from the main > memory. This way you could get a much more reliable system > that can tolerate broken PCI devices at a moderate > performance penalty. Andi is being modest. It was he and Andrea Arcangeli who convinced me we had a problem. We found a way to trick the AGP GART hardware into helping, and then they turned it into a "real" solution and helped us work the warts out of the BIOS to enable it. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: IOMMUs was Re: Intel vs AMD x86-64 2004-02-24 15:50 richard.brunner @ 2004-02-24 16:27 ` Mike Fedyk 0 siblings, 0 replies; 7+ messages in thread From: Mike Fedyk @ 2004-02-24 16:27 UTC (permalink / raw) To: richard.brunner; +Cc: linux-kernel On Tue, Feb 24, 2004 at 09:50:02AM -0600, richard.brunner@amd.com wrote: > > > -----Original Message----- > > From: Andi Kleen [mailto:ak@suse.de] > > > > On Opteron the IOMMU code (ab)uses the built in AGPv3 GART in > > the CPU, which > > was originally intended for AGP. AMD converted it to be able > > to remap PCI especially for Linux, which I think deserves applause. > > > > It works surprisingly well even though it was not designed as > > a real IOMMU. Of course one of the main advantages of a real > > IOMMU - preventing arbitary memory corruption from broken > > devices - is lost because the remapping table is just a hole > > in the memory. I'm > > secretly hoping that when there is more support for Linux at > > chipset vendors they will someday add a bit to isolate all > > traffic that doesn't go through the GART from the main > > memory. This way you could get a much more reliable system > > that can tolerate broken PCI devices at a moderate > > performance penalty. > > Andi is being modest. It was he and Andrea Arcangeli who convinced > me we had a problem. We found a way to trick the AGP > GART hardware into helping, and then they turned it into a > "real" solution and helped us work the warts out of the BIOS > to enable it. Yowza! Open source helping to make better processors. :-D ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2004-02-25 0:43 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.44.0402231625220.9708-100000@chimarrao.boston.redhat.com.suse.lists.linux.kernel>
[not found] ` <Pine.LNX.4.58.0402231335430.3005@ppc970.osdl.org.suse.lists.linux.kernel>
[not found] ` <20040223134853.5947a414.davem@redhat.com.suse.lists.linux.kernel>
[not found] ` <Pine.LNX.4.58.0402231359280.3005@ppc970.osdl.org.suse.lists.linux.kernel>
2004-02-24 14:06 ` IOMMUs was Re: Intel vs AMD x86-64 Andi Kleen
2004-02-24 18:13 ` David S. Miller
2004-02-27 1:28 ` Andi Kleen
2004-02-24 18:41 ` David S. Miller
2004-02-25 0:36 ` Benjamin Herrenschmidt
2004-02-24 15:50 richard.brunner
2004-02-24 16:27 ` Mike Fedyk
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.