* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) [not found] ` <fa.lphwLNvMksoBFaqfqCzMG1UVhsA@ifi.uio.no> @ 2007-03-05 6:25 ` Robert Hancock 2007-03-12 13:06 ` Andi Kleen 0 siblings, 1 reply; 27+ messages in thread From: Robert Hancock @ 2007-03-05 6:25 UTC (permalink / raw) To: linux-kernel; +Cc: Chip Coldwell, Andi Kleen Chip Coldwell wrote: > On Wed, 17 Jan 2007, Andi Kleen wrote: > >> On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote: >>> On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote: >>>> I agree,... it seems drastic, but this is the only really secure >>>> solution. >>> I'd like to here from Andi how he feels about this? It seems like a >>> somewhat drastic solution in some ways given a lot of hardware doesn't >>> seem to be affected (or maybe in those cases it's just really hard to >>> hit, I don't know). >> AMD is looking at the issue. Only Nvidia chipsets seem to be affected, >> although there were similar problems on VIA in the past too. >> Unless a good workaround comes around soon I'll probably default >> to iommu=soft on Nvidia. > > We (Sun, AMD, Nvidia and Red Hat) have been testing a patch that seems > to solve the problem. AMD and Nvidia analyzed an HDT trace that > seemed to indicate that CPU updates of the GATT were still in cache > when a subsequent table walk caused by a device load used a stale GATT > PTE. That analysis inspired this patch, submitted to this list as an > RFC. It is not obvious (to me, at least) why this problem has only > shown up on Nvidia SATA controllers. > > We are continuing to investigate. > > diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c > index 030eb37..1dd461a 100644 > --- a/arch/x86_64/kernel/pci-gart.c > +++ b/arch/x86_64/kernel/pci-gart.c > @@ -69,6 +69,8 @@ static u32 gart_unmapped_entry; > #define AGPEXTERN > #endif > > +#define GATT_CLFLUSH(i) asm volatile ("clflush (%0)" :: "r" (iommu_gatt_base + (i))) > + > /* backdoor interface to AGP driver */ > AGPEXTERN int agp_memory_reserved; > AGPEXTERN __u32 *agp_gatt_table; > @@ -221,6 +223,7 @@ static dma_addr_t dma_map_area(struct device *dev, dma_addr_t phys_mem, > for (i = 0; i < npages; i++) { > iommu_gatt_base[iommu_page + i] = GPTE_ENCODE(phys_mem); > SET_LEAK(iommu_page + i); > + GATT_CLFLUSH(iommu_page + i); > phys_mem += PAGE_SIZE; > } > return iommu_bus_base + iommu_page*PAGE_SIZE + (phys_mem & ~PAGE_MASK); > @@ -348,6 +351,7 @@ static int __dma_map_cont(struct scatterlist *sg, int start, int stopat, > while (pages--) { > iommu_gatt_base[iommu_page] = GPTE_ENCODE(addr); > SET_LEAK(iommu_page); > + GATT_CLFLUSH(iommu_page); > addr += PAGE_SIZE; > iommu_page++; > } > > Andi, have you had a look at this? I'm a bit surprised at the lack of reaction to this find.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-03-05 6:25 ` data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) Robert Hancock @ 2007-03-12 13:06 ` Andi Kleen 2007-03-12 14:56 ` Jeff Garzik 0 siblings, 1 reply; 27+ messages in thread From: Andi Kleen @ 2007-03-12 13:06 UTC (permalink / raw) To: Robert Hancock; +Cc: linux-kernel, Chip Coldwell > Andi, have you had a look at this? I'm a bit surprised at the lack of > reaction to this find.. FYI the problem is still being analysed behind the scenes. Chip's patch didn't fix it in all cases unfortunately -- it just changed the timing enough to make it happen less often. The latest evidence points to a DMA mapping management problem in Linux. Apparently in some cases sata_nv does DMA on an already freed and then reused mapping. -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-03-12 13:06 ` Andi Kleen @ 2007-03-12 14:56 ` Jeff Garzik 0 siblings, 0 replies; 27+ messages in thread From: Jeff Garzik @ 2007-03-12 14:56 UTC (permalink / raw) To: Andi Kleen; +Cc: Robert Hancock, linux-kernel, Chip Coldwell Andi Kleen wrote: > in Linux. Apparently in some cases sata_nv does DMA on an already freed and then > reused mapping. Any data or additional info on that? Did you discover this by tracking the DMA API software routines, or something lower level (like a bus analyzer)? libata handles all the DMA allocation and mapping and cleanup for sata_nv, so any software problem would affect the whole of libata. But it's possible that the nForce SATA chip has DMA padding needs that are different from those provided by libata-core (grep for "pad"), which could create a situation where the hardware continues DMA'ing past the end of the DMA area. Jeff ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <fa.E9jVXDLMKzMZNCbslzUxjMhsInE@ifi.uio.no>]
* Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?! [not found] <fa.E9jVXDLMKzMZNCbslzUxjMhsInE@ifi.uio.no> @ 2007-01-03 23:41 ` Robert Hancock 2007-01-15 22:56 ` Christoph Anton Mitterer 0 siblings, 1 reply; 27+ messages in thread From: Robert Hancock @ 2007-01-03 23:41 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-kernel Christoph Anton Mitterer wrote: > Hi. > > Perhaps some of you have read my older two threads: > http://marc.theaimsgroup.com/?t=116312440000001&r=1&w=2 and the even > older http://marc.theaimsgroup.com/?t=116291314500001&r=1&w=2 > > The issue was basically the following: > I found a severe bug mainly by fortune because it occurs very rarely. > My test looks like the following: I have about 30GB of testing data on > my harddisk,... I repeat verifying sha512 sums on these files and check > if errors occur. > One test pass verifies the 30GB 50 times,... about one to four > differences are found in each pass. > > The corrupted data is not one single completely wrong block of data or > so,.. but if you look at the area of the file where differences are > found,.. than some bytes are ok,.. some are wrong,.. and so on (seems to > be randomly). > > Also, there seems to be no event that triggers the corruption,.. it > seems to be randomly, too. > > It is really definitely not a harware issue (see my old threads my > emails to Tyan/Hitachi and my "workaround" below. My system isn't > overclocked. > > > > My System: > Mainboard: Tyan S2895 > Chipsets: Nvidia nforce professional 2200 and 2050 and AMD 8131 > CPU: 2x DualCore Opterons model 275 > RAM: 4GB Kingston Registered/ECC > Diskdrives: IBM/Hitachi: 1 PATA, 2 SATA > > > The data corruption error occurs on all drives. > > > You might have a look at the emails between me and Tyan and Hitachi,.. > they contain probalby lots of valuable information (especially my > different tests). > > > > Some days ago,.. an engineer of Tyan suggested me to boot the kernel > with mem=3072M. > When doing this,.. the issue did not occur (I don't want to say it was > solved. Why? See my last emails to Tyan!) > Then he suggested me to disable the memory hole mapping in the BIOS,... > When doing so,.. the error doesn't occur, too. > But I loose about 2GB RAM,.. and,.. more important,.. I cant believe > that this is responsible for the whole issue. I don't consider it a > solution but more a poor workaround which perhaps only by fortune solves > the issue (Why? See my last eMails to Tyan ;) ) > > > > So I'd like to ask you if you perhaps could read the current information > in this and previous mails,.. and tell me your opinions. > It is very likely that a large number of users suffer from this error > (namely all Nvidia chipset users) but only few (there are some,.. I > found most of them in the Nvidia forums,.. and they have exactly the > same issue) identify this as an error because it's so rare. > > Perhaps someone have an idea why disabling the memhole mapping solves > it. I've always thought that memhole mapping just moves some address > space to higher addreses to avoid the conflict between address space for > PCI devices and address space for pyhsical memory. > But this should be just a simple addition and not solve this obviously > complex error. If this is related to some problem with using the GART IOMMU with memory hole remapping enabled, then 2.6.20-rc kernels may avoid this problem on nForce4 CK804/MCP04 chipsets as far as transfers to/from the SATA controller are concerned as the sata_nv driver now supports 64-bit DMA on these chipsets and so no longer requires the IOMMU. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?! 2007-01-03 23:41 ` data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?! Robert Hancock @ 2007-01-15 22:56 ` Christoph Anton Mitterer 2007-01-15 23:05 ` Christoph Anton Mitterer 0 siblings, 1 reply; 27+ messages in thread From: Christoph Anton Mitterer @ 2007-01-15 22:56 UTC (permalink / raw) To: Robert Hancock Cc: linux-kernel, cw, knweiss, ak, andersen, krader, lfriedman, linux-nforce-bugs [-- Attachment #1: Type: text/plain, Size: 1820 bytes --] Hi everybody. Sorry again for my late reply... Robert gave us the following interesting information some days ago: Robert Hancock wrote: > If this is related to some problem with using the GART IOMMU with memory > hole remapping enabled, then 2.6.20-rc kernels may avoid this problem on > nForce4 CK804/MCP04 chipsets as far as transfers to/from the SATA > controller are concerned as the sata_nv driver now supports 64-bit DMA > on these chipsets and so no longer requires the IOMMU. > I've just tested it with my "normal" BIOS settings, that is memhole mapping = hardware, IOMMU = enabled and 64MB and _without_ (!) iommu=soft as kernel parameters. I only had the time for a small test (that is 3 passes with each 10 complete sha512sums cyles over about 30GB data)... but sofar, no corruption occured. It is surely far to eraly to tell that our issue was solved by 2.6.20-rc-something.... but I ask all of you that had systems that suffered from the corruption to make _intensive_ tests with the most recent rc of 2.6.20 (I've used 2.6.20-rc5) and report your results. I'll do a extensive test tomorrow. And of course (!!): Test without using iommu=soft and with enabled memhole mapping (in the BIOS). (It won't make any sense to look if the new kernel solves our problem while still applying one of our two workarounds). Please also note that there might be two completely data corruption problems. The onle "solved" by iommu=soft and another reported by Kurtis D. Rader. I've asked him to clarify this in a post. :-) Ok,... now if this (the new kernel) would really solve the issue... we should try to find out what exactly was changed in the code, and if it sounds logical that this solved the problem or not. The new kernel could just make the corruption even more rare. Best wishes, Chris. [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?! 2007-01-15 22:56 ` Christoph Anton Mitterer @ 2007-01-15 23:05 ` Christoph Anton Mitterer 2007-01-16 0:23 ` Robert Hancock 0 siblings, 1 reply; 27+ messages in thread From: Christoph Anton Mitterer @ 2007-01-15 23:05 UTC (permalink / raw) To: Robert Hancock Cc: linux-kernel, cw, knweiss, ak, andersen, krader, lfriedman, linux-nforce-bugs [-- Attachment #1: Type: text/plain, Size: 1020 bytes --] Sorry, as always I've forgot some things... *g* Robert Hancock wrote: > If this is related to some problem with using the GART IOMMU with memory > hole remapping enabled What is that GART thing exactly? Is this the hardware IOMMU? I've always thought GART was something graphics card related,.. but if so,.. how could this solve our problem (that seems to occur mainly on harddisks)? > then 2.6.20-rc kernels may avoid this problem on > nForce4 CK804/MCP04 chipsets as far as transfers to/from the SATA > controller are concerned Does this mean that PATA is no related? The corruption appears on PATA disks to, so why should it only solve the issue at SATA disks? Sounds a bit strange to me? > as the sata_nv driver now supports 64-bit DMA > on these chipsets and so no longer requires the IOMMU. > Can you explain this a little bit more please? Is this a drawback (like a performance decrease)? Like under Windows where they never use the hardware iommu but always do it via software? Best wishes, Chris. [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?! 2007-01-15 23:05 ` Christoph Anton Mitterer @ 2007-01-16 0:23 ` Robert Hancock 2007-01-16 13:54 ` Christoph Anton Mitterer 0 siblings, 1 reply; 27+ messages in thread From: Robert Hancock @ 2007-01-16 0:23 UTC (permalink / raw) To: Christoph Anton Mitterer Cc: linux-kernel, cw, knweiss, ak, andersen, krader, lfriedman, linux-nforce-bugs Christoph Anton Mitterer wrote: > Sorry, as always I've forgot some things... *g* > > > Robert Hancock wrote: > >> If this is related to some problem with using the GART IOMMU with memory >> hole remapping enabled > What is that GART thing exactly? Is this the hardware IOMMU? I've always > thought GART was something graphics card related,.. but if so,.. how > could this solve our problem (that seems to occur mainly on harddisks)? The GART built into the Athlon 64/Opteron CPUs is normally used for remapping graphics memory so that an AGP graphics card can see physically non-contiguous memory as one contiguous region. However, Linux can also use it as an IOMMU which allows devices which normally can't access memory above 4GB to see a mapping of that memory that resides below 4GB. In pre-2.6.20 kernels both the SATA and PATA controllers on the nForce 4 chipsets can only access memory below 4GB so transfers to memory above this mark have to go through the IOMMU. In 2.6.20 this limitation is lifted on the nForce4 SATA controllers. > >> then 2.6.20-rc kernels may avoid this problem on >> nForce4 CK804/MCP04 chipsets as far as transfers to/from the SATA >> controller are concerned > Does this mean that PATA is no related? The corruption appears on PATA > disks to, so why should it only solve the issue at SATA disks? Sounds a > bit strange to me? The PATA controller will still be using 32-bit DMA and so may also use the IOMMU, so this problem would not be avoided. > >> as the sata_nv driver now supports 64-bit DMA >> on these chipsets and so no longer requires the IOMMU. >> > Can you explain this a little bit more please? Is this a drawback (like > a performance decrease)? Like under Windows where they never use the > hardware iommu but always do it via software? No, it shouldn't cause any performance loss. In previous kernels the nForce4 SATA controller was controlled using an interface quite similar to a PATA controller. In 2.6.20 kernels they use a more efficient interface that NVidia calls ADMA, which in addition to supporting NCQ also supports DMA without any 4GB limitations, so it can access all memory directly without requiring IOMMU assistance. Note that if this corruption problem is, as has been suggested, related to memory hole remapping and the IOMMU, then this change only prevents the SATA controller transfers from experiencing this problem. Transfers on the PATA controller as well as any other devices with 32-bit DMA limitations might still have problems. As such this really just avoids the problem, not fixes it. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from hancockr@nospamshaw.ca Home Page: http://www.roberthancock.com/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?! 2007-01-16 0:23 ` Robert Hancock @ 2007-01-16 13:54 ` Christoph Anton Mitterer 2007-01-16 14:26 ` Robert Hancock 0 siblings, 1 reply; 27+ messages in thread From: Christoph Anton Mitterer @ 2007-01-16 13:54 UTC (permalink / raw) To: Robert Hancock Cc: linux-kernel, cw, knweiss, ak, andersen, krader, lfriedman, linux-nforce-bugs [-- Attachment #1: Type: text/plain, Size: 2748 bytes --] Robert Hancock wrote: >> What is that GART thing exactly? Is this the hardware IOMMU? I've always >> thought GART was something graphics card related,.. but if so,.. how >> could this solve our problem (that seems to occur mainly on harddisks)? >> > The GART built into the Athlon 64/Opteron CPUs is normally used for > remapping graphics memory so that an AGP graphics card can see > physically non-contiguous memory as one contiguous region. However, > Linux can also use it as an IOMMU which allows devices which normally > can't access memory above 4GB to see a mapping of that memory that > resides below 4GB. In pre-2.6.20 kernels both the SATA and PATA > controllers on the nForce 4 chipsets can only access memory below 4GB so > transfers to memory above this mark have to go through the IOMMU. In > 2.6.20 this limitation is lifted on the nForce4 SATA controllers. > Ah, I see. Thanks for that introduction :-) >> Does this mean that PATA is no related? The corruption appears on PATA >> disks to, so why should it only solve the issue at SATA disks? Sounds a >> bit strange to me? >> > The PATA controller will still be using 32-bit DMA and so may also use > the IOMMU, so this problem would not be avoided. > > >> Can you explain this a little bit more please? Is this a drawback (like >> a performance decrease)? Like under Windows where they never use the >> hardware iommu but always do it via software? >> > > No, it shouldn't cause any performance loss. In previous kernels the > nForce4 SATA controller was controlled using an interface quite similar > to a PATA controller. In 2.6.20 kernels they use a more efficient > interface that NVidia calls ADMA, which in addition to supporting NCQ > also supports DMA without any 4GB limitations, so it can access all > memory directly without requiring IOMMU assistance. > > Note that if this corruption problem is, as has been suggested, related > to memory hole remapping and the IOMMU, then this change only prevents > the SATA controller transfers from experiencing this problem. Transfers > on the PATA controller as well as any other devices with 32-bit DMA > limitations might still have problems. As such this really just avoids > the problem, not fixes it. > Ok,.. that sounds reasonable,.. so the whole thing might (!) actually be a hardware design error,... but we just don't use that hardware any longer when accessing devices via sata_nv. So this doesn't solve our problem with PATA drives or other devices (although we had until now no reports of errors with other devices) and we have to stick with iommu=soft. If one use iommu=soft the sata_nv will continue to use the new code for the ADMA, right? Best wishes, Chris. [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?! 2007-01-16 13:54 ` Christoph Anton Mitterer @ 2007-01-16 14:26 ` Robert Hancock 2007-01-16 18:01 ` data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) Chris Wedgwood 0 siblings, 1 reply; 27+ messages in thread From: Robert Hancock @ 2007-01-16 14:26 UTC (permalink / raw) To: Christoph Anton Mitterer Cc: linux-kernel, cw, knweiss, ak, andersen, krader, lfriedman, linux-nforce-bugs Christoph Anton Mitterer wrote: > Ok,.. that sounds reasonable,.. so the whole thing might (!) actually be > a hardware design error,... but we just don't use that hardware any > longer when accessing devices via sata_nv. > > So this doesn't solve our problem with PATA drives or other devices > (although we had until now no reports of errors with other devices) and > we have to stick with iommu=soft. > > If one use iommu=soft the sata_nv will continue to use the new code for > the ADMA, right? Right, that shouldn't affect it. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-16 14:26 ` Robert Hancock @ 2007-01-16 18:01 ` Chris Wedgwood 2007-01-16 19:52 ` Christoph Anton Mitterer ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Chris Wedgwood @ 2007-01-16 18:01 UTC (permalink / raw) To: Robert Hancock Cc: Christoph Anton Mitterer, linux-kernel, knweiss, ak, andersen, krader, lfriedman, linux-nforce-bugs On Tue, Jan 16, 2007 at 08:26:05AM -0600, Robert Hancock wrote: > >If one use iommu=soft the sata_nv will continue to use the new code > >for the ADMA, right? > > Right, that shouldn't affect it. right now i'm thinking if we can't figure out which cpu/bios combinations are safe we might almost be better off doing iommu=soft for *all* k8 stuff except for those that are whitelisted; though this seems extremely drastic it's not clear if this only affect nvidia based chipsets, the nature of the corruption makes me think it's not an iommu software bug (we see a few bytes not entire pages corrupted, it's not even clear if it's entire cachelines trashed) --- perhaps other vendors have more recent bios errata or maybe it's just that nvidia has sold a lot of these so they are more visible? (i'm assuming at this point it might be some kind of cpu errata that some bioses deal with because some mainboards don't ever seem to see this whilst others do) in some ways the problem is worse with recent kernels --- because the ethernet and sata can address over 4GB and don't use the iommu anymore the problem is going to be *much* harder to hit, but still here lurking to cause problems for people. with ethernet you'll probably end up getting the odd trashed tcp frame and dropping it, so those will go mostly unnoticed, so this is why sata seems to be the easier way to show it ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-16 18:01 ` data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) Chris Wedgwood @ 2007-01-16 19:52 ` Christoph Anton Mitterer 2007-01-16 20:31 ` Chris Wedgwood 2007-01-16 20:16 ` Arkadiusz Miskiewicz 2007-01-16 20:31 ` Krzysztof Halasa 2 siblings, 1 reply; 27+ messages in thread From: Christoph Anton Mitterer @ 2007-01-16 19:52 UTC (permalink / raw) To: Chris Wedgwood Cc: Robert Hancock, linux-kernel, knweiss, ak, andersen, krader, lfriedman, linux-nforce-bugs [-- Attachment #1: Type: text/plain, Size: 1487 bytes --] Chris Wedgwood wrote: > right now i'm thinking if we can't figure out which cpu/bios > combinations are safe we might almost be better off doing iommu=soft > for *all* k8 stuff except for those that are whitelisted; though this > seems extremely drastic > I agree,... it seems drastic, but this is the only really secure solution. But it seems that none of the responsible developers read our thread or the bugreport and gave his opinion about the issue. > it's not clear if this only affect nvidia based chipsets, the nature > of the corruption makes me think it's not an iommu software bug (we > see a few bytes not entire pages corrupted, it's not even clear if > it's entire cachelines trashed) --- perhaps other vendors have more > recent bios errata or maybe it's just that nvidia has sold a lot of > these so they are more visible? (i'm assuming at this point it might > be some kind of cpu errata that some bioses deal with because some > mainboards don't ever seem to see this whilst others do) > Well we can hope that Nvidia will find out more (though I'm not too optimistic). > in some ways the problem is worse with recent kernels --- because the > ethernet and sata can address over 4GB and don't use the iommu anymore > the problem is going to be *much* harder to hit, but still here > lurking to cause problems for people. Yes I agree,.. this is a dangerous situation... But we should not forget about the issue, just because SATA is not longer affected. Chris. [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-16 19:52 ` Christoph Anton Mitterer @ 2007-01-16 20:31 ` Chris Wedgwood 2007-01-16 21:29 ` Andi Kleen ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Chris Wedgwood @ 2007-01-16 20:31 UTC (permalink / raw) To: Christoph Anton Mitterer Cc: Robert Hancock, linux-kernel, knweiss, ak, andersen, krader, lfriedman, linux-nforce-bugs On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote: > I agree,... it seems drastic, but this is the only really secure > solution. I'd like to here from Andi how he feels about this? It seems like a somewhat drastic solution in some ways given a lot of hardware doesn't seem to be affected (or maybe in those cases it's just really hard to hit, I don't know). > Well we can hope that Nvidia will find out more (though I'm not too > optimistic). Ideally someone from AMD needs to look into this, if some mainboards really never see this problem, then why is that? Is there errata that some BIOS/mainboard vendors are dealing with that others are not? > But we should not forget about the issue, just because SATA is not > longer affected. Right. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-16 20:31 ` Chris Wedgwood @ 2007-01-16 21:29 ` Andi Kleen 2007-01-17 1:17 ` Christoph Anton Mitterer ` (4 more replies) 2007-01-16 21:54 ` Allen Martin 2007-01-17 1:12 ` Christoph Anton Mitterer 2 siblings, 5 replies; 27+ messages in thread From: Andi Kleen @ 2007-01-16 21:29 UTC (permalink / raw) To: Chris Wedgwood Cc: Christoph Anton Mitterer, Robert Hancock, linux-kernel, knweiss, andersen, krader, lfriedman, linux-nforce-bugs On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote: > On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote: > > I agree,... it seems drastic, but this is the only really secure > > solution. > > I'd like to here from Andi how he feels about this? It seems like a > somewhat drastic solution in some ways given a lot of hardware doesn't > seem to be affected (or maybe in those cases it's just really hard to > hit, I don't know). AMD is looking at the issue. Only Nvidia chipsets seem to be affected, although there were similar problems on VIA in the past too. Unless a good workaround comes around soon I'll probably default to iommu=soft on Nvidia. -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-16 21:29 ` Andi Kleen @ 2007-01-17 1:17 ` Christoph Anton Mitterer 2007-01-17 14:48 ` Chip Coldwell ` (3 subsequent siblings) 4 siblings, 0 replies; 27+ messages in thread From: Christoph Anton Mitterer @ 2007-01-17 1:17 UTC (permalink / raw) To: Andi Kleen Cc: Chris Wedgwood, Robert Hancock, linux-kernel, knweiss, andersen, krader, lfriedman, linux-nforce-bugs [-- Attachment #1: Type: text/plain, Size: 494 bytes --] Andi Kleen wrote: > AMD is looking at the issue. Only Nvidia chipsets seem to be affected, > although there were similar problems on VIA in the past too. > Unless a good workaround comes around soon I'll probably default > to iommu=soft on Nvidia. I've just read the posts about AMDs and NVIDIAs effort to find the issue,... but in the meantime this would be the best solution. And if "we"'ll ever find a rue solution,.. we could still deactivate the iommu=soft setting. Best wishes, Chris. [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-16 21:29 ` Andi Kleen 2007-01-17 1:17 ` Christoph Anton Mitterer @ 2007-01-17 14:48 ` Chip Coldwell 2007-01-17 19:46 ` Chip Coldwell 2007-01-17 22:15 ` Andi Kleen 2007-01-18 9:29 ` joachim ` (2 subsequent siblings) 4 siblings, 2 replies; 27+ messages in thread From: Chip Coldwell @ 2007-01-17 14:48 UTC (permalink / raw) To: Andi Kleen Cc: Chris Wedgwood, Christoph Anton Mitterer, Robert Hancock, linux-kernel, knweiss, andersen, krader, lfriedman, linux-nforce-bugs On Wed, 17 Jan 2007, Andi Kleen wrote: > On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote: >> On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote: >>> I agree,... it seems drastic, but this is the only really secure >>> solution. >> >> I'd like to here from Andi how he feels about this? It seems like a >> somewhat drastic solution in some ways given a lot of hardware doesn't >> seem to be affected (or maybe in those cases it's just really hard to >> hit, I don't know). > > AMD is looking at the issue. Only Nvidia chipsets seem to be affected, > although there were similar problems on VIA in the past too. > Unless a good workaround comes around soon I'll probably default > to iommu=soft on Nvidia. > > -Andi > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ We've just verified that configuring the graphics aperture to be write-combining instead of write-back using an MTRR also solves the problem. It appears to be a cache incoherency issue in the graphics aperture. This script does the trick: [ -- cut here -- ] #!/bin/bash # Read the northbridge offset 0x90 to get the size of the aperture size=0x`lspci -xxx -s 0:18.3 | awk '/^90:/ { print $2 }'` # bit 0 indicates the aperture is enabled, bits 1 - 3 indicate the size if [ $((size & 1)) -eq 0 ] ; then echo "GART disabled; exiting" exit 0 fi shft=$(((size >> 1) & 7)) size=$((0x2000000 << shft)) # Read the northbridge offset 0x94 to get the base address of the aperture base=0x`lspci -xxx -s 0:18.3 | awk '/^90:/ { print $6 }'` base=$((base << 25)) basehex=`printf 0x%08x $base` printf "IOMMU aperture found at base=0x%08x size=0x%08x (%d KiB)\n" $base $size $((size/1024)) if grep -q $basehex /proc/mtrr ; then echo "MTRR already configured for IOMMU aperture; exiting" exit 0 fi echo "Configuring write-combining MTRR for IOMMU aperture" printf "base=0x%08x size=0x%08x type=write-combining\n" $base $size >/proc/mtrr exit 0 [ -- cut here-- ] Chip -- Charles M. "Chip" Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-17 14:48 ` Chip Coldwell @ 2007-01-17 19:46 ` Chip Coldwell 2007-01-17 22:15 ` Andi Kleen 1 sibling, 0 replies; 27+ messages in thread From: Chip Coldwell @ 2007-01-17 19:46 UTC (permalink / raw) To: Chip Coldwell Cc: Andi Kleen, Chris Wedgwood, Christoph Anton Mitterer, Robert Hancock, linux-kernel, knweiss, andersen, krader, lfriedman, linux-nforce-bugs On Wed, 17 Jan 2007, Chip Coldwell wrote: > On Wed, 17 Jan 2007, Andi Kleen wrote: > >> On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote: >>> On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote: >>>> I agree,... it seems drastic, but this is the only really secure >>>> solution. >>> >>> I'd like to here from Andi how he feels about this? It seems like a >>> somewhat drastic solution in some ways given a lot of hardware doesn't >>> seem to be affected (or maybe in those cases it's just really hard to >>> hit, I don't know). >> >> AMD is looking at the issue. Only Nvidia chipsets seem to be affected, >> although there were similar problems on VIA in the past too. >> Unless a good workaround comes around soon I'll probably default >> to iommu=soft on Nvidia. >> > > We've just verified that configuring the graphics aperture to be > write-combining instead of write-back using an MTRR also solves the > problem. It appears to be a cache incoherency issue in the graphics > aperture. I take it back. Further testing has revealed that this does not solve the problem. Chip -- Charles M. "Chip" Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-17 14:48 ` Chip Coldwell 2007-01-17 19:46 ` Chip Coldwell @ 2007-01-17 22:15 ` Andi Kleen 2007-01-18 21:57 ` Chip Coldwell 1 sibling, 1 reply; 27+ messages in thread From: Andi Kleen @ 2007-01-17 22:15 UTC (permalink / raw) To: Chip Coldwell Cc: Chris Wedgwood, Christoph Anton Mitterer, Robert Hancock, linux-kernel, knweiss, andersen, krader, lfriedman, linux-nforce-bugs > We've just verified that configuring the graphics aperture to be > write-combining instead of write-back using an MTRR also solves the > problem. It appears to be a cache incoherency issue in the graphics > aperture. Interesting. Unfortunately it is also not correct. It was intentional to mark the IOMMU half. of the aperture write-back, as opposed to uncached as the AGP half. Otherwise you get illegal cache attribute conflicts with the memory that is being remapped which can also cause corruption. The Northbridge guarantees coherency over the aperture, but only if the caching attributes match. You would need to change_page_attr() every kernel address that is mapped into the IOMMU to use an uncached aperture. AGP does this, but the frequency of mapping for the IOMMU is much higher and it would be prohibitively costly unfortunately. In the past we saw corruptions from such conflicts, so this is more than just theory. I suspect you traded a more easy to trigger corruption with a more subtle one. -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-17 22:15 ` Andi Kleen @ 2007-01-18 21:57 ` Chip Coldwell 2007-01-18 22:49 ` Andi Kleen 0 siblings, 1 reply; 27+ messages in thread From: Chip Coldwell @ 2007-01-18 21:57 UTC (permalink / raw) To: Andi Kleen Cc: Chris Wedgwood, Christoph Anton Mitterer, Robert Hancock, linux-kernel, knweiss, andersen, krader, lfriedman, linux-nforce-bugs On Thu, 18 Jan 2007, Andi Kleen wrote: > > The Northbridge guarantees coherency over the aperture, but > only if the caching attributes match. That's interesting. Makes sense, I suppose. > You would need to change_page_attr() every kernel address that is mapped into > the IOMMU to use an uncached aperture. AGP does this, but the frequency of > mapping for the IOMMU is much higher and it would be prohibitively costly > unfortunately. But it still might be a reasonable thing to do to test the theory that the problem is cache coherency across the graphics aperture, even if it isn't a long-term solution for the problem. > In the past we saw corruptions from such conflicts, so this is more > than just theory. I suspect you traded a more easy to trigger > corruption with a more subtle one. Yup. That was the inspiration for the script. Chip -- Charles M. "Chip" Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-18 21:57 ` Chip Coldwell @ 2007-01-18 22:49 ` Andi Kleen 0 siblings, 0 replies; 27+ messages in thread From: Andi Kleen @ 2007-01-18 22:49 UTC (permalink / raw) To: Chip Coldwell Cc: Chris Wedgwood, Christoph Anton Mitterer, Robert Hancock, linux-kernel, knweiss, andersen, krader, lfriedman, linux-nforce-bugs On Friday 19 January 2007 08:57, Chip Coldwell wrote: > But it still might be a reasonable thing to do to test the theory that > the problem is cache coherency across the graphics aperture, even if > it isn't a long-term solution for the problem. I suspect it would disturb timing so badly that it might hide the original problem. If that is true then adding udelays might hide it too. Ok i guess you could test with a UP kernel. There change_page_attr should be much cheaper because it doesn't need to IPI to other CPUs. Also use a .2.6.20-rc* kernel that uses CLFLUSH in there, not WBINVD which is also very costly. Anyways I guess we can just wait what the hardware people figure out. -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-16 21:29 ` Andi Kleen 2007-01-17 1:17 ` Christoph Anton Mitterer 2007-01-17 14:48 ` Chip Coldwell @ 2007-01-18 9:29 ` joachim 2007-01-18 14:34 ` Christoph Anton Mitterer 2007-01-18 16:42 ` Chris Wedgwood 2007-01-18 11:00 ` Erik Andersen 2007-02-21 17:03 ` Chip Coldwell 4 siblings, 2 replies; 27+ messages in thread From: joachim @ 2007-01-18 9:29 UTC (permalink / raw) To: Andi Kleen Cc: Chris Wedgwood, Christoph Anton Mitterer, Robert Hancock, linux-kernel, knweiss, andersen, krader, lfriedman, linux-nforce-bugs Andi Kleen <ak@suse.de> wrote on 22:29 16/01/2007 +0100 : > AMD is looking at the issue. Only Nvidia chipsets seem to be affected, > although there were similar problems on VIA in the past too. > Unless a good workaround comes around soon I'll probably default > to iommu=soft on Nvidia. > > -Andi Not only has it only been on Nvidia chipsets but we have only seen reports on the Nvidia CK804 SATA controller. Please write in or add yourself to the bugzilla entry [1] and tell us which hardware you have if you get 4kB pagesize corruption and it goes away with "iommu=soft". thanks -joachim [1] http://bugzilla.kernel.org/show_bug.cgi?id=7768 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-18 9:29 ` joachim @ 2007-01-18 14:34 ` Christoph Anton Mitterer 2007-01-18 16:42 ` Chris Wedgwood 1 sibling, 0 replies; 27+ messages in thread From: Christoph Anton Mitterer @ 2007-01-18 14:34 UTC (permalink / raw) To: joachim Cc: Andi Kleen, Chris Wedgwood, Robert Hancock, linux-kernel, knweiss, andersen, krader, lfriedman, linux-nforce-bugs [-- Attachment #1: Type: text/plain, Size: 478 bytes --] joachim wrote: > Not only has it only been on Nvidia chipsets but we have only seen > reports on the Nvidia CK804 SATA controller. Please write in or add > yourself to the bugzilla entry [1] and tell us which hardware you have > if you get 4kB pagesize corruption and it goes away with "iommu=soft". How do I find out if I get a 4kB pagesize corruption (or is this the same as "our corruption"? Chris. btw: Should we only post the controller, or other hardware details, too? [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-18 9:29 ` joachim 2007-01-18 14:34 ` Christoph Anton Mitterer @ 2007-01-18 16:42 ` Chris Wedgwood 1 sibling, 0 replies; 27+ messages in thread From: Chris Wedgwood @ 2007-01-18 16:42 UTC (permalink / raw) To: joachim Cc: Andi Kleen, Christoph Anton Mitterer, Robert Hancock, linux-kernel, knweiss, andersen, krader, lfriedman, linux-nforce-bugs On Thu, Jan 18, 2007 at 10:29:14AM +0100, joachim wrote: > Not only has it only been on Nvidia chipsets but we have only seen > reports on the Nvidia CK804 SATA controller. People have reported problems with other controllers. I have one here I can test given a day or so. I don't think it's SATA related, it just happens that it shows up well there, for networking you would end up with the odd corrupted packet probably and end up just dropping those so it might not be noticeable. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-16 21:29 ` Andi Kleen ` (2 preceding siblings ...) 2007-01-18 9:29 ` joachim @ 2007-01-18 11:00 ` Erik Andersen 2007-01-18 14:43 ` Christoph Anton Mitterer ` (2 more replies) 2007-02-21 17:03 ` Chip Coldwell 4 siblings, 3 replies; 27+ messages in thread From: Erik Andersen @ 2007-01-18 11:00 UTC (permalink / raw) To: Andi Kleen Cc: Chris Wedgwood, Christoph Anton Mitterer, Robert Hancock, linux-kernel, knweiss, krader, lfriedman, linux-nforce-bugs On Wed Jan 17, 2007 at 08:29:53AM +1100, Andi Kleen wrote: > AMD is looking at the issue. Only Nvidia chipsets seem to be affected, > although there were similar problems on VIA in the past too. > Unless a good workaround comes around soon I'll probably default > to iommu=soft on Nvidia. I just tried again and while using iommu=soft does avoid the corruption problem, as with previous kernels with 2.6.20-rc5 using iommu=soft still makes my pcHDTV HD5500 DVB cards not work. I still have to disable memhole and lose 1 GB. :-( -Erik -- Erik B. Andersen http://codepoet-consulting.com/ --This message was written using 73% post-consumer electrons-- ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-18 11:00 ` Erik Andersen @ 2007-01-18 14:43 ` Christoph Anton Mitterer 2007-01-18 16:36 ` Chris Wedgwood 2007-01-18 23:23 ` Andi Kleen 2 siblings, 0 replies; 27+ messages in thread From: Christoph Anton Mitterer @ 2007-01-18 14:43 UTC (permalink / raw) To: andersen, Andi Kleen, Chris Wedgwood, Christoph Anton Mitterer, Robert Hancock, linux-kernel, knweiss, krader, lfriedman, linux-nforce-bugs [-- Attachment #1: Type: text/plain, Size: 361 bytes --] Erik Andersen wrote: > I just tried again and while using iommu=soft does avoid the > corruption problem, as with previous kernels with 2.6.20-rc5 > using iommu=soft still makes my pcHDTV HD5500 DVB cards not work. > I still have to disable memhole and lose 1 GB. :-( Please add this to the bugreport (http://bugzilla.kernel.org/show_bug.cgi?id=7768) Chris. [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-18 11:00 ` Erik Andersen 2007-01-18 14:43 ` Christoph Anton Mitterer @ 2007-01-18 16:36 ` Chris Wedgwood 2007-01-18 23:23 ` Andi Kleen 2 siblings, 0 replies; 27+ messages in thread From: Chris Wedgwood @ 2007-01-18 16:36 UTC (permalink / raw) To: andersen, Andi Kleen, Christoph Anton Mitterer, Robert Hancock, linux-kernel, knweiss, krader, lfriedman, linux-nforce-bugs On Thu, Jan 18, 2007 at 04:00:28AM -0700, Erik Andersen wrote: > I just tried again and while using iommu=soft does avoid the > corruption problem, as with previous kernels with 2.6.20-rc5 using > iommu=soft still makes my pcHDTV HD5500 DVB cards not work. i would file a separate bug about that, presumably it won't work in intel based machines too if the driver has dma api bugs ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-18 11:00 ` Erik Andersen 2007-01-18 14:43 ` Christoph Anton Mitterer 2007-01-18 16:36 ` Chris Wedgwood @ 2007-01-18 23:23 ` Andi Kleen 2 siblings, 0 replies; 27+ messages in thread From: Andi Kleen @ 2007-01-18 23:23 UTC (permalink / raw) To: andersen Cc: Chris Wedgwood, Christoph Anton Mitterer, Robert Hancock, linux-kernel, knweiss, krader, lfriedman, linux-nforce-bugs On Thursday 18 January 2007 22:00, Erik Andersen wrote: > I just tried again and while using iommu=soft does avoid the > corruption problem, as with previous kernels with 2.6.20-rc5 > using iommu=soft still makes my pcHDTV HD5500 DVB cards not work. This must be some separate bug and needs to be fixed anyways. -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-16 21:29 ` Andi Kleen ` (3 preceding siblings ...) 2007-01-18 11:00 ` Erik Andersen @ 2007-02-21 17:03 ` Chip Coldwell 4 siblings, 0 replies; 27+ messages in thread From: Chip Coldwell @ 2007-02-21 17:03 UTC (permalink / raw) To: linux-kernel Cc: Andi Kleen, Andy Currid, Mark Langsdorf, Joe Moriarty, Russ Doty, Lonni Friedman, Thorsten Kellermann On Wed, 17 Jan 2007, Andi Kleen wrote: > On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote: > > On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote: > > > I agree,... it seems drastic, but this is the only really secure > > > solution. > > > > I'd like to here from Andi how he feels about this? It seems like a > > somewhat drastic solution in some ways given a lot of hardware doesn't > > seem to be affected (or maybe in those cases it's just really hard to > > hit, I don't know). > > AMD is looking at the issue. Only Nvidia chipsets seem to be affected, > although there were similar problems on VIA in the past too. > Unless a good workaround comes around soon I'll probably default > to iommu=soft on Nvidia. We (Sun, AMD, Nvidia and Red Hat) have been testing a patch that seems to solve the problem. AMD and Nvidia analyzed an HDT trace that seemed to indicate that CPU updates of the GATT were still in cache when a subsequent table walk caused by a device load used a stale GATT PTE. That analysis inspired this patch, submitted to this list as an RFC. It is not obvious (to me, at least) why this problem has only shown up on Nvidia SATA controllers. We are continuing to investigate. diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c index 030eb37..1dd461a 100644 --- a/arch/x86_64/kernel/pci-gart.c +++ b/arch/x86_64/kernel/pci-gart.c @@ -69,6 +69,8 @@ static u32 gart_unmapped_entry; #define AGPEXTERN #endif +#define GATT_CLFLUSH(i) asm volatile ("clflush (%0)" :: "r" (iommu_gatt_base + (i))) + /* backdoor interface to AGP driver */ AGPEXTERN int agp_memory_reserved; AGPEXTERN __u32 *agp_gatt_table; @@ -221,6 +223,7 @@ static dma_addr_t dma_map_area(struct device *dev, dma_addr_t phys_mem, for (i = 0; i < npages; i++) { iommu_gatt_base[iommu_page + i] = GPTE_ENCODE(phys_mem); SET_LEAK(iommu_page + i); + GATT_CLFLUSH(iommu_page + i); phys_mem += PAGE_SIZE; } return iommu_bus_base + iommu_page*PAGE_SIZE + (phys_mem & ~PAGE_MASK); @@ -348,6 +351,7 @@ static int __dma_map_cont(struct scatterlist *sg, int start, int stopat, while (pages--) { iommu_gatt_base[iommu_page] = GPTE_ENCODE(addr); SET_LEAK(iommu_page); + GATT_CLFLUSH(iommu_page); addr += PAGE_SIZE; iommu_page++; } Chip -- Charles M. "Chip" Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* RE: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-16 20:31 ` Chris Wedgwood 2007-01-16 21:29 ` Andi Kleen @ 2007-01-16 21:54 ` Allen Martin 2007-01-17 1:12 ` Christoph Anton Mitterer 2 siblings, 0 replies; 27+ messages in thread From: Allen Martin @ 2007-01-16 21:54 UTC (permalink / raw) To: Chris Wedgwood, Christoph Anton Mitterer Cc: Robert Hancock, linux-kernel, knweiss, ak, andersen, krader, Lonni Friedman, Linux-Nforce-Bugs > I'd like to here from Andi how he feels about this? It seems like a > somewhat drastic solution in some ways given a lot of hardware doesn't > seem to be affected (or maybe in those cases it's just really hard to > hit, I don't know). > > > Well we can hope that Nvidia will find out more (though I'm not too > > optimistic). > > Ideally someone from AMD needs to look into this, if some mainboards > really never see this problem, then why is that? Is there errata that > some BIOS/mainboard vendors are dealing with that others are not? NVIDIA and AMD are ivestigating this issue, we don't know what the problem is yet. ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ----------------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-16 20:31 ` Chris Wedgwood 2007-01-16 21:29 ` Andi Kleen 2007-01-16 21:54 ` Allen Martin @ 2007-01-17 1:12 ` Christoph Anton Mitterer 2 siblings, 0 replies; 27+ messages in thread From: Christoph Anton Mitterer @ 2007-01-17 1:12 UTC (permalink / raw) To: Chris Wedgwood Cc: Robert Hancock, linux-kernel, knweiss, ak, andersen, krader, lfriedman, linux-nforce-bugs [-- Attachment #1: Type: text/plain, Size: 1398 bytes --] Chris Wedgwood wrote: > I'd like to here from Andi how he feels about this? It seems like a > somewhat drastic solution in some ways given a lot of hardware doesn't > seem to be affected (or maybe in those cases it's just really hard to > hit, I don't know). > Yes this might be true,.. those who have reported working systems might just have a configuration where the error happens even rarer or where some other event(s) work around it. >> Well we can hope that Nvidia will find out more (though I'm not too >> optimistic). >> > Ideally someone from AMD needs to look into this, if some mainboards > really never see this problem, then why is that? Is there errata that > some BIOS/mainboard vendors are dealing with that others are not? > Some time ago I've asked here in a post if some of you could try to contact AMD and/or Nvidia,.. as no one did,... I wrote them again (to all forums and email addresses I knew). (You can see the text here http://www.nvnews.net/vbulletin/showthread.php?t=82909). Now Nvidia replied and it seems (thanks to Mr. Friedman) that they're actually try to investigate in the issue... I received on reply from AMD (actually in German which is strange as I wrote to their US support)... where they told me they'd have forwarded my mail to their Linux engineers... but no reply since then. Perhaps some of you have some "contacts" and can use them... [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-16 18:01 ` data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) Chris Wedgwood 2007-01-16 19:52 ` Christoph Anton Mitterer @ 2007-01-16 20:16 ` Arkadiusz Miskiewicz 2007-01-16 20:21 ` Christoph Anton Mitterer 2007-01-16 20:31 ` Krzysztof Halasa 2 siblings, 1 reply; 27+ messages in thread From: Arkadiusz Miskiewicz @ 2007-01-16 20:16 UTC (permalink / raw) To: Chris Wedgwood Cc: Robert Hancock, Christoph Anton Mitterer, linux-kernel, knweiss, ak, andersen, krader, lfriedman, linux-nforce-bugs On Tuesday 16 January 2007 19:01, Chris Wedgwood wrote: > On Tue, Jan 16, 2007 at 08:26:05AM -0600, Robert Hancock wrote: > > >If one use iommu=soft the sata_nv will continue to use the new code > > >for the ADMA, right? > > > > Right, that shouldn't affect it. > > right now i'm thinking if we can't figure out which cpu/bios > combinations are safe we might almost be better off doing iommu=soft > for *all* k8 stuff except for those that are whitelisted; though this > seems extremely drastic > > it's not clear if this only affect nvidia based chipsets, the nature > of the corruption makes me think it's not an iommu software bug (we > see a few bytes not entire pages corrupted, it's not even clear if > it's entire cachelines trashed) --- perhaps other vendors have more > recent bios errata or maybe it's just that nvidia has sold a lot of > these so they are more visible? (i'm assuming at this point it might > be some kind of cpu errata that some bioses deal with because some > mainboards don't ever seem to see this whilst others do) FYI it seems that I was also hit by this bug with qlogic fc card + adaptec taro raid controller on Thunder K8SRE S2891 mainboard with nvidia chipset on it. http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/b8bdbde9721f7d35/45701994c95fe2cf?lnk=st&q=arkadiusz+fibre&rnum=8#45701994c95fe2cf -- Arkadiusz Miśkiewicz PLD/Linux Team arekm / maven.pl http://ftp.pld-linux.org/ ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-16 20:16 ` Arkadiusz Miskiewicz @ 2007-01-16 20:21 ` Christoph Anton Mitterer 0 siblings, 0 replies; 27+ messages in thread From: Christoph Anton Mitterer @ 2007-01-16 20:21 UTC (permalink / raw) To: Arkadiusz Miskiewicz; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 483 bytes --] Arkadiusz Miskiewicz wrote: > FYI it seems that I was also hit by this bug with qlogic fc card + adaptec > taro raid controller on Thunder K8SRE S2891 mainboard with nvidia chipset on > it. > > http://groups.google.com/group/fa.linux.kernel/browse_thread/thread/b8bdbde9721f7d35/45701994c95fe2cf?lnk=st&q=arkadiusz+fibre&rnum=8#45701994c95fe2cf > I'm aware of your old thread and at least I considered your postings from it :-) Anyway, thanks for your information. =) Chris. [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-16 18:01 ` data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) Chris Wedgwood 2007-01-16 19:52 ` Christoph Anton Mitterer 2007-01-16 20:16 ` Arkadiusz Miskiewicz @ 2007-01-16 20:31 ` Krzysztof Halasa 2007-01-16 20:35 ` Chris Wedgwood 2 siblings, 1 reply; 27+ messages in thread From: Krzysztof Halasa @ 2007-01-16 20:31 UTC (permalink / raw) To: Chris Wedgwood Cc: Robert Hancock, Christoph Anton Mitterer, linux-kernel, knweiss, ak, andersen, krader, lfriedman, linux-nforce-bugs Chris Wedgwood <cw@f00f.org> writes: > right now i'm thinking if we can't figure out which cpu/bios > combinations are safe we might almost be better off doing iommu=soft > for *all* k8 stuff except for those that are whitelisted; though this > seems extremely drastic Do you (someone) have (maintain) a list of affected systems, including motherboard type and possibly version, BIOS version and CPU type? A similar list of unaffected systems with 4GB+ RAM could be useful, too. I'm afraid with default iommu=soft it will be a mystery forever. -- Krzysztof Halasa ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) 2007-01-16 20:31 ` Krzysztof Halasa @ 2007-01-16 20:35 ` Chris Wedgwood 0 siblings, 0 replies; 27+ messages in thread From: Chris Wedgwood @ 2007-01-16 20:35 UTC (permalink / raw) To: Krzysztof Halasa Cc: Robert Hancock, Christoph Anton Mitterer, linux-kernel, knweiss, ak, andersen, krader, lfriedman, linux-nforce-bugs On Tue, Jan 16, 2007 at 09:31:31PM +0100, Krzysztof Halasa wrote: > Do you (someone) have (maintain) a list of affected systems, > including motherboard type and possibly version, BIOS version and > CPU type? A similar list of unaffected systems with 4GB+ RAM could > be useful, too. All I know is that some system hit this and some don't seem to. Why it's not clear. > I'm afraid with default iommu=soft it will be a mystery forever. Right, but given windows doesn't use the iommu at all and that a lot of newer hardware/drivers doesn't need it it might be the safest option since it clearly has been causing corruption for a number of people for well over a year now. ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2007-03-12 14:56 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <fa.Tb2r8Il/2H8l9dhmsGAAQB2WEZg@ifi.uio.no>
[not found] ` <fa.MydhQJqKsehLS9lIc2JSZC8Q77A@ifi.uio.no>
[not found] ` <fa.M9SCVAz1qZ6vrR9HTrRb95KdBSY@ifi.uio.no>
[not found] ` <fa.5WKg4jbSxjbUXedux14VStNyV+8@ifi.uio.no>
[not found] ` <fa.lphwLNvMksoBFaqfqCzMG1UVhsA@ifi.uio.no>
2007-03-05 6:25 ` data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) Robert Hancock
2007-03-12 13:06 ` Andi Kleen
2007-03-12 14:56 ` Jeff Garzik
[not found] <fa.E9jVXDLMKzMZNCbslzUxjMhsInE@ifi.uio.no>
2007-01-03 23:41 ` data corruption with nvidia chipsets and IDE/SATA drives // memory hole mapping related bug?! Robert Hancock
2007-01-15 22:56 ` Christoph Anton Mitterer
2007-01-15 23:05 ` Christoph Anton Mitterer
2007-01-16 0:23 ` Robert Hancock
2007-01-16 13:54 ` Christoph Anton Mitterer
2007-01-16 14:26 ` Robert Hancock
2007-01-16 18:01 ` data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?) Chris Wedgwood
2007-01-16 19:52 ` Christoph Anton Mitterer
2007-01-16 20:31 ` Chris Wedgwood
2007-01-16 21:29 ` Andi Kleen
2007-01-17 1:17 ` Christoph Anton Mitterer
2007-01-17 14:48 ` Chip Coldwell
2007-01-17 19:46 ` Chip Coldwell
2007-01-17 22:15 ` Andi Kleen
2007-01-18 21:57 ` Chip Coldwell
2007-01-18 22:49 ` Andi Kleen
2007-01-18 9:29 ` joachim
2007-01-18 14:34 ` Christoph Anton Mitterer
2007-01-18 16:42 ` Chris Wedgwood
2007-01-18 11:00 ` Erik Andersen
2007-01-18 14:43 ` Christoph Anton Mitterer
2007-01-18 16:36 ` Chris Wedgwood
2007-01-18 23:23 ` Andi Kleen
2007-02-21 17:03 ` Chip Coldwell
2007-01-16 21:54 ` Allen Martin
2007-01-17 1:12 ` Christoph Anton Mitterer
2007-01-16 20:16 ` Arkadiusz Miskiewicz
2007-01-16 20:21 ` Christoph Anton Mitterer
2007-01-16 20:31 ` Krzysztof Halasa
2007-01-16 20:35 ` Chris Wedgwood
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox