* Strange write errors on FAT32 partition (maybe an FAT32 bug?!) @ 2006-11-07 15:21 Christoph Anton Mitterer 2006-11-07 18:55 ` OGAWA Hirofumi 0 siblings, 1 reply; 24+ messages in thread From: Christoph Anton Mitterer @ 2006-11-07 15:21 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 3671 bytes --] Hi. I've got a very strange problem which I'm going to try to explain here in detail. As one could easily suppose that issue results from hardware problems I'm going to explain it very very detailed because the facts let me think that my hardware is ok (unfortunately I've got no other computer to try to reproduce the problem). I'm archiving my personal CD-DA collection and for that reason I had to install Windows (yes I feel very ugly ;-) ) because I wanted to use EAC (Exact Audo Copy) for that, due to its superior features. So I've installed Windows XP (on NTFS) and created on additional FAT32 partition to store the extracted audio. I did several badblock scans in addition to long SMART checks on the whole drive, so the disc should be ok. When extracting (under Windows): I extracted each CD twice,.. so Windows/WAC should write exactly the same data for each CD twice to the FAT32 partition. This is important because I think, that if there would be errors in the drive/FAT32 filesystem/RAM/CPU it would be likely that these files are _not_ equal. After ripping about 20 CDs I went back to linux and wanted to compare the pairs of extracted data (originally I did that just to find any errors in the ripping process). Before doing anything I wrote sha512sums for all files. At some point (I did that procedure for every CD) I've copied (cp -a) the directory with all data for that CD to a temporary location on the FAT32 partition. Right after that, I've diffed the whole stuff (diff -q -r dirA dirB). And there were differences in one file!!! I copied again diffed again,.. and differences again (but in another file). First of all I've thought that this would be an hardware issue. I supposed the RAM could be damaged because diff would use probably the cached data from the files I've had just copied. So I did excessive memtests (memtest86+) for several hours/passes. But no errors have been found in my 4GB ECC/Reg RAMs. So I supposed it could be a CPU related problem (2x DualCore Opteron 275) and I've startet an mprime/gimps torture test on each core and let it run for 16 hours with no errors at all. Some days later I had the same or at least a very equal problem. I copied,... diffed,.. but this time _no_ differences. I restartet the system (thus the file cache was cleard)... diffed again,.. and know differences!! This was also a reason to not believe that my RAM is defect but the writing to the FAT32 disc. Ok,... the original files written by Windows/EAC seem to be ok and never changed or corrupted. Why? First of all, the sha512sums are still equal, but one could say, that the data was already damaged when calculating those hashes. But EAC stores internally a hash (some CRCxx) which is (afaik) calculated from data from the RAM. So if the RAMs are ok (and I suppose that because of my memtests) the hashes should be correct, too. I compared those EAC hashes with the original data and all data seem to be correct. So this is as far as I can say,.. only a Linux/FAT32 related problem, as the data written by Windows seems to be correct. And as I've said, I'm pretty sure my hardware is correct, too. The strange thing is that one time the differences were found directly after copying (thus one would thing RAM is damaged, because the data was probalby (I cannot tell this for sure) taken from file cache). and the other time after restarting with a certainly empty file cache. Any ideas? I'm willing to help debugging and so on but I must admit that I need someone to say me what to do :D btw: my system: Debian sid (which should be unimportant) kernel 2.6.18.2 For further data please ask :) Thanks in advance, Chris. [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-07 15:21 Strange write errors on FAT32 partition (maybe an FAT32 bug?!) Christoph Anton Mitterer @ 2006-11-07 18:55 ` OGAWA Hirofumi 2006-11-07 21:32 ` Christoph Anton Mitterer ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: OGAWA Hirofumi @ 2006-11-07 18:55 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-kernel Christoph Anton Mitterer <calestyo@scientia.net> writes: > The strange thing is that one time the differences were found directly > after copying (thus one would thing RAM is damaged, because the data was > probalby (I cannot tell this for sure) taken from file cache). > and the other time after restarting with a certainly empty file cache. > > Any ideas? I'm willing to help debugging and so on but I must admit that > I need someone to say me what to do :D bit interesting. Could you send the output of diff? I'd like to see how it's breaking. -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-07 18:55 ` OGAWA Hirofumi @ 2006-11-07 21:32 ` Christoph Anton Mitterer 2006-11-09 18:32 ` Christoph Anton Mitterer 2006-11-09 22:23 ` Christoph Anton Mitterer 2 siblings, 0 replies; 24+ messages in thread From: Christoph Anton Mitterer @ 2006-11-07 21:32 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: linux-kernel OGAWA Hirofumi wrote: > Christoph Anton Mitterer <calestyo@scientia.net> writes: > > >> The strange thing is that one time the differences were found directly >> after copying (thus one would thing RAM is damaged, because the data was >> probalby (I cannot tell this for sure) taken from file cache). >> and the other time after restarting with a certainly empty file cache. >> >> Any ideas? I'm willing to help debugging and so on but I must admit that >> I need someone to say me what to do :D >> > bit interesting. Could you send the output of diff? I'd like to see > how it's breaking. > Unfortunately I don't have currently any of the corrupted files (deleted them,..) but as soon as I'll encounter the issue again I'll send you :) But as far as I remember there was no pattern,.. on time a small part was replaced by 0x0's and the other time by any bytes. Chris. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-07 18:55 ` OGAWA Hirofumi 2006-11-07 21:32 ` Christoph Anton Mitterer @ 2006-11-09 18:32 ` Christoph Anton Mitterer [not found] ` <45539188.5080607@atipa.com> 2006-11-09 22:23 ` Christoph Anton Mitterer 2 siblings, 1 reply; 24+ messages in thread From: Christoph Anton Mitterer @ 2006-11-09 18:32 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: linux-kernel OGAWA Hirofumi wrote: > Christoph Anton Mitterer <calestyo@scientia.net> writes: > > >> The strange thing is that one time the differences were found directly >> after copying (thus one would thing RAM is damaged, because the data was >> probalby (I cannot tell this for sure) taken from file cache). >> and the other time after restarting with a certainly empty file cache. >> >> Any ideas? I'm willing to help debugging and so on but I must admit that >> I need someone to say me what to do :D >> > > bit interesting. Could you send the output of diff? I'd like to see > how it's breaking. > Ok today I have perhaps some more information: I've copied around 30 GBs from FAT32 to ext3. I diffed everything,.. differences in one file. I recopied that one file, rebooted, diffed again differences in another file: euler:~# diff -q -r /mnt/tmp/CDDA_DATA_1 /mnt/CDDA/EAC_DATA_1 Files /mnt/tmp/CDDA_DATA_1/LOTR 1/16.01.wav and /mnt/CDDA/EAC_DATA_1/LOTR 1/16.01.wav differ Than after the complete diff was finished I diffed the single file again euler:~# diff /mnt/tmp/CDDA_DATA_1/LOTR\ 1/16.01.wav /mnt/CDDA/EAC_DATA_1/LOTR\ 1/16.01.wav => then,.. no differences?! Am I crazy or what? Is this know an memory problem? But if so why does memtest give me no errors? Regards, Chris, ^ permalink raw reply [flat|nested] 24+ messages in thread
[parent not found: <45539188.5080607@atipa.com>]
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) [not found] ` <45539188.5080607@atipa.com> @ 2006-11-09 20:45 ` Christoph Anton Mitterer 2006-11-09 20:54 ` Roger Heflin 0 siblings, 1 reply; 24+ messages in thread From: Christoph Anton Mitterer @ 2006-11-09 20:45 UTC (permalink / raw) To: Roger Heflin; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 743 bytes --] Roger Heflin wrote: > Christoph, > > Install then edac_mc module, and make sure through the > sysctl command that pci parity checking is enabled. > > I have seen pci parity errors produce this sort of results, > ie make 100 identical 50MB files, and cksum them and one > will be wrong, do it a again, and the "wrong" one is now > right, but someone else is "wrong". Ah thx,... is it in the vanilla kernel? And do you know of any possible results that this issue has? When I just read data (see my original stuff with fat32) is it possible that this had been modified or damaged? Or are the only consequences that diff errors occur? And what is responsible for that parity errors? Is it possible that any hardware is damaged? Thanks, Chris. [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-09 20:45 ` Christoph Anton Mitterer @ 2006-11-09 20:54 ` Roger Heflin 2006-11-09 20:59 ` Christoph Anton Mitterer 0 siblings, 1 reply; 24+ messages in thread From: Roger Heflin @ 2006-11-09 20:54 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-kernel Christoph Anton Mitterer wrote: > Roger Heflin wrote: >> Christoph, >> >> Install then edac_mc module, and make sure through the >> sysctl command that pci parity checking is enabled. >> >> I have seen pci parity errors produce this sort of results, >> ie make 100 identical 50MB files, and cksum them and one >> will be wrong, do it a again, and the "wrong" one is now >> right, but someone else is "wrong". > Ah thx,... is it in the vanilla kernel? > And do you know of any possible results that this issue has? When I just > read data (see my original stuff with fat32) is it possible that this > had been modified or damaged? > Or are the only consequences that diff errors occur? > > And what is responsible for that parity errors? Is it possible that any > hardware is damaged? The failure can manifest itself in many ways, I have only seen it as a read failure, but there should be no reason why it cannot also show as a write failure. It should be in the later vanilla kernels, it won't be in the earlier ones, I would do a find /lib/modules -name "*edac*" -ls It is a hw issue, either something is running faster that it should be (pci bus set to fast for the given hardware/config) or something is broken. Roger ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-09 20:54 ` Roger Heflin @ 2006-11-09 20:59 ` Christoph Anton Mitterer 2006-11-09 21:02 ` Roger Heflin 2006-11-09 21:03 ` Roger Heflin 0 siblings, 2 replies; 24+ messages in thread From: Christoph Anton Mitterer @ 2006-11-09 20:59 UTC (permalink / raw) To: Roger Heflin; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 729 bytes --] Roger Heflin wrote: > The failure can manifest itself in many ways, I have > only seen it as a read failure, but there should be no > reason why it cannot also show as a write failure. > > It should be in the later vanilla kernels, it won't > be in the earlier ones, I would do a > find /lib/modules -name "*edac*" -ls > > It is a hw issue, either something is running faster that > it should be (pci bus set to fast for the given hardware/config) > or something is broken. The strange thing is that it always occures on the copied data,.. not the original (which is on another disk). But wouldn those parity errors not occur in general? For example al my sha1sums -c sumfile checks are working corretly on the original disk :/ [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-09 20:59 ` Christoph Anton Mitterer @ 2006-11-09 21:02 ` Roger Heflin 2006-11-09 21:57 ` Christoph Anton Mitterer 2006-11-09 21:03 ` Roger Heflin 1 sibling, 1 reply; 24+ messages in thread From: Roger Heflin @ 2006-11-09 21:02 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-kernel Christoph Anton Mitterer wrote: > Roger Heflin wrote: >> The failure can manifest itself in many ways, I have >> only seen it as a read failure, but there should be no >> reason why it cannot also show as a write failure. >> >> It should be in the later vanilla kernels, it won't >> be in the earlier ones, I would do a >> find /lib/modules -name "*edac*" -ls >> >> It is a hw issue, either something is running faster that >> it should be (pci bus set to fast for the given hardware/config) >> or something is broken. > The strange thing is that it always occures on the copied data,.. not > the original (which is on another disk). But wouldn those parity errors > not occur in general? > For example al my sha1sums -c sumfile checks are working corretly on the > original disk :/ It depends on which PCI bus has the issue and which hardware is using the bus with the issue. There are several different buses in most machines, and they are broken out different ways, and the error can only affect one or 2 devices on a certain part of the bus. Roger ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-09 21:02 ` Roger Heflin @ 2006-11-09 21:57 ` Christoph Anton Mitterer 2006-11-09 22:02 ` Roger Heflin 0 siblings, 1 reply; 24+ messages in thread From: Christoph Anton Mitterer @ 2006-11-09 21:57 UTC (permalink / raw) To: Roger Heflin; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 135 bytes --] It seems that I don't get any data at all. I only get the edac_mc module but none that seems to support my chipset or so... Any ideas? [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-09 21:57 ` Christoph Anton Mitterer @ 2006-11-09 22:02 ` Roger Heflin 2006-11-09 22:08 ` Christoph Anton Mitterer 0 siblings, 1 reply; 24+ messages in thread From: Roger Heflin @ 2006-11-09 22:02 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-kernel Christoph Anton Mitterer wrote: > It seems that I don't get any data at all. > I only get the edac_mc module but none that seems to support my chipset > or so... > Any ideas? The mc part does pci parity, it is separate from the chipset driver, I have even used the _mc part on a Itanium with no chipset driver at all and had it report parity errors properly, so I expect just the mc driver to work. You would need the k8 module for the cpu, but that is only if you want ECC checking also. If you got the _mc loaded do a "sysctl -a | grep mc" and see what things are set how, and reset if necessary check_pci_parity to 1. Roger ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-09 22:02 ` Roger Heflin @ 2006-11-09 22:08 ` Christoph Anton Mitterer 2006-11-09 22:14 ` Roger Heflin 2006-11-10 10:28 ` Alan Cox 0 siblings, 2 replies; 24+ messages in thread From: Christoph Anton Mitterer @ 2006-11-09 22:08 UTC (permalink / raw) To: Roger Heflin; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1670 bytes --] Roger Heflin wrote: > The mc part does pci parity, it is separate from the > chipset driver, What? I thought the MC part does ECC and the pci part the parity stuff? > I have even used the _mc part on a > Itanium with no chipset driver at all and had it report > parity errors properly, so I expect just the mc driver > to work. > > You would need the k8 module for the cpu, but that is > only if you want ECC checking also. > Where do I get this only when patching from CVS? > If you got the _mc loaded do a "sysctl -a | grep mc" and > see what things are set how, and reset if necessary > check_pci_parity to 1. Well ok,.. module is loaded now: I've set check_pci_parity to 1 everything else is 0 in sysfs... # sysctl -a | grep mc error: "Operation not permitted" reading key "net.ipv6.route.flush" net.ipv6.neigh.eth1.mcast_solicit = 3 net.ipv6.neigh.eth0.mcast_solicit = 3 net.ipv6.neigh.lo.mcast_solicit = 3 net.ipv6.neigh.default.mcast_solicit = 3 net.ipv4.conf.ppp0.mc_forwarding = 0 net.ipv4.conf.eth1.mc_forwarding = 0 net.ipv4.conf.eth0.mc_forwarding = 0 net.ipv4.conf.lo.mc_forwarding = 0 net.ipv4.conf.default.mc_forwarding = 0 net.ipv4.conf.all.mc_forwarding = 0 net.ipv4.neigh.ppp0.mcast_solicit = 3 net.ipv4.neigh.eth1.mcast_solicit = 3 net.ipv4.neigh.eth0.mcast_solicit = 3 net.ipv4.neigh.lo.mcast_solicit = 3 net.ipv4.neigh.default.mcast_solicit = 3 error: "Operation not permitted" reading key "net.ipv4.route.flush" error: "Invalid argument" reading key "fs.binfmt_misc.register" But this has nothing to do with edac, has it? And I've already had diff errors again,.. so if there had been some parity issue it should have been logged, right? [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-09 22:08 ` Christoph Anton Mitterer @ 2006-11-09 22:14 ` Roger Heflin 2006-11-09 22:24 ` Christoph Anton Mitterer 2006-11-10 10:28 ` Alan Cox 1 sibling, 1 reply; 24+ messages in thread From: Roger Heflin @ 2006-11-09 22:14 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-kernel Christoph Anton Mitterer wrote: > Roger Heflin wrote: >> The mc part does pci parity, it is separate from the >> chipset driver, > What? I thought the MC part does ECC and the pci part the parity stuff? > mc does pci parity all by itself, it is also the main module holding the ecc stuff together, but you get no ecc without the chipset/cpu specific module. >> I have even used the _mc part on a >> Itanium with no chipset driver at all and had it report >> parity errors properly, so I expect just the mc driver >> to work. >> >> You would need the k8 module for the cpu, but that is >> only if you want ECC checking also. >> > Where do I get this only when patching from CVS? I don't know the status is of the k8 modules, some distro kernels include it, I don't know if vanilla has it yet. mcelog should also report ecc errors, but you would need to be running the mcelog userspace program every so often to realize that errors where happening. > >> If you got the _mc loaded do a "sysctl -a | grep mc" and >> see what things are set how, and reset if necessary >> check_pci_parity to 1. > Well ok,.. module is loaded now: > I've set check_pci_parity to 1 everything else is 0 in sysfs... > > > # sysctl -a | grep mc > error: "Operation not permitted" reading key "net.ipv6.route.flush" > net.ipv6.neigh.eth1.mcast_solicit = 3 > net.ipv6.neigh.eth0.mcast_solicit = 3 > net.ipv6.neigh.lo.mcast_solicit = 3 > net.ipv6.neigh.default.mcast_solicit = 3 > net.ipv4.conf.ppp0.mc_forwarding = 0 > net.ipv4.conf.eth1.mc_forwarding = 0 > net.ipv4.conf.eth0.mc_forwarding = 0 > net.ipv4.conf.lo.mc_forwarding = 0 > net.ipv4.conf.default.mc_forwarding = 0 > net.ipv4.conf.all.mc_forwarding = 0 > net.ipv4.neigh.ppp0.mcast_solicit = 3 > net.ipv4.neigh.eth1.mcast_solicit = 3 > net.ipv4.neigh.eth0.mcast_solicit = 3 > net.ipv4.neigh.lo.mcast_solicit = 3 > net.ipv4.neigh.default.mcast_solicit = 3 > error: "Operation not permitted" reading key "net.ipv4.route.flush" > error: "Invalid argument" reading key "fs.binfmt_misc.register" > > > But this has nothing to do with edac, has it? > > And I've already had diff errors again,.. > so if there had been some parity issue it should have been logged, right? The names and locations may have change, I am more familiar with the older versions that had the sysctl stuff in them, the new parts may not have the sysctl stuff, but if you make the adjustment with the /sys filesystem, that should work just fine. Roger ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-09 22:14 ` Roger Heflin @ 2006-11-09 22:24 ` Christoph Anton Mitterer 2006-11-09 22:35 ` Roger Heflin 0 siblings, 1 reply; 24+ messages in thread From: Christoph Anton Mitterer @ 2006-11-09 22:24 UTC (permalink / raw) To: Roger Heflin; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 2951 bytes --] Roger Heflin wrote: > Christoph Anton Mitterer wrote: > >> Roger Heflin wrote: >> >>> The mc part does pci parity, it is separate from the >>> chipset driver, >>> >> What? I thought the MC part does ECC and the pci part the parity stuff? >> >> > > mc does pci parity all by itself, it is also the main module > holding the ecc stuff together, but you get no ecc without the > chipset/cpu specific module. > > >>> I have even used the _mc part on a >>> Itanium with no chipset driver at all and had it report >>> parity errors properly, so I expect just the mc driver >>> to work. >>> >>> You would need the k8 module for the cpu, but that is >>> only if you want ECC checking also. >>> >>> >> Where do I get this only when patching from CVS? >> > > I don't know the status is of the k8 modules, some > distro kernels include it, I don't know if vanilla has > it yet. > > mcelog should also report ecc errors, but you would need > to be running the mcelog userspace program every so often > to realize that errors where happening. > > >>> If you got the _mc loaded do a "sysctl -a | grep mc" and >>> see what things are set how, and reset if necessary >>> check_pci_parity to 1. >>> >> Well ok,.. module is loaded now: >> I've set check_pci_parity to 1 everything else is 0 in sysfs... >> >> >> # sysctl -a | grep mc >> error: "Operation not permitted" reading key "net.ipv6.route.flush" >> net.ipv6.neigh.eth1.mcast_solicit = 3 >> net.ipv6.neigh.eth0.mcast_solicit = 3 >> net.ipv6.neigh.lo.mcast_solicit = 3 >> net.ipv6.neigh.default.mcast_solicit = 3 >> net.ipv4.conf.ppp0.mc_forwarding = 0 >> net.ipv4.conf.eth1.mc_forwarding = 0 >> net.ipv4.conf.eth0.mc_forwarding = 0 >> net.ipv4.conf.lo.mc_forwarding = 0 >> net.ipv4.conf.default.mc_forwarding = 0 >> net.ipv4.conf.all.mc_forwarding = 0 >> net.ipv4.neigh.ppp0.mcast_solicit = 3 >> net.ipv4.neigh.eth1.mcast_solicit = 3 >> net.ipv4.neigh.eth0.mcast_solicit = 3 >> net.ipv4.neigh.lo.mcast_solicit = 3 >> net.ipv4.neigh.default.mcast_solicit = 3 >> error: "Operation not permitted" reading key "net.ipv4.route.flush" >> error: "Invalid argument" reading key "fs.binfmt_misc.register" >> >> >> But this has nothing to do with edac, has it? >> >> And I've already had diff errors again,.. >> so if there had been some parity issue it should have been logged, right? >> > > The names and locations may have change, I am more > familiar with the older versions that had the sysctl stuff > in them, the new parts may not have the sysctl stuff, > but if you make the adjustment with the /sys filesystem, > that should work just fine. > Ahh now I see: Parity Count: 'pci_parity_count' This attribute file will display the number of parity errors that have been detected. but this is zero ... So would that mean that I don't have any parity errors? btw: I'm still always getting diff errors at different files... Chris. [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-09 22:24 ` Christoph Anton Mitterer @ 2006-11-09 22:35 ` Roger Heflin 2006-11-09 22:38 ` Christoph Anton Mitterer 0 siblings, 1 reply; 24+ messages in thread From: Roger Heflin @ 2006-11-09 22:35 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-kernel Christoph Anton Mitterer wrote: e. >> > Ahh now I see: > Parity Count: > > 'pci_parity_count' > > This attribute file will display the number of parity errors that > have been detected. > > > but this is zero ... > So would that mean that I don't have any parity errors? > > btw: I'm still always getting diff errors at different files... > > Chris. > That should mean that it is not a HW pci bus issue, though I still have seen odd MB failures that cause corruption and don't show anywhere (pci, ecc, mcelog), and only show up with cksums on specific pieces of hw. I don't have any good way of find those, we swapped one part at a time until it went quit doing it. Roger ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-09 22:35 ` Roger Heflin @ 2006-11-09 22:38 ` Christoph Anton Mitterer 2006-11-09 22:42 ` Roger Heflin 0 siblings, 1 reply; 24+ messages in thread From: Christoph Anton Mitterer @ 2006-11-09 22:38 UTC (permalink / raw) To: Roger Heflin; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 600 bytes --] Roger Heflin wrote: > That should mean that it is not a HW pci bus issue, though I > still have seen odd MB failures that cause corruption and don't > show anywhere (pci, ecc, mcelog), and only show up with cksums > on specific pieces of hw. > > I don't have any good way of find those, we swapped one part > at a time until it went quit doing it. Would those errors also occur when just calculating message digests (sha1sum)? Because if so,.. I could exclude those types of errors for my issue because as I've told,.. at least on the original files the sha sums always are correct. Regards, Chris. [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-09 22:38 ` Christoph Anton Mitterer @ 2006-11-09 22:42 ` Roger Heflin 2006-11-10 0:45 ` Christoph Anton Mitterer 0 siblings, 1 reply; 24+ messages in thread From: Roger Heflin @ 2006-11-09 22:42 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-kernel Christoph Anton Mitterer wrote: > Roger Heflin wrote: >> That should mean that it is not a HW pci bus issue, though I >> still have seen odd MB failures that cause corruption and don't >> show anywhere (pci, ecc, mcelog), and only show up with cksums >> on specific pieces of hw. >> >> I don't have any good way of find those, we swapped one part >> at a time until it went quit doing it. > Would those errors also occur when just calculating message digests > (sha1sum)? Because if so,.. I could exclude those types of errors for my > issue because as I've told,.. at least on the original files the sha > sums always are correct. > > Regards, > Chris. Usually it seemed to be IO related, the sums just happened to show it issue. It did not seem to be a cpu issue, something unknown outside of the cpu seemed to cause it. Roger ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-09 22:42 ` Roger Heflin @ 2006-11-10 0:45 ` Christoph Anton Mitterer 0 siblings, 0 replies; 24+ messages in thread From: Christoph Anton Mitterer @ 2006-11-10 0:45 UTC (permalink / raw) To: Roger Heflin; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 535 bytes --] Roger Heflin wrote: > Usually it seemed to be IO related, the sums just happened > to show it issue. It did not seem to be a cpu issue, > something unknown outside of the cpu seemed to cause it. > Ok,.. as this is obviously not FAT32 related (just tested the whole stuff on ext3) I'll open a new thread to hopefully attract more people for help :-) btw: right now I'm going to try the whole thing with the edac_mc with ECC for K8. mcelog did not return anything at all (just silently quitted). Thanks so far and regards, Chris. [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-09 22:08 ` Christoph Anton Mitterer 2006-11-09 22:14 ` Roger Heflin @ 2006-11-10 10:28 ` Alan Cox 2006-11-11 16:01 ` Christoph Anton Mitterer 1 sibling, 1 reply; 24+ messages in thread From: Alan Cox @ 2006-11-10 10:28 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: Roger Heflin, linux-kernel Ar Iau, 2006-11-09 am 23:08 +0100, ysgrifennodd Christoph Anton Mitterer: > And I've already had diff errors again,.. > so if there had been some parity issue it should have been logged, right? If it was a PCI side parity error yes. If you have dodgy memory then the K8 will MCE and report that if the MCE code is loaded. If the memory is non ECC or the CPU doesn't support ECC memory you'll get silent strange behaviour, but a long run of memtest86 can usually find any main memory problems. Alan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-10 10:28 ` Alan Cox @ 2006-11-11 16:01 ` Christoph Anton Mitterer 0 siblings, 0 replies; 24+ messages in thread From: Christoph Anton Mitterer @ 2006-11-11 16:01 UTC (permalink / raw) To: Alan Cox; +Cc: Roger Heflin, linux-kernel Alan Cox wrote: > If it was a PCI side parity error yes. If you have dodgy memory then the > K8 will MCE and report that if the MCE code is loaded. If the memory is > non ECC or the CPU doesn't support ECC memory you'll get silent strange > behaviour, but a long run of memtest86 can usually find any main memory > problems. > > Alan > Dear Alan.... The memory has ECC (and neither EDAC_MC with K8 support, nor mcelog (I even tried to compile in both the AMD and intel MCE support) nor memtest does show me any errors. Pleas have a look at my "new" post.... as this is definitely not FAT32 related,.. I posted the whole thing unter a new thread (that that would be the correct way). There you'll also find my latest results. Thanks in advance for any further help :-) Chris. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-09 20:59 ` Christoph Anton Mitterer 2006-11-09 21:02 ` Roger Heflin @ 2006-11-09 21:03 ` Roger Heflin 2006-11-09 21:11 ` Christoph Anton Mitterer 1 sibling, 1 reply; 24+ messages in thread From: Roger Heflin @ 2006-11-09 21:03 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-kernel Christoph Anton Mitterer wrote: > Roger Heflin wrote: >> The failure can manifest itself in many ways, I have >> only seen it as a read failure, but there should be no >> reason why it cannot also show as a write failure. >> >> It should be in the later vanilla kernels, it won't >> be in the earlier ones, I would do a >> find /lib/modules -name "*edac*" -ls >> >> It is a hw issue, either something is running faster that >> it should be (pci bus set to fast for the given hardware/config) >> or something is broken. > The strange thing is that it always occures on the copied data,.. not > the original (which is on another disk). But wouldn those parity errors > not occur in general? > For example al my sha1sums -c sumfile checks are working corretly on the > original disk :/ Are both disks of the same type and connected to the same hardware? Or do they have different physical connections/drivers to the machine? Roger ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-09 21:03 ` Roger Heflin @ 2006-11-09 21:11 ` Christoph Anton Mitterer 0 siblings, 0 replies; 24+ messages in thread From: Christoph Anton Mitterer @ 2006-11-09 21:11 UTC (permalink / raw) To: Roger Heflin; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 924 bytes --] Roger Heflin wrote: > Are both disks of the same type and connected to the same > hardware? > > Or do they have different physical connections/drivers to the > machine? The system has 2 DualCore Opterons 275, on a Tyan S2895 board... The disk with the originak data is a PATA disk from IBM. The disk where I've copied the stuff to... is a SATA. I did several diffs the last hours between the two disks and experienced what you've described, that sometimes no differences sometimes there are differences (in different files). But note that the same happened already on the SAME disk. In the beginning I copied the data to another place on the same disk, then diffed and there were the same problems. So I still wonder why this never affects the original files. When I check sha512sums there I never get an error. Right now I compile a new kernel with that module... and pray to god that this is not an hardware error :/ [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-07 18:55 ` OGAWA Hirofumi 2006-11-07 21:32 ` Christoph Anton Mitterer 2006-11-09 18:32 ` Christoph Anton Mitterer @ 2006-11-09 22:23 ` Christoph Anton Mitterer 2006-11-10 1:49 ` OGAWA Hirofumi 2 siblings, 1 reply; 24+ messages in thread From: Christoph Anton Mitterer @ 2006-11-09 22:23 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: linux-kernel, Roger Heflin [-- Attachment #1: Type: text/plain, Size: 708 bytes --] OGAWA Hirofumi wrote: > Christoph Anton Mitterer <calestyo@scientia.net> writes: > > >> The strange thing is that one time the differences were found directly >> after copying (thus one would thing RAM is damaged, because the data was >> probalby (I cannot tell this for sure) taken from file cache). >> and the other time after restarting with a certainly empty file cache. >> >> Any ideas? I'm willing to help debugging and so on but I must admit that >> I need someone to say me what to do :D >> > > bit interesting. Could you send the output of diff? I'd like to see > how it's breaking. > I have now such a diff,... but where should I send it,.. it's quite big (21266 bytes) Regards, Chris. [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-09 22:23 ` Christoph Anton Mitterer @ 2006-11-10 1:49 ` OGAWA Hirofumi 2006-11-10 2:55 ` Christoph Anton Mitterer 0 siblings, 1 reply; 24+ messages in thread From: OGAWA Hirofumi @ 2006-11-10 1:49 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: linux-kernel, Roger Heflin Christoph Anton Mitterer <calestyo@scientia.net> writes: > OGAWA Hirofumi wrote: >> Christoph Anton Mitterer <calestyo@scientia.net> writes: >> >>> The strange thing is that one time the differences were found directly >>> after copying (thus one would thing RAM is damaged, because the data was >>> probalby (I cannot tell this for sure) taken from file cache). >>> and the other time after restarting with a certainly empty file cache. >>> >>> Any ideas? I'm willing to help debugging and so on but I must admit that >>> I need someone to say me what to do :D >>> >> >> bit interesting. Could you send the output of diff? I'd like to see >> how it's breaking. >> > I have now such a diff,... but where should I send it,.. it's quite big > (21266 bytes) I think it's not so big. If you care, please send it to me. -- OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Strange write errors on FAT32 partition (maybe an FAT32 bug?!) 2006-11-10 1:49 ` OGAWA Hirofumi @ 2006-11-10 2:55 ` Christoph Anton Mitterer 0 siblings, 0 replies; 24+ messages in thread From: Christoph Anton Mitterer @ 2006-11-10 2:55 UTC (permalink / raw) To: OGAWA Hirofumi; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 371 bytes --] OGAWA Hirofumi wrote: >> I have now such a diff,... but where should I send it,.. it's quite big >> (21266 bytes) >> > > I think it's not so big. If you care, please send it to me. > Sorry this must wait until monday,... I'm away over the weekend (but I'm available via email) and the file I had got lost.... but I will "create" a new one monday. Regards, Chris. [-- Attachment #2: calestyo.vcf --] [-- Type: text/x-vcard, Size: 156 bytes --] begin:vcard fn:Mitterer, Christoph Anton n:Mitterer;Christoph Anton email;internet:calestyo@scientia.net x-mozilla-html:TRUE version:2.1 end:vcard ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2006-11-11 16:01 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-07 15:21 Strange write errors on FAT32 partition (maybe an FAT32 bug?!) Christoph Anton Mitterer
2006-11-07 18:55 ` OGAWA Hirofumi
2006-11-07 21:32 ` Christoph Anton Mitterer
2006-11-09 18:32 ` Christoph Anton Mitterer
[not found] ` <45539188.5080607@atipa.com>
2006-11-09 20:45 ` Christoph Anton Mitterer
2006-11-09 20:54 ` Roger Heflin
2006-11-09 20:59 ` Christoph Anton Mitterer
2006-11-09 21:02 ` Roger Heflin
2006-11-09 21:57 ` Christoph Anton Mitterer
2006-11-09 22:02 ` Roger Heflin
2006-11-09 22:08 ` Christoph Anton Mitterer
2006-11-09 22:14 ` Roger Heflin
2006-11-09 22:24 ` Christoph Anton Mitterer
2006-11-09 22:35 ` Roger Heflin
2006-11-09 22:38 ` Christoph Anton Mitterer
2006-11-09 22:42 ` Roger Heflin
2006-11-10 0:45 ` Christoph Anton Mitterer
2006-11-10 10:28 ` Alan Cox
2006-11-11 16:01 ` Christoph Anton Mitterer
2006-11-09 21:03 ` Roger Heflin
2006-11-09 21:11 ` Christoph Anton Mitterer
2006-11-09 22:23 ` Christoph Anton Mitterer
2006-11-10 1:49 ` OGAWA Hirofumi
2006-11-10 2:55 ` Christoph Anton Mitterer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox