* (root cause found....may be not)64k Page size + ext3 errors
@ 2008-08-02 0:07 tirumalareddy marri
2008-08-03 11:00 ` Peter Grandi
0 siblings, 1 reply; 2+ messages in thread
From: tirumalareddy marri @ 2008-08-02 0:07 UTC (permalink / raw)
To: Roger Heflin; +Cc: linux-raid, linux-ext4
After lots of debugging and dumping file system information. I found that super block is being corrupted during SATA dma transfer. I am using PCI-E based SATA card to attach hard disks. Looks with 64k page size SATA DMA seems to be stressed so much compared to 4k page size. I used another SATA card which is more stable(it does not use libata). It worked finw with RAID-5 and 64k page size.
I have used a small C program to create w2GB size file and read it back and check the data consistency. So far no errors found. I also used IO meter test , which worked fine too.
All thank you very much for the suggestions and responses.
Regards,
Marri
----- Original Message ----
From: Roger Heflin <rogerheflin@gmail.com>
To: tirumalareddy marri <tirumalareddymarri@yahoo.com>
Cc: linux-raid@vger.kernel.org
Sent: Monday, July 28, 2008 5:33:34 PM
Subject: Re: 64k Page size + ext3 errors
tirumalareddy marri wrote:
> Hi Roger,
> I did sync after I copied the 128MB data. Isn't that should guarantee data is flushed to disk ? I am using "sum" command to check if data file is copied to Disk is valid or not.
It means it will be flushed to disk, it does not mean that when you read it back
that will come off disk, if it is still in memory then it will come out of
memory, and still be wrong on disk. If you won't want to to more complicated
test it might be best to create the file, csum it and if it is ok umount the
device and remount it and csum it again and see, this should at least force it
to come off of disk again.
How much memory does your test machine have?
> Here is more information.
> setup: Created /dev/md0 of 30GB size , created ext3 files system. Then started SAMBA server to export mountded /dev/md0 to a windows machine to run IO and copy files.
> 4K Page size:
> -------------------
> 1. IO Meter Test: Works just fine.
None of the benchmarks I am familiar with actually confirm that the data is
good, the only way one of the benchmarks will fail is if the file table gets
corrupted, and they may run in cache.
> 2. Copied 1.8 GB file and check sum is good.
> 3. Performance is not good because of small page size.
> 16k Page size:
> ---------------------
> 1. RAID-5 fails some times with " Attempt to access beyond the end of device"
> 2. Copied 128MB and 385MB file. Checked check sum, they are matching check sum.
> 3. Copied 1.8 GB file , this failed checksum test using "sum" command. I see "EXT3-fs errors".
> 64K Page size:
> ----------------------
> 1. RAID-5 failes some times with "Attempt to access beyond the end of device"
> 2. Able to copy 128MB data and check sum test passed.
> 3. Copying 385MB and 1.8 GB file with EXT3-fs errors.
> Thanks,
> Marri
I would write directly to the /dev/mdx a specific pattern (a stream of binary
numbers from 1 ... whatever works fine), and then read that back and see how
things match or don't. csum *can* fail, and if you have enough memory then any
corruption actually on disk *WON'T* be found until somethings causes it to be
ejected from cache, and then later re-read from disk.
Roger
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: (root cause found....may be not)64k Page size + ext3 errors
2008-08-02 0:07 (root cause found....may be not)64k Page size + ext3 errors tirumalareddy marri
@ 2008-08-03 11:00 ` Peter Grandi
0 siblings, 0 replies; 2+ messages in thread
From: Peter Grandi @ 2008-08-03 11:00 UTC (permalink / raw)
To: Linux RAID
[ ... ]
> After lots of debugging and dumping file system information.
> I found that super block is being corrupted during SATA dma
> transfer. I am using PCI-E based SATA card to attach hard disks.
> Looks with 64k page size SATA DMA seems to be stressed so much
> compared to 4k page size.
Well, that depends quite a bit on a lot of factors, but if for
example a 64KiB IO block size means is transferred as one PCI
transaction, things will likely be bad indeed.
> I used another SATA card which is more stable(it does not use
> libata). It worked finw with RAID-5 and 64k page size.
It would be interesting to see if the PCI latency timers for the
two cards default to the same value or different values. Quite a
few (GPU and other) cards set high latency timer values to game
benchmarks.
> I have used a small C program to create w2GB size file and
> read it back and check the data consistency. So far no errors
> found. I also used IO meter test , which worked fine too. [
> ... ]
This very interesting paper from CERN on silent corruptions may
be relevant too.
https://indico.desy.de/contributionDisplay.py?contribId=65&sessionId=42&confId=257
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2008-08-03 11:00 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-02 0:07 (root cause found....may be not)64k Page size + ext3 errors tirumalareddy marri
2008-08-03 11:00 ` Peter Grandi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).