* ext4 filesystem corruption across partitions @ 2014-04-17 15:05 Devrin Talen 2014-04-17 16:12 ` Theodore Ts'o 0 siblings, 1 reply; 5+ messages in thread From: Devrin Talen @ 2014-04-17 15:05 UTC (permalink / raw) To: linux-ext4 Hi all, I'm debugging an issue on my platform. In short, I can corrupt an ext4 filesystem on one partition by writing a file on a different one. I'm suspecting something is off either with my partition table or filesystem parameters, but I'm such an ext4 beginner that I thought I'd start here to get some help in where to look. If I run this (which writes a relatively large file to partition 12): dd if=/dev/zero of=/cache/goingtodie bs=4096 count=120000 Then (after rebooting) I'll get an ext4 error like this on partition 13: EXT4-fs error (device mmcblk0p13): ext4_readdir:214: inode #102545: block 426479: comm er.ServerThread: path /data/app-private: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, na0 My android system runs a slightly modified 3.0.31 kernel with 4GB eMMC as the block device. My partition table is set up as such (located in the first 34 blocks): lba size = 512 lba_start partition_size name ========= ====================== ============== 34 97280( 95K) environment 224 16384( 16K) crypto 256 393216( 384K) xloader 1024 524288( 512K) bootloader 2048 524288( 512K) device_info 3072 524288( 512K) bootloader2 4096 524288( 512K) misc 5120 8388608( 8M) recovery 21504 8388608( 8M) boot 37888 16777216( 16M) efs 70656 1073741824( 1024M) system 2167808 536870912( 512M) cache 3216384 2195193856( 2093M) userdata ========= ====================== ============== The two filesystems in question are the cache and userdata partitions. These are created with the following make_ext4fs[1] commands and then flashed to eMMC: % make_ext4fs -s -L cache -l 536870912 cache.img cache Creating filesystem with parameters: Size: 536870912 Block size: 4096 Blocks per group: 32768 Inodes per group: 8192 Inode size: 256 Journal blocks: 2048 Label: cache Blocks: 131072 Block groups: 4 Reserved block group size: 31 Created filesystem with 11/32768 inodes and 4206/131072 blocks % make_ext4fs -s -l 2143744K -a data userdata.img data/ Creating filesystem with parameters: Size: 2195193856 Block size: 4096 Blocks per group: 32768 Inodes per group: 7888 Inode size: 256 Journal blocks: 8374 Label: Blocks: 535936 Block groups: 17 Reserved block group size: 135 Created filesystem with 11/134096 inodes and 17614/535936 blocks Any help is very much appreciated. Does anyone see anything amiss, or that I should try looking into? If there's any more information that's needed just let me know. Or if you think there's a better mailing list for me to take this to, I can do that too. [1]: https://android.googlesource.com/platform/system/extras/+/fb109b894a5fc2891e49ec8e81c0dda171b45b7f/ext4_utils/make_ext4fs_main.c -- Devrin Talen <dct23@cornell.edu> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ext4 filesystem corruption across partitions 2014-04-17 15:05 ext4 filesystem corruption across partitions Devrin Talen @ 2014-04-17 16:12 ` Theodore Ts'o 2014-05-06 2:01 ` Devrin Talen 0 siblings, 1 reply; 5+ messages in thread From: Theodore Ts'o @ 2014-04-17 16:12 UTC (permalink / raw) To: Devrin Talen; +Cc: linux-ext4 On Thu, Apr 17, 2014 at 11:05:23AM -0400, Devrin Talen wrote: > Hi all, > > I'm debugging an issue on my platform. In short, I can corrupt an ext4 > filesystem on one partition by writing a file on a different one. I'm > suspecting something is off either with my partition table or filesystem > parameters, but I'm such an ext4 beginner that I thought I'd start here > to get some help in where to look. The partition table looks fine. (What I did was to take the lba_start and partition_size fields from your table, imported them into a spreadsheet, and then verified that "lba_start + partition_size/512" for each partition was the same as the lba_start of the next partition. Obviously, there is no partition table overlap.) The kernel is supposed to make sure that writes in one partition can't affect another parition, so either you have a kernel bug in the block device layer or driver, or you have a hardware problem. I hate to ask this, but are you sure you have a quality 4GB sd card? There are fraudulent cards out there where a card will be marked as having X GB, but it only really has Y GB, or even Y MB worth of flash. The people making these fraudulent cards rely on the fact that very often people don't actually fill up their flash cards, so as long as they don't write to more than Y GB worth of disk sectors, they won't notice anything wrong. But if you do write to more sectors than there is flash, then the N+Ith unique disk sector write ends up going to the Ith disk sector that had been written. - Ted ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ext4 filesystem corruption across partitions 2014-04-17 16:12 ` Theodore Ts'o @ 2014-05-06 2:01 ` Devrin Talen 2014-05-06 19:40 ` Theodore Ts'o 0 siblings, 1 reply; 5+ messages in thread From: Devrin Talen @ 2014-05-06 2:01 UTC (permalink / raw) To: Theodore Ts'o; +Cc: linux-ext4 On Thu, 17 Apr 2014 12:12:49 -0400 "Theodore Ts'o" <tytso@mit.edu> wrote: > On Thu, Apr 17, 2014 at 11:05:23AM -0400, Devrin Talen wrote: > > Hi all, > > > > I'm debugging an issue on my platform. In short, I can corrupt an > > ext4 filesystem on one partition by writing a file on a different > > one. I'm suspecting something is off either with my partition > > table or filesystem parameters, but I'm such an ext4 beginner that > > I thought I'd start here to get some help in where to look. > > The partition table looks fine. (What I did was to take the lba_start > and partition_size fields from your table, imported them into a > spreadsheet, and then verified that "lba_start + partition_size/512" > for each partition was the same as the lba_start of the next > partition. Obviously, there is no partition table overlap.) Ted, thanks for the response. I wanted to reply sooner but I had to make sure I had a good way to reproduce the filesystem corruption before getting back. As far as the partition table, that's what I thought too but it helps to have a second pair of eyes on it. Thanks! > The kernel is supposed to make sure that writes in one partition can't > affect another parition, so either you have a kernel bug in the block > device layer or driver, or you have a hardware problem. That could be. We're fairly certain it's not electrical, just because of how simple the hookups are to our CPU, but it wouldn't be surprising if there's some setting on the eMMC part that we're missing. Anyway, here's how we've been able to get this to reproduce fairly reliably: 1. Run `ls -R *` in a loop from the root directory. The root is mounted from partition 11 (system) on the eMMC and the ls will read the /cache (partition 12) and /data (partition 13) filesystems as well. 2. Write data to partition 12 via ADB (using `adb push ... /cache/`) Doing these two things, we'll get ext4 errors reported on partition 13. I'll get the exact error messages when I'm back at my desk tomorrow. Fortunately, we managed to capture the failure while printing out the trace of eMMC commands from the block driver. It's a large file, but if someone would find that useful I think I can make it available somehow. > I hate to ask this, but are you sure you have a quality 4GB sd card? > There are fraudulent cards out there where a card will be marked as > having X GB, but it only really has Y GB, or even Y MB worth of flash. > The people making these fraudulent cards rely on the fact that very > often people don't actually fill up their flash cards, so as long as > they don't write to more than Y GB worth of disk sectors, they won't > notice anything wrong. But if you do write to more sectors than there > is flash, then the N+Ith unique disk sector write ends up going to the > Ith disk sector that had been written. That's a good point, but we're actually using a Micron eMMC part soldered to out board, so it better be as big as they advertise it :). Again, thanks for the help so far and hope that we can track this down. -- Devrin Talen <dct23@cornell.edu> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ext4 filesystem corruption across partitions 2014-05-06 2:01 ` Devrin Talen @ 2014-05-06 19:40 ` Theodore Ts'o 2014-05-06 22:38 ` Andreas Dilger 0 siblings, 1 reply; 5+ messages in thread From: Theodore Ts'o @ 2014-05-06 19:40 UTC (permalink / raw) To: Devrin Talen; +Cc: linux-ext4 On Mon, May 05, 2014 at 10:01:30PM -0400, Devrin Talen wrote: > > 1. Run `ls -R *` in a loop from the root directory. The root is > mounted from partition 11 (system) on the eMMC and the ls will read > the /cache (partition 12) and /data (partition 13) filesystems as well. Try mounting /data read-only. That should pretty much guarantee that nothing should be able to write to it. You can also use blktrace to capture block I/O traces to the device, and use that to make sure nothing was actually writing to it. > 2. Write data to partition 12 via ADB (using `adb push ... /cache/`) Instead of using ADB, I would suggest writing a test program which writes a series of 512 byte sectors to a single large file in /cache. At the beginning of each 512 byte sector include a 4 byte serial number (which is incremented by one for each sector), a 4 byte testID which is different for each run of your test program, a time stamp, a CRC of these fields, and then fill the rest of the sector with some text string to make it easy to recognize this pattern. It can be anything from 0xDEADBEEF, to a string such as "DEBUGGING RANDOM HW BUGS REALLY SUCKS". :-) Now try to reproduce the problem with this write load. If you can reproduce the problem, check and see if the corrupted file system block in the shows evidence of the string that was supposed to be written into /cache, showing up in /data. You can also check the large file being written in the /cache has the expended serial number and checksum. This will allow you to see if a the block writes are just going to the wrong place on the SSD, or something else more strange might be going on. Depending on the pattern of what blocks are ending up where they shouldn't, it might point towards different possible causes (i.e., a flaky solder joint, a buggy flash translation layer in the eMMC chip, etc.) Cheers, - Ted ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: ext4 filesystem corruption across partitions 2014-05-06 19:40 ` Theodore Ts'o @ 2014-05-06 22:38 ` Andreas Dilger 0 siblings, 0 replies; 5+ messages in thread From: Andreas Dilger @ 2014-05-06 22:38 UTC (permalink / raw) To: Devrin Talen; +Cc: Theodore Ts'o, Ext4 Developers List [-- Attachment #1: Type: text/plain, Size: 2964 bytes --] On May 6, 2014, at 1:40 PM, Theodore Ts'o <tytso@mit.edu> wrote: > On Mon, May 05, 2014 at 10:01:30PM -0400, Devrin Talen wrote: >> 2. Write data to partition 12 via ADB (using `adb push ... /cache/`) > > Instead of using ADB, I would suggest writing a test program which > writes a series of 512 byte sectors to a single large file in /cache. > At the beginning of each 512 byte sector include a 4 byte serial > number (which is incremented by one for each sector), a 4 byte testID > which is different for each run of your test program, a time stamp, a > CRC of these fields, and then fill the rest of the sector with some > text string to make it easy to recognize this pattern. It can be > anything from 0xDEADBEEF, to a string such as "DEBUGGING RANDOM HW > BUGS REALLY SUCKS". :-) We wrote a tool "llverfs" to do this years ago, for debugging problems with >16TB LUN sizes and other 64/32-bit address truncation problems: http://git.hpdd.intel.com/?p=fs/lustre-release.git;a=blob;f=lustre/utils/llverfs.c It either partially fills the filesystem (write all files then read all files, with one write per MB) to do a fast test of the system or can optionally completely fill the filesystem and writes to every 4kB block and then reads it back and verifies the data. Each block contains the inode number, block offset, and a timestamp (to distinguish between separate runs) so that it can detect where badly written data is coming from. There is a companion tool for doing block-device testing http://git.hpdd.intel.com/?p=fs/lustre-release.git;a=blob;f=lustre/utils/llverdev.c Caveat - we only ever use the llverfs on disposable filesystems, and while I don't _think_ it will clobber the other files that already exist, I've never tested it in such a manner. Obviously, llverdev is overwriting the whole block device, so it will erase all data in the device it is pointed at. Cheers, Andreas > Now try to reproduce the problem with this write load. If you can > reproduce the problem, check and see if the corrupted file system > block in the shows evidence of the string that was supposed to be > written into /cache, showing up in /data. You can also check the > large file being written in the /cache has the expended serial number > and checksum. > > This will allow you to see if a the block writes are just going to the > wrong place on the SSD, or something else more strange might be going > on. Depending on the pattern of what blocks are ending up where they > shouldn't, it might point towards different possible causes (i.e., a > flaky solder joint, a buggy flash translation layer in the eMMC chip, > etc.) > > Cheers, > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-05-06 22:38 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-04-17 15:05 ext4 filesystem corruption across partitions Devrin Talen 2014-04-17 16:12 ` Theodore Ts'o 2014-05-06 2:01 ` Devrin Talen 2014-05-06 19:40 ` Theodore Ts'o 2014-05-06 22:38 ` Andreas Dilger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).