linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* hfsplus corruption, failed fsck, journalling and zero'ing extent record on delete
@ 2012-09-24  6:53 Hin-Tak Leung
  2012-09-24  7:30 ` hfsplus BUG: Bad page state in process du pfn:07759 (Re: hfsplus corruption, failed fsck, journalling and zero'ing extent record on delete) Hin-Tak Leung
  0 siblings, 1 reply; 8+ messages in thread
From: Hin-Tak Leung @ 2012-09-24  6:53 UTC (permalink / raw)
  To: Vyacheslav Dubeyko, linux-fsdevel
  Cc: Till Kamppeter, Naohiro Aota, Matthew Garrett

Hi Vyacheslav,

I mentioned briefly some days ago that I managed to corrupt an HFS+ paritition while experimenting with the journalling code, to the extent that fsck_hfs/fsck.hfsplus (Apple's diskdev_cmds tool) refuses to fix. And that partition, with the unmodified module used ready-only can get the kernel to BUG() "reliably" by just doing "du" on it (and I was thinking whether BUG()'ing on corrupted disk is a bug to file...).

With a lot of reading-up, and some C, some python, in the end, some dd if=/of= and a hex editor with a calculator, I managed to get fsck to go successfully again. So now I have some idea of how to stress HFS+ for fsck to refuse to fix. The recipe is something like this:

- a disk with a lot of small files and quite full. (I have a 105 GB partition, 75% full, 600,000 leaf records in the catalog btree, or 400,000 inodes depends on how you count... untar'ing a few kernel trees under it should do)

- try to delete the small files one by one very quickly. (I did essentially 
    cat list | perl -ne 'chomp; if (-f $_) {unlink $_;}' 
, after comparing the netgear code with stock kernel's and generating an "uninteresting" file list).

- probably SMP system + reasonably amount of memory for disk cache (dual core + 2GB RAM).

Under that combination of conditions, it may be possible to stress HFS+ in a way such that:

1. the Catalog B-Tree needs to be substantially re-written/re-located, rather than being updated in-place. i.e. a large number of changes of leaf-records in a short time.

2. the re-written/re-located part of the Catalog B-Tree needs to re-use the extents which are recently "vacated" by the deletion. i.e. need a fairly full disk to see this.

It seems that when files are deleted, leaf records are made only *partially* invalid, and a partially up-to-date new Catalog B-Tree is written, and then further updates happen in-place to bring to whole thing consistent (the extent bitmap & volume headers, etc) ... but in my case, for whatever reason, things was interrupted in the middle.

So I had a new Catalog B-Tree sitting on the overlapping extents as partially deleted file records. fsck thinks the files need to be "undeleted", but cannot read the B-Tree without error on those partially invalid leaf records, and cannot fix either of them.

I pieced together the Catalog B-Tree (in 3 fragments - actually it was 4 to begin with, fsck in rebuild-Catalog mode gives me a new one which is "differently" broken - i.e. overlap with another set of ~140 partially deleted records), found all the overlaping leaf records - ~140 of them in 17 leaf nodes , used a hex editor to zero'ed the extents and file sizes by hand, and voila, fsck was a lot happier afterwards.

- the linux hfsplus driver probably *should* zero' the corresponding extent descriptor in the leaf record when a file is deleted?
I seem to remember years ago between ext2 and ext3, one notable/advertised difference of ext3 is that ext3 zero's inodes on delete (and make it difficult for low-level data recovery) - and there was a reason for it... I should read that up... disk formatting & file deleting under Mac OS X seems to take much longer, compared to under linux - do they zero' records *fully* on format/delete?

- whether this possibility of corruption is related to the experimental journalling code - it does work correctly under light use - i.e. fsck is fully happy after unmount.

- HFS+ is probably one of the rare minority of file systems where critical parts of it, the Catalog B-Tree, (and the other 3-4(?) B-Tree), are regular files and subjected to the same fragmentation and competition from normal file usage?! (instead of being in "dedicated" allocated areas, and also having multiple copies).

- oh, one last thing: there was one later version of the journalling code from netgear, which copied a lot of files from ext3 (the jbd part). Maybe they know about HFS+ needing a kernel demon to do more regular sync to disk than others...

More experiments... fixing things which fsck cannot, makes experimenting easier...

Hin-Tak


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-10-03 10:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-24  6:53 hfsplus corruption, failed fsck, journalling and zero'ing extent record on delete Hin-Tak Leung
2012-09-24  7:30 ` hfsplus BUG: Bad page state in process du pfn:07759 (Re: hfsplus corruption, failed fsck, journalling and zero'ing extent record on delete) Hin-Tak Leung
2012-09-24 10:35   ` Vyacheslav Dubeyko
2012-09-24 17:43     ` Hin-Tak Leung
2012-09-24 19:03       ` Vyacheslav Dubeyko
2012-09-24 20:10         ` Hin-Tak Leung
2012-10-01 19:09   ` Vyacheslav Dubeyko
2012-10-03 10:45     ` Hin-Tak Leung

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).