* minixfs bitmaps and associated lossage [not found] ` <20060506163737.GP27946@ftp.linux.org.uk> @ 2006-05-06 22:04 ` Al Viro 2006-05-06 22:25 ` Matthew Wilcox ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Al Viro @ 2006-05-06 22:04 UTC (permalink / raw) To: linux-kernel; +Cc: linux-fsdevel, Linus Torvalds Warning: text below is a mild example of software coproarchaeology, so if you are easily squicked by tangled mess of bugs and dumb lossage, well... you've been warned. This particular clusterfsck had begun when AST decided to store the metadata in host-endian order. All of it; inode numbers in directories, block numbers in inodes and indirect blocks, etc. Ugly as it was, it would be more or less straightforward, if not for one trap - the bitmaps. The rest of metadata had obvious element sizes, so it was hard to get wrong. However, for bitmaps it was arbitrary. And it does matter - mapping an array of bits to array of big-endian 16bit and to array of big-endian 32bit gives different results. We get either 8-15, 0-7, 24-31, 16-23, ... or 24-31, 16-23, 8-15, 0-7, ... resp. For little-endian we get the same thing, though. AST had chosen to make it an array of 16-bit host-endian. Linux had minixfs support from the very beginning, but it started on little-endian hosts, so that issue had been happily ignored - le16 or le32, you get the same result. The second architecture to be merged also had been little-endian (alpha), so it didn't cause any new problems. fs/minix/inode.c used clear_bit(), etc. for bitmap access, which assumes array of unsigned long in host-endian. Then it had hit the fan, but nobody cared - sparc merge was 1.1.77, but I'm not sure if minix even existed on sparc at that point. And it sure as hell was not a concern with respect to sharing fs. Same for mips merge in 1.1.82 and ppc one in 1.3.45. The next one was m68k in 1.3.94. And there it became serious - m68k boxen with both minix and Linux on them did exist. So behaviour of mainline minixfs was a real problem - it would eat filesystems if it would ever build and run. m68k tree had that fixed, though, by providing minix_test_bit() et.al. that did the right thing. As always with m68k, "fixed" and "cared to put the fix into mainline" had been rather... loosely coupled events. When the fix did go into the mainline (2.1.17), it had created an interesting situation: * i386 and alpha: minix_test_bit() and friends added as wrappers * m68k: added, do the right thing. * sparc, mips and ppc: helpers absent, won't build with CONFIG_MINIX_FS The real trouble was that the only non-trivial implementation had not been documented - not even to say what it does and why it's needed. So when the folks on other platforms started to fix the breakage, results had been ugly: - sparc: blindly defined as on i386 - i.e. host-endian 32bit (== be32). Compiles, still broken. - mips: defined as on i386 with bloody misguiding comment: * FIXME: These assume that Minix uses the native byte/bitorder. It _does_ use the native byte order. It's chunk size that doesn't match the native word size. Overall: be32. - ppc: perhaps due to the second-hand confusion induced by mips comment, perhaps independently, ppc went with _little-endian_ 32bit. - sparc64: blindly copied as on i386. That meant yet another variant: host-endian 64bit (== be64). After that it went uglier and uglier. Little-endian architectures were still all right, but big-endian had done everything except the correct behaviour. Some did like ppc and used little-endian bitmaps. Some did be32. Some be64. The _ONLY_ big-endian that does the right thing is m68k. Everything else is using layouts that never would be recognized by minix - on any platform. Again, we are paying for the lack of description of original minix_..._bit() family - and for the original mess in minix fs layout. Minix recognizes two layouts: 16bit values 32bit values bitmaps 01 0123 01234567... 10 3210 10325476... Little-endian architectures on Linux follow the first variant. m68k follows the second one. ppc, parisc and big-endian arm and frv do 10 3210 01234567... The rest of 32bit big-endian goes with 10 3210 32107654... and 64bit big-endian do 10 3210 76543210... In effect, we've got three new layouts, thanks to aforementioned lossage. But it gets even funnier: filesystem has to be created, after all. And _that_ is not just broken, it's broken differently. We have: little-endian: layout 1 (correct) m68k: layout 2 (correct) everything else:layout 3 Of course, native minix mkfs always creates (1) or (2) and native minix fsck gets quite unhappy with anything else. Amusingly enough, debian util-linux has mkfs.minix and fsck.minix excluded on sparc, so the problem _was_ noticed and duly papered over. Recently all that crap got "regularized" kernel-side. About the only effect was the loss of some warnings along the lines of "something's fishy here". So... What the hell can we do? Layouts (4) and (5) are clearly broken and _never_ worked - there's nothing that would manage to create such filesystem. So these are obvious candidates for switching - either to (2) (correct) or to (3) (broken, but at least match util-linux fsck.minix and mkfs.minix on such platforms). The question being, what do we do with (3) (big-endian metadata, little-endian bitmaps) and what do we do with Linux fsck.minix? Aside of repeating the mantra, that is ("All Software Sucks, All Hardware Sucks")... ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: minixfs bitmaps and associated lossage 2006-05-06 22:04 ` minixfs bitmaps and associated lossage Al Viro @ 2006-05-06 22:25 ` Matthew Wilcox 2006-05-06 22:26 ` Linus Torvalds 2006-05-07 7:35 ` Pavel Machek 2 siblings, 0 replies; 7+ messages in thread From: Matthew Wilcox @ 2006-05-06 22:25 UTC (permalink / raw) To: Al Viro; +Cc: linux-kernel, linux-fsdevel, Linus Torvalds On Sat, May 06, 2006 at 11:04:51PM +0100, Al Viro wrote: > So... What the hell can we do? Layouts (4) and (5) are clearly > broken and _never_ worked - there's nothing that would manage to create > such filesystem. So these are obvious candidates for switching - either > to (2) (correct) or to (3) (broken, but at least match util-linux fsck.minix > and mkfs.minix on such platforms). The question being, what do we do with > (3) (big-endian metadata, little-endian bitmaps) and what do we do with > Linux fsck.minix? Aside of repeating the mantra, that is ("All Software > Sucks, All Hardware Sucks")... For parisc (and I suspect many other architectures), the situation is clear. Nobody has ever used minixfs, and the only possible reason to use it is for data transfer from another system. Now, there's more i386/minix systems in existence than there are m68k/minix, so I'd actually prefer to switch parisc to support the LE minix format. Or, since that would involve doing work for something that nobody would ever use, just disabling it on parisc. If anyone ever wants it, they can do the work. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: minixfs bitmaps and associated lossage 2006-05-06 22:04 ` minixfs bitmaps and associated lossage Al Viro 2006-05-06 22:25 ` Matthew Wilcox @ 2006-05-06 22:26 ` Linus Torvalds 2006-05-06 23:10 ` Al Viro 2006-05-07 7:35 ` Pavel Machek 2 siblings, 1 reply; 7+ messages in thread From: Linus Torvalds @ 2006-05-06 22:26 UTC (permalink / raw) To: Al Viro; +Cc: linux-kernel, linux-fsdevel On Sat, 6 May 2006, Al Viro wrote: > > Warning: text below is a mild example of software coproarchaeology, > so if you are easily squicked by tangled mess of bugs and dumb lossage, > well... you've been warned. LOL. Maybe the right thing to do is to just disable minixfs for anything big-endian except for m68k. It's not like it likely matters, and while we could save your description of the problem as an amusing "how to really f*ck up" episode, I doubt anybody really _cares_ in this case. Linus ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: minixfs bitmaps and associated lossage 2006-05-06 22:26 ` Linus Torvalds @ 2006-05-06 23:10 ` Al Viro 2006-05-06 23:42 ` Linus Torvalds 0 siblings, 1 reply; 7+ messages in thread From: Al Viro @ 2006-05-06 23:10 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel On Sat, May 06, 2006 at 03:26:21PM -0700, Linus Torvalds wrote: > > > On Sat, 6 May 2006, Al Viro wrote: > > > > Warning: text below is a mild example of software coproarchaeology, > > so if you are easily squicked by tangled mess of bugs and dumb lossage, > > well... you've been warned. > > LOL. > > Maybe the right thing to do is to just disable minixfs for anything > big-endian except for m68k. > > It's not like it likely matters, and while we could save your description > of the problem as an amusing "how to really f*ck up" episode, I doubt > anybody really _cares_ in this case. Well... There's a minixfs v3 patch floating around, so somebody apparently cares ;-) FWIW, the only way to really deal with such structure would be to treat on-disk values as "fs-endian" and make the conversion to and from host-endian check the superblock. That would _really_ consolidate minix_..._bit() (turning them into __test_bit(nr ^ sbi->mangle, p), etc.) and would give support of big- and little-endian images for free. That's what we do e.g. in fs/sysv and it's neither harder nor seriously bigger than existing code. Whether we care to do that is a separate question, of course, and I certainly agree that not a lot of people care about the damn thing these days, no matter which architecture it is. If somebody wants to play with that code, they could just merge fs/minix into fs/sysv - that might very well turn out to be the right thing and a fun exercise. Codebases are very close - minixfs is a derivative of v7 filesystem, after all, and our fs/minix and fs/sysv had been kept mostly in sync. Might merge minix v3 into that while we are at it... If there are any takers for that kind of work, go ahead and if you run into problems - feel free to ask on fsdevel or l-k. I promise to review and comment, but I'm not signing up for doing the entire thing myself. If nobody picks that up, marking it broken on affected platforms is probably the best solution. The only problem here is that we don't have a uniform way to say "it's little-endian" in Kconfig, but that's something we ought to do anyway - too many places have things like (BROKEN || !(SPARC || PPC || PARISC || M68K || FRV)) in Kconfig dependencies. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: minixfs bitmaps and associated lossage 2006-05-06 23:10 ` Al Viro @ 2006-05-06 23:42 ` Linus Torvalds 2006-05-07 7:37 ` Al Viro 0 siblings, 1 reply; 7+ messages in thread From: Linus Torvalds @ 2006-05-06 23:42 UTC (permalink / raw) To: Al Viro; +Cc: linux-kernel, linux-fsdevel On Sun, 7 May 2006, Al Viro wrote: > > FWIW, the only way to really deal with such structure would be to treat > on-disk values as "fs-endian" and make the conversion to and from > host-endian check the superblock. That would _really_ consolidate > minix_..._bit() (turning them into __test_bit(nr ^ sbi->mangle, p), etc.) Yeah, especially for bitmaps, it really _should_ be pretty simple, since it's literally a bitwise xor of the bit number. It's actually worse for things that truly have byte order dependencies where the values span bytes and need re-ordering. For bits, that obviously will never be the case. > If somebody wants to play with that code, they could just merge fs/minix > into fs/sysv - that might very well turn out to be the right thing and > a fun exercise. Codebases are very close - minixfs is a derivative of > v7 filesystem, after all, and our fs/minix and fs/sysv had been kept > mostly in sync. Heh. Yes. The physical filesystem layout of minix is close to the old sysv one, and the implementation ends up being pretty closely related too, although the genealogy there is the other way around. However, I thought the direct sysv descendants used linked lists of free-block lists, not bitmaps? So while a lot of the _other_ part of the filesystem layout is similar, the actual free-block handling is very different. No? So there are things that are very similar (directory layout, inode format), and could probably be share, while other things (free block and inode handling) are fundamentally different, no? Linus ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: minixfs bitmaps and associated lossage 2006-05-06 23:42 ` Linus Torvalds @ 2006-05-07 7:37 ` Al Viro 0 siblings, 0 replies; 7+ messages in thread From: Al Viro @ 2006-05-07 7:37 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, linux-fsdevel On Sat, May 06, 2006 at 04:42:27PM -0700, Linus Torvalds wrote: > > If somebody wants to play with that code, they could just merge fs/minix > > into fs/sysv - that might very well turn out to be the right thing and > > a fun exercise. Codebases are very close - minixfs is a derivative of > > v7 filesystem, after all, and our fs/minix and fs/sysv had been kept > > mostly in sync. > > Heh. Yes. The physical filesystem layout of minix is close to the old sysv > one, and the implementation ends up being pretty closely related too, > although the genealogy there is the other way around. Actually, some things (e.g. indirect block tree handling and directory handling via pagecache) went the other way - from fs/sysv to fs/minix. > However, I thought the direct sysv descendants used linked lists of > free-block lists, not bitmaps? So while a lot of the _other_ part of the > filesystem layout is similar, the actual free-block handling is very > different. No? Yes and no - keep in mind that details of those lists are different for various sysvfs flavours, so sysv_new_block() et.al. check sbi->s_type anyway. And the entry points into [ib]alloc are parallel, so it's not hard to merge transparently for the rest of code. Superblock layouts are very different, obviously, but they are just as different among sysv flavours. Again, no big deal... BTW, there's a sysv flavour that uses bitmaps (EAFS); we only do it read-only, so that's not an issue with the current fs/sysv code. Again, what I'm saying is that figuring out details of doing it clean way would make a good exercise, not that we can't live without that. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: minixfs bitmaps and associated lossage 2006-05-06 22:04 ` minixfs bitmaps and associated lossage Al Viro 2006-05-06 22:25 ` Matthew Wilcox 2006-05-06 22:26 ` Linus Torvalds @ 2006-05-07 7:35 ` Pavel Machek 2 siblings, 0 replies; 7+ messages in thread From: Pavel Machek @ 2006-05-07 7:35 UTC (permalink / raw) To: Al Viro; +Cc: linux-kernel, linux-fsdevel, Linus Torvalds Hi! > Warning: text below is a mild example of software coproarchaeology, > so if you are easily squicked by tangled mess of bugs and dumb lossage, > well... you've been warned. :-) > So... What the hell can we do? Layouts (4) and (5) are clearly > broken and _never_ worked - there's nothing that would manage to create > such filesystem. So these are obvious candidates for switching - either > to (2) (correct) or to (3) (broken, but at least match util-linux fsck.minix > and mkfs.minix on such platforms). The question being, what do we do with > (3) (big-endian metadata, little-endian bitmaps) and what do we do with > Linux fsck.minix? Aside of repeating the mantra, that is ("All Software > Sucks, All Hardware Sucks")... Remove minix write support? Only writers care about bitmap layout, right? Pavel -- Thanks for all the (sleeping) penguins. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-05-08 7:02 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <44560796.8010700@gmail.com>
[not found] ` <20060506162956.GO27946@ftp.linux.org.uk>
[not found] ` <20060506163737.GP27946@ftp.linux.org.uk>
2006-05-06 22:04 ` minixfs bitmaps and associated lossage Al Viro
2006-05-06 22:25 ` Matthew Wilcox
2006-05-06 22:26 ` Linus Torvalds
2006-05-06 23:10 ` Al Viro
2006-05-06 23:42 ` Linus Torvalds
2006-05-07 7:37 ` Al Viro
2006-05-07 7:35 ` Pavel Machek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).