* RE: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes
@ 2001-12-20 19:06 Torrey Hoffman
2001-12-20 19:46 ` Linus Torvalds
0 siblings, 1 reply; 32+ messages in thread
From: Torrey Hoffman @ 2001-12-20 19:06 UTC (permalink / raw)
To: Tachino Nobuhiro; +Cc: andersen, linux-kernel, viro
Yes, this does fix the problem. Thank you very much!
Hopefully something like this will make it into 2.4.18?
(I have to admit I applied and tested your patch to
2.4.17-rc2, and did not test -rc2 without your patch.
However, -rc1 has the bug and I don't think any other changes
between -rc1 and -rc2 would have fixed this. If someone out
there wants me to, I can recompile and test -rc2 vanilla.)
Torrey Hoffman
Tachino Nobuhiro (tachino@open.nm.fujitsu.co.jp) wrote:
> Hello,
>
> Following patch may fix your problem.
>
> diff -r -u linux-2.4.17-rc2.org/drivers/block/rd.c
> linux-2.4.17-rc2/drivers/block/rd.c
> --- linux-2.4.17-rc2.org/drivers/block/rd.c Thu Dec 20 20:30:57 2001
> +++ linux-2.4.17-rc2/drivers/block/rd.c Thu Dec 20 20:46:53 2001
> @@ -194,9 +194,11 @@
> static int ramdisk_readpage(struct file *file, struct page * page)
> {
> if (!Page_Uptodate(page)) {
> - memset(kmap(page), 0, PAGE_CACHE_SIZE);
> - kunmap(page);
> - flush_dcache_page(page);
> + if (!page->buffers) {
> + memset(kmap(page), 0, PAGE_CACHE_SIZE);
> + kunmap(page);
> + flush_dcache_page(page);
> + }
> SetPageUptodate(page);
> }
> UnlockPage(page);
>
>
> grow_dev_page() creates not Uptodate page which has valid
> buffers, so
> it is wrong that ramdisk_readpage() clears whole page unconditionally.
>
>
> At Tue, 18 Dec 2001 12:14:49 -0800,
> Torrey Hoffman wrote:
> >
> > More information! And a workaround!
> >
> > I conjecture that the ramdisk driver (post 2.4.9) only grabs
> > VM pages properly if it is accessed directly, as a dd to
> > /dev/ram0 does. I further conjecture that accessing the
> > ramdisk through a mounted filesystem does not grab pages
> > properly.
> >
> > The reason I believe this is that removing the call to
> > "freeramdisk" from my original script avoids corruption.
> >
> > Another way to avoid ramdisk corruption is to
> > "dd if=/dev/zero of=/dev/ram0 bs=1k count=4000"
> > immediately after the call to freeramdisk.
> >
> > If my conjecture is right, then the corruption is caused
> > because mke2fs on a "freed" /dev/ram0 doesn't touch every
> > block of the fs, leaving "holes" where pages are not properly
> > grabbed from the VM. The resulting filesystem appears to work,
> > but dd'ing from /dev/ram0 gets a broken filesystem image.
> >
> > Note that "freeramdisk /dev/ram0" is pretty much just:
> > #define FLKFLSBUF _IO(0x12,97) /* flush buffer cache */
> > f = open("/dev/ram0", O_RDWR);
> > ioctl(f, BLKFLSBUF);
> >
> > To experiment for yourself, stick the following script in
> > a subdirectory which also contains a "testdir" directory
> > with about 3 MB of data.
> >
> > - - - - - - - -
> > #!/bin/bash
> >
> > # to tickle the bug, do the freeramdisk but not the
> > # dd from /dev/zero to /dev/ram0.
> >
> > freeramdisk /dev/ram0
> > #dd if=/dev/zero of=/dev/ram0 bs=1k count=4000
> >
> > mke2fs -m0 /dev/ram0 4000
> > mount -t ext2 /dev/ram0 /mnt/ramdisk
> > rm -rf /mnt/ramdisk/*
> >
> > cp -a ./testdir /mnt/ramdisk
> > umount /dev/ram0
> >
> > dd if=/dev/ram0 of=ram0.img bs=1k count=4000
> > dd if=ram0.img of=/dev/ram0 bs=1k count=4000
> >
> > mount -t ext2 /dev/ram0 /mnt/ramdisk
> > diff -q -r ./testdir /mnt/ramdisk/testdir
> >
> > # If diff reports mismatches, you saw the bug.
> >
> > umount /dev/ram0
> > - - - - - - - -
> >
> > If the gods of the VM and VFS don't bother to look at
> > it, I might take a peek at the relevant kernel code myself.
> > Might take two months of study before I know enough though.
> >
> > Torrey
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2001-12-20 19:06 ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes Torrey Hoffman @ 2001-12-20 19:46 ` Linus Torvalds 2001-12-20 22:56 ` Alexander Viro ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Linus Torvalds @ 2001-12-20 19:46 UTC (permalink / raw) To: torrey.hoffman, linux-kernel, Alexander Viro, Marcelo Tosatti In article <D52B19A7284D32459CF20D579C4B0C0211CB0F@mail0.myrio.com> you write: >Yes, this does fix the problem. Thank you very much! > >Hopefully something like this will make it into 2.4.18? This does not seem quite right. >Tachino Nobuhiro (tachino@open.nm.fujitsu.co.jp) wrote: >> Hello, >> >> Following patch may fix your problem. >> >> diff -r -u linux-2.4.17-rc2.org/drivers/block/rd.c >> linux-2.4.17-rc2/drivers/block/rd.c >> --- linux-2.4.17-rc2.org/drivers/block/rd.c Thu Dec 20 20:30:57 2001 >> +++ linux-2.4.17-rc2/drivers/block/rd.c Thu Dec 20 20:46:53 2001 >> @@ -194,9 +194,11 @@ >> static int ramdisk_readpage(struct file *file, struct page * page) >> { >> if (!Page_Uptodate(page)) { >> - memset(kmap(page), 0, PAGE_CACHE_SIZE); >> - kunmap(page); >> - flush_dcache_page(page); >> + if (!page->buffers) { >> + memset(kmap(page), 0, PAGE_CACHE_SIZE); >> + kunmap(page); >> + flush_dcache_page(page); >> + } >> SetPageUptodate(page); >> } >> UnlockPage(page); >> >> >> grow_dev_page() creates not Uptodate page which has valid >> buffers, so >> it is wrong that ramdisk_readpage() clears whole page unconditionally. The problem is that having buffers doesn't necessarily always mean that they are valid, nor that _all_ of them are valid. Also, if the ramdisk "readpage" code is wrong, then so is the "prepare_write" code. They share the same logic, which basically says that "if the page isn't up-to-date, then it is zero". Which is always true for normal read/write accesses, but as you found out it's not true when parts of the page have been accessed by filesystems through the buffers. So the code _should_ use a common helper, something like static void ramdisk_updatepage(struct page * page) { if (!Page_Uptodate(page)) { struct buffer_head *bh = page->buffers; void * address = page_address(page); if (bh) { struct buffer_head *tmp = bh; do { if (!buffer_uptodate(tmp)) { memset(address, 0, tmp->b_size); set_buffer_uptodate(tmp); } address += tmp->b_size; tmp = tmp->b_this_page; } while (tmp != bh); } else memset(address, 0, PAGE_SIZE); flush_dcache_page(page); SetPageUptodate(page); } } and then ramdisk_readpage() would just be kmap(page); ramdisk_updatepage(page); UnlockPage(page); kunmap(page); return 0; while ramdisk_prepare_write() would be ramdisk_updatepage(page); SetPageDirty(page); return 0; NOTE NOTE NOTE! This is untested. Please somebody test it, and in particular verify that there aren't any stupid highmem bugs, and resubmit the patch to me and Marcelo, ok? Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2001-12-20 19:46 ` Linus Torvalds @ 2001-12-20 22:56 ` Alexander Viro 2001-12-20 23:42 ` Andrea Arcangeli 2001-12-21 1:38 ` Tachino Nobuhiro 2 siblings, 0 replies; 32+ messages in thread From: Alexander Viro @ 2001-12-20 22:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: torrey.hoffman, linux-kernel, Marcelo Tosatti On Thu, 20 Dec 2001, Linus Torvalds wrote: > The problem is that having buffers doesn't necessarily always mean that > they are valid, nor that _all_ of them are valid. > > Also, if the ramdisk "readpage" code is wrong, then so is the > "prepare_write" code. They share the same logic, which basically says > that "if the page isn't up-to-date, then it is zero". Which is always > true for normal read/write accesses, but as you found out it's not true > when parts of the page have been accessed by filesystems through the > buffers. AFAICS, it's nastier than that. What's to stop buffer_heads to be freed under memory pressure? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2001-12-20 19:46 ` Linus Torvalds 2001-12-20 22:56 ` Alexander Viro @ 2001-12-20 23:42 ` Andrea Arcangeli 2001-12-21 1:49 ` Andrea Arcangeli 2001-12-21 1:38 ` Tachino Nobuhiro 2 siblings, 1 reply; 32+ messages in thread From: Andrea Arcangeli @ 2001-12-20 23:42 UTC (permalink / raw) To: Linus Torvalds Cc: torrey.hoffman, linux-kernel, Alexander Viro, Marcelo Tosatti On Thu, Dec 20, 2001 at 11:46:23AM -0800, Linus Torvalds wrote: > In article <D52B19A7284D32459CF20D579C4B0C0211CB0F@mail0.myrio.com> you write: > >Yes, this does fix the problem. Thank you very much! > > > >Hopefully something like this will make it into 2.4.18? > > This does not seem quite right. > > >Tachino Nobuhiro (tachino@open.nm.fujitsu.co.jp) wrote: > >> Hello, > >> > >> Following patch may fix your problem. > >> > >> diff -r -u linux-2.4.17-rc2.org/drivers/block/rd.c > >> linux-2.4.17-rc2/drivers/block/rd.c > >> --- linux-2.4.17-rc2.org/drivers/block/rd.c Thu Dec 20 20:30:57 2001 > >> +++ linux-2.4.17-rc2/drivers/block/rd.c Thu Dec 20 20:46:53 2001 > >> @@ -194,9 +194,11 @@ > >> static int ramdisk_readpage(struct file *file, struct page * page) > >> { > >> if (!Page_Uptodate(page)) { > >> - memset(kmap(page), 0, PAGE_CACHE_SIZE); > >> - kunmap(page); > >> - flush_dcache_page(page); > >> + if (!page->buffers) { > >> + memset(kmap(page), 0, PAGE_CACHE_SIZE); > >> + kunmap(page); > >> + flush_dcache_page(page); > >> + } > >> SetPageUptodate(page); > >> } > >> UnlockPage(page); > >> > >> > >> grow_dev_page() creates not Uptodate page which has valid > >> buffers, so > >> it is wrong that ramdisk_readpage() clears whole page unconditionally. > > The problem is that having buffers doesn't necessarily always mean that > they are valid, nor that _all_ of them are valid. > > Also, if the ramdisk "readpage" code is wrong, then so is the > "prepare_write" code. They share the same logic, which basically says > that "if the page isn't up-to-date, then it is zero". Which is always > true for normal read/write accesses, but as you found out it's not true > when parts of the page have been accessed by filesystems through the > buffers. subtle, while writing and testing that code the buffercache was not sharing the physical address space yet, so it was stable, then it kept to work somehow till today because only metadata writes into holes would trigger it. > > So the code _should_ use a common helper, something like > > static void ramdisk_updatepage(struct page * page) > { > if (!Page_Uptodate(page)) { > struct buffer_head *bh = page->buffers; > void * address = page_address(page); > if (bh) { > struct buffer_head *tmp = bh; > do { > if (!buffer_uptodate(tmp)) { > memset(address, 0, tmp->b_size); > set_buffer_uptodate(tmp); > } > address += tmp->b_size; > tmp = tmp->b_this_page; > } while (tmp != bh); > } else > memset(address, 0, PAGE_SIZE); > flush_dcache_page(page); > SetPageUptodate(page); > } > } > > and then ramdisk_readpage() would just be > > kmap(page); > ramdisk_updatepage(page); > UnlockPage(page); > kunmap(page); > return 0; > > while ramdisk_prepare_write() would be > > ramdisk_updatepage(page); > SetPageDirty(page); > return 0; agreed. rd_blkdev_pagecache_IO will as well do: err = 0; kmap(page); ramdisk_updatepage(page); kunmap(page); unlock = 1; Andrea ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2001-12-20 23:42 ` Andrea Arcangeli @ 2001-12-21 1:49 ` Andrea Arcangeli [not found] ` <3C22CF16.C78B1F19@zip.com.au> 2002-01-05 7:53 ` Andrew Morton 0 siblings, 2 replies; 32+ messages in thread From: Andrea Arcangeli @ 2001-12-21 1:49 UTC (permalink / raw) To: Linus Torvalds Cc: torrey.hoffman, linux-kernel, Alexander Viro, Marcelo Tosatti I diffed a more advanced version of it. It's not very well tested. --- 2.4.17rc2aa1/drivers/block/rd.c.~1~ Wed Dec 19 03:01:04 2001 +++ 2.4.17rc2aa1/drivers/block/rd.c Fri Dec 21 01:39:15 2001 @@ -191,26 +191,44 @@ * 2000 Transmeta Corp. * aops copied from ramfs. */ -static int ramdisk_readpage(struct file *file, struct page * page) +static void ramdisk_updatepage(struct page * page, int need_kmap) { if (!Page_Uptodate(page)) { - memset(kmap(page), 0, PAGE_CACHE_SIZE); - kunmap(page); + struct buffer_head *bh = page->buffers; + void * address; + + if (need_kmap) + kmap(page); + address = page_address(page); + if (bh) { + struct buffer_head *tmp = bh; + do { + if (!buffer_uptodate(tmp)) { + memset(address, 0, tmp->b_size); + mark_buffer_uptodate(tmp, 1); + } + address += tmp->b_size; + tmp = tmp->b_this_page; + } while (tmp != bh); + } else + memset(address, 0, PAGE_SIZE); + if (need_kmap) + kunmap(page); flush_dcache_page(page); SetPageUptodate(page); } +} + +static int ramdisk_readpage(struct file *file, struct page * page) +{ + ramdisk_updatepage(page, 1); UnlockPage(page); return 0; } static int ramdisk_prepare_write(struct file *file, struct page *page, unsigned offset, unsigned to) { - if (!Page_Uptodate(page)) { - void *addr = page_address(page); - memset(addr, 0, PAGE_CACHE_SIZE); - flush_dcache_page(page); - SetPageUptodate(page); - } + ramdisk_updatepage(page, 0); SetPageDirty(page); return 0; } @@ -233,10 +251,18 @@ unsigned long index; int offset, size, err; - err = -EIO; err = 0; mapping = rd_bdev[minor]->bd_inode->i_mapping; + /* writing a buffer cache not uptodate must not clear it */ + if (sbh->b_page->mapping == mapping) { + if (rw == WRITE) { + mark_buffer_uptodate(sbh, 1); + SetPageDirty(sbh->b_page); + } + goto out; + } + index = sbh->b_rsector >> (PAGE_CACHE_SHIFT - 9); offset = (sbh->b_rsector << 9) & ~PAGE_CACHE_MASK; size = sbh->b_size; @@ -262,11 +288,7 @@ goto out; err = 0; - if (!Page_Uptodate(page)) { - memset(kmap(page), 0, PAGE_CACHE_SIZE); - kunmap(page); - SetPageUptodate(page); - } + ramdisk_updatepage(page, 1); unlock = 1; } actually while testing it I unfortunately found also an fs corruption bug in the ->prepare_write/commit_write/writepage/direct_IO callbacks. It's a longstanding one, since get_block born. In short, if get_block fails while mapping on a certain page during prepare_write/writepage/direct_IO (like it can happen with a full filesystem, incidentally ext2 on /dev/ram0 during my testing that is only 4M and so it overflows fast), the blocks before the ENOSPC failure will be allocated, but the i_size won't be update accordingly (no commit write, because prepare_write failed in the middle). for the generic_file_write users it's easily fixable with an hack (truncating backwards because we don't know how far we got allocating blocks when we return from prepare_write), similar hack for the direct_IO one (that commits only once at the end in function of the direct_IO generated). But for writepage is quite a pain, infact I believe the writepage blocks should be reserved in-core, to guarantee we will never fail a truncate with ENOSPC. With the shared mappings we're effectively doing allocate on flush... but with the lack of space reservation that makes it unreliable :) So for now I did an hack to cure the other two (writepage can still corrupt the fs as said). I think the right fix (ala 2.5) is to change the API so we can use the last blocks too, but the below will cure 2.4 and for writepage the right fix IMHO is to do the reservation of the space. both holds the i_sem so calling truncate from there cannot race with any other inode-writer. --- 2.4.17rc2aa2/mm/filemap.c.~1~ Wed Dec 19 03:43:23 2001 +++ 2.4.17rc2aa2/mm/filemap.c Fri Dec 21 02:21:07 2001 @@ -3026,8 +3025,14 @@ kaddr = kmap(page); status = mapping->a_ops->prepare_write(file, page, offset, offset+bytes); - if (status) + if (status) { + if (status == -ENOSPC && inode->i_op && inode->i_op->truncate) { + lock_kernel(); + inode->i_op->truncate(inode); + unlock_kernel(); + } goto unlock; + } page_fault = __copy_from_user(kaddr+offset, buf, bytes); flush_dcache_page(page); status = mapping->a_ops->commit_write(file, page, offset, offset+bytes); @@ -3090,6 +3095,14 @@ if (end > inode->i_size && !S_ISBLK(inode->i_mode)) { inode->i_size = end; mark_inode_dirty(inode); + } + /* there's the possibility that some get_block left some leftover */ + if (written != count) { + if (inode->i_op && inode->i_op->truncate) { + lock_kernel(); + inode->i_op->truncate(inode); + unlock_kernel(); + } } *ppos = end; invalidate_inode_pages2(mapping); ah, and of course, other workaround is to use fs-blocksize == PAGE_SIZE 8) btw, I'll be away the next few days, so I will not be very responsive until after Christmas. Andrea ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <3C22CF16.C78B1F19@zip.com.au>]
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes [not found] ` <3C22CF16.C78B1F19@zip.com.au> @ 2001-12-29 15:40 ` Andrea Arcangeli 2001-12-30 6:19 ` Andrew Morton 0 siblings, 1 reply; 32+ messages in thread From: Andrea Arcangeli @ 2001-12-29 15:40 UTC (permalink / raw) To: Andrew Morton Cc: Linus Torvalds, torrey.hoffman, linux-kernel, Alexander Viro, Marcelo Tosatti On Thu, Dec 20, 2001 at 09:56:40PM -0800, Andrew Morton wrote: > Andrea Arcangeli wrote: > > > > ... > > actually while testing it I unfortunately found also an fs corruption > > bug in the ->prepare_write/commit_write/writepage/direct_IO callbacks. > > It's a longstanding one, since get_block born. In short, if get_block > > fails while mapping on a certain page during > > prepare_write/writepage/direct_IO (like it can happen with a full > > filesystem, incidentally ext2 on /dev/ram0 during my testing that is > > only 4M and so it overflows fast), the blocks before the ENOSPC failure > > will be allocated, but the i_size won't be update accordingly (no commit > > write, because prepare_write failed in the middle). for the > > generic_file_write users it's easily fixable with an hack (truncating > > backwards because we don't know how far we got allocating blocks when we > > return from prepare_write), similar hack for the direct_IO one (that > > commits only once at the end in function of the direct_IO generated). > > But for writepage is quite a pain, infact I believe the writepage blocks > > should be reserved in-core, to guarantee we will never fail a truncate > > with ENOSPC. With the shared mappings we're effectively doing allocate > > on flush... but with the lack of space reservation that makes it > > unreliable :) > > The -ac kernels handled this by zeroing out the accidentally-allocated > blocks at __block_prepare_write. actually my fix seems cleaner because it puts the "clearing" in one single place. > > So for now I did an hack to cure the other two (writepage can still > > corrupt the fs as said). I think the right fix (ala 2.5) is to change > > the API so we can use the last blocks too, but the below will cure 2.4 > > and for writepage the right fix IMHO is to do the reservation of the > > space. > > This is better in a way because it reclaims the eztra few blocks. But > the -ac approach also works for writepage(). yes, it can solve the metadata corruption (assuming the locking is right, I can as well call ->truncate within writepage but it's not obvious at all that it won't race because we don't hold the i_sem within writepage), but the data corruption still holds. I mean, there's no failure path to notify userspace that a certain page fault is writing into a page over an hole, that we don't have space to later allocate on disk. so to me it sounds like MAP_SHARED should preallocate the space of the holes so you will know that the writes into the MAP_SHARED segments won't be lost (current state of things will lead to silent corruption and pinned dirty pages in ram, aka broken allocate on flush like previously said). > Why was that code not brought across? Who developed that code? Can the author of the code forward port it to 2.4.18pre and post a patch to the list so we can review? thanks, Avoiding the matadata corruption would be a good start at least for 2.4, then we should just focus on the writepage locking that could race with the other "create=1" get_blocks. If it doesn't race I will certainly agree on that approch for 2.4. Andrea ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2001-12-29 15:40 ` Andrea Arcangeli @ 2001-12-30 6:19 ` Andrew Morton 2001-12-30 6:33 ` Alexander Viro ` (3 more replies) 0 siblings, 4 replies; 32+ messages in thread From: Andrew Morton @ 2001-12-30 6:19 UTC (permalink / raw) To: Andrea Arcangeli Cc: Linus Torvalds, torrey.hoffman, linux-kernel, Alexander Viro, Marcelo Tosatti, Stephen C. Tweedie [ this thread started at http://www.lib.uaa.alaska.edu/linux-kernel/archive/2001-Week-51/1001.html ] Andrea Arcangeli wrote: > > On Thu, Dec 20, 2001 at 09:56:40PM -0800, Andrew Morton wrote: > > Andrea Arcangeli wrote: > > > > > > ... > > > actually while testing it I unfortunately found also an fs corruption > > > bug in the ->prepare_write/commit_write/writepage/direct_IO callbacks. > > > It's a longstanding one, since get_block born. In short, if get_block > > > fails while mapping on a certain page during > > > prepare_write/writepage/direct_IO (like it can happen with a full > > > filesystem, incidentally ext2 on /dev/ram0 during my testing that is > > > only 4M and so it overflows fast), the blocks before the ENOSPC failure > > > will be allocated, but the i_size won't be update accordingly (no commit > > > write, because prepare_write failed in the middle). for the > > > generic_file_write users it's easily fixable with an hack (truncating > > > backwards because we don't know how far we got allocating blocks when we > > > return from prepare_write), similar hack for the direct_IO one (that > > > commits only once at the end in function of the direct_IO generated). > > > But for writepage is quite a pain, infact I believe the writepage blocks > > > should be reserved in-core, to guarantee we will never fail a truncate > > > with ENOSPC. With the shared mappings we're effectively doing allocate > > > on flush... but with the lack of space reservation that makes it > > > unreliable :) > > > > The -ac kernels handled this by zeroing out the accidentally-allocated > > blocks at __block_prepare_write. > > actually my fix seems cleaner because it puts the "clearing" in one single > place. I think so too. Even for ext3, which has a very complex truncate, it appears to be OK. > > > So for now I did an hack to cure the other two (writepage can still > > > corrupt the fs as said). I think the right fix (ala 2.5) is to change > > > the API so we can use the last blocks too, but the below will cure 2.4 > > > and for writepage the right fix IMHO is to do the reservation of the > > > space. > > > > This is better in a way because it reclaims the eztra few blocks. But > > the -ac approach also works for writepage(). > > yes, it can solve the metadata corruption (assuming the locking is Where can metadata corruption occur? A few extra blocks outside i_size for ext2 directories isn't going to cause corruption, is it? Or are you referring to i_blocks accounting being incorrect? > right, I can as well call ->truncate within writepage but it's not > obvious at all that it won't race because we don't hold the i_sem within > writepage), but the data corruption still holds. For sure - holding i_sem on truncate is abolutely required. > I mean, there's no > failure path to notify userspace that a certain page fault is writing > into a page over an hole, that we don't have space to later allocate on > disk. so to me it sounds like MAP_SHARED should preallocate the space of > the holes so you will know that the writes into the MAP_SHARED segments > won't be lost (current state of things will lead to silent corruption > and pinned dirty pages in ram, aka broken allocate on flush like > previously said). Um. How does this differ from an I/O error on flush? Would it be necessary to preallocate the holes at mmap() time? Mad hand-waving: Could we not perform the instantiation at pagefault time, and give the caller SIGBUS if we cannot allocate the blocks? Or if there's an IO error, or quota exceeded. > > Why was that code not brought across? > > Who developed that code? Can the author of the code forward port it to > 2.4.18pre and post a patch to the list so we can review? thanks, > Avoiding the matadata corruption would be a good start at least for 2.4, > then we should just focus on the writepage locking that could race with the > other "create=1" get_blocks. If it doesn't race I will certainly agree > on that approch for 2.4. > It appeared in 2.4.2-ac25, and it looks like sct was the author: o Fix higmem block_prepare_write crash (Stephen Tweedie) Which is interesting - from the changelog it looks like he was fixing a different problem! I always though that code was there to prevent leakage of stale blocks. Stephen? A 2.4.18-pre1 version is below. It compiles, but I haven't actually exercised that code path. It's not a pretty fix, IMO. It leaves dangling blocks outside i_size which will make fsck unhappy. It also breaks ext3 a little bit - those blocks should be journalled and in theory we'd need to clone off private copies of __block_prepare_write() and unmap_underlying_metadata() to do this. Which would be irritating, but not the end of the world. (Should have done this in the -ac version of ext3, but I never noticed it). --- linux-2.4.18-pre1/fs/buffer.c Fri Dec 21 11:19:14 2001 +++ linux-akpm/fs/buffer.c Sat Dec 29 21:53:46 2001 @@ -1639,6 +1639,17 @@ static int __block_prepare_write(struct } return 0; out: + bh = head; + block_start = 0; + do { + if (buffer_new(bh) && !buffer_uptodate(bh)) { + memset(kaddr+block_start, 0, bh->b_size); + set_bit(BH_Uptodate, &bh->b_state); + mark_buffer_dirty(bh); + } + block_start += bh->b_size; + bh = bh->b_this_page; + } while (bh != head); return err; } Question: can someone please define BH_New? Its lifecycle seems very vague. We never actually seem to *clear* it anywhere for ext2, and it appears that the kernel will keep on treating a clearly non-new buffer as "new" all the time. ext3 explicitly clears BH_New in get_block(), if it finds the block was already present in the file. I did this because we need the newness info for internal purposes. - ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2001-12-30 6:19 ` Andrew Morton @ 2001-12-30 6:33 ` Alexander Viro 2001-12-30 6:38 ` Andrew Morton ` (2 more replies) 2001-12-30 23:56 ` Andrea Arcangeli ` (2 subsequent siblings) 3 siblings, 3 replies; 32+ messages in thread From: Alexander Viro @ 2001-12-30 6:33 UTC (permalink / raw) To: Andrew Morton Cc: Andrea Arcangeli, Linus Torvalds, torrey.hoffman, linux-kernel, Marcelo Tosatti, Stephen C. Tweedie On Sat, 29 Dec 2001, Andrew Morton wrote: > Would it be necessary to preallocate the holes at mmap() time? Mad > hand-waving: Could we not perform the instantiation at pagefault time, > and give the caller SIGBUS if we cannot allocate the blocks? Or if > there's an IO error, or quota exceeded. Allocation at mmap() Is Not Going To Happen. Consider it vetoed. There are applications that use mmap() on large and very sparse files. > Question: can someone please define BH_New? Its lifecycle seems > very vague. We never actually seem to *clear* it anywhere for > ext2, and it appears that the kernel will keep on treating a > clearly non-new buffer as "new" all the time. ext3 explicitly > clears BH_New in get_block(), if it finds the block was already > present in the file. I did this because we need the newness > info for internal purposes. It should be reset when we submit IO. Breakage related to failing allocation is indeed not new, but that's a long story. And no, "allocate on mmap()" is not a fix. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2001-12-30 6:33 ` Alexander Viro @ 2001-12-30 6:38 ` Andrew Morton 2001-12-30 7:17 ` Alexander Viro 2001-12-31 0:08 ` ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes Andrea Arcangeli 2001-12-30 7:08 ` Andrew Morton 2001-12-31 0:05 ` ramdisk corruption problems - was: RE: pivot_root and initrd kern " Andrea Arcangeli 2 siblings, 2 replies; 32+ messages in thread From: Andrew Morton @ 2001-12-30 6:38 UTC (permalink / raw) To: Alexander Viro Cc: Andrea Arcangeli, Linus Torvalds, torrey.hoffman, linux-kernel, Marcelo Tosatti, Stephen C. Tweedie Alexander Viro wrote: > > On Sat, 29 Dec 2001, Andrew Morton wrote: > > > Would it be necessary to preallocate the holes at mmap() time? Mad > > hand-waving: Could we not perform the instantiation at pagefault time, > > and give the caller SIGBUS if we cannot allocate the blocks? Or if > > there's an IO error, or quota exceeded. > > Allocation at mmap() Is Not Going To Happen. Consider it vetoed. > There are applications that use mmap() on large and very sparse > files. I think Andrea was referring to simply reserving the necessary amount of disk space, rather than actually instantiating the blocks. But even that would be a big problem for the applications which you describe. > > Question: can someone please define BH_New? Its lifecycle seems > > very vague. We never actually seem to *clear* it anywhere for > > ext2, and it appears that the kernel will keep on treating a > > clearly non-new buffer as "new" all the time. ext3 explicitly > > clears BH_New in get_block(), if it finds the block was already > > present in the file. I did this because we need the newness > > info for internal purposes. > > It should be reset when we submit IO. well... It isn't. And I'd like a chance to review/test any proposed changes in this area which are outside specific filesystems... > Breakage related to failing allocation is indeed not new, but > that's a long story. And no, "allocate on mmap()" is not a fix. Yup. But what *is* the fix? (filemap_nopage?) - ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2001-12-30 6:38 ` Andrew Morton @ 2001-12-30 7:17 ` Alexander Viro 2001-12-30 10:15 ` ramdisk corruption problems - was: RE: pivot_root and initrd Alan Cox 2001-12-31 0:08 ` ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes Andrea Arcangeli 1 sibling, 1 reply; 32+ messages in thread From: Alexander Viro @ 2001-12-30 7:17 UTC (permalink / raw) To: Andrew Morton Cc: Andrea Arcangeli, Linus Torvalds, torrey.hoffman, linux-kernel, Marcelo Tosatti, Stephen C. Tweedie On Sat, 29 Dec 2001, Andrew Morton wrote: > > Breakage related to failing allocation is indeed not new, but > > that's a long story. And no, "allocate on mmap()" is not a fix. > > Yup. But what *is* the fix? (filemap_nopage?) For ->writepage() - nothing. It _is_ asynchronous and it _can_ fail. Due to failing allocations, IO errors, whatever. Now, the fs consistency stuff is a different story. Fixes had been in -ac since before 2.4.0 and I distinctly remember at least one of 3 area getting synced with -linus. My fault - I assumed that the whole patch went there at that point. I'll try to dig the rest out. 2.4.9-ac* is probably a good starting point - they are in generic_file_write() and in __block_write_full_page() ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd 2001-12-30 7:17 ` Alexander Viro @ 2001-12-30 10:15 ` Alan Cox 0 siblings, 0 replies; 32+ messages in thread From: Alan Cox @ 2001-12-30 10:15 UTC (permalink / raw) To: Alexander Viro Cc: Andrew Morton, Andrea Arcangeli, Linus Torvalds, torrey.hoffman, linux-kernel, Marcelo Tosatti, Stephen C. Tweedie > Now, the fs consistency stuff is a different story. Fixes had been > in -ac since before 2.4.0 and I distinctly remember at least one of > 3 area getting synced with -linus. My fault - I assumed that the > whole patch went there at that point. I'll try to dig the rest out. I didnt feed Marcelo the final truncate stuff nor the file write fixes, it was getting too close to 2.4.17 and they are not completely risk free sort of changes. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2001-12-30 6:38 ` Andrew Morton 2001-12-30 7:17 ` Alexander Viro @ 2001-12-31 0:08 ` Andrea Arcangeli 1 sibling, 0 replies; 32+ messages in thread From: Andrea Arcangeli @ 2001-12-31 0:08 UTC (permalink / raw) To: Andrew Morton Cc: Alexander Viro, Linus Torvalds, torrey.hoffman, linux-kernel, Marcelo Tosatti, Stephen C. Tweedie On Sat, Dec 29, 2001 at 10:38:14PM -0800, Andrew Morton wrote: > Alexander Viro wrote: > > > > On Sat, 29 Dec 2001, Andrew Morton wrote: > > > > > Would it be necessary to preallocate the holes at mmap() time? Mad > > > hand-waving: Could we not perform the instantiation at pagefault time, > > > and give the caller SIGBUS if we cannot allocate the blocks? Or if > > > there's an IO error, or quota exceeded. > > > > Allocation at mmap() Is Not Going To Happen. Consider it vetoed. > > There are applications that use mmap() on large and very sparse > > files. > > I think Andrea was referring to simply reserving the necessary > amount of disk space, rather than actually instantiating the > blocks. But even that would be a big problem for the applications correct. > which you describe. Nod. > > > > Question: can someone please define BH_New? Its lifecycle seems > > > very vague. We never actually seem to *clear* it anywhere for > > > ext2, and it appears that the kernel will keep on treating a > > > clearly non-new buffer as "new" all the time. ext3 explicitly > > > clears BH_New in get_block(), if it finds the block was already > > > present in the file. I did this because we need the newness > > > info for internal purposes. > > > > It should be reset when we submit IO. > > well... It isn't. And I'd like a chance to review/test any see the other email, it's all right as far I can tell, it doesn't need to be resetted ever. only mapped bh are bh_new so you will never care about bh_new any longer because you will never need any further get_block on the same bh. > proposed changes in this area which are outside specific filesystems... > > > Breakage related to failing allocation is indeed not new, but > > that's a long story. And no, "allocate on mmap()" is not a fix. > > Yup. But what *is* the fix? (filemap_nopage?) > > - Andrea ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2001-12-30 6:33 ` Alexander Viro 2001-12-30 6:38 ` Andrew Morton @ 2001-12-30 7:08 ` Andrew Morton 2001-12-30 7:29 ` Alexander Viro 2001-12-31 0:05 ` ramdisk corruption problems - was: RE: pivot_root and initrd kern " Andrea Arcangeli 2 siblings, 1 reply; 32+ messages in thread From: Andrew Morton @ 2001-12-30 7:08 UTC (permalink / raw) To: Alexander Viro Cc: Andrea Arcangeli, Linus Torvalds, torrey.hoffman, linux-kernel, Marcelo Tosatti, Stephen C. Tweedie As you pointed out offline, the -ac kernels fixed the problem using *both* approaches. Here's a 2.4.18-pre1 version. generic_file_write() is getting rather icky. Please let me know if this looks like the way to proceed, and I'll find a way to actually test the thing. --- linux-2.4.18-pre1/fs/buffer.c Fri Dec 21 11:19:14 2001 +++ linux-akpm/fs/buffer.c Sat Dec 29 22:46:21 2001 @@ -1639,6 +1639,17 @@ static int __block_prepare_write(struct } return 0; out: + bh = head; + block_start = 0; + do { + if (buffer_new(bh) && !buffer_uptodate(bh)) { + memset(kaddr+block_start, 0, bh->b_size); + set_bit(BH_Uptodate, &bh->b_state); + mark_buffer_dirty(bh); + } + block_start += bh->b_size; + bh = bh->b_this_page; + } while (bh != head); return err; } --- linux-2.4.18-pre1/mm/filemap.c Wed Dec 26 11:47:41 2001 +++ linux-akpm/mm/filemap.c Sat Dec 29 23:04:29 2001 @@ -2856,6 +2856,7 @@ generic_file_write(struct file *file,con ssize_t written; long status = 0; int err; + int prepare_write_failed = 0; unsigned bytes; if ((ssize_t) count < 0) @@ -3003,8 +3004,10 @@ generic_file_write(struct file *file,con kaddr = kmap(page); status = mapping->a_ops->prepare_write(file, page, offset, offset+bytes); - if (status) + if (status) { + prepare_write_failed = 1; goto unlock; + } page_fault = __copy_from_user(kaddr+offset, buf, bytes); flush_dcache_page(page); status = mapping->a_ops->commit_write(file, page, offset, offset+bytes); @@ -3026,8 +3029,18 @@ unlock: UnlockPage(page); page_cache_release(page); - if (status < 0) + if (status < 0) { + if (prepare_write_failed) { + /* + * If blocksize < pagesize, prepare_write() may have + * instantiated a few blocks outside i_size. Trim + * these off again. + */ + if (pos + bytes > inode->i_size) + vmtruncate(inode, inode->i_size); + } break; + } } while (count); *ppos = pos; ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2001-12-30 7:08 ` Andrew Morton @ 2001-12-30 7:29 ` Alexander Viro 2001-12-30 7:59 ` ramdisk corruption problems - was: RE: pivot_root and initrdkern " Andrew Morton 0 siblings, 1 reply; 32+ messages in thread From: Alexander Viro @ 2001-12-30 7:29 UTC (permalink / raw) To: Andrew Morton Cc: Andrea Arcangeli, Linus Torvalds, torrey.hoffman, linux-kernel, Marcelo Tosatti, Stephen C. Tweedie On Sat, 29 Dec 2001, Andrew Morton wrote: > As you pointed out offline, the -ac kernels fixed the problem using > *both* approaches. Here's a 2.4.18-pre1 version. generic_file_write() > is getting rather icky. > > Please let me know if this looks like the way to proceed, and I'll > find a way to actually test the thing. Erm... I don't think so. Come on - vmtruncate() is obvious candidate for out-of-line. Please, look how it was done in 2.4.9-ac*. BTW, fs/buffer.c part of patch looks mangled. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrdkern el panic woes 2001-12-30 7:29 ` Alexander Viro @ 2001-12-30 7:59 ` Andrew Morton 2001-12-30 17:40 ` Linus Torvalds 0 siblings, 1 reply; 32+ messages in thread From: Andrew Morton @ 2001-12-30 7:59 UTC (permalink / raw) To: Alexander Viro Cc: Andrea Arcangeli, Linus Torvalds, torrey.hoffman, linux-kernel, Marcelo Tosatti, Stephen C. Tweedie Alexander Viro wrote: > > Erm... I don't think so. Come on - vmtruncate() is obvious candidate > for out-of-line. Please, look how it was done in 2.4.9-ac*. Yeah, I thought that was just too damn ugly to make it out of my ethernet port. Oh well, let's try it anyway. > BTW, fs/buffer.c part of patch looks mangled. Looks OK to me. Here's the latest. It includes the block_write_full_page() chunk (minus the access-bh-after-submit_bh bug which was in the original). So we have three chunks: generic_file_write(): truncate the blocks when prepare_write() fails. This is for ENOSPC on write-to-end-of-file. block_write_full_page(): submit the blocks which we managed to map. To prevent exposure of old data. __block_prepare_write(): zero out and dirty the blocks which we managed to instantiate. This is to prevent exposure of stale data on ENOSPC during write() to mid-file. Question: is the block_write_full_page() change correct? We mark the page not-up-to-date, but end_buffer_io_async() is going to mark it uptodate anyway. And what should we do with BH_New? --- linux-2.4.18-pre1/fs/buffer.c Fri Dec 21 11:19:14 2001 +++ linux-akpm/fs/buffer.c Sat Dec 29 23:39:11 2001 @@ -1512,6 +1512,7 @@ static int __block_write_full_page(struc int err, i; unsigned long block; struct buffer_head *bh, *head; + int need_unlock; if (!PageLocked(page)) BUG(); @@ -1567,8 +1568,34 @@ static int __block_write_full_page(struc return 0; out: + /* + * ENOSPC, or some other error. We may already have added some + * blocks to the file, so we need to write these out to avoid + * exposing stale data. + */ ClearPageUptodate(page); - UnlockPage(page); + bh = head; + need_unlock = 1; + /* Recovery: lock and submit the mapped buffers */ + do { + if (buffer_mapped(bh)) { + lock_buffer(bh); + need_unlock = 0; + } + bh = bh->b_this_page; + } while (bh != head); + do { + struct buffer_head *next = bh->b_this_page; + if (buffer_mapped(bh)) { + set_buffer_async_io(bh); + set_bit(BH_Uptodate, &bh->b_state); + clear_bit(BH_Dirty, &bh->b_state); + submit_bh(WRITE, bh); + } + bh = next; + } while (bh != head); + if (need_unlock) + UnlockPage(page); return err; } @@ -1639,6 +1666,17 @@ static int __block_prepare_write(struct } return 0; out: + bh = head; + block_start = 0; + do { + if (buffer_new(bh) && !buffer_uptodate(bh)) { + memset(kaddr+block_start, 0, bh->b_size); + set_bit(BH_Uptodate, &bh->b_state); + mark_buffer_dirty(bh); + } + block_start += bh->b_size; + bh = bh->b_this_page; + } while (bh != head); return err; } --- linux-2.4.18-pre1/mm/filemap.c Wed Dec 26 11:47:41 2001 +++ linux-akpm/mm/filemap.c Sat Dec 29 23:44:03 2001 @@ -3004,7 +3004,7 @@ generic_file_write(struct file *file,con kaddr = kmap(page); status = mapping->a_ops->prepare_write(file, page, offset, offset+bytes); if (status) - goto unlock; + goto sync_failure; page_fault = __copy_from_user(kaddr+offset, buf, bytes); flush_dcache_page(page); status = mapping->a_ops->commit_write(file, page, offset, offset+bytes); @@ -3029,6 +3029,7 @@ unlock: if (status < 0) break; } while (count); +done: *ppos = pos; if (cached_page) @@ -3050,6 +3051,18 @@ out: fail_write: status = -EFAULT; goto unlock; + +sync_failure: + /* + * If blocksize < pagesize, prepare_write() may have instantiated a + * few blocks outside i_size. Trim these off again. + */ + kunmap(page); + UnlockPage(page); + page_cache_release(page); + if (pos + bytes > inode->i_size) + vmtruncate(inode, inode->i_size); + goto done; o_direct: written = generic_file_direct_IO(WRITE, file, (char *) buf, count, pos); ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrdkern el panic woes 2001-12-30 7:59 ` ramdisk corruption problems - was: RE: pivot_root and initrdkern " Andrew Morton @ 2001-12-30 17:40 ` Linus Torvalds 2001-12-31 0:28 ` Andrea Arcangeli 0 siblings, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2001-12-30 17:40 UTC (permalink / raw) To: Andrew Morton Cc: Alexander Viro, Andrea Arcangeli, torrey.hoffman, linux-kernel, Marcelo Tosatti, Stephen C. Tweedie On Sat, 29 Dec 2001, Andrew Morton wrote: > > And what should we do with BH_New? The only point of BH_New was to not need this horror in three different places, and have the BH_New bit as a way of saying "this buffer has no contents yet", and fill it with zeroes in just _one_ place (ie the readpage path). However, I don't think it was ever implemented, so if you prefer the straightforward (brute-force but ugly) approach, just get rid of it. Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrdkern el panic woes 2001-12-30 17:40 ` Linus Torvalds @ 2001-12-31 0:28 ` Andrea Arcangeli 2001-12-31 0:35 ` Linus Torvalds 0 siblings, 1 reply; 32+ messages in thread From: Andrea Arcangeli @ 2001-12-31 0:28 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Alexander Viro, torrey.hoffman, linux-kernel, Marcelo Tosatti, Stephen C. Tweedie On Sun, Dec 30, 2001 at 09:40:11AM -0800, Linus Torvalds wrote: > > On Sat, 29 Dec 2001, Andrew Morton wrote: > > > > And what should we do with BH_New? > > The only point of BH_New was to not need this horror in three different > places, and have the BH_New bit as a way of saying "this buffer has no > contents yet", and fill it with zeroes in just _one_ place (ie the > readpage path). actually bh_new is needed also to serialize with the buffercache, a new bh mapped in pagecache must be dropped from the buffercache before we can start using it (unmap_underlying_metadata). so at least for 2.4 I wouldn't drop it :) > > However, I don't think it was ever implemented, so if you prefer the > straightforward (brute-force but ugly) approach, just get rid of it. > > Linus Andrea ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrdkern el panic woes 2001-12-31 0:28 ` Andrea Arcangeli @ 2001-12-31 0:35 ` Linus Torvalds 2001-12-31 1:00 ` Andrea Arcangeli 0 siblings, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2001-12-31 0:35 UTC (permalink / raw) To: Andrea Arcangeli Cc: Andrew Morton, Alexander Viro, torrey.hoffman, linux-kernel, Marcelo Tosatti, Stephen C. Tweedie On Mon, 31 Dec 2001, Andrea Arcangeli wrote: > > actually bh_new is needed also to serialize with the buffercache, a new > bh mapped in pagecache must be dropped from the buffercache before we > can start using it (unmap_underlying_metadata). You're right, although it's something of an optimization (ie we could as well just depend on the "mapped" bit and watch it change). Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrdkern el panic woes 2001-12-31 0:35 ` Linus Torvalds @ 2001-12-31 1:00 ` Andrea Arcangeli 0 siblings, 0 replies; 32+ messages in thread From: Andrea Arcangeli @ 2001-12-31 1:00 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Alexander Viro, torrey.hoffman, linux-kernel, Marcelo Tosatti, Stephen C. Tweedie On Sun, Dec 30, 2001 at 04:35:46PM -0800, Linus Torvalds wrote: > > On Mon, 31 Dec 2001, Andrea Arcangeli wrote: > > > > actually bh_new is needed also to serialize with the buffercache, a new > > bh mapped in pagecache must be dropped from the buffercache before we > > can start using it (unmap_underlying_metadata). > > You're right, although it's something of an optimization (ie we could as > well just depend on the "mapped" bit and watch it change). we could even hold the optimization (cache coherency only on new blocks) by pushing the cache coherency into the lowlevel just like the bh clearing, but the current buffer_new branch in the library code seems clean (and potentially a little faster with big softblocksize due the partial clearing). Andrea ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2001-12-30 6:33 ` Alexander Viro 2001-12-30 6:38 ` Andrew Morton 2001-12-30 7:08 ` Andrew Morton @ 2001-12-31 0:05 ` Andrea Arcangeli 2002-01-05 11:43 ` Andrew Morton 2 siblings, 1 reply; 32+ messages in thread From: Andrea Arcangeli @ 2001-12-31 0:05 UTC (permalink / raw) To: Alexander Viro Cc: Andrew Morton, Linus Torvalds, torrey.hoffman, linux-kernel, Marcelo Tosatti, Stephen C. Tweedie On Sun, Dec 30, 2001 at 01:33:24AM -0500, Alexander Viro wrote: > > > On Sat, 29 Dec 2001, Andrew Morton wrote: > > > Would it be necessary to preallocate the holes at mmap() time? Mad > > hand-waving: Could we not perform the instantiation at pagefault time, > > and give the caller SIGBUS if we cannot allocate the blocks? Or if > > there's an IO error, or quota exceeded. > > Allocation at mmap() Is Not Going To Happen. Consider it vetoed. > There are applications that use mmap() on large and very sparse > files. it's exactly this kind of apps that will be screwed up by silent data corruption. the point of the holes is to optimize performance and save space, but they shouldn't introduce the possibilty of data corruption. Note: I'm fine to introduce another way to notify the app about -ENOSPC, -ENOSPC on mmap is the most obvious one, but we could still allow the current "overcommit" behaviour with a kind of sigbus mentioned by Andrew (possibly not sigbus though, since it has just well defined semantics for MAP_SHARED, maybe they could be extended, anyways as said this is only a matter of API). My point is only that some API should be added because your mmap on sparse files are unreliable at the moment. > > Question: can someone please define BH_New? Its lifecycle seems > > very vague. We never actually seem to *clear* it anywhere for > > ext2, and it appears that the kernel will keep on treating a > > clearly non-new buffer as "new" all the time. ext3 explicitly > > clears BH_New in get_block(), if it finds the block was already > > present in the file. I did this because we need the newness > > info for internal purposes. > > It should be reset when we submit IO. Breakage related to failing > allocation is indeed not new, but that's a long story. And no, > "allocate on mmap()" is not a fix. it is a fix (assuming people will understand/expect this retval :), but yes, the current overcommit behaviour has some value so I agree some other async API ala sigbus may be more desiderable. Andrea ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2001-12-31 0:05 ` ramdisk corruption problems - was: RE: pivot_root and initrd kern " Andrea Arcangeli @ 2002-01-05 11:43 ` Andrew Morton 2002-01-05 14:04 ` Trond Myklebust 2002-01-07 3:08 ` Andrea Arcangeli 0 siblings, 2 replies; 32+ messages in thread From: Andrew Morton @ 2002-01-05 11:43 UTC (permalink / raw) To: Andrea Arcangeli Cc: Alexander Viro, Linus Torvalds, torrey.hoffman, linux-kernel, Marcelo Tosatti, Stephen C. Tweedie, Trond Myklebust Andrea Arcangeli wrote: > > Note: I'm fine to introduce another way to notify the app about -ENOSPC, > -ENOSPC on mmap is the most obvious one, but we could still allow the > current "overcommit" behaviour with a kind of sigbus mentioned by > Andrew (possibly not sigbus though, since it has just well defined > semantics for MAP_SHARED, maybe they could be extended, anyways as said > this is only a matter of API). My point is only that some API should be > added because your mmap on sparse files are unreliable at the moment. > The very least we can do is to return a sensible error code from msync(). At present, if you create a 200 meg mapping on a 100 meg disk, dirty it all and run msync(MS_SYNC), the damn thing returns zero and you don't know that you lost half your data. The patch makes filemap_fdatasync() and filemap_fdatawait() return an error code, and propagates that up through all callers, including fsync() and fdatasync(). Please review, especially the nfs and generic_file_direct_IO() changes. There's also a half-assed attempt to implement MS_ASYNC in here. If anyone thinks that's not worth the effort, I won't be offended. --- linux-2.4.18-pre1/include/linux/fs.h Fri Dec 21 11:19:23 2001 +++ linux-akpm/include/linux/fs.h Sat Jan 5 03:21:07 2002 @@ -1212,8 +1212,8 @@ extern int osync_inode_data_buffers(stru extern int fsync_inode_buffers(struct inode *); extern int fsync_inode_data_buffers(struct inode *); extern int inode_has_buffers(struct inode *); -extern void filemap_fdatasync(struct address_space *); -extern void filemap_fdatawait(struct address_space *); +extern int filemap_fdatasync(struct address_space *); +extern int filemap_fdatawait(struct address_space *); extern void sync_supers(kdev_t); extern int bmap(struct inode *, int); extern int notify_change(struct dentry *, struct iattr *); --- linux-2.4.18-pre1/mm/filemap.c Wed Dec 26 11:47:41 2001 +++ linux-akpm/mm/filemap.c Sat Jan 5 03:23:09 2002 @@ -582,8 +582,9 @@ EXPORT_SYMBOL(fail_writepage); * @mapping: address space structure to write * */ -void filemap_fdatasync(struct address_space * mapping) +int filemap_fdatasync(struct address_space * mapping) { + int ret = 0; int (*writepage)(struct page *) = mapping->a_ops->writepage; spin_lock(&pagecache_lock); @@ -603,8 +604,11 @@ void filemap_fdatasync(struct address_sp lock_page(page); if (PageDirty(page)) { + int err; ClearPageDirty(page); - writepage(page); + err = writepage(page); + if (err && !ret) + ret = err; } else UnlockPage(page); @@ -612,6 +616,7 @@ void filemap_fdatasync(struct address_sp spin_lock(&pagecache_lock); } spin_unlock(&pagecache_lock); + return ret; } /** @@ -621,8 +626,10 @@ void filemap_fdatasync(struct address_sp * @mapping: address space structure to wait for * */ -void filemap_fdatawait(struct address_space * mapping) +int filemap_fdatawait(struct address_space * mapping) { + int ret = 0; + spin_lock(&pagecache_lock); while (!list_empty(&mapping->locked_pages)) { @@ -638,11 +645,14 @@ void filemap_fdatawait(struct address_sp spin_unlock(&pagecache_lock); ___wait_on_page(page); + if (PageError(page)) + ret = -EIO; page_cache_release(page); spin_lock(&pagecache_lock); } spin_unlock(&pagecache_lock); + return ret; } /* @@ -1519,12 +1529,14 @@ static ssize_t generic_file_direct_IO(in goto out_free; /* - * Flush to disk exlusively the _data_, metadata must remains + * Flush to disk exclusively the _data_, metadata must remain * completly asynchronous or performance will go to /dev/null. */ - filemap_fdatasync(mapping); - retval = fsync_inode_data_buffers(inode); - filemap_fdatawait(mapping); + retval = filemap_fdatasync(mapping); + if (retval == 0) + retval = fsync_inode_data_buffers(inode); + if (retval == 0) + retval = filemap_fdatawait(mapping); if (retval < 0) goto out_free; @@ -2141,26 +2153,41 @@ int generic_file_mmap(struct file * file * The msync() system call. */ +/* + * We attempt to implement MS_ASYNC, but it's lame. There needs to be a way + * of starting async writeout of the metadata and inode. + */ static int msync_interval(struct vm_area_struct * vma, unsigned long start, unsigned long end, int flags) { + int ret = 0; struct file * file = vma->vm_file; + if (file && (vma->vm_flags & VM_SHARED)) { - int error; - error = filemap_sync(vma, start, end-start, flags); + /* filemap_sync() cannot fail */ + ret = filemap_sync(vma, start, end-start, flags); - if (!error && (flags & MS_SYNC)) { + if (flags & (MS_SYNC|MS_ASYNC)) { struct inode * inode = file->f_dentry->d_inode; + down(&inode->i_sem); - filemap_fdatasync(inode->i_mapping); - if (file->f_op && file->f_op->fsync) - error = file->f_op->fsync(file, file->f_dentry, 1); - filemap_fdatawait(inode->i_mapping); + ret = filemap_fdatasync(inode->i_mapping); + if (flags & MS_SYNC) { + int err; + + if (file->f_op && file->f_op->fsync) { + err = file->f_op->fsync(file, file->f_dentry, 1); + if (err && !ret) + ret = err; + } + err = filemap_fdatawait(inode->i_mapping); + if (err && !ret) + ret = err; + } up(&inode->i_sem); } - return error; } - return 0; + return ret; } asmlinkage long sys_msync(unsigned long start, size_t len, int flags) --- linux-2.4.18-pre1/fs/block_dev.c Fri Dec 21 11:19:14 2001 +++ linux-akpm/fs/block_dev.c Sat Jan 5 03:21:07 2002 @@ -171,11 +171,15 @@ static loff_t block_llseek(struct file * static int __block_fsync(struct inode * inode) { - int ret; + int ret, err; - filemap_fdatasync(inode->i_mapping); - ret = sync_buffers(inode->i_rdev, 1); - filemap_fdatawait(inode->i_mapping); + ret = filemap_fdatasync(inode->i_mapping); + err = sync_buffers(inode->i_rdev, 1); + if (err && !ret) + ret = err; + err = filemap_fdatawait(inode->i_mapping); + if (err && !ret) + ret = err; return ret; } --- linux-2.4.18-pre1/fs/buffer.c Fri Dec 21 11:19:14 2001 +++ linux-akpm/fs/buffer.c Sat Jan 5 03:21:07 2002 @@ -401,9 +401,9 @@ asmlinkage long sys_fsync(unsigned int f struct file * file; struct dentry * dentry; struct inode * inode; - int err; + int ret, err; - err = -EBADF; + ret = -EBADF; file = fget(fd); if (!file) goto out; @@ -411,21 +411,27 @@ asmlinkage long sys_fsync(unsigned int f dentry = file->f_dentry; inode = dentry->d_inode; - err = -EINVAL; - if (!file->f_op || !file->f_op->fsync) + ret = -EINVAL; + if (!file->f_op || !file->f_op->fsync) { + /* Why? We can still call filemap_fdatasync */ goto out_putf; + } /* We need to protect against concurrent writers.. */ down(&inode->i_sem); - filemap_fdatasync(inode->i_mapping); + ret = filemap_fdatasync(inode->i_mapping); err = file->f_op->fsync(file, dentry, 0); - filemap_fdatawait(inode->i_mapping); + if (err && !ret) + ret = err; + err = filemap_fdatawait(inode->i_mapping); + if (err && !ret) + ret = err; up(&inode->i_sem); out_putf: fput(file); out: - return err; + return ret; } asmlinkage long sys_fdatasync(unsigned int fd) @@ -433,9 +439,9 @@ asmlinkage long sys_fdatasync(unsigned i struct file * file; struct dentry * dentry; struct inode * inode; - int err; + int ret, err; - err = -EBADF; + ret = -EBADF; file = fget(fd); if (!file) goto out; @@ -443,14 +449,18 @@ asmlinkage long sys_fdatasync(unsigned i dentry = file->f_dentry; inode = dentry->d_inode; - err = -EINVAL; + ret = -EINVAL; if (!file->f_op || !file->f_op->fsync) goto out_putf; down(&inode->i_sem); - filemap_fdatasync(inode->i_mapping); + ret = filemap_fdatasync(inode->i_mapping); err = file->f_op->fsync(file, dentry, 1); - filemap_fdatawait(inode->i_mapping); + if (err && !ret) + ret = err; + err = filemap_fdatawait(inode->i_mapping); + if (err && !ret) + ret = err; up(&inode->i_sem); out_putf: --- linux-2.4.18-pre1/fs/nfs/file.c Wed Dec 26 11:47:41 2001 +++ linux-akpm/fs/nfs/file.c Sat Jan 5 03:21:07 2002 @@ -244,6 +244,7 @@ nfs_lock(struct file *filp, int cmd, str { struct inode * inode = filp->f_dentry->d_inode; int status = 0; + int status2; dprintk("NFS: nfs_lock(f=%4x/%ld, t=%x, fl=%x, r=%Ld:%Ld)\n", inode->i_dev, inode->i_ino, @@ -278,11 +279,18 @@ nfs_lock(struct file *filp, int cmd, str * Flush all pending writes before doing anything * with locks.. */ - filemap_fdatasync(inode->i_mapping); + /* + * Shouldn't filemap_fdatasync/wait be inside i_sem? + */ + status = filemap_fdatasync(inode->i_mapping); down(&inode->i_sem); - status = nfs_wb_all(inode); + status2 = nfs_wb_all(inode); + if (status2 && !status) + status = status2; up(&inode->i_sem); - filemap_fdatawait(inode->i_mapping); + status2 = filemap_fdatawait(inode->i_mapping); + if (status2 && !status) + status = status2; if (status < 0) return status; @@ -300,11 +308,17 @@ nfs_lock(struct file *filp, int cmd, str */ out_ok: if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->fl_type != F_UNLCK) { - filemap_fdatasync(inode->i_mapping); + status2 = filemap_fdatasync(inode->i_mapping); + if (status2 && !status) + status = status2; down(&inode->i_sem); - nfs_wb_all(inode); /* we may have slept */ + status2 = nfs_wb_all(inode); /* we may have slept */ + if (status2 && !status) + status2 = status; up(&inode->i_sem); - filemap_fdatawait(inode->i_mapping); + status2 = filemap_fdatawait(inode->i_mapping); + if (status2 && !status) + status = status2; nfs_zap_caches(inode); } return status; ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2002-01-05 11:43 ` Andrew Morton @ 2002-01-05 14:04 ` Trond Myklebust 2002-01-07 3:08 ` Andrea Arcangeli 1 sibling, 0 replies; 32+ messages in thread From: Trond Myklebust @ 2002-01-05 14:04 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel >>>>> " " == Andrew Morton <akpm@zip.com.au> writes: > out_ok: > if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->fl_type > != F_UNLCK) { > - filemap_fdatasync(inode->i_mapping); > + status2 = filemap_fdatasync(inode->i_mapping); > + if (status2 && !status) > + status = status2; > down(&inode->i_sem); > - nfs_wb_all(inode); /* we may have slept */ > + status2 = nfs_wb_all(inode); /* we may have slept */ > + if (status2 && !status) > + status2 = status; > up(&inode->i_sem); > - filemap_fdatawait(inode->i_mapping); > + status2 = filemap_fdatawait(inode->i_mapping); > + if (status2 && !status) > + status = status2; > nfs_zap_caches(inode); > } return status; Hmm. I'm not sure about this hunk... At this point in the code, we already know that we've been granted a lock by the server. All we are doing is to try to sync any data that may have been committed while we were waiting on the lock, in order to ensure that the act of locking provides a cache coherency point. IMHO it would be wrong to signal that the lock itself has failed just because some other process has lost data in the filemap_fdata* calls. It's a different matter with the nfs_wb_all() call: that indicates that the process has been signalled, so it may indeed make sense to return that particular error. Cheers, Trond ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2002-01-05 11:43 ` Andrew Morton 2002-01-05 14:04 ` Trond Myklebust @ 2002-01-07 3:08 ` Andrea Arcangeli 2002-01-07 3:49 ` Andrew Morton 1 sibling, 1 reply; 32+ messages in thread From: Andrea Arcangeli @ 2002-01-07 3:08 UTC (permalink / raw) To: Andrew Morton Cc: Alexander Viro, Linus Torvalds, torrey.hoffman, linux-kernel, Marcelo Tosatti, Stephen C. Tweedie, Trond Myklebust On Sat, Jan 05, 2002 at 03:43:36AM -0800, Andrew Morton wrote: > static int msync_interval(struct vm_area_struct * vma, > unsigned long start, unsigned long end, int flags) > { > + int ret = 0; > struct file * file = vma->vm_file; > + > if (file && (vma->vm_flags & VM_SHARED)) { > - int error; > - error = filemap_sync(vma, start, end-start, flags); > + /* filemap_sync() cannot fail */ :) yes, I guess it cames from some old debugging code trying to catch pte corruption, now dead in current kernels. > + ret = filemap_sync(vma, start, end-start, flags); > > - if (!error && (flags & MS_SYNC)) { > + if (flags & (MS_SYNC|MS_ASYNC)) { ok, it cannot fail but I prefer you either avoid the 'ret = filemap_sync' and you make filemap_sync return void, or you also left '(!err' over here. > struct inode * inode = file->f_dentry->d_inode; > + > down(&inode->i_sem); > - filemap_fdatasync(inode->i_mapping); > - if (file->f_op && file->f_op->fsync) > - error = file->f_op->fsync(file, file->f_dentry, 1); > - filemap_fdatawait(inode->i_mapping); > + ret = filemap_fdatasync(inode->i_mapping); > + if (flags & MS_SYNC) { > + int err; > + > + if (file->f_op && file->f_op->fsync) { > + err = file->f_op->fsync(file, file->f_dentry, 1); > + if (err && !ret) > + ret = err; > + } > + err = filemap_fdatawait(inode->i_mapping); > + if (err && !ret) > + ret = err; > + } sounds right (not something I'd love to do in 2.4 but it's strightforward enough so I'll take the risk). > @@ -433,9 +439,9 @@ asmlinkage long sys_fdatasync(unsigned i > struct file * file; > struct dentry * dentry; > struct inode * inode; > - int err; > + int ret, err; > > - err = -EBADF; > + ret = -EBADF; > file = fget(fd); > if (!file) > goto out; > @@ -443,14 +449,18 @@ asmlinkage long sys_fdatasync(unsigned i > dentry = file->f_dentry; > inode = dentry->d_inode; > > - err = -EINVAL; > + ret = -EINVAL; > if (!file->f_op || !file->f_op->fsync) > goto out_putf; > > down(&inode->i_sem); > - filemap_fdatasync(inode->i_mapping); > + ret = filemap_fdatasync(inode->i_mapping); > err = file->f_op->fsync(file, dentry, 1); > - filemap_fdatawait(inode->i_mapping); > + if (err && !ret) > + ret = err; > + err = filemap_fdatawait(inode->i_mapping); > + if (err && !ret) > + ret = err; > up(&inode->i_sem); > > out_putf: you forgot return ret here. > --- linux-2.4.18-pre1/fs/nfs/file.c Wed Dec 26 11:47:41 2001 > +++ linux-akpm/fs/nfs/file.c Sat Jan 5 03:21:07 2002 > @@ -244,6 +244,7 @@ nfs_lock(struct file *filp, int cmd, str > { > struct inode * inode = filp->f_dentry->d_inode; > int status = 0; > + int status2; > > dprintk("NFS: nfs_lock(f=%4x/%ld, t=%x, fl=%x, r=%Ld:%Ld)\n", > inode->i_dev, inode->i_ino, > @@ -278,11 +279,18 @@ nfs_lock(struct file *filp, int cmd, str > * Flush all pending writes before doing anything > * with locks.. > */ > - filemap_fdatasync(inode->i_mapping); > + /* > + * Shouldn't filemap_fdatasync/wait be inside i_sem? > + */ I think it's not necessary, because the list browse it's serialized by the pagecache_lock and writepage can run outside the i_sem. > + status = filemap_fdatasync(inode->i_mapping); > down(&inode->i_sem); > - status = nfs_wb_all(inode); > + status2 = nfs_wb_all(inode); > + if (status2 && !status) > + status = status2; > up(&inode->i_sem); > - filemap_fdatawait(inode->i_mapping); > + status2 = filemap_fdatawait(inode->i_mapping); > + if (status2 && !status) > + status = status2; > if (status < 0) > return status; > > @@ -300,11 +308,17 @@ nfs_lock(struct file *filp, int cmd, str > */ > out_ok: > if ((IS_SETLK(cmd) || IS_SETLKW(cmd)) && fl->fl_type != F_UNLCK) { > - filemap_fdatasync(inode->i_mapping); > + status2 = filemap_fdatasync(inode->i_mapping); > + if (status2 && !status) > + status = status2; > down(&inode->i_sem); > - nfs_wb_all(inode); /* we may have slept */ > + status2 = nfs_wb_all(inode); /* we may have slept */ > + if (status2 && !status) > + status2 = status; > up(&inode->i_sem); > - filemap_fdatawait(inode->i_mapping); > + status2 = filemap_fdatawait(inode->i_mapping); > + if (status2 && !status) > + status = status2; > nfs_zap_caches(inode); > } > return status; all right, thanks. Andrea ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2002-01-07 3:08 ` Andrea Arcangeli @ 2002-01-07 3:49 ` Andrew Morton 2002-01-07 4:31 ` Andrea Arcangeli 0 siblings, 1 reply; 32+ messages in thread From: Andrew Morton @ 2002-01-07 3:49 UTC (permalink / raw) To: Andrea Arcangeli Cc: Alexander Viro, Linus Torvalds, torrey.hoffman, linux-kernel, Marcelo Tosatti, Stephen C. Tweedie, Trond Myklebust Andrea Arcangeli wrote: > > > + ret = filemap_sync(vma, start, end-start, flags); > > > > - if (!error && (flags & MS_SYNC)) { > > + if (flags & (MS_SYNC|MS_ASYNC)) { > > ok, it cannot fail but I prefer you either avoid the 'ret = > filemap_sync' and you make filemap_sync return void, or you also left > '(!err' over here. OK, let's leave the test in place - filemap_sync() has global scope. > > struct inode * inode = file->f_dentry->d_inode; > > + > > down(&inode->i_sem); > > - filemap_fdatasync(inode->i_mapping); > > - if (file->f_op && file->f_op->fsync) > > - error = file->f_op->fsync(file, file->f_dentry, 1); > > - filemap_fdatawait(inode->i_mapping); > > + ret = filemap_fdatasync(inode->i_mapping); > > + if (flags & MS_SYNC) { > > + int err; > > + > > + if (file->f_op && file->f_op->fsync) { > > + err = file->f_op->fsync(file, file->f_dentry, 1); > > + if (err && !ret) > > + ret = err; > > + } > > + err = filemap_fdatawait(inode->i_mapping); > > + if (err && !ret) > > + ret = err; > > + } > > sounds right (not something I'd love to do in 2.4 but it's > strightforward enough so I'll take the risk). It works OK, but in practice makes little difference compared to MS_ASYNC. We run out of request slots very easily with this sort of thing. Another option would be to make MS_ASYNC behave the same as MS_SYNC, which is probably better than just ignoring it as we do at present. I'll leave it as shown above unless there be objections. > > you forgot return ret here. Whoop, thanks. It was returning `err'. > > --- linux-2.4.18-pre1/fs/nfs/file.c Wed Dec 26 11:47:41 2001 > > +++ linux-akpm/fs/nfs/file.c Sat Jan 5 03:21:07 2002 > > @@ -244,6 +244,7 @@ nfs_lock(struct file *filp, int cmd, str > > { > > struct inode * inode = filp->f_dentry->d_inode; > > int status = 0; > > + int status2; > > > > dprintk("NFS: nfs_lock(f=%4x/%ld, t=%x, fl=%x, r=%Ld:%Ld)\n", > > inode->i_dev, inode->i_ino, > > @@ -278,11 +279,18 @@ nfs_lock(struct file *filp, int cmd, str > > * Flush all pending writes before doing anything > > * with locks.. > > */ > > - filemap_fdatasync(inode->i_mapping); > > + /* > > + * Shouldn't filemap_fdatasync/wait be inside i_sem? > > + */ > > I think it's not necessary, because the list browse it's serialized by > the pagecache_lock and writepage can run outside the i_sem. Yup. I've changed this part of the patch based on some discussion with Trond. I'll wait until Marcelo looks like he has his head above water and then send out the final version of the ramdisk, truncate and fsync/msync patches, cc'ed to yourself and lkml. Thanks. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2002-01-07 3:49 ` Andrew Morton @ 2002-01-07 4:31 ` Andrea Arcangeli 0 siblings, 0 replies; 32+ messages in thread From: Andrea Arcangeli @ 2002-01-07 4:31 UTC (permalink / raw) To: Andrew Morton Cc: Alexander Viro, Linus Torvalds, torrey.hoffman, linux-kernel, Marcelo Tosatti, Stephen C. Tweedie, Trond Myklebust On Sun, Jan 06, 2002 at 07:49:05PM -0800, Andrew Morton wrote: > which is probably better than just ignoring it as we do at present. > I'll leave it as shown above unless there be objections. by design defintely better as shown than mainline. a MS_ASYNC currently doesn't work because the VM will never care flushing dirty mapped pages, first it will have to unmap them that will never happen unless there's low-on mem condition, while the filemap_fdatasync will ensure those pages will hit the disk asynchronously in a sane amount of time (depending on kupdate). MS_ASYNC manpage says an update is scheduled, currently it will instead never happen for example if the machine is idle and there's no vm swap etc... your change will fix it. > I'll wait until Marcelo looks like he has his head above water and > then send out the final version of the ramdisk, truncate and > fsync/msync patches, cc'ed to yourself and lkml. I'd like to release a new -aa within tomorrow afternoon with every known bug discussed in this thread fixed, so then I can concentrate back on another thing, so I'd like to get those new versions ASAP :) for the buffer_new thing I guess it's simpler to just change the semantics of it rather than allocating ram on the stack. Andrea ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2001-12-30 6:19 ` Andrew Morton 2001-12-30 6:33 ` Alexander Viro @ 2001-12-30 23:56 ` Andrea Arcangeli 2001-12-31 10:06 ` Daniel Phillips 2002-01-04 16:38 ` Stephen C. Tweedie 3 siblings, 0 replies; 32+ messages in thread From: Andrea Arcangeli @ 2001-12-30 23:56 UTC (permalink / raw) To: Andrew Morton Cc: Linus Torvalds, torrey.hoffman, linux-kernel, Alexander Viro, Marcelo Tosatti, Stephen C. Tweedie On Sat, Dec 29, 2001 at 10:19:52PM -0800, Andrew Morton wrote: > [ this thread started at http://www.lib.uaa.alaska.edu/linux-kernel/archive/2001-Week-51/1001.html ] > > > Andrea Arcangeli wrote: > > > > On Thu, Dec 20, 2001 at 09:56:40PM -0800, Andrew Morton wrote: > > > Andrea Arcangeli wrote: > > > > > > > > ... > > > > actually while testing it I unfortunately found also an fs corruption > > > > bug in the ->prepare_write/commit_write/writepage/direct_IO callbacks. > > > > It's a longstanding one, since get_block born. In short, if get_block > > > > fails while mapping on a certain page during > > > > prepare_write/writepage/direct_IO (like it can happen with a full > > > > filesystem, incidentally ext2 on /dev/ram0 during my testing that is > > > > only 4M and so it overflows fast), the blocks before the ENOSPC failure > > > > will be allocated, but the i_size won't be update accordingly (no commit > > > > write, because prepare_write failed in the middle). for the > > > > generic_file_write users it's easily fixable with an hack (truncating > > > > backwards because we don't know how far we got allocating blocks when we > > > > return from prepare_write), similar hack for the direct_IO one (that > > > > commits only once at the end in function of the direct_IO generated). > > > > But for writepage is quite a pain, infact I believe the writepage blocks > > > > should be reserved in-core, to guarantee we will never fail a truncate > > > > with ENOSPC. With the shared mappings we're effectively doing allocate > > > > on flush... but with the lack of space reservation that makes it > > > > unreliable :) > > > > > > The -ac kernels handled this by zeroing out the accidentally-allocated > > > blocks at __block_prepare_write. > > > > actually my fix seems cleaner because it puts the "clearing" in one single > > place. > > I think so too. Even for ext3, which has a very complex truncate, > it appears to be OK. > > > > > So for now I did an hack to cure the other two (writepage can still > > > > corrupt the fs as said). I think the right fix (ala 2.5) is to change > > > > the API so we can use the last blocks too, but the below will cure 2.4 > > > > and for writepage the right fix IMHO is to do the reservation of the > > > > space. > > > > > > This is better in a way because it reclaims the eztra few blocks. But > > > the -ac approach also works for writepage(). > > > > yes, it can solve the metadata corruption (assuming the locking is > > Where can metadata corruption occur? A few extra blocks outside > i_size for ext2 directories isn't going to cause corruption, is it? > > Or are you referring to i_blocks accounting being incorrect? yes, I'm referring to the further blocks wrongly left allocated into the inode. that's plain metadata corruption (of course, it's not completly random corruption at least) but just to mention one of the side effects it can exploited in order to sniff contents of old data on disk (ok, with ext2 that can happen anytime after an unclean unmount anyways, but you know ext3 ordered or data journaling would prevent that). > > right, I can as well call ->truncate within writepage but it's not > > obvious at all that it won't race because we don't hold the i_sem within > > writepage), but the data corruption still holds. > > For sure - holding i_sem on truncate is abolutely required. yes, this is also why it's not obvious the "undo" get_block on -ENOSPC failure within writepage (what I understood from you to be in -ac). > > I mean, there's no > > failure path to notify userspace that a certain page fault is writing > > into a page over an hole, that we don't have space to later allocate on > > disk. so to me it sounds like MAP_SHARED should preallocate the space of > > the holes so you will know that the writes into the MAP_SHARED segments > > won't be lost (current state of things will lead to silent corruption > > and pinned dirty pages in ram, aka broken allocate on flush like > > previously said). > > Um. How does this differ from an I/O error on flush? It differs because I/O error is an hardware error and we have no hope if the platter is corrupted. I mean, if the DB gets corrupted after a reboot because last night the harddisk created some badblocks is not our business. We cannot fix it, currently we cannot notify about it, but after all it wasn't our mistake :). we'll just left some dirty pages around if those write fails because the hardware couldn't relocate the badblock transparently during the writes. If instead the DB gets corrupted because somebody was writing to disk and the disk filled up, so the writepage allocate on flush couldn't find any space where to write the dirty data, that sounds like our mistake. There's no way the app can know about this unless we sigsomething or we return -ENOSPC on mmap. Checking SuSv2 there's no -ENOSPC contemplated as retval from mmap, but I remeber there are always the zillon of exceptions and "always available" error codes, so it's not obvious it's not contemplated. I mean, if the mmap API prevents us to return -ENOSPC there's no way we can build a reliable MAP_SHARED with filesystems that allows holes on files. You will never know if this MAP_SHARED updates hit the disk, unless you previously made sure to generate a file without holes (to get the -ENOSPC failure path with the write(2) syscall). > Would it be necessary to preallocate the holes at mmap() time? Mad Not preallocate, but just reserve space, just like we should do with write(2), the reason write(2) is synchronous with the lowlevel block allocation is just because we don't have an API with the lowlevel fs to reserve space. As soon as we'll add this API it will be possible to turn all the fs out there into allocate on flush for writes (not only for MAP_SHARED). so if we write a few bytes and delete the file, we won't hit the lowlevel fs ialloc path for example (but only the superblock space reservation callback that can also be threaded or maybe lockless with atomic ops). > hand-waving: Could we not perform the instantiation at pagefault time, > and give the caller SIGBUS if we cannot allocate the blocks? Or if > there's an IO error, or quota exceeded. sigbus is viable solution indeed, it's a matter of userspace API specifications actually, not something we can choose liberally. The fact is, currently we sigbus on MAP_SHARED if the updates are beyond the end of the i_size, so maybe using sigbus to notify about the other cases too would be confusing? (app thinks it's writing beyond the end of i_size, unless it explicitly checks for that) > > > > Why was that code not brought across? > > > > Who developed that code? Can the author of the code forward port it to > > 2.4.18pre and post a patch to the list so we can review? thanks, > > Avoiding the matadata corruption would be a good start at least for 2.4, > > then we should just focus on the writepage locking that could race with the > > other "create=1" get_blocks. If it doesn't race I will certainly agree > > on that approch for 2.4. > > > > It appeared in 2.4.2-ac25, and it looks like sct was the author: > > o Fix higmem block_prepare_write crash (Stephen Tweedie) > > Which is interesting - from the changelog it looks like he was > fixing a different problem! I always though that code was there > to prevent leakage of stale blocks. Stephen? Makes sense to me, I also didn't recall any specific fix for the i_block corruption before my report of last week, maybe it fixed both things at once? > A 2.4.18-pre1 version is below. It compiles, but I haven't actually > exercised that code path. > > It's not a pretty fix, IMO. It leaves dangling blocks outside i_size > which will make fsck unhappy. It also breaks ext3 a little bit - > those blocks should be journalled and in theory we'd need to > clone off private copies of __block_prepare_write() and > unmap_underlying_metadata() to do this. Which would be irritating, > but not the end of the world. (Should have done this in the -ac > version of ext3, but I never noticed it). > > > --- linux-2.4.18-pre1/fs/buffer.c Fri Dec 21 11:19:14 2001 > +++ linux-akpm/fs/buffer.c Sat Dec 29 21:53:46 2001 > @@ -1639,6 +1639,17 @@ static int __block_prepare_write(struct > } > return 0; > out: > + bh = head; > + block_start = 0; > + do { > + if (buffer_new(bh) && !buffer_uptodate(bh)) { > + memset(kaddr+block_start, 0, bh->b_size); > + set_bit(BH_Uptodate, &bh->b_state); > + mark_buffer_dirty(bh); > + } > + block_start += bh->b_size; > + bh = bh->b_this_page; > + } while (bh != head); > return err; > } yes, I doesn't deallocate the blocks that was the whole point of the corruption I mentioned, so it looks it's not covering our case. > > Question: can someone please define BH_New? Its lifecycle seems BH_New means the bh is just born while being mapped. It matters only if you called get_block with create=1 in order to map an unmapped bh. If the bh was mapped it doesn't matter, because if it was mapped you didn't need to call get_block in first place (so we left it set and that's fine). Calling get_block on mapped bh is a bug. a bh_new is always bh_mapped. > very vague. We never actually seem to *clear* it anywhere for > ext2, and it appears that the kernel will keep on treating a > clearly non-new buffer as "new" all the time. ext3 explicitly > clears BH_New in get_block(), if it finds the block was already > present in the file. I did this because we need the newness > info for internal purposes. > > - anyways about the sigbus/mmap returning -ENOSPC things, they're low prio, I think first we should avoid metadata corruption, that's our internal stuff and we have no constraints about userspace API in order to fix it. Convering writepage with the "get_block undo" as said doesn't look obvious due the lack of i_sem and the other writers depending on stuff not going away under them unless i_sem was acquired. (OTOH those floating unwriteable dirty pages could run the machine oom) Andrea ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2001-12-30 6:19 ` Andrew Morton 2001-12-30 6:33 ` Alexander Viro 2001-12-30 23:56 ` Andrea Arcangeli @ 2001-12-31 10:06 ` Daniel Phillips 2002-01-04 16:38 ` Stephen C. Tweedie 3 siblings, 0 replies; 32+ messages in thread From: Daniel Phillips @ 2001-12-31 10:06 UTC (permalink / raw) To: Andrew Morton, Andrea Arcangeli Cc: Linus Torvalds, torrey.hoffman, linux-kernel, Alexander Viro, Marcelo Tosatti, Stephen C. Tweedie On December 30, 2001 07:19 am, Andrew Morton wrote: > Question: can someone please define BH_New? Its lifecycle seems > very vague. That's because it is. (vague) In its current incarnation, BH_New is valid right after calling xxx_get_block, otherwise it's like a floating signal. This is just sloppy. > We never actually seem to *clear* it anywhere for > ext2, and it appears that the kernel will keep on treating a > clearly non-new buffer as "new" all the time. ext3 explicitly > clears BH_New in get_block(), if it finds the block was already > present in the file. I did this because we need the newness > info for internal purposes. As I understand it, BH_New is set because xxx_get_block created a block and didn't initialize it - the initialization is expected to be done by someone else. So somebody better pick it up before the block starts transitioning to other states. The correct model would use scalars for block states, not bits, and we would enumerate all the correct transitions. Since that isn't going to happen any time soon, we could clear BH_New every time we set some other state bit. Or maybe BUG if we ever change another state bit and BH_NEW is still set, indicating somebody forgot to initialize the buffer. Hmm, I'm just taking a look at ext3/inode.c, line 863 - you've just called getblk, and you will act on BH_New. Does that code ever get executed? -- Daniel ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2001-12-30 6:19 ` Andrew Morton ` (2 preceding siblings ...) 2001-12-31 10:06 ` Daniel Phillips @ 2002-01-04 16:38 ` Stephen C. Tweedie 3 siblings, 0 replies; 32+ messages in thread From: Stephen C. Tweedie @ 2002-01-04 16:38 UTC (permalink / raw) To: Andrew Morton Cc: Andrea Arcangeli, Linus Torvalds, torrey.hoffman, linux-kernel, Alexander Viro, Marcelo Tosatti, Stephen C. Tweedie Hi, [catching up from holidays] On Sat, Dec 29, 2001 at 10:19:52PM -0800, Andrew Morton wrote: > > It appeared in 2.4.2-ac25, and it looks like sct was the author: > > o Fix higmem block_prepare_write crash (Stephen Tweedie) > > Which is interesting - from the changelog it looks like he was > fixing a different problem! I always though that code was there > to prevent leakage of stale blocks. Stephen? The code I fixed was the - memset(bh->b_data, 0, bh->b_size); + memset(kaddr+block_start, 0, bh->b_size); chunk to prevent the block_prepare_write out: error path from oopsing on highmem pages -- that's unrelated to the problem at hand here. Cheers, Stephen ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2001-12-21 1:49 ` Andrea Arcangeli [not found] ` <3C22CF16.C78B1F19@zip.com.au> @ 2002-01-05 7:53 ` Andrew Morton 2002-01-07 1:08 ` Andrea Arcangeli 1 sibling, 1 reply; 32+ messages in thread From: Andrew Morton @ 2002-01-05 7:53 UTC (permalink / raw) To: Andrea Arcangeli Cc: Linus Torvalds, torrey.hoffman, linux-kernel, Alexander Viro, Marcelo Tosatti Andrea Arcangeli wrote: > > I diffed a more advanced version of it. It's not very well tested. > > [ your rd.c patch ] > Your patch is working OK for me. I made two changes: - s/PAGE_SIZE/PAGE_CACHE_SIZE/ in ramdisk_updatepage() - I think there's an SMP race in rd_blkdev_pagecache_IO() - it looks up the underlying page in the pagecache and if it is present, it simply proceeds, assuming that the page is uptodate. But another CPU could have just added the page and may be in the middle of initialising it. So I changed rd_blkdev_pagecache_IO() to always lock the page. It got simpler. BTW, here's some fun: set up /dev/ram0 and /dev/ram1, both 128 megabytes. Fill each of them with a big file and then try to umount /dev/ram1. The machine locks up for sixty seconds running the quadratic search in write_some_buffers(). Sigh. In the lowlatency patch I simply brute-forced this by always setting dev = NODEV in sync_buffers(). Please review - I'm trying to use the rd driver to test the truncate+ENOSPC patch and there's just rampant filesystem corruption all over the place without this patch. It's unusable. --- linux-2.4.18-pre1/drivers/block/rd.c Fri Dec 21 11:19:13 2001 +++ linux-akpm/drivers/block/rd.c Fri Jan 4 23:50:09 2002 @@ -191,26 +191,44 @@ __setup("ramdisk_blocksize=", ramdisk_bl * 2000 Transmeta Corp. * aops copied from ramfs. */ -static int ramdisk_readpage(struct file *file, struct page * page) +static void ramdisk_updatepage(struct page * page, int need_kmap) { if (!Page_Uptodate(page)) { - memset(kmap(page), 0, PAGE_CACHE_SIZE); - kunmap(page); + struct buffer_head *bh = page->buffers; + void * address; + + if (need_kmap) + kmap(page); + address = page_address(page); + if (bh) { + struct buffer_head *tmp = bh; + do { + if (!buffer_uptodate(tmp)) { + memset(address, 0, tmp->b_size); + mark_buffer_uptodate(tmp, 1); + } + address += tmp->b_size; + tmp = tmp->b_this_page; + } while (tmp != bh); + } else + memset(address, 0, PAGE_CACHE_SIZE); + if (need_kmap) + kunmap(page); flush_dcache_page(page); SetPageUptodate(page); } +} + +static int ramdisk_readpage(struct file *file, struct page * page) +{ + ramdisk_updatepage(page, 1); UnlockPage(page); return 0; } static int ramdisk_prepare_write(struct file *file, struct page *page, unsigned offset, unsigned to) { - if (!Page_Uptodate(page)) { - void *addr = page_address(page); - memset(addr, 0, PAGE_CACHE_SIZE); - flush_dcache_page(page); - SetPageUptodate(page); - } + ramdisk_updatepage(page, 0); SetPageDirty(page); return 0; } @@ -233,44 +251,40 @@ static int rd_blkdev_pagecache_IO(int rw unsigned long index; int offset, size, err; - err = -EIO; err = 0; mapping = rd_bdev[minor]->bd_inode->i_mapping; + /* writing a buffer cache not uptodate must not clear it */ + if (sbh->b_page->mapping == mapping) { + if (rw == WRITE) { + mark_buffer_uptodate(sbh, 1); + SetPageDirty(sbh->b_page); + } + goto out; + } + index = sbh->b_rsector >> (PAGE_CACHE_SHIFT - 9); offset = (sbh->b_rsector << 9) & ~PAGE_CACHE_MASK; size = sbh->b_size; do { int count; - struct page ** hash; struct page * page; char * src, * dst; - int unlock = 0; count = PAGE_CACHE_SIZE - offset; if (count > size) count = size; size -= count; - hash = page_hash(mapping, index); - page = __find_get_page(mapping, index, hash); + page = grab_cache_page(mapping, index); if (!page) { - page = grab_cache_page(mapping, index); err = -ENOMEM; - if (!page) - goto out; - err = 0; - - if (!Page_Uptodate(page)) { - memset(kmap(page), 0, PAGE_CACHE_SIZE); - kunmap(page); - SetPageUptodate(page); - } - - unlock = 1; + goto out; } + ramdisk_updatepage(page, 1); + index++; if (rw == READ) { @@ -294,8 +308,7 @@ static int rd_blkdev_pagecache_IO(int rw } else { SetPageDirty(page); } - if (unlock) - UnlockPage(page); + UnlockPage(page); __free_page(page); } while (size); ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2002-01-05 7:53 ` Andrew Morton @ 2002-01-07 1:08 ` Andrea Arcangeli 0 siblings, 0 replies; 32+ messages in thread From: Andrea Arcangeli @ 2002-01-07 1:08 UTC (permalink / raw) To: Andrew Morton Cc: Linus Torvalds, torrey.hoffman, linux-kernel, Alexander Viro, Marcelo Tosatti On Fri, Jan 04, 2002 at 11:53:59PM -0800, Andrew Morton wrote: > Andrea Arcangeli wrote: > > > > I diffed a more advanced version of it. It's not very well tested. > > > > [ your rd.c patch ] > > > > Your patch is working OK for me. I made two changes: > > - s/PAGE_SIZE/PAGE_CACHE_SIZE/ in ramdisk_updatepage() ok > > - I think there's an SMP race in rd_blkdev_pagecache_IO() - it looks up > the underlying page in the pagecache and if it is present, it simply > proceeds, assuming that the page is uptodate. But another CPU could have > just added the page and may be in the middle of initialising it. > So I changed rd_blkdev_pagecache_IO() to always lock the page. It > got simpler. certainly it makes smp simpler agreed, so also ramdisk_updatepage always runs on locked pages. the reason it was not locking down the page is that before the ramdisk_aops was introduced, a read via /dev/ram? would lock down the pagecache page, and then start the ll_rw_block on the pagecache page, but with the page locked, so it would deadlock with an unconditional grab_cache_page. Now with the ramdisk_aops it should be ok because the physical address space I/O never uses the ->make_request_fn so it shouldn't recurse on the page lock any longer. > Please review - I'm trying to use the rd driver to test the truncate+ENOSPC your patch seems all right to me now (of course with ramdisk_updatepage run unconditionally). (I'd only set -ENOMEM in the fast path, so the oom branch becomes an out of line goto, didn't checked the asm generated though) Andrea ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes 2001-12-20 19:46 ` Linus Torvalds 2001-12-20 22:56 ` Alexander Viro 2001-12-20 23:42 ` Andrea Arcangeli @ 2001-12-21 1:38 ` Tachino Nobuhiro 2001-12-21 1:51 ` Everyone else but TWO Andre Hedrick 2 siblings, 1 reply; 32+ messages in thread From: Tachino Nobuhiro @ 2001-12-21 1:38 UTC (permalink / raw) To: torvalds; +Cc: torrey.hoffman, linux-kernel, viro, marcelo, tachino At Thu, 20 Dec 2001 11:46:23 -0800, Linus Torvalds wrote: > > The problem is that having buffers doesn't necessarily always mean that > they are valid, nor that _all_ of them are valid. If following sequence occurs, ramdisk_readpage() may clear the valid buffer data. I'm not sure whether this really occurs, but if it does, I think lock_buffer()/unlock_buffer() may be required. CPU1 CPU2 --------------------------------------------------------------------------- call ext2_alloc_branch() call ramdisk_readpage() bh = getblk(); lock_buffer(bh); write data to bh. ramdisk_readpage() if (!buffer_uptodate(bh)) do memset() mark_buffer_uptodate() unlock_buffer(bh); ^ permalink raw reply [flat|nested] 32+ messages in thread
* Everyone else but TWO ... 2001-12-21 1:38 ` Tachino Nobuhiro @ 2001-12-21 1:51 ` Andre Hedrick 0 siblings, 0 replies; 32+ messages in thread From: Andre Hedrick @ 2001-12-21 1:51 UTC (permalink / raw) To: linux-kernel; +Cc: Linus Torvalds, Marcelo Tosatti Well now that I have proven the isolation technique to the folks IRC'ng, just maybe the two Lead Penguins will get catch a clue. I suspect one of the two is more on top of the issue and more willing to listen. Regards, Andre Hedrick CEO/President, LAD Storage Consulting Group Linux ATA Development Linux Disk Certification Project ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2002-01-07 4:32 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-12-20 19:06 ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes Torrey Hoffman
2001-12-20 19:46 ` Linus Torvalds
2001-12-20 22:56 ` Alexander Viro
2001-12-20 23:42 ` Andrea Arcangeli
2001-12-21 1:49 ` Andrea Arcangeli
[not found] ` <3C22CF16.C78B1F19@zip.com.au>
2001-12-29 15:40 ` Andrea Arcangeli
2001-12-30 6:19 ` Andrew Morton
2001-12-30 6:33 ` Alexander Viro
2001-12-30 6:38 ` Andrew Morton
2001-12-30 7:17 ` Alexander Viro
2001-12-30 10:15 ` ramdisk corruption problems - was: RE: pivot_root and initrd Alan Cox
2001-12-31 0:08 ` ramdisk corruption problems - was: RE: pivot_root and initrd kern el panic woes Andrea Arcangeli
2001-12-30 7:08 ` Andrew Morton
2001-12-30 7:29 ` Alexander Viro
2001-12-30 7:59 ` ramdisk corruption problems - was: RE: pivot_root and initrdkern " Andrew Morton
2001-12-30 17:40 ` Linus Torvalds
2001-12-31 0:28 ` Andrea Arcangeli
2001-12-31 0:35 ` Linus Torvalds
2001-12-31 1:00 ` Andrea Arcangeli
2001-12-31 0:05 ` ramdisk corruption problems - was: RE: pivot_root and initrd kern " Andrea Arcangeli
2002-01-05 11:43 ` Andrew Morton
2002-01-05 14:04 ` Trond Myklebust
2002-01-07 3:08 ` Andrea Arcangeli
2002-01-07 3:49 ` Andrew Morton
2002-01-07 4:31 ` Andrea Arcangeli
2001-12-30 23:56 ` Andrea Arcangeli
2001-12-31 10:06 ` Daniel Phillips
2002-01-04 16:38 ` Stephen C. Tweedie
2002-01-05 7:53 ` Andrew Morton
2002-01-07 1:08 ` Andrea Arcangeli
2001-12-21 1:38 ` Tachino Nobuhiro
2001-12-21 1:51 ` Everyone else but TWO Andre Hedrick
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox