* fs corruption with invalidate_buffers()
@ 2000-12-06 2:07 Jan Niehusmann
2000-12-07 19:05 ` Jan Niehusmann
0 siblings, 1 reply; 12+ messages in thread
From: Jan Niehusmann @ 2000-12-06 2:07 UTC (permalink / raw)
To: linux-kernel, adilger
Some days ago I saw filesystem corruptions while testing the ext2fs online
resize patches by Andreas Dilger. First I thought that the online resizing
caused the problems, but further investigations didn't support this.
The latest observation shows that the problem is probably neither ext2 nor
lvm related:
While resizing the filesystem, invalidate_buffers() is called from the
lvm code. (lvm.c, line 2251, in lvm_do_lv_extend_reduce())
If I remove this call, the corruption goes away. But this is probably not
the correct fix, as it can cause problems when reducing the lv size.
For reference, some details of the corruption:
- I reproduced it with kernels between 2.4.0-test9
and 2.4.0-test12-pre5
- It is easily reproducible immediately after rebooting, but goes
away after some uptime (perhaps simply related to the amount of
unused memory)
- example script follows (attention: absolute device names
like /dev/vg1/test3 hardcoded!)
---------------------------------------------------
#!/bin/bash
umount /dev/vg1/test3
lvremove -f /dev/vg1/test3
lvcreate -n test3 -L 100M vg1
mke2fs -b 1024 /dev/vg1/test3
ext2prepare -v /dev/vg1/test3 50G
mount /dev/vg1/test3 /mnt/test3
( sleep 20; echo resize1; e2fsadm -L+90M /dev/vg1/test3; echo resize1 done ;
sleep 10; echo resize2; e2fsadm -L+90M /dev/vg1/test3; echo resize2 done ) &
echo copy1
cp -a /mnt/test/linux /mnt/test3/linux
echo copy1 done
echo copy2
cp -a /mnt/test3/linux /mnt/test3/linux2
echo copy2 done
---------------------------------------------------
/mnt/test/linux contains (surprise) a linux source, but I don't think
the contents are too important :-). The sleep values are tuned in a way that
leads to the following sequence:
copy1, resize1, resize1 done, copy1 done,
copy2, resize2, resize2 done, copy2 done
After that, the first copy is corrupted in memory only (and is ok after
rebooting), and the second copy is corrupted in memory and on disk. The
corrupted files contain parts of other files or binary stuff that may come
from directory entries.
I guess that invalidate_buffers somehow marks the buffers that contain
the first copy as free, but the second cp still uses them to copy the
files again. I don't understand the source well enough to find out
how it happens.
Jan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: fs corruption with invalidate_buffers()
2000-12-06 2:07 fs corruption with invalidate_buffers() Jan Niehusmann
@ 2000-12-07 19:05 ` Jan Niehusmann
2000-12-07 21:30 ` [PATCH] " Jan Niehusmann
0 siblings, 1 reply; 12+ messages in thread
From: Jan Niehusmann @ 2000-12-07 19:05 UTC (permalink / raw)
To: linux-kernel, adilger
On Wed, Dec 06, 2000 at 03:07:23AM +0100, Jan Niehusmann wrote:
> While resizing the filesystem, invalidate_buffers() is called from the
> lvm code. (lvm.c, line 2251, in lvm_do_lv_extend_reduce())
> If I remove this call, the corruption goes away. But this is probably not
> the correct fix, as it can cause problems when reducing the lv size.
Some more details:
I added the following code to put_last_free(bh) in buffer.c:
--- buffer.c.orig Wed Dec 6 17:19:57 2000
+++ buffer.c Thu Dec 7 19:55:39 2000
@@ -500,6 +500,11 @@
struct bh_free_head *head = &free_list[BUFSIZE_INDEX(bh->b_size)];
struct buffer_head **bhp = &head->list;
+ if(bh->b_page && Page_Uptodate(bh->b_page)
+ && bh->b_page->mapping) { // XXX ???
+ BUG();
+ }
+
bh->b_state = 0;
spin_lock(&head->lock);
That is, if I want to put a buffer to the free list, I check if it is
mapped and uptodate. If I understand the memory management correctly, this
is a Bad Thing and should not happen. But guess what? It does, in
invalidate_buffers.
I think invalidate_buffers should check if the buffer belongs to a
mapped page, and if it does, invalidate this mapping.
Jan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 12+ messages in thread* [PATCH] Re: fs corruption with invalidate_buffers()
2000-12-07 19:05 ` Jan Niehusmann
@ 2000-12-07 21:30 ` Jan Niehusmann
2000-12-07 22:03 ` Udo A. Steinberg
0 siblings, 1 reply; 12+ messages in thread
From: Jan Niehusmann @ 2000-12-07 21:30 UTC (permalink / raw)
To: linux-kernel, adilger; +Cc: Udo A. Steinberg, Byron Stanoszek
The following patch actually prevents the corruption I described.
I'd like to hear from the people having problems with hdparm, if it helps
them, too.
Please note that the patch circumvents the problem more than it fixes it.
The true fix would invalidate the mappings, but I don't know how to do it.
Jan
--- linux-2.4.0-test11/fs/buffer.c Mon Nov 20 08:55:05 2000
+++ test/fs/buffer.c Thu Dec 7 22:28:24 2000
@@ -589,7 +589,7 @@
then an invalidate_buffers call that doesn't trash dirty buffers. */
void __invalidate_buffers(kdev_t dev, int destroy_dirty_buffers)
{
- int i, nlist, slept;
+ int i, nlist, slept, db_message=0;
struct buffer_head * bh, * bh_next;
retry:
@@ -615,8 +615,13 @@
write_lock(&hash_table_lock);
if (!atomic_read(&bh->b_count) &&
(destroy_dirty_buffers || !buffer_dirty(bh))) {
- __remove_from_queues(bh);
- put_last_free(bh);
+ if(bh->b_page
+ && bh->b_page->mapping) {
+ db_message=1;
+ } else {
+ __remove_from_queues(bh);
+ put_last_free(bh);
+ }
}
/* else complain loudly? */
@@ -629,6 +634,8 @@
spin_unlock(&lru_list_lock);
if (slept)
goto retry;
+ if(db_message)
+ printk("invalidate_buffer with mapped page\n");
}
void set_blocksize(kdev_t dev, int size)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] Re: fs corruption with invalidate_buffers()
2000-12-07 21:30 ` [PATCH] " Jan Niehusmann
@ 2000-12-07 22:03 ` Udo A. Steinberg
2000-12-07 22:26 ` Alexander Viro
0 siblings, 1 reply; 12+ messages in thread
From: Udo A. Steinberg @ 2000-12-07 22:03 UTC (permalink / raw)
To: Jan Niehusmann; +Cc: linux-kernel, adilger, Byron Stanoszek, Alexander Viro
Jan Niehusmann wrote:
>
> The following patch actually prevents the corruption I described.
>
> I'd like to hear from the people having problems with hdparm, if it helps
> them, too.
Yes, it prevents the issue.
> Please note that the patch circumvents the problem more than it fixes it.
> The true fix would invalidate the mappings, but I don't know how to do it.
I don't know either. What does Alexander Viro say to all of this?
-Udo.
Same debug patch adapted to test12-pre7 follows:
--- linux/fs/buffer.c Thu Dec 7 22:55:54 2000
+++ /usr/src/linux/fs/buffer.c Thu Dec 7 22:49:02 2000
@@ -627,7 +627,7 @@
then an invalidate_buffers call that doesn't trash dirty buffers. */
void __invalidate_buffers(kdev_t dev, int destroy_dirty_buffers)
{
- int i, nlist, slept;
+ int i, nlist, slept, db_message = 0;
struct buffer_head * bh, * bh_next;
retry:
@@ -653,9 +653,13 @@
write_lock(&hash_table_lock);
if (!atomic_read(&bh->b_count) &&
(destroy_dirty_buffers || !buffer_dirty(bh))) {
- remove_inode_queue(bh);
- __remove_from_queues(bh);
- put_last_free(bh);
+ if (bh->b_page && bh->b_page->mapping)
+ db_message = 1;
+ else {
+ remove_inode_queue(bh);
+ __remove_from_queues(bh);
+ put_last_free(bh);
+ }
}
/* else complain loudly? */
@@ -668,6 +672,8 @@
spin_unlock(&lru_list_lock);
if (slept)
goto retry;
+ if (db_message)
+ printk("invalidate_buffers with mapped page!\n");
}
void set_blocksize(kdev_t dev, int size)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] Re: fs corruption with invalidate_buffers()
2000-12-07 22:03 ` Udo A. Steinberg
@ 2000-12-07 22:26 ` Alexander Viro
2000-12-07 23:37 ` Jan Niehusmann
2000-12-22 0:03 ` Jan Niehusmann
0 siblings, 2 replies; 12+ messages in thread
From: Alexander Viro @ 2000-12-07 22:26 UTC (permalink / raw)
To: Udo A. Steinberg; +Cc: Jan Niehusmann, linux-kernel, adilger, Byron Stanoszek
On Thu, 7 Dec 2000, Udo A. Steinberg wrote:
> Jan Niehusmann wrote:
> >
> > The following patch actually prevents the corruption I described.
> >
> > I'd like to hear from the people having problems with hdparm, if it helps
> > them, too.
>
> Yes, it prevents the issue.
>
> > Please note that the patch circumvents the problem more than it fixes it.
> > The true fix would invalidate the mappings, but I don't know how to do it.
>
> I don't know either. What does Alexander Viro say to all of this?
That invalidate_buffers() should leave the unhashed ones alone. If it can't
be found via getblk() - just leave it as is.
IOW, let it skip bh if (bh->b_next == NULL && !destroy_dirty_buffers).
No warnings needed - it's a normal situation.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Re: fs corruption with invalidate_buffers()
2000-12-07 22:26 ` Alexander Viro
@ 2000-12-07 23:37 ` Jan Niehusmann
2000-12-22 0:03 ` Jan Niehusmann
1 sibling, 0 replies; 12+ messages in thread
From: Jan Niehusmann @ 2000-12-07 23:37 UTC (permalink / raw)
To: Alexander Viro; +Cc: Udo A. Steinberg, linux-kernel
On Thu, Dec 07, 2000 at 05:26:46PM -0500, Alexander Viro wrote:
> That invalidate_buffers() should leave the unhashed ones alone. If it can't
> be found via getblk() - just leave it as is.
>
> IOW, let it skip bh if (bh->b_next == NULL && !destroy_dirty_buffers).
> No warnings needed - it's a normal situation.
Yes, if(bh->b_next == NULL && !destroy_dirty_buffers) seems to work, too.
I wonder why, because, if I analysed the problem correctly, it was caused
by the page mapping. Or is it a general rule that bh->b_next==NULL if
bh->b_page->mapping != NULL, ie. a buffer is never directly hased and belongs
to a mapped page?
Is there a place one can look for documentation about these things?
Another question is what should happen with the mapped pages? Whoever calls
invalidate_buffers(), probably does it because the underlying device disappered
or changed, so the page mappings should be invalidated, too.
OTOH, pages are (normally) mapped through inodes, and if the inode stays valid
after the invalidate_buffers() (ie. if it's called by an lvm resize event),
the page mapping stays valid, too.
Jan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] Re: fs corruption with invalidate_buffers()
2000-12-07 22:26 ` Alexander Viro
2000-12-07 23:37 ` Jan Niehusmann
@ 2000-12-22 0:03 ` Jan Niehusmann
2000-12-22 0:37 ` Linus Torvalds
1 sibling, 1 reply; 12+ messages in thread
From: Jan Niehusmann @ 2000-12-22 0:03 UTC (permalink / raw)
To: Alexander Viro, linux-kernel, adilger, Linus Torvalds
The file corruption I reported on Dec 6 is still there in test13-pre3.
(I can only reproduce it easily with the ext2 online resizing patches,
but I really don't think it is caused by them)
The corruption happens if invalidate_buffers calls put_last_free() on
buffers that belong to mapped pages. These pages remain valid and may
get used later, while the buffers are marked free and may be reused
by something completely different, immediately causing corruption.
I changed my patch for the problem according to the following advice by
Al Viro:
On Thu, Dec 07, 2000 at 05:26:46PM -0500, Alexander Viro wrote:
> That invalidate_buffers() should leave the unhashed ones alone. If it can't
> be found via getblk() - just leave it as is.
>
> IOW, let it skip bh if (bh->b_next == NULL && !destroy_dirty_buffers).
> No warnings needed - it's a normal situation.
This is the result - against test12-pre7, but works well with
test13-pre3:
--- linux-2.4.0-test12-pre7-jn/fs/buffer.c.orig Fri Dec 8 14:59:28 2000
+++ linux-2.4.0-test12-pre7-jn/fs/buffer.c Fri Dec 8 15:05:11 2000
@@ -502,6 +502,10 @@
struct bh_free_head *head = &free_list[BUFSIZE_INDEX(bh->b_size)];
struct buffer_head **bhp = &head->list;
+ if(bh->b_page && bh->b_page->mapping) {
+ panic("put_last_free() on mapped buffer!");
+ }
+
bh->b_state = 0;
spin_lock(&head->lock);
@@ -652,7 +656,8 @@
write_lock(&hash_table_lock);
if (!atomic_read(&bh->b_count) &&
- (destroy_dirty_buffers || !buffer_dirty(bh))) {
+ (destroy_dirty_buffers ||
+ (!buffer_dirty(bh) && bh->b_next!=0) )) {
remove_inode_queue(bh);
__remove_from_queues(bh);
put_last_free(bh);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] Re: fs corruption with invalidate_buffers()
2000-12-22 0:03 ` Jan Niehusmann
@ 2000-12-22 0:37 ` Linus Torvalds
2000-12-22 0:48 ` Jan Niehusmann
2000-12-22 1:56 ` Alexander Viro
0 siblings, 2 replies; 12+ messages in thread
From: Linus Torvalds @ 2000-12-22 0:37 UTC (permalink / raw)
To: Jan Niehusmann; +Cc: Alexander Viro, linux-kernel, adilger
On Fri, 22 Dec 2000, Jan Niehusmann wrote:
>
> This is the result - against test12-pre7, but works well with
> test13-pre3:
This looks bogus.
You can't test "bh->b_next!=0", because that is entirely meaningless.
b_next can be NULL either because the buffer isn't hashed, or because the
buffer _is_ hashed, but just happens to be last on the hash chain.
So testing "bh->b_next" doesn't actually tell you anything at all.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Re: fs corruption with invalidate_buffers()
2000-12-22 0:37 ` Linus Torvalds
@ 2000-12-22 0:48 ` Jan Niehusmann
2000-12-22 1:01 ` Linus Torvalds
2000-12-22 1:56 ` Alexander Viro
1 sibling, 1 reply; 12+ messages in thread
From: Jan Niehusmann @ 2000-12-22 0:48 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Alexander Viro, linux-kernel, adilger
On Thu, Dec 21, 2000 at 04:37:30PM -0800, Linus Torvalds wrote:
> This looks bogus.
It may be - I just did what Al told me without really understanding it ;-)
The test I did initially was the following:
if(!atomic_read(&bh->b_count) &&
(destroy_dirty_buffers || !buffer_dirty(bh))
&& ! (bh->b_page && bh->b_page->mapping)
)
That is, I was explicitely checking for a mapped page. It worked well, too.
Is this more reasonable?
Jan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Re: fs corruption with invalidate_buffers()
2000-12-22 0:48 ` Jan Niehusmann
@ 2000-12-22 1:01 ` Linus Torvalds
2000-12-22 1:49 ` Jan Niehusmann
0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2000-12-22 1:01 UTC (permalink / raw)
To: Jan Niehusmann; +Cc: Alexander Viro, linux-kernel, adilger
On Fri, 22 Dec 2000, Jan Niehusmann wrote:
>
> The test I did initially was the following:
>
> if(!atomic_read(&bh->b_count) &&
> (destroy_dirty_buffers || !buffer_dirty(bh))
> && ! (bh->b_page && bh->b_page->mapping)
> )
>
> That is, I was explicitely checking for a mapped page. It worked well, too.
> Is this more reasonable?
I'd suggest just doing this instead (warning: cut-and-paste in xterm, so
white-space damage):
--- linux/fs/buffer.c.old Wed Dec 20 17:50:56 2000
+++ linux/fs/buffer.c Thu Dec 21 16:42:11 2000
@@ -639,8 +639,13 @@
continue;
for (i = nr_buffers_type[nlist]; i > 0 ; bh = bh_next, i--) {
bh_next = bh->b_next_free;
+
+ /* Another device? */
if (bh->b_dev != dev)
continue;
+ /* Part of a mapping? */
+ if (bh->b_page->mapping)
+ continue;
if (buffer_locked(bh)) {
atomic_inc(&bh->b_count);
spin_unlock(&lru_list_lock);
which just ignores mapped buffers entirely (and doesn't test for
bh->b_page being non-NULL, because that shouldn't be allowed anyway).
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] Re: fs corruption with invalidate_buffers()
2000-12-22 1:01 ` Linus Torvalds
@ 2000-12-22 1:49 ` Jan Niehusmann
0 siblings, 0 replies; 12+ messages in thread
From: Jan Niehusmann @ 2000-12-22 1:49 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Alexander Viro, linux-kernel, adilger
On Thu, Dec 21, 2000 at 05:01:00PM -0800, Linus Torvalds wrote:
>
>
> On Fri, 22 Dec 2000, Jan Niehusmann wrote:
> >
> > The test I did initially was the following:
> >
> > if(!atomic_read(&bh->b_count) &&
> > (destroy_dirty_buffers || !buffer_dirty(bh))
> > && ! (bh->b_page && bh->b_page->mapping)
> > )
> >
> > That is, I was explicitely checking for a mapped page. It worked well, too.
> > Is this more reasonable?
>
> I'd suggest just doing this instead (warning: cut-and-paste in xterm, so
> white-space damage):
> which just ignores mapped buffers entirely (and doesn't test for
> bh->b_page being non-NULL, because that shouldn't be allowed anyway).
Yes, looks good to me, and passes some tests. Here is a patch that has not
been cut and pasted:
--- linux/fs/buffer.c.orig Thu Dec 21 20:30:03 2000
+++ linux/fs/buffer.c Fri Dec 22 02:11:29 2000
@@ -643,7 +643,12 @@
continue;
for (i = nr_buffers_type[nlist]; i > 0 ; bh = bh_next, i--) {
bh_next = bh->b_next_free;
+
+ /* Another device? */
if (bh->b_dev != dev)
+ continue;
+ /* Part of a mapping? */
+ if (bh->b_page->mapping)
continue;
if (buffer_locked(bh)) {
atomic_inc(&bh->b_count);
I have one additional question: invalidate_buffers normally gets called if
someone wants to make sure that, after the call, read accesses to a device
really go to the device and don't get served by a cache. Is there some
mechanismn that does the same thing to mapped pages?
Jan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] Re: fs corruption with invalidate_buffers()
2000-12-22 0:37 ` Linus Torvalds
2000-12-22 0:48 ` Jan Niehusmann
@ 2000-12-22 1:56 ` Alexander Viro
1 sibling, 0 replies; 12+ messages in thread
From: Alexander Viro @ 2000-12-22 1:56 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Jan Niehusmann, linux-kernel, adilger
On Thu, 21 Dec 2000, Linus Torvalds wrote:
>
>
> On Fri, 22 Dec 2000, Jan Niehusmann wrote:
> >
> > This is the result - against test12-pre7, but works well with
> > test13-pre3:
>
> This looks bogus.
It is bogus. My apologies.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2000-12-22 2:27 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2000-12-06 2:07 fs corruption with invalidate_buffers() Jan Niehusmann
2000-12-07 19:05 ` Jan Niehusmann
2000-12-07 21:30 ` [PATCH] " Jan Niehusmann
2000-12-07 22:03 ` Udo A. Steinberg
2000-12-07 22:26 ` Alexander Viro
2000-12-07 23:37 ` Jan Niehusmann
2000-12-22 0:03 ` Jan Niehusmann
2000-12-22 0:37 ` Linus Torvalds
2000-12-22 0:48 ` Jan Niehusmann
2000-12-22 1:01 ` Linus Torvalds
2000-12-22 1:49 ` Jan Niehusmann
2000-12-22 1:56 ` Alexander Viro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox