2.4.11-pre2 fs/buffer.c: invalidate: busy buffer

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer
@ 2001-10-03  2:05 Shane Wegner
  2001-10-03  4:10 ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Shane Wegner @ 2001-10-03  2:05 UTC (permalink / raw)
  To: linux-kernel

Hi all,

I am getting the following out of fs/buffer.c immediately
after bootup.  The kernel is 2.4.11-pre2 when the message
was added.

Oct  2 17:35:08 continuum kernel: invalidate: busy buffer
Oct  2 17:35:08 continuum last message repeated 7 times

I assume this is an error though it doesn't seem to say so. 
It's consistent in that it happens on every bootup however
the repeat count can vary between 7 and 10 times.

Everything is ext2 here though LVM and MD (raid0/linear)
are both being used.

Regards,
Shane

-- 
Shane Wegner: shane@cm.nu
              http://www.cm.nu/~shane/
PGP:          1024D/FFE3035D
              A0ED DAC4 77EC D674 5487
              5B5C 4F89 9A4E FFE3 035D

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer
  2001-10-03  2:05 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer Shane Wegner
@ 2001-10-03  4:10 ` Linus Torvalds
  2001-10-03  4:21   ` Alexander Viro
  2001-10-03  4:57   ` Andreas Dilger
  0 siblings, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2001-10-03  4:10 UTC (permalink / raw)
  To: linux-kernel

In article <20011002190547.A3323@cm.nu>, Shane Wegner  <shane@cm.nu> wrote:
>
>I am getting the following out of fs/buffer.c immediately
>after bootup.  The kernel is 2.4.11-pre2 when the message
>was added.
>
>Oct  2 17:35:08 continuum kernel: invalidate: busy buffer
>Oct  2 17:35:08 continuum last message repeated 7 times
>
>I assume this is an error though it doesn't seem to say so. 

Well, it's an error, but it's an error in that LVM invalidates the block
devices a bit too much, and the message really tells you that the code
refused to invalidate stuff that must not be invalidated.

It's harmless, although I hope that the LVM people will become a bit
less invalidation-happy as a result of the warning (it's always happened
before, it just hasn't warned about it in earlier kernels).

		Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer
  2001-10-03  4:10 ` Linus Torvalds
@ 2001-10-03  4:21   ` Alexander Viro
  2001-10-03  6:41     ` Andreas Dilger
  2001-10-03  4:57   ` Andreas Dilger
  1 sibling, 1 reply; 8+ messages in thread
From: Alexander Viro @ 2001-10-03  4:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel



On Wed, 3 Oct 2001, Linus Torvalds wrote:

> It's harmless, although I hope that the LVM people will become a bit
> less invalidation-happy as a result of the warning (it's always happened
> before, it just hasn't warned about it in earlier kernels).

AFAICS, md.c also doesn't play nice with buffe cache.

        fsync_dev(dev);
        set_blocksize(dev, MD_SB_BYTES);
        bh = getblk(dev, sb_offset / MD_SB_BLOCKS, MD_SB_BYTES);
        if (!bh) {
                printk(GETBLK_FAILED, partition_name(dev));
                return 1;
        }
        memset(bh->b_data,0,bh->b_size);
        sb = (mdp_super_t *) bh->b_data;
        memcpy(sb, rdev->sb, MD_SB_BYTES);

        mark_buffer_uptodate(bh, 1);
        mark_buffer_dirty(bh);
        ll_rw_block(WRITE, 1, &bh);
        wait_on_buffer(bh);
        brelse(bh);
        fsync_dev(dev);

is not a good thing to do (write_disk_sb()).  Not to mention the fact that
set_blocksize() looks bogus, the code is obviously racy.  Think what will
happen if somebody is reading from device at that moment.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer
  2001-10-03  4:21   ` Alexander Viro
@ 2001-10-03  6:41     ` Andreas Dilger
  2001-10-03  6:54       ` Neil Brown
  0 siblings, 1 reply; 8+ messages in thread
From: Andreas Dilger @ 2001-10-03  6:41 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Linus Torvalds, linux-kernel, Neil Brown, Ingo Molnar

On Oct 03, 2001  00:21 -0400, Alexander Viro wrote:
> AFAICS, md.c also doesn't play nice with buffe cache.
> 
>         fsync_dev(dev);
>         set_blocksize(dev, MD_SB_BYTES);
>         bh = getblk(dev, sb_offset / MD_SB_BLOCKS, MD_SB_BYTES);
>         if (!bh) {
>                 printk(GETBLK_FAILED, partition_name(dev));
>                 return 1;
>         }
>         memset(bh->b_data,0,bh->b_size);
>         sb = (mdp_super_t *) bh->b_data;
>         memcpy(sb, rdev->sb, MD_SB_BYTES);
> 
>         mark_buffer_uptodate(bh, 1);
>         mark_buffer_dirty(bh);
>         ll_rw_block(WRITE, 1, &bh);
>         wait_on_buffer(bh);
>         brelse(bh);
>         fsync_dev(dev);
> 
> is not a good thing to do (write_disk_sb()).  Not to mention the fact that
> set_blocksize() looks bogus, the code is obviously racy.  Think what will
> happen if somebody is reading from device at that moment.

This is especially true since I think this function is called not only
during startup and shutdown of a device, but also whenever there is a
problem with a disk (e.g. hot_remove_disk(), md_do_recovery(), raid1/5
errors, etc).  Maybe people don't notice it much because MD_SB_BYTES=4096
is the default blocksize of ext2/ext3 for devices > 500MB (i.e. most
RAID devices), and the only blocksize for reiserfs.

Three possibilities:
1) Save the blocksize and restore it afterwards (ugly, not 100% fix)
2) Do I/O in the current blocksize, which adds slight complication to
   this function, but not too much (loop on the number of blocks to
   write, normally only one).  I don't know what impact this has
   on superblock consistency, but in theory none because you can't
   guarantee larger than single-sector I/O anyways, although there
   _may_ be a slightly increased chance of not getting a bh in OOM.
   Compiled but untested patch below.  I had also looked at the read
   path, but it is only used for new disks so no harm in setting blocksize.
3) Rewrite to do I/O directly via pagecache?

Cheers, Andreas

PS - Neil and Ingo CC'd
==========================================================================
--- linux.orig/drivers/md/md.c	Tue Jul 10 17:05:24 2001
+++ linux/drivers/md/md.c	Wed Oct  3 00:19:23 2001
@@ -913,10 +913,10 @@
 
 static int write_disk_sb(mdk_rdev_t * rdev)
 {
-	struct buffer_head *bh;
+	struct buffer_head *bh[MD_SB_BYTES/512];
 	kdev_t dev;
 	unsigned long sb_offset, size;
-	mdp_super_t *sb;
+	int blocksize, sb_chunks, i;
 
 	if (!rdev->sb) {
 		MD_BUG();
@@ -950,21 +950,30 @@
 
 	printk("(write) %s's sb offset: %ld\n", partition_name(dev), sb_offset);
 	fsync_dev(dev);
-	set_blocksize(dev, MD_SB_BYTES);
-	bh = getblk(dev, sb_offset / MD_SB_BLOCKS, MD_SB_BYTES);
-	if (!bh) {
-		printk(GETBLK_FAILED, partition_name(dev));
-		return 1;
+	blocksize = blksize_size[MAJOR(dev)][MINOR(dev)];
+	if (blocksize == 0)
+		blocksize = BLOCK_SIZE;
+	sb_chunks = blocksize >= MD_SB_BYTES ? 1 : MD_SB_BYTES / blocksize;
+	for (i = 0; i < sb_chunks; i++) {
+		bh[i] = getblk(dev, sb_offset / blocksize + i, blocksize);
+		if (!bh[i]) {
+			printk(GETBLK_FAILED, partition_name(dev));
+			while (i--)
+				brelse(bh[i]);
+			return 1;
+		}
+		if (blocksize > MD_SB_BYTES)
+			memset(bh[i]->b_data, 0, bh[i]->b_size);
+		memcpy(bh[i]->b_data, (char*)rdev->sb + i*blocksize, blocksize);
+
+		mark_buffer_uptodate(bh[i], 1);
+		mark_buffer_dirty(bh[i]);
+	}
+	ll_rw_block(WRITE, sb_chunks, bh);
+	while (i--) {
+		wait_on_buffer(bh[i]);
+		brelse(bh[i]);
 	}
-	memset(bh->b_data,0,bh->b_size);
-	sb = (mdp_super_t *) bh->b_data;
-	memcpy(sb, rdev->sb, MD_SB_BYTES);
-
-	mark_buffer_uptodate(bh, 1);
-	mark_buffer_dirty(bh);
-	ll_rw_block(WRITE, 1, &bh);
-	wait_on_buffer(bh);
-	brelse(bh);
 	fsync_dev(dev);
 skip:
 	return 0;
@@ -2745,7 +2764,7 @@
 		}
 
 		default:
-			printk(KERN_WARNING "md: %s(pid %d) used obsolete MD ioctl, upgrade your software to use new ictls.\n", current->comm, current->pid);
+			printk(KERN_WARNING "md: %s(pid %d) used obsolete MD ioctl, upgrade your software to use new ioctls.\n", current->comm, current->pid);
 			err = -EINVAL;
 			goto abort_unlock;
 	}
--
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer
  2001-10-03  6:41     ` Andreas Dilger
@ 2001-10-03  6:54       ` Neil Brown
  2001-10-03  7:05         ` Alexander Viro
  0 siblings, 1 reply; 8+ messages in thread
From: Neil Brown @ 2001-10-03  6:54 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Alexander Viro, Linus Torvalds, linux-kernel, Ingo Molnar

On Wednesday October 3, adilger@turbolabs.com wrote:
> On Oct 03, 2001  00:21 -0400, Alexander Viro wrote:
> > AFAICS, md.c also doesn't play nice with buffe cache.
  snip
> > 
snip
> 
> Three possibilities:
> 1) Save the blocksize and restore it afterwards (ugly, not 100% fix)
> 2) Do I/O in the current blocksize, which adds slight complication to
>    this function, but not too much (loop on the number of blocks to
>    write, normally only one).  I don't know what impact this has
>    on superblock consistency, but in theory none because you can't
>    guarantee larger than single-sector I/O anyways, although there
>    _may_ be a slightly increased chance of not getting a bh in OOM.
>    Compiled but untested patch below.  I had also looked at the read
>    path, but it is only used for new disks so no harm in setting blocksize.
> 3) Rewrite to do I/O directly via pagecache?

4) Rewrite to do I/O directly via generic_make_request just like it
   does for all other I/O to underlying devices.
   It is on my (mental) todo list, but doesn't have a high priority.
   At the same time, it would be good to get write_disk_sb to notice
   if the write failed.....

NeilBrown

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer
  2001-10-03  6:54       ` Neil Brown
@ 2001-10-03  7:05         ` Alexander Viro
  2001-10-03  7:13           ` Neil Brown
  0 siblings, 1 reply; 8+ messages in thread
From: Alexander Viro @ 2001-10-03  7:05 UTC (permalink / raw)
  To: Neil Brown; +Cc: Andreas Dilger, Linus Torvalds, linux-kernel, Ingo Molnar

On Wed, 3 Oct 2001, Neil Brown wrote:

> On Wednesday October 3, adilger@turbolabs.com wrote:

> > 3) Rewrite to do I/O directly via pagecache?
> 
> 4) Rewrite to do I/O directly via generic_make_request just like it
>    does for all other I/O to underlying devices.
>    It is on my (mental) todo list, but doesn't have a high priority.
>    At the same time, it would be good to get write_disk_sb to notice
>    if the write failed.....

Folks, there's a problem aside of set_blocksize().  You are doing memset
and memcpy on unlocked buffer returned by getblk().  That's a race -
if that buffer is currently being read into, we will get a random crap.
And write it to disk afterwards.

General rule: don't modify buffer you've got from getblk() unless you've
locked it and mark it uptodate before you unlock.  We had the same
bug in ext2, BTW.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer
  2001-10-03  7:05         ` Alexander Viro
@ 2001-10-03  7:13           ` Neil Brown
  0 siblings, 0 replies; 8+ messages in thread
From: Neil Brown @ 2001-10-03  7:13 UTC (permalink / raw)
  To: Alexander Viro; +Cc: Andreas Dilger, Linus Torvalds, linux-kernel, Ingo Molnar

On Wednesday October 3, viro@math.psu.edu wrote:
> 
> 
> On Wed, 3 Oct 2001, Neil Brown wrote:
> 
> > On Wednesday October 3, adilger@turbolabs.com wrote:
> 
> > > 3) Rewrite to do I/O directly via pagecache?
> > 
> > 4) Rewrite to do I/O directly via generic_make_request just like it
> >    does for all other I/O to underlying devices.
> >    It is on my (mental) todo list, but doesn't have a high priority.
> >    At the same time, it would be good to get write_disk_sb to notice
> >    if the write failed.....
> 
> Folks, there's a problem aside of set_blocksize().  You are doing memset
> and memcpy on unlocked buffer returned by getblk().  That's a race -
> if that buffer is currently being read into, we will get a random crap.
> And write it to disk afterwards.
> 
> General rule: don't modify buffer you've got from getblk() unless you've
> locked it and mark it uptodate before you unlock.  We had the same
> bug in ext2, BTW.

What I am saying is "don't use the buffer cache at all" or any cache.
Don't use getblk.
Just:
   bh = kmalloc(sizeof(*bh));
   bh->b_data = rdev->sb; /* it is always one page */
   bh->b_rdev = rdev->dev;
   bh->b_rsector = sb_offset*2;
   bh->b_end_io = md_just_set_a_flag_and_call_wakeup;
   init_wait_queue_head(&bh->b_wait);
   bh->b_flags = 1<<B_Locked;

   generic_make_request(WRITE,bh);
   wait_disk_event((bh->b_flags & (1<<B_Locked)), bh->b_wait);

   err = bh->b_flags & (1<<B_Uptodate);
   kfree(bh);
   return err;


with typos corrected and a fairly obvious definition of
 md_just_set_a_flag_and_call_wakeup
fairly similar to end_buffer_io_sync.

NeilBrown

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer
  2001-10-03  4:10 ` Linus Torvalds
  2001-10-03  4:21   ` Alexander Viro
@ 2001-10-03  4:57   ` Andreas Dilger
  1 sibling, 0 replies; 8+ messages in thread
From: Andreas Dilger @ 2001-10-03  4:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, lvm-devel

On Oct 03, 2001  04:10 +0000, Linus Torvalds wrote:
> In article <20011002190547.A3323@cm.nu>, Shane Wegner  <shane@cm.nu> wrote:
> >I am getting the following out of fs/buffer.c immediately
> >after bootup.  The kernel is 2.4.11-pre2 when the message
> >was added.
> >
> >Oct  2 17:35:08 continuum kernel: invalidate: busy buffer
> >Oct  2 17:35:08 continuum last message repeated 7 times
> >
> >I assume this is an error though it doesn't seem to say so. 
> 
> Well, it's an error, but it's an error in that LVM invalidates the block
> devices a bit too much, and the message really tells you that the code
> refused to invalidate stuff that must not be invalidated.
> 
> It's harmless, although I hope that the LVM people will become a bit
> less invalidation-happy as a result of the warning (it's always happened
> before, it just hasn't warned about it in earlier kernels).

Given that 2.4.10+ have devices in page cache, is there _any_ reason
why what the kernel sees on a device would be different than what user
space reads from a device?  I don't think it was ever an issue between
whole-disk-dev and partition-dev aliasing, since both user-space and
the kernel are accessing the same device.

If not, then we can just change the PV_FLUSH code to not do
invalidate_buffers() on the device for kernels 2.4.10+.  There never
was a very strong reason to do it for disk identification.

Cheers, Andreas

CC'd LVM folks to get their input on this.
--
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2001-10-03  7:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-03  2:05 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer Shane Wegner
2001-10-03  4:10 ` Linus Torvalds
2001-10-03  4:21   ` Alexander Viro
2001-10-03  6:41     ` Andreas Dilger
2001-10-03  6:54       ` Neil Brown
2001-10-03  7:05         ` Alexander Viro
2001-10-03  7:13           ` Neil Brown
2001-10-03  4:57   ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox