* 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer
@ 2001-10-03 2:05 Shane Wegner
2001-10-03 4:10 ` Linus Torvalds
0 siblings, 1 reply; 8+ messages in thread
From: Shane Wegner @ 2001-10-03 2:05 UTC (permalink / raw)
To: linux-kernel
Hi all,
I am getting the following out of fs/buffer.c immediately
after bootup. The kernel is 2.4.11-pre2 when the message
was added.
Oct 2 17:35:08 continuum kernel: invalidate: busy buffer
Oct 2 17:35:08 continuum last message repeated 7 times
I assume this is an error though it doesn't seem to say so.
It's consistent in that it happens on every bootup however
the repeat count can vary between 7 and 10 times.
Everything is ext2 here though LVM and MD (raid0/linear)
are both being used.
Regards,
Shane
--
Shane Wegner: shane@cm.nu
http://www.cm.nu/~shane/
PGP: 1024D/FFE3035D
A0ED DAC4 77EC D674 5487
5B5C 4F89 9A4E FFE3 035D
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer
2001-10-03 2:05 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer Shane Wegner
@ 2001-10-03 4:10 ` Linus Torvalds
2001-10-03 4:21 ` Alexander Viro
2001-10-03 4:57 ` Andreas Dilger
0 siblings, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2001-10-03 4:10 UTC (permalink / raw)
To: linux-kernel
In article <20011002190547.A3323@cm.nu>, Shane Wegner <shane@cm.nu> wrote:
>
>I am getting the following out of fs/buffer.c immediately
>after bootup. The kernel is 2.4.11-pre2 when the message
>was added.
>
>Oct 2 17:35:08 continuum kernel: invalidate: busy buffer
>Oct 2 17:35:08 continuum last message repeated 7 times
>
>I assume this is an error though it doesn't seem to say so.
Well, it's an error, but it's an error in that LVM invalidates the block
devices a bit too much, and the message really tells you that the code
refused to invalidate stuff that must not be invalidated.
It's harmless, although I hope that the LVM people will become a bit
less invalidation-happy as a result of the warning (it's always happened
before, it just hasn't warned about it in earlier kernels).
Linus
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer
2001-10-03 4:10 ` Linus Torvalds
@ 2001-10-03 4:21 ` Alexander Viro
2001-10-03 6:41 ` Andreas Dilger
2001-10-03 4:57 ` Andreas Dilger
1 sibling, 1 reply; 8+ messages in thread
From: Alexander Viro @ 2001-10-03 4:21 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-kernel
On Wed, 3 Oct 2001, Linus Torvalds wrote:
> It's harmless, although I hope that the LVM people will become a bit
> less invalidation-happy as a result of the warning (it's always happened
> before, it just hasn't warned about it in earlier kernels).
AFAICS, md.c also doesn't play nice with buffe cache.
fsync_dev(dev);
set_blocksize(dev, MD_SB_BYTES);
bh = getblk(dev, sb_offset / MD_SB_BLOCKS, MD_SB_BYTES);
if (!bh) {
printk(GETBLK_FAILED, partition_name(dev));
return 1;
}
memset(bh->b_data,0,bh->b_size);
sb = (mdp_super_t *) bh->b_data;
memcpy(sb, rdev->sb, MD_SB_BYTES);
mark_buffer_uptodate(bh, 1);
mark_buffer_dirty(bh);
ll_rw_block(WRITE, 1, &bh);
wait_on_buffer(bh);
brelse(bh);
fsync_dev(dev);
is not a good thing to do (write_disk_sb()). Not to mention the fact that
set_blocksize() looks bogus, the code is obviously racy. Think what will
happen if somebody is reading from device at that moment.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer
2001-10-03 4:10 ` Linus Torvalds
2001-10-03 4:21 ` Alexander Viro
@ 2001-10-03 4:57 ` Andreas Dilger
1 sibling, 0 replies; 8+ messages in thread
From: Andreas Dilger @ 2001-10-03 4:57 UTC (permalink / raw)
To: Linus Torvalds; +Cc: linux-kernel, lvm-devel
On Oct 03, 2001 04:10 +0000, Linus Torvalds wrote:
> In article <20011002190547.A3323@cm.nu>, Shane Wegner <shane@cm.nu> wrote:
> >I am getting the following out of fs/buffer.c immediately
> >after bootup. The kernel is 2.4.11-pre2 when the message
> >was added.
> >
> >Oct 2 17:35:08 continuum kernel: invalidate: busy buffer
> >Oct 2 17:35:08 continuum last message repeated 7 times
> >
> >I assume this is an error though it doesn't seem to say so.
>
> Well, it's an error, but it's an error in that LVM invalidates the block
> devices a bit too much, and the message really tells you that the code
> refused to invalidate stuff that must not be invalidated.
>
> It's harmless, although I hope that the LVM people will become a bit
> less invalidation-happy as a result of the warning (it's always happened
> before, it just hasn't warned about it in earlier kernels).
Given that 2.4.10+ have devices in page cache, is there _any_ reason
why what the kernel sees on a device would be different than what user
space reads from a device? I don't think it was ever an issue between
whole-disk-dev and partition-dev aliasing, since both user-space and
the kernel are accessing the same device.
If not, then we can just change the PV_FLUSH code to not do
invalidate_buffers() on the device for kernels 2.4.10+. There never
was a very strong reason to do it for disk identification.
Cheers, Andreas
CC'd LVM folks to get their input on this.
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer
2001-10-03 4:21 ` Alexander Viro
@ 2001-10-03 6:41 ` Andreas Dilger
2001-10-03 6:54 ` Neil Brown
0 siblings, 1 reply; 8+ messages in thread
From: Andreas Dilger @ 2001-10-03 6:41 UTC (permalink / raw)
To: Alexander Viro; +Cc: Linus Torvalds, linux-kernel, Neil Brown, Ingo Molnar
On Oct 03, 2001 00:21 -0400, Alexander Viro wrote:
> AFAICS, md.c also doesn't play nice with buffe cache.
>
> fsync_dev(dev);
> set_blocksize(dev, MD_SB_BYTES);
> bh = getblk(dev, sb_offset / MD_SB_BLOCKS, MD_SB_BYTES);
> if (!bh) {
> printk(GETBLK_FAILED, partition_name(dev));
> return 1;
> }
> memset(bh->b_data,0,bh->b_size);
> sb = (mdp_super_t *) bh->b_data;
> memcpy(sb, rdev->sb, MD_SB_BYTES);
>
> mark_buffer_uptodate(bh, 1);
> mark_buffer_dirty(bh);
> ll_rw_block(WRITE, 1, &bh);
> wait_on_buffer(bh);
> brelse(bh);
> fsync_dev(dev);
>
> is not a good thing to do (write_disk_sb()). Not to mention the fact that
> set_blocksize() looks bogus, the code is obviously racy. Think what will
> happen if somebody is reading from device at that moment.
This is especially true since I think this function is called not only
during startup and shutdown of a device, but also whenever there is a
problem with a disk (e.g. hot_remove_disk(), md_do_recovery(), raid1/5
errors, etc). Maybe people don't notice it much because MD_SB_BYTES=4096
is the default blocksize of ext2/ext3 for devices > 500MB (i.e. most
RAID devices), and the only blocksize for reiserfs.
Three possibilities:
1) Save the blocksize and restore it afterwards (ugly, not 100% fix)
2) Do I/O in the current blocksize, which adds slight complication to
this function, but not too much (loop on the number of blocks to
write, normally only one). I don't know what impact this has
on superblock consistency, but in theory none because you can't
guarantee larger than single-sector I/O anyways, although there
_may_ be a slightly increased chance of not getting a bh in OOM.
Compiled but untested patch below. I had also looked at the read
path, but it is only used for new disks so no harm in setting blocksize.
3) Rewrite to do I/O directly via pagecache?
Cheers, Andreas
PS - Neil and Ingo CC'd
==========================================================================
--- linux.orig/drivers/md/md.c Tue Jul 10 17:05:24 2001
+++ linux/drivers/md/md.c Wed Oct 3 00:19:23 2001
@@ -913,10 +913,10 @@
static int write_disk_sb(mdk_rdev_t * rdev)
{
- struct buffer_head *bh;
+ struct buffer_head *bh[MD_SB_BYTES/512];
kdev_t dev;
unsigned long sb_offset, size;
- mdp_super_t *sb;
+ int blocksize, sb_chunks, i;
if (!rdev->sb) {
MD_BUG();
@@ -950,21 +950,30 @@
printk("(write) %s's sb offset: %ld\n", partition_name(dev), sb_offset);
fsync_dev(dev);
- set_blocksize(dev, MD_SB_BYTES);
- bh = getblk(dev, sb_offset / MD_SB_BLOCKS, MD_SB_BYTES);
- if (!bh) {
- printk(GETBLK_FAILED, partition_name(dev));
- return 1;
+ blocksize = blksize_size[MAJOR(dev)][MINOR(dev)];
+ if (blocksize == 0)
+ blocksize = BLOCK_SIZE;
+ sb_chunks = blocksize >= MD_SB_BYTES ? 1 : MD_SB_BYTES / blocksize;
+ for (i = 0; i < sb_chunks; i++) {
+ bh[i] = getblk(dev, sb_offset / blocksize + i, blocksize);
+ if (!bh[i]) {
+ printk(GETBLK_FAILED, partition_name(dev));
+ while (i--)
+ brelse(bh[i]);
+ return 1;
+ }
+ if (blocksize > MD_SB_BYTES)
+ memset(bh[i]->b_data, 0, bh[i]->b_size);
+ memcpy(bh[i]->b_data, (char*)rdev->sb + i*blocksize, blocksize);
+
+ mark_buffer_uptodate(bh[i], 1);
+ mark_buffer_dirty(bh[i]);
+ }
+ ll_rw_block(WRITE, sb_chunks, bh);
+ while (i--) {
+ wait_on_buffer(bh[i]);
+ brelse(bh[i]);
}
- memset(bh->b_data,0,bh->b_size);
- sb = (mdp_super_t *) bh->b_data;
- memcpy(sb, rdev->sb, MD_SB_BYTES);
-
- mark_buffer_uptodate(bh, 1);
- mark_buffer_dirty(bh);
- ll_rw_block(WRITE, 1, &bh);
- wait_on_buffer(bh);
- brelse(bh);
fsync_dev(dev);
skip:
return 0;
@@ -2745,7 +2764,7 @@
}
default:
- printk(KERN_WARNING "md: %s(pid %d) used obsolete MD ioctl, upgrade your software to use new ictls.\n", current->comm, current->pid);
+ printk(KERN_WARNING "md: %s(pid %d) used obsolete MD ioctl, upgrade your software to use new ioctls.\n", current->comm, current->pid);
err = -EINVAL;
goto abort_unlock;
}
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer
2001-10-03 6:41 ` Andreas Dilger
@ 2001-10-03 6:54 ` Neil Brown
2001-10-03 7:05 ` Alexander Viro
0 siblings, 1 reply; 8+ messages in thread
From: Neil Brown @ 2001-10-03 6:54 UTC (permalink / raw)
To: Andreas Dilger; +Cc: Alexander Viro, Linus Torvalds, linux-kernel, Ingo Molnar
On Wednesday October 3, adilger@turbolabs.com wrote:
> On Oct 03, 2001 00:21 -0400, Alexander Viro wrote:
> > AFAICS, md.c also doesn't play nice with buffe cache.
snip
> >
snip
>
> Three possibilities:
> 1) Save the blocksize and restore it afterwards (ugly, not 100% fix)
> 2) Do I/O in the current blocksize, which adds slight complication to
> this function, but not too much (loop on the number of blocks to
> write, normally only one). I don't know what impact this has
> on superblock consistency, but in theory none because you can't
> guarantee larger than single-sector I/O anyways, although there
> _may_ be a slightly increased chance of not getting a bh in OOM.
> Compiled but untested patch below. I had also looked at the read
> path, but it is only used for new disks so no harm in setting blocksize.
> 3) Rewrite to do I/O directly via pagecache?
4) Rewrite to do I/O directly via generic_make_request just like it
does for all other I/O to underlying devices.
It is on my (mental) todo list, but doesn't have a high priority.
At the same time, it would be good to get write_disk_sb to notice
if the write failed.....
NeilBrown
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer
2001-10-03 6:54 ` Neil Brown
@ 2001-10-03 7:05 ` Alexander Viro
2001-10-03 7:13 ` Neil Brown
0 siblings, 1 reply; 8+ messages in thread
From: Alexander Viro @ 2001-10-03 7:05 UTC (permalink / raw)
To: Neil Brown; +Cc: Andreas Dilger, Linus Torvalds, linux-kernel, Ingo Molnar
On Wed, 3 Oct 2001, Neil Brown wrote:
> On Wednesday October 3, adilger@turbolabs.com wrote:
> > 3) Rewrite to do I/O directly via pagecache?
>
> 4) Rewrite to do I/O directly via generic_make_request just like it
> does for all other I/O to underlying devices.
> It is on my (mental) todo list, but doesn't have a high priority.
> At the same time, it would be good to get write_disk_sb to notice
> if the write failed.....
Folks, there's a problem aside of set_blocksize(). You are doing memset
and memcpy on unlocked buffer returned by getblk(). That's a race -
if that buffer is currently being read into, we will get a random crap.
And write it to disk afterwards.
General rule: don't modify buffer you've got from getblk() unless you've
locked it and mark it uptodate before you unlock. We had the same
bug in ext2, BTW.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer
2001-10-03 7:05 ` Alexander Viro
@ 2001-10-03 7:13 ` Neil Brown
0 siblings, 0 replies; 8+ messages in thread
From: Neil Brown @ 2001-10-03 7:13 UTC (permalink / raw)
To: Alexander Viro; +Cc: Andreas Dilger, Linus Torvalds, linux-kernel, Ingo Molnar
On Wednesday October 3, viro@math.psu.edu wrote:
>
>
> On Wed, 3 Oct 2001, Neil Brown wrote:
>
> > On Wednesday October 3, adilger@turbolabs.com wrote:
>
> > > 3) Rewrite to do I/O directly via pagecache?
> >
> > 4) Rewrite to do I/O directly via generic_make_request just like it
> > does for all other I/O to underlying devices.
> > It is on my (mental) todo list, but doesn't have a high priority.
> > At the same time, it would be good to get write_disk_sb to notice
> > if the write failed.....
>
> Folks, there's a problem aside of set_blocksize(). You are doing memset
> and memcpy on unlocked buffer returned by getblk(). That's a race -
> if that buffer is currently being read into, we will get a random crap.
> And write it to disk afterwards.
>
> General rule: don't modify buffer you've got from getblk() unless you've
> locked it and mark it uptodate before you unlock. We had the same
> bug in ext2, BTW.
What I am saying is "don't use the buffer cache at all" or any cache.
Don't use getblk.
Just:
bh = kmalloc(sizeof(*bh));
bh->b_data = rdev->sb; /* it is always one page */
bh->b_rdev = rdev->dev;
bh->b_rsector = sb_offset*2;
bh->b_end_io = md_just_set_a_flag_and_call_wakeup;
init_wait_queue_head(&bh->b_wait);
bh->b_flags = 1<<B_Locked;
generic_make_request(WRITE,bh);
wait_disk_event((bh->b_flags & (1<<B_Locked)), bh->b_wait);
err = bh->b_flags & (1<<B_Uptodate);
kfree(bh);
return err;
with typos corrected and a fairly obvious definition of
md_just_set_a_flag_and_call_wakeup
fairly similar to end_buffer_io_sync.
NeilBrown
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2001-10-03 7:13 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-03 2:05 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer Shane Wegner
2001-10-03 4:10 ` Linus Torvalds
2001-10-03 4:21 ` Alexander Viro
2001-10-03 6:41 ` Andreas Dilger
2001-10-03 6:54 ` Neil Brown
2001-10-03 7:05 ` Alexander Viro
2001-10-03 7:13 ` Neil Brown
2001-10-03 4:57 ` Andreas Dilger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox