* 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer
@ 2001-10-03 2:05 Shane Wegner
2001-10-03 4:10 ` Linus Torvalds
0 siblings, 1 reply; 8+ messages in thread
From: Shane Wegner @ 2001-10-03 2:05 UTC (permalink / raw)
To: linux-kernel
Hi all,
I am getting the following out of fs/buffer.c immediately
after bootup. The kernel is 2.4.11-pre2 when the message
was added.
Oct 2 17:35:08 continuum kernel: invalidate: busy buffer
Oct 2 17:35:08 continuum last message repeated 7 times
I assume this is an error though it doesn't seem to say so.
It's consistent in that it happens on every bootup however
the repeat count can vary between 7 and 10 times.
Everything is ext2 here though LVM and MD (raid0/linear)
are both being used.
Regards,
Shane
--
Shane Wegner: shane@cm.nu
http://www.cm.nu/~shane/
PGP: 1024D/FFE3035D
A0ED DAC4 77EC D674 5487
5B5C 4F89 9A4E FFE3 035D
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer 2001-10-03 2:05 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer Shane Wegner @ 2001-10-03 4:10 ` Linus Torvalds 2001-10-03 4:21 ` Alexander Viro 2001-10-03 4:57 ` Andreas Dilger 0 siblings, 2 replies; 8+ messages in thread From: Linus Torvalds @ 2001-10-03 4:10 UTC (permalink / raw) To: linux-kernel In article <20011002190547.A3323@cm.nu>, Shane Wegner <shane@cm.nu> wrote: > >I am getting the following out of fs/buffer.c immediately >after bootup. The kernel is 2.4.11-pre2 when the message >was added. > >Oct 2 17:35:08 continuum kernel: invalidate: busy buffer >Oct 2 17:35:08 continuum last message repeated 7 times > >I assume this is an error though it doesn't seem to say so. Well, it's an error, but it's an error in that LVM invalidates the block devices a bit too much, and the message really tells you that the code refused to invalidate stuff that must not be invalidated. It's harmless, although I hope that the LVM people will become a bit less invalidation-happy as a result of the warning (it's always happened before, it just hasn't warned about it in earlier kernels). Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer 2001-10-03 4:10 ` Linus Torvalds @ 2001-10-03 4:21 ` Alexander Viro 2001-10-03 6:41 ` Andreas Dilger 2001-10-03 4:57 ` Andreas Dilger 1 sibling, 1 reply; 8+ messages in thread From: Alexander Viro @ 2001-10-03 4:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Wed, 3 Oct 2001, Linus Torvalds wrote: > It's harmless, although I hope that the LVM people will become a bit > less invalidation-happy as a result of the warning (it's always happened > before, it just hasn't warned about it in earlier kernels). AFAICS, md.c also doesn't play nice with buffe cache. fsync_dev(dev); set_blocksize(dev, MD_SB_BYTES); bh = getblk(dev, sb_offset / MD_SB_BLOCKS, MD_SB_BYTES); if (!bh) { printk(GETBLK_FAILED, partition_name(dev)); return 1; } memset(bh->b_data,0,bh->b_size); sb = (mdp_super_t *) bh->b_data; memcpy(sb, rdev->sb, MD_SB_BYTES); mark_buffer_uptodate(bh, 1); mark_buffer_dirty(bh); ll_rw_block(WRITE, 1, &bh); wait_on_buffer(bh); brelse(bh); fsync_dev(dev); is not a good thing to do (write_disk_sb()). Not to mention the fact that set_blocksize() looks bogus, the code is obviously racy. Think what will happen if somebody is reading from device at that moment. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer 2001-10-03 4:21 ` Alexander Viro @ 2001-10-03 6:41 ` Andreas Dilger 2001-10-03 6:54 ` Neil Brown 0 siblings, 1 reply; 8+ messages in thread From: Andreas Dilger @ 2001-10-03 6:41 UTC (permalink / raw) To: Alexander Viro; +Cc: Linus Torvalds, linux-kernel, Neil Brown, Ingo Molnar On Oct 03, 2001 00:21 -0400, Alexander Viro wrote: > AFAICS, md.c also doesn't play nice with buffe cache. > > fsync_dev(dev); > set_blocksize(dev, MD_SB_BYTES); > bh = getblk(dev, sb_offset / MD_SB_BLOCKS, MD_SB_BYTES); > if (!bh) { > printk(GETBLK_FAILED, partition_name(dev)); > return 1; > } > memset(bh->b_data,0,bh->b_size); > sb = (mdp_super_t *) bh->b_data; > memcpy(sb, rdev->sb, MD_SB_BYTES); > > mark_buffer_uptodate(bh, 1); > mark_buffer_dirty(bh); > ll_rw_block(WRITE, 1, &bh); > wait_on_buffer(bh); > brelse(bh); > fsync_dev(dev); > > is not a good thing to do (write_disk_sb()). Not to mention the fact that > set_blocksize() looks bogus, the code is obviously racy. Think what will > happen if somebody is reading from device at that moment. This is especially true since I think this function is called not only during startup and shutdown of a device, but also whenever there is a problem with a disk (e.g. hot_remove_disk(), md_do_recovery(), raid1/5 errors, etc). Maybe people don't notice it much because MD_SB_BYTES=4096 is the default blocksize of ext2/ext3 for devices > 500MB (i.e. most RAID devices), and the only blocksize for reiserfs. Three possibilities: 1) Save the blocksize and restore it afterwards (ugly, not 100% fix) 2) Do I/O in the current blocksize, which adds slight complication to this function, but not too much (loop on the number of blocks to write, normally only one). I don't know what impact this has on superblock consistency, but in theory none because you can't guarantee larger than single-sector I/O anyways, although there _may_ be a slightly increased chance of not getting a bh in OOM. Compiled but untested patch below. I had also looked at the read path, but it is only used for new disks so no harm in setting blocksize. 3) Rewrite to do I/O directly via pagecache? Cheers, Andreas PS - Neil and Ingo CC'd ========================================================================== --- linux.orig/drivers/md/md.c Tue Jul 10 17:05:24 2001 +++ linux/drivers/md/md.c Wed Oct 3 00:19:23 2001 @@ -913,10 +913,10 @@ static int write_disk_sb(mdk_rdev_t * rdev) { - struct buffer_head *bh; + struct buffer_head *bh[MD_SB_BYTES/512]; kdev_t dev; unsigned long sb_offset, size; - mdp_super_t *sb; + int blocksize, sb_chunks, i; if (!rdev->sb) { MD_BUG(); @@ -950,21 +950,30 @@ printk("(write) %s's sb offset: %ld\n", partition_name(dev), sb_offset); fsync_dev(dev); - set_blocksize(dev, MD_SB_BYTES); - bh = getblk(dev, sb_offset / MD_SB_BLOCKS, MD_SB_BYTES); - if (!bh) { - printk(GETBLK_FAILED, partition_name(dev)); - return 1; + blocksize = blksize_size[MAJOR(dev)][MINOR(dev)]; + if (blocksize == 0) + blocksize = BLOCK_SIZE; + sb_chunks = blocksize >= MD_SB_BYTES ? 1 : MD_SB_BYTES / blocksize; + for (i = 0; i < sb_chunks; i++) { + bh[i] = getblk(dev, sb_offset / blocksize + i, blocksize); + if (!bh[i]) { + printk(GETBLK_FAILED, partition_name(dev)); + while (i--) + brelse(bh[i]); + return 1; + } + if (blocksize > MD_SB_BYTES) + memset(bh[i]->b_data, 0, bh[i]->b_size); + memcpy(bh[i]->b_data, (char*)rdev->sb + i*blocksize, blocksize); + + mark_buffer_uptodate(bh[i], 1); + mark_buffer_dirty(bh[i]); + } + ll_rw_block(WRITE, sb_chunks, bh); + while (i--) { + wait_on_buffer(bh[i]); + brelse(bh[i]); } - memset(bh->b_data,0,bh->b_size); - sb = (mdp_super_t *) bh->b_data; - memcpy(sb, rdev->sb, MD_SB_BYTES); - - mark_buffer_uptodate(bh, 1); - mark_buffer_dirty(bh); - ll_rw_block(WRITE, 1, &bh); - wait_on_buffer(bh); - brelse(bh); fsync_dev(dev); skip: return 0; @@ -2745,7 +2764,7 @@ } default: - printk(KERN_WARNING "md: %s(pid %d) used obsolete MD ioctl, upgrade your software to use new ictls.\n", current->comm, current->pid); + printk(KERN_WARNING "md: %s(pid %d) used obsolete MD ioctl, upgrade your software to use new ioctls.\n", current->comm, current->pid); err = -EINVAL; goto abort_unlock; } -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer 2001-10-03 6:41 ` Andreas Dilger @ 2001-10-03 6:54 ` Neil Brown 2001-10-03 7:05 ` Alexander Viro 0 siblings, 1 reply; 8+ messages in thread From: Neil Brown @ 2001-10-03 6:54 UTC (permalink / raw) To: Andreas Dilger; +Cc: Alexander Viro, Linus Torvalds, linux-kernel, Ingo Molnar On Wednesday October 3, adilger@turbolabs.com wrote: > On Oct 03, 2001 00:21 -0400, Alexander Viro wrote: > > AFAICS, md.c also doesn't play nice with buffe cache. snip > > snip > > Three possibilities: > 1) Save the blocksize and restore it afterwards (ugly, not 100% fix) > 2) Do I/O in the current blocksize, which adds slight complication to > this function, but not too much (loop on the number of blocks to > write, normally only one). I don't know what impact this has > on superblock consistency, but in theory none because you can't > guarantee larger than single-sector I/O anyways, although there > _may_ be a slightly increased chance of not getting a bh in OOM. > Compiled but untested patch below. I had also looked at the read > path, but it is only used for new disks so no harm in setting blocksize. > 3) Rewrite to do I/O directly via pagecache? 4) Rewrite to do I/O directly via generic_make_request just like it does for all other I/O to underlying devices. It is on my (mental) todo list, but doesn't have a high priority. At the same time, it would be good to get write_disk_sb to notice if the write failed..... NeilBrown ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer 2001-10-03 6:54 ` Neil Brown @ 2001-10-03 7:05 ` Alexander Viro 2001-10-03 7:13 ` Neil Brown 0 siblings, 1 reply; 8+ messages in thread From: Alexander Viro @ 2001-10-03 7:05 UTC (permalink / raw) To: Neil Brown; +Cc: Andreas Dilger, Linus Torvalds, linux-kernel, Ingo Molnar On Wed, 3 Oct 2001, Neil Brown wrote: > On Wednesday October 3, adilger@turbolabs.com wrote: > > 3) Rewrite to do I/O directly via pagecache? > > 4) Rewrite to do I/O directly via generic_make_request just like it > does for all other I/O to underlying devices. > It is on my (mental) todo list, but doesn't have a high priority. > At the same time, it would be good to get write_disk_sb to notice > if the write failed..... Folks, there's a problem aside of set_blocksize(). You are doing memset and memcpy on unlocked buffer returned by getblk(). That's a race - if that buffer is currently being read into, we will get a random crap. And write it to disk afterwards. General rule: don't modify buffer you've got from getblk() unless you've locked it and mark it uptodate before you unlock. We had the same bug in ext2, BTW. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer 2001-10-03 7:05 ` Alexander Viro @ 2001-10-03 7:13 ` Neil Brown 0 siblings, 0 replies; 8+ messages in thread From: Neil Brown @ 2001-10-03 7:13 UTC (permalink / raw) To: Alexander Viro; +Cc: Andreas Dilger, Linus Torvalds, linux-kernel, Ingo Molnar On Wednesday October 3, viro@math.psu.edu wrote: > > > On Wed, 3 Oct 2001, Neil Brown wrote: > > > On Wednesday October 3, adilger@turbolabs.com wrote: > > > > 3) Rewrite to do I/O directly via pagecache? > > > > 4) Rewrite to do I/O directly via generic_make_request just like it > > does for all other I/O to underlying devices. > > It is on my (mental) todo list, but doesn't have a high priority. > > At the same time, it would be good to get write_disk_sb to notice > > if the write failed..... > > Folks, there's a problem aside of set_blocksize(). You are doing memset > and memcpy on unlocked buffer returned by getblk(). That's a race - > if that buffer is currently being read into, we will get a random crap. > And write it to disk afterwards. > > General rule: don't modify buffer you've got from getblk() unless you've > locked it and mark it uptodate before you unlock. We had the same > bug in ext2, BTW. What I am saying is "don't use the buffer cache at all" or any cache. Don't use getblk. Just: bh = kmalloc(sizeof(*bh)); bh->b_data = rdev->sb; /* it is always one page */ bh->b_rdev = rdev->dev; bh->b_rsector = sb_offset*2; bh->b_end_io = md_just_set_a_flag_and_call_wakeup; init_wait_queue_head(&bh->b_wait); bh->b_flags = 1<<B_Locked; generic_make_request(WRITE,bh); wait_disk_event((bh->b_flags & (1<<B_Locked)), bh->b_wait); err = bh->b_flags & (1<<B_Uptodate); kfree(bh); return err; with typos corrected and a fairly obvious definition of md_just_set_a_flag_and_call_wakeup fairly similar to end_buffer_io_sync. NeilBrown ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer 2001-10-03 4:10 ` Linus Torvalds 2001-10-03 4:21 ` Alexander Viro @ 2001-10-03 4:57 ` Andreas Dilger 1 sibling, 0 replies; 8+ messages in thread From: Andreas Dilger @ 2001-10-03 4:57 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, lvm-devel On Oct 03, 2001 04:10 +0000, Linus Torvalds wrote: > In article <20011002190547.A3323@cm.nu>, Shane Wegner <shane@cm.nu> wrote: > >I am getting the following out of fs/buffer.c immediately > >after bootup. The kernel is 2.4.11-pre2 when the message > >was added. > > > >Oct 2 17:35:08 continuum kernel: invalidate: busy buffer > >Oct 2 17:35:08 continuum last message repeated 7 times > > > >I assume this is an error though it doesn't seem to say so. > > Well, it's an error, but it's an error in that LVM invalidates the block > devices a bit too much, and the message really tells you that the code > refused to invalidate stuff that must not be invalidated. > > It's harmless, although I hope that the LVM people will become a bit > less invalidation-happy as a result of the warning (it's always happened > before, it just hasn't warned about it in earlier kernels). Given that 2.4.10+ have devices in page cache, is there _any_ reason why what the kernel sees on a device would be different than what user space reads from a device? I don't think it was ever an issue between whole-disk-dev and partition-dev aliasing, since both user-space and the kernel are accessing the same device. If not, then we can just change the PV_FLUSH code to not do invalidate_buffers() on the device for kernels 2.4.10+. There never was a very strong reason to do it for disk identification. Cheers, Andreas CC'd LVM folks to get their input on this. -- Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto, \ would they cancel out, leaving him still hungry?" http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2001-10-03 7:13 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-10-03 2:05 2.4.11-pre2 fs/buffer.c: invalidate: busy buffer Shane Wegner 2001-10-03 4:10 ` Linus Torvalds 2001-10-03 4:21 ` Alexander Viro 2001-10-03 6:41 ` Andreas Dilger 2001-10-03 6:54 ` Neil Brown 2001-10-03 7:05 ` Alexander Viro 2001-10-03 7:13 ` Neil Brown 2001-10-03 4:57 ` Andreas Dilger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox