[PATCH 0/4] detect online disk resize

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] detect online disk resize
@ 2008-05-05 23:04 Andrew Patterson
  2008-05-05 23:04 ` [PATCH 1/2] Added flush_disk to factor out common buffer cache flushing code Andrew Patterson
  2008-05-05 23:04 ` [PATCH 2/2] Wrapper for lower-level revalidate_disk routines Andrew Patterson
  0 siblings, 2 replies; 7+ messages in thread
From: Andrew Patterson @ 2008-05-05 23:04 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-kernel, viro, axboe, andmike

This patch series handles online disk resizes that are currently not
completely recognized by the kernel using the existing revalidate_disk
routines.  An online resize can occur when growing or shrinking a
Fibre Channel LUN or perhaps by adding a disk to an existing RAID
volume.

The kernel currently recognizes a device size change when
revalidate_disk() is called; however, the block layer does not use the
new size while it has any current openers on the device. So, for
example, if LVM has an volume open on the device, you will generally
not see the size change until after a reboot. We fix this problem by
creating a wrapper to be used with lower-level revalidate_disk
routines.  This wrapper first calls the lower-level driver's
revalidate_disk routine. It then compares the gendisk capacity to the
block devices inode size. If there is a difference, we adjust the
block device's size. If the size has changed, we then flush the disk
for safety.

This patch series only modifies the sd driver to use these changes as
that is all that I currently have to test with. Device drivers like cciss
and DAC960 should probably use it as well.

Diff stats:

 drivers/scsi/sd.c  |    4 +--
 fs/block_dev.c     |   76 +++++++++++++++++++++++++++++++++++++++++++++++++---
 include/linux/fs.h |    1 +
 3 files changed, 74 insertions(+), 7 deletions(-)

Commits:

 - Added flush_disk to factor out common buffer cache flushing code.
 - Wrapper for lower-level revalidate_disk routines.
 - Adjust block device size after an online resize of a disk.
 - SCSI sd driver calls revalidate_disk wrapper.

-- 
Andrew Patterson

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] Added flush_disk to factor out common buffer cache flushing code.
  2008-05-05 23:04 [PATCH 0/4] detect online disk resize Andrew Patterson
@ 2008-05-05 23:04 ` Andrew Patterson
  2008-05-06  8:44   ` Christoph Hellwig
  2008-05-05 23:04 ` [PATCH 2/2] Wrapper for lower-level revalidate_disk routines Andrew Patterson
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Patterson @ 2008-05-05 23:04 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-kernel, viro, axboe, andmike, Andrew Patterson

Added flush_disk to factor out common buffer cache flushing code.

We need to be able to flush the buffer cache for more than just when a
disk is changed, so we factor out common cache flush code in
check_disk_change() to an internal flush_disk() routine.  This routine
will then be used for both disk changes and disk resizes (in a later
patch).

Include the disk name in the text indicating that there are busy
inodes on the device and increase the KERN severity of the message.

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
---

 fs/block_dev.c |   33 ++++++++++++++++++++++++++++-----
 1 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7d822fa..fcd0398 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -867,6 +867,33 @@ struct block_device *open_by_devnum(dev_t dev, unsigned mode)
 
 EXPORT_SYMBOL(open_by_devnum);
 
+
+/**
+ * flush_disk - invalidates all buffer-cache entries on a disk
+ *
+ * @bdev:	struct block device to be flushed
+ *
+ * Invalidates all buffer-cache entries on a disk. It should be called
+ * when a disk has been changed -- either by a media change or online
+ * resize.
+ */
+static void flush_disk(struct block_device *bdev)
+{
+	if (__invalidate_device(bdev)) {
+		char name[BDEVNAME_SIZE] = "";
+
+		if (bdev->bd_disk)
+			disk_name(bdev->bd_disk, 0, name);
+		printk(KERN_WARNING "VFS: busy inodes on changed media %s\n",
+		       name);
+	}
+
+	if (!bdev->bd_disk)
+		return;
+	if (bdev->bd_disk->minors > 1)
+		bdev->bd_invalidated = 1;
+}
+
 /*
  * This routine checks whether a removable media has been changed,
  * and invalidates all buffer-cache-entries in that case. This
@@ -886,13 +913,9 @@ int check_disk_change(struct block_device *bdev)
 	if (!bdops->media_changed(bdev->bd_disk))
 		return 0;
 
-	if (__invalidate_device(bdev))
-		printk("VFS: busy inodes on changed media.\n");
-
+	flush_disk(bdev);
 	if (bdops->revalidate_disk)
 		bdops->revalidate_disk(bdev->bd_disk);
-	if (bdev->bd_disk->minors > 1)
-		bdev->bd_invalidated = 1;
 	return 1;
 }
 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] Wrapper for lower-level revalidate_disk routines.
  2008-05-05 23:04 [PATCH 0/4] detect online disk resize Andrew Patterson
  2008-05-05 23:04 ` [PATCH 1/2] Added flush_disk to factor out common buffer cache flushing code Andrew Patterson
@ 2008-05-05 23:04 ` Andrew Patterson
  1 sibling, 0 replies; 7+ messages in thread
From: Andrew Patterson @ 2008-05-05 23:04 UTC (permalink / raw)
  To: linux-scsi; +Cc: linux-kernel, viro, axboe, andmike, Andrew Patterson

Wrapper for lower-level revalidate_disk routines.

This is a wrapper for the lower-level revalidate_disk call-backs such
as sd_revalidate_disk(). It allows us to perform pre and post
operations when calling them.

We will use this wrapper in a later patch to adjust block device sizes
after an online resize (a _post_ operation).

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
---

 fs/block_dev.c     |   21 +++++++++++++++++++++
 include/linux/fs.h |    1 +
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index fcd0398..b510451 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -894,6 +894,27 @@ static void flush_disk(struct block_device *bdev)
 		bdev->bd_invalidated = 1;
 }
 
+/**
+ * revalidate_disk - wrapper for lower-level driver's revalidate_disk
+ *                   call-back
+ *
+ * @disk: struct gendisk to be revalidated
+ *
+ * This routine is a wrapper for lower-level driver's revalidate_disk
+ * call-backs.  It is used to do common pre and post operations needed
+ * for all revalidate_disk operations.
+ */
+int revalidate_disk(struct gendisk *disk)
+{
+	int ret = 0;
+
+	if (disk->fops->revalidate_disk)
+		ret = disk->fops->revalidate_disk(disk);
+
+	return ret;
+}
+EXPORT_SYMBOL(revalidate_disk);
+
 /*
  * This routine checks whether a removable media has been changed,
  * and invalidates all buffer-cache-entries in that case. This
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b84b848..278172f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1665,6 +1665,7 @@ extern int fs_may_remount_ro(struct super_block *);
  */
 #define bio_data_dir(bio)	((bio)->bi_rw & 1)
 
+extern int revalidate_disk(struct gendisk *);
 extern int check_disk_change(struct block_device *);
 extern int __invalidate_device(struct block_device *);
 extern int invalidate_partition(struct gendisk *, int);


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Added flush_disk to factor out common buffer cache flushing code.
  2008-05-05 23:04 ` [PATCH 1/2] Added flush_disk to factor out common buffer cache flushing code Andrew Patterson
@ 2008-05-06  8:44   ` Christoph Hellwig
  2008-05-07 17:59     ` James Bottomley
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2008-05-06  8:44 UTC (permalink / raw)
  To: Andrew Patterson; +Cc: linux-scsi, linux-kernel, viro, axboe, andmike

On Mon, May 05, 2008 at 05:04:19PM -0600, Andrew Patterson wrote:
> Added flush_disk to factor out common buffer cache flushing code.
> 
> We need to be able to flush the buffer cache for more than just when a
> disk is changed, so we factor out common cache flush code in
> check_disk_change() to an internal flush_disk() routine.  This routine
> will then be used for both disk changes and disk resizes (in a later
> patch).
> 
> Include the disk name in the text indicating that there are busy
> inodes on the device and increase the KERN severity of the message.

This doesn't make much sense to me.  When a disk has grown there's no
point in invalidating any buffers, and when it has shrunk it's too late
already.  Also I suspect modern filesystems might be really allergic to
this kind of under the hood actions.  That is if they use the bdev
mapping at all, something that at least xfs and I think btrfs aswell
don't do at all.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Added flush_disk to factor out common buffer cache flushing code.
  2008-05-06  8:44   ` Christoph Hellwig
@ 2008-05-07 17:59     ` James Bottomley
  2008-05-07 18:08       ` Andrew Patterson
  0 siblings, 1 reply; 7+ messages in thread
From: James Bottomley @ 2008-05-07 17:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Patterson, linux-scsi, linux-kernel, viro, axboe, andmike

On Tue, 2008-05-06 at 04:44 -0400, Christoph Hellwig wrote:
> On Mon, May 05, 2008 at 05:04:19PM -0600, Andrew Patterson wrote:
> > Added flush_disk to factor out common buffer cache flushing code.
> > 
> > We need to be able to flush the buffer cache for more than just when a
> > disk is changed, so we factor out common cache flush code in
> > check_disk_change() to an internal flush_disk() routine.  This routine
> > will then be used for both disk changes and disk resizes (in a later
> > patch).
> > 
> > Include the disk name in the text indicating that there are busy
> > inodes on the device and increase the KERN severity of the message.
> 
> This doesn't make much sense to me.  When a disk has grown there's no
> point in invalidating any buffers, and when it has shrunk it's too late
> already.  Also I suspect modern filesystems might be really allergic to
> this kind of under the hood actions.  That is if they use the bdev
> mapping at all, something that at least xfs and I think btrfs aswell
> don't do at all.

I agree on the grown disc case.  For the shrunk disk, we need at least
to invalidate the sectors that no-longer physically exist.

The two use cases for shrinking I can see are

     1. planned: the fs is already shrunk to within the new boundaries
        and all data is relocated, so invalidate is fine (any dirty
        buffers that might exist in the shrunk region are there only
        because they were relocated but not yet written to their
        original location).
     2. unplanned:  In this case, the fs is probably toast, so whether
        we invalidate or not isn't going to make a whole lot of
        difference; it's still going to try to read or write from
        sectors beyond the new size and get I/O errors.

Unfortunately, we don't seem to have a partial invalidation function for
the page cache and filesystem, so should we have one?

James



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Added flush_disk to factor out common buffer cache flushing code.
  2008-05-07 17:59     ` James Bottomley
@ 2008-05-07 18:08       ` Andrew Patterson
  2008-05-07 18:21         ` James Bottomley
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Patterson @ 2008-05-07 18:08 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, linux-scsi, linux-kernel, viro, axboe, andmike

On Wed, 2008-05-07 at 12:59 -0500, James Bottomley wrote:
> On Tue, 2008-05-06 at 04:44 -0400, Christoph Hellwig wrote:
> > On Mon, May 05, 2008 at 05:04:19PM -0600, Andrew Patterson wrote:
> > > Added flush_disk to factor out common buffer cache flushing code.
> > > 
> > > We need to be able to flush the buffer cache for more than just when a
> > > disk is changed, so we factor out common cache flush code in
> > > check_disk_change() to an internal flush_disk() routine.  This routine
> > > will then be used for both disk changes and disk resizes (in a later
> > > patch).
> > > 
> > > Include the disk name in the text indicating that there are busy
> > > inodes on the device and increase the KERN severity of the message.
> > 
> > This doesn't make much sense to me.  When a disk has grown there's no
> > point in invalidating any buffers, and when it has shrunk it's too late
> > already.  Also I suspect modern filesystems might be really allergic to
> > this kind of under the hood actions.  That is if they use the bdev
> > mapping at all, something that at least xfs and I think btrfs aswell
> > don't do at all.
> 
> I agree on the grown disc case.  For the shrunk disk, we need at least
> to invalidate the sectors that no-longer physically exist.
> 
> The two use cases for shrinking I can see are
> 
>      1. planned: the fs is already shrunk to within the new boundaries
>         and all data is relocated, so invalidate is fine (any dirty
>         buffers that might exist in the shrunk region are there only
>         because they were relocated but not yet written to their
>         original location).

So why do we need to invalidate here if everything is fine?


>      2. unplanned:  In this case, the fs is probably toast, so whether
>         we invalidate or not isn't going to make a whole lot of
>         difference; it's still going to try to read or write from
>         sectors beyond the new size and get I/O errors.
> 

Invalidating here might be useful in that errors are reported earlier.

> Unfortunately, we don't seem to have a partial invalidation function for
> the page cache and filesystem, so should we have one?
> 

I have been having problems with my email, hence the missing 2 patches.
I'll resend the whole series and add flush_disk() call in
revalidate_disk() as separate patch, so that the flush code can be
optionally applied.


> James
> 
> 
-- 
Andrew Patterson


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] Added flush_disk to factor out common buffer cache flushing code.
  2008-05-07 18:08       ` Andrew Patterson
@ 2008-05-07 18:21         ` James Bottomley
  0 siblings, 0 replies; 7+ messages in thread
From: James Bottomley @ 2008-05-07 18:21 UTC (permalink / raw)
  To: Andrew Patterson
  Cc: Christoph Hellwig, linux-scsi, linux-kernel, viro, axboe, andmike

On Wed, 2008-05-07 at 18:08 +0000, Andrew Patterson wrote:
> On Wed, 2008-05-07 at 12:59 -0500, James Bottomley wrote:
> > On Tue, 2008-05-06 at 04:44 -0400, Christoph Hellwig wrote:
> > > On Mon, May 05, 2008 at 05:04:19PM -0600, Andrew Patterson wrote:
> > > > Added flush_disk to factor out common buffer cache flushing code.
> > > > 
> > > > We need to be able to flush the buffer cache for more than just when a
> > > > disk is changed, so we factor out common cache flush code in
> > > > check_disk_change() to an internal flush_disk() routine.  This routine
> > > > will then be used for both disk changes and disk resizes (in a later
> > > > patch).
> > > > 
> > > > Include the disk name in the text indicating that there are busy
> > > > inodes on the device and increase the KERN severity of the message.
> > > 
> > > This doesn't make much sense to me.  When a disk has grown there's no
> > > point in invalidating any buffers, and when it has shrunk it's too late
> > > already.  Also I suspect modern filesystems might be really allergic to
> > > this kind of under the hood actions.  That is if they use the bdev
> > > mapping at all, something that at least xfs and I think btrfs aswell
> > > don't do at all.
> > 
> > I agree on the grown disc case.  For the shrunk disk, we need at least
> > to invalidate the sectors that no-longer physically exist.
> > 
> > The two use cases for shrinking I can see are
> > 
> >      1. planned: the fs is already shrunk to within the new boundaries
> >         and all data is relocated, so invalidate is fine (any dirty
> >         buffers that might exist in the shrunk region are there only
> >         because they were relocated but not yet written to their
> >         original location).
> 
> So why do we need to invalidate here if everything is fine?

We need rid of stray pages. Obviously dirty ones that would cause write
errors at some point need to be killed.  The danger ones are read only
ones that can hang around for a long time.  The (perhaps unlikely)
scenario where they bite is if the disk is shrunk then expanded it's one
of those annoying scenarios that most people don't care about: expanded
space is empty, what does it matter if we get stray data; and security
people jump up and down and scream about data leaking.

> >      2. unplanned:  In this case, the fs is probably toast, so whether
> >         we invalidate or not isn't going to make a whole lot of
> >         difference; it's still going to try to read or write from
> >         sectors beyond the new size and get I/O errors.
> > 
> 
> Invalidating here might be useful in that errors are reported earlier.

Yes ... force the filesystem to have errors immediately before it sees
them on writeback or read ahead or something delayed.

> > Unfortunately, we don't seem to have a partial invalidation function for
> > the page cache and filesystem, so should we have one?
> > 
> 
> I have been having problems with my email, hence the missing 2 patches.
> I'll resend the whole series and add flush_disk() call in
> revalidate_disk() as separate patch, so that the flush code can be
> optionally applied.

James



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-05-07 18:21 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-05 23:04 [PATCH 0/4] detect online disk resize Andrew Patterson
2008-05-05 23:04 ` [PATCH 1/2] Added flush_disk to factor out common buffer cache flushing code Andrew Patterson
2008-05-06  8:44   ` Christoph Hellwig
2008-05-07 17:59     ` James Bottomley
2008-05-07 18:08       ` Andrew Patterson
2008-05-07 18:21         ` James Bottomley
2008-05-05 23:04 ` [PATCH 2/2] Wrapper for lower-level revalidate_disk routines Andrew Patterson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox