linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Christoph Hellwig <hch@infradead.org>
Cc: Andrew Patterson <andrew.patterson@hp.com>,
	Jens Axboe <axboe@kernel.dk>,
	linux-raid@vger.kernel.org, dm-devel@redhat.com,
	linux-kernel@vger.kernel.org, James.Bottomley@suse.de
Subject: Re: [PATCH] Fix over-zealous flush_disk when changing device size.
Date: Fri, 4 Mar 2011 11:16:24 +1100	[thread overview]
Message-ID: <20110304111624.4be27aaf@notabene.brown> (raw)
In-Reply-To: <20110303143120.GA8134@infradead.org>

On Thu, 3 Mar 2011 09:31:20 -0500 Christoph Hellwig <hch@infradead.org> wrote:

> On Thu, Feb 17, 2011 at 04:50:57PM +1100, NeilBrown wrote:
> > 
> > Hi Andrew (and others)
> >  I wonder if you would review the following for me and comment.
> 
> Please send think in this area through -fsdevel next time, thanks!

Will try to remember - it is sometimes hard to get this sort of patch before
the right audience ... I thought "block layer" rather than "file systems" :-(

Thanks for finding it anyway.

> 
> > There are two cases when we call flush_disk.
> > In one, the device has disappeared (check_disk_change) so any
> > data will hold becomes irrelevant.
> > In the oter, the device has changed size (check_disk_size_change)
> > so data we hold may be irrelevant.
> > 
> > In both cases it makes sense to discard any 'clean' buffers,
> > so they will be read back from the device if needed.
> 
> Does it?  If the device has disappeared we can't read them back anyway.

I think that is the point - return an error rather than stale data.

> If the device has resized to a smaller size the same is true about
> those buffers that have gone away, and if it has resized to a larger
> size invalidating anything doesn't make sense at all.  I think this
> area needs more love than a quick kill_dirty hackjob.

I tend to agree.  I wasn't entirely convinced by the changelog comments on
the original offending patch, but I couldn't convince myself there was no
justification either, and I wanted to fix the corruption I saw - while close
to the end of a release cycle - without introducing any new regressions.

> 
> > In the former case it makes sense to discard 'dirty' buffers
> > as there will never be anywhere safe to write the data.  In the
> > second case it *does*not* make sense to discard dirty buffers
> > as that will lead to file system corruption when you simply enlarge
> > the containing devices.
> 
> Doing anything like this at the buffer cache layer or inode cache layer
> doesn't make any sense.  If a device goes away or shrinks below the
> filesystem size the filesystem simply needs to be shut down and in te
> former size the admin needs to start a manual repair.  Trying to do
> any botch jobs in lower layer never works in practice.

Amen.
What I personally would really like to see is an interface for the block
device to say to the filesystem (or more specifically: whatever has bdclaimed
it) "I am about to resize to $X - is that OK?" and also "I have resized -
deal with it".

> 
> For now I think the best short term fix is to simply revert commit
> 608aeef17a91747d6303de4df5e2c2e6899a95e8
> 
> 	"Call flush_disk() after detecting an online resize."

You may be right, but I suspect that Andrew Patterson had a real issue to
solve which lead to submitting it, and I'd really like to understand that
issue before I would feel confident just reverting it.

Andrew:  are you out there?  Can you provide some background for your patch?

Thanks,
NeilBrown

  reply	other threads:[~2011-03-04  0:16 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-17  5:50 [PATCH] Fix over-zealous flush_disk when changing device size NeilBrown
2011-02-17 17:03 ` Jeff Moyer
2011-02-23  8:48   ` Kwolek, Adam
2011-02-23 10:01     ` NeilBrown
2011-02-21 19:36 ` Jeff Moyer
2011-02-21 21:14   ` NeilBrown
2011-03-03 14:31 ` Christoph Hellwig
2011-03-04  0:16   ` NeilBrown [this message]
2011-03-04 17:25     ` Andrew Patterson
2011-03-06  6:47       ` NeilBrown
2011-03-07  4:22         ` Andrew Patterson
2011-03-07 16:46           ` [dm-devel] " James Bottomley
2011-03-07 22:44             ` NeilBrown
2011-03-07 22:56               ` James Bottomley
2011-03-08  0:04                 ` NeilBrown
2011-03-16 20:30                   ` Jeff Moyer
2011-03-17  1:28                     ` NeilBrown
2011-03-17 17:33                       ` Jeff Moyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110304111624.4be27aaf@notabene.brown \
    --to=neilb@suse.de \
    --cc=James.Bottomley@suse.de \
    --cc=andrew.patterson@hp.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).