linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: James Bottomley <James.Bottomley@suse.de>
Cc: device-mapper development <dm-devel@redhat.com>,
	Jens Axboe <axboe@kernel.dk>,
	linux-raid@vger.kernel.org, linux-kernel@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	linux-fsdevel@vger.kernel.org
Subject: Re: [dm-devel] [PATCH] Fix over-zealous flush_disk when changing device size.
Date: Tue, 8 Mar 2011 09:44:12 +1100	[thread overview]
Message-ID: <20110308094412.1c45b277@notabene.brown> (raw)
In-Reply-To: <1299516418.15258.4.camel@mulgrave.site>

(linux-fsdevel added - seems relevant)

On Mon, 07 Mar 2011 10:46:58 -0600 James Bottomley <James.Bottomley@suse.de>
wrote:

> On Sun, 2011-03-06 at 21:22 -0700, Andrew Patterson wrote:
> > On Sun, 2011-03-06 at 17:47 +1100, NeilBrown wrote:
> > >  Would you be uncomfortable if I asked Linus to revert both my fix and your
> > >  original patch??
> > 
> > James Bottomley wanted me to put this functionality in. I have no
> > problem with reverting it myself, especially if it is causing other
> > problems.  I would have to say that you need to ask him (or rather, I am
> > not qualified to render an opinion here).
> 
> So it seems we have a couple of problems: the first being that
> flush_disk() doesn't actually do what it says (flush the disk).  If it's
> just discarding all data, dirty or clean, then its use in the
> grow/shrink interface is definitely wrong.
> 
> The idea is that before we complete the grow/shrink, we make sure that
> the device doesn't have any errors, so we want to try to write out all
> dirty buffers to make sure they still have a home.  If flush_disk()
> doesn't do that, then we need to use a different interface ... what's
> the interface to flush a disk?
> 

Hi James,

I *always* want to make sure that my device doesn't have any errors, not just
when it changes size... but I'm not sure that regularly flushing out data is
the right way to do it.
But maybe I still misunderstand what the real point of this is.

As for the correct interface to flush a disk - there isn't one.
One doesn't flush a storage device, one flushes a cache - to a storage
device.  And there is not a 1-to-1 mapping.

A block device *is* associated with one cache - one which is used for
caching accesses through /dev/XXX and also by some filesystems to cache some
metadata.  You can flush this cache with sync_blockdev().  This just
flushes out dirty pages in that cache, it doesn't discard clean pages.
invalidate_bdev() can discard clean pages.  Call both, and get both outcomes.

If a filesystem is mounted directly on a given block_device, then it
should have a valid s_bdev pointer and it is possible to find that filesystem
from the block_device using get_active_super().  You could then call
sync_filesystem() to flush out dirty data.  There isn't really a good
interface to discard clean data. shrink_dcache_sb() does some of it, other
bits of code do other bits.

Note that a block_device also can have a pointer to a super_block (bd_super).
This does not seem to be widely used .... ext3 and ext4 use it so that memory
pressure felt by the block-device cache can transmitted to the fs, so the
fs can associate private data with the block device's cache I guess.
I don't think bd_super is sufficiently persistent reference to be usable
for sync_filesystems (it could race with unmount).


But if the block device is within a dm or md device, or is an external
journal for e.g. ext3, or is in some more complex relationship with a
filesystem, then there is no way to request that any particular cache get
flushed to the block device.  And if some user-space app has cached data
for the device .... there is no way to get at that either.

If we wanted a "proper" solution for this we probably need to leverage
the 'bd_claim' concept.  When a device is claimed, allow an 'operations'
structure to be provided with operations like:
   flush_caches  (writes out data)
   purge_caches  (discards clean data)
   discard_caches  (discards all data)
   prepare_resize (is allowed to fail)
   commit_resize
   freeze_io
   thaw_io

But flush_disk is definitely a wrong interface.  It has a meaningless name
(as discussed above), and purges some caches while discarding others.

What do we *really* lose if we just revert that original patch?

Thanks,
NeilBrown

       reply	other threads:[~2011-03-07 22:44 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20110217165057.5c50e566@notabene.brown>
     [not found] ` <20110303143120.GA8134@infradead.org>
     [not found]   ` <20110304111624.4be27aaf@notabene.brown>
     [not found]     ` <1299259506.2118.24.camel@grinch>
     [not found]       ` <20110306174755.49404c8e@notabene.brown>
     [not found]         ` <1299471771.2228.11.camel@grinch>
     [not found]           ` <1299516418.15258.4.camel@mulgrave.site>
2011-03-07 22:44             ` NeilBrown [this message]
2011-03-07 22:56               ` [dm-devel] [PATCH] Fix over-zealous flush_disk when changing device size James Bottomley
2011-03-08  0:04                 ` NeilBrown
2011-03-16 20:30                   ` Jeff Moyer
2011-03-17  1:28                     ` NeilBrown
2011-03-17 17:33                       ` Jeff Moyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110308094412.1c45b277@notabene.brown \
    --to=neilb@suse.de \
    --cc=James.Bottomley@suse.de \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).