linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: Eric Sandeen <sandeen@redhat.com>
Cc: "Huang Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"Juergens Dirk (CM-AI/ECO2)" <Dirk.Juergens@de.bosch.com>
Subject: Re: ext4 filesystem bad extent error review
Date: Fri, 3 Jan 2014 12:51:11 -0500	[thread overview]
Message-ID: <20140103175111.GA4336@thunk.org> (raw)
In-Reply-To: <52C6F22A.4040202@redhat.com>

On Fri, Jan 03, 2014 at 11:23:54AM -0600, Eric Sandeen wrote:
> > The BLKFLSBUF ioctl does __not__ send a CACHE FLUSH command to the
> > hardware device.  It forces all of the dirty buffers in memory to the
> > storage device, and then it invalidates all the buffer cache, but it
> > does not send a CACHE FLUSH command to the hardware.  Hence, the
> > hardware is free to write it to its on-disk cache, and not necessarily
> > guarantee that the data is written to stable store.  (For an example
> > use case of BLKFLSBUF, we use it in e2fsck to drop the buffer cache
> > for benchmarking purposes.)
> 
> Are you sure?  for a bdev w/ ext4 on it:
> 
> BLKFLSBUF
> 	fsync_bdev
> 		sync_filesystem
> 			sync_fs
> 				ext4_sync_fs
> 					blkdev_issue_flush

This call chain only happens if the block device is mounted.

If you only have the block device opened, and doing read and writes
directly to the block device, then BLKFLSBUF will not result in
blkdev_issue_flush() being called.

Actually, BLKFLSBUF is really a bit of a mess, and it's because it
conflates multiple meanins of the word "flush" (which is ambiguous).
For ram disks, it actually destroys the ram disk (due to a
implementation detail about how the original ramdisk driver was
implemented).  The original meaning of the ioctl was to safely remove
all of the buffers from the buffer cache --- for example, to deal with
a 5.25" floppy disk being replaced, since there's no way for the
hardware to signal this to the OS, or for benchmarking purposes.

Adding things like the call to sync_fs() has made the BLKFLSBUF ioctl
more and more confused, and arguably we should add some new ioctl's
which separate out some of these use cases.  For example, there is
currently no way to force all dirty buffers for an unmounted block
devicein the buffer cache to be written to disk, without actually
dropping all of the clean buffers from the buffer cache (as would be
the case with BLKFLSBUF), and without causing a forced CACHE_FLUSH
command (as would be the case if you called fsync).

The main reason why we haven't is that it's rare that people would
want to do these things in isolation, but the real problem is that
exactly what the semantics are for BLKFLSBUF are a bit confused, and
hence confusing.  It's not even well documented --- I had to go diving
into the kernel sources to be sure, and even then, as you've pointed
out, what happens is variable depending on whether the block device is
mounted or not.

					- Ted

  reply	other threads:[~2014-01-03 17:51 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-02  4:59 ext4 filesystem bad extent error review Huang Weller (CM/ESW12-CN)
2014-01-02 18:42 ` Theodore Ts'o
2014-01-03  3:16   ` Huang Weller (CM/ESW12-CN)
2014-01-03 15:48     ` Theodore Ts'o
2014-01-03 16:40       ` AW: " Juergens Dirk (CM-AI/ECO2)
2014-01-06  2:23         ` Huang Weller (CM/ESW12-CN)
2014-01-03 17:23       ` Eric Sandeen
2014-01-03 17:51         ` Theodore Ts'o [this message]
2014-01-03 17:54           ` Eric Sandeen
2014-01-03 18:06             ` Theodore Ts'o
2014-01-03 18:21               ` AW: " Juergens Dirk (CM-AI/ECO2)
2014-01-06  3:53                 ` Huang Weller (CM/ESW12-CN)
2014-01-03 16:29   ` AW: " Juergens Dirk (CM-AI/ECO2)
2014-01-03 17:25     ` Eric Sandeen
2014-01-03 18:45       ` AW: " Juergens Dirk (CM-AI/ECO2)
2014-01-03 18:48         ` Eric Sandeen
2014-01-03 18:56           ` AW: " Juergens Dirk (CM-AI/ECO2)
2014-01-06  5:45             ` Huang Weller (CM/ESW12-CN)
2014-01-06  1:44           ` Huang Weller (CM/ESW12-CN)
2014-01-06  5:17         ` Huang Weller (CM/ESW12-CN)
2014-01-06  5:10       ` [Attachment has been removed]RE: " Huang Weller (CM/ESW12-CN)
2014-01-07  9:10       ` Huang Weller (CM/ESW12-CN)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140103175111.GA4336@thunk.org \
    --to=tytso@mit.edu \
    --cc=Dirk.Juergens@de.bosch.com \
    --cc=Weller.Huang@cn.bosch.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).