linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <clm@fb.com>
To: Dan Williams <dan.j.williams@intel.com>,
	<lsf-pc@lists.linux-foundation.org>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	<jmoyer@redhat.com>, <david@fromorbit.com>,
	Jens Axboe <axboe@kernel.dk>,
	Bryan E Veal <bryan.e.veal@intel.com>,
	Annie Foong <annie.foong@intel.com>
Subject: Re: [LSF/MM TOPIC] atomic block device
Date: Mon, 17 Feb 2014 08:05:07 -0500	[thread overview]
Message-ID: <53020903.1050006@fb.com> (raw)
In-Reply-To: <CAA9_cmf7Y1TL8XqR7dYUn=Pv-En2e0X0FM0zdpkiBkUuNBGKfQ@mail.gmail.com>

On 02/15/2014 10:04 AM, Dan Williams wrote:
> In response to Dave's call [1] and highlighting Jeff's attend request
> [2] I'd like to stoke a discussion on an emulation layer for atomic
> block commands.  Specifically, SNIA has laid out their position on the
> command set an atomic block device may support (NVM Programming Model
> [3]) and it is a good conversation piece for this effort.  The goal
> would be to review the proposed operations, identify the capabilities
> that would be readily useful to filesystems / existing use cases, and
> tear down a straw man implementation proposal.
>
> The SNIA defined capabilities that seem the highest priority to implement are:
> * ATOMIC_MULTIWRITE - dis-contiguous LBA ranges, power fail atomic, no
> ordering constraint relative to other i/o
>
> * ATOMIC_WRITE - contiguous LBA range, power fail atomic, no ordering
> constraint relative to other i/o
>
> * EXISTS - not an atomic command, but defined in the NPM.  It is akin
> to SEEK_{DATA|HOLE} to test whether an LBA is mapped or unmapped.  If
> the LBA is mapped additionally specifies whether data is present or
> the LBA is only allocated.
>
> * SCAR - again not an atomic command, but once we have metadata can
> implement a bad block list, analogous to the bad-block-list support in
> md.
>
> Initial thought is that this functionality is better implemented as a
> library a block device driver (bio-based or request-based) can call to
> emulate these features.  In the case where the feature is directly
> supported by the underlying hardware device the emulation layer will
> stub out and pass it through.  The argument for not doing this as a
> device-mapper target or stacked block device driver is to ease
> provisioning and make the emulation transparent.  On the other hand,
> the argument for doing this as a virtual block device is that the
> "failed to parse device metadata" is a known failure scenario for
> dm/md, but not sd for example.

Hi Dan,

I'd suggest a dm device instead of a special library, mostly because the 
emulated device is likely to need some kind of cleanup action after a 
crash, and the dm model is best suited to cleanly provide that.  It's 
also a good fit for people that want to duct tape a small amount of very 
fast nvm onto relatively slower devices.

The absolute minimum to provide something useful is a 16K discontig 
atomic.  That won't help the filesystems much, but it will allow mysql 
to turn off double buffering.  Oracle would benefit from ~64K, mostly 
from a safety point of view since they don't double buffer.

Helping the filesystems is harder, we need atomics bigger than any 
individual device is likely to provide.  But as Dave says elsewhere in 
the thread, we can limit that for specific workloads.

I'm not sold on SCAR, since I'd expect the FTL or drive firmware provide 
that for us, what use case do you have in mind there?

-chris

  parent reply	other threads:[~2014-02-17 13:02 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-15 15:04 [LSF/MM TOPIC] atomic block device Dan Williams
2014-02-15 17:55 ` Andy Rudoff
2014-02-15 18:29   ` Howard Chu
2014-02-15 18:31     ` Howard Chu
2014-02-15 18:02 ` James Bottomley
2014-02-15 18:15   ` Andy Rudoff
2014-02-15 20:25     ` James Bottomley
2014-03-20 20:10       ` Jeff Moyer
     [not found] ` <CABBL8E+r+Uao9aJsezy16K_JXQgVuoD7ArepB46WTS=zruHL4g@mail.gmail.com>
2014-02-15 21:35   ` Dan Williams
2014-02-17  8:56   ` Dave Chinner
2014-02-17  9:51     ` [Lsf-pc] " Jan Kara
2014-02-17 10:20       ` Howard Chu
2014-02-18  0:10         ` Dave Chinner
2014-02-18  8:59           ` Alex Elsayed
2014-02-18 13:17             ` Dave Chinner
2014-02-18 14:09               ` Theodore Ts'o
2014-02-17 13:05 ` Chris Mason [this message]
2014-02-18 19:07   ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53020903.1050006@fb.com \
    --to=clm@fb.com \
    --cc=annie.foong@intel.com \
    --cc=axboe@kernel.dk \
    --cc=bryan.e.veal@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).