All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@linux.intel.com>
To: Chris Mason <chris.mason@fusionio.com>
Cc: Jeff Moyer <jmoyer@redhat.com>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	Jens Axboe <axboe@kernel.dk>
Subject: Re: [PATCH 1/2] block: Add support for atomic writes
Date: Wed, 13 Nov 2013 16:35:54 -0500	[thread overview]
Message-ID: <20131113213554.GL6900@linux.intel.com> (raw)
In-Reply-To: <20131113204438.3802.80855@localhost.localdomain>

On Wed, Nov 13, 2013 at 03:44:38PM -0500, Chris Mason wrote:
> Quoting Matthew Wilcox (2013-11-12 10:11:51)
> > On Thu, Nov 07, 2013 at 08:52:20AM -0500, Chris Mason wrote:
> > > Unfortunately, it's hard to say.  I think the fusionio cards are the
> > > only shipping devices that support this, but I've definitely heard that
> > > others plan to support it as well.  mariadb/percona already support the
> > > atomics via fusionio specific ioctls, and turning that into a real
> > > O_ATOMIC is a priority so other hardware can just hop on the train.
> > > 
> > > This feature in general is pretty natural for the log structured squirrels
> > > they stuff inside flash, so I'd expect everyone to support it.  Matthew,
> > > how do you feel about all of this?
> > 
> > NVMe doesn't have support for this functionality.  I know what stories I've
> > heard from our internal device teams about what they can and can't support
> > in the way of this kind of thing, but I obviously can't repeat them here!
> 
> There are some atomics in the NVMe spec though, correct?  The minimum
> needed for database use is only ~16-64K.

Yes, NVMe has limited atomic support.  It has two fields:

  Atomic Write Unit Normal (AWUN): This field indicates the atomic write
  size for the controller during normal operation. This field is specified
  in logical blocks and is a 0’s based value. If a write is submitted
  of this size or less, the host is guaranteed that the write is atomic
  to the NVM with respect to other read or write operations. If a write
  is submitted that is greater than this size, there is no guarantee
  of atomicity.

  A value of FFFFh indicates all commands are atomic as this is the
  largest command size. It is recommended that implementations support
  a minimum of 128KB (appropriately scaled based on LBA size).


  Atomic Write Unit Power Fail (AWUPF): This field indicates the atomic
  write size for the controller during a power fail condition. This
  field is specified in logical blocks and is a 0’s based value. If a
  write is submitted of this size or less, the host is guaranteed that
  the write is atomic to the NVM with respect to other read or write
  operations. If a write is submitted that is greater than this size,
  there is no guarantee of atomicity.


Basically just exposing what is assumed to be true for SCSI and generally
assumed to be lies for ATA drives :-)

> > I took a look at the SCSI Block Command spec.  If I understand it
> > correctly, SCSI would implement this with the WRITE USING TOKEN command.
> > I don't see why it couldn't implement this API, though it seems like
> > SCSI would prefer a separate setup step before the write comes in.  I'm
> > not sure that's a reasonable request to make of the application (nor
> > am I sure I understand SBC correctly).
> 
> What kind of setup would we have to do?  We have all the IO in hand, so
> it can be organized in just about any way needed.

Someone who understands SCSI better than I do assures me this is NOT the
proposal that allows SCSI devices to do scattered writes.  Apparently that
proposal is still in progress.  This appears to be true; from the t10
NEW list:

12-087r6 	SBC-4 Gathered reads, optionally atomic 	Rob Elliott, Ashish Batwara, Walt Hubis 	Missing	
12-086r6 	SBC-4 SPC-5 Scattered writes, optionally atomic 	Rob Elliott, Ashish Batwara, Walt Hubis 	Missing

> Grin, almost Btrfs already does this...COW means that btrfs needs to
> update metadata to point to new locations.  To avoid an ugly
> flush-all-the-io-every-commit mess, we track pending writes and update
> the meatadata when the write is fully on media.
> 
> We're missing a firm line that makes sure all the metadata updates for a
> single write happen in the same transaction, but that part isn't hard.
> 
> We're missing good performance in database workloads, which is a
> slightly bigger trick.

Yeah ... if only you could find a database company to ... oh, wait :-)

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2013-11-13 21:35 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-01 21:27 [PATCH 0/2] Support for atomic IOs Chris Mason
2013-11-01 21:28 ` [PATCH 1/2] block: Add support for atomic writes Chris Mason
2013-11-01 21:47   ` Shaohua Li
2013-11-05 17:43   ` Jeff Moyer
2013-11-07 13:52     ` Chris Mason
2013-11-07 15:43       ` Jeff Moyer
2013-11-07 15:55         ` Chris Mason
2013-11-07 16:14           ` Jeff Moyer
2013-11-07 16:52             ` Chris Mason
2013-11-13 23:59             ` Dave Chinner
2013-11-12 15:11       ` Matthew Wilcox
2013-11-13 20:44         ` Chris Mason
2013-11-13 20:53           ` Howard Chu
2013-11-13 21:35           ` Matthew Wilcox [this message]
2013-11-01 21:29 ` [PATCH 2/3] fs: Add O_ATOMIC support to direct IO Chris Mason
  -- strict thread matches above, loose matches on Subject: below --
2013-11-20  8:23 [PATCH 1/2] block: Add support for atomic writes Kishore Sampathkumar
2013-11-26  6:24 Kishore Sampathkumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131113213554.GL6900@linux.intel.com \
    --to=willy@linux.intel.com \
    --cc=axboe@kernel.dk \
    --cc=chris.mason@fusionio.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.