Re: API for multi-segment atomic IO

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Bart Van Assche <bart.vanassche@sandisk.com>
To: doug@easyco.com, device-mapper development <dm-devel@redhat.com>,
	Christoph Hellwig <hch@infradead.org>
Subject: Re: API for multi-segment atomic IO
Date: Thu, 9 Jul 2015 09:34:37 -0700	[thread overview]
Message-ID: <559EA29D.9000700@sandisk.com> (raw)
In-Reply-To: <CAFx4rwR980To5cK=UgW1-qe8psgeQFYYm0-g_sPz1cEs48JiYg@mail.gmail.com>

On 07/09/2015 08:41 AM, Doug Dumitru wrote:> Mr. Hellwig,
> On Wed, Jul 8, 2015 at 12:38 PM, Christoph Hellwig <hch@infradead.org
> <mailto:hch@infradead.org>> wrote:
>
>     On Wed, Jul 08, 2015 at 09:21:21AM -0700, Doug Dumitru wrote:
>     > I have a "smart" block device that can implement multi-segment atomic
>     > writes.
>
>     How about submitting your driver upstream first and then we can work
>     with you on an API that fits the devices and the consumers needs.
>
> I usually like to start with an interface and then implement the
> driver's from there.
>
> In this case, this is a block-level interface that supports new
> functionality (atomic writes).  In the past, you would approach this
> type of problem by having the atomic and user layers as a monolithic
> solution.  Consider database updates and the complexity that they go
> through to insure database integrity.  If a block device could provide a
> database with an atomic update interface, the database would get a lot
> simpler.  The same discussion holds true for file systems.  Depending on
> the atomic update implementation, you might end up in the same place in
> terms of total code, but you might also end up somewhere completely
> different.
>
> The impetus for this is some research on file system "write
> amplification".  In general, file system design seems to be heading in
> the direction of higher and higher write amplification.  For example,
> the tree structure of zfs is shockingly inefficient in terms of write
> overhead.  This is happening at the same time as Flash is becoming
> popular but is also moving to smaller and smaller geometries.  So write
> efficiency is becoming more and more important.
>
> By decoupling the atomic update semantics from file system and other
> block device "users", this gives devices the opportunity to implement
> atomic updates internal to or in cooperation with Flash management
> algorithms.  In theory, you can implement atomic updates without any
> extra writes.  In practice, some devices will be better than others.
>
> I was hoping to stumble across someone interested in this as a concept,
> or someone who has researched this area, as I don't have any near
> production existing code.  I could pretty easily hack in a couple of
> extra fields in struct bio that would accomplish what I see, but others
> might have differing input.

Hello Doug,

When designing such an API, please try to stay close to the semantics of 
the already standardized SCSI commands. As you probably know the Linux 
SCSI core has been implemented as a block driver. Any new command that 
is added to the Linux block layer has to be translated by the Linux SCSI 
core into a SCSI command. An example of a patch series that adds support 
for a new block layer primitive is the patch series that adds 
compare-and-write support 
(http://thread.gmane.org/gmane.linux.scsi/95869). Although that patch 
series is not yet upstream I think it is a good example of how to add 
new functionality to the block layer and SCSI core.

Bart.

next prev parent reply	other threads:[~2015-07-09 16:34 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-08  3:33 API for multi-segment atomic IO Doug Dumitru
2015-07-08 15:38 ` Bart Van Assche
2015-07-08 16:21   ` Doug Dumitru
2015-07-08 19:38     ` Christoph Hellwig
2015-07-09 15:41       ` Doug Dumitru
2015-07-09 16:34         ` Bart Van Assche [this message]
2015-07-09 17:08           ` Doug Dumitru
2015-07-09 17:24             ` Bart Van Assche
2015-07-09 20:39               ` Doug Dumitru

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=559EA29D.9000700@sandisk.com \
    --to=bart.vanassche@sandisk.com \
    --cc=dm-devel@redhat.com \
    --cc=doug@easyco.com \
    --cc=hch@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.