linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jeff Moyer <jmoyer@redhat.com>
Cc: Chris Mason <chris.mason@fusionio.com>,
	Matthew Wilcox <willy@linux.intel.com>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	Jens Axboe <axboe@kernel.dk>
Subject: Re: [PATCH 1/2] block: Add support for atomic writes
Date: Thu, 14 Nov 2013 10:59:14 +1100	[thread overview]
Message-ID: <20131113235914.GD11434@dastard> (raw)
In-Reply-To: <x4938n8tcj9.fsf@segfault.boston.devel.redhat.com>

On Thu, Nov 07, 2013 at 11:14:02AM -0500, Jeff Moyer wrote:
> Chris Mason <chris.mason@fusionio.com> writes:
> 
> >> Well, we have control over dm and md, so I'm not worried about that.
> >> For the storage vendors, we'll have to see about influencing the
> >> standards bodies.
> >> 
> >> The way I see it, there are 3 pieces of information that are required:
> >> 1) minimum size that is atomic (likely the physical block size, but
> >>    maybe the logical block size?)
> >> 2) maximum size that is atomic (multiple of minimum size)
> >> 3) whether or not discontiguous ranges are supported
> >> 
> >> Did I miss anything?
> >
> > It'll vary from vendor to vendor.  A discontig range of two 512KB areas
> > is different from 256 distcontig 4KB areas.
> 
> Sure.
> 
> > And it's completely dependent on filesystem fragmentation.  So, a given
> > IO might pass for one file and fail for the next.
> 
> Worse, it could pass for one region of a file and fail for a different
> region of the same file.
> 
> I guess you could export the most conservative estimate, based on
> completely non-contiguous smallest sized segments.  Things larger may
> work, but they may not.  Perhaps this would be too limiting, I don't
> know.
> 
> > In a DM/MD configuration, an atomic IO inside a single stripe on raid0
> > could succeed while it will fail if it spans two stripes to two
> > different devices.
> 
> I'd say that if you are spanning multiple devices, you don't support
> O_ATOMIC.  You could write a specific dm target that allows it, but I
> don't think it's a priority to support it in the way your example does.

I would have thought this would be pretty simple to do - just
journal the atomic write so it can be recovered in full if there is
a power failure.

Indeed, what I'd really like to be able to do from a filesystem
perspective is to be able to issue a group of related metadata IO as
an atomic write rather than marshaling it through a journal and then
issuing them as unrelated IO. If we have a special dm-target
underneath that can either issue it as an atomic write (if the
hardware supports it) or emulate it via a journal to maintain
multi-device atomicity requirements then we end up with a general
atomic write solution that filesystems can then depend on.

Once we have guaranteed support for atomic writes, then we can 
completely remove journalling from filesystem transaction engines
as the atomicity requirements can be met with atomic writes. An then
we can optimise things like fsync() for atomic writes.

IOWs, generic support for atomic writes will make a major difference
to filesystem algorithms. Hence, from my perspective, at this early
point in the development lifecycle having guaranteed atomic write
support via emulation is far more important than actually having
hardware that supports it... :)

> Given that there are applications using your implementation, what did
> they determine was a sane way to do things?  Only access the block
> device?  Preallocate files?  Fallback to non-atomic writes + fsync?
> Something else?

Avoiding these problems is, IMO, another goof reason for having
generic, transparent support for atomic writes built into the IO
stack....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2013-11-13 23:59 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-01 21:27 [PATCH 0/2] Support for atomic IOs Chris Mason
2013-11-01 21:28 ` [PATCH 1/2] block: Add support for atomic writes Chris Mason
2013-11-01 21:47   ` Shaohua Li
2013-11-05 17:43   ` Jeff Moyer
2013-11-07 13:52     ` Chris Mason
2013-11-07 15:43       ` Jeff Moyer
2013-11-07 15:55         ` Chris Mason
2013-11-07 16:14           ` Jeff Moyer
2013-11-07 16:52             ` Chris Mason
2013-11-13 23:59             ` Dave Chinner [this message]
2013-11-12 15:11       ` Matthew Wilcox
2013-11-13 20:44         ` Chris Mason
2013-11-13 20:53           ` Howard Chu
2013-11-13 21:35           ` Matthew Wilcox
2013-11-01 21:29 ` [PATCH 2/3] fs: Add O_ATOMIC support to direct IO Chris Mason
  -- strict thread matches above, loose matches on Subject: below --
2013-11-20  8:23 [PATCH 1/2] block: Add support for atomic writes Kishore Sampathkumar
2013-11-26  6:24 Kishore Sampathkumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131113235914.GD11434@dastard \
    --to=david@fromorbit.com \
    --cc=axboe@kernel.dk \
    --cc=chris.mason@fusionio.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).