All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@linux.intel.com>,
	"Wilcox, Matthew R" <matthew.r.wilcox@intel.com>,
	"ross.zwisler@linux.intel.com" <ross.zwisler@linux.intel.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"Dilger, Andreas" <andreas.dilger@intel.com>,
	"axboe@kernel.dk" <axboe@kernel.dk>,
	"boaz@plexistor.com" <boaz@plexistor.com>,
	"hch@lst.de" <hch@lst.de>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>,
	"mathieu.desnoyers@efficios.com" <mathieu.desnoyers@efficios.com>,
	"rdunlap@infradead.org" <rdunlap@infradead.org>,
	"tytso@mit.edu" <tytso@mit.edu>,
	"mm-commits@vger.kernel.org" <mm-commits@vger.kernel.org>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	xfs@oss.sgi.com
Subject: Re: + ext4-add-dax-functionality.patch added to -mm tree
Date: Fri, 20 Feb 2015 08:12:10 +1100	[thread overview]
Message-ID: <20150219211210.GS12722@dastard> (raw)
In-Reply-To: <20150219154241.GC22712@quack.suse.cz>

On Thu, Feb 19, 2015 at 04:42:41PM +0100, Jan Kara wrote:
> On Thu 19-02-15 08:55:23, Dave Chinner wrote:
> > On Wed, Feb 18, 2015 at 11:40:09AM +0100, Jan Kara wrote:
> > > On Tue 17-02-15 08:37:45, Matthew Wilcox wrote:
> > > > On Tue, Feb 17, 2015 at 09:52:00AM +0100, Jan Kara wrote:
> > > > > > > This got added to fix a problem that Dave Chinner pointed out.  We need
> > > > > > > the allocated extent to either be zeroed (as ext2 does), or marked as
> > > > > > > unwritten (ext4, XFS) so that a racing read/page fault doesn't return
> > > > > > > uninitialized data.  If it's marked as unwritten, we need to convert it
> > > > > > > to a written extent after we've initialised the contents.  We use the
> > > > > > > b_end_io() callback to do this, and it's called from the DAX code, not in
> > > > > > > softirq context.
> > > > > >   OK, I see. But I didn't find where ->b_end_io gets called from dax code
> > > > > > (specifically I don't see it anywhere in dax_do_IO() or dax_io()). Can you
> > > > > > point me please?
> > > > 
> > > > For faults, we call it in dax_insert_mapping(), the very last thing
> > > > before returning in the fault path.  The normal I/O path gets to use
> > > > the dio_iodone_t for the same purpose.
> > >   I see. I didn't think of races with reads (hum, I actually wonder whether
> > > we don't have this data exposure problem for ext4 for mmapped write into
> > > a hole vs direct read as well). So I guess we do need those unwritten
> > > extent dances after all (or we would need to have a page covering hole when
> > > writing to it via mmap but I guess unwritten extent dances are somewhat
> > > more standard).
> > 
> > Right, that was the reason for doing it that way - it leveraged all
> > the existing methods we have for avoiding data exposure races in
> > XFS. but it's also not just for races - it's for ensuring that if we
> > crash between the allocation and the write to the persistent store
> > we don't expose the underlying contents when the system next comes
> > up.
>   Well, ext3/4 handles the crash situation differently - we make sure we
> flush data to allocated blocks before committing a transaction that
> allocates them. That works perfectly for crashes but doesn't avoid the
> race with DIO.

I was talking about direct IO, not buffered IO. DAX is modeled on
the direct IO stack, not buffered IO. I did go and look at the ext4
IO completion path, and I can see where ext4_end_io_dio() triggers a
commit outside of doing unwritten extent conversion. Can you clue me
in - IO completion in ext4 is a maze of twisty passages...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: "axboe@kernel.dk" <axboe@kernel.dk>,
	"boaz@plexistor.com" <boaz@plexistor.com>,
	xfs@oss.sgi.com, "Dilger, Andreas" <andreas.dilger@intel.com>,
	"rdunlap@infradead.org" <rdunlap@infradead.org>,
	"tytso@mit.edu" <tytso@mit.edu>, "hch@lst.de" <hch@lst.de>,
	"mathieu.desnoyers@efficios.com" <mathieu.desnoyers@efficios.com>,
	"Wilcox, Matthew R" <matthew.r.wilcox@intel.com>,
	"mm-commits@vger.kernel.org" <mm-commits@vger.kernel.org>,
	Matthew Wilcox <willy@linux.intel.com>,
	"ross.zwisler@linux.intel.com" <ross.zwisler@linux.intel.com>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>
Subject: Re: + ext4-add-dax-functionality.patch added to -mm tree
Date: Fri, 20 Feb 2015 08:12:10 +1100	[thread overview]
Message-ID: <20150219211210.GS12722@dastard> (raw)
In-Reply-To: <20150219154241.GC22712@quack.suse.cz>

On Thu, Feb 19, 2015 at 04:42:41PM +0100, Jan Kara wrote:
> On Thu 19-02-15 08:55:23, Dave Chinner wrote:
> > On Wed, Feb 18, 2015 at 11:40:09AM +0100, Jan Kara wrote:
> > > On Tue 17-02-15 08:37:45, Matthew Wilcox wrote:
> > > > On Tue, Feb 17, 2015 at 09:52:00AM +0100, Jan Kara wrote:
> > > > > > > This got added to fix a problem that Dave Chinner pointed out.  We need
> > > > > > > the allocated extent to either be zeroed (as ext2 does), or marked as
> > > > > > > unwritten (ext4, XFS) so that a racing read/page fault doesn't return
> > > > > > > uninitialized data.  If it's marked as unwritten, we need to convert it
> > > > > > > to a written extent after we've initialised the contents.  We use the
> > > > > > > b_end_io() callback to do this, and it's called from the DAX code, not in
> > > > > > > softirq context.
> > > > > >   OK, I see. But I didn't find where ->b_end_io gets called from dax code
> > > > > > (specifically I don't see it anywhere in dax_do_IO() or dax_io()). Can you
> > > > > > point me please?
> > > > 
> > > > For faults, we call it in dax_insert_mapping(), the very last thing
> > > > before returning in the fault path.  The normal I/O path gets to use
> > > > the dio_iodone_t for the same purpose.
> > >   I see. I didn't think of races with reads (hum, I actually wonder whether
> > > we don't have this data exposure problem for ext4 for mmapped write into
> > > a hole vs direct read as well). So I guess we do need those unwritten
> > > extent dances after all (or we would need to have a page covering hole when
> > > writing to it via mmap but I guess unwritten extent dances are somewhat
> > > more standard).
> > 
> > Right, that was the reason for doing it that way - it leveraged all
> > the existing methods we have for avoiding data exposure races in
> > XFS. but it's also not just for races - it's for ensuring that if we
> > crash between the allocation and the write to the persistent store
> > we don't expose the underlying contents when the system next comes
> > up.
>   Well, ext3/4 handles the crash situation differently - we make sure we
> flush data to allocated blocks before committing a transaction that
> allocates them. That works perfectly for crashes but doesn't avoid the
> race with DIO.

I was talking about direct IO, not buffered IO. DAX is modeled on
the direct IO stack, not buffered IO. I did go and look at the ext4
IO completion path, and I can see where ext4_end_io_dio() triggers a
commit outside of doing unwritten extent conversion. Can you clue me
in - IO completion in ext4 is a maze of twisty passages...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-02-19 21:12 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-12 23:11 + ext4-add-dax-functionality.patch added to -mm tree akpm
2015-01-15 12:41 ` Jan Kara
2015-01-16 21:16   ` Wilcox, Matthew R
2015-01-19 14:18     ` Jan Kara
2015-01-19 14:18       ` Jan Kara
2015-02-17  8:52       ` Jan Kara
2015-02-17  8:52         ` Jan Kara
2015-02-17 13:37         ` Matthew Wilcox
2015-02-17 13:37           ` Matthew Wilcox
2015-02-18 10:40           ` Jan Kara
2015-02-18 10:40             ` Jan Kara
2015-02-18 21:55             ` Dave Chinner
2015-02-18 21:59               ` hch
2015-02-18 21:59                 ` hch
2015-02-19 15:42               ` Jan Kara
2015-02-19 21:12                 ` Dave Chinner [this message]
2015-02-19 21:12                   ` Dave Chinner
2015-02-19 23:08                   ` Dave Chinner
2015-02-19 23:08                     ` Dave Chinner
2015-02-20 12:05                   ` Jan Kara
2015-02-20 12:05                     ` Jan Kara
2015-02-20 22:15             ` Matthew Wilcox
2015-02-20 22:15               ` Matthew Wilcox
2015-02-23 12:52               ` Jan Kara
2015-02-23 12:52                 ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150219211210.GS12722@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreas.dilger@intel.com \
    --cc=axboe@kernel.dk \
    --cc=boaz@plexistor.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=matthew.r.wilcox@intel.com \
    --cc=mm-commits@vger.kernel.org \
    --cc=rdunlap@infradead.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=tytso@mit.edu \
    --cc=willy@linux.intel.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.