From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: "Wilcox, Matthew R" <matthew.r.wilcox@intel.com>,
Boaz Harrosh <boaz@plexistor.com>, Jan Kara <jack@suse.cz>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Davidlohr Bueso <dbueso@suse.de>
Subject: Re: [PATCH, RFC 2/2] dax: use range_lock instead of i_mmap_lock
Date: Thu, 13 Aug 2015 13:30:12 +0200 [thread overview]
Message-ID: <20150813113012.GK26599@quack.suse.cz> (raw)
In-Reply-To: <20150811214822.GA20596@dastard>
On Wed 12-08-15 07:48:22, Dave Chinner wrote:
> On Tue, Aug 11, 2015 at 04:51:22PM +0000, Wilcox, Matthew R wrote:
> > The race that you're not seeing is page fault vs page fault. Two
> > threads each attempt to store a byte to different locations on the
> > same page. With a read-mutex to exclude truncates, each thread
> > calls ->get_block. One of the threads gets back a buffer marked
> > as BH_New and calls memset() to clear the page. The other thread
> > gets back a buffer which isn't marked as BH_New and simply inserts
> > the mapping, returning to userspace, which stores the byte ...
> > just in time for the other thread's memset() to write a zero over
> > the top of it.
>
> So, this is not a truncate race that the XFS MMAPLOCK solves.
>
> However, that doesn't mean that the DAX code needs to add locking to
> solve it. The race here is caused by block initialisation being
> unserialised after a ->get_block call allocates the block (which the
> filesystem serialises via internal locking). Hence two simultaneous
> ->get_block calls to the same block is guaranteed to have the DAX
> block initialisation race with the second ->get_block call that says
> the block is already allocated.
>
> IOWs, the way to handle this is to have the ->get_block call handle
> the block zeroing for new blocks instead of doing it after the fact
> in the generic DAX code where there is no fine-grained serialisation
> object available. By calling dax_clear_blocks() in the ->get_block
> callback, the filesystem can ensure that the second racing call will
> only make progress once the block has been fully initialised by the
> first call.
>
> IMO the fix is - again - to move the functionality into the
> filesystem where we already have the necessary exclusion in place to
> avoid this race condition entirely.
I'm somewhat sad to add even more functionality into the already loaded
block mapping interface - we can already allocate delalloc blocks, unwritten
blocks, uninitialized blocks, and now also pre-zeroed blocks. But I agree
fs already synchronizes block allocation for a given inode so adding the
pre-zeroing there is pretty easy. Also getting rid of unwritten extent
handling from DAX code is a nice bonus so all in all I'm for this approach.
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
prev parent reply other threads:[~2015-08-13 11:30 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-08-10 15:14 [PATCH, RFC 0/2] Recover some scalability for DAX Kirill A. Shutemov
2015-08-10 15:14 ` [PATCH, RFC 1/2] lib: Implement range locks Kirill A. Shutemov
2015-08-10 15:14 ` [PATCH, RFC 2/2] dax: use range_lock instead of i_mmap_lock Kirill A. Shutemov
2015-08-11 8:19 ` Jan Kara
2015-08-11 9:37 ` Dave Chinner
2015-08-11 11:09 ` Boaz Harrosh
2015-08-11 12:03 ` Kirill A. Shutemov
2015-08-11 13:50 ` Jan Kara
2015-08-11 14:31 ` Boaz Harrosh
2015-08-11 15:28 ` Kirill A. Shutemov
2015-08-11 16:17 ` Boaz Harrosh
2015-08-11 20:26 ` Kirill A. Shutemov
2015-08-12 7:54 ` Boaz Harrosh
2015-08-11 16:51 ` Wilcox, Matthew R
2015-08-11 18:46 ` Boaz Harrosh
2015-08-11 21:48 ` Dave Chinner
2015-08-12 8:51 ` Boaz Harrosh
2015-08-13 11:30 ` Jan Kara [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150813113012.GK26599@quack.suse.cz \
--to=jack@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=boaz@plexistor.com \
--cc=david@fromorbit.com \
--cc=dbueso@suse.de \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=matthew.r.wilcox@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).