linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
To: Eric Sandeen <sandeen-+82itfer+wXR7s880joybQ@public.gmane.org>
Cc: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
	darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
	Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org>,
	Ross Zwisler <zwisler-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	linux-fsdevel
	<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	lczerner-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	linux-ext4 <linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH v4 0/2] ext4: fix DAX dma vs truncate/hole-punch
Date: Tue, 11 Sep 2018 17:14:21 +0200	[thread overview]
Message-ID: <20180911151421.GD6104@quack2.suse.cz> (raw)
In-Reply-To: <8c70d61a-fc5c-b928-334a-fbb2567b8dea-+82itfer+wXR7s880joybQ@public.gmane.org>

On Mon 10-09-18 17:18:49, Eric Sandeen wrote:
> On 8/7/18 3:45 AM, Jan Kara wrote:
> > On Fri 27-07-18 10:28:51, Ross Zwisler wrote:
> >> + fsdevel and the xfs list.
> >>
> >> On Wed, Jul 25, 2018 at 4:28 PM Ross Zwisler
> >> <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> wrote:
> >>> On Wed, Jul 11, 2018 at 10:17:41AM +0200, Jan Kara wrote:
> >>>> On Tue 10-07-18 13:10:29, Ross Zwisler wrote:
> >>>>> Changes since v3:
> >>>>>  * Added an ext4_break_layouts() call to ext4_insert_range() to ensure
> >>>>>    that the {ext4,xfs}_break_layouts() calls have the same meaning.
> >>>>>    (Dave, Darrick and Jan)
> >>>>
> >>>> How about the occasional WARN_ON_ONCE you mention below. Were you able to
> >>>> hunt them down?
> >>>
> >>> The root cause of this issue is that while the ei->i_mmap_sem provides
> >>> synchronization between ext4_break_layouts() and page faults, it doesn't
> >>> provide synchronize us with the direct I/O path.  This exact same issue exists
> >>> in XFS AFAICT, with the synchronization tool there being the XFS_MMAPLOCK.
> >>>
> >>> This allows the direct I/O path to do I/O and raise & lower page->_refcount
> >>> while we're executing a truncate/hole punch.  This leads to us trying to free
> >>> a page with an elevated refcount.
> >>>
> >>> Here's one instance of the race:
> >>>
> >>> CPU 0                                   CPU 1
> >>> -----                                   -----
> >>> ext4_punch_hole()
> >>>   ext4_break_layouts() # all pages have refcount=1
> >>>
> >>>                                         ext4_direct_IO()
> >>>                                           ... lots of layers ...
> >>>                                           follow_page_pte()
> >>>                                             get_page() # elevates refcount
> >>>
> >>>   truncate_pagecache_range()
> >>>    ... a few layers ...
> >>>    dax_disassociate_entry() # sees elevated refcount, WARN_ON_ONCE()
> >>>
> > 
> > So this is a very different race from the one below. And it should be
> > impossible to happen. This race is exactly the reason why
> > dax_layout_busy_page() has unmap_mapping_range() call to force GUP to fault
> > which blocks on ei->i_mmap_sem / XFS_MMAPLOCK and thus avoids the race.
> > 
> >>> A similar race occurs when the refcount is being dropped while we're running
> >>> ext4_break_layouts(), and this is the one that my test was actually hitting:
> >>>
> >>> CPU 0                                   CPU 1
> >>> -----                                   -----
> >>>                                         ext4_direct_IO()
> >>>                                           ... lots of layers ...
> >>>                                           follow_page_pte()
> >>>                                             get_page()
> >>>                                             # elevates refcount of page X
> >>> ext4_punch_hole()
> >>>   ext4_break_layouts() # two pages, X & Y, have refcount == 2
> >>>     __wait_var_event() # called for page X
> >>>
> >>>                                           __put_devmap_managed_page()
> >>>                                           # drops refcount of X to 1
> >>>
> >>>    # __wait_var_events() checks X's refcount in "if (condition)", and breaks.
> >>>    # We never actually called ext4_wait_dax_page(), so 'retry' in
> >>>    # ext4_break_layouts() is still false.  Exit do/while loop in
> >>>    # ext4_break_layouts, never attempting to wait on page Y which still has an
> >>>    # elevated refcount of 2.
> >>>
> >>>   truncate_pagecache_range()
> >>>    ... a few layers ...
> >>>    dax_disassociate_entry() # sees elevated refcount for Y, WARN_ON_ONCE()
> >>>
> >>> This second race can be fixed with the patch at the end of this function,
> >>> which I think should go in, unless there is a benfit to the current retry
> >>> scheme which relies on the 'retry' variable in {ext4,xfs}_break_layouts()?
> >>> With this patch applied I've been able to run my unit test through
> >>> thousands of iterations, where it used to failed consistently within 10 or
> >>> so.
> >>>
> >>> Even so, I wonder if the real solution is to add synchronization between
> >>> the direct I/O path and {ext4,xfs}_break_layouts()?  Other ideas on how
> >>> this should be handled?
> >>>
> >>> --- >8 ---
> >>>
> >>> From a4519b0f40362f0a63ae96acaf986092aff0f0d3 Mon Sep 17 00:00:00 2001
> >>> From: Ross Zwisler <zwisler-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> >>> Date: Wed, 25 Jul 2018 16:16:05 -0600
> >>> Subject: [PATCH] ext4: Close race between direct IO and ext4_break_layouts()
> >>>
> >>> If the refcount of a page is lowered between the time that it is returned
> >>> by dax_busy_page() and when the refcount is again checked in
> >>> ext4_break_layouts() => ___wait_var_event(), the waiting function
> >>> ext4_wait_dax_page() will never be called.  This means that
> >>> ext4_break_layouts() will still have 'retry' set to false, so we'll stop
> >>> looping and never check the refcount of other pages in this inode.
> >>>
> >>> Instead, always continue looping as long as dax_layout_busy_page() gives us
> >>> a page which it found with an elevated refcount.
> >>>
> >>> Note that this works around the race exposed by my unit test, but I think
> >>> that there is another race that needs to be addressed, probably with
> >>> additional synchronization added between direct I/O and
> >>> {ext4,xfs}_break_layouts().
> >>>
> >>> Signed-off-by: Ross Zwisler <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> > 
> > OK, this is a good catch and the patch looks good. You can add:
> > 
> > Reviewed-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
> > 
> > Also please post this fix officially to Ted to include it in his tree (I
> > can see that he has all your other patches queued for the merge window).
> 
> Did these ever get on Ted's radar?  I don't see it upstream yet.

Hum, it seems Ted never picked this patch up. I guess I'll gather the two
fixes you pointed out and resend them to Ted.

								Honza
-- 
Jan Kara <jack-IBi9RG/b67k@public.gmane.org>
SUSE Labs, CR

  parent reply	other threads:[~2018-09-11 15:14 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-10 19:10 [PATCH v4 0/2] ext4: fix DAX dma vs truncate/hole-punch Ross Zwisler
     [not found] ` <20180710191031.17919-1-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-07-10 19:10   ` [PATCH v4 1/2] dax: dax_layout_busy_page() warn on !exceptional Ross Zwisler
     [not found]     ` <20180710191031.17919-2-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-08-10 19:52       ` Eric Sandeen
     [not found]         ` <6ca60cf6-f1dc-27d5-fb79-7547aea39dfa-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-08-10 20:33           ` Theodore Y. Ts'o
     [not found]             ` <20180810203349.GH627-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2018-08-11  2:10               ` Theodore Y. Ts'o
     [not found]                 ` <20180811021053.GB14368-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2018-08-13 10:12                   ` Jan Kara
     [not found]                     ` <20180813101252.GC8927-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2018-08-13 12:46                       ` Theodore Y. Ts'o
2018-08-24 15:44                       ` Jan Kara
2018-08-27 16:09                       ` Jan Kara
2018-07-10 19:10   ` [PATCH v4 2/2] ext4: handle layout changes to pinned DAX mappings Ross Zwisler
2018-07-11  8:17   ` [PATCH v4 0/2] ext4: fix DAX dma vs truncate/hole-punch Jan Kara
     [not found]     ` <20180711081741.lmr44sp4cmt3f6um-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2018-07-11 15:41       ` Ross Zwisler
2018-07-25 22:28       ` Ross Zwisler
     [not found]         ` <20180725222839.GA28304-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-07-27 16:28           ` Ross Zwisler
     [not found]             ` <CAOxpaSWCT+XRtGxGQT_PVMcn79csMxbqb+cf4yycbV3eqjjnkQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-08-06  3:55               ` Dave Chinner
2018-08-06 15:49                 ` Christoph Hellwig
     [not found]                   ` <20180806154943.GA17666-jcswGhMUV9g@public.gmane.org>
2018-08-06 22:25                     ` Dave Chinner
2018-08-07  8:45               ` Jan Kara
     [not found]                 ` <20180807084545.6gzxrtrvb34hyhdq-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2018-09-10 22:18                   ` Eric Sandeen
     [not found]                     ` <8c70d61a-fc5c-b928-334a-fbb2567b8dea-+82itfer+wXR7s880joybQ@public.gmane.org>
2018-09-11 15:14                       ` Jan Kara [this message]
     [not found]                         ` <20180911151421.GD6104-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2018-09-11 15:20                           ` Jan Kara
     [not found]                             ` <20180911152024.GE6104-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2018-09-11 17:28                               ` Theodore Y. Ts'o
     [not found]                                 ` <20180911172806.GC2225-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2018-09-11 18:21                                   ` Eric Sandeen
2018-07-31 19:44   ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180911151421.GD6104@quack2.suse.cz \
    --to=jack-alswssmvlrq@public.gmane.org \
    --cc=darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=lczerner-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org \
    --cc=sandeen-+82itfer+wXR7s880joybQ@public.gmane.org \
    --cc=tytso-3s7WtUTddSA@public.gmane.org \
    --cc=zwisler-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).