linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Dave Chinner <david@fromorbit.com>,
	Christoph Hellwig <hch@lst.de>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Darrick J. Wong" <darrick.wong@oracle.com>
Subject: Re: [PATCH v4 17/18] mm, fs, dax: dax_flush_dma, handle dma vs block-map-change collisions
Date: Fri, 9 Mar 2018 08:15:06 -0800	[thread overview]
Message-ID: <CAPcyv4iO1TVLXPPC=RXHWLfaSXO0wBcSOUMJevZgbeVmF+vrTA@mail.gmail.com> (raw)
In-Reply-To: <20180309125625.n6hpu6ooi6rhzqzs@quack2.suse.cz>

On Fri, Mar 9, 2018 at 4:56 AM, Jan Kara <jack@suse.cz> wrote:
> On Thu 08-03-18 09:02:30, Dan Williams wrote:
>> On Mon, Jan 8, 2018 at 5:50 AM, Jan Kara <jack@suse.cz> wrote:
>> > On Sun 07-01-18 13:58:42, Dan Williams wrote:
>> >> On Thu, Jan 4, 2018 at 3:12 AM, Jan Kara <jack@suse.cz> wrote:
>> >> > On Sat 23-12-17 16:57:31, Dan Williams wrote:
>> >> >
>> >> >> +     /*
>> >> >> +      * Flush dax_dma_lock() sections to ensure all possible page
>> >> >> +      * references have been taken, or will block on the fs
>> >> >> +      * 'mmap_lock'.
>> >> >> +      */
>> >> >> +     synchronize_rcu();
>> >> >
>> >> > Frankly, I don't like synchronize_rcu() in a relatively hot path like this.
>> >> > Cannot we just abuse get_dev_pagemap() to fail if truncation is in progress
>> >> > for the pfn? We could indicate that by some bit in struct page or something
>> >> > like that.
>> >>
>> >> We would need a lockless way to take a reference conditionally if the
>> >> page is not subject to truncation.
>> >>
>> >> I recall the raid5 code did something similar where it split a
>> >> reference count into 2 fields. I.e. take page->_refcount and use the
>> >> upper bits as a truncation count. Something like:
>> >>
>> >> do {
>> >>     old = atomic_read(&page->_refcount);
>> >>     if (old & trunc_mask) /* upper bits of _refcount */
>> >>         return false;
>> >>     new = cnt + 1;
>> >> } while (atomic_cmpxchg(&page->_refcount, old, new) != old);
>> >> return true; /* we incremented the _refcount while the truncation
>> >> count was zero */
>> >>
>> >> ...the only concern is teaching the put_page() path to consider that
>> >> 'trunc_mask' when determining that the page is idle.
>> >>
>> >> Other ideas?
>> >
>> > What I rather thought about was an update to GUP paths (like
>> > follow_page_pte()):
>> >
>> >         if (flags & FOLL_GET) {
>> >                 get_page(page);
>> >                 if (pte_devmap(pte)) {
>> >                         /*
>> >                          * Pairs with the barrier in the truncate path.
>> >                          * Could be possibly _after_atomic version of the
>> >                          * barrier.
>> >                          */
>> >                         smp_mb();
>> >                         if (PageTruncateInProgress(page)) {
>> >                                 put_page(page);
>> >                                 ..bail...
>> >                         }
>> >                 }
>> >         }
>> >
>> > and in the truncate path:
>> >
>> >         down_write(inode->i_mmap_sem);
>> >         walk all pages in the mapping and mark them PageTruncateInProgress().
>> >         unmap_mapping_range(...);
>> >         /*
>> >          * Pairs with the barrier in GUP path. In fact not necessary since
>> >          * unmap_mapping_range() provides us with the barrier already.
>> >          */
>> >         smp_mb();
>> >         /*
>> >          * By now we are either guaranteed to see grabbed page reference or
>> >          * GUP is guaranteed to see PageTruncateInProgress().
>> >          */
>> >         while ((page = dax_find_referenced_page(mapping))) {
>> >                 ...
>> >         }
>> >
>> > The barriers need some verification, I've opted for the conservative option
>> > but I guess you get the idea.
>>
>> [ Reviving this thread for the next rev of this patch set for 4.17
>> consideration ]
>>
>> I don't think this barrier scheme can work in the presence of
>> get_user_pages_fast(). The get_user_pages_fast() path can race
>> unmap_mapping_range() to take out an elevated reference count on a
>> page.
>
> Why the scheme cannot work? Sure you'd need to patch also gup_pte_range()
> and a similar thing for PMDs to recheck PageTruncateInProgress() after
> grabbing the page reference. But in principle I don't see anything
> fundamentally different between gup_fast() and plain gup().

Ah, yes I didn't grok the abort on PageTruncateInProgress() until I
read this again (and again), I'll try that.

  reply	other threads:[~2018-03-09 16:15 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-24  0:56 [PATCH v4 00/18] dax: fix dma vs truncate/hole-punch Dan Williams
2017-12-24  0:56 ` [PATCH v4 01/18] mm, dax: introduce pfn_t_special() Dan Williams
2018-01-04  8:16   ` Christoph Hellwig
2017-12-24  0:56 ` [PATCH v4 02/18] ext4: auto disable dax instead of failing mount Dan Williams
2018-01-03 14:20   ` Jan Kara
2017-12-24  0:56 ` [PATCH v4 03/18] ext2: " Dan Williams
2018-01-03 14:21   ` Jan Kara
2017-12-24  0:56 ` [PATCH v4 04/18] dax: require 'struct page' by default for filesystem dax Dan Williams
2018-01-03 15:29   ` Jan Kara
2018-01-04  8:16   ` Christoph Hellwig
2018-01-08 11:58   ` Gerald Schaefer
2017-12-24  0:56 ` [PATCH v4 05/18] dax: stop using VM_MIXEDMAP for dax Dan Williams
2018-01-03 15:27   ` Jan Kara
2017-12-24  0:56 ` [PATCH v4 06/18] dax: stop using VM_HUGEPAGE " Dan Williams
2017-12-24  0:56 ` [PATCH v4 07/18] dax: store pfns in the radix Dan Williams
2017-12-27  0:17   ` Ross Zwisler
2018-01-02 20:15     ` Dan Williams
2018-01-03 15:39   ` Jan Kara
2017-12-24  0:56 ` [PATCH v4 08/18] tools/testing/nvdimm: add 'bio_delay' mechanism Dan Williams
2017-12-27 18:08   ` Ross Zwisler
2018-01-02 20:35     ` Dan Williams
2018-01-02 21:44   ` Dave Chinner
2018-01-02 21:51     ` Dan Williams
2018-01-03 15:46       ` Jan Kara
2018-01-03 20:37         ` Jeff Moyer
2017-12-24  0:56 ` [PATCH v4 09/18] mm, dax: enable filesystems to trigger dev_pagemap ->page_free callbacks Dan Williams
2018-01-04  8:20   ` Christoph Hellwig
2017-12-24  0:56 ` [PATCH v4 10/18] mm, dev_pagemap: introduce CONFIG_DEV_PAGEMAP_OPS Dan Williams
2018-01-04  8:25   ` Christoph Hellwig
2017-12-24  0:56 ` [PATCH v4 11/18] fs, dax: introduce DEFINE_FSDAX_AOPS Dan Williams
2017-12-27  5:29   ` Matthew Wilcox
2018-01-02 20:21     ` Dan Williams
2018-01-03 16:05       ` Jan Kara
2018-01-04  8:27         ` Christoph Hellwig
2018-01-02 21:41   ` Dave Chinner
2017-12-24  0:57 ` [PATCH v4 12/18] xfs: use DEFINE_FSDAX_AOPS Dan Williams
2018-01-02 21:15   ` Darrick J. Wong
2018-01-02 21:40     ` Dan Williams
2018-01-03 16:09       ` Jan Kara
2018-01-04  8:28   ` Christoph Hellwig
2017-12-24  0:57 ` [PATCH v4 13/18] ext4: " Dan Williams
2018-01-04  8:29   ` Christoph Hellwig
2017-12-24  0:57 ` [PATCH v4 14/18] ext2: " Dan Williams
2018-01-04  8:29   ` Christoph Hellwig
2017-12-24  0:57 ` [PATCH v4 15/18] mm, fs, dax: use page->mapping to warn if dma collides with truncate Dan Williams
2018-01-04  8:30   ` Christoph Hellwig
2018-01-04  9:39   ` Jan Kara
2017-12-24  0:57 ` [PATCH v4 16/18] wait_bit: introduce {wait_on,wake_up}_atomic_one Dan Williams
2018-01-04  8:30   ` Christoph Hellwig
2017-12-24  0:57 ` [PATCH v4 17/18] mm, fs, dax: dax_flush_dma, handle dma vs block-map-change collisions Dan Williams
2018-01-04  8:31   ` Christoph Hellwig
2018-01-04 11:12   ` Jan Kara
2018-01-07 21:58     ` Dan Williams
2018-01-08 13:50       ` Jan Kara
2018-03-08 17:02         ` Dan Williams
2018-03-09 12:56           ` Jan Kara
2018-03-09 16:15             ` Dan Williams [this message]
2018-03-09 17:26               ` Dan Williams
2017-12-24  0:57 ` [PATCH v4 18/18] xfs, dax: wire up dax_flush_dma support via a new xfs_sync_dma helper Dan Williams
2018-01-02 21:07   ` Darrick J. Wong
2018-01-02 23:00   ` Dave Chinner
2018-01-03  2:21     ` Dan Williams
2018-01-03  7:51       ` Dave Chinner
2018-01-04  8:34         ` Christoph Hellwig
2018-01-04  8:33     ` Christoph Hellwig
2018-01-04  8:17 ` [PATCH v4 00/18] dax: fix dma vs truncate/hole-punch Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPcyv4iO1TVLXPPC=RXHWLfaSXO0wBcSOUMJevZgbeVmF+vrTA@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=darrick.wong@oracle.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mawilcox@microsoft.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).