All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>, Jerome Glisse <jglisse@redhat.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Matthew Wilcox <willy@infradead.org>,
	Dan Williams <dan.j.williams@intel.com>,
	John Hubbard <john.hubbard@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>,
	tom@talpey.com, Al Viro <viro@zeniv.linux.org.uk>,
	benve@cisco.com, Christoph Hellwig <hch@infradead.org>,
	Christopher Lameter <cl@linux.com>,
	"Dalessandro, Dennis" <dennis.dalessandro@intel.com>,
	Doug Ledford <dledford@redhat.com>,
	Jason Gunthorpe <jgg@ziepe.ca>, Michal Hocko <mhocko@kernel.org>,
	mike.marciniszyn@intel.com, rcampbell@nvidia.com,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions
Date: Wed, 19 Dec 2018 14:24:49 +0100	[thread overview]
Message-ID: <20181219132449.GD18345@quack2.suse.cz> (raw)
In-Reply-To: <20181218234254.GC31274@dastard>

On Wed 19-12-18 10:42:54, Dave Chinner wrote:
> On Tue, Dec 18, 2018 at 11:33:06AM +0100, Jan Kara wrote:
> > On Mon 17-12-18 08:58:19, Dave Chinner wrote:
> > > On Fri, Dec 14, 2018 at 04:43:21PM +0100, Jan Kara wrote:
> > > > Yes, for filesystem it is too late. But the plan we figured back in October
> > > > was to do the bouncing in the block layer. I.e., mark the bio (or just the
> > > > particular page) as needing bouncing and then use the existing page
> > > > bouncing mechanism in the block layer to do the bouncing for us. Ext3 (when
> > > > it was still a separate fs driver) has been using a mechanism like this to
> > > > make DIF/DIX work with its metadata.
> > > 
> > > Sure, that's a possibility, but that doesn't close off any race
> > > conditions because there can be DMA into the page in progress while
> > > the page is being bounced, right? AFAICT this ext3+DIF/DIX case is
> > > different in that there is no 3rd-party access to the page while it
> > > is under IO (ext3 arbitrates all access to it's metadata), and so
> > > nothing can actually race for modification of the page between
> > > submission and bouncing at the block layer.
> > >
> > > In this case, the moment the page is unlocked, anyone else can map
> > > it and start (R)DMA on it, and that can happen before the bio is
> > > bounced by the block layer. So AFAICT, block layer bouncing doesn't
> > > solve the problem of racing writeback and DMA direct to the page we
> > > are doing IO on. Yes, it reduces the race window substantially, but
> > > it doesn't get rid of it.
> > 
> > The scenario you describe here cannot happen exactly because of the
> > wait_for_stable_page() in ->page_mkwrite() you mention below.
> 
> In general, no, because stable pages are controlled by block
> devices.
> 
> void wait_for_stable_page(struct page *page)
> {
>         if (bdi_cap_stable_pages_required(inode_to_bdi(page->mapping->host)))
>                 wait_on_page_writeback(page);
> }
> 
> 
> I have previously advocated for the filesystem to be in control of
> stable pages but, well, too many people shouted "but performance!"
> and so we still have all these holes I wanted to close in our
> code...
> 
> > If someone
> > will try to GUP a page that is under writeback (has already PageWriteback
> > set), GUP will have to do a write fault because the page is writeprotected
> > in page tables and go into ->page_mkwrite() which will wait.
> 
> Correct, but that doesn't close the problem down because stable
> pages are something we cannot rely on right now. We need to fix
> wait_for_stable_page() to always block on page writeback before
> this specific race condition goes away.

Right, all I said was assuming that someone actually cares about stable
pages so bdi_cap_stable_pages_required() is set. I agree with filesystem
having ability to control whether stable pages are required or not. But
when stable pages get enforced seems like a separate problem to me.

> > The problem rather is with someone mapping the page *before* writeback
> > starts, giving the page to HW. Then clear_page_dirty_for_io() writeprotects
> > the page in PTEs but the HW gives a damn about that. Then, after we add the
> > page to the bio but before the page gets bounced by the block layer, the HW
> > can still modify it.
> 
> Sure, that's yet another aspect of the same problem - not getting a
> write fault when the page is being written to. If we got a write
> fault, then the wait_for_stable_page() call in ->page_mkwrite would
> then solve the problem.
> 
> Essentially, what we are talking about is how to handle broken
> hardware. I say we should just brun it with napalm and thermite
> (i.e. taint the kernel with "unsupportable hardware") and force
> wait_for_stable_page() to trigger when there are GUP mappings if
> the underlying storage doesn't already require it.

As I wrote in other email, this is also about direct IO using file mapping
as a data buffer. So burn with napalm can hardly be a complete solution...
I agree that for the hardware that cannot support revoking of access /
fault on access and uses long-term page pins, we may just have to put up
with weird behavior in some corner cases.

> > > If it's permanently dirty, how do we trigger new COW operations
> > > after writeback has "cleaned" the page? i.e. we still need a
> > > ->page_mkwrite call to run before we allow the next write to the
> > > page to be done, regardless of whether the page is "permanently
> > > dirty" or not....
> > 
> > Interaction with COW is certainly an interesting problem. When the page
> > gets pinned, GUP will make sure the page is writeably mapped and trigger a
> > write fault if not. So at the moment the page is pinned, we are sure the
> > page is COWed. Now the question is what should happen when the file A
> > containing this pinned page gets reflinked to file B while the page is still
> > pinned.
> > 
> > Options I can see are:
> > 
> > 1) Fail the reflink.
> >   - difficult for sysadmin to discover the source of failure
> >
> > 2) Block reflink until the pin of the page is released.
> >   - can last for a long time, again difficult to discover
> > 
> > 3) Don't do anything special.
> >   - can corrupt data as read accesses through file B won't see
> >     modifications done to the page (and thus eventually the corresponding disk
> >     block) by the HW.
> > 
> > 4) Immediately COW the block during reflink when the corresponding page
> >    cache page is pinned.
> >   - seems as the best solution at this point, although sadly also requires
> >     the most per-filesystem work
> 
> None of the above are acceptable solutions - they all have nasty
> corner cases which are going to be difficult to get right, test,
> etc. IMO, the robust, reliable, testable solution is this:
> 
> 5) The reflink breaks the file lease, the userspace app releases the
> pinned pages on the file and drops the lease. The reflink proceeds,
> does it's work, and then the app gets a new lease on the file. When
> the app pins the pages again, it triggers new ->page_mkwrite calls
> to break any sharing that the reflink created. And if the app fails
> to drop the lease, then we can either fail with a lease related
> error or kill it....

This is certainly fine for the GUP users that are going to support leases.
But do you want GUP in direct IO to create a lease if the pages are from a
file mapping? I belive we need another option at least for GUP references
that are short-term in nature and sometimes also performance critical.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  parent reply	other threads:[~2018-12-19 13:24 UTC|newest]

Thread overview: 214+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-04  0:17 [PATCH 0/2] put_user_page*(): start converting the call sites john.hubbard
2018-12-04  0:17 ` [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions john.hubbard
2018-12-04  7:53   ` Mike Rapoport
2018-12-05  1:40     ` John Hubbard
2018-12-05  1:40       ` John Hubbard
2018-12-04 20:28   ` Dan Williams
2018-12-04 21:56     ` John Hubbard
2018-12-04 23:03       ` Dan Williams
2018-12-05  0:36         ` Jerome Glisse
2018-12-05  0:40           ` Dan Williams
2018-12-05  0:59             ` John Hubbard
2018-12-05  0:59               ` John Hubbard
2018-12-05  0:58         ` John Hubbard
2018-12-05  1:00           ` Dan Williams
2018-12-05  1:15           ` Matthew Wilcox
2018-12-05  1:44             ` Jerome Glisse
2018-12-05  1:57               ` John Hubbard
2018-12-05  1:57                 ` John Hubbard
2018-12-07  2:45                 ` John Hubbard
2018-12-07  2:45                   ` John Hubbard
2018-12-07 19:16                   ` Jerome Glisse
2018-12-07 19:26                     ` Dan Williams
2018-12-07 19:40                       ` Jerome Glisse
2018-12-08  0:52                     ` John Hubbard
2018-12-08  0:52                       ` John Hubbard
2018-12-08  2:24                       ` Jerome Glisse
2018-12-10 10:28                         ` Jan Kara
2018-12-12 15:03                           ` Jerome Glisse
2018-12-12 16:27                             ` Dan Williams
2018-12-12 17:02                               ` Jerome Glisse
2018-12-12 17:49                                 ` Dan Williams
2018-12-12 19:07                                   ` John Hubbard
2018-12-12 19:07                                     ` John Hubbard
2018-12-12 21:30                               ` Jerome Glisse
2018-12-12 21:40                                 ` Dan Williams
2018-12-12 21:53                                   ` Jerome Glisse
2018-12-12 22:11                                     ` Matthew Wilcox
2018-12-12 22:16                                       ` Jerome Glisse
2018-12-12 23:37                                     ` Jason Gunthorpe
2018-12-12 23:46                                       ` John Hubbard
2018-12-12 23:46                                         ` John Hubbard
2018-12-12 23:54                                       ` Dan Williams
2018-12-13  0:01                                       ` Jerome Glisse
2018-12-13  0:18                                         ` Dan Williams
2018-12-13  0:44                                           ` Jerome Glisse
2018-12-13  3:26                                             ` Jason Gunthorpe
2018-12-13  3:20                                         ` Jason Gunthorpe
2018-12-13 12:43                                           ` Jerome Glisse
2018-12-13 13:40                                             ` Tom Talpey
2018-12-13 14:18                                               ` Jerome Glisse
2018-12-13 14:51                                                 ` Tom Talpey
2018-12-13 15:18                                                   ` Jerome Glisse
2018-12-13 18:12                                                     ` Tom Talpey
2018-12-13 19:18                                                       ` Jerome Glisse
2018-12-14 10:41                                             ` Jan Kara
2018-12-14 15:25                                               ` Jerome Glisse
2018-12-12 21:56                                 ` John Hubbard
2018-12-12 21:56                                   ` John Hubbard
2018-12-12 22:04                                   ` Jerome Glisse
2018-12-12 22:11                                     ` John Hubbard
2018-12-12 22:11                                       ` John Hubbard
2018-12-12 22:14                                       ` Jerome Glisse
2018-12-12 22:17                                         ` John Hubbard
2018-12-12 22:17                                           ` John Hubbard
2018-12-12 21:46                             ` Dave Chinner
2018-12-12 21:59                               ` Jerome Glisse
2018-12-13  0:51                                 ` Dave Chinner
2018-12-13  2:02                                   ` Jerome Glisse
2018-12-13 15:56                                     ` Christopher Lameter
2018-12-13 16:02                                       ` Jerome Glisse
2018-12-14  6:00                                     ` Dave Chinner
2018-12-14 15:13                                       ` Jerome Glisse
2018-12-14  3:52                                   ` John Hubbard
2018-12-14  3:52                                     ` John Hubbard
2018-12-14  5:21                                     ` Dan Williams
2018-12-14  6:11                                       ` John Hubbard
2018-12-14 15:20                                         ` Jerome Glisse
2018-12-14 19:38                                         ` Dan Williams
2018-12-14 19:48                                           ` Matthew Wilcox
2018-12-14 19:53                                             ` Dave Hansen
2018-12-14 20:03                                               ` Matthew Wilcox
2018-12-14 20:17                                                 ` Dan Williams
2018-12-14 20:29                                                   ` Matthew Wilcox
2018-12-15  0:41                                                 ` John Hubbard
2018-12-17  8:56                                           ` Jan Kara
2018-12-17 18:28                                             ` Dan Williams
2018-12-14 15:43                               ` Jan Kara
2018-12-16 21:58                                 ` Dave Chinner
2018-12-17 18:11                                   ` Jerome Glisse
2018-12-17 18:34                                     ` Matthew Wilcox
2018-12-17 19:48                                       ` Jerome Glisse
2018-12-17 19:51                                         ` Matthew Wilcox
2018-12-17 19:54                                           ` Jerome Glisse
2018-12-17 19:59                                             ` Matthew Wilcox
2018-12-17 20:55                                               ` Jerome Glisse
2018-12-17 21:03                                                 ` Matthew Wilcox
2018-12-17 21:15                                                   ` Jerome Glisse
2018-12-18  1:09                                       ` Dave Chinner
2018-12-18  6:12                                       ` Darrick J. Wong
2018-12-18  9:30                                       ` Jan Kara
2018-12-18 23:29                                         ` John Hubbard
2018-12-18 23:29                                           ` John Hubbard
2018-12-19  2:07                                           ` Jerome Glisse
2018-12-19 11:08                                             ` Jan Kara
2018-12-20 10:54                                               ` John Hubbard
2018-12-20 10:54                                                 ` John Hubbard
2018-12-20 16:50                                                 ` Jerome Glisse
2018-12-20 16:50                                                   ` Jerome Glisse
2018-12-20 16:57                                                   ` Dan Williams
2018-12-20 16:49                                               ` Jerome Glisse
2018-12-20 16:49                                                 ` Jerome Glisse
2019-01-03  1:55                                               ` Jerome Glisse
2019-01-03  1:55                                                 ` Jerome Glisse
2019-01-03  3:27                                                 ` John Hubbard
2019-01-03  3:27                                                   ` John Hubbard
2019-01-03 14:57                                                   ` Jerome Glisse
2019-01-03 14:57                                                     ` Jerome Glisse
2019-01-03  9:26                                                 ` Jan Kara
2019-01-03  9:26                                                   ` Jan Kara
2019-01-03 14:44                                                   ` Jerome Glisse
2019-01-03 14:44                                                     ` Jerome Glisse
2019-01-11  2:59                                                     ` John Hubbard
2019-01-11  2:59                                                       ` John Hubbard
2019-01-11 16:51                                                       ` Jerome Glisse
2019-01-11 16:51                                                         ` Jerome Glisse
2019-01-12  1:04                                                         ` John Hubbard
2019-01-12  1:04                                                           ` John Hubbard
2019-01-12  2:02                                                           ` Jerome Glisse
2019-01-12  2:38                                                             ` John Hubbard
2019-01-12  2:38                                                               ` John Hubbard
2019-01-12  2:46                                                               ` Jerome Glisse
2019-01-12  3:06                                                                 ` John Hubbard
2019-01-12  3:06                                                                   ` John Hubbard
2019-01-12  3:25                                                                   ` Jerome Glisse
2019-01-12 20:46                                                                     ` John Hubbard
2019-01-12 20:46                                                                       ` John Hubbard
2019-01-14 14:54                                                                   ` Jan Kara
2019-01-14 17:21                                                                     ` Jerome Glisse
2019-01-14 19:09                                                                       ` John Hubbard
2019-01-14 19:09                                                                         ` John Hubbard
2019-01-14 19:09                                                                         ` John Hubbard
2019-01-15  8:34                                                                         ` Jan Kara
2019-01-15 21:39                                                                           ` John Hubbard
2019-01-15 21:39                                                                             ` John Hubbard
2019-01-15  8:07                                                                       ` Jan Kara
2019-01-15 17:15                                                                         ` Jerome Glisse
2019-01-15 21:56                                                                           ` John Hubbard
2019-01-15 21:56                                                                             ` John Hubbard
2019-01-15 22:12                                                                             ` Jerome Glisse
2019-01-16  0:44                                                                               ` John Hubbard
2019-01-16  0:44                                                                                 ` John Hubbard
2019-01-16  1:56                                                                                 ` Jerome Glisse
2019-01-16  2:01                                                                                   ` Dan Williams
2019-01-16  2:23                                                                                     ` Jerome Glisse
2019-01-16  4:34                                                                                       ` Dave Chinner
2019-01-16 14:50                                                                                         ` Jerome Glisse
2019-01-16 22:51                                                                                           ` Dave Chinner
2019-01-16 11:38                                                                         ` Jan Kara
2019-01-16 13:08                                                                           ` Jerome Glisse
2019-01-17  5:42                                                                             ` John Hubbard
2019-01-17  5:42                                                                               ` John Hubbard
2019-01-17 15:21                                                                               ` Jerome Glisse
2019-01-18  0:16                                                                                 ` Dave Chinner
2019-01-18  1:59                                                                                   ` Jerome Glisse
2019-01-17  9:30                                                                             ` Jan Kara
2019-01-17 15:17                                                                               ` Jerome Glisse
2019-01-22 15:24                                                                                 ` Jan Kara
2019-01-22 16:46                                                                                   ` Jerome Glisse
2019-01-23 18:02                                                                                     ` Jan Kara
2019-01-23 19:04                                                                                       ` Jerome Glisse
2019-01-29  0:22                                                                                         ` John Hubbard
2019-01-29  1:23                                                                                           ` Jerome Glisse
2019-01-29  6:41                                                                                             ` John Hubbard
2019-01-29 10:12                                                                                               ` Jan Kara
2019-01-30  2:21                                                                                                 ` John Hubbard
2019-01-17  5:25                                                                         ` John Hubbard
2019-01-17  5:25                                                                           ` John Hubbard
2019-01-17  9:04                                                                           ` Jan Kara
2019-01-12  3:14                                                               ` Jerome Glisse
2018-12-18 10:33                                   ` Jan Kara
2018-12-18 23:42                                     ` Dave Chinner
2018-12-19  3:03                                       ` Jason Gunthorpe
2018-12-19  5:26                                         ` Dan Williams
2018-12-19 11:19                                           ` Jan Kara
2018-12-19 10:28                                         ` Dave Chinner
2018-12-19 11:35                                           ` Jan Kara
2018-12-19 16:56                                             ` Jason Gunthorpe
2018-12-19 22:33                                             ` Dave Chinner
2018-12-20  9:07                                               ` Jan Kara
2018-12-20 16:54                                               ` Jerome Glisse
2018-12-20 16:54                                                 ` Jerome Glisse
2018-12-19 13:24                                       ` Jan Kara [this message]
2018-12-08  5:18                       ` Matthew Wilcox
2018-12-12 19:13                         ` John Hubbard
2018-12-12 19:13                           ` John Hubbard
2018-12-08  7:16                       ` Dan Williams
2018-12-08 16:33                         ` Jerome Glisse
2018-12-08 16:48                           ` Christoph Hellwig
2018-12-08 17:47                             ` Jerome Glisse
2018-12-08 18:26                               ` Christoph Hellwig
2018-12-08 18:45                                 ` Jerome Glisse
2018-12-08 18:09                             ` Dan Williams
2018-12-08 18:12                               ` Christoph Hellwig
2018-12-11  6:18                               ` Dave Chinner
2018-12-05  5:52             ` Dan Williams
2018-12-05 11:16       ` Jan Kara
2018-12-04  0:17 ` [PATCH 2/2] infiniband/mm: convert put_page() to put_user_page*() john.hubbard
2018-12-04 17:10 ` [PATCH 0/2] put_user_page*(): start converting the call sites David Laight
2018-12-05  1:05   ` John Hubbard
2018-12-05 14:08     ` David Laight
2018-12-28  8:37       ` Pavel Machek
  -- strict thread matches above, loose matches on Subject: below --
2019-02-08  7:56 [PATCH 0/2] mm: put_user_page() call site conversion first john.hubbard
2019-02-08  7:56 ` [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions john.hubbard
2019-02-08 10:32   ` Mike Rapoport
2019-02-08 20:44     ` John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181219132449.GD18345@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=benve@cisco.com \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=dennis.dalessandro@intel.com \
    --cc=dledford@redhat.com \
    --cc=hch@infradead.org \
    --cc=jgg@ziepe.ca \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=john.hubbard@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.marciniszyn@intel.com \
    --cc=rcampbell@nvidia.com \
    --cc=tom@talpey.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.