From: Jerome Glisse <jglisse@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>,
john.hubbard@gmail.com, Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, Al Viro <viro@zeniv.linux.org.uk>,
Christian Benvenuti <benve@cisco.com>,
Christoph Hellwig <hch@infradead.org>,
Christopher Lameter <cl@linux.com>,
Dan Williams <dan.j.williams@intel.com>,
Dennis Dalessandro <dennis.dalessandro@intel.com>,
Doug Ledford <dledford@redhat.com>,
Ira Weiny <ira.weiny@intel.com>, Jan Kara <jack@suse.cz>,
Jason Gunthorpe <jgg@ziepe.ca>,
Matthew Wilcox <willy@infradead.org>,
Michal Hocko <mhocko@kernel.org>,
Mike Rapoport <rppt@linux.ibm.com>,
Mike Marciniszyn <mike.marciniszyn@intel.com>,
Ralph Campbell <rcampbell@nvidia.com>,
Tom Talpey <tom@talpey.com>, LKML <linux-kernel@vger.kernel.org>,
linux-fsdevel@vger.kernel.org, John Hubbard <jhubbard@nvidia.com>,
Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [PATCH v4 1/1] mm: introduce put_user_page*(), placeholder versions
Date: Tue, 19 Mar 2019 20:08:39 -0400 [thread overview]
Message-ID: <20190320000838.GA6364@redhat.com> (raw)
In-Reply-To: <20190319235752.GB26298@dastard>
On Wed, Mar 20, 2019 at 10:57:52AM +1100, Dave Chinner wrote:
> On Tue, Mar 19, 2019 at 06:06:55PM -0400, Jerome Glisse wrote:
> > On Wed, Mar 20, 2019 at 08:23:46AM +1100, Dave Chinner wrote:
> > > On Tue, Mar 19, 2019 at 10:14:16AM -0400, Jerome Glisse wrote:
> > > > On Tue, Mar 19, 2019 at 09:47:24AM -0400, Jerome Glisse wrote:
> > > > > On Tue, Mar 19, 2019 at 03:04:17PM +0300, Kirill A. Shutemov wrote:
> > > > > > On Fri, Mar 08, 2019 at 01:36:33PM -0800, john.hubbard@gmail.com wrote:
> > > > > > > From: John Hubbard <jhubbard@nvidia.com>
> > > > >
> > > > > [...]
> > > > >
> > > > > > > diff --git a/mm/gup.c b/mm/gup.c
> > > > > > > index f84e22685aaa..37085b8163b1 100644
> > > > > > > --- a/mm/gup.c
> > > > > > > +++ b/mm/gup.c
> > > > > > > @@ -28,6 +28,88 @@ struct follow_page_context {
> > > > > > > unsigned int page_mask;
> > > > > > > };
> > > > > > >
> > > > > > > +typedef int (*set_dirty_func_t)(struct page *page);
> > > > > > > +
> > > > > > > +static void __put_user_pages_dirty(struct page **pages,
> > > > > > > + unsigned long npages,
> > > > > > > + set_dirty_func_t sdf)
> > > > > > > +{
> > > > > > > + unsigned long index;
> > > > > > > +
> > > > > > > + for (index = 0; index < npages; index++) {
> > > > > > > + struct page *page = compound_head(pages[index]);
> > > > > > > +
> > > > > > > + if (!PageDirty(page))
> > > > > > > + sdf(page);
> > > > > >
> > > > > > How is this safe? What prevents the page to be cleared under you?
> > > > > >
> > > > > > If it's safe to race clear_page_dirty*() it has to be stated explicitly
> > > > > > with a reason why. It's not very clear to me as it is.
> > > > >
> > > > > The PageDirty() optimization above is fine to race with clear the
> > > > > page flag as it means it is racing after a page_mkclean() and the
> > > > > GUP user is done with the page so page is about to be write back
> > > > > ie if (!PageDirty(page)) see the page as dirty and skip the sdf()
> > > > > call while a split second after TestClearPageDirty() happens then
> > > > > it means the racing clear is about to write back the page so all
> > > > > is fine (the page was dirty and it is being clear for write back).
> > > > >
> > > > > If it does call the sdf() while racing with write back then we
> > > > > just redirtied the page just like clear_page_dirty_for_io() would
> > > > > do if page_mkclean() failed so nothing harmful will come of that
> > > > > neither. Page stays dirty despite write back it just means that
> > > > > the page might be write back twice in a row.
> > > >
> > > > Forgot to mention one thing, we had a discussion with Andrea and Jan
> > > > about set_page_dirty() and Andrea had the good idea of maybe doing
> > > > the set_page_dirty() at GUP time (when GUP with write) not when the
> > > > GUP user calls put_page(). We can do that by setting the dirty bit
> > > > in the pte for instance. They are few bonus of doing things that way:
> > > > - amortize the cost of calling set_page_dirty() (ie one call for
> > > > GUP and page_mkclean()
> > > > - it is always safe to do so at GUP time (ie the pte has write
> > > > permission and thus the page is in correct state)
> > > > - safe from truncate race
> > > > - no need to ever lock the page
> > >
> > > I seem to have missed this conversation, so please excuse me for
> >
> > The set_page_dirty() at GUP was in a private discussion (it started
> > on another topic and drifted away to set_page_dirty()).
> >
> > > asking a stupid question: if it's a file backed page, what prevents
> > > background writeback from cleaning the dirty page ~30s into a long
> > > term pin? i.e. I don't see anything in this proposal that prevents
> > > the page from being cleaned by writeback and putting us straight
> > > back into the situation where a long term RDMA is writing to a clean
> > > page....
> >
> > So this patchset does not solve this issue.
>
> OK, so it just kicks the can further down the road.
>
> > [3..N] decide what to do for GUPed page, so far the plans seems
> > to be to keep the page always dirty and never allow page
> > write back to restore the page in a clean state. This does
> > disable thing like COW and other fs feature but at least
> > it seems to be the best thing we can do.
>
> So the plan for GUP vs writeback so far is "break fsync()"? :)
>
> We might need to work on that a bit more...
Sorry forgot to say that we still do write back using a bounce page
so that at least we write something to disk that is just a snapshot
of the GUPed page everytime writeback kicks in (so either through
radix tree dirty page write back or fsync or any other sync events).
So many little details that i forgot the big chunk :)
Cheers,
Jérôme
next prev parent reply other threads:[~2019-03-20 0:08 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-08 21:36 [PATCH v4 0/1] mm: introduce put_user_page*(), placeholder versions john.hubbard
2019-03-08 21:36 ` [PATCH v4 1/1] " john.hubbard
2019-03-19 12:04 ` Kirill A. Shutemov
2019-03-19 13:47 ` Jerome Glisse
2019-03-19 14:06 ` Kirill A. Shutemov
2019-03-19 14:15 ` Jerome Glisse
2019-03-19 20:01 ` John Hubbard
2019-03-20 9:28 ` Kirill A. Shutemov
2019-03-19 14:14 ` Jerome Glisse
2019-03-19 14:29 ` Kirill A. Shutemov
2019-03-19 15:36 ` Jan Kara
2019-03-19 9:03 ` Ira Weiny
2019-03-19 20:43 ` Tom Talpey
2019-03-19 20:45 ` Jerome Glisse
2019-03-19 20:55 ` Tom Talpey
2019-03-19 19:02 ` John Hubbard
2019-03-19 21:23 ` Dave Chinner
2019-03-19 22:06 ` Jerome Glisse
2019-03-19 23:57 ` Dave Chinner
2019-03-20 0:08 ` Jerome Glisse [this message]
2019-03-20 1:43 ` John Hubbard
2019-03-20 4:33 ` Jerome Glisse
2019-03-20 9:08 ` Ira Weiny
2019-03-20 14:55 ` William Kucharski
2019-03-20 14:59 ` Jerome Glisse
2019-03-20 0:15 ` John Hubbard
2019-03-20 1:01 ` Christopher Lameter
2019-03-19 19:24 ` John Hubbard
2019-03-20 9:40 ` Kirill A. Shutemov
2019-03-08 23:21 ` [PATCH v4 0/1] " John Hubbard
2019-03-19 18:12 ` Christopher Lameter
2019-03-19 19:24 ` John Hubbard
2019-03-20 1:09 ` Christopher Lameter
2019-03-20 1:18 ` John Hubbard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190320000838.GA6364@redhat.com \
--to=jglisse@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=benve@cisco.com \
--cc=cl@linux.com \
--cc=dan.j.williams@intel.com \
--cc=david@fromorbit.com \
--cc=dennis.dalessandro@intel.com \
--cc=dledford@redhat.com \
--cc=hch@infradead.org \
--cc=ira.weiny@intel.com \
--cc=jack@suse.cz \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=john.hubbard@gmail.com \
--cc=kirill@shutemov.name \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mike.marciniszyn@intel.com \
--cc=rcampbell@nvidia.com \
--cc=rppt@linux.ibm.com \
--cc=tom@talpey.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.