From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dave Chinner Subject: Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions Date: Wed, 16 Jan 2019 15:34:55 +1100 Message-ID: <20190116043455.GP4205@dastard> References: <20190114145447.GJ13316@quack2.suse.cz> <20190114172124.GA3702@redhat.com> <20190115080759.GC29524@quack2.suse.cz> <20190115171557.GB3696@redhat.com> <752839e6-6cb3-a6aa-94cb-63d3d4265934@nvidia.com> <20190115221205.GD3696@redhat.com> <99110c19-3168-f6a9-fbde-0a0e57f67279@nvidia.com> <20190116015610.GH3696@redhat.com> <20190116022312.GJ3696@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Dan Williams , John Hubbard , Jan Kara , Matthew Wilcox , John Hubbard , Andrew Morton , Linux MM , tom@talpey.com, Al Viro , benve@cisco.com, Christoph Hellwig , Christopher Lameter , "Dalessandro, Dennis" , Doug Ledford , Jason Gunthorpe , Michal Hocko , Mike Marciniszyn , rcampbell@nvidia.com, Linux Kernel Mailing List , linux-fsdevel To: Jerome Glisse Return-path: Content-Disposition: inline In-Reply-To: <20190116022312.GJ3696@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Tue, Jan 15, 2019 at 09:23:12PM -0500, Jerome Glisse wrote: > On Tue, Jan 15, 2019 at 06:01:09PM -0800, Dan Williams wrote: > > On Tue, Jan 15, 2019 at 5:56 PM Jerome Glisse wrote: > > > On Tue, Jan 15, 2019 at 04:44:41PM -0800, John Hubbard wrote: > > [..] > > > To make it clear. > > > > > > Lock code: > > > GUP() > > > ... > > > lock_page(page); > > > if (PageWriteback(page)) { > > > unlock_page(page); > > > wait_stable_page(page); > > > goto retry; > > > } > > > atomic_add(page->refcount, PAGE_PIN_BIAS); > > > unlock_page(page); > > > > > > test_set_page_writeback() > > > bool pinned = false; > > > ... > > > pinned = page_is_pin(page); // could be after TestSetPageWriteback > > > TestSetPageWriteback(page); > > > ... > > > return pinned; > > > > > > Memory barrier: > > > GUP() > > > ... > > > atomic_add(page->refcount, PAGE_PIN_BIAS); > > > smp_mb(); > > > if (PageWriteback(page)) { > > > atomic_add(page->refcount, -PAGE_PIN_BIAS); > > > wait_stable_page(page); > > > goto retry; > > > } > > > > > > test_set_page_writeback() > > > bool pinned = false; > > > ... > > > TestSetPageWriteback(page); > > > smp_wmb(); > > > pinned = page_is_pin(page); > > > ... > > > return pinned; > > > > > > > > > One is not more complex than the other. One can contend, the other > > > will _never_ contend. > > > > The complexity is in the validation of lockless algorithms. It's > > easier to reason about locks than barriers for the long term > > maintainability of this code. I'm with Jan and John on wanting to > > explore lock_page() before a barrier-based scheme. > > How is the above hard to validate ? Well, if you think it's so easy, then please write the test cases so we can add them to fstests and make sure that we don't break it in future. If you can't write filesystem test cases that exercise these race conditions reliably, then the answer to your question is "it is extremely hard to validate" and the correct thing to do is to start with the simple lock_page() based algorithm. Premature optimisation in code this complex is something we really, really need to avoid. -Dave. -- Dave Chinner david@fromorbit.com