From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753817AbcAEUdh (ORCPT ); Tue, 5 Jan 2016 15:33:37 -0500 Received: from bombadil.infradead.org ([198.137.202.9]:32875 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753639AbcAEUda (ORCPT ); Tue, 5 Jan 2016 15:33:30 -0500 Date: Tue, 5 Jan 2016 21:33:17 +0100 From: Peter Zijlstra To: Davidlohr Bueso Cc: Mel Gorman , Thomas Gleixner , Sebastian Andrzej Siewior , Chris Mason , Darren Hart , linux-kernel@vger.kernel.org, Davidlohr Bueso Subject: Re: [PATCH] futex: Reduce the scope of lock_page, aka lockless futex_get_key() Message-ID: <20160105203317.GQ6344@twins.programming.kicks-ass.net> References: <1452025435-15355-1-git-send-email-dave@stgolabs.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1452025435-15355-1-git-send-email-dave@stgolabs.net> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 05, 2016 at 12:23:55PM -0800, Davidlohr Bueso wrote: > +++ b/kernel/futex.c > @@ -520,7 +520,18 @@ again: > else > err = 0; > > - lock_page(page); > + /* > + * The treatment of mapping from this point on is critical. The page > + * lock protects many things but in this context the page lock > + * stabilises mapping, prevents inode freeing in the shared > + * file-backed region case and guards against movement to swap cache. A little extra whitespace separating this into paragraphs might help readability. > + * Strictly speaking the page lock is not needed in all cases being > + * considered here and page lock forces unnecessarily serialisation. > + * From this point on, mapping will be reverified if necessary and > + * page lock will be acquired only if it is unavoiable. > + */ > + mapping = READ_ONCE(compound_head(page)->mapping); > + > /* > * If page->mapping is NULL, then it cannot be a PageAnon > * page; but it might be the ZERO_PAGE or in the gate area or > + if (unlikely(!mapping)) { > + int shmem_swizzled; > + > + /* > + * Page lock is required to identify which special case above > + * applies. If this is really a shmem page then the page lock > + * will prevent unexpected transitions. > + */ > + lock_page(page); > + shmem_swizzled = PageSwapCache(page); > unlock_page(page); > put_page(page); > + WARN_ON_ONCE(mapping); We've not re-loaded mapping, so how could this possibly be? > + /* > + * Take a reference unless it is about to be freed. Previously > + * this reference was taken by ihold under the page lock > + * pinning the inode in place so i_lock was unnecessary. The > + * only way for this check to fail is if the inode was > + * truncated in parallel so warn for now if this happens. > + * > + * TODO: VFS and/or filesystem people should review this check > + * and see if there is a safer or more reliable way to do this. > + */ > + if (WARN_ON(!atomic_inc_not_zero(&inode->i_count))) { > + rcu_read_unlock(); > + put_page(page); > + goto again; > + } > + > + /* > + * get_futex_key() must imply MB (B) and we are not going to > + * call into get_futex_key_refs() at this point. > + */ > + smp_mb__after_atomic(); I don't get this one, the above is a successful atomic op with return value, that _must_ imply a full barrier.