From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f53.google.com (mail-pb0-f53.google.com [209.85.160.53]) by kanga.kvack.org (Postfix) with ESMTP id EF5846B0035 for ; Mon, 25 Nov 2013 18:18:22 -0500 (EST) Received: by mail-pb0-f53.google.com with SMTP id ma3so6752756pbc.40 for ; Mon, 25 Nov 2013 15:18:22 -0800 (PST) Received: from ipmail05.adl6.internode.on.net (ipmail05.adl6.internode.on.net. [150.101.137.143]) by mx.google.com with ESMTP id fb6si18242534pab.8.2013.11.25.15.18.20 for ; Mon, 25 Nov 2013 15:18:21 -0800 (PST) Date: Tue, 26 Nov 2013 10:17:16 +1100 From: Dave Chinner Subject: Re: [patch 6/9] mm + fs: store shadow entries in page cache Message-ID: <20131125231716.GJ8803@dastard> References: <1385336308-27121-1-git-send-email-hannes@cmpxchg.org> <1385336308-27121-7-git-send-email-hannes@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1385336308-27121-7-git-send-email-hannes@cmpxchg.org> Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner Cc: Andrew Morton , Rik van Riel , Jan Kara , Vlastimil Babka , Peter Zijlstra , Tejun Heo , Andi Kleen , Andrea Arcangeli , Greg Thelen , Christoph Hellwig , Hugh Dickins , KOSAKI Motohiro , Mel Gorman , Minchan Kim , Michel Lespinasse , Seth Jennings , Roman Gushchin , Ozgun Erdogan , Metin Doslu , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org On Sun, Nov 24, 2013 at 06:38:25PM -0500, Johannes Weiner wrote: > Reclaim will be leaving shadow entries in the page cache radix tree > upon evicting the real page. As those pages are found from the LRU, > an iput() can lead to the inode being freed concurrently. At this > point, reclaim must no longer install shadow pages because the inode > freeing code needs to ensure the page tree is really empty. > > Add an address_space flag, AS_EXITING, that the inode freeing code > sets under the tree lock before doing the final truncate. Reclaim > will check for this flag before installing shadow pages. > > Signed-off-by: Johannes Weiner .... > @@ -545,10 +546,25 @@ static void evict(struct inode *inode) > */ > inode_wait_for_writeback(inode); > > + /* > + * Page reclaim can not do iput() and thus can race with the > + * inode teardown. Tell it when the address space is exiting, > + * so that it does not install eviction information after the > + * final truncate has begun. > + * > + * As truncation uses a lockless tree lookup, acquire the > + * spinlock to make sure any ongoing tree modification that > + * does not see AS_EXITING is completed before starting the > + * final truncate. > + */ > + spin_lock_irq(&inode->i_data.tree_lock); > + mapping_set_exiting(&inode->i_data); > + spin_unlock_irq(&inode->i_data.tree_lock); > + > if (op->evict_inode) { > op->evict_inode(inode); > } else { > - if (inode->i_data.nrpages) > + if (inode->i_data.nrpages || inode->i_data.nrshadows) > truncate_inode_pages(&inode->i_data, 0); > clear_inode(inode); > } Ok, so what I see here is that we need a wrapper function that handles setting the AS_EXITING flag and doing the "final" truncate_inode_pages() call, and the locking for the AS_EXITING flag moved into mapping_set_exiting() That is, because this AS_EXITING flag and it's locking constraints are directly related to the upcoming truncate_inode_pages() call, I'd prefer to see a helper that captures that relationship used in all the filesystem code. e.g: void truncate_inode_pages_final(struct address_space *mapping) { spin_lock_irq(&mapping->tree_lock); mapping_set_exiting(mapping); spin_unlock_irq(&mapping->tree_lock); if (inode->i_data.nrpages || inode->i_data.nrshadows) truncate_inode_pages_range(mapping, 0, (loff_t)-1); } And document it in Documentation/filesystems/porting as a mandatory function to be called from ->evict_inode() implementations before calling clear_inode(). You can then replace all the direct calls to truncate_inode_pages() in the evict_inode() path with a call to truncate_inode_pages_final(). As it is, I'd really like to see that unconditional irq disable go away from this code - disabling and enabling interrupts for every single inode we reclaim is going to add significant overhead to this hot code path. And given that: > +static inline void mapping_set_exiting(struct address_space *mapping) > +{ > + set_bit(AS_EXITING, &mapping->flags); > +} > + > +static inline int mapping_exiting(struct address_space *mapping) > +{ > + return test_bit(AS_EXITING, &mapping->flags); > +} these atomic bit ops, why do we need to take the tree_lock and disable irqs in evict() to set this bit if there's nothing to truncate on the inode? i.e. something like this: void truncate_inode_pages_final(struct address_space *mapping) { mapping_set_exiting(mapping); if (inode->i_data.nrpages || inode->i_data.nrshadows) { /* * spinlock barrier to ensure all modifications are * complete before we do the final truncate */ spin_lock_irq(&mapping->tree_lock); spin_unlock_irq(&mapping->tree_lock); truncate_inode_pages_range(mapping, 0, (loff_t)-1); } and thereby avoiding the mapping lock altogether for inodes that do not require it to be taken? Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org