From: Dave Chinner <david@fromorbit.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Rik van Riel <riel@redhat.com>, Jan Kara <jack@suse.cz>,
Vlastimil Babka <vbabka@suse.cz>,
Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>,
Andi Kleen <andi@firstfloor.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Greg Thelen <gthelen@google.com>,
Christoph Hellwig <hch@infradead.org>,
Hugh Dickins <hughd@google.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Mel Gorman <mgorman@suse.de>, Minchan Kim <minchan.kim@gmail.com>,
Michel Lespinasse <walken@google.com>,
Seth Jennings <sjenning@linux.vnet.ibm.com>,
Roman Gushchin <klamm@yandex-team.ru>,
Ozgun Erdogan <ozgun@citusdata.com>,
Metin Doslu <metin@citusdata.com>,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [patch 6/9] mm + fs: store shadow entries in page cache
Date: Tue, 26 Nov 2013 10:17:16 +1100 [thread overview]
Message-ID: <20131125231716.GJ8803@dastard> (raw)
In-Reply-To: <1385336308-27121-7-git-send-email-hannes@cmpxchg.org>
On Sun, Nov 24, 2013 at 06:38:25PM -0500, Johannes Weiner wrote:
> Reclaim will be leaving shadow entries in the page cache radix tree
> upon evicting the real page. As those pages are found from the LRU,
> an iput() can lead to the inode being freed concurrently. At this
> point, reclaim must no longer install shadow pages because the inode
> freeing code needs to ensure the page tree is really empty.
>
> Add an address_space flag, AS_EXITING, that the inode freeing code
> sets under the tree lock before doing the final truncate. Reclaim
> will check for this flag before installing shadow pages.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
....
> @@ -545,10 +546,25 @@ static void evict(struct inode *inode)
> */
> inode_wait_for_writeback(inode);
>
> + /*
> + * Page reclaim can not do iput() and thus can race with the
> + * inode teardown. Tell it when the address space is exiting,
> + * so that it does not install eviction information after the
> + * final truncate has begun.
> + *
> + * As truncation uses a lockless tree lookup, acquire the
> + * spinlock to make sure any ongoing tree modification that
> + * does not see AS_EXITING is completed before starting the
> + * final truncate.
> + */
> + spin_lock_irq(&inode->i_data.tree_lock);
> + mapping_set_exiting(&inode->i_data);
> + spin_unlock_irq(&inode->i_data.tree_lock);
> +
> if (op->evict_inode) {
> op->evict_inode(inode);
> } else {
> - if (inode->i_data.nrpages)
> + if (inode->i_data.nrpages || inode->i_data.nrshadows)
> truncate_inode_pages(&inode->i_data, 0);
> clear_inode(inode);
> }
Ok, so what I see here is that we need a wrapper function that
handles setting the AS_EXITING flag and doing the "final"
truncate_inode_pages() call, and the locking for the AS_EXITING flag
moved into mapping_set_exiting()
That is, because this AS_EXITING flag and it's locking constraints
are directly related to the upcoming truncate_inode_pages() call,
I'd prefer to see a helper that captures that relationship used
in all the filesystem code. e.g:
void truncate_inode_pages_final(struct address_space *mapping)
{
spin_lock_irq(&mapping->tree_lock);
mapping_set_exiting(mapping);
spin_unlock_irq(&mapping->tree_lock);
if (inode->i_data.nrpages || inode->i_data.nrshadows)
truncate_inode_pages_range(mapping, 0, (loff_t)-1);
}
And document it in Documentation/filesystems/porting as a mandatory
function to be called from ->evict_inode() implementations before
calling clear_inode(). You can then replace all the direct calls to
truncate_inode_pages() in the evict_inode() path with a call to
truncate_inode_pages_final().
As it is, I'd really like to see that unconditional irq disable go
away from this code - disabling and enabling interrupts for every
single inode we reclaim is going to add significant overhead to this
hot code path. And given that:
> +static inline void mapping_set_exiting(struct address_space *mapping)
> +{
> + set_bit(AS_EXITING, &mapping->flags);
> +}
> +
> +static inline int mapping_exiting(struct address_space *mapping)
> +{
> + return test_bit(AS_EXITING, &mapping->flags);
> +}
these atomic bit ops, why do we need to take the tree_lock and
disable irqs in evict() to set this bit if there's nothing to
truncate on the inode? i.e. something like this:
void truncate_inode_pages_final(struct address_space *mapping)
{
mapping_set_exiting(mapping);
if (inode->i_data.nrpages || inode->i_data.nrshadows) {
/*
* spinlock barrier to ensure all modifications are
* complete before we do the final truncate
*/
spin_lock_irq(&mapping->tree_lock);
spin_unlock_irq(&mapping->tree_lock);
truncate_inode_pages_range(mapping, 0, (loff_t)-1);
}
and thereby avoiding the mapping lock altogether for inodes that do
not require it to be taken?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-11-25 23:18 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-24 23:38 [patch 0/9] mm: thrash detection-based file cache sizing v6 Johannes Weiner
2013-11-24 23:38 ` [patch 1/9] fs: cachefiles: use add_to_page_cache_lru() Johannes Weiner
2013-11-24 23:38 ` [patch 2/9] lib: radix-tree: radix_tree_delete_item() Johannes Weiner
2013-11-25 8:21 ` Minchan Kim
2013-11-24 23:38 ` [patch 3/9] mm: shmem: save one radix tree lookup when truncating swapped pages Johannes Weiner
2013-11-25 8:21 ` Minchan Kim
2013-11-24 23:38 ` [patch 4/9] mm: filemap: move radix tree hole searching here Johannes Weiner
2013-11-24 23:38 ` [patch 5/9] mm + fs: prepare for non-page entries in page cache radix trees Johannes Weiner
2013-11-24 23:38 ` [patch 6/9] mm + fs: store shadow entries in page cache Johannes Weiner
2013-11-25 23:17 ` Dave Chinner [this message]
2013-11-26 10:20 ` Peter Zijlstra
2013-11-27 16:45 ` Johannes Weiner
2013-11-27 17:08 ` Johannes Weiner
2013-11-27 23:32 ` Dave Chinner
2013-11-24 23:38 ` [patch 7/9] mm: thrash detection-based file cache sizing Johannes Weiner
2013-11-25 23:50 ` Andrew Morton
2013-11-26 2:15 ` Johannes Weiner
2013-11-26 1:56 ` Ryan Mallon
2013-11-26 20:57 ` Johannes Weiner
2013-11-24 23:38 ` [patch 8/9] lib: radix_tree: tree node interface Johannes Weiner
2013-11-24 23:38 ` [patch 9/9] mm: keep page cache radix tree nodes in check Johannes Weiner
2013-11-25 23:49 ` Dave Chinner
2013-11-26 21:27 ` Johannes Weiner
2013-11-26 22:29 ` Dave Chinner
2013-11-26 23:00 ` Johannes Weiner
2013-11-27 0:59 ` Dave Chinner
2013-11-26 0:13 ` Andrew Morton
2013-11-26 22:05 ` Johannes Weiner
2013-11-26 0:57 ` [patch 0/9] mm: thrash detection-based file cache sizing v6 Andrew Morton
2013-11-26 22:30 ` Johannes Weiner
2013-11-28 4:40 ` Johannes Weiner
-- strict thread matches above, loose matches on Subject: below --
2013-12-02 19:21 [patch 0/9] mm: thrash detection-based file cache sizing v7 Johannes Weiner
2013-12-02 19:21 ` [patch 6/9] mm + fs: store shadow entries in page cache Johannes Weiner
2014-01-10 18:10 [patch 0/9] mm: thrash detection-based file cache sizing v8 Johannes Weiner
2014-01-10 18:10 ` [patch 6/9] mm + fs: store shadow entries in page cache Johannes Weiner
2014-01-10 22:30 ` Rik van Riel
2014-01-13 2:18 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131125231716.GJ8803@dastard \
--to=david@fromorbit.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=hughd@google.com \
--cc=jack@suse.cz \
--cc=klamm@yandex-team.ru \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=metin@citusdata.com \
--cc=mgorman@suse.de \
--cc=minchan.kim@gmail.com \
--cc=ozgun@citusdata.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=sjenning@linux.vnet.ibm.com \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
--cc=walken@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).