From: Nick Piggin <nickpiggin@yahoo.com.au>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <npiggin@suse.de>, Hugh Dickins <hugh@veritas.com>,
Andrew Morton <akpm@osdl.org>,
Linux Memory Management <linux-mm@kvack.org>
Subject: Re: [patch 3/9] mm: speculative get page
Date: Tue, 26 Sep 2006 08:41:55 +1000 [thread overview]
Message-ID: <45185B33.6060109@yahoo.com.au> (raw)
In-Reply-To: <1159189453.5018.25.camel@lappy>
Peter Zijlstra wrote:
>On Mon, 2006-09-25 at 13:47 +0200, Nick Piggin wrote:
>
>
>>+/*
>>+ * speculatively take a reference to a page.
>>+ * If the page is free (_count == 0), then _count is untouched, and 0
>>+ * is returned. Otherwise, _count is incremented by 1 and 1 is returned.
>>+ *
>>+ * This function must be run in the same rcu_read_lock() section as has
>>+ * been used to lookup the page in the pagecache radix-tree: this allows
>>+ * allocators to use a synchronize_rcu() to stabilize _count.
>>+ *
>>+ * Unless an RCU grace period has passed, the count of all pages coming out
>>+ * of the allocator must be considered unstable. page_count may return higher
>>+ * than expected, and put_page must be able to do the right thing when the
>>+ * page has been finished with (because put_page is what is used to drop an
>>+ * invalid speculative reference).
>>+ *
>>+ * This forms the core of the lockless pagecache locking protocol, where
>>+ * the lookup-side (eg. find_get_page) has the following pattern:
>>+ * 1. find page in radix tree
>>+ * 2. conditionally increment refcount
>>+ * 3. check the page is still in pagecache (if no, goto 1)
>>+ *
>>+ * Remove-side that cares about stability of _count (eg. reclaim) has the
>>
> ^^^^^^^^^^^^^^^^^^^
>is that the reason that the following two code paths are good without
>change:
>
> truncate_inode_page_range()
> truncate_complete_page()
> remove_from_page_cache()
> radix_tree_delete()
>
^^^ Yes.
>and
>
> zap_pte_range()
> free_swap_and_cache() <-- does check page_count()
> delete_from_swap_cache()
> __delete_from_swap_cache()
> radix_tree_delete()
>
>>From the comments around the truncate bit it seems to be ok with keeping
>the page as anonymous, however the zap_pte_range() thing does seem to
>want to have a stable page_count().
>
However when I last looked at it, it count can be elevated there for other
reasons (I think it was swap IO or get_user_pages or something). Anyway,
those pages will remain on the LRU and eventually get reclaimed.
I did initially change that code around a little bit, but I remember
working through it with Hugh and we decided that it would be OK as it was.
It should indeed be commented though.
>>+ * following (with tree_lock held for write):
>>+ * A. atomically check refcount is correct and set it to 0 (atomic_cmpxchg)
>>+ * B. remove page from pagecache
>>+ * C. free the page
>>+ *
>>+ * There are 2 critical interleavings that matter:
>>+ * - 2 runs before A: in this case, A sees elevated refcount and bails out
>>+ * - A runs before 2: in this case, 2 sees zero refcount and retries;
>>+ * subsequently, B will complete and 1 will find no page, causing the
>>+ * lookup to return NULL.
>>+ *
>>+ * It is possible that between 1 and 2, the page is removed then the exact same
>>+ * page is inserted into the same position in pagecache. That's OK: the
>>+ * old find_get_page using tree_lock could equally have run before or after
>>+ * such a re-insertion, depending on order that locks are granted.
>>+ *
>>+ * Lookups racing against pagecache insertion isn't a big problem: either 1
>>+ * will find the page or it will not. Likewise, the old find_get_page could run
>>+ * either before the insertion or afterwards, depending on timing.
>>+ */
>>
>
>Awesome code ;-)
>
Thanks :) Well, thank Hugh.
--
Send instant messages to your online friends http://au.messenger.yahoo.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2006-09-25 22:41 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-09-22 19:22 [patch 0/4] lockless pagecache for 2.6.18-rc7-mm1 Nick Piggin
2006-09-22 19:22 ` [patch 2/4] radix-tree: use indirect bit Nick Piggin
2006-09-22 19:22 ` [patch 2/9] radix-tree: gang_lookup_slot Nick Piggin
2006-09-22 19:22 ` [patch 3/9] mm: speculative get page Nick Piggin
2006-09-23 10:01 ` Peter Zijlstra
2006-09-24 18:01 ` Hugh Dickins
2006-09-25 2:00 ` Nick Piggin
2006-09-25 11:47 ` Nick Piggin
2006-09-25 13:04 ` Peter Zijlstra
2006-09-25 22:41 ` Nick Piggin [this message]
2006-09-22 19:22 ` [patch 4/9] mm: lockless pagecache lookups Nick Piggin
2006-09-22 20:01 ` Lee Schermerhorn
2006-09-23 2:35 ` Nick Piggin
2006-09-22 19:24 ` [patch 0/4] lockless pagecache for 2.6.18-rc7-mm1 Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45185B33.6060109@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@osdl.org \
--cc=hugh@veritas.com \
--cc=linux-mm@kvack.org \
--cc=npiggin@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.