From: Nick Piggin <nickpiggin@yahoo.com.au>
To: William Lee Irwin III <wli@holomorphy.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
Linux Memory Management <linux-mm@kvack.org>
Subject: Re: [patch 2] mm: speculative get_page
Date: Tue, 28 Jun 2005 10:03:00 +1000 [thread overview]
Message-ID: <42C093B4.3010707@yahoo.com.au> (raw)
In-Reply-To: <20050627141220.GM3334@holomorphy.com>
William Lee Irwin III wrote:
> On Mon, Jun 27, 2005 at 04:32:38PM +1000, Nick Piggin wrote:
>
>>+static inline struct page *page_cache_get_speculative(struct page **pagep)
>>+{
>>+ struct page *page;
>>+
>>+ preempt_disable();
>>+ page = *pagep;
>>+ if (!page)
>>+ goto out_failed;
>>+
>>+ if (unlikely(get_page_testone(page))) {
>>+ /* Picked up a freed page */
>>+ __put_page(page);
>>+ goto out_failed;
>>+ }
>
>
> So you pick up 0->1 refcount transitions.
>
Yep ie. a page that's freed or being freed.
>
> On Mon, Jun 27, 2005 at 04:32:38PM +1000, Nick Piggin wrote:
>
>>+ /*
>>+ * preempt can really be enabled here (only needs to be disabled
>>+ * because page allocation can spin on the elevated refcount, but
>>+ * we don't want to hold a reference on an unrelated page for too
>>+ * long, so keep preempt off until we know we have the right page
>>+ */
>>+
>>+ if (unlikely(PageFreeing(page)) ||
>
>
> SetPageFreeing is only done in shrink_list(), so other pages in the
> buddy bitmaps and/or pagecache pages freed by other methods may not
It is also done by remove_exclusive_swap_page, although that hunk
leaked into a later patch (#5), sorry.
Other methods (eg truncate) don't seem to have an atomicity guarantee
anyway - ie. it is valid to pick up a reference on a page that is
just about to get truncated. PageFreeing is only used when some code
is making an assumption about the number of users of the page.
> be found by this. There's also likely trouble with higher-order pages.
>
There isn't because higher order pages aren't used for pagecache.
>
> On Mon, Jun 27, 2005 at 04:32:38PM +1000, Nick Piggin wrote:
>
>>+ unlikely(page != *pagep)) {
>>+ /* Picked up a page being freed, or one that's been reused */
>>+ put_page(page);
>>+ goto out_failed;
>>+ }
>>+ preempt_enable();
>>+
>>+ return page;
>>+
>>+out_failed:
>>+ preempt_enable();
>>+ return NULL;
>>+}
>
>
> page != *pagep won't be reliably tripped unless the pagecache
> modification has the appropriate memory barriers.
>
There are appropriate memory barriers: the radix tree is
modified uner the rwlock/spinlock, and this function has
a memory barrier before testing page != *pagep.
> The lockless radix tree lookups are a harder problem than this, and
> the implementation didn't look promising. I have other problems to deal
> with so I'm not going to go very far into this.
>
What's wrong with the lockless radix tree lookups?
> While I agree that locklessness is the right direction for the
> pagecache to go, this RFC seems to have too far to go to use it to
> conclude anything about the subject.
>
You don't seem to have looked enough to conclude anything about it.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
WARNING: multiple messages have this Message-ID (diff)
From: Nick Piggin <nickpiggin@yahoo.com.au>
To: William Lee Irwin III <wli@holomorphy.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
Linux Memory Management <linux-mm@kvack.org>
Subject: Re: [patch 2] mm: speculative get_page
Date: Tue, 28 Jun 2005 10:03:00 +1000 [thread overview]
Message-ID: <42C093B4.3010707@yahoo.com.au> (raw)
In-Reply-To: <20050627141220.GM3334@holomorphy.com>
William Lee Irwin III wrote:
> On Mon, Jun 27, 2005 at 04:32:38PM +1000, Nick Piggin wrote:
>
>>+static inline struct page *page_cache_get_speculative(struct page **pagep)
>>+{
>>+ struct page *page;
>>+
>>+ preempt_disable();
>>+ page = *pagep;
>>+ if (!page)
>>+ goto out_failed;
>>+
>>+ if (unlikely(get_page_testone(page))) {
>>+ /* Picked up a freed page */
>>+ __put_page(page);
>>+ goto out_failed;
>>+ }
>
>
> So you pick up 0->1 refcount transitions.
>
Yep ie. a page that's freed or being freed.
>
> On Mon, Jun 27, 2005 at 04:32:38PM +1000, Nick Piggin wrote:
>
>>+ /*
>>+ * preempt can really be enabled here (only needs to be disabled
>>+ * because page allocation can spin on the elevated refcount, but
>>+ * we don't want to hold a reference on an unrelated page for too
>>+ * long, so keep preempt off until we know we have the right page
>>+ */
>>+
>>+ if (unlikely(PageFreeing(page)) ||
>
>
> SetPageFreeing is only done in shrink_list(), so other pages in the
> buddy bitmaps and/or pagecache pages freed by other methods may not
It is also done by remove_exclusive_swap_page, although that hunk
leaked into a later patch (#5), sorry.
Other methods (eg truncate) don't seem to have an atomicity guarantee
anyway - ie. it is valid to pick up a reference on a page that is
just about to get truncated. PageFreeing is only used when some code
is making an assumption about the number of users of the page.
> be found by this. There's also likely trouble with higher-order pages.
>
There isn't because higher order pages aren't used for pagecache.
>
> On Mon, Jun 27, 2005 at 04:32:38PM +1000, Nick Piggin wrote:
>
>>+ unlikely(page != *pagep)) {
>>+ /* Picked up a page being freed, or one that's been reused */
>>+ put_page(page);
>>+ goto out_failed;
>>+ }
>>+ preempt_enable();
>>+
>>+ return page;
>>+
>>+out_failed:
>>+ preempt_enable();
>>+ return NULL;
>>+}
>
>
> page != *pagep won't be reliably tripped unless the pagecache
> modification has the appropriate memory barriers.
>
There are appropriate memory barriers: the radix tree is
modified uner the rwlock/spinlock, and this function has
a memory barrier before testing page != *pagep.
> The lockless radix tree lookups are a harder problem than this, and
> the implementation didn't look promising. I have other problems to deal
> with so I'm not going to go very far into this.
>
What's wrong with the lockless radix tree lookups?
> While I agree that locklessness is the right direction for the
> pagecache to go, this RFC seems to have too far to go to use it to
> conclude anything about the subject.
>
You don't seem to have looked enough to conclude anything about it.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>
next prev parent reply other threads:[~2005-06-28 0:03 UTC|newest]
Thread overview: 106+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-06-27 6:29 [rfc] lockless pagecache Nick Piggin
2005-06-27 6:29 ` Nick Piggin
2005-06-27 6:32 ` [patch 1] mm: PG_free flag Nick Piggin
2005-06-27 6:32 ` [patch 2] mm: speculative get_page Nick Piggin
2005-06-27 6:33 ` [patch 3] radix tree: lookup_slot Nick Piggin
2005-06-27 6:34 ` [patch 4] radix tree: lockless readside Nick Piggin
2005-06-27 6:34 ` [patch 5] mm: lockless pagecache lookups Nick Piggin
2005-06-27 6:35 ` [patch 6] mm: spinlock tree_lock Nick Piggin
2005-06-27 14:12 ` [patch 2] mm: speculative get_page William Lee Irwin III
2005-06-27 14:12 ` William Lee Irwin III
2005-06-28 0:03 ` Nick Piggin [this message]
2005-06-28 0:03 ` Nick Piggin
2005-06-28 0:56 ` Nick Piggin
2005-06-28 0:56 ` Nick Piggin
2005-06-28 1:22 ` William Lee Irwin III
2005-06-28 1:22 ` William Lee Irwin III
2005-06-28 1:42 ` Nick Piggin
2005-06-28 1:42 ` Nick Piggin
2005-06-28 4:06 ` William Lee Irwin III
2005-06-28 4:06 ` William Lee Irwin III
2005-06-28 4:50 ` Nick Piggin
2005-06-28 4:50 ` Nick Piggin
2005-06-28 5:08 ` David S. Miller
2005-06-28 5:08 ` [patch 2] mm: speculative get_page, " David S. Miller, Nick Piggin
2005-06-28 5:34 ` Nick Piggin
2005-06-28 5:34 ` Nick Piggin
2005-06-28 14:19 ` William Lee Irwin III
2005-06-28 14:19 ` William Lee Irwin III
2005-06-28 15:43 ` Nick Piggin
2005-06-28 15:43 ` Nick Piggin
2005-06-28 17:01 ` Christoph Lameter
2005-06-28 17:01 ` Christoph Lameter
2005-06-28 23:10 ` Nick Piggin
2005-06-28 23:10 ` Nick Piggin
2005-06-28 21:32 ` Jesse Barnes
2005-06-28 21:32 ` Jesse Barnes
2005-06-28 22:17 ` Christoph Lameter
2005-06-28 22:17 ` Christoph Lameter
2005-06-28 12:45 ` Andy Whitcroft
2005-06-28 12:45 ` Andy Whitcroft
2005-06-28 13:16 ` Nick Piggin
2005-06-28 13:16 ` Nick Piggin
2005-06-28 16:02 ` Dave Hansen
2005-06-28 16:02 ` Dave Hansen
2005-06-29 16:31 ` Pavel Machek
2005-06-29 16:31 ` Pavel Machek
2005-06-29 18:43 ` Dave Hansen
2005-06-29 18:43 ` Dave Hansen
2005-06-29 21:22 ` Pavel Machek
2005-06-29 21:22 ` Pavel Machek
2005-06-29 16:31 ` Pavel Machek
2005-06-29 16:31 ` Pavel Machek
2005-06-27 6:43 ` VFS scalability (was: [rfc] lockless pagecache) Nick Piggin
2005-06-27 6:43 ` Nick Piggin
2005-06-27 7:13 ` Andi Kleen
2005-06-27 7:13 ` Andi Kleen
2005-06-27 7:33 ` VFS scalability Nick Piggin
2005-06-27 7:33 ` Nick Piggin
2005-06-27 7:44 ` Andi Kleen
2005-06-27 7:44 ` Andi Kleen
2005-06-27 8:03 ` Nick Piggin
2005-06-27 8:03 ` Nick Piggin
2005-06-27 7:46 ` [rfc] lockless pagecache Andrew Morton
2005-06-27 7:46 ` Andrew Morton
2005-06-27 8:02 ` Nick Piggin
2005-06-27 8:02 ` Nick Piggin
2005-06-27 8:15 ` Andrew Morton
2005-06-27 8:15 ` Andrew Morton
2005-06-27 8:28 ` Nick Piggin
2005-06-27 8:28 ` Nick Piggin
2005-06-27 8:56 ` Lincoln Dale
2005-06-27 8:56 ` Lincoln Dale
2005-06-27 9:04 ` Nick Piggin
2005-06-27 9:04 ` Nick Piggin
2005-06-27 18:14 ` Chen, Kenneth W
2005-06-27 18:14 ` Chen, Kenneth W
2005-06-27 18:50 ` Badari Pulavarty
2005-06-27 18:50 ` Badari Pulavarty
2005-06-27 19:05 ` Chen, Kenneth W
2005-06-27 19:05 ` Chen, Kenneth W
2005-06-27 19:22 ` Christoph Lameter
2005-06-27 19:22 ` Christoph Lameter
2005-06-27 19:42 ` Chen, Kenneth W
2005-06-27 19:42 ` Chen, Kenneth W
2005-07-05 15:11 ` Sonny Rao
2005-07-05 15:11 ` Sonny Rao
2005-07-05 15:31 ` Martin J. Bligh
2005-07-05 15:31 ` Martin J. Bligh
2005-07-05 15:37 ` Sonny Rao
2005-07-05 15:37 ` Sonny Rao
2005-06-27 13:17 ` Benjamin LaHaise
2005-06-27 13:17 ` Benjamin LaHaise
2005-06-28 0:32 ` Nick Piggin
2005-06-28 0:32 ` Nick Piggin
2005-06-28 1:26 ` William Lee Irwin III
2005-06-28 1:26 ` William Lee Irwin III
2005-06-27 14:08 ` Martin J. Bligh
2005-06-27 14:08 ` Martin J. Bligh
2005-06-27 17:49 ` Christoph Lameter
2005-06-27 17:49 ` Christoph Lameter
2005-06-29 10:49 ` Hirokazu Takahashi
2005-06-29 10:49 ` Hirokazu Takahashi
2005-06-29 11:38 ` Nick Piggin
2005-06-29 11:38 ` Nick Piggin
2005-06-30 3:32 ` Hirokazu Takahashi
2005-06-30 3:32 ` Hirokazu Takahashi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42C093B4.3010707@yahoo.com.au \
--to=nickpiggin@yahoo.com.au \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.