All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>, Andrew Morton <akpm@osdl.org>,
	Linux Memory Management <linux-mm@kvack.org>
Subject: Re: [patch 3/9] mm: speculative get page
Date: Mon, 25 Sep 2006 15:04:13 +0200	[thread overview]
Message-ID: <1159189453.5018.25.camel@lappy> (raw)
In-Reply-To: <20060925114739.GA31148@wotan.suse.de>

On Mon, 2006-09-25 at 13:47 +0200, Nick Piggin wrote:

> +/*
> + * speculatively take a reference to a page.
> + * If the page is free (_count == 0), then _count is untouched, and 0
> + * is returned. Otherwise, _count is incremented by 1 and 1 is returned.
> + *
> + * This function must be run in the same rcu_read_lock() section as has
> + * been used to lookup the page in the pagecache radix-tree: this allows
> + * allocators to use a synchronize_rcu() to stabilize _count.
> + *
> + * Unless an RCU grace period has passed, the count of all pages coming out
> + * of the allocator must be considered unstable. page_count may return higher
> + * than expected, and put_page must be able to do the right thing when the
> + * page has been finished with (because put_page is what is used to drop an
> + * invalid speculative reference).
> + *
> + * This forms the core of the lockless pagecache locking protocol, where
> + * the lookup-side (eg. find_get_page) has the following pattern:
> + * 1. find page in radix tree
> + * 2. conditionally increment refcount
> + * 3. check the page is still in pagecache (if no, goto 1)
> + *
> + * Remove-side that cares about stability of _count (eg. reclaim) has the
                                   ^^^^^^^^^^^^^^^^^^^
is that the reason that the following two code paths are good without
change:

  truncate_inode_page_range()
   truncate_complete_page()
     remove_from_page_cache()
       radix_tree_delete()

and

  zap_pte_range()
    free_swap_and_cache()  <-- does check page_count()
      delete_from_swap_cache()
        __delete_from_swap_cache()
          radix_tree_delete()

>From the comments around the truncate bit it seems to be ok with keeping
the page as anonymous, however the zap_pte_range() thing does seem to
want to have a stable page_count().

> + * following (with tree_lock held for write):
> + * A. atomically check refcount is correct and set it to 0 (atomic_cmpxchg)
> + * B. remove page from pagecache
> + * C. free the page
> + *
> + * There are 2 critical interleavings that matter:
> + * - 2 runs before A: in this case, A sees elevated refcount and bails out
> + * - A runs before 2: in this case, 2 sees zero refcount and retries;
> + *   subsequently, B will complete and 1 will find no page, causing the
> + *   lookup to return NULL.
> + *
> + * It is possible that between 1 and 2, the page is removed then the exact same
> + * page is inserted into the same position in pagecache. That's OK: the
> + * old find_get_page using tree_lock could equally have run before or after
> + * such a re-insertion, depending on order that locks are granted.
> + *
> + * Lookups racing against pagecache insertion isn't a big problem: either 1
> + * will find the page or it will not. Likewise, the old find_get_page could run
> + * either before the insertion or afterwards, depending on timing.
> + */

Awesome code ;-)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-09-25 13:04 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-09-22 19:22 [patch 0/4] lockless pagecache for 2.6.18-rc7-mm1 Nick Piggin
2006-09-22 19:22 ` [patch 2/4] radix-tree: use indirect bit Nick Piggin
2006-09-22 19:22 ` [patch 2/9] radix-tree: gang_lookup_slot Nick Piggin
2006-09-22 19:22 ` [patch 3/9] mm: speculative get page Nick Piggin
2006-09-23 10:01   ` Peter Zijlstra
2006-09-24 18:01   ` Hugh Dickins
2006-09-25  2:00     ` Nick Piggin
2006-09-25 11:47       ` Nick Piggin
2006-09-25 13:04         ` Peter Zijlstra [this message]
2006-09-25 22:41           ` Nick Piggin
2006-09-22 19:22 ` [patch 4/9] mm: lockless pagecache lookups Nick Piggin
2006-09-22 20:01   ` Lee Schermerhorn
2006-09-23  2:35     ` Nick Piggin
2006-09-22 19:24 ` [patch 0/4] lockless pagecache for 2.6.18-rc7-mm1 Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1159189453.5018.25.camel@lappy \
    --to=a.p.zijlstra@chello.nl \
    --cc=akpm@osdl.org \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.