All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: Hugh Dickins <hugh@veritas.com>
Cc: Andrew Morton <akpm@osdl.org>, linux-mm@kvack.org
Subject: Re: [patch 1/2] mm: speculative get_page
Date: Mon, 7 Aug 2006 16:51:01 +0200	[thread overview]
Message-ID: <20060807145101.GF4433@wotan.suse.de> (raw)
In-Reply-To: <Pine.LNX.4.64.0608071530210.10881@blonde.wat.veritas.com>

On Mon, Aug 07, 2006 at 03:37:12PM +0100, Hugh Dickins wrote:
> On Mon, 7 Aug 2006, Nick Piggin wrote:
> > On Mon, Aug 07, 2006 at 11:11:15AM +0100, Hugh Dickins wrote:
> > > 
> > > Yes, I understand why remove_mapping and migrate_page_move_mapping
> > > (on page) do the PageNoNewRefs business; but why do add_to_page_cache,
> > > __add_to_swap_cache and migrate_page_move_mapping (on newpage) do it?
> > 
> > add_to_*_cache(), because they insert the page *then* set up fields
> > in the page. Without the bit set, the page is visible to pagecache
> > as soon as it hits the radix tree.
> 
> Aha, thank you.
> 
> > In the page_cache case, I have a subsequent patch to rearrange this a bit,
> > and reduce the number of atomic ops. I thought it would just add too much
> > to review for now, though.
> 
> Well, it's a slightly different use for PageNoNewRefs, and would need
> to be commented if it stays: I'd recommend avoiding the need for that
> comment and the unnecessary atomics, doing your rearrangement in a
> preceding patch.

It's the same use in that when you combine tree_lock with the page
bit, you get the same semantics as the old write_lock(&tree_lock).

What I mean is: if the current slightly different uses of tree_lock
don't warrant different comments, then I'm don't see that PG_nnr
does either.

> 
> Though maybe cleaner to have mapping/index/SwapCache/private properly
> set before inserting page into radix tree, page_cache_get_speculative
> callers all have to check afterwards and repeat if wrong; so the only
> thing that's essential to do earlier is the SetPageLocked, isn't it?

Something like that, yes.

> 
> On the subject of mapping/index, I think there's potential for a very
> very unlikely race you're ignoring, a race you can blame on me and my
> passion for squeezing in alternative uses of struct page fields:
> 
> Isn't it conceivable that a page_cache_get_speculative finds a page
> in the radix tree, but by the time its callers do those mapping/index
> checks, that page is reused for some other purpose completely, which
> happens to set the field formerly known as page->mapping to something
> (perhaps a sequence of 4 or 8 random bytes) identical to what was
> there before (and leaves index untouched, or changes it to the same)?
> 
> I'm thinking particularly of the per-pagetable page spinlock, where
> what goes into page->mapping depends on CONFIG_DEBUG_SPINLOCK de jour.
> 
> I think we can probably (but I've not tried) satisfy ourselves that
> there's currently no way that can happen; but how shall we prevent
> ourselves from later making a change which opens up the possibility?
> (By passing my address to a hitman, perhaps.)
> 
> An alternative would be to go more the radix_tree_lookup_slot way,
> and the checks be on page remaining in slot; but I think you comment
> that it cannot be used for RCU lookups, I didn't investigate further.
> 
> This is not a grave concern: but (unless I'm plain wrong) we do need
> to be aware of it.

No, it is something I'm worried about too. And definitely the lookup_slot
approach would solve it. I'm inclined to go back to the lookup_slot
method which would solve the weird gang lookup problems that come about
with this approach. And as another bonus we don't need find_get_swap_page.

The problem is not so much with RCU lookups as with the direct-data
patch: one can take the address of a slot in a radix-tree node with the
knowledge that, under RCU lock, it will always dereference to either
NULL or a valid item.

However, direct data stores 0-height trees' item at ->rnode. But this
can also be switched to point to a radix-tree node or vice versa at
any time.

The solution is just a little bit more API to do the dereferencing work
for us. Shouldn't be a big problem.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2006-08-07 14:51 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-07-26  6:39 [patch 1/2] mm: speculative get_page Nick Piggin
2006-07-31 15:35 ` Andy Whitcroft
2006-08-01  8:45   ` Nick Piggin
2006-08-07 10:11 ` Hugh Dickins
     [not found]   ` <20060807132633.GD4433@wotan.suse.de>
2006-08-07 14:37     ` Hugh Dickins
2006-08-07 14:51       ` Nick Piggin [this message]
  -- strict thread matches above, loose matches on Subject: below --
2006-08-01 19:32 Oleg Nesterov
2006-08-01 19:32 ` Oleg Nesterov
2006-08-01 15:55 ` Dave Kleikamp
2006-08-01 15:55   ` Dave Kleikamp
2006-08-01 20:42   ` Oleg Nesterov
2006-08-01 20:42     ` Oleg Nesterov
2006-08-01 23:53     ` Nick Piggin
2006-08-01 23:53       ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060807145101.GF4433@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=akpm@osdl.org \
    --cc=hugh@veritas.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.