From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Hugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>,
Zachary Amsden <zach@vmware.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Rusty Russell <rusty@rustcorp.com.au>, Andi Kleen <ak@suse.de>,
Keir Fraser <keir@xensource.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: race with page_referenced_one->ptep_test_and_clear_young and pagetable setup/pulldown
Date: Fri, 05 Oct 2007 12:39:42 -0700 [thread overview]
Message-ID: <470692FE.6070203@goop.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0710051146300.25518@blonde.wat.veritas.com>
Hugh Dickins wrote:
> I see the discussion has somehow moved on to pagetable locking -
> mysterious because the word "lock" never once appears in your
> otherwise very helpful elucidation below, for which many thanks.
>
It has to be taken with a grain of salt. The reason "lock" isn't
mentioned is because I mis-analyzed the situation, and overlooked that
page_referenced_one does actually take the pte's lock.
> Maybe what I have to add is now of historical interest only, or none,
> but I was prevented from answering your original mail earlier...
>
> On Thu, 4 Oct 2007, Jeremy Fitzhardinge wrote:
>
>
>> David's change 10a8d6ae4b3182d6588a5809a8366343bc295c20, "i386: add
>> ptep_test_and_clear_{dirty,young}" has introduced an SMP race which
>> affects the Xen pv-ops backend.
>>
>
> I think that race with Xen has been there all along, not introduced
> by David's commit. Take a look at ptep_clear_flush_young() in the
> 2.6.21 include/asm-i386/pgtable.h, it looks equally a problem to me.
>
Yes, but I don't think its a problem with correct locking. set_pte_at
is a red herring.
>> When a pagetable is first created (either in execve or fork), the the
>> Xen paravirt backend pins the pagetable, and conversely, on exit it is
>> unpinned; this is done via the arch_dup_mmap() and activate_mm() hooks.
>> Pinning is done in two phases: first the pagetable pages are marked RO,
>> and then the pagetable is registered with Xen; unpinning is the
>>
>
> To my naive mind, your problem actually lies in those two stages:
> whatever marks the pages RO should not be keeping Xen in ignorance.
>
No, its the Xen-specific kernel code which does the RO marking: it marks
the pagetables RO (using a hypercall to make the actual modifications),
and then tells the hypervisor to pin the whole pagetable. It can't be
done atomically, so there's always a window between the two phases.
>> It all worked OK before David's change, because asm-generic/pgtable.h
>> uses set_pte_at(), which ends up making a hypercall to update the
>> pagetable, which always works regardless of the state of the pagetable
>> pages.
>>
>
> Except ptep_clear_flush_young() didn't use set_pte_at().
>
Yes, my mistake.
>> 3. Restructure the pagetable setup code so that the mm is not added
>> to the prio tree until after arch_dup_mmap has been called (and
>> the converse for exit_mmap). This is arguably cleaner, but I
>> haven't looked to see how much trouble this would be.
>>
>
> No. It is intentional that we make those ptes visible as early as
> possible, so that concurrent pageout (and less importantly swapoff)
> has the best chance of finding all references to a page (or swap ent).
> If they only became visible at the final arch_dup_mmap stage, then
> it might become impossible to fork a large well-populated mm, if it
> contains those very pages which need to be freed to make space to
> allocate pagetables for the child.
Hm, OK.
J
next prev parent reply other threads:[~2007-10-05 19:40 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-05 1:43 race with page_referenced_one->ptep_test_and_clear_young and pagetable setup/pulldown Jeremy Fitzhardinge
2007-10-05 1:52 ` Rik van Riel
2007-10-05 4:15 ` Jeremy Fitzhardinge
2007-10-05 13:17 ` Rik van Riel
2007-10-05 2:44 ` Andrew Morton
2007-10-05 4:08 ` Jeremy Fitzhardinge
2007-10-07 9:52 ` Nick Piggin
2007-10-05 11:36 ` Hugh Dickins
2007-10-05 18:58 ` Rik van Riel
2007-10-05 19:40 ` Jeremy Fitzhardinge
2007-10-05 19:56 ` Rik van Riel
2007-10-05 19:39 ` Jeremy Fitzhardinge [this message]
[not found] <C32B9BEC.E711%keir@xensource.com>
2007-10-05 8:03 ` Andi Kleen
2007-10-05 9:05 ` Jeremy Fitzhardinge
2007-10-05 9:15 ` Keir Fraser
2007-10-05 15:33 ` Hugh Dickins
2007-10-05 15:46 ` Keir Fraser
2007-10-05 16:48 ` Jeremy Fitzhardinge
2007-10-05 20:35 ` Jeremy Fitzhardinge
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=470692FE.6070203@goop.org \
--to=jeremy@goop.org \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=hugh@veritas.com \
--cc=keir@xensource.com \
--cc=linux-kernel@vger.kernel.org \
--cc=rientjes@google.com \
--cc=rusty@rustcorp.com.au \
--cc=torvalds@linux-foundation.org \
--cc=zach@vmware.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.