* [PATCH] mm: fix hang on anon_vma->root->lock @ 2010-08-26 6:12 Hugh Dickins 2010-08-26 6:41 ` David Miller ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Hugh Dickins @ 2010-08-26 6:12 UTC (permalink / raw) To: Linus Torvalds Cc: Andrew Morton, Rik van Riel, Andrea Arcangeli, linux-kernel, linux-mm After several hours, kbuild tests hang with anon_vma_prepare() spinning on a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y (which makes this very much more likely, but it could happen without). The ever-subtle page_lock_anon_vma() now needs a further twist: since anon_vma_prepare() and anon_vma_fork() are liable to change the ->root of a reused anon_vma structure at any moment, page_lock_anon_vma() needs to check page_mapped() again before succeeding, otherwise page_unlock_anon_vma() might address a different root->lock. Signed-off-by: Hugh Dickins <hughd@google.com> --- mm/rmap.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) --- 2.6.36-rc2/mm/rmap.c 2010-08-16 00:18:01.000000000 -0700 +++ linux/mm/rmap.c 2010-08-24 03:22:17.000000000 -0700 @@ -316,7 +316,7 @@ void __init anon_vma_init(void) */ struct anon_vma *page_lock_anon_vma(struct page *page) { - struct anon_vma *anon_vma; + struct anon_vma *anon_vma, *root_anon_vma; unsigned long anon_mapping; rcu_read_lock(); @@ -327,8 +327,21 @@ struct anon_vma *page_lock_anon_vma(stru goto out; anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); - anon_vma_lock(anon_vma); - return anon_vma; + root_anon_vma = ACCESS_ONCE(anon_vma->root); + spin_lock(&root_anon_vma->lock); + + /* + * If this page is still mapped, then its anon_vma cannot have been + * freed. But if it has been unmapped, we have no security against + * the anon_vma structure being freed and reused (for another anon_vma: + * SLAB_DESTROY_BY_RCU guarantees that - so the spin_lock above cannot + * corrupt): with anon_vma_prepare() or anon_vma_fork() redirecting + * anon_vma->root before page_unlock_anon_vma() is called to unlock. + */ + if (page_mapped(page)) + return anon_vma; + + spin_unlock(&root_anon_vma->lock); out: rcu_read_unlock(); return NULL; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-26 6:12 [PATCH] mm: fix hang on anon_vma->root->lock Hugh Dickins @ 2010-08-26 6:41 ` David Miller 2010-08-26 10:54 ` Hugh Dickins 2010-08-26 13:32 ` Rik van Riel 2010-08-26 23:50 ` Andrea Arcangeli 2 siblings, 1 reply; 22+ messages in thread From: David Miller @ 2010-08-26 6:41 UTC (permalink / raw) To: hughd; +Cc: torvalds, akpm, riel, aarcange, linux-kernel, linux-mm From: Hugh Dickins <hughd@google.com> Date: Wed, 25 Aug 2010 23:12:54 -0700 (PDT) > After several hours, kbuild tests hang with anon_vma_prepare() spinning on > a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y > (which makes this very much more likely, but it could happen without). > > The ever-subtle page_lock_anon_vma() now needs a further twist: since > anon_vma_prepare() and anon_vma_fork() are liable to change the ->root > of a reused anon_vma structure at any moment, page_lock_anon_vma() > needs to check page_mapped() again before succeeding, otherwise > page_unlock_anon_vma() might address a different root->lock. > > Signed-off-by: Hugh Dickins <hughd@google.com> Interesting, is the condition which allows this to trigger specific to this merge window or was it always possible? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-26 6:41 ` David Miller @ 2010-08-26 10:54 ` Hugh Dickins 2010-08-26 19:00 ` David Miller 2010-08-27 0:19 ` Andrea Arcangeli 0 siblings, 2 replies; 22+ messages in thread From: Hugh Dickins @ 2010-08-26 10:54 UTC (permalink / raw) To: David Miller; +Cc: torvalds, akpm, riel, aarcange, linux-kernel, linux-mm On Wed, Aug 25, 2010 at 11:41 PM, David Miller <davem@davemloft.net> wrote: > From: Hugh Dickins <hughd@google.com> > Date: Wed, 25 Aug 2010 23:12:54 -0700 (PDT) > >> After several hours, kbuild tests hang with anon_vma_prepare() spinning on >> a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y >> (which makes this very much more likely, but it could happen without). >> >> The ever-subtle page_lock_anon_vma() now needs a further twist: since >> anon_vma_prepare() and anon_vma_fork() are liable to change the ->root >> of a reused anon_vma structure at any moment, page_lock_anon_vma() >> needs to check page_mapped() again before succeeding, otherwise >> page_unlock_anon_vma() might address a different root->lock. >> >> Signed-off-by: Hugh Dickins <hughd@google.com> > > Interesting, is the condition which allows this to trigger specific > to this merge window or was it always possible? Just specific to this merge window, which started using anon_vma->root->lock in place of anon_vma->lock (anon_vma->root is often anon_vma itself, but not always). I _think_ that change was itself a simplification of the locking in 2.6.35, rather than plugging a particular hole (it's not been backported to -stable), but I may be wrong on that - Rik? Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-26 10:54 ` Hugh Dickins @ 2010-08-26 19:00 ` David Miller 2010-08-27 0:19 ` Andrea Arcangeli 1 sibling, 0 replies; 22+ messages in thread From: David Miller @ 2010-08-26 19:00 UTC (permalink / raw) To: hughd; +Cc: torvalds, akpm, riel, aarcange, linux-kernel, linux-mm From: Hugh Dickins <hughd@google.com> Date: Thu, 26 Aug 2010 03:54:28 -0700 > On Wed, Aug 25, 2010 at 11:41 PM, David Miller <davem@davemloft.net> wrote: >> From: Hugh Dickins <hughd@google.com> >> Date: Wed, 25 Aug 2010 23:12:54 -0700 (PDT) >> >>> After several hours, kbuild tests hang with anon_vma_prepare() spinning on >>> a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y >>> (which makes this very much more likely, but it could happen without). >>> >>> The ever-subtle page_lock_anon_vma() now needs a further twist: since >>> anon_vma_prepare() and anon_vma_fork() are liable to change the ->root >>> of a reused anon_vma structure at any moment, page_lock_anon_vma() >>> needs to check page_mapped() again before succeeding, otherwise >>> page_unlock_anon_vma() might address a different root->lock. >>> >>> Signed-off-by: Hugh Dickins <hughd@google.com> >> >> Interesting, is the condition which allows this to trigger specific >> to this merge window or was it always possible? > > Just specific to this merge window, which started using > anon_vma->root->lock in place of anon_vma->lock (anon_vma->root is > often anon_vma itself, but not always). Ok, because I've been seeing mysterious hangs recently on one of my sparc64 boxes and I was wondering if this might explain them :-) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-26 10:54 ` Hugh Dickins 2010-08-26 19:00 ` David Miller @ 2010-08-27 0:19 ` Andrea Arcangeli 1 sibling, 0 replies; 22+ messages in thread From: Andrea Arcangeli @ 2010-08-27 0:19 UTC (permalink / raw) To: Hugh Dickins; +Cc: David Miller, torvalds, akpm, riel, linux-kernel, linux-mm On Thu, Aug 26, 2010 at 03:54:28AM -0700, Hugh Dickins wrote: > On Wed, Aug 25, 2010 at 11:41 PM, David Miller <davem@davemloft.net> wrote: > > From: Hugh Dickins <hughd@google.com> > > Date: Wed, 25 Aug 2010 23:12:54 -0700 (PDT) > > > >> After several hours, kbuild tests hang with anon_vma_prepare() spinning on > >> a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y > >> (which makes this very much more likely, but it could happen without). > >> > >> The ever-subtle page_lock_anon_vma() now needs a further twist: since > >> anon_vma_prepare() and anon_vma_fork() are liable to change the ->root > >> of a reused anon_vma structure at any moment, page_lock_anon_vma() > >> needs to check page_mapped() again before succeeding, otherwise > >> page_unlock_anon_vma() might address a different root->lock. > >> > >> Signed-off-by: Hugh Dickins <hughd@google.com> > > > > Interesting, is the condition which allows this to trigger specific > > to this merge window or was it always possible? > > Just specific to this merge window, which started using > anon_vma->root->lock in place of anon_vma->lock (anon_vma->root is > often anon_vma itself, but not always). I _think_ that change was > itself a simplification of the locking in 2.6.35, rather than plugging > a particular hole (it's not been backported to -stable), but I may be > wrong on that - Rik? rmap_walk_anon isn't stable without the anon_vma->root->lock. This is because it only locks the local anon_vma and but the "vma" can be still modified under it in vma_adjust, so vma_address can fail and migration will crash. swapping is not issue as it's ok to miss a pte once in a while if the vma is under vma_adjust by the time the VM tries to unmap the page. See the anon_vma_lock added as well to vma_adjust (without it, it'd be just an useless exercise, but instead it's really plugging a real hole thanks to the anon_vma_lock addition to vma_adjust). Again not an issue unless you use migration (incidentally what memory compaction uses, and you know who the primary user of memory compaction is :). Secondly ksm_does_need_to_copy can't be fixed to cow only pages that are nonlinear as the page->mapping != anon_vma can now generate false positives. It's now trivial to fix with page->mapping->root != anon_vma->root. About your patch, it's a noop in my view... A single page_mapping check after rcu_read_lock is enough. And "anon_vma->root" can't change if page->mapping points to "anon_vma" at any time after rcu_read_lock returns. rcu_read_lock works for all anon_vma including anon_vma->root. If you were running the page_mapped() check on a different "page" then it could make a difference, but repeating it on the same page in the same rcu_read_lock protected critical section won't make a difference as far as the anon_vma freeing is concerned. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-26 6:12 [PATCH] mm: fix hang on anon_vma->root->lock Hugh Dickins 2010-08-26 6:41 ` David Miller @ 2010-08-26 13:32 ` Rik van Riel 2010-08-26 23:50 ` Andrea Arcangeli 2 siblings, 0 replies; 22+ messages in thread From: Rik van Riel @ 2010-08-26 13:32 UTC (permalink / raw) To: Hugh Dickins Cc: Linus Torvalds, Andrew Morton, Andrea Arcangeli, linux-kernel, linux-mm On 08/26/2010 02:12 AM, Hugh Dickins wrote: > After several hours, kbuild tests hang with anon_vma_prepare() spinning on > a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y > (which makes this very much more likely, but it could happen without). > > The ever-subtle page_lock_anon_vma() now needs a further twist: since > anon_vma_prepare() and anon_vma_fork() are liable to change the ->root > of a reused anon_vma structure at any moment, page_lock_anon_vma() > needs to check page_mapped() again before succeeding, otherwise > page_unlock_anon_vma() might address a different root->lock. > > Signed-off-by: Hugh Dickins<hughd@google.com> Reviewed-by: Rik van Riel <riel@redhat.com> And yes, AFAIK this code lived just in -mm up to now. -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-26 6:12 [PATCH] mm: fix hang on anon_vma->root->lock Hugh Dickins 2010-08-26 6:41 ` David Miller 2010-08-26 13:32 ` Rik van Riel @ 2010-08-26 23:50 ` Andrea Arcangeli 2010-08-27 1:43 ` Hugh Dickins 2 siblings, 1 reply; 22+ messages in thread From: Andrea Arcangeli @ 2010-08-26 23:50 UTC (permalink / raw) To: Hugh Dickins Cc: Linus Torvalds, Andrew Morton, Rik van Riel, linux-kernel, linux-mm Hi Hugh, On Wed, Aug 25, 2010 at 11:12:54PM -0700, Hugh Dickins wrote: > After several hours, kbuild tests hang with anon_vma_prepare() spinning on > a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y > (which makes this very much more likely, but it could happen without). > > The ever-subtle page_lock_anon_vma() now needs a further twist: since > anon_vma_prepare() and anon_vma_fork() are liable to change the ->root > of a reused anon_vma structure at any moment, page_lock_anon_vma() > needs to check page_mapped() again before succeeding, otherwise > page_unlock_anon_vma() might address a different root->lock. I don't get it, the anon_vma can be freed and reused only after we run rcu_read_unlock(). And the anon_vma->root can't change unless the anon_vma is freed and reused. Last but not the least by the time page->mapping points to "anon_vma" the "anon_vma->root" is already initialized and stable. The page_mapped test is only relevant against the rcu_read_lock, not the spin_lock, so how it can make a difference to run it twice inside the same rcu_read_lock protected critical section? The first one still is valid also after the anon_vma_lock() returns, it's not like that anon_vma_lock drops the rcu_read_lock internally. Furthermore no need of ACCESS_ONCE on the anon_vma->root because it can't change from under us as the anon_vma can't be freed from under us until rcu_read_unlock returns (after we verified the first time that page_mapped is true under the rcu_read_lock, which we already do before trying to take the anon_vma_lock). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-26 23:50 ` Andrea Arcangeli @ 2010-08-27 1:43 ` Hugh Dickins 2010-08-27 9:55 ` Andrea Arcangeli 0 siblings, 1 reply; 22+ messages in thread From: Hugh Dickins @ 2010-08-27 1:43 UTC (permalink / raw) To: Andrea Arcangeli Cc: Linus Torvalds, Andrew Morton, Rik van Riel, linux-kernel, linux-mm On Thu, Aug 26, 2010 at 4:50 PM, Andrea Arcangeli <aarcange@redhat.com> wrote: > On Wed, Aug 25, 2010 at 11:12:54PM -0700, Hugh Dickins wrote: >> After several hours, kbuild tests hang with anon_vma_prepare() spinning on >> a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y >> (which makes this very much more likely, but it could happen without). >> >> The ever-subtle page_lock_anon_vma() now needs a further twist: since >> anon_vma_prepare() and anon_vma_fork() are liable to change the ->root >> of a reused anon_vma structure at any moment, page_lock_anon_vma() >> needs to check page_mapped() again before succeeding, otherwise >> page_unlock_anon_vma() might address a different root->lock. > > I don't get it, the anon_vma can be freed and reused only after we run > rcu_read_unlock(). No. Between rcu_read_lock and rcu_read_unlock, once we've done the first (original) page_mapped test to make sure that this isn't just a long-stale page->mapping left over in there, SLAB_DESTROY_BY_RCU ensures that the slab page on which this "struct anon_vma" resides cannot be freed and reused for some other purpose e.g. a page of user data. But that piece of slab holding this "struct anon_vma" is liable to be freed and reused for another struct anon_vma at any point, until we've got the right lock on it. > And the anon_vma->root can't change unless the > anon_vma is freed and reused. Yes, but RCU is not protecting against that: all it's doing is guaranteeing that when we "speculatively" spin_lock the anon_vma, we won't be corrupting some other kind of structure or data. > Last but not the least by the time > page->mapping points to "anon_vma" the "anon_vma->root" is already > initialized and stable. Yes, but two things to be careful of there: one, we leave page->mapping pointing to the anon_vma maybe long after that address has got reused for something else, it's only when the page is finally freed that it's cleared (and there certainly was a good racing reason for that, but I'd have to think long and hard to reconstruct the sequence - OTOH it was a race between page_remove_rmap and page_add_anon_rmap); and two, notice how anon_vma_prepare() sets anon_vma->root = anon_vma on a newly allocated anon_vma, before it gets anon_vma_lock - so anon_vma->root can change underneath us at any point there, until we've got anon_vma_lock _and_ checked stability with a second page_mapped test. > > The page_mapped test is only relevant against the rcu_read_lock, not > the spin_lock, so how it can make a difference to run it twice inside > the same rcu_read_lock protected critical section? The first one still > is valid also after the anon_vma_lock() returns, it's not like that > anon_vma_lock drops the rcu_read_lock internally. > > Furthermore no need of ACCESS_ONCE on the anon_vma->root because it > can't change from under us as the anon_vma can't be freed from under > us until rcu_read_unlock returns (after we verified the first time > that page_mapped is true under the rcu_read_lock, which we already do > before trying to take the anon_vma_lock). I must rush off for a few hours: hopefully my assertions above shed some light., I think you're mistaking the role that RCU plays here. You remark in other mail "About your patch, it's a noop in my view..." - but seems quite an effective noop ;) Without the patch my kbuilds would hang usually within 6 hours - though one time they did manage 20 hours, I admit. They ran with the patch for 52 hours before I switched the machine over to something else. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-27 1:43 ` Hugh Dickins @ 2010-08-27 9:55 ` Andrea Arcangeli 2010-08-27 16:43 ` Hugh Dickins 0 siblings, 1 reply; 22+ messages in thread From: Andrea Arcangeli @ 2010-08-27 9:55 UTC (permalink / raw) To: Hugh Dickins Cc: Linus Torvalds, Andrew Morton, Rik van Riel, linux-kernel, linux-mm On Thu, Aug 26, 2010 at 06:43:31PM -0700, Hugh Dickins wrote: > some light., I think you're mistaking the role that RCU plays here. That's exactly correct, I thought it prevented reuse of the slab entry, not only of the whole slab... SLAB_DESTROY_BY_RCU is a lot more tricky to use than I though... However at the light of this, I think page_lock_anon_vma could have returned a freed and reused anon_vma well before the anon-vma changes. The anon_vma could have been freed after the first page_mapped check succeed but before taking the spinlock. I think, it worked fine because the rmap walks are robust enough just not to fall apart on a reused anon_vma while the lock is hold. It become a visible problem now because we were unlocking the wrong lock leading to a deadlock. But I guess it wasn't too intentional to return a reused anon_vma out of page_lock_anon_vma. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-27 9:55 ` Andrea Arcangeli @ 2010-08-27 16:43 ` Hugh Dickins 2010-08-27 17:13 ` Christoph Lameter 0 siblings, 1 reply; 22+ messages in thread From: Hugh Dickins @ 2010-08-27 16:43 UTC (permalink / raw) To: Andrea Arcangeli Cc: Linus Torvalds, Andrew Morton, Rik van Riel, Christoph Lameter, Peter Zijlstra, linux-kernel, linux-mm On Fri, Aug 27, 2010 at 2:55 AM, Andrea Arcangeli <aarcange@redhat.com> wrote: > On Thu, Aug 26, 2010 at 06:43:31PM -0700, Hugh Dickins wrote: >> some light., I think you're mistaking the role that RCU plays here. > > That's exactly correct, I thought it prevented reuse of the slab > entry, not only of the whole slab... SLAB_DESTROY_BY_RCU is a lot more > tricky to use than I though... > > However at the light of this, I think page_lock_anon_vma could have > returned a freed and reused anon_vma well before the anon-vma changes. > > The anon_vma could have been freed after the first page_mapped check > succeed but before taking the spinlock. I think, it worked fine > because the rmap walks are robust enough just not to fall apart on a > reused anon_vma while the lock is hold. It become a visible problem > now because we were unlocking the wrong lock leading to a > deadlock. But I guess it wasn't too intentional to return a reused > anon_vma out of page_lock_anon_vma. What you say there is all exactly right, except for "I guess it wasn't too intentional": it was intentional, and known that it all worked out okay in the rare case when a reused anon_vma got fed into the loops - the anon_vma, after all, is nothing more than a list of places where you may find the page mapped, it has never asserted that a page will be found everywhere that the anon_vma lists. I would have liked to say "well known" above, but perhaps well known only to me: you're certainly not the first to be surprised by this. IIRC both Christoph and Peter have at different times proposed patches to tighten up page_lock_anon_vma() to avoid returning a stale/reused anon_vma, probably both were dropped because neither was actually necessary, until now: I guess it's a good thing for understandability that anon_vma->root->lock now requires that we weed out that case. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-27 16:43 ` Hugh Dickins @ 2010-08-27 17:13 ` Christoph Lameter 2010-08-27 17:55 ` Hugh Dickins 0 siblings, 1 reply; 22+ messages in thread From: Christoph Lameter @ 2010-08-27 17:13 UTC (permalink / raw) To: Hugh Dickins Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel, Peter Zijlstra, linux-kernel, linux-mm On Fri, 27 Aug 2010, Hugh Dickins wrote: > I would have liked to say "well known" above, but perhaps well known > only to me: you're certainly not the first to be surprised by this. Most people dealing with this for the first time get through a discovery period. The network guys had similar problems when they first tried to use SLAB_DESTROY_BY_RCU. > IIRC both Christoph and Peter have at different times proposed patches > to tighten up page_lock_anon_vma() to avoid returning a stale/reused > anon_vma, probably both were dropped because neither was actually > necessary, until now: I guess it's a good thing for understandability > that anon_vma->root->lock now requires that we weed out that case. Right. We need to verify that the object we have reached is the correct one. The basic problem with SLAB_DESTROY_BY_RCU is that you get a reference to an object that is guaranteed only to have the same type (the instance may fluctuate and be replaced from under you unless other measures are taken). Typically one must take a lock within the memory structure to pin down the object (or take a refcount). Only then can you follow pointers and such. It is only possible to verify that the right object has been reached *after* locking. Following a pointer without having determined that we hit the right object should not occur. A solution here would be to take the anon_vma->lock (prevents the object switching under us) and then verify that the mapping is the one we are looking for and that the pointer points to the right root. Then take the root lock. Hughs solution takes a global spinlock which will limit scalability. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-27 17:13 ` Christoph Lameter @ 2010-08-27 17:55 ` Hugh Dickins 2010-08-27 19:29 ` Christoph Lameter 0 siblings, 1 reply; 22+ messages in thread From: Hugh Dickins @ 2010-08-27 17:55 UTC (permalink / raw) To: Christoph Lameter Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel, Peter Zijlstra, linux-kernel, linux-mm On Fri, Aug 27, 2010 at 10:13 AM, Christoph Lameter <cl@linux.com> wrote: > The basic problem with SLAB_DESTROY_BY_RCU is that you get a reference to > an object that is guaranteed only to have the same type (the instance may > fluctuate and be replaced from under you unless other measures are taken). (I wouldn't describe that as a "problem with SLAB_DESTROY_BY_RCU": it's precisely the nature of SLAB_DESTROY_BY_RCU, what makes it useful in solving backward-locking problems elsewhere.) > > Typically one must take a lock within the memory structure to pin down > the object (or take a refcount). Only then can you follow pointers and > such. It is only possible to verify that the right object has been > reached *after* locking. Following a pointer without having determined > that we hit the right object should not occur. > > A solution here would be to take the anon_vma->lock (prevents the object > switching under us) and then verify that the mapping is the one we are > looking for and that the pointer points to the right root. Then take the > root lock. > > Hughs solution takes a global spinlock which will limit scalability. Eh? My solution was a second page_mapped(page) test i.e. testing an atomic. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-27 17:55 ` Hugh Dickins @ 2010-08-27 19:29 ` Christoph Lameter 2010-08-27 20:14 ` Hugh Dickins 0 siblings, 1 reply; 22+ messages in thread From: Christoph Lameter @ 2010-08-27 19:29 UTC (permalink / raw) To: Hugh Dickins Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel, Peter Zijlstra, linux-kernel, linux-mm On Fri, 27 Aug 2010, Hugh Dickins wrote: > Eh? My solution was a second page_mapped(page) test i.e. testing an atomic. Argh. Right. Looked like a global to me. Did not see the earlier local def. If you still use a pointer then what does insure that the root pointer was not changed after the ACCESS_ONCE? The free semantics of an anon_vma? Since there is no lock taken before the mapped check none of the earlier reads from the anon vma structure nor the page mapped check necessarily reflect a single state of the anon_vma. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-27 19:29 ` Christoph Lameter @ 2010-08-27 20:14 ` Hugh Dickins 2010-08-27 20:56 ` Christoph Lameter 0 siblings, 1 reply; 22+ messages in thread From: Hugh Dickins @ 2010-08-27 20:14 UTC (permalink / raw) To: Christoph Lameter Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel, Peter Zijlstra, linux-kernel, linux-mm On Fri, Aug 27, 2010 at 12:29 PM, Christoph Lameter <cl@linux.com> wrote: > On Fri, 27 Aug 2010, Hugh Dickins wrote: > >> Eh? My solution was a second page_mapped(page) test i.e. testing an atomic. > > Argh. Right. Looked like a global to me. Did not see the earlier local > def. > > If you still use a pointer then what does insure that the root > pointer was not changed after the ACCESS_ONCE? The free semantics > of an anon_vma? Nothing ensures that the root pointer was not changed after the ACCESS_ONCE, that's exactly why we use ACCESS_ONCE there: once we've got the lock and realize that what we've locked may not be what we wanted (or may change from what we were wanting at any moment, the page no longer being mapped there - but in that case we no longer want it), we have to be sure to unlock the one we locked, rather than the one which anon_vma->root might subsequently point to. (Umm, maybe I'm not the clearest of explainers, sorry! If you get my point, fine; if it's gibberish to you, please ask me to try again.) > > Since there is no lock taken before the mapped check none of the > earlier reads from the anon vma structure nor the page mapped check > necessarily reflect a single state of the anon_vma. There's no lock (other than RCU's read "lock") taken before the original mapped check, and that's important, otherwise our attempt to lock might actually spinon or corrupt something that was long ago an anon_vma. But we do take the anon_vma->root->lock before the second mapped check which I added. If the page is still mapped at the point of that second check, then we know that we got the right anon_vma, that the page might be mapped in it, and anon_vma->root is not going to change underneath us before the page_unlock_anon_vma(). (The page may get unmapped at any time, the lock does not protect against that; but if it's still mapped once we hold the lock, free_pgtables() cannot free the anon_vma until we're done.) Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-27 20:14 ` Hugh Dickins @ 2010-08-27 20:56 ` Christoph Lameter 2010-08-27 21:28 ` Hugh Dickins 0 siblings, 1 reply; 22+ messages in thread From: Christoph Lameter @ 2010-08-27 20:56 UTC (permalink / raw) To: Hugh Dickins Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel, Peter Zijlstra, linux-kernel, linux-mm On Fri, 27 Aug 2010, Hugh Dickins wrote: > Nothing ensures that the root pointer was not changed after the > ACCESS_ONCE, that's exactly why we use ACCESS_ONCE there: once we've > got the lock and realize that what we've locked may not be what we > wanted (or may change from what we were wanting at any moment, the > page no longer being mapped there - but in that case we no longer want > it), we have to be sure to unlock the one we locked, rather than the > one which anon_vma->root might subsequently point to. I do not see any check after we have taken the lock to verify that we locked the correct object. Was there a second version of the patch? > > Since there is no lock taken before the mapped check none of the > > earlier reads from the anon vma structure nor the page mapped check > > necessarily reflect a single state of the anon_vma. > > There's no lock (other than RCU's read "lock") taken before the > original mapped check, and that's important, otherwise our attempt to > lock might actually spinon or corrupt something that was long ago an > anon_vma. But we do take the anon_vma->root->lock before the second > mapped check which I added. If the page is still mapped at the point You then are using an object from the anon_vma (the pointer) without a lock! This is unstable therefore unless there are other constraints. The anon_vma->lock must be taken before derefencing that pointer. The page may have been unmapped and mapped again between the two checks. Unlikely but possible. > of that second check, then we know that we got the right anon_vma, I do not see a second check (*after* taking the lock) in the patch and the way the lock is taken can be a problem in itself. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-27 20:56 ` Christoph Lameter @ 2010-08-27 21:28 ` Hugh Dickins 2010-08-27 21:33 ` Hugh Dickins 0 siblings, 1 reply; 22+ messages in thread From: Hugh Dickins @ 2010-08-27 21:28 UTC (permalink / raw) To: Christoph Lameter Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel, Peter Zijlstra, linux-kernel, linux-mm On Fri, Aug 27, 2010 at 1:56 PM, Christoph Lameter <cl@linux.com> wrote: > On Fri, 27 Aug 2010, Hugh Dickins wrote: > >> Nothing ensures that the root pointer was not changed after the >> ACCESS_ONCE, that's exactly why we use ACCESS_ONCE there: once we've >> got the lock and realize that what we've locked may not be what we >> wanted (or may change from what we were wanting at any moment, the >> page no longer being mapped there - but in that case we no longer want >> it), we have to be sure to unlock the one we locked, rather than the >> one which anon_vma->root might subsequently point to. > > I do not see any check after we have taken the lock to verify that we > locked the correct object. Was there a second version of the patch? No second version of the patch, no. As I said already, it's that second page_mapped check which gives the guarantee that the anon_vma has not yet been freed, hence we've locked the correct object. > >> > Since there is no lock taken before the mapped check none of the >> > earlier reads from the anon vma structure nor the page mapped check >> > necessarily reflect a single state of the anon_vma. >> >> There's no lock (other than RCU's read "lock") taken before the >> original mapped check, and that's important, otherwise our attempt to >> lock might actually spinon or corrupt something that was long ago an >> anon_vma. But we do take the anon_vma->root->lock before the second >> mapped check which I added. If the page is still mapped at the point > > You then are using an object from the anon_vma (the pointer) without a > lock! Yes. (not counting RCU's read "lock" as a lock). > This is unstable therefore unless there are other constraints. The > anon_vma->lock must be taken before derefencing that pointer. No, SLAB_DESTROY_BY_RCU gives us just the stablity we need to take the lock ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-27 21:28 ` Hugh Dickins @ 2010-08-27 21:33 ` Hugh Dickins 2010-08-27 23:06 ` Christoph Lameter 0 siblings, 1 reply; 22+ messages in thread From: Hugh Dickins @ 2010-08-27 21:33 UTC (permalink / raw) To: Christoph Lameter Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel, Peter Zijlstra, linux-kernel, linux-mm Sorry, I seem to have hit some key which sent the mail off too soon, a tab perhaps: finishing up... On Fri, Aug 27, 2010 at 2:28 PM, Hugh Dickins <hughd@google.com> wrote: > On Fri, Aug 27, 2010 at 1:56 PM, Christoph Lameter <cl@linux.com> wrote: >> >>> of that second check, then we know that we got the right anon_vma, >> >> I do not see a second check (*after* taking the lock) in the patch if (page_mapped(page)) return anon_vma; >> and the way the lock is taken can be a problem in itself. No, that's what we rely upon SLAB_DESTROY_BY_RCU for. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-27 21:33 ` Hugh Dickins @ 2010-08-27 23:06 ` Christoph Lameter 2010-08-28 1:07 ` Hugh Dickins 0 siblings, 1 reply; 22+ messages in thread From: Christoph Lameter @ 2010-08-27 23:06 UTC (permalink / raw) To: Hugh Dickins Cc: Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel, Peter Zijlstra, linux-kernel, linux-mm On Fri, 27 Aug 2010, Hugh Dickins wrote: > >> I do not see a second check (*after* taking the lock) in the patch > > if (page_mapped(page)) > return anon_vma; As far as I can tell you would have to recheck the mapping pointer and the pointer to the root too after taking the lock because only taking the lock stabilitzes the object. Any other data you may have obtained before acquiring the lock may have changed. > >> and the way the lock is taken can be a problem in itself. > > No, that's what we rely upon SLAB_DESTROY_BY_RCU for. SLAB_DESTROY_BY_RCU does not guarantee that the object stays the same nor does it prevent any fields from changing. Going through a pointer with only SLAB_DESTROY_BY_RCU means that you can only rely on the atomicity guarantee for pointer updates. You get a valid pointer but pointer changes are not prevented by SLAB_DESTROY_BY_RCU. The only guarantee of that would be through other synchronization techniques. If you believe that the page lock provides sufficient synchronization that then this approach may be ok. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-27 23:06 ` Christoph Lameter @ 2010-08-28 1:07 ` Hugh Dickins 2010-08-28 2:47 ` Christoph Lameter 2010-08-28 15:54 ` Andrea Arcangeli 0 siblings, 2 replies; 22+ messages in thread From: Hugh Dickins @ 2010-08-28 1:07 UTC (permalink / raw) To: Christoph Lameter Cc: Naoya Horiguchi, Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel, Peter Zijlstra, linux-kernel, linux-mm On Fri, Aug 27, 2010 at 4:06 PM, Christoph Lameter <cl@linux.com> wrote: > On Fri, 27 Aug 2010, Hugh Dickins wrote: > >> >> I do not see a second check (*after* taking the lock) in the patch >> >> if (page_mapped(page)) >> return anon_vma; > > As far as I can tell you would have to recheck the mapping pointer and the That's a more interesting question than I'd realized. When page_lock_anon_vma() first came in (2.6.9) there was nothing which updated page->mapping of an anon page after it was set, until the page was freed. Since then we've gathered a few places which update it while holding the page lock (migrate.c, ksm.c) - no problem since the callers of page_lock_anon_vma() hold and must hold page lock. Well, there is the fairly recent call to page_lock_anon_vma() from memory-failure.c, and its even more recent use on hugepages: there's switching back and forth between p and hpage and page, but I think it does end up applying page_lock_anon_vma() to the very page that it locked earlier. Then there's the recently added page_move_anon_rmap(): fine in memory.c, the page lock is held; but apparently broken in hugetlb.c, where it's called only when the pagelock has not been taken! Horiguchi-san Cc'ed. __page_set_anon_rmap() looks like it might have changed anon page->mapping in 2.6.35, but Andrea has fixed that with PageAnon tests in 2.6.36-rc. Ah, but what if "exclusive" and non-exclusive calls to __page_set_anon_rmap() are racing? Not clear, it may be that Andrea has only narrowed a window not closed it (and I've not yet looked up the commit to see his intent); or it may be okay, that there cannot be a conflict of anon_vma in that case. Need to dig deeper. __hugepage_set_anon_rmap() appears to copy the 2.6.35 __page_set_anon_rmap(), and probably needs to add in Andrea's fix, or whatever else is needed there. This is a different problem (or it may turn out to be a non-existent problem, aside from the hugetlb.c case to be fixed there) than I was fixing with my patch, and can be patched separately; but It certainly looks as if it's worth adding a BUG_ON or VM_BUG_ON to check for a switch of anon_vma beneath us there. Plus a VM_BUG_ON(PageLocked(page)) going into page_lock_anon_vma(). > pointer to the root too after taking the lock because only taking the lock > stabilitzes the object. A change in the pointer to the root is covered by the ACCESS_ONCE: yes, it can change beneath us there, but only through the anon_vma being freed and reused, in which case the subsequent page_mapped test tells us the page is no longer mapped, whereupon we back out, unlocking what we locked. (I had at one point been tempted to check anon_vma->root == root_anon_vma there instead of page_mapped(), but that would not have been good enough: since anon_vma_prepare() sets anon_vma->root before taking the lock, anon_vma->root could change under us anywhere between the page_lock_anon_vma() and its page_unlock_anon_vma() in that case.) > Any other data you may have obtained before > acquiring the lock may have changed. > >> >> and the way the lock is taken can be a problem in itself. >> >> No, that's what we rely upon SLAB_DESTROY_BY_RCU for. > > SLAB_DESTROY_BY_RCU does not guarantee that the object stays the same nor > does it prevent any fields from changing. Going through a pointer with > only SLAB_DESTROY_BY_RCU means that you can only rely on the atomicity > guarantee for pointer updates. You get a valid pointer but pointer changes > are not prevented by SLAB_DESTROY_BY_RCU. You're speaking too generally there for me to understand its relevance! What specific problem do you see? > > The only guarantee of that would be through other synchronization > techniques. If you believe that the page lock provides sufficient > synchronization that then this approach may be ok. The page lock should be guaranteeing that page->mapping (anon_vma) cannot change underneath us; but there is some doubt on that above, I'll report back when I've had enough quiet time to think through the __set_page_anon_rmap() possibilities: thanks for uncovering those doubts. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-28 1:07 ` Hugh Dickins @ 2010-08-28 2:47 ` Christoph Lameter 2010-08-28 10:17 ` Peter Zijlstra 2010-08-28 15:54 ` Andrea Arcangeli 1 sibling, 1 reply; 22+ messages in thread From: Christoph Lameter @ 2010-08-28 2:47 UTC (permalink / raw) To: Hugh Dickins Cc: Naoya Horiguchi, Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel, Peter Zijlstra, linux-kernel, linux-mm On Fri, 27 Aug 2010, Hugh Dickins wrote: > >> No, that's what we rely upon SLAB_DESTROY_BY_RCU for. > > > > SLAB_DESTROY_BY_RCU does not guarantee that the object stays the same nor > > does it prevent any fields from changing. Going through a pointer with > > only SLAB_DESTROY_BY_RCU means that you can only rely on the atomicity > > guarantee for pointer updates. You get a valid pointer but pointer changes > > are not prevented by SLAB_DESTROY_BY_RCU. > > You're speaking too generally there for me to understand its > relevance! What specific problem do you see? I had the impression that you rely on SLAB_DESTROY_BY_RCU for more than what it gives you. If the lock taken is not directly in the structure that is managed by slab but only reachable by a pointer then potential pointer changes are also danger to consider. I'd be much more comfortable if the following would be done A. Pin the anon_vma by either I. Take a refcount on the anon vma II. Take a lock in the anon vma (something that is not pointed to) B. Either I. All values that have been used before the pinning are verified after the pinning (and the lock is reacquired if verification fails). II. Or all functions using page_lock_anon_vma() must securely work in the case that the anon_vma was reused for something else before the vma lock was acquired. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-28 2:47 ` Christoph Lameter @ 2010-08-28 10:17 ` Peter Zijlstra 0 siblings, 0 replies; 22+ messages in thread From: Peter Zijlstra @ 2010-08-28 10:17 UTC (permalink / raw) To: Christoph Lameter Cc: Hugh Dickins, Naoya Horiguchi, Andrea Arcangeli, Linus Torvalds, Andrew Morton, Rik van Riel, linux-kernel, linux-mm On Fri, 2010-08-27 at 21:47 -0500, Christoph Lameter wrote: > > I'd be much more comfortable if the following would be done > > A. Pin the anon_vma by either > I. Take a refcount on the anon vma My preemptible mmu patches do that.. > II. Take a lock in the anon vma (something that is not pointed to) > > B. Either > I. All values that have been used before the pinning are > verified after the pinning (and the lock is reacquired > if verification fails). > > II. Or all functions using page_lock_anon_vma() must securely > work in the case that the anon_vma was reused for > something else before the vma lock was acquired. Last time I looked they all work like that, they all use something akin to vma_address() which validates that the page we're interested in is indeed part of the vma we obtained from the rmap chain. Anyway, I'll try and refresh my preemptible mmu patch-set now that the merge window dust settled and post if again, hopefully we can stick it in -next. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] mm: fix hang on anon_vma->root->lock 2010-08-28 1:07 ` Hugh Dickins 2010-08-28 2:47 ` Christoph Lameter @ 2010-08-28 15:54 ` Andrea Arcangeli 1 sibling, 0 replies; 22+ messages in thread From: Andrea Arcangeli @ 2010-08-28 15:54 UTC (permalink / raw) To: Hugh Dickins Cc: Christoph Lameter, Naoya Horiguchi, Linus Torvalds, Andrew Morton, Rik van Riel, Peter Zijlstra, linux-kernel, linux-mm On Fri, Aug 27, 2010 at 06:07:23PM -0700, Hugh Dickins wrote: > __page_set_anon_rmap() looks like it might have changed anon > page->mapping in 2.6.35, but Andrea has fixed that with PageAnon tests > in 2.6.36-rc. Ah, but what if "exclusive" and non-exclusive calls to > __page_set_anon_rmap() are racing? Not clear, it may be that Andrea __page_set_anon_rmap doesn't require the PG_lock only if it's a newly allocated page and it is called from page_add_new_anon_rmap. So there cannot be concurrent __page_set_anon_rmap running on the same page. If there could be concurrent __page_set_anon_rmap running on the same page, the page_lock_anon_vma running on a changing page->mapping would be the last worry, as page_add_new_anon_rmap would overwrite _mapcount with 0 while do_page_add_anon_rmap runs, so corrupting the mapcount information leading to immediate crash during page freeing.... So definitely it must not happen and not only because of page_lock_anon_vma... and it's unlikely to go unnoticed. I think we're safe on that respect. > has only narrowed a window not closed it (and I've not yet looked up > the commit to see his intent); or it may be okay, that there cannot be > a conflict of anon_vma in that case. Need to dig deeper. That change is to avoid altering the page->mapping for anon pages. It's only an optimization. No need to set the page->mapping back to the anon_vma->root for AnonPages (that in turn have already their page->mapping set) if we've already more finegrained information into the page->mapping. If we've already information in page->mapping (page is Anon) then setting to anon_vma->root can only be a coarser setting losing anon_vma child granularity. We must set to the root anon vma however when the page is swapcache but not anon yet... and if it's not exclusive we've to use the anon_vma->root, otherwise if it's being taken over by the local process we can use the local vma->anon_vma. This should explain the logic in __set_page_anon_rmap. > __hugepage_set_anon_rmap() appears to copy the 2.6.35 > __page_set_anon_rmap(), and probably needs to add in Andrea's fix, or > whatever else is needed there. I think it's actually safe in anon_vma terms, setting the page->mapping to the anon_vma->root _always_ safe, but it should use anon_vma->root instead of list_entry (should still lead to the same result) and it can probably also optimize it if it's already an AnonPage like I did for the not-hugetlbfs case (which also includes transparent hugepages as they share the core VM paths). The lack of BUG_ON(!PageLocked(page)) in the hugetlb_add_anon_rmap is worth fixing... hugepage_add_new_anon_rmap runs lock_page before so most certainly is ok (maybe lock_page not needed if it's a new page?). hugepage_add_anon_rmap seems to run on a local new page too (just cowed) so I'm unsure why it's not using hugepage_add_new_anon_rmap too and maybe it's safe without the PG_lock too, but it should use hugepage_add_new_anon_rmap so we keep the same logic of the core VM. I didn't spend too much on this hugetlbfs code, this is just a short review. > switch of anon_vma beneath us there. Plus a > VM_BUG_ON(PageLocked(page)) going into page_lock_anon_vma(). !PageLocked Ok I think the concurrent writers of page->mapping needs the PG_lock (unless they're working on the newly allocated page like page_add_new_anon_rmap) but I really like page_lock_anon_vma to be safe without PG_lock and to rely only on RCU and the anon_vma lock. Do you think there's a window there? I think as long as the page is mapped it doesn't matter... any change that can happen from under us in page_lock_anon_vma, is still going to point to a valid anon_vma, if it is reused it can be reused regardless if it's the root or the local one and we've your fix to take care of that complexity of slab RCU freeing behavior. I need to think again and more deeply about page_lock_anon_vma running on a page not locked, but I was convinced it was safe and that only the writers needed the PG_lock (and even for the writers case, I think it is mostly needed for other reasons, notably to keep the mapcount coherent!). Yeah the write to page->mapping has to be atomic of course (I actually checked every page->mapping writer in mm/rmap.c to verify the gcc output to verify it is atomic, like we also relay on gcc to write 64bit atomic for the pte/spte writes... I worried a little about the |=1 but gcc is smart enough to do it in registers before writing to memory, again not so different from the pte updates ;). My reasoning is that any anon_vma we pick in page_lock_anon_vma is ok (the root or intermediate one can't go away until the local one goes away, if it's the local one it can't go away as long as the page is mapped). So as long as the page->mapping writer writes atomic, and we check page_mapped after taking the lock like your patch does, we should be ok without PG_lock in page_lock_anon_vma. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2010-08-28 15:54 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-08-26 6:12 [PATCH] mm: fix hang on anon_vma->root->lock Hugh Dickins 2010-08-26 6:41 ` David Miller 2010-08-26 10:54 ` Hugh Dickins 2010-08-26 19:00 ` David Miller 2010-08-27 0:19 ` Andrea Arcangeli 2010-08-26 13:32 ` Rik van Riel 2010-08-26 23:50 ` Andrea Arcangeli 2010-08-27 1:43 ` Hugh Dickins 2010-08-27 9:55 ` Andrea Arcangeli 2010-08-27 16:43 ` Hugh Dickins 2010-08-27 17:13 ` Christoph Lameter 2010-08-27 17:55 ` Hugh Dickins 2010-08-27 19:29 ` Christoph Lameter 2010-08-27 20:14 ` Hugh Dickins 2010-08-27 20:56 ` Christoph Lameter 2010-08-27 21:28 ` Hugh Dickins 2010-08-27 21:33 ` Hugh Dickins 2010-08-27 23:06 ` Christoph Lameter 2010-08-28 1:07 ` Hugh Dickins 2010-08-28 2:47 ` Christoph Lameter 2010-08-28 10:17 ` Peter Zijlstra 2010-08-28 15:54 ` Andrea Arcangeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).