* [PATCH] mm: fix page_lock_anon_vma leaving mutex locked @ 2011-05-28 20:20 Hugh Dickins 2011-05-28 21:14 ` Peter Zijlstra 0 siblings, 1 reply; 11+ messages in thread From: Hugh Dickins @ 2011-05-28 20:20 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, Peter Zijlstra, linux-kernel, linux-mm On one machine I've been getting hangs, a page fault's anon_vma_prepare() waiting in anon_vma_lock(), other processes waiting for that page's lock. This is a replay of last year's f18194275c39 "mm: fix hang on anon_vma->root->lock". The new page_lock_anon_vma() places too much faith in its refcount: when it has acquired the mutex_trylock(), it's possible that a racing task in anon_vma_alloc() has just reallocated the struct anon_vma, set refcount to 1, and is about to reset its anon_vma->root. Fix this by saving anon_vma->root, and relying on the usual page_mapped() check instead of a refcount check: if page is still mapped, the anon_vma is still ours; if page is not still mapped, we're no longer interested. Signed-off-by: Hugh Dickins <hughd@google.com> --- mm/rmap.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) --- linux.orig/mm/rmap.c 2011-05-27 20:07:44.000000000 -0700 +++ linux/mm/rmap.c 2011-05-27 20:31:04.596303434 -0700 @@ -405,6 +405,7 @@ out: struct anon_vma *page_lock_anon_vma(struct page *page) { struct anon_vma *anon_vma = NULL; + struct anon_vma *root_anon_vma; unsigned long anon_mapping; rcu_read_lock(); @@ -415,13 +416,15 @@ struct anon_vma *page_lock_anon_vma(stru goto out; anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); - if (mutex_trylock(&anon_vma->root->mutex)) { + root_anon_vma = ACCESS_ONCE(anon_vma->root); + if (mutex_trylock(&root_anon_vma->mutex)) { /* - * If we observe a !0 refcount, then holding the lock ensures - * the anon_vma will not go away, see __put_anon_vma(). + * If the page is still mapped, then this anon_vma is still + * its anon_vma, and holding the mutex ensures that it will + * not go away, see __put_anon_vma(). */ - if (!atomic_read(&anon_vma->refcount)) { - anon_vma_unlock(anon_vma); + if (!page_mapped(page)) { + mutex_unlock(&root_anon_vma->mutex); anon_vma = NULL; } goto out; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: fix page_lock_anon_vma leaving mutex locked 2011-05-28 20:20 [PATCH] mm: fix page_lock_anon_vma leaving mutex locked Hugh Dickins @ 2011-05-28 21:14 ` Peter Zijlstra 2011-05-28 22:02 ` Hugh Dickins 0 siblings, 1 reply; 11+ messages in thread From: Peter Zijlstra @ 2011-05-28 21:14 UTC (permalink / raw) To: Hugh Dickins; +Cc: Linus Torvalds, Andrew Morton, linux-kernel, linux-mm On Sat, 2011-05-28 at 13:20 -0700, Hugh Dickins wrote: > On one machine I've been getting hangs, a page fault's anon_vma_prepare() > waiting in anon_vma_lock(), other processes waiting for that page's lock. > > This is a replay of last year's f18194275c39 > "mm: fix hang on anon_vma->root->lock". > > The new page_lock_anon_vma() places too much faith in its refcount: when > it has acquired the mutex_trylock(), it's possible that a racing task in > anon_vma_alloc() has just reallocated the struct anon_vma, set refcount > to 1, and is about to reset its anon_vma->root. > > Fix this by saving anon_vma->root, and relying on the usual page_mapped() > check instead of a refcount check: if page is still mapped, the anon_vma > is still ours; if page is not still mapped, we're no longer interested. Interesting race.. but can we guarantee that the page didn't get remapped meanwhile? The updated comment by page_get_anon_vma() describes the lack of serialization against page_remove_rmap() but fails to mention the page_add_anon_rmap cases (bad me, I know I checked at the time, but can't for the life of me remember what it was now). _IFF_ we are serialized, your patch should suffice, since then page_mapped() implies a >0 refcount, if not however, I think we need both tests since in that case the page might be mapped again against a different anon_vma and our current anon_vma (the one we locked against) might have refcount == 0 and already be past the mutex_is_locked() test in anon_vma_free(), at which point we're up shit creek since then the anon_vma we're returning can disappear the moment we do rcu_read_unlock(). Or am I delusional due to lack of sleep? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: fix page_lock_anon_vma leaving mutex locked 2011-05-28 21:14 ` Peter Zijlstra @ 2011-05-28 22:02 ` Hugh Dickins 2011-05-28 23:24 ` Linus Torvalds 0 siblings, 1 reply; 11+ messages in thread From: Hugh Dickins @ 2011-05-28 22:02 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Linus Torvalds, Andrew Morton, linux-kernel, linux-mm On Sat, 28 May 2011, Peter Zijlstra wrote: > On Sat, 2011-05-28 at 13:20 -0700, Hugh Dickins wrote: > > On one machine I've been getting hangs, a page fault's anon_vma_prepare() > > waiting in anon_vma_lock(), other processes waiting for that page's lock. > > > > This is a replay of last year's f18194275c39 > > "mm: fix hang on anon_vma->root->lock". > > > > The new page_lock_anon_vma() places too much faith in its refcount: when > > it has acquired the mutex_trylock(), it's possible that a racing task in > > anon_vma_alloc() has just reallocated the struct anon_vma, set refcount > > to 1, and is about to reset its anon_vma->root. > > > > Fix this by saving anon_vma->root, and relying on the usual page_mapped() > > check instead of a refcount check: if page is still mapped, the anon_vma > > is still ours; if page is not still mapped, we're no longer interested. > > Interesting race.. but can we guarantee that the page didn't get > remapped meanwhile? Remapped how? If migrated, then this page will simply be unmapped, and another page take its place. I see now your comment above page_get_anon_vma(), saying the page might have been remapped to a different anon_vma, but how does tha come about? Oh, are you referring to page_move_anon_rmap()? Yikes, I wasn't paying attention when that came in: in my comfortable little world, once PageAnon page->mapping is set, it remains constant until the page is freed. That's long been a given, but now I see, not for the last year. I'll give some thought to that right now, it's hard to appreciate the ramifications. The page lock is held while that move is made, but I believe there's one (perhaps no more) path to page_lock_anon_vma(), when checking page_referenced(), when the page is not necessarily locked. But I'm not even sure if you're referring to page_move_anon_rmap(), or something else? > > The updated comment by page_get_anon_vma() describes the lack of > serialization against page_remove_rmap() but fails to mention the > page_add_anon_rmap cases (bad me, I know I checked at the time, but > can't for the life of me remember what it was now). What is the problematic page_add_anon_rmap() case? Ah, when do_swap_page() calls do_page_add_anon_rmap() with exclusive 1? Which is pretty much equivalent to the page_move_anon_rmap() case, but just from that slightly different place. > > _IFF_ we are serialized, your patch should suffice, since then > page_mapped() implies a >0 refcount, if not however, I think we need > both tests since in that case the page might be mapped again against a > different anon_vma and our current anon_vma (the one we locked against) > might have refcount == 0 and already be past the mutex_is_locked() test > in anon_vma_free(), at which point we're up shit creek since then the > anon_vma we're returning can disappear the moment we do > rcu_read_unlock(). I'm not convinced that the refcount helps at all there. If the page is now using a different anon_vma than the one it started out with, what prevents the old one from being freed and reused and now refcount 1? But I'm replying before I've given it enough thought, mainly to let you know that I am back on it now. > > Or am I delusional due to lack of sleep? Congratulations!? I fear you're not delusional: thank you for catching this. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: fix page_lock_anon_vma leaving mutex locked 2011-05-28 22:02 ` Hugh Dickins @ 2011-05-28 23:24 ` Linus Torvalds 2011-05-28 23:56 ` Linus Torvalds 2011-05-29 0:12 ` Hugh Dickins 0 siblings, 2 replies; 11+ messages in thread From: Linus Torvalds @ 2011-05-28 23:24 UTC (permalink / raw) To: Hugh Dickins; +Cc: Peter Zijlstra, Andrew Morton, linux-kernel, linux-mm On Sat, May 28, 2011 at 3:02 PM, Hugh Dickins <hughd@google.com> wrote: > > But I'm replying before I've given it enough thought, > mainly to let you know that I am back on it now. So I applied your other two patches as obvious, but not this one. I'm wondering - wouldn't it be nicer to just re-check (after getting the anon_vma lock) that page->mapping still matches anon_mapping? That said, I do agree with the "anon_vma_root" part of your patch. I just think you mixed up two independent issues with it: the fact that we may be unlocking a new root, and the precise check used to determine whether the anon_vma might have changed. So my gut feeling is that we should do the "anon_vma" root thing independently as a fix for the "maybe anon_vma->root changed" issue, and then as a separate patch decide on how to check whether anon_vma is still valid. Hmm? Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: fix page_lock_anon_vma leaving mutex locked 2011-05-28 23:24 ` Linus Torvalds @ 2011-05-28 23:56 ` Linus Torvalds 2011-05-29 0:12 ` Hugh Dickins 1 sibling, 0 replies; 11+ messages in thread From: Linus Torvalds @ 2011-05-28 23:56 UTC (permalink / raw) To: Hugh Dickins; +Cc: Peter Zijlstra, Andrew Morton, linux-kernel, linux-mm On Sat, May 28, 2011 at 4:24 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > That said, I do agree with the "anon_vma_root" part of your patch. I > just think you mixed up two independent issues with it: the fact that > we may be unlocking a new root, and the precise check used to > determine whether the anon_vma might have changed. Thinking some more about it, I end up agreeing with the whole patch. The "page_mapped()" test is what we use in the slow-path after incrementing the anon_vma count too when the trylock didn't work too, so it can't be too wrong. So I'm going to apply it as-is as an improvement (at least we won't be unlocking the wrong anon_vma root), and hope that you and Peter end up agreeing about what the sufficient test is for whether the anon_vma is the right one. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: fix page_lock_anon_vma leaving mutex locked 2011-05-28 23:24 ` Linus Torvalds 2011-05-28 23:56 ` Linus Torvalds @ 2011-05-29 0:12 ` Hugh Dickins 2011-05-29 0:23 ` Linus Torvalds 2011-05-29 8:33 ` Peter Zijlstra 1 sibling, 2 replies; 11+ messages in thread From: Hugh Dickins @ 2011-05-29 0:12 UTC (permalink / raw) To: Linus Torvalds Cc: Peter Zijlstra, Andrew Morton, Rik van Riel, linux-kernel, linux-mm On Sat, 28 May 2011, Linus Torvalds wrote: > On Sat, May 28, 2011 at 3:02 PM, Hugh Dickins <hughd@google.com> wrote: > > > > But I'm replying before I've given it enough thought, > > mainly to let you know that I am back on it now. > > So I applied your other two patches as obvious, but not this one. Thank you, that's right. Though I think I'm arriving at the conclusion that this patch is correct as is, despite the doubts that have arisen. One argument is by induction: since we've noticed no problems before Peter's patchset, and actually Peter's patchset plus my patch is not really making any difference to this "anon_vma changing beneath you" case, is it? (I've not relooked at what the situation was before his final optimized page_lock_anon_vma(): maybe that was fine, or maybe it could get into decrementing a different refcount than had been incremented, if we didn't have the protection of PageLocked against switching anon_vma.) > > I'm wondering - wouldn't it be nicer to just re-check (after getting > the anon_vma lock) that page->mapping still matches anon_mapping? I toyed with that: it seemed a better idea than relying on the refcount, which wasn't giving the guarantee we needed (the refcount is perfectly good in other respects, it just isn't good for this particular check). However, the problem (if there is one) goes a bit further than that: if we don't actually have serialization against page->anon_vma (okay, it's actually page->mapping, but simpler to express this way) being changed at any instant, i.e. we're serving the page_referenced() without PageLocked case, then what good is the "anon_vma" that page_lock_anon_vma() returns? If that can be freed and reused at any moment? I believe that although it may no longer be the anon_vma that the page is pointing to, it remains stable. Because even if page->anon_vma is updated, it will certainly have the same anon_vma->root as before (see the first BUG_ON in __page_check_anon_rmap() for reassurance), so the mutex locking holds good. And the structure itself won't be freed: although the page is now pointing to a less inclusive, more optimal anon_vma for reclaim to use, the anon_vma which was originally pointed to remains on the same vma's chains as it ever was, and only gets freed up when they're all gone. So, when there's this race with moving anon_vma, page_lock_anon_vma() may end up returning a less than optimal anon_vma, but it's still valid as a good though longer list of vmas to look through. The previous code would have broken horribly, wouldn't it, were that not the case? > > That said, I do agree with the "anon_vma_root" part of your patch. I > just think you mixed up two independent issues with it: the fact that > we may be unlocking a new root, and the precise check used to > determine whether the anon_vma might have changed. It's true that actually I first ran with just the page_mapped() instead of refcount part of the patch; and was disappointed to find that did not fix the hang on its own. Had to spend a few minutes yesterday morning actually thinking and remembering the root_anon_vma thing. So to that extent they must be separable; but I'm finding it hard to agree with putting in a broken half-patch. And the page_mapped() part of it is essential. Let's add Rik to the Cc in case he's around and might comment too. Hugh > > So my gut feeling is that we should do the "anon_vma" root thing > independently as a fix for the "maybe anon_vma->root changed" issue, > and then as a separate patch decide on how to check whether anon_vma > is still valid. > > Hmm? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: fix page_lock_anon_vma leaving mutex locked 2011-05-29 0:12 ` Hugh Dickins @ 2011-05-29 0:23 ` Linus Torvalds 2011-05-29 0:43 ` Hugh Dickins 2011-05-29 8:33 ` Peter Zijlstra 1 sibling, 1 reply; 11+ messages in thread From: Linus Torvalds @ 2011-05-29 0:23 UTC (permalink / raw) To: Hugh Dickins Cc: Peter Zijlstra, Andrew Morton, Rik van Riel, linux-kernel, linux-mm On Sat, May 28, 2011 at 5:12 PM, Hugh Dickins <hughd@google.com> wrote: > > Though I think I'm arriving at the conclusion that this patch > is correct as is, despite the doubts that have arisen. Well, you hopefully saw my second email where I had come to the same conclusion. So I applied the third patch as well, after all. I think it's at the very least at least "more correct" than what we have now. Whether that "page_mapped()" should then be extended to do something else is an additional thing, and I suspect it would affect the slow-path case too. I dunno. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: fix page_lock_anon_vma leaving mutex locked 2011-05-29 0:23 ` Linus Torvalds @ 2011-05-29 0:43 ` Hugh Dickins 2011-05-29 8:35 ` Peter Zijlstra 0 siblings, 1 reply; 11+ messages in thread From: Hugh Dickins @ 2011-05-29 0:43 UTC (permalink / raw) To: Linus Torvalds Cc: Peter Zijlstra, Andrew Morton, Rik van Riel, linux-kernel, linux-mm On Sat, 28 May 2011, Linus Torvalds wrote: > On Sat, May 28, 2011 at 5:12 PM, Hugh Dickins <hughd@google.com> wrote: > > > > Though I think I'm arriving at the conclusion that this patch > > is correct as is, despite the doubts that have arisen. > > Well, you hopefully saw my second email where I had come to the same conclusion. Yes, thanks. > > So I applied the third patch as well, after all. I think it's at the > very least at least "more correct" than what we have now. Whether that > "page_mapped()" should then be extended to do something else is an > additional thing, and I suspect it would affect the slow-path case > too. Yes, I agree it's certainly more correct (two machines running well with it for 33 hours now, one of them would hang in 7 or 17 hours). And I'm increasingly confident that it's complete too, but it will be interesting to see whether I've persuaded Peter. It was certainly a very good point that he raised, that I hadn't thought of at all. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: fix page_lock_anon_vma leaving mutex locked 2011-05-29 0:43 ` Hugh Dickins @ 2011-05-29 8:35 ` Peter Zijlstra 0 siblings, 0 replies; 11+ messages in thread From: Peter Zijlstra @ 2011-05-29 8:35 UTC (permalink / raw) To: Hugh Dickins Cc: Linus Torvalds, Andrew Morton, Rik van Riel, linux-kernel, linux-mm On Sat, 2011-05-28 at 17:43 -0700, Hugh Dickins wrote: > > And I'm increasingly confident that it's complete too, but it will be > interesting to see whether I've persuaded Peter. It was certainly a > very good point that he raised, that I hadn't thought of at all. Yes you have. And hopefully we've now got enough comments there serve us next time.. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: fix page_lock_anon_vma leaving mutex locked 2011-05-29 0:12 ` Hugh Dickins 2011-05-29 0:23 ` Linus Torvalds @ 2011-05-29 8:33 ` Peter Zijlstra 2011-05-29 20:53 ` Hugh Dickins 1 sibling, 1 reply; 11+ messages in thread From: Peter Zijlstra @ 2011-05-29 8:33 UTC (permalink / raw) To: Hugh Dickins Cc: Linus Torvalds, Andrew Morton, Rik van Riel, linux-kernel, linux-mm On Sat, 2011-05-28 at 17:12 -0700, Hugh Dickins wrote: > Though I think I'm arriving at the conclusion that this patch > is correct as is, despite the doubts that have arisen. > > One argument is by induction: since we've noticed no problems before > Peter's patchset, and actually Peter's patchset plus my patch is not > really making any difference to this "anon_vma changing beneath you" > case, is it? I'll buy that argument, and on those grounds would have Acked the patch, but since Linus already committed it, that's a tad moot. Still understanding why the previous code was right (if so) is important. > > I'm wondering - wouldn't it be nicer to just re-check (after getting > > the anon_vma lock) that page->mapping still matches anon_mapping? > > I toyed with that: it seemed a better idea than relying on the refcount, > which wasn't giving the guarantee we needed (the refcount is perfectly > good in other respects, it just isn't good for this particular check). > > However, the problem (if there is one) goes a bit further than that: > if we don't actually have serialization against page->anon_vma (okay, > it's actually page->mapping, but simpler to express this way) being > changed at any instant, i.e. we're serving the page_referenced() without > PageLocked case, then what good is the "anon_vma" that page_lock_anon_vma() > returns? If that can be freed and reused at any moment? Agreed, all except page_referenced() are serialized using PageLock. > I believe that although it may no longer be the anon_vma that the page > is pointing to, it remains stable. Because even if page->anon_vma is > updated, it will certainly have the same anon_vma->root as before > (see the first BUG_ON in __page_check_anon_rmap() for reassurance), > so the mutex locking holds good. > > And the structure itself won't be freed: although the page is now > pointing to a less inclusive, more optimal anon_vma for reclaim to use, > the anon_vma which was originally pointed to remains on the same vma's > chains as it ever was, and only gets freed up when they're all gone. > > So, when there's this race with moving anon_vma, page_lock_anon_vma() > may end up returning a less than optimal anon_vma, but it's still valid > as a good though longer list of vmas to look through. Yes, and I think I see what you mean, if a page's anon_vma is changed while it remains mapped it will only ever be moved to a child of the original anon_vma. And because of the anon_vma ref-counting, the original anon_vma will stick around until that too is dead, which won't happen for as long as the page remains mapped. Therefore, for as long as we observe page_mapped(), any anon_vma obtained from it remains valid. Talk about tricky.. shees. I bet that wants a comment or so. > The previous code would have broken horribly, wouldn't it, were that > not the case? It would have, yes. --- Subject: mm, rmap: Add yet more comments to page_get_anon_vma/page_lock_anon_vma Inspired by an analysis from Hugh on why again all this doesn't explode in our face. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- mm/rmap.c | 9 +++++++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 6bada99..487d5cc 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -350,7 +350,12 @@ void __init anon_vma_init(void) * have been relevant to this page. * * The page might have been remapped to a different anon_vma or the anon_vma - * returned may already be freed (and even reused). + * returned may already be freed (and even reused). + * + * In case it was remapped to a different anon_vma, the new anon_vma will be a + * child of the old anon_vma, and the anon_vma lifetime rules will therefore + * ensure that any anon_vma obtained from the page will still be valid for as + * long as we observe page_mapped() [ hence all those page_mapped() tests ]. * * All users of this function must be very careful when walking the anon_vma * chain and verify that the page in question is indeed mapped in it @@ -421,7 +426,7 @@ struct anon_vma *page_lock_anon_vma(struct page *page) /* * If the page is still mapped, then this anon_vma is still * its anon_vma, and holding the mutex ensures that it will - * not go away, see __put_anon_vma(). + * not go away, see anon_vma_free(). */ if (!page_mapped(page)) { mutex_unlock(&root_anon_vma->mutex); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] mm: fix page_lock_anon_vma leaving mutex locked 2011-05-29 8:33 ` Peter Zijlstra @ 2011-05-29 20:53 ` Hugh Dickins 0 siblings, 0 replies; 11+ messages in thread From: Hugh Dickins @ 2011-05-29 20:53 UTC (permalink / raw) To: Peter Zijlstra Cc: Linus Torvalds, Andrew Morton, Rik van Riel, linux-kernel, linux-mm On Sun, 29 May 2011, Peter Zijlstra wrote: > On Sat, 2011-05-28 at 17:12 -0700, Hugh Dickins wrote: > > I believe that although it may no longer be the anon_vma that the page > > is pointing to, it remains stable. Because even if page->anon_vma is > > updated, it will certainly have the same anon_vma->root as before > > (see the first BUG_ON in __page_check_anon_rmap() for reassurance), > > so the mutex locking holds good. > > > > And the structure itself won't be freed: although the page is now > > pointing to a less inclusive, more optimal anon_vma for reclaim to use, > > the anon_vma which was originally pointed to remains on the same vma's > > chains as it ever was, and only gets freed up when they're all gone. > > > > So, when there's this race with moving anon_vma, page_lock_anon_vma() > > may end up returning a less than optimal anon_vma, but it's still valid > > as a good though longer list of vmas to look through. > > Yes, and I think I see what you mean, if a page's anon_vma is changed > while it remains mapped it will only ever be moved to a child of the > original anon_vma. And because of the anon_vma ref-counting, the > original anon_vma will stick around until that too is dead, which won't > happen for as long as the page remains mapped. Child or grandchild or more remote descendent, I think, yes. Actually, it's not the anon_vma ref-counting that keeps them around generally: I think it's the way we keep all those anon_vmas linked to their vmas - the anon_vma will be freed only when all its possible vmas have been unmapped. Just like when we didn't have ref-counting, and just like before the anon_vma_chains. For a while I thought that, if we were careful about the ordering of the lists, always freeing root last, we could have a naughty patch which even removes the additional counting on anon_vma->root. But no, precisely because there is some ref-counting, which may hold any anon_vma for a while, we cannot enforce such ordering and do need additional holds on the root. > > Therefore, for as long as we observe page_mapped(), any anon_vma > obtained from it remains valid. > > Talk about tricky.. shees. I bet that wants a comment or so. I don't think anybody understood how this was working, until you forced us to think about it yesterday: thanks a lot for doing so. > > > The previous code would have broken horribly, wouldn't it, were that > > not the case? > > It would have, yes. > > --- > Subject: mm, rmap: Add yet more comments to page_get_anon_vma/page_lock_anon_vma > > Inspired by an analysis from Hugh on why again all this doesn't explode > in our face. > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Hugh Dickins <hughd@google.com> > --- > mm/rmap.c | 9 +++++++-- > 1 files changed, 7 insertions(+), 2 deletions(-) > > diff --git a/mm/rmap.c b/mm/rmap.c > index 6bada99..487d5cc 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -350,7 +350,12 @@ void __init anon_vma_init(void) > * have been relevant to this page. > * > * The page might have been remapped to a different anon_vma or the anon_vma > - * returned may already be freed (and even reused). > + * returned may already be freed (and even reused). > + * > + * In case it was remapped to a different anon_vma, the new anon_vma will be a > + * child of the old anon_vma, and the anon_vma lifetime rules will therefore > + * ensure that any anon_vma obtained from the page will still be valid for as > + * long as we observe page_mapped() [ hence all those page_mapped() tests ]. > * > * All users of this function must be very careful when walking the anon_vma > * chain and verify that the page in question is indeed mapped in it > @@ -421,7 +426,7 @@ struct anon_vma *page_lock_anon_vma(struct page *page) > /* > * If the page is still mapped, then this anon_vma is still > * its anon_vma, and holding the mutex ensures that it will > - * not go away, see __put_anon_vma(). > + * not go away, see anon_vma_free(). > */ > if (!page_mapped(page)) { > mutex_unlock(&root_anon_vma->mutex); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-05-29 20:53 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-05-28 20:20 [PATCH] mm: fix page_lock_anon_vma leaving mutex locked Hugh Dickins 2011-05-28 21:14 ` Peter Zijlstra 2011-05-28 22:02 ` Hugh Dickins 2011-05-28 23:24 ` Linus Torvalds 2011-05-28 23:56 ` Linus Torvalds 2011-05-29 0:12 ` Hugh Dickins 2011-05-29 0:23 ` Linus Torvalds 2011-05-29 0:43 ` Hugh Dickins 2011-05-29 8:35 ` Peter Zijlstra 2011-05-29 8:33 ` Peter Zijlstra 2011-05-29 20:53 ` Hugh Dickins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).