* PREEMPT_RCU breaks anon_vma locking ?
@ 2007-02-23 21:23 Oleg Nesterov
2007-02-23 22:41 ` Paul E. McKenney
2007-02-24 22:04 ` Hugh Dickins
0 siblings, 2 replies; 10+ messages in thread
From: Oleg Nesterov @ 2007-02-23 21:23 UTC (permalink / raw)
To: Paul E. McKenney, Hugh Dickins; +Cc: dipankar, Andrew Morton, linux-kernel
If my understanding correct, vmscan can find a page which lives in a already
anon_vma_unlink'ed vma. This is ok, the page is pinned, and page->mapping is
not cleared until free_hot_cold_page().
So page_lock_anon_vma() works correctly due to SLAB_DESTROY_BY_RCU even if
anon_vma_unlink() has already freed anon_vma. In that case we should see
list_empty(&anon_vma->head), we are safe.
However, we are doing spin_unlock(anon_vma->lock) after page_lock_anon_vma(),
and this looks unsafe to me because page_lock_anon_vma() does rcu_read_unlock()
on return.
This worked before because spin_lock() implied rcu_read_lock(), so rcu was
blocked if page_lock_anon_vma() returns !NULL. With CONFIG_PREEMPT_RCU this
is not true (yes?), so it is possible that the slab returns the memory to
the system and it is re-used when we write to anon_vma->lock.
IOW, don't we need something like this
static struct anon_vma *page_lock_anon_vma(struct page *page)
{
struct anon_vma *anon_vma;
unsigned long anon_mapping;
rcu_read_lock();
anon_mapping = (unsigned long) page->mapping;
if (!(anon_mapping & PAGE_MAPPING_ANON))
goto out;
if (!page_mapped(page))
goto out;
anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON);
spin_lock(&anon_vma->lock);
return anon_vma;
out:
rcu_read_unlock();
return NULL;
}
static inline void page_lock_anon_vma(struct anon_vma *anon_vma)
{
spin_unlock(&anon_vma->lock);
rcu_read_unlock();
}
?
Oleg.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: PREEMPT_RCU breaks anon_vma locking ? 2007-02-23 21:23 PREEMPT_RCU breaks anon_vma locking ? Oleg Nesterov @ 2007-02-23 22:41 ` Paul E. McKenney 2007-02-24 22:10 ` Hugh Dickins 2007-02-24 22:04 ` Hugh Dickins 1 sibling, 1 reply; 10+ messages in thread From: Paul E. McKenney @ 2007-02-23 22:41 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Hugh Dickins, dipankar, Andrew Morton, linux-kernel On Sat, Feb 24, 2007 at 12:23:03AM +0300, Oleg Nesterov wrote: > If my understanding correct, vmscan can find a page which lives in a already > anon_vma_unlink'ed vma. This is ok, the page is pinned, and page->mapping is > not cleared until free_hot_cold_page(). > > So page_lock_anon_vma() works correctly due to SLAB_DESTROY_BY_RCU even if > anon_vma_unlink() has already freed anon_vma. In that case we should see > list_empty(&anon_vma->head), we are safe. > > However, we are doing spin_unlock(anon_vma->lock) after page_lock_anon_vma(), > and this looks unsafe to me because page_lock_anon_vma() does rcu_read_unlock() > on return. This would indeed be bad when using CONFIG_PREEMPT_RCU! Good catch!!! > This worked before because spin_lock() implied rcu_read_lock(), so rcu was > blocked if page_lock_anon_vma() returns !NULL. With CONFIG_PREEMPT_RCU this > is not true (yes?), so it is possible that the slab returns the memory to > the system and it is re-used when we write to anon_vma->lock. > > IOW, don't we need something like this > > static struct anon_vma *page_lock_anon_vma(struct page *page) > { > struct anon_vma *anon_vma; > unsigned long anon_mapping; > > rcu_read_lock(); > anon_mapping = (unsigned long) page->mapping; > if (!(anon_mapping & PAGE_MAPPING_ANON)) > goto out; > if (!page_mapped(page)) > goto out; > > anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); > spin_lock(&anon_vma->lock); > return anon_vma; > > out: > rcu_read_unlock(); > return NULL; > } > > static inline void page_lock_anon_vma(struct anon_vma *anon_vma) > { > spin_unlock(&anon_vma->lock); > rcu_read_unlock(); > } > ? This look like a valid fix to me, at least as long as the lock is never dropped in the meantime (e.g., to do I/O). If the lock -is- dropped in the meantime, then presumably whatever is done to keep the page from vanishing should allow an rcu_read_unlock() to be placed after each spin_unlock(&...->lock) and an rcu_read_lock() to be placed before each spin_lock(&...->lock). Thanx, Paul ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PREEMPT_RCU breaks anon_vma locking ? 2007-02-23 22:41 ` Paul E. McKenney @ 2007-02-24 22:10 ` Hugh Dickins 2007-02-24 22:36 ` Paul E. McKenney 0 siblings, 1 reply; 10+ messages in thread From: Hugh Dickins @ 2007-02-24 22:10 UTC (permalink / raw) To: Paul E. McKenney; +Cc: Oleg Nesterov, dipankar, Andrew Morton, linux-kernel On Fri, 23 Feb 2007, Paul E. McKenney wrote: > > This look like a valid fix to me, at least as long as the lock is never > dropped in the meantime (e.g., to do I/O). If the lock -is- dropped in > the meantime, then presumably whatever is done to keep the page from > vanishing should allow an rcu_read_unlock() to be placed after each > spin_unlock(&...->lock) and an rcu_read_lock() to be placed before each > spin_lock(&...->lock). Thankfully no complications of that kind, page_lock_anon_vma is static to mm/rmap.c, and only used to hold the spin lock while examining page tables of the vmas in the list, never a need to drop that lock at all. (Until the day when someone reports such a long list that we start to worry about the latency.) Hugh ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PREEMPT_RCU breaks anon_vma locking ? 2007-02-24 22:10 ` Hugh Dickins @ 2007-02-24 22:36 ` Paul E. McKenney 0 siblings, 0 replies; 10+ messages in thread From: Paul E. McKenney @ 2007-02-24 22:36 UTC (permalink / raw) To: Hugh Dickins; +Cc: Oleg Nesterov, dipankar, Andrew Morton, linux-kernel On Sat, Feb 24, 2007 at 10:10:57PM +0000, Hugh Dickins wrote: > On Fri, 23 Feb 2007, Paul E. McKenney wrote: > > > > This look like a valid fix to me, at least as long as the lock is never > > dropped in the meantime (e.g., to do I/O). If the lock -is- dropped in > > the meantime, then presumably whatever is done to keep the page from > > vanishing should allow an rcu_read_unlock() to be placed after each > > spin_unlock(&...->lock) and an rcu_read_lock() to be placed before each > > spin_lock(&...->lock). > > Thankfully no complications of that kind, page_lock_anon_vma is static > to mm/rmap.c, and only used to hold the spin lock while examining page > tables of the vmas in the list, never a need to drop that lock at all. > (Until the day when someone reports such a long list that we start to > worry about the latency.) Whew!!! For now, anyway! ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PREEMPT_RCU breaks anon_vma locking ? 2007-02-23 21:23 PREEMPT_RCU breaks anon_vma locking ? Oleg Nesterov 2007-02-23 22:41 ` Paul E. McKenney @ 2007-02-24 22:04 ` Hugh Dickins 2007-02-24 22:53 ` Paul E. McKenney ` (2 more replies) 1 sibling, 3 replies; 10+ messages in thread From: Hugh Dickins @ 2007-02-24 22:04 UTC (permalink / raw) To: Oleg Nesterov Cc: Paul E. McKenney, dipankar, Andrew Morton, Christoph Lameter, linux-kernel On Sat, 24 Feb 2007, Oleg Nesterov wrote: > If my understanding correct, vmscan can find a page which lives in a already > anon_vma_unlink'ed vma. This is ok, the page is pinned, and page->mapping is > not cleared until free_hot_cold_page(). That's about right. The page_mapped checks, at several levels, make it very hard to hit this case in practice; but it is possible for the page to be unmapped and the anon_vma unlinked and kmem_cache_freed while vmscan is racing through page_lock_anon_vma. > > So page_lock_anon_vma() works correctly due to SLAB_DESTROY_BY_RCU even if > anon_vma_unlink() has already freed anon_vma. In that case we should see > list_empty(&anon_vma->head), we are safe. (It doesn't affect your argument, but we won't necessarily see list_empty there: the anon_vma slot may already have got reused for a different bundle of vmas completely; but its lock remains a lock and its list remains a list of vmas, and the worst that happens is that page_referenced_anon or try_to_unmap_anon wanders through an irrelevant bundle of vmas, looking for a page that cannot be found there.) > > However, we are doing spin_unlock(anon_vma->lock) after page_lock_anon_vma(), > and this looks unsafe to me because page_lock_anon_vma() does rcu_read_unlock() > on return. > > This worked before because spin_lock() implied rcu_read_lock(), so rcu was > blocked if page_lock_anon_vma() returns !NULL. With CONFIG_PREEMPT_RCU this > is not true (yes?), so it is possible that the slab returns the memory to > the system and it is re-used when we write to anon_vma->lock. I had been meaning to take a fresh look at page_lock_anon_vma, to see whether recent RCU developments in -mm affected it. Thanks for saving me the trouble. You and Paul know infinitely more about CONFIG_PREEMPT_RCU than I do, so if you believe that the change below is enough, that's great, it's much simpler than I'd feared might be needed. CONFIG_PREEMPT_RCU rather scares me, but perhaps it's less worrying than I'd imagined. Have you checked through the SLAB_DESTROY_BY_RCU end in slab.c? Is what that's doing still valid? > > IOW, don't we need something like this > > static struct anon_vma *page_lock_anon_vma(struct page *page) > { > struct anon_vma *anon_vma; > unsigned long anon_mapping; > > rcu_read_lock(); > anon_mapping = (unsigned long) page->mapping; > if (!(anon_mapping & PAGE_MAPPING_ANON)) > goto out; > if (!page_mapped(page)) > goto out; > > anon_vma = (struct anon_vma *) (anon_mapping - PAGE_MAPPING_ANON); > spin_lock(&anon_vma->lock); > return anon_vma; > > out: > rcu_read_unlock(); > return NULL; > } > > static inline void page_lock_anon_vma(struct anon_vma *anon_vma) It might be wiser to call that one page_unlock_anon_vma ;) (I'm slightly disgruntled that page_lock_anon_vma takes a struct page *, but page_unlock_anon_vma no struct page *. But it would be silly to do it differently, or mess with the naming: besides, it's a static function and the prototype guards against error anyway.) > { > spin_unlock(&anon_vma->lock); > rcu_read_unlock(); > } > ? Looks fine to me: please go ahead a make a patch for -mm, Oleg: thanks. I've CC'ed Christoph for several reasons. One, I think he was trying to persuade me a year ago to change page_lock_anon_vma as you're now proposing; but I resisted, preferring to keep the RCU stuff self-contained within page_lock_anon_vma: let's credit him for prescience, and admit you've proved me wrong. Two, in his SLUB thread it sounds like he's toying with mixing up kmem caches of similar sizes: that seems a really bad idea to me (for reasons Andi and others have made) and I expect it'll get dropped; but in case not, let's note that the anon_vma cache is one which would fare very badly from getting mixed in with others (lock would no longer remain a lock, list would no longer remain a list, across that race). Three, he has a recurring interest in this unusual page_lock_anon_vma, mainly for page migration reasons, so let's keep him informed. Hugh ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PREEMPT_RCU breaks anon_vma locking ? 2007-02-24 22:04 ` Hugh Dickins @ 2007-02-24 22:53 ` Paul E. McKenney 2007-03-02 16:27 ` Hugh Dickins 2007-02-25 0:13 ` Christoph Lameter 2007-02-25 20:05 ` Oleg Nesterov 2 siblings, 1 reply; 10+ messages in thread From: Paul E. McKenney @ 2007-02-24 22:53 UTC (permalink / raw) To: Hugh Dickins Cc: Oleg Nesterov, dipankar, Andrew Morton, Christoph Lameter, linux-kernel On Sat, Feb 24, 2007 at 10:04:04PM +0000, Hugh Dickins wrote: > On Sat, 24 Feb 2007, Oleg Nesterov wrote: > > Have you checked through the SLAB_DESTROY_BY_RCU end in slab.c? > Is what that's doing still valid? The only thing I see needed due to PREEMPT_RCU is the following comment change. For a terrified few minutes, I thought that the code assumed that struct rcu_head was the same size as struct list_head, but it turns out to only assume that struct slab is at least as large as struct slab_rcu. Thanx, Paul Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> --- diff -urpNa -X dontdiff linux-2.6.20/mm/slab.c linux-2.6.20-slabrcufix/mm/slab.c --- linux-2.6.20/mm/slab.c 2007-02-04 10:44:54.000000000 -0800 +++ linux-2.6.20-slabrcufix/mm/slab.c 2007-02-24 14:50:39.000000000 -0800 @@ -238,7 +238,7 @@ struct slab { * other kind of object (which our subsystem's lock might corrupt). * * rcu_read_lock before reading the address, then rcu_read_unlock after - * taking the spinlock within the structure expected at that address. + * releasing the spinlock within the structure expected at that address. * * We assume struct slab_rcu can overlay struct slab when destroying. */ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PREEMPT_RCU breaks anon_vma locking ? 2007-02-24 22:53 ` Paul E. McKenney @ 2007-03-02 16:27 ` Hugh Dickins 0 siblings, 0 replies; 10+ messages in thread From: Hugh Dickins @ 2007-03-02 16:27 UTC (permalink / raw) To: Paul E. McKenney Cc: Oleg Nesterov, dipankar, Andrew Morton, Christoph Lameter, linux-kernel On Sat, 24 Feb 2007, Paul E. McKenney wrote: > On Sat, Feb 24, 2007 at 10:04:04PM +0000, Hugh Dickins wrote: > > > > Have you checked through the SLAB_DESTROY_BY_RCU end in slab.c? > > Is what that's doing still valid? > > The only thing I see needed due to PREEMPT_RCU is the following comment > change. > > For a terrified few minutes, I thought that the code assumed that struct > rcu_head was the same size as struct list_head, but it turns out to only > assume that struct slab is at least as large as struct slab_rcu. > > Thanx, Paul Thanks for enduring the terror, checking it out, and arriving at such a reassuring conclusion. Andrew, please add this to your -mm collection after (or folded into) Paul's rcu-preemptible-rcu.patch. PREEMPT_RCU has stricter needs: updated comment on SLAB_DESTROY_BY_RCU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Hugh Dickins <hugh@veritas.com> --- diff -urpNa -X dontdiff linux-2.6.20/mm/slab.c linux-2.6.20-slabrcufix/mm/slab.c --- linux-2.6.20/mm/slab.c 2007-02-04 10:44:54.000000000 -0800 +++ linux-2.6.20-slabrcufix/mm/slab.c 2007-02-24 14:50:39.000000000 -0800 @@ -238,7 +238,7 @@ struct slab { * other kind of object (which our subsystem's lock might corrupt). * * rcu_read_lock before reading the address, then rcu_read_unlock after - * taking the spinlock within the structure expected at that address. + * releasing the spinlock within the structure expected at that address. * * We assume struct slab_rcu can overlay struct slab when destroying. */ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PREEMPT_RCU breaks anon_vma locking ? 2007-02-24 22:04 ` Hugh Dickins 2007-02-24 22:53 ` Paul E. McKenney @ 2007-02-25 0:13 ` Christoph Lameter 2007-02-25 20:05 ` Oleg Nesterov 2 siblings, 0 replies; 10+ messages in thread From: Christoph Lameter @ 2007-02-25 0:13 UTC (permalink / raw) To: Hugh Dickins Cc: Oleg Nesterov, Paul E. McKenney, dipankar, Andrew Morton, linux-kernel On Sat, 24 Feb 2007, Hugh Dickins wrote: > Two, in his SLUB thread it sounds like he's toying with mixing up kmem > caches of similar sizes: that seems a really bad idea to me (for reasons > Andi and others have made) and I expect it'll get dropped; but in case > not, let's note that the anon_vma cache is one which would fare very > badly from getting mixed in with others (lock would no longer remain > a lock, list would no longer remain a list, across that race). Dont worry DESTROY_BY_RCU slabs wont be mixed with those not marked DESTROY_BY_RCU. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PREEMPT_RCU breaks anon_vma locking ? 2007-02-24 22:04 ` Hugh Dickins 2007-02-24 22:53 ` Paul E. McKenney 2007-02-25 0:13 ` Christoph Lameter @ 2007-02-25 20:05 ` Oleg Nesterov 2007-02-26 1:53 ` Paul E. McKenney 2 siblings, 1 reply; 10+ messages in thread From: Oleg Nesterov @ 2007-02-25 20:05 UTC (permalink / raw) To: Hugh Dickins Cc: Paul E. McKenney, dipankar, Andrew Morton, Christoph Lameter, linux-kernel On 02/24, Hugh Dickins wrote: > > On Sat, 24 Feb 2007, Oleg Nesterov wrote: > > > So page_lock_anon_vma() works correctly due to SLAB_DESTROY_BY_RCU even if > > anon_vma_unlink() has already freed anon_vma. In that case we should see > > list_empty(&anon_vma->head), we are safe. > > (It doesn't affect your argument, but we won't necessarily see list_empty > there: the anon_vma slot may already have got reused for a different > bundle of vmas completely; but its lock remains a lock and its list > remains a list of vmas, and the worst that happens is that > page_referenced_anon or try_to_unmap_anon wanders through an irrelevant > bundle of vmas, looking for a page that cannot be found there.) Yes, but in that case we are safe, right? We hold the lock, anon_vma can't be freed. But thanks for clarification! Somehow I missed that not only unlock() is unsafe (in theory). If anon_vma's memory was re-used for something else, we can't assume that we will see list_empty(&anon_vma->head). > > static inline void page_lock_anon_vma(struct anon_vma *anon_vma) > > It might be wiser to call that one page_unlock_anon_vma ;) Congratulations, you passed the test! Paul didn't :) > (I'm slightly disgruntled that page_lock_anon_vma takes a struct page *, > but page_unlock_anon_vma no struct page *. But it would be silly to do > it differently, or mess with the naming: besides, it's a static function > and the prototype guards against error anyway.) OK. I thought about "unlock_anon_vma", but symmetry is good indeed. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: PREEMPT_RCU breaks anon_vma locking ? 2007-02-25 20:05 ` Oleg Nesterov @ 2007-02-26 1:53 ` Paul E. McKenney 0 siblings, 0 replies; 10+ messages in thread From: Paul E. McKenney @ 2007-02-26 1:53 UTC (permalink / raw) To: Oleg Nesterov Cc: Hugh Dickins, dipankar, Andrew Morton, Christoph Lameter, linux-kernel On Sun, Feb 25, 2007 at 11:05:50PM +0300, Oleg Nesterov wrote: > On 02/24, Hugh Dickins wrote: > > > > On Sat, 24 Feb 2007, Oleg Nesterov wrote: > > > > > So page_lock_anon_vma() works correctly due to SLAB_DESTROY_BY_RCU even if > > > anon_vma_unlink() has already freed anon_vma. In that case we should see > > > list_empty(&anon_vma->head), we are safe. > > > > (It doesn't affect your argument, but we won't necessarily see list_empty > > there: the anon_vma slot may already have got reused for a different > > bundle of vmas completely; but its lock remains a lock and its list > > remains a list of vmas, and the worst that happens is that > > page_referenced_anon or try_to_unmap_anon wanders through an irrelevant > > bundle of vmas, looking for a page that cannot be found there.) > > Yes, but in that case we are safe, right? We hold the lock, anon_vma can't be > freed. But thanks for clarification! Somehow I missed that not only unlock() > is unsafe (in theory). If anon_vma's memory was re-used for something else, we > can't assume that we will see list_empty(&anon_vma->head). > > > > static inline void page_lock_anon_vma(struct anon_vma *anon_vma) > > > > It might be wiser to call that one page_unlock_anon_vma ;) > > Congratulations, you passed the test! Paul didn't :) What is in a name? ;-) Thanx, Paul > > (I'm slightly disgruntled that page_lock_anon_vma takes a struct page *, > > but page_unlock_anon_vma no struct page *. But it would be silly to do > > it differently, or mess with the naming: besides, it's a static function > > and the prototype guards against error anyway.) > > OK. I thought about "unlock_anon_vma", but symmetry is good indeed. > > Oleg. > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-03-02 16:27 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-02-23 21:23 PREEMPT_RCU breaks anon_vma locking ? Oleg Nesterov 2007-02-23 22:41 ` Paul E. McKenney 2007-02-24 22:10 ` Hugh Dickins 2007-02-24 22:36 ` Paul E. McKenney 2007-02-24 22:04 ` Hugh Dickins 2007-02-24 22:53 ` Paul E. McKenney 2007-03-02 16:27 ` Hugh Dickins 2007-02-25 0:13 ` Christoph Lameter 2007-02-25 20:05 ` Oleg Nesterov 2007-02-26 1:53 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox