Race in pagevec code

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Race in pagevec code
@ 2002-08-21 15:45 Christian Ehrhardt
  2002-08-21 17:41 ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Ehrhardt @ 2002-08-21 15:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rik van Riel, linux-kernel, Thomas Molina


Hi,

the following could explain Ooops reports which
BUG_ON(page->pte.chain != NULL) and possibly others. The kernel I'm
talking about is 2.5.31+akpm-everything.gz but others might be
affected as well:

Summary: Don't try to work with page_count == 0. 
	 Don't hold references to a page that don't count towards
	 the page count.


Processor 1                            Processor 2
refill_inactive grabs a page with      
page_count == 1 from the lru.
                                       __pagevec_release tries to release
					 the same page with page_count == 1
					 and does put_page_test_zero
					 bringing the page_count to 0
					 (No spinlocks involved!)

				       Interrupt or preempt or something
					 else that gives the other processor
					 some time.

refill_inactive calls page_cache_get
bringing the page_count back to 1
(this should ring some alarm bells)

refill_inactive continues all the way
down until it calls __pagevec_release
itself.

__pagevec_release calls put_page_test_zero
bringing the count back to 0

      Both processors succeeded in bringing the page_count to zero,
      i.e. both processors will add the page to their own
      pages_to_free_list.

actually frees the page and reallocates
it to a new process which immediatly
sets page->pte.chain to something useful.

                                          actully frees the page and sees
                                          BUGs on page->pte.chain beeing
					  non zero.
                                            
Depending on what the new process does with the page this may also
explain other Ooopses on one of the BUGs in free_pages_ok as well.
Also it probably isn't necessarily a pagevec only BUG. I think
__page_cache_release has very much the same Problem:

Processor 1                               Processor 2
page_cache_release frees a page
still on a lru list brining the
page count to to 0.
   => Interrupt to make the timeing
      possible

                                          Grabs page from lru list
					  and temporarily increments
					  page_count back to 1.
BUG_ON(page_count(page) != 0)
in __page_cache_release.

I don't have a fix but I think the only real solution is to
increment the page count if a page is on a lru list. After all
this is a reference to the page.

     regards    Christian Ehrhardt

-- 
THAT'S ALL FOLKS!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Race in pagevec code
  2002-08-21 15:45 Race in pagevec code Christian Ehrhardt
@ 2002-08-21 17:41 ` Andrew Morton
  2002-08-21 20:27   ` Rik van Riel
  2002-08-21 22:23   ` Christian Ehrhardt
  0 siblings, 2 replies; 5+ messages in thread
From: Andrew Morton @ 2002-08-21 17:41 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: Rik van Riel, linux-kernel, Thomas Molina

Christian Ehrhardt wrote:
> 
> ...
>       Both processors succeeded in bringing the page_count to zero,
>       i.e. both processors will add the page to their own
>       pages_to_free_list.

This is why __pagevec_release() has the refcount check inside the lock.
If someone else grabbed a ref to the page (also inside the lock) via
the LRU, __pagevec_release doesn't free it.

So the rule could be stated as: the page gets freed when there are
no references to it, presence on the LRU counts as a reference,
serialisation is via pagemap_lru_lock.

> ..
> 
> I don't have a fix but I think the only real solution is to
> increment the page count if a page is on a lru list. After all
> this is a reference to the page.

One would think so, but that doesn't really change anything.

I agree the locking and reffing in there is really nasty.  It 
doesn't help that I put four, repeat four bugs in the 20-line
__page_cache_release().  __pagevec_release() is, I think, OK.

It would be much simpler to grab the lock each time
page_cache_release() is executed, but our performance targets
for 2.5 preclude that.

The page->pte.chain != NULL problems predate the locking changes.
We haven't found that one yet.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Race in pagevec code
  2002-08-21 17:41 ` Andrew Morton
@ 2002-08-21 20:27   ` Rik van Riel
  2002-08-21 22:23   ` Christian Ehrhardt
  1 sibling, 0 replies; 5+ messages in thread
From: Rik van Riel @ 2002-08-21 20:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christian Ehrhardt, linux-kernel, Thomas Molina

On Wed, 21 Aug 2002, Andrew Morton wrote:

> The page->pte.chain != NULL problems predate the locking changes.
> We haven't found that one yet.

Yes, but we have 2 fixes for the page->pte.chain != NULL
problem that happened with mlock() and with discontigmem.

I don't remember seeing the other manifestation of the
bug without the locking changes...

regards,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/		http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Race in pagevec code
  2002-08-21 17:41 ` Andrew Morton
  2002-08-21 20:27   ` Rik van Riel
@ 2002-08-21 22:23   ` Christian Ehrhardt
  2002-08-21 22:52     ` Andrew Morton
  1 sibling, 1 reply; 5+ messages in thread
From: Christian Ehrhardt @ 2002-08-21 22:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rik van Riel, linux-kernel, Thomas Molina

On Wed, Aug 21, 2002 at 10:41:48AM -0700, Andrew Morton wrote:
> Christian Ehrhardt wrote:
> > 
> > ...
> >       Both processors succeeded in bringing the page_count to zero,
> >       i.e. both processors will add the page to their own
> >       pages_to_free_list.
> 
> This is why __pagevec_release() has the refcount check inside the lock.
> If someone else grabbed a ref to the page (also inside the lock) via
> the LRU, __pagevec_release doesn't free it.

I saw this check but this doesn't help. There is no guarantee that this
other reference that someone grabbed is still beeing held at the time
where we do the check:
The problem is if this newly grabbed reference is again dropped BEFORE
the check for page_count == 0 but AFTER put_page_test_zero. In this
case there can be TWO execution paths the BOTH think that they dropped
the last reference, i.e. both call __free_pages_ok for the same page.
See?

> So the rule could be stated as: the page gets freed when there are
> no references to it, presence on the LRU counts as a reference,
> serialisation is via pagemap_lru_lock.

I assume you mean zone->lru_lock here, pagemap_lru_lock is gone
in 2.5.31+everything.gz. Besides this would possibly help if all accesses
to the page_count were protected by the lru_lock (I know we don't want
this). Relying on the lru_lock to protect certain "critical" accesses to
page_count against each other sounds dangerous to me.

> > I don't have a fix but I think the only real solution is to
> > increment the page count if a page is on a lru list. After all
> > this is a reference to the page.
> 
> One would think so, but that doesn't really change anything.

We'd need yet another kernel thread that removes pages with no other
references from the lru_list. I agree that this is probably not an
option. What we really want is to make the lru_cache reference a weak
reference to the page.

    regards   Christian

-- 
THAT'S ALL FOLKS!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Race in pagevec code
  2002-08-21 22:23   ` Christian Ehrhardt
@ 2002-08-21 22:52     ` Andrew Morton
  0 siblings, 0 replies; 5+ messages in thread
From: Andrew Morton @ 2002-08-21 22:52 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: Rik van Riel, linux-kernel, Thomas Molina

Christian Ehrhardt wrote:
> 
> On Wed, Aug 21, 2002 at 10:41:48AM -0700, Andrew Morton wrote:
> > Christian Ehrhardt wrote:
> > >
> > > ...
> > >       Both processors succeeded in bringing the page_count to zero,
> > >       i.e. both processors will add the page to their own
> > >       pages_to_free_list.
> >
> > This is why __pagevec_release() has the refcount check inside the lock.
> > If someone else grabbed a ref to the page (also inside the lock) via
> > the LRU, __pagevec_release doesn't free it.
> 
> I saw this check but this doesn't help. There is no guarantee that this
> other reference that someone grabbed is still beeing held at the time
> where we do the check:
> The problem is if this newly grabbed reference is again dropped BEFORE
> the check for page_count == 0 but AFTER put_page_test_zero. In this
> case there can be TWO execution paths the BOTH think that they dropped
> the last reference, i.e. both call __free_pages_ok for the same page.
> See?

shrink_cache() detects that, inside the lock, and puts the page back
if it has a zero refcount.

refill_inactive doesn't need to do that, because it calls page_cache_release(),
which should look like this:

void __page_cache_release(struct page *page)
{
        unsigned long flags;

        spin_lock_irqsave(&_pagemap_lru_lock, flags);
        if (TestClearPageLRU(page)) {
                if (PageActive(page))
                        del_page_from_active_list(page);
                else
                        del_page_from_inactive_list(page);
        }
        if (page_count(page) != 0)
                page = NULL;
        spin_unlock_irqrestore(&_pagemap_lru_lock, flags);
        if (page)
                __free_pages_ok(page, 0);
}

If the page count and non-LRUness are both seen inside the lock,
the page is freeable.

We do a similar thing with inodes, via atomic_dec_and_lock.

Despite the transformations, it's based on the 2.4 approach.  But you've
successfully worried me, and I'm not really sure it's right, and I'm
dead sure it's too hairy.  Something simpler-but-not-sucky is needed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-08-21 22:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-08-21 15:45 Race in pagevec code Christian Ehrhardt
2002-08-21 17:41 ` Andrew Morton
2002-08-21 20:27   ` Rik van Riel
2002-08-21 22:23   ` Christian Ehrhardt
2002-08-21 22:52     ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox