* Race in pagevec code
@ 2002-08-21 15:45 Christian Ehrhardt
2002-08-21 17:41 ` Andrew Morton
0 siblings, 1 reply; 5+ messages in thread
From: Christian Ehrhardt @ 2002-08-21 15:45 UTC (permalink / raw)
To: Andrew Morton; +Cc: Rik van Riel, linux-kernel, Thomas Molina
Hi,
the following could explain Ooops reports which
BUG_ON(page->pte.chain != NULL) and possibly others. The kernel I'm
talking about is 2.5.31+akpm-everything.gz but others might be
affected as well:
Summary: Don't try to work with page_count == 0.
Don't hold references to a page that don't count towards
the page count.
Processor 1 Processor 2
refill_inactive grabs a page with
page_count == 1 from the lru.
__pagevec_release tries to release
the same page with page_count == 1
and does put_page_test_zero
bringing the page_count to 0
(No spinlocks involved!)
Interrupt or preempt or something
else that gives the other processor
some time.
refill_inactive calls page_cache_get
bringing the page_count back to 1
(this should ring some alarm bells)
refill_inactive continues all the way
down until it calls __pagevec_release
itself.
__pagevec_release calls put_page_test_zero
bringing the count back to 0
Both processors succeeded in bringing the page_count to zero,
i.e. both processors will add the page to their own
pages_to_free_list.
actually frees the page and reallocates
it to a new process which immediatly
sets page->pte.chain to something useful.
actully frees the page and sees
BUGs on page->pte.chain beeing
non zero.
Depending on what the new process does with the page this may also
explain other Ooopses on one of the BUGs in free_pages_ok as well.
Also it probably isn't necessarily a pagevec only BUG. I think
__page_cache_release has very much the same Problem:
Processor 1 Processor 2
page_cache_release frees a page
still on a lru list brining the
page count to to 0.
=> Interrupt to make the timeing
possible
Grabs page from lru list
and temporarily increments
page_count back to 1.
BUG_ON(page_count(page) != 0)
in __page_cache_release.
I don't have a fix but I think the only real solution is to
increment the page count if a page is on a lru list. After all
this is a reference to the page.
regards Christian Ehrhardt
--
THAT'S ALL FOLKS!
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Race in pagevec code 2002-08-21 15:45 Race in pagevec code Christian Ehrhardt @ 2002-08-21 17:41 ` Andrew Morton 2002-08-21 20:27 ` Rik van Riel 2002-08-21 22:23 ` Christian Ehrhardt 0 siblings, 2 replies; 5+ messages in thread From: Andrew Morton @ 2002-08-21 17:41 UTC (permalink / raw) To: Christian Ehrhardt; +Cc: Rik van Riel, linux-kernel, Thomas Molina Christian Ehrhardt wrote: > > ... > Both processors succeeded in bringing the page_count to zero, > i.e. both processors will add the page to their own > pages_to_free_list. This is why __pagevec_release() has the refcount check inside the lock. If someone else grabbed a ref to the page (also inside the lock) via the LRU, __pagevec_release doesn't free it. So the rule could be stated as: the page gets freed when there are no references to it, presence on the LRU counts as a reference, serialisation is via pagemap_lru_lock. > .. > > I don't have a fix but I think the only real solution is to > increment the page count if a page is on a lru list. After all > this is a reference to the page. One would think so, but that doesn't really change anything. I agree the locking and reffing in there is really nasty. It doesn't help that I put four, repeat four bugs in the 20-line __page_cache_release(). __pagevec_release() is, I think, OK. It would be much simpler to grab the lock each time page_cache_release() is executed, but our performance targets for 2.5 preclude that. The page->pte.chain != NULL problems predate the locking changes. We haven't found that one yet. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Race in pagevec code 2002-08-21 17:41 ` Andrew Morton @ 2002-08-21 20:27 ` Rik van Riel 2002-08-21 22:23 ` Christian Ehrhardt 1 sibling, 0 replies; 5+ messages in thread From: Rik van Riel @ 2002-08-21 20:27 UTC (permalink / raw) To: Andrew Morton; +Cc: Christian Ehrhardt, linux-kernel, Thomas Molina On Wed, 21 Aug 2002, Andrew Morton wrote: > The page->pte.chain != NULL problems predate the locking changes. > We haven't found that one yet. Yes, but we have 2 fixes for the page->pte.chain != NULL problem that happened with mlock() and with discontigmem. I don't remember seeing the other manifestation of the bug without the locking changes... regards, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Race in pagevec code 2002-08-21 17:41 ` Andrew Morton 2002-08-21 20:27 ` Rik van Riel @ 2002-08-21 22:23 ` Christian Ehrhardt 2002-08-21 22:52 ` Andrew Morton 1 sibling, 1 reply; 5+ messages in thread From: Christian Ehrhardt @ 2002-08-21 22:23 UTC (permalink / raw) To: Andrew Morton; +Cc: Rik van Riel, linux-kernel, Thomas Molina On Wed, Aug 21, 2002 at 10:41:48AM -0700, Andrew Morton wrote: > Christian Ehrhardt wrote: > > > > ... > > Both processors succeeded in bringing the page_count to zero, > > i.e. both processors will add the page to their own > > pages_to_free_list. > > This is why __pagevec_release() has the refcount check inside the lock. > If someone else grabbed a ref to the page (also inside the lock) via > the LRU, __pagevec_release doesn't free it. I saw this check but this doesn't help. There is no guarantee that this other reference that someone grabbed is still beeing held at the time where we do the check: The problem is if this newly grabbed reference is again dropped BEFORE the check for page_count == 0 but AFTER put_page_test_zero. In this case there can be TWO execution paths the BOTH think that they dropped the last reference, i.e. both call __free_pages_ok for the same page. See? > So the rule could be stated as: the page gets freed when there are > no references to it, presence on the LRU counts as a reference, > serialisation is via pagemap_lru_lock. I assume you mean zone->lru_lock here, pagemap_lru_lock is gone in 2.5.31+everything.gz. Besides this would possibly help if all accesses to the page_count were protected by the lru_lock (I know we don't want this). Relying on the lru_lock to protect certain "critical" accesses to page_count against each other sounds dangerous to me. > > I don't have a fix but I think the only real solution is to > > increment the page count if a page is on a lru list. After all > > this is a reference to the page. > > One would think so, but that doesn't really change anything. We'd need yet another kernel thread that removes pages with no other references from the lru_list. I agree that this is probably not an option. What we really want is to make the lru_cache reference a weak reference to the page. regards Christian -- THAT'S ALL FOLKS! ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Race in pagevec code 2002-08-21 22:23 ` Christian Ehrhardt @ 2002-08-21 22:52 ` Andrew Morton 0 siblings, 0 replies; 5+ messages in thread From: Andrew Morton @ 2002-08-21 22:52 UTC (permalink / raw) To: Christian Ehrhardt; +Cc: Rik van Riel, linux-kernel, Thomas Molina Christian Ehrhardt wrote: > > On Wed, Aug 21, 2002 at 10:41:48AM -0700, Andrew Morton wrote: > > Christian Ehrhardt wrote: > > > > > > ... > > > Both processors succeeded in bringing the page_count to zero, > > > i.e. both processors will add the page to their own > > > pages_to_free_list. > > > > This is why __pagevec_release() has the refcount check inside the lock. > > If someone else grabbed a ref to the page (also inside the lock) via > > the LRU, __pagevec_release doesn't free it. > > I saw this check but this doesn't help. There is no guarantee that this > other reference that someone grabbed is still beeing held at the time > where we do the check: > The problem is if this newly grabbed reference is again dropped BEFORE > the check for page_count == 0 but AFTER put_page_test_zero. In this > case there can be TWO execution paths the BOTH think that they dropped > the last reference, i.e. both call __free_pages_ok for the same page. > See? shrink_cache() detects that, inside the lock, and puts the page back if it has a zero refcount. refill_inactive doesn't need to do that, because it calls page_cache_release(), which should look like this: void __page_cache_release(struct page *page) { unsigned long flags; spin_lock_irqsave(&_pagemap_lru_lock, flags); if (TestClearPageLRU(page)) { if (PageActive(page)) del_page_from_active_list(page); else del_page_from_inactive_list(page); } if (page_count(page) != 0) page = NULL; spin_unlock_irqrestore(&_pagemap_lru_lock, flags); if (page) __free_pages_ok(page, 0); } If the page count and non-LRUness are both seen inside the lock, the page is freeable. We do a similar thing with inodes, via atomic_dec_and_lock. Despite the transformations, it's based on the 2.4 approach. But you've successfully worried me, and I'm not really sure it's right, and I'm dead sure it's too hairy. Something simpler-but-not-sucky is needed. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2002-08-21 22:50 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2002-08-21 15:45 Race in pagevec code Christian Ehrhardt 2002-08-21 17:41 ` Andrew Morton 2002-08-21 20:27 ` Rik van Riel 2002-08-21 22:23 ` Christian Ehrhardt 2002-08-21 22:52 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox