public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Stress testing 2.4.14-pre6
@ 2001-10-31 21:57 Bob Matthews
  2001-11-01  0:40 ` Alan Cox
  2001-11-01 16:24 ` Linus Torvalds
  0 siblings, 2 replies; 9+ messages in thread
From: Bob Matthews @ 2001-10-31 21:57 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel

Hi Linus,

We've been doing some stress-testing on 2.4.14-pre6 and have encountered
a couple of problems.  The platform is an 8xPIII with 8G RAM and 32G
swap.  After running Cerberus for about 3 hours, the machine hung
completely.  I was not able to switch VC's.

Unfortunately, this is as detailed a bug report as I can submit.  It
looks like Magic SysRq is broken in this kernel.  <alt><SysRq>T will
print the column headings but nothing else.
-- 
Bob Matthews
Red Hat, Inc.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stress testing 2.4.14-pre6
  2001-10-31 21:57 Stress testing 2.4.14-pre6 Bob Matthews
@ 2001-11-01  0:40 ` Alan Cox
  2001-11-01 16:24 ` Linus Torvalds
  1 sibling, 0 replies; 9+ messages in thread
From: Alan Cox @ 2001-11-01  0:40 UTC (permalink / raw)
  To: Bob Matthews; +Cc: torvalds, linux-kernel

> Unfortunately, this is as detailed a bug report as I can submit.  It
> looks like Magic SysRq is broken in this kernel.  <alt><SysRq>T will
> print the column headings but nothing else.

Thats consistent with memory corruption trashig the task list

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stress testing 2.4.14-pre6
  2001-11-01 16:24 ` Linus Torvalds
@ 2001-11-01 15:41   ` Marcelo Tosatti
  2001-11-01 17:08     ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Marcelo Tosatti @ 2001-11-01 15:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel



On Thu, 1 Nov 2001, Linus Torvalds wrote:

> In article <3BE073B6.BDCB3D56@redhat.com>,
> Bob Matthews  <bmatthews@redhat.com> wrote:
> >Hi Linus,
> >
> >We've been doing some stress-testing on 2.4.14-pre6 and have encountered
> >a couple of problems.  The platform is an 8xPIII with 8G RAM and 32G
> >swap.  After running Cerberus for about 3 hours, the machine hung
> >completely.  I was not able to switch VC's.
> 
> There is some race somewhere - I've found one interrupt race (that
> actually seems to exist in the 2.2.x VM too, but is probably _much_
> harder to trigger where an interrupt at _just_ the right time will
> corrupt the per-process local page list.  That looks so unlikely that I
> doubt that is it, but I'm looking for others (the irq one wasn't even a
> SMP race - it's on UP too, surprise surprise). 
> 
> Working on it, in other words.

Would you mind to describe this race? 

Thanks



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stress testing 2.4.14-pre6
  2001-10-31 21:57 Stress testing 2.4.14-pre6 Bob Matthews
  2001-11-01  0:40 ` Alan Cox
@ 2001-11-01 16:24 ` Linus Torvalds
  2001-11-01 15:41   ` Marcelo Tosatti
  1 sibling, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2001-11-01 16:24 UTC (permalink / raw)
  To: linux-kernel

In article <3BE073B6.BDCB3D56@redhat.com>,
Bob Matthews  <bmatthews@redhat.com> wrote:
>Hi Linus,
>
>We've been doing some stress-testing on 2.4.14-pre6 and have encountered
>a couple of problems.  The platform is an 8xPIII with 8G RAM and 32G
>swap.  After running Cerberus for about 3 hours, the machine hung
>completely.  I was not able to switch VC's.

There is some race somewhere - I've found one interrupt race (that
actually seems to exist in the 2.2.x VM too, but is probably _much_
harder to trigger where an interrupt at _just_ the right time will
corrupt the per-process local page list.  That looks so unlikely that I
doubt that is it, but I'm looking for others (the irq one wasn't even a
SMP race - it's on UP too, surprise surprise). 

Working on it, in other words.

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stress testing 2.4.14-pre6
  2001-11-01 15:41   ` Marcelo Tosatti
@ 2001-11-01 17:08     ` Linus Torvalds
  2001-11-01 17:18       ` Jeff Garzik
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2001-11-01 17:08 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel


On Thu, 1 Nov 2001, Marcelo Tosatti wrote:
> >
> > There is some race somewhere - I've found one interrupt race (that
> > actually seems to exist in the 2.2.x VM too, but is probably _much_
> > harder to trigger where an interrupt at _just_ the right time will
> > corrupt the per-process local page list.  That looks so unlikely that I
> > doubt that is it, but I'm looking for others (the irq one wasn't even a
> > SMP race - it's on UP too, surprise surprise).
> >
> > Working on it, in other words.
>
> Would you mind to describe this race?

Both 2.2.x and the new VM (which, through Andrea, has a lot of the same
things) have this notion of a per-process "free pages list" that it
replenished by any freeing that the process does itself when it gets into
the "try_to_free_memory()" path.

The trigger for refilling this list is "current->flags & PF_FREE_PAGES".

The bug is that ytou can be in the middle of adding such a recently free'd
page to the per-process list of free pages, and an interrupt comes in.

The interrupt (or bottom half), in turn, might do something like

	page = get_free_page(GFP_ATOMIC);
	...
	free_page(page);

and now the free_page() inside the interrupt context will _also_ trigger
the PF_FREE_PAGES test, and _also_ add the page to the list. Except, of
course, the list is totally unprotected by any locks, so it may not be
valid at this point.

Fix is to only care about the PF_FREE_PAGES bit when not in an interrupt
context.

Anyway, I seriously doubt this explains any real-world bad behaviour: the
window for the interrupt hitting a half-way updated list is something like
two instructions long out of the whole memory freeing path. AND most
interrupts don't actually do any allocation.

		Linus


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stress testing 2.4.14-pre6
  2001-11-01 17:08     ` Linus Torvalds
@ 2001-11-01 17:18       ` Jeff Garzik
  2001-11-01 17:29         ` Arjan van de Ven
  2001-11-01 18:15         ` Jens Axboe
  0 siblings, 2 replies; 9+ messages in thread
From: Jeff Garzik @ 2001-11-01 17:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Marcelo Tosatti, linux-kernel

Linus Torvalds wrote:
> Anyway, I seriously doubt this explains any real-world bad behaviour: the
> window for the interrupt hitting a half-way updated list is something like
> two instructions long out of the whole memory freeing path. AND most
> interrupts don't actually do any allocation.

Network Rx interrupts do....  definitely not as frequent as IDE
interrupts, but not infrequent.

-- 
Jeff Garzik      | Only so many songs can be sung
Building 1024    | with two lips, two lungs, and one tongue.
MandrakeSoft     |         - nomeansno

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stress testing 2.4.14-pre6
  2001-11-01 17:18       ` Jeff Garzik
@ 2001-11-01 17:29         ` Arjan van de Ven
  2001-11-01 18:15         ` Jens Axboe
  1 sibling, 0 replies; 9+ messages in thread
From: Arjan van de Ven @ 2001-11-01 17:29 UTC (permalink / raw)
  To: Jeff Garzik, linux-kernel

Jeff Garzik wrote:
> 
> Linus Torvalds wrote:
> > Anyway, I seriously doubt this explains any real-world bad behaviour: the
> > window for the interrupt hitting a half-way updated list is something like
> > two instructions long out of the whole memory freeing path. AND most
> > interrupts don't actually do any allocation.
> 
> Network Rx interrupts do....  definitely not as frequent as IDE
> interrupts, but not infrequent.

Cerberus doesn't use networking in the tested setup iirc

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stress testing 2.4.14-pre6
  2001-11-01 17:18       ` Jeff Garzik
  2001-11-01 17:29         ` Arjan van de Ven
@ 2001-11-01 18:15         ` Jens Axboe
  2001-11-01 18:20           ` Jeff Garzik
  1 sibling, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2001-11-01 18:15 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Linus Torvalds, Marcelo Tosatti, linux-kernel

On Thu, Nov 01 2001, Jeff Garzik wrote:
> Linus Torvalds wrote:
> > Anyway, I seriously doubt this explains any real-world bad behaviour: the
> > window for the interrupt hitting a half-way updated list is something like
> > two instructions long out of the whole memory freeing path. AND most
> > interrupts don't actually do any allocation.
> 
> Network Rx interrupts do....  definitely not as frequent as IDE
> interrupts, but not infrequent.

Which IDE interrupts allocate memory?!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stress testing 2.4.14-pre6
  2001-11-01 18:15         ` Jens Axboe
@ 2001-11-01 18:20           ` Jeff Garzik
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff Garzik @ 2001-11-01 18:20 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linus Torvalds, Marcelo Tosatti, linux-kernel

Jens Axboe wrote:
> 
> On Thu, Nov 01 2001, Jeff Garzik wrote:
> > Linus Torvalds wrote:
> > > Anyway, I seriously doubt this explains any real-world bad behaviour: the
> > > window for the interrupt hitting a half-way updated list is something like
> > > two instructions long out of the whole memory freeing path. AND most
> > > interrupts don't actually do any allocation.
> >
> > Network Rx interrupts do....  definitely not as frequent as IDE
> > interrupts, but not infrequent.
> 
> Which IDE interrupts allocate memory?!

Sorry, I meant as in, IDE interrupts occur more frequently than Rx
interrupts.

English is my first language... really.

-- 
Jeff Garzik      | Only so many songs can be sung
Building 1024    | with two lips, two lungs, and one tongue.
MandrakeSoft     |         - nomeansno

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2001-11-01 18:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-31 21:57 Stress testing 2.4.14-pre6 Bob Matthews
2001-11-01  0:40 ` Alan Cox
2001-11-01 16:24 ` Linus Torvalds
2001-11-01 15:41   ` Marcelo Tosatti
2001-11-01 17:08     ` Linus Torvalds
2001-11-01 17:18       ` Jeff Garzik
2001-11-01 17:29         ` Arjan van de Ven
2001-11-01 18:15         ` Jens Axboe
2001-11-01 18:20           ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox