* [Bridge] stresstest.sh
@ 2005-09-16 21:43 Tom McNeal
2005-09-16 22:18 ` Stephen Hemminger
0 siblings, 1 reply; 5+ messages in thread
From: Tom McNeal @ 2005-09-16 21:43 UTC (permalink / raw)
To: bridge
Hi -
When running the stress tests, after a few hours, a panic occurs
due to a kernel page fault for address 0x0 while executing one
of the brctl commands. We don't know which one, yet. Has anyone
run across this?
do_cpu invoked from kernel context! in traps.c:do_cpu, line 787:
...snip...
Process brctl (pid: 23999, stackpage=813d8000)
...snip...
note: brctl[23999] exited with preempt_count 2
Unable to handle kernel paging request at virtual address 00000000,
epc ==
801131e8, ra == 8011c5d8
Oops in fault.c:do_page_fault, line 213:
...etc....
This is basically in the 2.4.17 kernel, with some of the security
fixes.
Tom
--
Tom McNeal
MontaVista Software
408-992-4459
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bridge] stresstest.sh
2005-09-16 21:43 [Bridge] stresstest.sh Tom McNeal
@ 2005-09-16 22:18 ` Stephen Hemminger
2005-09-19 15:49 ` Tom McNeal
0 siblings, 1 reply; 5+ messages in thread
From: Stephen Hemminger @ 2005-09-16 22:18 UTC (permalink / raw)
To: Tom McNeal; +Cc: bridge
On Fri, 16 Sep 2005 14:43:42 -0700
Tom McNeal <tmcneal@mvista.com> wrote:
> Hi -
>
> When running the stress tests, after a few hours, a panic occurs
> due to a kernel page fault for address 0x0 while executing one
> of the brctl commands. We don't know which one, yet. Has anyone
> run across this?
>
> do_cpu invoked from kernel context! in traps.c:do_cpu, line 787:
> ...snip...
> Process brctl (pid: 23999, stackpage=813d8000)
> ...snip...
> note: brctl[23999] exited with preempt_count 2
> Unable to handle kernel paging request at virtual address 00000000,
> epc ==
> 801131e8, ra == 8011c5d8
> Oops in fault.c:do_page_fault, line 213:
> ...etc....
>
> This is basically in the 2.4.17 kernel, with some of the security
> fixes.
>
> Tom
>
Some basics:
* 2.4.17 is pretty old, can you at least try 2.4.30 or later.
Better yet, 2.6
* Why are you doing bridge commands during the stress test?
it is pretty much a setup and forget it thing.
* are you using SMP? Locking in bridge code for 2.4 is pretty
weak and there are probably holes. I ended up reworking the whole
locking model of bridge code for 2.6 for speed and correctness.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bridge] stresstest.sh
2005-09-16 22:18 ` Stephen Hemminger
@ 2005-09-19 15:49 ` Tom McNeal
2005-09-19 16:34 ` Stephen Hemminger
0 siblings, 1 reply; 5+ messages in thread
From: Tom McNeal @ 2005-09-19 15:49 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: bridge
Stephen Hemminger wrote:
> On Fri, 16 Sep 2005 14:43:42 -0700
> Tom McNeal <tmcneal@mvista.com> wrote:
>
>>Hi -
>>
>>When running the stress tests, after a few hours, a panic occurs
>>due to a kernel page fault for address 0x0 while executing one
>>of the brctl commands. We don't know which one, yet. Has anyone
>>run across this?
>>
>>This is basically in the 2.4.17 kernel, with some of the security
>>fixes.
>>
>>Tom
>
>
> Some basics:
> * 2.4.17 is pretty old, can you at least try 2.4.30 or later.
> Better yet, 2.6
> * Why are you doing bridge commands during the stress test?
> it is pretty much a setup and forget it thing.
> * are you using SMP? Locking in bridge code for 2.4 is pretty
> weak and there are probably holes. I ended up reworking the whole
> locking model of bridge code for 2.6 for speed and correctness.
>
By 'bridge commands' I meant the brctl command, used by the stress
tests posted in the bridge-utils-1.0.6 test directory. The tests
add and delete bridges while independently adding and deleting
interfaces to the supposedly existing bridge, in independent loops.
How real world is that?
I'm pretty sure it is SMP; are there fixes, like the ones you added
in 2.4.22 and 2.4.27, which are relevant? I'm looking at trying to
patch 2.4.17 right now (I can't upgrade, but I can patch). I do
seem some locking stuff that I'm going to look at now....
Thanks -
Tom
--
Tom McNeal
MontaVista Software
408-992-4459
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bridge] stresstest.sh
2005-09-19 15:49 ` Tom McNeal
@ 2005-09-19 16:34 ` Stephen Hemminger
2005-09-21 16:37 ` Tom McNeal
0 siblings, 1 reply; 5+ messages in thread
From: Stephen Hemminger @ 2005-09-19 16:34 UTC (permalink / raw)
To: Tom McNeal; +Cc: bridge
On Mon, 19 Sep 2005 08:49:22 -0700
Tom McNeal <tmcneal@mvista.com> wrote:
> Stephen Hemminger wrote:
> > On Fri, 16 Sep 2005 14:43:42 -0700
> > Tom McNeal <tmcneal@mvista.com> wrote:
> >
> >>Hi -
> >>
> >>When running the stress tests, after a few hours, a panic occurs
> >>due to a kernel page fault for address 0x0 while executing one
> >>of the brctl commands. We don't know which one, yet. Has anyone
> >>run across this?
> >>
> >>This is basically in the 2.4.17 kernel, with some of the security
> >>fixes.
> >>
> >>Tom
> >
> >
> > Some basics:
> > * 2.4.17 is pretty old, can you at least try 2.4.30 or later.
> > Better yet, 2.6
> > * Why are you doing bridge commands during the stress test?
> > it is pretty much a setup and forget it thing.
> > * are you using SMP? Locking in bridge code for 2.4 is pretty
> > weak and there are probably holes. I ended up reworking the whole
> > locking model of bridge code for 2.6 for speed and correctness.
> >
>
> By 'bridge commands' I meant the brctl command, used by the stress
> tests posted in the bridge-utils-1.0.6 test directory. The tests
> add and delete bridges while independently adding and deleting
> interfaces to the supposedly existing bridge, in independent loops.
> How real world is that?
It isn't real world at all, but the test was made to make sure
the locking changes for 2.6 (especially switching to RCU), were
safe.
Real world would be blasting lots of packets through (with something
like pktgen), and also testing with 1000's of different source addresses
to make sure forwarding table survives.
> I'm pretty sure it is SMP; are there fixes, like the ones you added
> in 2.4.22 and 2.4.27, which are relevant? I'm looking at trying to
> patch 2.4.17 right now (I can't upgrade, but I can patch). I do
> seem some locking stuff that I'm going to look at now....
You probably could just copy whole net/bridge directory over from
current 2.4 tree.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Bridge] stresstest.sh
2005-09-19 16:34 ` Stephen Hemminger
@ 2005-09-21 16:37 ` Tom McNeal
0 siblings, 0 replies; 5+ messages in thread
From: Tom McNeal @ 2005-09-21 16:37 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: bridge
Hi -
We are still hitting a panic while running the bridge-utils-1.0.6
stress tests, apparently due to a page fault in the kernel
at address 0, or at least we see that before seeing the panic
message (Kernel panic: Aiee, killing interrupt handler!)
Anyway, the bridge code in the 2.4.17 kernel (net/bridge) was
upgraded to 2.4.31 code as suggested. In addition, the process
running at the time of the panic is the 'brctl delbr brtmp1' call
in the newdel function of the stress tests, which is looping with
addbr and delbr calls to create and delete a temporary, unused bridge.
Any suggestions?
Tom
Stephen Hemminger wrote:
>
> You probably could just copy whole net/bridge directory over from
> current 2.4 tree.
>
--
Tom McNeal
MontaVista Software
408-992-4459
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2005-09-21 16:37 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-16 21:43 [Bridge] stresstest.sh Tom McNeal
2005-09-16 22:18 ` Stephen Hemminger
2005-09-19 15:49 ` Tom McNeal
2005-09-19 16:34 ` Stephen Hemminger
2005-09-21 16:37 ` Tom McNeal
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.