linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* "event-scan failed" logflood
@ 2010-05-13  4:13 pacman
  2010-05-13  6:04 ` Michael Ellerman
  0 siblings, 1 reply; 10+ messages in thread
From: pacman @ 2010-05-13  4:13 UTC (permalink / raw)
  To: linuxppc-dev

I upgraded the kernel on my Pegasos from 2.6.32 to 2.6.33 and now it sends
the message "event-scan failed" to the kernel log about 60 times per second
as long as it's running.

The message comes from arch/powerpc/kernel/rtasd.c but I don't know what's
going on in there so I can't say much more about it.

How may I help find the cause of this problem?

(Or alternatively: what is the correct way to disable this "event-scan"? I
suspect it's not really necessary since it didn't exist in the previous
kernel version, and the system is running perfectly aside from the error
message flood.)

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "event-scan failed" logflood
  2010-05-13  4:13 "event-scan failed" logflood pacman
@ 2010-05-13  6:04 ` Michael Ellerman
  2010-05-13 10:46   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Ellerman @ 2010-05-13  6:04 UTC (permalink / raw)
  To: pacman; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 995 bytes --]

On Wed, 2010-05-12 at 23:13 -0500, pacman@kosh.dhis.org wrote:
> I upgraded the kernel on my Pegasos from 2.6.32 to 2.6.33 and now it sends
> the message "event-scan failed" to the kernel log about 60 times per second
> as long as it's running.
> 
> The message comes from arch/powerpc/kernel/rtasd.c but I don't know what's
> going on in there so I can't say much more about it.
> 
> How may I help find the cause of this problem?
> 
> (Or alternatively: what is the correct way to disable this "event-scan"? I
> suspect it's not really necessary since it didn't exist in the previous
> kernel version, and the system is running perfectly aside from the error
> message flood.)
> 

It's probably caused by 3d541c4b7f6efd55a98189afd1b2f1c9d048c1b3.

Is Pegasos a chrp, I can't remember. It did get called in the old chrp
code, and the comments suggest you have to call it.

It's not obvious looking at the diff why it would be failing now but
didn't previously.

cheers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "event-scan failed" logflood
  2010-05-13  6:04 ` Michael Ellerman
@ 2010-05-13 10:46   ` Benjamin Herrenschmidt
  2010-05-13 11:16     ` pacman
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-13 10:46 UTC (permalink / raw)
  To: michael; +Cc: pacman, linuxppc-dev

On Thu, 2010-05-13 at 16:04 +1000, Michael Ellerman wrote:
> It's probably caused by 3d541c4b7f6efd55a98189afd1b2f1c9d048c1b3.
> 
> Is Pegasos a chrp, I can't remember. It did get called in the old chrp
> code, and the comments suggest you have to call it.
> 
> It's not obvious looking at the diff why it would be failing now but
> didn't previously.

Well, first it should be called once per second, not 60 times per
second, so something is wrong there...

Then, it wouldn't surprise me if the Pegasos RTAS didn't implement
the event scan properly. Maybe we failed silently before that ? I
doubt it's necessary there anyways.

It would be relatively easy to avoid starting the event scan on a
pegasos, I think we have a _chrp_type variable that can be tested
for that.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "event-scan failed" logflood
  2010-05-13 10:46   ` Benjamin Herrenschmidt
@ 2010-05-13 11:16     ` pacman
  2010-05-13 21:28       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 10+ messages in thread
From: pacman @ 2010-05-13 11:16 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev

Benjamin Herrenschmidt writes:
> 
> Well, first it should be called once per second, not 60 times per
> second, so something is wrong there...

Actually I think it was happening a lot more than 60 times per second, and
klogd was losing most of the messages because they came too fast. When
running the new kernel, vmstat shows 170000 context switches per second (and
CPU usage never going below 5%) even when everything should be idle.

My next experiment was to hack rtas_init to return 0 unconditionally. Now
everything still runs fine, and the context switches per second are back down
in the 2-digit range.

> 
> Then, it wouldn't surprise me if the Pegasos RTAS didn't implement
> the event scan properly. Maybe we failed silently before that ? I
> doubt it's necessary there anyways.

If I'm reading things correctly, rtas-event-scan-rate is the key to the
frequency of attempts. And rtasd.c didn't expect this:

$ od -t x1 /proc/device-tree/rtas/rtas-event-scan-rate
0000000 00 00 00 00
0000004

-- 
Alan Curry

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "event-scan failed" logflood
  2010-05-13 11:16     ` pacman
@ 2010-05-13 21:28       ` Benjamin Herrenschmidt
  2010-05-17 12:58         ` nello martuscielli
  0 siblings, 1 reply; 10+ messages in thread
From: Benjamin Herrenschmidt @ 2010-05-13 21:28 UTC (permalink / raw)
  To: pacman; +Cc: linuxppc-dev

On Thu, 2010-05-13 at 06:16 -0500, pacman@kosh.dhis.org wrote:
> Benjamin Herrenschmidt writes:
> > 
> > Well, first it should be called once per second, not 60 times per
> > second, so something is wrong there...
> 
> Actually I think it was happening a lot more than 60 times per second, and
> klogd was losing most of the messages because they came too fast. When
> running the new kernel, vmstat shows 170000 context switches per second (and
> CPU usage never going below 5%) even when everything should be idle.
> 
> My next experiment was to hack rtas_init to return 0 unconditionally. Now
> everything still runs fine, and the context switches per second are back down
> in the 2-digit range.
> 
> > 
> > Then, it wouldn't surprise me if the Pegasos RTAS didn't implement
> > the event scan properly. Maybe we failed silently before that ? I
> > doubt it's necessary there anyways.
> 
> If I'm reading things correctly, rtas-event-scan-rate is the key to the
> frequency of attempts. And rtasd.c didn't expect this:
> 
> $ od -t x1 /proc/device-tree/rtas/rtas-event-scan-rate
> 0000000 00 00 00 00
> 0000004

Ok, that gives us a good thing to use to not do the scanning then. If
that's 0 then we assume it's a bogus OF and we disable the scan
completely. I'll do a patch later today, though feel free to beat me to
it.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "event-scan failed" logflood
  2010-05-13 21:28       ` Benjamin Herrenschmidt
@ 2010-05-17 12:58         ` nello martuscielli
  2010-05-18 12:33           ` Michael Ellerman
  0 siblings, 1 reply; 10+ messages in thread
From: nello martuscielli @ 2010-05-17 12:58 UTC (permalink / raw)
  To: linuxppc-dev

Benjamin Herrenschmidt <benh <at> kernel.crashing.org> writes:

_omissis__
> 
> Ok, that gives us a good thing to use to not do the scanning then. If
> that's 0 then we assume it's a bogus OF and we disable the scan
> completely. I'll do a patch later today, though feel free to beat me to
> it.
> 
> Cheers,
> Ben.
> 

hi, is there available that patch?
With the fresh new 2.6.34 the logflood problem is still present.

thanks,
--nello

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "event-scan failed" logflood
  2010-05-17 12:58         ` nello martuscielli
@ 2010-05-18 12:33           ` Michael Ellerman
  2010-05-18 16:30             ` nello martuscielli
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Ellerman @ 2010-05-18 12:33 UTC (permalink / raw)
  To: nello martuscielli; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 1274 bytes --]

On Mon, 2010-05-17 at 12:58 +0000, nello martuscielli wrote:
> Benjamin Herrenschmidt <benh <at> kernel.crashing.org> writes:
> 
> _omissis__
> > 
> > Ok, that gives us a good thing to use to not do the scanning then. If
> > that's 0 then we assume it's a bogus OF and we disable the scan
> > completely. I'll do a patch later today, though feel free to beat me to
> > it.
> > 
> > Cheers,
> > Ben.
> > 
> 
> hi, is there available that patch?
> With the fresh new 2.6.34 the logflood problem is still present.

You could try this, completely untested:

diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
index 4190eae..fd68bed 100644
--- a/arch/powerpc/kernel/rtasd.c
+++ b/arch/powerpc/kernel/rtasd.c
@@ -490,6 +490,12 @@ static int __init rtas_init(void)
                return -ENODEV;
        }
 
+       if (!rtas_event_scan_rate) {
+               /* Broken firmware: take a rate of zero to mean don't scan */
+               printk(KERN_DEBUG "rtasd: scan rate is 0, not scanning\n");
+               return 0;
+       }
+
        /* Make room for the sequence number */
        rtas_error_log_max = rtas_get_error_log_max();
        rtas_error_log_buffer_max = rtas_error_log_max + sizeof(int);


cheers


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: "event-scan failed" logflood
  2010-05-18 12:33           ` Michael Ellerman
@ 2010-05-18 16:30             ` nello martuscielli
  2010-05-19  1:39               ` Michael Ellerman
  0 siblings, 1 reply; 10+ messages in thread
From: nello martuscielli @ 2010-05-18 16:30 UTC (permalink / raw)
  To: linuxppc-dev

Michael Ellerman <michael <at> ellerman.id.au> writes:

_omissis__
> > 
> > hi, is there available that patch?
> > With the fresh new 2.6.34 the logflood problem is still present.
> 
> You could try this, completely untested:
> 
> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
> index 4190eae..fd68bed 100644
> --- a/arch/powerpc/kernel/rtasd.c
> +++ b/arch/powerpc/kernel/rtasd.c
> @@ -490,6 +490,12 @@ static int __init rtas_init(void)
>                 return -ENODEV;
>         }
> 
> +       if (!rtas_event_scan_rate) {
> +               /* Broken firmware: take a rate of zero to mean don't scan */
> +               printk(KERN_DEBUG "rtasd: scan rate is 0, not scanning\n");
> +               return 0;
> +       }
> +
>         /* Make room for the sequence number */
>         rtas_error_log_max = rtas_get_error_log_max();
>         rtas_error_log_buffer_max = rtas_error_log_max + sizeof(int);
> 


hi Michael, thanks for pointing me to that patch, now my Pegasos2 G4 seems to
work fine.
Anyway i got these dumps caused by mv643xx_eth:


[...]
sysfs: cannot create duplicate filename '/class/mdio_bus/0'
------------[ cut here ]------------
Badness at fs/sysfs/dir.c:451
NIP: c00efc70 LR: c00efc70 CTR: 00000000
REGS: ef117c70 TRAP: 0700   Not tainted  (2.6.34)
MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 22002488  XER: 00000000
TASK = ef110630[80] 'modprobe' THREAD: ef116000
GPR00: c00efc70 ef117d20 ef110630 00000042 c0009480 00000000 00134ac8 00000000
GPR08: 00000000 00000000 22002482 10000002 ffffffff 10023ab8 00000000 00000000
GPR16: 00000000 107db0b0 00000000 107db088 10005558 00000003 107db324 107db008
GPR24: 10005558 00000000 0000a9ce fffffffe ef117d48 ef0a4300 efb87000 ffffffef
NIP [c00efc70] sysfs_add_one+0x88/0xb8
LR [c00efc70] sysfs_add_one+0x88/0xb8
Call Trace:
[ef117d20] [c00efc70] sysfs_add_one+0x88/0xb8 (unreliable)
[ef117d40] [c00f0d9c] sysfs_do_create_link+0x104/0x1ac
[ef117d70] [c026b238] device_add+0x1c8/0x534
[ef117db0] [c02b2e40] mdiobus_register+0xb4/0x1d8
[ef117de0] [f222c510] mv643xx_eth_shared_probe+0x14c/0x420 [mv643xx_eth]
[ef117e00] [c026f4b4] platform_drv_probe+0x20/0x30
[ef117e10] [c026e044] driver_probe_device+0xdc/0x194
[ef117e30] [c026e174] __driver_attach+0x78/0xb4
[ef117e50] [c026d610] bus_for_each_dev+0x68/0xc0
[ef117e80] [c026de3c] driver_attach+0x24/0x34
[ef117e90] [c026cdbc] bus_add_driver+0xc8/0x278
[ef117eb0] [c026e59c] driver_register+0xd0/0x188
[ef117ed0] [c026f82c] platform_driver_register+0x6c/0x7c
[ef117ee0] [f223602c] mv643xx_eth_init_module+0x2c/0x90 [mv643xx_eth]
[ef117f00] [c0003eec] do_one_initcall+0x70/0x20c
[ef117f20] [c005ccb4] sys_init_module+0xf4/0x248
[ef117f40] [c0015b20] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xff654b8
    LR = 0x10002e7c
Instruction dump:
807c0000 7fc4f378 4bffff3d 3c80c043 38843afa 4bf2bb8d 809d0010 4bf2bb85
7c641b78 3c60c043 38633ac7 482c7d91 <0fe00000> 7fc3f378 4bfa4f75 80010024
mii_bus 0 failed to register
mv643xx_eth: probe of mv643xx_eth.0 failed with error -12
Unable to handle kernel paging request for data at address 0x00000000
Faulting instruction address: 0xf222f26c
usbcore: registered new interface driver hiddev
Oops: Kernel access of bad area, sig: 11 [#1]
PREEMPT CHRP
last sysfs file: /sys/devices/virtual/tty/ptyv3/uevent
Modules linked in: snd_mpu401_uart usbhid(+) snd_rawmidi mv643xx_eth(+) firewire
_core via_rhine i2c_viapro
NIP: f222f26c LR: f222f264 CTR: c026daf0
REGS: ef117d10 TRAP: 0300   Tainted: G        W   (2.6.34)
MSR: 00009032 <EE,ME,IR,DR>  CR: 84002488  XER: 20000000
DAR: 00000000, DSISR: 40000000
TASK = ef110630[80] 'modprobe' THREAD: ef116000
GPR00: f222f258 ef117dc0 ef110630 00000000 efa9e320 efa9e005 efa9e155 00000000
GPR08: 00000400 ef856cc0 efa9e314 00000001 ffffffff 10023ab8 00000000 00000000
GPR16: 00000000 107db0b0 00000000 107db088 10005558 00000003 107db324 107db008
GPR24: 10005558 c04ca290 0000a9ce c04ca288 c04ca138 fffffff4 efa9e000 efa9e320
NIP [f222f26c] mv643xx_eth_probe+0xd8/0x690 [mv643xx_eth]
LR [f222f264] mv643xx_eth_probe+0xd0/0x690 [mv643xx_eth]
Call Trace:
[ef117dc0] [f222f258] mv643xx_eth_probe+0xc4/0x690 [mv643xx_eth] (unreliable)
[ef117e00] [c026f4b4] platform_drv_probe+0x20/0x30
[ef117e10] [c026e044] driver_probe_device+0xdc/0x194
[ef117e30] [c026e174] __driver_attach+0x78/0xb4
[ef117e50] [c026d610] bus_for_each_dev+0x68/0xc0
[ef117e80] [c026de3c] driver_attach+0x24/0x34
[ef117e90] [c026cdbc] bus_add_driver+0xc8/0x278
[ef117eb0] [c026e59c] driver_register+0xd0/0x188
[ef117ed0] [c026f82c] platform_driver_register+0x6c/0x7c
[ef117ee0] [f223603c] mv643xx_eth_init_module+0x3c/0x90 [mv643xx_eth]
[ef117f00] [c0003eec] do_one_initcall+0x70/0x20c
[ef117f20] [c005ccb4] sys_init_module+0xf4/0x248
[ef117f40] [c0015b20] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xff654b8
    LR = 0x10002e7c
Instruction dump:
7c7e1b79 41820544 3bfe0320 3b3b0008 7f23cb78 7fe4fb78 480019e5 807c0000
38630008 48001dc9 907e0320 817c0004 <80030000> 556b502a 396b0400 7d605a14
[...]

cheers,
nello

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "event-scan failed" logflood
  2010-05-18 16:30             ` nello martuscielli
@ 2010-05-19  1:39               ` Michael Ellerman
  2010-06-11 13:32                 ` acrux@cruxppc.org
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Ellerman @ 2010-05-19  1:39 UTC (permalink / raw)
  To: nello martuscielli; +Cc: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 2459 bytes --]

On Tue, 2010-05-18 at 16:30 +0000, nello martuscielli wrote:
> Michael Ellerman <michael <at> ellerman.id.au> writes:
> 
> _omissis__
> > > 
> > > hi, is there available that patch?
> > > With the fresh new 2.6.34 the logflood problem is still present.
> > 
> > You could try this, completely untested:
> > 
> > diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
> > index 4190eae..fd68bed 100644
> > --- a/arch/powerpc/kernel/rtasd.c
> > +++ b/arch/powerpc/kernel/rtasd.c
> > @@ -490,6 +490,12 @@ static int __init rtas_init(void)
> >                 return -ENODEV;
> >         }
> > 
> > +       if (!rtas_event_scan_rate) {
> > +               /* Broken firmware: take a rate of zero to mean don't scan */
> > +               printk(KERN_DEBUG "rtasd: scan rate is 0, not scanning\n");
> > +               return 0;
> > +       }
> > +
> >         /* Make room for the sequence number */
> >         rtas_error_log_max = rtas_get_error_log_max();
> >         rtas_error_log_buffer_max = rtas_error_log_max + sizeof(int);
> > 
> 
> 
> hi Michael, thanks for pointing me to that patch, now my Pegasos2 G4 seems to
> work fine.

Great. Can I add Tested-by you?

> Anyway i got these dumps caused by mv643xx_eth:
> 
> 
> [...]
> sysfs: cannot create duplicate filename '/class/mdio_bus/0'
> ------------[ cut here ]------------
> Badness at fs/sysfs/dir.c:451
> NIP: c00efc70 LR: c00efc70 CTR: 00000000
> REGS: ef117c70 TRAP: 0700   Not tainted  (2.6.34)
> MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 22002488  XER: 00000000
> TASK = ef110630[80] 'modprobe' THREAD: ef116000
> GPR00: c00efc70 ef117d20 ef110630 00000042 c0009480 00000000 00134ac8 00000000
> GPR08: 00000000 00000000 22002482 10000002 ffffffff 10023ab8 00000000 00000000
> GPR16: 00000000 107db0b0 00000000 107db088 10005558 00000003 107db324 107db008
> GPR24: 10005558 00000000 0000a9ce fffffffe ef117d48 ef0a4300 efb87000 ffffffef
> NIP [c00efc70] sysfs_add_one+0x88/0xb8
> LR [c00efc70] sysfs_add_one+0x88/0xb8
> Call Trace:
> [ef117d20] [c00efc70] sysfs_add_one+0x88/0xb8 (unreliable)
> [ef117d40] [c00f0d9c] sysfs_do_create_link+0x104/0x1ac
> [ef117d70] [c026b238] device_add+0x1c8/0x534
> [ef117db0] [c02b2e40] mdiobus_register+0xb4/0x1d8
> [ef117de0] [f222c510] mv643xx_eth_shared_probe+0x14c/0x420 [mv643xx_eth]

Can't really help with that, looks like it's trying to register the same
mdio bus twice?

cheers

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: "event-scan failed" logflood
  2010-05-19  1:39               ` Michael Ellerman
@ 2010-06-11 13:32                 ` acrux@cruxppc.org
  0 siblings, 0 replies; 10+ messages in thread
From: acrux@cruxppc.org @ 2010-06-11 13:32 UTC (permalink / raw)
  To: linuxppc-dev



Michael Ellerman wrote:
> 
> On Tue, 2010-05-18 at 16:30 +0000, nello martuscielli wrote:
>> Michael Ellerman <michael <at> ellerman.id.au> writes:
>> 
>> _omissis__
>> > > 
>> > > hi, is there available that patch?
>> > > With the fresh new 2.6.34 the logflood problem is still present.
>> > 
>> > You could try this, completely untested:
>> > 
>> > diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c
>> > index 4190eae..fd68bed 100644
>> > --- a/arch/powerpc/kernel/rtasd.c
>> > +++ b/arch/powerpc/kernel/rtasd.c
>> > @@ -490,6 +490,12 @@ static int __init rtas_init(void)
>> >                 return -ENODEV;
>> >         }
>> > 
>> > +       if (!rtas_event_scan_rate) {
>> > +               /* Broken firmware: take a rate of zero to mean don't
>> scan */
>> > +               printk(KERN_DEBUG "rtasd: scan rate is 0, not
>> scanning\n");
>> > +               return 0;
>> > +       }
>> > +
>> >         /* Make room for the sequence number */
>> >         rtas_error_log_max = rtas_get_error_log_max();
>> >         rtas_error_log_buffer_max = rtas_error_log_max + sizeof(int);
>> > 
>> 
>> 
>> hi Michael, thanks for pointing me to that patch, now my Pegasos2 G4
>> seems to
>> work fine.
> 
> Great. Can I add Tested-by you?
> 
> 

hi Michael,
thanks, this patch seems to solve that problem on my Pegasos2 too.


thanks,
Nico

-- 
View this message in context: http://old.nabble.com/Re%3A-%22event-scan-failed%22-logflood-tp28585064p28855222.html
Sent from the linuxppc-dev mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-06-11 13:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-13  4:13 "event-scan failed" logflood pacman
2010-05-13  6:04 ` Michael Ellerman
2010-05-13 10:46   ` Benjamin Herrenschmidt
2010-05-13 11:16     ` pacman
2010-05-13 21:28       ` Benjamin Herrenschmidt
2010-05-17 12:58         ` nello martuscielli
2010-05-18 12:33           ` Michael Ellerman
2010-05-18 16:30             ` nello martuscielli
2010-05-19  1:39               ` Michael Ellerman
2010-06-11 13:32                 ` acrux@cruxppc.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).