* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-19 7:29 ` Steven Noonan
@ 2008-09-19 8:17 ` Ingo Molnar
2008-09-19 8:22 ` Steven Noonan
2008-09-19 8:22 ` Steven Noonan
2008-09-19 14:28 ` Senthil Balasubramanian
2 siblings, 1 reply; 62+ messages in thread
From: Ingo Molnar @ 2008-09-19 8:17 UTC (permalink / raw)
To: Steven Noonan
Cc: Luis R. Rodriguez, Luis Rodriguez, ath9k-devel@lists.ath9k.org,
linux-wireless, LKML
* Steven Noonan <steven@uplinklabs.net> wrote:
> On Thu, Sep 18, 2008 at 8:01 PM, Luis R. Rodriguez
> <lrodriguez@atheros.com> wrote:
> > Thanks for testing, and glad to see this is resolved.
> >
>
> Damn. It's back. I was using wireless normally this evening. Browsing
> the web, that kind of thing, and then the wireless inexplicably
> dropped (even with the group rekeying patch), so I unloaded/reloaded
> the module. This popped up in dmesg:
>
> [ 3834.375658] vendor=8086 device=27d2
> [ 3834.375666] ath9k 0000:03:00.0: PCI INT A disabled
> [ 3834.375716] ath9k: driver unloaded
> [ 3838.552419] ath9k: 0.1
> [ 3838.552502] vendor=8086 device=27d2
> [ 3838.552511] ath9k 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
> [ 3838.552532] ath9k 0000:03:00.0: setting latency timer to 64
> [ 3838.688924] phy1: Selected rate control algorithm 'ath9k_rate_control'
> [ 3838.693652] phy1: Atheros 5416: mem=0xffffc20000060000, irq=17
> [ 3839.427125] irq 17: nobody cared (try booting with the "irqpoll" option)
> [ 3839.427136] Pid: 0, comm: swapper Tainted: P
> 2.6.27-rc6-tip-00478-g74f1a36 #1
> [ 3839.427141] Call Trace:
> [ 3839.427145] <IRQ> [<ffffffff802219c5>] ? read_hpet+0x9/0x1c
> [ 3839.427165] [<ffffffff8026af71>] __report_bad_irq+0x3d/0x8c
> [ 3839.427172] [<ffffffff8026b0c6>] note_interrupt+0x106/0x160
> [ 3839.427180] [<ffffffff8026b801>] handle_fasteoi_irq+0xad/0xda
> [ 3839.427188] [<ffffffff8020f7b0>] do_IRQ+0x10c/0x190
> [ 3839.427194] [<ffffffff8020c551>] ret_from_intr+0x0/0xa
> [ 3839.427198] <EOI> [<ffffffff8026c6f0>] rcu_pending+0x62/0x6e
> [ 3839.427211] [<ffffffff8025bc11>] ? tick_nohz_stop_sched_tick+0x2e4/0x2f3
> [ 3839.427218] [<ffffffff8020ad94>] cpu_idle+0x7b/0xdb
> [ 3839.427226] [<ffffffff8060c921>] rest_init+0x75/0x77
> [ 3839.427231] handlers:
> [ 3839.427234] [<ffffffffa0240238>] (ath_isr+0x0/0x170 [ath9k])
> [ 3839.427263] Disabling IRQ #17
> [ 3842.263699] ADDRCONF(NETDEV_UP): wlan0: link is not ready
> [ 3848.035003] ADDRCONF(NETDEV_UP): wlan0: link is not ready
> [ 3848.432701] ADDRCONF(NETDEV_UP): wlan0: link is not ready
> [ 3850.216947] wlan0: authenticate with AP 00:1e:52:79:4d:01
> [ 3850.217027] wlan0: authenticate with AP 00:1e:52:79:4d:01
> [ 3850.228326] wlan0: authenticated
> [ 3850.228336] wlan0: associate with AP 00:1e:52:79:4d:01
> [ 3850.428140] wlan0: associate with AP 00:1e:52:79:4d:01
> [ 3850.628151] wlan0: associate with AP 00:1e:52:79:4d:01
> [ 3850.728305] wlan0: RX AssocResp from 00:1e:52:79:4d:01 (capab=0x431
> status=0 aid=1)
> [ 3850.728314] wlan0: associated
> [ 3850.728655] wlan0 (WE) : Wireless Event too big (320)
> [ 3850.743377] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
> [ 3860.855104] wlan0: no IPv6 routers present
>
> I rebuilt the module with DBG_ATH_INTERRUPT, but it somehow stumbled
> itself back into working order while I was compiling. I can't keep the
> interrupt debugging on all the time because it's just -too verbose-,
> and when I pop a debug version of the module in, then it's too late to
> track the issue....
in tip/master there's an ftrace_printk() facility. You can just replace
the verbose printk()s with ftrace_printk() and see the result in
/debug/tracing/trace.
This sort of tracing has very low overhead and can be used in an easy
ad-hoc manner with no extra infrastructure. Here's a few quick-start
links about how to enable the scheduler tracer:
http://redhat.com/~mingo/sched-devel.git/readme-tracer.txt
http://redhat.com/~mingo/sched-devel.git/howto-trace-latencies.txt
any ftrace_printk() you add in the kernel will show up in that trace.
(If the scheduling events are uninteresting and clutter the output then
you might want to remove the scheduler tracing entries from kernel/*.c
by removing the trace_sched_*() calls or use a less noisy tracer.)
to get enough of a trace history you might want to increase the number
of trace entries in /debug/tracing/trace_entries from 16K to 128K or so.
Ingo
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-19 8:17 ` Ingo Molnar
@ 2008-09-19 8:22 ` Steven Noonan
2008-09-19 8:32 ` Ingo Molnar
2008-09-19 8:45 ` Steven Noonan
0 siblings, 2 replies; 62+ messages in thread
From: Steven Noonan @ 2008-09-19 8:22 UTC (permalink / raw)
To: Ingo Molnar
Cc: Luis R. Rodriguez, Luis Rodriguez, ath9k-devel@lists.ath9k.org,
linux-wireless, LKML
On Fri, Sep 19, 2008 at 1:17 AM, Ingo Molnar <mingo@elte.hu> wrote:
> in tip/master there's an ftrace_printk() facility. You can just replace
> the verbose printk()s with ftrace_printk() and see the result in
> /debug/tracing/trace.
>
> This sort of tracing has very low overhead and can be used in an easy
> ad-hoc manner with no extra infrastructure. Here's a few quick-start
> links about how to enable the scheduler tracer:
>
> http://redhat.com/~mingo/sched-devel.git/readme-tracer.txt
> http://redhat.com/~mingo/sched-devel.git/howto-trace-latencies.txt
>
> any ftrace_printk() you add in the kernel will show up in that trace.
>
> (If the scheduling events are uninteresting and clutter the output then
> you might want to remove the scheduler tracing entries from kernel/*.c
> by removing the trace_sched_*() calls or use a less noisy tracer.)
>
> to get enough of a trace history you might want to increase the number
> of trace entries in /debug/tracing/trace_entries from 16K to 128K or so.
>
Thanks for the suggestion, I'll do that. :)
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-19 8:22 ` Steven Noonan
@ 2008-09-19 8:32 ` Ingo Molnar
2008-09-27 1:23 ` Steven Rostedt
2008-09-19 8:45 ` Steven Noonan
1 sibling, 1 reply; 62+ messages in thread
From: Ingo Molnar @ 2008-09-19 8:32 UTC (permalink / raw)
To: Steven Noonan
Cc: Luis R. Rodriguez, Luis Rodriguez, ath9k-devel@lists.ath9k.org,
linux-wireless, LKML, Steven Rostedt
* Steven Noonan <steven@uplinklabs.net> wrote:
> On Fri, Sep 19, 2008 at 1:17 AM, Ingo Molnar <mingo@elte.hu> wrote:
> > in tip/master there's an ftrace_printk() facility. You can just replace
> > the verbose printk()s with ftrace_printk() and see the result in
> > /debug/tracing/trace.
> >
> > This sort of tracing has very low overhead and can be used in an easy
> > ad-hoc manner with no extra infrastructure. Here's a few quick-start
> > links about how to enable the scheduler tracer:
> >
> > http://redhat.com/~mingo/sched-devel.git/readme-tracer.txt
> > http://redhat.com/~mingo/sched-devel.git/howto-trace-latencies.txt
> >
> > any ftrace_printk() you add in the kernel will show up in that trace.
> >
> > (If the scheduling events are uninteresting and clutter the output then
> > you might want to remove the scheduler tracing entries from kernel/*.c
> > by removing the trace_sched_*() calls or use a less noisy tracer.)
> >
> > to get enough of a trace history you might want to increase the number
> > of trace entries in /debug/tracing/trace_entries from 16K to 128K or so.
> >
>
> Thanks for the suggestion, I'll do that. :)
one more suggestion: you should first check how large the 'scope' of
your trace is - i.e. how long of a time span it can cover, in the
workload that you are going to trace.
if it's too short - say just a few hundred milliseconds then you need a
larger trace buffer or fewer trace entries.
another thing: you can auto-stop tracing on the failure condition. That
way you dont have to worry about whether you notice the badness.
Unfortunately there's no easy callback to just anonymously stop all
tracing that goes on. There's ftrace_kill_atomic() but i'm not sure it
properly stops the scheduler tracer.
What is needed is to set kernel/trace/trace.c's tracing_disabled
variable to 0. I've Cc:-ed Steve Rostedt - Steve, why is there no easy
ad-hoc way to trigger a stop-tracing event without elaborate changes?
Ingo
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-19 8:32 ` Ingo Molnar
@ 2008-09-27 1:23 ` Steven Rostedt
2008-09-27 19:20 ` Ingo Molnar
0 siblings, 1 reply; 62+ messages in thread
From: Steven Rostedt @ 2008-09-27 1:23 UTC (permalink / raw)
To: Ingo Molnar
Cc: Steven Noonan, Luis R. Rodriguez, Luis Rodriguez,
ath9k-devel@lists.ath9k.org, linux-wireless, LKML
I hate trips where my email box becomes filled. I skip over a lot of
emails that I should not have.
On Fri, 19 Sep 2008, Ingo Molnar wrote:
>
> What is needed is to set kernel/trace/trace.c's tracing_disabled
> variable to 0. I've Cc:-ed Steve Rostedt - Steve, why is there no easy
> ad-hoc way to trigger a stop-tracing event without elaborate changes?
I think I added a way to do this. Hmm, I think it was in -rt that I added
it, and have not gotten around to sending you the updates.
Yes, we need an easy way to disable the tracer. I think we have
ftrace_disable() function. Perhaps that only disables tracing that checks
the tracing_disabled variable.
Anyway, I made sure the new ring_buffer code has an easy way to disable
it.
-- Steve
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-27 1:23 ` Steven Rostedt
@ 2008-09-27 19:20 ` Ingo Molnar
2008-09-27 19:28 ` Steven Rostedt
0 siblings, 1 reply; 62+ messages in thread
From: Ingo Molnar @ 2008-09-27 19:20 UTC (permalink / raw)
To: Steven Rostedt
Cc: Steven Noonan, Luis R. Rodriguez, Luis Rodriguez,
ath9k-devel@lists.ath9k.org, linux-wireless, LKML
* Steven Rostedt <rostedt@goodmis.org> wrote:
>
> I hate trips where my email box becomes filled. I skip over a lot of
> emails that I should not have.
>
> On Fri, 19 Sep 2008, Ingo Molnar wrote:
> >
> > What is needed is to set kernel/trace/trace.c's tracing_disabled
> > variable to 0. I've Cc:-ed Steve Rostedt - Steve, why is there no easy
> > ad-hoc way to trigger a stop-tracing event without elaborate changes?
>
> I think I added a way to do this. Hmm, I think it was in -rt that I added
> it, and have not gotten around to sending you the updates.
>
> Yes, we need an easy way to disable the tracer. I think we have
> ftrace_disable() function. Perhaps that only disables tracing that
> checks the tracing_disabled variable.
we have kill_ftrace(), but that will permanently zap it.
Ingo
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-27 19:20 ` Ingo Molnar
@ 2008-09-27 19:28 ` Steven Rostedt
0 siblings, 0 replies; 62+ messages in thread
From: Steven Rostedt @ 2008-09-27 19:28 UTC (permalink / raw)
To: Ingo Molnar
Cc: Steven Noonan, Luis R. Rodriguez, Luis Rodriguez,
ath9k-devel@lists.ath9k.org, linux-wireless, LKML
On Sat, 27 Sep 2008, Ingo Molnar wrote:
>
> * Steven Rostedt <rostedt@goodmis.org> wrote:
>
> >
> > I hate trips where my email box becomes filled. I skip over a lot of
> > emails that I should not have.
> >
> > On Fri, 19 Sep 2008, Ingo Molnar wrote:
> > >
> > > What is needed is to set kernel/trace/trace.c's tracing_disabled
> > > variable to 0. I've Cc:-ed Steve Rostedt - Steve, why is there no easy
> > > ad-hoc way to trigger a stop-tracing event without elaborate changes?
> >
> > I think I added a way to do this. Hmm, I think it was in -rt that I added
> > it, and have not gotten around to sending you the updates.
> >
> > Yes, we need an easy way to disable the tracer. I think we have
> > ftrace_disable() function. Perhaps that only disables tracing that
> > checks the tracing_disabled variable.
>
> we have kill_ftrace(), but that will permanently zap it.
OK, on Monday, as I work on getting ftrace to use the ring buffer, I'll
also look to see which features are only in -rt that need to go to tip.
-- Steve
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-19 8:22 ` Steven Noonan
2008-09-19 8:32 ` Ingo Molnar
@ 2008-09-19 8:45 ` Steven Noonan
2008-09-19 8:51 ` Ingo Molnar
1 sibling, 1 reply; 62+ messages in thread
From: Steven Noonan @ 2008-09-19 8:45 UTC (permalink / raw)
To: Ingo Molnar
Cc: Luis R. Rodriguez, Luis Rodriguez, ath9k-devel@lists.ath9k.org,
linux-wireless, LKML
On Fri, Sep 19, 2008 at 1:22 AM, Steven Noonan <steven@uplinklabs.net> wrote:
> On Fri, Sep 19, 2008 at 1:17 AM, Ingo Molnar <mingo@elte.hu> wrote:
>> in tip/master there's an ftrace_printk() facility. You can just replace
>> the verbose printk()s with ftrace_printk() and see the result in
>> /debug/tracing/trace.
>>
>> This sort of tracing has very low overhead and can be used in an easy
>> ad-hoc manner with no extra infrastructure. Here's a few quick-start
>> links about how to enable the scheduler tracer:
>>
>> http://redhat.com/~mingo/sched-devel.git/readme-tracer.txt
>> http://redhat.com/~mingo/sched-devel.git/howto-trace-latencies.txt
>>
>> any ftrace_printk() you add in the kernel will show up in that trace.
>>
>> (If the scheduling events are uninteresting and clutter the output then
>> you might want to remove the scheduler tracing entries from kernel/*.c
>> by removing the trace_sched_*() calls or use a less noisy tracer.)
>>
>> to get enough of a trace history you might want to increase the number
>> of trace entries in /debug/tracing/trace_entries from 16K to 128K or so.
>>
>
> Thanks for the suggestion, I'll do that. :)
>
It'd be extremely convenient to have a tracer that did absolutely
nothing but display ftrace_printk()s.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-19 8:45 ` Steven Noonan
@ 2008-09-19 8:51 ` Ingo Molnar
0 siblings, 0 replies; 62+ messages in thread
From: Ingo Molnar @ 2008-09-19 8:51 UTC (permalink / raw)
To: Steven Noonan
Cc: Luis R. Rodriguez, Luis Rodriguez, ath9k-devel@lists.ath9k.org,
linux-wireless, LKML, Steven Rostedt
* Steven Noonan <steven@uplinklabs.net> wrote:
> On Fri, Sep 19, 2008 at 1:22 AM, Steven Noonan <steven@uplinklabs.net> wrote:
> > On Fri, Sep 19, 2008 at 1:17 AM, Ingo Molnar <mingo@elte.hu> wrote:
> >> in tip/master there's an ftrace_printk() facility. You can just replace
> >> the verbose printk()s with ftrace_printk() and see the result in
> >> /debug/tracing/trace.
> >>
> >> This sort of tracing has very low overhead and can be used in an easy
> >> ad-hoc manner with no extra infrastructure. Here's a few quick-start
> >> links about how to enable the scheduler tracer:
> >>
> >> http://redhat.com/~mingo/sched-devel.git/readme-tracer.txt
> >> http://redhat.com/~mingo/sched-devel.git/howto-trace-latencies.txt
> >>
> >> any ftrace_printk() you add in the kernel will show up in that trace.
> >>
> >> (If the scheduling events are uninteresting and clutter the output then
> >> you might want to remove the scheduler tracing entries from kernel/*.c
> >> by removing the trace_sched_*() calls or use a less noisy tracer.)
> >>
> >> to get enough of a trace history you might want to increase the number
> >> of trace entries in /debug/tracing/trace_entries from 16K to 128K or so.
> >>
> >
> > Thanks for the suggestion, I'll do that. :)
> >
>
> It'd be extremely convenient to have a tracer that did absolutely
> nothing but display ftrace_printk()s.
yeah, had the same thought.
Could you please contribute one? :) I'd suggest you take the smallest
tracer as a template: trace_sched_switch.c and turn that into
trace_nop.c - and change all the function names and ASCII and kconfig
glue to 'nop'/'NOP'?
That can then be used as a basis for additional ftrace plugins, or for
ad-hoc temporary tracing.
Ingo
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-19 7:29 ` Steven Noonan
2008-09-19 8:17 ` Ingo Molnar
@ 2008-09-19 8:22 ` Steven Noonan
2008-09-19 14:28 ` Senthil Balasubramanian
2 siblings, 0 replies; 62+ messages in thread
From: Steven Noonan @ 2008-09-19 8:22 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Luis Rodriguez, Ingo Molnar, ath9k-devel@lists.ath9k.org,
linux-wireless, LKML
On Fri, Sep 19, 2008 at 12:29 AM, Steven Noonan <steven@uplinklabs.net> wrote:
> On Thu, Sep 18, 2008 at 8:01 PM, Luis R. Rodriguez
> <lrodriguez@atheros.com> wrote:
>> Thanks for testing, and glad to see this is resolved.
>>
>
> Damn. It's back. I was using wireless normally this evening. Browsing
> the web, that kind of thing, and then the wireless inexplicably
> dropped (even with the group rekeying patch), so I unloaded/reloaded
> the module. This popped up in dmesg:
>
> [ 3834.375658] vendor=8086 device=27d2
> [ 3834.375666] ath9k 0000:03:00.0: PCI INT A disabled
> [ 3834.375716] ath9k: driver unloaded
> [ 3838.552419] ath9k: 0.1
> [ 3838.552502] vendor=8086 device=27d2
> [ 3838.552511] ath9k 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
> [ 3838.552532] ath9k 0000:03:00.0: setting latency timer to 64
> [ 3838.688924] phy1: Selected rate control algorithm 'ath9k_rate_control'
> [ 3838.693652] phy1: Atheros 5416: mem=0xffffc20000060000, irq=17
> [ 3839.427125] irq 17: nobody cared (try booting with the "irqpoll" option)
> [ 3839.427136] Pid: 0, comm: swapper Tainted: P
> 2.6.27-rc6-tip-00478-g74f1a36 #1
> [ 3839.427141] Call Trace:
> [ 3839.427145] <IRQ> [<ffffffff802219c5>] ? read_hpet+0x9/0x1c
> [ 3839.427165] [<ffffffff8026af71>] __report_bad_irq+0x3d/0x8c
> [ 3839.427172] [<ffffffff8026b0c6>] note_interrupt+0x106/0x160
> [ 3839.427180] [<ffffffff8026b801>] handle_fasteoi_irq+0xad/0xda
> [ 3839.427188] [<ffffffff8020f7b0>] do_IRQ+0x10c/0x190
> [ 3839.427194] [<ffffffff8020c551>] ret_from_intr+0x0/0xa
> [ 3839.427198] <EOI> [<ffffffff8026c6f0>] rcu_pending+0x62/0x6e
> [ 3839.427211] [<ffffffff8025bc11>] ? tick_nohz_stop_sched_tick+0x2e4/0x2f3
> [ 3839.427218] [<ffffffff8020ad94>] cpu_idle+0x7b/0xdb
> [ 3839.427226] [<ffffffff8060c921>] rest_init+0x75/0x77
> [ 3839.427231] handlers:
> [ 3839.427234] [<ffffffffa0240238>] (ath_isr+0x0/0x170 [ath9k])
> [ 3839.427263] Disabling IRQ #17
> [ 3842.263699] ADDRCONF(NETDEV_UP): wlan0: link is not ready
> [ 3848.035003] ADDRCONF(NETDEV_UP): wlan0: link is not ready
> [ 3848.432701] ADDRCONF(NETDEV_UP): wlan0: link is not ready
> [ 3850.216947] wlan0: authenticate with AP 00:1e:52:79:4d:01
> [ 3850.217027] wlan0: authenticate with AP 00:1e:52:79:4d:01
> [ 3850.228326] wlan0: authenticated
> [ 3850.228336] wlan0: associate with AP 00:1e:52:79:4d:01
> [ 3850.428140] wlan0: associate with AP 00:1e:52:79:4d:01
> [ 3850.628151] wlan0: associate with AP 00:1e:52:79:4d:01
> [ 3850.728305] wlan0: RX AssocResp from 00:1e:52:79:4d:01 (capab=0x431
> status=0 aid=1)
> [ 3850.728314] wlan0: associated
> [ 3850.728655] wlan0 (WE) : Wireless Event too big (320)
> [ 3850.743377] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
> [ 3860.855104] wlan0: no IPv6 routers present
>
> I rebuilt the module with DBG_ATH_INTERRUPT, but it somehow stumbled
> itself back into working order while I was compiling. I can't keep the
> interrupt debugging on all the time because it's just -too verbose-,
> and when I pop a debug version of the module in, then it's too late to
> track the issue....
>
Do pardon me if this is incredibly stupid, but I have some thoughts
about ath9k interrupt handling. Let me know if there's any validity
here. And if there's any invalid points, please help me understand
what's going on. :)
First of all, I know for certain the ath9k interrupt handler is
registered before the driver prints the line saying the name of the
device ("phy1: Atheros 5416"). So any IRQs to IRQ 17 should be seen by
the ath9k handler (ath_isr) after that point, right? Shortly after
that point is where we're seeing the "nobody cared". This seems to
suggest that it registered the handler successfully (because we didn't
get an error message), but the handler returned IRQ_NONE.
Second of all, I'm looking at the ath9k interrupt handler right now,
and there are a few cases where it returns IRQ_NONE. And here's where
I'm a bit fuzzy. Since there could be any number of things using IRQ
17 (though in my case, ath9k is on its own dedicated IRQ), it seems
odd that you check the value of sc->sc_invalid, since the cookie
passed to the handler might not actually be ath9k's cookie if multiple
drivers have registered IRQ handlers for that particular IRQ. Who
knows if what you're reading is even valid? Heck, what if some driver
uses a NULL for their cookie (however unlikely)? You'd get a
segmentation fault on the second line of the interrupt handler. Of
course, I could be completely wrong: does parent interrupt handler
check to see which device driver owns the particular device signaling
an IRQ and then call the appropriate handler?
- Steven
^ permalink raw reply [flat|nested] 62+ messages in thread* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-19 7:29 ` Steven Noonan
2008-09-19 8:17 ` Ingo Molnar
2008-09-19 8:22 ` Steven Noonan
@ 2008-09-19 14:28 ` Senthil Balasubramanian
2008-09-19 16:42 ` Steven Noonan
2 siblings, 1 reply; 62+ messages in thread
From: Senthil Balasubramanian @ 2008-09-19 14:28 UTC (permalink / raw)
To: Steven Noonan
Cc: Luis Rodriguez, Ingo Molnar, ath9k-devel@lists.ath9k.org,
linux-wireless, LKML
On Fri, Sep 19, 2008 at 12:59:29PM +0530, Steven Noonan wrote:
> On Thu, Sep 18, 2008 at 8:01 PM, Luis R. Rodriguez
> <lrodriguez@atheros.com> wrote:
> > Thanks for testing, and glad to see this is resolved.
> >
>
> Damn. It's back. I was using wireless normally this evening. Browsing
> the web, that kind of thing, and then the wireless inexplicably
> dropped (even with the group rekeying patch), so I unloaded/reloaded
> the module. This popped up in dmesg:
>
> [ 3834.375658] vendor=8086 device=27d2
> [ 3834.375666] ath9k 0000:03:00.0: PCI INT A disabled
> [ 3834.375716] ath9k: driver unloaded
> [ 3838.552419] ath9k: 0.1
> [ 3838.552502] vendor=8086 device=27d2
> [ 3838.552511] ath9k 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
> [ 3838.552532] ath9k 0000:03:00.0: setting latency timer to 64
> [ 3838.688924] phy1: Selected rate control algorithm 'ath9k_rate_control'
> [ 3838.693652] phy1: Atheros 5416: mem=0xffffc20000060000, irq=17
> [ 3839.427125] irq 17: nobody cared (try booting with the "irqpoll" option)
> [ 3839.427136] Pid: 0, comm: swapper Tainted: P
> 2.6.27-rc6-tip-00478-g74f1a36 #1
> [ 3839.427141] Call Trace:
> [ 3839.427145] <IRQ> [<ffffffff802219c5>] ? read_hpet+0x9/0x1c
> [ 3839.427165] [<ffffffff8026af71>] __report_bad_irq+0x3d/0x8c
> [ 3839.427172] [<ffffffff8026b0c6>] note_interrupt+0x106/0x160
> [ 3839.427180] [<ffffffff8026b801>] handle_fasteoi_irq+0xad/0xda
> [ 3839.427188] [<ffffffff8020f7b0>] do_IRQ+0x10c/0x190
> [ 3839.427194] [<ffffffff8020c551>] ret_from_intr+0x0/0xa
> [ 3839.427198] <EOI> [<ffffffff8026c6f0>] rcu_pending+0x62/0x6e
> [ 3839.427211] [<ffffffff8025bc11>] ? tick_nohz_stop_sched_tick+0x2e4/0x2f3
> [ 3839.427218] [<ffffffff8020ad94>] cpu_idle+0x7b/0xdb
> [ 3839.427226] [<ffffffff8060c921>] rest_init+0x75/0x77
> [ 3839.427231] handlers:
> [ 3839.427234] [<ffffffffa0240238>] (ath_isr+0x0/0x170 [ath9k])
> [ 3839.427263] Disabling IRQ #17
> [ 3842.263699] ADDRCONF(NETDEV_UP): wlan0: link is not ready
> [ 3848.035003] ADDRCONF(NETDEV_UP): wlan0: link is not ready
> [ 3848.432701] ADDRCONF(NETDEV_UP): wlan0: link is not ready
> [ 3850.216947] wlan0: authenticate with AP 00:1e:52:79:4d:01
> [ 3850.217027] wlan0: authenticate with AP 00:1e:52:79:4d:01
> [ 3850.228326] wlan0: authenticated
> [ 3850.228336] wlan0: associate with AP 00:1e:52:79:4d:01
> [ 3850.428140] wlan0: associate with AP 00:1e:52:79:4d:01
> [ 3850.628151] wlan0: associate with AP 00:1e:52:79:4d:01
> [ 3850.728305] wlan0: RX AssocResp from 00:1e:52:79:4d:01 (capab=0x431
> status=0 aid=1)
> [ 3850.728314] wlan0: associated
> [ 3850.728655] wlan0 (WE) : Wireless Event too big (320)
> [ 3850.743377] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
> [ 3860.855104] wlan0: no IPv6 routers present
>
> I rebuilt the module with DBG_ATH_INTERRUPT, but it somehow stumbled
> itself back into working order while I was compiling. I can't keep the
> interrupt debugging on all the time because it's just -too verbose-,
> and when I pop a debug version of the module in, then it's too late to
> track the issue....
I am able to reproduce this IRQ nobody cared issue in my setup and the
following patch seems to be fixing the issue. Please try it out and let
me know if it solves your issue in your setup.
**********
IRQs should be disabled before calling free_irq. Also clear pending IRQs.
Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com>
---
drivers/net/wireless/ath9k/core.c | 2 ++
drivers/net/wireless/ath9k/main.c | 8 +++++++-
2 files changed, 9 insertions(+), 1 deletions(-)
diff --git a/drivers/net/wireless/ath9k/core.c b/drivers/net/wireless/ath9k/core.c
index c262ef2..c007dd2 100644
--- a/drivers/net/wireless/ath9k/core.c
+++ b/drivers/net/wireless/ath9k/core.c
@@ -1183,6 +1183,8 @@ void ath_deinit(struct ath_softc *sc)
DPRINTF(sc, ATH_DBG_CONFIG, "%s\n", __func__);
+ tasklet_kill(&sc->intr_tq);
+ tasklet_kill(&sc->bcon_tasklet);
ath_stop(sc);
if (!(sc->sc_flags & SC_OP_INVALID))
ath9k_hw_setpower(sc->sc_ah, ATH9K_PM_AWAKE);
diff --git a/drivers/net/wireless/ath9k/main.c b/drivers/net/wireless/ath9k/main.c
index aca893a..559e0e8 100644
--- a/drivers/net/wireless/ath9k/main.c
+++ b/drivers/net/wireless/ath9k/main.c
@@ -1781,10 +1781,16 @@ static void ath_pci_remove(struct pci_dev *pdev)
{
struct ieee80211_hw *hw = pci_get_drvdata(pdev);
struct ath_softc *sc = hw->priv;
+ enum ath9k_int status;
- if (pdev->irq)
+ if (pdev->irq) {
+ ath9k_hw_set_interrupts(sc->sc_ah, 0);
+ ath9k_hw_getisr(sc->sc_ah, &status); /* NB: clears ISR too */
+ sc->sc_flags |= SC_OP_INVALID;
free_irq(pdev->irq, sc);
+ }
ath_detach(sc);
+
pci_iounmap(pdev, sc->mem);
pci_release_region(pdev, 0);
pci_disable_device(pdev);
--
1.5.5
^ permalink raw reply related [flat|nested] 62+ messages in thread* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-19 14:28 ` Senthil Balasubramanian
@ 2008-09-19 16:42 ` Steven Noonan
2008-09-19 17:58 ` Senthil Balasubramanian
0 siblings, 1 reply; 62+ messages in thread
From: Steven Noonan @ 2008-09-19 16:42 UTC (permalink / raw)
To: Senthil Balasubramanian
Cc: Luis Rodriguez, Ingo Molnar, ath9k-devel@lists.ath9k.org,
linux-wireless, LKML
On Fri, 19 Sep 2008, Senthil Balasubramanian wrote:
> On Fri, Sep 19, 2008 at 12:59:29PM +0530, Steven Noonan wrote:
> > On Thu, Sep 18, 2008 at 8:01 PM, Luis R. Rodriguez
> > <lrodriguez@atheros.com> wrote:
> > > Thanks for testing, and glad to see this is resolved.
> > >
> >
> > Damn. It's back. I was using wireless normally this evening. Browsing
> > the web, that kind of thing, and then the wireless inexplicably
> > dropped (even with the group rekeying patch), so I unloaded/reloaded
> > the module. This popped up in dmesg:
> >
> > [ 3834.375658] vendor=8086 device=27d2
> > [ 3834.375666] ath9k 0000:03:00.0: PCI INT A disabled
> > [ 3834.375716] ath9k: driver unloaded
> > [ 3838.552419] ath9k: 0.1
> > [ 3838.552502] vendor=8086 device=27d2
> > [ 3838.552511] ath9k 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
> > [ 3838.552532] ath9k 0000:03:00.0: setting latency timer to 64
> > [ 3838.688924] phy1: Selected rate control algorithm 'ath9k_rate_control'
> > [ 3838.693652] phy1: Atheros 5416: mem=0xffffc20000060000, irq=17
> > [ 3839.427125] irq 17: nobody cared (try booting with the "irqpoll" option)
> > [ 3839.427136] Pid: 0, comm: swapper Tainted: P
> > 2.6.27-rc6-tip-00478-g74f1a36 #1
> > [ 3839.427141] Call Trace:
> > [ 3839.427145] <IRQ> [<ffffffff802219c5>] ? read_hpet+0x9/0x1c
> > [ 3839.427165] [<ffffffff8026af71>] __report_bad_irq+0x3d/0x8c
> > [ 3839.427172] [<ffffffff8026b0c6>] note_interrupt+0x106/0x160
> > [ 3839.427180] [<ffffffff8026b801>] handle_fasteoi_irq+0xad/0xda
> > [ 3839.427188] [<ffffffff8020f7b0>] do_IRQ+0x10c/0x190
> > [ 3839.427194] [<ffffffff8020c551>] ret_from_intr+0x0/0xa
> > [ 3839.427198] <EOI> [<ffffffff8026c6f0>] rcu_pending+0x62/0x6e
> > [ 3839.427211] [<ffffffff8025bc11>] ? tick_nohz_stop_sched_tick+0x2e4/0x2f3
> > [ 3839.427218] [<ffffffff8020ad94>] cpu_idle+0x7b/0xdb
> > [ 3839.427226] [<ffffffff8060c921>] rest_init+0x75/0x77
> > [ 3839.427231] handlers:
> > [ 3839.427234] [<ffffffffa0240238>] (ath_isr+0x0/0x170 [ath9k])
> > [ 3839.427263] Disabling IRQ #17
> > [ 3842.263699] ADDRCONF(NETDEV_UP): wlan0: link is not ready
> > [ 3848.035003] ADDRCONF(NETDEV_UP): wlan0: link is not ready
> > [ 3848.432701] ADDRCONF(NETDEV_UP): wlan0: link is not ready
> > [ 3850.216947] wlan0: authenticate with AP 00:1e:52:79:4d:01
> > [ 3850.217027] wlan0: authenticate with AP 00:1e:52:79:4d:01
> > [ 3850.228326] wlan0: authenticated
> > [ 3850.228336] wlan0: associate with AP 00:1e:52:79:4d:01
> > [ 3850.428140] wlan0: associate with AP 00:1e:52:79:4d:01
> > [ 3850.628151] wlan0: associate with AP 00:1e:52:79:4d:01
> > [ 3850.728305] wlan0: RX AssocResp from 00:1e:52:79:4d:01 (capab=0x431
> > status=0 aid=1)
> > [ 3850.728314] wlan0: associated
> > [ 3850.728655] wlan0 (WE) : Wireless Event too big (320)
> > [ 3850.743377] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
> > [ 3860.855104] wlan0: no IPv6 routers present
> >
> > I rebuilt the module with DBG_ATH_INTERRUPT, but it somehow stumbled
> > itself back into working order while I was compiling. I can't keep the
> > interrupt debugging on all the time because it's just -too verbose-,
> > and when I pop a debug version of the module in, then it's too late to
> > track the issue....
>
> I am able to reproduce this IRQ nobody cared issue in my setup and the
> following patch seems to be fixing the issue. Please try it out and let
> me know if it solves your issue in your setup.
The patch you prvide doesn't want to apply. What code did you base this
on?
The first change listed doesn't work because there is no tasklet_kill() in
core.c, and the line immediately after ath_stop there has
"!sc->sc_invalid" instead of the "!(sc->sc_flags & SC_OP_INVALID)".
The second fails because SC_OP_INVALID isn't defined.
However, if your patch did apply to my code, I bet it'd solve the issue,
based on what it says it does.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-19 16:42 ` Steven Noonan
@ 2008-09-19 17:58 ` Senthil Balasubramanian
2008-09-19 18:23 ` Luis R. Rodriguez
0 siblings, 1 reply; 62+ messages in thread
From: Senthil Balasubramanian @ 2008-09-19 17:58 UTC (permalink / raw)
To: Steven Noonan
Cc: Senthilkumar Balasubramanian, Luis Rodriguez, Ingo Molnar,
ath9k-devel@lists.ath9k.org, linux-wireless, LKML
On Fri, Sep 19, 2008 at 10:12:04PM +0530, Steven Noonan wrote:
>
>
> On Fri, 19 Sep 2008, Senthil Balasubramanian wrote:
>
> > On Fri, Sep 19, 2008 at 12:59:29PM +0530, Steven Noonan wrote:
> > > On Thu, Sep 18, 2008 at 8:01 PM, Luis R. Rodriguez
> > > <lrodriguez@atheros.com> wrote:
> > > > Thanks for testing, and glad to see this is resolved.
> > > >
> > >
> > > Damn. It's back. I was using wireless normally this evening. Browsing
> > > the web, that kind of thing, and then the wireless inexplicably
> > > dropped (even with the group rekeying patch), so I unloaded/reloaded
> > > the module. This popped up in dmesg:
> > >
> > > [ 3834.375658] vendor=8086 device=27d2
> > > [ 3834.375666] ath9k 0000:03:00.0: PCI INT A disabled
> > > [ 3834.375716] ath9k: driver unloaded
> > > [ 3838.552419] ath9k: 0.1
> > > [ 3838.552502] vendor=8086 device=27d2
> > > [ 3838.552511] ath9k 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
> > > [ 3838.552532] ath9k 0000:03:00.0: setting latency timer to 64
> > > [ 3838.688924] phy1: Selected rate control algorithm 'ath9k_rate_control'
> > > [ 3838.693652] phy1: Atheros 5416: mem=0xffffc20000060000, irq=17
> > > [ 3839.427125] irq 17: nobody cared (try booting with the "irqpoll" option)
> > > [ 3839.427136] Pid: 0, comm: swapper Tainted: P
> > > 2.6.27-rc6-tip-00478-g74f1a36 #1
> > > [ 3839.427141] Call Trace:
> > > [ 3839.427145] <IRQ> [<ffffffff802219c5>] ? read_hpet+0x9/0x1c
> > > [ 3839.427165] [<ffffffff8026af71>] __report_bad_irq+0x3d/0x8c
> > > [ 3839.427172] [<ffffffff8026b0c6>] note_interrupt+0x106/0x160
> > > [ 3839.427180] [<ffffffff8026b801>] handle_fasteoi_irq+0xad/0xda
> > > [ 3839.427188] [<ffffffff8020f7b0>] do_IRQ+0x10c/0x190
> > > [ 3839.427194] [<ffffffff8020c551>] ret_from_intr+0x0/0xa
> > > [ 3839.427198] <EOI> [<ffffffff8026c6f0>] rcu_pending+0x62/0x6e
> > > [ 3839.427211] [<ffffffff8025bc11>] ? tick_nohz_stop_sched_tick+0x2e4/0x2f3
> > > [ 3839.427218] [<ffffffff8020ad94>] cpu_idle+0x7b/0xdb
> > > [ 3839.427226] [<ffffffff8060c921>] rest_init+0x75/0x77
> > > [ 3839.427231] handlers:
> > > [ 3839.427234] [<ffffffffa0240238>] (ath_isr+0x0/0x170 [ath9k])
> > > [ 3839.427263] Disabling IRQ #17
> > > [ 3842.263699] ADDRCONF(NETDEV_UP): wlan0: link is not ready
> > > [ 3848.035003] ADDRCONF(NETDEV_UP): wlan0: link is not ready
> > > [ 3848.432701] ADDRCONF(NETDEV_UP): wlan0: link is not ready
> > > [ 3850.216947] wlan0: authenticate with AP 00:1e:52:79:4d:01
> > > [ 3850.217027] wlan0: authenticate with AP 00:1e:52:79:4d:01
> > > [ 3850.228326] wlan0: authenticated
> > > [ 3850.228336] wlan0: associate with AP 00:1e:52:79:4d:01
> > > [ 3850.428140] wlan0: associate with AP 00:1e:52:79:4d:01
> > > [ 3850.628151] wlan0: associate with AP 00:1e:52:79:4d:01
> > > [ 3850.728305] wlan0: RX AssocResp from 00:1e:52:79:4d:01 (capab=0x431
> > > status=0 aid=1)
> > > [ 3850.728314] wlan0: associated
> > > [ 3850.728655] wlan0 (WE) : Wireless Event too big (320)
> > > [ 3850.743377] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
> > > [ 3860.855104] wlan0: no IPv6 routers present
> > >
> > > I rebuilt the module with DBG_ATH_INTERRUPT, but it somehow stumbled
> > > itself back into working order while I was compiling. I can't keep the
> > > interrupt debugging on all the time because it's just -too verbose-,
> > > and when I pop a debug version of the module in, then it's too late to
> > > track the issue....
> >
> > I am able to reproduce this IRQ nobody cared issue in my setup and the
> > following patch seems to be fixing the issue. Please try it out and let
> > me know if it solves your issue in your setup.
>
> The patch you prvide doesn't want to apply. What code did you base this
> on?
>
> The first change listed doesn't work because there is no tasklet_kill() in
> core.c, and the line immediately after ath_stop there has
> "!sc->sc_invalid" instead of the "!(sc->sc_flags & SC_OP_INVALID)".
>
> The second fails because SC_OP_INVALID isn't defined.
>
> However, if your patch did apply to my code, I bet it'd solve the issue,
> based on what it says it does.
I am on 2.6.27-rc6 and this patch is on top of my earlier patch
titled "[PATCH] ath9k: connectivity is lost after Group rekeying is
done". However this patch can be applied on top of latest wireless testing
too. I could apply this patch succesfully on top of wireless testing
git tree.
My git-describe says v2.6.27-rc6-1378-g34e512f.
There is no sc_invalid flag in "struct ath_softc" today. Where did
you get this variable from? It was removed in the following commit
-----------------------------------------------
commit f2c9705a05ecbc0d94216a3b042d5641e8bf70b1
Author: Sujith <Sujith.Manoharan@atheros.com>
Date: Mon Aug 11 14:05:08 2008 +0530
ath9k: Use bitfields for sc operations
Signed-off-by: Sujith Manoharan <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
-----------------------------------------------
Which code base are you using?
^ permalink raw reply [flat|nested] 62+ messages in thread* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-19 17:58 ` Senthil Balasubramanian
@ 2008-09-19 18:23 ` Luis R. Rodriguez
2008-09-19 23:31 ` Steven Noonan
0 siblings, 1 reply; 62+ messages in thread
From: Luis R. Rodriguez @ 2008-09-19 18:23 UTC (permalink / raw)
To: Senthilkumar Balasubramanian
Cc: Steven Noonan, Luis Rodriguez, Ingo Molnar,
ath9k-devel@lists.ath9k.org, linux-wireless, LKML
On Fri, Sep 19, 2008 at 10:58:24AM -0700, Senthilkumar Balasubramanian wrote:
>
> I am on 2.6.27-rc6 and this patch is on top of my earlier patch
> titled "[PATCH] ath9k: connectivity is lost after Group rekeying is
> done". However this patch can be applied on top of latest wireless testing
> too. I could apply this patch succesfully on top of wireless testing
> git tree.
>
> My git-describe says v2.6.27-rc6-1378-g34e512f.
>
> There is no sc_invalid flag in "struct ath_softc" today. Where did
> you get this variable from? It was removed in the following commit
>
> -----------------------------------------------
> commit f2c9705a05ecbc0d94216a3b042d5641e8bf70b1
> Author: Sujith <Sujith.Manoharan@atheros.com>
> Date: Mon Aug 11 14:05:08 2008 +0530
>
> ath9k: Use bitfields for sc operations
>
> Signed-off-by: Sujith Manoharan <Sujith.Manoharan@atheros.com>
> Signed-off-by: John W. Linville <linville@tuxdriver.com>
> -----------------------------------------------
>
> Which code base are you using?
Thanks for the patch Senthil. He is on Linus' v2.6.27-rc, you are on
wireless-testing so I've ported it (no changes, it just applies cleanly
now). This is also untested by me:
diff --git a/drivers/net/wireless/ath9k/core.c b/drivers/net/wireless/ath9k/core.c
index f6c4528..9a432e5 100644
--- a/drivers/net/wireless/ath9k/core.c
+++ b/drivers/net/wireless/ath9k/core.c
@@ -1336,6 +1336,8 @@ void ath_deinit(struct ath_softc *sc)
DPRINTF(sc, ATH_DBG_CONFIG, "%s\n", __func__);
+ tasklet_kill(&sc->intr_tq);
+ tasklet_kill(&sc->bcon_tasklet);
ath_stop(sc);
if (!sc->sc_invalid)
ath9k_hw_setpower(sc->sc_ah, ATH9K_PM_AWAKE);
diff --git a/drivers/net/wireless/ath9k/main.c b/drivers/net/wireless/ath9k/main.c
index c5107f2..75acf54 100644
--- a/drivers/net/wireless/ath9k/main.c
+++ b/drivers/net/wireless/ath9k/main.c
@@ -1400,10 +1400,16 @@ static void ath_pci_remove(struct pci_dev *pdev)
{
struct ieee80211_hw *hw = pci_get_drvdata(pdev);
struct ath_softc *sc = hw->priv;
+ enum ath9k_int status;
- if (pdev->irq)
+ if (pdev->irq) {
+ ath9k_hw_set_interrupts(sc->sc_ah, 0);
+ ath9k_hw_getisr(sc->sc_ah, &status); /* NB: clears ISR too */
+ sc->sc_flags |= SC_OP_INVALID;
free_irq(pdev->irq, sc);
+ }
ath_detach(sc);
+
pci_iounmap(pdev, sc->mem);
pci_release_region(pdev, 0);
pci_disable_device(pdev);
^ permalink raw reply related [flat|nested] 62+ messages in thread* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-19 18:23 ` Luis R. Rodriguez
@ 2008-09-19 23:31 ` Steven Noonan
2008-09-19 23:53 ` Luis R. Rodriguez
0 siblings, 1 reply; 62+ messages in thread
From: Steven Noonan @ 2008-09-19 23:31 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Senthilkumar Balasubramanian, Luis Rodriguez, Ingo Molnar,
ath9k-devel@lists.ath9k.org, linux-wireless, LKML
On Fri, 19 Sep 2008, Luis R. Rodriguez wrote:
> Thanks for the patch Senthil. He is on Linus' v2.6.27-rc, you are on
> wireless-testing so I've ported it (no changes, it just applies cleanly
> now). This is also untested by me:
I'm seeing the same latency issue I reported originally with the latest
-tip kernel and WITHOUT irqpoll.
Apparently there's something else to this...
- Steven
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-19 23:31 ` Steven Noonan
@ 2008-09-19 23:53 ` Luis R. Rodriguez
2008-09-20 0:01 ` Steven Noonan
0 siblings, 1 reply; 62+ messages in thread
From: Luis R. Rodriguez @ 2008-09-19 23:53 UTC (permalink / raw)
To: Steven Noonan
Cc: Luis Rodriguez, Senthilkumar Balasubramanian, Ingo Molnar,
ath9k-devel@lists.ath9k.org, linux-wireless, LKML
On Fri, Sep 19, 2008 at 04:31:20PM -0700, Steven Noonan wrote:
>
> On Fri, 19 Sep 2008, Luis R. Rodriguez wrote:
> > Thanks for the patch Senthil. He is on Linus' v2.6.27-rc, you are on
> > wireless-testing so I've ported it (no changes, it just applies cleanly
> > now). This is also untested by me:
>
> I'm seeing the same latency issue I reported originally with the latest
> -tip kernel and WITHOUT irqpoll.
Damn it.
> Apparently there's something else to this...
Steven,
Can you reproduce this? If so can you tell us how? If you can reproduce
can you try to test this with wireless-testing as well? That is still
based on 2.6.26-rc6 but its just has all the new wireless development
stuff queued up for 2.6.28 so far. Everything else is the same.
Luis
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-19 23:53 ` Luis R. Rodriguez
@ 2008-09-20 0:01 ` Steven Noonan
2008-09-20 1:13 ` Luis R. Rodriguez
0 siblings, 1 reply; 62+ messages in thread
From: Steven Noonan @ 2008-09-20 0:01 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Luis Rodriguez, Senthilkumar Balasubramanian, Ingo Molnar,
ath9k-devel@lists.ath9k.org, linux-wireless, LKML
On Fri, Sep 19, 2008 at 4:53 PM, Luis R. Rodriguez
<lrodriguez@atheros.com> wrote:
> On Fri, Sep 19, 2008 at 04:31:20PM -0700, Steven Noonan wrote:
>>
>> I'm seeing the same latency issue I reported originally with the latest
>> -tip kernel and WITHOUT irqpoll.
>
> Damn it.
>
>> Apparently there's something else to this...
>
> Steven,
>
> Can you reproduce this? If so can you tell us how? If you can reproduce
> can you try to test this with wireless-testing as well? That is still
> based on 2.6.26-rc6 but its just has all the new wireless development
> stuff queued up for 2.6.28 so far. Everything else is the same.
It's a weird problem, because I'm able to reproduce it by running a
kernel with ath9k loaded for a few hours. In this particular case, I
only ran this kernel for about 3.5 hours , but the delay when VT
switching was getting noticeable. Unload/reload of ath9k and the
latency disappeared again. It's definitely ath9k's doing, but I don't
know how. I'm finding the delay is most noticeable after 7 hours
uptime with ath9k loaded.
I'll pull wireless-testing and run that kernel for another 8 hours.
What's the URL for the repo? Surely you don't mean the
wireless-testing on git.kernel.org that hasn't been touched for almost
a week?
I kind of wish there was a better debugging facility in ath9k. With
ATH_DBG_INTERRUPTS, I see several hundred messages per second, which
fills log files up like crazy (I switched to ftrace_printk as Ingo
suggested, which has alleviated this somewhat). Is there a way to make
it less verbose (perhaps so that it prints data at a rate of no more
than 2 or 3 messages per second)? Or maybe the rate at which it prints
this stuff is indicative something's wrong?
- Steven
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-20 0:01 ` Steven Noonan
@ 2008-09-20 1:13 ` Luis R. Rodriguez
2008-09-20 1:15 ` Luis R. Rodriguez
2008-09-20 3:29 ` Steven Noonan
0 siblings, 2 replies; 62+ messages in thread
From: Luis R. Rodriguez @ 2008-09-20 1:13 UTC (permalink / raw)
To: Steven Noonan
Cc: linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Ingo Molnar
On Fri, Sep 19, 2008 at 5:01 PM, Steven Noonan <steven@uplinklabs.net> wrote:
> I'll pull wireless-testing and run that kernel for another 8 hours.
> What's the URL for the repo? Surely you don't mean the
> wireless-testing on git.kernel.org that hasn't been touched for almost
> a week?
Yes, that's the guy. This week was the Kernel Summit and the Linux
Plumbers Conference too so there is understandably some lag in pulls
as developers are getting drunk.
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-testing.git
> I kind of wish there was a better debugging facility in ath9k.
Patches are welcomed ;)
Also, we have a few interrupt levels you can use. In this case we are
debugging interrupts.
> With
> ATH_DBG_INTERRUPTS, I see several hundred messages per second, which
> fills log files up like crazy (I switched to ftrace_printk as Ingo
> suggested, which has alleviated this somewhat). Is there a way to make
> it less verbose (perhaps so that it prints data at a rate of no more
> than 2 or 3 messages per second)? Or maybe the rate at which it prints
> this stuff is indicative something's wrong?
No, interrupts do happen quite often so that is what you get, perhaps
we can enhance this by not printing on some interrupts but the point
is to debug interrupts after all. To help us debug in userspace in a
nicer way we can use sadc and sar. So you can try this:
On a window leave this running:
/usr/lib/sysstat/sadc -I 1 irqlog.sar
And when you are seeing your latency issues, run this:
sar -I 17 -f irqlog.sar | less
That is capital "i", this will show you the monitor results of number
of interrupts on IRQ 17. You want to see if the rate of change is huge
when you have latency issues. If not then its most likely an interrupt
storm by ath9k (which is my suspect right now) but something else.
Luis
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-20 1:13 ` Luis R. Rodriguez
@ 2008-09-20 1:15 ` Luis R. Rodriguez
2008-09-20 3:29 ` Steven Noonan
1 sibling, 0 replies; 62+ messages in thread
From: Luis R. Rodriguez @ 2008-09-20 1:15 UTC (permalink / raw)
To: Steven Noonan
Cc: linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Ingo Molnar
On Fri, Sep 19, 2008 at 6:13 PM, Luis R. Rodriguez
<lrodriguez@atheros.com> wrote:
> That is capital "i", this will show you the monitor results of number
> of interrupts on IRQ 17. You want to see if the rate of change is huge
> when you have latency issues. If not then its most likely
I meant most likely *NOT* an interrupt storm.
> an interrupt
> storm by ath9k (which is my suspect right now) but something else.
>
> Luis
>
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-20 1:13 ` Luis R. Rodriguez
2008-09-20 1:15 ` Luis R. Rodriguez
@ 2008-09-20 3:29 ` Steven Noonan
2008-09-20 3:40 ` Steven Noonan
2008-09-20 15:21 ` Steven Noonan
1 sibling, 2 replies; 62+ messages in thread
From: Steven Noonan @ 2008-09-20 3:29 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Ingo Molnar
On Fri, Sep 19, 2008 at 6:13 PM, Luis R. Rodriguez
<lrodriguez@atheros.com> wrote:
> On Fri, Sep 19, 2008 at 5:01 PM, Steven Noonan <steven@uplinklabs.net> wrote:
>> I'll pull wireless-testing and run that kernel for another 8 hours.
>> What's the URL for the repo? Surely you don't mean the
>> wireless-testing on git.kernel.org that hasn't been touched for almost
>> a week?
>
> Yes, that's the guy. This week was the Kernel Summit and the Linux
> Plumbers Conference too so there is understandably some lag in pulls
> as developers are getting drunk.
>
> git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-testing.git
>
>> I kind of wish there was a better debugging facility in ath9k.
>
> Patches are welcomed ;)
>
> Also, we have a few interrupt levels you can use. In this case we are
> debugging interrupts.
>
>> With
>> ATH_DBG_INTERRUPTS, I see several hundred messages per second, which
>> fills log files up like crazy (I switched to ftrace_printk as Ingo
>> suggested, which has alleviated this somewhat). Is there a way to make
>> it less verbose (perhaps so that it prints data at a rate of no more
>> than 2 or 3 messages per second)? Or maybe the rate at which it prints
>> this stuff is indicative something's wrong?
>
> No, interrupts do happen quite often so that is what you get, perhaps
> we can enhance this by not printing on some interrupts but the point
> is to debug interrupts after all. To help us debug in userspace in a
> nicer way we can use sadc and sar. So you can try this:
>
> On a window leave this running:
>
> /usr/lib/sysstat/sadc -I 1 irqlog.sar
>
> And when you are seeing your latency issues, run this:
>
> sar -I 17 -f irqlog.sar | less
Thanks for informing me about this useful utility. I wasn't aware of
its existence.
However, this profiler seems broken. When I check individual or 'ALL'
statistics, they're all 0.00 IRQs/sec, but SUM shows the actual number
of total IRQs per second. I suspect it's only broken because of some
kernel config option not being enabled.
Anyway, there definitely is an interrupt storm here. Let me demonstrate...
This is when the system is operating normally:
[...]
19:14:39 sum 1271.00
19:14:40 sum 1286.14
19:14:41 sum 1373.00
19:14:42 sum 1691.00
19:14:43 sum 1380.00
19:14:44 sum 902.00
19:14:45 sum 772.00
19:14:46 sum 1155.00
19:14:47 sum 804.00
19:14:48 sum 809.00
19:14:49 sum 965.00
19:14:50 sum 1077.23
Average: sum 615.69
Here's while the latency issue is occurring, which can now safely be
called an interrupt storm:
[...]
20:21:07 sum 18864.08
20:21:08 sum 21152.73
20:21:09 sum 19221.35
20:21:10 sum 20786.14
20:21:11 sum 19891.92
20:21:12 sum 19816.16
20:21:13 sum 19430.69
20:21:14 sum 20432.00
20:21:15 sum 19657.84
20:21:16 sum 19386.00
20:21:17 sum 19386.79
20:21:18 sum 20152.94
20:21:19 sum 19077.88
20:21:20 sum 19852.00
Average: sum 19958.91
And here's the change as I unload and reload ath9k:
[...]
20:21:38 sum 20138.83
20:21:40 sum 21576.09
20:21:41 sum 17678.38
20:21:42 sum 19834.34
20:21:43 sum 19845.10
20:21:44 sum 19470.00
20:21:45 sum 9273.00 <--- ath9k unloaded/reloaded
20:21:46 sum 187.00
20:21:47 sum 1066.00
20:21:48 sum 610.00
20:21:49 sum 106.93
20:21:50 sum 670.00
20:21:51 sum 742.00
20:21:52 sum 781.00
Average: sum 19562.56
- Steven
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-20 3:29 ` Steven Noonan
@ 2008-09-20 3:40 ` Steven Noonan
2008-09-20 20:56 ` Luis R. Rodriguez
2008-09-20 15:21 ` Steven Noonan
1 sibling, 1 reply; 62+ messages in thread
From: Steven Noonan @ 2008-09-20 3:40 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Ingo Molnar
On Fri, Sep 19, 2008 at 8:29 PM, Steven Noonan <steven@uplinklabs.net> wrote:
> Anyway, there definitely is an interrupt storm here. Let me demonstrate...
Worth noting that this change (modified version of the patch by
Senthil from earlier) made it so it didn't give an 'IRQ 17: nobody
cared' when I reloaded ath9k during the IRQ storm...
---
ath9k: flush pending IRQs before freeing IRQ
Signed-off-by: Steven Noonan <steven@uplinklabs.net>
diff --git a/drivers/net/wireless/ath9k/main.c
b/drivers/net/wireless/ath9k/main.c
index 0997196..591eec1 100644
--- a/drivers/net/wireless/ath9k/main.c
+++ b/drivers/net/wireless/ath9k/main.c
@@ -1411,10 +1411,17 @@ static void ath_pci_remove(struct pci_dev *pdev)
{
struct ieee80211_hw *hw = pci_get_drvdata(pdev);
struct ath_softc *sc = hw->priv;
+ enum ath9k_int status;
if (pdev->irq)
+ {
+ ath9k_hw_set_interrupts(sc->sc_ah, 0);
+ ath9k_hw_getisr(sc->sc_ah, &status); /* NB: clears ISR too */
+ sc->sc_invalid = 1;
free_irq(pdev->irq, sc);
+ }
ath_detach(sc);
+
pci_iounmap(pdev, sc->mem);
pci_release_region(pdev, 0);
pci_disable_device(pdev);
^ permalink raw reply related [flat|nested] 62+ messages in thread* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-20 3:40 ` Steven Noonan
@ 2008-09-20 20:56 ` Luis R. Rodriguez
0 siblings, 0 replies; 62+ messages in thread
From: Luis R. Rodriguez @ 2008-09-20 20:56 UTC (permalink / raw)
To: Steven Noonan
Cc: Luis Rodriguez, linux-wireless, LKML,
Senthilkumar Balasubramanian, ath9k-devel@lists.ath9k.org,
Ingo Molnar, John W. Linville
On Fri, Sep 19, 2008 at 08:40:04PM -0700, Steven Noonan wrote:
> On Fri, Sep 19, 2008 at 8:29 PM, Steven Noonan <steven@uplinklabs.net> wrote:
> > Anyway, there definitely is an interrupt storm here. Let me demonstrate...
>
> Worth noting that this change (modified version of the patch by
> Senthil from earlier) made it so it didn't give an 'IRQ 17: nobody
> cared' when I reloaded ath9k during the IRQ storm...
Thanks, for letting us know -- John, seems that IRQ patch by Senthil is
27 material as it fixes an IRQ issue (nobody cared complaint, and it
disables the IRQ otherwise) upon rmmod during an IRQ storm.
Luis
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-20 3:29 ` Steven Noonan
2008-09-20 3:40 ` Steven Noonan
@ 2008-09-20 15:21 ` Steven Noonan
2008-09-20 20:57 ` Luis R. Rodriguez
1 sibling, 1 reply; 62+ messages in thread
From: Steven Noonan @ 2008-09-20 15:21 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Ingo Molnar
On Fri, Sep 19, 2008 at 8:29 PM, Steven Noonan <steven@uplinklabs.net> wrote:
> And here's the change as I unload and reload ath9k:
>
> [...]
> 20:21:38 sum 20138.83
> 20:21:40 sum 21576.09
> 20:21:41 sum 17678.38
> 20:21:42 sum 19834.34
> 20:21:43 sum 19845.10
> 20:21:44 sum 19470.00
> 20:21:45 sum 9273.00 <--- ath9k unloaded/reloaded
> 20:21:46 sum 187.00
> 20:21:47 sum 1066.00
> 20:21:48 sum 610.00
> 20:21:49 sum 106.93
> 20:21:50 sum 670.00
> 20:21:51 sum 742.00
> 20:21:52 sum 781.00
> Average: sum 19562.56
>
I think it might help if you can see the .sar file yourself. I've
uploaded a gzipped copy (great compression ratio in this case for
merely using gzip!).
http://www.uplinklabs.net/~tycho/linux/irqlog-storm.sar.gz
- Steven
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-20 15:21 ` Steven Noonan
@ 2008-09-20 20:57 ` Luis R. Rodriguez
2008-09-20 20:58 ` Johannes Berg
2008-09-20 21:18 ` Steven Noonan
0 siblings, 2 replies; 62+ messages in thread
From: Luis R. Rodriguez @ 2008-09-20 20:57 UTC (permalink / raw)
To: Steven Noonan
Cc: linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Ingo Molnar
On Sat, Sep 20, 2008 at 8:21 AM, Steven Noonan <steven@uplinklabs.net> wrote:
> On Fri, Sep 19, 2008 at 8:29 PM, Steven Noonan <steven@uplinklabs.net> wrote:
>> And here's the change as I unload and reload ath9k:
>>
>> [...]
>> 20:21:38 sum 20138.83
>> 20:21:40 sum 21576.09
>> 20:21:41 sum 17678.38
>> 20:21:42 sum 19834.34
>> 20:21:43 sum 19845.10
>> 20:21:44 sum 19470.00
>> 20:21:45 sum 9273.00 <--- ath9k unloaded/reloaded
>> 20:21:46 sum 187.00
>> 20:21:47 sum 1066.00
>> 20:21:48 sum 610.00
>> 20:21:49 sum 106.93
>> 20:21:50 sum 670.00
>> 20:21:51 sum 742.00
>> 20:21:52 sum 781.00
>> Average: sum 19562.56
>>
>
> I think it might help if you can see the .sar file yourself. I've
> uploaded a gzipped copy (great compression ratio in this case for
> merely using gzip!).
>
> http://www.uplinklabs.net/~tycho/linux/irqlog-storm.sar.gz
mcgrof@tesla ~/tmp/irq $ sar -f irqlog-storm.sar -A
Invalid system activity file: irqlog-storm.sar
Perhaps we its a different format than what my sar can read...
So here's what we need to do -- try to zero on in what causes this IRQ
storm. Most likely its some strange event that triggers it, perhaps
some sort of noise which makes the device try to adapt and hits a path
we are not handling correctly yet. Can you provide the initial log of
when ath9k gets loaded? This tells us what card it is, like AR5416 or
AR9280, etc.
Luis
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-20 20:57 ` Luis R. Rodriguez
@ 2008-09-20 20:58 ` Johannes Berg
2008-09-20 21:03 ` Luis R. Rodriguez
2008-09-20 21:18 ` Steven Noonan
1 sibling, 1 reply; 62+ messages in thread
From: Johannes Berg @ 2008-09-20 20:58 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Steven Noonan, linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Ingo Molnar
[-- Attachment #1: Type: text/plain, Size: 317 bytes --]
On Sat, 2008-09-20 at 13:57 -0700, Luis R. Rodriguez wrote:
> mcgrof@tesla ~/tmp/irq $ sar -f irqlog-storm.sar -A
> Invalid system activity file: irqlog-storm.sar
>
> Perhaps we its a different format than what my sar can read...
It changes like every release... Not a very useful thing, that.
johannes
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-20 20:58 ` Johannes Berg
@ 2008-09-20 21:03 ` Luis R. Rodriguez
2008-09-20 21:22 ` Steven Noonan
0 siblings, 1 reply; 62+ messages in thread
From: Luis R. Rodriguez @ 2008-09-20 21:03 UTC (permalink / raw)
To: Johannes Berg
Cc: linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Ingo Molnar
2008/9/20 Johannes Berg <johannes@sipsolutions.net>:
> On Sat, 2008-09-20 at 13:57 -0700, Luis R. Rodriguez wrote:
>
>> mcgrof@tesla ~/tmp/irq $ sar -f irqlog-storm.sar -A
>> Invalid system activity file: irqlog-storm.sar
>>
>> Perhaps we its a different format than what my sar can read...
>
> It changes like every release... Not a very useful thing, that.
Ah -- oh well.
OK so yeah you have an interrupt storm, this is evident.
Do you think there is a way you can try to trigger the interrupt storm
so it happens < than after 7 hours or however long its taking you? If
so then it'd be good to get your debug out using the interrupt debug.
If that's not useful then we can enable all debug stuff (file will be
huge).
Try turning on the microwave, turn on all your bluetooth devices, use
your 2.4 GHz wireless phone, use your xbox 360 controllers and ps3
controllers all at once :D
Luis
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-20 21:03 ` Luis R. Rodriguez
@ 2008-09-20 21:22 ` Steven Noonan
2008-09-20 21:26 ` Luis R. Rodriguez
0 siblings, 1 reply; 62+ messages in thread
From: Steven Noonan @ 2008-09-20 21:22 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Johannes Berg, linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Ingo Molnar
On Sat, Sep 20, 2008 at 2:03 PM, Luis R. Rodriguez
<lrodriguez@atheros.com> wrote:
>
> OK so yeah you have an interrupt storm, this is evident.
Indeedy.
>
> Do you think there is a way you can try to trigger the interrupt storm
> so it happens < than after 7 hours or however long its taking you? If
> so then it'd be good to get your debug out using the interrupt debug.
> If that's not useful then we can enable all debug stuff (file will be
> huge).
It's apparently 7 idle hours and ~1-2 hours when I'm actively browsing
the web. I don't know what in particular triggers it, so I guess I'd
just need to enable debug logging and then start running through all
my webcomics, or perhaps perusing Wikipedia.
> Try turning on the microwave, turn on all your bluetooth devices, use
> your 2.4 GHz wireless phone, use your xbox 360 controllers and ps3
> controllers all at once :D
Heh. I don't have an Xbox 360 or PS3 (sad, I know). Do have several
2.4GHz phones. ;)
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-20 21:22 ` Steven Noonan
@ 2008-09-20 21:26 ` Luis R. Rodriguez
2008-09-20 21:48 ` Steven Noonan
0 siblings, 1 reply; 62+ messages in thread
From: Luis R. Rodriguez @ 2008-09-20 21:26 UTC (permalink / raw)
To: Steven Noonan
Cc: linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Johannes Berg,
Ingo Molnar
On Sat, Sep 20, 2008 at 2:22 PM, Steven Noonan <steven@uplinklabs.net> wrote:
> Heh. I don't have an Xbox 360 or PS3 (sad, I know). Do have several
> 2.4GHz phones. ;)
Microwaves are the best to make the signal wet poop BTW, if you have
that option available. Not sure if we can trigger the device that way
but I'm just brainstorming here.
Luis
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-20 21:26 ` Luis R. Rodriguez
@ 2008-09-20 21:48 ` Steven Noonan
2008-09-21 1:48 ` Steven Noonan
0 siblings, 1 reply; 62+ messages in thread
From: Steven Noonan @ 2008-09-20 21:48 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Johannes Berg,
Ingo Molnar
On Sat, Sep 20, 2008 at 2:26 PM, Luis R. Rodriguez
<lrodriguez@atheros.com> wrote:
> On Sat, Sep 20, 2008 at 2:22 PM, Steven Noonan <steven@uplinklabs.net> wrote:
>
>> Heh. I don't have an Xbox 360 or PS3 (sad, I know). Do have several
>> 2.4GHz phones. ;)
>
> Microwaves are the best to make the signal wet poop BTW, if you have
> that option available. Not sure if we can trigger the device that way
> but I'm just brainstorming here.
>
Well, I've got the thing set to ATH_DBG_ANY (I want this damn bug
found!), which will certainly be *interesting*. In 5 minutes, the log
has jumped 9 megabytes.
I've also got sadc monitoring IRQs so I know for certain when the
interrupt storm starts.
I'll let you know how this goes.
- Steven
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-20 21:48 ` Steven Noonan
@ 2008-09-21 1:48 ` Steven Noonan
2008-09-22 6:54 ` Luis R. Rodriguez
0 siblings, 1 reply; 62+ messages in thread
From: Steven Noonan @ 2008-09-21 1:48 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Johannes Berg,
Ingo Molnar
On Sat, Sep 20, 2008 at 2:48 PM, Steven Noonan <steven@uplinklabs.net> wrote:
> On Sat, Sep 20, 2008 at 2:26 PM, Luis R. Rodriguez
> <lrodriguez@atheros.com> wrote:
>> On Sat, Sep 20, 2008 at 2:22 PM, Steven Noonan <steven@uplinklabs.net> wrote:
>>
>>> Heh. I don't have an Xbox 360 or PS3 (sad, I know). Do have several
>>> 2.4GHz phones. ;)
>>
>> Microwaves are the best to make the signal wet poop BTW, if you have
>> that option available. Not sure if we can trigger the device that way
>> but I'm just brainstorming here.
>>
>
> Well, I've got the thing set to ATH_DBG_ANY (I want this damn bug
> found!), which will certainly be *interesting*. In 5 minutes, the log
> has jumped 9 megabytes.
>
> I've also got sadc monitoring IRQs so I know for certain when the
> interrupt storm starts.
>
> I'll let you know how this goes.
OK, here's the time index at which it b0rked:
18:28:38 sum 1106.00
18:28:39 sum 1037.62
18:28:40 sum 1069.00
18:28:41 sum 1167.00
18:28:42 sum 1155.00
18:28:43 sum 1339.00
18:28:44 sum 18355.00
18:28:45 sum 17845.45
18:28:46 sum 15285.00
18:28:47 sum 17511.00
18:28:48 sum 17568.69
18:28:49 sum 17704.04
18:28:50 sum 18566.67
18:28:51 sum 18913.13
And the entire log of the minute 18:28 is here:
http://www.uplinklabs.net/~tycho/linux/18.28.00-59.log.gz
It looks pretty crazy.
- Steven
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-21 1:48 ` Steven Noonan
@ 2008-09-22 6:54 ` Luis R. Rodriguez
2008-09-22 7:14 ` Luis R. Rodriguez
2008-09-22 7:26 ` Steven Noonan
0 siblings, 2 replies; 62+ messages in thread
From: Luis R. Rodriguez @ 2008-09-22 6:54 UTC (permalink / raw)
To: Steven Noonan
Cc: Luis Rodriguez, linux-wireless, LKML,
Senthilkumar Balasubramanian, ath9k-devel@lists.ath9k.org,
Johannes Berg, Ingo Molnar
On Sat, Sep 20, 2008 at 06:48:05PM -0700, Steven Noonan wrote:
>
> OK, here's the time index at which it b0rked:
>
> 18:28:42 sum 1155.00
> 18:28:43 sum 1339.00
> 18:28:44 sum 18355.00
> 18:28:45 sum 17845.45
>
> And the entire log of the minute 18:28 is here:
> http://www.uplinklabs.net/~tycho/linux/18.28.00-59.log.gz
>
> It looks pretty crazy.
Thanks, I'll take a look when I get a chance (remember aggretion
is in the queue too) unless someone beats me to it. Asking in advance
-- if you can also provide a log of minutes 43, 44 and 45 that would
be great, but this may just be enough.
Luis
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-22 6:54 ` Luis R. Rodriguez
@ 2008-09-22 7:14 ` Luis R. Rodriguez
2008-09-22 7:26 ` Steven Noonan
1 sibling, 0 replies; 62+ messages in thread
From: Luis R. Rodriguez @ 2008-09-22 7:14 UTC (permalink / raw)
To: Luis Rodriguez
Cc: Steven Noonan, linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Johannes Berg, Ingo Molnar
On Sun, Sep 21, 2008 at 11:54:53PM -0700, Luis Rodriguez wrote:
> On Sat, Sep 20, 2008 at 06:48:05PM -0700, Steven Noonan wrote:
> >
> > OK, here's the time index at which it b0rked:
> >
> > 18:28:42 sum 1155.00
> > 18:28:43 sum 1339.00
> > 18:28:44 sum 18355.00
> > 18:28:45 sum 17845.45
> >
> > And the entire log of the minute 18:28 is here:
> > http://www.uplinklabs.net/~tycho/linux/18.28.00-59.log.gz
> >
> > It looks pretty crazy.
>
> Thanks, I'll take a look when I get a chance (remember aggretion
> is in the queue too) unless someone beats me to it. Asking in advance
> -- if you can also provide a log of minutes 43, 44 and 45 that would
> be great, but this may just be enough.
Oh wait, those are seconds, nevermind :)
Luis
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-22 6:54 ` Luis R. Rodriguez
2008-09-22 7:14 ` Luis R. Rodriguez
@ 2008-09-22 7:26 ` Steven Noonan
2008-09-22 8:06 ` Luis R. Rodriguez
1 sibling, 1 reply; 62+ messages in thread
From: Steven Noonan @ 2008-09-22 7:26 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Luis Rodriguez, linux-wireless, LKML,
Senthilkumar Balasubramanian, ath9k-devel@lists.ath9k.org,
Johannes Berg, Ingo Molnar
On Sun, Sep 21, 2008 at 11:54 PM, Luis R. Rodriguez
<lrodriguez@atheros.com> wrote:
> On Sat, Sep 20, 2008 at 06:48:05PM -0700, Steven Noonan wrote:
>>
>> OK, here's the time index at which it b0rked:
>>
>> 18:28:42 sum 1155.00
>> 18:28:43 sum 1339.00
>> 18:28:44 sum 18355.00
>> 18:28:45 sum 17845.45
>>
>> And the entire log of the minute 18:28 is here:
>> http://www.uplinklabs.net/~tycho/linux/18.28.00-59.log.gz
>>
>> It looks pretty crazy.
>
> Thanks, I'll take a look when I get a chance (remember aggretion
> is in the queue too) unless someone beats me to it.
>
You may want to take a quick look at it now. I certainly don't mean
you should push aggregation aside, but the log looks kind of screwy at
18:28:43 (look at line 4972 to see what I mean). It looks as though it
did printk()s from multiple threads with no locking mechanism. It's
possible I will need to re-capture the error, and I'd like to know if
this log will help or not.
- Steven
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-22 7:26 ` Steven Noonan
@ 2008-09-22 8:06 ` Luis R. Rodriguez
2008-09-22 15:11 ` Steven Noonan
0 siblings, 1 reply; 62+ messages in thread
From: Luis R. Rodriguez @ 2008-09-22 8:06 UTC (permalink / raw)
To: Steven Noonan
Cc: Luis Rodriguez, linux-wireless, LKML,
Senthilkumar Balasubramanian, ath9k-devel@lists.ath9k.org,
Johannes Berg, Ingo Molnar
On Mon, Sep 22, 2008 at 12:26:02AM -0700, Steven Noonan wrote:
> On Sun, Sep 21, 2008 at 11:54 PM, Luis R. Rodriguez
> <lrodriguez@atheros.com> wrote:
> > On Sat, Sep 20, 2008 at 06:48:05PM -0700, Steven Noonan wrote:
> >>
> >> OK, here's the time index at which it b0rked:
> >>
> >> 18:28:42 sum 1155.00
> >> 18:28:43 sum 1339.00
> >> 18:28:44 sum 18355.00
> >> 18:28:45 sum 17845.45
> >>
> >> And the entire log of the minute 18:28 is here:
> >> http://www.uplinklabs.net/~tycho/linux/18.28.00-59.log.gz
> >>
> >> It looks pretty crazy.
> >
> > Thanks, I'll take a look when I get a chance (remember aggretion
> > is in the queue too) unless someone beats me to it.
> >
>
> You may want to take a quick look at it now.
Its Sunday night and I am looking at it :)
> I certainly don't mean
> you should push aggregation aside,
Trust me I want to get this resolved too ;)
> but the log looks kind of screwy at
> 18:28:43 (look at line 4972 to see what I mean). It looks as though it
> did printk()s from multiple threads with no locking mechanism. It's
> possible I will need to re-capture the error, and I'd like to know if
> this log will help or not.
At first glance I see tons of MIB interrupts being spewed, besides the
ackward prints you see too my main concern so far is the MIB interrupts
as we don't handle ANI. My initial thought is because don't hanld ANI
yet we are letting MIB interrupts to continue to spew out.
Luis
Please test the following patch.
diff --git a/drivers/net/wireless/ath9k/core.c b/drivers/net/wireless/ath9k/core.c
index c262ef2..9a51739 100644
--- a/drivers/net/wireless/ath9k/core.c
+++ b/drivers/net/wireless/ath9k/core.c
@@ -681,10 +681,12 @@ int ath_open(struct ath_softc *sc, struct ath9k_channel *initial_chan)
* Enable MIB interrupts when there are hardware phy counters.
* Note we only do this (at the moment) for station mode.
*/
+#if 0
if (ath9k_hw_phycounters(ah) &&
((sc->sc_ah->ah_opmode == ATH9K_M_STA) ||
(sc->sc_ah->ah_opmode == ATH9K_M_IBSS)))
sc->sc_imask |= ATH9K_INT_MIB;
+#endif
/*
* Some hardware processes the TIM IE and fires an
* interrupt when the TIM bit is set. For hardware
^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-22 8:06 ` Luis R. Rodriguez
@ 2008-09-22 15:11 ` Steven Noonan
2008-09-22 16:01 ` Luis R. Rodriguez
0 siblings, 1 reply; 62+ messages in thread
From: Steven Noonan @ 2008-09-22 15:11 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Luis Rodriguez, linux-wireless, LKML,
Senthilkumar Balasubramanian, ath9k-devel@lists.ath9k.org,
Johannes Berg, Ingo Molnar
On Mon, Sep 22, 2008 at 1:06 AM, Luis R. Rodriguez
<lrodriguez@atheros.com> wrote:
> Please test the following patch.
>
> diff --git a/drivers/net/wireless/ath9k/core.c b/drivers/net/wireless/ath9k/core.c
> index c262ef2..9a51739 100644
> --- a/drivers/net/wireless/ath9k/core.c
> +++ b/drivers/net/wireless/ath9k/core.c
> @@ -681,10 +681,12 @@ int ath_open(struct ath_softc *sc, struct ath9k_channel *initial_chan)
> * Enable MIB interrupts when there are hardware phy counters.
> * Note we only do this (at the moment) for station mode.
> */
> +#if 0
> if (ath9k_hw_phycounters(ah) &&
> ((sc->sc_ah->ah_opmode == ATH9K_M_STA) ||
> (sc->sc_ah->ah_opmode == ATH9K_M_IBSS)))
> sc->sc_imask |= ATH9K_INT_MIB;
> +#endif
> /*
> * Some hardware processes the TIM IE and fires an
> * interrupt when the TIM bit is set. For hardware
>
It didn't apply to -rc7, but I managed to apply it manually
(apparently you made this for wireless-testing?) and added the #if 0.
After 7 hours uptime with the driver, no apparent interrupt storm.
I'll do more thorough testing later, but I have a 3 hour drive today,
and some packing to do, so it'll need to wait a bit.
- Steven
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-22 15:11 ` Steven Noonan
@ 2008-09-22 16:01 ` Luis R. Rodriguez
2008-09-23 7:22 ` Steven Noonan
0 siblings, 1 reply; 62+ messages in thread
From: Luis R. Rodriguez @ 2008-09-22 16:01 UTC (permalink / raw)
To: Steven Noonan
Cc: linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Johannes Berg,
Ingo Molnar
On Mon, Sep 22, 2008 at 8:11 AM, Steven Noonan <steven@uplinklabs.net> wrote:
> On Mon, Sep 22, 2008 at 1:06 AM, Luis R. Rodriguez
> <lrodriguez@atheros.com> wrote:
>> Please test the following patch.
>>
>> diff --git a/drivers/net/wireless/ath9k/core.c b/drivers/net/wireless/ath9k/core.c
>> index c262ef2..9a51739 100644
>> --- a/drivers/net/wireless/ath9k/core.c
>> +++ b/drivers/net/wireless/ath9k/core.c
>> @@ -681,10 +681,12 @@ int ath_open(struct ath_softc *sc, struct ath9k_channel *initial_chan)
>> * Enable MIB interrupts when there are hardware phy counters.
>> * Note we only do this (at the moment) for station mode.
>> */
>> +#if 0
>> if (ath9k_hw_phycounters(ah) &&
>> ((sc->sc_ah->ah_opmode == ATH9K_M_STA) ||
>> (sc->sc_ah->ah_opmode == ATH9K_M_IBSS)))
>> sc->sc_imask |= ATH9K_INT_MIB;
>> +#endif
>> /*
>> * Some hardware processes the TIM IE and fires an
>> * interrupt when the TIM bit is set. For hardware
>>
>
> It didn't apply to -rc7, but I managed to apply it manually
> (apparently you made this for wireless-testing?) and added the #if 0.
> After 7 hours uptime with the driver, no apparent interrupt storm.
>
> I'll do more thorough testing later, but I have a 3 hour drive today,
> and some packing to do, so it'll need to wait a bit.
Yeah I did it for wireless-testing. Good to hear this so far has
solved the issue. I'll port it to 27 and post it once I get your
blessings that this fixed it by a Tested-by.
Luis
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-22 16:01 ` Luis R. Rodriguez
@ 2008-09-23 7:22 ` Steven Noonan
2008-09-23 7:55 ` Steven Noonan
2008-09-23 19:22 ` Luis R. Rodriguez
0 siblings, 2 replies; 62+ messages in thread
From: Steven Noonan @ 2008-09-23 7:22 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Johannes Berg,
Ingo Molnar
On Mon, Sep 22, 2008 at 9:01 AM, Luis R. Rodriguez<lrodriguez@atheros.com> wrote:> On Mon, Sep 22, 2008 at 8:11 AM, Steven Noonan <steven@uplinklabs.net> wrote:>> On Mon, Sep 22, 2008 at 1:06 AM, Luis R. Rodriguez>> <lrodriguez@atheros.com> wrote:>>> Please test the following patch.>>>>>> diff --git a/drivers/net/wireless/ath9k/core.c b/drivers/net/wireless/ath9k/core.c>>> index c262ef2..9a51739 100644>>> --- a/drivers/net/wireless/ath9k/core.c>>> +++ b/drivers/net/wireless/ath9k/core.c>>> @@ -681,10 +681,12 @@ int ath_open(struct ath_softc *sc, struct ath9k_channel *initial_chan)>>> * Enable MIB interrupts when there are hardware phy counters.>>> * Note we only do this (at the moment) for station mode.>>> */>>> +#if 0>>> if (ath9k_hw_phycounters(ah) &&>>> ((sc->sc_ah->ah_opmode == ATH9K_M_STA) ||>>> (sc->sc_ah->ah_opmode == ATH9K_M_IBSS)))>>> sc->sc_imask |= ATH9K_INT_MIB;>>> +#endif>>> /*>>> * Some hardware processes the TIM IE and fires an>>> * interrupt when the TIM bit is set. For hardware>>>>>>> It didn't apply to -rc7, but I managed to apply it manually>> (apparently you made this for wireless-testing?) and added the #if 0.>> After 7 hours uptime with the driver, no apparent interrupt storm.>>>> I'll do more thorough testing later, but I have a 3 hour drive today,>> and some packing to do, so it'll need to wait a bit.>> Yeah I did it for wireless-testing. Good to hear this so far has> solved the issue. I'll port it to 27 and post it once I get your> blessings that this fixed it by a Tested-by.>
Okay. I'm in Seattle now.
Yes, it solved the issue, but surely an #if 0 isn't a proper solution.What's the actual bug here? I'm afraid the meaning of what's going onhere isn't exactly intuitive.
Also, I'm doing an 8 hour run with this patch tonight. Maybe more,depends on when I wake up. ;)
I think if this resolves the issue on two separate ≈8 hour runs, itshould be considered the solution to the issue, unless it resurfaces.
- StevenЪТХ╨{.nг+┴╥÷╝┴╜├+%┼кЪ╠Ищ╤\x17╔┼wЪ╨{.nг+┴╥╔┼{╠ЧG╚²ИЪ┼{ay╨\x1dй┤з≥К,j\a╜╒fё╒╥h ▐О│ЙЪ▒ЙГz_Х╝\x03(╜И ▌┼щ╒j"²З\x1a╤^[m╖ЪЪ╬\a╚ЧG╚²ИЪ╒╦?≥╗Х╜з&ёЬ╖~▐А╤iO∙Ф╛z╥ vь^\x14\x04\x1a╤^[m╖ЪЪц\fЪ╤ЛЪ╒╦?√I╔
^ permalink raw reply [flat|nested] 62+ messages in thread* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-23 7:22 ` Steven Noonan
@ 2008-09-23 7:55 ` Steven Noonan
2008-09-23 16:20 ` Steven Noonan
2008-09-23 19:22 ` Luis R. Rodriguez
1 sibling, 1 reply; 62+ messages in thread
From: Steven Noonan @ 2008-09-23 7:55 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Johannes Berg,
Ingo Molnar
2008/9/23 Steven Noonan <steven@uplinklabs.net>:> On Mon, Sep 22, 2008 at 9:01 AM, Luis R. Rodriguez> <lrodriguez@atheros.com> wrote:>> On Mon, Sep 22, 2008 at 8:11 AM, Steven Noonan <steven@uplinklabs.net> wrote:>>> On Mon, Sep 22, 2008 at 1:06 AM, Luis R. Rodriguez>>> <lrodriguez@atheros.com> wrote:>>>> Please test the following patch.>>>>>>>> diff --git a/drivers/net/wireless/ath9k/core.c b/drivers/net/wireless/ath9k/core.c>>>> index c262ef2..9a51739 100644>>>> --- a/drivers/net/wireless/ath9k/core.c>>>> +++ b/drivers/net/wireless/ath9k/core.c>>>> @@ -681,10 +681,12 @@ int ath_open(struct ath_softc *sc, struct ath9k_channel *initial_chan)>>>> * Enable MIB interrupts when there are hardware phy counters.>>>> * Note we only do this (at the moment) for station mode.>>>> */>>>> +#if 0>>>> if (ath9k_hw_phycounters(ah) &&>>>> ((sc->sc_ah->ah_opmode == ATH9K_M_STA) ||>>>> (sc->sc_ah->ah_opmode == ATH9K_M_IBSS)))>>>> sc->sc_imask |= ATH9K_INT_MIB;>>>> +#endif>>>> /*>>>> * Some hardware processes the TIM IE and fires an>>>> * interrupt when the TIM bit is set. For hardware>>>>>>>>>> It didn't apply to -rc7, but I managed to apply it manually>>> (apparently you made this for wireless-testing?) and added the #if 0.>>> After 7 hours uptime with the driver, no apparent interrupt storm.>>>>>> I'll do more thorough testing later, but I have a 3 hour drive today,>>> and some packing to do, so it'll need to wait a bit.>>>> Yeah I did it for wireless-testing. Good to hear this so far has>> solved the issue. I'll port it to 27 and post it once I get your>> blessings that this fixed it by a Tested-by.>>>> Okay. I'm in Seattle now.>> Yes, it solved the issue, but surely an #if 0 isn't a proper solution.> What's the actual bug here? I'm afraid the meaning of what's going on> here isn't exactly intuitive.>> Also, I'm doing an 8 hour run with this patch tonight. Maybe more,> depends on when I wake up. ;)>> I think if this resolves the issue on two separate ≈8 hour runs, it> should be considered the solution to the issue, unless it resurfaces.>
I am only 25 minutes into the run, but I feel this is worth reportingbefore I forget. No IRQ storm thus far, but the connection keepsdropping. Or rather, it doesn't disassociate, but while I wastransferring a large file, it had points of time where it suddenlylost and reacquired the connection (signal strength fluctuates wildlybetween 70-90% and then 0% for a second, and then back tofluctuating). Perhaps this is related to the lack of aggregation?
- StevenЪТХ╨{.nг+┴╥÷╝┴╜├+%┼кЪ╠Ищ╤\x17╔┼wЪ╨{.nг+┴╥╔┼{╠ЧG╚²ИЪ┼{ay╨\x1dй┤з≥К,j\a╜╒fё╒╥h ▐О│ЙЪ▒ЙГz_Х╝\x03(╜И ▌┼щ╒j"²З\x1a╤^[m╖ЪЪ╬\a╚ЧG╚²ИЪ╒╦?≥╗Х╜з&ёЬ╖~▐А╤iO∙Ф╛z╥ vь^\x14\x04\x1a╤^[m╖ЪЪц\fЪ╤ЛЪ╒╦?√I╔
^ permalink raw reply [flat|nested] 62+ messages in thread* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-23 7:55 ` Steven Noonan
@ 2008-09-23 16:20 ` Steven Noonan
0 siblings, 0 replies; 62+ messages in thread
From: Steven Noonan @ 2008-09-23 16:20 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Johannes Berg,
Ingo Molnar
2008/9/23 Steven Noonan <steven@uplinklabs.net>:> 2008/9/23 Steven Noonan <steven@uplinklabs.net>:>> On Mon, Sep 22, 2008 at 9:01 AM, Luis R. Rodriguez>> <lrodriguez@atheros.com> wrote:>>> On Mon, Sep 22, 2008 at 8:11 AM, Steven Noonan <steven@uplinklabs.net> wrote:>>>> On Mon, Sep 22, 2008 at 1:06 AM, Luis R. Rodriguez>>>> <lrodriguez@atheros.com> wrote:>>>>> Please test the following patch.>>>>>>>>>> diff --git a/drivers/net/wireless/ath9k/core.c b/drivers/net/wireless/ath9k/core.c>>>>> index c262ef2..9a51739 100644>>>>> --- a/drivers/net/wireless/ath9k/core.c>>>>> +++ b/drivers/net/wireless/ath9k/core.c>>>>> @@ -681,10 +681,12 @@ int ath_open(struct ath_softc *sc, struct ath9k_channel *initial_chan)>>>>> * Enable MIB interrupts when there are hardware phy counters.>>>>> * Note we only do this (at the moment) for station mode.>>>>> */>>>>> +#if 0>>>>> if (ath9k_hw_phycounters(ah) &&>>>>> ((sc->sc_ah->ah_opmode == ATH9K_M_STA) ||>>>>> (sc->sc_ah->ah_opmode == ATH9K_M_IBSS)))>>>>> sc->sc_imask |= ATH9K_INT_MIB;>>>>> +#endif>>>>> /*>>>>> * Some hardware processes the TIM IE and fires an>>>>> * interrupt when the TIM bit is set. For hardware>>>>>>>>>>>>> It didn't apply to -rc7, but I managed to apply it manually>>>> (apparently you made this for wireless-testing?) and added the #if 0.>>>> After 7 hours uptime with the driver, no apparent interrupt storm.>>>>>>>> I'll do more thorough testing later, but I have a 3 hour drive today,>>>> and some packing to do, so it'll need to wait a bit.>>>>>> Yeah I did it for wireless-testing. Good to hear this so far has>>> solved the issue. I'll port it to 27 and post it once I get your>>> blessings that this fixed it by a Tested-by.>>>>>>> Okay. I'm in Seattle now.>>>> Yes, it solved the issue, but surely an #if 0 isn't a proper solution.>> What's the actual bug here? I'm afraid the meaning of what's going on>> here isn't exactly intuitive.>>>> Also, I'm doing an 8 hour run with this patch tonight. Maybe more,>> depends on when I wake up. ;)>>>> I think if this resolves the issue on two separate ≈8 hour runs, it>> should be considered the solution to the issue, unless it resurfaces.>>>> I am only 25 minutes into the run, but I feel this is worth reporting> before I forget. No IRQ storm thus far, but the connection keeps> dropping. Or rather, it doesn't disassociate, but while I was> transferring a large file, it had points of time where it suddenly> lost and reacquired the connection (signal strength fluctuates wildly> between 70-90% and then 0% for a second, and then back to> fluctuating). Perhaps this is related to the lack of aggregation?>
This 8 hour test passes.
- StevenЪТХ╨{.nг+┴╥÷╝┴╜├+%┼кЪ╠Ищ╤\x17╔┼wЪ╨{.nг+┴╥╔┼{╠ЧG╚²ИЪ┼{ay╨\x1dй┤з≥К,j\a╜╒fё╒╥h ▐О│ЙЪ▒ЙГz_Х╝\x03(╜И ▌┼щ╒j"²З\x1a╤^[m╖ЪЪ╬\a╚ЧG╚²ИЪ╒╦?≥╗Х╜з&ёЬ╖~▐А╤iO∙Ф╛z╥ vь^\x14\x04\x1a╤^[m╖ЪЪц\fЪ╤ЛЪ╒╦?√I╔
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-23 7:22 ` Steven Noonan
2008-09-23 7:55 ` Steven Noonan
@ 2008-09-23 19:22 ` Luis R. Rodriguez
1 sibling, 0 replies; 62+ messages in thread
From: Luis R. Rodriguez @ 2008-09-23 19:22 UTC (permalink / raw)
To: Steven Noonan
Cc: linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Johannes Berg,
Ingo Molnar
2008/9/23 Steven Noonan <steven@uplinklabs.net>:
> Okay. I'm in Seattle now.
>
> Yes, it solved the issue, but surely an #if 0 isn't a proper solution.
For 2.6.27 it will be unless people are willing to accept a patch to
enable ANI support for 2.6.27.
> What's the actual bug here? I'm afraid the meaning of what's going on
> here isn't exactly intuitive.
Check linux-wireless for my patch on MIB event comments. But I'll
summarize here for you. We get MIB interrupts to update statistics on
a number of PHY related elements. We then use this information to
process ANI -- some automatic noise immunity work. If you don't
process the information for ANI it seems you get the hardware treating
this as untreated and hence you get an interrupt storm. So since
2.6.27 doesn't have ANI implemented the MIB interrupts are not useful
except to collect internal statistics.
Since you are so eager to test this with a "proper" solution I will
send you a patch to test ANI for 2.6.27. We'll leave it up to the
maintainers and distributions to decide whether or not to pick this
up. But for that let's leave that for another thread.
Luis
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-20 20:57 ` Luis R. Rodriguez
2008-09-20 20:58 ` Johannes Berg
@ 2008-09-20 21:18 ` Steven Noonan
2008-09-20 21:24 ` Luis R. Rodriguez
1 sibling, 1 reply; 62+ messages in thread
From: Steven Noonan @ 2008-09-20 21:18 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Ingo Molnar
On Sat, Sep 20, 2008 at 1:57 PM, Luis R. Rodriguez
<lrodriguez@atheros.com> wrote:
> So here's what we need to do -- try to zero on in what causes this IRQ
> storm.
How is the best way to approach this? I assume ATH_DBG_INTERRUPT and
pray like hell my /var partition doesn't fill up first? Maybe I'll
just symlink /var/log/messages to a file on a substantially larger
pattition...
> Most likely its some strange event that triggers it, perhaps
> some sort of noise which makes the device try to adapt and hits a path
> we are not handling correctly yet. Can you provide the initial log of
> when ath9k gets loaded? This tells us what card it is, like AR5416 or
> AR9280, etc.
[10238.347956] ath9k: 0.1
[10238.348028] vendor=8086 device=27d2
[10238.348033] ath9k 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[10238.348050] ath9k 0000:03:00.0: setting latency timer to 64
[10238.480781] phy1: Selected rate control algorithm 'ath9k_rate_control'
[10238.483320] phy1: Atheros 5416: mem=0xffffc20000060000, irq=17
It claims to be a 5416, but I was under the impression that it was the
5418 that came with the MacBook Pro.
^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [ath9k-devel] ath9k: massive unexplained latency in 2.6.27 (rc5, rc6, probably others)
2008-09-20 21:18 ` Steven Noonan
@ 2008-09-20 21:24 ` Luis R. Rodriguez
0 siblings, 0 replies; 62+ messages in thread
From: Luis R. Rodriguez @ 2008-09-20 21:24 UTC (permalink / raw)
To: Steven Noonan
Cc: linux-wireless, LKML, Senthilkumar Balasubramanian,
ath9k-devel@lists.ath9k.org, Luis Rodriguez, Ingo Molnar
On Sat, Sep 20, 2008 at 2:18 PM, Steven Noonan <steven@uplinklabs.net> wrote:
> On Sat, Sep 20, 2008 at 1:57 PM, Luis R. Rodriguez
> <lrodriguez@atheros.com> wrote:
>> So here's what we need to do -- try to zero on in what causes this IRQ
>> storm.
>
> How is the best way to approach this? I assume ATH_DBG_INTERRUPT and
> pray like hell my /var partition doesn't fill up first? Maybe I'll
> just symlink /var/log/messages to a file on a substantially larger
> pattition...
Yeah see the other e-mail I sent.
>> Most likely its some strange event that triggers it, perhaps
>> some sort of noise which makes the device try to adapt and hits a path
>> we are not handling correctly yet. Can you provide the initial log of
>> when ath9k gets loaded? This tells us what card it is, like AR5416 or
>> AR9280, etc.
>
> [10238.347956] ath9k: 0.1
> [10238.348028] vendor=8086 device=27d2
> [10238.348033] ath9k 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
> [10238.348050] ath9k 0000:03:00.0: setting latency timer to 64
> [10238.480781] phy1: Selected rate control algorithm 'ath9k_rate_control'
> [10238.483320] phy1: Atheros 5416: mem=0xffffc20000060000, irq=17
>
> It claims to be a 5416, but I was under the impression that it was the
> 5418 that came with the MacBook Pro.
lspci output is outdated, we have submitted some new updates to it.
The output of the driver should be the one you can rely on for chipset
information for now. The problem is besides the PCI device ID there is
also EEPROM information which further identifies the chipset like
devices can be dual band (2 GHz and 5 GHz capable) and some other
single band -- using the same PCI device:vendor ID. So for now we've
tried to update the PCI id stuff (not sure when it'll be merged) to
tell your "family chipset" and in parens the MAC or known model. It
can still use some cleanup but yeah, hope this helps understand this a
little better. You can see more info on this here:
http://wireless.kernel.org/en/users/Drivers/ath9k#supportedchips
Luis
^ permalink raw reply [flat|nested] 62+ messages in thread