On 09/05/2012 09:22 PM, Gregoire Gentil wrote: > > > On 09/05/2012 08:48 PM, Josh Cartwright wrote: >> CC'd Luciano to let him know about breakage of wl12xx on -rt. >> >> On Tue, Sep 04, 2012 at 06:22:33PM -0700, Gregoire Gentil wrote: >>> On 09/04/2012 07:11 AM, Josh Cartwright wrote: >>>> On Sun, Sep 02, 2012 at 11:24:47PM -0700, Gregoire Gentil wrote: >>>>> Hello, >>>>> >>>>> I'm trying to debug a wifi bug with 3.4-rt17 applied, running on an >>>>> OMAP4 ARM board such as Pandaboard. >>>>> >>>>> Wi-Fi works perfectly well without rt patches. It also works quite >>>>> well with rt patches AND without wifi module loaded. But with both >>>>> rt patches and wifi module, the system is very flaky and even if I >>>>> manage to launch a big download, I get a kernel hang. I managed to >>>>> get a trace: >>>>> >>>>> BUG: scheduling while atomic: irq/213-wl12xx/1588/0x00010002 >>>>> Modules linked in: omapdce(C) wl12xx wlcore omaprpc(C) mac80211 d >>>>> [] (unwind_backtrace+0x0/0xf0) from [] (dump) >>>>> [] (dump_stack+0x20/0x24) from [] (__schedul) >>>>> [] (__schedule_bug+0x54/0x60) from [] (__sch) >>>>> [] (__schedule+0x74/0x6c0) from [] (schedule) >>>>> [] (schedule+0xa0/0xb8) from [] (rt_spin_loc) >>>>> [] (rt_spin_lock_slowlock+0x198/0x288) from [>>>> [] (rt_spin_lock+0x18/0x1c) from [] (wl12xx_) >>>>> [] (wl12xx_hardirq+0x2c/0xa4 [wlcore]) from [>>>> [] (handle_irq_event_percpu+0xac/0x24c) from [>>>> [] (handle_irq_event+0x7c/0x9c) from [] (han) >>>>> [] (handle_level_irq+0xe4/0x134) from [] (ge) >>>>> [] (generic_handle_irq+0x34/0x3c) from [] (g) >>>>> [] (gpio_irq_handler+0x160/0x1a4) from [] (g) >>>>> [] (generic_handle_irq+0x34/0x3c) from [] (h) >>>>> [] (handle_IRQ+0x88/0xc8) >>>>> >>>>> Source code including the function wl12xx_hardirq is here: >>>>> http://dev.omapzoom.org/?p=integration/kernel-ubuntu.git;a=blob;f=drivers/net/wireless/ti/wlcore/main.c;h=45fe911a6504f92dddff5a9415bb77a643b3c4a9;hb=f84c72f6b36418ff11d16808c16a7c3216730bb0 >>>>> >>>>> >>>>> Any idea what could be wrong and how I could debug and fix this >>>>> situation? >>>> >>>> On first glance, it looks like the driver uses request_threaded_irq(), >>>> to register its handlers, but is trying to acquire a regular spin_lock >>>> in its primary handler. That's bad news, since spin_locks' can >>>> schedule() when contended with CONFIG_PREEMPT_RT. >>>> >>>> And it's not just that, unfortunately, since the primary handler also >>>> complete()s a completion, which also can schedule(). >>>> >>>> It looks like the overall interrupt handling strategy of this driver >>>> probably needs to be revisited. :(. >>> >>> Josh, >>> >>> I really appreciate the answer. Thank you. Though I'm definitely not >>> a RT expert, I really would like to make this work... Could you >>> provide a little bit more guideline what I should patch to fix this >>> situation? Could I change request_threaded_irq to something else? >>> Would raw_spin* help? >> >> In general, the hardirq handler should be doing the absolute bare >> minimum necessary to determine whether or not the device is interrupting >> (and if so, quiet it down and return IRQ_WAKE_THREAD), since >> longer-running handlers have detrimental impact to the system >> determinism. >> >> The right solution for wl12xx seems to be pushing the logic currently >> implemented in the hardirq handler into the threaded handler, but >> without knowing too many details about the driver, its difficult to >> judge the viability/impact of this solution. > [G2]. Luciano, > > Could you please comment on this suggestion? With the right guideline, > I'm willing to patch and test and see if we can improve this buggy > situation. > > Many thanks in advance, > > Grégoire Find attached a patch which seems to work but I really don't know what I'm doing here! Any comment would be appreciated, Grégoire