From mboxrd@z Thu Jan 1 00:00:00 1970 From: yuweizheng at 139.com Date: Mon, 16 Mar 2015 11:06:27 +0800 Subject: [ath9k-devel] [PATCHv2] ath9k_htc: add adaptive usb receive flow control to repair soft lockup with monitor mode References: <1423528464-8433-1-git-send-email-yuweizheng@139.com>, <54E59A57.402@openwrt.org> Message-ID: <201503161106270545805@139.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: ath9k-devel@lists.ath9k.org I'm sorry that we have a lot of issues after the Chinese traditional spring festival. Now I can start follow this patch. The workqueue will delay the urb submit while the CPU have no resource to process the wireless packet buffered in the rx_buf. In the situation, the original driver always submit URBS , this may make more resource be consumed. Furthermore, the tasklet is in a deadloop, and a soft lockup maybe detect. I noticed that this way I use in the patch is not very graceful. I will try to combine the delayed submit code to the original rx_tasklet. yuweizheng at 139.com From: Felix Fietkau Date: 2015-02-19 16:09 To: Yuwei Zheng; linux-kernel; ath9k-devel; linux-wireless; kvalo; ath9k-devel CC: netdev; zhengyuwei Subject: Re: [PATCHv2] ath9k_htc: add adaptive usb receive flow control to repair soft lockup with monitor mode On 2015-02-10 11:34, Yuwei Zheng wrote: > The ath9k_hif_usb_rx_cb function excute on the interrupt context, and ath9k_rx_tasklet excute > on the soft irq context. In other words, the ath9k_hif_usb_rx_cb have more chance to excute than > ath9k_rx_tasklet. So in the worst condition, the rx.rxbuf receive list is always full, > and the do {}while(true) loop will not be break. The kernel get a soft lockup panic. > > [59011.007210] BUG: soft lockup - CPU#0 stuck for 23s! > [kworker/0:0:30609] > [59011.030560] BUG: scheduling while atomic: kworker/0:0/30609/0x40010100 > [59013.804486] BUG: scheduling while atomic: kworker/0:0/30609/0x40010100 > [59013.858522] Kernel panic - not syncing: softlockup: hung tasks > > [59014.038891] Exception stack(0xdf4bbc38 to 0xdf4bbc80) > [59014.046834] bc20: de57b950 60000113 > [59014.059579] bc40: 00000000 bb32bb32 60000113 de57b948 de57b500 dc7bb440 df4bbcd0 00000000 > [59014.072337] bc60: de57b950 60000113 df4bbcd0 df4bbc80 c04c259d c04c25a0 60000133 ffffffff > [59014.085233] [] (__irq_svc+0x3b/0x5c) from [] (_raw_spin_unlock_irqrestore+0xc/0x10) > [59014.100437] [] (_raw_spin_unlock_irqrestore+0xc/0x10) from [] (ath9k_rx_tasklet+0x290/0x490 [ath9k_htc]) > [59014.118267] [] (ath9k_rx_tasklet+0x290/0x490 [ath9k_htc]) from [] (tasklet_action+0x3b/0x98) > [59014.134132] [] (tasklet_action+0x3b/0x98) from [] (__do_softirq+0x99/0x16c) > [59014.147784] [] (__do_softirq+0x99/0x16c) from [] (irq_exit+0x5b/0x5c) > [59014.160653] [] (irq_exit+0x5b/0x5c) from [] (handle_IRQ+0x37/0x78) > [59014.173124] [] (handle_IRQ+0x37/0x78) from [] (omap3_intc_handle_irq+0x5f/0x68) > [59014.187225] [] (omap3_intc_handle_irq+0x5f/0x68) from [](__irq_svc+0x3b/0x5c) > > This bug can be see with low performance board, such as uniprocessor beagle bone board. Add some debug > message in the ath9k_hif_usb_rx_cb function may trigger this bug quickly. > > Signed-off-by: Yuwei Zheng This approach of interaction between tasklet and workqueue processing seems quite complex to me. Wouldn't it be simpler and better to simply always run the rx processing code in workqueue context? That way it can go on processing forever (as long as there is data to be received), while the scheduler ensures that it doesn't interfere with other critical work on the CPU. - Felix -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.ath9k.org/pipermail/ath9k-devel/attachments/20150316/aab41d90/attachment-0001.htm From mboxrd@z Thu Jan 1 00:00:00 1970 From: "yuweizheng@139.com" Subject: Re: [PATCHv2] ath9k_htc: add adaptive usb receive flow control to repair soft lockup with monitor mode Date: Mon, 16 Mar 2015 11:06:27 +0800 Message-ID: <201503161106270545805@139.com> References: <1423528464-8433-1-git-send-email-yuweizheng@139.com>, <54E59A57.402@openwrt.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0850616345==" Cc: netdev To: "Felix Fietkau" , linux-kernel , ath9k-devel , linux-wireless , kvalo , ath9k-devel Return-path: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Mime-version: 1.0 Sender: ath9k-devel-bounces@lists.ath9k.org Errors-To: ath9k-devel-bounces@lists.ath9k.org List-Id: netdev.vger.kernel.org This is a multi-part message in MIME format. --===============0850616345== Content-Type: multipart/alternative; boundary="----=_001_NextPart448136155231_=----" This is a multi-part message in MIME format. ------=_001_NextPart448136155231_=---- Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 SSdtIHNvcnJ5IHRoYXQgd2UgaGF2ZSBhIGxvdCBvZiBpc3N1ZXMgYWZ0ZXIgdGhlIENoaW5lc2Ug dHJhZGl0aW9uYWwgc3ByaW5nIGZlc3RpdmFsLiANCk5vdyBJIGNhbiBzdGFydCBmb2xsb3cgdGhp cyBwYXRjaC4gDQogDQpUaGUgd29ya3F1ZXVlIHdpbGwgZGVsYXkgdGhlIHVyYiBzdWJtaXQgd2hp bGUgdGhlIENQVSBoYXZlIG5vIHJlc291cmNlIHRvIHByb2Nlc3MgdGhlIHdpcmVsZXNzIHBhY2tl dCBidWZmZXJlZCBpbiB0aGUgcnhfYnVmLiAgIEluIHRoZSBzaXR1YXRpb24sIHRoZSBvcmlnaW5h bCBkcml2ZXIgYWx3YXlzIHN1Ym1pdCBVUkJTICwgIHRoaXMgbWF5IG1ha2UgbW9yZSByZXNvdXJj ZSBiZSBjb25zdW1lZC4gRnVydGhlcm1vcmUsIHRoZSB0YXNrbGV0IGlzIGluIGEgZGVhZGxvb3As IGFuZCBhIHNvZnQgbG9ja3VwIG1heWJlIGRldGVjdC4gIA0KIA0KSSBub3RpY2VkIHRoYXQgdGhp cyB3YXkgSSB1c2UgaW4gdGhlIHBhdGNoIGlzIG5vdCB2ZXJ5IGdyYWNlZnVsLiAgSSB3aWxsIHRy eSB0byBjb21iaW5lIHRoZSBkZWxheWVkIHN1Ym1pdCBjb2RlIHRvIHRoZSBvcmlnaW5hbCByeF90 YXNrbGV0LiAgDQoNCg0KDQp5dXdlaXpoZW5nQDEzOS5jb20NCiANCkZyb206IEZlbGl4IEZpZXRr YXUNCkRhdGU6IDIwMTUtMDItMTkgMTY6MDkNClRvOiBZdXdlaSBaaGVuZzsgbGludXgta2VybmVs OyBhdGg5ay1kZXZlbDsgbGludXgtd2lyZWxlc3M7IGt2YWxvOyBhdGg5ay1kZXZlbA0KQ0M6IG5l dGRldjsgemhlbmd5dXdlaQ0KU3ViamVjdDogUmU6IFtQQVRDSHYyXSBhdGg5a19odGM6IGFkZCBh ZGFwdGl2ZSB1c2IgcmVjZWl2ZSBmbG93IGNvbnRyb2wgdG8gcmVwYWlyIHNvZnQgbG9ja3VwIHdp dGggbW9uaXRvciBtb2RlDQpPbiAyMDE1LTAyLTEwIDExOjM0LCBZdXdlaSBaaGVuZyB3cm90ZToN Cj4gVGhlIGF0aDlrX2hpZl91c2JfcnhfY2IgZnVuY3Rpb24gZXhjdXRlIG9uICB0aGUgaW50ZXJy dXB0IGNvbnRleHQsIGFuZCBhdGg5a19yeF90YXNrbGV0IGV4Y3V0ZQ0KPiBvbiB0aGUgc29mdCBp cnEgY29udGV4dC4gSW4gb3RoZXIgd29yZHMsIHRoZSBhdGg5a19oaWZfdXNiX3J4X2NiIGhhdmUg bW9yZSBjaGFuY2UgdG8gZXhjdXRlIHRoYW4NCj4gYXRoOWtfcnhfdGFza2xldC4gIFNvIGluIHRo ZSB3b3JzdCBjb25kaXRpb24sICB0aGUgcngucnhidWYgcmVjZWl2ZSBsaXN0IGlzIGFsd2F5cyBm dWxsLA0KPiBhbmQgdGhlIGRvIHt9d2hpbGUodHJ1ZSkgbG9vcCB3aWxsIG5vdCBiZSBicmVhay4g VGhlIGtlcm5lbCBnZXQgYSBzb2Z0IGxvY2t1cCBwYW5pYy4gDQo+ICANCj4gWzU5MDExLjAwNzIx MF0gQlVHOiBzb2Z0IGxvY2t1cCAtIENQVSMwIHN0dWNrIGZvciAyM3MhDQo+IFtrd29ya2VyLzA6 MDozMDYwOV0NCj4gWzU5MDExLjAzMDU2MF0gQlVHOiBzY2hlZHVsaW5nIHdoaWxlIGF0b21pYzog a3dvcmtlci8wOjAvMzA2MDkvMHg0MDAxMDEwMA0KPiBbNTkwMTMuODA0NDg2XSBCVUc6IHNjaGVk dWxpbmcgd2hpbGUgYXRvbWljOiBrd29ya2VyLzA6MC8zMDYwOS8weDQwMDEwMTAwDQo+IFs1OTAx My44NTg1MjJdIEtlcm5lbCBwYW5pYyAtIG5vdCBzeW5jaW5nOiBzb2Z0bG9ja3VwOiBodW5nIHRh c2tzDQo+ICANCj4gWzU5MDE0LjAzODg5MV0gRXhjZXB0aW9uIHN0YWNrKDB4ZGY0YmJjMzggdG8g MHhkZjRiYmM4MCkNCj4gWzU5MDE0LjA0NjgzNF0gYmMyMDogICAgICAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgZGU1N2I5NTAgNjAwMDAxMTMNCj4gWzU5 MDE0LjA1OTU3OV0gYmM0MDogMDAwMDAwMDAgYmIzMmJiMzIgNjAwMDAxMTMgZGU1N2I5NDggZGU1 N2I1MDAgZGM3YmI0NDAgZGY0YmJjZDAgMDAwMDAwMDANCj4gWzU5MDE0LjA3MjMzN10gYmM2MDog ZGU1N2I5NTAgNjAwMDAxMTMgZGY0YmJjZDAgZGY0YmJjODAgYzA0YzI1OWQgYzA0YzI1YTAgNjAw MDAxMzMgZmZmZmZmZmYNCj4gWzU5MDE0LjA4NTIzM10gWzxjMDRjMjhkYj5dIChfX2lycV9zdmMr MHgzYi8weDVjKSBmcm9tIFs8YzA0YzI1YTA+XSAoX3Jhd19zcGluX3VubG9ja19pcnFyZXN0b3Jl KzB4Yy8weDEwKQ0KPiBbNTkwMTQuMTAwNDM3XSBbPGMwNGMyNWEwPl0gKF9yYXdfc3Bpbl91bmxv Y2tfaXJxcmVzdG9yZSsweGMvMHgxMCkgZnJvbSBbPGJmOWMyMDg5Pl0gKGF0aDlrX3J4X3Rhc2ts ZXQrMHgyOTAvMHg0OTAgW2F0aDlrX2h0Y10pDQo+IFs1OTAxNC4xMTgyNjddIFs8YmY5YzIwODk+ XSAoYXRoOWtfcnhfdGFza2xldCsweDI5MC8weDQ5MCBbYXRoOWtfaHRjXSkgZnJvbSBbPGMwMDM2 ZDIzPl0gKHRhc2tsZXRfYWN0aW9uKzB4M2IvMHg5OCkNCj4gWzU5MDE0LjEzNDEzMl0gWzxjMDAz NmQyMz5dICh0YXNrbGV0X2FjdGlvbisweDNiLzB4OTgpIGZyb20gWzxjMDAzNjcwOT5dIChfX2Rv X3NvZnRpcnErMHg5OS8weDE2YykNCj4gWzU5MDE0LjE0Nzc4NF0gWzxjMDAzNjcwOT5dIChfX2Rv X3NvZnRpcnErMHg5OS8weDE2YykgZnJvbSBbPGMwMDM2OWY3Pl0gKGlycV9leGl0KzB4NWIvMHg1 YykNCj4gWzU5MDE0LjE2MDY1M10gWzxjMDAzNjlmNz5dIChpcnFfZXhpdCsweDViLzB4NWMpIGZy b20gWzxjMDAwY2ZjMz5dIChoYW5kbGVfSVJRKzB4MzcvMHg3OCkNCj4gWzU5MDE0LjE3MzEyNF0g WzxjMDAwY2ZjMz5dIChoYW5kbGVfSVJRKzB4MzcvMHg3OCkgZnJvbSBbPGMwMDA4NWRmPl0gKG9t YXAzX2ludGNfaGFuZGxlX2lycSsweDVmLzB4NjgpDQo+IFs1OTAxNC4xODcyMjVdIFs8YzAwMDg1 ZGY+XSAob21hcDNfaW50Y19oYW5kbGVfaXJxKzB4NWYvMHg2OCkgZnJvbSBbPGMwNGMyOGRiPl0o X19pcnFfc3ZjKzB4M2IvMHg1YykNCj4gIA0KPiBUaGlzIGJ1ZyBjYW4gYmUgc2VlIHdpdGggbG93 IHBlcmZvcm1hbmNlIGJvYXJkLCBzdWNoIGFzIHVuaXByb2Nlc3NvciBiZWFnbGUgYm9uZSBib2Fy ZC4gQWRkIHNvbWUgZGVidWcgDQo+IG1lc3NhZ2UgaW4gdGhlIGF0aDlrX2hpZl91c2JfcnhfY2Ig ZnVuY3Rpb24gbWF5IHRyaWdnZXIgdGhpcyBidWcgcXVpY2tseS4NCj4gIA0KPiBTaWduZWQtb2Zm LWJ5OiBZdXdlaSBaaGVuZyA8eXV3ZWl6aGVuZ0AxMzkuY29tPg0KVGhpcyBhcHByb2FjaCBvZiBp bnRlcmFjdGlvbiBiZXR3ZWVuIHRhc2tsZXQgYW5kIHdvcmtxdWV1ZSBwcm9jZXNzaW5nDQpzZWVt cyBxdWl0ZSBjb21wbGV4IHRvIG1lLiBXb3VsZG4ndCBpdCBiZSBzaW1wbGVyIGFuZCBiZXR0ZXIg dG8gc2ltcGx5DQphbHdheXMgcnVuIHRoZSByeCBwcm9jZXNzaW5nIGNvZGUgaW4gd29ya3F1ZXVl IGNvbnRleHQ/DQpUaGF0IHdheSBpdCBjYW4gZ28gb24gcHJvY2Vzc2luZyBmb3JldmVyIChhcyBs b25nIGFzIHRoZXJlIGlzIGRhdGEgdG8gYmUNCnJlY2VpdmVkKSwgd2hpbGUgdGhlIHNjaGVkdWxl ciBlbnN1cmVzIHRoYXQgaXQgZG9lc24ndCBpbnRlcmZlcmUgd2l0aA0Kb3RoZXIgY3JpdGljYWwg d29yayBvbiB0aGUgQ1BVLg0KIA0KLSBGZWxpeA0K ------=_001_NextPart448136155231_=---- Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable =0A
=0A=0A=0A=0A

I'm sorry that we have a lot of issues after the Chinese= =0Atraditional spring festival.

=0A

Now I can start follow this patch.

=0A

 

=0A

The workqueue will delay the urb submit while the CP= U have=0Ano resource to process the wireless packet buffered in the rx_buf= .   In the=0Asituation, the original driver always submit URBS ,=  this may make more resource be consumed. Furthermore,=0Athe tasklet= is in a deadloop, and a soft lockup maybe detect.  

= =0A

 

=0A

I noticed that this way I use in the patch = is not very graceful.=0A I will try to combine the delayed submit=0Ac= ode to the original rx_tasklet.  

=0A=
=0A


=0A
yuweizheng@139.com
=0A
 
= From: Felix Fietkau
<= div>Date: 2015-02-19 16:09
To: Yuwei Zheng; linux-kernel; ath9k-devel; linux-wireless; kvalo= ; ath9k-devel
Subject= : Re: [PATCHv2] ath9k_htc: add adaptive usb receive flow control = to repair soft lockup with monitor mode
On 2015= -02-10 11:34, Yuwei Zheng wrote:
=0A
> The ath9k_hif_usb_rx_cb= function excute on  the interrupt context, and ath9k_rx_tasklet excu= te
=0A
> on the soft irq context. In other words, the ath9k_hi= f_usb_rx_cb have more chance to excute than
=0A
> ath9k_rx_tas= klet.  So in the worst condition,  the rx.rxbuf receive list is = always full,
=0A
> and the do {}while(true) loop will not be b= reak. The kernel get a soft lockup panic.
=0A
= =0A
> [59011.007210] BUG: soft lockup - CPU#0 stuck for 23s!
= =0A
> [kworker/0:0:30609]
=0A
> [59011.030560] BUG: sch= eduling while atomic: kworker/0:0/30609/0x40010100
=0A
> [5901= 3.804486] BUG: scheduling while atomic: kworker/0:0/30609/0x40010100
= =0A
> [59013.858522] Kernel panic - not syncing: softlockup: hung t= asks
=0A
=0A
> [59014.038891] Exception s= tack(0xdf4bbc38 to 0xdf4bbc80)
=0A
> [59014.046834] bc20: = ;            &= nbsp;           &nb= sp;            = ;            &= nbsp;    de57b950 60000113
=0A
> [59014.059579]= bc40: 00000000 bb32bb32 60000113 de57b948 de57b500 dc7bb440 df4bbcd0 0000= 0000
=0A
> [59014.072337] bc60: de57b950 60000113 df4bbcd0 df4= bbc80 c04c259d c04c25a0 60000133 ffffffff
=0A
> [59014.085233]= [<c04c28db>] (__irq_svc+0x3b/0x5c) from [<c04c25a0>] (_raw_sp= in_unlock_irqrestore+0xc/0x10)
=0A
> [59014.100437] [<c04c2= 5a0>] (_raw_spin_unlock_irqrestore+0xc/0x10) from [<bf9c2089>] (a= th9k_rx_tasklet+0x290/0x490 [ath9k_htc])
=0A
> [59014.118267] = [<bf9c2089>] (ath9k_rx_tasklet+0x290/0x490 [ath9k_htc]) from [<c0= 036d23>] (tasklet_action+0x3b/0x98)
=0A
> [59014.134132] [&= lt;c0036d23>] (tasklet_action+0x3b/0x98) from [<c0036709>] (__do_= softirq+0x99/0x16c)
=0A
> [59014.147784] [<c0036709>] (_= _do_softirq+0x99/0x16c) from [<c00369f7>] (irq_exit+0x5b/0x5c)
= =0A
> [59014.160653] [<c00369f7>] (irq_exit+0x5b/0x5c) from [= <c000cfc3>] (handle_IRQ+0x37/0x78)
=0A
> [59014.173124] = [<c000cfc3>] (handle_IRQ+0x37/0x78) from [<c00085df>] (omap3_i= ntc_handle_irq+0x5f/0x68)
=0A
> [59014.187225] [<c00085df&g= t;] (omap3_intc_handle_irq+0x5f/0x68) from [<c04c28db>](__irq_svc+0x= 3b/0x5c)
=0A
=0A
> This bug can be see wi= th low performance board, such as uniprocessor beagle bone board. Add some= debug
=0A
> message in the ath9k_hif_usb_rx_cb function may = trigger this bug quickly.
=0A
=0A
> Signe= d-off-by: Yuwei Zheng <yuweizheng@139.com>
=0A
This approac= h of interaction between tasklet and workqueue processing
=0A
see= ms quite complex to me. Wouldn't it be simpler and better to simply
= =0A
always run the rx processing code in workqueue context?
=0AThat way it can go on processing forever (as long as there is data to b= e
=0A
received), while the scheduler ensures that it doesn't inte= rfere with
=0A
other critical work on the CPU.
=0A
 = ;
=0A
- Felix
=0A
=0A ------=_001_NextPart448136155231_=------ --===============0850616345== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ ath9k-devel mailing list ath9k-devel@lists.ath9k.org https://lists.ath9k.org/mailman/listinfo/ath9k-devel --===============0850616345==--