* regression: soft lockup with ath9k on master-2010-04-14 @ 2010-04-15 19:44 Kalle Valo 2010-04-15 19:58 ` John W. Linville 0 siblings, 1 reply; 7+ messages in thread From: Kalle Valo @ 2010-04-15 19:44 UTC (permalink / raw) To: linux-wireless Hello, I just updated my laptop to latest wireless-testing and it everytime soft lockups few seconds after association. I haven't updated wireless-testing for few days, so I can't say when this bug was introduced. Info about my setup: samsung x120 ubuntu 10.04 x86-64 11n ap running openwrt and ath9k, wpa2 psk more info about chipset: [ 18.617974] ath9k 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 [ 18.617989] ath9k 0000:02:00.0: setting latency timer to 64 [ 18.667465] ath: EEPROM regdomain: 0x65 [ 18.667468] ath: EEPROM indicates we should expect a direct regpair map [ 18.667473] ath: Country alpha2 being used: 00 [ 18.667475] ath: Regpair used: 0x65 [ 18.840350] phy0: Selected rate control algorithm 'ath9k_rate_control' [ 18.841346] Registered led device: ath9k-phy0::radio [ 18.841373] Registered led device: ath9k-phy0::assoc [ 18.841400] Registered led device: ath9k-phy0::tx [ 18.841425] Registered led device: ath9k-phy0::rx [ 18.841442] phy0: Atheros AR9285 MAC/BB Rev:2 AR5133 RF Rev:e0: mem=0xffffc90021e20000, irq=16 I will debug more tomorrow. -- Kalle Valo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: regression: soft lockup with ath9k on master-2010-04-14 2010-04-15 19:44 regression: soft lockup with ath9k on master-2010-04-14 Kalle Valo @ 2010-04-15 19:58 ` John W. Linville 2010-04-19 6:29 ` Kalle Valo 0 siblings, 1 reply; 7+ messages in thread From: John W. Linville @ 2010-04-15 19:58 UTC (permalink / raw) To: Kalle Valo; +Cc: linux-wireless On Thu, Apr 15, 2010 at 10:44:52PM +0300, Kalle Valo wrote: > Hello, > > I just updated my laptop to latest wireless-testing and it everytime > soft lockups few seconds after association. > > I haven't updated wireless-testing for few days, so I can't say when > this bug was introduced. It might be useful to do a bisect. If you choose to do that, you might want to use wireless-next-2.6 instead, since that doesn't have the occasional pulls from Linus that make bisecting wireless-testing more painful. John -- John W. Linville Someday the world will need a hero, and you linville@tuxdriver.com might be all we have. Be ready. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: regression: soft lockup with ath9k on master-2010-04-14 2010-04-15 19:58 ` John W. Linville @ 2010-04-19 6:29 ` Kalle Valo 2010-04-19 6:43 ` Johannes Berg 0 siblings, 1 reply; 7+ messages in thread From: Kalle Valo @ 2010-04-19 6:29 UTC (permalink / raw) To: John W. Linville; +Cc: linux-wireless, johannes "John W. Linville" <linville@tuxdriver.com> writes: > On Thu, Apr 15, 2010 at 10:44:52PM +0300, Kalle Valo wrote: >> Hello, >> >> I just updated my laptop to latest wireless-testing and it everytime >> soft lockups few seconds after association. >> >> I haven't updated wireless-testing for few days, so I can't say when >> this bug was introduced. > > It might be useful to do a bisect. If you choose to do that, you > might want to use wireless-next-2.6 instead, since that doesn't have > the occasional pulls from Linus that make bisecting wireless-testing > more painful. Thanks for the tip, it helped a lot. My new laptop is really slow to compile kernels :/ I bisected it finally and found the culprit: 66b0470aeef10a3b0f9a6a1c60d908b5a06c62ae is the first bad commit commit 66b0470aeef10a3b0f9a6a1c60d908b5a06c62ae Author: Johannes Berg <johannes@sipsolutions.net> Date: Tue Apr 6 11:18:45 2010 +0200 mac80211: remove ieee80211_sta_stop_rx_ba_session All callers of ieee80211_sta_stop_rx_ba_session can just call __ieee80211_stop_rx_ba_session instead because they already have the station struct, so do that and remove ieee80211_sta_stop_rx_ba_session. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com> I verified that reverting these three patches make my laptop stable again: 54297e4d60b74e602138594c131097347d128b5a mac80211: fix some RX aggregation... 098a607091426e79178b9a6c318d993fea131791 mac80211: clean up/fix aggregation.. 66b0470aeef10a3b0f9a6a1c60d908b5a06c62ae mac80211: remove ieee80211_sta_... (I had to revert all three because of conflicts.) I took a quick peek of the patches but I wasn't able to immediately say what was wrong. This just made me suspicious: - ieee80211_sta_stop_rx_ba_session(sta->sdata, sta->sta.addr, - (u16)*ptid, WLAN_BACK_TIMER, - WLAN_REASON_QSTA_TIMEOUT); + __ieee80211_stop_rx_ba_session(sta, *ptid, - WLAN_BACK_RECIPIENT, + WLAN_REASON_QSTA_TIMEOUT); WLAN_BACK_TIMER was changed to WLAN_BACK_RECIPIENT, but I don't know if it was in purpose or not. Johannes, any ideas? -- Kalle Valo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: regression: soft lockup with ath9k on master-2010-04-14 2010-04-19 6:29 ` Kalle Valo @ 2010-04-19 6:43 ` Johannes Berg 2010-04-19 7:19 ` Kalle Valo 0 siblings, 1 reply; 7+ messages in thread From: Johannes Berg @ 2010-04-19 6:43 UTC (permalink / raw) To: Kalle Valo; +Cc: John W. Linville, linux-wireless On Mon, 2010-04-19 at 09:29 +0300, Kalle Valo wrote: > > It might be useful to do a bisect. If you choose to do that, you > > might want to use wireless-next-2.6 instead, since that doesn't have > > the occasional pulls from Linus that make bisecting wireless-testing > > more painful. > > Thanks for the tip, it helped a lot. My new laptop is really slow to > compile kernels :/ > > I bisected it finally and found the culprit: > > 66b0470aeef10a3b0f9a6a1c60d908b5a06c62ae is the first bad commit > commit 66b0470aeef10a3b0f9a6a1c60d908b5a06c62ae > Author: Johannes Berg <johannes@sipsolutions.net> > Date: Tue Apr 6 11:18:45 2010 +0200 > > mac80211: remove ieee80211_sta_stop_rx_ba_session [...] > I took a quick peek of the patches but I wasn't able to immediately > say what was wrong. This just made me suspicious: > > - ieee80211_sta_stop_rx_ba_session(sta->sdata, sta->sta.addr, > - (u16)*ptid, WLAN_BACK_TIMER, > - WLAN_REASON_QSTA_TIMEOUT); > + __ieee80211_stop_rx_ba_session(sta, *ptid, > - WLAN_BACK_RECIPIENT, > + WLAN_REASON_QSTA_TIMEOUT); > > WLAN_BACK_TIMER was changed to WLAN_BACK_RECIPIENT, but I don't know > if it was in purpose or not. Johannes, any ideas? That was on purpose but belongs into 098a607091426e79178b9a6c318d993fea131791 not this patch ... :( However that shouldn't be the problem. Or rather, that could be the reason you're seeing the problem on this patch, rather than the 098a one. Try the patch below? johannes --- wireless-testing.orig/net/mac80211/agg-rx.c 2010-04-19 08:40:17.000000000 +0200 +++ wireless-testing/net/mac80211/agg-rx.c 2010-04-19 08:40:27.000000000 +0200 @@ -47,11 +47,6 @@ void __ieee80211_stop_rx_ba_session(stru printk(KERN_DEBUG "HW problem - can not stop rx " "aggregation for tid %d\n", tid); - /* check if this is a self generated aggregation halt */ - if (initiator == WLAN_BACK_RECIPIENT) - ieee80211_send_delba(sta->sdata, sta->sta.addr, - tid, 0, reason); - /* free the reordering buffer */ for (i = 0; i < tid_rx->buf_size; i++) { if (tid_rx->reorder_buf[i]) { @@ -69,6 +64,11 @@ void __ieee80211_stop_rx_ba_session(stru spin_unlock_bh(&sta->lock); + /* check if this is a self generated aggregation halt */ + if (initiator == WLAN_BACK_RECIPIENT) + ieee80211_send_delba(sta->sdata, sta->sta.addr, + tid, 0, reason); + del_timer_sync(&tid_rx->session_timer); kfree(tid_rx); } ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: regression: soft lockup with ath9k on master-2010-04-14 2010-04-19 6:43 ` Johannes Berg @ 2010-04-19 7:19 ` Kalle Valo 2010-04-19 9:00 ` [PATCH] mac80211: fix stopping RX BA session from timer Johannes Berg 0 siblings, 1 reply; 7+ messages in thread From: Kalle Valo @ 2010-04-19 7:19 UTC (permalink / raw) To: Johannes Berg; +Cc: John W. Linville, linux-wireless Johannes Berg <johannes@sipsolutions.net> writes: > However that shouldn't be the problem. Or rather, that could be the > reason you're seeing the problem on this patch, rather than the 098a > one. > > Try the patch below? Sorry, no luck. Still my laptop freezes. -- Kalle Valo ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] mac80211: fix stopping RX BA session from timer 2010-04-19 7:19 ` Kalle Valo @ 2010-04-19 9:00 ` Johannes Berg 2010-04-19 11:22 ` Kalle Valo 0 siblings, 1 reply; 7+ messages in thread From: Johannes Berg @ 2010-04-19 9:00 UTC (permalink / raw) To: Kalle Valo; +Cc: John W. Linville, linux-wireless Kalle reported that his system deadlocks since my recent work in this area. The reason quickly became apparent: we try to cancel_timer_sync() a timer from within itself. Fix that by making the function aware of the context it is called from. Reported-by: Kalle Valo <kvalo@adurom.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> --- Please apply after Kalle confirms it fixes the problem. net/mac80211/agg-rx.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) --- wireless-testing.orig/net/mac80211/agg-rx.c 2010-04-19 10:49:25.000000000 +0200 +++ wireless-testing/net/mac80211/agg-rx.c 2010-04-19 10:51:57.000000000 +0200 @@ -18,8 +18,9 @@ #include "ieee80211_i.h" #include "driver-ops.h" -void __ieee80211_stop_rx_ba_session(struct sta_info *sta, u16 tid, - u16 initiator, u16 reason) +static void ___ieee80211_stop_rx_ba_session(struct sta_info *sta, u16 tid, + u16 initiator, u16 reason, + bool from_timer) { struct ieee80211_local *local = sta->local; struct tid_ampdu_rx *tid_rx; @@ -69,10 +70,17 @@ void __ieee80211_stop_rx_ba_session(stru spin_unlock_bh(&sta->lock); - del_timer_sync(&tid_rx->session_timer); + if (!from_timer) + del_timer_sync(&tid_rx->session_timer); kfree(tid_rx); } +void __ieee80211_stop_rx_ba_session(struct sta_info *sta, u16 tid, + u16 initiator, u16 reason) +{ + ___ieee80211_stop_rx_ba_session(sta, tid, initiator, reason, false); +} + /* * After accepting the AddBA Request we activated a timer, * resetting it after each frame that arrives from the originator. @@ -91,8 +99,8 @@ static void sta_rx_agg_session_timer_exp #ifdef CONFIG_MAC80211_HT_DEBUG printk(KERN_DEBUG "rx session timer expired on tid %d\n", (u16)*ptid); #endif - __ieee80211_stop_rx_ba_session(sta, *ptid, WLAN_BACK_RECIPIENT, - WLAN_REASON_QSTA_TIMEOUT); + ___ieee80211_stop_rx_ba_session(sta, *ptid, WLAN_BACK_RECIPIENT, + WLAN_REASON_QSTA_TIMEOUT, true); } static void ieee80211_send_addba_resp(struct ieee80211_sub_if_data *sdata, u8 *da, u16 tid, ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] mac80211: fix stopping RX BA session from timer 2010-04-19 9:00 ` [PATCH] mac80211: fix stopping RX BA session from timer Johannes Berg @ 2010-04-19 11:22 ` Kalle Valo 0 siblings, 0 replies; 7+ messages in thread From: Kalle Valo @ 2010-04-19 11:22 UTC (permalink / raw) To: Johannes Berg; +Cc: John W. Linville, linux-wireless Johannes Berg <johannes@sipsolutions.net> writes: > Kalle reported that his system deadlocks since my > recent work in this area. The reason quickly became > apparent: we try to cancel_timer_sync() a timer > from within itself. Fix that by making the function > aware of the context it is called from. I have now tested this for an hour and I can't reproduce the problem anymore, earlier I was able to reproduce at least within a minute or so. So I'm confident that the problem I saw is this fixed by this patch. Thank you very much for fixing this so quickly. > Reported-by: Kalle Valo <kvalo@adurom.com> > Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Tested-by: Kalle Valo <kvalo@adurom.com> -- Kalle Valo ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-04-19 11:22 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-04-15 19:44 regression: soft lockup with ath9k on master-2010-04-14 Kalle Valo 2010-04-15 19:58 ` John W. Linville 2010-04-19 6:29 ` Kalle Valo 2010-04-19 6:43 ` Johannes Berg 2010-04-19 7:19 ` Kalle Valo 2010-04-19 9:00 ` [PATCH] mac80211: fix stopping RX BA session from timer Johannes Berg 2010-04-19 11:22 ` Kalle Valo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).