linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nils Radtke <lkml@Think-Future.de>
To: reinette chatre <reinette.chatre@intel.com>
Cc: linux-kernel@vger.kernel.org, linux-wireless@vger.kernel.org
Subject: Re: kernel BUG in iwl-agn-rs.c:2076, WAS: iwlagn + some accesspoint == hardlock
Date: Mon, 31 May 2010 22:12:25 +0200	[thread overview]
Message-ID: <20100531201225.GA6327@localhost> (raw)
In-Reply-To: <1274380408.2091.9272.camel@rchatre-DESK>

[-- Attachment #1: Type: text/plain, Size: 7370 bytes --]


  Hi Reinette,

First off: 
Linux mypole 2.6.33.4 #23 SMP PREEMPT Sat May 15 20:27:33 CEST 2010 i686 GNU/Linux

On Thu 2010-05-20 @ 11-33-28AM -0700, reinette chatre wrote: 
# On Thu, 2010-05-20 at 05:15 -0700, Nils Radtke wrote:
# > # 
# > # To address (1), could you please run with attached debug patch and also
# > # enable rate scaling debugging. That will be "modprobe iwlagn
# > # debug=0x143fff).
# > drivers/net/wireless/iwlwifi/iwl-agn-rs.c: In function ‘rs_collect_tx_data’:
# > drivers/net/wireless/iwlwifi/iwl-agn-rs.c:364: error: ‘priv’ undeclared (first use in this function)
# > drivers/net/wireless/iwlwifi/iwl-agn-rs.c:364: error: (Each undeclared identifier is reported only once
# > drivers/net/wireless/iwlwifi/iwl-agn-rs.c:364: error: for each function it appears in.)
# > 
# > This happens when compiling w/ the patch applied cleanly against .33.3
# > I'll try to fix it, then conduct the field test.
# 
# Sorry ... and thanks.
# 
# >  For the latter, do 
# > you need the same kind of log as for the previous one? 
# The goal of this patch is to find the reason behind the error
# "expected_tpt should have been calculated by now". From what I
# understand you only encountered that in one of your tests, not all. Any
# test you can run to reproduce that error will be welcome. 
Yep. This expected_tpt stuff happens IRC (see mails for certainty, though) 
exclusively on site B.

Ok, so that's the goal. What could I do to advance us an additional step
at a time avoiding pushing hundreds of kilobytes of logs uplink?

To reproduce it, I have to be on site B and just start surfing. It's a matter
of _short_ time until the driver hits the wall.

I'm suspecting still getting those expected_tpt thingies (see below). Though today
it seems I've been lucky, it (maybe) only happened once (or never today).

# Thinking about your question more ... I believe your previous debug logs
# were created with debug flag 0x43fff. For this iteration, please use
# debug flag 0x143fff.

If you insist on a test using .33.3, I will do so but that will have to wait.

Meanwhile I used this patch for .34 to fix the build err from your dbg patch:

 drivers/net/wireless/iwlwifi/iwl-agn-rs.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Index: linux/drivers/net/wireless/iwlwifi/iwl-agn-rs.c
===================================================================
--- linux.orig/drivers/net/wireless/iwlwifi/iwl-agn-rs.c        2010-05-31 13:21:48.000000000 +0200
+++ linux/drivers/net/wireless/iwlwifi/iwl-agn-rs.c     2010-05-31 13:25:52.000000000 +0200
@@ -365,7 +365,8 @@
  * packets.
  */
 static int rs_collect_tx_data(struct iwl_scale_tbl_info *tbl,
-                             int scale_index, int attempts, int successes)
+                             int scale_index, int attempts, int successes,
+                             struct iwl_priv *priv)
 {
        struct iwl_rate_scale_data *window = NULL;
        static const u64 mask = (((u64)1) << (IWL_RATE_MAX_WINDOW - 1));
@@ -868,7 +869,7 @@
                                &rs_index);
                rs_collect_tx_data(curr_tbl, rs_index,
                                   info->status.ampdu_ack_len,
-                                  info->status.ampdu_ack_map);
+                                  info->status.ampdu_ack_map, priv);
 
                /* Update success/fail counts if not searching for new mode */
                if (lq_sta->stay_in_tbl) {
@@ -902,7 +903,7 @@
                        else
                                continue;
                        rs_collect_tx_data(tmp_tbl, rs_index, 1,
-                                          i < retries ? 0 : legacy_success);
+                                          i < retries ? 0 : legacy_success, priv);
                }
 
                /* Update success/fail counts if not searching for new mode */
 

Test conducted using debug flag 0x143fff.

Test on Site B resulted immediately in a hard crash upon resume, that is I got to X,
activated wireless, echoed 0x143fff to the sysfile and that was it.
But I didn't count on that one so I had no console to watch for it. So right now 
I got no clue what caused the crash, there's nothing in the logs, of course..

Next thing was to redo test on site B, but this time I switched to console beforehand.
Sure enough, this time nothing happened..

Log appended as bz2.

Hopefully lines like the following don't indicate essential info getting dropped:
May 31 17:23:17 localhost kernel: [91800.091565] net_ratelimit: 70 callbacks suppressed

This line indicates the first timestamp _after_ the crash:
May 31 17:35:19 localhost kernel: [   69.488456]

The crash happened after site A and on site B. Just arrived, opened lid and *crash*.

I noticed in iwl-agn-rs.c:2080:
  BUG_ON(window->average_tpt != ((window->success_ratio *
        tbl->expected_tpt[index] + 64) / 128));
Could that be again the point that hit me today when the machine crashed once?
Would you mind changing this into a milder WARN? That way I wouldn't hit the wall 
that hard. And I would notice it anyway while skimming the logs as we still are on the
hunt. It's more maintainable if it's a WARN in the src instead of me patching it w/ any
update..

Wasn't this BUG_ON a WARNING in .33.3? (didn't check..)

The dbg log contains all types of the errs happening here:

  - "deauthenticated" msgs w/ "reason 2"

  - "request scan called when driver not ready"

  - iwl_tx_agg_start

  - of course the "expected_tpt should have been[..]" don't show up anymore, the source
    has no more WARN regarding this..

And the rest of the 9Mb dbg log..

Could you tell me a bit about your idea of how to track those down? Maybe we can
speed things a little up. The logging and testing stuff takes a lot of time and 
most of it I have no clue why that might help or what the goal is..
Are you online in some #chan? 
  
# > # Regarding (2): This is a common issue in busy environments where AP
# > # decides to deathenticate station after it does not receive an ack for
# > # data sent after a few retries. Was this test done in busy environment?
# > Both. This happens in busy environment as well as in an idle one. Can't tell
# > yet whether there're more of those msgs in busy env. I start to feel against 
# > Cisco APs..
# 
# I don't know ... perhaps these APs have been set up to be strict wrt
# delays.
Sure. May well be.. For sure I'm no fan of this config/policy.. 
But wait. I noticed no such delays or disassociations using another notebook.
Well, seems I got to investigate in this, too.
 
# > # Regarding (3): Seems like driver is getting a request to scan after a
# > # request to remove interface. I am still inquiring about this.
# > Probably due to me switching of via RF_KILLSWITCH. But anyway I assume this
# > msg should not happen..
# 
# Absolutely. What are the exact steps you run when you encounter this
# issue?
Nothing particular. I.e. after the tests conducted toggle the hw kill switch. Then
a script gets called via acpi callback that in turn 1) kills the respective dhclient3, 2)
terminates wpa_supplicant and 3) removes the wifi modules (for power saving).

        Cheers,

                    Nils


[-- Attachment #2: 2010-05-31_iwlwifi_dbg_filter.bz2 --]
[-- Type: application/octet-stream, Size: 365121 bytes --]

  reply	other threads:[~2010-05-31 20:12 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-05-03 19:17 kernel BUG in iwl-agn-rs.c:2076, WAS: iwlagn + some accesspoint == hardlock NilsRadtkelkml
2010-05-03 19:22 ` John W. Linville
2010-05-06  9:14   ` Christian Borntraeger
2010-05-06 16:28     ` reinette chatre
2010-05-11 15:50       ` Christian Borntraeger
2010-05-11 17:21         ` reinette chatre
2010-05-12 15:18           ` Christian Borntraeger
2010-05-10 18:36   ` Nils Radtke
2010-05-10 23:32     ` reinette chatre
2010-05-12 14:39       ` Nils Radtke
2010-05-12 23:14         ` reinette chatre
2010-05-13 10:34           ` Nils Radtke
2010-05-13 11:32           ` Nils Radtke
2010-05-13 16:31             ` reinette chatre
2010-05-14 17:45               ` Nils Radtke
2010-05-13 15:05           ` Nils Radtke
2010-05-17 23:19             ` reinette chatre
2010-05-20 12:15               ` Nils Radtke
2010-05-20 18:33                 ` reinette chatre
2010-05-31 20:12                   ` Nils Radtke [this message]
2010-06-02 17:51                     ` reinette chatre
2010-06-04 16:57                       ` Nils Radtke
2010-06-08 17:46                         ` reinette chatre
2010-06-10 14:22                           ` Nils Radtke
2010-06-10 16:19                             ` reinette chatre
2010-05-20 12:31               ` Nils Radtke
2010-05-20 18:26                 ` reinette chatre
2010-05-20 22:30                 ` David Miller
  -- strict thread matches above, loose matches on Subject: below --
2010-05-11  9:41 Nils Radtke
2010-04-29 18:26 kernel BUG in iwl-agn-rs.c:2076,was: iwlagn + some accesspoint = hardlock lkml

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100531201225.GA6327@localhost \
    --to=lkml@think-future.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=reinette.chatre@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).