linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Help debugging iwldvm / ath10k stalls
@ 2014-05-25 17:23 Andy Lutomirski
  2014-05-26  5:06 ` Andy Lutomirski
  0 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2014-05-25 17:23 UTC (permalink / raw)
  To: Linux Wireless List

I have an old iwldvm station (Intel(R) Centrino(R) Ultimate-N 6300
AGN, REV=0x74) and a new ath10k AP (QCA9880-BR4A, I think, as found in
the TP-Link Archer C7 vs).  The former is Fedora's 3.14 kernel and the
latter is OpenWRT trunk from a few days ago.

Every now and then, my connection drops out.  From wireshark on the
station, it looks like traffic from the AP to the STA is coming
through but traffic from the STA to the AP is not.  I could be wrong
here, though.  Nothing is logged in either machine's kernel log.
After 5 seconds to a couple minutes, everything starts working again.

A couple of Apple devices using the same radio on the same AP seem to work fine.

Any advice for debugging this?  I don't know where to start.

Thanks,
Andy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-05-25 17:23 Help debugging iwldvm / ath10k stalls Andy Lutomirski
@ 2014-05-26  5:06 ` Andy Lutomirski
  2014-05-26  5:39   ` Emmanuel Grumbach
  0 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2014-05-26  5:06 UTC (permalink / raw)
  To: Linux Wireless List

[-- Attachment #1: Type: text/plain, Size: 1244 bytes --]

On Sun, May 25, 2014 at 10:23 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> I have an old iwldvm station (Intel(R) Centrino(R) Ultimate-N 6300
> AGN, REV=0x74) and a new ath10k AP (QCA9880-BR4A, I think, as found in
> the TP-Link Archer C7 vs).  The former is Fedora's 3.14 kernel and the
> latter is OpenWRT trunk from a few days ago.
>
> Every now and then, my connection drops out.  From wireshark on the
> station, it looks like traffic from the AP to the STA is coming
> through but traffic from the STA to the AP is not.  I could be wrong
> here, though.  Nothing is logged in either machine's kernel log.
> After 5 seconds to a couple minutes, everything starts working again.
>
> A couple of Apple devices using the same radio on the same AP seem to work fine.
>
> Any advice for debugging this?  I don't know where to start.

I caught one of the stalls with iwlwifi debugging on.  I've attached
the dmesg.  The stall ended ten seconds or so before the end of this
trace.  Is this an indication that something's wrong with iwlwifi?

[151258.096091] iwlwifi 0000:03:00.0: U iwlagn_good_plcp_health plcp health thre
shold 50 delta 96 msecs 103
[151258.096095] iwlwifi 0000:03:00.0: U iwl_force_rf_reset perform radio reset.


--Andy

[-- Attachment #2: iwl_stall.txt.xz --]
[-- Type: application/x-xz, Size: 15560 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-05-26  5:06 ` Andy Lutomirski
@ 2014-05-26  5:39   ` Emmanuel Grumbach
  2014-05-26  6:11     ` Andy Lutomirski
  0 siblings, 1 reply; 25+ messages in thread
From: Emmanuel Grumbach @ 2014-05-26  5:39 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Linux Wireless List

On Mon, May 26, 2014 at 8:06 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Sun, May 25, 2014 at 10:23 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> I have an old iwldvm station (Intel(R) Centrino(R) Ultimate-N 6300
>> AGN, REV=0x74) and a new ath10k AP (QCA9880-BR4A, I think, as found in
>> the TP-Link Archer C7 vs).  The former is Fedora's 3.14 kernel and the
>> latter is OpenWRT trunk from a few days ago.
>>
>> Every now and then, my connection drops out.  From wireshark on the
>> station, it looks like traffic from the AP to the STA is coming
>> through but traffic from the STA to the AP is not.  I could be wrong
>> here, though.  Nothing is logged in either machine's kernel log.
>> After 5 seconds to a couple minutes, everything starts working again.
>>
>> A couple of Apple devices using the same radio on the same AP seem to work fine.
>>
>> Any advice for debugging this?  I don't know where to start.
>
> I caught one of the stalls with iwlwifi debugging on.  I've attached
> the dmesg.  The stall ended ten seconds or so before the end of this
> trace.  Is this an indication that something's wrong with iwlwifi?
>
> [151258.096091] iwlwifi 0000:03:00.0: U iwlagn_good_plcp_health plcp health thre
> shold 50 delta 96 msecs 103
> [151258.096095] iwlwifi 0000:03:00.0: U iwl_force_rf_reset perform radio reset.
>
Can you please disable powersave?
It seems that it should be disabled by default, but you enabled it?
I might be confused though... Looking again at the code...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-05-26  5:39   ` Emmanuel Grumbach
@ 2014-05-26  6:11     ` Andy Lutomirski
  2014-05-26  8:29       ` Emmanuel Grumbach
  0 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2014-05-26  6:11 UTC (permalink / raw)
  To: Emmanuel Grumbach; +Cc: Linux Wireless List

On Sun, May 25, 2014 at 10:39 PM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
> On Mon, May 26, 2014 at 8:06 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Sun, May 25, 2014 at 10:23 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> I have an old iwldvm station (Intel(R) Centrino(R) Ultimate-N 6300
>>> AGN, REV=0x74) and a new ath10k AP (QCA9880-BR4A, I think, as found in
>>> the TP-Link Archer C7 vs).  The former is Fedora's 3.14 kernel and the
>>> latter is OpenWRT trunk from a few days ago.
>>>
>>> Every now and then, my connection drops out.  From wireshark on the
>>> station, it looks like traffic from the AP to the STA is coming
>>> through but traffic from the STA to the AP is not.  I could be wrong
>>> here, though.  Nothing is logged in either machine's kernel log.
>>> After 5 seconds to a couple minutes, everything starts working again.
>>>
>>> A couple of Apple devices using the same radio on the same AP seem to work fine.
>>>
>>> Any advice for debugging this?  I don't know where to start.
>>
>> I caught one of the stalls with iwlwifi debugging on.  I've attached
>> the dmesg.  The stall ended ten seconds or so before the end of this
>> trace.  Is this an indication that something's wrong with iwlwifi?
>>
>> [151258.096091] iwlwifi 0000:03:00.0: U iwlagn_good_plcp_health plcp health thre
>> shold 50 delta 96 msecs 103
>> [151258.096095] iwlwifi 0000:03:00.0: U iwl_force_rf_reset perform radio reset.
>>
> Can you please disable powersave?
> It seems that it should be disabled by default, but you enabled it?
> I might be confused though... Looking again at the code...

This?

$ cat /sys/module/iwlwifi/parameters/power_save
N

I have pcie_aspm.policy=powersave, but I don't think that should
matter.  Or am I missing something?

--Andy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-05-26  6:11     ` Andy Lutomirski
@ 2014-05-26  8:29       ` Emmanuel Grumbach
  2014-05-26  8:39         ` Emmanuel Grumbach
  0 siblings, 1 reply; 25+ messages in thread
From: Emmanuel Grumbach @ 2014-05-26  8:29 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Linux Wireless List

>>>>
>>>> Every now and then, my connection drops out.  From wireshark on the
>>>> station, it looks like traffic from the AP to the STA is coming
>>>> through but traffic from the STA to the AP is not.  I could be wrong
>>>> here, though.  Nothing is logged in either machine's kernel log.
>>>> After 5 seconds to a couple minutes, everything starts working again.
>>>>
>>>> A couple of Apple devices using the same radio on the same AP seem to work fine.
>>>>
>>>> Any advice for debugging this?  I don't know where to start.
>>>
>>> I caught one of the stalls with iwlwifi debugging on.  I've attached
>>> the dmesg.  The stall ended ten seconds or so before the end of this
>>> trace.  Is this an indication that something's wrong with iwlwifi?
>>>
>>> [151258.096091] iwlwifi 0000:03:00.0: U iwlagn_good_plcp_health plcp health thre
>>> shold 50 delta 96 msecs 103
>>> [151258.096095] iwlwifi 0000:03:00.0: U iwl_force_rf_reset perform radio reset.
>>>
>> Can you please disable powersave?
>> It seems that it should be disabled by default, but you enabled it?
>> I might be confused though... Looking again at the code...
>
> This?
>
> $ cat /sys/module/iwlwifi/parameters/power_save
> N
>
> I have pcie_aspm.policy=powersave, but I don't think that should
> matter.  Or am I missing something?

That should be ok.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-05-26  8:29       ` Emmanuel Grumbach
@ 2014-05-26  8:39         ` Emmanuel Grumbach
  2014-05-26 17:28           ` Andy Lutomirski
  0 siblings, 1 reply; 25+ messages in thread
From: Emmanuel Grumbach @ 2014-05-26  8:39 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Linux Wireless List

On Mon, May 26, 2014 at 11:29 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>>>>>
>>>>> Every now and then, my connection drops out.  From wireshark on the
>>>>> station, it looks like traffic from the AP to the STA is coming
>>>>> through but traffic from the STA to the AP is not.  I could be wrong
>>>>> here, though.  Nothing is logged in either machine's kernel log.
>>>>> After 5 seconds to a couple minutes, everything starts working again.
>>>>>
>>>>> A couple of Apple devices using the same radio on the same AP seem to work fine.
>>>>>
>>>>> Any advice for debugging this?  I don't know where to start.
>>>>
>>>> I caught one of the stalls with iwlwifi debugging on.  I've attached
>>>> the dmesg.  The stall ended ten seconds or so before the end of this
>>>> trace.  Is this an indication that something's wrong with iwlwifi?
>>>>
>>>> [151258.096091] iwlwifi 0000:03:00.0: U iwlagn_good_plcp_health plcp health thre
>>>> shold 50 delta 96 msecs 103
>>>> [151258.096095] iwlwifi 0000:03:00.0: U iwl_force_rf_reset perform radio reset.
>>>>
>>> Can you please disable powersave?
>>> It seems that it should be disabled by default, but you enabled it?
>>> I might be confused though... Looking again at the code...
>>
>> This?
>>
>> $ cat /sys/module/iwlwifi/parameters/power_save
>> N
>>
>> I have pcie_aspm.policy=powersave, but I don't think that should
>> matter.  Or am I missing something?
>
> That should be ok.

Does this help?

diff --git a/drivers/net/wireless/iwlwifi/dvm/power.c
b/drivers/net/wireless/iwlwifi/dvm/power.c
index f2c1439..e5a6b70 100644
--- a/drivers/net/wireless/iwlwifi/dvm/power.c
+++ b/drivers/net/wireless/iwlwifi/dvm/power.c
@@ -288,6 +288,9 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
        bool enabled = priv->hw->conf.flags & IEEE80211_CONF_PS;
        int dtimper;

+       iwl_power_sleep_cam_cmd(priv, cmd);
+       return;
+
        dtimper = priv->hw->conf.ps_dtim_period ?: 1;

        if (priv->wowlan)

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-05-26  8:39         ` Emmanuel Grumbach
@ 2014-05-26 17:28           ` Andy Lutomirski
  2014-05-26 18:00             ` Emmanuel Grumbach
  0 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2014-05-26 17:28 UTC (permalink / raw)
  To: Emmanuel Grumbach; +Cc: Linux Wireless List

On Mon, May 26, 2014 at 1:39 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
> On Mon, May 26, 2014 at 11:29 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>>>>>>
>>>>>> Every now and then, my connection drops out.  From wireshark on the
>>>>>> station, it looks like traffic from the AP to the STA is coming
>>>>>> through but traffic from the STA to the AP is not.  I could be wrong
>>>>>> here, though.  Nothing is logged in either machine's kernel log.
>>>>>> After 5 seconds to a couple minutes, everything starts working again.
>>>>>>
>>>>>> A couple of Apple devices using the same radio on the same AP seem to work fine.
>>>>>>
>>>>>> Any advice for debugging this?  I don't know where to start.
>>>>>
>>>>> I caught one of the stalls with iwlwifi debugging on.  I've attached
>>>>> the dmesg.  The stall ended ten seconds or so before the end of this
>>>>> trace.  Is this an indication that something's wrong with iwlwifi?
>>>>>
>>>>> [151258.096091] iwlwifi 0000:03:00.0: U iwlagn_good_plcp_health plcp health thre
>>>>> shold 50 delta 96 msecs 103
>>>>> [151258.096095] iwlwifi 0000:03:00.0: U iwl_force_rf_reset perform radio reset.
>>>>>
>>>> Can you please disable powersave?
>>>> It seems that it should be disabled by default, but you enabled it?
>>>> I might be confused though... Looking again at the code...
>>>
>>> This?
>>>
>>> $ cat /sys/module/iwlwifi/parameters/power_save
>>> N
>>>
>>> I have pcie_aspm.policy=powersave, but I don't think that should
>>> matter.  Or am I missing something?
>>
>> That should be ok.
>
> Does this help?
>
> diff --git a/drivers/net/wireless/iwlwifi/dvm/power.c
> b/drivers/net/wireless/iwlwifi/dvm/power.c
> index f2c1439..e5a6b70 100644
> --- a/drivers/net/wireless/iwlwifi/dvm/power.c
> +++ b/drivers/net/wireless/iwlwifi/dvm/power.c
> @@ -288,6 +288,9 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
>         bool enabled = priv->hw->conf.flags & IEEE80211_CONF_PS;
>         int dtimper;
>
> +       iwl_power_sleep_cam_cmd(priv, cmd);
> +       return;
> +
>         dtimper = priv->hw->conf.ps_dtim_period ?: 1;
>
>         if (priv->wowlan)

This patch has survived for about 15 minutes on 3.15-rc.  It certainly
has some effect: pinging from the AP to the STA now takes a consistent
~3.8ms instead of taking varying amounts of time between 5 and 200 ms
or so.

I wonder if this is something 802.11n/802.11ac-specific?  I remember
having all kinds of problems with 802.11n on this laptop that were
resolved by turning off 11n or turning off power saving.

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-05-26 17:28           ` Andy Lutomirski
@ 2014-05-26 18:00             ` Emmanuel Grumbach
  2014-05-26 18:08               ` Andy Lutomirski
  0 siblings, 1 reply; 25+ messages in thread
From: Emmanuel Grumbach @ 2014-05-26 18:00 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Linux Wireless List

>>>>>>> Every now and then, my connection drops out.  From wireshark on the
>>>>>>> station, it looks like traffic from the AP to the STA is coming
>>>>>>> through but traffic from the STA to the AP is not.  I could be wrong
>>>>>>> here, though.  Nothing is logged in either machine's kernel log.
>>>>>>> After 5 seconds to a couple minutes, everything starts working again.
>>>>>>>
>>>>>>> A couple of Apple devices using the same radio on the same AP seem to work fine.
>>>>>>>
>>>>>>> Any advice for debugging this?  I don't know where to start.
>>>>>>
>>>>>> I caught one of the stalls with iwlwifi debugging on.  I've attached
>>>>>> the dmesg.  The stall ended ten seconds or so before the end of this
>>>>>> trace.  Is this an indication that something's wrong with iwlwifi?
>>>>>>
>>>>>> [151258.096091] iwlwifi 0000:03:00.0: U iwlagn_good_plcp_health plcp health thre
>>>>>> shold 50 delta 96 msecs 103
>>>>>> [151258.096095] iwlwifi 0000:03:00.0: U iwl_force_rf_reset perform radio reset.
>>>>>>
>>>>> Can you please disable powersave?
>>>>> It seems that it should be disabled by default, but you enabled it?
>>>>> I might be confused though... Looking again at the code...
>>>>
>>>> This?
>>>>
>>>> $ cat /sys/module/iwlwifi/parameters/power_save
>>>> N
>>>>
>>>> I have pcie_aspm.policy=powersave, but I don't think that should
>>>> matter.  Or am I missing something?
>>>
>>> That should be ok.
>>
>> Does this help?
>>
>> diff --git a/drivers/net/wireless/iwlwifi/dvm/power.c
>> b/drivers/net/wireless/iwlwifi/dvm/power.c
>> index f2c1439..e5a6b70 100644
>> --- a/drivers/net/wireless/iwlwifi/dvm/power.c
>> +++ b/drivers/net/wireless/iwlwifi/dvm/power.c
>> @@ -288,6 +288,9 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
>>         bool enabled = priv->hw->conf.flags & IEEE80211_CONF_PS;
>>         int dtimper;
>>
>> +       iwl_power_sleep_cam_cmd(priv, cmd);
>> +       return;
>> +
>>         dtimper = priv->hw->conf.ps_dtim_period ?: 1;
>>
>>         if (priv->wowlan)
>
> This patch has survived for about 15 minutes on 3.15-rc.  It certainly
> has some effect: pinging from the AP to the STA now takes a consistent
> ~3.8ms instead of taking varying amounts of time between 5 and 200 ms
> or so.
>
> I wonder if this is something 802.11n/802.11ac-specific?  I remember
> having all kinds of problems with 802.11n on this laptop that were
> resolved by turning off 11n or turning off power saving.
>

well... my patch really turns off power saving. The hard way.
But since you are having issues - I guess I'll post it and disable
power save for these devices. Note that today most of the power save
features are disabled anyway.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-05-26 18:00             ` Emmanuel Grumbach
@ 2014-05-26 18:08               ` Andy Lutomirski
  2014-05-26 18:13                 ` Emmanuel Grumbach
  0 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2014-05-26 18:08 UTC (permalink / raw)
  To: Emmanuel Grumbach; +Cc: Linux Wireless List

On Mon, May 26, 2014 at 11:00 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>>>>>>>> Every now and then, my connection drops out.  From wireshark on the
>>>>>>>> station, it looks like traffic from the AP to the STA is coming
>>>>>>>> through but traffic from the STA to the AP is not.  I could be wrong
>>>>>>>> here, though.  Nothing is logged in either machine's kernel log.
>>>>>>>> After 5 seconds to a couple minutes, everything starts working again.
>>>>>>>>
>>>>>>>> A couple of Apple devices using the same radio on the same AP seem to work fine.
>>>>>>>>
>>>>>>>> Any advice for debugging this?  I don't know where to start.
>>>>>>>
>>>>>>> I caught one of the stalls with iwlwifi debugging on.  I've attached
>>>>>>> the dmesg.  The stall ended ten seconds or so before the end of this
>>>>>>> trace.  Is this an indication that something's wrong with iwlwifi?
>>>>>>>
>>>>>>> [151258.096091] iwlwifi 0000:03:00.0: U iwlagn_good_plcp_health plcp health thre
>>>>>>> shold 50 delta 96 msecs 103
>>>>>>> [151258.096095] iwlwifi 0000:03:00.0: U iwl_force_rf_reset perform radio reset.
>>>>>>>
>>>>>> Can you please disable powersave?
>>>>>> It seems that it should be disabled by default, but you enabled it?
>>>>>> I might be confused though... Looking again at the code...
>>>>>
>>>>> This?
>>>>>
>>>>> $ cat /sys/module/iwlwifi/parameters/power_save
>>>>> N
>>>>>
>>>>> I have pcie_aspm.policy=powersave, but I don't think that should
>>>>> matter.  Or am I missing something?
>>>>
>>>> That should be ok.
>>>
>>> Does this help?
>>>
>>> diff --git a/drivers/net/wireless/iwlwifi/dvm/power.c
>>> b/drivers/net/wireless/iwlwifi/dvm/power.c
>>> index f2c1439..e5a6b70 100644
>>> --- a/drivers/net/wireless/iwlwifi/dvm/power.c
>>> +++ b/drivers/net/wireless/iwlwifi/dvm/power.c
>>> @@ -288,6 +288,9 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
>>>         bool enabled = priv->hw->conf.flags & IEEE80211_CONF_PS;
>>>         int dtimper;
>>>
>>> +       iwl_power_sleep_cam_cmd(priv, cmd);
>>> +       return;
>>> +
>>>         dtimper = priv->hw->conf.ps_dtim_period ?: 1;
>>>
>>>         if (priv->wowlan)
>>
>> This patch has survived for about 15 minutes on 3.15-rc.  It certainly
>> has some effect: pinging from the AP to the STA now takes a consistent
>> ~3.8ms instead of taking varying amounts of time between 5 and 200 ms
>> or so.
>>
>> I wonder if this is something 802.11n/802.11ac-specific?  I remember
>> having all kinds of problems with 802.11n on this laptop that were
>> resolved by turning off 11n or turning off power saving.
>>
>
> well... my patch really turns off power saving. The hard way.
> But since you are having issues - I guess I'll post it and disable
> power save for these devices. Note that today most of the power save
> features are disabled anyway.

Would it make sense to do this only when connected to a VHT-supporting
AP or whatever AP feature is probably triggering the problem?

*sigh* I'd just plug in a new PCIe wireless card, except that I think
that my evil BIOS will refuse to boot if I do that.

--Andy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-05-26 18:08               ` Andy Lutomirski
@ 2014-05-26 18:13                 ` Emmanuel Grumbach
  2014-05-26 18:47                   ` Andy Lutomirski
  0 siblings, 1 reply; 25+ messages in thread
From: Emmanuel Grumbach @ 2014-05-26 18:13 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Linux Wireless List

>>>>
>>>> diff --git a/drivers/net/wireless/iwlwifi/dvm/power.c
>>>> b/drivers/net/wireless/iwlwifi/dvm/power.c
>>>> index f2c1439..e5a6b70 100644
>>>> --- a/drivers/net/wireless/iwlwifi/dvm/power.c
>>>> +++ b/drivers/net/wireless/iwlwifi/dvm/power.c
>>>> @@ -288,6 +288,9 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
>>>>         bool enabled = priv->hw->conf.flags & IEEE80211_CONF_PS;
>>>>         int dtimper;
>>>>
>>>> +       iwl_power_sleep_cam_cmd(priv, cmd);
>>>> +       return;
>>>> +
>>>>         dtimper = priv->hw->conf.ps_dtim_period ?: 1;
>>>>
>>>>         if (priv->wowlan)
>>>
>>> This patch has survived for about 15 minutes on 3.15-rc.  It certainly
>>> has some effect: pinging from the AP to the STA now takes a consistent
>>> ~3.8ms instead of taking varying amounts of time between 5 and 200 ms
>>> or so.
>>>
>>> I wonder if this is something 802.11n/802.11ac-specific?  I remember
>>> having all kinds of problems with 802.11n on this laptop that were
>>> resolved by turning off 11n or turning off power saving.
>>>
>>
>> well... my patch really turns off power saving. The hard way.
>> But since you are having issues - I guess I'll post it and disable
>> power save for these devices. Note that today most of the power save
>> features are disabled anyway.
>
> Would it make sense to do this only when connected to a VHT-supporting
> AP or whatever AP feature is probably triggering the problem?

I am not sure it is related to some "VHT / whatever" feature. From
your logs, I'd say this bug could have happened with any AP. It is
down there in the PCI layer.

>
> *sigh* I'd just plug in a new PCIe wireless card, except that I think
> that my evil BIOS will refuse to boot if I do that.
>
I can recommend 7260 :)
Before you do that, can you check this one?

diff --git a/drivers/net/wireless/iwlwifi/dvm/power.c
b/drivers/net/wireless/iwlwifi/dvm/power.c
index f2c1439..047112c 100644
--- a/drivers/net/wireless/iwlwifi/dvm/power.c
+++ b/drivers/net/wireless/iwlwifi/dvm/power.c
@@ -292,6 +292,8 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,

        if (priv->wowlan)
                iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, dtimper);
+       if (!enabled)
+               iwl_power_sleep_cam_cmd(priv, cmd);
        else if (!priv->lib->no_idle_support &&
                 priv->hw->conf.flags & IEEE80211_CONF_IDLE)
                iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, 20);
@@ -299,9 +301,7 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
                /* in thermal throttling low power state */
                iwl_static_sleep_cmd(priv, cmd,
                    iwl_tt_current_power_mode(priv), dtimper);
-       } else if (!enabled)
-               iwl_power_sleep_cam_cmd(priv, cmd);
-       else if (priv->power_data.debug_sleep_level_override >= 0)
+       } else if (priv->power_data.debug_sleep_level_override >= 0)
                iwl_static_sleep_cmd(priv, cmd,

priv->power_data.debug_sleep_level_override,
                                     dtimper);

This is the final version I want to send.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-05-26 18:13                 ` Emmanuel Grumbach
@ 2014-05-26 18:47                   ` Andy Lutomirski
  2014-05-27 17:09                     ` Andy Lutomirski
  0 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2014-05-26 18:47 UTC (permalink / raw)
  To: Emmanuel Grumbach; +Cc: Linux Wireless List

On Mon, May 26, 2014 at 11:13 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>>>>>
>>>>> diff --git a/drivers/net/wireless/iwlwifi/dvm/power.c
>>>>> b/drivers/net/wireless/iwlwifi/dvm/power.c
>>>>> index f2c1439..e5a6b70 100644
>>>>> --- a/drivers/net/wireless/iwlwifi/dvm/power.c
>>>>> +++ b/drivers/net/wireless/iwlwifi/dvm/power.c
>>>>> @@ -288,6 +288,9 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
>>>>>         bool enabled = priv->hw->conf.flags & IEEE80211_CONF_PS;
>>>>>         int dtimper;
>>>>>
>>>>> +       iwl_power_sleep_cam_cmd(priv, cmd);
>>>>> +       return;
>>>>> +
>>>>>         dtimper = priv->hw->conf.ps_dtim_period ?: 1;
>>>>>
>>>>>         if (priv->wowlan)
>>>>
>>>> This patch has survived for about 15 minutes on 3.15-rc.  It certainly
>>>> has some effect: pinging from the AP to the STA now takes a consistent
>>>> ~3.8ms instead of taking varying amounts of time between 5 and 200 ms
>>>> or so.
>>>>
>>>> I wonder if this is something 802.11n/802.11ac-specific?  I remember
>>>> having all kinds of problems with 802.11n on this laptop that were
>>>> resolved by turning off 11n or turning off power saving.
>>>>
>>>
>>> well... my patch really turns off power saving. The hard way.
>>> But since you are having issues - I guess I'll post it and disable
>>> power save for these devices. Note that today most of the power save
>>> features are disabled anyway.
>>
>> Would it make sense to do this only when connected to a VHT-supporting
>> AP or whatever AP feature is probably triggering the problem?
>
> I am not sure it is related to some "VHT / whatever" feature. From
> your logs, I'd say this bug could have happened with any AP. It is
> down there in the PCI layer.
>
>>
>> *sigh* I'd just plug in a new PCIe wireless card, except that I think
>> that my evil BIOS will refuse to boot if I do that.
>>
> I can recommend 7260 :)

http://www.thinkwiki.org/wiki/Problem_with_unauthorized_MiniPCI_network_card

I'm not sure how I feel about flashing my BIOS with random things off
the internet :(

> Before you do that, can you check this one?
>
> diff --git a/drivers/net/wireless/iwlwifi/dvm/power.c
> b/drivers/net/wireless/iwlwifi/dvm/power.c
> index f2c1439..047112c 100644
> --- a/drivers/net/wireless/iwlwifi/dvm/power.c
> +++ b/drivers/net/wireless/iwlwifi/dvm/power.c
> @@ -292,6 +292,8 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
>
>         if (priv->wowlan)
>                 iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, dtimper);
> +       if (!enabled)
> +               iwl_power_sleep_cam_cmd(priv, cmd);
>         else if (!priv->lib->no_idle_support &&
>                  priv->hw->conf.flags & IEEE80211_CONF_IDLE)
>                 iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, 20);
> @@ -299,9 +301,7 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
>                 /* in thermal throttling low power state */
>                 iwl_static_sleep_cmd(priv, cmd,
>                     iwl_tt_current_power_mode(priv), dtimper);
> -       } else if (!enabled)
> -               iwl_power_sleep_cam_cmd(priv, cmd);
> -       else if (priv->power_data.debug_sleep_level_override >= 0)
> +       } else if (priv->power_data.debug_sleep_level_override >= 0)
>                 iwl_static_sleep_cmd(priv, cmd,
>
> priv->power_data.debug_sleep_level_override,
>                                      dtimper);
>
> This is the final version I want to send.

I'll test it tonight.  I'm away from both the network and the laptop right now.

--Andy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-05-26 18:47                   ` Andy Lutomirski
@ 2014-05-27 17:09                     ` Andy Lutomirski
  2014-05-27 18:01                       ` Andy Lutomirski
  0 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2014-05-27 17:09 UTC (permalink / raw)
  To: Emmanuel Grumbach; +Cc: Linux Wireless List

[-- Attachment #1: Type: text/plain, Size: 4150 bytes --]

On Mon, May 26, 2014 at 11:47 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, May 26, 2014 at 11:13 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>>>>>>
>>>>>> diff --git a/drivers/net/wireless/iwlwifi/dvm/power.c
>>>>>> b/drivers/net/wireless/iwlwifi/dvm/power.c
>>>>>> index f2c1439..e5a6b70 100644
>>>>>> --- a/drivers/net/wireless/iwlwifi/dvm/power.c
>>>>>> +++ b/drivers/net/wireless/iwlwifi/dvm/power.c
>>>>>> @@ -288,6 +288,9 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
>>>>>>         bool enabled = priv->hw->conf.flags & IEEE80211_CONF_PS;
>>>>>>         int dtimper;
>>>>>>
>>>>>> +       iwl_power_sleep_cam_cmd(priv, cmd);
>>>>>> +       return;
>>>>>> +
>>>>>>         dtimper = priv->hw->conf.ps_dtim_period ?: 1;
>>>>>>
>>>>>>         if (priv->wowlan)
>>>>>
>>>>> This patch has survived for about 15 minutes on 3.15-rc.  It certainly
>>>>> has some effect: pinging from the AP to the STA now takes a consistent
>>>>> ~3.8ms instead of taking varying amounts of time between 5 and 200 ms
>>>>> or so.
>>>>>
>>>>> I wonder if this is something 802.11n/802.11ac-specific?  I remember
>>>>> having all kinds of problems with 802.11n on this laptop that were
>>>>> resolved by turning off 11n or turning off power saving.
>>>>>
>>>>
>>>> well... my patch really turns off power saving. The hard way.
>>>> But since you are having issues - I guess I'll post it and disable
>>>> power save for these devices. Note that today most of the power save
>>>> features are disabled anyway.
>>>
>>> Would it make sense to do this only when connected to a VHT-supporting
>>> AP or whatever AP feature is probably triggering the problem?
>>
>> I am not sure it is related to some "VHT / whatever" feature. From
>> your logs, I'd say this bug could have happened with any AP. It is
>> down there in the PCI layer.
>>
>>>
>>> *sigh* I'd just plug in a new PCIe wireless card, except that I think
>>> that my evil BIOS will refuse to boot if I do that.
>>>
>> I can recommend 7260 :)
>
> http://www.thinkwiki.org/wiki/Problem_with_unauthorized_MiniPCI_network_card
>
> I'm not sure how I feel about flashing my BIOS with random things off
> the internet :(
>
>> Before you do that, can you check this one?
>>
>> diff --git a/drivers/net/wireless/iwlwifi/dvm/power.c
>> b/drivers/net/wireless/iwlwifi/dvm/power.c
>> index f2c1439..047112c 100644
>> --- a/drivers/net/wireless/iwlwifi/dvm/power.c
>> +++ b/drivers/net/wireless/iwlwifi/dvm/power.c
>> @@ -292,6 +292,8 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
>>
>>         if (priv->wowlan)
>>                 iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, dtimper);
>> +       if (!enabled)
>> +               iwl_power_sleep_cam_cmd(priv, cmd);
>>         else if (!priv->lib->no_idle_support &&
>>                  priv->hw->conf.flags & IEEE80211_CONF_IDLE)
>>                 iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, 20);
>> @@ -299,9 +301,7 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
>>                 /* in thermal throttling low power state */
>>                 iwl_static_sleep_cmd(priv, cmd,
>>                     iwl_tt_current_power_mode(priv), dtimper);
>> -       } else if (!enabled)
>> -               iwl_power_sleep_cam_cmd(priv, cmd);
>> -       else if (priv->power_data.debug_sleep_level_override >= 0)
>> +       } else if (priv->power_data.debug_sleep_level_override >= 0)
>>                 iwl_static_sleep_cmd(priv, cmd,
>>
>> priv->power_data.debug_sleep_level_override,
>>                                      dtimper);
>>
>> This is the final version I want to send.
>
> I'll test it tonight.  I'm away from both the network and the laptop right now.

No good :(  I got the same issue again, but it still sames rarer than unpatched.

It looks like your patch allows both:

iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, dtimper);

and

iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, 20);

I don't know whether that's a problem.

Also, is this bad?

iwlwifi 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control

lspci seems to think that L1 is enabled (see attached logs).

--Andy

[-- Attachment #2: iwl_stall_2.txt.xz --]
[-- Type: application/x-xz, Size: 8636 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-05-27 17:09                     ` Andy Lutomirski
@ 2014-05-27 18:01                       ` Andy Lutomirski
  2014-05-27 20:30                         ` Emmanuel Grumbach
  0 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2014-05-27 18:01 UTC (permalink / raw)
  To: Emmanuel Grumbach; +Cc: Linux Wireless List

On Tue, May 27, 2014 at 10:09 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, May 26, 2014 at 11:47 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Mon, May 26, 2014 at 11:13 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>>>>>>>
>>>>>>> diff --git a/drivers/net/wireless/iwlwifi/dvm/power.c
>>>>>>> b/drivers/net/wireless/iwlwifi/dvm/power.c
>>>>>>> index f2c1439..e5a6b70 100644
>>>>>>> --- a/drivers/net/wireless/iwlwifi/dvm/power.c
>>>>>>> +++ b/drivers/net/wireless/iwlwifi/dvm/power.c
>>>>>>> @@ -288,6 +288,9 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
>>>>>>>         bool enabled = priv->hw->conf.flags & IEEE80211_CONF_PS;
>>>>>>>         int dtimper;
>>>>>>>
>>>>>>> +       iwl_power_sleep_cam_cmd(priv, cmd);
>>>>>>> +       return;
>>>>>>> +
>>>>>>>         dtimper = priv->hw->conf.ps_dtim_period ?: 1;
>>>>>>>
>>>>>>>         if (priv->wowlan)
>>>>>>
>>>>>> This patch has survived for about 15 minutes on 3.15-rc.  It certainly
>>>>>> has some effect: pinging from the AP to the STA now takes a consistent
>>>>>> ~3.8ms instead of taking varying amounts of time between 5 and 200 ms
>>>>>> or so.
>>>>>>
>>>>>> I wonder if this is something 802.11n/802.11ac-specific?  I remember
>>>>>> having all kinds of problems with 802.11n on this laptop that were
>>>>>> resolved by turning off 11n or turning off power saving.
>>>>>>
>>>>>
>>>>> well... my patch really turns off power saving. The hard way.
>>>>> But since you are having issues - I guess I'll post it and disable
>>>>> power save for these devices. Note that today most of the power save
>>>>> features are disabled anyway.
>>>>
>>>> Would it make sense to do this only when connected to a VHT-supporting
>>>> AP or whatever AP feature is probably triggering the problem?
>>>
>>> I am not sure it is related to some "VHT / whatever" feature. From
>>> your logs, I'd say this bug could have happened with any AP. It is
>>> down there in the PCI layer.
>>>
>>>>
>>>> *sigh* I'd just plug in a new PCIe wireless card, except that I think
>>>> that my evil BIOS will refuse to boot if I do that.
>>>>
>>> I can recommend 7260 :)
>>
>> http://www.thinkwiki.org/wiki/Problem_with_unauthorized_MiniPCI_network_card
>>
>> I'm not sure how I feel about flashing my BIOS with random things off
>> the internet :(
>>
>>> Before you do that, can you check this one?
>>>
>>> diff --git a/drivers/net/wireless/iwlwifi/dvm/power.c
>>> b/drivers/net/wireless/iwlwifi/dvm/power.c
>>> index f2c1439..047112c 100644
>>> --- a/drivers/net/wireless/iwlwifi/dvm/power.c
>>> +++ b/drivers/net/wireless/iwlwifi/dvm/power.c
>>> @@ -292,6 +292,8 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
>>>
>>>         if (priv->wowlan)
>>>                 iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, dtimper);
>>> +       if (!enabled)
>>> +               iwl_power_sleep_cam_cmd(priv, cmd);
>>>         else if (!priv->lib->no_idle_support &&
>>>                  priv->hw->conf.flags & IEEE80211_CONF_IDLE)
>>>                 iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, 20);
>>> @@ -299,9 +301,7 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
>>>                 /* in thermal throttling low power state */
>>>                 iwl_static_sleep_cmd(priv, cmd,
>>>                     iwl_tt_current_power_mode(priv), dtimper);
>>> -       } else if (!enabled)
>>> -               iwl_power_sleep_cam_cmd(priv, cmd);
>>> -       else if (priv->power_data.debug_sleep_level_override >= 0)
>>> +       } else if (priv->power_data.debug_sleep_level_override >= 0)
>>>                 iwl_static_sleep_cmd(priv, cmd,
>>>
>>> priv->power_data.debug_sleep_level_override,
>>>                                      dtimper);
>>>
>>> This is the final version I want to send.
>>
>> I'll test it tonight.  I'm away from both the network and the laptop right now.
>
> No good :(  I got the same issue again, but it still sames rarer than unpatched.
>
> It looks like your patch allows both:
>
> iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, dtimper);
>
> and
>
> iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, 20);
>
> I don't know whether that's a problem.
>
> Also, is this bad?
>
> iwlwifi 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
>
> lspci seems to think that L1 is enabled (see attached logs).
>

This seems surprisingly consistent: on first boot, wireless is okay w/
this AP, but after suspend/resume, it's unreliable.  I could be wrong
(the unreliability isn't *that* consistent), but I've just seen this a
couple more times.

Using pcie_aspm=force to forcibly disable L1 makes no difference.

Is there possibly a driver bug that causes some workaround to be
applied on boot but not on resume?  Or is BIOS doing something
different?

--Andy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-05-27 18:01                       ` Andy Lutomirski
@ 2014-05-27 20:30                         ` Emmanuel Grumbach
  2014-05-28  3:07                           ` Andy Lutomirski
  0 siblings, 1 reply; 25+ messages in thread
From: Emmanuel Grumbach @ 2014-05-27 20:30 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Linux Wireless List

>>>>>>>> diff --git a/drivers/net/wireless/iwlwifi/dvm/power.c
>>>>>>>> b/drivers/net/wireless/iwlwifi/dvm/power.c
>>>>>>>> index f2c1439..e5a6b70 100644
>>>>>>>> --- a/drivers/net/wireless/iwlwifi/dvm/power.c
>>>>>>>> +++ b/drivers/net/wireless/iwlwifi/dvm/power.c
>>>>>>>> @@ -288,6 +288,9 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
>>>>>>>>         bool enabled = priv->hw->conf.flags & IEEE80211_CONF_PS;
>>>>>>>>         int dtimper;
>>>>>>>>
>>>>>>>> +       iwl_power_sleep_cam_cmd(priv, cmd);
>>>>>>>> +       return;
>>>>>>>> +
>>>>>>>>         dtimper = priv->hw->conf.ps_dtim_period ?: 1;
>>>>>>>>
>>>>>>>>         if (priv->wowlan)
>>>>>>>
>>>>>>> This patch has survived for about 15 minutes on 3.15-rc.  It certainly
>>>>>>> has some effect: pinging from the AP to the STA now takes a consistent
>>>>>>> ~3.8ms instead of taking varying amounts of time between 5 and 200 ms
>>>>>>> or so.
>>>>>>>
>>>>>>> I wonder if this is something 802.11n/802.11ac-specific?  I remember
>>>>>>> having all kinds of problems with 802.11n on this laptop that were
>>>>>>> resolved by turning off 11n or turning off power saving.
>>>>>>>
>>>>>>
>>>>>> well... my patch really turns off power saving. The hard way.
>>>>>> But since you are having issues - I guess I'll post it and disable
>>>>>> power save for these devices. Note that today most of the power save
>>>>>> features are disabled anyway.
>>>>>
>>>>> Would it make sense to do this only when connected to a VHT-supporting
>>>>> AP or whatever AP feature is probably triggering the problem?
>>>>
>>>> I am not sure it is related to some "VHT / whatever" feature. From
>>>> your logs, I'd say this bug could have happened with any AP. It is
>>>> down there in the PCI layer.
>>>>
>>>>>
>>>>> *sigh* I'd just plug in a new PCIe wireless card, except that I think
>>>>> that my evil BIOS will refuse to boot if I do that.
>>>>>
>>>> I can recommend 7260 :)
>>>
>>> http://www.thinkwiki.org/wiki/Problem_with_unauthorized_MiniPCI_network_card
>>>
>>> I'm not sure how I feel about flashing my BIOS with random things off
>>> the internet :(
>>>
>>>> Before you do that, can you check this one?
>>>>
>>>> diff --git a/drivers/net/wireless/iwlwifi/dvm/power.c
>>>> b/drivers/net/wireless/iwlwifi/dvm/power.c
>>>> index f2c1439..047112c 100644
>>>> --- a/drivers/net/wireless/iwlwifi/dvm/power.c
>>>> +++ b/drivers/net/wireless/iwlwifi/dvm/power.c
>>>> @@ -292,6 +292,8 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
>>>>
>>>>         if (priv->wowlan)
>>>>                 iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, dtimper);
>>>> +       if (!enabled)
>>>> +               iwl_power_sleep_cam_cmd(priv, cmd);
>>>>         else if (!priv->lib->no_idle_support &&
>>>>                  priv->hw->conf.flags & IEEE80211_CONF_IDLE)
>>>>                 iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, 20);
>>>> @@ -299,9 +301,7 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
>>>>                 /* in thermal throttling low power state */
>>>>                 iwl_static_sleep_cmd(priv, cmd,
>>>>                     iwl_tt_current_power_mode(priv), dtimper);
>>>> -       } else if (!enabled)
>>>> -               iwl_power_sleep_cam_cmd(priv, cmd);
>>>> -       else if (priv->power_data.debug_sleep_level_override >= 0)
>>>> +       } else if (priv->power_data.debug_sleep_level_override >= 0)
>>>>                 iwl_static_sleep_cmd(priv, cmd,
>>>>
>>>> priv->power_data.debug_sleep_level_override,
>>>>                                      dtimper);
>>>>
>>>> This is the final version I want to send.
>>>
>>> I'll test it tonight.  I'm away from both the network and the laptop right now.
>>
>> No good :(  I got the same issue again, but it still sames rarer than unpatched.
>>
>> It looks like your patch allows both:
>>
>> iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, dtimper);
>>
>> and
>>
>> iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, 20);
>>
>> I don't know whether that's a problem.

I'd expect enable to be false... But I might have got lost in
mac80211's PS code... Can  you check the value of enable? Thanks
Anyway - I want to call the cam one. IOW, I want to disable power save
completely. This is what is causing trouble.
We want to add a frame on the ring, and since the NIC is asleep, we
schedule a wakeup of the NIC so that it can fetch the packet when it
wakes up... But it doesn't wake up...
Now - This is firmware / hardware related... which means that there
isn't much I can do for this old device.
 Bottom line, I guess we want to disable power save here.
If you can bisect this, this can be really helpful. This might allow
us to keep the feature.

>>
>> Also, is this bad?
>>
>> iwlwifi 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control
>>
>> lspci seems to think that L1 is enabled (see attached logs).
>>
>

yeah... that's a mess. I have suffered a lot from L1, so... I disable
it (not for newer devices). But here, your BIOS doesn't let me disable
it...

> This seems surprisingly consistent: on first boot, wireless is okay w/
> this AP, but after suspend/resume, it's unreliable.  I could be wrong
> (the unreliability isn't *that* consistent), but I've just seen this a
> couple more times.
>
> Using pcie_aspm=force to forcibly disable L1 makes no difference.

no wonder.

>
> Is there possibly a driver bug that causes some workaround to be
> applied on boot but not on resume?  Or is BIOS doing something
> different?
>

I find it hard to believe, but one never knows...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-05-27 20:30                         ` Emmanuel Grumbach
@ 2014-05-28  3:07                           ` Andy Lutomirski
  2014-05-28 12:09                             ` Emmanuel Grumbach
  0 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2014-05-28  3:07 UTC (permalink / raw)
  To: Emmanuel Grumbach; +Cc: Linux Wireless List

[-- Attachment #1: Type: text/plain, Size: 2892 bytes --]

On Tue, May 27, 2014 at 1:30 PM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>>>
>>> It looks like your patch allows both:
>>>
>>> iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, dtimper);
>>>
>>> and
>>>
>>> iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, 20);
>>>
>>> I don't know whether that's a problem.
>
> I'd expect enable to be false... But I might have got lost in
> mac80211's PS code... Can  you check the value of enable? Thanks
> Anyway - I want to call the cam one. IOW, I want to disable power save
> completely. This is what is causing trouble.
> We want to add a frame on the ring, and since the NIC is asleep, we
> schedule a wakeup of the NIC so that it can fetch the packet when it
> wakes up... But it doesn't wake up...
> Now - This is firmware / hardware related... which means that there
> isn't much I can do for this old device.
>  Bottom line, I guess we want to disable power save here.
> If you can bisect this, this can be really helpful. This might allow
> us to keep the feature.

I doubt I can bisect -- the trigger was a new AP, not a new kernel.  I
can't exactly cut the AP in half :)


Pre-suspend, i.e., working:

[   20.949900] enabled = 1, wowlan = 0
[   20.950177] enabled = 1, wowlan = 0
[   21.614016] enabled = 1, wowlan = 0
[   21.614658] enabled = 1, wowlan = 0
[   42.667586] enabled = 0, wowlan = 0
[   42.672514] enabled = 1, wowlan = 0
[   53.088165] fuse init (API version 7.23)
[   53.102082] SELinux: initialized (dev fuse, type fuse), uses genfs_contexts
[   53.130945] SELinux: initialized (dev fusectl, type fusectl), uses
genfs_contexts
[   85.627558] enabled = 0, wowlan = 0
[   85.631686] enabled = 1, wowlan = 0
[  134.649346] e1000e: em1 NIC Link is Down
[  137.682277] wlan0: deauthenticating from 02:c6:26:cc:b4:c7 by local
choice (Reason: 3=DEAUTH_LEAVING)
[  137.682780] enabled = 0, wowlan = 0
[  137.693889] enabled = 0, wowlan = 0

Post-suspend, i.e., not working:

[  144.406303] enabled = 1, wowlan = 0
[  144.406496] enabled = 1, wowlan = 0
[  145.026827] enabled = 1, wowlan = 0
[  145.028211] enabled = 1, wowlan = 0
[  165.688632] enabled = 0, wowlan = 0
[  165.689960] enabled = 0, wowlan = 0
[  165.693988] enabled = 1, wowlan = 0
[  165.694245] enabled = 1, wowlan = 0
[  208.641426] enabled = 0, wowlan = 0
[  208.641786] enabled = 0, wowlan = 0
[  208.647499] enabled = 1, wowlan = 0
[  208.647639] enabled = 1, wowlan = 0
[  271.435558] enabled = 0, wowlan = 0
[  271.435767] enabled = 0, wowlan = 0
[  271.440125] enabled = 1, wowlan = 0
[  271.440405] enabled = 1, wowlan = 0

With even more instrumentation added, I did get a glitch before
suspend/resume, but it came with more than two power setting updates.
Logs and patch attached, complete with call stacks.

Is it possible that part of the problem is that the firmware doesn't
like being spammed with power config changes so frequently?

--Andy

[-- Attachment #2: iwl_stall.txt.xz --]
[-- Type: application/x-xz, Size: 15560 bytes --]

[-- Attachment #3: iwlwifi_hack.patch --]
[-- Type: text/x-patch, Size: 1472 bytes --]

commit 5225b6b3256c399d29de1e0db1915e07005516e7
Author: Andy Lutomirski <luto@amacapital.net>
Date:   Tue May 27 10:06:09 2014 -0700

    iwlwifi pm hack

diff --git a/drivers/net/wireless/iwlwifi/dvm/power.c b/drivers/net/wireless/iwlwifi/dvm/power.c
index b4e6141..5d94162 100644
--- a/drivers/net/wireless/iwlwifi/dvm/power.c
+++ b/drivers/net/wireless/iwlwifi/dvm/power.c
@@ -288,10 +288,15 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
 	bool enabled = priv->hw->conf.flags & IEEE80211_CONF_PS;
 	int dtimper;
 
+	printk(KERN_INFO "enabled = %d, wowlan = %d\n", (int)enabled, (int)priv->wowlan);
+	dump_stack();
+
 	dtimper = priv->hw->conf.ps_dtim_period ?: 1;
 
 	if (priv->wowlan)
 		iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, dtimper);
+	if (!enabled)
+		iwl_power_sleep_cam_cmd(priv, cmd);
 	else if (!priv->lib->no_idle_support &&
 		 priv->hw->conf.flags & IEEE80211_CONF_IDLE)
 		iwl_static_sleep_cmd(priv, cmd, IWL_POWER_INDEX_5, 20);
@@ -299,9 +304,7 @@ static void iwl_power_build_cmd(struct iwl_priv *priv,
 		/* in thermal throttling low power state */
 		iwl_static_sleep_cmd(priv, cmd,
 		    iwl_tt_current_power_mode(priv), dtimper);
-	} else if (!enabled)
-		iwl_power_sleep_cam_cmd(priv, cmd);
-	else if (priv->power_data.debug_sleep_level_override >= 0)
+	} else if (priv->power_data.debug_sleep_level_override >= 0)
 		iwl_static_sleep_cmd(priv, cmd,
 				     priv->power_data.debug_sleep_level_override,
 				     dtimper);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-05-28  3:07                           ` Andy Lutomirski
@ 2014-05-28 12:09                             ` Emmanuel Grumbach
  2014-06-02 16:54                               ` Andy Lutomirski
  0 siblings, 1 reply; 25+ messages in thread
From: Emmanuel Grumbach @ 2014-05-28 12:09 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Linux Wireless List

>
> I doubt I can bisect -- the trigger was a new AP, not a new kernel.  I
> can't exactly cut the AP in half :)

I see.. This is really weird though. Anyway.

>
>
> Pre-suspend, i.e., working:
>
> [   20.949900] enabled = 1, wowlan = 0
> [   20.950177] enabled = 1, wowlan = 0
> [   21.614016] enabled = 1, wowlan = 0
> [   21.614658] enabled = 1, wowlan = 0
> [   42.667586] enabled = 0, wowlan = 0
> [   42.672514] enabled = 1, wowlan = 0
> [   53.088165] fuse init (API version 7.23)
> [   53.102082] SELinux: initialized (dev fuse, type fuse), uses genfs_contexts
> [   53.130945] SELinux: initialized (dev fusectl, type fusectl), uses
> genfs_contexts
> [   85.627558] enabled = 0, wowlan = 0
> [   85.631686] enabled = 1, wowlan = 0
> [  134.649346] e1000e: em1 NIC Link is Down
> [  137.682277] wlan0: deauthenticating from 02:c6:26:cc:b4:c7 by local
> choice (Reason: 3=DEAUTH_LEAVING)
> [  137.682780] enabled = 0, wowlan = 0
> [  137.693889] enabled = 0, wowlan = 0
>
> Post-suspend, i.e., not working:
>
> [  144.406303] enabled = 1, wowlan = 0
> [  144.406496] enabled = 1, wowlan = 0
> [  145.026827] enabled = 1, wowlan = 0
> [  145.028211] enabled = 1, wowlan = 0
> [  165.688632] enabled = 0, wowlan = 0
> [  165.689960] enabled = 0, wowlan = 0
> [  165.693988] enabled = 1, wowlan = 0
> [  165.694245] enabled = 1, wowlan = 0
> [  208.641426] enabled = 0, wowlan = 0
> [  208.641786] enabled = 0, wowlan = 0
> [  208.647499] enabled = 1, wowlan = 0
> [  208.647639] enabled = 1, wowlan = 0
> [  271.435558] enabled = 0, wowlan = 0
> [  271.435767] enabled = 0, wowlan = 0
> [  271.440125] enabled = 1, wowlan = 0
> [  271.440405] enabled = 1, wowlan = 0
>
> With even more instrumentation added, I did get a glitch before
> suspend/resume, but it came with more than two power setting updates.
> Logs and patch attached, complete with call stacks.
>

I don't see any callstacks?
Doesn't matter though.

> Is it possible that part of the problem is that the firmware doesn't
> like being spammed with power config changes so frequently?
>
Don't think so.
But the suspend / resume thing is not the kind of things that are fun to debug.
I am afraid that I will disable power save completely since I don't
really have the time to dig into that.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-05-28 12:09                             ` Emmanuel Grumbach
@ 2014-06-02 16:54                               ` Andy Lutomirski
  2014-06-02 18:11                                 ` Andy Lutomirski
  0 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2014-06-02 16:54 UTC (permalink / raw)
  To: Emmanuel Grumbach; +Cc: Linux Wireless List

On Wed, May 28, 2014 at 5:09 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>>
>> I doubt I can bisect -- the trigger was a new AP, not a new kernel.  I
>> can't exactly cut the AP in half :)
>
> I see.. This is really weird though. Anyway.
>
>>
>>
>> Pre-suspend, i.e., working:
>>
>> [   20.949900] enabled = 1, wowlan = 0
>> [   20.950177] enabled = 1, wowlan = 0
>> [   21.614016] enabled = 1, wowlan = 0
>> [   21.614658] enabled = 1, wowlan = 0
>> [   42.667586] enabled = 0, wowlan = 0
>> [   42.672514] enabled = 1, wowlan = 0
>> [   53.088165] fuse init (API version 7.23)
>> [   53.102082] SELinux: initialized (dev fuse, type fuse), uses genfs_contexts
>> [   53.130945] SELinux: initialized (dev fusectl, type fusectl), uses
>> genfs_contexts
>> [   85.627558] enabled = 0, wowlan = 0
>> [   85.631686] enabled = 1, wowlan = 0
>> [  134.649346] e1000e: em1 NIC Link is Down
>> [  137.682277] wlan0: deauthenticating from 02:c6:26:cc:b4:c7 by local
>> choice (Reason: 3=DEAUTH_LEAVING)
>> [  137.682780] enabled = 0, wowlan = 0
>> [  137.693889] enabled = 0, wowlan = 0
>>
>> Post-suspend, i.e., not working:
>>
>> [  144.406303] enabled = 1, wowlan = 0
>> [  144.406496] enabled = 1, wowlan = 0
>> [  145.026827] enabled = 1, wowlan = 0
>> [  145.028211] enabled = 1, wowlan = 0
>> [  165.688632] enabled = 0, wowlan = 0
>> [  165.689960] enabled = 0, wowlan = 0
>> [  165.693988] enabled = 1, wowlan = 0
>> [  165.694245] enabled = 1, wowlan = 0
>> [  208.641426] enabled = 0, wowlan = 0
>> [  208.641786] enabled = 0, wowlan = 0
>> [  208.647499] enabled = 1, wowlan = 0
>> [  208.647639] enabled = 1, wowlan = 0
>> [  271.435558] enabled = 0, wowlan = 0
>> [  271.435767] enabled = 0, wowlan = 0
>> [  271.440125] enabled = 1, wowlan = 0
>> [  271.440405] enabled = 1, wowlan = 0
>>
>> With even more instrumentation added, I did get a glitch before
>> suspend/resume, but it came with more than two power setting updates.
>> Logs and patch attached, complete with call stacks.
>>
>
> I don't see any callstacks?
> Doesn't matter though.

I think the callstacks were in the attachment.  I could have messed up, though.

Anyway, I don't buy the theory that this is caused by the firmware
going out to lunch.  The queues files in debugfs show the rx queue
chugging along and all of the tx queues have read_ptr == write_ptr.
Wireshark shows incoming broadcast traffic, too.  I'd guess that the
problem is more likely to be that the card is failing to wake up and
notice pending data in the TIM.

--Andy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-06-02 16:54                               ` Andy Lutomirski
@ 2014-06-02 18:11                                 ` Andy Lutomirski
  2014-06-02 18:40                                   ` Emmanuel Grumbach
  0 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2014-06-02 18:11 UTC (permalink / raw)
  To: Emmanuel Grumbach; +Cc: Linux Wireless List

On Mon, Jun 2, 2014 at 9:54 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Wed, May 28, 2014 at 5:09 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>>>
>>> I doubt I can bisect -- the trigger was a new AP, not a new kernel.  I
>>> can't exactly cut the AP in half :)
>>
>> I see.. This is really weird though. Anyway.
>>
>>>
>>>
>>> Pre-suspend, i.e., working:
>>>
>>> [   20.949900] enabled = 1, wowlan = 0
>>> [   20.950177] enabled = 1, wowlan = 0
>>> [   21.614016] enabled = 1, wowlan = 0
>>> [   21.614658] enabled = 1, wowlan = 0
>>> [   42.667586] enabled = 0, wowlan = 0
>>> [   42.672514] enabled = 1, wowlan = 0
>>> [   53.088165] fuse init (API version 7.23)
>>> [   53.102082] SELinux: initialized (dev fuse, type fuse), uses genfs_contexts
>>> [   53.130945] SELinux: initialized (dev fusectl, type fusectl), uses
>>> genfs_contexts
>>> [   85.627558] enabled = 0, wowlan = 0
>>> [   85.631686] enabled = 1, wowlan = 0
>>> [  134.649346] e1000e: em1 NIC Link is Down
>>> [  137.682277] wlan0: deauthenticating from 02:c6:26:cc:b4:c7 by local
>>> choice (Reason: 3=DEAUTH_LEAVING)
>>> [  137.682780] enabled = 0, wowlan = 0
>>> [  137.693889] enabled = 0, wowlan = 0
>>>
>>> Post-suspend, i.e., not working:
>>>
>>> [  144.406303] enabled = 1, wowlan = 0
>>> [  144.406496] enabled = 1, wowlan = 0
>>> [  145.026827] enabled = 1, wowlan = 0
>>> [  145.028211] enabled = 1, wowlan = 0
>>> [  165.688632] enabled = 0, wowlan = 0
>>> [  165.689960] enabled = 0, wowlan = 0
>>> [  165.693988] enabled = 1, wowlan = 0
>>> [  165.694245] enabled = 1, wowlan = 0
>>> [  208.641426] enabled = 0, wowlan = 0
>>> [  208.641786] enabled = 0, wowlan = 0
>>> [  208.647499] enabled = 1, wowlan = 0
>>> [  208.647639] enabled = 1, wowlan = 0
>>> [  271.435558] enabled = 0, wowlan = 0
>>> [  271.435767] enabled = 0, wowlan = 0
>>> [  271.440125] enabled = 1, wowlan = 0
>>> [  271.440405] enabled = 1, wowlan = 0
>>>
>>> With even more instrumentation added, I did get a glitch before
>>> suspend/resume, but it came with more than two power setting updates.
>>> Logs and patch attached, complete with call stacks.
>>>
>>
>> I don't see any callstacks?
>> Doesn't matter though.
>
> I think the callstacks were in the attachment.  I could have messed up, though.
>
> Anyway, I don't buy the theory that this is caused by the firmware
> going out to lunch.  The queues files in debugfs show the rx queue
> chugging along and all of the tx queues have read_ptr == write_ptr.
> Wireshark shows incoming broadcast traffic, too.  I'd guess that the
> problem is more likely to be that the card is failing to wake up and
> notice pending data in the TIM.

OTOH, with iwlwifi.11n_disable=4 (no rx A-MPDU), I seem to be doing
pretty well.  I'll test a stock kernel configured like that for the
next few days.

--Andy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-06-02 18:11                                 ` Andy Lutomirski
@ 2014-06-02 18:40                                   ` Emmanuel Grumbach
  2014-06-02 18:45                                     ` Andy Lutomirski
  0 siblings, 1 reply; 25+ messages in thread
From: Emmanuel Grumbach @ 2014-06-02 18:40 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Linux Wireless List



On 06/02/2014 09:11 PM, Andy Lutomirski wrote:
> On Mon, Jun 2, 2014 at 9:54 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Wed, May 28, 2014 at 5:09 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>>>>
>>>> I doubt I can bisect -- the trigger was a new AP, not a new kernel.  I
>>>> can't exactly cut the AP in half :)
>>>
>>> I see.. This is really weird though. Anyway.
>>>
>>>>
>>>>
>>>> Pre-suspend, i.e., working:
>>>>
>>>> [   20.949900] enabled = 1, wowlan = 0
>>>> [   20.950177] enabled = 1, wowlan = 0
>>>> [   21.614016] enabled = 1, wowlan = 0
>>>> [   21.614658] enabled = 1, wowlan = 0
>>>> [   42.667586] enabled = 0, wowlan = 0
>>>> [   42.672514] enabled = 1, wowlan = 0
>>>> [   53.088165] fuse init (API version 7.23)
>>>> [   53.102082] SELinux: initialized (dev fuse, type fuse), uses genfs_contexts
>>>> [   53.130945] SELinux: initialized (dev fusectl, type fusectl), uses
>>>> genfs_contexts
>>>> [   85.627558] enabled = 0, wowlan = 0
>>>> [   85.631686] enabled = 1, wowlan = 0
>>>> [  134.649346] e1000e: em1 NIC Link is Down
>>>> [  137.682277] wlan0: deauthenticating from 02:c6:26:cc:b4:c7 by local
>>>> choice (Reason: 3=DEAUTH_LEAVING)
>>>> [  137.682780] enabled = 0, wowlan = 0
>>>> [  137.693889] enabled = 0, wowlan = 0
>>>>
>>>> Post-suspend, i.e., not working:
>>>>
>>>> [  144.406303] enabled = 1, wowlan = 0
>>>> [  144.406496] enabled = 1, wowlan = 0
>>>> [  145.026827] enabled = 1, wowlan = 0
>>>> [  145.028211] enabled = 1, wowlan = 0
>>>> [  165.688632] enabled = 0, wowlan = 0
>>>> [  165.689960] enabled = 0, wowlan = 0
>>>> [  165.693988] enabled = 1, wowlan = 0
>>>> [  165.694245] enabled = 1, wowlan = 0
>>>> [  208.641426] enabled = 0, wowlan = 0
>>>> [  208.641786] enabled = 0, wowlan = 0
>>>> [  208.647499] enabled = 1, wowlan = 0
>>>> [  208.647639] enabled = 1, wowlan = 0
>>>> [  271.435558] enabled = 0, wowlan = 0
>>>> [  271.435767] enabled = 0, wowlan = 0
>>>> [  271.440125] enabled = 1, wowlan = 0
>>>> [  271.440405] enabled = 1, wowlan = 0
>>>>
>>>> With even more instrumentation added, I did get a glitch before
>>>> suspend/resume, but it came with more than two power setting updates.
>>>> Logs and patch attached, complete with call stacks.
>>>>
>>>
>>> I don't see any callstacks?
>>> Doesn't matter though.
>>
>> I think the callstacks were in the attachment.  I could have messed up, though.
>>
>> Anyway, I don't buy the theory that this is caused by the firmware
>> going out to lunch.  The queues files in debugfs show the rx queue
>> chugging along and all of the tx queues have read_ptr == write_ptr.
>> Wireshark shows incoming broadcast traffic, too.  I'd guess that the
>> problem is more likely to be that the card is failing to wake up and
>> notice pending data in the TIM.

Well... I might have been unclear here (I never know how much detail I should share with the recipient :)).
>From your log it appears that the NIC is in power save. So we can't increment the write pointer of the Tx ring (add a packet for transmission). So we simply remember that we need to do so (increment the write pointer) and request a wakeup so that we will update the write pointer in the wakeup interrupt... which doesn't happen.
No power save - no need for wakeup interrupt.

> 
> OTOH, with iwlwifi.11n_disable=4 (no rx A-MPDU), I seem to be doing
> pretty well.  I'll test a stock kernel configured like that for the
> next few days.
> 

That's interesting...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-06-02 18:40                                   ` Emmanuel Grumbach
@ 2014-06-02 18:45                                     ` Andy Lutomirski
  2014-06-02 18:53                                       ` Emmanuel Grumbach
  0 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2014-06-02 18:45 UTC (permalink / raw)
  To: Emmanuel Grumbach; +Cc: Linux Wireless List

On Mon, Jun 2, 2014 at 11:40 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>
>
> On 06/02/2014 09:11 PM, Andy Lutomirski wrote:
>> On Mon, Jun 2, 2014 at 9:54 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>> On Wed, May 28, 2014 at 5:09 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>>>>>
>>>>> I doubt I can bisect -- the trigger was a new AP, not a new kernel.  I
>>>>> can't exactly cut the AP in half :)
>>>>
>>>> I see.. This is really weird though. Anyway.
>>>>
>>>>>
>>>>>
>>>>> Pre-suspend, i.e., working:
>>>>>
>>>>> [   20.949900] enabled = 1, wowlan = 0
>>>>> [   20.950177] enabled = 1, wowlan = 0
>>>>> [   21.614016] enabled = 1, wowlan = 0
>>>>> [   21.614658] enabled = 1, wowlan = 0
>>>>> [   42.667586] enabled = 0, wowlan = 0
>>>>> [   42.672514] enabled = 1, wowlan = 0
>>>>> [   53.088165] fuse init (API version 7.23)
>>>>> [   53.102082] SELinux: initialized (dev fuse, type fuse), uses genfs_contexts
>>>>> [   53.130945] SELinux: initialized (dev fusectl, type fusectl), uses
>>>>> genfs_contexts
>>>>> [   85.627558] enabled = 0, wowlan = 0
>>>>> [   85.631686] enabled = 1, wowlan = 0
>>>>> [  134.649346] e1000e: em1 NIC Link is Down
>>>>> [  137.682277] wlan0: deauthenticating from 02:c6:26:cc:b4:c7 by local
>>>>> choice (Reason: 3=DEAUTH_LEAVING)
>>>>> [  137.682780] enabled = 0, wowlan = 0
>>>>> [  137.693889] enabled = 0, wowlan = 0
>>>>>
>>>>> Post-suspend, i.e., not working:
>>>>>
>>>>> [  144.406303] enabled = 1, wowlan = 0
>>>>> [  144.406496] enabled = 1, wowlan = 0
>>>>> [  145.026827] enabled = 1, wowlan = 0
>>>>> [  145.028211] enabled = 1, wowlan = 0
>>>>> [  165.688632] enabled = 0, wowlan = 0
>>>>> [  165.689960] enabled = 0, wowlan = 0
>>>>> [  165.693988] enabled = 1, wowlan = 0
>>>>> [  165.694245] enabled = 1, wowlan = 0
>>>>> [  208.641426] enabled = 0, wowlan = 0
>>>>> [  208.641786] enabled = 0, wowlan = 0
>>>>> [  208.647499] enabled = 1, wowlan = 0
>>>>> [  208.647639] enabled = 1, wowlan = 0
>>>>> [  271.435558] enabled = 0, wowlan = 0
>>>>> [  271.435767] enabled = 0, wowlan = 0
>>>>> [  271.440125] enabled = 1, wowlan = 0
>>>>> [  271.440405] enabled = 1, wowlan = 0
>>>>>
>>>>> With even more instrumentation added, I did get a glitch before
>>>>> suspend/resume, but it came with more than two power setting updates.
>>>>> Logs and patch attached, complete with call stacks.
>>>>>
>>>>
>>>> I don't see any callstacks?
>>>> Doesn't matter though.
>>>
>>> I think the callstacks were in the attachment.  I could have messed up, though.
>>>
>>> Anyway, I don't buy the theory that this is caused by the firmware
>>> going out to lunch.  The queues files in debugfs show the rx queue
>>> chugging along and all of the tx queues have read_ptr == write_ptr.
>>> Wireshark shows incoming broadcast traffic, too.  I'd guess that the
>>> problem is more likely to be that the card is failing to wake up and
>>> notice pending data in the TIM.
>
> Well... I might have been unclear here (I never know how much detail I should share with the recipient :)).
> From your log it appears that the NIC is in power save. So we can't increment the write pointer of the Tx ring (add a packet for transmission). So we simply remember that we need to do so (increment the write pointer) and request a wakeup so that we will update the write pointer in the wakeup interrupt... which doesn't happen.
> No power save - no need for wakeup interrupt.

I'm still unconvinced.  One of the tx queues actually has a both
read_ptr and write_ptr incrementing once or twice per second even when
I can't ping the gateway.  Can you point me at the right code or log
stuff to look at?

>
>>
>> OTOH, with iwlwifi.11n_disable=4 (no rx A-MPDU), I seem to be doing
>> pretty well.  I'll test a stock kernel configured like that for the
>> next few days.
>>
>
> That's interesting...

--Andy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-06-02 18:45                                     ` Andy Lutomirski
@ 2014-06-02 18:53                                       ` Emmanuel Grumbach
  2014-06-02 18:56                                         ` Andy Lutomirski
  0 siblings, 1 reply; 25+ messages in thread
From: Emmanuel Grumbach @ 2014-06-02 18:53 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Linux Wireless List


>>>>
>>>> Anyway, I don't buy the theory that this is caused by the firmware
>>>> going out to lunch.  The queues files in debugfs show the rx queue
>>>> chugging along and all of the tx queues have read_ptr == write_ptr.
>>>> Wireshark shows incoming broadcast traffic, too.  I'd guess that the
>>>> problem is more likely to be that the card is failing to wake up and
>>>> notice pending data in the TIM.
>>
>> Well... I might have been unclear here (I never know how much detail I should share with the recipient :)).
>> From your log it appears that the NIC is in power save. So we can't increment the write pointer of the Tx ring (add a packet for transmission). So we simply remember that we need to do so (increment the write pointer) and request a wakeup so that we will update the write pointer in the wakeup interrupt... which doesn't happen.
>> No power save - no need for wakeup interrupt.
> 
> I'm still unconvinced.  One of the tx queues actually has a both
> read_ptr and write_ptr incrementing once or twice per second even when
> I can't ping the gateway.  Can you point me at the right code or log
> stuff to look at?

drivers/net/wireless/iwlwifi/pcie/tx.c:
static void iwl_pcie_txq_inc_wr_ptr(struct iwl_trans *trans,
                                    struct iwl_txq *txq)
[snip]

                if (reg & CSR_UCODE_DRV_GP1_BIT_MAC_SLEEP) {
                        IWL_DEBUG_INFO(trans, "Tx queue %d requesting wakeup, GP1 = 0x%x\n",
                                       txq_id, reg);
                        iwl_set_bit(trans, CSR_GP_CNTRL,
                                    CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ);
                        txq->need_update = true;
                        return;
                }
[snip]
}

ISR - drivers/net/wireless/iwlwifi/pcie/tx.c:
irqreturn_t iwl_pcie_irq_handler(int irq, void *dev_id)
{
[snip]
        /* uCode wakes up after power-down sleep */
        if (inta & CSR_INT_BIT_WAKEUP) {
                IWL_DEBUG_ISR(trans, "Wakeup interrupt\n");
                iwl_pcie_rxq_check_wrptr(trans);
                iwl_pcie_txq_check_wrptrs(trans);

                isr_stats->wakeup++;

                handled |= CSR_INT_BIT_WAKEUP;
        }
[snip]
}

drivers/net/wireless/iwlwifi/pcie/tx.c:
void iwl_pcie_txq_check_wrptrs(struct iwl_trans *trans)
{
        struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
        int i;

        for (i = 0; i < trans->cfg->base_params->num_of_queues; i++) {
                struct iwl_txq *txq = &trans_pcie->txq[i];

                spin_lock_bh(&txq->lock);
                if (trans_pcie->txq[i].need_update) {
                        iwl_pcie_txq_inc_wr_ptr(trans, txq);
                        trans_pcie->txq[i].need_update = false;
                }
                spin_unlock_bh(&txq->lock);
        }
}

Note that the CMD queue (9 or 4 depending on your configuration) - uses another mechanism that is safer but consumes more power. This queue is intended for commands and not for real Tx packets.


> 
>>
>>>
>>> OTOH, with iwlwifi.11n_disable=4 (no rx A-MPDU), I seem to be doing
>>> pretty well.  I'll test a stock kernel configured like that for the
>>> next few days.
>>>
>>
>> That's interesting...
> 
> --Andy
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-06-02 18:53                                       ` Emmanuel Grumbach
@ 2014-06-02 18:56                                         ` Andy Lutomirski
  2014-06-02 19:32                                           ` Emmanuel Grumbach
  0 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2014-06-02 18:56 UTC (permalink / raw)
  To: Emmanuel Grumbach; +Cc: Linux Wireless List

On Mon, Jun 2, 2014 at 11:53 AM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>
>>>>>
>>>>> Anyway, I don't buy the theory that this is caused by the firmware
>>>>> going out to lunch.  The queues files in debugfs show the rx queue
>>>>> chugging along and all of the tx queues have read_ptr == write_ptr.
>>>>> Wireshark shows incoming broadcast traffic, too.  I'd guess that the
>>>>> problem is more likely to be that the card is failing to wake up and
>>>>> notice pending data in the TIM.
>>>
>>> Well... I might have been unclear here (I never know how much detail I should share with the recipient :)).
>>> From your log it appears that the NIC is in power save. So we can't increment the write pointer of the Tx ring (add a packet for transmission). So we simply remember that we need to do so (increment the write pointer) and request a wakeup so that we will update the write pointer in the wakeup interrupt... which doesn't happen.
>>> No power save - no need for wakeup interrupt.
>>
>> I'm still unconvinced.  One of the tx queues actually has a both
>> read_ptr and write_ptr incrementing once or twice per second even when
>> I can't ping the gateway.  Can you point me at the right code or log
>> stuff to look at?
>
> drivers/net/wireless/iwlwifi/pcie/tx.c:
> static void iwl_pcie_txq_inc_wr_ptr(struct iwl_trans *trans,
>                                     struct iwl_txq *txq)
> [snip]
>
>                 if (reg & CSR_UCODE_DRV_GP1_BIT_MAC_SLEEP) {
>                         IWL_DEBUG_INFO(trans, "Tx queue %d requesting wakeup, GP1 = 0x%x\n",
>                                        txq_id, reg);
>                         iwl_set_bit(trans, CSR_GP_CNTRL,
>                                     CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ);
>                         txq->need_update = true;
>                         return;
>                 }
> [snip]
> }
>
> ISR - drivers/net/wireless/iwlwifi/pcie/tx.c:
> irqreturn_t iwl_pcie_irq_handler(int irq, void *dev_id)
> {
> [snip]
>         /* uCode wakes up after power-down sleep */
>         if (inta & CSR_INT_BIT_WAKEUP) {
>                 IWL_DEBUG_ISR(trans, "Wakeup interrupt\n");
>                 iwl_pcie_rxq_check_wrptr(trans);
>                 iwl_pcie_txq_check_wrptrs(trans);
>
>                 isr_stats->wakeup++;
>
>                 handled |= CSR_INT_BIT_WAKEUP;
>         }
> [snip]
> }
>
> drivers/net/wireless/iwlwifi/pcie/tx.c:
> void iwl_pcie_txq_check_wrptrs(struct iwl_trans *trans)
> {
>         struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
>         int i;
>
>         for (i = 0; i < trans->cfg->base_params->num_of_queues; i++) {
>                 struct iwl_txq *txq = &trans_pcie->txq[i];
>
>                 spin_lock_bh(&txq->lock);
>                 if (trans_pcie->txq[i].need_update) {
>                         iwl_pcie_txq_inc_wr_ptr(trans, txq);
>                         trans_pcie->txq[i].need_update = false;
>                 }
>                 spin_unlock_bh(&txq->lock);
>         }
> }
>
> Note that the CMD queue (9 or 4 depending on your configuration) - uses another mechanism that is safer but consumes more power. This queue is intended for commands and not for real Tx packets.

If that's queue 4 in the debugfs tx_queues file, then I think that
that's the queue that keeps moving when the card is dead.  I'll see if
I can improve the tx_queues file to show pending updates.

Yay troubleshooting old hardware.

--Andy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-06-02 18:56                                         ` Andy Lutomirski
@ 2014-06-02 19:32                                           ` Emmanuel Grumbach
  2014-06-02 19:57                                             ` Andy Lutomirski
  0 siblings, 1 reply; 25+ messages in thread
From: Emmanuel Grumbach @ 2014-06-02 19:32 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Linux Wireless List

>>
>>>>>>
>>>>>> Anyway, I don't buy the theory that this is caused by the firmware
>>>>>> going out to lunch.  The queues files in debugfs show the rx queue
>>>>>> chugging along and all of the tx queues have read_ptr == write_ptr.
>>>>>> Wireshark shows incoming broadcast traffic, too.  I'd guess that the
>>>>>> problem is more likely to be that the card is failing to wake up and
>>>>>> notice pending data in the TIM.
>>>>
>>>> Well... I might have been unclear here (I never know how much detail I should share with the recipient :)).
>>>> From your log it appears that the NIC is in power save. So we can't increment the write pointer of the Tx ring (add a packet for transmission). So we simply remember that we need to do so (increment the write pointer) and request a wakeup so that we will update the write pointer in the wakeup interrupt... which doesn't happen.
>>>> No power save - no need for wakeup interrupt.
>>>
>>> I'm still unconvinced.  One of the tx queues actually has a both
>>> read_ptr and write_ptr incrementing once or twice per second even when
>>> I can't ping the gateway.  Can you point me at the right code or log
>>> stuff to look at?
>>
>> drivers/net/wireless/iwlwifi/pcie/tx.c:
>> static void iwl_pcie_txq_inc_wr_ptr(struct iwl_trans *trans,
>>                                     struct iwl_txq *txq)
>> [snip]
>>
>>                 if (reg & CSR_UCODE_DRV_GP1_BIT_MAC_SLEEP) {
>>                         IWL_DEBUG_INFO(trans, "Tx queue %d requesting wakeup, GP1 = 0x%x\n",
>>                                        txq_id, reg);
>>                         iwl_set_bit(trans, CSR_GP_CNTRL,
>>                                     CSR_GP_CNTRL_REG_FLAG_MAC_ACCESS_REQ);
>>                         txq->need_update = true;
>>                         return;
>>                 }
>> [snip]
>> }
>>
>> ISR - drivers/net/wireless/iwlwifi/pcie/tx.c:
>> irqreturn_t iwl_pcie_irq_handler(int irq, void *dev_id)
>> {
>> [snip]
>>         /* uCode wakes up after power-down sleep */
>>         if (inta & CSR_INT_BIT_WAKEUP) {
>>                 IWL_DEBUG_ISR(trans, "Wakeup interrupt\n");
>>                 iwl_pcie_rxq_check_wrptr(trans);
>>                 iwl_pcie_txq_check_wrptrs(trans);
>>
>>                 isr_stats->wakeup++;
>>
>>                 handled |= CSR_INT_BIT_WAKEUP;
>>         }
>> [snip]
>> }
>>
>> drivers/net/wireless/iwlwifi/pcie/tx.c:
>> void iwl_pcie_txq_check_wrptrs(struct iwl_trans *trans)
>> {
>>         struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
>>         int i;
>>
>>         for (i = 0; i < trans->cfg->base_params->num_of_queues; i++) {
>>                 struct iwl_txq *txq = &trans_pcie->txq[i];
>>
>>                 spin_lock_bh(&txq->lock);
>>                 if (trans_pcie->txq[i].need_update) {
>>                         iwl_pcie_txq_inc_wr_ptr(trans, txq);
>>                         trans_pcie->txq[i].need_update = false;
>>                 }
>>                 spin_unlock_bh(&txq->lock);
>>         }
>> }
>>
>> Note that the CMD queue (9 or 4 depending on your configuration) - uses another mechanism that is safer but consumes more power. This queue is intended for commands and not for real Tx packets.
>
> If that's queue 4 in the debugfs tx_queues file, then I think that
> that's the queue that keeps moving when the card is dead.  I'll see if
> I can improve the tx_queues file to show pending updates.
>
> Yay troubleshooting old hardware.
>
If you plan to spend some time hacking on the code, please take into
account this code has been changed in 3.14 (mostly cleanups by
Johannes and myself).
Also note that long ago, I disabled shadow registers (you'll see ifs
all over the code). Shadow registers makes the whole wakup thing much
easier since it is handled internally in the HW. BUT we had tons of
issues with it so we disabled it. Most of the issues should have gone
away with the cmd_in_flight thing. So I would recommend you to give it
a try with shadow register enabled. Just check the relevant iwl-XXXX.c
file.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-06-02 19:32                                           ` Emmanuel Grumbach
@ 2014-06-02 19:57                                             ` Andy Lutomirski
  2014-06-02 20:15                                               ` Emmanuel Grumbach
  0 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2014-06-02 19:57 UTC (permalink / raw)
  To: Emmanuel Grumbach; +Cc: Linux Wireless List

On Mon, Jun 2, 2014 at 12:32 PM, Emmanuel Grumbach <egrumbach@gmail.com> wrote:
>>> Note that the CMD queue (9 or 4 depending on your configuration) - uses another mechanism that is safer but consumes more power. This queue is intended for commands and not for real Tx packets.
>>
>> If that's queue 4 in the debugfs tx_queues file, then I think that
>> that's the queue that keeps moving when the card is dead.  I'll see if
>> I can improve the tx_queues file to show pending updates.
>>
>> Yay troubleshooting old hardware.
>>
> If you plan to spend some time hacking on the code, please take into
> account this code has been changed in 3.14 (mostly cleanups by
> Johannes and myself).
> Also note that long ago, I disabled shadow registers (you'll see ifs
> all over the code). Shadow registers makes the whole wakup thing much
> easier since it is handled internally in the HW. BUT we had tons of
> issues with it so we disabled it. Most of the issues should have gone
> away with the cmd_in_flight thing. So I would recommend you to give it
> a try with shadow register enabled. Just check the relevant iwl-XXXX.c
> file.

Will do.

FWIW, I looked at the cmd_in_flight thing and, unless there's a bunch
of locking magic going on, it looks racy.  It seems like some code
expects it to be protected by reg_lock, and other code expects it to
be protected by txq_lock (and there are multiple txq->lock instances,
I think).

Also, what stops the card from going to sleep while processing a
wakeup interrupt?  For example, what prevents this chain of events?
The only thing I can see is that the only thing that clears ACCESS_REQ
is the rx interrupt processing, which happens after wakeup processing,
but this seems rather fragile.

--Andy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Help debugging iwldvm / ath10k stalls
  2014-06-02 19:57                                             ` Andy Lutomirski
@ 2014-06-02 20:15                                               ` Emmanuel Grumbach
  0 siblings, 0 replies; 25+ messages in thread
From: Emmanuel Grumbach @ 2014-06-02 20:15 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: Linux Wireless List

>>> If that's queue 4 in the debugfs tx_queues file, then I think that
>>> that's the queue that keeps moving when the card is dead.  I'll see if
>>> I can improve the tx_queues file to show pending updates.
>>>
>>> Yay troubleshooting old hardware.
>>>
>> If you plan to spend some time hacking on the code, please take into
>> account this code has been changed in 3.14 (mostly cleanups by
>> Johannes and myself).
>> Also note that long ago, I disabled shadow registers (you'll see ifs
>> all over the code). Shadow registers makes the whole wakup thing much
>> easier since it is handled internally in the HW. BUT we had tons of
>> issues with it so we disabled it. Most of the issues should have gone
>> away with the cmd_in_flight thing. So I would recommend you to give it
>> a try with shadow register enabled. Just check the relevant iwl-XXXX.c
>> file.
>
> Will do.
>
> FWIW, I looked at the cmd_in_flight thing and, unless there's a bunch
> of locking magic going on, it looks racy.  It seems like some code
> expects it to be protected by reg_lock, and other code expects it to
> be protected by txq_lock (and there are multiple txq->lock instances,
> I think).

:)
Yeah - there is lockdep magic there.
In any case, we set it when we enqueue and host command and clear it
when the last command is being responded.
When we set it, we request access, and vice versa.

>
> Also, what stops the card from going to sleep while processing a
> wakeup interrupt?  For example, what prevents this chain of events?
> The only thing I can see is that the only thing that clears ACCESS_REQ
> is the rx interrupt processing, which happens after wakeup processing,
> but this seems rather fragile.
>

As long as ACCESS_REQ isn't cleared, the NIC is awake. We set it when
we we want to wakeup the NIC.
I can't really find where we clear it right now, but I am half sleeping :)

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-06-02 20:15 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-25 17:23 Help debugging iwldvm / ath10k stalls Andy Lutomirski
2014-05-26  5:06 ` Andy Lutomirski
2014-05-26  5:39   ` Emmanuel Grumbach
2014-05-26  6:11     ` Andy Lutomirski
2014-05-26  8:29       ` Emmanuel Grumbach
2014-05-26  8:39         ` Emmanuel Grumbach
2014-05-26 17:28           ` Andy Lutomirski
2014-05-26 18:00             ` Emmanuel Grumbach
2014-05-26 18:08               ` Andy Lutomirski
2014-05-26 18:13                 ` Emmanuel Grumbach
2014-05-26 18:47                   ` Andy Lutomirski
2014-05-27 17:09                     ` Andy Lutomirski
2014-05-27 18:01                       ` Andy Lutomirski
2014-05-27 20:30                         ` Emmanuel Grumbach
2014-05-28  3:07                           ` Andy Lutomirski
2014-05-28 12:09                             ` Emmanuel Grumbach
2014-06-02 16:54                               ` Andy Lutomirski
2014-06-02 18:11                                 ` Andy Lutomirski
2014-06-02 18:40                                   ` Emmanuel Grumbach
2014-06-02 18:45                                     ` Andy Lutomirski
2014-06-02 18:53                                       ` Emmanuel Grumbach
2014-06-02 18:56                                         ` Andy Lutomirski
2014-06-02 19:32                                           ` Emmanuel Grumbach
2014-06-02 19:57                                             ` Andy Lutomirski
2014-06-02 20:15                                               ` Emmanuel Grumbach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).