Linux wireless drivers development
 help / color / mirror / Atom feed
* Re: bcmdhd: Strange Power Save messages
From: Arend Van Spriel @ 2016-10-06  8:25 UTC (permalink / raw)
  To: Gucea Doru
  Cc: Krishna Chaitanya, Arend van Spriel, Andra Paraschiv,
	linux-wireless
In-Reply-To: <CANfLQrY1NahfQF0xheocVFdJHgxhSE5YbFUwcm3t6e0Fur-YGg@mail.gmail.com>

On 6-10-2016 10:07, Gucea Doru wrote:
> On Wed, Oct 5, 2016 at 11:12 AM, Arend Van Spriel
> <arend.vanspriel@broadcom.com> wrote:
>> On 4-10-2016 13:39, Gucea Doru wrote:
>>> On Sat, Oct 1, 2016 at 2:52 PM, Arend van Spriel
>>>> <arend.vanspriel@broadcom.com> wrote:
>>>>>
>>>>>
>>>>> On 29-09-16 13:32, Gucea Doru wrote:
>>>>>> On Tue, Sep 27, 2016 at 12:03 PM, Gucea Doru <gucea.doru@gmail.com> wrote:
>>>>>>> What is the decision triggering the exit from the PS mode immediately
>>>>>>> after the ping request? I am asking this because 802.11 PS legacy
>>>>>>> specifies that the client should wait for a beacon with TIM set in
>>>>>>> order to wake up: in my case, there is no beacon between the ping
>>>>>>> request message and the Null frame that announces the exit from the PS
>>>>>>> mode.
>>>>>>
>>>>>>
>>>>>> Any help would be highly appreciated :)
>>>>>
>>>>> Actually though I already sent you are reply, but alas here it is.
>>>>>
>>>>> bcmdhd is our aosp driver. I am maintaining the upstream brcm80211
>>>>> drivers. Regardless your question is more for firmware running on the
>>>>> device. So like the same behavior would be observed when using brcmfmac
>>>>> with same firmware.
>>>>>
>>>>>> IEEE Std 802.11-2012, section 10.2.1.8 specifies that "when the STA
>>>>>> detects that the bit corresponding to its AID is 1 i the TIM, the STA
>>>>>> shall issue a PS Poll". In my capture there are cases when the STA
>>>>>> exits the PS mode without waiting for a beacon.
>>>>>
>>>>> It is a bit tricky, but the standard does not explicitly say the STA
>>>>> should be in power-save at any other time. So it is difficult to say
>>>>> what event occurred on the STA side to exit PS mode. Also STA means
>>>>> P2P-Client as you say. That means that you have multiple interfaces:
>>>>> regular STA and P2P-Client. So is the STA connected to some other AP or
>>>>> just not connected. wpa_supplicant will do intermittent scan or initiate
>>>>> scheduled scan by which firmware will scan at a certain interval. That
>>>>> is just some things I can come up with and I am sure there are more.
>>>
>>> I agree that there may be some events belonging to the regular STA
>>> interface that could trigger the Null Frame (which includes the exit
>>> from PS Mode). However, I would expect to see some management frames
>>> in the air before/after the Null Packet (e.g.: a Probe request in case
>>> of a scheduled scan). But in my case the trigger for the Null frame
>>> seems to be the ping request packet, the scenario is the same every
>>> time: ping request -> Block ACK -> Null Frame (Wireshark trace
>>> confirms this behavior).
>>>
>>> I thought that you had a power save optimization algorithm that keeps
>>> the card on a few milliseconds just to see if we can have a fast reply
>>> from the peer. Does this ring a bell? :)
>>
>> It does not. That would be implemented in firmware. As said I am working
>> on brcmfmac/brcmsmac. So bcmdhd and firmware are not my expertise.
>>
> 
> Arend, could you please redirect me to a Broadcom firmware maintainer?

Can you please elaborate on your platform, broadcom chipset, and what
version of bcmdhd you are using.

Regards,
Arend

^ permalink raw reply

* Re: bcmdhd: Strange Power Save messages
From: Gucea Doru @ 2016-10-06  8:07 UTC (permalink / raw)
  To: Arend Van Spriel
  Cc: Krishna Chaitanya, Arend van Spriel, Andra Paraschiv,
	linux-wireless
In-Reply-To: <db6d5803-3fec-64d9-5290-32216d1bd031@broadcom.com>

On Wed, Oct 5, 2016 at 11:12 AM, Arend Van Spriel
<arend.vanspriel@broadcom.com> wrote:
> On 4-10-2016 13:39, Gucea Doru wrote:
>> On Sat, Oct 1, 2016 at 2:52 PM, Arend van Spriel
>>> <arend.vanspriel@broadcom.com> wrote:
>>>>
>>>>
>>>> On 29-09-16 13:32, Gucea Doru wrote:
>>>>> On Tue, Sep 27, 2016 at 12:03 PM, Gucea Doru <gucea.doru@gmail.com> wrote:
>>>>>> What is the decision triggering the exit from the PS mode immediately
>>>>>> after the ping request? I am asking this because 802.11 PS legacy
>>>>>> specifies that the client should wait for a beacon with TIM set in
>>>>>> order to wake up: in my case, there is no beacon between the ping
>>>>>> request message and the Null frame that announces the exit from the PS
>>>>>> mode.
>>>>>
>>>>>
>>>>> Any help would be highly appreciated :)
>>>>
>>>> Actually though I already sent you are reply, but alas here it is.
>>>>
>>>> bcmdhd is our aosp driver. I am maintaining the upstream brcm80211
>>>> drivers. Regardless your question is more for firmware running on the
>>>> device. So like the same behavior would be observed when using brcmfmac
>>>> with same firmware.
>>>>
>>>>> IEEE Std 802.11-2012, section 10.2.1.8 specifies that "when the STA
>>>>> detects that the bit corresponding to its AID is 1 i the TIM, the STA
>>>>> shall issue a PS Poll". In my capture there are cases when the STA
>>>>> exits the PS mode without waiting for a beacon.
>>>>
>>>> It is a bit tricky, but the standard does not explicitly say the STA
>>>> should be in power-save at any other time. So it is difficult to say
>>>> what event occurred on the STA side to exit PS mode. Also STA means
>>>> P2P-Client as you say. That means that you have multiple interfaces:
>>>> regular STA and P2P-Client. So is the STA connected to some other AP or
>>>> just not connected. wpa_supplicant will do intermittent scan or initiate
>>>> scheduled scan by which firmware will scan at a certain interval. That
>>>> is just some things I can come up with and I am sure there are more.
>>
>> I agree that there may be some events belonging to the regular STA
>> interface that could trigger the Null Frame (which includes the exit
>> from PS Mode). However, I would expect to see some management frames
>> in the air before/after the Null Packet (e.g.: a Probe request in case
>> of a scheduled scan). But in my case the trigger for the Null frame
>> seems to be the ping request packet, the scenario is the same every
>> time: ping request -> Block ACK -> Null Frame (Wireshark trace
>> confirms this behavior).
>>
>> I thought that you had a power save optimization algorithm that keeps
>> the card on a few milliseconds just to see if we can have a fast reply
>> from the peer. Does this ring a bell? :)
>
> It does not. That would be implemented in firmware. As said I am working
> on brcmfmac/brcmsmac. So bcmdhd and firmware are not my expertise.
>

Arend, could you please redirect me to a Broadcom firmware maintainer?

Thank you,
Doru

^ permalink raw reply

* Re: [PATCH] ath10k: cache calibration data when the core is stopped.
From: Valo, Kalle @ 2016-10-06  7:40 UTC (permalink / raw)
  To: Marty Faltesek; +Cc: ath10k@lists.infradead.org, linux-wireless@vger.kernel.org
In-Reply-To: <CAOiWkA_fuhzPeF7qPP0hLN3KTZUOi6_te5Zc3+vxKQ9GRj0VBw@mail.gmail.com>

Marty Faltesek <mfaltesek@google.com> writes:

> On Mon, Oct 3, 2016 at 3:46 AM, Valo, Kalle <kvalo@qca.qualcomm.com> wrot=
e:
>> Marty Faltesek <mfaltesek@google.com> writes:
>>
>>> Caching calibration data allows it to be accessed when the
>>> device is not active.
>>>
>>> Signed-off-by: Marty Faltesek <mfaltesek@google.com>
>>
>> No comma in the title, please.
>>
>> What tree did you use as the baseline? This doesn't seem to apply to
>> ath.git:
>
> We use backports 20160122 which has not been updated since earlier this y=
ear.
> I can forward port it to your tree, and make sure
> it builds but won't be able to test it. Will that be OK?

Sure, I can test it.

>> Also please note that this patch (which I'm queuing to 4.9) touches the
>> same area:
>>
>> ath10k: fix debug cal data file
>>
>> https://patchwork.kernel.org/patch/9340953/
>
> I've modified this too, and this won't be necessary, so can you drop
> it? If not, let me know and I'll pull it in and make sure I'm based
> off it too.

Actually I was first planning to push 9340953 to 4.9 and take your patch
to 4.10. But your patch is a cleaner approach to this and maybe I should
push that to 4.9 instead? Need to think a bit more.

--=20
Kalle Valo=

^ permalink raw reply

* Re: [1/2] ath10k: clean up HTT tx buffer allocation and free
From: Mohammed Shafi Shajakhan @ 2016-10-06  7:37 UTC (permalink / raw)
  To: Kalle Valo; +Cc: Mohammed Shafi Shajakhan, ath10k, linux-wireless
In-Reply-To: <dc6b3afe60574d7bb60e1913da3f1ad4@euamsexm01a.eu.qualcomm.com>

On Thu, Oct 06, 2016 at 09:31:41AM +0200, Kalle Valo wrote:
> Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com> wrote:
> > From: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
> > 
> > cleanup 'ath10k_htt_tx_alloc' by introducing the API's
> > 'ath10k_htt_tx_alloc/free_{cont_txbuf, txdone_fifo} and
> > re-use them whereever needed
> > 
> > Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
> 
> Patch 1 doesn't apply and the conflict was not trivial

[shafi] oops will rebase it, not sure how i missed it :-(

> 
> error: patch failed: drivers/net/wireless/ath/ath10k/htt_tx.c:388
> error: drivers/net/wireless/ath/ath10k/htt_tx.c: patch does not apply
> stg import: Diff does not apply cleanly
> error: patch failed: drivers/net/wireless/ath/ath10k/htt_tx.c:283
> error: drivers/net/wireless/ath/ath10k/htt_tx.c: patch does not apply
> stg import: Diff does not apply cleanly
> 
> 2 patches set to Changes Requested.
> 
> 9361741 [1/2] ath10k: clean up HTT tx buffer allocation and free
> 9361743 [2/2] ath10k: Remove extraneous error message in tx alloc
> 
> -- 
> https://patchwork.kernel.org/patch/9361741/
> 
> Documentation about submitting wireless patches and checking status
> from patchwork:
> 
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> 

^ permalink raw reply

* Re: [1/2] ath10k: clean up HTT tx buffer allocation and free
From: Kalle Valo @ 2016-10-06  7:31 UTC (permalink / raw)
  To: Mohammed Shafi Shajakhan
  Cc: ath10k, mohammed, linux-wireless, Mohammed Shafi Shajakhan
In-Reply-To: <1475584772-4091-2-git-send-email-mohammed@qca.qualcomm.com>

Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com> wrote:
> From: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
> 
> cleanup 'ath10k_htt_tx_alloc' by introducing the API's
> 'ath10k_htt_tx_alloc/free_{cont_txbuf, txdone_fifo} and
> re-use them whereever needed
> 
> Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>

Patch 1 doesn't apply and the conflict was not trivial

error: patch failed: drivers/net/wireless/ath/ath10k/htt_tx.c:388
error: drivers/net/wireless/ath/ath10k/htt_tx.c: patch does not apply
stg import: Diff does not apply cleanly
error: patch failed: drivers/net/wireless/ath/ath10k/htt_tx.c:283
error: drivers/net/wireless/ath/ath10k/htt_tx.c: patch does not apply
stg import: Diff does not apply cleanly

2 patches set to Changes Requested.

9361741 [1/2] ath10k: clean up HTT tx buffer allocation and free
9361743 [2/2] ath10k: Remove extraneous error message in tx alloc

-- 
https://patchwork.kernel.org/patch/9361741/

Documentation about submitting wireless patches and checking status
from patchwork:

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [1/3] ath6kl: fix busreqs so they can be reused when sg is cleaned up
From: Kalle Valo @ 2016-10-06  7:15 UTC (permalink / raw)
  To: James Minor
  Cc: linux-wireless, ath6kl, julia.cartwright, steve.derosier,
	James Minor
In-Reply-To: <1475517604-17710-2-git-send-email-james.minor@ni.com>

James Minor <james.minor@ni.com> wrote:
> To reuse the busreqs in case of hardware restart, they must be
> properly reinitialized.  If the scat_req pointer isn't reset to
> 0, __ath6kl_sdio_write_async() will assume there is sg work to be
> done (causing a kernel OOPS).
> 
> Signed-off-by: James Minor <james.minor@ni.com>
> Reviewed-by: Steve deRosier <steve.derosier@lairdtech.com>

3 patches applied to ath-next branch of ath.git, thanks.

3605d751d5dd ath6kl: fix busreqs so they can be reused when sg is cleaned up
db14b18a73a1 ath6kl: after cleanup properly reflect that sg is disabled
fdb6e4839e3a ath6kl: configure SDIO when power is reapplied

-- 
https://patchwork.kernel.org/patch/9360777/

Documentation about submitting wireless patches and checking status
from patchwork:

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: [RFC 0/3] of: add common bindings to (de)activate IEEE 802.11 bands
From: Felix Fietkau @ 2016-10-05 20:34 UTC (permalink / raw)
  To: Rob Herring
  Cc: Martin Blumenstingl, Mark Rutland, Frank Rowand,
	devicetree@vger.kernel.org, linux-wireless, ath9k-devel,
	ath9k-devel, Kalle Valo
In-Reply-To: <CAL_JsqJfxEZCtdNi-wRgrtdmkPsGw0-ZMGhdbcoQ=seyij5LiQ@mail.gmail.com>

On 2016-10-05 22:31, Rob Herring wrote:
> On Wed, Oct 5, 2016 at 1:36 PM, Felix Fietkau <nbd@nbd.name> wrote:
>> On 2016-10-05 20:25, Martin Blumenstingl wrote:
>>> On Mon, Oct 3, 2016 at 5:22 PM, Rob Herring <robh+dt@kernel.org> wrote:
>>>> On Sun, Oct 2, 2016 at 5:50 PM, Martin Blumenstingl
>>>> <martin.blumenstingl@googlemail.com> wrote:
>>>>> There are at least two drivers (ath9k and mt76) out there, where
>>>>> devicetree authors need to override the enabled bands.
>>>>>
>>>>> For ath9k there is only one use-case: disabling a band which has been
>>>>> incorrectly enabled by the vendor in the EEPROM (enabling a band is not
>>>>> possible because the calibration data would be missing in the EEPROM).
>>>>> The mt76 driver (currently pending for review) however allows enabling
>>>>> and disabling the 2.4GHz and 5GHz band, see [0].
>>>>>
>>>>> Based on the discussion of (earlier versions of) my ath9k devicetree
>>>>> patch it was suggested [1] that we use enable- and disable- properties.
>>>>> The current patch implements:
>>>>> - bands can be enabled or disabled with the corresponding property
>>>>> - if both (disable and enable) properties are given and a driver asks
>>>>>   whether the band is enabled then the logic will return false (= not
>>>>>   enabled, preferring the disabled flag)
>>>>> - if both (disable and enable) properties are given and a driver asks
>>>>>   whether the band is disabled then the logic will return true (again,
>>>>>   preferring the disabled flag over the enabled flag)
>>>>>
>>>>> We can still change the logic to do what the mt76 driver does (I am
>>>>> fine with both solutions):
>>>>> - property not available: driver decides which bands to enable
>>>>> - property set to 0: disable the band
>>>>> - property set to 1: enable the band
>>>>
>>>> I prefer this way. Really, I'd prefer just a boolean disable property.
>>>> I'm not sure why you need the enable. We usually have these tri-state
>>>> properties when you want not present to mean use the bootloader or
>>>> default setting. Try to make not present the most common case.
>>> Felix: could you please give a few details why mt76 can not only
>>> disable but also enable a specific band?
>>> There is no specific comment in the sources [0] why this is needed.
>>> All other drivers that I am aware of (ath9k, rt2x00) only allow
>>> disabling a specific band, they never enable it (this has to be done
>>> explicitly by the EEPROM).
>> None of the devices use it at the moment, I probably added it when I
>> played with a device that had broken EEPROM data. I guess I decided to
>> do it this way because on many devices the band capability field simply
>> cannot be trusted.
>> I guess I would be okay with just having a disable property and adding
>> an enable property later only if we end up actually needing it.
> 
> If EEPROM is commonly wrong or missing, then seems like you want to
> plan ahead and support both enable and disable.
> 
> The other approach I've mentioned before is just put raw EEPROM data
> into DT to override the EEPROM. This would be better if there's a
> large number of settings you need.
So far, other EEPROM settings didn't need to be changed.

- Felix

^ permalink raw reply

* Re: [RFC 0/3] of: add common bindings to (de)activate IEEE 802.11 bands
From: Rob Herring @ 2016-10-05 20:31 UTC (permalink / raw)
  To: Felix Fietkau
  Cc: Martin Blumenstingl, Mark Rutland, Frank Rowand,
	devicetree@vger.kernel.org, linux-wireless, ath9k-devel,
	ath9k-devel, Kalle Valo
In-Reply-To: <23f49efe-ddef-ea16-00de-7477e4872d82@nbd.name>

On Wed, Oct 5, 2016 at 1:36 PM, Felix Fietkau <nbd@nbd.name> wrote:
> On 2016-10-05 20:25, Martin Blumenstingl wrote:
>> On Mon, Oct 3, 2016 at 5:22 PM, Rob Herring <robh+dt@kernel.org> wrote:
>>> On Sun, Oct 2, 2016 at 5:50 PM, Martin Blumenstingl
>>> <martin.blumenstingl@googlemail.com> wrote:
>>>> There are at least two drivers (ath9k and mt76) out there, where
>>>> devicetree authors need to override the enabled bands.
>>>>
>>>> For ath9k there is only one use-case: disabling a band which has been
>>>> incorrectly enabled by the vendor in the EEPROM (enabling a band is not
>>>> possible because the calibration data would be missing in the EEPROM).
>>>> The mt76 driver (currently pending for review) however allows enabling
>>>> and disabling the 2.4GHz and 5GHz band, see [0].
>>>>
>>>> Based on the discussion of (earlier versions of) my ath9k devicetree
>>>> patch it was suggested [1] that we use enable- and disable- properties.
>>>> The current patch implements:
>>>> - bands can be enabled or disabled with the corresponding property
>>>> - if both (disable and enable) properties are given and a driver asks
>>>>   whether the band is enabled then the logic will return false (= not
>>>>   enabled, preferring the disabled flag)
>>>> - if both (disable and enable) properties are given and a driver asks
>>>>   whether the band is disabled then the logic will return true (again,
>>>>   preferring the disabled flag over the enabled flag)
>>>>
>>>> We can still change the logic to do what the mt76 driver does (I am
>>>> fine with both solutions):
>>>> - property not available: driver decides which bands to enable
>>>> - property set to 0: disable the band
>>>> - property set to 1: enable the band
>>>
>>> I prefer this way. Really, I'd prefer just a boolean disable property.
>>> I'm not sure why you need the enable. We usually have these tri-state
>>> properties when you want not present to mean use the bootloader or
>>> default setting. Try to make not present the most common case.
>> Felix: could you please give a few details why mt76 can not only
>> disable but also enable a specific band?
>> There is no specific comment in the sources [0] why this is needed.
>> All other drivers that I am aware of (ath9k, rt2x00) only allow
>> disabling a specific band, they never enable it (this has to be done
>> explicitly by the EEPROM).
> None of the devices use it at the moment, I probably added it when I
> played with a device that had broken EEPROM data. I guess I decided to
> do it this way because on many devices the band capability field simply
> cannot be trusted.
> I guess I would be okay with just having a disable property and adding
> an enable property later only if we end up actually needing it.

If EEPROM is commonly wrong or missing, then seems like you want to
plan ahead and support both enable and disable.

The other approach I've mentioned before is just put raw EEPROM data
into DT to override the EEPROM. This would be better if there's a
large number of settings you need.

Rob

^ permalink raw reply

* [PATCH v2] ath10k: cache calibration data when the core is stopped
From: Marty Faltesek @ 2016-10-05 20:23 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Marty Faltesek

Caching calibration data allows it to be accessed when the
device is not active.

---
 drivers/net/wireless/ath/ath10k/core.h  |  1 +
 drivers/net/wireless/ath/ath10k/debug.c | 72 ++++++++++++++-------------------
 2 files changed, 31 insertions(+), 42 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 6e5aa2d..7274ebe 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -452,6 +452,7 @@ struct ath10k_debug {
 	u32 nf_cal_period;
 
 	struct ath10k_fw_crash_data *fw_crash_data;
+	void *cal_data;
 };
 
 enum ath10k_state {
diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c
index 832da6e..668074c 100644
--- a/drivers/net/wireless/ath/ath10k/debug.c
+++ b/drivers/net/wireless/ath/ath10k/debug.c
@@ -1451,75 +1451,58 @@ static const struct file_operations fops_fw_dbglog = {
 	.llseek = default_llseek,
 };
 
-static int ath10k_debug_cal_data_open(struct inode *inode, struct file *file)
+static int ath10k_debug_cal_data_fetch(struct ath10k *ar)
 {
-	struct ath10k *ar = inode->i_private;
-	void *buf;
 	u32 hi_addr;
 	__le32 addr;
 	int ret;
 
-	mutex_lock(&ar->conf_mutex);
-
-	if (ar->state != ATH10K_STATE_ON &&
-	    ar->state != ATH10K_STATE_UTF) {
-		ret = -ENETDOWN;
-		goto err;
-	}
-
-	buf = vmalloc(ar->hw_params.cal_data_len);
-	if (!buf) {
-		ret = -ENOMEM;
-		goto err;
-	}
-
 	hi_addr = host_interest_item_address(HI_ITEM(hi_board_data));
 
 	ret = ath10k_hif_diag_read(ar, hi_addr, &addr, sizeof(addr));
+
 	if (ret) {
-		ath10k_warn(ar, "failed to read hi_board_data address: %d\n", ret);
-		goto err_vfree;
+		ath10k_warn(ar, "failed to read hi_board_data address: %d\n",
+		            ret);
+		return ret;
 	}
 
-	ret = ath10k_hif_diag_read(ar, le32_to_cpu(addr), buf,
+	ret = ath10k_hif_diag_read(ar, le32_to_cpu(addr), ar->debug.cal_data,
 				   ar->hw_params.cal_data_len);
 	if (ret) {
 		ath10k_warn(ar, "failed to read calibration data: %d\n", ret);
-		goto err_vfree;
+		return ret;
 	}
 
-	file->private_data = buf;
+	return 0;
+}
 
-	mutex_unlock(&ar->conf_mutex);
+static int ath10k_debug_cal_data_open(struct inode *inode, struct file *file)
+{
+	struct ath10k *ar = inode->i_private;
 
-	return 0;
+	mutex_lock(&ar->conf_mutex);
 
-err_vfree:
-	vfree(buf);
+	if (ar->state == ATH10K_STATE_ON ||
+	    ar->state == ATH10K_STATE_UTF) {
+		ath10k_debug_cal_data_fetch(ar);
+	}
 
-err:
+	file->private_data = ar;
 	mutex_unlock(&ar->conf_mutex);
 
-	return ret;
+	return 0;
 }
 
 static ssize_t ath10k_debug_cal_data_read(struct file *file,
-					  char __user *user_buf,
-					  size_t count, loff_t *ppos)
+                                          char __user *user_buf,
+                                          size_t count, loff_t *ppos)
 {
 	struct ath10k *ar = file->private_data;
-	void *buf = file->private_data;
 
 	return simple_read_from_buffer(user_buf, count, ppos,
-				       buf, ar->hw_params.cal_data_len);
-}
-
-static int ath10k_debug_cal_data_release(struct inode *inode,
-					 struct file *file)
-{
-	vfree(file->private_data);
-
-	return 0;
+				       ar->debug.cal_data,
+				       ar->hw_params.cal_data_len);
 }
 
 static ssize_t ath10k_write_ani_enable(struct file *file,
@@ -1580,7 +1563,6 @@ static const struct file_operations fops_ani_enable = {
 static const struct file_operations fops_cal_data = {
 	.open = ath10k_debug_cal_data_open,
 	.read = ath10k_debug_cal_data_read,
-	.release = ath10k_debug_cal_data_release,
 	.owner = THIS_MODULE,
 	.llseek = default_llseek,
 };
@@ -1932,6 +1914,8 @@ void ath10k_debug_stop(struct ath10k *ar)
 {
 	lockdep_assert_held(&ar->conf_mutex);
 
+	ath10k_debug_cal_data_fetch(ar);
+
 	/* Must not use _sync to avoid deadlock, we do that in
 	 * ath10k_debug_destroy(). The check for htt_stats_mask is to avoid
 	 * warning from del_timer(). */
@@ -2343,7 +2327,9 @@ int ath10k_debug_create(struct ath10k *ar)
 	ar->debug.fw_crash_data = vzalloc(sizeof(*ar->debug.fw_crash_data));
 	if (!ar->debug.fw_crash_data)
 		return -ENOMEM;
-
+	ar->debug.cal_data = vzalloc(ar->hw_params.cal_data_len);
+	if (!ar->debug.cal_data)
+		return -ENOMEM;
 	INIT_LIST_HEAD(&ar->debug.fw_stats.pdevs);
 	INIT_LIST_HEAD(&ar->debug.fw_stats.vdevs);
 	INIT_LIST_HEAD(&ar->debug.fw_stats.peers);
@@ -2355,7 +2341,9 @@ int ath10k_debug_create(struct ath10k *ar)
 void ath10k_debug_destroy(struct ath10k *ar)
 {
 	vfree(ar->debug.fw_crash_data);
+	vfree(ar->debug.cal_data);
 	ar->debug.fw_crash_data = NULL;
+	ar->debug.cal_data = NULL;
 
 	ath10k_debug_fw_stats_reset(ar);
 
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* Re: [PATCH v4] ath9k: Switch to using mac80211 intermediate software queues.
From: Toke Høiland-Jørgensen @ 2016-10-05 19:56 UTC (permalink / raw)
  To: Kalle Valo
  Cc: linux-wireless, make-wifi-fast, ath9k-devel, Tim Shepard,
	Felix Fietkau
In-Reply-To: <874m4qof58.fsf@kamboji.qca.qualcomm.com>

Kalle Valo <kvalo@codeaurora.org> writes:

> Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke.dk> writes:
>
>> Kalle Valo <kvalo@codeaurora.org> writes:
>>
>>> Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke.dk> writes:
>>>
>>>> Kalle Valo <kvalo@codeaurora.org> writes:
>>>>
>>>>> Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke.dk> writes:
>>>>>
>>>>> I understand your point, but I don't want to rush this to 4.9 and then
>>>>> start getting lots of bug reports and eventually forced to revert it.=
 If
>>>>> we just found a new serious regression the chances are that there are
>>>>> more lurking somewhere and this patch is just not ready yet.
>>>>
>>>> So, the changes to mac80211 that fixes the known regressions of this
>>>> patch have gone in.
>>>
>>> I guess you mean this commit:
>>>
>>> bb42f2d13ffc mac80211: Move reorder-sensitive TX handlers to after TXQ =
dequeue
>>>
>>> (Just making sure that I have the same commit in my tree when I apply
>>> this)
>>
>> Yup, that's the one :)
>>
>>>> Any chance of seeing this merged during the current merge window? :)
>>>
>>> I sent last new feature ("-next") patches for 4.9 last week, sorry. So
>>> this has to wait for 4.10.
>>
>> Ah, right, I think I got my merge windows confused. You already said you
>> wouldn't take it for 4.9. So I guess what I'm asking is for you to put
>> it into the appropriate -next tree so it can get some wider exposure
>> ahead of the *next* merge window...
>
> Yeah, we have plenty of time for 4.10 :) So my plan is to apply this
> after I open wireless-drivers-next in 2-3 weeks or so. That would mean
> that the patch would hit Linus' tree when 4.10-rc1 is released
> (estimated to happen on 2017-01-01). The timing is actually perfect as
> now we get maximal testing time on -next.

So the -next trees are those that are open outside the merge window.
Right, got it; thanks :)

>>> And I assume I need to take v5:
>>>
>>> https://patchwork.kernel.org/patch/9311037/
>>
>> Yes. Haven't noticed anything that changed since that might conflict
>> with it, but let me know if I missed something and you want a refreshed
>> version.
>
> Thanks, I'll let you know if there are any problems.

Cool.

-Toke

^ permalink raw reply

* Re: [PATCH RESEND v3] mwifiex: parse device tree node for PCIe
From: Brian Norris @ 2016-10-05 19:48 UTC (permalink / raw)
  To: Amitkumar Karwar; +Cc: linux-wireless, devicetree, rajatja, Xinming Hu
In-Reply-To: <1475167183-18376-1-git-send-email-akarwar@marvell.com>

Hi,

On Thu, Sep 29, 2016 at 10:09:43PM +0530, Amitkumar Karwar wrote:
> From: Xinming Hu <huxm@marvell.com>
> 
> This patch derives device tree node from pcie bus layer framework.
> Device tree bindings file has been renamed(marvell-sd8xxx.txt ->
> marvell-8xxx.txt) to accomodate PCIe changes.
> 
> Signed-off-by: Xinming Hu <huxm@marvell.com>
> Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
> ---
> v2: Included vendor and product IDs in compatible strings for PCIe
> chipsets(Rob Herring)
> v3: Patch is created using -M option so that it will only include diff of
> original and renamed files(Rob Herring)
> Resend v3: Resending the patch because I missed to include device tree mailing
> while sending v3.
> ---

Why is this patch so different than the SDIO DT handling for the same
driver?

>  .../{marvell-sd8xxx.txt => marvell-8xxx.txt}         |  8 +++++---
>  drivers/net/wireless/marvell/mwifiex/pcie.c          | 20 ++++++++++++++++++++
>  drivers/net/wireless/marvell/mwifiex/sta_cmd.c       |  3 ++-
>  3 files changed, 27 insertions(+), 4 deletions(-)
>  rename Documentation/devicetree/bindings/net/wireless/{marvell-sd8xxx.txt => marvell-8xxx.txt} (91%)
> 
> diff --git a/Documentation/devicetree/bindings/net/wireless/marvell-sd8xxx.txt b/Documentation/devicetree/bindings/net/wireless/marvell-8xxx.txt
> similarity index 91%
> rename from Documentation/devicetree/bindings/net/wireless/marvell-sd8xxx.txt
> rename to Documentation/devicetree/bindings/net/wireless/marvell-8xxx.txt
> index c421aba..dfe5f8e 100644
> --- a/Documentation/devicetree/bindings/net/wireless/marvell-sd8xxx.txt
> +++ b/Documentation/devicetree/bindings/net/wireless/marvell-8xxx.txt
> @@ -1,8 +1,8 @@
> -Marvell 8897/8997 (sd8897/sd8997) SDIO devices
> +Marvell 8897/8997 (sd8897/sd8997/pcie8997) SDIO/PCIE devices
>  ------
>  
> -This node provides properties for controlling the marvell sdio wireless device.
> -The node is expected to be specified as a child node to the SDIO controller that
> +This node provides properties for controlling the marvell sdio/pcie wireless device.
> +The node is expected to be specified as a child node to the SDIO/PCIE controller that
>  connects the device to the system.
>  
>  Required properties:
> @@ -10,6 +10,8 @@ Required properties:
>    - compatible : should be one of the following:
>  	* "marvell,sd8897"
>  	* "marvell,sd8997"
> +	* "pci11ab,2b42"
> +	* "pci1b4b,2b42"
>  
>  Optional properties:
>  
> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
> index 063c707..f1eeb73 100644
> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> @@ -37,6 +37,23 @@ static struct mwifiex_if_ops pcie_ops;
>  
>  static struct semaphore add_remove_card_sem;
>  
> +static const struct of_device_id mwifiex_pcie_of_match_table[] = {
> +	{ .compatible = "pci11ab,2b42" },
> +	{ .compatible = "pci1b4b,2b42" },
> +	{ }
> +};
> +
> +static int mwifiex_pcie_probe_of(struct device *dev)
> +{
> +	if (!dev->of_node ||

In sdio.c, this check is done silently; that way, we don't error out
(and print an error string) just because there's no DT node. And I think
that makes sense.

Maybe pull this condition out separately and return 0?

> +	    !of_match_node(mwifiex_pcie_of_match_table, dev->of_node)) {
> +		pr_err("pcie device node not available");
> +		return -1;
> +	}
> +

Just an FYI, the equivalent sdio.c code (mwifiex_sdio_probe_of()) also
parses a platform-specific wakeup pin. Rajat and I were considering
moving that to mwifiex_register(), so we can get the same handling in
pcie.c. But I think it's probably sane to leave the compatible table
here in the interface driver for now.

> +	return 0;
> +}
> +
>  static int
>  mwifiex_map_pci_memory(struct mwifiex_adapter *adapter, struct sk_buff *skb,
>  		       size_t size, int flags)
> @@ -206,6 +223,9 @@ static int mwifiex_pcie_probe(struct pci_dev *pdev,
>  		card->pcie.can_ext_scan = data->can_ext_scan;
>  	}
>  
> +	/* device tree node parsing and platform specific configuration*/
> +	mwifiex_pcie_probe_of(&pdev->dev);

You aren't catching the error code here. Probably because you're wrongly
returning an error above for the !of_node case.

Brian

> +
>  	if (mwifiex_add_card(card, &add_remove_card_sem, &pcie_ops,
>  			     MWIFIEX_PCIE)) {
>  		pr_err("%s failed\n", __func__);
> diff --git a/drivers/net/wireless/marvell/mwifiex/sta_cmd.c b/drivers/net/wireless/marvell/mwifiex/sta_cmd.c
> index 638d30a..f2a7a13 100644
> --- a/drivers/net/wireless/marvell/mwifiex/sta_cmd.c
> +++ b/drivers/net/wireless/marvell/mwifiex/sta_cmd.c
> @@ -2218,7 +2218,8 @@ int mwifiex_sta_init_cmd(struct mwifiex_private *priv, u8 first_sta, bool init)
>  		 * The cal-data can be read from device tree and/or
>  		 * a configuration file and downloaded to firmware.
>  		 */
> -		if (priv->adapter->iface_type == MWIFIEX_SDIO &&
> +		if ((priv->adapter->iface_type == MWIFIEX_SDIO ||
> +		    priv->adapter->iface_type == MWIFIEX_PCIE) &&
>  		    adapter->dev->of_node) {
>  			adapter->dt_node = adapter->dev->of_node;
>  			if (of_property_read_u32(adapter->dt_node,

^ permalink raw reply

* Re: ath10k: device stops working ("failed to install key for vdev" error in kernel log)
From: Ben Greear @ 2016-10-05 19:11 UTC (permalink / raw)
  To: Martin Blumenstingl; +Cc: ath10k, linux-wireless
In-Reply-To: <CAFBinCAv9PDYQ7J+0LtQg7vKnqp2HVpotqwjALrAUQf57YM4tg@mail.gmail.com>

On 10/05/2016 12:06 PM, Martin Blumenstingl wrote:
> On Wed, Oct 5, 2016 at 8:58 PM, Ben Greear <greearb@candelatech.com> wrote:
>> On 10/05/2016 11:51 AM, Martin Blumenstingl wrote:
>>>>
>>>> [54064.293597] ath10k_pci 0000:02:00.0: failed to install key for vdev
>>>> 0 peer [AP MAC addr]: -145
>>>> [54064.301234] wlan0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from
>>>> hardware (-145)
>>>> [54067.305703] ath10k_pci 0000:02:00.0: failed to install key for vdev
>>>> 0 peer [AP MAC addr]: -145
>>>> [54067.313307] wlan0: failed to set key (1, ff:ff:ff:ff:ff:ff) to
>>>> hardware (-145)
>>>
>>> it just happened again:
>>> ...
>>> [130266.948005] ath10k_pci 0000:02:00.0: failed to install key for
>>> vdev 0 peer [AP MAC address]: -145
>>> [130266.955697] wlan0: failed to remove key (2, ff:ff:ff:ff:ff:ff)
>>> from hardware (-145)
>>> [130269.964069] ath10k_pci 0000:02:00.0: failed to install key for
>>> vdev 0 peer [AP MAC address]: -145
>>> [130269.971775] wlan0: failed to set key (2, ff:ff:ff:ff:ff:ff) to
>>> hardware (-145)
>>> [172198.889700] ath10k_pci 0000:02:00.0: failed to send pdev bss chan
>>> info request
>>> [172201.897770] ath10k_pci 0000:02:00.0: failed to send pdev bss chan
>>> info request
>>>
>>> I tried to get more information from the firmware by looking at the
>>> fw_* debugfs files:
>>> # cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_reset_stats
>>> fw_crash_counter                0
>>> fw_warm_reset_counter           4
>>> fw_cold_reset_counter           0
>>> # cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_stats
>>> cat: can't open '/sys/kernel/debug/ieee80211/phy0/ath10k/fw_stats':
>>> Resource temporarily unavailable
>>> # cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_crash_dump
>>> cat: can't open
>>> '/sys/kernel/debug/ieee80211/phy0/ath10k/fw_crash_dump': No data
>>> available
>>> # cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_dbglog
>>> 0x00000000 0
>>> # cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_checksums
>>> firmware-N.bin          9d340dd9
>>> athwlan                 8d25deed
>>> otp                     f3efeb4f
>>> codeswap                00000000
>>> board-N.bin             bebc7c08
>>> board                   bebc7c08
>>>
>>> This is still with firmware 10.2.4.70.54.
>>> Please let me know if you need further information.
>>
>>
>> Not sure about your firmware exactly, but the timeout might happen because
>> firmware has leaked and/or run-out of resources, fails to insert the key,
>> and then it just doesn't respond instead of sending an event.  So, driver
>> gets the timeout message and who knows what state your system is in.
>>
>> I hit this when doing capacity tests, and I modified my firmware to always
>> send an event, and driver to deal with it.  I also fixed some resource leaks
>> and tuned firmware objects to make sure I do not hit the key exhaustion
>> state.
> That sounds bad.
> Especially as I would not describe my current setup as "high capacity" network.
> The worst-case I have is 5 devices:
> - Nexus 5
> - Sony Xperia Z3 Compact
> - Notebook with Intel AC 7260
> - QCA9880-2R4E in station (client) mode
> - BCM4330 based device
>
>> What is your test scenario in this case?
> with this specific crash it was pretty easy:
> - AP did not have any connections while I was at work
> - when I came back home two devices (Nexus 5 and Sony Xperia Z3
> Compact) tried to connect to the AP
> - device went into error state
>
> I already had days where only one phone was turned on and I was still
> able to reproduce it.

Firmware sometimes sends dbglog messages when it detects errors.  You can hack your driver with patches
I have posted to enable printing this out, and then maybe QCA firmware guys could debug this issue
for you.

Or, if you can reproduce this with my firmware and kernel (which enables the dbglog print
by default), email dmesg or similar output to me and maybe I will have a clue as to what
is going on.

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: ath10k: device stops working ("failed to install key for vdev" error in kernel log)
From: Martin Blumenstingl @ 2016-10-05 19:06 UTC (permalink / raw)
  To: Ben Greear; +Cc: ath10k, linux-wireless
In-Reply-To: <bdd2b4f1-114b-f114-c908-fc0f57b08ce8@candelatech.com>

On Wed, Oct 5, 2016 at 8:58 PM, Ben Greear <greearb@candelatech.com> wrote:
> On 10/05/2016 11:51 AM, Martin Blumenstingl wrote:
>>>
>>> [54064.293597] ath10k_pci 0000:02:00.0: failed to install key for vdev
>>> 0 peer [AP MAC addr]: -145
>>> [54064.301234] wlan0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from
>>> hardware (-145)
>>> [54067.305703] ath10k_pci 0000:02:00.0: failed to install key for vdev
>>> 0 peer [AP MAC addr]: -145
>>> [54067.313307] wlan0: failed to set key (1, ff:ff:ff:ff:ff:ff) to
>>> hardware (-145)
>>
>> it just happened again:
>> ...
>> [130266.948005] ath10k_pci 0000:02:00.0: failed to install key for
>> vdev 0 peer [AP MAC address]: -145
>> [130266.955697] wlan0: failed to remove key (2, ff:ff:ff:ff:ff:ff)
>> from hardware (-145)
>> [130269.964069] ath10k_pci 0000:02:00.0: failed to install key for
>> vdev 0 peer [AP MAC address]: -145
>> [130269.971775] wlan0: failed to set key (2, ff:ff:ff:ff:ff:ff) to
>> hardware (-145)
>> [172198.889700] ath10k_pci 0000:02:00.0: failed to send pdev bss chan
>> info request
>> [172201.897770] ath10k_pci 0000:02:00.0: failed to send pdev bss chan
>> info request
>>
>> I tried to get more information from the firmware by looking at the
>> fw_* debugfs files:
>> # cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_reset_stats
>> fw_crash_counter                0
>> fw_warm_reset_counter           4
>> fw_cold_reset_counter           0
>> # cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_stats
>> cat: can't open '/sys/kernel/debug/ieee80211/phy0/ath10k/fw_stats':
>> Resource temporarily unavailable
>> # cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_crash_dump
>> cat: can't open
>> '/sys/kernel/debug/ieee80211/phy0/ath10k/fw_crash_dump': No data
>> available
>> # cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_dbglog
>> 0x00000000 0
>> # cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_checksums
>> firmware-N.bin          9d340dd9
>> athwlan                 8d25deed
>> otp                     f3efeb4f
>> codeswap                00000000
>> board-N.bin             bebc7c08
>> board                   bebc7c08
>>
>> This is still with firmware 10.2.4.70.54.
>> Please let me know if you need further information.
>
>
> Not sure about your firmware exactly, but the timeout might happen because
> firmware has leaked and/or run-out of resources, fails to insert the key,
> and then it just doesn't respond instead of sending an event.  So, driver
> gets the timeout message and who knows what state your system is in.
>
> I hit this when doing capacity tests, and I modified my firmware to always
> send an event, and driver to deal with it.  I also fixed some resource leaks
> and tuned firmware objects to make sure I do not hit the key exhaustion
> state.
That sounds bad.
Especially as I would not describe my current setup as "high capacity" network.
The worst-case I have is 5 devices:
- Nexus 5
- Sony Xperia Z3 Compact
- Notebook with Intel AC 7260
- QCA9880-2R4E in station (client) mode
- BCM4330 based device

> What is your test scenario in this case?
with this specific crash it was pretty easy:
- AP did not have any connections while I was at work
- when I came back home two devices (Nexus 5 and Sony Xperia Z3
Compact) tried to connect to the AP
- device went into error state

I already had days where only one phone was turned on and I was still
able to reproduce it.


Regards,
Martin

^ permalink raw reply

* Re: ath10k: device stops working ("failed to install key for vdev" error in kernel log)
From: Ben Greear @ 2016-10-05 18:58 UTC (permalink / raw)
  To: Martin Blumenstingl, ath10k, linux-wireless
In-Reply-To: <CAFBinCAuCfTeWw3SMVLbeARkUs_66YT6MJVbsd9QH_CqW9SCtg@mail.gmail.com>

On 10/05/2016 11:51 AM, Martin Blumenstingl wrote:
>> [54064.293597] ath10k_pci 0000:02:00.0: failed to install key for vdev
>> 0 peer [AP MAC addr]: -145
>> [54064.301234] wlan0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from
>> hardware (-145)
>> [54067.305703] ath10k_pci 0000:02:00.0: failed to install key for vdev
>> 0 peer [AP MAC addr]: -145
>> [54067.313307] wlan0: failed to set key (1, ff:ff:ff:ff:ff:ff) to
>> hardware (-145)
> it just happened again:
> ...
> [130266.948005] ath10k_pci 0000:02:00.0: failed to install key for
> vdev 0 peer [AP MAC address]: -145
> [130266.955697] wlan0: failed to remove key (2, ff:ff:ff:ff:ff:ff)
> from hardware (-145)
> [130269.964069] ath10k_pci 0000:02:00.0: failed to install key for
> vdev 0 peer [AP MAC address]: -145
> [130269.971775] wlan0: failed to set key (2, ff:ff:ff:ff:ff:ff) to
> hardware (-145)
> [172198.889700] ath10k_pci 0000:02:00.0: failed to send pdev bss chan
> info request
> [172201.897770] ath10k_pci 0000:02:00.0: failed to send pdev bss chan
> info request
>
> I tried to get more information from the firmware by looking at the
> fw_* debugfs files:
> # cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_reset_stats
> fw_crash_counter                0
> fw_warm_reset_counter           4
> fw_cold_reset_counter           0
> # cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_stats
> cat: can't open '/sys/kernel/debug/ieee80211/phy0/ath10k/fw_stats':
> Resource temporarily unavailable
> # cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_crash_dump
> cat: can't open
> '/sys/kernel/debug/ieee80211/phy0/ath10k/fw_crash_dump': No data
> available
> # cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_dbglog
> 0x00000000 0
> # cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_checksums
> firmware-N.bin          9d340dd9
> athwlan                 8d25deed
> otp                     f3efeb4f
> codeswap                00000000
> board-N.bin             bebc7c08
> board                   bebc7c08
>
> This is still with firmware 10.2.4.70.54.
> Please let me know if you need further information.

Not sure about your firmware exactly, but the timeout might happen because
firmware has leaked and/or run-out of resources, fails to insert the key,
and then it just doesn't respond instead of sending an event.  So, driver
gets the timeout message and who knows what state your system is in.

I hit this when doing capacity tests, and I modified my firmware to always
send an event, and driver to deal with it.  I also fixed some resource leaks
and tuned firmware objects to make sure I do not hit the key exhaustion state.

What is your test scenario in this case?

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: ath10k: device stops working ("failed to install key for vdev" error in kernel log)
From: Martin Blumenstingl @ 2016-10-05 18:51 UTC (permalink / raw)
  To: ath10k, linux-wireless

> [54064.293597] ath10k_pci 0000:02:00.0: failed to install key for vdev
> 0 peer [AP MAC addr]: -145
> [54064.301234] wlan0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from
> hardware (-145)
> [54067.305703] ath10k_pci 0000:02:00.0: failed to install key for vdev
> 0 peer [AP MAC addr]: -145
> [54067.313307] wlan0: failed to set key (1, ff:ff:ff:ff:ff:ff) to
> hardware (-145)
it just happened again:
...
[130266.948005] ath10k_pci 0000:02:00.0: failed to install key for
vdev 0 peer [AP MAC address]: -145
[130266.955697] wlan0: failed to remove key (2, ff:ff:ff:ff:ff:ff)
from hardware (-145)
[130269.964069] ath10k_pci 0000:02:00.0: failed to install key for
vdev 0 peer [AP MAC address]: -145
[130269.971775] wlan0: failed to set key (2, ff:ff:ff:ff:ff:ff) to
hardware (-145)
[172198.889700] ath10k_pci 0000:02:00.0: failed to send pdev bss chan
info request
[172201.897770] ath10k_pci 0000:02:00.0: failed to send pdev bss chan
info request

I tried to get more information from the firmware by looking at the
fw_* debugfs files:
# cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_reset_stats
fw_crash_counter                0
fw_warm_reset_counter           4
fw_cold_reset_counter           0
# cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_stats
cat: can't open '/sys/kernel/debug/ieee80211/phy0/ath10k/fw_stats':
Resource temporarily unavailable
# cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_crash_dump
cat: can't open
'/sys/kernel/debug/ieee80211/phy0/ath10k/fw_crash_dump': No data
available
# cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_dbglog
0x00000000 0
# cat /sys/kernel/debug/ieee80211/phy0/ath10k/fw_checksums
firmware-N.bin          9d340dd9
athwlan                 8d25deed
otp                     f3efeb4f
codeswap                00000000
board-N.bin             bebc7c08
board                   bebc7c08

This is still with firmware 10.2.4.70.54.
Please let me know if you need further information.


Regards,
Martin

^ permalink raw reply

* Re: [RFC 0/3] of: add common bindings to (de)activate IEEE 802.11 bands
From: Felix Fietkau @ 2016-10-05 18:36 UTC (permalink / raw)
  To: Martin Blumenstingl, Rob Herring
  Cc: Mark Rutland, Frank Rowand, devicetree@vger.kernel.org,
	linux-wireless, ath9k-devel, ath9k-devel, Kalle Valo
In-Reply-To: <CAFBinCCDjZs6szRw0D1rwB89tUo+-RNFH2_g=K+Xj2iRJuUZDQ@mail.gmail.com>

On 2016-10-05 20:25, Martin Blumenstingl wrote:
> On Mon, Oct 3, 2016 at 5:22 PM, Rob Herring <robh+dt@kernel.org> wrote:
>> On Sun, Oct 2, 2016 at 5:50 PM, Martin Blumenstingl
>> <martin.blumenstingl@googlemail.com> wrote:
>>> There are at least two drivers (ath9k and mt76) out there, where
>>> devicetree authors need to override the enabled bands.
>>>
>>> For ath9k there is only one use-case: disabling a band which has been
>>> incorrectly enabled by the vendor in the EEPROM (enabling a band is not
>>> possible because the calibration data would be missing in the EEPROM).
>>> The mt76 driver (currently pending for review) however allows enabling
>>> and disabling the 2.4GHz and 5GHz band, see [0].
>>>
>>> Based on the discussion of (earlier versions of) my ath9k devicetree
>>> patch it was suggested [1] that we use enable- and disable- properties.
>>> The current patch implements:
>>> - bands can be enabled or disabled with the corresponding property
>>> - if both (disable and enable) properties are given and a driver asks
>>>   whether the band is enabled then the logic will return false (= not
>>>   enabled, preferring the disabled flag)
>>> - if both (disable and enable) properties are given and a driver asks
>>>   whether the band is disabled then the logic will return true (again,
>>>   preferring the disabled flag over the enabled flag)
>>>
>>> We can still change the logic to do what the mt76 driver does (I am
>>> fine with both solutions):
>>> - property not available: driver decides which bands to enable
>>> - property set to 0: disable the band
>>> - property set to 1: enable the band
>>
>> I prefer this way. Really, I'd prefer just a boolean disable property.
>> I'm not sure why you need the enable. We usually have these tri-state
>> properties when you want not present to mean use the bootloader or
>> default setting. Try to make not present the most common case.
> Felix: could you please give a few details why mt76 can not only
> disable but also enable a specific band?
> There is no specific comment in the sources [0] why this is needed.
> All other drivers that I am aware of (ath9k, rt2x00) only allow
> disabling a specific band, they never enable it (this has to be done
> explicitly by the EEPROM).
None of the devices use it at the moment, I probably added it when I
played with a device that had broken EEPROM data. I guess I decided to
do it this way because on many devices the band capability field simply
cannot be trusted.
I guess I would be okay with just having a disable property and adding
an enable property later only if we end up actually needing it.

- Felix

^ permalink raw reply

* Re: [RFC 0/3] of: add common bindings to (de)activate IEEE 802.11 bands
From: Martin Blumenstingl @ 2016-10-05 18:25 UTC (permalink / raw)
  To: Rob Herring, Felix Fietkau
  Cc: Mark Rutland, Frank Rowand, devicetree@vger.kernel.org,
	linux-wireless, ath9k-devel, ath9k-devel, Kalle Valo
In-Reply-To: <CAL_Jsq+cie=enhUYBF-OSrMdQ5S_048JF31ghEjtBNAmkGbB9Q@mail.gmail.com>

On Mon, Oct 3, 2016 at 5:22 PM, Rob Herring <robh+dt@kernel.org> wrote:
> On Sun, Oct 2, 2016 at 5:50 PM, Martin Blumenstingl
> <martin.blumenstingl@googlemail.com> wrote:
>> There are at least two drivers (ath9k and mt76) out there, where
>> devicetree authors need to override the enabled bands.
>>
>> For ath9k there is only one use-case: disabling a band which has been
>> incorrectly enabled by the vendor in the EEPROM (enabling a band is not
>> possible because the calibration data would be missing in the EEPROM).
>> The mt76 driver (currently pending for review) however allows enabling
>> and disabling the 2.4GHz and 5GHz band, see [0].
>>
>> Based on the discussion of (earlier versions of) my ath9k devicetree
>> patch it was suggested [1] that we use enable- and disable- properties.
>> The current patch implements:
>> - bands can be enabled or disabled with the corresponding property
>> - if both (disable and enable) properties are given and a driver asks
>>   whether the band is enabled then the logic will return false (= not
>>   enabled, preferring the disabled flag)
>> - if both (disable and enable) properties are given and a driver asks
>>   whether the band is disabled then the logic will return true (again,
>>   preferring the disabled flag over the enabled flag)
>>
>> We can still change the logic to do what the mt76 driver does (I am
>> fine with both solutions):
>> - property not available: driver decides which bands to enable
>> - property set to 0: disable the band
>> - property set to 1: enable the band
>
> I prefer this way. Really, I'd prefer just a boolean disable property.
> I'm not sure why you need the enable. We usually have these tri-state
> properties when you want not present to mean use the bootloader or
> default setting. Try to make not present the most common case.
Felix: could you please give a few details why mt76 can not only
disable but also enable a specific band?
There is no specific comment in the sources [0] why this is needed.
All other drivers that I am aware of (ath9k, rt2x00) only allow
disabling a specific band, they never enable it (this has to be done
explicitly by the EEPROM).

>> The new code has been integrated into ath9k to demonstrate how to use
>> it (with the benefit that the disable_2ghz and disable_5ghz settings
>> from the ath9k_platform_data can now also be configured via .dts).
>>
>> open questions/decisions needed:
>> - where to place this new code? I put it into drivers/of/of_ieee80211
>>   because that's where most other functions live.
>>   However, I found that this makes backporting the code harder (using
>>   wireless-backports from the driver-backports project [2])
>
> We are generally moving subsystem specific code like this out of
> drivers/of/, so please do that here. Maybe someone will get motivated
> to move the other networking code out too.
OK, thanks for the hint.

[0] https://github.com/openwrt/mt76/blob/master/eeprom.c

^ permalink raw reply

* Re: [PATCH v4] ath9k: Switch to using mac80211 intermediate software queues.
From: Kalle Valo @ 2016-10-05 17:54 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: linux-wireless, make-wifi-fast, ath9k-devel, Tim Shepard,
	Felix Fietkau
In-Reply-To: <87d1jeenwa.fsf@toke.dk>

Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke.dk> writes:

> Kalle Valo <kvalo@codeaurora.org> writes:
>
>> Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke.dk> writes:
>>
>>> Kalle Valo <kvalo@codeaurora.org> writes:
>>>
>>>> Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke.dk> writes:
>>>>
>>>> I understand your point, but I don't want to rush this to 4.9 and then
>>>> start getting lots of bug reports and eventually forced to revert it. =
If
>>>> we just found a new serious regression the chances are that there are
>>>> more lurking somewhere and this patch is just not ready yet.
>>>
>>> So, the changes to mac80211 that fixes the known regressions of this
>>> patch have gone in.
>>
>> I guess you mean this commit:
>>
>> bb42f2d13ffc mac80211: Move reorder-sensitive TX handlers to after TXQ d=
equeue
>>
>> (Just making sure that I have the same commit in my tree when I apply
>> this)
>
> Yup, that's the one :)
>
>>> Any chance of seeing this merged during the current merge window? :)
>>
>> I sent last new feature ("-next") patches for 4.9 last week, sorry. So
>> this has to wait for 4.10.
>
> Ah, right, I think I got my merge windows confused. You already said you
> wouldn't take it for 4.9. So I guess what I'm asking is for you to put
> it into the appropriate -next tree so it can get some wider exposure
> ahead of the *next* merge window...

Yeah, we have plenty of time for 4.10 :) So my plan is to apply this
after I open wireless-drivers-next in 2-3 weeks or so. That would mean
that the patch would hit Linus' tree when 4.10-rc1 is released
(estimated to happen on 2017-01-01). The timing is actually perfect as
now we get maximal testing time on -next.

>> And I assume I need to take v5:
>>
>> https://patchwork.kernel.org/patch/9311037/
>
> Yes. Haven't noticed anything that changed since that might conflict
> with it, but let me know if I missed something and you want a refreshed
> version.

Thanks, I'll let you know if there are any problems.

--=20
Kalle Valo

^ permalink raw reply

* ath9k excessive delay in handling EAPOL frames
From: Daniel Drake @ 2016-10-05 17:45 UTC (permalink / raw)
  To: linux-wireless; +Cc: Linux Upstreaming Team, ath9k-devel

Hi,

As this is remote problem debugging I haven't gathered quite as much
info as I would like, and won't be investigating further immediately,
but I would like to share what I have found so far, maybe it is useful
knowledge and we can revisit later.

With the following hardware on Linux 4.4, we cannot connect to our
office WPA2-PSK network. Other networks seem fine.

02:00.0 Network controller [0280]: Qualcomm Atheros QCA9565 / AR9565
Wireless Network Adapter [168c:0036] (rev 01)
Subsystem: AzureWave Device [1a3b:218d]
Kernel driver in use: ath9k

The logs show:

wpa_supplicant[585]: wlp2s0: SME: Trying to authenticate with
0c:11:67:33:8d:50 (SSID='Endless' freq=2457 MHz)
kernel: wlp2s0: authenticate with 0c:11:67:33:8d:50
NetworkManager[620]: <info> [1474483556.0677] device (wlp2s0):
supplicant interface state: inactive -> authenticating
kernel: wlp2s0: send auth to 0c:11:67:33:8d:50 (try 1/3)
kernel: wlp2s0: send auth to 0c:11:67:33:8d:50 (try 2/3)
kernel: wlp2s0: send auth to 0c:11:67:33:8d:50 (try 3/3)
wpa_supplicant[585]: wlp2s0: Trying to associate with
0c:11:67:33:8d:50 (SSID='Endless' freq=2457 MHz)
kernel: wlp2s0: authenticated
NetworkManager[620]: <info> [1474483558.1078] device (wlp2s0):
supplicant interface state: authenticating -> associating
kernel: wlp2s0: associate with 0c:11:67:33:8d:50 (try 1/3)
kernel: wlp2s0: associate with 0c:11:67:33:8d:50 (try 2/3)
kernel: wlp2s0: associate with 0c:11:67:33:8d:50 (try 3/3)
wpa_supplicant[585]: wlp2s0: Associated with 0c:11:67:33:8d:50
kernel: wlp2s0: RX AssocResp from 0c:11:67:33:8d:50 (capab=0x431 status=0 aid=5)
kernel: wlp2s0: associated
kernel: wlp2s0: deauthenticated from 0c:11:67:33:8d:50 (Reason:
23=IEEE8021X_FAILED)

Using monitor mode from another station, I observe:

- STA sends association request
- AP sends association response 0.01s later, STA acks
- AP sends EAPOL 0.002s later, STA acks
- AP sends another EAPOL 0.1s later, STA acks
- AP sends deauthentication 0.3s later (presumably a timeout waiting
for EAPOL response), STA acks
- STA sends another association request 0.5s later
- AP replies with Deauthentication (can't associated as you are deauthed)
- STA sends another association request 1s later
- AP replies with Deauthentication again
- STA sends EAPOL response message, a full 2 seconds after the first
EAPOL was received

It is as if the processing of incoming frames is getting stuck for 2
seconds, even though they were already ACKed. i.e. The first
association requests succeeds immediately but the processing of the
AssocResp frame (and the following EAPOLs and deauth) is delayed by
more than 2 seconds, far longer than the AP is willing to wait.

I have confirmed this perspective in the wpa_supplicant debug logs
too, there is 2 seconds of RX silence after the first association
request is sent before all the frames come in at once.

Hope this partial info is useful in some way, I'll come back to this
problem as time permits.

Daniel

^ permalink raw reply

* RE: [PATCH v2 2/2] mwifiex: check hw_status in suspend and resume handlers
From: Cathy Luo @ 2016-10-05 17:19 UTC (permalink / raw)
  To: Brian Norris, Amitkumar Karwar
  Cc: linux-wireless@vger.kernel.org, Nishant Sarmukadam,
	rajatja@google.com, Xinming Hu
In-Reply-To: <20161005164138.GB54237@google.com>

Hi Brian


I think your concern is right, we will update patch to handle the case you mentioned below. 

If our firmware is dead or we init failure, we should allow the system to suspend. 

We have following hardware status supported in driver. We should return -EBUSY only when hardware status is 
MWIFIEX_HW_STATUS_INITIALIZING/MWIFIEX_HW_STATUS_INIT_DONE/ MWIFIEX_HW_STATUS_CLOSING. 

enum MWIFIEX_HARDWARE_STATUS {
	MWIFIEX_HW_STATUS_READY,
	MWIFIEX_HW_STATUS_INITIALIZING,
	MWIFIEX_HW_STATUS_INIT_DONE,
	MWIFIEX_HW_STATUS_RESET,
	MWIFIEX_HW_STATUS_CLOSING,
	MWIFIEX_HW_STATUS_NOT_READY
};

Amit will help prepare the new patch to handle this. 

Regards

Cathy

-----Original Message-----
From: Brian Norris [mailto:briannorris@chromium.org] 
Sent: Wednesday, October 05, 2016 9:42 AM
To: Amitkumar Karwar
Cc: linux-wireless@vger.kernel.org; Cathy Luo; Nishant Sarmukadam; rajatja@google.com; Xinming Hu
Subject: Re: [PATCH v2 2/2] mwifiex: check hw_status in suspend and resume handlers

Hi Amit,

On Wed, Oct 05, 2016 at 12:26:36PM +0000, Amitkumar Karwar wrote:
> > From: Brian Norris [mailto:briannorris@chromium.org]
> > Sent: Wednesday, October 05, 2016 2:35 AM
> > To: Amitkumar Karwar
> > Cc: linux-wireless@vger.kernel.org; Cathy Luo; Nishant Sarmukadam; 
> > rajatja@google.com; briannorris@google.com; Xinming Hu
> > Subject: Re: [PATCH v2 2/2] mwifiex: check hw_status in suspend and 
> > resume handlers
> > 
> > On Tue, Oct 04, 2016 at 10:38:25PM +0530, Amitkumar Karwar wrote:
> > > From: Xinming Hu <huxm@marvell.com>
> > >
> > > We have observed a kernel crash when system immediately suspends 
> > > after booting. There is a race between suspend and driver 
> > > initialization paths.
> > > This patch adds hw_status checks to fix the problem
> > >
> > > Signed-off-by: Xinming Hu <huxm@marvell.com>
> > > Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
> > > ---
> > > v2: Return failure in suspend/resume handler in this scenario.
> > > ---
> > >  drivers/net/wireless/marvell/mwifiex/pcie.c | 10 ++++++----
> > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > index ba9e068..fa6bf85 100644
> > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > @@ -122,9 +122,10 @@ static int mwifiex_pcie_suspend(struct device
> > > *dev)
> > >
> > >  	if (pdev) {
> > >  		card = pci_get_drvdata(pdev);
> > > -		if (!card || !card->adapter) {
> > > +		if (!card || !card->adapter ||
> > > +		    card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {
> > 
> > Wait, is there no locking on the 'hw_status' field? That is 
> > inherently an unsafe race all on its own; you're not guaranteed that 
> > this will be read/written atomically. And you also aren't guaranteed 
> > that writes to this happen in the order they appear in the code -- 
> > in other words, reading this flag doesn't necessarily guarantee that 
> > initialization is actually complete (even if that's very likely to 
> > be true, given that it's probably just a single-instruction 
> > word-access, and any prior HW polling or interrupts likely have done 
> > some synchronization and can't be reordered).
> > actually complete
> 
> Here is the brief info on how "hw_status" flag is updated.
> 1) It gets changed incrementally during initialization.
>     MWIFIEX_HW_STATUS_INITIALIZING -> MWIFIEX_HW_STATUS_INIT_DONE -> 
> MWIFIEX_HW_STATUS_READY
> 
> 2) Status will remain READY once driver+firmware is up and running.
> 
> 3) Below is status during teardown
> MWIFIEX_HW_STATUS_READY -> MWIFIEX_HW_STATUS_RESET -> 
> MWIFIEX_HW_STATUS_CLOSING -> MWIFIEX_HW_STATUS_NOT_READY
> 
> As the events occur one after another, we don't expect a race and 
> don't need locking here for the flag. Flag status 
> MWIFIEX_HW_STATUS_READY guarantees that initialization is completed.

It seems like, as with patch 1, you're mostly arguing about the writes to this variable. But writes race with reads as well; how do you guarantee that you're not seeing incorrect values of 'hw_status' here in
suspend() -- e.g., either old or new values of it, or even partially-written values, if for some reason the compiler decides it can't read/write this all in one go?

> In worst case scenario, only first system suspend attempt issued 
> immediately after system boot will be aborted with BUSY error. I 
> think, that should be fine.

(For the record, my concern about -EBUSY is separate from my concern about the potential race condition.)

> Let me know if you have any concerns.

Sorry, I probably didn't completely flesh out my thought here.

I think I was concerned about a failed initialization causing the system to never enter suspend again. So specifically: what happens if (e.g.) the firmware fails to load? AFAICT, the device doesn't actually unbind itself from the driver, so instead, you have a device in limbo that will always return -EBUSY in suspend(), and your system can never again enter suspend. Am I correct? If so, that doesn't sound great.

> > This is probably better than nothing, but it's not very good.
> > 
> > >  			pr_err("Card or adapter structure is not valid\n");
> > > -			return 0;
> > > +			return -EBUSY;
> > 
> > So the above cases all mean that the driver hasn't finished loading, 
> > right?
> > 
> > !card => can't happen (PCIe probe() would have failed 
> > mwifiex_add_card()), but fine to check
> 
> NULL "card" or "card->adapter" pointers are expected in below cases
> 1) Driver initialization failed after downloading the firmware. Driver is performing cleanup in init failure path. Now system suspends.
> 2) Race of teardown + suspend
>  
> > 
> > !card->adapter => only happens after patch 1; i.e., when tearing 
> > down the device and detaching it from the driver
> > 
> > card->adapter->hw_status != MWIFIEX_HW_STATUS_READY => FW is not 
> > card->adapter->loaded
> > (i.e., in the process of starting or stopping FW?)
> > 
> > I guess all of those cases make sense to be -EBUSY.

^^ I'm second-guessing my claim here then.

> Yes. we can keep -EBUSY for all of the cases.

Brian

^ permalink raw reply

* Re: [PATCH v4] ath9k: Switch to using mac80211 intermediate software queues.
From: Toke Høiland-Jørgensen @ 2016-10-05 16:55 UTC (permalink / raw)
  To: Kalle Valo
  Cc: linux-wireless, make-wifi-fast, ath9k-devel, Tim Shepard,
	Felix Fietkau
In-Reply-To: <878tu2okv7.fsf@kamboji.qca.qualcomm.com>

Kalle Valo <kvalo@codeaurora.org> writes:

> Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke.dk> writes:
>
>> Kalle Valo <kvalo@codeaurora.org> writes:
>>
>>> Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke.dk> writes:
>>>
>>> I understand your point, but I don't want to rush this to 4.9 and then
>>> start getting lots of bug reports and eventually forced to revert it. If
>>> we just found a new serious regression the chances are that there are
>>> more lurking somewhere and this patch is just not ready yet.
>>
>> So, the changes to mac80211 that fixes the known regressions of this
>> patch have gone in.
>
> I guess you mean this commit:
>
> bb42f2d13ffc mac80211: Move reorder-sensitive TX handlers to after TXQ de=
queue
>
> (Just making sure that I have the same commit in my tree when I apply
> this)

Yup, that's the one :)

>> Any chance of seeing this merged during the current merge window? :)
>
> I sent last new feature ("-next") patches for 4.9 last week, sorry. So
> this has to wait for 4.10.

Ah, right, I think I got my merge windows confused. You already said you
wouldn't take it for 4.9. So I guess what I'm asking is for you to put
it into the appropriate -next tree so it can get some wider exposure
ahead of the *next* merge window...

> And I assume I need to take v5:
>
> https://patchwork.kernel.org/patch/9311037/

Yes. Haven't noticed anything that changed since that might conflict
with it, but let me know if I missed something and you want a refreshed
version.

-Toke

^ permalink raw reply

* Re: [PATCH v2 2/2] mwifiex: check hw_status in suspend and resume handlers
From: Brian Norris @ 2016-10-05 16:41 UTC (permalink / raw)
  To: Amitkumar Karwar
  Cc: linux-wireless@vger.kernel.org, Cathy Luo, Nishant Sarmukadam,
	rajatja@google.com, Xinming Hu
In-Reply-To: <39b7881045ec42628a4ffa654840681e@SC-EXCH04.marvell.com>

Hi Amit,

On Wed, Oct 05, 2016 at 12:26:36PM +0000, Amitkumar Karwar wrote:
> > From: Brian Norris [mailto:briannorris@chromium.org]
> > Sent: Wednesday, October 05, 2016 2:35 AM
> > To: Amitkumar Karwar
> > Cc: linux-wireless@vger.kernel.org; Cathy Luo; Nishant Sarmukadam;
> > rajatja@google.com; briannorris@google.com; Xinming Hu
> > Subject: Re: [PATCH v2 2/2] mwifiex: check hw_status in suspend and
> > resume handlers
> > 
> > On Tue, Oct 04, 2016 at 10:38:25PM +0530, Amitkumar Karwar wrote:
> > > From: Xinming Hu <huxm@marvell.com>
> > >
> > > We have observed a kernel crash when system immediately suspends after
> > > booting. There is a race between suspend and driver initialization
> > > paths.
> > > This patch adds hw_status checks to fix the problem
> > >
> > > Signed-off-by: Xinming Hu <huxm@marvell.com>
> > > Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
> > > ---
> > > v2: Return failure in suspend/resume handler in this scenario.
> > > ---
> > >  drivers/net/wireless/marvell/mwifiex/pcie.c | 10 ++++++----
> > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > index ba9e068..fa6bf85 100644
> > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > @@ -122,9 +122,10 @@ static int mwifiex_pcie_suspend(struct device
> > > *dev)
> > >
> > >  	if (pdev) {
> > >  		card = pci_get_drvdata(pdev);
> > > -		if (!card || !card->adapter) {
> > > +		if (!card || !card->adapter ||
> > > +		    card->adapter->hw_status != MWIFIEX_HW_STATUS_READY) {
> > 
> > Wait, is there no locking on the 'hw_status' field? That is inherently
> > an unsafe race all on its own; you're not guaranteed that this will be
> > read/written atomically. And you also aren't guaranteed that writes to
> > this happen in the order they appear in the code -- in other words,
> > reading this flag doesn't necessarily guarantee that initialization is
> > actually complete (even if that's very likely to be true, given that
> > it's probably just a single-instruction word-access, and any prior HW
> > polling or interrupts likely have done some synchronization and can't be
> > reordered).
> > actually complete
> 
> Here is the brief info on how "hw_status" flag is updated.
> 1) It gets changed incrementally during initialization.
>     MWIFIEX_HW_STATUS_INITIALIZING -> MWIFIEX_HW_STATUS_INIT_DONE -> MWIFIEX_HW_STATUS_READY
> 
> 2) Status will remain READY once driver+firmware is up and running.
> 
> 3) Below is status during teardown
> MWIFIEX_HW_STATUS_READY -> MWIFIEX_HW_STATUS_RESET -> MWIFIEX_HW_STATUS_CLOSING -> MWIFIEX_HW_STATUS_NOT_READY
> 
> As the events occur one after another, we don't expect a race and
> don't need locking here for the flag. Flag status
> MWIFIEX_HW_STATUS_READY guarantees that initialization is completed.

It seems like, as with patch 1, you're mostly arguing about the writes
to this variable. But writes race with reads as well; how do you
guarantee that you're not seeing incorrect values of 'hw_status' here in
suspend() -- e.g., either old or new values of it, or even
partially-written values, if for some reason the compiler decides it
can't read/write this all in one go?

> In worst case scenario, only first system suspend attempt issued
> immediately after system boot will be aborted with BUSY error. I
> think, that should be fine.

(For the record, my concern about -EBUSY is separate from my concern
about the potential race condition.)

> Let me know if you have any concerns.

Sorry, I probably didn't completely flesh out my thought here.

I think I was concerned about a failed initialization causing the system
to never enter suspend again. So specifically: what happens if (e.g.)
the firmware fails to load? AFAICT, the device doesn't actually unbind
itself from the driver, so instead, you have a device in limbo that will
always return -EBUSY in suspend(), and your system can never again enter
suspend. Am I correct? If so, that doesn't sound great.

> > This is probably better than nothing, but it's not very good.
> > 
> > >  			pr_err("Card or adapter structure is not valid\n");
> > > -			return 0;
> > > +			return -EBUSY;
> > 
> > So the above cases all mean that the driver hasn't finished loading,
> > right?
> > 
> > !card => can't happen (PCIe probe() would have failed
> > mwifiex_add_card()), but fine to check
> 
> NULL "card" or "card->adapter" pointers are expected in below cases
> 1) Driver initialization failed after downloading the firmware. Driver is performing cleanup in init failure path. Now system suspends.
> 2) Race of teardown + suspend
>  
> > 
> > !card->adapter => only happens after patch 1; i.e., when tearing down
> > the device and detaching it from the driver
> > 
> > card->adapter->hw_status != MWIFIEX_HW_STATUS_READY => FW is not loaded
> > (i.e., in the process of starting or stopping FW?)
> > 
> > I guess all of those cases make sense to be -EBUSY.

^^ I'm second-guessing my claim here then.

> Yes. we can keep -EBUSY for all of the cases.

Brian

^ permalink raw reply

* Re: [PATCH] ath10k: cache calibration data when the core is stopped.
From: Marty Faltesek @ 2016-10-05 16:40 UTC (permalink / raw)
  To: Michal Kazior; +Cc: ath10k@lists.infradead.org, linux-wireless
In-Reply-To: <CA+BoTQkwSX954pPEh3oYzL62H9y4Y6UuPVLSxxxCkM5AznY7NA@mail.gmail.com>

On Mon, Oct 3, 2016 at 4:02 AM, Michal Kazior <michal.kazior@tieto.com> wro=
te:
> On 13 September 2016 at 23:11, Marty Faltesek <mfaltesek@google.com> wrot=
e:
> [...]
>> +int
>> +ath10k_cal_data_alloc(struct ath10k *ar, void **buf)
>> +{
>> +       u32 hi_addr;
>> +       __le32 addr;
>> +       int ret;
>> +
>> +       vfree(*buf);
>> +       *buf =3D vmalloc(QCA988X_CAL_DATA_LEN);
>
> Shouldn't you use ar->hw_params to get hw-specific caldata length?

yup, it was because we were based on backports from earlier this year.

>
>
> [...]
>> @@ -1714,6 +1750,12 @@ int ath10k_core_start(struct ath10k *ar, enum ath=
10k_firmware_mode mode)
>>
>>         INIT_LIST_HEAD(&ar->arvifs);
>>
>> +       /*
>> +        * We are up now, so no need to cache calibration data.
>> +        */
>
> The comment style is:
>
>  /* comment */
>
> If it's multi-line it should be:
>
>  /* comment
>   * comment
>   */
>
> Ditto for other instances.

fixed, thanks.

>
>
> [...]
>> @@ -1757,6 +1799,11 @@ void ath10k_core_stop(struct ath10k *ar)
>>         lockdep_assert_held(&ar->conf_mutex);
>>         ath10k_debug_stop(ar);
>>
>> +       /*
>> +        * Cache caclibration data while stopped.
>
> typo. "calibration"
>
>
> Micha=C5=82

fixed thanks.

^ permalink raw reply

* Re: [PATCH] ath10k: cache calibration data when the core is stopped.
From: Marty Faltesek @ 2016-10-05 16:39 UTC (permalink / raw)
  To: Valo, Kalle; +Cc: ath10k@lists.infradead.org, linux-wireless@vger.kernel.org
In-Reply-To: <87fuodswlu.fsf@kamboji.qca.qualcomm.com>

On Mon, Oct 3, 2016 at 3:46 AM, Valo, Kalle <kvalo@qca.qualcomm.com> wrote:
> Marty Faltesek <mfaltesek@google.com> writes:
>
>> Caching calibration data allows it to be accessed when the
>> device is not active.
>>
>> Signed-off-by: Marty Faltesek <mfaltesek@google.com>
>
> No comma in the title, please.
>
> What tree did you use as the baseline? This doesn't seem to apply to
> ath.git:

We use backports 20160122 which has not been updated since earlier this year.
I can forward port it to your tree, and make sure
it builds but won't be able to test it. Will that be OK?

>
> Applying: ath10k: cache calibration data when the core is stopped.
> fatal: sha1 information is lacking or useless (drivers/net/wireless/ath/ath10k/core.c).
> Repository lacks necessary blobs to fall back on 3-way merge.
> Cannot fall back to three-way merge.
> Patch failed at 0001 ath10k: cache calibration data when the core is stopped.
>
>> --- a/drivers/net/wireless/ath/ath10k/core.c
>> +++ b/drivers/net/wireless/ath/ath10k/core.c
>> @@ -1227,6 +1227,42 @@ success:
>>       return 0;
>>  }
>>
>> +int
>> +ath10k_cal_data_alloc(struct ath10k *ar, void **buf)
>> +{
>> +     u32 hi_addr;
>> +     __le32 addr;
>> +     int ret;
>
> I think this function should be in debug.c. That way the code is not
> wasting memory if DEBUGFS is disabled.

ok.

>
>> +     vfree(*buf);
>> +     *buf = vmalloc(QCA988X_CAL_DATA_LEN);
>> +     if (!*buf) {
>> +             return -EAGAIN;
>> +     }
>
> Is it really necessary to allocate memory every time? What if allocate
> it only once when module is loaded, just like with
> ar->debug.fw_crash_data?

yup, good suggestion.

>
> Also please note that this patch (which I'm queuing to 4.9) touches the
> same area:
>
> ath10k: fix debug cal data file
>
> https://patchwork.kernel.org/patch/9340953/

I've modified this too, and this won't be necessary, so can you drop
it? If not, let me know and I'll
pull it in and make sure I'm based off it too.

^ permalink raw reply

* Re: [PATCH v2 1/2] mwifiex: reset card->adapter during device unregister
From: Brian Norris @ 2016-10-05 16:30 UTC (permalink / raw)
  To: Amitkumar Karwar
  Cc: linux-wireless@vger.kernel.org, Cathy Luo, Nishant Sarmukadam,
	rajatja@google.com, Xinming Hu
In-Reply-To: <8ec683dc72a746909157fca4dcbd10e8@SC-EXCH04.marvell.com>

Hi,

On Wed, Oct 05, 2016 at 02:04:53PM +0000, Amitkumar Karwar wrote:
> > From: Brian Norris [mailto:briannorris@chromium.org]
> > Sent: Wednesday, October 05, 2016 3:28 AM
> > To: Amitkumar Karwar
> > Cc: linux-wireless@vger.kernel.org; Cathy Luo; Nishant Sarmukadam;
> > rajatja@google.com; briannorris@google.com; Xinming Hu
> > Subject: Re: [PATCH v2 1/2] mwifiex: reset card->adapter during device
> > unregister
> > 
> > Hi,
> > 
> > On Tue, Oct 04, 2016 at 10:38:24PM +0530, Amitkumar Karwar wrote:

> > > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
> > > @@ -3042,6 +3042,7 @@ static void mwifiex_unregister_dev(struct
> > mwifiex_adapter *adapter)
> > >  				pci_disable_msi(pdev);
> > >  	       }
> > >  	}
> > > +	card->adapter = NULL;
> > 
> > I think you have a similar problem here as in patch 2; there is no
> > locking to protect fields in struct pcie_service_card or struct
> > sdio_mmc_card below. That problem kind of already exists, except that
> > you only write the value of card->adapter once at registration time, so
> > it's not actually unsafe. But now that you're introducing a second
> > write, you have a problem.
> > 
> > Brian
> > 
> 
> We have a global "add_remove_card_sem" semaphore in our code for
> synchronizing initialization and teardown threads. Ideally "init +
> teardown/reboot" should not have a race issue with this logic
> 
> Later there was a loophole introduced in this after using async
> firmware download API. During initialization, firmware downloads
> asynchronously in a separate thread where might have released the
> semaphore. I am working on a patch to fix this.
> 
> So "card->adapter" doesn't need to have locking. Even if we have two
> write operations, those two threads can't run simultaneously due to
> above mentioned logic.

What about writes racing with reads? You have lots of unsynchronized
cases that read this, although most of them should be halted by now
(e.g., cmd processing). I was looking at suspend() in particular, which
I thought you were looking at in this patch series.

Brian

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox