All of lore.kernel.org
 help / color / mirror / Atom feed
* [ath9k-devel] Seem to have fried my AR9300 NIC?
@ 2012-07-23 21:10 Ben Greear
  2012-07-23 22:08 ` Ben Greear
  0 siblings, 1 reply; 4+ messages in thread
From: Ben Greear @ 2012-07-23 21:10 UTC (permalink / raw)
  To: ath9k-devel

Came back after a 1 week vacation and found the 3.3.8+ kernel spitting
timeout errors, and network devices will not 'ifconfig foo up'.

I rebooted into 3.5.0+, and see the same (or at least similar errors):

ath: wiphy0: timeout (100000 us) on reg 0xa2c4: 0x00158dd9 & 0x00000001 != 0x00000000
ath: wiphy0: Unable to reset hardware; reset status -5 (freq 2412 MHz)
ath: wiphy0: timeout (100000 us) on reg 0xa640: 0x00000001 & 0x00000001 != 0x00000000
ath: wiphy0: timeout (100000 us) on reg 0xa2c4: 0x00158dd9 & 0x00000001 != 0x00000000
ath: wiphy0: Unable to reset hardware; reset status -5 (freq 2412 MHz)
ath: wiphy0: timeout (100000 us) on reg 0xa640: 0x00000001 & 0x00000001 != 0x00000000
ath: wiphy0: timeout (100000 us) on reg 0xa2c4: 0x00158dd9 & 0x00000001 != 0x00000000
ath: wiphy0: Unable to reset hardware; reset status -5 (freq 2412 MHz)

This system ran stable for several months..so maybe the hardware just died.

But if anyone has any suggestions for debugging this more, please let me know.


03:00.0 Network controller: Atheros Communications Inc. AR9300 Wireless LAN adaptor (rev 01)
	Subsystem: Atheros Communications Inc. Device 3116
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 17
	Region 0: Memory at f2500000 (64-bit, non-prefetchable) [size=128K]
	Expansion ROM at dfa00000 [disabled] [size=64K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/4 Maskable+ 64bit+
		Address: 0000000000000000  Data: 0000
		Masking: 00000000  Pending: 00000000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <4us, L1 <64us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr+ BadTLP- BadDLLP+ Rollover- Timeout- NonFatalErr+
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [140 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [300 v1] Device Serial Number 00-00-00-00-00-00-00-00
	Kernel driver in use: ath9k
	Kernel modules: ath9k

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [ath9k-devel] Seem to have fried my AR9300 NIC?
  2012-07-23 21:10 [ath9k-devel] Seem to have fried my AR9300 NIC? Ben Greear
@ 2012-07-23 22:08 ` Ben Greear
  2012-07-23 22:31   ` Ben Greear
  0 siblings, 1 reply; 4+ messages in thread
From: Ben Greear @ 2012-07-23 22:08 UTC (permalink / raw)
  To: ath9k-devel

On 07/23/2012 02:10 PM, Ben Greear wrote:
> Came back after a 1 week vacation and found the 3.3.8+ kernel spitting
> timeout errors, and network devices will not 'ifconfig foo up'.
>
> I rebooted into 3.5.0+, and see the same (or at least similar errors):

Well, I replaced the NIC and the problem remains.

Guess it's time to poke a bit deeper.

Ben

>
> ath: wiphy0: timeout (100000 us) on reg 0xa2c4: 0x00158dd9 & 0x00000001 != 0x00000000
> ath: wiphy0: Unable to reset hardware; reset status -5 (freq 2412 MHz)
> ath: wiphy0: timeout (100000 us) on reg 0xa640: 0x00000001 & 0x00000001 != 0x00000000
> ath: wiphy0: timeout (100000 us) on reg 0xa2c4: 0x00158dd9 & 0x00000001 != 0x00000000
> ath: wiphy0: Unable to reset hardware; reset status -5 (freq 2412 MHz)
> ath: wiphy0: timeout (100000 us) on reg 0xa640: 0x00000001 & 0x00000001 != 0x00000000
> ath: wiphy0: timeout (100000 us) on reg 0xa2c4: 0x00158dd9 & 0x00000001 != 0x00000000
> ath: wiphy0: Unable to reset hardware; reset status -5 (freq 2412 MHz)
>
> This system ran stable for several months..so maybe the hardware just died.
>
> But if anyone has any suggestions for debugging this more, please let me know.
>
>
> 03:00.0 Network controller: Atheros Communications Inc. AR9300 Wireless LAN adaptor (rev 01)
> 	Subsystem: Atheros Communications Inc. Device 3116
> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 64 bytes
> 	Interrupt: pin A routed to IRQ 17
> 	Region 0: Memory at f2500000 (64-bit, non-prefetchable) [size=128K]
> 	Expansion ROM at dfa00000 [disabled] [size=64K]
> 	Capabilities: [40] Power Management version 3
> 		Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold-)
> 		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> 	Capabilities: [50] MSI: Enable- Count=1/4 Maskable+ 64bit+
> 		Address: 0000000000000000  Data: 0000
> 		Masking: 00000000  Pending: 00000000
> 	Capabilities: [70] Express (v2) Endpoint, MSI 00
> 		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
> 			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> 		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
> 			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
> 			MaxPayload 128 bytes, MaxReadReq 512 bytes
> 		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
> 		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <4us, L1 <64us
> 			ClockPM- Surprise- LLActRep- BwNot-
> 		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk+
> 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> 		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> 		DevCap2: Completion Timeout: Not Supported, TimeoutDis+
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
> 		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance De-emphasis: -6dB
> 		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
> 			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
> 	Capabilities: [100 v1] Advanced Error Reporting
> 		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
> 		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
> 		CESta:	RxErr+ BadTLP- BadDLLP+ Rollover- Timeout- NonFatalErr+
> 		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
> 		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
> 	Capabilities: [140 v1] Virtual Channel
> 		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
> 		Arb:	Fixed- WRR32- WRR64- WRR128-
> 		Ctrl:	ArbSelect=Fixed
> 		Status:	InProgress-
> 		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> 			Arb:	Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
> 			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
> 			Status:	NegoPending- InProgress-
> 	Capabilities: [300 v1] Device Serial Number 00-00-00-00-00-00-00-00
> 	Kernel driver in use: ath9k
> 	Kernel modules: ath9k
>
> Thanks,
> Ben
>
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [ath9k-devel] Seem to have fried my AR9300 NIC?
  2012-07-23 22:08 ` Ben Greear
@ 2012-07-23 22:31   ` Ben Greear
  2012-07-24  5:03     ` Mohammed Shafi
  0 siblings, 1 reply; 4+ messages in thread
From: Ben Greear @ 2012-07-23 22:31 UTC (permalink / raw)
  To: ath9k-devel

On 07/23/2012 03:08 PM, Ben Greear wrote:
> On 07/23/2012 02:10 PM, Ben Greear wrote:
>> Came back after a 1 week vacation and found the 3.3.8+ kernel spitting
>> timeout errors, and network devices will not 'ifconfig foo up'.
>>
>> I rebooted into 3.5.0+, and see the same (or at least similar errors):
>
> Well, I replaced the NIC and the problem remains.
>
> Guess it's time to poke a bit deeper.

Ahh..so here's what happened.  I added code to set the rx-chainmask
and tx-chainmask from user-space app (via writing to appropriate debugfs
files).

Code assumed 0x7 by default, but this particular NIC is only 2x2.

When the chainmask is set wrong, the NIC gets into the broken state.

Changing it back to 0x3 fixes the problem.

Is that worth trying to fix in the driver, or should I just
fix it in user-space so that it never sets more than what
the eeprom reports as supported?

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [ath9k-devel] Seem to have fried my AR9300 NIC?
  2012-07-23 22:31   ` Ben Greear
@ 2012-07-24  5:03     ` Mohammed Shafi
  0 siblings, 0 replies; 4+ messages in thread
From: Mohammed Shafi @ 2012-07-24  5:03 UTC (permalink / raw)
  To: ath9k-devel

Hi Ben,

On Tue, Jul 24, 2012 at 4:01 AM, Ben Greear <greearb@candelatech.com> wrote:
> On 07/23/2012 03:08 PM, Ben Greear wrote:
>> On 07/23/2012 02:10 PM, Ben Greear wrote:
>>> Came back after a 1 week vacation and found the 3.3.8+ kernel spitting
>>> timeout errors, and network devices will not 'ifconfig foo up'.
>>>
>>> I rebooted into 3.5.0+, and see the same (or at least similar errors):
>>
>> Well, I replaced the NIC and the problem remains.
>>
>> Guess it's time to poke a bit deeper.
>
> Ahh..so here's what happened.  I added code to set the rx-chainmask
> and tx-chainmask from user-space app (via writing to appropriate debugfs
> files).
>
> Code assumed 0x7 by default, but this particular NIC is only 2x2.
>
> When the chainmask is set wrong, the NIC gets into the broken state.

great!, this could be one root cause for chip reset failures!

>
> Changing it back to 0x3 fixes the problem.
>
> Is that worth trying to fix in the driver, or should I just
> fix it in user-space so that it never sets more than what
> the eeprom reports as supported?

Felix made a fix for broken EEPROM chainmasks.

commit 6054069a03f77ffa686e2dfd5f07cff8ee40b72d
Author: Felix Fietkau <nbd@openwrt.org>
Date:   Tue Jul 19 08:46:44 2011 +0200

    ath9k_hw: validate and fix broken eeprom chainmask settings

    Some devices (e.g. Ubiquiti AirRouter) ship with broken EEPROM chainmask
    data, which breaks the initial calibration after a hardware reset.
    To fix this, mask the eeprom chainmask with the chainmask of the chip,
    and use the chip chainmask if the result is zero.

    Signed-off-by: Felix Fietkau <nbd@openwrt.org>
    Signed-off-by: John W. Linville <linville@tuxdriver.com>

the hard coded chain mask comes into picture only when the EEPROM
chainmask settings are zero.
Incase we are validating the chainmask in the driver we got to be sure
of validating for all chipsets.
Also we need to figure it out how to differentiate AR9382 (2x2) and
AR9380 (3x3).


>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>
>
>
> _______________________________________________
> ath9k-devel mailing list
> ath9k-devel at lists.ath9k.org
> https://lists.ath9k.org/mailman/listinfo/ath9k-devel



-- 
thanks,
shafi

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-07-24  5:03 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-23 21:10 [ath9k-devel] Seem to have fried my AR9300 NIC? Ben Greear
2012-07-23 22:08 ` Ben Greear
2012-07-23 22:31   ` Ben Greear
2012-07-24  5:03     ` Mohammed Shafi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.