* RE: [E1000-devel] 82571EB: Detected Hardware Unit Hang
[not found] ` <9B4A1B1917080E46B64F07F2989DADD62F2D8070@ORSMSX102.amr.corp.intel.com>
@ 2012-11-27 18:10 ` Ben Hutchings
2012-11-27 18:24 ` Fujinaka, Todd
2012-11-28 8:31 ` Joe Jin
0 siblings, 2 replies; 9+ messages in thread
From: Ben Hutchings @ 2012-11-27 18:10 UTC (permalink / raw)
To: Fujinaka, Todd, Mary Mcgrath
Cc: Joe Jin, netdev@vger.kernel.org, e1000-devel@lists.sf.net,
linux-kernel@vger.kernel.org, linux-pci
On Tue, 2012-11-27 at 17:32 +0000, Fujinaka, Todd wrote:
> Forgive me if I'm being too repetitious as I think some of this has
> been mentioned in the past.
>
> We (and by we I mean the Ethernet part and driver) can only change the
> advertised availability of a larger MaxPayloadSize. The size is
> negotiated by both sides of the link when the link is established. The
> driver should not change the size of the link as it would be poking at
> registers outside of its scope and is controlled by the upstream
> bridge (not us).
[...]
MaxPayloadSize (MPS) is not negotiated between devices but is programmed
by the system firmware (at least for devices present at boot - the
kernel may be responsible in case of hotplug). You can use the kernel
parameter 'pci=pcie_bus_perf' (or one of several others) to set a policy
that overrides this, but no policy will allow setting MPS above the
device's MaxPayloadSizeSupported (MPSS).
(These parameters are not documented in
Documentation/kernel-parameters.txt! Someone ought to fix that.)
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [E1000-devel] 82571EB: Detected Hardware Unit Hang
2012-11-27 18:10 ` [E1000-devel] 82571EB: Detected Hardware Unit Hang Ben Hutchings
@ 2012-11-27 18:24 ` Fujinaka, Todd
2012-11-28 8:31 ` Joe Jin
1 sibling, 0 replies; 9+ messages in thread
From: Fujinaka, Todd @ 2012-11-27 18:24 UTC (permalink / raw)
To: Ben Hutchings, Mary Mcgrath
Cc: Joe Jin, netdev@vger.kernel.org, e1000-devel@lists.sf.net,
linux-kernel@vger.kernel.org, linux-pci
VGhhbmtzIGZvciB0aGUgY2xhcmlmaWNhdGlvbi4gSSB3YXMganVzdCBnb2luZyBieSB0aGUgUENJ
ZSBzcGVjLCB3aGljaCBzYXlzIHRoZSBsb3dlc3QgdmFsdWUgb2YgYm90aCBlbmRzIGlzIHVzZWQs
IGFuZCBJIGZpZ3VyZWQgU09NRVRISU5HIGhhZCB0byBiZSBsb29raW5nIGF0IHRoYXQgYW5kIGRv
aW5nIHNvbWUgc29ydCBvZiBuZWdvdGlhdGlvbi4gSSdtIG5vIEJJT1MgZ3V5LCBzbyBJJ20gbm90
IHN1cmUgd2hhdCdzIGFjdHVhbGx5IGdvaW5nIG9uLCB3aGV0aGVyIHNvbWV0aGluZyB3YWxrcyB0
aGUgUENJZSB0cmVlIG9yIGlmIHRoZSBCSU9TIGp1c3Qgc2V0cyBhbGwgdGhlIHZhbHVlcyB0byB0
aGUgbWluaW11bS4NCg0KVG9kZCBGdWppbmFrYQ0KVGVjaG5pY2FsIE1hcmtldGluZyBFbmdpbmVl
cg0KTEFOIEFjY2VzcyBEaXZpc2lvbiAoTEFEKQ0KSW50ZWwgQ29ycG9yYXRpb24NCnRvZGQuZnVq
aW5ha2FAaW50ZWwuY29tDQooNTAzKSA3MTItNDU2NQ0KDQoNCi0tLS0tT3JpZ2luYWwgTWVzc2Fn
ZS0tLS0tDQpGcm9tOiBCZW4gSHV0Y2hpbmdzIFttYWlsdG86Ymh1dGNoaW5nc0Bzb2xhcmZsYXJl
LmNvbV0gDQpTZW50OiBUdWVzZGF5LCBOb3ZlbWJlciAyNywgMjAxMiAxMDoxMSBBTQ0KVG86IEZ1
amluYWthLCBUb2RkOyBNYXJ5IE1jZ3JhdGgNCkNjOiBKb2UgSmluOyBuZXRkZXZAdmdlci5rZXJu
ZWwub3JnOyBlMTAwMC1kZXZlbEBsaXN0cy5zZi5uZXQ7IGxpbnV4LWtlcm5lbEB2Z2VyLmtlcm5l
bC5vcmc7IGxpbnV4LXBjaQ0KU3ViamVjdDogUkU6IFtFMTAwMC1kZXZlbF0gODI1NzFFQjogRGV0
ZWN0ZWQgSGFyZHdhcmUgVW5pdCBIYW5nDQoNCk9uIFR1ZSwgMjAxMi0xMS0yNyBhdCAxNzozMiAr
MDAwMCwgRnVqaW5ha2EsIFRvZGQgd3JvdGU6DQo+IEZvcmdpdmUgbWUgaWYgSSdtIGJlaW5nIHRv
byByZXBldGl0aW91cyBhcyBJIHRoaW5rIHNvbWUgb2YgdGhpcyBoYXMgDQo+IGJlZW4gbWVudGlv
bmVkIGluIHRoZSBwYXN0Lg0KPiANCj4gV2UgKGFuZCBieSB3ZSBJIG1lYW4gdGhlIEV0aGVybmV0
IHBhcnQgYW5kIGRyaXZlcikgY2FuIG9ubHkgY2hhbmdlIHRoZSANCj4gYWR2ZXJ0aXNlZCBhdmFp
bGFiaWxpdHkgb2YgYSBsYXJnZXIgTWF4UGF5bG9hZFNpemUuIFRoZSBzaXplIGlzIA0KPiBuZWdv
dGlhdGVkIGJ5IGJvdGggc2lkZXMgb2YgdGhlIGxpbmsgd2hlbiB0aGUgbGluayBpcyBlc3RhYmxp
c2hlZC4gVGhlIA0KPiBkcml2ZXIgc2hvdWxkIG5vdCBjaGFuZ2UgdGhlIHNpemUgb2YgdGhlIGxp
bmsgYXMgaXQgd291bGQgYmUgcG9raW5nIGF0IA0KPiByZWdpc3RlcnMgb3V0c2lkZSBvZiBpdHMg
c2NvcGUgYW5kIGlzIGNvbnRyb2xsZWQgYnkgdGhlIHVwc3RyZWFtIA0KPiBicmlkZ2UgKG5vdCB1
cykuDQpbLi4uXQ0KDQpNYXhQYXlsb2FkU2l6ZSAoTVBTKSBpcyBub3QgbmVnb3RpYXRlZCBiZXR3
ZWVuIGRldmljZXMgYnV0IGlzIHByb2dyYW1tZWQgYnkgdGhlIHN5c3RlbSBmaXJtd2FyZSAoYXQg
bGVhc3QgZm9yIGRldmljZXMgcHJlc2VudCBhdCBib290IC0gdGhlIGtlcm5lbCBtYXkgYmUgcmVz
cG9uc2libGUgaW4gY2FzZSBvZiBob3RwbHVnKS4gIFlvdSBjYW4gdXNlIHRoZSBrZXJuZWwgcGFy
YW1ldGVyICdwY2k9cGNpZV9idXNfcGVyZicgKG9yIG9uZSBvZiBzZXZlcmFsIG90aGVycykgdG8g
c2V0IGEgcG9saWN5IHRoYXQgb3ZlcnJpZGVzIHRoaXMsIGJ1dCBubyBwb2xpY3kgd2lsbCBhbGxv
dyBzZXR0aW5nIE1QUyBhYm92ZSB0aGUgZGV2aWNlJ3MgTWF4UGF5bG9hZFNpemVTdXBwb3J0ZWQg
KE1QU1MpLg0KDQooVGhlc2UgcGFyYW1ldGVycyBhcmUgbm90IGRvY3VtZW50ZWQgaW4NCkRvY3Vt
ZW50YXRpb24va2VybmVsLXBhcmFtZXRlcnMudHh0ISAgU29tZW9uZSBvdWdodCB0byBmaXggdGhh
dC4pDQoNCkJlbi4NCg0KLS0NCkJlbiBIdXRjaGluZ3MsIFN0YWZmIEVuZ2luZWVyLCBTb2xhcmZs
YXJlIE5vdCBzcGVha2luZyBmb3IgbXkgZW1wbG95ZXI7IHRoYXQncyB0aGUgbWFya2V0aW5nIGRl
cGFydG1lbnQncyBqb2IuDQpUaGV5IGFza2VkIHVzIHRvIG5vdGUgdGhhdCBTb2xhcmZsYXJlIHBy
b2R1Y3QgbmFtZXMgYXJlIHRyYWRlbWFya2VkLg0KDQo=
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
2012-11-27 18:10 ` [E1000-devel] 82571EB: Detected Hardware Unit Hang Ben Hutchings
2012-11-27 18:24 ` Fujinaka, Todd
@ 2012-11-28 8:31 ` Joe Jin
2012-11-28 15:53 ` Fujinaka, Todd
1 sibling, 1 reply; 9+ messages in thread
From: Joe Jin @ 2012-11-28 8:31 UTC (permalink / raw)
To: Ben Hutchings
Cc: Fujinaka, Todd, Mary Mcgrath, netdev@vger.kernel.org,
e1000-devel@lists.sf.net, linux-kernel@vger.kernel.org, linux-pci
On 11/28/12 02:10, Ben Hutchings wrote:
> On Tue, 2012-11-27 at 17:32 +0000, Fujinaka, Todd wrote:
>> Forgive me if I'm being too repetitious as I think some of this has
>> been mentioned in the past.
>>
>> We (and by we I mean the Ethernet part and driver) can only change the
>> advertised availability of a larger MaxPayloadSize. The size is
>> negotiated by both sides of the link when the link is established. The
>> driver should not change the size of the link as it would be poking at
>> registers outside of its scope and is controlled by the upstream
>> bridge (not us).
> [...]
>
> MaxPayloadSize (MPS) is not negotiated between devices but is programmed
> by the system firmware (at least for devices present at boot - the
> kernel may be responsible in case of hotplug). You can use the kernel
> parameter 'pci=pcie_bus_perf' (or one of several others) to set a policy
> that overrides this, but no policy will allow setting MPS above the
> device's MaxPayloadSizeSupported (MPSS).
>
Ben,
Unfortunately I'm using 3.0.x kernel and this is not included in the kernel.
So I'm trying to use ethtool modify it from eeprom to see if help or no.
Todd, I'll review all MaxPayload for all devices, but need to say if it mismatch,
customer could not modify it from BIOS for there was not entry at there, to
test it, we have to find how to verify if this is the root cause, so still
need to find the offset in eeprom.
Thanks in advance,
Joe
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [E1000-devel] 82571EB: Detected Hardware Unit Hang
2012-11-28 8:31 ` Joe Jin
@ 2012-11-28 15:53 ` Fujinaka, Todd
2012-11-29 3:10 ` Ethan Zhao
0 siblings, 1 reply; 9+ messages in thread
From: Fujinaka, Todd @ 2012-11-28 15:53 UTC (permalink / raw)
To: Joe Jin, Ben Hutchings
Cc: Mary Mcgrath, netdev@vger.kernel.org, e1000-devel@lists.sf.net,
linux-kernel@vger.kernel.org, linux-pci
VGhlIG9ubHkgRUVQUk9NIEkga25vdyBhYm91dCBvciBjYW4gc3BlYWsgdG8gaXMgdGhlIG9uZSBh
dHRhY2hlZCB0byB0aGUgODI1NzEgYW5kIGl0IGRvZXNuJ3Qgc2V0IHRoZSBNYXhQYXlsb2FkU2l6
ZS4gVGhhdCdzIGRvbmUgYnkgdGhlIEJJT1MuDQoNClRvZGQgRnVqaW5ha2ENClRlY2huaWNhbCBN
YXJrZXRpbmcgRW5naW5lZXINCkxBTiBBY2Nlc3MgRGl2aXNpb24gKExBRCkNCkludGVsIENvcnBv
cmF0aW9uDQp0b2RkLmZ1amluYWthQGludGVsLmNvbQ0KKDUwMykgNzEyLTQ1NjUNCg0KDQotLS0t
LU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KRnJvbTogSm9lIEppbiBbbWFpbHRvOmpvZS5qaW5Ab3Jh
Y2xlLmNvbV0gDQpTZW50OiBXZWRuZXNkYXksIE5vdmVtYmVyIDI4LCAyMDEyIDEyOjMxIEFNDQpU
bzogQmVuIEh1dGNoaW5ncw0KQ2M6IEZ1amluYWthLCBUb2RkOyBNYXJ5IE1jZ3JhdGg7IG5ldGRl
dkB2Z2VyLmtlcm5lbC5vcmc7IGUxMDAwLWRldmVsQGxpc3RzLnNmLm5ldDsgbGludXgta2VybmVs
QHZnZXIua2VybmVsLm9yZzsgbGludXgtcGNpDQpTdWJqZWN0OiBSZTogW0UxMDAwLWRldmVsXSA4
MjU3MUVCOiBEZXRlY3RlZCBIYXJkd2FyZSBVbml0IEhhbmcNCg0KT24gMTEvMjgvMTIgMDI6MTAs
IEJlbiBIdXRjaGluZ3Mgd3JvdGU6DQo+IE9uIFR1ZSwgMjAxMi0xMS0yNyBhdCAxNzozMiArMDAw
MCwgRnVqaW5ha2EsIFRvZGQgd3JvdGU6DQo+PiBGb3JnaXZlIG1lIGlmIEknbSBiZWluZyB0b28g
cmVwZXRpdGlvdXMgYXMgSSB0aGluayBzb21lIG9mIHRoaXMgaGFzIA0KPj4gYmVlbiBtZW50aW9u
ZWQgaW4gdGhlIHBhc3QuDQo+Pg0KPj4gV2UgKGFuZCBieSB3ZSBJIG1lYW4gdGhlIEV0aGVybmV0
IHBhcnQgYW5kIGRyaXZlcikgY2FuIG9ubHkgY2hhbmdlIA0KPj4gdGhlIGFkdmVydGlzZWQgYXZh
aWxhYmlsaXR5IG9mIGEgbGFyZ2VyIE1heFBheWxvYWRTaXplLiBUaGUgc2l6ZSBpcyANCj4+IG5l
Z290aWF0ZWQgYnkgYm90aCBzaWRlcyBvZiB0aGUgbGluayB3aGVuIHRoZSBsaW5rIGlzIGVzdGFi
bGlzaGVkLiANCj4+IFRoZSBkcml2ZXIgc2hvdWxkIG5vdCBjaGFuZ2UgdGhlIHNpemUgb2YgdGhl
IGxpbmsgYXMgaXQgd291bGQgYmUgDQo+PiBwb2tpbmcgYXQgcmVnaXN0ZXJzIG91dHNpZGUgb2Yg
aXRzIHNjb3BlIGFuZCBpcyBjb250cm9sbGVkIGJ5IHRoZSANCj4+IHVwc3RyZWFtIGJyaWRnZSAo
bm90IHVzKS4NCj4gWy4uLl0NCj4gDQo+IE1heFBheWxvYWRTaXplIChNUFMpIGlzIG5vdCBuZWdv
dGlhdGVkIGJldHdlZW4gZGV2aWNlcyBidXQgaXMgDQo+IHByb2dyYW1tZWQgYnkgdGhlIHN5c3Rl
bSBmaXJtd2FyZSAoYXQgbGVhc3QgZm9yIGRldmljZXMgcHJlc2VudCBhdCANCj4gYm9vdCAtIHRo
ZSBrZXJuZWwgbWF5IGJlIHJlc3BvbnNpYmxlIGluIGNhc2Ugb2YgaG90cGx1ZykuICBZb3UgY2Fu
IHVzZSANCj4gdGhlIGtlcm5lbCBwYXJhbWV0ZXIgJ3BjaT1wY2llX2J1c19wZXJmJyAob3Igb25l
IG9mIHNldmVyYWwgb3RoZXJzKSB0byANCj4gc2V0IGEgcG9saWN5IHRoYXQgb3ZlcnJpZGVzIHRo
aXMsIGJ1dCBubyBwb2xpY3kgd2lsbCBhbGxvdyBzZXR0aW5nIE1QUyANCj4gYWJvdmUgdGhlIGRl
dmljZSdzIE1heFBheWxvYWRTaXplU3VwcG9ydGVkIChNUFNTKS4NCj4gDQoNCkJlbiwNCg0KVW5m
b3J0dW5hdGVseSBJJ20gdXNpbmcgMy4wLngga2VybmVsIGFuZCB0aGlzIGlzIG5vdCBpbmNsdWRl
ZCBpbiB0aGUga2VybmVsLg0KU28gSSdtIHRyeWluZyB0byB1c2UgZXRodG9vbCBtb2RpZnkgaXQg
ZnJvbSBlZXByb20gdG8gc2VlIGlmIGhlbHAgb3Igbm8uDQoNCg0KVG9kZCwgSSdsbCByZXZpZXcg
YWxsIE1heFBheWxvYWQgZm9yIGFsbCBkZXZpY2VzLCBidXQgbmVlZCB0byBzYXkgaWYgaXQgbWlz
bWF0Y2gsIGN1c3RvbWVyIGNvdWxkIG5vdCBtb2RpZnkgaXQgZnJvbSBCSU9TIGZvciB0aGVyZSB3
YXMgbm90IGVudHJ5IGF0IHRoZXJlLCB0byB0ZXN0IGl0LCB3ZSBoYXZlIHRvIGZpbmQgaG93IHRv
IHZlcmlmeSBpZiB0aGlzIGlzIHRoZSByb290IGNhdXNlLCBzbyBzdGlsbCBuZWVkIHRvIGZpbmQg
dGhlIG9mZnNldCBpbiBlZXByb20uDQoNClRoYW5rcyBpbiBhZHZhbmNlLA0KSm9lDQoNCg==
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
2012-11-28 15:53 ` Fujinaka, Todd
@ 2012-11-29 3:10 ` Ethan Zhao
2012-11-29 15:52 ` Fujinaka, Todd
0 siblings, 1 reply; 9+ messages in thread
From: Ethan Zhao @ 2012-11-29 3:10 UTC (permalink / raw)
To: Fujinaka, Todd
Cc: Joe Jin, Ben Hutchings, Mary Mcgrath, netdev@vger.kernel.org,
e1000-devel@lists.sf.net, linux-kernel@vger.kernel.org, linux-pci
Joe,
Possibly your customer is running a kernel without source code on
a platform whose vendor wouldn't like to fix BIOS issue( Is that a
HP/Dell server ?).
Anyway, to see if is a payload issue or, you could change the
payload size with setpci tool to those devices and set the link
retrain bit to trigger the link retraining to debug the issue and
identity the root cause. I thinks it is much easier than modify the
BIOS or eeprom of NIC.
e.g.
set device control register to 0f 00 (128 bytes payload size)
# setpci -v -s 00:02.0 98.w=000f
set device link control register to 60h (retrain the link)
# setpci -v -s 00:02.0 a0.b=60
Hope it works, Just my 2 cents.
Ethan.zhao@oracle.com
On Wed, Nov 28, 2012 at 11:53 PM, Fujinaka, Todd
<todd.fujinaka@intel.com> wrote:
> The only EEPROM I know about or can speak to is the one attached to the 82571 and it doesn't set the MaxPayloadSize. That's done by the BIOS.
>
> Todd Fujinaka
> Technical Marketing Engineer
> LAN Access Division (LAD)
> Intel Corporation
> todd.fujinaka@intel.com
> (503) 712-4565
>
>
> -----Original Message-----
> From: Joe Jin [mailto:joe.jin@oracle.com]
> Sent: Wednesday, November 28, 2012 12:31 AM
> To: Ben Hutchings
> Cc: Fujinaka, Todd; Mary Mcgrath; netdev@vger.kernel.org; e1000-devel@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
>
> On 11/28/12 02:10, Ben Hutchings wrote:
>> On Tue, 2012-11-27 at 17:32 +0000, Fujinaka, Todd wrote:
>>> Forgive me if I'm being too repetitious as I think some of this has
>>> been mentioned in the past.
>>>
>>> We (and by we I mean the Ethernet part and driver) can only change
>>> the advertised availability of a larger MaxPayloadSize. The size is
>>> negotiated by both sides of the link when the link is established.
>>> The driver should not change the size of the link as it would be
>>> poking at registers outside of its scope and is controlled by the
>>> upstream bridge (not us).
>> [...]
>>
>> MaxPayloadSize (MPS) is not negotiated between devices but is
>> programmed by the system firmware (at least for devices present at
>> boot - the kernel may be responsible in case of hotplug). You can use
>> the kernel parameter 'pci=pcie_bus_perf' (or one of several others) to
>> set a policy that overrides this, but no policy will allow setting MPS
>> above the device's MaxPayloadSizeSupported (MPSS).
>>
>
> Ben,
>
> Unfortunately I'm using 3.0.x kernel and this is not included in the kernel.
> So I'm trying to use ethtool modify it from eeprom to see if help or no.
>
>
> Todd, I'll review all MaxPayload for all devices, but need to say if it mismatch, customer could not modify it from BIOS for there was not entry at there, to test it, we have to find how to verify if this is the root cause, so still need to find the offset in eeprom.
>
> Thanks in advance,
> Joe
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [E1000-devel] 82571EB: Detected Hardware Unit Hang
2012-11-29 3:10 ` Ethan Zhao
@ 2012-11-29 15:52 ` Fujinaka, Todd
2012-12-19 3:04 ` Joe Jin
0 siblings, 1 reply; 9+ messages in thread
From: Fujinaka, Todd @ 2012-11-29 15:52 UTC (permalink / raw)
To: Ethan Zhao
Cc: Joe Jin, Ben Hutchings, Mary Mcgrath, netdev@vger.kernel.org,
e1000-devel@lists.sf.net, linux-kernel@vger.kernel.org, linux-pci
Someone else pointed this out to me locally. If you have a non-client BIOS, you should be able to set the MaxPayloadSize using setpci. You have to make sure that you're being consistent throughout all the associated links.
Todd Fujinaka
Technical Marketing Engineer
LAN Access Division (LAD)
Intel Corporation
todd.fujinaka@intel.com
(503) 712-4565
-----Original Message-----
From: Ethan Zhao [mailto:ethan.kernel@gmail.com]
Sent: Wednesday, November 28, 2012 7:10 PM
To: Fujinaka, Todd
Cc: Joe Jin; Ben Hutchings; Mary Mcgrath; netdev@vger.kernel.org; e1000-devel@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
Joe,
Possibly your customer is running a kernel without source code on a platform whose vendor wouldn't like to fix BIOS issue( Is that a HP/Dell server ?).
Anyway, to see if is a payload issue or, you could change the payload size with setpci tool to those devices and set the link retrain bit to trigger the link retraining to debug the issue and identity the root cause. I thinks it is much easier than modify the BIOS or eeprom of NIC.
e.g.
set device control register to 0f 00 (128 bytes payload size)
# setpci -v -s 00:02.0 98.w=000f
set device link control register to 60h (retrain the link)
# setpci -v -s 00:02.0 a0.b=60
Hope it works, Just my 2 cents.
Ethan.zhao@oracle.com
On Wed, Nov 28, 2012 at 11:53 PM, Fujinaka, Todd <todd.fujinaka@intel.com> wrote:
> The only EEPROM I know about or can speak to is the one attached to the 82571 and it doesn't set the MaxPayloadSize. That's done by the BIOS.
>
> Todd Fujinaka
> Technical Marketing Engineer
> LAN Access Division (LAD)
> Intel Corporation
> todd.fujinaka@intel.com
> (503) 712-4565
>
>
> -----Original Message-----
> From: Joe Jin [mailto:joe.jin@oracle.com]
> Sent: Wednesday, November 28, 2012 12:31 AM
> To: Ben Hutchings
> Cc: Fujinaka, Todd; Mary Mcgrath; netdev@vger.kernel.org;
> e1000-devel@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
>
> On 11/28/12 02:10, Ben Hutchings wrote:
>> On Tue, 2012-11-27 at 17:32 +0000, Fujinaka, Todd wrote:
>>> Forgive me if I'm being too repetitious as I think some of this has
>>> been mentioned in the past.
>>>
>>> We (and by we I mean the Ethernet part and driver) can only change
>>> the advertised availability of a larger MaxPayloadSize. The size is
>>> negotiated by both sides of the link when the link is established.
>>> The driver should not change the size of the link as it would be
>>> poking at registers outside of its scope and is controlled by the
>>> upstream bridge (not us).
>> [...]
>>
>> MaxPayloadSize (MPS) is not negotiated between devices but is
>> programmed by the system firmware (at least for devices present at
>> boot - the kernel may be responsible in case of hotplug). You can
>> use the kernel parameter 'pci=pcie_bus_perf' (or one of several
>> others) to set a policy that overrides this, but no policy will allow
>> setting MPS above the device's MaxPayloadSizeSupported (MPSS).
>>
>
> Ben,
>
> Unfortunately I'm using 3.0.x kernel and this is not included in the kernel.
> So I'm trying to use ethtool modify it from eeprom to see if help or no.
>
>
> Todd, I'll review all MaxPayload for all devices, but need to say if it mismatch, customer could not modify it from BIOS for there was not entry at there, to test it, we have to find how to verify if this is the root cause, so still need to find the offset in eeprom.
>
> Thanks in advance,
> Joe
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
2012-11-29 15:52 ` Fujinaka, Todd
@ 2012-12-19 3:04 ` Joe Jin
2012-12-19 5:52 ` Yijing Wang
0 siblings, 1 reply; 9+ messages in thread
From: Joe Jin @ 2012-12-19 3:04 UTC (permalink / raw)
To: Fujinaka, Todd
Cc: Ethan Zhao, Ben Hutchings, Mary Mcgrath, netdev@vger.kernel.org,
e1000-devel@lists.sf.net, linux-kernel@vger.kernel.org, linux-pci
Hi all,
I backported mps commits and ask customer pass "pci=pcie_bus_peer2pee" to kernel
to limited MPS to 128 and issue disappeared, sound like this is a BIOS bug.
Thanks all of your help.
Best Regards,
Joe
On 11/29/12 23:52, Fujinaka, Todd wrote:
> Someone else pointed this out to me locally. If you have a non-client BIOS, you should be able to set the MaxPayloadSize using setpci. You have to make sure that you're being consistent throughout all the associated links.
>
> Todd Fujinaka
> Technical Marketing Engineer
> LAN Access Division (LAD)
> Intel Corporation
> todd.fujinaka@intel.com
> (503) 712-4565
>
>
> -----Original Message-----
> From: Ethan Zhao [mailto:ethan.kernel@gmail.com]
> Sent: Wednesday, November 28, 2012 7:10 PM
> To: Fujinaka, Todd
> Cc: Joe Jin; Ben Hutchings; Mary Mcgrath; netdev@vger.kernel.org; e1000-devel@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
>
> Joe,
> Possibly your customer is running a kernel without source code on a platform whose vendor wouldn't like to fix BIOS issue( Is that a HP/Dell server ?).
> Anyway, to see if is a payload issue or, you could change the payload size with setpci tool to those devices and set the link retrain bit to trigger the link retraining to debug the issue and identity the root cause. I thinks it is much easier than modify the BIOS or eeprom of NIC.
>
> e.g.
> set device control register to 0f 00 (128 bytes payload size)
> # setpci -v -s 00:02.0 98.w=000f
> set device link control register to 60h (retrain the link)
> # setpci -v -s 00:02.0 a0.b=60
>
> Hope it works, Just my 2 cents.
>
> Ethan.zhao@oracle.com
>
> On Wed, Nov 28, 2012 at 11:53 PM, Fujinaka, Todd <todd.fujinaka@intel.com> wrote:
>> The only EEPROM I know about or can speak to is the one attached to the 82571 and it doesn't set the MaxPayloadSize. That's done by the BIOS.
>>
>> Todd Fujinaka
>> Technical Marketing Engineer
>> LAN Access Division (LAD)
>> Intel Corporation
>> todd.fujinaka@intel.com
>> (503) 712-4565
>>
>>
>> -----Original Message-----
>> From: Joe Jin [mailto:joe.jin@oracle.com]
>> Sent: Wednesday, November 28, 2012 12:31 AM
>> To: Ben Hutchings
>> Cc: Fujinaka, Todd; Mary Mcgrath; netdev@vger.kernel.org;
>> e1000-devel@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
>> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
>>
>> On 11/28/12 02:10, Ben Hutchings wrote:
>>> On Tue, 2012-11-27 at 17:32 +0000, Fujinaka, Todd wrote:
>>>> Forgive me if I'm being too repetitious as I think some of this has
>>>> been mentioned in the past.
>>>>
>>>> We (and by we I mean the Ethernet part and driver) can only change
>>>> the advertised availability of a larger MaxPayloadSize. The size is
>>>> negotiated by both sides of the link when the link is established.
>>>> The driver should not change the size of the link as it would be
>>>> poking at registers outside of its scope and is controlled by the
>>>> upstream bridge (not us).
>>> [...]
>>>
>>> MaxPayloadSize (MPS) is not negotiated between devices but is
>>> programmed by the system firmware (at least for devices present at
>>> boot - the kernel may be responsible in case of hotplug). You can
>>> use the kernel parameter 'pci=pcie_bus_perf' (or one of several
>>> others) to set a policy that overrides this, but no policy will allow
>>> setting MPS above the device's MaxPayloadSizeSupported (MPSS).
>>>
>>
>> Ben,
>>
>> Unfortunately I'm using 3.0.x kernel and this is not included in the kernel.
>> So I'm trying to use ethtool modify it from eeprom to see if help or no.
>>
>>
>> Todd, I'll review all MaxPayload for all devices, but need to say if it mismatch, customer could not modify it from BIOS for there was not entry at there, to test it, we have to find how to verify if this is the root cause, so still need to find the offset in eeprom.
>>
>> Thanks in advance,
>> Joe
>>
--
Oracle <http://www.oracle.com>
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
2012-12-19 3:04 ` Joe Jin
@ 2012-12-19 5:52 ` Yijing Wang
2012-12-19 6:13 ` Joe Jin
0 siblings, 1 reply; 9+ messages in thread
From: Yijing Wang @ 2012-12-19 5:52 UTC (permalink / raw)
To: Joe Jin
Cc: Fujinaka, Todd, Ethan Zhao, Ben Hutchings, Mary Mcgrath,
netdev@vger.kernel.org, e1000-devel@lists.sf.net,
linux-kernel@vger.kernel.org, linux-pci, Jon Mason, Bjorn Helgaas
On 2012/12/19 11:04, Joe Jin wrote:
> Hi all,
>
> I backported mps commits and ask customer pass "pci=pcie_bus_peer2pee" to kernel
> to limited MPS to 128 and issue disappeared, sound like this is a BIOS bug.
>
Hi Joe,
I found similar problem when I do pci hotplug, discussion is here:http://marc.info/?l=linux-pci&m=134810569924220&w=2.
We try to improve Linux kernel to debug this problem easily based Bjorn's suggestion. Jon sent out the first version patch http://marc.info/?l=linux-pci&m=135002016005274&w=2.
I think we can do further here, http://marc.info/?l=linux-pci&m=135115581307869&w=2. I hope this information can help you.
Thanks!
Yijing.
> Thanks all of your help.
>
> Best Regards,
> Joe
>
> On 11/29/12 23:52, Fujinaka, Todd wrote:
>> Someone else pointed this out to me locally. If you have a non-client BIOS, you should be able to set the MaxPayloadSize using setpci. You have to make sure that you're being consistent throughout all the associated links.
>>
>> Todd Fujinaka
>> Technical Marketing Engineer
>> LAN Access Division (LAD)
>> Intel Corporation
>> todd.fujinaka@intel.com
>> (503) 712-4565
>>
>>
>> -----Original Message-----
>> From: Ethan Zhao [mailto:ethan.kernel@gmail.com]
>> Sent: Wednesday, November 28, 2012 7:10 PM
>> To: Fujinaka, Todd
>> Cc: Joe Jin; Ben Hutchings; Mary Mcgrath; netdev@vger.kernel.org; e1000-devel@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
>> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
>>
>> Joe,
>> Possibly your customer is running a kernel without source code on a platform whose vendor wouldn't like to fix BIOS issue( Is that a HP/Dell server ?).
>> Anyway, to see if is a payload issue or, you could change the payload size with setpci tool to those devices and set the link retrain bit to trigger the link retraining to debug the issue and identity the root cause. I thinks it is much easier than modify the BIOS or eeprom of NIC.
>>
>> e.g.
>> set device control register to 0f 00 (128 bytes payload size)
>> # setpci -v -s 00:02.0 98.w=000f
>> set device link control register to 60h (retrain the link)
>> # setpci -v -s 00:02.0 a0.b=60
>>
>> Hope it works, Just my 2 cents.
>>
>> Ethan.zhao@oracle.com
>>
>> On Wed, Nov 28, 2012 at 11:53 PM, Fujinaka, Todd <todd.fujinaka@intel.com> wrote:
>>> The only EEPROM I know about or can speak to is the one attached to the 82571 and it doesn't set the MaxPayloadSize. That's done by the BIOS.
>>>
>>> Todd Fujinaka
>>> Technical Marketing Engineer
>>> LAN Access Division (LAD)
>>> Intel Corporation
>>> todd.fujinaka@intel.com
>>> (503) 712-4565
>>>
>>>
>>> -----Original Message-----
>>> From: Joe Jin [mailto:joe.jin@oracle.com]
>>> Sent: Wednesday, November 28, 2012 12:31 AM
>>> To: Ben Hutchings
>>> Cc: Fujinaka, Todd; Mary Mcgrath; netdev@vger.kernel.org;
>>> e1000-devel@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
>>> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
>>>
>>> On 11/28/12 02:10, Ben Hutchings wrote:
>>>> On Tue, 2012-11-27 at 17:32 +0000, Fujinaka, Todd wrote:
>>>>> Forgive me if I'm being too repetitious as I think some of this has
>>>>> been mentioned in the past.
>>>>>
>>>>> We (and by we I mean the Ethernet part and driver) can only change
>>>>> the advertised availability of a larger MaxPayloadSize. The size is
>>>>> negotiated by both sides of the link when the link is established.
>>>>> The driver should not change the size of the link as it would be
>>>>> poking at registers outside of its scope and is controlled by the
>>>>> upstream bridge (not us).
>>>> [...]
>>>>
>>>> MaxPayloadSize (MPS) is not negotiated between devices but is
>>>> programmed by the system firmware (at least for devices present at
>>>> boot - the kernel may be responsible in case of hotplug). You can
>>>> use the kernel parameter 'pci=pcie_bus_perf' (or one of several
>>>> others) to set a policy that overrides this, but no policy will allow
>>>> setting MPS above the device's MaxPayloadSizeSupported (MPSS).
>>>>
>>>
>>> Ben,
>>>
>>> Unfortunately I'm using 3.0.x kernel and this is not included in the kernel.
>>> So I'm trying to use ethtool modify it from eeprom to see if help or no.
>>>
>>>
>>> Todd, I'll review all MaxPayload for all devices, but need to say if it mismatch, customer could not modify it from BIOS for there was not entry at there, to test it, we have to find how to verify if this is the root cause, so still need to find the offset in eeprom.
>>>
>>> Thanks in advance,
>>> Joe
>>>
>
>
--
Thanks!
Yijing
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
2012-12-19 5:52 ` Yijing Wang
@ 2012-12-19 6:13 ` Joe Jin
0 siblings, 0 replies; 9+ messages in thread
From: Joe Jin @ 2012-12-19 6:13 UTC (permalink / raw)
To: Yijing Wang
Cc: Fujinaka, Todd, Ethan Zhao, Ben Hutchings, Mary Mcgrath,
netdev@vger.kernel.org, e1000-devel@lists.sf.net,
linux-kernel@vger.kernel.org, linux-pci, Jon Mason, Bjorn Helgaas
Hi Yijing,
Thanks for your reference, the patch looks good for me, but I have no chance
to test it on customer's env.
Best Regards,
Joe
On 12/19/12 13:52, Yijing Wang wrote:
> On 2012/12/19 11:04, Joe Jin wrote:
>> Hi all,
>>
>> I backported mps commits and ask customer pass "pci=pcie_bus_peer2pee" to kernel
>> to limited MPS to 128 and issue disappeared, sound like this is a BIOS bug.
>>
>
> Hi Joe,
> I found similar problem when I do pci hotplug, discussion is here:http://marc.info/?l=linux-pci&m=134810569924220&w=2.
> We try to improve Linux kernel to debug this problem easily based Bjorn's suggestion. Jon sent out the first version patch http://marc.info/?l=linux-pci&m=135002016005274&w=2.
> I think we can do further here, http://marc.info/?l=linux-pci&m=135115581307869&w=2. I hope this information can help you.
>
> Thanks!
> Yijing.
>
>> Thanks all of your help.
>>
>> Best Regards,
>> Joe
>>
>> On 11/29/12 23:52, Fujinaka, Todd wrote:
>>> Someone else pointed this out to me locally. If you have a non-client BIOS, you should be able to set the MaxPayloadSize using setpci. You have to make sure that you're being consistent throughout all the associated links.
>>>
>>> Todd Fujinaka
>>> Technical Marketing Engineer
>>> LAN Access Division (LAD)
>>> Intel Corporation
>>> todd.fujinaka@intel.com
>>> (503) 712-4565
>>>
>>>
>>> -----Original Message-----
>>> From: Ethan Zhao [mailto:ethan.kernel@gmail.com]
>>> Sent: Wednesday, November 28, 2012 7:10 PM
>>> To: Fujinaka, Todd
>>> Cc: Joe Jin; Ben Hutchings; Mary Mcgrath; netdev@vger.kernel.org; e1000-devel@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
>>> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
>>>
>>> Joe,
>>> Possibly your customer is running a kernel without source code on a platform whose vendor wouldn't like to fix BIOS issue( Is that a HP/Dell server ?).
>>> Anyway, to see if is a payload issue or, you could change the payload size with setpci tool to those devices and set the link retrain bit to trigger the link retraining to debug the issue and identity the root cause. I thinks it is much easier than modify the BIOS or eeprom of NIC.
>>>
>>> e.g.
>>> set device control register to 0f 00 (128 bytes payload size)
>>> # setpci -v -s 00:02.0 98.w=000f
>>> set device link control register to 60h (retrain the link)
>>> # setpci -v -s 00:02.0 a0.b=60
>>>
>>> Hope it works, Just my 2 cents.
>>>
>>> Ethan.zhao@oracle.com
>>>
>>> On Wed, Nov 28, 2012 at 11:53 PM, Fujinaka, Todd <todd.fujinaka@intel.com> wrote:
>>>> The only EEPROM I know about or can speak to is the one attached to the 82571 and it doesn't set the MaxPayloadSize. That's done by the BIOS.
>>>>
>>>> Todd Fujinaka
>>>> Technical Marketing Engineer
>>>> LAN Access Division (LAD)
>>>> Intel Corporation
>>>> todd.fujinaka@intel.com
>>>> (503) 712-4565
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Joe Jin [mailto:joe.jin@oracle.com]
>>>> Sent: Wednesday, November 28, 2012 12:31 AM
>>>> To: Ben Hutchings
>>>> Cc: Fujinaka, Todd; Mary Mcgrath; netdev@vger.kernel.org;
>>>> e1000-devel@lists.sf.net; linux-kernel@vger.kernel.org; linux-pci
>>>> Subject: Re: [E1000-devel] 82571EB: Detected Hardware Unit Hang
>>>>
>>>> On 11/28/12 02:10, Ben Hutchings wrote:
>>>>> On Tue, 2012-11-27 at 17:32 +0000, Fujinaka, Todd wrote:
>>>>>> Forgive me if I'm being too repetitious as I think some of this has
>>>>>> been mentioned in the past.
>>>>>>
>>>>>> We (and by we I mean the Ethernet part and driver) can only change
>>>>>> the advertised availability of a larger MaxPayloadSize. The size is
>>>>>> negotiated by both sides of the link when the link is established.
>>>>>> The driver should not change the size of the link as it would be
>>>>>> poking at registers outside of its scope and is controlled by the
>>>>>> upstream bridge (not us).
>>>>> [...]
>>>>>
>>>>> MaxPayloadSize (MPS) is not negotiated between devices but is
>>>>> programmed by the system firmware (at least for devices present at
>>>>> boot - the kernel may be responsible in case of hotplug). You can
>>>>> use the kernel parameter 'pci=pcie_bus_perf' (or one of several
>>>>> others) to set a policy that overrides this, but no policy will allow
>>>>> setting MPS above the device's MaxPayloadSizeSupported (MPSS).
>>>>>
>>>>
>>>> Ben,
>>>>
>>>> Unfortunately I'm using 3.0.x kernel and this is not included in the kernel.
>>>> So I'm trying to use ethtool modify it from eeprom to see if help or no.
>>>>
>>>>
>>>> Todd, I'll review all MaxPayload for all devices, but need to say if it mismatch, customer could not modify it from BIOS for there was not entry at there, to test it, we have to find how to verify if this is the root cause, so still need to find the offset in eeprom.
>>>>
>>>> Thanks in advance,
>>>> Joe
>>>>
>>
>>
>
>
--
Oracle <http://www.oracle.com>
Joe Jin | Software Development Senior Manager | +8610.6106.5624
ORACLE | Linux and Virtualization
No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-12-19 6:13 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <509B5038.8090304@oracle.com>
[not found] ` <061C8A8601E8EE4CA8D8FD6990CEA89133487884@ORSMSX102.amr.corp.intel.com>
[not found] ` <50A30656.6090508@oracle.com>
[not found] ` <061C8A8601E8EE4CA8D8FD6990CEA8913348B105@ORSMSX102.amr.corp.intel.com>
[not found] ` <50A43828.6000702@oracle.com>
[not found] ` <061C8A8601E8EE4CA8D8FD6990CEA8913349A0B4@ORSMSX102.amr.corp.intel.com>
[not found] ` <50A9C5CC.1030300@oracle.com>
[not found] ` <061C8A8601E8EE4CA8D8FD6990CEA8913349EB41@ORSMSX102.amr.corp.intel.com>
[not found] ` <50AB8471.7080607@oracle.com>
[not found] ` <9B4A1B1917080E46B64F07F2989DADD62F2D62D6@ORSMSX102.amr.corp.intel.com>
[not found] ` <50B41077.3080009@oracle.com>
[not found] ` <b46b84cf-3688-4fd7-b741-4e96f2663517@default>
[not found] ` <9B4A1B1917080E46B64F07F2989DADD62F2D8070@ORSMSX102.amr.corp.intel.com>
2012-11-27 18:10 ` [E1000-devel] 82571EB: Detected Hardware Unit Hang Ben Hutchings
2012-11-27 18:24 ` Fujinaka, Todd
2012-11-28 8:31 ` Joe Jin
2012-11-28 15:53 ` Fujinaka, Todd
2012-11-29 3:10 ` Ethan Zhao
2012-11-29 15:52 ` Fujinaka, Todd
2012-12-19 3:04 ` Joe Jin
2012-12-19 5:52 ` Yijing Wang
2012-12-19 6:13 ` Joe Jin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).