From: Yishai Hadas <yishaih@dev.mellanox.co.il>
To: Don Dutile <ddutile@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
"Pandarathil, Vijaymohan R" <vijaymohan.pandarathil@hp.com>,
Myron Stowe <myron.stowe@redhat.com>,
"linux-rdma (linux-rdma@vger.kernel.org)"
<linux-rdma@vger.kernel.org>,
"yishaih@mellanox.com" <yishaih@mellanox.com>,
liranl@mellanox.com,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>
Subject: Re: PCI/AER: AER in SRIOV environment
Date: Tue, 24 Jun 2014 01:44:37 +0300 [thread overview]
Message-ID: <53A8ADD5.7030207@dev.mellanox.co.il> (raw)
In-Reply-To: <53A88A32.4010406@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 5823 bytes --]
On 6/23/2014 11:12 PM, Don Dutile wrote:
> On 06/23/2014 03:09 PM, Bjorn Helgaas wrote:
>> [+cc linux-pci, Don]
>>
> Adding Alex Williamson in case he can add more to this conversation...
>
>> On Mon, Jun 23, 2014 at 8:29 AM, Yishai Hadas
>> <yishaih@dev.mellanox.co.il> wrote:
>>> Hi Vijay,
>>> Trying to add AER support for Mellanox NIC in SRIOV environment, while
>>> evaluating/testing encountered a problem which led me to your
>>> patch accepted as part of kernel 3.8, commit ID
>>> "918b4053184c0ca22236e70e299c5343eea35304".
>>>
>>> Have some concerns/questions on:
>>> When working in SRIOV environment VFs may be un-attached, having no
>>> driver
>>> assigned to, or may be attached to Virtual machine to work in some
>>> pass-through mode.
>>> Once working in KVM setup there is pci-stub driver which is loaded
>>> in the
>>> HYP/PF for a given attached VF.
> huh? 'loaded in the hyp/pf? .... um, loaded in the host, and a VF is
> detached from its host driver -- a VF can be used in the host w/o any
> virtualization,
> i.e., that's how guest VM is driving the VF: as if it was used by a
> guest (host) OS directly --
> and attached to pci-stub driver, when assigned to a KVM guest in
> pre-VFIO days/ways.
> If VFIO used, then VF is attached to vfio-pci driver.
>
>>>
>>> I'm using the aer-inject kernel module and its corresponding
>>> aer-inject tool
>>> to simulate an error in the HYP.
>>> In both cases your commit will cause the AER recovery to fail as
>>> there is no
>>> driver assigned to PF's VFs that supports AER, comparing the code
>>> before
>>> your change.
>>>
> Without VFIO, I believe that's correct. There was no AER-to-VF support
> pre-VFIO days.
> I believe with the recent VFIO support,
> and modifications to KVM, an AER that is associated with an assigned
> VF will
> force the crash/halt of the KVM guest -- can't depend on a guest VF
> driver clearing
> the AER in the hyp/host -- guest isn't privileged enough to clear the
> error.
> So, crashing the guest is the simple option at the moment, to contain
> the error.
> Alex: do I have that (vfio aer default) correct, or is that still
> site-under-construction?
How about the case that the VF is not attached to a KVM guest and
has no driver loaded on host ? in such a case from code review and some
testing the recovery will
fail as there is no AER aware driver here. What is the expected
solution here ?
Any special qemu /stuff is needed to activate the VFIO support ?
would like to give it a try for a case that VF is attached.
>
>>> How such cases should work ? my expectation was that the PF will
>>> get the
>>> error detected message then will recognize whether
>>> issue is its own or one of its VFs
> The AER packet will have the tag of the VF in if it was the source of
> the error;
> so the PF will never see it; although one could argue it should be
> 'promoted'
> to the PF if PF/VF needs to clear some state it has wrt the VF (the
> SRIOV spec is
> lacking of info in this space); _but_, VFIO resets the VF (sets FLR
> bit) when the
> device is deassigned and before re-attachment to the host, so that
> should clear out
> any state btwn PF & VF ('should' ... famous last words...).
In my test I have used the aer-inject tool simulating an error to
the BUS that both PF/VF are residing on, putting the function number to
be the PF one, looks like both should be called by the aer driver as part
of the pci_walk_bus(). As mentioned I got a call only on the PF and
recovery failed as of the VF doesn't include an AER aware driver, once
removed the VF recovery succeeded.
I believe that packet should include some info about the source of
the error isn't it ?
In addition, looking at IXGBE upstream source code at
ixgbe_error_detected() looks like there is some code running on the PF
that checks whether the source was a VF.
By the way: when tried to simulate a VF error using its FN got
below error:
"Error: Failed to write, Inappropriate ioctl for device", any idea
about that error ?
>
>>
>> I'm really not an AER expert, so help me understand this question of
>> recognizing whether an error is associated with a PF or a VF.
>>
>> In terms of hardware, it looks like the device that detects an error
>> logs some information and sends an Error Message upstream. The Root
>> Complex receives the message, captures the source ID from the Error
>> Message, and may generate an interrupt. I expect this source ID can
>> be either a PF or a VF; there's no requirement that a VF error must be
>> reported as though it's from the PF, is there?
>>
>>> and work accordingly, in current code
>>> looks like recovery failed as part of "voting" once there is no AER
>>> handler
>>> assigned to the VFs.
>>
>> The commit you mentioned has to do with PCI_ERS_RESULT_NO_AER_DRIVER.
>> We use pci_walk_bus() to figure out whether all the devices in a
>> subtree have a driver. What subtree is involved here? I would expect
>> the VFs to be siblings of the PF, not children of it, so I'm not sure
>> where things went wrong.
> Well, VFs could be on virtual busses (ARI turned on), so not
> necessarily a
> sibling to PF ... and then we have the problem in PCI code of not
> being able
> to traverse these virtual busses (in some cases; not sure if
> pci_walk_bus(),
> which is going down the tree vs up the tree, has any problems here
> w/VFs on
> virtual busses).
>
>>
>> Can you collect "lspci -vvv" output and maybe add some debug so we can
>> see exactly where the error is detected and what devices we're looking
>> at to conclude that one of them doesn't have a driver?
lspci -vvv for both PF & VF is attached, we can see that VF
(21:00.1) has no driver loaded comparing the PF (Kernel driver in use:
mlx4_core).
>>
>> Bjorn
>>
>
[-- Attachment #2: lspci.txt --]
[-- Type: text/plain, Size: 7889 bytes --]
21:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Subsystem: Hewlett-Packard Company Device 18cf
Physical Slot: 2
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 64
Region 0: Memory at fbf00000 (64-bit, non-prefetchable) [size=1M]
Region 2: Memory at f8000000 (64-bit, prefetchable) [size=32M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] Vital Product Data
Product Name: HP ConnectX-3 Mezz
Read-only fields:
[PN] Part number: 644161-B21
[EC] Engineering changes: C4
[SN] Serial number: IL224202VW
[V0] Vendor specific: HP IB FDR/EN 10/40Gb 2P 544M Adptr
[RV] Reserved: checksum good, 0 byte(s) reserved
Read/write fields:
[V1] Vendor specific: N/A
[YA] Asset tag: N/A
[RW] Read-write area: 102 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 252 byte(s) free
End
Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
Vector table: BAR=0 offset=0007c000
PBA: BAR=0 offset=0007d000
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 256 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #8, Speed 8GT/s, Width x8, ASPM L0s, Latency L0 unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [148 v1] Device Serial Number 24-be-05-ff-ff-8b-6b-d0
Capabilities: [108 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+
IOVSta: Migration-
Initial VFs: 16, Total VFs: 16, Number of VFs: 1, Function Dependency Link: 00
VF offset: 1, stride: 1, Device ID: 1004
Supported Page Size: 000007ff, System Page Size: 00000001
Region 2: Memory at 00000000d8000000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [154 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [18c v1] #19
Kernel driver in use: mlx4_core
Kernel modules: mlx4_core
21:00.1 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]
Subsystem: Hewlett-Packard Company Device 61b0
Physical Slot: 2
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Region 2: [virtual] Memory at d8000000 (64-bit, prefetchable) [size=32M]
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset+
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM unknown, Latency L0 <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [9c] MSI-X: Enable- Count=256 Masked-
Vector table: BAR=2 offset=00002000
PBA: BAR=2 offset=00003000
Capabilities: [40] #00 [0000]
Kernel modules: mlx4_core
next prev parent reply other threads:[~2014-06-23 22:44 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <53A839C6.5050102@dev.mellanox.co.il>
2014-06-23 19:09 ` PCI/AER: AER in SRIOV environment Bjorn Helgaas
2014-06-23 20:12 ` Don Dutile
2014-06-23 22:44 ` Yishai Hadas [this message]
2014-06-23 23:17 ` Alex Williamson
2014-06-24 14:56 ` Don Dutile
2014-06-24 16:22 ` Yishai Hadas
2014-06-24 17:38 ` Alex Williamson
2014-06-23 23:10 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53A8ADD5.7030207@dev.mellanox.co.il \
--to=yishaih@dev.mellanox.co.il \
--cc=bhelgaas@google.com \
--cc=ddutile@redhat.com \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=liranl@mellanox.com \
--cc=myron.stowe@redhat.com \
--cc=vijaymohan.pandarathil@hp.com \
--cc=yishaih@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).