* Re: PCI/AER: AER in SRIOV environment
[not found] <53A839C6.5050102@dev.mellanox.co.il>
@ 2014-06-23 19:09 ` Bjorn Helgaas
2014-06-23 20:12 ` Don Dutile
0 siblings, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2014-06-23 19:09 UTC (permalink / raw)
To: Yishai Hadas
Cc: Pandarathil, Vijaymohan R, Myron Stowe,
linux-rdma (linux-rdma@vger.kernel.org), yishaih@mellanox.com,
liranl, linux-pci@vger.kernel.org, Don Dutile
[+cc linux-pci, Don]
On Mon, Jun 23, 2014 at 8:29 AM, Yishai Hadas
<yishaih@dev.mellanox.co.il> wrote:
> Hi Vijay,
> Trying to add AER support for Mellanox NIC in SRIOV environment, while
> evaluating/testing encountered a problem which led me to your
> patch accepted as part of kernel 3.8, commit ID
> "918b4053184c0ca22236e70e299c5343eea35304".
>
> Have some concerns/questions on:
> When working in SRIOV environment VFs may be un-attached, having no driver
> assigned to, or may be attached to Virtual machine to work in some
> pass-through mode.
> Once working in KVM setup there is pci-stub driver which is loaded in the
> HYP/PF for a given attached VF.
>
> I'm using the aer-inject kernel module and its corresponding aer-inject tool
> to simulate an error in the HYP.
> In both cases your commit will cause the AER recovery to fail as there is no
> driver assigned to PF's VFs that supports AER, comparing the code before
> your change.
>
> How such cases should work ? my expectation was that the PF will get the
> error detected message then will recognize whether
> issue is its own or one of its VFs
I'm really not an AER expert, so help me understand this question of
recognizing whether an error is associated with a PF or a VF.
In terms of hardware, it looks like the device that detects an error
logs some information and sends an Error Message upstream. The Root
Complex receives the message, captures the source ID from the Error
Message, and may generate an interrupt. I expect this source ID can
be either a PF or a VF; there's no requirement that a VF error must be
reported as though it's from the PF, is there?
> and work accordingly, in current code
> looks like recovery failed as part of "voting" once there is no AER handler
> assigned to the VFs.
The commit you mentioned has to do with PCI_ERS_RESULT_NO_AER_DRIVER.
We use pci_walk_bus() to figure out whether all the devices in a
subtree have a driver. What subtree is involved here? I would expect
the VFs to be siblings of the PF, not children of it, so I'm not sure
where things went wrong.
Can you collect "lspci -vvv" output and maybe add some debug so we can
see exactly where the error is detected and what devices we're looking
at to conclude that one of them doesn't have a driver?
Bjorn
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PCI/AER: AER in SRIOV environment
2014-06-23 19:09 ` PCI/AER: AER in SRIOV environment Bjorn Helgaas
@ 2014-06-23 20:12 ` Don Dutile
2014-06-23 22:44 ` Yishai Hadas
2014-06-23 23:10 ` Alex Williamson
0 siblings, 2 replies; 8+ messages in thread
From: Don Dutile @ 2014-06-23 20:12 UTC (permalink / raw)
To: Bjorn Helgaas, Yishai Hadas
Cc: Pandarathil, Vijaymohan R, Myron Stowe,
linux-rdma (linux-rdma@vger.kernel.org), yishaih@mellanox.com,
liranl, linux-pci@vger.kernel.org
On 06/23/2014 03:09 PM, Bjorn Helgaas wrote:
> [+cc linux-pci, Don]
>
Adding Alex Williamson in case he can add more to this conversation...
> On Mon, Jun 23, 2014 at 8:29 AM, Yishai Hadas
> <yishaih@dev.mellanox.co.il> wrote:
>> Hi Vijay,
>> Trying to add AER support for Mellanox NIC in SRIOV environment, while
>> evaluating/testing encountered a problem which led me to your
>> patch accepted as part of kernel 3.8, commit ID
>> "918b4053184c0ca22236e70e299c5343eea35304".
>>
>> Have some concerns/questions on:
>> When working in SRIOV environment VFs may be un-attached, having no driver
>> assigned to, or may be attached to Virtual machine to work in some
>> pass-through mode.
>> Once working in KVM setup there is pci-stub driver which is loaded in the
>> HYP/PF for a given attached VF.
huh? 'loaded in the hyp/pf? .... um, loaded in the host, and a VF is
detached from its host driver -- a VF can be used in the host w/o any virtualization,
i.e., that's how guest VM is driving the VF: as if it was used by a guest (host) OS directly --
and attached to pci-stub driver, when assigned to a KVM guest in pre-VFIO days/ways.
If VFIO used, then VF is attached to vfio-pci driver.
>>
>> I'm using the aer-inject kernel module and its corresponding aer-inject tool
>> to simulate an error in the HYP.
>> In both cases your commit will cause the AER recovery to fail as there is no
>> driver assigned to PF's VFs that supports AER, comparing the code before
>> your change.
>>
Without VFIO, I believe that's correct. There was no AER-to-VF support pre-VFIO days.
I believe with the recent VFIO support,
and modifications to KVM, an AER that is associated with an assigned VF will
force the crash/halt of the KVM guest -- can't depend on a guest VF driver clearing
the AER in the hyp/host -- guest isn't privileged enough to clear the error.
So, crashing the guest is the simple option at the moment, to contain the error.
Alex: do I have that (vfio aer default) correct, or is that still site-under-construction?
>> How such cases should work ? my expectation was that the PF will get the
>> error detected message then will recognize whether
>> issue is its own or one of its VFs
The AER packet will have the tag of the VF in if it was the source of the error;
so the PF will never see it; although one could argue it should be 'promoted'
to the PF if PF/VF needs to clear some state it has wrt the VF (the SRIOV spec is
lacking of info in this space); _but_, VFIO resets the VF (sets FLR bit) when the
device is deassigned and before re-attachment to the host, so that should clear out
any state btwn PF & VF ('should' ... famous last words...).
>
> I'm really not an AER expert, so help me understand this question of
> recognizing whether an error is associated with a PF or a VF.
>
> In terms of hardware, it looks like the device that detects an error
> logs some information and sends an Error Message upstream. The Root
> Complex receives the message, captures the source ID from the Error
> Message, and may generate an interrupt. I expect this source ID can
> be either a PF or a VF; there's no requirement that a VF error must be
> reported as though it's from the PF, is there?
>
>> and work accordingly, in current code
>> looks like recovery failed as part of "voting" once there is no AER handler
>> assigned to the VFs.
>
> The commit you mentioned has to do with PCI_ERS_RESULT_NO_AER_DRIVER.
> We use pci_walk_bus() to figure out whether all the devices in a
> subtree have a driver. What subtree is involved here? I would expect
> the VFs to be siblings of the PF, not children of it, so I'm not sure
> where things went wrong.
Well, VFs could be on virtual busses (ARI turned on), so not necessarily a
sibling to PF ... and then we have the problem in PCI code of not being able
to traverse these virtual busses (in some cases; not sure if pci_walk_bus(),
which is going down the tree vs up the tree, has any problems here w/VFs on
virtual busses).
>
> Can you collect "lspci -vvv" output and maybe add some debug so we can
> see exactly where the error is detected and what devices we're looking
> at to conclude that one of them doesn't have a driver?
>
> Bjorn
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PCI/AER: AER in SRIOV environment
2014-06-23 20:12 ` Don Dutile
@ 2014-06-23 22:44 ` Yishai Hadas
2014-06-23 23:17 ` Alex Williamson
2014-06-24 14:56 ` Don Dutile
2014-06-23 23:10 ` Alex Williamson
1 sibling, 2 replies; 8+ messages in thread
From: Yishai Hadas @ 2014-06-23 22:44 UTC (permalink / raw)
To: Don Dutile
Cc: Bjorn Helgaas, Pandarathil, Vijaymohan R, Myron Stowe,
linux-rdma (linux-rdma@vger.kernel.org), yishaih@mellanox.com,
liranl, linux-pci@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 5823 bytes --]
On 6/23/2014 11:12 PM, Don Dutile wrote:
> On 06/23/2014 03:09 PM, Bjorn Helgaas wrote:
>> [+cc linux-pci, Don]
>>
> Adding Alex Williamson in case he can add more to this conversation...
>
>> On Mon, Jun 23, 2014 at 8:29 AM, Yishai Hadas
>> <yishaih@dev.mellanox.co.il> wrote:
>>> Hi Vijay,
>>> Trying to add AER support for Mellanox NIC in SRIOV environment, while
>>> evaluating/testing encountered a problem which led me to your
>>> patch accepted as part of kernel 3.8, commit ID
>>> "918b4053184c0ca22236e70e299c5343eea35304".
>>>
>>> Have some concerns/questions on:
>>> When working in SRIOV environment VFs may be un-attached, having no
>>> driver
>>> assigned to, or may be attached to Virtual machine to work in some
>>> pass-through mode.
>>> Once working in KVM setup there is pci-stub driver which is loaded
>>> in the
>>> HYP/PF for a given attached VF.
> huh? 'loaded in the hyp/pf? .... um, loaded in the host, and a VF is
> detached from its host driver -- a VF can be used in the host w/o any
> virtualization,
> i.e., that's how guest VM is driving the VF: as if it was used by a
> guest (host) OS directly --
> and attached to pci-stub driver, when assigned to a KVM guest in
> pre-VFIO days/ways.
> If VFIO used, then VF is attached to vfio-pci driver.
>
>>>
>>> I'm using the aer-inject kernel module and its corresponding
>>> aer-inject tool
>>> to simulate an error in the HYP.
>>> In both cases your commit will cause the AER recovery to fail as
>>> there is no
>>> driver assigned to PF's VFs that supports AER, comparing the code
>>> before
>>> your change.
>>>
> Without VFIO, I believe that's correct. There was no AER-to-VF support
> pre-VFIO days.
> I believe with the recent VFIO support,
> and modifications to KVM, an AER that is associated with an assigned
> VF will
> force the crash/halt of the KVM guest -- can't depend on a guest VF
> driver clearing
> the AER in the hyp/host -- guest isn't privileged enough to clear the
> error.
> So, crashing the guest is the simple option at the moment, to contain
> the error.
> Alex: do I have that (vfio aer default) correct, or is that still
> site-under-construction?
How about the case that the VF is not attached to a KVM guest and
has no driver loaded on host ? in such a case from code review and some
testing the recovery will
fail as there is no AER aware driver here. What is the expected
solution here ?
Any special qemu /stuff is needed to activate the VFIO support ?
would like to give it a try for a case that VF is attached.
>
>>> How such cases should work ? my expectation was that the PF will
>>> get the
>>> error detected message then will recognize whether
>>> issue is its own or one of its VFs
> The AER packet will have the tag of the VF in if it was the source of
> the error;
> so the PF will never see it; although one could argue it should be
> 'promoted'
> to the PF if PF/VF needs to clear some state it has wrt the VF (the
> SRIOV spec is
> lacking of info in this space); _but_, VFIO resets the VF (sets FLR
> bit) when the
> device is deassigned and before re-attachment to the host, so that
> should clear out
> any state btwn PF & VF ('should' ... famous last words...).
In my test I have used the aer-inject tool simulating an error to
the BUS that both PF/VF are residing on, putting the function number to
be the PF one, looks like both should be called by the aer driver as part
of the pci_walk_bus(). As mentioned I got a call only on the PF and
recovery failed as of the VF doesn't include an AER aware driver, once
removed the VF recovery succeeded.
I believe that packet should include some info about the source of
the error isn't it ?
In addition, looking at IXGBE upstream source code at
ixgbe_error_detected() looks like there is some code running on the PF
that checks whether the source was a VF.
By the way: when tried to simulate a VF error using its FN got
below error:
"Error: Failed to write, Inappropriate ioctl for device", any idea
about that error ?
>
>>
>> I'm really not an AER expert, so help me understand this question of
>> recognizing whether an error is associated with a PF or a VF.
>>
>> In terms of hardware, it looks like the device that detects an error
>> logs some information and sends an Error Message upstream. The Root
>> Complex receives the message, captures the source ID from the Error
>> Message, and may generate an interrupt. I expect this source ID can
>> be either a PF or a VF; there's no requirement that a VF error must be
>> reported as though it's from the PF, is there?
>>
>>> and work accordingly, in current code
>>> looks like recovery failed as part of "voting" once there is no AER
>>> handler
>>> assigned to the VFs.
>>
>> The commit you mentioned has to do with PCI_ERS_RESULT_NO_AER_DRIVER.
>> We use pci_walk_bus() to figure out whether all the devices in a
>> subtree have a driver. What subtree is involved here? I would expect
>> the VFs to be siblings of the PF, not children of it, so I'm not sure
>> where things went wrong.
> Well, VFs could be on virtual busses (ARI turned on), so not
> necessarily a
> sibling to PF ... and then we have the problem in PCI code of not
> being able
> to traverse these virtual busses (in some cases; not sure if
> pci_walk_bus(),
> which is going down the tree vs up the tree, has any problems here
> w/VFs on
> virtual busses).
>
>>
>> Can you collect "lspci -vvv" output and maybe add some debug so we can
>> see exactly where the error is detected and what devices we're looking
>> at to conclude that one of them doesn't have a driver?
lspci -vvv for both PF & VF is attached, we can see that VF
(21:00.1) has no driver loaded comparing the PF (Kernel driver in use:
mlx4_core).
>>
>> Bjorn
>>
>
[-- Attachment #2: lspci.txt --]
[-- Type: text/plain, Size: 7889 bytes --]
21:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Subsystem: Hewlett-Packard Company Device 18cf
Physical Slot: 2
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 64
Region 0: Memory at fbf00000 (64-bit, non-prefetchable) [size=1M]
Region 2: Memory at f8000000 (64-bit, prefetchable) [size=32M]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [48] Vital Product Data
Product Name: HP ConnectX-3 Mezz
Read-only fields:
[PN] Part number: 644161-B21
[EC] Engineering changes: C4
[SN] Serial number: IL224202VW
[V0] Vendor specific: HP IB FDR/EN 10/40Gb 2P 544M Adptr
[RV] Reserved: checksum good, 0 byte(s) reserved
Read/write fields:
[V1] Vendor specific: N/A
[YA] Asset tag: N/A
[RW] Read-write area: 102 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 253 byte(s) free
[RW] Read-write area: 252 byte(s) free
End
Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
Vector table: BAR=0 offset=0007c000
PBA: BAR=0 offset=0007d000
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 256 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #8, Speed 8GT/s, Width x8, ASPM L0s, Latency L0 unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
EqualizationPhase2+, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [148 v1] Device Serial Number 24-be-05-ff-ff-8b-6b-d0
Capabilities: [108 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+
IOVSta: Migration-
Initial VFs: 16, Total VFs: 16, Number of VFs: 1, Function Dependency Link: 00
VF offset: 1, stride: 1, Device ID: 1004
Supported Page Size: 000007ff, System Page Size: 00000001
Region 2: Memory at 00000000d8000000 (64-bit, prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [154 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [18c v1] #19
Kernel driver in use: mlx4_core
Kernel modules: mlx4_core
21:00.1 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 Virtual Function]
Subsystem: Hewlett-Packard Company Device 61b0
Physical Slot: 2
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Region 2: [virtual] Memory at d8000000 (64-bit, prefetchable) [size=32M]
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset+
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM unknown, Latency L0 <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [9c] MSI-X: Enable- Count=256 Masked-
Vector table: BAR=2 offset=00002000
PBA: BAR=2 offset=00003000
Capabilities: [40] #00 [0000]
Kernel modules: mlx4_core
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PCI/AER: AER in SRIOV environment
2014-06-23 20:12 ` Don Dutile
2014-06-23 22:44 ` Yishai Hadas
@ 2014-06-23 23:10 ` Alex Williamson
1 sibling, 0 replies; 8+ messages in thread
From: Alex Williamson @ 2014-06-23 23:10 UTC (permalink / raw)
To: Don Dutile
Cc: Bjorn Helgaas, Yishai Hadas, Pandarathil, Vijaymohan R,
Myron Stowe, linux-rdma (linux-rdma@vger.kernel.org),
yishaih@mellanox.com, liranl, linux-pci@vger.kernel.org
On Mon, 2014-06-23 at 16:12 -0400, Don Dutile wrote:
> On 06/23/2014 03:09 PM, Bjorn Helgaas wrote:
> > [+cc linux-pci, Don]
> >
> Adding Alex Williamson in case he can add more to this conversation...
>
> > On Mon, Jun 23, 2014 at 8:29 AM, Yishai Hadas
> > <yishaih@dev.mellanox.co.il> wrote:
> >> Hi Vijay,
> >> Trying to add AER support for Mellanox NIC in SRIOV environment, while
> >> evaluating/testing encountered a problem which led me to your
> >> patch accepted as part of kernel 3.8, commit ID
> >> "918b4053184c0ca22236e70e299c5343eea35304".
> >>
> >> Have some concerns/questions on:
> >> When working in SRIOV environment VFs may be un-attached, having no driver
> >> assigned to, or may be attached to Virtual machine to work in some
> >> pass-through mode.
> >> Once working in KVM setup there is pci-stub driver which is loaded in the
> >> HYP/PF for a given attached VF.
> huh? 'loaded in the hyp/pf? .... um, loaded in the host, and a VF is
> detached from its host driver -- a VF can be used in the host w/o any virtualization,
> i.e., that's how guest VM is driving the VF: as if it was used by a guest (host) OS directly --
> and attached to pci-stub driver, when assigned to a KVM guest in pre-VFIO days/ways.
> If VFIO used, then VF is attached to vfio-pci driver.
>
> >>
> >> I'm using the aer-inject kernel module and its corresponding aer-inject tool
> >> to simulate an error in the HYP.
> >> In both cases your commit will cause the AER recovery to fail as there is no
> >> driver assigned to PF's VFs that supports AER, comparing the code before
> >> your change.
> >>
> Without VFIO, I believe that's correct. There was no AER-to-VF support pre-VFIO days.
> I believe with the recent VFIO support,
> and modifications to KVM, an AER that is associated with an assigned VF will
> force the crash/halt of the KVM guest -- can't depend on a guest VF driver clearing
> the AER in the hyp/host -- guest isn't privileged enough to clear the error.
> So, crashing the guest is the simple option at the moment, to contain the error.
> Alex: do I have that (vfio aer default) correct, or is that still site-under-construction?
Yep, any kind of recovery is TBD, we just send an eventfd signal that an
error occurred and QEMU handles it by stopping the guest. Not sure I
can add much more to the conversation, but this is exactly the sort of
thing that makes legacy kvm device assignment and pci-stub a bad design.
Thanks,
Alex
> >> How such cases should work ? my expectation was that the PF will get the
> >> error detected message then will recognize whether
> >> issue is its own or one of its VFs
> The AER packet will have the tag of the VF in if it was the source of the error;
> so the PF will never see it; although one could argue it should be 'promoted'
> to the PF if PF/VF needs to clear some state it has wrt the VF (the SRIOV spec is
> lacking of info in this space); _but_, VFIO resets the VF (sets FLR bit) when the
> device is deassigned and before re-attachment to the host, so that should clear out
> any state btwn PF & VF ('should' ... famous last words...).
>
> >
> > I'm really not an AER expert, so help me understand this question of
> > recognizing whether an error is associated with a PF or a VF.
> >
> > In terms of hardware, it looks like the device that detects an error
> > logs some information and sends an Error Message upstream. The Root
> > Complex receives the message, captures the source ID from the Error
> > Message, and may generate an interrupt. I expect this source ID can
> > be either a PF or a VF; there's no requirement that a VF error must be
> > reported as though it's from the PF, is there?
> >
> >> and work accordingly, in current code
> >> looks like recovery failed as part of "voting" once there is no AER handler
> >> assigned to the VFs.
> >
> > The commit you mentioned has to do with PCI_ERS_RESULT_NO_AER_DRIVER.
> > We use pci_walk_bus() to figure out whether all the devices in a
> > subtree have a driver. What subtree is involved here? I would expect
> > the VFs to be siblings of the PF, not children of it, so I'm not sure
> > where things went wrong.
> Well, VFs could be on virtual busses (ARI turned on), so not necessarily a
> sibling to PF ... and then we have the problem in PCI code of not being able
> to traverse these virtual busses (in some cases; not sure if pci_walk_bus(),
> which is going down the tree vs up the tree, has any problems here w/VFs on
> virtual busses).
>
> >
> > Can you collect "lspci -vvv" output and maybe add some debug so we can
> > see exactly where the error is detected and what devices we're looking
> > at to conclude that one of them doesn't have a driver?
> >
> > Bjorn
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PCI/AER: AER in SRIOV environment
2014-06-23 22:44 ` Yishai Hadas
@ 2014-06-23 23:17 ` Alex Williamson
2014-06-24 14:56 ` Don Dutile
1 sibling, 0 replies; 8+ messages in thread
From: Alex Williamson @ 2014-06-23 23:17 UTC (permalink / raw)
To: Yishai Hadas
Cc: Don Dutile, Bjorn Helgaas, Pandarathil, Vijaymohan R, Myron Stowe,
linux-rdma (linux-rdma@vger.kernel.org), yishaih@mellanox.com,
liranl, linux-pci@vger.kernel.org
On Tue, 2014-06-24 at 01:44 +0300, Yishai Hadas wrote:
> On 6/23/2014 11:12 PM, Don Dutile wrote:
> > On 06/23/2014 03:09 PM, Bjorn Helgaas wrote:
> >> [+cc linux-pci, Don]
> >>
> > Adding Alex Williamson in case he can add more to this conversation...
> >
> >> On Mon, Jun 23, 2014 at 8:29 AM, Yishai Hadas
> >> <yishaih@dev.mellanox.co.il> wrote:
> >>> Hi Vijay,
> >>> Trying to add AER support for Mellanox NIC in SRIOV environment, while
> >>> evaluating/testing encountered a problem which led me to your
> >>> patch accepted as part of kernel 3.8, commit ID
> >>> "918b4053184c0ca22236e70e299c5343eea35304".
> >>>
> >>> Have some concerns/questions on:
> >>> When working in SRIOV environment VFs may be un-attached, having no
> >>> driver
> >>> assigned to, or may be attached to Virtual machine to work in some
> >>> pass-through mode.
> >>> Once working in KVM setup there is pci-stub driver which is loaded
> >>> in the
> >>> HYP/PF for a given attached VF.
> > huh? 'loaded in the hyp/pf? .... um, loaded in the host, and a VF is
> > detached from its host driver -- a VF can be used in the host w/o any
> > virtualization,
> > i.e., that's how guest VM is driving the VF: as if it was used by a
> > guest (host) OS directly --
> > and attached to pci-stub driver, when assigned to a KVM guest in
> > pre-VFIO days/ways.
> > If VFIO used, then VF is attached to vfio-pci driver.
> >
> >>>
> >>> I'm using the aer-inject kernel module and its corresponding
> >>> aer-inject tool
> >>> to simulate an error in the HYP.
> >>> In both cases your commit will cause the AER recovery to fail as
> >>> there is no
> >>> driver assigned to PF's VFs that supports AER, comparing the code
> >>> before
> >>> your change.
> >>>
> > Without VFIO, I believe that's correct. There was no AER-to-VF support
> > pre-VFIO days.
> > I believe with the recent VFIO support,
> > and modifications to KVM, an AER that is associated with an assigned
> > VF will
> > force the crash/halt of the KVM guest -- can't depend on a guest VF
> > driver clearing
> > the AER in the hyp/host -- guest isn't privileged enough to clear the
> > error.
> > So, crashing the guest is the simple option at the moment, to contain
> > the error.
> > Alex: do I have that (vfio aer default) correct, or is that still
> > site-under-construction?
> How about the case that the VF is not attached to a KVM guest and
> has no driver loaded on host ? in such a case from code review and some
> testing the recovery will
> fail as there is no AER aware driver here. What is the expected
> solution here ?
> Any special qemu /stuff is needed to activate the VFIO support ?
> would like to give it a try for a case that VF is attached.
Just use recent QEMU (>=1.6, the newer the better) and it should be
automatic. Note that the VM won't exit on error, it's stopped with
state RUN_STATE_IO_ERROR to allow the possibility of collecting data.
Thanks,
Alex
> >
> >>> How such cases should work ? my expectation was that the PF will
> >>> get the
> >>> error detected message then will recognize whether
> >>> issue is its own or one of its VFs
> > The AER packet will have the tag of the VF in if it was the source of
> > the error;
> > so the PF will never see it; although one could argue it should be
> > 'promoted'
> > to the PF if PF/VF needs to clear some state it has wrt the VF (the
> > SRIOV spec is
> > lacking of info in this space); _but_, VFIO resets the VF (sets FLR
> > bit) when the
> > device is deassigned and before re-attachment to the host, so that
> > should clear out
> > any state btwn PF & VF ('should' ... famous last words...).
> In my test I have used the aer-inject tool simulating an error to
> the BUS that both PF/VF are residing on, putting the function number to
> be the PF one, looks like both should be called by the aer driver as part
> of the pci_walk_bus(). As mentioned I got a call only on the PF and
> recovery failed as of the VF doesn't include an AER aware driver, once
> removed the VF recovery succeeded.
> I believe that packet should include some info about the source of
> the error isn't it ?
> In addition, looking at IXGBE upstream source code at
> ixgbe_error_detected() looks like there is some code running on the PF
> that checks whether the source was a VF.
>
> By the way: when tried to simulate a VF error using its FN got
> below error:
> "Error: Failed to write, Inappropriate ioctl for device", any idea
> about that error ?
> >
> >>
> >> I'm really not an AER expert, so help me understand this question of
> >> recognizing whether an error is associated with a PF or a VF.
> >>
> >> In terms of hardware, it looks like the device that detects an error
> >> logs some information and sends an Error Message upstream. The Root
> >> Complex receives the message, captures the source ID from the Error
> >> Message, and may generate an interrupt. I expect this source ID can
> >> be either a PF or a VF; there's no requirement that a VF error must be
> >> reported as though it's from the PF, is there?
> >>
> >>> and work accordingly, in current code
> >>> looks like recovery failed as part of "voting" once there is no AER
> >>> handler
> >>> assigned to the VFs.
> >>
> >> The commit you mentioned has to do with PCI_ERS_RESULT_NO_AER_DRIVER.
> >> We use pci_walk_bus() to figure out whether all the devices in a
> >> subtree have a driver. What subtree is involved here? I would expect
> >> the VFs to be siblings of the PF, not children of it, so I'm not sure
> >> where things went wrong.
> > Well, VFs could be on virtual busses (ARI turned on), so not
> > necessarily a
> > sibling to PF ... and then we have the problem in PCI code of not
> > being able
> > to traverse these virtual busses (in some cases; not sure if
> > pci_walk_bus(),
> > which is going down the tree vs up the tree, has any problems here
> > w/VFs on
> > virtual busses).
> >
> >>
> >> Can you collect "lspci -vvv" output and maybe add some debug so we can
> >> see exactly where the error is detected and what devices we're looking
> >> at to conclude that one of them doesn't have a driver?
> lspci -vvv for both PF & VF is attached, we can see that VF
> (21:00.1) has no driver loaded comparing the PF (Kernel driver in use:
> mlx4_core).
> >>
> >> Bjorn
> >>
> >
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PCI/AER: AER in SRIOV environment
2014-06-23 22:44 ` Yishai Hadas
2014-06-23 23:17 ` Alex Williamson
@ 2014-06-24 14:56 ` Don Dutile
2014-06-24 16:22 ` Yishai Hadas
1 sibling, 1 reply; 8+ messages in thread
From: Don Dutile @ 2014-06-24 14:56 UTC (permalink / raw)
To: Yishai Hadas
Cc: Bjorn Helgaas, Pandarathil, Vijaymohan R, Myron Stowe,
linux-rdma (linux-rdma@vger.kernel.org), yishaih@mellanox.com,
liranl, linux-pci@vger.kernel.org
On 06/23/2014 06:44 PM, Yishai Hadas wrote:
> On 6/23/2014 11:12 PM, Don Dutile wrote:
>> On 06/23/2014 03:09 PM, Bjorn Helgaas wrote:
>>> [+cc linux-pci, Don]
>>>
>> Adding Alex Williamson in case he can add more to this conversation...
>>
>>> On Mon, Jun 23, 2014 at 8:29 AM, Yishai Hadas
>>> <yishaih@dev.mellanox.co.il> wrote:
>>>> Hi Vijay,
>>>> Trying to add AER support for Mellanox NIC in SRIOV environment, while
>>>> evaluating/testing encountered a problem which led me to your
>>>> patch accepted as part of kernel 3.8, commit ID
>>>> "918b4053184c0ca22236e70e299c5343eea35304".
>>>>
>>>> Have some concerns/questions on:
>>>> When working in SRIOV environment VFs may be un-attached, having no driver
>>>> assigned to, or may be attached to Virtual machine to work in some
>>>> pass-through mode.
>>>> Once working in KVM setup there is pci-stub driver which is loaded in the
>>>> HYP/PF for a given attached VF.
>> huh? 'loaded in the hyp/pf? .... um, loaded in the host, and a VF is
>> detached from its host driver -- a VF can be used in the host w/o any virtualization,
>> i.e., that's how guest VM is driving the VF: as if it was used by a guest (host) OS directly --
>> and attached to pci-stub driver, when assigned to a KVM guest in pre-VFIO days/ways.
>> If VFIO used, then VF is attached to vfio-pci driver.
>>
>>>>
>>>> I'm using the aer-inject kernel module and its corresponding aer-inject tool
>>>> to simulate an error in the HYP.
>>>> In both cases your commit will cause the AER recovery to fail as there is no
>>>> driver assigned to PF's VFs that supports AER, comparing the code before
>>>> your change.
>>>>
>> Without VFIO, I believe that's correct. There was no AER-to-VF support pre-VFIO days.
>> I believe with the recent VFIO support,
>> and modifications to KVM, an AER that is associated with an assigned VF will
>> force the crash/halt of the KVM guest -- can't depend on a guest VF driver clearing
>> the AER in the hyp/host -- guest isn't privileged enough to clear the error.
>> So, crashing the guest is the simple option at the moment, to contain the error.
>> Alex: do I have that (vfio aer default) correct, or is that still site-under-construction?
> How about the case that the VF is not attached to a KVM guest and has no driver loaded on host ? in such a case from code review and some testing the recovery will
> fail as there is no AER aware driver here. What is the expected solution here ?
Well, how can a VF be attributed to an AER if it's not assigned to a guest, and it doesn't have a driver loaded for it in the host? i.e., if it's not configured, it's sitting idle, so how can it generate an AER?
if you are injecting an AER attributed to a device that isn't configured, then you are contriving a non-valid system condition/state, and I'm not surprised that the AER handler fails. Maybe it needs an update/patch for the aer-inject case.
> Any special qemu /stuff is needed to activate the VFIO support ? would like to give it a try for a case that VF is attached.
I see Alex answered this question. VFIO rocks! ... Alex did a great job with it.
Definitely cleaned up and more cleanly architected a solution that will lend itself to incremental for
complete reconstruction/duplication for other arches, busses, iommus, etc. to follow.
>>
>>>> How such cases should work ? my expectation was that the PF will get the
>>>> error detected message then will recognize whether
>>>> issue is its own or one of its VFs
>> The AER packet will have the tag of the VF in if it was the source of the error;
>> so the PF will never see it; although one could argue it should be 'promoted'
>> to the PF if PF/VF needs to clear some state it has wrt the VF (the SRIOV spec is
>> lacking of info in this space); _but_, VFIO resets the VF (sets FLR bit) when the
>> device is deassigned and before re-attachment to the host, so that should clear out
>> any state btwn PF & VF ('should' ... famous last words...).
> In my test I have used the aer-inject tool simulating an error to the BUS that both PF/VF are residing on, putting the function number to be the PF one, looks like both should be called by the aer driver as part
> of the pci_walk_bus(). As mentioned I got a call only on the PF and recovery failed as of the VF doesn't include an AER aware driver, once removed the VF recovery succeeded.
> I believe that packet should include some info about the source of the error isn't it ?
yup.
> In addition, looking at IXGBE upstream source code at ixgbe_error_detected() looks like there is some code running on the PF that checks whether the source was a VF.
Ping the INTEL gang listed in MAINTAINDERS for ixgbe to see what was tested, intention of this code wrt AER.
Alex Duyck & Greg Rose are the two I've worked with the most on design issues in the driver, but others @INTEL may be more active in it now.
>
> By the way: when tried to simulate a VF error using its FN got below error:
> "Error: Failed to write, Inappropriate ioctl for device", any idea about that error ?
details? how did you simulate the error? -- send cmdline used, or code snippet to inject error, etc.
that (quickly) looks like a sysfs error reply when an operation is not supported to a file/device under it.
>>
>>>
>>> I'm really not an AER expert, so help me understand this question of
>>> recognizing whether an error is associated with a PF or a VF.
>>>
>>> In terms of hardware, it looks like the device that detects an error
>>> logs some information and sends an Error Message upstream. The Root
>>> Complex receives the message, captures the source ID from the Error
>>> Message, and may generate an interrupt. I expect this source ID can
>>> be either a PF or a VF; there's no requirement that a VF error must be
>>> reported as though it's from the PF, is there?
>>>
>>>> and work accordingly, in current code
>>>> looks like recovery failed as part of "voting" once there is no AER handler
>>>> assigned to the VFs.
>>>
>>> The commit you mentioned has to do with PCI_ERS_RESULT_NO_AER_DRIVER.
>>> We use pci_walk_bus() to figure out whether all the devices in a
>>> subtree have a driver. What subtree is involved here? I would expect
>>> the VFs to be siblings of the PF, not children of it, so I'm not sure
>>> where things went wrong.
>> Well, VFs could be on virtual busses (ARI turned on), so not necessarily a
>> sibling to PF ... and then we have the problem in PCI code of not being able
>> to traverse these virtual busses (in some cases; not sure if pci_walk_bus(),
>> which is going down the tree vs up the tree, has any problems here w/VFs on
>> virtual busses).
>>
>>>
>>> Can you collect "lspci -vvv" output and maybe add some debug so we can
>>> see exactly where the error is detected and what devices we're looking
>>> at to conclude that one of them doesn't have a driver?
> lspci -vvv for both PF & VF is attached, we can see that VF (21:00.1) has no driver loaded comparing the PF (Kernel driver in use: mlx4_core).
um, well, the VF doesn't contain an AER cap strucuture, so expecting AER support for a pci device (VF or PF) w/o an AER cap is 'wishful'.... so, the VF can't generate an AER b/c it doesn't have the appropriate cap regs for the AER handler to use to report the error & recover from it.
>>>
>>> Bjorn
>>>
>>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PCI/AER: AER in SRIOV environment
2014-06-24 14:56 ` Don Dutile
@ 2014-06-24 16:22 ` Yishai Hadas
2014-06-24 17:38 ` Alex Williamson
0 siblings, 1 reply; 8+ messages in thread
From: Yishai Hadas @ 2014-06-24 16:22 UTC (permalink / raw)
To: Don Dutile
Cc: Bjorn Helgaas, Pandarathil, Vijaymohan R, Myron Stowe,
linux-rdma (linux-rdma@vger.kernel.org), yishaih@mellanox.com,
liranl, linux-pci@vger.kernel.org
On 6/24/2014 5:56 PM, Don Dutile wrote:
> On 06/23/2014 06:44 PM, Yishai Hadas wrote:
>> On 6/23/2014 11:12 PM, Don Dutile wrote:
>>> On 06/23/2014 03:09 PM, Bjorn Helgaas wrote:
>>>> [+cc linux-pci, Don]
>>>>
>>> Adding Alex Williamson in case he can add more to this conversation...
>>>
>>>> On Mon, Jun 23, 2014 at 8:29 AM, Yishai Hadas
>>>> <yishaih@dev.mellanox.co.il> wrote:
>>>>> Hi Vijay,
>>>>> Trying to add AER support for Mellanox NIC in SRIOV environment,
>>>>> while
>>>>> evaluating/testing encountered a problem which led me to your
>>>>> patch accepted as part of kernel 3.8, commit ID
>>>>> "918b4053184c0ca22236e70e299c5343eea35304".
>>>>>
>>>>> Have some concerns/questions on:
>>>>> When working in SRIOV environment VFs may be un-attached, having
>>>>> no driver
>>>>> assigned to, or may be attached to Virtual machine to work in some
>>>>> pass-through mode.
>>>>> Once working in KVM setup there is pci-stub driver which is loaded
>>>>> in the
>>>>> HYP/PF for a given attached VF.
>>> huh? 'loaded in the hyp/pf? .... um, loaded in the host, and a VF is
>>> detached from its host driver -- a VF can be used in the host w/o
>>> any virtualization,
>>> i.e., that's how guest VM is driving the VF: as if it was used by a
>>> guest (host) OS directly --
>>> and attached to pci-stub driver, when assigned to a KVM guest in
>>> pre-VFIO days/ways.
>>> If VFIO used, then VF is attached to vfio-pci driver.
>>>
>>>>>
>>>>> I'm using the aer-inject kernel module and its corresponding
>>>>> aer-inject tool
>>>>> to simulate an error in the HYP.
>>>>> In both cases your commit will cause the AER recovery to fail as
>>>>> there is no
>>>>> driver assigned to PF's VFs that supports AER, comparing the code
>>>>> before
>>>>> your change.
>>>>>
>>> Without VFIO, I believe that's correct. There was no AER-to-VF
>>> support pre-VFIO days.
>>> I believe with the recent VFIO support,
>>> and modifications to KVM, an AER that is associated with an assigned
>>> VF will
>>> force the crash/halt of the KVM guest -- can't depend on a guest VF
>>> driver clearing
>>> the AER in the hyp/host -- guest isn't privileged enough to clear
>>> the error.
>>> So, crashing the guest is the simple option at the moment, to
>>> contain the error.
>>> Alex: do I have that (vfio aer default) correct, or is that still
>>> site-under-construction?
>> How about the case that the VF is not attached to a KVM guest
>> and has no driver loaded on host ? in such a case from code review
>> and some testing the recovery will
>> fail as there is no AER aware driver here. What is the expected
>> solution here ?
> Well, how can a VF be attributed to an AER if it's not assigned to a
> guest, and it doesn't have a driver loaded for it in the host? i.e.,
> if it's not configured, it's sitting idle, so how can it generate an AER?
The expectation is really that it will not participate and won't
vote, however in current code looks like it gets a default vote of
PCI_ERS_RESULT_NO_AER_DRIVER and recovery failed, that's exactly my concern.
In case pci_walk_bus() go over both PF & VF which have same BUS
number this may happen, agree ?
> if you are injecting an AER attributed to a device that isn't
> configured, then you are contriving a non-valid system
> condition/state, and I'm not surprised that the AER handler fails.
> Maybe it needs an update/patch for the aer-inject case.
>
The failure happens once I inject the error to its PF which is
configured.
>> Any special qemu /stuff is needed to activate the VFIO support ?
>> would like to give it a try for a case that VF is attached.
> I see Alex answered this question. VFIO rocks! ... Alex did a great
> job with it.
> Definitely cleaned up and more cleanly architected a solution that
> will lend itself to incremental for
> complete reconstruction/duplication for other arches, busses, iommus,
> etc. to follow.
Alex - have on my setup QEMU which comes as part of RH 6.5, it uses
the pci-stub driver, looking for a clear way to replace with a newer one
which supports VFIO.
Should I just download from GIT and run configure, make & make
install or there are other required steps to fully replace ? any pointer
to the required steps may help.
>
>>>
>>>>> How such cases should work ? my expectation was that the PF will
>>>>> get the
>>>>> error detected message then will recognize whether
>>>>> issue is its own or one of its VFs
>>> The AER packet will have the tag of the VF in if it was the source
>>> of the error;
>>> so the PF will never see it; although one could argue it should be
>>> 'promoted'
>>> to the PF if PF/VF needs to clear some state it has wrt the VF (the
>>> SRIOV spec is
>>> lacking of info in this space); _but_, VFIO resets the VF (sets FLR
>>> bit) when the
>>> device is deassigned and before re-attachment to the host, so that
>>> should clear out
>>> any state btwn PF & VF ('should' ... famous last words...).
>> In my test I have used the aer-inject tool simulating an error
>> to the BUS that both PF/VF are residing on, putting the function
>> number to be the PF one, looks like both should be called by the aer
>> driver as part
>> of the pci_walk_bus(). As mentioned I got a call only on the PF
>> and recovery failed as of the VF doesn't include an AER aware driver,
>> once removed the VF recovery succeeded.
>> I believe that packet should include some info about the source
>> of the error isn't it ?
> yup.
>> In addition, looking at IXGBE upstream source code at
>> ixgbe_error_detected() looks like there is some code running on the
>> PF that checks whether the source was a VF.
> Ping the INTEL gang listed in MAINTAINDERS for ixgbe to see what was
> tested, intention of this code wrt AER.
> Alex Duyck & Greg Rose are the two I've worked with the most on design
> issues in the driver, but others @INTEL may be more active in it now.
Thanks, may be helpful.
>
>>
>> By the way: when tried to simulate a VF error using its FN got
>> below error:
>> "Error: Failed to write, Inappropriate ioctl for device", any
>> idea about that error ?
> details? how did you simulate the error? -- send cmdline used, or code
> snippet to inject error, etc.
> that (quickly) looks like a sysfs error reply when an operation is not
> supported to a file/device under it.
Working with the downloaded aer-inject-0.1 tool, command line was
./aer-inject test/aer1
content of aer1 file is as below:
AER
BUS 33 DEV 0 FN 0 (33 is the decimal value of 21
hex value of PF & VF bus number)
UNCOR_STATUS POISON_TLP
HEADER_LOG 0 1 2 3
>
>>>
>>>>
>>>> I'm really not an AER expert, so help me understand this question of
>>>> recognizing whether an error is associated with a PF or a VF.
>>>>
>>>> In terms of hardware, it looks like the device that detects an error
>>>> logs some information and sends an Error Message upstream. The Root
>>>> Complex receives the message, captures the source ID from the Error
>>>> Message, and may generate an interrupt. I expect this source ID can
>>>> be either a PF or a VF; there's no requirement that a VF error must be
>>>> reported as though it's from the PF, is there?
>>>>
>>>>> and work accordingly, in current code
>>>>> looks like recovery failed as part of "voting" once there is no
>>>>> AER handler
>>>>> assigned to the VFs.
>>>>
>>>> The commit you mentioned has to do with PCI_ERS_RESULT_NO_AER_DRIVER.
>>>> We use pci_walk_bus() to figure out whether all the devices in a
>>>> subtree have a driver. What subtree is involved here? I would expect
>>>> the VFs to be siblings of the PF, not children of it, so I'm not sure
>>>> where things went wrong.
>>> Well, VFs could be on virtual busses (ARI turned on), so not
>>> necessarily a
>>> sibling to PF ... and then we have the problem in PCI code of not
>>> being able
>>> to traverse these virtual busses (in some cases; not sure if
>>> pci_walk_bus(),
>>> which is going down the tree vs up the tree, has any problems here
>>> w/VFs on
>>> virtual busses).
>>>
>>>>
>>>> Can you collect "lspci -vvv" output and maybe add some debug so we can
>>>> see exactly where the error is detected and what devices we're looking
>>>> at to conclude that one of them doesn't have a driver?
>> lspci -vvv for both PF & VF is attached, we can see that VF
>> (21:00.1) has no driver loaded comparing the PF (Kernel driver in
>> use: mlx4_core).
> um, well, the VF doesn't contain an AER cap strucuture, so expecting
> AER support for a pci device (VF or PF) w/o an AER cap is
> 'wishful'.... so, the VF can't generate an AER b/c it doesn't have the
> appropriate cap regs for the AER handler to use to report the error &
> recover from it.
In case VF doesn't have AER caps we may expect that once there is
an error let its PF that has the AER caps to know about and let it
handle it on behalf of the VF, doesn't it make sense ?
Do you think that without AER caps for the VF it can work properly
under VFIO driver ?
>
>>>>
>>>> Bjorn
>>>>
>>>
>>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: PCI/AER: AER in SRIOV environment
2014-06-24 16:22 ` Yishai Hadas
@ 2014-06-24 17:38 ` Alex Williamson
0 siblings, 0 replies; 8+ messages in thread
From: Alex Williamson @ 2014-06-24 17:38 UTC (permalink / raw)
To: Yishai Hadas
Cc: Don Dutile, Bjorn Helgaas, Pandarathil, Vijaymohan R, Myron Stowe,
linux-rdma (linux-rdma@vger.kernel.org), yishaih@mellanox.com,
liranl, linux-pci@vger.kernel.org
On Tue, 2014-06-24 at 19:22 +0300, Yishai Hadas wrote:
> On 6/24/2014 5:56 PM, Don Dutile wrote:
> > On 06/23/2014 06:44 PM, Yishai Hadas wrote:
> >> On 6/23/2014 11:12 PM, Don Dutile wrote:
> >>> On 06/23/2014 03:09 PM, Bjorn Helgaas wrote:
> >>>> [+cc linux-pci, Don]
> >>>>
> >>> Adding Alex Williamson in case he can add more to this conversation...
> >>>
> >>>> On Mon, Jun 23, 2014 at 8:29 AM, Yishai Hadas
> >>>> <yishaih@dev.mellanox.co.il> wrote:
> >>>>> Hi Vijay,
> >>>>> Trying to add AER support for Mellanox NIC in SRIOV environment,
> >>>>> while
> >>>>> evaluating/testing encountered a problem which led me to your
> >>>>> patch accepted as part of kernel 3.8, commit ID
> >>>>> "918b4053184c0ca22236e70e299c5343eea35304".
> >>>>>
> >>>>> Have some concerns/questions on:
> >>>>> When working in SRIOV environment VFs may be un-attached, having
> >>>>> no driver
> >>>>> assigned to, or may be attached to Virtual machine to work in some
> >>>>> pass-through mode.
> >>>>> Once working in KVM setup there is pci-stub driver which is loaded
> >>>>> in the
> >>>>> HYP/PF for a given attached VF.
> >>> huh? 'loaded in the hyp/pf? .... um, loaded in the host, and a VF is
> >>> detached from its host driver -- a VF can be used in the host w/o
> >>> any virtualization,
> >>> i.e., that's how guest VM is driving the VF: as if it was used by a
> >>> guest (host) OS directly --
> >>> and attached to pci-stub driver, when assigned to a KVM guest in
> >>> pre-VFIO days/ways.
> >>> If VFIO used, then VF is attached to vfio-pci driver.
> >>>
> >>>>>
> >>>>> I'm using the aer-inject kernel module and its corresponding
> >>>>> aer-inject tool
> >>>>> to simulate an error in the HYP.
> >>>>> In both cases your commit will cause the AER recovery to fail as
> >>>>> there is no
> >>>>> driver assigned to PF's VFs that supports AER, comparing the code
> >>>>> before
> >>>>> your change.
> >>>>>
> >>> Without VFIO, I believe that's correct. There was no AER-to-VF
> >>> support pre-VFIO days.
> >>> I believe with the recent VFIO support,
> >>> and modifications to KVM, an AER that is associated with an assigned
> >>> VF will
> >>> force the crash/halt of the KVM guest -- can't depend on a guest VF
> >>> driver clearing
> >>> the AER in the hyp/host -- guest isn't privileged enough to clear
> >>> the error.
> >>> So, crashing the guest is the simple option at the moment, to
> >>> contain the error.
> >>> Alex: do I have that (vfio aer default) correct, or is that still
> >>> site-under-construction?
> >> How about the case that the VF is not attached to a KVM guest
> >> and has no driver loaded on host ? in such a case from code review
> >> and some testing the recovery will
> >> fail as there is no AER aware driver here. What is the expected
> >> solution here ?
> > Well, how can a VF be attributed to an AER if it's not assigned to a
> > guest, and it doesn't have a driver loaded for it in the host? i.e.,
> > if it's not configured, it's sitting idle, so how can it generate an AER?
> The expectation is really that it will not participate and won't
> vote, however in current code looks like it gets a default vote of
> PCI_ERS_RESULT_NO_AER_DRIVER and recovery failed, that's exactly my concern.
> In case pci_walk_bus() go over both PF & VF which have same BUS
> number this may happen, agree ?
> > if you are injecting an AER attributed to a device that isn't
> > configured, then you are contriving a non-valid system
> > condition/state, and I'm not surprised that the AER handler fails.
> > Maybe it needs an update/patch for the aer-inject case.
> >
> The failure happens once I inject the error to its PF which is
> configured.
> >> Any special qemu /stuff is needed to activate the VFIO support ?
> >> would like to give it a try for a case that VF is attached.
> > I see Alex answered this question. VFIO rocks! ... Alex did a great
> > job with it.
> > Definitely cleaned up and more cleanly architected a solution that
> > will lend itself to incremental for
> > complete reconstruction/duplication for other arches, busses, iommus,
> > etc. to follow.
> Alex - have on my setup QEMU which comes as part of RH 6.5, it uses
> the pci-stub driver, looking for a clear way to replace with a newer one
> which supports VFIO.
> Should I just download from GIT and run configure, make & make
> install or there are other required steps to fully replace ? any pointer
> to the required steps may help.
Sure, you can compile from git and make install, just don't expect
libvirt on a RHEL6 system to know about vfio. That doesn't prevent you
from binding devices manually and running QEMU from the commandline.
Alternatively, RHEL7 and recent Fedora (and I assume other recent
distros) should have native vfio support. Thanks,
Alex
> >>>>> How such cases should work ? my expectation was that the PF will
> >>>>> get the
> >>>>> error detected message then will recognize whether
> >>>>> issue is its own or one of its VFs
> >>> The AER packet will have the tag of the VF in if it was the source
> >>> of the error;
> >>> so the PF will never see it; although one could argue it should be
> >>> 'promoted'
> >>> to the PF if PF/VF needs to clear some state it has wrt the VF (the
> >>> SRIOV spec is
> >>> lacking of info in this space); _but_, VFIO resets the VF (sets FLR
> >>> bit) when the
> >>> device is deassigned and before re-attachment to the host, so that
> >>> should clear out
> >>> any state btwn PF & VF ('should' ... famous last words...).
> >> In my test I have used the aer-inject tool simulating an error
> >> to the BUS that both PF/VF are residing on, putting the function
> >> number to be the PF one, looks like both should be called by the aer
> >> driver as part
> >> of the pci_walk_bus(). As mentioned I got a call only on the PF
> >> and recovery failed as of the VF doesn't include an AER aware driver,
> >> once removed the VF recovery succeeded.
> >> I believe that packet should include some info about the source
> >> of the error isn't it ?
> > yup.
> >> In addition, looking at IXGBE upstream source code at
> >> ixgbe_error_detected() looks like there is some code running on the
> >> PF that checks whether the source was a VF.
> > Ping the INTEL gang listed in MAINTAINDERS for ixgbe to see what was
> > tested, intention of this code wrt AER.
> > Alex Duyck & Greg Rose are the two I've worked with the most on design
> > issues in the driver, but others @INTEL may be more active in it now.
> Thanks, may be helpful.
> >
> >>
> >> By the way: when tried to simulate a VF error using its FN got
> >> below error:
> >> "Error: Failed to write, Inappropriate ioctl for device", any
> >> idea about that error ?
> > details? how did you simulate the error? -- send cmdline used, or code
> > snippet to inject error, etc.
> > that (quickly) looks like a sysfs error reply when an operation is not
> > supported to a file/device under it.
> Working with the downloaded aer-inject-0.1 tool, command line was
> ./aer-inject test/aer1
> content of aer1 file is as below:
> AER
> BUS 33 DEV 0 FN 0 (33 is the decimal value of 21
> hex value of PF & VF bus number)
> UNCOR_STATUS POISON_TLP
> HEADER_LOG 0 1 2 3
> >
> >>>
> >>>>
> >>>> I'm really not an AER expert, so help me understand this question of
> >>>> recognizing whether an error is associated with a PF or a VF.
> >>>>
> >>>> In terms of hardware, it looks like the device that detects an error
> >>>> logs some information and sends an Error Message upstream. The Root
> >>>> Complex receives the message, captures the source ID from the Error
> >>>> Message, and may generate an interrupt. I expect this source ID can
> >>>> be either a PF or a VF; there's no requirement that a VF error must be
> >>>> reported as though it's from the PF, is there?
> >>>>
> >>>>> and work accordingly, in current code
> >>>>> looks like recovery failed as part of "voting" once there is no
> >>>>> AER handler
> >>>>> assigned to the VFs.
> >>>>
> >>>> The commit you mentioned has to do with PCI_ERS_RESULT_NO_AER_DRIVER.
> >>>> We use pci_walk_bus() to figure out whether all the devices in a
> >>>> subtree have a driver. What subtree is involved here? I would expect
> >>>> the VFs to be siblings of the PF, not children of it, so I'm not sure
> >>>> where things went wrong.
> >>> Well, VFs could be on virtual busses (ARI turned on), so not
> >>> necessarily a
> >>> sibling to PF ... and then we have the problem in PCI code of not
> >>> being able
> >>> to traverse these virtual busses (in some cases; not sure if
> >>> pci_walk_bus(),
> >>> which is going down the tree vs up the tree, has any problems here
> >>> w/VFs on
> >>> virtual busses).
> >>>
> >>>>
> >>>> Can you collect "lspci -vvv" output and maybe add some debug so we can
> >>>> see exactly where the error is detected and what devices we're looking
> >>>> at to conclude that one of them doesn't have a driver?
> >> lspci -vvv for both PF & VF is attached, we can see that VF
> >> (21:00.1) has no driver loaded comparing the PF (Kernel driver in
> >> use: mlx4_core).
> > um, well, the VF doesn't contain an AER cap strucuture, so expecting
> > AER support for a pci device (VF or PF) w/o an AER cap is
> > 'wishful'.... so, the VF can't generate an AER b/c it doesn't have the
> > appropriate cap regs for the AER handler to use to report the error &
> > recover from it.
> In case VF doesn't have AER caps we may expect that once there is
> an error let its PF that has the AER caps to know about and let it
> handle it on behalf of the VF, doesn't it make sense ?
> Do you think that without AER caps for the VF it can work properly
> under VFIO driver ?
> >
> >>>>
> >>>> Bjorn
> >>>>
> >>>
> >>
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-06-24 17:38 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <53A839C6.5050102@dev.mellanox.co.il>
2014-06-23 19:09 ` PCI/AER: AER in SRIOV environment Bjorn Helgaas
2014-06-23 20:12 ` Don Dutile
2014-06-23 22:44 ` Yishai Hadas
2014-06-23 23:17 ` Alex Williamson
2014-06-24 14:56 ` Don Dutile
2014-06-24 16:22 ` Yishai Hadas
2014-06-24 17:38 ` Alex Williamson
2014-06-23 23:10 ` Alex Williamson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).