From: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
Alex Williamson <alex.williamson@redhat.com>,
Leon Romanovsky <leon@kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>
Subject: Re: Write to srvio_numvfs triggers kernel panic
Date: Sun, 8 May 2022 11:07:40 +0000 [thread overview]
Message-ID: <87ee14l1tx.fsf@epam.com> (raw)
In-Reply-To: <20220507154145.GA568412@bhelgaas>
Hello Bjorn,
Bjorn Helgaas <helgaas@kernel.org> writes:
> On Sat, May 07, 2022 at 10:22:32AM +0000, Volodymyr Babchuk wrote:
>> Bjorn Helgaas <helgaas@kernel.org> writes:
>> > On Wed, May 04, 2022 at 07:56:01PM +0000, Volodymyr Babchuk wrote:
>> >>
>> >> I have encountered issue when PCI code tries to use both fields in
>> >>
>> >> union {
>> >> struct pci_sriov *sriov; /* PF: SR-IOV info */
>> >> struct pci_dev *physfn; /* VF: related PF */
>> >> };
>> >>
>> >> (which are part of struct pci_dev) at the same time.
>> >>
>> >> Symptoms are following:
>> >>
>> >> # echo 1 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
>> >>
>> >> pci 0000:01:00.2: reg 0x20c: [mem 0x30018000-0x3001ffff 64bit]
>> >> pci 0000:01:00.2: VF(n) BAR0 space: [mem 0x30018000-0x30117fff 64bit] (contains BAR0 for 32 VFs)
>> >> Unable to handle kernel paging request at virtual address 0001000200000010
>
>> >> Debugging showed the following:
>> >>
>> >> pci_iov_add_virtfn() allocates new struct pci_dev:
>> >>
>> >> virtfn = pci_alloc_dev(bus);
>> >> and sets physfn:
>> >> virtfn->is_virtfn = 1;
>> >> virtfn->physfn = pci_dev_get(dev);
>> >>
>> >> then we will get into sriov_init() via the following call path:
>> >>
>> >> pci_device_add(virtfn, virtfn->bus);
>> >> pci_init_capabilities(dev);
>> >> pci_iov_init(dev);
>> >> sriov_init(dev, pos);
>> >
>> > We called pci_device_add() with the VF. pci_iov_init() only calls
>> > sriov_init() if it finds an SR-IOV capability on the device:
>> >
>> > pci_iov_init(struct pci_dev *dev)
>> > pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
>> > if (pos)
>> > return sriov_init(dev, pos);
>> >
>> > So this means the VF must have an SR-IOV capability, which sounds a
>> > little dubious. From PCIe r6.0:
>>
>> [...]
>>
>> Yes, I dived into debugging and came to the same conclusions. I'm still
>> investigating this, but looks like my PCIe controller (DesignWare-based)
>> incorrectly reads configuration space for VF. Looks like instead of
>> providing access VF config space, it reads PF's one.
>>
>> > Can you supply the output of "sudo lspci -vv" for your system?
>>
>> Sure:
>>
>> root@spider:~# lspci -vv
>> 00:00.0 PCI bridge: Renesas Technology Corp. Device 0031 (prog-if 00 [Normal decode])
>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>> Latency: 0
>> Interrupt: pin A routed to IRQ 189
>> Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
>> I/O behind bridge: [disabled]
>> Memory behind bridge: 30000000-301fffff [size=2M]
>> Prefetchable memory behind bridge: [disabled]
>> Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
>> BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
>> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
>> Capabilities: [40] Power Management version 3
>> Flags: PMEClk- DSI- D1+ D2- AuxCurrent=0mA PME(D0+,D1+,D2-,D3hot+,D3cold+)
>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>> Capabilities: [50] MSI: Enable+ Count=128/128 Maskable+ 64bit+
>> Address: 0000000004030040 Data: 0000
>> Masking: fffffffe Pending: 00000000
>> Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00
>> DevCap: MaxPayload 256 bytes, PhantFunc 0
>> ExtTag+ RBE+
>> DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
>> RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
>> MaxPayload 128 bytes, MaxReadReq 512 bytes
>> DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>> LnkCap: Port #0, Speed 5GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us
>> ClockPM- Surprise- LLActRep+ BwNot- ASPMOptComp+
>> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
>> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>> LnkSta: Speed 5GT/s (ok), Width x2 (ok)
>> TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt-
>> RootCap: CRSVisible-
>> RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
>> RootSta: PME ReqID 0000, PMEStatus- PMEPending-
>> DevCap2: Completion Timeout: Not Supported, TimeoutDis+, NROPrPrP+, LTR+
>> 10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
>> EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
>> FRS-, LN System CLS Not Supported, TPHComp-, ExtTPHComp-, ARIFwd-
>> AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
>> AtomicOpsCtl: ReqEn- EgressBlck-
>> LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
>> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>> Compliance De-emphasis: -6dB
>> LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>> Capabilities: [100 v2] Advanced Error Reporting
>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>> AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
>> MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap-
>> HeaderLog: 00000000 00000000 00000000 00000000
>> RootCmd: CERptEn- NFERptEn- FERptEn-
>> RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
>> FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
>> ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
>> Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
>> Capabilities: [158 v1] Secondary PCI Express
>> LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
>> LaneErrStat: 0
>> Capabilities: [178 v1] Physical Layer 16.0 GT/s <?>
>> Capabilities: [19c v1] Lane Margining at the Receiver <?>
>> Capabilities: [1bc v1] L1 PM Substates
>> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
>> PortCommonModeRestoreTime=10us PortTPowerOnTime=14us
>> L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
>> T_CommonMode=0us LTR1.2_Threshold=0ns
>> L1SubCtl2: T_PwrOn=10us
>> Capabilities: [1cc v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
>> Capabilities: [2cc v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
>> Capabilities: [304 v1] Data Link Feature <?>
>> Capabilities: [310 v1] Precision Time Measurement
>> PTMCap: Requester:+ Responder:+ Root:+
>> PTMClockGranularity: 16ns
>> PTMControl: Enabled:- RootSelected:-
>> PTMEffectiveGranularity: Unknown
>> Capabilities: [31c v1] Vendor Specific Information: ID=0004 Rev=1 Len=054 <?>
>> Kernel driver in use: pcieport
>> Kernel modules: pci_endpoint_test
>>
>> 01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a824 (prog-if 02 [NVM Express])
>> Subsystem: Samsung Electronics Co Ltd Device a809
>> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>> Latency: 0
>> Interrupt: pin A routed to IRQ 0
>> NUMA node: 0
>> Region 0: Memory at 30010000 (64-bit, non-prefetchable) [size=32K]
>> Expansion ROM at 30000000 [virtual] [disabled] [size=64K]
>> Capabilities: [40] Power Management version 3
>> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>> Capabilities: [70] Express (v2) Endpoint, MSI 00 [8/5710]
>> DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
>> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>> RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
>> MaxPayload 128 bytes, MaxReadReq 512 bytes
>> DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
>> LnkCap: Port #0, Speed 16GT/s, Width x4, ASPM not supported
>> ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
>> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
>> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>> LnkSta: Speed 5GT/s (downgraded), Width x2 (downgraded)
>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR-
>> 10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
>> EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
>> FRS-, TPHComp-, ExtTPHComp-
>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
>> AtomicOpsCtl: ReqEn-
>> LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
>> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>> Compliance De-emphasis: -6dB
>> LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>> Capabilities: [b0] MSI-X: Enable+ Count=64 Masked-
>> Vector table: BAR=0 offset=00004000
>> PBA: BAR=0 offset=00003000
>> Capabilities: [100 v2] Advanced Error Reporting
>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
>> MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap-
>> HeaderLog: 00000000 00000000 00000000 00000000
>> Capabilities: [148 v1] Device Serial Number d3-42-50-11-99-38-25-00
>> Capabilities: [168 v1] Alternative Routing-ID Interpretation (ARI)
>> ARICap: MFVC- ACS-, Next Function: 0
>> ARICtl: MFVC- ACS-, Function Group: 0
>> Capabilities: [178 v1] Secondary PCI Express
>> LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
>> LaneErrStat: 0
>> Capabilities: [198 v1] Physical Layer 16.0 GT/s <?>
>> Capabilities: [1c0 v1] Lane Margining at the Receiver <?>
>> Capabilities: [1e8 v1] Single Root I/O Virtualization (SR-IOV)
>> IOVCap: Migration-, Interrupt Message Number: 000
>> IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy-
>> IOVSta: Migration-
>> Initial VFs: 32, Total VFs: 32, Number of VFs: 0, Function Dependency Link: 00
>> VF offset: 2, stride: 1, Device ID: a824
>> Supported Page Size: 00000553, System Page Size: 00000001
>> Region 0: Memory at 0000000030018000 (64-bit, non-prefetchable)
>> VF Migration: offset: 00000000, BIR: 0
>> Capabilities: [3a4 v1] Data Link Feature <?>
>> Kernel driver in use: nvme
>> Kernel modules: nvme
>
> I guess this is before enabling SR-IOV on 01:00.0, so it doesn't show
> the VFs themselves.
Yes. Because kernel crashed without your suggested patch.
>> > It could be that the device has an SR-IOV capability when it
>> > shouldn't. But even if it does, Linux could tolerate that better
>> > than it does today.
>>
>> Agree there. I can create simple patch that checks for is_virtfn
>> in sriov_init(). But what to do if it is set?
>
> Maybe something like this? It makes no sense to me that a VF would
> have an SR-IOV capability, but ...
>
> If the below avoids the problem, maybe collect another "lspci -vv"
> output including the VF(s).
>
I had another crash in nvme_pci_enable(), for which I made quick
workaround. And now yeah, it looks like I have some issues with
my root complex HW:
[skipping bridge info]
01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a824 (prog-if 02 [NVM Express])
Subsystem: Samsung Electronics Co Ltd Device a809
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 0
NUMA node: 0
Region 0: Memory at 30010000 (64-bit, non-prefetchable) [size=32K]
Expansion ROM at 30000000 [virtual] [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x4, ASPM not supported
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s (downgraded), Width x2 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR-
10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-, TPHComp-, ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable- Count=64 Masked-
Vector table: BAR=0 offset=00004000
PBA: BAR=0 offset=00003000
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [148 v1] Device Serial Number d3-42-50-11-99-38-25-00
Capabilities: [168 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [178 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
LaneErrStat: 0
Capabilities: [198 v1] Physical Layer 16.0 GT/s <?>
Capabilities: [1c0 v1] Lane Margining at the Receiver <?>
Capabilities: [1e8 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy-
IOVSta: Migration-
Initial VFs: 32, Total VFs: 32, Number of VFs: 1, Function Dependency Link: 00
VF offset: 2, stride: 1, Device ID: a824
Supported Page Size: 00000553, System Page Size: 00000001
Region 0: Memory at 0000000030018000 (64-bit, non-prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [3a4 v1] Data Link Feature <?>
Kernel driver in use: nvme
Kernel modules: nvme
01:00.2 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a824 (prog-if 02 [NVM Express])
Subsystem: Samsung Electronics Co Ltd Device a809
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 0
NUMA node: 0
Region 0: Memory at 30018000 (64-bit, non-prefetchable) [size=32K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x4, ASPM not supported
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s (downgraded), Width x2 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR-
10BitTagComp+, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-, TPHComp-, ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
AtomicOpsCtl: ReqEn-
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable- Count=64 Masked-
Vector table: BAR=0 offset=00004000
PBA: BAR=0 offset=00003000
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap+ MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [148 v1] Device Serial Number d3-42-50-11-99-38-25-00
Capabilities: [168 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [178 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
LaneErrStat: 0
Capabilities: [198 v1] Physical Layer 16.0 GT/s <?>
Capabilities: [1c0 v1] Lane Margining at the Receiver <?>
Capabilities: [1e8 v1] Single Root I/O Virtualization (SR-IOV)
IOVCap: Migration-, Interrupt Message Number: 000
IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy-
IOVSta: Migration-
Initial VFs: 32, Total VFs: 32, Number of VFs: 1, Function Dependency Link: 00
VF offset: 2, stride: 1, Device ID: a824
Supported Page Size: 00000553, System Page Size: 00000001
Region 0: Memory at 0000000030018000 (64-bit, non-prefetchable)
VF Migration: offset: 00000000, BIR: 0
Capabilities: [3a4 v1] Data Link Feature <?>
Kernel modules: nvme
As you can see, output for func 0 and func 2 is identical, so yeah,
looks like my system reads config space for func 0 in both cases.
Now at least I know where to look. Thank you for your help.
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 952217572113..9c5184384a45 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -901,6 +901,10 @@ int pci_iov_init(struct pci_dev *dev)
> if (!pci_is_pcie(dev))
> return -ENODEV;
>
> + /* Some devices include SR-IOV cap on VFs as well as PFs */
> + if (dev->is_virtfn)
> + return -ENODEV;
> +
> pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
> if (pos)
> return sriov_init(dev, pos);
Thanks, this patch helped. You can have my
Tested-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
if you are going to include it in the kernel.
On other hand, I'm wondering if it is correct to have both is_virtfn and
is_physfn in the first place, as there can 4 combinations and only two
(or three?) of them are valid. Maybe it is worth to replace them with
enum?
--
Volodymyr Babchuk at EPAM
next prev parent reply other threads:[~2022-05-08 20:19 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-04 19:56 Write to srvio_numvfs triggers kernel panic Volodymyr Babchuk
2022-05-06 20:17 ` Bjorn Helgaas
2022-05-07 1:34 ` Jason Gunthorpe
2022-05-07 10:25 ` Volodymyr Babchuk
2022-05-08 11:19 ` Leon Romanovsky
2022-05-09 18:22 ` Keith Busch
2022-05-07 10:22 ` Volodymyr Babchuk
2022-05-07 15:41 ` Bjorn Helgaas
2022-05-08 11:07 ` Volodymyr Babchuk [this message]
2022-05-09 16:49 ` Bjorn Helgaas
2022-05-09 16:58 ` Alex Williamson
2022-05-10 6:39 ` Christoph Hellwig
2022-05-10 17:37 ` Bjorn Helgaas
2022-05-12 7:18 ` Volodymyr Babchuk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ee14l1tx.fsf@epam.com \
--to=volodymyr_babchuk@epam.com \
--cc=alex.williamson@redhat.com \
--cc=helgaas@kernel.org \
--cc=jgg@ziepe.ca \
--cc=leon@kernel.org \
--cc=linux-pci@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.