* Re: [E1000-devel] SFP+ EEPROM readouts fail on X722 (ethtool -m: Invalid argument)
From: Jakub Jankowski @ 2019-08-28 7:18 UTC (permalink / raw)
To: Fujinaka, Todd, e1000-devel@lists.sourceforge.net
Cc: netdev@vger.kernel.org, mhemsley@open-systems.com, Jeff Kirsher,
Lihong Yang
In-Reply-To: <9B4A1B1917080E46B64F07F2989DADD69B01402F@ORSMSX115.amr.corp.intel.com>
This commit suggests that it should be possible:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c271dd6c391b535226cf1a81aaad9f33cb5899d3
(It has been in upstream kernel since v4.12, so my test kernel does have
it, and so does the out-of-tree driver I'm testing with)
On 8/28/19 2:53 AM, Fujinaka, Todd wrote:
> Sorry about the top posting, but if I don't do it this way I can't read anything in Outlook (not my preferred MUA).
>
> I think I may have been wrong about things. I'm not as familiar with the x722, and the NVM versions are completely different than the x710 and I was confused.
>
> Even worse, I'm not sure if the x722 is able to read the data from the SFP/SFP+ EEPROM. I remembered that was a feature we requested internally but I don't remember what the progress was.
>
> I'm asking around to see if I can get clarification. I haven't heard anything yet.
>
> Todd Fujinaka
> Software Application Engineer
> Datacenter Engineering Group
> Intel Corporation
> todd.fujinaka@intel.com
>
>
> -----Original Message-----
> From: Jakub Jankowski [mailto:shasta@toxcorp.com]
> Sent: Tuesday, August 27, 2019 4:01 PM
> To: Fujinaka, Todd <todd.fujinaka@intel.com>; e1000-devel@lists.sourceforge.net
> Cc: netdev@vger.kernel.org; mhemsley@open-systems.com
> Subject: Re: [E1000-devel] SFP+ EEPROM readouts fail on X722 (ethtool -m: Invalid argument)
>
> Hi,
>
> On 8/27/19 7:56 PM, Fujinaka, Todd wrote:
>> The hints should be:
>> # ethtool -m eth10
>> Cannot get module EEPROM information: Invalid argument # dmesg | tail -n 1 [ 445.971974] i40e 0000:3d:00.3 eth10: Module EEPROM memory read not supported. Please update the NVM image.
>>
>> # ethtool -i eth10
>> driver: i40e
>> version: 2.9.21
>> firmware-version: 3.31 0x80000d31 1.1767.0
>>
>> And the working case:
>> # ethtool -i eth8
>> driver: i40e
>> version: 2.9.21
>> firmware-version: 6.01 0x800035cf 1.1876.0
>>
>> If you don't see it, 6.01 > 3.31.
> The reason why firmware between the two is (that much) different is because the non-working case is from X722 NIC, while the working one is from X710.
>
>> The NVM update tool should be available on downloadcenter.intel.com
> Thanks for the pointer to NVM updater. I'd like to offer some additional comments about my experience with the newest one (v4.00):
>
> a) running ./nvmupdate64e (from X722_NVMUpdate_Linux_x64 subdir) errors out without really saying what's wrong:
>
> # ./nvmupdate64e
>
> Intel(R) Ethernet NVM Update Tool
> NVMUpdate version 1.30.2.11
> Copyright (C) 2013 - 2017 Intel Corporation.
>
>
> WARNING: To avoid damage to your device, do not stop the update or reboot or power off the system during this update.
> Inventory in progress. Please wait [+.........]
> Tool execution completed with the following status: The configuration file could not be opened/read, or a syntax error was discovered in the file
> Press any key to exit.
>
> after enabling logging (-l out.log) a bit more is revealed:
>
> # tail -n 2 out.log
> Error: Config file line 2: Not supported config file version.
> Error: Missing CONFIG VERSION parameter in configuration file.
>
> but that's not entirely true, CONFIG VERSION is set in the default configuration file:
>
> # head -n 2 nvmupdate.cfg
> CURRENT FAMILY: 1.0.0
> CONFIG VERSION: 1.14.0
>
> so why isn't this understood?
> Manually editing nvmupdate.cfg and setting CONFIG VERSION: 1.11.0 seems to make this particular problem go away.
>
> b) Re-doing this with downgraded config version exposes another problem:
>
> Config file read.
> Error: Can't open NVM map file [Immediate_offset_2.txt]
>
> and indeed, there is no Immediate_offset_2.txt in NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_Linux_x64/
> There is one, however, in
> NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_EFIx64/ subdir.
> Copying it over to the _Linux_x64 resolves this particular problem
>
> c) Re-doing this with Immediate_offset_2.txt in place exposes third problem:
>
> Error: Can't open NVM image file
> [LBG_B2_Wolf_Pass_WFT_X557_P01_PHY_Auto_Detect_P23_NCSI_v3.31_800016DB.bin]
>
> and once again - same story. It exists in NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_EFIx64/ but not NVMUpdatePackage_WFT_WFQ&WF0_v4.00/X722_NVMUpdate_Linux_x64/ - had to copy it over.
>
>
> Once I managed to get all these out of the way, the tool finally ran:
>
> Num Description Ver. DevId S:B Status
> === ======================================== ===== ===== ====== ===============
> 01) Intel(R) Ethernet Server Adapter I350-T4 1.99 1521 00:024 Update not available
> 02) Intel(R) Ethernet Connection X722 for 3.49 37D2 00:061 Update
> 10GBASE-T available
> 03) Intel(R) Ethernet Server Adapter I350-T4 1.99 1521 00:175 Update not available
>
>
> The initial starting point was:
>
> 0) firmware-version: 3.31 0x80000d31 1.1767.0
>
> After first update+reboot, this was bumped to:
>
> 1) firmware-version: 3.1d 0x800016db 1.1767.0 (but ethtool -m ethX still doesn't work)
>
> So I ran the tool the second time, it said 'Update available' again, but this time:
>
> Num Description Ver. DevId S:B Status
> === ======================================== ===== ===== ====== ===============
> 01) Intel(R) Ethernet Server Adapter I350-T4 1.99 1521 00:024 Update not available
> 02) Intel(R) Ethernet Connection X722 for 3.29 37D2 00:061 Update
> 10GBASE-T available
> 03) Intel(R) Ethernet Server Adapter I350-T4 1.99 1521 00:175 Update not available
>
> Options: Adapter Index List (comma-separated), [A]ll, e[X]it
> Enter selection:02
> Would you like to back up the NVM images? [Y]es/[N]o: Y
> Update in progress. This operation may take several minutes.
> [*******+..]
> Tool execution completed with the following status: <---------- why is there no status printed?
> Press any key to exit.
>
>
> Checking output log:
>
> # cat out3.log
> Intel(R) Ethernet NVM Update Tool
> NVMUpdate version 1.30.2.11
> Copyright (C) 2013 - 2017 Intel Corporation.
>
> ./nvmupdate64e -c nvmupdate.cfg -l out3.log
>
> Config file read.
> Inventory
> [00:061:00:00]: Intel(R) Ethernet Connection X722 for 10GBASE-T
> Flash inventory started
> Shadow RAM inventory started
> Alternate MAC address is not set
> Shadow RAM inventory finished
> Flash inventory finished
> OROM inventory started
> OROM inventory finished
> PHY NVM inventory started
> PHY NVM inventory finished
> [00:061:00:01]: Intel(R) Ethernet Connection X722 for 10GBASE-T
> Device already inventoried.
> [00:061:00:02]: Intel(R) Ethernet Connection X722 for 10GbE SFP+
> Device already inventoried.
> PHY NVM inventory started
> PHY NVM inventory finished
> [00:061:00:03]: Intel(R) Ethernet Connection X722 for 10GbE SFP+
> Device already inventoried.
> Update
> [00:061:00:00]: Intel(R) Ethernet Connection X722 for 10GBASE-T
> Creating backup images in directory: A4BF0164884A
> Backup images created.
> Flash update started
> NVM image verification started
> Shadow RAM image verification started
>
> Image differences found at offset 0x3AE [Device=0xF, Buffer=0x0] -
> update required.
> Error: Flash update failed
> [00:061:00:02]: Intel(R) Ethernet Connection X722 for 10GbE SFP+
> #
>
> However, ethtool -i suggests that firmware was updated to:
>
> 2) firmware-version: 4.00 0x80001577 1.1580.0 <------- so it did
> _something_ after all?
>
> At this point, every subsequent attempt to run the NVM updater yields
> the same results: an update is available, but attempting to apply it
> fails with the same message in log.
>
> And my initial issue still persists - ethtool -m <iface> still returns
> "invalid argument" with "Module EEPROM memory read not supported. Please
> update the NVM image" logged in dmesg.
>
> How can I resolve this?
>
> Cheers,
> Jakub.
>
>> Todd Fujinaka
>> Software Application Engineer
>> Datacenter Engineering Group
>> Intel Corporation
>> todd.fujinaka@intel.com
>>
>>
>> -----Original Message-----
>> From: Jakub Jankowski [mailto:shasta@toxcorp.com]
>> Sent: Tuesday, August 27, 2019 4:03 AM
>> To: e1000-devel@lists.sourceforge.net
>> Cc: netdev@vger.kernel.org; shasta@toxcorp.com; mhemsley@open-systems.com
>> Subject: [E1000-devel] SFP+ EEPROM readouts fail on X722 (ethtool -m: Invalid argument)
>>
>> Hi,
>>
>> We can't get SFP+ EEPROM readouts for X722 to work at all:
>>
>> # ethtool -m eth10
>> Cannot get module EEPROM information: Invalid argument # dmesg | tail -n 1 [ 445.971974] i40e 0000:3d:00.3 eth10: Module EEPROM memory read not supported. Please update the NVM image.
>> # lspci | grep 3d:00.3
>> 3d:00.3 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GbE SFP+ (rev 09)
>>
>>
>> We're running 4.19.65 kernel at the moment, testing using the newest out-of-tree Intel module
>>
>> # modinfo -F version i40e
>> 2.9.21
>>
>> We also tried:
>> - 4.19.65 with in-tree i40e (2.3.2-k)
>> - stock Arch Linux (kernel 5.2.5, driver 2.8.20-k) and the results are the same, as shown above.
>>
>> # ethtool -i eth10
>> driver: i40e
>> version: 2.9.21
>> firmware-version: 3.31 0x80000d31 1.1767.0
>> expansion-rom-version:
>> bus-info: 0000:3d:00.3
>> supports-statistics: yes
>> supports-test: yes
>> supports-eeprom-access: yes
>> supports-register-dump: yes
>> supports-priv-flags: yes
>> # dmidecode -s baseboard-manufacturer
>> Intel Corporation
>> # dmidecode -s baseboard-product-name
>> S2600WFT
>> # dmidecode -s baseboard-version
>> H48104-853
>>
>> # lspci -vvv
>> (...)
>> 3d:00.3 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GbE SFP+ (rev 09)
>> DeviceName: Intel PCH Integrated 10 Gigabit Ethernet Controller
>> Subsystem: Intel Corporation Ethernet Connection X722 for 10GbE SFP+
>> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>> Latency: 0, Cache Line Size: 32 bytes
>> Interrupt: pin A routed to IRQ 112
>> NUMA node: 0
>> Region 0: Memory at ab000000 (64-bit, prefetchable) [size=16M]
>> Region 3: Memory at b0000000 (64-bit, prefetchable) [size=32K]
>> Expansion ROM at <ignored> [disabled]
>> Capabilities: [40] Power Management version 3
>> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
>> Address: 0000000000000000 Data: 0000
>> Masking: 00000000 Pending: 00000000
>> Capabilities: [70] MSI-X: Enable+ Count=129 Masked-
>> Vector table: BAR=3 offset=00000000
>> PBA: BAR=3 offset=00001000
>> Capabilities: [a0] Express (v2) Endpoint, MSI 00
>> DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
>> ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
>> DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
>> RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset-
>> MaxPayload 256 bytes, MaxReadReq 512 bytes
>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
>> ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
>> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
>> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>> DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported
>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
>> AtomicOpsCtl: ReqEn-
>> LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>> Capabilities: [e0] Vital Product Data
>> Product Name: Example VPD
>> Read-only fields:
>> [V0] Vendor specific:
>> [RV] Reserved: checksum good, 0 byte(s) reserved
>> End
>> Capabilities: [100 v2] Advanced Error Reporting
>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
>> UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>> CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>> HeaderLog: 00000000 00000000 00000000 00000000
>> Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
>> ARICap: MFVC- ACS-, Next Function: 0
>> ARICtl: MFVC- ACS-, Function Group: 0
>> Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
>> IOVCap: Migration-, Interrupt Message Number: 000
>> IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy-
>> IOVSta: Migration-
>> Initial VFs: 32, Total VFs: 32, Number of VFs: 0, Function Dependency Link: 03
>> VF offset: 109, stride: 1, Device ID: 37cd
>> Supported Page Size: 00000553, System Page Size: 00000001
>> Region 0: Memory at 00000000af000000 (64-bit, prefetchable)
>> Region 3: Memory at 00000000b0020000 (64-bit, prefetchable)
>> VF Migration: offset: 00000000, BIR: 0
>> Capabilities: [1a0 v1] Transaction Processing Hints
>> Device specific mode supported
>> No steering table available
>> Capabilities: [1b0 v1] Access Control Services
>> ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
>> ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
>> Kernel driver in use: i40e
>> Kernel modules: i40e
>>
>>
>> Same kernel+i40e, same SFP+ module - but on Intel X710, works like a treat:
>>
>> # lspci | grep X7
>> 81:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)
>> 81:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01) # ethtool -m eth8
>> Identifier : 0x03 (SFP)
>> Extended identifier : 0x04 (GBIC/SFP defined by 2-wire interface ID)
>> Connector : 0x07 (LC)
>> Transceiver codes : 0x10 0x00 0x00 0x01 0x00 0x00 0x00 0x00 0x00
>> Transceiver type : 10G Ethernet: 10G Base-SR
>> Transceiver type : Ethernet: 1000BASE-SX
>> Encoding : 0x06 (64B/66B)
>> BR, Nominal : 10300MBd
>> (...)
>> # ethtool -i eth8
>> driver: i40e
>> version: 2.9.21
>> firmware-version: 6.01 0x800035cf 1.1876.0
>> expansion-rom-version:
>> bus-info: 0000:81:00.0
>> supports-statistics: yes
>> supports-test: yes
>> supports-eeprom-access: yes
>> supports-register-dump: yes
>> supports-priv-flags: yes
>> #
>>
>>
>> Is this a known problem?
>>
>>
>> Best regards,
>> Jakub
>>
>>
>>
>> _______________________________________________
>> E1000-devel mailing list
>> E1000-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/e1000-devel
>> To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
--
Jakub Jankowski|shasta@toxcorp.com|http://toxcorp.com/
GPG: FCBF F03D 9ADB B768 8B92 BB52 0341 9037 A875 942D
^ permalink raw reply
* Re: [PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
From: Peter Zijlstra @ 2019-08-28 7:14 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Alexei Starovoitov, Kees Cook, LSM List, James Morris, Jann Horn,
Masami Hiramatsu, Steven Rostedt, David S. Miller,
Daniel Borkmann, Network Development, bpf, kernel-team, Linux API
In-Reply-To: <CALCETrV8iJv9+Ai11_1_r6MapPhhwt9hjxi=6EoixytabTScqg@mail.gmail.com>
On Tue, Aug 27, 2019 at 04:01:08PM -0700, Andy Lutomirski wrote:
> > Tracing:
> >
> > CAP_BPF and perf_paranoid_tracepoint_raw() (which is kernel.perf_event_paranoid == -1)
> > are necessary to:
That's not tracing, that's perf.
> > +bool cap_bpf_tracing(void)
> > +{
> > + return capable(CAP_SYS_ADMIN) ||
> > + (capable(CAP_BPF) && !perf_paranoid_tracepoint_raw());
> > +}
A whole long time ago, I proposed we introduce CAP_PERF or something
along those lines; as a replacement for that horrible crap Android and
Debian ship. But nobody was ever interested enough.
The nice thing about that is that you can then disallow perf/tracing in
general, but tag the perf executable (and similar tools) with the
capability so that unpriv users can still use it, but only limited
through the tool, not the syscalls directly.
^ permalink raw reply
* Re: libbpf distro packaging
From: Jiri Olsa @ 2019-08-28 7:12 UTC (permalink / raw)
To: Julia Kartseva
Cc: Alexei Starovoitov, Andrii Nakryiko, labbott@redhat.com,
acme@kernel.org, debian-kernel@lists.debian.org,
netdev@vger.kernel.org, Andrey Ignatov, Yonghong Song,
jolsa@kernel.org, Daniel Borkmann
In-Reply-To: <A2E805DD-8237-4703-BE6F-CC96A4D4D909@fb.com>
On Tue, Aug 27, 2019 at 10:30:24PM +0000, Julia Kartseva wrote:
> On 8/25/19, 11:42 PM, "Jiri Olsa" <jolsa@redhat.com> wrote:
>
> > On Fri, Aug 23, 2019 at 04:00:01PM +0000, Alexei Starovoitov wrote:
> > >
> > > Technically we can bump it at any time.
> > > The goal was to bump it only when new kernel is released
> > > to capture a collection of new APIs in a given 0.0.X release.
> > > So that libbpf versions are synchronized with kernel versions
> > > in some what loose way.
> > > In this case we can make an exception and bump it now.
> >
> > I see, I dont think it's worth of the exception now,
> > the patch is simple or we'll start with 0.0.3
>
> PR introducing 0.0.5 ABI was merged:
> https://github.com/libbpf/libbpf/commit/476e158
> Jiri, you'd like to avoid patching, you can start w/ 0.0.5.
> Also if you're planning to use *.spec from libbpf as a source of truth,
> It may be enhanced by syncing spec and ABI versions, similar to
> https://github.com/libbpf/libbpf/commit/d60f568
cool, anyway I started with v0.0.3 ;-) I'll update
to latest once we are merged in
the spec/srpm is currently under Fedora review:
https://bugzilla.redhat.com/show_bug.cgi?id=1745478
you can check it in here:
http://people.redhat.com/~jolsa/libbpf/v2/
I think it's little different from what you have,
but not in the essential parts
jirka
^ permalink raw reply
* Re: [patch net-next rfc 3/7] net: rtnetlink: add commands to add and delete alternative ifnames
From: Jiri Pirko @ 2019-08-28 7:07 UTC (permalink / raw)
To: Roopa Prabhu
Cc: David Miller, Jakub Kicinski, David Ahern, netdev,
Stephen Hemminger, dcbw, Michal Kubecek, Andrew Lunn, parav,
Saeed Mahameed, mlxsw
In-Reply-To: <CAJieiUjpE+o-=x2hQcsKQJNxB8O7VLHYw2tSnqzTFRuy_vtOxw@mail.gmail.com>
Tue, Aug 27, 2019 at 05:14:49PM CEST, roopa@cumulusnetworks.com wrote:
>On Tue, Aug 27, 2019 at 2:35 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Tue, Aug 27, 2019 at 10:22:42AM CEST, davem@davemloft.net wrote:
>> >From: Jiri Pirko <jiri@resnulli.us>
>> >Date: Tue, 27 Aug 2019 09:08:08 +0200
>> >
>> >> Okay, so if I understand correctly, on top of separate commands for
>> >> add/del of alternative names, you suggest also get/dump to be separate
>> >> command and don't fill this up in existing newling/getlink command.
>> >
>> >I'm not sure what to do yet.
>> >
>> >David has a point, because the only way these ifnames are useful is
>> >as ways to specify and choose net devices. So based upon that I'm
>> >slightly learning towards not using separate commands.
>>
>> Well yeah, one can use it to handle existing commands instead of
>> IFLA_NAME.
>>
>> But why does it rule out separate commands? I think it is cleaner than
>> to put everything in poor setlink messages :/ The fact that we would
>> need to add "OP" to the setlink message just feels of. Other similar
>> needs may show up in the future and we may endup in ridiculous messages
>> like:
>>
>> SETLINK
>> IFLA_NAME eth0
>> IFLA_ATLNAME_LIST (nest)
>> IFLA_ALTNAME_OP add
>> IFLA_ALTNAME somereallylongname
>> IFLA_ALTNAME_OP del
>> IFLA_ALTNAME somereallyreallylongname
>> IFLA_ALTNAME_OP add
>> IFLA_ALTNAME someotherreallylongname
>> IFLA_SOMETHING_ELSE_LIST (nest)
>> IFLA_SOMETHING_ELSE_OP add
>> ...
>> IFLA_SOMETHING_ELSE_OP del
>> ...
>> IFLA_SOMETHING_ELSE_OP add
>> ...
>>
>> I don't know what to think about it. Rollbacks are going to be pure hell :/
>
>I don't see a huge problem with the above. We need a way to solve this
>anyways for other list types in the future correct ?.
>The approach taken by this series will not scale if we have to add a
>new msg type and header for every such list attribute in the future.
Do you have some other examples in mind? So far, this was not needed.
>
>A good parallel here is bridge vlan which uses RTM_SETLINK and
>RTM_DELLINK for vlan add and deletes. But it does have an advantage of
>a separate
>msg space under AF_BRIDGE which makes it cleaner. Maybe something
>closer to that can be made to work (possibly with a msg flag) ?.
1) Not sure if AF_BRIDGE is the right example how to do things
2) See br_vlan_info(). It is not an OP-PER-VLAN. You either add or
delete all passed info, depending on the cmd (RTM_SETLINK/RTM_DETLINK).
>
>Would be good to have a consistent way to update list attributes for
>future needs too.
Okay. Do you suggest to have new set of commands to handle
adding/deleting lists of items? altNames now, others (other nests) later?
Something like:
CMD SETLISTS
IFLA_NAME eth0
IFLA_ATLNAME_LIST (nest)
IFLA_ALTNAME somereallylongname
IFLA_ALTNAME somereallyreallylongname
IFLA_ALTNAME someotherreallylongname
IFLA_SOMETHING_ELSE_LIST (nest)
IFLA_SOMETHING_ELSE
IFLA_SOMETHING_ELSE
IFLA_SOMETHING_ELSE
CMD DELLISTS
IFLA_NAME eth0
IFLA_ATLNAME_LIST (nest)
IFLA_ALTNAME somereallylongname
IFLA_ALTNAME somereallyreallylongname
IFLA_ALTNAME someotherreallylongname
IFLA_SOMETHING_ELSE_LIST (nest)
IFLA_SOMETHING_ELSE
IFLA_SOMETHING_ELSE
IFLA_SOMETHING_ELSE
How does this sound?
^ permalink raw reply
* Re: [PATCH rdma-next v3 0/3] ODP support for mlx5 DC QPs
From: Leon Romanovsky @ 2019-08-28 7:06 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Doug Ledford, RDMA mailing list, Michael Guralnik, Saeed Mahameed,
linux-netdev
In-Reply-To: <20190827155140.GA15153@ziepe.ca>
On Tue, Aug 27, 2019 at 12:51:40PM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 19, 2019 at 03:08:12PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@mellanox.com>
> >
> > Changelog
> > v3:
> > * Rewrote patches to expose through DEVX without need to change mlx5-abi.h at all.
> > v2: https://lore.kernel.org/linux-rdma/20190806074807.9111-1-leon@kernel.org
> > * Fixed reserved_* field wrong name (Saeed M.)
> > * Split first patch to two patches, one for mlx5-next and one for rdma-next. (Saeed M.)
> > v1: https://lore.kernel.org/linux-rdma/20190804100048.32671-1-leon@kernel.org
> > * Fixed alignment to u64 in mlx5-abi.h (Gal P.)
> > v0: https://lore.kernel.org/linux-rdma/20190801122139.25224-1-leon@kernel.org
> >
> > >From Michael,
> >
> > The series adds support for on-demand paging for DC transport.
> >
> > As DC is mlx-only transport, the capabilities are exposed
> > to the user using DEVX objects and later on through mlx5dv_query_device.
> >
> > Thanks
> >
> > Michael Guralnik (3):
> > net/mlx5: Set ODP capabilities for DC transport to max
> > IB/mlx5: Remove check of FW capabilities in ODP page fault handling
> > IB/mlx5: Add page fault handler for DC initiator WQE
>
> This seems fine, can you put the commit on the shared branch?
Thanks, applied to mlx5-next
00679b631edd net/mlx5: Set ODP capabilities for DC transport to max
>
> Thanks,
> Jason
^ permalink raw reply
* [net-next 04/15] igc: Remove useless forward declaration
From: Jeff Kirsher @ 2019-08-28 6:43 UTC (permalink / raw)
To: davem; +Cc: Sasha Neftin, netdev, nhorman, sassmann, Aaron Brown,
Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>
From: Sasha Neftin <sasha.neftin@intel.com>
Move igc_phy_setup_autoneg, igc_wait_autoneg and igc_set_fc_watermarks
up to avoid forward declaration.
It is not necessary to forward declare these static methods.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/igc/igc_mac.c | 73 +++++----
drivers/net/ethernet/intel/igc/igc_phy.c | 192 +++++++++++------------
2 files changed, 129 insertions(+), 136 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_mac.c b/drivers/net/ethernet/intel/igc/igc_mac.c
index ba4646737288..5eeb4c8caf4a 100644
--- a/drivers/net/ethernet/intel/igc/igc_mac.c
+++ b/drivers/net/ethernet/intel/igc/igc_mac.c
@@ -7,9 +7,6 @@
#include "igc_mac.h"
#include "igc_hw.h"
-/* forward declaration */
-static s32 igc_set_fc_watermarks(struct igc_hw *hw);
-
/**
* igc_disable_pcie_master - Disables PCI-express master access
* @hw: pointer to the HW structure
@@ -74,6 +71,41 @@ void igc_init_rx_addrs(struct igc_hw *hw, u16 rar_count)
hw->mac.ops.rar_set(hw, mac_addr, i);
}
+/**
+ * igc_set_fc_watermarks - Set flow control high/low watermarks
+ * @hw: pointer to the HW structure
+ *
+ * Sets the flow control high/low threshold (watermark) registers. If
+ * flow control XON frame transmission is enabled, then set XON frame
+ * transmission as well.
+ */
+static s32 igc_set_fc_watermarks(struct igc_hw *hw)
+{
+ u32 fcrtl = 0, fcrth = 0;
+
+ /* Set the flow control receive threshold registers. Normally,
+ * these registers will be set to a default threshold that may be
+ * adjusted later by the driver's runtime code. However, if the
+ * ability to transmit pause frames is not enabled, then these
+ * registers will be set to 0.
+ */
+ if (hw->fc.current_mode & igc_fc_tx_pause) {
+ /* We need to set up the Receive Threshold high and low water
+ * marks as well as (optionally) enabling the transmission of
+ * XON frames.
+ */
+ fcrtl = hw->fc.low_water;
+ if (hw->fc.send_xon)
+ fcrtl |= IGC_FCRTL_XONE;
+
+ fcrth = hw->fc.high_water;
+ }
+ wr32(IGC_FCRTL, fcrtl);
+ wr32(IGC_FCRTH, fcrth);
+
+ return 0;
+}
+
/**
* igc_setup_link - Setup flow control and link settings
* @hw: pointer to the HW structure
@@ -194,41 +226,6 @@ s32 igc_force_mac_fc(struct igc_hw *hw)
return ret_val;
}
-/**
- * igc_set_fc_watermarks - Set flow control high/low watermarks
- * @hw: pointer to the HW structure
- *
- * Sets the flow control high/low threshold (watermark) registers. If
- * flow control XON frame transmission is enabled, then set XON frame
- * transmission as well.
- */
-static s32 igc_set_fc_watermarks(struct igc_hw *hw)
-{
- u32 fcrtl = 0, fcrth = 0;
-
- /* Set the flow control receive threshold registers. Normally,
- * these registers will be set to a default threshold that may be
- * adjusted later by the driver's runtime code. However, if the
- * ability to transmit pause frames is not enabled, then these
- * registers will be set to 0.
- */
- if (hw->fc.current_mode & igc_fc_tx_pause) {
- /* We need to set up the Receive Threshold high and low water
- * marks as well as (optionally) enabling the transmission of
- * XON frames.
- */
- fcrtl = hw->fc.low_water;
- if (hw->fc.send_xon)
- fcrtl |= IGC_FCRTL_XONE;
-
- fcrth = hw->fc.high_water;
- }
- wr32(IGC_FCRTL, fcrtl);
- wr32(IGC_FCRTH, fcrth);
-
- return 0;
-}
-
/**
* igc_clear_hw_cntrs_base - Clear base hardware counters
* @hw: pointer to the HW structure
diff --git a/drivers/net/ethernet/intel/igc/igc_phy.c b/drivers/net/ethernet/intel/igc/igc_phy.c
index 4c8f96a9a148..f4b05af0dd2f 100644
--- a/drivers/net/ethernet/intel/igc/igc_phy.c
+++ b/drivers/net/ethernet/intel/igc/igc_phy.c
@@ -3,10 +3,6 @@
#include "igc_phy.h"
-/* forward declaration */
-static s32 igc_phy_setup_autoneg(struct igc_hw *hw);
-static s32 igc_wait_autoneg(struct igc_hw *hw);
-
/**
* igc_check_reset_block - Check if PHY reset is blocked
* @hw: pointer to the HW structure
@@ -207,100 +203,6 @@ s32 igc_phy_hw_reset(struct igc_hw *hw)
return ret_val;
}
-/**
- * igc_copper_link_autoneg - Setup/Enable autoneg for copper link
- * @hw: pointer to the HW structure
- *
- * Performs initial bounds checking on autoneg advertisement parameter, then
- * configure to advertise the full capability. Setup the PHY to autoneg
- * and restart the negotiation process between the link partner. If
- * autoneg_wait_to_complete, then wait for autoneg to complete before exiting.
- */
-static s32 igc_copper_link_autoneg(struct igc_hw *hw)
-{
- struct igc_phy_info *phy = &hw->phy;
- u16 phy_ctrl;
- s32 ret_val;
-
- /* Perform some bounds checking on the autoneg advertisement
- * parameter.
- */
- phy->autoneg_advertised &= phy->autoneg_mask;
-
- /* If autoneg_advertised is zero, we assume it was not defaulted
- * by the calling code so we set to advertise full capability.
- */
- if (phy->autoneg_advertised == 0)
- phy->autoneg_advertised = phy->autoneg_mask;
-
- hw_dbg("Reconfiguring auto-neg advertisement params\n");
- ret_val = igc_phy_setup_autoneg(hw);
- if (ret_val) {
- hw_dbg("Error Setting up Auto-Negotiation\n");
- goto out;
- }
- hw_dbg("Restarting Auto-Neg\n");
-
- /* Restart auto-negotiation by setting the Auto Neg Enable bit and
- * the Auto Neg Restart bit in the PHY control register.
- */
- ret_val = phy->ops.read_reg(hw, PHY_CONTROL, &phy_ctrl);
- if (ret_val)
- goto out;
-
- phy_ctrl |= (MII_CR_AUTO_NEG_EN | MII_CR_RESTART_AUTO_NEG);
- ret_val = phy->ops.write_reg(hw, PHY_CONTROL, phy_ctrl);
- if (ret_val)
- goto out;
-
- /* Does the user want to wait for Auto-Neg to complete here, or
- * check at a later time (for example, callback routine).
- */
- if (phy->autoneg_wait_to_complete) {
- ret_val = igc_wait_autoneg(hw);
- if (ret_val) {
- hw_dbg("Error while waiting for autoneg to complete\n");
- goto out;
- }
- }
-
- hw->mac.get_link_status = true;
-
-out:
- return ret_val;
-}
-
-/**
- * igc_wait_autoneg - Wait for auto-neg completion
- * @hw: pointer to the HW structure
- *
- * Waits for auto-negotiation to complete or for the auto-negotiation time
- * limit to expire, which ever happens first.
- */
-static s32 igc_wait_autoneg(struct igc_hw *hw)
-{
- u16 i, phy_status;
- s32 ret_val = 0;
-
- /* Break after autoneg completes or PHY_AUTO_NEG_LIMIT expires. */
- for (i = PHY_AUTO_NEG_LIMIT; i > 0; i--) {
- ret_val = hw->phy.ops.read_reg(hw, PHY_STATUS, &phy_status);
- if (ret_val)
- break;
- ret_val = hw->phy.ops.read_reg(hw, PHY_STATUS, &phy_status);
- if (ret_val)
- break;
- if (phy_status & MII_SR_AUTONEG_COMPLETE)
- break;
- msleep(100);
- }
-
- /* PHY_AUTO_NEG_TIME expiration doesn't guarantee auto-negotiation
- * has completed.
- */
- return ret_val;
-}
-
/**
* igc_phy_setup_autoneg - Configure PHY for auto-negotiation
* @hw: pointer to the HW structure
@@ -485,6 +387,100 @@ static s32 igc_phy_setup_autoneg(struct igc_hw *hw)
return ret_val;
}
+/**
+ * igc_wait_autoneg - Wait for auto-neg completion
+ * @hw: pointer to the HW structure
+ *
+ * Waits for auto-negotiation to complete or for the auto-negotiation time
+ * limit to expire, which ever happens first.
+ */
+static s32 igc_wait_autoneg(struct igc_hw *hw)
+{
+ u16 i, phy_status;
+ s32 ret_val = 0;
+
+ /* Break after autoneg completes or PHY_AUTO_NEG_LIMIT expires. */
+ for (i = PHY_AUTO_NEG_LIMIT; i > 0; i--) {
+ ret_val = hw->phy.ops.read_reg(hw, PHY_STATUS, &phy_status);
+ if (ret_val)
+ break;
+ ret_val = hw->phy.ops.read_reg(hw, PHY_STATUS, &phy_status);
+ if (ret_val)
+ break;
+ if (phy_status & MII_SR_AUTONEG_COMPLETE)
+ break;
+ msleep(100);
+ }
+
+ /* PHY_AUTO_NEG_TIME expiration doesn't guarantee auto-negotiation
+ * has completed.
+ */
+ return ret_val;
+}
+
+/**
+ * igc_copper_link_autoneg - Setup/Enable autoneg for copper link
+ * @hw: pointer to the HW structure
+ *
+ * Performs initial bounds checking on autoneg advertisement parameter, then
+ * configure to advertise the full capability. Setup the PHY to autoneg
+ * and restart the negotiation process between the link partner. If
+ * autoneg_wait_to_complete, then wait for autoneg to complete before exiting.
+ */
+static s32 igc_copper_link_autoneg(struct igc_hw *hw)
+{
+ struct igc_phy_info *phy = &hw->phy;
+ u16 phy_ctrl;
+ s32 ret_val;
+
+ /* Perform some bounds checking on the autoneg advertisement
+ * parameter.
+ */
+ phy->autoneg_advertised &= phy->autoneg_mask;
+
+ /* If autoneg_advertised is zero, we assume it was not defaulted
+ * by the calling code so we set to advertise full capability.
+ */
+ if (phy->autoneg_advertised == 0)
+ phy->autoneg_advertised = phy->autoneg_mask;
+
+ hw_dbg("Reconfiguring auto-neg advertisement params\n");
+ ret_val = igc_phy_setup_autoneg(hw);
+ if (ret_val) {
+ hw_dbg("Error Setting up Auto-Negotiation\n");
+ goto out;
+ }
+ hw_dbg("Restarting Auto-Neg\n");
+
+ /* Restart auto-negotiation by setting the Auto Neg Enable bit and
+ * the Auto Neg Restart bit in the PHY control register.
+ */
+ ret_val = phy->ops.read_reg(hw, PHY_CONTROL, &phy_ctrl);
+ if (ret_val)
+ goto out;
+
+ phy_ctrl |= (MII_CR_AUTO_NEG_EN | MII_CR_RESTART_AUTO_NEG);
+ ret_val = phy->ops.write_reg(hw, PHY_CONTROL, phy_ctrl);
+ if (ret_val)
+ goto out;
+
+ /* Does the user want to wait for Auto-Neg to complete here, or
+ * check at a later time (for example, callback routine).
+ */
+ if (phy->autoneg_wait_to_complete) {
+ ret_val = igc_wait_autoneg(hw);
+ if (ret_val) {
+ hw_dbg("Error while waiting for autoneg to complete\n");
+ goto out;
+ }
+ }
+
+ hw->mac.get_link_status = true;
+
+out:
+ return ret_val;
+}
+
/**
* igc_setup_copper_link - Configure copper link settings
* @hw: pointer to the HW structure
--
2.21.0
^ permalink raw reply related
* [net-next 05/15] Documentation: iavf: Update the Intel LAN driver doc for iavf
From: Jeff Kirsher @ 2019-08-28 6:43 UTC (permalink / raw)
To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, Aaron Brown
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>
Update the LAN driver documentation to include the latest feature
implementation and driver capabilities.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
---
.../networking/device_drivers/intel/iavf.rst | 115 +++++++++++++-----
1 file changed, 82 insertions(+), 33 deletions(-)
diff --git a/Documentation/networking/device_drivers/intel/iavf.rst b/Documentation/networking/device_drivers/intel/iavf.rst
index 2d0c3baa1752..cfc08842e32c 100644
--- a/Documentation/networking/device_drivers/intel/iavf.rst
+++ b/Documentation/networking/device_drivers/intel/iavf.rst
@@ -10,11 +10,15 @@ Copyright(c) 2013-2018 Intel Corporation.
Contents
========
+- Overview
- Identifying Your Adapter
- Additional Configurations
- Known Issues/Troubleshooting
- Support
+Overview
+========
+
This file describes the iavf Linux* Base Driver. This driver was formerly
called i40evf.
@@ -27,6 +31,7 @@ The guest OS loading the iavf driver must support MSI-X interrupts.
Identifying Your Adapter
========================
+
The driver in this kernel is compatible with devices based on the following:
* Intel(R) XL710 X710 Virtual Function
* Intel(R) X722 Virtual Function
@@ -50,9 +55,10 @@ Link messages will not be displayed to the console if the distribution is
restricting system messages. In order to see network driver link messages on
your console, set dmesg to eight by entering the following::
- dmesg -n 8
+ # dmesg -n 8
-NOTE: This setting is not saved across reboots.
+NOTE:
+ This setting is not saved across reboots.
ethtool
-------
@@ -72,11 +78,11 @@ then requests from that VF to set VLAN tag stripping will be ignored.
To enable/disable VLAN tag stripping for a VF, issue the following command
from inside the VM in which you are running the VF::
- ethtool -K <if_name> rxvlan on/off
+ # ethtool -K <if_name> rxvlan on/off
or alternatively::
- ethtool --offload <if_name> rxvlan on/off
+ # ethtool --offload <if_name> rxvlan on/off
Adaptive Virtual Function
-------------------------
@@ -91,21 +97,21 @@ additional features depending on what features are available in the PF with
which the AVF is associated. The following are base mode features:
- 4 Queue Pairs (QP) and associated Configuration Status Registers (CSRs)
- for Tx/Rx.
-- i40e descriptors and ring format.
-- Descriptor write-back completion.
-- 1 control queue, with i40e descriptors, CSRs and ring format.
-- 5 MSI-X interrupt vectors and corresponding i40e CSRs.
-- 1 Interrupt Throttle Rate (ITR) index.
-- 1 Virtual Station Interface (VSI) per VF.
+ for Tx/Rx
+- i40e descriptors and ring format
+- Descriptor write-back completion
+- 1 control queue, with i40e descriptors, CSRs and ring format
+- 5 MSI-X interrupt vectors and corresponding i40e CSRs
+- 1 Interrupt Throttle Rate (ITR) index
+- 1 Virtual Station Interface (VSI) per VF
- 1 Traffic Class (TC), TC0
- Receive Side Scaling (RSS) with 64 entry indirection table and key,
- configured through the PF.
-- 1 unicast MAC address reserved per VF.
-- 16 MAC address filters for each VF.
-- Stateless offloads - non-tunneled checksums.
-- AVF device ID.
-- HW mailbox is used for VF to PF communications (including on Windows).
+ configured through the PF
+- 1 unicast MAC address reserved per VF
+- 16 MAC address filters for each VF
+- Stateless offloads - non-tunneled checksums
+- AVF device ID
+- HW mailbox is used for VF to PF communications (including on Windows)
IEEE 802.1ad (QinQ) Support
---------------------------
@@ -117,8 +123,8 @@ VLAN ID, among other uses.
The following are examples of how to configure 802.1ad (QinQ)::
- ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24
- ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371
+ # ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24
+ # ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371
Where "24" and "371" are example VLAN IDs.
@@ -133,6 +139,19 @@ specific application. This can reduce latency for the specified application,
and allow Tx traffic to be rate limited per application. Follow the steps below
to set ADq.
+Requirements:
+
+- The sch_mqprio, act_mirred and cls_flower modules must be loaded
+- The latest version of iproute2
+- If another driver (for example, DPDK) has set cloud filters, you cannot
+ enable ADQ
+- Depending on the underlying PF device, ADQ cannot be enabled when the
+ following features are enabled:
+
+ + Data Center Bridging (DCB)
+ + Multiple Functions per Port (MFP)
+ + Sideband Filters
+
1. Create traffic classes (TCs). Maximum of 8 TCs can be created per interface.
The shaper bw_rlimit parameter is optional.
@@ -141,9 +160,9 @@ to 1Gbit for tc0 and 3Gbit for tc1.
::
- # tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
- queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit
- max_rate 1Gbit 3Gbit
+ tc qdisc add dev <interface> root mqprio num_tc 2 map 0 0 0 0 1 1 1 1
+ queues 16@0 16@16 hw 1 mode channel shaper bw_rlimit min_rate 1Gbit 2Gbit
+ max_rate 1Gbit 3Gbit
map: priority mapping for up to 16 priorities to tcs (e.g. map 0 0 0 0 1 1 1 1
sets priorities 0-3 to use tc0 and 4-7 to use tc1)
@@ -162,6 +181,10 @@ Totals must be equal or less than port speed.
For example: min_rate 1Gbit 3Gbit: Verify bandwidth limit using network
monitoring tools such as ifstat or sar –n DEV [interval] [number of samples]
+NOTE:
+ Setting up channels via ethtool (ethtool -L) is not supported when the
+ TCs are configured using mqprio.
+
2. Enable HW TC offload on interface::
# ethtool -K <interface> hw-tc-offload on
@@ -171,16 +194,16 @@ monitoring tools such as ifstat or sar –n DEV [interval] [number of samples]
# tc qdisc add dev <interface> ingress
NOTES:
- - Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory.
- - ADq is not compatible with cloud filters.
+ - Run all tc commands from the iproute2 <pathtoiproute2>/tc/ directory
+ - ADq is not compatible with cloud filters
- Setting up channels via ethtool (ethtool -L) is not supported when the TCs
- are configured using mqprio.
+ are configured using mqprio
- You must have iproute2 latest version
- - NVM version 6.01 or later is required.
+ - NVM version 6.01 or later is required
- ADq cannot be enabled when any the following features are enabled: Data
- Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband Filters.
+ Center Bridging (DCB), Multiple Functions per Port (MFP), or Sideband Filters
- If another driver (for example, DPDK) has set cloud filters, you cannot
- enable ADq.
+ enable ADq
- Tunnel filters are not supported in ADq. If encapsulated packets do arrive
in non-tunnel mode, filtering will be done on the inner headers. For example,
for VXLAN traffic in non-tunnel mode, PCTYPE is identified as a VXLAN
@@ -198,6 +221,16 @@ NOTES:
Known Issues/Troubleshooting
============================
+Bonding fails with VFs bound to an Intel(R) Ethernet Controller 700 series device
+---------------------------------------------------------------------------------
+If you bind Virtual Functions (VFs) to an Intel(R) Ethernet Controller 700
+series based device, the VF slaves may fail when they become the active slave.
+If the MAC address of the VF is set by the PF (Physical Function) of the
+device, when you add a slave, or change the active-backup slave, Linux bonding
+tries to sync the backup slave's MAC address to the same MAC address as the
+active slave. Linux bonding will fail at this point. This issue will not occur
+if the VF's MAC address is not set by the PF.
+
Traffic Is Not Being Passed Between VM and Client
-------------------------------------------------
You may not be able to pass traffic between a client system and a
@@ -215,13 +248,28 @@ Do not unload a port's driver if a Virtual Function (VF) with an active Virtual
Machine (VM) is bound to it. Doing so will cause the port to appear to hang.
Once the VM shuts down, or otherwise releases the VF, the command will complete.
+Using four traffic classes fails
+--------------------------------
+Do not try to reserve more than three traffic classes in the iavf driver. Doing
+so will fail to set any traffic classes and will cause the driver to write
+errors to stdout. Use a maximum of three queues to avoid this issue.
+
+Multiple log error messages on iavf driver removal
+--------------------------------------------------
+If you have several VFs and you remove the iavf driver, several instances of
+the following log errors are written to the log::
+
+ Unable to send opcode 2 to PF, err I40E_ERR_QUEUE_EMPTY, aq_err ok
+ Unable to send the message to VF 2 aq_err 12
+ ARQ Overflow Error detected
+
Virtual machine does not get link
---------------------------------
If the virtual machine has more than one virtual port assigned to it, and those
virtual ports are bound to different physical ports, you may not get link on
all of the virtual ports. The following command may work around the issue::
- ethtool -r <PF>
+ # ethtool -r <PF>
Where <PF> is the PF interface in the host, for example: p5p1. You may need to
run the command more than once to get link on all virtual ports.
@@ -251,12 +299,13 @@ traffic.
If you have multiple interfaces in a server, either turn on ARP filtering by
entering::
- echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
+ # echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
-NOTE: This setting is not saved across reboots. The configuration change can be
-made permanent by adding the following line to the file /etc/sysctl.conf::
+NOTE:
+ This setting is not saved across reboots. The configuration change can be
+ made permanent by adding the following line to the file /etc/sysctl.conf::
- net.ipv4.conf.all.arp_filter = 1
+ net.ipv4.conf.all.arp_filter = 1
Another alternative is to install the interfaces in separate broadcast domains
(either in different switches or in a switch partitioned to VLANs).
--
2.21.0
^ permalink raw reply related
* [net-next 08/15] iavf: allow permanent MAC address to change
From: Jeff Kirsher @ 2019-08-28 6:44 UTC (permalink / raw)
To: davem
Cc: Mitch Williams, netdev, nhorman, sassmann, Andrew Bowers,
Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>
From: Mitch Williams <mitch.a.williams@intel.com>
Allow the VF to override the "permanent" MAC address set by the host.
This allows bonding to work in the case where the administrator has set
the VF MAC.
Note that the VF must still be set to Trusted on the host if this change
is to be accepted by the PF driver.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/iavf/iavf.h | 1 -
drivers/net/ethernet/intel/iavf/iavf_main.c | 4 ----
2 files changed, 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index 9fc635d816d2..29de3ae96ef2 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -253,7 +253,6 @@ struct iavf_adapter {
#define IAVF_FLAG_RESET_PENDING BIT(4)
#define IAVF_FLAG_RESET_NEEDED BIT(5)
#define IAVF_FLAG_WB_ON_ITR_CAPABLE BIT(6)
-#define IAVF_FLAG_ADDR_SET_BY_PF BIT(8)
#define IAVF_FLAG_SERVICE_CLIENT_REQUESTED BIT(9)
#define IAVF_FLAG_CLIENT_NEEDS_OPEN BIT(10)
#define IAVF_FLAG_CLIENT_NEEDS_CLOSE BIT(11)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 554aa619ff02..07f5541a0f01 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -790,9 +790,6 @@ static int iavf_set_mac(struct net_device *netdev, void *p)
if (ether_addr_equal(netdev->dev_addr, addr->sa_data))
return 0;
- if (adapter->flags & IAVF_FLAG_ADDR_SET_BY_PF)
- return -EPERM;
-
spin_lock_bh(&adapter->mac_vlan_list_lock);
f = iavf_find_filter(adapter, hw->mac.addr);
@@ -1811,7 +1808,6 @@ static int iavf_init_get_resources(struct iavf_adapter *adapter)
eth_hw_addr_random(netdev);
ether_addr_copy(adapter->hw.mac.addr, netdev->dev_addr);
} else {
- adapter->flags |= IAVF_FLAG_ADDR_SET_BY_PF;
ether_addr_copy(netdev->dev_addr, adapter->hw.mac.addr);
ether_addr_copy(netdev->perm_addr, adapter->hw.mac.addr);
}
--
2.21.0
^ permalink raw reply related
* [net-next 07/15] igc: Add NVM checksum validation
From: Jeff Kirsher @ 2019-08-28 6:43 UTC (permalink / raw)
To: davem; +Cc: Sasha Neftin, netdev, nhorman, sassmann, Aaron Brown,
Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>
From: Sasha Neftin <sasha.neftin@intel.com>
Add NVM checksum validation during probe functionality.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/igc/igc_main.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 251552855c40..965d1c939f0f 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -4133,6 +4133,15 @@ static int igc_probe(struct pci_dev *pdev,
*/
hw->mac.ops.reset_hw(hw);
+ if (igc_get_flash_presence_i225(hw)) {
+ if (hw->nvm.ops.validate(hw) < 0) {
+ dev_err(&pdev->dev,
+ "The NVM Checksum Is Not Valid\n");
+ err = -EIO;
+ goto err_eeprom;
+ }
+ }
+
if (eth_platform_get_mac_address(&pdev->dev, hw->mac.addr)) {
/* copy the MAC address out of the NVM */
if (hw->mac.ops.read_mac_addr(hw))
--
2.21.0
^ permalink raw reply related
* [net-next 06/15] fm10k: use a local variable for the frag pointer
From: Jeff Kirsher @ 2019-08-28 6:43 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, Andrew Bowers,
Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
In the function fm10k_xmit_frame_ring, we recently switched to using
the skb_frag_size accessor instead of directly using the size member of
the skb fragment.
This made the for loop slightly harder to read because it created a very
long line that is difficult to split up. Avoid this by using a local
variable in the for loop, so that we do not have to break the line on an
open parenthesis.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index e0a2be534b20..2be9222510e7 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -1073,9 +1073,11 @@ netdev_tx_t fm10k_xmit_frame_ring(struct sk_buff *skb,
* + 2 desc gap to keep tail from touching head
* otherwise try next time
*/
- for (f = 0; f < skb_shinfo(skb)->nr_frags; f++)
- count += TXD_USE_COUNT(skb_frag_size(
- &skb_shinfo(skb)->frags[f]));
+ for (f = 0; f < skb_shinfo(skb)->nr_frags; f++) {
+ skb_frag_t *frag = &skb_shinfo(skb)->frags[f];
+
+ count += TXD_USE_COUNT(skb_frag_size(frag));
+ }
if (fm10k_maybe_stop_tx(tx_ring, count + 3)) {
tx_ring->tx_stats.tx_busy++;
--
2.21.0
^ permalink raw reply related
* [net-next 09/15] igc: Remove unneeded PCI bus defines
From: Jeff Kirsher @ 2019-08-28 6:44 UTC (permalink / raw)
To: davem; +Cc: Sasha Neftin, netdev, nhorman, sassmann, Aaron Brown,
Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>
From: Sasha Neftin <sasha.neftin@intel.com>
PCIe device control 2 defines does not use internally.
This patch comes to clean up those.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/igc/igc_defines.h | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index 11b99acf4abe..549134ecd105 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -10,10 +10,6 @@
#define IGC_CTRL_EXT_DRV_LOAD 0x10000000 /* Drv loaded bit for FW */
-/* PCI Bus Info */
-#define PCIE_DEVICE_CONTROL2 0x28
-#define PCIE_DEVICE_CONTROL2_16ms 0x0005
-
/* Physical Func Reset Done Indication */
#define IGC_CTRL_EXT_LINK_MODE_MASK 0x00C00000
--
2.21.0
^ permalink raw reply related
* [net-next 10/15] i40e: fix hw_dbg usage in i40e_hmc_get_object_va
From: Jeff Kirsher @ 2019-08-28 6:44 UTC (permalink / raw)
To: davem
Cc: Mauro S. M. Rodrigues, netdev, nhorman, sassmann, Andrew Bowers,
Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>
From: "Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com>
The mentioned function references a i40e_hw attribute, as parameter for
hw_dbg, but it doesn't exist in the function scope.
Fixes it by changing parameters from i40e_hmc_info to i40e_hw which can
retrieve the necessary i40e_hmc_info.
Signed-off-by: "Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c b/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c
index 994011c38fb4..f059de33a0fd 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_lan_hmc.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright(c) 2013 - 2018 Intel Corporation. */
+#include "i40e.h"
#include "i40e_osdep.h"
#include "i40e_register.h"
#include "i40e_type.h"
@@ -963,7 +964,7 @@ static i40e_status i40e_set_hmc_context(u8 *context_bytes,
/**
* i40e_hmc_get_object_va - retrieves an object's virtual address
- * @hmc_info: pointer to i40e_hmc_info struct
+ * @hw: the hardware struct, from which we obtain the i40e_hmc_info pointer
* @object_base: pointer to u64 to get the va
* @rsrc_type: the hmc resource type
* @obj_idx: hmc object index
@@ -972,7 +973,7 @@ static i40e_status i40e_set_hmc_context(u8 *context_bytes,
* base pointer. This function is used for LAN Queue contexts.
**/
static
-i40e_status i40e_hmc_get_object_va(struct i40e_hmc_info *hmc_info,
+i40e_status i40e_hmc_get_object_va(struct i40e_hw *hw,
u8 **object_base,
enum i40e_hmc_lan_rsrc_type rsrc_type,
u32 obj_idx)
@@ -982,6 +983,7 @@ i40e_status i40e_hmc_get_object_va(struct i40e_hmc_info *hmc_info,
struct i40e_hmc_sd_entry *sd_entry;
struct i40e_hmc_pd_entry *pd_entry;
u32 pd_idx, pd_lmt, rel_pd_idx;
+ struct i40e_hmc_info *hmc_info = &hw->hmc;
u64 obj_offset_in_fpm;
u32 sd_idx, sd_lmt;
@@ -1047,7 +1049,7 @@ i40e_status i40e_clear_lan_tx_queue_context(struct i40e_hw *hw,
i40e_status err;
u8 *context_bytes;
- err = i40e_hmc_get_object_va(&hw->hmc, &context_bytes,
+ err = i40e_hmc_get_object_va(hw, &context_bytes,
I40E_HMC_LAN_TX, queue);
if (err < 0)
return err;
@@ -1068,7 +1070,7 @@ i40e_status i40e_set_lan_tx_queue_context(struct i40e_hw *hw,
i40e_status err;
u8 *context_bytes;
- err = i40e_hmc_get_object_va(&hw->hmc, &context_bytes,
+ err = i40e_hmc_get_object_va(hw, &context_bytes,
I40E_HMC_LAN_TX, queue);
if (err < 0)
return err;
@@ -1088,7 +1090,7 @@ i40e_status i40e_clear_lan_rx_queue_context(struct i40e_hw *hw,
i40e_status err;
u8 *context_bytes;
- err = i40e_hmc_get_object_va(&hw->hmc, &context_bytes,
+ err = i40e_hmc_get_object_va(hw, &context_bytes,
I40E_HMC_LAN_RX, queue);
if (err < 0)
return err;
@@ -1109,7 +1111,7 @@ i40e_status i40e_set_lan_rx_queue_context(struct i40e_hw *hw,
i40e_status err;
u8 *context_bytes;
- err = i40e_hmc_get_object_va(&hw->hmc, &context_bytes,
+ err = i40e_hmc_get_object_va(hw, &context_bytes,
I40E_HMC_LAN_RX, queue);
if (err < 0)
return err;
--
2.21.0
^ permalink raw reply related
* [net-next 13/15] ixgbe: sync the first fragment unconditionally
From: Jeff Kirsher @ 2019-08-28 6:44 UTC (permalink / raw)
To: davem
Cc: Firo Yang, netdev, nhorman, sassmann, Alexander Duyck,
Andrew Bowers, Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>
From: Firo Yang <firo.yang@suse.com>
In Xen environment, if Xen-swiotlb is enabled, ixgbe driver
could possibly allocate a page, DMA memory buffer, for the first
fragment which is not suitable for Xen-swiotlb to do DMA operations.
Xen-swiotlb have to internally allocate another page for doing DMA
operations. This mechanism requires syncing the data from the internal
page to the page which ixgbe sends to upper network stack. However,
since commit f3213d932173 ("ixgbe: Update driver to make use of DMA
attributes in Rx path"), the unmap operation is performed with
DMA_ATTR_SKIP_CPU_SYNC. As a result, the sync is not performed.
Since the sync isn't performed, the upper network stack could receive
a incomplete network packet. By incomplete, it means the linear data
on the first fragment(between skb->head and skb->end) is invalid. So
we have to copy the data from the internal xen-swiotlb page to the page
which ixgbe sends to upper network stack through the sync operation.
More details from Alexander Duyck:
Specifically since we are mapping the frame with
DMA_ATTR_SKIP_CPU_SYNC we have to unmap with that as well. As a result
a sync is not performed on an unmap and must be done manually as we
skipped it for the first frag. As such we need to always sync before
possibly performing a page unmap operation.
Fixes: f3213d932173 ("ixgbe: Update driver to make use of DMA
attributes in Rx path")
Signed-off-by: Firo Yang <firo.yang@suse.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 17b7ae9f46ec..f5fc5929a15d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1825,13 +1825,7 @@ static void ixgbe_pull_tail(struct ixgbe_ring *rx_ring,
static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
struct sk_buff *skb)
{
- /* if the page was released unmap it, else just sync our portion */
- if (unlikely(IXGBE_CB(skb)->page_released)) {
- dma_unmap_page_attrs(rx_ring->dev, IXGBE_CB(skb)->dma,
- ixgbe_rx_pg_size(rx_ring),
- DMA_FROM_DEVICE,
- IXGBE_RX_DMA_ATTR);
- } else if (ring_uses_build_skb(rx_ring)) {
+ if (ring_uses_build_skb(rx_ring)) {
unsigned long offset = (unsigned long)(skb->data) & ~PAGE_MASK;
dma_sync_single_range_for_cpu(rx_ring->dev,
@@ -1848,6 +1842,14 @@ static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
skb_frag_size(frag),
DMA_FROM_DEVICE);
}
+
+ /* If the page was released, just unmap it. */
+ if (unlikely(IXGBE_CB(skb)->page_released)) {
+ dma_unmap_page_attrs(rx_ring->dev, IXGBE_CB(skb)->dma,
+ ixgbe_rx_pg_size(rx_ring),
+ DMA_FROM_DEVICE,
+ IXGBE_RX_DMA_ATTR);
+ }
}
/**
--
2.21.0
^ permalink raw reply related
* [net-next 12/15] i40e: Remove EMPR traces from debugfs facility
From: Jeff Kirsher @ 2019-08-28 6:44 UTC (permalink / raw)
To: davem
Cc: Mauro S. M. Rodrigues, netdev, nhorman, sassmann, Andrew Bowers,
Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>
From: "Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com>
Since commit
'5098850c9b9b ("i40e/i40evf: i40e_register.h updates")'
it is no longer possible to trigger an EMP Reset from debugfs, but it's
possible to request it either way, to end up with a bad reset request:
echo empr > /sys/kernel/debug/i40e/0002\:01\:00.1/command
i40e 0002:01:00.1: debugfs: forcing EMPR
i40e 0002:01:00.1: bad reset request 0x00010000
So let's remove this piece of code and show the available valid commands
as it is when any invalid command is issued.
Signed-off-by: "Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e.h | 1 -
drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 4 ----
2 files changed, 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 3e535d3263b3..f1a1bd324b50 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -131,7 +131,6 @@ enum i40e_state_t {
__I40E_PF_RESET_REQUESTED,
__I40E_CORE_RESET_REQUESTED,
__I40E_GLOBAL_RESET_REQUESTED,
- __I40E_EMP_RESET_REQUESTED,
__I40E_EMP_RESET_INTR_RECEIVED,
__I40E_SUSPENDED,
__I40E_PTP_TX_IN_PROGRESS,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 41232898d8ae..99ea543dd245 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -1125,10 +1125,6 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
dev_info(&pf->pdev->dev, "debugfs: forcing GlobR\n");
i40e_do_reset_safe(pf, BIT(__I40E_GLOBAL_RESET_REQUESTED));
- } else if (strncmp(cmd_buf, "empr", 4) == 0) {
- dev_info(&pf->pdev->dev, "debugfs: forcing EMPR\n");
- i40e_do_reset_safe(pf, BIT(__I40E_EMP_RESET_REQUESTED));
-
} else if (strncmp(cmd_buf, "read", 4) == 0) {
u32 address;
u32 value;
--
2.21.0
^ permalink raw reply related
* [net-next 15/15] i40e: Add support for X710 device
From: Jeff Kirsher @ 2019-08-28 6:44 UTC (permalink / raw)
To: davem
Cc: Mariusz Stachura, netdev, nhorman, sassmann, Andrew Bowers,
Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>
From: Mariusz Stachura <mariusz.stachura@intel.com>
Add I40E_DEV_ID_10G_BASE_T_BC to i40e_pci_tbl
Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index fdf43d87e983..a71369546c23 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -73,6 +73,7 @@ static const struct pci_device_id i40e_pci_tbl[] = {
{PCI_VDEVICE(INTEL, I40E_DEV_ID_QSFP_C), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_BASE_T), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_BASE_T4), 0},
+ {PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_BASE_T_BC), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_SFP), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_B), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_KX_X722), 0},
--
2.21.0
^ permalink raw reply related
* [net-next 14/15] igc: Add tx_csum offload functionality
From: Jeff Kirsher @ 2019-08-28 6:44 UTC (permalink / raw)
To: davem; +Cc: Sasha Neftin, netdev, nhorman, sassmann, Aaron Brown,
Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>
From: Sasha Neftin <sasha.neftin@intel.com>
Add IP generic TX checksum offload functionality.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/igc/igc.h | 4 +
drivers/net/ethernet/intel/igc/igc_base.h | 8 ++
drivers/net/ethernet/intel/igc/igc_defines.h | 5 +
drivers/net/ethernet/intel/igc/igc_main.c | 97 ++++++++++++++++++++
4 files changed, 114 insertions(+)
diff --git a/drivers/net/ethernet/intel/igc/igc.h b/drivers/net/ethernet/intel/igc/igc.h
index 0f5534ce27b0..7e16345d836e 100644
--- a/drivers/net/ethernet/intel/igc/igc.h
+++ b/drivers/net/ethernet/intel/igc/igc.h
@@ -135,6 +135,9 @@ extern char igc_driver_version[];
/* How many Rx Buffers do we bundle into one write to the hardware ? */
#define IGC_RX_BUFFER_WRITE 16 /* Must be power of 2 */
+/* VLAN info */
+#define IGC_TX_FLAGS_VLAN_MASK 0xffff0000
+
/* igc_test_staterr - tests bits within Rx descriptor status and error fields */
static inline __le32 igc_test_staterr(union igc_adv_rx_desc *rx_desc,
const u32 stat_err_bits)
@@ -254,6 +257,7 @@ struct igc_ring {
u16 count; /* number of desc. in the ring */
u8 queue_index; /* logical index of the ring*/
u8 reg_idx; /* physical index of the ring */
+ bool launchtime_enable; /* true if LaunchTime is enabled */
/* everything past this point are written often */
u16 next_to_clean;
diff --git a/drivers/net/ethernet/intel/igc/igc_base.h b/drivers/net/ethernet/intel/igc/igc_base.h
index 58d1109d7f3f..ea627ce52525 100644
--- a/drivers/net/ethernet/intel/igc/igc_base.h
+++ b/drivers/net/ethernet/intel/igc/igc_base.h
@@ -22,6 +22,14 @@ union igc_adv_tx_desc {
} wb;
};
+/* Context descriptors */
+struct igc_adv_tx_context_desc {
+ __le32 vlan_macip_lens;
+ __le32 launch_time;
+ __le32 type_tucmd_mlhl;
+ __le32 mss_l4len_idx;
+};
+
/* Adv Transmit Descriptor Config Masks */
#define IGC_ADVTXD_MAC_TSTAMP 0x00080000 /* IEEE1588 Timestamp packet */
#define IGC_ADVTXD_DTYP_CTXT 0x00200000 /* Advanced Context Descriptor */
diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h
index 549134ecd105..f3f2325fe567 100644
--- a/drivers/net/ethernet/intel/igc/igc_defines.h
+++ b/drivers/net/ethernet/intel/igc/igc_defines.h
@@ -397,4 +397,9 @@
#define IGC_VLAPQF_P_VALID(_n) (0x1 << (3 + (_n) * 4))
#define IGC_VLAPQF_QUEUE_MASK 0x03
+#define IGC_ADVTXD_MACLEN_SHIFT 9 /* Adv ctxt desc mac len shift */
+#define IGC_ADVTXD_TUCMD_IPV4 0x00000400 /* IP Packet Type:1=IPv4 */
+#define IGC_ADVTXD_TUCMD_L4T_TCP 0x00000800 /* L4 Packet Type of TCP */
+#define IGC_ADVTXD_TUCMD_L4T_SCTP 0x00001000 /* L4 packet TYPE of SCTP */
+
#endif /* _IGC_DEFINES_H_ */
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index 965d1c939f0f..63b62d74f961 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -5,6 +5,11 @@
#include <linux/types.h>
#include <linux/if_vlan.h>
#include <linux/aer.h>
+#include <linux/tcp.h>
+#include <linux/udp.h>
+#include <linux/ip.h>
+
+#include <net/ipv6.h>
#include "igc.h"
#include "igc_hw.h"
@@ -790,8 +795,96 @@ static int igc_set_mac(struct net_device *netdev, void *p)
return 0;
}
+static void igc_tx_ctxtdesc(struct igc_ring *tx_ring,
+ struct igc_tx_buffer *first,
+ u32 vlan_macip_lens, u32 type_tucmd,
+ u32 mss_l4len_idx)
+{
+ struct igc_adv_tx_context_desc *context_desc;
+ u16 i = tx_ring->next_to_use;
+ struct timespec64 ts;
+
+ context_desc = IGC_TX_CTXTDESC(tx_ring, i);
+
+ i++;
+ tx_ring->next_to_use = (i < tx_ring->count) ? i : 0;
+
+ /* set bits to identify this as an advanced context descriptor */
+ type_tucmd |= IGC_TXD_CMD_DEXT | IGC_ADVTXD_DTYP_CTXT;
+
+ /* For 82575, context index must be unique per ring. */
+ if (test_bit(IGC_RING_FLAG_TX_CTX_IDX, &tx_ring->flags))
+ mss_l4len_idx |= tx_ring->reg_idx << 4;
+
+ context_desc->vlan_macip_lens = cpu_to_le32(vlan_macip_lens);
+ context_desc->type_tucmd_mlhl = cpu_to_le32(type_tucmd);
+ context_desc->mss_l4len_idx = cpu_to_le32(mss_l4len_idx);
+
+ /* We assume there is always a valid Tx time available. Invalid times
+ * should have been handled by the upper layers.
+ */
+ if (tx_ring->launchtime_enable) {
+ ts = ns_to_timespec64(first->skb->tstamp);
+ first->skb->tstamp = 0;
+ context_desc->launch_time = cpu_to_le32(ts.tv_nsec / 32);
+ } else {
+ context_desc->launch_time = 0;
+ }
+}
+
+static inline bool igc_ipv6_csum_is_sctp(struct sk_buff *skb)
+{
+ unsigned int offset = 0;
+
+ ipv6_find_hdr(skb, &offset, IPPROTO_SCTP, NULL, NULL);
+
+ return offset == skb_checksum_start_offset(skb);
+}
+
static void igc_tx_csum(struct igc_ring *tx_ring, struct igc_tx_buffer *first)
{
+ struct sk_buff *skb = first->skb;
+ u32 vlan_macip_lens = 0;
+ u32 type_tucmd = 0;
+
+ if (skb->ip_summed != CHECKSUM_PARTIAL) {
+csum_failed:
+ if (!(first->tx_flags & IGC_TX_FLAGS_VLAN) &&
+ !tx_ring->launchtime_enable)
+ return;
+ goto no_csum;
+ }
+
+ switch (skb->csum_offset) {
+ case offsetof(struct tcphdr, check):
+ type_tucmd = IGC_ADVTXD_TUCMD_L4T_TCP;
+ /* fall through */
+ case offsetof(struct udphdr, check):
+ break;
+ case offsetof(struct sctphdr, checksum):
+ /* validate that this is actually an SCTP request */
+ if ((first->protocol == htons(ETH_P_IP) &&
+ (ip_hdr(skb)->protocol == IPPROTO_SCTP)) ||
+ (first->protocol == htons(ETH_P_IPV6) &&
+ igc_ipv6_csum_is_sctp(skb))) {
+ type_tucmd = IGC_ADVTXD_TUCMD_L4T_SCTP;
+ break;
+ }
+ /* fall through */
+ default:
+ skb_checksum_help(skb);
+ goto csum_failed;
+ }
+
+ /* update TX checksum flag */
+ first->tx_flags |= IGC_TX_FLAGS_CSUM;
+ vlan_macip_lens = skb_checksum_start_offset(skb) -
+ skb_network_offset(skb);
+no_csum:
+ vlan_macip_lens |= skb_network_offset(skb) << IGC_ADVTXD_MACLEN_SHIFT;
+ vlan_macip_lens |= first->tx_flags & IGC_TX_FLAGS_VLAN_MASK;
+
+ igc_tx_ctxtdesc(tx_ring, first, vlan_macip_lens, type_tucmd, 0);
}
static int __igc_maybe_stop_tx(struct igc_ring *tx_ring, const u16 size)
@@ -4116,6 +4209,9 @@ static int igc_probe(struct pci_dev *pdev,
if (err)
goto err_sw_init;
+ /* Add supported features to the features list*/
+ netdev->features |= NETIF_F_HW_CSUM;
+
/* setup the private structure */
err = igc_sw_init(adapter);
if (err)
@@ -4123,6 +4219,7 @@ static int igc_probe(struct pci_dev *pdev,
/* copy netdev features into list of user selectable features */
netdev->hw_features |= NETIF_F_NTUPLE;
+ netdev->hw_features |= netdev->features;
/* MTU range: 68 - 9216 */
netdev->min_mtu = ETH_MIN_MTU;
--
2.21.0
^ permalink raw reply related
* [net-next 11/15] i40e: Implement debug macro hw_dbg using pr_debug
From: Jeff Kirsher @ 2019-08-28 6:44 UTC (permalink / raw)
To: davem
Cc: Mauro S. M. Rodrigues, netdev, nhorman, sassmann, Andrew Bowers,
Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>
From: "Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com>
There are several uses of hw_dbg in the code, producing no output. This
patch implements it using pr_debug.
Initially the intention was to implement it using netdev_dbg, analogously
to what is done in ixgbe for instance. That approach was avoided due to
some early usages of hw_dbg, like i40e_pf_reset, before the VSI structure
initialization causing NULL pointer dereference during the driver probe if
the dbg messages were turned on as soon as the module is probed.
Signed-off-by: "Mauro S. M. Rodrigues" <maurosr@linux.vnet.ibm.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_common.c | 1 +
drivers/net/ethernet/intel/i40e/i40e_hmc.c | 1 +
drivers/net/ethernet/intel/i40e/i40e_osdep.h | 7 ++++++-
3 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 46e649c09f72..d37c6e0e5f08 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright(c) 2013 - 2018 Intel Corporation. */
+#include "i40e.h"
#include "i40e_type.h"
#include "i40e_adminq.h"
#include "i40e_prototype.h"
diff --git a/drivers/net/ethernet/intel/i40e/i40e_hmc.c b/drivers/net/ethernet/intel/i40e/i40e_hmc.c
index 19ce93d7fd0a..163ee8c6311c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_hmc.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_hmc.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright(c) 2013 - 2018 Intel Corporation. */
+#include "i40e.h"
#include "i40e_osdep.h"
#include "i40e_register.h"
#include "i40e_status.h"
diff --git a/drivers/net/ethernet/intel/i40e/i40e_osdep.h b/drivers/net/ethernet/intel/i40e/i40e_osdep.h
index a07574bff550..c0c9ce3eab23 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_osdep.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_osdep.h
@@ -18,7 +18,12 @@
* actual OS primitives
*/
-#define hw_dbg(hw, S, A...) do {} while (0)
+#define hw_dbg(hw, S, A...) \
+do { \
+ int domain = pci_domain_nr(((struct i40e_pf *)(hw)->back)->pdev->bus); \
+ pr_debug("i40e %04x:%02x:%02x.%x " S, domain, (hw)->bus.bus_id, \
+ (hw)->bus.device, (hw)->bus.func, ## A); \
+} while (0)
#define wr32(a, reg, value) writel((value), ((a)->hw_addr + (reg)))
#define rd32(a, reg) readl((a)->hw_addr + (reg))
--
2.21.0
^ permalink raw reply related
* [net-next 01/15] iavf: remove unused debug function iavf_debug_d
From: Jeff Kirsher @ 2019-08-28 6:43 UTC (permalink / raw)
To: davem
Cc: YueHaibing, netdev, nhorman, sassmann, Hulk Robot, Andrew Bowers,
Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>
From: YueHaibing <yuehaibing@huawei.com>
There is no caller of function iavf_debug_d() in tree since
commit 75051ce4c5d8 ("iavf: Fix up debug print macro"),
so it can be removed.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/iavf/iavf_main.c | 22 ---------------------
1 file changed, 22 deletions(-)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 9d2b50964a08..554aa619ff02 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -142,28 +142,6 @@ enum iavf_status iavf_free_virt_mem_d(struct iavf_hw *hw,
return 0;
}
-/**
- * iavf_debug_d - OS dependent version of debug printing
- * @hw: pointer to the HW structure
- * @mask: debug level mask
- * @fmt_str: printf-type format description
- **/
-void iavf_debug_d(void *hw, u32 mask, char *fmt_str, ...)
-{
- char buf[512];
- va_list argptr;
-
- if (!(mask & ((struct iavf_hw *)hw)->debug_mask))
- return;
-
- va_start(argptr, fmt_str);
- vsnprintf(buf, sizeof(buf), fmt_str, argptr);
- va_end(argptr);
-
- /* the debug string is already formatted with a newline */
- pr_info("%s", buf);
-}
-
/**
* iavf_schedule_reset - Set the flags and schedule a reset event
* @adapter: board private structure
--
2.21.0
^ permalink raw reply related
* [net-next 03/15] e1000e: Make speed detection on hotplugging cable more reliable
From: Jeff Kirsher @ 2019-08-28 6:43 UTC (permalink / raw)
To: davem; +Cc: Kai-Heng Feng, netdev, nhorman, sassmann, Aaron Brown,
Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>
From: Kai-Heng Feng <kai.heng.feng@canonical.com>
After hot plugging an 1Gbps Ethernet cable with 1Gbps link partner, the
MII_BMSR may report 10Mbps, renders the network rather slow.
The issue has much lower fail rate after commit 59653e6497d1 ("e1000e:
Make watchdog use delayed work"), which essentially introduces some
delay before running the watchdog task.
But there's still a chance that the hot plugging event and the queued
watchdog task gets run at the same time, then the original issue can be
observed once again.
So let's use mod_delayed_work() to add a deterministic 1 second delay
before running watchdog task, after an interrupt.
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/e1000e/netdev.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 8a3f035c3a5f..d7d56e42a6aa 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -1780,8 +1780,8 @@ static irqreturn_t e1000_intr_msi(int __always_unused irq, void *data)
}
/* guard against interrupt when we're going down */
if (!test_bit(__E1000_DOWN, &adapter->state))
- queue_delayed_work(adapter->e1000_workqueue,
- &adapter->watchdog_task, 1);
+ mod_delayed_work(adapter->e1000_workqueue,
+ &adapter->watchdog_task, HZ);
}
/* Reset on uncorrectable ECC error */
@@ -1861,8 +1861,8 @@ static irqreturn_t e1000_intr(int __always_unused irq, void *data)
}
/* guard against interrupt when we're going down */
if (!test_bit(__E1000_DOWN, &adapter->state))
- queue_delayed_work(adapter->e1000_workqueue,
- &adapter->watchdog_task, 1);
+ mod_delayed_work(adapter->e1000_workqueue,
+ &adapter->watchdog_task, HZ);
}
/* Reset on uncorrectable ECC error */
@@ -1907,8 +1907,8 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data)
hw->mac.get_link_status = true;
/* guard against interrupt when we're going down */
if (!test_bit(__E1000_DOWN, &adapter->state))
- queue_delayed_work(adapter->e1000_workqueue,
- &adapter->watchdog_task, 1);
+ mod_delayed_work(adapter->e1000_workqueue,
+ &adapter->watchdog_task, HZ);
}
if (!test_bit(__E1000_DOWN, &adapter->state))
--
2.21.0
^ permalink raw reply related
* [net-next 02/15] ixgbevf: Link lost in VM on ixgbevf when restoring from freeze or suspend
From: Jeff Kirsher @ 2019-08-28 6:43 UTC (permalink / raw)
To: davem; +Cc: Radoslaw Tyl, netdev, nhorman, sassmann, Andrew Bowers,
Jeff Kirsher
In-Reply-To: <20190828064407.30168-1-jeffrey.t.kirsher@intel.com>
From: Radoslaw Tyl <radoslawx.tyl@intel.com>
This patch fixed issue in VM which shows no link when hypervisor is
restored from low-power state. The driver is responsible for re-enabling
any features of the device that had been disabled during suspend calls,
such as IRQs and bus mastering.
Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 8c011d4ce7a9..75e849a64db7 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -2517,6 +2517,7 @@ void ixgbevf_reinit_locked(struct ixgbevf_adapter *adapter)
msleep(1);
ixgbevf_down(adapter);
+ pci_set_master(adapter->pdev);
ixgbevf_up(adapter);
clear_bit(__IXGBEVF_RESETTING, &adapter->state);
--
2.21.0
^ permalink raw reply related
* [net-next 00/15][pull request] Intel Wired LAN Driver Updates 2019-08-27
From: Jeff Kirsher @ 2019-08-28 6:43 UTC (permalink / raw)
To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann
This series contains a variety of cold and hot savoury changes to Intel
drivers. Some of the fixes could be considered for stable even though
the author did not request it.
Hulk Robert cleans up (i.e. removes) a function that has no caller for
the iavf driver.
Radoslaw fixes an issue when there is no link in the VM after the
hypervisor is restored from a low-power state due to the driver not
properly restoring features in the device that had been disabled during
the suspension for ixgbevf.
Kai-Heng Feng modified e1000e to use mod_delayed_work() to help resolve
a hot plug speed detection issue by adding a deterministic 1 second
delay before running watchdog task after an interrupt.
Sasha moves functions around to avoid forward declarations, since the
forward declarations are not necessary for these static functions in
igc. Also added a check for igc during driver probe to validate the NVM
checksum. Cleaned up code defines that were not being used in the igc
driver. Adds support for IP generic transmit checksum offload in the
igc driver.
Updated the iavf kernel documentation by a developer with no life.
Jake provides another fm10k update to a local variable for ease of code
readability.
Mitch fixes the iavf driver to allow the VF to override the MAC address
set by the host, if the VF is in "trusted" mode.
Mauro S. M. Rodrigues provides several changes for i40e driver, first
with resolving hw_dbg usage and referencing a i40e_hw attribute. Also
implemented a debug macro using pr_debug, since the use of netdev_dbg
could cause a NULL pointer dereference during probe. Finally cleaned up
code that is no longer used or needed.
Firo Yang provides a change in the ixgbe driver to ensure we sync the
first fragment unconditionally to help resolve an issue seen in the XEN
environment when the upper network stack could receive an incomplete
network packet.
Mariusz adds a missing device to the i40e PCI table in the driver.
The following are changes since commit 68aaf4459556b1f9370c259fd486aecad2257552:
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
and are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 10GbE
Firo Yang (1):
ixgbe: sync the first fragment unconditionally
Jacob Keller (1):
fm10k: use a local variable for the frag pointer
Jeff Kirsher (1):
Documentation: iavf: Update the Intel LAN driver doc for iavf
Kai-Heng Feng (1):
e1000e: Make speed detection on hotplugging cable more reliable
Mariusz Stachura (1):
i40e: Add support for X710 device
Mauro S. M. Rodrigues (3):
i40e: fix hw_dbg usage in i40e_hmc_get_object_va
i40e: Implement debug macro hw_dbg using pr_debug
i40e: Remove EMPR traces from debugfs facility
Mitch Williams (1):
iavf: allow permanent MAC address to change
Radoslaw Tyl (1):
ixgbevf: Link lost in VM on ixgbevf when restoring from freeze or
suspend
Sasha Neftin (4):
igc: Remove useless forward declaration
igc: Add NVM checksum validation
igc: Remove unneeded PCI bus defines
igc: Add tx_csum offload functionality
YueHaibing (1):
iavf: remove unused debug function iavf_debug_d
.../networking/device_drivers/intel/iavf.rst | 115 ++++++++---
drivers/net/ethernet/intel/e1000e/netdev.c | 12 +-
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 8 +-
drivers/net/ethernet/intel/i40e/i40e.h | 1 -
drivers/net/ethernet/intel/i40e/i40e_common.c | 1 +
.../net/ethernet/intel/i40e/i40e_debugfs.c | 4 -
drivers/net/ethernet/intel/i40e/i40e_hmc.c | 1 +
.../net/ethernet/intel/i40e/i40e_lan_hmc.c | 14 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
drivers/net/ethernet/intel/i40e/i40e_osdep.h | 7 +-
drivers/net/ethernet/intel/iavf/iavf.h | 1 -
drivers/net/ethernet/intel/iavf/iavf_main.c | 26 ---
drivers/net/ethernet/intel/igc/igc.h | 4 +
drivers/net/ethernet/intel/igc/igc_base.h | 8 +
drivers/net/ethernet/intel/igc/igc_defines.h | 9 +-
drivers/net/ethernet/intel/igc/igc_mac.c | 73 ++++---
drivers/net/ethernet/intel/igc/igc_main.c | 106 ++++++++++
drivers/net/ethernet/intel/igc/igc_phy.c | 192 +++++++++---------
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 16 +-
.../net/ethernet/intel/ixgbevf/ixgbevf_main.c | 1 +
20 files changed, 372 insertions(+), 228 deletions(-)
--
2.21.0
^ permalink raw reply
* general protection fault in tls_sk_proto_close (2)
From: syzbot @ 2019-08-28 6:38 UTC (permalink / raw)
To: aviadye, borisp, daniel, davejwatson, davem, jakub.kicinski,
john.fastabend, linux-kernel, netdev, syzkaller-bugs
Hello,
syzbot found the following crash on:
HEAD commit: a55aa89a Linux 5.3-rc6
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16c26ebc600000
kernel config: https://syzkaller.appspot.com/x/.config?x=2a6a2b9826fdadf9
dashboard link: https://syzkaller.appspot.com/bug?extid=7a6ee4d0078eac6bf782
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1112a4de600000
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+7a6ee4d0078eac6bf782@syzkaller.appspotmail.com
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 10290 Comm: syz-executor.0 Not tainted 5.3.0-rc6 #120
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:tls_sk_proto_close+0xe5/0x990 net/tls/tls_main.c:298
Code: 0f 85 3f 08 00 00 49 8b 84 24 c0 02 00 00 4d 8d 75 14 4c 89 f2 48 c1
ea 03 48 89 85 50 ff ff ff 48 b8 00 00 00 00 00 fc ff df <0f> b6 04 02 4c
89 f2 83 e2 07 38 d0 7f 08 84 c0 0f 85 2e 06 00 00
RSP: 0018:ffff88809b23fb90 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: ffffffff862cb8db
RDX: 0000000000000002 RSI: ffffffff862cb639 RDI: ffff8880a155ef00
RBP: ffff88809b23fc48 R08: ffff888094344640 R09: ffffed10142abd9a
R10: ffffed10142abd99 R11: ffff8880a155eccb R12: ffff8880a155ec40
R13: 0000000000000000 R14: 0000000000000014 R15: 0000000000000001
FS: 00005555556a8940(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f353458e000 CR3: 00000000a9174000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
tls_sk_proto_close+0x35b/0x990 net/tls/tls_main.c:321
tcp_bpf_close+0x17c/0x390 net/ipv4/tcp_bpf.c:582
inet_release+0xed/0x200 net/ipv4/af_inet.c:427
inet6_release+0x53/0x80 net/ipv6/af_inet6.c:470
__sock_release+0xce/0x280 net/socket.c:590
sock_close+0x1e/0x30 net/socket.c:1268
__fput+0x2ff/0x890 fs/file_table.c:280
____fput+0x16/0x20 fs/file_table.c:313
task_work_run+0x145/0x1c0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop+0x316/0x380 arch/x86/entry/common.c:163
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
do_syscall_64+0x5a9/0x6a0 arch/x86/entry/common.c:299
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x413540
Code: 01 f0 ff ff 0f 83 30 1b 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
44 00 00 83 3d 4d 2d 66 00 00 75 14 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff
ff 0f 83 04 1b 00 00 c3 48 83 ec 08 e8 0a fc ff ff
RSP: 002b:00007fff5d481778 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000413540
RDX: 0000001b2e520000 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffffffffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000075bf20
R13: 0000000000000003 R14: 0000000000761220 R15: ffffffffffffffff
Modules linked in:
---[ end trace bdfd4385a0f1f76d ]---
RIP: 0010:tls_sk_proto_close+0xe5/0x990 net/tls/tls_main.c:298
Code: 0f 85 3f 08 00 00 49 8b 84 24 c0 02 00 00 4d 8d 75 14 4c 89 f2 48 c1
ea 03 48 89 85 50 ff ff ff 48 b8 00 00 00 00 00 fc ff df <0f> b6 04 02 4c
89 f2 83 e2 07 38 d0 7f 08 84 c0 0f 85 2e 06 00 00
RSP: 0018:ffff88809b23fb90 EFLAGS: 00010203
RAX: dffffc0000000000 RBX: dffffc0000000000 RCX: ffffffff862cb8db
RDX: 0000000000000002 RSI: ffffffff862cb639 RDI: ffff8880a155ef00
RBP: ffff88809b23fc48 R08: ffff888094344640 R09: ffffed10142abd9a
R10: ffffed10142abd99 R11: ffff8880a155eccb R12: ffff8880a155ec40
R13: 0000000000000000 R14: 0000000000000014 R15: 0000000000000001
FS: 00005555556a8940(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f353458e000 CR3: 00000000a9174000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches
^ permalink raw reply
* WARNING in smc_unhash_sk (3)
From: syzbot @ 2019-08-28 6:38 UTC (permalink / raw)
To: davem, kgraul, linux-kernel, linux-s390, netdev, syzkaller-bugs,
ubraun
Hello,
syzbot found the following crash on:
HEAD commit: a55aa89a Linux 5.3-rc6
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=112dd212600000
kernel config: https://syzkaller.appspot.com/x/.config?x=58485246ad14eafe
dashboard link: https://syzkaller.appspot.com/bug?extid=8488cc4cf1c9e09b8b86
compiler: clang version 9.0.0 (/home/glider/llvm/clang
80fee25776c2fb61e74c1ecb1a523375c2500b69)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15426ebc600000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=116aca7a600000
IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+8488cc4cf1c9e09b8b86@syzkaller.appspotmail.com
------------[ cut here ]------------
WARNING: CPU: 0 PID: 9198 at ./include/net/sock.h:666 sk_del_node_init
include/net/sock.h:666 [inline]
WARNING: CPU: 0 PID: 9198 at ./include/net/sock.h:666
smc_unhash_sk+0x21b/0x240 net/smc/af_smc.c:96
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 9198 Comm: syz-executor057 Not tainted 5.3.0-rc6 #93
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1d8/0x2f8 lib/dump_stack.c:113
panic+0x25c/0x799 kernel/panic.c:219
__warn+0x22f/0x230 kernel/panic.c:576
report_bug+0x190/0x290 lib/bug.c:186
fixup_bug arch/x86/kernel/traps.c:179 [inline]
do_error_trap+0xd7/0x440 arch/x86/kernel/traps.c:272
do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:291
invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1028
RIP: 0010:sk_del_node_init include/net/sock.h:666 [inline]
RIP: 0010:smc_unhash_sk+0x21b/0x240 net/smc/af_smc.c:96
Code: 48 89 df e8 07 b1 39 00 48 83 c4 20 5b 41 5c 41 5d 41 5e 41 5f 5d c3
e8 03 d7 31 fa 48 c7 c7 f2 c3 3a 88 31 c0 e8 28 1d 1b fa <0f> 0b eb 85 44
89 f1 80 e1 07 80 c1 03 38 c1 0f 8c 5b ff ff ff 4c
RSP: 0018:ffff888094177b68 EFLAGS: 00010246
RAX: 0000000000000024 RBX: 0000000000000001 RCX: b964ece25f6b7c00
RDX: 0000000000000000 RSI: 0000000000000201 RDI: 0000000000000000
RBP: ffff888094177bb0 R08: ffffffff815cf7d4 R09: ffffed1015d46088
R10: ffffed1015d46088 R11: 0000000000000000 R12: ffff888098ccb240
R13: dffffc0000000000 R14: ffff888098ccb2c0 R15: ffff888098ccb268
__smc_release+0x1f8/0x3a0 net/smc/af_smc.c:146
smc_release+0x15b/0x2c0 net/smc/af_smc.c:185
__sock_release net/socket.c:590 [inline]
sock_close+0xe1/0x260 net/socket.c:1268
__fput+0x2e4/0x740 fs/file_table.c:280
____fput+0x15/0x20 fs/file_table.c:313
task_work_run+0x17e/0x1b0 kernel/task_work.c:113
exit_task_work include/linux/task_work.h:22 [inline]
do_exit+0x5e8/0x21a0 kernel/exit.c:879
do_group_exit+0x15c/0x2b0 kernel/exit.c:983
__do_sys_exit_group+0x17/0x20 kernel/exit.c:994
__se_sys_exit_group+0x14/0x20 kernel/exit.c:992
__x64_sys_exit_group+0x3b/0x40 kernel/exit.c:992
do_syscall_64+0xfe/0x140 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x43ff28
Code: 00 00 be 3c 00 00 00 eb 19 66 0f 1f 84 00 00 00 00 00 48 89 d7 89 f0
0f 05 48 3d 00 f0 ff ff 77 21 f4 48 89 d7 44 89 c0 0f 05 <48> 3d 00 f0 ff
ff 76 e0 f7 d8 64 41 89 01 eb d8 0f 1f 84 00 00 00
RSP: 002b:00007ffefacce238 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 000000000043ff28
RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
RBP: 00000000004bf750 R08: 00000000000000e7 R09: ffffffffffffffd0
R10: 00000000200000c0 R11: 0000000000000246 R12: 0000000000000001
R13: 00000000006d1180 R14: 0000000000000000 R15: 0000000000000000
Kernel Offset: disabled
Rebooting in 86400 seconds..
---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.
syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches
^ permalink raw reply
* Re: [PATCH net-next v2 2/3] dt-bindings: net: dsa: mt7530: Add support for port 5
From: René van Dorst @ 2019-08-28 6:35 UTC (permalink / raw)
To: Rob Herring
Cc: Sean Wang, Andrew Lunn, Vivien Didelot, Florian Fainelli,
David S . Miller, Matthias Brugger, netdev, linux-arm-kernel,
linux-mediatek, John Crispin, linux-mips, Frank Wunderlich,
devicetree
In-Reply-To: <20190827222251.GA30507@bogus>
Hi Rob,
Quoting Rob Herring <robh@kernel.org>:
> On Wed, Aug 21, 2019 at 04:45:46PM +0200, René van Dorst wrote:
>> MT7530 port 5 has many modes/configurations.
>> Update the documentation how to use port 5.
>>
>> Signed-off-by: René van Dorst <opensource@vdorst.com>
>> Cc: devicetree@vger.kernel.org
>> Cc: Rob Herring <robh@kernel.org>
>
>> v1->v2:
>> * Adding extra note about RGMII2 and gpio use.
>> rfc->v1:
>> * No change
>
> The changelog goes below the '---'
>
Thanks for the review,
I shall fix that.
>> ---
>> .../devicetree/bindings/net/dsa/mt7530.txt | 218 ++++++++++++++++++
>> 1 file changed, 218 insertions(+)
>>
>> diff --git a/Documentation/devicetree/bindings/net/dsa/mt7530.txt
>> b/Documentation/devicetree/bindings/net/dsa/mt7530.txt
>> index 47aa205ee0bd..43993aae3f9c 100644
>> --- a/Documentation/devicetree/bindings/net/dsa/mt7530.txt
>> +++ b/Documentation/devicetree/bindings/net/dsa/mt7530.txt
>> @@ -35,6 +35,42 @@ Required properties for the child nodes within
>> ports container:
>> - phy-mode: String, must be either "trgmii" or "rgmii" for port labeled
>> "cpu".
>>
>> +Port 5 of the switch is muxed between:
>> +1. GMAC5: GMAC5 can interface with another external MAC or PHY.
>> +2. PHY of port 0 or port 4: PHY interfaces with an external MAC
>> like 2nd GMAC
>> + of the SOC. Used in many setups where port 0/4 becomes the WAN port.
>> + Note: On a MT7621 SOC with integrated switch: 2nd GMAC can only
>> connected to
>> + GMAC5 when the gpios for RGMII2 (GPIO 22-33) are not used and not
>> + connected to external component!
>> +
>> +Port 5 modes/configurations:
>> +1. Port 5 is disabled and isolated: An external phy can interface
>> to the 2nd
>> + GMAC of the SOC.
>> + In the case of a build-in MT7530 switch, port 5 shares the
>> RGMII bus with 2nd
>> + GMAC and an optional external phy. Mind the GPIO/pinctl
>> settings of the SOC!
>> +2. Port 5 is muxed to PHY of port 0/4: Port 0/4 interfaces with 2nd GMAC.
>> + It is a simple MAC to PHY interface, port 5 needs to be setup
>> for xMII mode
>> + and RGMII delay.
>> +3. Port 5 is muxed to GMAC5 and can interface to an external phy.
>> + Port 5 becomes an extra switch port.
>> + Only works on platform where external phy TX<->RX lines are swapped.
>> + Like in the Ubiquiti ER-X-SFP.
>> +4. Port 5 is muxed to GMAC5 and interfaces with the 2nd GAMC as
>> 2nd CPU port.
>> + Currently a 2nd CPU port is not supported by DSA code.
>> +
>> +Depending on how the external PHY is wired:
>> +1. normal: The PHY can only connect to 2nd GMAC but not to the switch
>> +2. swapped: RGMII TX, RX are swapped; external phy interface with
>> the switch as
>> + a ethernet port. But can't interface to the 2nd GMAC.
>> +
>> +Based on the DT the port 5 mode is configured.
>> +
>> +Driver tries to lookup the phy-handle of the 2nd GMAC of the master device.
>> +When phy-handle matches PHY of port 0 or 4 then port 5 set-up as mode 2.
>> +phy-mode must be set, see also example 2 below!
>> + * mt7621: phy-mode = "rgmii-txid";
>> + * mt7623: phy-mode = "rgmii";
>> +
>> See Documentation/devicetree/bindings/net/dsa/dsa.txt for a list
>> of additional
>> required, optional properties and how the integrated switch subnodes must
>> be specified.
>> @@ -94,3 +130,185 @@ Example:
>> };
>> };
>> };
>> +
>> +Example 2: MT7621: Port 4 is WAN port: 2nd GMAC -> Port 5 -> PHY port 4.
>> +
>> +ð {
>> + status = "okay";
>
> Don't show status in examples.
OK.
> This should show the complete node.
>
To be clear, I should take ethernet node from the mt7621.dtsi [0] or
mt7623.dtsi
[1] and insert the example below?, right?
Greats,
René
[0]:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/drivers/staging/mt7621-dts/mt7621.dtsi#n397
[1]:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/tree/arch/arm/boot/dts/mt7623.dtsi#n1023
>> +
>> + gmac0: mac@0 {
>> + compatible = "mediatek,eth-mac";
>> + reg = <0>;
>> + phy-mode = "rgmii";
>> +
>> + fixed-link {
>> + speed = <1000>;
>> + full-duplex;
>> + pause;
>> + };
>> + };
>> +
>> + gmac1: mac@1 {
>> + compatible = "mediatek,eth-mac";
>> + reg = <1>;
>> + phy-mode = "rgmii-txid";
>> + phy-handle = <&phy4>;
>> + };
>> +
>> + mdio: mdio-bus {
>> + #address-cells = <1>;
>> + #size-cells = <0>;
>> +
>> + /* Internal phy */
>> + phy4: ethernet-phy@4 {
>> + reg = <4>;
>> + };
>> +
>> + mt7530: switch@1f {
>> + compatible = "mediatek,mt7621";
>> + #address-cells = <1>;
>> + #size-cells = <0>;
>> + reg = <0x1f>;
>> + pinctrl-names = "default";
>> + mediatek,mcm;
>> +
>> + resets = <&rstctrl 2>;
>> + reset-names = "mcm";
>> +
>> + ports {
>> + #address-cells = <1>;
>> + #size-cells = <0>;
>> +
>> + port@0 {
>> + reg = <0>;
>> + label = "lan0";
>> + };
>> +
>> + port@1 {
>> + reg = <1>;
>> + label = "lan1";
>> + };
>> +
>> + port@2 {
>> + reg = <2>;
>> + label = "lan2";
>> + };
>> +
>> + port@3 {
>> + reg = <3>;
>> + label = "lan3";
>> + };
>> +
>> +/* Commented out. Port 4 is handled by 2nd GMAC.
>> + port@4 {
>> + reg = <4>;
>> + label = "lan4";
>> + };
>> +*/
>> +
>> + cpu_port0: port@6 {
>> + reg = <6>;
>> + label = "cpu";
>> + ethernet = <&gmac0>;
>> + phy-mode = "rgmii";
>> +
>> + fixed-link {
>> + speed = <1000>;
>> + full-duplex;
>> + pause;
>> + };
>> + };
>> + };
>> + };
>> + };
>> +};
>> +
>> +Example 3: MT7621: Port 5 is connected to external PHY: Port 5 ->
>> external PHY.
>> +
>> +ð {
>> + status = "okay";
>> +
>> + gmac0: mac@0 {
>> + compatible = "mediatek,eth-mac";
>> + reg = <0>;
>> + phy-mode = "rgmii";
>> +
>> + fixed-link {
>> + speed = <1000>;
>> + full-duplex;
>> + pause;
>> + };
>> + };
>> +
>> + mdio: mdio-bus {
>> + #address-cells = <1>;
>> + #size-cells = <0>;
>> +
>> + /* External phy */
>> + ephy5: ethernet-phy@7 {
>> + reg = <7>;
>> + };
>> +
>> + mt7530: switch@1f {
>> + compatible = "mediatek,mt7621";
>> + #address-cells = <1>;
>> + #size-cells = <0>;
>> + reg = <0x1f>;
>> + pinctrl-names = "default";
>> + mediatek,mcm;
>> +
>> + resets = <&rstctrl 2>;
>> + reset-names = "mcm";
>> +
>> + ports {
>> + #address-cells = <1>;
>> + #size-cells = <0>;
>> +
>> + port@0 {
>> + reg = <0>;
>> + label = "lan0";
>> + };
>> +
>> + port@1 {
>> + reg = <1>;
>> + label = "lan1";
>> + };
>> +
>> + port@2 {
>> + reg = <2>;
>> + label = "lan2";
>> + };
>> +
>> + port@3 {
>> + reg = <3>;
>> + label = "lan3";
>> + };
>> +
>> + port@4 {
>> + reg = <4>;
>> + label = "lan4";
>> + };
>> +
>> + port@5 {
>> + reg = <5>;
>> + label = "lan5";
>> + phy-mode = "rgmii";
>> + phy-handle = <&ephy5>;
>> + };
>> +
>> + cpu_port0: port@6 {
>> + reg = <6>;
>> + label = "cpu";
>> + ethernet = <&gmac0>;
>> + phy-mode = "rgmii";
>> +
>> + fixed-link {
>> + speed = <1000>;
>> + full-duplex;
>> + pause;
>> + };
>> + };
>> + };
>> + };
>> + };
>> +};
>> --
>> 2.20.1
>>
^ permalink raw reply
* Re: [PATCH net] net/sched: pfifo_fast: fix wrong dereference in pfifo_fast_enqueue
From: Paolo Abeni @ 2019-08-28 6:31 UTC (permalink / raw)
To: Davide Caratti, Cong Wang, Jamal Hadi Salim, Jiri Pirko,
David S. Miller, netdev
Cc: Stefano Brivio, Li Shuang
In-Reply-To: <d5a7a167ab57e035685445ee641840a0c5fd39ae.1566940693.git.dcaratti@redhat.com>
On Tue, 2019-08-27 at 23:18 +0200, Davide Caratti wrote:
> Now that 'TCQ_F_CPUSTATS' bit can be cleared, depending on the value of
> 'TCQ_F_NOLOCK' bit in the parent qdisc, we can't assume anymore that
> per-cpu counters are there in the error path of skb_array_produce().
> Otherwise, the following splat can be seen:
>
> Unable to handle kernel paging request at virtual address 0000600dea430008
> Mem abort info:
> ESR = 0x96000005
> Exception class = DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> Data abort info:
> ISV = 0, ISS = 0x00000005
> CM = 0, WnR = 0
> user pgtable: 64k pages, 48-bit VAs, pgdp = 000000007b97530e
> [0000600dea430008] pgd=0000000000000000, pud=0000000000000000
> Internal error: Oops: 96000005 [#1] SMP
> [...]
> pstate: 10000005 (nzcV daif -PAN -UAO)
> pc : pfifo_fast_enqueue+0x524/0x6e8
> lr : pfifo_fast_enqueue+0x46c/0x6e8
> sp : ffff800d39376fe0
> x29: ffff800d39376fe0 x28: 1ffff001a07d1e40
> x27: ffff800d03e8f188 x26: ffff800d03e8f200
> x25: 0000000000000062 x24: ffff800d393772f0
> x23: 0000000000000000 x22: 0000000000000403
> x21: ffff800cca569a00 x20: ffff800d03e8ee00
> x19: ffff800cca569a10 x18: 00000000000000bf
> x17: 0000000000000000 x16: 0000000000000000
> x15: 0000000000000000 x14: ffff1001a726edd0
> x13: 1fffe4000276a9a4 x12: 0000000000000000
> x11: dfff200000000000 x10: ffff800d03e8f1a0
> x9 : 0000000000000003 x8 : 0000000000000000
> x7 : 00000000f1f1f1f1 x6 : ffff1001a726edea
> x5 : ffff800cca56a53c x4 : 1ffff001bf9a8003
> x3 : 1ffff001bf9a8003 x2 : 1ffff001a07d1dcb
> x1 : 0000600dea430000 x0 : 0000600dea430008
> Process ping (pid: 6067, stack limit = 0x00000000dc0aa557)
> Call trace:
> pfifo_fast_enqueue+0x524/0x6e8
> htb_enqueue+0x660/0x10e0 [sch_htb]
> __dev_queue_xmit+0x123c/0x2de0
> dev_queue_xmit+0x24/0x30
> ip_finish_output2+0xc48/0x1720
> ip_finish_output+0x548/0x9d8
> ip_output+0x334/0x788
> ip_local_out+0x90/0x138
> ip_send_skb+0x44/0x1d0
> ip_push_pending_frames+0x5c/0x78
> raw_sendmsg+0xed8/0x28d0
> inet_sendmsg+0xc4/0x5c0
> sock_sendmsg+0xac/0x108
> __sys_sendto+0x1ac/0x2a0
> __arm64_sys_sendto+0xc4/0x138
> el0_svc_handler+0x13c/0x298
> el0_svc+0x8/0xc
> Code: f9402e80 d538d081 91002000 8b010000 (885f7c03)
>
> Fix this by testing the value of 'TCQ_F_CPUSTATS' bit in 'qdisc->flags',
> before dereferencing 'qdisc->cpu_qstats'.
>
> Fixes: 8a53e616de29 ("net: sched: when clearing NOLOCK, clear TCQ_F_CPUSTATS, too")
> CC: Paolo Abeni <pabeni@redhat.com>
> CC: Stefano Brivio <sbrivio@redhat.com>
> Reported-by: Li Shuang <shuali@redhat.com>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
> ---
> net/sched/sch_generic.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 099797e5409d..137db1cbde85 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -624,8 +624,12 @@ static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc,
>
> err = skb_array_produce(q, skb);
>
> - if (unlikely(err))
> - return qdisc_drop_cpu(skb, qdisc, to_free);
> + if (unlikely(err)) {
> + if (qdisc_is_percpu_stats(qdisc))
> + return qdisc_drop_cpu(skb, qdisc, to_free);
> + else
> + return qdisc_drop(skb, qdisc, to_free);
> + }
>
> qdisc_update_stats_at_enqueue(qdisc, pkt_len);
> return NET_XMIT_SUCCESS;
LGTM, thanks Davide!
I just did a code audit of the others pfifo_fast callbacks, I think
this is the last spot in need of such fix.
Acked-by: Paolo Abeni <pabeni@redhat.com>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox