* Re: r8169 Driver - Poor Network Performance Since Kernel 4.19
From: Heiner Kallweit @ 2019-02-05 18:50 UTC (permalink / raw)
To: David Chang; +Cc: Realtek linux nic maintainers, netdev
In-Reply-To: <20190131023240.GF25745@linux-kyyb.suse>
Hi David,
meanwhile there's the following bug report matching what reported.
It's even the same chip version (RTL8168h).
https://bugzilla.redhat.com/show_bug.cgi?id=1671958
Symptom there is also a significant number of rx_missed packets.
Could you try what I mentioned there last:
Try building a kernel with the call to rtl_hw_aspm_clkreq_enable(tp, true) at the
end of rtl_hw_start_8168h_1() being disabled.
Heiner
On 31.01.2019 03:32, David Chang wrote:
> Hi,
>
> We had a similr case here.
> - Realtek r8169 receive performance regression in kernel 4.19
> https://bugzilla.suse.com/show_bug.cgi?id=1119649
>
> kernel: r8169 0000:01:00.0 eth0: RTL8168h/8111h, XID 54100880
> The major symptom is there are many rx_missed count.
>
>
> On Jan 30, 2019 at 20:15:45 +0100, Heiner Kallweit wrote:
>> Hi Peter,
>>
>> recently I had somebody where pcie_aspm=off for whatever reason didn't
>> do the trick, can you also check with pcie_aspm.policy=performance.
>
> We will give it a try later.
>
>> And please check with "ethtool -S <if>" whether the chip statistics
>> show a significant number of errors.
>>
>> If this doesn't help you may have to bisect to find the offending commit.
>
> We had tried fallback driver to a few previous commits as following,
> but with no luck.
>
> 9675931e6b65 r8169: re-enable MSI-X on RTL8168g (v4.19)
> 098b01ad9837 r8169: don't include asm headers directly (v4.19-rc1)
> a2965f12fde6 r8169: remove rtl8169_set_speed_xmii (v4.19-rc1)
> 6fcf9b1d4d6c r8169: fix runtime suspend (v4.19-rc1)
> e397286b8e89 r8169: remove TBI 1000BaseX support (v4.19-rc1)
>
> Thanks,
> David Chang
>
>>
>> Heiner
>>
>>
>> On 30.01.2019 10:59, Peter Ceiley wrote:
>>> Hi Heiner,
>>>
>>> I tried disabling the ASPM using the pcie_aspm=off kernel parameter
>>> and this made no difference.
>>>
>>> I tried compiling the 4.18.16 r8169.c with the 4.19.18 source and
>>> subsequently loaded the module in the running 4.19.18 kernel. I can
>>> confirm that this immediately resolved the issue and access to the NFS
>>> shares operated as expected.
>>>
>>> I presume this means it is an issue with the r8169 driver included in
>>> 4.19 onwards?
>>>
>>> To answer your last questions:
>>>
>>> Base Board Information
>>> Manufacturer: Alienware
>>> Product Name: 0PGRP5
>>> Version: A02
>>>
>>> ... and yes, the RTL8168 is the onboard network chip.
>>>
>>> Regards,
>>>
>>> Peter.
>>>
>>> On Tue, 29 Jan 2019 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>
>>>> Hi Peter,
>>>>
>>>> I think the vendor driver doesn't enable ASPM per default.
>>>> So it's worth a try to disable ASPM in the BIOS or via sysfs.
>>>> Few older systems seem to have issues with ASPM, what kind of
>>>> system / mainboard are you using? The RTL8168 is the onboard
>>>> network chip?
>>>>
>>>> Rgds, Heiner
>>>>
>>>>
>>>> On 29.01.2019 07:20, Peter Ceiley wrote:
>>>>> Hi Heiner,
>>>>>
>>>>> Thanks, I'll do some more testing. It might not be the driver - I
>>>>> assumed it was due to the fact that using the r8168 driver 'resolves'
>>>>> the issue. I'll see if I can test the r8169.c on top of 4.19 - this is
>>>>> a good idea.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Peter.
>>>>>
>>>>> On Tue, 29 Jan 2019 at 17:16, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>
>>>>>> Hi Peter,
>>>>>>
>>>>>> at a first glance it doesn't look like a typical driver issue.
>>>>>> What you could do:
>>>>>>
>>>>>> - Test the r8169.c from 4.18 on top of 4.19.
>>>>>>
>>>>>> - Check whether disabling ASPM (/sys/module/pcie_aspm) has an effect.
>>>>>>
>>>>>> - Bisect between 4.18 and 4.19 to find the offending commit.
>>>>>>
>>>>>> Any specific reason why you think root cause is in the driver and not
>>>>>> elsewhere in the network subsystem?
>>>>>>
>>>>>> Heiner
>>>>>>
>>>>>>
>>>>>> On 28.01.2019 23:10, Peter Ceiley wrote:
>>>>>>> Hi Heiner,
>>>>>>>
>>>>>>> Thanks for getting back to me.
>>>>>>>
>>>>>>> No, I don't use jumbo packets.
>>>>>>>
>>>>>>> Bandwidth is *generally* good, and iperf results to my NAS provide
>>>>>>> over 900 Mbits/s in both circumstances. The issue seems to appear when
>>>>>>> establishing a connection and is most notable, for example, on my
>>>>>>> mounted NFS shares where it takes seconds (up to 10's of seconds on
>>>>>>> larger directories) to list the contents of each directory. Once a
>>>>>>> transfer begins on a file, I appear to get good bandwidth.
>>>>>>>
>>>>>>> I'm unsure of the best scientific data to provide you in order to
>>>>>>> troubleshoot this issue. Running the following
>>>>>>>
>>>>>>> netstat -s |grep retransmitted
>>>>>>>
>>>>>>> shows a steady increase in retransmitted segments each time I list the
>>>>>>> contents of a remote directory, for example, running 'ls' on a
>>>>>>> directory containing 345 media files did the following using kernel
>>>>>>> 4.19.18:
>>>>>>>
>>>>>>> increased retransmitted segments by 21 and the 'time' command showed
>>>>>>> the following:
>>>>>>> real 0m19.867s
>>>>>>> user 0m0.012s
>>>>>>> sys 0m0.036s
>>>>>>>
>>>>>>> The same command shows no retransmitted segments running kernel
>>>>>>> 4.18.16 and 'time' showed:
>>>>>>> real 0m0.300s
>>>>>>> user 0m0.004s
>>>>>>> sys 0m0.007s
>>>>>>>
>>>>>>> ifconfig does not show any RX/TX errors nor dropped packets in either case.
>>>>>>>
>>>>>>> dmesg XID:
>>>>>>> [ 2.979984] r8169 0000:03:00.0 eth0: RTL8168g/8111g,
>>>>>>> f8:b1:56:fe:67:e0, XID 4c000800, IRQ 32
>>>>>>>
>>>>>>> # lspci -vv
>>>>>>> 03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
>>>>>>> RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c)
>>>>>>> Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
>>>>>>> <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>> Latency: 0, Cache Line Size: 64 bytes
>>>>>>> Interrupt: pin A routed to IRQ 19
>>>>>>> Region 0: I/O ports at d000 [size=256]
>>>>>>> Region 2: Memory at f7b00000 (64-bit, non-prefetchable) [size=4K]
>>>>>>> Region 4: Memory at f2100000 (64-bit, prefetchable) [size=16K]
>>>>>>> Capabilities: [40] Power Management version 3
>>>>>>> Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
>>>>>>> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>>>>>>> Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>>>>>>> Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
>>>>>>> Address: 0000000000000000 Data: 0000
>>>>>>> Capabilities: [70] Express (v2) Endpoint, MSI 01
>>>>>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
>>>>>>> <512ns, L1 <64us
>>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>>>>>> SlotPowerLimit 10.000W
>>>>>>> DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>> MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>>>>>> DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
>>>>>>> Latency L0s unlimited, L1 <64us
>>>>>>> ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes Disabled- CommClk+
>>>>>>> ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>>>>>>> LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
>>>>>>> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>> DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+,
>>>>>>> OBFF Via message/WAKE#
>>>>>>> AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>>>>>>> DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+,
>>>>>>> OBFF Disabled
>>>>>>> AtomicOpsCtl: ReqEn-
>>>>>>> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
>>>>>>> Transmit Margin: Normal Operating Range,
>>>>>>> EnterModifiedCompliance- ComplianceSOS-
>>>>>>> Compliance De-emphasis: -6dB
>>>>>>> LnkSta2: Current De-emphasis Level: -6dB,
>>>>>>> EqualizationComplete-, EqualizationPhase1-
>>>>>>> EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
>>>>>>> Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
>>>>>>> Vector table: BAR=4 offset=00000000
>>>>>>> PBA: BAR=4 offset=00000800
>>>>>>> Capabilities: [d0] Vital Product Data
>>>>>>> pcilib: sysfs_read_vpd: read failed: Input/output error
>>>>>>> Not readable
>>>>>>> Capabilities: [100 v1] Advanced Error Reporting
>>>>>>> UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>> UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>>>>>>> UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
>>>>>>> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>>>>>>> CESta: RxErr+ BadTLP+ BadDLLP+ Rollover- Timeout+ AdvNonFatalErr-
>>>>>>> CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>>>>>>> AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn-
>>>>>>> ECRCChkCap+ ECRCChkEn-
>>>>>>> MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>>>>>>> HeaderLog: 00000000 00000000 00000000 00000000
>>>>>>> Capabilities: [140 v1] Virtual Channel
>>>>>>> Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128-
>>>>>>> Ctrl: ArbSelect=Fixed
>>>>>>> Status: InProgress-
>>>>>>> VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>>>>>>> Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>>>>>>> Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
>>>>>>> Status: NegoPending- InProgress-
>>>>>>> Capabilities: [160 v1] Device Serial Number 01-00-00-00-68-4c-e0-00
>>>>>>> Capabilities: [170 v1] Latency Tolerance Reporting
>>>>>>> Max snoop latency: 71680ns
>>>>>>> Max no snoop latency: 71680ns
>>>>>>> Kernel driver in use: r8169
>>>>>>> Kernel modules: r8169
>>>>>>>
>>>>>>> Please let me know if you have any other ideas in terms of testing.
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Peter.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 29 Jan 2019 at 05:28, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On 28.01.2019 12:13, Peter Ceiley wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I have been experiencing very poor network performance since Kernel
>>>>>>>>> 4.19 and I'm confident it's related to the r8169 driver.
>>>>>>>>>
>>>>>>>>> I have no issue with kernel versions 4.18 and prior. I am experiencing
>>>>>>>>> this issue in kernels 4.19 and 4.20 (currently running/testing with
>>>>>>>>> 4.20.4 & 4.19.18).
>>>>>>>>>
>>>>>>>>> If someone could guide me in the right direction, I'm happy to help
>>>>>>>>> troubleshoot this issue. Note that I have been keeping an eye on one
>>>>>>>>> issue related to loading of the PHY driver, however, my symptoms
>>>>>>>>> differ in that I still have a network connection. I have attempted to
>>>>>>>>> reload the driver on a running system, but this does not improve the
>>>>>>>>> situation.
>>>>>>>>>
>>>>>>>>> Using the proprietary r8168 driver returns my device to proper working order.
>>>>>>>>>
>>>>>>>>> lshw shows:
>>>>>>>>> description: Ethernet interface
>>>>>>>>> product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>>>>>>>>> vendor: Realtek Semiconductor Co., Ltd.
>>>>>>>>> physical id: 0
>>>>>>>>> bus info: pci@0000:03:00.0
>>>>>>>>> logical name: enp3s0
>>>>>>>>> version: 0c
>>>>>>>>> serial:
>>>>>>>>> size: 1Gbit/s
>>>>>>>>> capacity: 1Gbit/s
>>>>>>>>> width: 64 bits
>>>>>>>>> clock: 33MHz
>>>>>>>>> capabilities: pm msi pciexpress msix vpd bus_master cap_list
>>>>>>>>> ethernet physical tp aui bnc mii fibre 10bt 10bt-fd 100bt 100bt-fd
>>>>>>>>> 1000bt-fd autonegotiation
>>>>>>>>> configuration: autonegotiation=on broadcast=yes driver=r8169
>>>>>>>>> duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=192.168.1.25
>>>>>>>>> latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
>>>>>>>>> resources: irq:19 ioport:d000(size=256)
>>>>>>>>> memory:f7b00000-f7b00fff memory:f2100000-f2103fff
>>>>>>>>>
>>>>>>>>> Kind Regards,
>>>>>>>>>
>>>>>>>>> Peter.
>>>>>>>>>
>>>>>>>> Hi Peter,
>>>>>>>>
>>>>>>>> the description "poor network performance" is quite vague, therefore:
>>>>>>>>
>>>>>>>> - Can you provide any measurements?
>>>>>>>> - iperf results before and after
>>>>>>>> - statistics about dropped packets (rx and/or tx)
>>>>>>>> - Do you use jumbo packets?
>>>>>>>>
>>>>>>>> Also help would be a "lspci -vv" output for the network card and
>>>>>>>> the dmesg output line with the chip XID.
>>>>>>>>
>>>>>>>> Heiner
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
^ permalink raw reply
* Re: [PATCH net-next v3] net: dsa: mv88e6xxx: Prevent suspend to RAM
From: Miquel Raynal @ 2019-02-05 18:47 UTC (permalink / raw)
To: Vivien Didelot
Cc: Andrew Lunn, Florian Fainelli, David S. Miller, netdev,
linux-kernel, Thomas Petazzoni, Gregory Clement, Antoine Tenart,
Maxime Chevallier, Nadav Haklai
In-Reply-To: <20190205112857.GB13620@t480s.localdomain>
Hi Vivien,
Vivien Didelot <vivien.didelot@gmail.com> wrote on Tue, 5 Feb 2019
11:28:57 -0500:
> Hi Miquel,
>
> On Tue, 5 Feb 2019 12:07:28 +0100, Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> > +/* There is no suspend to RAM support at DSA level yet, the switch configuration
> > + * would be lost after a power cycle so prevent it to be suspended.
> > + */
> > +static int __maybe_unused mv88e6xxx_suspend(struct device *dev)
> > +{
> > + return -EOPNOTSUPP;
> > +}
> > +
> > +static int __maybe_unused mv88e6xxx_resume(struct device *dev)
> > +{
> > + return 0;
> > +}
>
> The code looks good but my only concern is -EOPNOTSUPP. In this
> context this code is specific to callbacks targeting bridge and
> switchdev, while the dev_pm_ops are completely parallel to DSA.
>
> It is intuitive but given Documentation/power/runtime_pm.txt, this
> will default to being interpreted as a fatal error, while -EBUSY
> seems to keep the device in an 'active' state in a saner way.
>
> I don't understand yet how to properly tell PM core that suspend to RAM
> isn't supported. If an error code different from -EAGAIN or -EBUSY
> is the way to go, I'm good with it:
I do share your concern and I went through the Documentation but I did
not find a unified way to tell the PM core the feature is unsupported.
By grepping code, I realized returning -EOPNOTSUPP was a recurrent
alternative so here we are. I also considered -EBUSY but it seems more
like a "I cannot right now" and -EAGAIN which is more a "try again
soon". Anyway, no matter the error code returned, I'm not sure if the PM
core actually cares?
> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
>
>
> Thanks,
>
> Vivien
Thanks,
Miquèl
^ permalink raw reply
* Re: [PATCH v2 2/2] r8169: Avoid pointer aliasing
From: Joe Perches @ 2019-02-05 18:42 UTC (permalink / raw)
To: David Miller, thierry.reding
Cc: hkallweit1, andrew, nic_swsd, netdev, linux-kernel
In-Reply-To: <20190204.192040.1074738183781998611.davem@davemloft.net>
On Mon, 2019-02-04 at 19:20 -0800, David Miller wrote:
> From: Thierry Reding <thierry.reding@gmail.com>
> Date: Mon, 4 Feb 2019 17:42:13 +0100
>
> > @@ -7316,7 +7325,7 @@ static int rtl_get_ether_clk(struct rtl8169_private *tp)
> > static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> > {
> > const struct rtl_cfg_info *cfg = rtl_cfg_infos + ent->driver_data;
> > - u8 mac_addr[ETH_ALEN] __aligned(4) = {};
> > + u8 mac_addr[ETH_ALEN] = {};
> > struct rtl8169_private *tp;
>
> I agree with Heiner, you have to provide at least 2 byte alignment for this
> buffer due to the reasons he stated.
It's declared after a pointer so it is already is 2 byte aligned.
A lot of drivers wouldn't work otherwise.
^ permalink raw reply
* Re: [PATCH net-next 2/2] net: marvell: mvpp2: fix lack of link interrupts
From: David Miller @ 2019-02-05 18:41 UTC (permalink / raw)
To: rmk+kernel; +Cc: antoine.tenart, maxime.chevallier, baruch, netdev
In-Reply-To: <E1gqnmR-0007eF-87@rmk-PC.armlinux.org.uk>
From: Russell King <rmk+kernel@armlinux.org.uk>
Date: Mon, 04 Feb 2019 23:35:59 +0000
> Sven Auhagen reports that if he changes a SFP+ module for a SFP module
> on the Macchiatobin Single Shot, the link does not come back up. For
> Sven, it is as easy as:
>
> - Insert a SFP+ module connected, and use ping6 to verify link is up.
> - Remove SFP+ module
> - Insert SFP 1000base-X module use ping6 to verify link is up: Link
> up event did not trigger and the link is down
>
> but that doesn't show the problem for me. Locally, this has been
> reproduced by:
>
> - Boot with no modules.
> - Insert SFP+ module, confirm link is up.
> - Replace module with 25000base-X module. Confirm link is up.
> - Set remote end down, link is reported as dropped at both ends.
> - Set remote end up, link is reported up at remote end, but not local
> end due to lack of link interrupt.
>
> Fix this by setting up both GMAC and XLG interrupts for port 0, but
> only unmasking the appropriate interrupt according to the current mode
> set in the mac_config() method. However, only do the mask/unmask
> dance when we are really changing the link mode to avoid missing any
> link interrupts.
>
> Tested-by: Sven Auhagen <sven.auhagen@voleatech.de>
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Applied.
^ permalink raw reply
* Re: [PATCH net-next 1/2] net: marvell: mvpp2: use phy_interface_mode_is_8023z() helper
From: David Miller @ 2019-02-05 18:40 UTC (permalink / raw)
To: rmk+kernel; +Cc: antoine.tenart, maxime.chevallier, baruch, netdev
In-Reply-To: <E1gqnmM-0007dy-3M@rmk-PC.armlinux.org.uk>
From: Russell King <rmk+kernel@armlinux.org.uk>
Date: Mon, 04 Feb 2019 23:35:54 +0000
> Use the phy_interface_mode_is_8023z() helper for detecting interface
> modes that use 802.3z serial encoding. This is equivalent to testing
> for both 1000base-X and 2500base-X.
>
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Applied.
^ permalink raw reply
* Re: [PATCH v2] arm64: dts: lx2160aqds: Add mdio mux nodes
From: Li Yang @ 2019-02-05 18:38 UTC (permalink / raw)
To: Pankaj Bansal
Cc: shawnguo@kernel.org, andrew@lunn.ch, f.fainelli@gmail.com,
netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
David Miller
In-Reply-To: <VI1PR0401MB2496085A7A23DF5CE86A4657F16E0@VI1PR0401MB2496.eurprd04.prod.outlook.com>
On Tue, Feb 5, 2019 at 6:26 AM Pankaj Bansal <pankaj.bansal@nxp.com> wrote:
>
> Hi Shawn/Leo,
>
> If you have no more comments, can you please merge this path in your branch?
> in same branch in which you have accepted LX2160AQDS board patches.
It can be merged through the ARM SoC tree. But there are still
comments remain to be addressed.
>
> Regards,
> Pankaj Bansal
>
> > -----Original Message-----
> > From: David Miller [mailto:davem@davemloft.net]
> > Sent: Monday, 4 February, 2019 10:21 PM
> > To: Pankaj Bansal <pankaj.bansal@nxp.com>
> > Cc: shawnguo@kernel.org; Leo Li <leoyang.li@nxp.com>; andrew@lunn.ch;
> > f.fainelli@gmail.com; netdev@vger.kernel.org; linux-arm-
> > kernel@lists.infradead.org
> > Subject: Re: [PATCH v2] arm64: dts: lx2160aqds: Add mdio mux nodes
> >
> > From: Pankaj Bansal <pankaj.bansal@nxp.com>
> > Date: Mon, 4 Feb 2019 08:51:57 +0000
> >
> > > The two external MDIO buses used to communicate with phy devices that
> > > are external to SOC are muxed in LX2160AQDS board.
> > >
> > > These buses can be routed to any one of the eight IO slots on
> > > LX2160AQDS board depending on value in fpga register 0x54.
> > >
> > > Additionally the external MDIO1 is used to communicate to the onboard
> > > RGMII phy devices.
> > >
> > > The mdio1 is controlled by bits 4-7 of fpga register and mdio2 is
> > > controlled by bits 0-3 of fpga register.
> > >
> > > Signed-off-by: Pankaj Bansal <pankaj.bansal@nxp.com>
> >
> > Am I applying this to my networking tree or are the ARM folks taking this?
^ permalink raw reply
* Re: [PATCH v2] net: dsa: mv88e6xxx: Revise irq setup ordering
From: David Miller @ 2019-02-05 18:37 UTC (permalink / raw)
To: dave.anglin; +Cc: andrew, linux, vivien.didelot, f.fainelli, netdev
In-Reply-To: <824d011b-3692-69c3-5e2c-58e950a80abf@bell.net>
Something is really wrong with how your patches are submitted.
The patch shows up in between the commit message and your signoff
tags, also the patch appears to be mangled by your email client.
Please fix this before submitting any future work.
Thank you.
^ permalink raw reply
* Re: [PATCH v2] arm64: dts: lx2160aqds: Add mdio mux nodes
From: Li Yang @ 2019-02-05 18:37 UTC (permalink / raw)
To: Pankaj Bansal
Cc: Shawn Guo, Andrew Lunn, Florian Fainelli, netdev@vger.kernel.org,
linux-arm-kernel@lists.infradead.org
In-Reply-To: <20190204141641.18272-1-pankaj.bansal@nxp.com>
On Mon, Feb 4, 2019 at 2:53 AM Pankaj Bansal <pankaj.bansal@nxp.com> wrote:
>
> The two external MDIO buses used to communicate with phy devices that are
> external to SOC are muxed in LX2160AQDS board.
>
> These buses can be routed to any one of the eight IO slots on LX2160AQDS
> board depending on value in fpga register 0x54.
>
> Additionally the external MDIO1 is used to communicate to the onboard
> RGMII phy devices.
>
> The mdio1 is controlled by bits 4-7 of fpga register and mdio2 is
> controlled by bits 0-3 of fpga register.
>
> Signed-off-by: Pankaj Bansal <pankaj.bansal@nxp.com>
> ---
>
> Notes:
> V2:
> - removed unnecassary TODO statements
> - removed device_type from mdio nodes
> - change the case of hex number to lowercase
> - removed board specific comments from soc file
There are still some comments from V1 haven't been addressed.
>
> .../boot/dts/freescale/fsl-lx2160a-qds.dts | 115 +++++++++++++++++
> .../boot/dts/freescale/fsl-lx2160a.dtsi | 18 +++
> 2 files changed, 133 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts b/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> index 99a22abbe725..2c3020a72d41 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a-qds.dts
> @@ -46,6 +46,121 @@
> &i2c0 {
> status = "okay";
>
> + fpga@66 {
> + compatible = "fsl,lx2160aqds-fpga", "fsl,fpga-qixis-i2c";
> + reg = <0x66>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + mdio-mux-1@54 {
Still no compatible string defined for the node. Probably should be
"mdio-mux-mmioreg", "mdio-mux"
> + mdio-parent-bus = <&emdio1>;
> + reg = <0x54>; /* BRDCFG4 */
> + mux-mask = <0xf8>; /* EMI1_MDIO */
> + #address-cells=<1>;
> + #size-cells = <0>;
> +
> + mdio@0 {
> + reg = <0x00>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + mdio@40 {
> + reg = <0x40>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + mdio@c0 {
> + reg = <0xc0>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + mdio@c8 {
> + reg = <0xc8>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + mdio@d0 {
> + reg = <0xd0>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + mdio@d8 {
> + reg = <0xd8>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + mdio@e0 {
> + reg = <0xe0>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + mdio@e8 {
> + reg = <0xe8>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + mdio@f0 {
> + reg = <0xf0>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + mdio@f8 {
> + reg = <0xf8>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + };
> +
> + mdio-mux-2@54 {
> + mdio-parent-bus = <&emdio2>;
> + reg = <0x54>; /* BRDCFG4 */
> + mux-mask = <0x07>; /* EMI2_MDIO */
> + #address-cells=<1>;
> + #size-cells = <0>;
> +
> + mdio@0 {
> + reg = <0x00>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + mdio@1 {
> + reg = <0x01>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + mdio@2 {
> + reg = <0x02>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + mdio@3 {
> + reg = <0x03>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + mdio@4 {
> + reg = <0x04>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + mdio@5 {
> + reg = <0x05>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + mdio@6 {
> + reg = <0x06>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + mdio@7 {
> + reg = <0x07>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + };
> + };
> + };
> +
> i2c-mux@77 {
> compatible = "nxp,pca9547";
> reg = <0x77>;
> diff --git a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> index a79f5c1ea56d..a74045ad22ad 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-lx2160a.dtsi
> @@ -762,5 +762,23 @@
> <GIC_SPI 209 IRQ_TYPE_LEVEL_HIGH>;
> dma-coherent;
> };
> +
> + /* WRIOP0: 0x8b8_0000, E-MDIO1: 0x1_6000 */
> + emdio1: mdio@8b96000 {
> + compatible = "fsl,fman-memac-mdio";
> + reg = <0x0 0x8b96000 0x0 0x1000>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + little-endian; /* force the driver in LE mode */
Still doesn't fully align with the binding at
"Documentation/devicetree/bindings/powerpc/fsl/fman.txt".
It should either have the "interrupts" property for external MDIO or
"fsl,fman-internal-mdio" for internal MDIO.
> + };
> +
> + /* WRIOP0: 0x8b8_0000, E-MDIO2: 0x1_7000 */
> + emdio2: mdio@8b97000 {
> + compatible = "fsl,fman-memac-mdio";
> + reg = <0x0 0x8b97000 0x0 0x1000>;
> + #address-cells = <1>;
> + #size-cells = <0>;
> + little-endian; /* force the driver in LE mode */
> + };
> };
> };
> --
> 2.17.1
>
^ permalink raw reply
* Re: [PATCH v2] net: emac: remove IBM_EMAC_RX_SKB_HEADROOM
From: David Miller @ 2019-02-05 18:36 UTC (permalink / raw)
To: chunkeey; +Cc: netdev
In-Reply-To: <20190204215829.27063-1-chunkeey@gmail.com>
From: Christian Lamparter <chunkeey@gmail.com>
Date: Mon, 4 Feb 2019 22:58:29 +0100
> @@ -1195,6 +1195,27 @@ static inline int emac_alloc_rx_skb(struct emac_instance *dev, int slot,
> return 0;
> }
>
> +static inline int
> +emac_alloc_rx_skb(struct emac_instance *dev, int slot)
> +{
> + struct sk_buff *skb;
> +
> + skb = __netdev_alloc_skb_ip_align(dev->ndev, dev->rx_skb_size,
> + GFP_KERNEL);
> +
> + return __emac_prepare_rx_skb(skb, dev, slot);
> +}
> +
> +static inline int
> +emac_alloc_rx_skb_napi(struct emac_instance *dev, int slot)
> +{
> + struct sk_buff *skb;
> +
> + skb = napi_alloc_skb(&dev->mal->napi, dev->rx_skb_size);
> +
> + return __emac_prepare_rx_skb(skb, dev, slot);
> +}
> +
Do not use the inline keyword in foo.c files, let the compiler decide.
Thank you.
^ permalink raw reply
* Re: [PATCH net-next 0/3] nixge: Fixed-link support
From: David Miller @ 2019-02-05 18:34 UTC (permalink / raw)
To: moritz.fischer
Cc: netdev, devicetree, linux-kernel, alex.williams, andrew, robh+dt,
mdf
In-Reply-To: <20190204173040.5538-1-moritz.fischer@ettus.com>
From: Moritz Fischer <moritz.fischer@ettus.com>
Date: Mon, 4 Feb 2019 09:30:37 -0800
> From: Moritz Fischer <mdf@kernel.org>
>
> This series adds fixed-link support to nixge.
>
> The first patch corrects the binding to correctly reflect
> hardware that does not come with MDIO cores instantiated.
>
> The second patch adds fixed link support to the driver.
>
> The third patch updates the binding document with the now
> optional (formerly required) phy-handle property and references
> the fixed-link docs.
Series applied, thank you.
^ permalink raw reply
* Re: [PATCH net] rxrpc: bad unlock balance in rxrpc_recvmsg
From: David Miller @ 2019-02-05 3:19 UTC (permalink / raw)
To: edumazet; +Cc: netdev, eric.dumazet, dhowells, syzkaller
In-Reply-To: <20190204163606.236419-1-edumazet@google.com>
From: Eric Dumazet <edumazet@google.com>
Date: Mon, 4 Feb 2019 08:36:06 -0800
> When either "goto wait_interrupted;" or "goto wait_error;"
> paths are taken, socket lock has already been released.
>
> This patch fixes following syzbot splat :
...
> Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: David Howells <dhowells@redhat.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>
David H., please take a look at this.
I can apply it directly and queue it up for -stable, just let me
know.
Thanks.
^ permalink raw reply
* Re: [PATCH v2] net: phy: fixed-phy: Drop GPIO from fixed_phy_add()
From: David Miller @ 2019-02-05 2:33 UTC (permalink / raw)
To: linus.walleij; +Cc: andrew, f.fainelli, netdev, Laurent.pinchart
In-Reply-To: <20190204102618.20583-1-linus.walleij@linaro.org>
From: Linus Walleij <linus.walleij@linaro.org>
Date: Mon, 4 Feb 2019 11:26:18 +0100
> All users of the fixed_phy_add() pass -1 as GPIO number
> to the fixed phy driver, and all users of fixed_phy_register()
> pass -1 as GPIO number as well, except for the device
> tree MDIO bus.
>
> Any new users should create a proper device and pass the
> GPIO as a descriptor associated with the device so delete
> the GPIO argument from the calls and drop the code looking
> requesting a GPIO in fixed_phy_add().
>
> In fixed phy_register(), investigate the "fixed-link"
> node and pick the GPIO descriptor from "link-gpios" if
> this property exists. Move the corresponding code out
> of of_mdio.c as the fixed phy code anyways requires
> OF to be in use.
>
> Tested-by: Andrew Lunn <andrew@lunn.ch>
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
This looks like net-next material, so that's where I have applied
this.
Thanks.
^ permalink raw reply
* Re: [PATCH v2 2/2] r8169: Avoid pointer aliasing
From: David Miller @ 2019-02-05 3:20 UTC (permalink / raw)
To: thierry.reding; +Cc: hkallweit1, andrew, nic_swsd, netdev, linux-kernel
In-Reply-To: <20190204164213.30727-2-thierry.reding@gmail.com>
From: Thierry Reding <thierry.reding@gmail.com>
Date: Mon, 4 Feb 2019 17:42:13 +0100
> @@ -7316,7 +7325,7 @@ static int rtl_get_ether_clk(struct rtl8169_private *tp)
> static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> {
> const struct rtl_cfg_info *cfg = rtl_cfg_infos + ent->driver_data;
> - u8 mac_addr[ETH_ALEN] __aligned(4) = {};
> + u8 mac_addr[ETH_ALEN] = {};
> struct rtl8169_private *tp;
I agree with Heiner, you have to provide at least 2 byte alignment for this
buffer due to the reasons he stated.
^ permalink raw reply
* Re: [PATCH] net: dsa: slave: Don't propagate flag changes on down slave interfaces
From: David Miller @ 2019-02-05 2:29 UTC (permalink / raw)
To: f.fainelli; +Cc: rdong.ge, andrew, vivien.didelot, netdev
In-Reply-To: <7f8fadc6-1bb5-03b5-4f5e-a407e9116399@gmail.com>
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Sat, 2 Feb 2019 09:05:11 -0800
> Le 2/2/19 à 6:29 AM, Rundong Ge a écrit :
>> The unbalance of master's promiscuity or allmulti will happen after ifdown
>> and ifup a slave interface which is in a bridge.
>>
>> When we ifdown a slave interface , both the 'dsa_slave_close' and
>> 'dsa_slave_change_rx_flags' will clear the master's flags. The flags
>> of master will be decrease twice.
>> In the other hand, if we ifup the slave interface again, since the
>> slave's flags were cleared the 'dsa_slave_open' won't set the master's
>> flag, only 'dsa_slave_change_rx_flags' that triggered by 'br_add_if'
>> will set the master's flags. The flags of master is increase once.
>>
>> Only propagating flag changes when a slave interface is up makes
>> sure this does not happen. The 'vlan_dev_change_rx_flags' had the
>> same problem and was fixed, and changes here follows that fix.
>
> VLAN code under net/8021q/vlan_dev.c::vlan_dev_change_rx_flags() appears
> to do the same thing that you are proposing, so this looks fine to me.
> Since that is a bugfix, we should probably add:
>
> Fixes: 91da11f870f0 ("net: Distributed Switch Architecture protocol
> support")
Applied with Fixes tag added, and queued up for -stable.
Thanks.
^ permalink raw reply
* Re: [PATCH 00/12] net: Introduce ndo_get_port_parent_id()
From: David Miller @ 2019-02-05 2:24 UTC (permalink / raw)
To: f.fainelli; +Cc: netdev
In-Reply-To: <20190204233633.20421-1-f.fainelli@gmail.com>
Florian, your email headers, particularly the CC: list, were too huge.
So vger.kernel.org blocked them.
Please trim the CC: list down and make it more reasonable.
Thank you.
^ permalink raw reply
* Re: [ovs-dev] [PATCH net-next V2 1/1] openvswitch: Declare ovs key structures using macros
From: Gregory Rose @ 2019-02-05 18:22 UTC (permalink / raw)
To: Eli Britstein, David Miller, yihung.wei@gmail.com
Cc: dev@openvswitch.org, netdev@vger.kernel.org,
simon.horman@netronome.com
In-Reply-To: <4f57a34c-55a3-ea39-b9cc-2e1dbebc99c1@mellanox.com>
On 2/5/2019 4:02 AM, Eli Britstein wrote:
> On 2/4/2019 10:07 PM, David Miller wrote:
>> From: Yi-Hung Wei <yihung.wei@gmail.com>
>> Date: Mon, 4 Feb 2019 10:47:18 -0800
>>
>>> For example, to see how 'struct ovs_key_ipv6' is defined, now we need
>>> to trace how OVS_KEY_IPV6_FIELDS is defined, and how OVS_KEY_FIELD_ARR
>>> and OVS_KEY_FIELD defined. I think it makes the header file to be
>>> more complicated.
>> I completely agree.
>>
>> Unless this is totally unavoidable, I do not want to apply a patch
>> which makes reading and auditing the networking code more difficult.
> This technique is discussed for example in
> https://stackoverflow.com/questions/6635851/real-world-use-of-x-macros,
> and I found existing examples of using it in the kernel tree:
>
> __BPF_FUNC_MAPPER in commit ebb676daa1a34 ("bpf: Print function name in
> addition to function id")
>
> __AAL_STAT_ITEMS and __SONET_ITEMS in commit 607ca46e97a1b ("UAPI:
> (Scripted) Disintegrate include/linux"), the successor of commit
> 1da177e4c3f4 ("Linux-2.6.12-rc2"). I didn't dig deeper to the past.
>
> I can agree it makes that H file a bit more complicated, but for sure
> less than ## macros that are widely used.
>
> However, I think the alternatives of generating such defines by some
> scripts, or having the fields in more than one place are even worse, so
> it is a kind of unavoidable.
Why is using a script to generate defines for the requirements of your
application or driver so bad? Your patch
turns openvswitch.h into some hardly readable code - using a script and
having a machine output the macros
your application or driver needs seems like a much better alternative to me.
- Greg
>
> Please reconsider regarding applying this patch.
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
^ permalink raw reply
* [linux-next:master 5947/6329] ERROR: "check_mmu_context" [arch/mips/kvm/kvm.ko] undefined!
From: kbuild test robot @ 2019-02-05 18:14 UTC (permalink / raw)
To: David S. Miller; +Cc: kbuild-all, netdev
[-- Attachment #1: Type: text/plain, Size: 1069 bytes --]
tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
head: 1ff5403385648b1554fd1aeffffdeec71d9cd41c
commit: bde8ef804c0dcf62b73cba01832e45f03f27694c [5947/6329] Merge remote-tracking branch 'net-next/master'
config: mips-malta_kvm_defconfig (attached as .config)
compiler: mipsel-linux-gnu-gcc (Debian 8.2.0-11) 8.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout bde8ef804c0dcf62b73cba01832e45f03f27694c
# save the attached .config to linux build tree
GCC_VERSION=8.2.0 make.cross ARCH=mips
All errors (new ones prefixed by >>):
>> ERROR: "check_mmu_context" [arch/mips/kvm/kvm.ko] undefined!
>> ERROR: "check_switch_mmu_context" [arch/mips/kvm/kvm.ko] undefined!
>> ERROR: "get_new_mmu_context" [arch/mips/kvm/kvm.ko] undefined!
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 19546 bytes --]
^ permalink raw reply
* [RFC bpf-next 7/7] net: flow_dissector: pass net argument to the eth_get_headlen
From: Stanislav Fomichev @ 2019-02-05 17:36 UTC (permalink / raw)
To: netdev; +Cc: davem, ast, daniel, simon.horman, willemb, Stanislav Fomichev
In-Reply-To: <20190205173629.160717-1-sdf@google.com>
This is an example and for RFC only, just to show what the end result
would look like. In a proper submission, we need to introduce new
helper and convert each driver one by one to make sure we don't
break 'git bisect'.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
drivers/net/ethernet/hisilicon/hns/hns_enet.c | 3 ++-
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 3 ++-
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 3 ++-
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 2 +-
drivers/net/ethernet/intel/ice/ice_txrx.c | 2 +-
drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
drivers/net/ethernet/intel/igc/igc_main.c | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 3 ++-
drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 3 ++-
drivers/net/tun.c | 3 ++-
include/linux/etherdevice.h | 2 +-
net/ethernet/eth.c | 5 +++--
15 files changed, 23 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 6a512871176b..efc3a0455967 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -889,7 +889,7 @@ static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp,
DMA_ATTR_WEAK_ORDERING);
if (unlikely(!payload))
- payload = eth_get_headlen(data_ptr, len);
+ payload = eth_get_headlen(bp->dev, data_ptr, len);
skb = napi_alloc_skb(&rxr->bnapi->napi, payload);
if (!skb) {
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_enet.c b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
index 60e7d7ae3787..caf075a7449c 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_enet.c
@@ -603,7 +603,8 @@ static int hns_nic_poll_rx_skb(struct hns_nic_ring_data *ring_data,
} else {
ring->stats.seg_pkt_cnt++;
- pull_len = eth_get_headlen(va, HNS_RX_HEAD_SIZE);
+ pull_len = eth_get_headlen(dev_net(ndev), va,
+ HNS_RX_HEAD_SIZE);
memcpy(__skb_put(skb, pull_len), va,
ALIGN(pull_len, sizeof(long)));
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 40b17223ee41..81308c7c7d1e 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2433,7 +2433,8 @@ static int hns3_alloc_skb(struct hns3_enet_ring *ring, int length,
ring->stats.seg_pkt_cnt++;
u64_stats_update_end(&ring->syncp);
- ring->pull_len = eth_get_headlen(va, HNS3_RX_HEAD_SIZE);
+ ring->pull_len = eth_get_headlen(dev_net(netdev), va,
+ HNS3_RX_HEAD_SIZE);
__skb_put(skb, ring->pull_len);
hns3_nic_reuse_page(skb, ring->frag_num++, ring, ring->pull_len,
desc_cb);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 6fd15a734324..b52dfad58b71 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -278,7 +278,7 @@ static bool fm10k_add_rx_frag(struct fm10k_rx_buffer *rx_buffer,
/* we need the header to contain the greater of either ETH_HLEN or
* 60 bytes if the skb->len is less than 60 for skb_pad.
*/
- pull_len = eth_get_headlen(va, FM10K_RX_HDR_LEN);
+ pull_len = eth_get_headlen(skb_net(skb), va, FM10K_RX_HDR_LEN);
/* align pull length to size of long to optimize memcpy performance */
memcpy(__skb_put(skb, pull_len), va, ALIGN(pull_len, sizeof(long)));
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index a7e14e98889f..7990f80736e2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2035,7 +2035,8 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
/* Determine available headroom for copy */
headlen = size;
if (headlen > I40E_RX_HDR_SIZE)
- headlen = eth_get_headlen(xdp->data, I40E_RX_HDR_SIZE);
+ headlen = eth_get_headlen(skb_net(skb), xdp->data,
+ I40E_RX_HDR_SIZE);
/* align pull length to size of long to optimize memcpy performance */
memcpy(__skb_put(skb, headlen), xdp->data,
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 9b4d7cec2e18..d3c7fd521fd7 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -1315,7 +1315,7 @@ static struct sk_buff *iavf_construct_skb(struct iavf_ring *rx_ring,
/* Determine available headroom for copy */
headlen = size;
if (headlen > IAVF_RX_HDR_SIZE)
- headlen = eth_get_headlen(va, IAVF_RX_HDR_SIZE);
+ headlen = eth_get_headlen(skb_net(skb), va, IAVF_RX_HDR_SIZE);
/* align pull length to size of long to optimize memcpy performance */
memcpy(__skb_put(skb, headlen), va, ALIGN(headlen, sizeof(long)));
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 2357fcac996b..0876ddbc495c 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -708,7 +708,7 @@ static void ice_pull_tail(struct sk_buff *skb)
/* we need the header to contain the greater of either ETH_HLEN or
* 60 bytes if the skb->len is less than 60 for skb_pad.
*/
- pull_len = eth_get_headlen(va, ICE_RX_HDR_SIZE);
+ pull_len = eth_get_headlen(skb_net(skb), va, ICE_RX_HDR_SIZE);
/* align pull length to size of long to optimize memcpy performance */
skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index dfa357b1a9d6..924d91696a01 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8064,7 +8064,7 @@ static struct sk_buff *igb_construct_skb(struct igb_ring *rx_ring,
/* Determine available headroom for copy */
headlen = size;
if (headlen > IGB_RX_HDR_LEN)
- headlen = eth_get_headlen(va, IGB_RX_HDR_LEN);
+ headlen = eth_get_headlen(skb_net(skb), va, IGB_RX_HDR_LEN);
/* align pull length to size of long to optimize memcpy performance */
memcpy(__skb_put(skb, headlen), va, ALIGN(headlen, sizeof(long)));
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index f20183037fb2..71e14f309189 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -1148,7 +1148,7 @@ static struct sk_buff *igc_construct_skb(struct igc_ring *rx_ring,
/* Determine available headroom for copy */
headlen = size;
if (headlen > IGC_RX_HDR_LEN)
- headlen = eth_get_headlen(va, IGC_RX_HDR_LEN);
+ headlen = eth_get_headlen(skb_net(skb), va, IGC_RX_HDR_LEN);
/* align pull length to size of long to optimize memcpy performance */
memcpy(__skb_put(skb, headlen), va, ALIGN(headlen, sizeof(long)));
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index b53087a980ef..9e9e6e5e679a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1799,7 +1799,7 @@ static void ixgbe_pull_tail(struct ixgbe_ring *rx_ring,
* we need the header to contain the greater of either ETH_HLEN or
* 60 bytes if the skb->len is less than 60 for skb_pad.
*/
- pull_len = eth_get_headlen(va, IXGBE_RX_HDR_SIZE);
+ pull_len = eth_get_headlen(skb_net(skb), va, IXGBE_RX_HDR_SIZE);
/* align pull length to size of long to optimize memcpy performance */
skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 49e23afa05a2..c45be7487d57 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -895,7 +895,8 @@ struct sk_buff *ixgbevf_construct_skb(struct ixgbevf_ring *rx_ring,
/* Determine available headroom for copy */
headlen = size;
if (headlen > IXGBEVF_RX_HDR_SIZE)
- headlen = eth_get_headlen(xdp->data, IXGBEVF_RX_HDR_SIZE);
+ headlen = eth_get_headlen(skb_net(skb), xdp->data,
+ IXGBEVF_RX_HDR_SIZE);
/* align pull length to size of long to optimize memcpy performance */
memcpy(__skb_put(skb, headlen), xdp->data,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 598ad7e4d5c9..b9fc28787f44 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -167,7 +167,8 @@ static inline u16 mlx5e_calc_min_inline(enum mlx5_inline_modes mode,
case MLX5_INLINE_MODE_NONE:
return 0;
case MLX5_INLINE_MODE_TCP_UDP:
- hlen = eth_get_headlen(skb->data, skb_headlen(skb));
+ hlen = eth_get_headlen(skb_net(skb), skb->data,
+ skb_headlen(skb));
if (hlen == ETH_HLEN && !skb_vlan_tag_present(skb))
hlen += VLAN_HLEN;
break;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 18656c4094b3..26063a8daf75 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1961,7 +1961,8 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
if (frags) {
/* Exercise flow dissector code path. */
- u32 headlen = eth_get_headlen(skb->data, skb_headlen(skb));
+ u32 headlen = eth_get_headlen(skb_net(skb), skb->data,
+ skb_headlen(skb));
if (unlikely(headlen > skb_headlen(skb))) {
this_cpu_inc(tun->pcpu_stats->rx_dropped);
diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h
index 2c0af7b00715..2e2b5ed30bfc 100644
--- a/include/linux/etherdevice.h
+++ b/include/linux/etherdevice.h
@@ -33,7 +33,7 @@ struct device;
int eth_platform_get_mac_address(struct device *dev, u8 *mac_addr);
unsigned char *arch_get_platform_mac_address(void);
int nvmem_get_mac_address(struct device *dev, void *addrbuf);
-u32 eth_get_headlen(void *data, unsigned int max_len);
+u32 eth_get_headlen(struct net *net, void *data, unsigned int max_len);
__be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev);
extern const struct header_ops eth_header_ops;
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 155d55025bfc..6fbfd6e41548 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -119,13 +119,14 @@ EXPORT_SYMBOL(eth_header);
/**
* eth_get_headlen - determine the length of header for an ethernet frame
+ * @net: pointer to device network namespace
* @data: pointer to start of frame
* @len: total length of frame
*
* Make a best effort attempt to pull the length for all of the headers for
* a given frame in a linear buffer.
*/
-u32 eth_get_headlen(void *data, unsigned int len)
+u32 eth_get_headlen(struct net *net, void *data, unsigned int len)
{
const unsigned int flags = FLOW_DISSECTOR_F_PARSE_1ST_FRAG;
const struct ethhdr *eth = (const struct ethhdr *)data;
@@ -136,7 +137,7 @@ u32 eth_get_headlen(void *data, unsigned int len)
return len;
/* parse any remaining L2/L3 headers, check for L4 */
- if (!skb_flow_dissect_flow_keys_basic(NULL, NULL, &keys, data,
+ if (!skb_flow_dissect_flow_keys_basic(net, NULL, &keys, data,
eth->h_proto, sizeof(*eth),
len, flags))
return max_t(u32, keys.control.thoff, sizeof(*eth));
--
2.20.1.611.gfbb209baf1-goog
^ permalink raw reply related
* [RFC bpf-next 6/7] selftests/bpf: add flow dissector bpf_skb_load_bytes helper test
From: Stanislav Fomichev @ 2019-02-05 17:36 UTC (permalink / raw)
To: netdev; +Cc: davem, ast, daniel, simon.horman, willemb, Stanislav Fomichev
In-Reply-To: <20190205173629.160717-1-sdf@google.com>
With the on-stack skb, we want to make sure we don't trigger any
shinfo access. Add small test which tries to read the data past
the packet boundary.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
tools/testing/selftests/bpf/test_progs.c | 49 ++++++++++++++++++++++++
1 file changed, 49 insertions(+)
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index c52bd90fbb34..c12f61efc427 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -1995,6 +1995,54 @@ static void test_flow_dissector(void)
bpf_object__close(obj);
}
+static void test_flow_dissector_load_bytes(void)
+{
+ struct bpf_flow_keys flow_keys;
+ __u32 duration, retval, size;
+ struct bpf_insn prog[] = {
+ // BPF_REG_1 - 1st argument: context
+ // BPF_REG_2 - 2nd argument: offset, start at last byte + 1
+ BPF_MOV64_IMM(BPF_REG_2, sizeof(pkt_v4)),
+ // BPF_REG_3 - 3rd argument: destination, reserve byte on stack
+ BPF_ALU64_REG(BPF_MOV, BPF_REG_3, BPF_REG_10),
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_3, -1),
+ // BPF_REG_4 - 4th argument: copy one byte
+ BPF_MOV64_IMM(BPF_REG_4, 1),
+ // bpf_skb_load_bytes(ctx, sizeof(pkt_v4), ptr, 1)
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+ BPF_FUNC_skb_load_bytes),
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
+ // if (ret == 0) return BPF_DROP (2)
+ BPF_MOV64_IMM(BPF_REG_0, BPF_DROP),
+ BPF_EXIT_INSN(),
+ // if (ret != 0) return BPF_OK (0)
+ BPF_MOV64_IMM(BPF_REG_0, BPF_OK),
+ BPF_EXIT_INSN(),
+ };
+ int fd, err;
+
+ /* make sure bpf_skb_load_bytes helper doesn't cause any
+ * problems when used with the fake skb in the flow
+ * dissector (try to read past the last byte)
+ */
+ fd = bpf_load_program(BPF_PROG_TYPE_FLOW_DISSECTOR, prog,
+ ARRAY_SIZE(prog), "GPL", 0, NULL, 0);
+ CHECK(fd < 0,
+ "flow_dissector-bpf_skb_load_bytes-load",
+ "fd %d errno %d\n",
+ fd, errno);
+
+ err = bpf_prog_test_run(fd, 1, &pkt_v4, sizeof(pkt_v4),
+ &flow_keys, &size, &retval, &duration);
+ CHECK(size != sizeof(flow_keys) || err || retval != 1,
+ "flow_dissector-bpf_skb_load_bytes",
+ "err %d errno %d retval %d duration %d size %u/%lu\n",
+ err, errno, retval, duration, size, sizeof(flow_keys));
+
+ if (fd >= -1)
+ close(fd);
+}
+
static void *test_spin_lock(void *arg)
{
__u32 duration, retval;
@@ -2136,6 +2184,7 @@ int main(void)
test_queue_stack_map(QUEUE);
test_queue_stack_map(STACK);
test_flow_dissector();
+ test_flow_dissector_load_bytes();
test_spinlock();
test_map_lock();
--
2.20.1.611.gfbb209baf1-goog
^ permalink raw reply related
* [RFC bpf-next 5/7] bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode
From: Stanislav Fomichev @ 2019-02-05 17:36 UTC (permalink / raw)
To: netdev; +Cc: davem, ast, daniel, simon.horman, willemb, Stanislav Fomichev
In-Reply-To: <20190205173629.160717-1-sdf@google.com>
Now that we have __flow_bpf_dissect which works on raw data (by
constructing temporary on-stack skb), use it when doing
BPF_PROG_TEST_RUN for flow dissector.
This should help us catch any possible bugs due to missing shinfo on
the on-stack skb.
Note that existing __skb_flow_bpf_dissect swallows L2 headers and returns
nhoff=0, we need to preserve the existing behavior.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
net/bpf/test_run.c | 52 +++++++++++++++-------------------------------
1 file changed, 17 insertions(+), 35 deletions(-)
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 2c5172b33209..502ae0e866d3 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -249,10 +249,8 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
u32 repeat = kattr->test.repeat;
struct bpf_flow_keys flow_keys;
u64 time_start, time_spent = 0;
- struct bpf_skb_data_end *cb;
+ const struct ethhdr *eth;
u32 retval, duration;
- struct sk_buff *skb;
- struct sock *sk;
void *data;
int ret;
u32 i;
@@ -260,35 +258,14 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
if (prog->type != BPF_PROG_TYPE_FLOW_DISSECTOR)
return -EINVAL;
- data = bpf_test_init(kattr, size, NET_SKB_PAD + NET_IP_ALIGN,
- SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
+ if (size < ETH_HLEN)
+ return -EINVAL;
+
+ data = bpf_test_init(kattr, size, 0, 0);
if (IS_ERR(data))
return PTR_ERR(data);
- sk = kzalloc(sizeof(*sk), GFP_USER);
- if (!sk) {
- kfree(data);
- return -ENOMEM;
- }
- sock_net_set(sk, current->nsproxy->net_ns);
- sock_init_data(NULL, sk);
-
- skb = build_skb(data, 0);
- if (!skb) {
- kfree(data);
- kfree(sk);
- return -ENOMEM;
- }
- skb->sk = sk;
-
- skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
- __skb_put(skb, size);
- skb->protocol = eth_type_trans(skb,
- current->nsproxy->net_ns->loopback_dev);
- skb_reset_network_header(skb);
-
- cb = (struct bpf_skb_data_end *)skb->cb;
- cb->qdisc_cb.flow_keys = &flow_keys;
+ eth = (struct ethhdr *)data;
if (!repeat)
repeat = 1;
@@ -297,9 +274,15 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
for (i = 0; i < repeat; i++) {
preempt_disable();
rcu_read_lock();
- retval = __skb_flow_bpf_dissect(prog, skb,
- &flow_keys_dissector,
- &flow_keys);
+ retval = __flow_bpf_dissect(prog, data,
+ eth->h_proto, ETH_HLEN,
+ size,
+ &flow_keys_dissector,
+ &flow_keys);
+ if (flow_keys.nhoff >= ETH_HLEN)
+ flow_keys.nhoff -= ETH_HLEN;
+ if (flow_keys.thoff >= ETH_HLEN)
+ flow_keys.thoff -= ETH_HLEN;
rcu_read_unlock();
preempt_enable();
@@ -317,8 +300,7 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
ret = bpf_test_finish(kattr, uattr, &flow_keys, sizeof(flow_keys),
retval, duration);
-
- kfree_skb(skb);
- kfree(sk);
+ kfree(data);
return ret;
+
}
--
2.20.1.611.gfbb209baf1-goog
^ permalink raw reply related
* [RFC bpf-next 4/7] net: flow_dissector: handle no-skb use case
From: Stanislav Fomichev @ 2019-02-05 17:36 UTC (permalink / raw)
To: netdev; +Cc: davem, ast, daniel, simon.horman, willemb, Stanislav Fomichev
In-Reply-To: <20190205173629.160717-1-sdf@google.com>
When flow_dissector is called without skb (with only data and hlen),
construct on-stack skb (which has a linear chunk of data passed
to the flow dissector). This should let us handle eth_get_headlen
case where only data is provided and we don't want to (yet) allocate
an skb.
Since this on-stack skb doesn't allocate its own data, we can't
add shinfo and need to be careful to avoid any code paths that use
it. Flow dissector BPF programs can only call bpf_skb_load_bytes helper,
which doesn't touch shinfo in our case (skb->len is the length of the
linear header so it exits early).
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
include/linux/skbuff.h | 5 +++
net/core/flow_dissector.c | 95 +++++++++++++++++++++++++++++----------
2 files changed, 76 insertions(+), 24 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index aa9a9983de80..5f1c085cb34c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1227,6 +1227,11 @@ bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
const struct sk_buff *skb,
struct flow_dissector *flow_dissector,
struct bpf_flow_keys *flow_keys);
+bool __flow_bpf_dissect(struct bpf_prog *prog,
+ void *data, __be16 proto,
+ int nhoff, int hlen,
+ struct flow_dissector *flow_dissector,
+ struct bpf_flow_keys *flow_keys);
bool __skb_flow_dissect(struct net *net,
const struct sk_buff *skb,
struct flow_dissector *flow_dissector,
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index dddcc37c0462..87167b74f59a 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -683,6 +683,28 @@ static void __skb_flow_bpf_to_target(const struct bpf_flow_keys *flow_keys,
}
}
+static inline void init_flow_keys(struct bpf_flow_keys *flow_keys,
+ struct sk_buff *skb, int nhoff)
+{
+ struct bpf_skb_data_end *cb = (struct bpf_skb_data_end *)skb->cb;
+
+ memset(cb, 0, sizeof(*cb));
+ memset(flow_keys, 0, sizeof(*flow_keys));
+
+ flow_keys->nhoff = nhoff;
+ flow_keys->thoff = nhoff;
+
+ cb->qdisc_cb.flow_keys = flow_keys;
+}
+
+static inline void clamp_flow_keys(struct bpf_flow_keys *flow_keys,
+ int hlen)
+{
+ flow_keys->nhoff = clamp_t(u16, flow_keys->nhoff, 0, hlen);
+ flow_keys->thoff = clamp_t(u16, flow_keys->thoff,
+ flow_keys->nhoff, hlen);
+}
+
bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
const struct sk_buff *skb,
struct flow_dissector *flow_dissector,
@@ -702,13 +724,9 @@ bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
/* Save Control Block */
memcpy(&cb_saved, cb, sizeof(cb_saved));
- memset(cb, 0, sizeof(*cb));
/* Pass parameters to the BPF program */
- memset(flow_keys, 0, sizeof(*flow_keys));
- cb->qdisc_cb.flow_keys = flow_keys;
- flow_keys->nhoff = skb_network_offset(skb);
- flow_keys->thoff = flow_keys->nhoff;
+ init_flow_keys(flow_keys, skb, skb_network_offset(skb));
bpf_compute_data_pointers((struct sk_buff *)skb);
result = BPF_PROG_RUN(prog, skb);
@@ -716,9 +734,34 @@ bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
/* Restore state */
memcpy(cb, &cb_saved, sizeof(cb_saved));
- flow_keys->nhoff = clamp_t(u16, flow_keys->nhoff, 0, skb->len);
- flow_keys->thoff = clamp_t(u16, flow_keys->thoff,
- flow_keys->nhoff, skb->len);
+ clamp_flow_keys(flow_keys, skb->len);
+
+ return result == BPF_OK;
+}
+
+bool __flow_bpf_dissect(struct bpf_prog *prog,
+ void *data, __be16 proto,
+ int nhoff, int hlen,
+ struct flow_dissector *flow_dissector,
+ struct bpf_flow_keys *flow_keys)
+{
+ struct bpf_skb_data_end *cb;
+ struct sk_buff skb;
+ u32 result;
+
+ __init_skb(&skb, data, hlen);
+ skb_put(&skb, hlen);
+ skb.protocol = proto;
+
+ init_flow_keys(flow_keys, &skb, nhoff);
+
+ cb = (struct bpf_skb_data_end *)skb.cb;
+ cb->data_meta = skb.data;
+ cb->data_end = skb.data + skb_headlen(&skb);
+
+ result = BPF_PROG_RUN(prog, &skb);
+
+ clamp_flow_keys(flow_keys, hlen);
return result == BPF_OK;
}
@@ -754,8 +797,10 @@ bool __skb_flow_dissect(struct net *net,
struct flow_dissector_key_icmp *key_icmp;
struct flow_dissector_key_tags *key_tags;
struct flow_dissector_key_vlan *key_vlan;
- enum flow_dissect_ret fdret;
enum flow_dissector_key_id dissector_vlan = FLOW_DISSECTOR_KEY_MAX;
+ struct bpf_prog *attached = NULL;
+ struct bpf_flow_keys flow_keys;
+ enum flow_dissect_ret fdret;
int num_hdrs = 0;
u8 ip_proto = 0;
bool ret;
@@ -795,30 +840,32 @@ bool __skb_flow_dissect(struct net *net,
FLOW_DISSECTOR_KEY_BASIC,
target_container);
- if (skb) {
- struct bpf_flow_keys flow_keys;
- struct bpf_prog *attached = NULL;
+ rcu_read_lock();
- rcu_read_lock();
+ if (!net && skb)
+ net = skb_net(skb);
+ if (net)
+ attached = rcu_dereference(net->flow_dissector_prog);
- if (!net && skb)
- net = skb_net(skb);
- if (net)
- attached = rcu_dereference(net->flow_dissector_prog);
- WARN_ON_ONCE(!net);
+ WARN_ON_ONCE(!net);
- if (attached) {
+ if (attached) {
+ if (skb)
ret = __skb_flow_bpf_dissect(attached, skb,
flow_dissector,
&flow_keys);
- __skb_flow_bpf_to_target(&flow_keys, flow_dissector,
- target_container);
- rcu_read_unlock();
- return ret;
- }
+ else
+ ret = __flow_bpf_dissect(attached, data, proto, nhoff,
+ hlen, flow_dissector,
+ &flow_keys);
+ __skb_flow_bpf_to_target(&flow_keys, flow_dissector,
+ target_container);
rcu_read_unlock();
+ return ret;
}
+ rcu_read_unlock();
+
if (dissector_uses_key(flow_dissector,
FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
struct ethhdr *eth = eth_hdr(skb);
--
2.20.1.611.gfbb209baf1-goog
^ permalink raw reply related
* [RFC bpf-next 3/7] net: plumb network namespace into __skb_flow_dissect
From: Stanislav Fomichev @ 2019-02-05 17:36 UTC (permalink / raw)
To: netdev; +Cc: davem, ast, daniel, simon.horman, willemb, Stanislav Fomichev
In-Reply-To: <20190205173629.160717-1-sdf@google.com>
This new argument will be used in the next patches for the
eth_get_headlen use case. eth_get_headlen calls flow dissector
with only data (without skb) so there is currently no way to
pull attached BPF flow dissector program. With this new argument,
we can amend the callers to explicitly pass network namespace
so we can use attached BPF program.
Note: WARN_ON_ONCE(!net) will now trigger for eth_get_headlen users.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
include/linux/skbuff.h | 15 +++++++++------
net/core/flow_dissector.c | 20 +++++++++++---------
net/ethernet/eth.c | 5 +++--
3 files changed, 23 insertions(+), 17 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 28723a86efdf..aa9a9983de80 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1227,7 +1227,8 @@ bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
const struct sk_buff *skb,
struct flow_dissector *flow_dissector,
struct bpf_flow_keys *flow_keys);
-bool __skb_flow_dissect(const struct sk_buff *skb,
+bool __skb_flow_dissect(struct net *net,
+ const struct sk_buff *skb,
struct flow_dissector *flow_dissector,
void *target_container,
void *data, __be16 proto, int nhoff, int hlen,
@@ -1237,7 +1238,7 @@ static inline bool skb_flow_dissect(const struct sk_buff *skb,
struct flow_dissector *flow_dissector,
void *target_container, unsigned int flags)
{
- return __skb_flow_dissect(skb, flow_dissector, target_container,
+ return __skb_flow_dissect(NULL, skb, flow_dissector, target_container,
NULL, 0, 0, 0, flags);
}
@@ -1246,18 +1247,19 @@ static inline bool skb_flow_dissect_flow_keys(const struct sk_buff *skb,
unsigned int flags)
{
memset(flow, 0, sizeof(*flow));
- return __skb_flow_dissect(skb, &flow_keys_dissector, flow,
+ return __skb_flow_dissect(NULL, skb, &flow_keys_dissector, flow,
NULL, 0, 0, 0, flags);
}
static inline bool
-skb_flow_dissect_flow_keys_basic(const struct sk_buff *skb,
+skb_flow_dissect_flow_keys_basic(struct net *net,
+ const struct sk_buff *skb,
struct flow_keys_basic *flow, void *data,
__be16 proto, int nhoff, int hlen,
unsigned int flags)
{
memset(flow, 0, sizeof(*flow));
- return __skb_flow_dissect(skb, &flow_keys_basic_dissector, flow,
+ return __skb_flow_dissect(net, skb, &flow_keys_basic_dissector, flow,
data, proto, nhoff, hlen, flags);
}
@@ -2438,7 +2440,8 @@ static inline void skb_probe_transport_header(struct sk_buff *skb,
if (skb_transport_header_was_set(skb))
return;
- if (skb_flow_dissect_flow_keys_basic(skb, &keys, NULL, 0, 0, 0, 0))
+ if (skb_flow_dissect_flow_keys_basic(NULL, skb, &keys,
+ NULL, 0, 0, 0, 0))
skb_set_transport_header(skb, keys.control.thoff);
else
skb_set_transport_header(skb, offset_hint);
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index bb1a54747d64..dddcc37c0462 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -725,6 +725,7 @@ bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
/**
* __skb_flow_dissect - extract the flow_keys struct and return it
+ * @net: associated network namespace, if NULL pulled from skb
* @skb: sk_buff to extract the flow from, can be NULL if the rest are specified
* @flow_dissector: list of keys to dissect
* @target_container: target structure to put dissected values into
@@ -739,7 +740,8 @@ bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
*
* Caller must take care of zeroing target container memory.
*/
-bool __skb_flow_dissect(const struct sk_buff *skb,
+bool __skb_flow_dissect(struct net *net,
+ const struct sk_buff *skb,
struct flow_dissector *flow_dissector,
void *target_container,
void *data, __be16 proto, int nhoff, int hlen,
@@ -799,12 +801,11 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
rcu_read_lock();
- if (skb->dev)
- attached = rcu_dereference(dev_net(skb->dev)->flow_dissector_prog);
- else if (skb->sk)
- attached = rcu_dereference(sock_net(skb->sk)->flow_dissector_prog);
- else
- WARN_ON_ONCE(1);
+ if (!net && skb)
+ net = skb_net(skb);
+ if (net)
+ attached = rcu_dereference(net->flow_dissector_prog);
+ WARN_ON_ONCE(!net);
if (attached) {
ret = __skb_flow_bpf_dissect(attached, skb,
@@ -1406,7 +1407,7 @@ u32 __skb_get_hash_symmetric(const struct sk_buff *skb)
__flow_hash_secret_init();
memset(&keys, 0, sizeof(keys));
- __skb_flow_dissect(skb, &flow_keys_dissector_symmetric, &keys,
+ __skb_flow_dissect(NULL, skb, &flow_keys_dissector_symmetric, &keys,
NULL, 0, 0, 0,
FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL);
@@ -1508,7 +1509,8 @@ u32 skb_get_poff(const struct sk_buff *skb)
{
struct flow_keys_basic keys;
- if (!skb_flow_dissect_flow_keys_basic(skb, &keys, NULL, 0, 0, 0, 0))
+ if (!skb_flow_dissect_flow_keys_basic(NULL, skb, &keys,
+ NULL, 0, 0, 0, 0))
return 0;
return __skb_get_poff(skb, skb->data, &keys, skb_headlen(skb));
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 4c520110b04f..155d55025bfc 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -136,8 +136,9 @@ u32 eth_get_headlen(void *data, unsigned int len)
return len;
/* parse any remaining L2/L3 headers, check for L4 */
- if (!skb_flow_dissect_flow_keys_basic(NULL, &keys, data, eth->h_proto,
- sizeof(*eth), len, flags))
+ if (!skb_flow_dissect_flow_keys_basic(NULL, NULL, &keys, data,
+ eth->h_proto, sizeof(*eth),
+ len, flags))
return max_t(u32, keys.control.thoff, sizeof(*eth));
/* parse for any L4 headers */
--
2.20.1.611.gfbb209baf1-goog
^ permalink raw reply related
* [RFC bpf-next 2/7] net: introduce skb_net helper
From: Stanislav Fomichev @ 2019-02-05 17:36 UTC (permalink / raw)
To: netdev; +Cc: davem, ast, daniel, simon.horman, willemb, Stanislav Fomichev
In-Reply-To: <20190205173629.160717-1-sdf@google.com>
skb_net returns network namespace from the associated device or socket.
This will be used in the next commit.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
include/linux/skbuff.h | 2 ++
net/core/skbuff.c | 10 ++++++++++
2 files changed, 12 insertions(+)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ad883ab2762c..28723a86efdf 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4343,5 +4343,7 @@ static inline __wsum lco_csum(struct sk_buff *skb)
return csum_partial(l4_hdr, csum_start - l4_hdr, partial);
}
+struct net *skb_net(const struct sk_buff *skb);
+
#endif /* __KERNEL__ */
#endif /* _LINUX_SKBUFF_H */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 23c9cf100bd4..016db13fa2b6 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5585,6 +5585,16 @@ void skb_condense(struct sk_buff *skb)
skb->truesize = SKB_TRUESIZE(skb_end_offset(skb));
}
+struct net *skb_net(const struct sk_buff *skb)
+{
+ if (skb->dev)
+ return dev_net(skb->dev);
+ else if (skb->sk)
+ return sock_net(skb->sk);
+ return NULL;
+}
+EXPORT_SYMBOL(skb_net);
+
#ifdef CONFIG_SKB_EXTENSIONS
static void *skb_ext_get_ptr(struct skb_ext *ext, enum skb_ext_id id)
{
--
2.20.1.611.gfbb209baf1-goog
^ permalink raw reply related
* [RFC bpf-next 1/7] net: introduce __init_skb and __init_skb_shinfo helpers
From: Stanislav Fomichev @ 2019-02-05 17:36 UTC (permalink / raw)
To: netdev; +Cc: davem, ast, daniel, simon.horman, willemb, Stanislav Fomichev
In-Reply-To: <20190205173629.160717-1-sdf@google.com>
__init_skb is essentially a version of __build_skb which accepts skb as
an argument (instead of doing kmem_cache_alloc to allocate it).
__init_skb_shinfo initializes shinfo.
No functional changes.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
include/linux/skbuff.h | 1 +
net/core/skbuff.c | 68 ++++++++++++++++++++----------------------
2 files changed, 33 insertions(+), 36 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 831846617d07..ad883ab2762c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1001,6 +1001,7 @@ void kfree_skb_partial(struct sk_buff *skb, bool head_stolen);
bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
bool *fragstolen, int *delta_truesize);
+void __init_skb(struct sk_buff *skb, u8 *data, unsigned int size);
struct sk_buff *__alloc_skb(unsigned int size, gfp_t priority, int flags,
int node);
struct sk_buff *__build_skb(void *data, unsigned int frag_size);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 26d848484912..23c9cf100bd4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -160,6 +160,34 @@ static void *__kmalloc_reserve(size_t size, gfp_t flags, int node,
*
*/
+void __init_skb(struct sk_buff *skb, u8 *data, unsigned int size)
+{
+ /* Only clear those fields we need to clear, not those that we will
+ * actually initialise below. Hence, don't put any more fields after
+ * the tail pointer in struct sk_buff!
+ */
+ memset(skb, 0, offsetof(struct sk_buff, tail));
+ /* Account for allocated memory : skb + skb->head */
+ skb->truesize = SKB_TRUESIZE(size);
+ refcount_set(&skb->users, 1);
+ skb->head = data;
+ skb->data = data;
+ skb_reset_tail_pointer(skb);
+ skb->end = skb->tail + size;
+ skb->mac_header = (typeof(skb->mac_header))~0U;
+ skb->transport_header = (typeof(skb->transport_header))~0U;
+}
+
+static inline void __init_skb_shinfo(struct sk_buff *skb)
+{
+ struct skb_shared_info *shinfo;
+
+ /* make sure we initialize shinfo sequentially */
+ shinfo = skb_shinfo(skb);
+ memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
+ atomic_set(&shinfo->dataref, 1);
+}
+
/**
* __alloc_skb - allocate a network buffer
* @size: size to allocate
@@ -181,7 +209,6 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
int flags, int node)
{
struct kmem_cache *cache;
- struct skb_shared_info *shinfo;
struct sk_buff *skb;
u8 *data;
bool pfmemalloc;
@@ -215,27 +242,9 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask,
size = SKB_WITH_OVERHEAD(ksize(data));
prefetchw(data + size);
- /*
- * Only clear those fields we need to clear, not those that we will
- * actually initialise below. Hence, don't put any more fields after
- * the tail pointer in struct sk_buff!
- */
- memset(skb, 0, offsetof(struct sk_buff, tail));
- /* Account for allocated memory : skb + skb->head */
- skb->truesize = SKB_TRUESIZE(size);
+ __init_skb(skb, data, size);
+ __init_skb_shinfo(skb);
skb->pfmemalloc = pfmemalloc;
- refcount_set(&skb->users, 1);
- skb->head = data;
- skb->data = data;
- skb_reset_tail_pointer(skb);
- skb->end = skb->tail + size;
- skb->mac_header = (typeof(skb->mac_header))~0U;
- skb->transport_header = (typeof(skb->transport_header))~0U;
-
- /* make sure we initialize shinfo sequentially */
- shinfo = skb_shinfo(skb);
- memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
- atomic_set(&shinfo->dataref, 1);
if (flags & SKB_ALLOC_FCLONE) {
struct sk_buff_fclones *fclones;
@@ -277,7 +286,6 @@ EXPORT_SYMBOL(__alloc_skb);
*/
struct sk_buff *__build_skb(void *data, unsigned int frag_size)
{
- struct skb_shared_info *shinfo;
struct sk_buff *skb;
unsigned int size = frag_size ? : ksize(data);
@@ -287,20 +295,8 @@ struct sk_buff *__build_skb(void *data, unsigned int frag_size)
size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
- memset(skb, 0, offsetof(struct sk_buff, tail));
- skb->truesize = SKB_TRUESIZE(size);
- refcount_set(&skb->users, 1);
- skb->head = data;
- skb->data = data;
- skb_reset_tail_pointer(skb);
- skb->end = skb->tail + size;
- skb->mac_header = (typeof(skb->mac_header))~0U;
- skb->transport_header = (typeof(skb->transport_header))~0U;
-
- /* make sure we initialize shinfo sequentially */
- shinfo = skb_shinfo(skb);
- memset(shinfo, 0, offsetof(struct skb_shared_info, dataref));
- atomic_set(&shinfo->dataref, 1);
+ __init_skb(skb, data, size);
+ __init_skb_shinfo(skb);
return skb;
}
--
2.20.1.611.gfbb209baf1-goog
^ permalink raw reply related
* [RFC bpf-next 0/7] net: flow_dissector: trigger BPF hook when called from eth_get_headlen
From: Stanislav Fomichev @ 2019-02-05 17:36 UTC (permalink / raw)
To: netdev; +Cc: davem, ast, daniel, simon.horman, willemb, Stanislav Fomichev
Currently, when eth_get_headlen calls flow dissector, it doesn't pass any
skb. Because we use passed skb to lookup associated networking namespace
to find whether we have a BPF program attached or not, we always use
C-based flow dissector in this case.
The goal of this patch series is to add new networking namespace argument
to the eth_get_headlen and make BPF flow dissector programs be able to
work in the skb-less case.
The series goes like this:
1. introduce __init_skb and __init_skb_shinfo; those will be used to
initialize temporary skb
2. introduce skb_net which can be used to get networking namespace
associated with an skb
3. add new optional network namespace argument to __skb_flow_dissect and
plumb through the callers
4. add new __flow_bpf_dissect which constructs temporary on-stack skb
(using __init_skb) and calls BPF flow dissector program
5. convert flow dissector BPF_PROG_TEST_RUN to skb-less mode to show that
it works
6. add selftest that makes sure going over the packet bounds in
bpf_skb_load_bytes with on-stack skb doesn't cause any problems
7. add new net namespace argument go eth_get_headlen and convert the
callers
Stanislav Fomichev (7):
net: introduce __init_skb and __init_skb_shinfo helpers
net: introduce skb_net helper
net: plumb network namespace into __skb_flow_dissect
net: flow_dissector: handle no-skb use case
bpf: when doing BPF_PROG_TEST_RUN for flow dissector use no-skb mode
selftests/bpf: add flow dissector bpf_skb_load_bytes helper test
net: flow_dissector: pass net argument to the eth_get_headlen
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
drivers/net/ethernet/hisilicon/hns/hns_enet.c | 3 +-
.../net/ethernet/hisilicon/hns3/hns3_enet.c | 3 +-
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 3 +-
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 2 +-
drivers/net/ethernet/intel/ice/ice_txrx.c | 2 +-
drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
drivers/net/ethernet/intel/igc/igc_main.c | 2 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
.../net/ethernet/intel/ixgbevf/ixgbevf_main.c | 3 +-
.../net/ethernet/mellanox/mlx5/core/en_tx.c | 3 +-
drivers/net/tun.c | 3 +-
include/linux/etherdevice.h | 2 +-
include/linux/skbuff.h | 23 +++-
net/bpf/test_run.c | 52 +++------
net/core/flow_dissector.c | 105 +++++++++++++-----
net/core/skbuff.c | 78 +++++++------
net/ethernet/eth.c | 8 +-
tools/testing/selftests/bpf/test_progs.c | 49 ++++++++
20 files changed, 227 insertions(+), 122 deletions(-)
--
2.20.1.611.gfbb209baf1-goog
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox