* Re: [PATCH net] net: rtnetlink: fix info leak in RTM_GETSTATS call
From: David Miller @ 2017-10-03 17:19 UTC (permalink / raw)
To: nikolay; +Cc: netdev, keescook, dvyukov, andreyknvl, kcc, roopa, glider,
edumazet
In-Reply-To: <1507026048-13734-1-git-send-email-nikolay@cumulusnetworks.com>
From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Date: Tue, 3 Oct 2017 13:20:48 +0300
> When RTM_GETSTATS was added the fields of its header struct were not all
> initialized when returning the result thus leaking 4 bytes of information
> to user-space per rtnl_fill_statsinfo call, so initialize them now. Thanks
> to Alexander Potapenko for the detailed report and bisection.
>
> Reported-by: Alexander Potapenko <glider@google.com>
> Fixes: 10c9ead9f3c6 ("rtnetlink: add new RTM_GETSTATS message to dump link stats")
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Applied and queued up for -stable, thanks.
^ permalink raw reply
* Re: [PATCH v2 net-next 06/12] qed: Add LL2 slowpath handling
From: David Miller @ 2017-10-03 17:17 UTC (permalink / raw)
To: Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
dledford-H+wXaHxf7aLQT0dZR+AlfA,
Ariel.Elior-YGCgFSpz5w/QT0dZR+AlfA
In-Reply-To: <1507020902-4952-7-git-send-email-Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
From: Michal Kalderon <Michal.Kalderon-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
Date: Tue, 3 Oct 2017 11:54:56 +0300
> @@ -423,6 +423,41 @@ static void qed_ll2_rxq_parse_reg(struct qed_hwfn *p_hwfn,
> }
>
> static int
> +qed_ll2_handle_slowpath(struct qed_hwfn *p_hwfn,
> + struct qed_ll2_info *p_ll2_conn,
> + union core_rx_cqe_union *p_cqe,
> + unsigned long *p_lock_flags)
> +{
...
> + spin_unlock_irqrestore(&p_rx->lock, *p_lock_flags);
> +
You can't drop this lock.
Another thread can enter the loop of our caller and process RX queue
entries, then we would return from here and try to process the same
entries again.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH] net: phy: DP83822 initial driver submission
From: Florian Fainelli @ 2017-10-03 17:15 UTC (permalink / raw)
To: Dan Murphy, andrew; +Cc: netdev
In-Reply-To: <20171003155316.12312-1-dmurphy@ti.com>
On 10/03/2017 08:53 AM, Dan Murphy wrote:
> Add support for the TI DP83822 10/100Mbit ethernet phy.
>
> The DP83822 provides flexibility to connect to a MAC through a
> standard MII, RMII or RGMII interface.
>
> Datasheet:
> http://www.ti.com/product/DP83822I/datasheet
This looks pretty good, just a few nits below.
[snip]
> +static int dp83822_set_wol(struct phy_device *phydev,
> + struct ethtool_wolinfo *wol)
> +{
> + struct net_device *ndev = phydev->attached_dev;
> + u16 value;
> + const u8 *mac;
> +
> + if (wol->wolopts & (WAKE_MAGIC | WAKE_MAGICSECURE)) {
> + mac = (const u8 *)ndev->dev_addr;
> +
> + if (!is_valid_ether_addr(mac))
> + return -EFAULT;
-EINVAL maybe?
> +
> + /* MAC addresses start with byte 5, but stored in mac[0].
> + * 822 PHYs store bytes 4|5, 2|3, 0|1
> + */
> + phy_write_mmd(phydev, DP83822_DEVADDR,
> + MII_DP83822_WOL_DA1, (mac[1] << 8) | mac[0]);
> + phy_write_mmd(phydev, DP83822_DEVADDR,
> + MII_DP83822_WOL_DA2, (mac[3] << 8) | mac[2]);
> + phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_DA3,
> + (mac[5] << 8) | mac[4]);
> +
> + value = phy_read_mmd(phydev, DP83822_DEVADDR,
> + MII_DP83822_WOL_CFG);
> + if (wol->wolopts & WAKE_MAGIC)
> + value |= DP83822_WOL_MAGIC_EN;
> + else
> + value &= ~DP83822_WOL_MAGIC_EN;
> +
> + if (wol->wolopts & WAKE_MAGICSECURE) {
> + value |= DP83822_WOL_SECURE_ON;
Just in case any of the writes below fail, you would probably want to
set this bit last, thus indicating that the password was successfully set.
> + phy_write_mmd(phydev, DP83822_DEVADDR,
> + MII_DP83822_RXSOP1,
> + (wol->sopass[1] << 8) | wol->sopass[0]);
> + phy_write_mmd(phydev, DP83822_DEVADDR,
> + MII_DP83822_RXSOP2,
> + (wol->sopass[3] << 8) | wol->sopass[2]);
> + phy_write_mmd(phydev, DP83822_DEVADDR,
> + MII_DP83822_RXSOP3,
> + (wol->sopass[5] << 8) | wol->sopass[4]);
In the else clause, you don't appear to be clearing the MagicPacket
SecureOn password, but your get_wol function does not check for
DP83822_WOL_SECURE_ON before returning the secure password, so either
one of these two should be fixed. I would go with fixing the get_wol
function see below.
> + } else {
> + value &= ~DP83822_WOL_SECURE_ON;
> + }
> +
> + value |= (DP83822_WOL_EN | DP83822_WOL_CLR_INDICATION |
> + DP83822_WOL_CLR_INDICATION);
The extra parenthesis should not be required here.
> + phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG,
> + value);
> + } else {
> + value =
> + phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
> + value &= (~DP83822_WOL_EN);
Same here, parenthesis should not be needed.
> + phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG,
> + value);
> + }
> +
> + return 0;
> +}
> +
> +static void dp83822_get_wol(struct phy_device *phydev,
> + struct ethtool_wolinfo *wol)
> +{
> + int value;
> +
> + wol->supported = (WAKE_MAGIC | WAKE_MAGICSECURE);
> + wol->wolopts = 0;
> +
> + value = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
> + if (value & DP83822_WOL_MAGIC_EN)
> + wol->wolopts |= WAKE_MAGIC;
> +
> + if (value & DP83822_WOL_SECURE_ON)
> + wol->wolopts |= WAKE_MAGICSECURE;
> +
> + if (~value & DP83822_WOL_CLR_INDICATION)
> + wol->wolopts = 0;
> +
> + wol->sopass[0] = (phy_read_mmd(phydev,
> + DP83822_DEVADDR,
> + MII_DP83822_RXSOP1) & 0xFF);
> + wol->sopass[1] =
> + (phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP1) >> 8);
You can save about twice the amount of reads by using a temporary
variable to hold the 16-bit register ;)
> + wol->sopass[2] =
> + (phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP2) & 0xFF);
> + wol->sopass[3] =
> + (phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP2) >> 8);
> + wol->sopass[4] =
> + (phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP3) & 0xFF);
> + wol->sopass[5] =
> + (phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_RXSOP3) >> 8);
Unless DP83822_WOL_SECURE_ON is set, you probably should not try reading
the password at all, because there is no guarantee it has been correctly
set.
> +}
> +static int dp83822_phy_reset(struct phy_device *phydev)
> +{
> + int err;
> +
> + err = phy_write(phydev, MII_DP83822_RESET_CTRL, DP83822_HW_RESET);
> + if (err < 0)
> + return err;
> +
> + return 0;
> +}
> +
> +static int dp83822_suspend(struct phy_device *phydev)
> +{
> + int value;
> +
> + mutex_lock(&phydev->lock);
> +
> + value = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
> + if (~value & DP83822_WOL_EN) {
> + value = phy_read(phydev, MII_BMCR);
> + phy_write(phydev, MII_BMCR, value | BMCR_PDOWN);
> + }
Can you use genphy_suspend() here along with careful locking of course?
> +
> + mutex_unlock(&phydev->lock);
> +
> + return 0;
> +}
> +
> +static int dp83822_resume(struct phy_device *phydev)
> +{
> + int value;
> +
> + mutex_lock(&phydev->lock);
> +
> + value = phy_read(phydev, MII_BMCR);
> + phy_write(phydev, MII_BMCR, value & ~BMCR_PDOWN);
And genphy_resume() here as well?
> +
> + value = phy_read_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG);
> +
> + phy_write_mmd(phydev, DP83822_DEVADDR, MII_DP83822_WOL_CFG, value |
> + DP83822_WOL_CLR_INDICATION);
> +
> + mutex_unlock(&phydev->lock);
> +
> + return 0;
> +}
> +
> +static struct phy_driver dp83822_driver[] = {
> + {
> + .phy_id = DP83822_PHY_ID,
> + .phy_id_mask = 0xfffffff0,
> + .name = "TI DP83822",
> + .features = PHY_BASIC_FEATURES,
> + .flags = PHY_HAS_INTERRUPT,
> +
> + .config_init = genphy_config_init,
> + .soft_reset = dp83822_phy_reset,
> +
> + .get_wol = dp83822_get_wol,
> + .set_wol = dp83822_set_wol,
> +
> + /* IRQ related */
> + .ack_interrupt = dp83822_ack_interrupt,
> + .config_intr = dp83822_config_intr,
> +
> + .config_aneg = genphy_config_aneg,
> + .read_status = genphy_read_status,
> + .suspend = dp83822_suspend,
> + .resume = dp83822_resume,
> + },
I would omit newlines between definitions of callbacks, but this is
really a personal preference. Unless you are planning on adding new IDs,
you could also avoid using an array of 1 element and just a plain
phy_driver structure, but that's not a big deal either.
--
Florian
^ permalink raw reply
* Re: [patch net-next v2 0/7] mlxsw: Add support for partial multicast route offload
From: David Miller @ 2017-10-03 17:07 UTC (permalink / raw)
To: jiri
Cc: netdev, yotamg, idosch, mlxsw, nikolay, andrew, dsa, edumazet,
willemb, johannes.berg, dcaratti, pabeni, daniel, f.fainelli, fw,
gfree.wind
In-Reply-To: <20171003075812.1540-1-jiri@resnulli.us>
From: Jiri Pirko <jiri@resnulli.us>
Date: Tue, 3 Oct 2017 09:58:05 +0200
> From: Jiri Pirko <jiri@mellanox.com>
>
> Yotam says:
>
> Previous patchset introduced support for offloading multicast MFC routes to
> the Spectrum hardware. As described in that patchset, no partial offloading
> is supported, i.e if a route has one output interface which is not a valid
> offloadable device (e.g. pimreg device, dummy device, management NIC), the
> route is trapped to the CPU and the forwarding is done in slow-path.
>
> Add support for partial offloading of multicast routes, by letting the
> hardware to forward the packet to all the in-hardware devices, while the
> kernel ipmr module will continue forwarding to all other interfaces.
...
Series applied, thanks.
^ permalink raw reply
* Re: [v2] cdc-ether: divorce initialisation with a filter reset and a generic method
From: David Miller @ 2017-10-03 16:59 UTC (permalink / raw)
To: vigneshr; +Cc: oneukum, netdev, bjorn
In-Reply-To: <fbe82c0f-c3d5-4dd6-62a7-b8fd2ea83302@ti.com>
From: Vignesh R <vigneshr@ti.com>
Date: Tue, 3 Oct 2017 13:39:18 +0530
> Hi Dave,
>
> On Monday 22 May 2017 06:20 PM, Oliver Neukum wrote:
>> Some devices need their multicast filter reset but others are crashed by that.
>> So the methods need to be separated.
>>
>> Signed-off-by: Oliver Neukum <oneukum@suse.com>
>> Reported-by: "Ridgway, Keith" <kridgway@harris.com>
>> ---
>
> I see this patch was merged and said to be queued for -stable, but I
> don't see this fix in any of the stable kernels. Would you please ask
> for this patch to be backported to -stable kernels?
I submit networking -stable patches at a time of my own choosing.
Usually 1 to 2 weeks after it gets applied to my tree.
Please be patient.
^ permalink raw reply
* [PATCH v3 2/2] ARM: sun8i: h2+: specify wifi interrupts for Orange Pi Zero
From: Icenowy Zheng @ 2017-10-03 16:59 UTC (permalink / raw)
To: Kalle Valo, Rob Herring, Maxime Ripard, Chen-Yu Tsai
Cc: linux-wireless, netdev, devicetree, linux-kernel,
linux-arm-kernel, linux-sunxi, Sergey Matyukevich, Icenowy Zheng
In-Reply-To: <20171003165944.13056-1-icenowy@aosc.io>
From: Sergey Matyukevich <geomatsi@gmail.com>
The Orange Pi Zero board has Allwinner XR819 SDIO wifi chip. The board
dts file provides a node enabling mmc1 controller, and a out-of-band
interrupt line of the chip is also connected, although the chip also
supports in-band interrupt.
The current out-of-tree driver is hardcoded to use out-of-band interrupt
as default, and it needs to be modified to use the in-band interrupt.
This commit adds the out-of-band interrupt line into the device tree.
Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com>
[Icenowy: Changed vendor prefix to allwinner and modify commit message]
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
---
Changes in v3 by Icenowy:
- Change the compatible string vendor prefix to "allwinner".
- Modify the commit message.
Changes in v2 by Sergey:
- Adds the compatible string.
arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts b/arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts
index b1502df7b509..6595617204b3 100644
--- a/arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts
+++ b/arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts
@@ -127,6 +127,9 @@
*/
xr819: sdio_wifi@1 {
reg = <1>;
+ compatible = "allwinner,xr819";
+ interrupt-parent = <&pio>;
+ interrupts = <6 10 IRQ_TYPE_EDGE_RISING>;
};
};
--
2.13.5
^ permalink raw reply related
* [PATCH v3 1/2] dt-bindings: add device tree binding for Allwinner XR819 SDIO Wi-Fi
From: Icenowy Zheng @ 2017-10-03 16:59 UTC (permalink / raw)
To: Kalle Valo, Rob Herring, Maxime Ripard, Chen-Yu Tsai
Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-sunxi-/JYPxA39Uh5TLH3MbocFFw, Icenowy Zheng
In-Reply-To: <20171003165944.13056-1-icenowy-h8G6r0blFSE@public.gmane.org>
Allwinner XR819 is a SDIO Wi-Fi chip, which has the functionality to use
an out-of-band interrupt pin instead of SDIO in-band interrupt.
Add the device tree binding of this chip, in order to make it possible
to add this interrupt pin to device trees.
Signed-off-by: Icenowy Zheng <icenowy-h8G6r0blFSE@public.gmane.org>
Acked-by: Rob Herring <robh-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
Changes in v3:
- Renames the node name.
- Adds ACK from Rob.
Changes in v2:
- Removed status property in example.
- Added required property reg.
.../bindings/net/wireless/allwinner,xr819.txt | 38 ++++++++++++++++++++++
1 file changed, 38 insertions(+)
create mode 100644 Documentation/devicetree/bindings/net/wireless/allwinner,xr819.txt
diff --git a/Documentation/devicetree/bindings/net/wireless/allwinner,xr819.txt b/Documentation/devicetree/bindings/net/wireless/allwinner,xr819.txt
new file mode 100644
index 000000000000..7ae40441e343
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/wireless/allwinner,xr819.txt
@@ -0,0 +1,38 @@
+Allwinner XRadio wireless SDIO devices
+
+This node provides properties for controlling the XRadio wireless device. The
+node is expected to be specified as a child node to the SDIO controller that
+connects the device to the system.
+
+Required properties:
+
+ - reg : The SDIO function number, see "Use of function subnodes" in
+ ../../mmc/mmc.txt.
+ - compatible : Should be "allwinner,xr819".
+
+Optional properties:
+ - interrupt-parent : the phandle for the interrupt controller to which the
+ device interrupts are connected.
+ - interrupts : specifies attributes for the out-of-band interrupt (host-wake).
+ When not specified the device will use in-band SDIO interrupts.
+
+Example:
+
+mmc1: mmc@01c10000 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ pinctrl-names = "default";
+ pinctrl-0 = <&mmc1_pins_a>;
+ vmmc-supply = <®_vcc_wifi>;
+ mmc-pwrseq = <&wifi_pwrseq>;
+ bus-width = <4>;
+ non-removable;
+
+ xr819: wifi@1 {
+ reg = <1>;
+ compatible = "allwinner,xr819";
+ interrupt-parent = <&pio>;
+ interrupts = <6 10 IRQ_TYPE_EDGE_RISING>;
+ };
+};
--
2.13.5
^ permalink raw reply related
* [PATCH v3 0/2] Allwinner XR819 SDIO Wi-Fi DT binding and OPi Zero XR819 IRQ
From: Icenowy Zheng @ 2017-10-03 16:59 UTC (permalink / raw)
To: Kalle Valo, Rob Herring, Maxime Ripard, Chen-Yu Tsai
Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-sunxi-/JYPxA39Uh5TLH3MbocFFw, Icenowy Zheng
The Allwinner XR819 SDIO Wi-Fi chip supports an out-of-band interrupt line,
and the in-band interrupt is also supported.
However the current out-of-tree driver uses the out-of-band interrupt by
default.
This patchset adds the device tree binding for the chip as well as the
out-of-band interrupt, then adds the interrupt to the device tree of
Orange Pi Zero.
Icenowy Zheng (1):
dt-bindings: add device tree binding for Allwinner XR819 SDIO Wi-Fi
Sergey Matyukevich (1):
ARM: sun8i: h2+: specify wifi interrupts for Orange Pi Zero
.../bindings/net/wireless/allwinner,xr819.txt | 38 ++++++++++++++++++++++
arch/arm/boot/dts/sun8i-h2-plus-orangepi-zero.dts | 3 ++
2 files changed, 41 insertions(+)
create mode 100644 Documentation/devicetree/bindings/net/wireless/allwinner,xr819.txt
--
2.13.5
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net-next] cxgb4: Update comment for min_mtu
From: David Miller @ 2017-10-03 16:56 UTC (permalink / raw)
To: ganeshgr; +Cc: netdev, nirranjan, indranil, venkatesh, arjun
In-Reply-To: <1507011185-7447-1-git-send-email-ganeshgr@chelsio.com>
From: Ganesh Goudar <ganeshgr@chelsio.com>
Date: Tue, 3 Oct 2017 11:43:05 +0530
> From: Arjun Vynipadath <arjun@chelsio.com>
>
> We have lost a comment for minimum mtu value set for netdevice with
> 'commit d894be57ca92 ("ethernet: use net core MTU range checking in
> more drivers"). Updating it accordingly.
>
> Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Applied.
^ permalink raw reply
* Re: [PATCH V2] Fix a sleep-in-atomic bug in shash_setkey_unaligned
From: Andy Lutomirski @ 2017-10-03 16:46 UTC (permalink / raw)
To: Herbert Xu
Cc: Andy Lutomirski, Jia-Ju Bai, David S. Miller, Neil Horman,
vyasevich, Kalle Valo, Linux Crypto Mailing List,
Network Development, linux-sctp, Linux Wireless List
In-Reply-To: <20171003052643.GB22750@gondor.apana.org.au>
On Mon, Oct 2, 2017 at 10:26 PM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Mon, Oct 02, 2017 at 09:18:24PM -0700, Andy Lutomirski wrote:
>> > On Oct 2, 2017, at 7:25 PM, Jia-Ju Bai <baijiaju1990@163.com> wrote:
>> >
>> > The SCTP program may sleep under a spinlock, and the function call path is:
>> > sctp_generate_t3_rtx_event (acquire the spinlock)
>> > sctp_do_sm
>> > sctp_side_effects
>> > sctp_cmd_interpreter
>> > sctp_make_init_ack
>> > sctp_pack_cookie
>> > crypto_shash_setkey
>> > shash_setkey_unaligned
>> > kmalloc(GFP_KERNEL)
>>
>> I'm going to go out on a limb here: why on Earth is out crypto API so
>> full of indirection that we allocate memory at all here?
>
> The crypto API operates on a one key per-tfm basis. So normally
> tfm allocation and key setting is done once only and not done on
> the data path.
>
> I have looked at the SCTP code and it appears to fit this paradigm.
> That is, we should be able to allocate the tfm and set the key when
> the key is actually generated via get_random_bytes, rather than every
> time the key is used which is not only a waste but as you see runs
> into API issues.
It's a waste because it loses a pre-computation advantage.
The fact that it has memory allocation issues is crypto API's fault,
full stop. There is no legit reason to need to allocate anything.
^ permalink raw reply
* Re: [PATCH net] net: br: Fix igmp snooping offload with CONFIG_BRIDGE_VLAN_FILTERING
From: Ido Schimmel @ 2017-10-03 16:42 UTC (permalink / raw)
To: Vivien Didelot
Cc: Andrew Lunn, Toshiaki Makita, Toshiaki Makita, David Miller,
netdev
In-Reply-To: <87infwtd0b.fsf@weeman.i-did-not-set--mail-host-address--so-tickle-me>
On Tue, Oct 03, 2017 at 12:25:08PM -0400, Vivien Didelot wrote:
> Andrew Lunn <andrew@lunn.ch> writes:
>
> >> The vlan will be effective only when vlan_filtering is enabled.
> >> When vlan_filtering is disabled, vlan information is still kept in the
> >> bridge and gets effective later when vlan_filtering becomes enable.
> >
> > O.K, so things are starting to get clearer.
> >
> > So when vlan filtering is disabled, the hardware should just ignore
> > the requests to add the vlan to the hardware?
> >
> > When vlan_filtering is enabled, are all the vlans in the software
> > bridge again offloaded? Or do we need to remember all the vlans which
> > we ignored while vlan filtering was disabled? The average switch has
> > nowhere to store these disabled vlans. It can only store active vlans.
>
> When vlan_filtering is enabled on the bridge, the bridge code does
> propagates the default_pvid again if I recall correctly.
>
> In my opinion the hardware mustn't ignore the VLAN requests, because we
> seem to agree that vlan_filtering disabled means that the target ports
> should not care yet about 802.1Q. So having some unused hardware VLAN
> entries and some ports with disabled 802.1Q mode must work together.
>
> That being said we still have the wrong hardware FDB populated when
> CONFIG_BRIDGE_VLAN_FILTERING is enabled but not vlan_filtering...
The driver can make sure it's able to handle the configured
`vlan_filtering` state during port enslavement to the bridge and also
forbid it from being toggled once it's enslaved.
^ permalink raw reply
* Re: BUG in free_netdev() on ppp link deletion
From: Guillaume Nault @ 2017-10-03 16:40 UTC (permalink / raw)
To: Beniamino Galvani
Cc: linux-ppp, netdev, Paul Mackerras, David Ahern, Gao Feng
In-Reply-To: <20171003074413.GA26158@tp>
On Tue, Oct 03, 2017 at 09:44:14AM +0200, Beniamino Galvani wrote:
> Call Trace:
> ppp_destroy_interface+0xd8/0xe0 [ppp_generic]
> ppp_disconnect_channel+0xda/0x110 [ppp_generic]
> ppp_unregister_channel+0x5e/0x110 [ppp_generic]
> pppox_unbind_sock+0x23/0x30 [pppox]
> pppoe_connect+0x130/0x440 [pppoe]
> SYSC_connect+0x98/0x110
> ? do_fcntl+0x2c0/0x5d0
> SyS_connect+0xe/0x10
> entry_SYSCALL_64_fastpath+0x1a/0xa5
> RIP: 0033:0x7fa71f4af840
> RSP: 002b:00007ffe4ea40bf8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
> RAX: ffffffffffffffda RBX: 0000556d37ae0538 RCX: 00007fa71f4af840
> RDX: 000000000000001e RSI: 00007ffe4ea40c00 RDI: 0000000000000008
> RBP: 0000556d37b2a1b0 R08: 0000556d396e95b0 R09: 0000000000000008
> R10: 00000000aaaaaaab R11: 0000000000000246 R12: 0000556d37adc008
> R13: 0000556d37adc004 R14: 0000556d37b2a1a4 R15: 0000000000000000
> Code: 04 00 00 04 e8 cb 52 e3 ff 5b 41 5c 41 5d 5d c3 41 0f b7 84 24 32 02 00 00 4c 89 e7 48 29 c7 e8 80 8b aa ff 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 41
> RIP: free_netdev+0x107/0x110 RSP: ffffc28a40573d88
> ---[ end trace ed294ff0cc40eeff ]---
>
> To reproduce this, establish a PPP connection through pppd, then bring
> down and delete the ppp interface:
>
> # pppd nodetach lock user client plugin rp-pppoe.so ens11 noauth nodeflate password password &
> Plugin rp-pppoe.so loaded.
> RP-PPPoE plugin version 3.8p compiled against pppd 2.4.7
> PPP session is 16
> Connected to fe:54:00:5f:04:13 via interface ens11
> Using interface ppp0
> Connect: ppp0 <--> ens11
> CHAP authentication succeeded: Access granted
> CHAP authentication succeeded
> peer from calling number FE:54:00:5F:04:13 authorized
> local IP address 3.1.1.10
> remote IP address 3.1.1.1
>
> # ip l set ppp0 down
> # ip l del ppp0
>
> It does not happen every time but only when ppp_destroy_interface() is
> called with dev->reg_state = UNREGISTERING, set by the concurrent
> rtnl_delete_link().
>
Indeed, we have a race here: ppp_destroy_interface() can be called before
netdev_run_todo() completes. I'm working on it.
^ permalink raw reply
* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Eric Dumazet @ 2017-10-03 16:36 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: Mark Rutland, LKML, netdev, linux-arm-kernel, syzkaller,
David S. Miller, Willem de Bruijn
In-Reply-To: <CACT4Y+azruiH-oEXVFen8WiDHhz0PS3DF7OdrkSCJYVRmkhoiQ@mail.gmail.com>
On Tue, Oct 3, 2017 at 9:06 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> On Tue, Oct 3, 2017 at 5:38 PM, 'Eric Dumazet' via syzkaller
> <syzkaller@googlegroups.com> wrote:
>> On Tue, Oct 3, 2017 at 8:19 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>>> On Mon, Oct 2, 2017 at 4:42 PM, 'Eric Dumazet' via syzkaller
>>> <syzkaller@googlegroups.com> wrote:
>>>> On Mon, Oct 2, 2017 at 7:21 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>>>>> Hi Eric,
>>>>>
>>>>> On Mon, Oct 02, 2017 at 06:36:32AM -0700, Eric Dumazet wrote:
>>>>>> On Mon, Oct 2, 2017 at 3:49 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>>>>>> > I hit the below splat at net/core/skbuff.c:2626 while fuzzing v4.14-rc2
>>>>>> > on arm64 with Syzkaller. This is the BUG_ON(len) at the end of
>>>>>> > skb_copy_and_csum_bits().
>>>>>
>>>>>> > kernel BUG at net/core/skbuff.c:2626!
>>>>>
>>>>>> > [<ffff200009e03214>] skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
>>>>>> > [<ffff20000a01d244>] icmp_glue_bits+0xa4/0x2a0 net/ipv4/icmp.c:357
>>>>>> > [<ffff200009f3f0d4>] __ip_append_data+0x10e4/0x20a8 net/ipv4/ip_output.c:1018
>>>>>> > [<ffff200009f41a88>] ip_append_data.part.3+0xe8/0x1a0 net/ipv4/ip_output.c:1170
>>>>>> > [<ffff200009f46e74>] ip_append_data+0xa4/0xb0 net/ipv4/ip_output.c:1173
>>>>>> > [<ffff20000a01ccc8>] icmp_push_reply+0x1b8/0x690 net/ipv4/icmp.c:375
>>>>>> > [<ffff20000a0211b0>] icmp_send+0x1070/0x1890 net/ipv4/icmp.c:741
>>>>>> > [<ffff200009f41d48>] ip_fragment.constprop.4+0x208/0x340 net/ipv4/ip_output.c:552
>>>>>> > [<ffff200009f42228>] ip_finish_output+0x3a8/0xab0 net/ipv4/ip_output.c:315
>>>>>> > [<ffff200009f468c4>] NF_HOOK_COND include/linux/netfilter.h:238 [inline]
>>>>>> > [<ffff200009f468c4>] ip_output+0x284/0x790 net/ipv4/ip_output.c:405
>>>>>> > [<ffff200009f43204>] dst_output include/net/dst.h:458 [inline]
>>>>>> > [<ffff200009f43204>] ip_local_out+0x9c/0x1b8 net/ipv4/ip_output.c:124
>>>>>> > [<ffff200009f445e8>] ip_queue_xmit+0x850/0x18e0 net/ipv4/ip_output.c:504
>>>>>> > [<ffff200009fb091c>] tcp_transmit_skb+0x107c/0x3338 net/ipv4/tcp_output.c:1123
>>>>>> > [<ffff200009fbbcc4>] __tcp_retransmit_skb+0x614/0x1d18 net/ipv4/tcp_output.c:2847
>>>>>> > [<ffff200009fbd840>] tcp_send_loss_probe+0x478/0x7d0 net/ipv4/tcp_output.c:2457
>>>>>> > [<ffff200009fc707c>] tcp_write_timer_handler+0x50c/0x7e8 net/ipv4/tcp_timer.c:557
>>>>>> > [<ffff200009fc73d0>] tcp_write_timer+0x78/0x170 net/ipv4/tcp_timer.c:579
>>>>>> > [<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
>>>>>> > [<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
>>>>>> > [<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
>>>>>> > [<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
>>>>>> > [<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
>>>>>> > [<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
>>>>>> > [<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
>>>>>> > [<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
>>>>>> > [<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
>>>>>> > [<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
>>>>>> > [<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367
>>>>>
>>>>>> This is most likely a bug caused by syzkaller setting a ridiculous MTU
>>>>>> on loopback device, below minimum size of ipv4 MTU.
>>>>>
>>>>>> I tried to track it in August [1], but it seems hard to find all the
>>>>>> issues with this.
>>>>>>
>>>>>> commit c780a049f9bf442314335372c9abc4548bfe3e44
>>>>>> Author: Eric Dumazet <edumazet@google.com>
>>>>>> Date: Wed Aug 16 11:09:12 2017 -0700
>>>>>>
>>>>>> ipv4: better IP_MAX_MTU enforcement
>>>>>>
>>>>>> While working on yet another syzkaller report, I found
>>>>>> that our IP_MAX_MTU enforcements were not properly done.
>>>>>>
>>>>>> gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
>>>>>> final result can be bigger than IP_MAX_MTU :/
>>>>>>
>>>>>> This is a problem because device mtu can be changed on other cpus or
>>>>>> threads.
>>>>>>
>>>>>> While this patch does not fix the issue I am working on, it is
>>>>>> probably worth addressing it.
>>>>>
>>>>> Just to check I've understood correctly, are you suggesting that the
>>>>> IPv4 code should also check the dev->mtu against a IP_MIN_MTU (which
>>>>> doesn't seem to exist today)?
>>>>
>>>> We have plenty of places this is checked.
>>>>
>>>> For example, trying to set MTU < 68 usually removes IPv4 addresses and routes.
>>>>
>>>> Problem is : these checks are not fool proof yet.
>>>>
>>>> ( Only the admin was supposed to play these games )
>>>>
>>>>>
>>>>> Otherwise, I do spot another potential issue. The writer side (e.g. most
>>>>> net_device::ndo_change_mtu implementations and the __dev_set_mtu()
>>>>> fallback) doesn't use WRITE_ONCE().
>>>>
>>>> It does not matter how many strange values can be observed by the reader :
>>>> We must be fool proof anyway from reader point of view, so the
>>>> WRITE_ONCE() is not strictly needed.
>>>
>>>
>>> Note if writer stores some temporal garbage there (which C language
>>> perfectly allows), it does not matter what we do on reader side --
>>> reader won't get correct data anyway. Say mtu changes from 1000 to
>>> 2000, but writer temporary stores 1 there, reader can observe 1 while
>>> it must not. Synchronization is always a game of two.
>>
>> Since we have no sync here, a reader _must_ cope with any MTU value.
>>
>> We need to care of any value, so we do not care how dummy writers can be.
>>
>> Sure, a WRITE_ONCE() will help avoiding some strange values being written,
>> but since we _allow_ writers to write such strange values,
>> there is really no point pretending to be safe here.
>>
>> Adding a WRITE_ONCE() will not fix the bug.
>
>
> Reader must cope with any value. But there is an additional
> requirement that it must behave correctly.
As soon as we allow admin to change MTU on a device, we _are_ dropping packets.
This is absolutely normal so far.
On most NIC , an MTU change closes and opens the port, typically
blocking the device for several seconds.
Networking is best effort, there is no 'requirement' that networking
stack is supposed to cope with any events,
including silly admins.
The following is still legal
while :
do
ifconfig lo mu random
done
And nothing asks us to deliver any packet while this loop is running.
We only have to prevent crashes ;)
> If mtu was 1000 and then
> reset to 2000 once (and not other manipulations with mtu), then
> correct behavior is either (1) sending packets with mtu 1000 or (2)
> sending packets with mtu 2000 (after mtu change) and nothing else.
> Sending packets with mtu 500, dropping packets because mtu is observed
> to be 1, or formatting hard drive are all incorrect behaviors and must
> not happen.
>
> What you say is valid for communication with user-space
> (copy_form_user, etc). Because there we don't control write side and
> racy writes are indistinguishable from intentional writes that do the
> same.
This is absolutely not relevant to the bug we are looking at at this
moment, even if true.
Unless you found the root cause, I will kindly ask you to open another
thread on lkml
to not pollute this one.
Thank you !
^ permalink raw reply
* Re: [PATCH iproute2] iproute: build more easily on Android
From: Lorenzo Colitti @ 2017-10-03 16:35 UTC (permalink / raw)
To: enh; +Cc: Stephen Hemminger, netdev@vger.kernel.org
In-Reply-To: <CAJgzZor7VMuTWgRUvVp_q=DHEm5bdgHNWxBJ3wG+X9GWTEwB7Q@mail.gmail.com>
On Tue, Oct 3, 2017 at 5:23 AM, enh <enh@google.com> wrote:
>> Rather than moving everything, why not make kernel headers directory
>> configurable as part of the configure script setup process.
>
> the problem is that C libraries with their our own uapi headers still
> need your app-specific headers. to build iproute2 we need to put
> iproute2's include/ on our include path, but then the fact that your
> different uapi headers are *under* that directory causes the conflict.
Right - when building iproute2 we must have .../iproute2/include in
the include paths.
So when, say, ip/link_iptnl.c does #include <linux/in.h>, that file is
in two places in the path - the C library includes and the iproute
includes, and those two files conflict with each other.
There's no way to tell the compiler "use external/iproute2/include but
not external/iproute2/include/linux". But if the iproute2 files are in
uapi/ , then a simple #include <linux/in.h> won't find the UAPI copy
and the files won't conflict.
^ permalink raw reply
* [net-next 9/9] fm10k: fix mis-ordered parameters in declaration for .ndo_set_vf_bw
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171003163138.47569-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
We've had support for setting both a minimum and maximum bandwidth via
.ndo_set_vf_bw since commit 883a9ccbae56 ("fm10k: Add support for SR-IOV
to driver", 2014-09-20).
Likely because we do not support minimum rates, the declaration
mis-ordered the "unused" parameter, which causes warnings when analyzed
with cppcheck.
Fix this warning by properly declaring the min_rate and max_rate
variables in the declaration and definition (rather than using
"unused"). Also rename "rate" to max_rate so as to clarify that we only
support setting the maximum rate.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k.h | 4 ++--
drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 9 +++++----
2 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h b/drivers/net/ethernet/intel/fm10k/fm10k.h
index 40856bc0f3b9..46973fb234c5 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
@@ -562,8 +562,8 @@ s32 fm10k_iov_update_pvid(struct fm10k_intfc *interface, u16 glort, u16 pvid);
int fm10k_ndo_set_vf_mac(struct net_device *netdev, int vf_idx, u8 *mac);
int fm10k_ndo_set_vf_vlan(struct net_device *netdev,
int vf_idx, u16 vid, u8 qos, __be16 vlan_proto);
-int fm10k_ndo_set_vf_bw(struct net_device *netdev, int vf_idx, int rate,
- int unused);
+int fm10k_ndo_set_vf_bw(struct net_device *netdev, int vf_idx,
+ int __always_unused min_rate, int max_rate);
int fm10k_ndo_get_vf_config(struct net_device *netdev,
int vf_idx, struct ifla_vf_info *ivi);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
index 4a17cc903eed..ea3ab24265ee 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
@@ -613,7 +613,7 @@ int fm10k_ndo_set_vf_vlan(struct net_device *netdev, int vf_idx, u16 vid,
}
int fm10k_ndo_set_vf_bw(struct net_device *netdev, int vf_idx,
- int __always_unused unused, int rate)
+ int __always_unused min_rate, int max_rate)
{
struct fm10k_intfc *interface = netdev_priv(netdev);
struct fm10k_iov_data *iov_data = interface->iov_data;
@@ -624,14 +624,15 @@ int fm10k_ndo_set_vf_bw(struct net_device *netdev, int vf_idx,
return -EINVAL;
/* rate limit cannot be less than 10Mbs or greater than link speed */
- if (rate && ((rate < FM10K_VF_TC_MIN) || rate > FM10K_VF_TC_MAX))
+ if (max_rate &&
+ (max_rate < FM10K_VF_TC_MIN || max_rate > FM10K_VF_TC_MAX))
return -EINVAL;
/* store values */
- iov_data->vf_info[vf_idx].rate = rate;
+ iov_data->vf_info[vf_idx].rate = max_rate;
/* update hardware configuration */
- hw->iov.ops.configure_tc(hw, vf_idx, rate);
+ hw->iov.ops.configure_tc(hw, vf_idx, max_rate);
return 0;
}
--
2.14.2
^ permalink raw reply related
* [net-next 8/9] fm10k: prefer %s and __func__ for diagnostic prints
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171003163138.47569-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
Don't hard code the function names in the diagnostic output when these
reset related routines fail. Instead, use %s and __func__ so that future
refactors don't need to change the print outs.
Additionally, while we are here, add missing function header comments
for the new reset_prepare and reset_done function handlers.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 58538ce997e1..1e9ae3197b17 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -2588,11 +2588,18 @@ static void fm10k_io_resume(struct pci_dev *pdev)
if (err)
dev_warn(&pdev->dev,
- "fm10k_io_resume failed: %d\n", err);
+ "%s failed: %d\n", __func__, err);
else
netif_device_attach(netdev);
}
+/**
+ * fm10k_io_reset_prepare - called when PCI function is about to be reset
+ * @pdev: Pointer to PCI device
+ *
+ * This callback is called when the PCI function is about to be reset,
+ * allowing the device driver to prepare for it.
+ */
static void fm10k_io_reset_prepare(struct pci_dev *pdev)
{
/* warn incase we have any active VF devices */
@@ -2602,6 +2609,13 @@ static void fm10k_io_reset_prepare(struct pci_dev *pdev)
fm10k_prepare_suspend(pci_get_drvdata(pdev));
}
+/**
+ * fm10k_io_reset_done - called when PCI function has finished resetting
+ * @pdev: Pointer to PCI device
+ *
+ * This callback is called just after the PCI function is reset, such as via
+ * /sys/class/net/<enpX>/device/reset or similar.
+ */
static void fm10k_io_reset_done(struct pci_dev *pdev)
{
struct fm10k_intfc *interface = pci_get_drvdata(pdev);
@@ -2609,7 +2623,7 @@ static void fm10k_io_reset_done(struct pci_dev *pdev)
if (err) {
dev_warn(&pdev->dev,
- "fm10k_io_reset_notify failed: %d\n", err);
+ "%s failed: %d\n", __func__, err);
netif_device_detach(interface->netdev);
}
}
--
2.14.2
^ permalink raw reply related
* [net-next 7/9] fm10k: Fix misuse of net_ratelimit()
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
To: davem; +Cc: Joe Perches, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171003163138.47569-1-jeffrey.t.kirsher@intel.com>
From: Joe Perches <joe@perches.com>
Correct the backward logic using !net_ratelimit()
Miscellanea:
o Add a blank line before the error return label
Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 5d56ed5ad7a6..dbd69310f263 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -806,9 +806,10 @@ static int fm10k_tso(struct fm10k_ring *tx_ring,
tx_desc->mss = cpu_to_le16(skb_shinfo(skb)->gso_size);
return 1;
+
err_vxlan:
tx_ring->netdev->features &= ~NETIF_F_GSO_UDP_TUNNEL;
- if (!net_ratelimit())
+ if (net_ratelimit())
netdev_err(tx_ring->netdev,
"TSO requested for unsupported tunnel, disabling offload\n");
return -1;
--
2.14.2
^ permalink raw reply related
* [net-next 6/9] fm10k: bump version number
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171003163138.47569-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 189d52a8a605..5d56ed5ad7a6 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -28,7 +28,7 @@
#include "fm10k.h"
-#define DRV_VERSION "0.21.7-k"
+#define DRV_VERSION "0.22.1-k"
#define DRV_SUMMARY "Intel(R) Ethernet Switch Host Interface Driver"
const char fm10k_driver_version[] = DRV_VERSION;
char fm10k_driver_name[] = "fm10k";
--
2.14.2
^ permalink raw reply related
* [net-next 4/9] fm10k: introduce a message queue for MAC/VLAN messages
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171003163138.47569-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
Under some circumstances, when dealing with a large number of MAC
address or VLAN updates at once, the fm10k driver, particularly the VFs
can overload the mailbox with too many messages at once.
This results in a mailbox timeout, which causes the driver to initiate
a reset. During the reset, we re-send all the same messages that
originally caused the timeout. This results in a cycle of resets each
triggering a future reset.
To fix or avoid this, we introduce a workqueue item which monitors
a queue of MAC and VLAN requests. These requests are queued to the end
of the list, and we process as a FIFO periodically.
Initially we only handle requests for the netdev, but we do handle
unicast MAC addresses, multicast MAC addresses, and update VLAN
requests.
A future patch will add support to use this queue for handling MAC
update requests from the VF<->PF mailbox.
The MAC/VLAN work item will keep checking to make sure that each request
does not overflow the mailbox and cause a timeout. If it might, then the
work item will reschedule itself a short time later. This avoids any
reset cycle, since we never send the message if the mailbox is not
ready.
As an alternative, we tried increasing the mailbox message FIFO, but
this just delays the problem and results in needless memory waste on the
system. Our new message queue is dynamically allocated so only uses as
much memory as it needs. Additionally, it need not be contiguous like
the Tx and Rx FIFOs.
Note that this patch chose to only create a queue for MAC and VLAN
messages, since these are the only messages sent in a large enough
volume to cause the reset loop. Other messages are very unlikely to
overflow the mailbox Tx FIFO so easily.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k.h | 39 +++++
drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 199 ++++++++++++++++++-----
drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 201 ++++++++++++++++++++++++
3 files changed, 397 insertions(+), 42 deletions(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h b/drivers/net/ethernet/intel/fm10k/fm10k.h
index 74542e9d63c7..40856bc0f3b9 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
@@ -248,6 +248,29 @@ struct fm10k_udp_port {
__be16 port;
};
+enum fm10k_macvlan_request_type {
+ FM10K_UC_MAC_REQUEST,
+ FM10K_MC_MAC_REQUEST,
+ FM10K_VLAN_REQUEST
+};
+
+struct fm10k_macvlan_request {
+ enum fm10k_macvlan_request_type type;
+ struct list_head list;
+ union {
+ struct fm10k_mac_request {
+ u8 addr[ETH_ALEN];
+ u16 glort;
+ u16 vid;
+ } mac;
+ struct fm10k_vlan_request {
+ u32 vid;
+ u8 vsi;
+ } vlan;
+ };
+ bool set;
+};
+
/* one work queue for entire driver */
extern struct workqueue_struct *fm10k_workqueue;
@@ -276,6 +299,9 @@ enum fm10k_state_t {
__FM10K_SERVICE_SCHED,
__FM10K_SERVICE_REQUEST,
__FM10K_SERVICE_DISABLE,
+ __FM10K_MACVLAN_SCHED,
+ __FM10K_MACVLAN_REQUEST,
+ __FM10K_MACVLAN_DISABLE,
__FM10K_LINK_DOWN,
__FM10K_UPDATING_STATS,
/* This value must be last and determines the BITMAP size */
@@ -368,6 +394,12 @@ struct fm10k_intfc {
struct list_head vxlan_port;
struct list_head geneve_port;
+ /* MAC/VLAN update queue */
+ struct list_head macvlan_requests;
+ struct delayed_work macvlan_task;
+ /* MAC/VLAN update queue lock */
+ spinlock_t macvlan_lock;
+
#ifdef CONFIG_DEBUG_FS
struct dentry *dbg_intfc;
#endif /* CONFIG_DEBUG_FS */
@@ -487,6 +519,7 @@ void fm10k_up(struct fm10k_intfc *interface);
void fm10k_down(struct fm10k_intfc *interface);
void fm10k_update_stats(struct fm10k_intfc *interface);
void fm10k_service_event_schedule(struct fm10k_intfc *interface);
+void fm10k_macvlan_schedule(struct fm10k_intfc *interface);
void fm10k_update_rx_drop_en(struct fm10k_intfc *interface);
#ifdef CONFIG_NET_POLL_CONTROLLER
void fm10k_netpoll(struct net_device *netdev);
@@ -507,6 +540,12 @@ void fm10k_reset_rx_state(struct fm10k_intfc *);
int fm10k_setup_tc(struct net_device *dev, u8 tc);
int fm10k_open(struct net_device *netdev);
int fm10k_close(struct net_device *netdev);
+int fm10k_queue_vlan_request(struct fm10k_intfc *interface, u32 vid,
+ u8 vsi, bool set);
+int fm10k_queue_mac_request(struct fm10k_intfc *interface, u16 glort,
+ const unsigned char *addr, u16 vid, bool set);
+void fm10k_clear_macvlan_queue(struct fm10k_intfc *interface,
+ u16 glort, bool vlans);
/* Ethtool */
void fm10k_set_ethtool_ops(struct net_device *dev);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
index 77d495fedced..81e4425f0529 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_netdev.c
@@ -758,11 +758,132 @@ static bool fm10k_host_mbx_ready(struct fm10k_intfc *interface)
return (hw->mac.type == fm10k_mac_vf || interface->host_ready);
}
+/**
+ * fm10k_queue_vlan_request - Queue a VLAN update request
+ * @interface: the fm10k interface structure
+ * @vid: the VLAN vid
+ * @vsi: VSI index number
+ * @set: whether to set or clear
+ *
+ * This function queues up a VLAN update. For VFs, this must be sent to the
+ * managing PF over the mailbox. For PFs, we'll use the same handling so that
+ * it's similar to the VF. This avoids storming the PF<->VF mailbox with too
+ * many VLAN updates during reset.
+ */
+int fm10k_queue_vlan_request(struct fm10k_intfc *interface,
+ u32 vid, u8 vsi, bool set)
+{
+ struct fm10k_macvlan_request *request;
+ unsigned long flags;
+
+ /* This must be atomic since we may be called while the netdev
+ * addr_list_lock is held
+ */
+ request = kzalloc(sizeof(*request), GFP_ATOMIC);
+ if (!request)
+ return -ENOMEM;
+
+ request->type = FM10K_VLAN_REQUEST;
+ request->vlan.vid = vid;
+ request->vlan.vsi = vsi;
+ request->set = set;
+
+ spin_lock_irqsave(&interface->macvlan_lock, flags);
+ list_add_tail(&request->list, &interface->macvlan_requests);
+ spin_unlock_irqrestore(&interface->macvlan_lock, flags);
+
+ fm10k_macvlan_schedule(interface);
+
+ return 0;
+}
+
+/**
+ * fm10k_queue_mac_request - Queue a MAC update request
+ * @interface: the fm10k interface structure
+ * @glort: the target glort for this update
+ * @addr: the address to update
+ * @vid: the vid to update
+ * @sync: whether to add or remove
+ *
+ * This function queues up a MAC request for sending to the switch manager.
+ * A separate thread monitors the queue and sends updates to the switch
+ * manager. Return 0 on success, and negative error code on failure.
+ **/
+int fm10k_queue_mac_request(struct fm10k_intfc *interface, u16 glort,
+ const unsigned char *addr, u16 vid, bool set)
+{
+ struct fm10k_macvlan_request *request;
+ unsigned long flags;
+
+ /* This must be atomic since we may be called while the netdev
+ * addr_list_lock is held
+ */
+ request = kzalloc(sizeof(*request), GFP_ATOMIC);
+ if (!request)
+ return -ENOMEM;
+
+ if (is_multicast_ether_addr(addr))
+ request->type = FM10K_MC_MAC_REQUEST;
+ else
+ request->type = FM10K_UC_MAC_REQUEST;
+
+ ether_addr_copy(request->mac.addr, addr);
+ request->mac.glort = glort;
+ request->mac.vid = vid;
+ request->set = set;
+
+ spin_lock_irqsave(&interface->macvlan_lock, flags);
+ list_add_tail(&request->list, &interface->macvlan_requests);
+ spin_unlock_irqrestore(&interface->macvlan_lock, flags);
+
+ fm10k_macvlan_schedule(interface);
+
+ return 0;
+}
+
+/**
+ * fm10k_clear_macvlan_queue - Cancel pending updates for a given glort
+ * @interface: the fm10k interface structure
+ * @glort: the target glort to clear
+ * @vlans: true to clear VLAN messages, false to ignore them
+ *
+ * Cancel any outstanding MAC/VLAN requests for a given glort. This is
+ * expected to be called when a logical port goes down.
+ **/
+void fm10k_clear_macvlan_queue(struct fm10k_intfc *interface,
+ u16 glort, bool vlans)
+
+{
+ struct fm10k_macvlan_request *r, *tmp;
+ unsigned long flags;
+
+ spin_lock_irqsave(&interface->macvlan_lock, flags);
+
+ /* Free any outstanding MAC/VLAN requests for this interface */
+ list_for_each_entry_safe(r, tmp, &interface->macvlan_requests, list) {
+ switch (r->type) {
+ case FM10K_MC_MAC_REQUEST:
+ case FM10K_UC_MAC_REQUEST:
+ /* Don't free requests for other interfaces */
+ if (r->mac.glort != glort)
+ break;
+ /* fall through */
+ case FM10K_VLAN_REQUEST:
+ if (vlans) {
+ list_del(&r->list);
+ kfree(r);
+ }
+ break;
+ }
+ }
+
+ spin_unlock_irqrestore(&interface->macvlan_lock, flags);
+}
+
static int fm10k_uc_vlan_unsync(struct net_device *netdev,
const unsigned char *uc_addr)
{
struct fm10k_intfc *interface = netdev_priv(netdev);
- struct fm10k_hw *hw = &interface->hw;
u16 glort = interface->glort;
u16 vid = interface->vid;
bool set = !!(vid / VLAN_N_VID);
@@ -771,10 +892,7 @@ static int fm10k_uc_vlan_unsync(struct net_device *netdev,
/* drop any leading bits on the VLAN ID */
vid &= VLAN_N_VID - 1;
- if (fm10k_host_mbx_ready(interface))
- err = hw->mac.ops.update_uc_addr(hw, glort, uc_addr,
- vid, set, 0);
-
+ err = fm10k_queue_mac_request(interface, glort, uc_addr, vid, set);
if (err)
return err;
@@ -786,7 +904,6 @@ static int fm10k_mc_vlan_unsync(struct net_device *netdev,
const unsigned char *mc_addr)
{
struct fm10k_intfc *interface = netdev_priv(netdev);
- struct fm10k_hw *hw = &interface->hw;
u16 glort = interface->glort;
u16 vid = interface->vid;
bool set = !!(vid / VLAN_N_VID);
@@ -795,9 +912,7 @@ static int fm10k_mc_vlan_unsync(struct net_device *netdev,
/* drop any leading bits on the VLAN ID */
vid &= VLAN_N_VID - 1;
- if (fm10k_host_mbx_ready(interface))
- err = hw->mac.ops.update_mc_addr(hw, glort, mc_addr, vid, set);
-
+ err = fm10k_queue_mac_request(interface, glort, mc_addr, vid, set);
if (err)
return err;
@@ -855,18 +970,14 @@ static int fm10k_update_vid(struct net_device *netdev, u16 vid, bool set)
/* only need to update the VLAN if not in promiscuous mode */
if (!(netdev->flags & IFF_PROMISC)) {
- err = hw->mac.ops.update_vlan(hw, vid, 0, set);
+ err = fm10k_queue_vlan_request(interface, vid, 0, set);
if (err)
goto err_out;
}
- /* update our base MAC address if host's mailbox is ready */
- if (fm10k_host_mbx_ready(interface))
- err = hw->mac.ops.update_uc_addr(hw, interface->glort,
- hw->mac.addr, vid, set, 0);
- else
- err = -EHOSTDOWN;
-
+ /* Update our base MAC address */
+ err = fm10k_queue_mac_request(interface, interface->glort,
+ hw->mac.addr, vid, set);
if (err)
goto err_out;
@@ -910,7 +1021,6 @@ static u16 fm10k_find_next_vlan(struct fm10k_intfc *interface, u16 vid)
static void fm10k_clear_unused_vlans(struct fm10k_intfc *interface)
{
- struct fm10k_hw *hw = &interface->hw;
u32 vid, prev_vid;
/* loop through and find any gaps in the table */
@@ -922,7 +1032,7 @@ static void fm10k_clear_unused_vlans(struct fm10k_intfc *interface)
/* send request to clear multiple bits at a time */
prev_vid += (vid - prev_vid - 1) << FM10K_VLAN_LENGTH_SHIFT;
- hw->mac.ops.update_vlan(hw, prev_vid, 0, false);
+ fm10k_queue_vlan_request(interface, prev_vid, 0, false);
}
}
@@ -937,15 +1047,11 @@ static int __fm10k_uc_sync(struct net_device *dev,
if (!is_valid_ether_addr(addr))
return -EADDRNOTAVAIL;
- /* update table with current entries if host's mailbox is ready */
- if (!fm10k_host_mbx_ready(interface))
- return -EHOSTDOWN;
-
for (vid = hw->mac.default_vid ? fm10k_find_next_vlan(interface, 0) : 1;
vid < VLAN_N_VID;
vid = fm10k_find_next_vlan(interface, vid)) {
- err = hw->mac.ops.update_uc_addr(hw, glort, addr,
- vid, sync, 0);
+ err = fm10k_queue_mac_request(interface, glort,
+ addr, vid, sync);
if (err)
return err;
}
@@ -1002,15 +1108,18 @@ static int __fm10k_mc_sync(struct net_device *dev,
struct fm10k_intfc *interface = netdev_priv(dev);
struct fm10k_hw *hw = &interface->hw;
u16 vid, glort = interface->glort;
+ s32 err;
- /* update table with current entries if host's mailbox is ready */
- if (!fm10k_host_mbx_ready(interface))
- return 0;
+ if (!is_multicast_ether_addr(addr))
+ return -EADDRNOTAVAIL;
for (vid = hw->mac.default_vid ? fm10k_find_next_vlan(interface, 0) : 1;
vid < VLAN_N_VID;
vid = fm10k_find_next_vlan(interface, vid)) {
- hw->mac.ops.update_mc_addr(hw, glort, addr, vid, sync);
+ err = fm10k_queue_mac_request(interface, glort,
+ addr, vid, sync);
+ if (err)
+ return err;
}
return 0;
@@ -1050,7 +1159,8 @@ static void fm10k_set_rx_mode(struct net_device *dev)
if (interface->xcast_mode != xcast_mode) {
/* update VLAN table */
if (xcast_mode == FM10K_XCAST_MODE_PROMISC)
- hw->mac.ops.update_vlan(hw, FM10K_VLAN_ALL, 0, true);
+ fm10k_queue_vlan_request(interface, FM10K_VLAN_ALL,
+ 0, true);
if (interface->xcast_mode == FM10K_XCAST_MODE_PROMISC)
fm10k_clear_unused_vlans(interface);
@@ -1098,22 +1208,20 @@ void fm10k_restore_rx_state(struct fm10k_intfc *interface)
interface->glort_count, true);
/* update VLAN table */
- hw->mac.ops.update_vlan(hw, FM10K_VLAN_ALL, 0,
- xcast_mode == FM10K_XCAST_MODE_PROMISC);
+ fm10k_queue_vlan_request(interface, FM10K_VLAN_ALL, 0,
+ xcast_mode == FM10K_XCAST_MODE_PROMISC);
/* Add filter for VLAN 0 */
- hw->mac.ops.update_vlan(hw, 0, 0, true);
+ fm10k_queue_vlan_request(interface, 0, 0, true);
/* update table with current entries */
for (vid = hw->mac.default_vid ? fm10k_find_next_vlan(interface, 0) : 1;
vid < VLAN_N_VID;
vid = fm10k_find_next_vlan(interface, vid)) {
- hw->mac.ops.update_vlan(hw, vid, 0, true);
+ fm10k_queue_vlan_request(interface, vid, 0, true);
- /* Update unicast entries if host's mailbox is ready */
- if (fm10k_host_mbx_ready(interface))
- hw->mac.ops.update_uc_addr(hw, glort, hw->mac.addr,
- vid, true, 0);
+ fm10k_queue_mac_request(interface, glort,
+ hw->mac.addr, vid, true);
}
/* update xcast mode before synchronizing addresses if host's mailbox
@@ -1140,6 +1248,13 @@ void fm10k_reset_rx_state(struct fm10k_intfc *interface)
struct net_device *netdev = interface->netdev;
struct fm10k_hw *hw = &interface->hw;
+ /* Wait for MAC/VLAN work to finish */
+ while (test_bit(__FM10K_MACVLAN_SCHED, interface->state))
+ usleep_range(1000, 2000);
+
+ /* Cancel pending MAC/VLAN requests */
+ fm10k_clear_macvlan_queue(interface, interface->glort, true);
+
fm10k_mbx_lock(interface);
/* clear the logical port state on lower device if host's mailbox is
@@ -1374,8 +1489,8 @@ static void *fm10k_dfwd_add_station(struct net_device *dev,
if (fm10k_host_mbx_ready(interface)) {
hw->mac.ops.update_xcast_mode(hw, glort,
FM10K_XCAST_MODE_MULTI);
- hw->mac.ops.update_uc_addr(hw, glort, sdev->dev_addr,
- 0, true, 0);
+ fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
+ 0, true);
}
fm10k_mbx_unlock(interface);
@@ -1414,8 +1529,8 @@ static void fm10k_dfwd_del_station(struct net_device *dev, void *priv)
if (fm10k_host_mbx_ready(interface)) {
hw->mac.ops.update_xcast_mode(hw, glort,
FM10K_XCAST_MODE_NONE);
- hw->mac.ops.update_uc_addr(hw, glort, sdev->dev_addr,
- 0, false, 0);
+ fm10k_queue_mac_request(interface, glort, sdev->dev_addr,
+ 0, false);
}
fm10k_mbx_unlock(interface);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index aef39909e4a2..58538ce997e1 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -91,6 +91,76 @@ static int fm10k_hw_ready(struct fm10k_intfc *interface)
return FM10K_REMOVED(hw->hw_addr) ? -ENODEV : 0;
}
+/**
+ * fm10k_macvlan_schedule - Schedule MAC/VLAN queue task
+ * @interface: fm10k private interface structure
+ *
+ * Schedule the MAC/VLAN queue monitor task. If the MAC/VLAN task cannot be
+ * started immediately, request that it be restarted when possible.
+ */
+void fm10k_macvlan_schedule(struct fm10k_intfc *interface)
+{
+ /* Avoid processing the MAC/VLAN queue when the service task is
+ * disabled, or when we're resetting the device.
+ */
+ if (!test_bit(__FM10K_MACVLAN_DISABLE, interface->state) &&
+ !test_and_set_bit(__FM10K_MACVLAN_SCHED, interface->state)) {
+ clear_bit(__FM10K_MACVLAN_REQUEST, interface->state);
+ /* We delay the actual start of execution in order to allow
+ * multiple MAC/VLAN updates to accumulate before handling
+ * them, and to allow some time to let the mailbox drain
+ * between runs.
+ */
+ queue_delayed_work(fm10k_workqueue,
+ &interface->macvlan_task, 10);
+ } else {
+ set_bit(__FM10K_MACVLAN_REQUEST, interface->state);
+ }
+}
+
+/**
+ * fm10k_stop_macvlan_task - Stop the MAC/VLAN queue monitor
+ * @interface: fm10k private interface structure
+ *
+ * Wait until the MAC/VLAN queue task has stopped, and cancel any future
+ * requests.
+ */
+static void fm10k_stop_macvlan_task(struct fm10k_intfc *interface)
+{
+ /* Disable the MAC/VLAN work item */
+ set_bit(__FM10K_MACVLAN_DISABLE, interface->state);
+
+ /* Make sure we waited until any current invocations have stopped */
+ cancel_delayed_work_sync(&interface->macvlan_task);
+
+ /* We set the __FM10K_MACVLAN_SCHED bit when we schedule the task.
+ * However, it may not be unset of the MAC/VLAN task never actually
+ * got a chance to run. Since we've canceled the task here, and it
+ * cannot be rescheuled right now, we need to ensure the scheduled bit
+ * gets unset.
+ */
+ clear_bit(__FM10K_MACVLAN_SCHED, interface->state);
+}
+
+/**
+ * fm10k_resume_macvlan_task - Restart the MAC/VLAN queue monitor
+ * @interface: fm10k private interface structure
+ *
+ * Clear the __FM10K_MACVLAN_DISABLE bit and, if a request occurred, schedule
+ * the MAC/VLAN work monitor.
+ */
+static void fm10k_resume_macvlan_task(struct fm10k_intfc *interface)
+{
+ /* Re-enable the MAC/VLAN work item */
+ clear_bit(__FM10K_MACVLAN_DISABLE, interface->state);
+
+ /* We might have received a MAC/VLAN request while disabled. If so,
+ * kick off the queue now.
+ */
+ if (test_bit(__FM10K_MACVLAN_REQUEST, interface->state))
+ fm10k_macvlan_schedule(interface);
+}
+
void fm10k_service_event_schedule(struct fm10k_intfc *interface)
{
if (!test_bit(__FM10K_SERVICE_DISABLE, interface->state) &&
@@ -174,6 +244,12 @@ static bool fm10k_prepare_for_reset(struct fm10k_intfc *interface)
if (test_and_set_bit(__FM10K_RESETTING, interface->state))
return false;
+ /* As the MAC/VLAN task will be accessing registers it must not be
+ * running while we reset. Although the task will not be scheduled
+ * once we start resetting it may already be running
+ */
+ fm10k_stop_macvlan_task(interface);
+
rtnl_lock();
fm10k_iov_suspend(interface->pdev);
@@ -258,6 +334,8 @@ static int fm10k_handle_reset(struct fm10k_intfc *interface)
rtnl_unlock();
+ fm10k_resume_macvlan_task(interface);
+
clear_bit(__FM10K_RESETTING, interface->state);
return err;
@@ -686,6 +764,112 @@ static void fm10k_service_task(struct work_struct *work)
fm10k_service_event_complete(interface);
}
+/**
+ * fm10k_macvlan_task - send queued MAC/VLAN requests to switch manager
+ * @work: pointer to work_struct containing our data
+ *
+ * This work item handles sending MAC/VLAN updates to the switch manager. When
+ * the interface is up, it will attempt to queue mailbox messages to the
+ * switch manager requesting updates for MAC/VLAN pairs. If the Tx fifo of the
+ * mailbox is full, it will reschedule itself to try again in a short while.
+ * This ensures that the driver does not overload the switch mailbox with too
+ * many simultaneous requests, causing an unnecessary reset.
+ **/
+static void fm10k_macvlan_task(struct work_struct *work)
+{
+ struct fm10k_macvlan_request *item;
+ struct fm10k_intfc *interface;
+ struct delayed_work *dwork;
+ struct list_head *requests;
+ struct fm10k_hw *hw;
+ unsigned long flags;
+
+ dwork = to_delayed_work(work);
+ interface = container_of(dwork, struct fm10k_intfc, macvlan_task);
+ hw = &interface->hw;
+ requests = &interface->macvlan_requests;
+
+ do {
+ /* Pop the first item off the list */
+ spin_lock_irqsave(&interface->macvlan_lock, flags);
+ item = list_first_entry_or_null(requests,
+ struct fm10k_macvlan_request,
+ list);
+ if (item)
+ list_del_init(&item->list);
+
+ spin_unlock_irqrestore(&interface->macvlan_lock, flags);
+
+ /* We have no more items to process */
+ if (!item)
+ goto done;
+
+ fm10k_mbx_lock(interface);
+
+ /* Check that we have plenty of space to send the message. We
+ * want to ensure that the mailbox stays low enough to avoid a
+ * change in the host state, otherwise we may see spurious
+ * link up / link down notifications.
+ */
+ if (!hw->mbx.ops.tx_ready(&hw->mbx, FM10K_VFMBX_MSG_MTU + 5)) {
+ hw->mbx.ops.process(hw, &hw->mbx);
+ set_bit(__FM10K_MACVLAN_REQUEST, interface->state);
+ fm10k_mbx_unlock(interface);
+
+ /* Put the request back on the list */
+ spin_lock_irqsave(&interface->macvlan_lock, flags);
+ list_add(&item->list, requests);
+ spin_unlock_irqrestore(&interface->macvlan_lock, flags);
+ break;
+ }
+
+ switch (item->type) {
+ case FM10K_MC_MAC_REQUEST:
+ hw->mac.ops.update_mc_addr(hw,
+ item->mac.glort,
+ item->mac.addr,
+ item->mac.vid,
+ item->set);
+ break;
+ case FM10K_UC_MAC_REQUEST:
+ hw->mac.ops.update_uc_addr(hw,
+ item->mac.glort,
+ item->mac.addr,
+ item->mac.vid,
+ item->set,
+ 0);
+ break;
+ case FM10K_VLAN_REQUEST:
+ hw->mac.ops.update_vlan(hw,
+ item->vlan.vid,
+ item->vlan.vsi,
+ item->set);
+ break;
+ default:
+ break;
+ }
+
+ fm10k_mbx_unlock(interface);
+
+ /* Free the item now that we've sent the update */
+ kfree(item);
+ } while (true);
+
+done:
+ WARN_ON(!test_bit(__FM10K_MACVLAN_SCHED, interface->state));
+
+ /* flush memory to make sure state is correct */
+ smp_mb__before_atomic();
+ clear_bit(__FM10K_MACVLAN_SCHED, interface->state);
+
+ /* If a MAC/VLAN request was scheduled since we started, we should
+ * re-schedule. However, there is no reason to re-schedule if there is
+ * no work to do.
+ */
+ if (test_bit(__FM10K_MACVLAN_REQUEST, interface->state))
+ fm10k_macvlan_schedule(interface);
+}
+
/**
* fm10k_configure_tx_ring - Configure Tx ring after Reset
* @interface: board private structure
@@ -1918,11 +2102,15 @@ static int fm10k_sw_init(struct fm10k_intfc *interface,
INIT_LIST_HEAD(&interface->vxlan_port);
INIT_LIST_HEAD(&interface->geneve_port);
+ /* Initialize the MAC/VLAN queue */
+ INIT_LIST_HEAD(&interface->macvlan_requests);
+
netdev_rss_key_fill(rss_key, sizeof(rss_key));
memcpy(interface->rssrk, rss_key, sizeof(rss_key));
/* Initialize the mailbox lock */
spin_lock_init(&interface->mbx_lock);
+ spin_lock_init(&interface->macvlan_lock);
/* Start off interface as being down */
set_bit(__FM10K_DOWN, interface->state);
@@ -2131,6 +2319,9 @@ static int fm10k_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
(unsigned long)interface);
INIT_WORK(&interface->service_task, fm10k_service_task);
+ /* Setup the MAC/VLAN queue */
+ INIT_DELAYED_WORK(&interface->macvlan_task, fm10k_macvlan_task);
+
/* kick off service timer now, even when interface is down */
mod_timer(&interface->service_timer, (HZ * 2) + jiffies);
@@ -2184,6 +2375,10 @@ static void fm10k_remove(struct pci_dev *pdev)
del_timer_sync(&interface->service_timer);
fm10k_stop_service_event(interface);
+ fm10k_stop_macvlan_task(interface);
+
+ /* Remove all pending MAC/VLAN requests */
+ fm10k_clear_macvlan_queue(interface, interface->glort, true);
/* free netdev, this may bounce the interrupts due to setup_tc */
if (netdev->reg_state == NETREG_REGISTERED)
@@ -2220,6 +2415,9 @@ static void fm10k_prepare_suspend(struct fm10k_intfc *interface)
* a surprise remove if the PCIe device is disabled while we're
* stopped. We stop the watchdog task until after we resume software
* activity.
+ *
+ * Note that the MAC/VLAN task will be stopped as part of preparing
+ * for reset so we don't need to handle it here.
*/
fm10k_stop_service_event(interface);
@@ -2259,6 +2457,9 @@ static int fm10k_handle_resume(struct fm10k_intfc *interface)
/* restart the service task */
fm10k_start_service_event(interface);
+ /* Restart the MAC/VLAN request queue in-case of outstanding events */
+ fm10k_macvlan_schedule(interface);
+
return err;
}
--
2.14.2
^ permalink raw reply related
* [net-next 5/9] fm10k: use the MAC/VLAN queue for VF<->PF MAC/VLAN requests
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171003163138.47569-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
Now that we have a working MAC/VLAN queue for handling MAC/VLAN messages
from the netdev, replace the default handler for the VF<->PF messages.
This new handler is very similar to the default code, but uses the
MAC/VLAN queue instead of sending the message directly. Unfortunately we
can't easily re-use the default code, so we'll just replace the entire
function.
This ensures that a VF requesting a large number of VLANs or MAC
addresses does not start a reset cycle, as explained in the commit which
introduced the message queue.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Ngai-mint Kwan <ngai-mint.kwan@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 132 ++++++++++++++++++++++++++-
drivers/net/ethernet/intel/fm10k/fm10k_pf.c | 2 +-
drivers/net/ethernet/intel/fm10k/fm10k_pf.h | 3 +-
3 files changed, 133 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
index 03897720bf0b..4a17cc903eed 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c
@@ -35,10 +35,133 @@ static s32 fm10k_iov_msg_error(struct fm10k_hw *hw, u32 **results,
return fm10k_tlv_msg_error(hw, results, mbx);
}
+/**
+ * fm10k_iov_msg_queue_mac_vlan - Message handler for MAC/VLAN request from VF
+ * @hw: Pointer to hardware structure
+ * @results: Pointer array to message, results[0] is pointer to message
+ * @mbx: Pointer to mailbox information structure
+ *
+ * This function is a custom handler for MAC/VLAN requests from the VF. The
+ * assumption is that it is acceptable to directly hand off the message from
+ * the VF to the PF's switch manager. However, we use a MAC/VLAN message
+ * queue to avoid overloading the mailbox when a large number of requests
+ * come in.
+ **/
+static s32 fm10k_iov_msg_queue_mac_vlan(struct fm10k_hw *hw, u32 **results,
+ struct fm10k_mbx_info *mbx)
+{
+ struct fm10k_vf_info *vf_info = (struct fm10k_vf_info *)mbx;
+ struct fm10k_intfc *interface = hw->back;
+ u8 mac[ETH_ALEN];
+ u32 *result;
+ int err = 0;
+ bool set;
+ u16 vlan;
+ u32 vid;
+
+ /* we shouldn't be updating rules on a disabled interface */
+ if (!FM10K_VF_FLAG_ENABLED(vf_info))
+ err = FM10K_ERR_PARAM;
+
+ if (!err && !!results[FM10K_MAC_VLAN_MSG_VLAN]) {
+ result = results[FM10K_MAC_VLAN_MSG_VLAN];
+
+ /* record VLAN id requested */
+ err = fm10k_tlv_attr_get_u32(result, &vid);
+ if (err)
+ return err;
+
+ set = !(vid & FM10K_VLAN_CLEAR);
+ vid &= ~FM10K_VLAN_CLEAR;
+
+ /* if the length field has been set, this is a multi-bit
+ * update request. For multi-bit requests, simply disallow
+ * them when the pf_vid has been set. In this case, the PF
+ * should have already cleared the VLAN_TABLE, and if we
+ * allowed them, it could allow a rogue VF to receive traffic
+ * on a VLAN it was not assigned. In the single-bit case, we
+ * need to modify requests for VLAN 0 to use the default PF or
+ * SW vid when assigned.
+ */
+
+ if (vid >> 16) {
+ /* prevent multi-bit requests when PF has
+ * administratively set the VLAN for this VF
+ */
+ if (vf_info->pf_vid)
+ return FM10K_ERR_PARAM;
+ } else {
+ err = fm10k_iov_select_vid(vf_info, (u16)vid);
+ if (err < 0)
+ return err;
+
+ vid = err;
+ }
+
+ /* update VSI info for VF in regards to VLAN table */
+ err = hw->mac.ops.update_vlan(hw, vid, vf_info->vsi, set);
+ }
+
+ if (!err && !!results[FM10K_MAC_VLAN_MSG_MAC]) {
+ result = results[FM10K_MAC_VLAN_MSG_MAC];
+
+ /* record unicast MAC address requested */
+ err = fm10k_tlv_attr_get_mac_vlan(result, mac, &vlan);
+ if (err)
+ return err;
+
+ /* block attempts to set MAC for a locked device */
+ if (is_valid_ether_addr(vf_info->mac) &&
+ !ether_addr_equal(mac, vf_info->mac))
+ return FM10K_ERR_PARAM;
+
+ set = !(vlan & FM10K_VLAN_CLEAR);
+ vlan &= ~FM10K_VLAN_CLEAR;
+
+ err = fm10k_iov_select_vid(vf_info, vlan);
+ if (err < 0)
+ return err;
+
+ vlan = (u16)err;
+
+ /* Add this request to the MAC/VLAN queue */
+ err = fm10k_queue_mac_request(interface, vf_info->glort,
+ mac, vlan, set);
+ }
+
+ if (!err && !!results[FM10K_MAC_VLAN_MSG_MULTICAST]) {
+ result = results[FM10K_MAC_VLAN_MSG_MULTICAST];
+
+ /* record multicast MAC address requested */
+ err = fm10k_tlv_attr_get_mac_vlan(result, mac, &vlan);
+ if (err)
+ return err;
+
+ /* verify that the VF is allowed to request multicast */
+ if (!(vf_info->vf_flags & FM10K_VF_FLAG_MULTI_ENABLED))
+ return FM10K_ERR_PARAM;
+
+ set = !(vlan & FM10K_VLAN_CLEAR);
+ vlan &= ~FM10K_VLAN_CLEAR;
+
+ err = fm10k_iov_select_vid(vf_info, vlan);
+ if (err < 0)
+ return err;
+
+ vlan = (u16)err;
+
+ /* Add this request to the MAC/VLAN queue */
+ err = fm10k_queue_mac_request(interface, vf_info->glort,
+ mac, vlan, set);
+ }
+
+ return err;
+}
+
static const struct fm10k_msg_data iov_mbx_data[] = {
FM10K_TLV_MSG_TEST_HANDLER(fm10k_tlv_msg_test),
FM10K_VF_MSG_MSIX_HANDLER(fm10k_iov_msg_msix_pf),
- FM10K_VF_MSG_MAC_VLAN_HANDLER(fm10k_iov_msg_mac_vlan_pf),
+ FM10K_VF_MSG_MAC_VLAN_HANDLER(fm10k_iov_msg_queue_mac_vlan),
FM10K_VF_MSG_LPORT_STATE_HANDLER(fm10k_iov_msg_lport_state_pf),
FM10K_TLV_MSG_ERROR_HANDLER(fm10k_iov_msg_error),
};
@@ -126,8 +249,10 @@ s32 fm10k_iov_mbx(struct fm10k_intfc *interface)
hw->mbx.ops.process(hw, &hw->mbx);
/* verify port mapping is valid, if not reset port */
- if (vf_info->vf_flags && !fm10k_glort_valid_pf(hw, glort))
+ if (vf_info->vf_flags && !fm10k_glort_valid_pf(hw, glort)) {
hw->iov.ops.reset_lport(hw, vf_info);
+ fm10k_clear_macvlan_queue(interface, glort, false);
+ }
/* reset VFs that have mailbox timed out */
if (!mbx->timeout) {
@@ -190,6 +315,7 @@ void fm10k_iov_suspend(struct pci_dev *pdev)
hw->iov.ops.reset_resources(hw, vf_info);
hw->iov.ops.reset_lport(hw, vf_info);
+ fm10k_clear_macvlan_queue(interface, vf_info->glort, false);
}
}
@@ -414,6 +540,8 @@ static inline void fm10k_reset_vf_info(struct fm10k_intfc *interface,
/* disable LPORT for this VF which clears switch rules */
hw->iov.ops.reset_lport(hw, vf_info);
+ fm10k_clear_macvlan_queue(interface, vf_info->glort, false);
+
/* assign new MAC+VLAN for this VF */
hw->iov.ops.assign_default_mac_vlan(hw, vf_info);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
index 9e4fb3a44376..425d814aed4d 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
@@ -1186,7 +1186,7 @@ s32 fm10k_iov_msg_msix_pf(struct fm10k_hw *hw, u32 **results,
* Will report an error if the VLAN ID is out of range. For VID = 0, it will
* return either the pf_vid or sw_vid depending on which one is set.
*/
-static s32 fm10k_iov_select_vid(struct fm10k_vf_info *vf_info, u16 vid)
+s32 fm10k_iov_select_vid(struct fm10k_vf_info *vf_info, u16 vid)
{
if (!vid)
return vf_info->pf_vid ? vf_info->pf_vid : vf_info->sw_vid;
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.h b/drivers/net/ethernet/intel/fm10k/fm10k_pf.h
index 3336d3c10760..e04d41f1a532 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.h
@@ -1,5 +1,5 @@
/* Intel(R) Ethernet Switch Host Interface Driver
- * Copyright(c) 2013 - 2016 Intel Corporation.
+ * Copyright(c) 2013 - 2017 Intel Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms and conditions of the GNU General Public License,
@@ -114,6 +114,7 @@ extern const struct fm10k_tlv_attr fm10k_err_msg_attr[];
#define FM10K_PF_MSG_ERR_HANDLER(msg, func) \
FM10K_MSG_HANDLER(FM10K_PF_MSG_ID_##msg, fm10k_err_msg_attr, func)
+s32 fm10k_iov_select_vid(struct fm10k_vf_info *vf_info, u16 vid);
s32 fm10k_iov_msg_msix_pf(struct fm10k_hw *, u32 **, struct fm10k_mbx_info *);
s32 fm10k_iov_msg_mac_vlan_pf(struct fm10k_hw *, u32 **,
struct fm10k_mbx_info *);
--
2.14.2
^ permalink raw reply related
* [net-next 0/9][pull request] 100GbE Intel Wired LAN Driver Updates 2017-10-03
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene
This series contains updates to fm10k only.
Jake provides majority of the changes in this series, starting with using
fm10k_prepare_for_reset() if we lose PCIe link. Before we would detach
the device and close the netdev, which left a lot of items still active,
such as the Tx/Rx resources. This could cause problems where register
reads would return potentially invalid values and would result in unknown
driver behavior, so call fm10k_prepare_for_reset() much like we do for
suspend/resume cycles. This will attempt to shutdown as much as possible
to prevent possible issues. Then replaced the PCI specific legacy power
management hooks with the new generic power management hooks for both
suspend and hibernate. Introduced a workqueue item which monitors a
queue of MAC and VLAN requests since a large number of MAC address or
VLAN updates at once can overload the mailbox with too many messages at
once. Fixed a cppcheck warning by properly declaring the min_rate and
max_rate variables in the declaration and definition for .ndo_set_vf_bw,
rather than using "unused" for the minimum rates.
Joe Perches fixes the backward logic when using net_ratelimit().
The following are changes since commit 4efac6ff4dc7d08ff3a63bf72172cb31323b6a8d:
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
and are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 100GbE
Jacob Keller (8):
fm10k: prepare_for_reset() when we lose PCIe Link
fm10k: use spinlock to implement mailbox lock
fm10k: use generic PM hooks instead of legacy PCIe power hooks
fm10k: introduce a message queue for MAC/VLAN messages
fm10k: use the MAC/VLAN queue for VF<->PF MAC/VLAN requests
fm10k: bump version number
fm10k: prefer %s and __func__ for diagnostic prints
fm10k: fix mis-ordered parameters in declaration for .ndo_set_vf_bw
Joe Perches (1):
fm10k: Fix misuse of net_ratelimit()
drivers/net/ethernet/intel/fm10k/fm10k.h | 60 +++-
drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 141 ++++++++-
drivers/net/ethernet/intel/fm10k/fm10k_main.c | 5 +-
drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 199 +++++++++---
drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 392 +++++++++++++++++++-----
drivers/net/ethernet/intel/fm10k/fm10k_pf.c | 2 +-
drivers/net/ethernet/intel/fm10k/fm10k_pf.h | 3 +-
7 files changed, 665 insertions(+), 137 deletions(-)
--
2.14.2
^ permalink raw reply
* [net-next 3/9] fm10k: use generic PM hooks instead of legacy PCIe power hooks
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171003163138.47569-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
Replace the PCI specific legacy power management hooks with the new
generic power management hooks which work properly for both suspend and
hibernate. The new generic system is better and properly handles the
lower level PCIe power management rather than forcing the driver to
handle it.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 67 +++++++++-------------------
1 file changed, 22 insertions(+), 45 deletions(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 240772ad5d69..aef39909e4a2 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -2264,36 +2264,19 @@ static int fm10k_handle_resume(struct fm10k_intfc *interface)
#ifdef CONFIG_PM
/**
- * fm10k_resume - Restore device to pre-sleep state
- * @pdev: PCI device information struct
+ * fm10k_resume - Generic PM resume hook
+ * @dev: generic device structure
*
- * fm10k_resume is called after the system has powered back up from a sleep
- * state and is ready to resume operation. This function is meant to restore
- * the device back to its pre-sleep state.
+ * Generic PM hook used when waking the device from a low power state after
+ * suspend or hibernation. This function does not need to handle lower PCIe
+ * device state as the stack takes care of that for us.
**/
-static int fm10k_resume(struct pci_dev *pdev)
+static int fm10k_resume(struct device *dev)
{
- struct fm10k_intfc *interface = pci_get_drvdata(pdev);
+ struct fm10k_intfc *interface = pci_get_drvdata(to_pci_dev(dev));
struct net_device *netdev = interface->netdev;
struct fm10k_hw *hw = &interface->hw;
- u32 err;
-
- pci_set_power_state(pdev, PCI_D0);
- pci_restore_state(pdev);
-
- /* pci_restore_state clears dev->state_saved so call
- * pci_save_state to restore it.
- */
- pci_save_state(pdev);
-
- err = pci_enable_device_mem(pdev);
- if (err) {
- dev_err(&pdev->dev, "Cannot enable PCI device from suspend\n");
- return err;
- }
- pci_set_master(pdev);
-
- pci_wake_from_d3(pdev, false);
+ int err;
/* refresh hw_addr in case it was dropped */
hw->hw_addr = interface->uc_addr;
@@ -2308,36 +2291,27 @@ static int fm10k_resume(struct pci_dev *pdev)
}
/**
- * fm10k_suspend - Prepare the device for a system sleep state
- * @pdev: PCI device information struct
+ * fm10k_suspend - Generic PM suspend hook
+ * @dev: generic device structure
*
- * fm10k_suspend is meant to shutdown the device prior to the system entering
- * a sleep state. The fm10k hardware does not support wake on lan so the
- * driver simply needs to shut down the device so it is in a low power state.
+ * Generic PM hook used when setting the device into a low power state for
+ * system suspend or hibernation. This function does not need to handle lower
+ * PCIe device state as the stack takes care of that for us.
**/
-static int fm10k_suspend(struct pci_dev *pdev,
- pm_message_t __always_unused state)
+static int fm10k_suspend(struct device *dev)
{
- struct fm10k_intfc *interface = pci_get_drvdata(pdev);
+ struct fm10k_intfc *interface = pci_get_drvdata(to_pci_dev(dev));
struct net_device *netdev = interface->netdev;
- int err = 0;
netif_device_detach(netdev);
fm10k_prepare_suspend(interface);
- err = pci_save_state(pdev);
- if (err)
- return err;
-
- pci_disable_device(pdev);
- pci_wake_from_d3(pdev, false);
- pci_set_power_state(pdev, PCI_D3hot);
-
return 0;
}
#endif /* CONFIG_PM */
+
/**
* fm10k_io_error_detected - called when PCI error is detected
* @pdev: Pointer to PCI device
@@ -2447,15 +2421,18 @@ static const struct pci_error_handlers fm10k_err_handler = {
.reset_done = fm10k_io_reset_done,
};
+static SIMPLE_DEV_PM_OPS(fm10k_pm_ops, fm10k_suspend, fm10k_resume);
+
static struct pci_driver fm10k_driver = {
.name = fm10k_driver_name,
.id_table = fm10k_pci_tbl,
.probe = fm10k_probe,
.remove = fm10k_remove,
#ifdef CONFIG_PM
- .suspend = fm10k_suspend,
- .resume = fm10k_resume,
-#endif
+ .driver = {
+ .pm = &fm10k_pm_ops,
+ },
+#endif /* CONFIG_PM */
.sriov_configure = fm10k_iov_configure,
.err_handler = &fm10k_err_handler
};
--
2.14.2
^ permalink raw reply related
* [net-next 1/9] fm10k: prepare_for_reset() when we lose PCIe Link
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171003163138.47569-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
If we lose PCIe link, such as when an unannounced PFLR event occurs, or
when a device is surprise removed, we currently detach the device and
close the netdev. This unfortunately leaves a lot of things still
active, such as the msix_mbx_pf IRQ, and Tx/Rx resources.
This can cause problems because the register reads will return
potentially invalid values which may result in unknown driver behavior.
Begin the process of resetting using fm10k_prepare_for_reset(), much in
the same way as the suspend and resume cycle does. This will attempt to
shutdown as much as possible, in order to prevent possible issues.
A naive implementation for this has issues, because there are now
multiple flows calling the reset logic and setting a reset bit. This
would cause problems, because the "re-attach" routine might call
fm10k_handle_reset() prior to the reset actually finishing. Instead,
we'll add state bits to indicate which flow actually initiated the
reset.
For the general reset flow, we'll assume that if someone else is
resetting that we do not need to handle it at all, so it does not need
its own state bit. For the suspend case, we will simply issue a warning
indicating that we are attempting to recover from this case when
resuming.
For the detached subtask, we'll simply refuse to re-attach until we've
actually initiated a reset as part of that flow.
Finally, we'll stop attempting to manage the mailbox subtask when we're
detached, since there's nothing we can do if we don't have a PCIe
address.
Overall this produces a much cleaner shutdown and recovery cycle for
a PCIe surprise remove event.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k.h | 2 +
drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 103 ++++++++++++++++++++-------
2 files changed, 79 insertions(+), 26 deletions(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h b/drivers/net/ethernet/intel/fm10k/fm10k.h
index 689c413b7782..ba70c58ca920 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
@@ -270,6 +270,8 @@ enum fm10k_flags_t {
enum fm10k_state_t {
__FM10K_RESETTING,
+ __FM10K_RESET_DETACHED,
+ __FM10K_RESET_SUSPENDED,
__FM10K_DOWN,
__FM10K_SERVICE_SCHED,
__FM10K_SERVICE_REQUEST,
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 9575f7c1862d..4e5e3e64beda 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -153,7 +153,15 @@ static void fm10k_service_timer(unsigned long data)
fm10k_service_event_schedule(interface);
}
-static void fm10k_prepare_for_reset(struct fm10k_intfc *interface)
+/**
+ * fm10k_prepare_for_reset - Prepare the driver and device for a pending reset
+ * @interface: fm10k private data structure
+ *
+ * This function prepares for a device reset by shutting as much down as we
+ * can. It does nothing and returns false if __FM10K_RESETTING was already set
+ * prior to calling this function. It returns true if it actually did work.
+ */
+static bool fm10k_prepare_for_reset(struct fm10k_intfc *interface)
{
struct net_device *netdev = interface->netdev;
@@ -162,8 +170,9 @@ static void fm10k_prepare_for_reset(struct fm10k_intfc *interface)
/* put off any impending NetWatchDogTimeout */
netif_trans_update(netdev);
- while (test_and_set_bit(__FM10K_RESETTING, interface->state))
- usleep_range(1000, 2000);
+ /* Nothing to do if a reset is already in progress */
+ if (test_and_set_bit(__FM10K_RESETTING, interface->state))
+ return false;
rtnl_lock();
@@ -181,6 +190,8 @@ static void fm10k_prepare_for_reset(struct fm10k_intfc *interface)
interface->last_reset = jiffies + (10 * HZ);
rtnl_unlock();
+
+ return true;
}
static int fm10k_handle_reset(struct fm10k_intfc *interface)
@@ -189,6 +200,8 @@ static int fm10k_handle_reset(struct fm10k_intfc *interface)
struct fm10k_hw *hw = &interface->hw;
int err;
+ WARN_ON(!test_bit(__FM10K_RESETTING, interface->state));
+
rtnl_lock();
pci_set_master(interface->pdev);
@@ -267,51 +280,75 @@ static void fm10k_detach_subtask(struct fm10k_intfc *interface)
struct net_device *netdev = interface->netdev;
u32 __iomem *hw_addr;
u32 value;
+ int err;
- /* do nothing if device is still present or hw_addr is set */
+ /* do nothing if netdev is still present or hw_addr is set */
if (netif_device_present(netdev) || interface->hw.hw_addr)
return;
+ /* We've lost the PCIe register space, and can no longer access the
+ * device. Shut everything except the detach subtask down and prepare
+ * to reset the device in case we recover. If we actually prepare for
+ * reset, indicate that we're detached.
+ */
+ if (fm10k_prepare_for_reset(interface))
+ set_bit(__FM10K_RESET_DETACHED, interface->state);
+
/* check the real address space to see if we've recovered */
hw_addr = READ_ONCE(interface->uc_addr);
value = readl(hw_addr);
if (~value) {
+ /* Make sure the reset was initiated because we detached,
+ * otherwise we might race with a different reset flow.
+ */
+ if (!test_and_clear_bit(__FM10K_RESET_DETACHED,
+ interface->state))
+ return;
+
+ /* Restore the hardware address */
interface->hw.hw_addr = interface->uc_addr;
+
+ /* PCIe link has been restored, and the device is active
+ * again. Restore everything and reset the device.
+ */
+ err = fm10k_handle_reset(interface);
+ if (err) {
+ netdev_err(netdev, "Unable to reset device: %d\n", err);
+ interface->hw.hw_addr = NULL;
+ return;
+ }
+
+ /* Re-attach the netdev */
netif_device_attach(netdev);
- set_bit(FM10K_FLAG_RESET_REQUESTED, interface->flags);
netdev_warn(netdev, "PCIe link restored, device now attached\n");
return;
}
-
- rtnl_lock();
-
- if (netif_running(netdev))
- dev_close(netdev);
-
- rtnl_unlock();
}
-static void fm10k_reinit(struct fm10k_intfc *interface)
+static void fm10k_reset_subtask(struct fm10k_intfc *interface)
{
int err;
- fm10k_prepare_for_reset(interface);
-
- err = fm10k_handle_reset(interface);
- if (err)
- dev_err(&interface->pdev->dev,
- "fm10k_handle_reset failed: %d\n", err);
-}
-
-static void fm10k_reset_subtask(struct fm10k_intfc *interface)
-{
if (!test_and_clear_bit(FM10K_FLAG_RESET_REQUESTED,
interface->flags))
return;
+ /* If another thread has already prepared to reset the device, we
+ * should not attempt to handle a reset here, since we'd race with
+ * that thread. This may happen if we suspend the device or if the
+ * PCIe link is lost. In this case, we'll just ignore the RESET
+ * request, as it will (eventually) be taken care of when the thread
+ * which actually started the reset is finished.
+ */
+ if (!fm10k_prepare_for_reset(interface))
+ return;
+
netdev_err(interface->netdev, "Reset interface\n");
- fm10k_reinit(interface);
+ err = fm10k_handle_reset(interface);
+ if (err)
+ dev_err(&interface->pdev->dev,
+ "fm10k_handle_reset failed: %d\n", err);
}
/**
@@ -381,6 +418,10 @@ static void fm10k_watchdog_update_host_state(struct fm10k_intfc *interface)
**/
static void fm10k_mbx_subtask(struct fm10k_intfc *interface)
{
+ /* If we're resetting, bail out */
+ if (test_bit(__FM10K_RESETTING, interface->state))
+ return;
+
/* process upstream mailbox and update device state */
fm10k_watchdog_update_host_state(interface);
@@ -630,9 +671,11 @@ static void fm10k_service_task(struct work_struct *work)
interface = container_of(work, struct fm10k_intfc, service_task);
+ /* Check whether we're detached first */
+ fm10k_detach_subtask(interface);
+
/* tasks run even when interface is down */
fm10k_mbx_subtask(interface);
- fm10k_detach_subtask(interface);
fm10k_reset_subtask(interface);
/* tasks only run when interface is up */
@@ -2177,7 +2220,8 @@ static void fm10k_prepare_suspend(struct fm10k_intfc *interface)
*/
fm10k_stop_service_event(interface);
- fm10k_prepare_for_reset(interface);
+ if (fm10k_prepare_for_reset(interface))
+ set_bit(__FM10K_RESET_SUSPENDED, interface->state);
}
static int fm10k_handle_resume(struct fm10k_intfc *interface)
@@ -2185,6 +2229,13 @@ static int fm10k_handle_resume(struct fm10k_intfc *interface)
struct fm10k_hw *hw = &interface->hw;
int err;
+ /* Even if we didn't properly prepare for reset in
+ * fm10k_prepare_suspend, we'll attempt to resume anyways.
+ */
+ if (!test_and_clear_bit(__FM10K_RESET_SUSPENDED, interface->state))
+ dev_warn(&interface->pdev->dev,
+ "Device was shut down as part of suspend... Attempting to recover\n");
+
/* reset statistics starting values */
hw->mac.ops.rebind_hw_stats(hw, &interface->stats);
--
2.14.2
^ permalink raw reply related
* [net-next 2/9] fm10k: use spinlock to implement mailbox lock
From: Jeff Kirsher @ 2017-10-03 16:31 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171003163138.47569-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
Lets not re-invent the locking wheel. Remove our bitlock and use
a proper spinlock instead.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/fm10k/fm10k.h | 15 +++++----------
drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 3 +++
2 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h b/drivers/net/ethernet/intel/fm10k/fm10k.h
index ba70c58ca920..74542e9d63c7 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
@@ -276,7 +276,6 @@ enum fm10k_state_t {
__FM10K_SERVICE_SCHED,
__FM10K_SERVICE_REQUEST,
__FM10K_SERVICE_DISABLE,
- __FM10K_MBX_LOCK,
__FM10K_LINK_DOWN,
__FM10K_UPDATING_STATS,
/* This value must be last and determines the BITMAP size */
@@ -346,6 +345,8 @@ struct fm10k_intfc {
struct fm10k_hw_stats stats;
struct fm10k_hw hw;
+ /* Mailbox lock */
+ spinlock_t mbx_lock;
u32 __iomem *uc_addr;
u32 __iomem *sw_addr;
u16 msg_enable;
@@ -386,23 +387,17 @@ struct fm10k_intfc {
static inline void fm10k_mbx_lock(struct fm10k_intfc *interface)
{
- /* busy loop if we cannot obtain the lock as some calls
- * such as ndo_set_rx_mode may be made in atomic context
- */
- while (test_and_set_bit(__FM10K_MBX_LOCK, interface->state))
- udelay(20);
+ spin_lock(&interface->mbx_lock);
}
static inline void fm10k_mbx_unlock(struct fm10k_intfc *interface)
{
- /* flush memory to make sure state is correct */
- smp_mb__before_atomic();
- clear_bit(__FM10K_MBX_LOCK, interface->state);
+ spin_unlock(&interface->mbx_lock);
}
static inline int fm10k_mbx_trylock(struct fm10k_intfc *interface)
{
- return !test_and_set_bit(__FM10K_MBX_LOCK, interface->state);
+ return spin_trylock(&interface->mbx_lock);
}
/* fm10k_test_staterr - test bits in Rx descriptor status and error fields */
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index 4e5e3e64beda..240772ad5d69 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1921,6 +1921,9 @@ static int fm10k_sw_init(struct fm10k_intfc *interface,
netdev_rss_key_fill(rss_key, sizeof(rss_key));
memcpy(interface->rssrk, rss_key, sizeof(rss_key));
+ /* Initialize the mailbox lock */
+ spin_lock_init(&interface->mbx_lock);
+
/* Start off interface as being down */
set_bit(__FM10K_DOWN, interface->state);
set_bit(__FM10K_UPDATING_STATS, interface->state);
--
2.14.2
^ permalink raw reply related
* Re: [PATCH net] net: br: Fix igmp snooping offload with CONFIG_BRIDGE_VLAN_FILTERING
From: Vivien Didelot @ 2017-10-03 16:25 UTC (permalink / raw)
To: Andrew Lunn, Toshiaki Makita; +Cc: Toshiaki Makita, David Miller, netdev
In-Reply-To: <20171003153031.GP17713@lunn.ch>
Andrew Lunn <andrew@lunn.ch> writes:
>> The vlan will be effective only when vlan_filtering is enabled.
>> When vlan_filtering is disabled, vlan information is still kept in the
>> bridge and gets effective later when vlan_filtering becomes enable.
>
> O.K, so things are starting to get clearer.
>
> So when vlan filtering is disabled, the hardware should just ignore
> the requests to add the vlan to the hardware?
>
> When vlan_filtering is enabled, are all the vlans in the software
> bridge again offloaded? Or do we need to remember all the vlans which
> we ignored while vlan filtering was disabled? The average switch has
> nowhere to store these disabled vlans. It can only store active vlans.
When vlan_filtering is enabled on the bridge, the bridge code does
propagates the default_pvid again if I recall correctly.
In my opinion the hardware mustn't ignore the VLAN requests, because we
seem to agree that vlan_filtering disabled means that the target ports
should not care yet about 802.1Q. So having some unused hardware VLAN
entries and some ports with disabled 802.1Q mode must work together.
That being said we still have the wrong hardware FDB populated when
CONFIG_BRIDGE_VLAN_FILTERING is enabled but not vlan_filtering...
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox