Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net 3/3] ARM64: dts: meson: odroidc2: disable 1000t-eee advertisement
From: Jerome Brunet @ 2016-11-15 14:29 UTC (permalink / raw)
  To: netdev, devicetree, Florian Fainelli
  Cc: Jerome Brunet, Carlo Caione, Kevin Hilman, Giuseppe Cavallaro,
	Alexandre TORGUE, Martin Blumenstingl, Andre Roth, Neil Armstrong,
	linux-amlogic, linux-arm-kernel, linux-kernel
In-Reply-To: <1479220154-25851-1-git-send-email-jbrunet@baylibre.com>

Reported-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Tested-by: Andre Roth <neolynx@gmail.com>
---
 arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts b/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts
index e6e3491d48a5..1f4416ecb183 100644
--- a/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts
+++ b/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts
@@ -98,3 +98,18 @@
 	pinctrl-0 = <&i2c_a_pins>;
 	pinctrl-names = "default";
 };
+
+&ethmac {
+	phy-handle = <&eth_phy0>;
+
+	mdio {
+		compatible = "snps,dwmac-mdio";
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		eth_phy0: ethernet-phy@0 {
+			reg = <0>;
+			realtek,disable-eee-1000t;
+		};
+	};
+};
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 2/3] dt-bindings: net: add DT bindings for realtek phys
From: Jerome Brunet @ 2016-11-15 14:29 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Florian Fainelli
  Cc: Jerome Brunet, Carlo Caione, Kevin Hilman, Giuseppe Cavallaro,
	Alexandre TORGUE, Martin Blumenstingl, Andre Roth, Neil Armstrong,
	linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1479220154-25851-1-git-send-email-jbrunet-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>

Signed-off-by: Jerome Brunet <jbrunet-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>
Signed-off-by: Neil Armstrong <narmstrong-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>
---
 .../devicetree/bindings/net/realtek-phy.txt          | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/realtek-phy.txt

diff --git a/Documentation/devicetree/bindings/net/realtek-phy.txt b/Documentation/devicetree/bindings/net/realtek-phy.txt
new file mode 100644
index 000000000000..dc2845a6b387
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/realtek-phy.txt
@@ -0,0 +1,20 @@
+Realtek Ethernet PHY
+
+Some boards require special tuning values of the phy.
+
+Optional properties:
+
+realtek,disable-eee-1000t:
+realtek,disable-eee-100tx:
+  If set, respectively disable 1000-BaseT and 100-BaseTx energy efficient
+  ethernet capabilty advertisement
+  default: Leave the phy default settings unchanged (capabilities advertised)
+
+Example:
+
+&mdio0 {
+	ethernetphy0: ethernet-phy@0 {
+		reg = <0>;
+		realtek,disable-eee-1000t;
+	};
+};
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH net 1/3] net: phy: realtek: add eee advertisement disable options
From: Jerome Brunet @ 2016-11-15 14:29 UTC (permalink / raw)
  To: netdev, devicetree, Florian Fainelli
  Cc: Jerome Brunet, Carlo Caione, Kevin Hilman, Giuseppe Cavallaro,
	Alexandre TORGUE, Martin Blumenstingl, Andre Roth, Neil Armstrong,
	linux-amlogic, linux-arm-kernel, linux-kernel
In-Reply-To: <1479220154-25851-1-git-send-email-jbrunet@baylibre.com>

On some platforms, energy efficient ethernet with rtl8211 devices is
causing issue, like throughput drop or broken link.

This was reported on the OdroidC2 (DWMAC + RTL8211F). While the issue root
cause is not fully understood yet, disabling EEE advertisement prevent auto
negotiation from enabling EEE.

This patch provides options to disable 1000T and 100TX EEE advertisement
individually for the realtek phys supporting this feature.

Reported-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Tested-by: Andre Roth <neolynx@gmail.com>
---
 drivers/net/phy/realtek.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 64 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index aadd6e9f54ad..77235fd5faaf 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -15,6 +15,12 @@
  */
 #include <linux/phy.h>
 #include <linux/module.h>
+#include <linux/of.h>
+
+struct rtl8211x_phy_priv {
+	bool eee_1000t_disable;
+	bool eee_100tx_disable;
+};
 
 #define RTL821x_PHYSR		0x11
 #define RTL821x_PHYSR_DUPLEX	0x2000
@@ -93,12 +99,44 @@ static int rtl8211f_config_intr(struct phy_device *phydev)
 	return err;
 }
 
+static void rtl8211x_clear_eee_adv(struct phy_device *phydev)
+{
+	struct rtl8211x_phy_priv *priv = phydev->priv;
+	u16 val;
+
+	if (priv->eee_1000t_disable || priv->eee_100tx_disable) {
+		val = phy_read_mmd_indirect(phydev, MDIO_AN_EEE_ADV,
+					    MDIO_MMD_AN);
+
+		if (priv->eee_1000t_disable)
+			val &= ~MDIO_AN_EEE_ADV_1000T;
+		if (priv->eee_100tx_disable)
+			val &= ~MDIO_AN_EEE_ADV_100TX;
+
+		phy_write_mmd_indirect(phydev, MDIO_AN_EEE_ADV,
+				       MDIO_MMD_AN, val);
+	}
+}
+
+static int rtl8211x_config_init(struct phy_device *phydev)
+{
+	int ret;
+
+	ret = genphy_config_init(phydev);
+	if (ret < 0)
+		return ret;
+
+	rtl8211x_clear_eee_adv(phydev);
+
+	return 0;
+}
+
 static int rtl8211f_config_init(struct phy_device *phydev)
 {
 	int ret;
 	u16 reg;
 
-	ret = genphy_config_init(phydev);
+	ret = rtl8211x_config_init(phydev);
 	if (ret < 0)
 		return ret;
 
@@ -115,6 +153,26 @@ static int rtl8211f_config_init(struct phy_device *phydev)
 	return 0;
 }
 
+static int rtl8211x_phy_probe(struct phy_device *phydev)
+{
+	struct device *dev = &phydev->mdio.dev;
+	struct device_node *of_node = dev->of_node;
+	struct rtl8211x_phy_priv *priv;
+
+	priv = devm_kzalloc(dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	priv->eee_1000t_disable =
+		of_property_read_bool(of_node, "realtek,disable-eee-1000t");
+	priv->eee_100tx_disable =
+		of_property_read_bool(of_node, "realtek,disable-eee-100tx");
+
+	phydev->priv = priv;
+
+	return 0;
+}
+
 static struct phy_driver realtek_drvs[] = {
 	{
 		.phy_id         = 0x00008201,
@@ -140,7 +198,9 @@ static struct phy_driver realtek_drvs[] = {
 		.phy_id_mask	= 0x001fffff,
 		.features	= PHY_GBIT_FEATURES,
 		.flags		= PHY_HAS_INTERRUPT,
+		.probe		= &rtl8211x_phy_probe,
 		.config_aneg	= genphy_config_aneg,
+		.config_init	= &rtl8211x_config_init,
 		.read_status	= genphy_read_status,
 		.ack_interrupt	= rtl821x_ack_interrupt,
 		.config_intr	= rtl8211e_config_intr,
@@ -152,7 +212,9 @@ static struct phy_driver realtek_drvs[] = {
 		.phy_id_mask	= 0x001fffff,
 		.features	= PHY_GBIT_FEATURES,
 		.flags		= PHY_HAS_INTERRUPT,
+		.probe		= &rtl8211x_phy_probe,
 		.config_aneg	= &genphy_config_aneg,
+		.config_init	= &rtl8211x_config_init,
 		.read_status	= &genphy_read_status,
 		.ack_interrupt	= &rtl821x_ack_interrupt,
 		.config_intr	= &rtl8211e_config_intr,
@@ -164,6 +226,7 @@ static struct phy_driver realtek_drvs[] = {
 		.phy_id_mask	= 0x001fffff,
 		.features	= PHY_GBIT_FEATURES,
 		.flags		= PHY_HAS_INTERRUPT,
+		.probe		= &rtl8211x_phy_probe,
 		.config_aneg	= &genphy_config_aneg,
 		.config_init	= &rtl8211f_config_init,
 		.read_status	= &genphy_read_status,
-- 
2.7.4

^ permalink raw reply related

* [PATCH net 0/3] Fix OdroidC2 Gigabit Tx link issue
From: Jerome Brunet @ 2016-11-15 14:29 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Florian Fainelli
  Cc: Jerome Brunet, Carlo Caione, Kevin Hilman, Giuseppe Cavallaro,
	Alexandre TORGUE, Martin Blumenstingl, Andre Roth, Neil Armstrong,
	linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

This patchset fixes an issue with the OdroidC2 board (DWMAC + RTL8211F).
Initially reported as a low Tx throughput issue at gigabit speed, the
platform enters LPI too often. This eventually break the link (both Tx
and Rx), and require to bring the interface down and up again to get the
Rx path working again.

The root cause of this issue is not fully understood yet but disabling EEE
advertisement on the PHY prevent this feature to be negotiated.
With this change, the link is stable and reliable, with the expected
throughput performance.

The patchset adds options in the realtek phy driver to disable EEE
advertisement, through device tree, for the phy version supporting EEE.
Then EEE is disabled in the OdroidC2 device tree for Gigabit speed.
100M is not affected by this issue.

Jerome Brunet (3):
  net: phy: realtek: add eee advertisement disable options
  dt-bindings: net: add DT bindings for realtek phys
  ARM64: dts: meson: odroidc2: disable 1000t-eee advertisement

 .../devicetree/bindings/net/realtek-phy.txt        | 20 +++++++
 .../arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts | 15 +++++
 drivers/net/phy/realtek.c                          | 65 +++++++++++++++++++++-
 3 files changed, 99 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/net/realtek-phy.txt

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net][v2] bpf: fix range arithmetic for bpf map access
From: Josef Bacik @ 2016-11-15 14:20 UTC (permalink / raw)
  To: Jann Horn, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, David S. Miller, netdev
In-Reply-To: <CAG48ez1z8wJstz84-ekY5Ed8oNgpT73Xc18Or7RboOoBnTE03w@mail.gmail.com>

On 11/15/2016 08:47 AM, Jann Horn wrote:
> On Tue, Nov 15, 2016 at 4:10 AM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> On Mon, Nov 14, 2016 at 03:45:36PM -0500, Josef Bacik wrote:
>>> I made some invalid assumptions with BPF_AND and BPF_MOD that could result in
>>> invalid accesses to bpf map entries.  Fix this up by doing a few things
>>>
>>> 1) Kill BPF_MOD support.  This doesn't actually get used by the compiler in real
>>> life and just adds extra complexity.
>>>
>>> 2) Fix the logic for BPF_AND, don't allow AND of negative numbers and set the
>>> minimum value to 0 for positive AND's.
>>>
>>> 3) Don't do operations on the ranges if they are set to the limits, as they are
>>> by definition undefined, and allowing arithmetic operations on those values
>>> could make them appear valid when they really aren't.
>>>
>>> This fixes the testcase provided by Jann as well as a few other theoretical
>>> problems.
>>>
>>> Reported-by: Jann Horn <jannh@google.com>
>>> Signed-off-by: Josef Bacik <jbacik@fb.com>
>>
>> lgtm.
>> Acked-by: Alexei Starovoitov <ast@kernel.org>
>>
>> Jann, could you please double check the logic.
>> Thanks!
>
> I found some more potential issues, maybe Josef and you can tell me whether I
> understood these correctly.
>
>
> /* If the source register is a random pointer then the
> * min_value/max_value values represent the range of the known
> * accesses into that value, not the actual min/max value of the
> * register itself.  In this case we have to reset the reg range
> * values so we know it is not safe to look at.
> */
> if (regs[insn->src_reg].type != CONST_IMM &&
>    regs[insn->src_reg].type != UNKNOWN_VALUE) {
> min_val = BPF_REGISTER_MIN_RANGE;
> max_val = BPF_REGISTER_MAX_RANGE;
> }
>
> Why only the source register? Why not the destination register?
>

The destination register is what we are doing arithmetic to, so we don't 
actually care about the type of register it is, as we'll interpret the values at 
a a later point.  If the source register however is a MAP or some other pointer, 
then we know that the min/max values only apply to the range on the actual value 
of the register, rather than the possible range of values.  Said in another way 
if src register is a MAP then the range is [SRC_REG.imm+SRC_REG.min_value, 
SRC_REG.imm+SRC_REG.max_value] instead of [SRC_REG.min_value, SRC_REG.max_value].

So this is the same behavior for the destination register for sure, but we don't 
actually care about it at this point.  If the src_reg meets these criteria then 
we certainly don't know anything about the dest_reg and reset it blow with this

         if (min_val == BPF_REGISTER_MIN_RANGE &&
             max_val == BPF_REGISTER_MAX_RANGE) {
                 reset_reg_range_values(regs, insn->dst_reg);
                 return;
         }

Then in check_alu_op() if we weren't doing our operation on a MAP_PTR then we 
set the register to unknown and carry on.

>
> /* We don't know anything about what was done to this register, mark it
> * as unknown.
> */
> if (min_val == BPF_REGISTER_MIN_RANGE &&
>    max_val == BPF_REGISTER_MAX_RANGE) {
> reset_reg_range_values(regs, insn->dst_reg);
> return;
> }
>
> Why have this special case at all? Since min_val and max_val are
> basically independent, this code shouldn't be necessary, right?
>

They are the value of the source register, as I explained above if we know 
nothing about the source register then don't even bother doing the math, we also 
know nothing about the destination register.  This is a shortcut to keep us from 
doing potentially dangerous things when we already know it's invalid.

>
> static void check_reg_overflow(struct bpf_reg_state *reg)
> {
> if (reg->max_value > BPF_REGISTER_MAX_RANGE)
> reg->max_value = BPF_REGISTER_MAX_RANGE;
> if (reg->min_value < BPF_REGISTER_MIN_RANGE ||
>    reg->min_value > BPF_REGISTER_MAX_RANGE)
> reg->min_value = BPF_REGISTER_MIN_RANGE;
> }
>
> Why is this asymmetric? Why is `reg->max_value <
> BPF_REGISTER_MIN_RANGE` not important, but `reg->min_value >
> BPF_REGISTER_MAX_RANGE` is?

Because max_value is unsigned, so if we do reg->max_value = 
BPF_REGISTER_MIN_RANGE; and then do this we'll still get max_value reset to 
MAX_RANGE.

>
>
> In states_equal():
> if (rold->type == NOT_INIT ||
>    (rold->type == UNKNOWN_VALUE && rcur->type != NOT_INIT))   <------------
> continue;
>
> I think this is broken in code like the following:
>
> int value;
> if (condition) {
>   value = 1; // visited first by verifier
> } else {
>   value = 1000000; // visited second by verifier
> }
> int dummy = 1; // states seem to converge here, but actually don't
> map[value] = 1234;
>
> `value` would be an UNKNOWN_VALUE for both paths, right? So
> states_equal() would decide that the states converge after the
> conditionally executed code?
>

Value would be CONST_IMM for both paths, and wouldn't match so they wouldn't 
converge.  I think I understood your question right, let me know if I'm 
addressing the wrong part of it.

Do my explanations make sense?  I'm doing this first thing in the morning so I'm 
still a little foggy, let me know if things still aren't clear.  Thanks,

Josef

^ permalink raw reply

* RE: [PATCH net-next v5] cadence: Add LSO support.
From: Rafal Ozieblo @ 2016-11-15 14:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, nicolas.ferre@atmel.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <1479215490.8455.122.camel@edumazet-glaptop3.roam.corp.google.com>

> From: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
> Sent: 15 listopada 2016 14:12
> To: Rafal Ozieblo
> Cc: David Miller; nicolas.ferre@atmel.com; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH net-next v5] cadence: Add LSO support.
>
> On Tue, 2016-11-15 at 07:07 +0000, Rafal Ozieblo wrote:
> > > > > If UFO is in use it should not silently disable UDP checksums.
> > > > > 
> > > > > If you cannot support UFO with proper checksumming, then you cannot enable support for that feature.
> > > > 
> > > > According Cadence Gigabit Ethernet MAC documentation:
> > > > 
> > > > "Hardware will not calculate the UDP checksum or modify the UDP 
> > > > checksum field. Therefore software must set a value of zero in the 
> > > > checksum field in the UDP header (in the first payload buffer) to indicate to the receiver that the UDP datagram does not include a checksum."
> > > > 
> > > > It is hardware requirement.
> > >
> > > I do not doubt that it is a hardware restriction.
> > >
> > > But I am saying that you cannot enable this feature under Linux if this is how it operates on your hardware.
> > 
> > Would it be good to enable UFO conditionally with some internal define? Ex.:
> > 
> > +#ifdef MACB_ENABLE_UFO
> > +#define MACB_NETIF_LSO         (NETIF_F_TSO | NETIF_F_UFO)
> > +#else
> > +#define MACB_NETIF_LSO         (NETIF_F_TSO)
> > +#endif
> > 
> > I could add precise comment here that ufo is possible only without checksum.
> > 
> > Or maybe I could enable it from module_params or device-tree (like: drivers/net/ethernet/neterion/s2io.c).
>
> No you can not do that.
>
> 1) That would violate UDP specs.
> 2) Module params are no longer accepted.
> 3) Comments in a driver source code would only help the driver maintainer, not users to make their mind.
>
> Only way would be to propagate the intent of the sender.
>
> Only the sender application can decide to generate UDP checksums or not.
>
> Your driver ndo_features_check() could then force software segmentation fallback if the user did not asked to disable UDP checksums, and packet is UFO.
>
> (look for UDP_NO_CHECK6_TX, and SO_NO_CHECK )
>
> Problem is complex, because the skb has no marker, only the socket has.
>
> And socket state could change between packets, and packets can stay in an intermediate qdisc before hitting device driver. So looking at
> skb->sk from your ndo_features_check() would be racy.
>
> What use case would you have precisely ?

I have talked with hardware team who designed and created gem IP.  The conclusion is that there is no need to zeroed checksum for UFO. 
I have tested UFO without zeroing UDP checksum on cadence gem. It works fine.
I'll deeply investigate and test UFO again. I'll send version 6 of LSO PATCH either with UFO without zeroing or without UFO at all.

^ permalink raw reply

* Re: amd-xgbe: Add support for MDIO attached PHYs
From: Tom Lendacky @ 2016-11-15 14:17 UTC (permalink / raw)
  To: Colin Ian King; +Cc: netdev@vger.kernel.org, David S. Miller
In-Reply-To: <6e653058-6442-5bcf-7e02-3136780caffb@canonical.com>

On 11/15/2016 7:07 AM, Colin Ian King wrote:
> Hi,
> 
> Commit:
> 
> amd-xgbe: Add support for MDIO attached PHYs
> 
>     Use the phylib support in the kernel to communicate with and control an
>     MDIO attached PHY. Use the hardware's MDIO communication mechanism to
>     communicate with the PHY.
> 
> 
> +static int xgbe_clr_gpio(struct xgbe_prv_data *pdata, unsigned int gpio)
> +{
> +       unsigned int reg;
> +
> +       if (gpio > 16)
> +               return -EINVAL;
> 
> is gpio in the range 0..15?
> 
> 	if (gpio > 15)
> 		return -EINVAL;

Yes, the GPIO range is 0 to 15. I'll submit a patch to change the
constraint check.

Thanks,
Tom

> 
> +
> +       reg = XGMAC_IOREAD(pdata, MAC_GPIOSR);
> +
> +       reg &= ~(1 << (gpio + 16));
> 
> if gpio is 16, we get 1 << 32 which I believe is undefined behaviour.
> 
> +       XGMAC_IOWRITE(pdata, MAC_GPIOSR, reg);
> +
> +       return 0;
> +}
> 
> 
> Same applies for function xgbe_clr_gpio().
> 
> Colin
> 

^ permalink raw reply

* Re: [PATCH net-next v3 2/7] vxlan: avoid checking socket multiple times.
From: Jiri Benc @ 2016-11-15 14:15 UTC (permalink / raw)
  To: Pravin B Shelar; +Cc: netdev
In-Reply-To: <1479098638-4921-3-git-send-email-pshelar@ovn.org>

Pravin,

please CC reviewers of the previous version when submitting a new
version. You'll get faster reviews that way.

On Sun, 13 Nov 2016 20:43:53 -0800, Pravin B Shelar wrote:
> @@ -2069,11 +2069,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>  		struct dst_entry *ndst;
>  		u32 rt6i_flags;
>  
> -		if (!sock6)
> -			goto drop;
> -		sk = sock6->sock->sk;
> -
> -		ndst = vxlan6_get_route(vxlan, skb,
> +		ndst = vxlan6_get_route(vxlan, sock6, skb,
>  					rdst ? rdst->remote_ifindex : 0, tos,
>  					label, &dst->sin6.sin6_addr,
>  					&src->sin6.sin6_addr,
> @@ -2093,6 +2089,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>  			goto tx_error;
>  		}
>  
> +		sk = sock6->sock->sk;
>  		/* Bypass encapsulation if the destination is local */
>  		rt6i_flags = ((struct rt6_info *)ndst)->rt6i_flags;
>  		if (!info && rt6i_flags & RTF_LOCAL &&

This moves the sk assignment from one arbitrary place to a different
arbitrary place, while it would be best to just remove it and open code
sock6->sock->sk in the call to udp_tunnel6_xmit_skb. But patch 6 does
that later, so whatever.

Acked-by: Jiri Benc <jbenc@redhat.com>

^ permalink raw reply

* Re: [PATCH] amd-xgbe: fix unsigned comparison against less than zero
From: Tom Lendacky @ 2016-11-15 14:09 UTC (permalink / raw)
  To: Colin King, netdev; +Cc: linux-kernel
In-Reply-To: <20161115121842.5774-1-colin.king@canonical.com>

On 11/15/2016 6:18 AM, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> Comparing unsigned int ret to less than zero for an error status
> check is never true.  Fix this by making ret a signed int. Reduce
> scope of ret too.
> 
> Found with static analysis by CoverityScan, CID 1377750

Thanks Colin, this was already identified by someone else and I
submitted the patch yesterday.

Thanks,
Tom

> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
> index 4ba4332..168507e 100644
> --- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
> +++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
> @@ -2346,7 +2346,7 @@ static bool xgbe_phy_valid_speed(struct xgbe_prv_data *pdata, int speed)
>  static int xgbe_phy_link_status(struct xgbe_prv_data *pdata, int *an_restart)
>  {
>  	struct xgbe_phy_data *phy_data = pdata->phy_data;
> -	unsigned int ret, reg;
> +	unsigned int reg;
>  
>  	*an_restart = 0;
>  
> @@ -2365,7 +2365,8 @@ static int xgbe_phy_link_status(struct xgbe_prv_data *pdata, int *an_restart)
>  
>  	if (phy_data->phydev) {
>  		/* Check external PHY */
> -		ret = phy_read_status(phy_data->phydev);
> +		int ret = phy_read_status(phy_data->phydev);
> +
>  		if (ret < 0)
>  			return 0;
>  
> 

^ permalink raw reply

* Re: linux-next: BUG: unable to handle kernel NULL pointer dereference in __sk_mem_raise_allocated()
From: Paolo Abeni @ 2016-11-15 14:07 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andrei Vagin, Linux Kernel Network Developers
In-Reply-To: <1479218524.8455.123.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, 2016-11-15 at 06:02 -0800, Eric Dumazet wrote:
> On Tue, 2016-11-15 at 10:26 +0100, Paolo Abeni wrote:
> > Hi,
> > 
> > On Mon, 2016-11-14 at 15:24 -0800, Andrei Vagin wrote:
> > > Our test system detected a kernel oops. Looks like a problem in the
> > > "udp: refactor memory accounting" series.
> > 
> > My fault: I missed udplite in my tests.
> > 
> > Thank you for reporting.
> > 
> > I'm fine with Eric's patch, setting both .memory_allocated
> > and .sysctl_mem.
> > We could also remove .backlog_rcv, but it's not strictly needed.
> 
> That is a good point, can you cook the official combined patch ?

Sure, I'll send ASAP, after a little testing.

^ permalink raw reply

* Re: linux-next: BUG: unable to handle kernel NULL pointer dereference in __sk_mem_raise_allocated()
From: Eric Dumazet @ 2016-11-15 14:02 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Andrei Vagin, Linux Kernel Network Developers
In-Reply-To: <1479202010.4660.11.camel@redhat.com>

On Tue, 2016-11-15 at 10:26 +0100, Paolo Abeni wrote:
> Hi,
> 
> On Mon, 2016-11-14 at 15:24 -0800, Andrei Vagin wrote:
> > Our test system detected a kernel oops. Looks like a problem in the
> > "udp: refactor memory accounting" series.
> 
> My fault: I missed udplite in my tests.
> 
> Thank you for reporting.
> 
> I'm fine with Eric's patch, setting both .memory_allocated
> and .sysctl_mem.
> We could also remove .backlog_rcv, but it's not strictly needed.

That is a good point, can you cook the official combined patch ?

Thanks !

^ permalink raw reply

* RE: [PATCH net-next v13 0/8] openvswitch: support for layer 3 encapsulated packets
From: Yang, Yi Y @ 2016-11-15 13:57 UTC (permalink / raw)
  To: Jiri Benc, netdev@vger.kernel.org
  Cc: dev@openvswitch.org, Pravin Shelar, Lorand Jakab, Simon Horman,
	Thadeu Lima de Souza Cascardo
In-Reply-To: <cover.1478791347.git.jbenc@redhat.com>

Hi, Jiri

I'm very glad to see you're continuing this work :-), I asked Simon about this twice, but nobody replies. I also remember Cascardo has a patch set to collaborate with this patch set, I asked Cascardo, but nobody responds, will you continue to do Cascardo's " create tunnel devices using rtnetlink interface" patch set? I test the old one v3, that can work with vxlan module in kernel, but if I build ovs with option " --with-linux=/lib/modules/`uname -r`/build", ovs vxlan module is built in vport_vxlan module, when I create vxlan-gpe port, kernel will automatically load vxlan module in the kernel instead of using the APIs in vport_vxlan module. 

Cascardo, are you still working on this?

-----Original Message-----
From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Jiri Benc
Sent: Thursday, November 10, 2016 11:28 PM
To: netdev@vger.kernel.org
Cc: dev@openvswitch.org; Pravin Shelar <pshelar@ovn.org>; Lorand Jakab <lojakab@cisco.com>; Simon Horman <simon.horman@netronome.com>
Subject: [PATCH net-next v13 0/8] openvswitch: support for layer 3 encapsulated packets

At the core of this patch set is removing the assumption in Open vSwitch datapath that all packets have Ethernet header.

The implementation relies on the presence of pop_eth and push_eth actions in datapath flows to facilitate adding and removing Ethernet headers as appropriate. The construction of such flows is left up to user-space.

This series is based on work by Simon Horman, Lorand Jakab, Thomas Morin and others. I kept Lorand's and Simon's s-o-b in the patches that are derived from v11 to record their authorship of parts of the code.

Changes from v12 to v13:

* Addressed Pravin's feedback.
* Removed the GRE vport conversion patch; L3 GRE ports should be created by
  rtnetlink instead.

Main changes from v11 to v12:

* The patches were restructured and split differently for easier review.
* They were rebased and adjusted to the current net-next. Especially MPLS
  handling is different (and easier) thanks to the recent MPLS GSO rework.
* Several bugs were discovered and fixed. The most notable is fragment
  handling: header adjustment for ARPHRD_NONE devices on tx needs to be done
  after refragmentation, not before it. This required significant changes in
  the patchset. Another one is stricter checking of attributes (match on L2
  vs. L3 packet) at the kernel level.
* Instead of is_layer3 bool, a mac_proto field is used.

Jiri Benc (8):
  openvswitch: use hard_header_len instead of hardcoded ETH_HLEN
  openvswitch: add mac_proto field to the flow key
  openvswitch: pass mac_proto to ovs_vport_send
  openvswitch: support MPLS push and pop for L3 packets
  openvswitch: add processing of L3 packets
  openvswitch: netlink: support L3 packets
  openvswitch: add Ethernet push and pop actions
  openvswitch: allow L3 netdev ports

 include/uapi/linux/openvswitch.h |  15 ++++
 net/openvswitch/actions.c        | 111 +++++++++++++++++-------
 net/openvswitch/datapath.c       |  13 +--
 net/openvswitch/flow.c           | 105 +++++++++++++++++------
 net/openvswitch/flow.h           |  22 +++++
 net/openvswitch/flow_netlink.c   | 179 ++++++++++++++++++++++++++-------------
 net/openvswitch/vport-netdev.c   |   9 +-
 net/openvswitch/vport.c          |  31 +++++--
 net/openvswitch/vport.h          |   2 +-
 9 files changed, 353 insertions(+), 134 deletions(-)

^ permalink raw reply

* Re: [PATCH net][v2] bpf: fix range arithmetic for bpf map access
From: Jann Horn @ 2016-11-15 13:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Josef Bacik, Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	netdev
In-Reply-To: <20161115031016.GA10323@ast-mbp.thefacebook.com>

On Tue, Nov 15, 2016 at 4:10 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Mon, Nov 14, 2016 at 03:45:36PM -0500, Josef Bacik wrote:
>> I made some invalid assumptions with BPF_AND and BPF_MOD that could result in
>> invalid accesses to bpf map entries.  Fix this up by doing a few things
>>
>> 1) Kill BPF_MOD support.  This doesn't actually get used by the compiler in real
>> life and just adds extra complexity.
>>
>> 2) Fix the logic for BPF_AND, don't allow AND of negative numbers and set the
>> minimum value to 0 for positive AND's.
>>
>> 3) Don't do operations on the ranges if they are set to the limits, as they are
>> by definition undefined, and allowing arithmetic operations on those values
>> could make them appear valid when they really aren't.
>>
>> This fixes the testcase provided by Jann as well as a few other theoretical
>> problems.
>>
>> Reported-by: Jann Horn <jannh@google.com>
>> Signed-off-by: Josef Bacik <jbacik@fb.com>
>
> lgtm.
> Acked-by: Alexei Starovoitov <ast@kernel.org>
>
> Jann, could you please double check the logic.
> Thanks!

I found some more potential issues, maybe Josef and you can tell me whether I
understood these correctly.

/* If the source register is a random pointer then the
* min_value/max_value values represent the range of the known
* accesses into that value, not the actual min/max value of the
* register itself.  In this case we have to reset the reg range
* values so we know it is not safe to look at.
*/
if (regs[insn->src_reg].type != CONST_IMM &&
   regs[insn->src_reg].type != UNKNOWN_VALUE) {
min_val = BPF_REGISTER_MIN_RANGE;
max_val = BPF_REGISTER_MAX_RANGE;
}

Why only the source register? Why not the destination register?

/* We don't know anything about what was done to this register, mark it
* as unknown.
*/
if (min_val == BPF_REGISTER_MIN_RANGE &&
   max_val == BPF_REGISTER_MAX_RANGE) {
reset_reg_range_values(regs, insn->dst_reg);
return;
}

Why have this special case at all? Since min_val and max_val are
basically independent, this code shouldn't be necessary, right?

static void check_reg_overflow(struct bpf_reg_state *reg)
{
if (reg->max_value > BPF_REGISTER_MAX_RANGE)
reg->max_value = BPF_REGISTER_MAX_RANGE;
if (reg->min_value < BPF_REGISTER_MIN_RANGE ||
   reg->min_value > BPF_REGISTER_MAX_RANGE)
reg->min_value = BPF_REGISTER_MIN_RANGE;
}

Why is this asymmetric? Why is `reg->max_value <
BPF_REGISTER_MIN_RANGE` not important, but `reg->min_value >
BPF_REGISTER_MAX_RANGE` is?

In states_equal():
if (rold->type == NOT_INIT ||
   (rold->type == UNKNOWN_VALUE && rcur->type != NOT_INIT))   <------------
continue;

I think this is broken in code like the following:

int value;
if (condition) {
  value = 1; // visited first by verifier
} else {
  value = 1000000; // visited second by verifier
}
int dummy = 1; // states seem to converge here, but actually don't
map[value] = 1234;

`value` would be an UNKNOWN_VALUE for both paths, right? So
states_equal() would decide that the states converge after the
conditionally executed code?

^ permalink raw reply

* Re: [PATCH net-next v3 3/5] ethtool: (uapi) Add ETHTOOL_PHY_DOWNSHIFT to PHY tunables
From: Andrew Lunn @ 2016-11-15 13:45 UTC (permalink / raw)
  To: Allan W. Nielsen; +Cc: netdev, raju.lakkaraju
In-Reply-To: <1479205204-27768-4-git-send-email-allan.nielsen@microsemi.com>

On Tue, Nov 15, 2016 at 11:20:02AM +0100, Allan W. Nielsen wrote:
> From: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
> 
> For operation in cabling environments that are incompatible with
> 1000BASE-T, PHY device may provide an automatic link speed downshift
> operation. When enabled, the device automatically changes its 1000BASE-T
> auto-negotiation to the next slower speed after a configured number of
> failed attempts at 1000BASE-T.  This feature is useful in setting up in
> networks using older cable installations that include only pairs A and B,
> and not pairs C and D.
> 
> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
> Signed-off-by: Allan W. Nielsen <allan.nielsen@microsemi.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* [RFC PATCH 1/2] net: use cmpxchg instead of spinlock in ptr rings
From: Jesper Dangaard Brouer @ 2016-11-15 13:32 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: brouer, John Fastabend, Michael S. Tsirkin


(looks like my message didn't reach the netdev list, due to me sending
from the wrong email, forwarded message again):

On Thu, 10 Nov 2016 20:44:08 -0800 John Fastabend <john.fastabend@gmail.com> wrote:

> ---
>  include/linux/ptr_ring_ll.h |  136 +++++++++++++++++++++++++++++++++++++++++++
>  include/linux/skb_array.h   |   25 ++++++++
>  2 files changed, 161 insertions(+)
>  create mode 100644 include/linux/ptr_ring_ll.h
> 
> diff --git a/include/linux/ptr_ring_ll.h b/include/linux/ptr_ring_ll.h
> new file mode 100644
> index 0000000..bcb11f3
> --- /dev/null
> +++ b/include/linux/ptr_ring_ll.h
> @@ -0,0 +1,136 @@
> +/*
> + *	Definitions for the 'struct ptr_ring_ll' datastructure.
> + *
> + *	Author:
> + *		John Fastabend <john.r.fastabend@intel.com>  
[...]
> + *
> + *	This is a limited-size FIFO maintaining pointers in FIFO order, with
> + *	one CPU producing entries and another consuming entries from a FIFO.
> + *	extended from ptr_ring_ll to use cmpxchg over spin lock.  

It sounds like this is Single Producer Single Consumer (SPSC)
implementation, but your implementation actually is Multi Producer
Multi Consumer (MPMC) capable.

The implementation looks a lot like my alf_queue[1] implementation:
 [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/include/linux/alf_queue.h

If the primary use-case is one CPU producing and another consuming,
then the normal ptr_ring (skb_array) will actually be faster!

The reason is ptr_ring avoids bouncing a cache-line between the CPUs on
every ring access.  This is achieved by having the checks for full
(__ptr_ring_full) and empty (__ptr_ring_empty) use the contents of the
array (NULL value).

I actually implemented two micro-benchmarks to measure the difference
between skb_array[2] and alf_queue[3]:
 [2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/skb_array_parallel01.c
 [3] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/alf_queue_parallel01.c


> + */
> +
> +#ifndef _LINUX_PTR_RING_LL_H
> +#define _LINUX_PTR_RING_LL_H 1
> +  
[...]
> +
> +struct ptr_ring_ll {
> +	u32 prod_size;
> +	u32 prod_mask;
> +	u32 prod_head;
> +	u32 prod_tail;
> +	u32 cons_size;
> +	u32 cons_mask;
> +	u32 cons_head;
> +	u32 cons_tail;
> +
> +	void **queue;
> +};  

Your implementation doesn't even split the consumer and producer into
different cachelines (which in practice doesn't help much due to how
the empty/full checks are performed).

> +
> +/* Note: callers invoking this in a loop must use a compiler barrier,
> + * for example cpu_relax(). Callers must hold producer_lock.
> + */
> +static inline int __ptr_ring_ll_produce(struct ptr_ring_ll *r, void *ptr)
> +{
> +	u32 ret, head, tail, next, slots, mask;
> +
> +	do {
> +		head = READ_ONCE(r->prod_head);
> +		mask = READ_ONCE(r->prod_mask);
> +		tail = READ_ONCE(r->cons_tail);  

Problem occur here, as the producer need to access/read the consumers
tail, to determine if the queue is not already full (slots avail).
Thus, the next "consumer-CPU" will see the cacheline in wrong state
(Modified/Invalid or Shared).

> +
> +		slots = mask + tail - head;
> +		if (slots < 1)
> +			return -ENOMEM;
> +
> +		next = head + 1;
> +		ret = cmpxchg(&r->prod_head, head, next);
> +	} while (ret != head);
> +
> +	r->queue[head & mask] = ptr;
> +	smp_wmb();
> +
> +	while (r->prod_tail != head)
> +		cpu_relax();
> +
> +	r->prod_tail = next;
> +	return 0;
> +}
> +
> +static inline void *__ptr_ring_ll_consume(struct ptr_ring_ll *r)
> +{
> +	u32 ret, head, tail, next, slots, mask;
> +	void *ptr;
> +
> +	do {
> +		head = READ_ONCE(r->cons_head);
> +		mask = READ_ONCE(r->cons_mask);
> +		tail = READ_ONCE(r->prod_tail);  

Like wise the consumer is reading the producer tail (for the empty check).

> +
> +		slots = tail - head;
> +		if (slots < 1)
> +			return ERR_PTR(-ENOMEM);
> +
> +		next = head + 1;
> +		ret = cmpxchg(&r->cons_head, head, next);
> +	} while (ret != head);
> +
> +	ptr = r->queue[head & mask];
> +	smp_rmb();
> +
> +	while (r->cons_tail != head)
> +		cpu_relax();
> +
> +	r->cons_tail = next;
> +	return ptr;
> +}
> +
> +static inline void **__ptr_ring_ll_init_queue_alloc(int size, gfp_t gfp)
> +{
> +	return kzalloc(ALIGN(size * sizeof(void *), SMP_CACHE_BYTES), gfp);
> +}
> +
> +static inline int ptr_ring_ll_init(struct ptr_ring_ll *r, int size, gfp_t gfp)
> +{
> +	r->queue = __ptr_ring_init_queue_alloc(size, gfp);
> +	if (!r->queue)
> +		return -ENOMEM;
> +
> +	r->prod_size = r->cons_size = size;
> +	r->prod_mask = r->cons_mask = size - 1;  

Shouldn't we have some check like is_power_of_2(size), as this code
looks like it depend on this.

> +	r->prod_tail = r->prod_head = 0;
> +	r->cons_tail = r->prod_tail = 0;
> +
> +	return 0;
> +}
> +  
[...]
> +#endif /* _LINUX_PTR_RING_LL_H  */
> diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
> index f4dfade..9b43dfd 100644
> --- a/include/linux/skb_array.h
> +++ b/include/linux/skb_array.h  
[...]
>  
> +static inline int skb_array_ll_produce(struct skb_array_ll *a, struct sk_buff *skb)
> +{
> +	return __ptr_ring_ll_produce(&a->ring, skb);
> +}
> +  
[...]
>  
> +static inline struct sk_buff *skb_array_ll_consume(struct skb_array_ll *a)
> +{
> +	return __ptr_ring_ll_consume(&a->ring);
> +}
> +  

Note in the Multi Producer Multi Consumer (MPMC) use-case this type of
queue can be faster than normal ptr_ring.  And in patch2 you implement
bulking, which is where the real benefit shows (in the MPMC case) for
this kind of queue.

What I would really like to see is a lock-free (locked cmpxchg) queue
implementation, what like ptr_ring use the array as empty/full check,
and still (somehow) support bulking.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH V2] mac80211: Ignore VHT IE from peer with wrong rx_mcs_map
From: Johannes Berg @ 2016-11-15 13:20 UTC (permalink / raw)
  To: Filip Matusiak, linux-wireless
  Cc: marek.kwaczynski, davem, netdev, linux-kernel
In-Reply-To: <1478077466-4308-1-git-send-email-filip.matusiak@tieto.com>

On Wed, 2016-11-02 at 10:04 +0100, Filip Matusiak wrote:
> This is a workaround for VHT-enabled STAs which break the spec
> and have the VHT-MCS Rx map filled in with value 3 for all eight
> spacial streams, an example is AR9462 in AP mode.
> 
> As per spec, in section 22.1.1 Introduction to the VHT PHY
> A VHT STA shall support at least single spactial stream VHT-MCSs
> 0 to 7 (transmit and receive) in all supported channel widths.
> 
> Some devices in STA mode will get firmware assert when trying to
> associate, examples are QCA9377 & QCA6174.
> 
> Packet example of broken VHT Cap IE of AR9462:
> 
> [...]

Applied, thanks.

johannes

^ permalink raw reply

* Re: [PATCH net-next v5] cadence: Add LSO support.
From: Eric Dumazet @ 2016-11-15 13:11 UTC (permalink / raw)
  To: Rafal Ozieblo
  Cc: David Miller, nicolas.ferre@atmel.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <BN3PR07MB251641C606D02892E2196960C9BF0@BN3PR07MB2516.namprd07.prod.outlook.com>

On Tue, 2016-11-15 at 07:07 +0000, Rafal Ozieblo wrote:
> > > > If UFO is in use it should not silently disable UDP checksums.
> > > > 
> > > > If you cannot support UFO with proper checksumming, then you cannot enable support for that feature.
> > > 
> > > According Cadence Gigabit Ethernet MAC documentation:
> > > 
> > > "Hardware will not calculate the UDP checksum or modify the UDP 
> > > checksum field. Therefore software must set a value of zero in the 
> > > checksum field in the UDP header (in the first payload buffer) to indicate to the receiver that the UDP datagram does not include a checksum."
> > > 
> > > It is hardware requirement.
> >
> > I do not doubt that it is a hardware restriction.
> >
> > But I am saying that you cannot enable this feature under Linux if this is how it operates on your hardware.
> 
> Would it be good to enable UFO conditionally with some internal define? Ex.:
> 
> +#ifdef MACB_ENABLE_UFO
> +#define MACB_NETIF_LSO         (NETIF_F_TSO | NETIF_F_UFO)
> +#else
> +#define MACB_NETIF_LSO         (NETIF_F_TSO)
> +#endif
> 
> I could add precise comment here that ufo is possible only without checksum.
> 
> Or maybe I could enable it from module_params or device-tree (like: drivers/net/ethernet/neterion/s2io.c).

No you can not do that.

1) That would violate UDP specs.
2) Module params are no longer accepted.
3) Comments in a driver source code would only help the driver
maintainer, not users to make their mind.

Only way would be to propagate the intent of the sender.

Only the sender application can decide to generate UDP checksums or not.

Your driver ndo_features_check() could then force software segmentation
fallback if the user did not asked to disable UDP checksums, and packet
is UFO.

(look for UDP_NO_CHECK6_TX, and SO_NO_CHECK )

Problem is complex, because the skb has no marker, only the socket has.

And socket state could change between packets, and packets can stay in
an intermediate qdisc before hitting device driver. So looking at
skb->sk from your ndo_features_check() would be racy.

What use case would you have precisely ?

^ permalink raw reply

* re: amd-xgbe: Add support for MDIO attached PHYs
From: Colin Ian King @ 2016-11-15 13:07 UTC (permalink / raw)
  To: Tom Lendacky; +Cc: netdev@vger.kernel.org, David S. Miller

Hi,

Commit:

amd-xgbe: Add support for MDIO attached PHYs

    Use the phylib support in the kernel to communicate with and control an
    MDIO attached PHY. Use the hardware's MDIO communication mechanism to
    communicate with the PHY.

+static int xgbe_clr_gpio(struct xgbe_prv_data *pdata, unsigned int gpio)
+{
+       unsigned int reg;
+
+       if (gpio > 16)
+               return -EINVAL;

is gpio in the range 0..15?

	if (gpio > 15)
		return -EINVAL;

+
+       reg = XGMAC_IOREAD(pdata, MAC_GPIOSR);
+
+       reg &= ~(1 << (gpio + 16));

if gpio is 16, we get 1 << 32 which I believe is undefined behaviour.

+       XGMAC_IOWRITE(pdata, MAC_GPIOSR, reg);
+
+       return 0;
+}

Same applies for function xgbe_clr_gpio().

Colin

^ permalink raw reply

* Re: [PATCH] vhost/vsock: Remove unused but set variable
From: Stefan Hajnoczi @ 2016-11-15 13:00 UTC (permalink / raw)
  To: Tobias Klauser
  Cc: Michael S. Tsirkin, Jason Wang, kvm, virtualization, netdev
In-Reply-To: <20161111132631.25708-1-tklauser@distanz.ch>

[-- Attachment #1: Type: text/plain, Size: 470 bytes --]

On Fri, Nov 11, 2016 at 02:26:31PM +0100, Tobias Klauser wrote:
> Remove the unused but set variable vq in vhost_transport_send_pkt() to
> fix the following GCC warning when building with 'W=1':
> 
>   drivers/vhost/vsock.c:198:26: warning: variable ‘vq’ set but not used
> 
> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
> ---
>  drivers/vhost/vsock.c | 3 ---
>  1 file changed, 3 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply

* [PATCH] net: ioctl SIOCSIFADDR minor cleanup
From: yuan linyu @ 2016-11-15 12:44 UTC (permalink / raw)
  To: netdev; +Cc: davem, yuan linyu

From: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>

1. set interface address label to ioctl request device name is enough
2. when address pass inet_abc_len check, prefixlen < 31 is always true

Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
---
 net/ipv4/devinet.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 062a67c..d491a7a 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1063,10 +1063,7 @@ int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 			if (!ifa)
 				break;
 			INIT_HLIST_NODE(&ifa->hash);
-			if (colon)
-				memcpy(ifa->ifa_label, ifr.ifr_name, IFNAMSIZ);
-			else
-				memcpy(ifa->ifa_label, dev->name, IFNAMSIZ);
+			memcpy(ifa->ifa_label, ifr.ifr_name, IFNAMSIZ);
 		} else {
 			ret = 0;
 			if (ifa->ifa_local == sin->sin_addr.s_addr)
@@ -1081,8 +1078,7 @@ int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 		if (!(dev->flags & IFF_POINTOPOINT)) {
 			ifa->ifa_prefixlen = inet_abc_len(ifa->ifa_address);
 			ifa->ifa_mask = inet_make_mask(ifa->ifa_prefixlen);
-			if ((dev->flags & IFF_BROADCAST) &&
-			    ifa->ifa_prefixlen < 31)
+			if (dev->flags & IFF_BROADCAST)
 				ifa->ifa_broadcast = ifa->ifa_address |
 						     ~ifa->ifa_mask;
 		} else {
-- 
2.7.4

^ permalink raw reply related

* [PATCH] amd-xgbe: fix unsigned comparison against less than zero
From: Colin King @ 2016-11-15 12:18 UTC (permalink / raw)
  To: Tom Lendacky, netdev; +Cc: linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Comparing unsigned int ret to less than zero for an error status
check is never true.  Fix this by making ret a signed int. Reduce
scope of ret too.

Found with static analysis by CoverityScan, CID 1377750

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
index 4ba4332..168507e 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
@@ -2346,7 +2346,7 @@ static bool xgbe_phy_valid_speed(struct xgbe_prv_data *pdata, int speed)
 static int xgbe_phy_link_status(struct xgbe_prv_data *pdata, int *an_restart)
 {
 	struct xgbe_phy_data *phy_data = pdata->phy_data;
-	unsigned int ret, reg;
+	unsigned int reg;
 
 	*an_restart = 0;
 
@@ -2365,7 +2365,8 @@ static int xgbe_phy_link_status(struct xgbe_prv_data *pdata, int *an_restart)
 
 	if (phy_data->phydev) {
 		/* Check external PHY */
-		ret = phy_read_status(phy_data->phydev);
+		int ret = phy_read_status(phy_data->phydev);
+
 		if (ret < 0)
 			return 0;
 
-- 
2.10.2

^ permalink raw reply related

* [PATCH v2 5/5] net: thunderx: Fix memory leak and other issues upon interface toggle
From: sunil.kovvuri @ 2016-11-15 12:08 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, linux-arm-kernel, Sunil Goutham

From: Sunil Goutham <sgoutham@cavium.com>

This patch fixes the following
1. When interface is being teardown and queues are being cleaned up,
   free pending SKBs that are in SQ which are either not transmitted
   or freed as NAPI is disabled by that time.
2. While interface initialization, delay CFG_DONE notification till
   the end to avoid corner cases where TXQs are enabled but CQ
   interrupts are not which results blocking transmission and kicking
   off watchdog.
3. Check for IFF_UP while re-enabling RBDR interrupts from tasklet.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
---
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   | 11 +++++------
 drivers/net/ethernet/cavium/thunder/nicvf_queues.c | 14 +++++++++++++-
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 9dc79c05..8a37012 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -473,9 +473,6 @@ int nicvf_set_real_num_queues(struct net_device *netdev,
 static int nicvf_init_resources(struct nicvf *nic)
 {
 	int err;
-	union nic_mbx mbx = {};
-
-	mbx.msg.msg = NIC_MBOX_MSG_CFG_DONE;
 
 	/* Enable Qset */
 	nicvf_qset_config(nic, true);
@@ -488,9 +485,6 @@ static int nicvf_init_resources(struct nicvf *nic)
 		return err;
 	}
 
-	/* Send VF config done msg to PF */
-	nicvf_write_to_mbx(nic, &mbx);
-
 	return 0;
 }
 
@@ -1184,6 +1178,7 @@ int nicvf_open(struct net_device *netdev)
 	struct nicvf *nic = netdev_priv(netdev);
 	struct queue_set *qs = nic->qs;
 	struct nicvf_cq_poll *cq_poll = NULL;
+	union nic_mbx mbx = {};
 
 	netif_carrier_off(netdev);
 
@@ -1271,6 +1266,10 @@ int nicvf_open(struct net_device *netdev)
 	for (qidx = 0; qidx < qs->rbdr_cnt; qidx++)
 		nicvf_enable_intr(nic, NICVF_INTR_RBDR, qidx);
 
+	/* Send VF config done msg to PF */
+	mbx.msg.msg = NIC_MBOX_MSG_CFG_DONE;
+	nicvf_write_to_mbx(nic, &mbx);
+
 	return 0;
 cleanup:
 	nicvf_disable_intr(nic, NICVF_INTR_MBOX, 0);
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
index bdce591..747ef08 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
@@ -271,7 +271,8 @@ static void nicvf_refill_rbdr(struct nicvf *nic, gfp_t gfp)
 			      rbdr_idx, new_rb);
 next_rbdr:
 	/* Re-enable RBDR interrupts only if buffer allocation is success */
-	if (!nic->rb_alloc_fail && rbdr->enable)
+	if (!nic->rb_alloc_fail && rbdr->enable &&
+	    netif_running(nic->pnicvf->netdev))
 		nicvf_enable_intr(nic, NICVF_INTR_RBDR, rbdr_idx);
 
 	if (rbdr_idx)
@@ -362,6 +363,8 @@ static int nicvf_init_snd_queue(struct nicvf *nic,
 
 static void nicvf_free_snd_queue(struct nicvf *nic, struct snd_queue *sq)
 {
+	struct sk_buff *skb;
+
 	if (!sq)
 		return;
 	if (!sq->dmem.base)
@@ -372,6 +375,15 @@ static void nicvf_free_snd_queue(struct nicvf *nic, struct snd_queue *sq)
 				  sq->dmem.q_len * TSO_HEADER_SIZE,
 				  sq->tso_hdrs, sq->tso_hdrs_phys);
 
+	/* Free pending skbs in the queue */
+	smp_rmb();
+	while (sq->head != sq->tail) {
+		skb = (struct sk_buff *)sq->skbuff[sq->head];
+		if (skb)
+			dev_kfree_skb_any(skb);
+		sq->head++;
+		sq->head &= (sq->dmem.q_len - 1);
+	}
 	kfree(sq->skbuff);
 	nicvf_free_q_desc_mem(nic, &sq->dmem);
 }
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 4/5] net: thunderx: Fix VF driver's interface statistics
From: sunil.kovvuri @ 2016-11-15 12:08 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, linux-arm-kernel, Sunil Goutham

From: Sunil Goutham <sgoutham@cavium.com>

This patch fixes multiple issues
1. Convert all driver statistics to percpu counters for accuracy.
2. To avoid multiple CQEs posted by a TSO packet appended to HW,
   TSO pkt's SQE has 'post_cqe' not set but a dummy SQE is added
   for getting HW transmit completion notification. This dummy
   SQE has 'dont_send' set and HW drops the pkt pointed to in this
   thus Tx drop counter increases. This patch fixes this by subtracting
   SW tx tso counter from HW Tx drop counter for actual packet drop counter.
3. Reset all individual queue's and VNIC HW stats when interface is going down.
4. Getrid off unnecessary counters in hot path.
5. Bringout all CQE error stats i.e both Rx and Tx.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
---
 drivers/net/ethernet/cavium/thunder/nic.h          |  61 +++++++-----
 drivers/net/ethernet/cavium/thunder/nic_main.c     |   1 +
 .../net/ethernet/cavium/thunder/nicvf_ethtool.c    | 105 +++++++++++---------
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   | 106 +++++++++++----------
 drivers/net/ethernet/cavium/thunder/nicvf_queues.c |  96 +++++++++----------
 drivers/net/ethernet/cavium/thunder/nicvf_queues.h |  24 +----
 6 files changed, 197 insertions(+), 196 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic.h b/drivers/net/ethernet/cavium/thunder/nic.h
index cd2d379..86bd93c 100644
--- a/drivers/net/ethernet/cavium/thunder/nic.h
+++ b/drivers/net/ethernet/cavium/thunder/nic.h
@@ -178,11 +178,11 @@ enum tx_stats_reg_offset {
 
 struct nicvf_hw_stats {
 	u64 rx_bytes;
+	u64 rx_frames;
 	u64 rx_ucast_frames;
 	u64 rx_bcast_frames;
 	u64 rx_mcast_frames;
-	u64 rx_fcs_errors;
-	u64 rx_l2_errors;
+	u64 rx_drops;
 	u64 rx_drop_red;
 	u64 rx_drop_red_bytes;
 	u64 rx_drop_overrun;
@@ -191,6 +191,19 @@ struct nicvf_hw_stats {
 	u64 rx_drop_mcast;
 	u64 rx_drop_l3_bcast;
 	u64 rx_drop_l3_mcast;
+	u64 rx_fcs_errors;
+	u64 rx_l2_errors;
+
+	u64 tx_bytes;
+	u64 tx_frames;
+	u64 tx_ucast_frames;
+	u64 tx_bcast_frames;
+	u64 tx_mcast_frames;
+	u64 tx_drops;
+};
+
+struct nicvf_drv_stats {
+	/* CQE Rx errs */
 	u64 rx_bgx_truncated_pkts;
 	u64 rx_jabber_errs;
 	u64 rx_fcs_errs;
@@ -216,34 +229,30 @@ struct nicvf_hw_stats {
 	u64 rx_l4_pclp;
 	u64 rx_truncated_pkts;
 
-	u64 tx_bytes_ok;
-	u64 tx_ucast_frames_ok;
-	u64 tx_bcast_frames_ok;
-	u64 tx_mcast_frames_ok;
-	u64 tx_drops;
-};
-
-struct nicvf_drv_stats {
-	/* Rx */
-	u64 rx_frames_ok;
-	u64 rx_frames_64;
-	u64 rx_frames_127;
-	u64 rx_frames_255;
-	u64 rx_frames_511;
-	u64 rx_frames_1023;
-	u64 rx_frames_1518;
-	u64 rx_frames_jumbo;
-	u64 rx_drops;
-
+	/* CQE Tx errs */
+	u64 tx_desc_fault;
+	u64 tx_hdr_cons_err;
+	u64 tx_subdesc_err;
+	u64 tx_max_size_exceeded;
+	u64 tx_imm_size_oflow;
+	u64 tx_data_seq_err;
+	u64 tx_mem_seq_err;
+	u64 tx_lock_viol;
+	u64 tx_data_fault;
+	u64 tx_tstmp_conflict;
+	u64 tx_tstmp_timeout;
+	u64 tx_mem_fault;
+	u64 tx_csum_overlap;
+	u64 tx_csum_overflow;
+
+	/* driver debug stats */
 	u64 rcv_buffer_alloc_failures;
-
-	/* Tx */
-	u64 tx_frames_ok;
-	u64 tx_drops;
 	u64 tx_tso;
 	u64 tx_timeout;
 	u64 txq_stop;
 	u64 txq_wake;
+
+	struct u64_stats_sync   syncp;
 };
 
 struct nicvf {
@@ -297,7 +306,7 @@ struct nicvf {
 
 	/* Stats */
 	struct nicvf_hw_stats   hw_stats;
-	struct nicvf_drv_stats  drv_stats;
+	struct nicvf_drv_stats  __percpu *drv_stats;
 	struct bgx_stats	bgx_stats;
 
 	/* MSI-X  */
diff --git a/drivers/net/ethernet/cavium/thunder/nic_main.c b/drivers/net/ethernet/cavium/thunder/nic_main.c
index 85c9e62..6677b96 100644
--- a/drivers/net/ethernet/cavium/thunder/nic_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nic_main.c
@@ -851,6 +851,7 @@ static int nic_reset_stat_counters(struct nicpf *nic,
 			nic_reg_write(nic, reg_addr, 0);
 		}
 	}
+
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c b/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c
index ad4fddb..432bf6b 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c
@@ -36,11 +36,11 @@ struct nicvf_stat {
 
 static const struct nicvf_stat nicvf_hw_stats[] = {
 	NICVF_HW_STAT(rx_bytes),
+	NICVF_HW_STAT(rx_frames),
 	NICVF_HW_STAT(rx_ucast_frames),
 	NICVF_HW_STAT(rx_bcast_frames),
 	NICVF_HW_STAT(rx_mcast_frames),
-	NICVF_HW_STAT(rx_fcs_errors),
-	NICVF_HW_STAT(rx_l2_errors),
+	NICVF_HW_STAT(rx_drops),
 	NICVF_HW_STAT(rx_drop_red),
 	NICVF_HW_STAT(rx_drop_red_bytes),
 	NICVF_HW_STAT(rx_drop_overrun),
@@ -49,50 +49,59 @@ static const struct nicvf_stat nicvf_hw_stats[] = {
 	NICVF_HW_STAT(rx_drop_mcast),
 	NICVF_HW_STAT(rx_drop_l3_bcast),
 	NICVF_HW_STAT(rx_drop_l3_mcast),
-	NICVF_HW_STAT(rx_bgx_truncated_pkts),
-	NICVF_HW_STAT(rx_jabber_errs),
-	NICVF_HW_STAT(rx_fcs_errs),
-	NICVF_HW_STAT(rx_bgx_errs),
-	NICVF_HW_STAT(rx_prel2_errs),
-	NICVF_HW_STAT(rx_l2_hdr_malformed),
-	NICVF_HW_STAT(rx_oversize),
-	NICVF_HW_STAT(rx_undersize),
-	NICVF_HW_STAT(rx_l2_len_mismatch),
-	NICVF_HW_STAT(rx_l2_pclp),
-	NICVF_HW_STAT(rx_ip_ver_errs),
-	NICVF_HW_STAT(rx_ip_csum_errs),
-	NICVF_HW_STAT(rx_ip_hdr_malformed),
-	NICVF_HW_STAT(rx_ip_payload_malformed),
-	NICVF_HW_STAT(rx_ip_ttl_errs),
-	NICVF_HW_STAT(rx_l3_pclp),
-	NICVF_HW_STAT(rx_l4_malformed),
-	NICVF_HW_STAT(rx_l4_csum_errs),
-	NICVF_HW_STAT(rx_udp_len_errs),
-	NICVF_HW_STAT(rx_l4_port_errs),
-	NICVF_HW_STAT(rx_tcp_flag_errs),
-	NICVF_HW_STAT(rx_tcp_offset_errs),
-	NICVF_HW_STAT(rx_l4_pclp),
-	NICVF_HW_STAT(rx_truncated_pkts),
-	NICVF_HW_STAT(tx_bytes_ok),
-	NICVF_HW_STAT(tx_ucast_frames_ok),
-	NICVF_HW_STAT(tx_bcast_frames_ok),
-	NICVF_HW_STAT(tx_mcast_frames_ok),
+	NICVF_HW_STAT(rx_fcs_errors),
+	NICVF_HW_STAT(rx_l2_errors),
+	NICVF_HW_STAT(tx_bytes),
+	NICVF_HW_STAT(tx_frames),
+	NICVF_HW_STAT(tx_ucast_frames),
+	NICVF_HW_STAT(tx_bcast_frames),
+	NICVF_HW_STAT(tx_mcast_frames),
+	NICVF_HW_STAT(tx_drops),
 };
 
 static const struct nicvf_stat nicvf_drv_stats[] = {
-	NICVF_DRV_STAT(rx_frames_ok),
-	NICVF_DRV_STAT(rx_frames_64),
-	NICVF_DRV_STAT(rx_frames_127),
-	NICVF_DRV_STAT(rx_frames_255),
-	NICVF_DRV_STAT(rx_frames_511),
-	NICVF_DRV_STAT(rx_frames_1023),
-	NICVF_DRV_STAT(rx_frames_1518),
-	NICVF_DRV_STAT(rx_frames_jumbo),
-	NICVF_DRV_STAT(rx_drops),
+	NICVF_DRV_STAT(rx_bgx_truncated_pkts),
+	NICVF_DRV_STAT(rx_jabber_errs),
+	NICVF_DRV_STAT(rx_fcs_errs),
+	NICVF_DRV_STAT(rx_bgx_errs),
+	NICVF_DRV_STAT(rx_prel2_errs),
+	NICVF_DRV_STAT(rx_l2_hdr_malformed),
+	NICVF_DRV_STAT(rx_oversize),
+	NICVF_DRV_STAT(rx_undersize),
+	NICVF_DRV_STAT(rx_l2_len_mismatch),
+	NICVF_DRV_STAT(rx_l2_pclp),
+	NICVF_DRV_STAT(rx_ip_ver_errs),
+	NICVF_DRV_STAT(rx_ip_csum_errs),
+	NICVF_DRV_STAT(rx_ip_hdr_malformed),
+	NICVF_DRV_STAT(rx_ip_payload_malformed),
+	NICVF_DRV_STAT(rx_ip_ttl_errs),
+	NICVF_DRV_STAT(rx_l3_pclp),
+	NICVF_DRV_STAT(rx_l4_malformed),
+	NICVF_DRV_STAT(rx_l4_csum_errs),
+	NICVF_DRV_STAT(rx_udp_len_errs),
+	NICVF_DRV_STAT(rx_l4_port_errs),
+	NICVF_DRV_STAT(rx_tcp_flag_errs),
+	NICVF_DRV_STAT(rx_tcp_offset_errs),
+	NICVF_DRV_STAT(rx_l4_pclp),
+	NICVF_DRV_STAT(rx_truncated_pkts),
+
+	NICVF_DRV_STAT(tx_desc_fault),
+	NICVF_DRV_STAT(tx_hdr_cons_err),
+	NICVF_DRV_STAT(tx_subdesc_err),
+	NICVF_DRV_STAT(tx_max_size_exceeded),
+	NICVF_DRV_STAT(tx_imm_size_oflow),
+	NICVF_DRV_STAT(tx_data_seq_err),
+	NICVF_DRV_STAT(tx_mem_seq_err),
+	NICVF_DRV_STAT(tx_lock_viol),
+	NICVF_DRV_STAT(tx_data_fault),
+	NICVF_DRV_STAT(tx_tstmp_conflict),
+	NICVF_DRV_STAT(tx_tstmp_timeout),
+	NICVF_DRV_STAT(tx_mem_fault),
+	NICVF_DRV_STAT(tx_csum_overlap),
+	NICVF_DRV_STAT(tx_csum_overflow),
+
 	NICVF_DRV_STAT(rcv_buffer_alloc_failures),
-	NICVF_DRV_STAT(tx_frames_ok),
 	NICVF_DRV_STAT(tx_tso),
-	NICVF_DRV_STAT(tx_drops),
 	NICVF_DRV_STAT(tx_timeout),
 	NICVF_DRV_STAT(txq_stop),
 	NICVF_DRV_STAT(txq_wake),
@@ -278,8 +287,8 @@ static void nicvf_get_ethtool_stats(struct net_device *netdev,
 				    struct ethtool_stats *stats, u64 *data)
 {
 	struct nicvf *nic = netdev_priv(netdev);
-	int stat;
-	int sqs;
+	int stat, tmp_stats;
+	int sqs, cpu;
 
 	nicvf_update_stats(nic);
 
@@ -289,9 +298,13 @@ static void nicvf_get_ethtool_stats(struct net_device *netdev,
 	for (stat = 0; stat < nicvf_n_hw_stats; stat++)
 		*(data++) = ((u64 *)&nic->hw_stats)
 				[nicvf_hw_stats[stat].index];
-	for (stat = 0; stat < nicvf_n_drv_stats; stat++)
-		*(data++) = ((u64 *)&nic->drv_stats)
-				[nicvf_drv_stats[stat].index];
+	for (stat = 0; stat < nicvf_n_drv_stats; stat++) {
+		tmp_stats = 0;
+		for_each_possible_cpu(cpu)
+			tmp_stats += ((u64 *)per_cpu_ptr(nic->drv_stats, cpu))
+				     [nicvf_drv_stats[stat].index];
+		*(data++) = tmp_stats;
+	}
 
 	nicvf_get_qset_stats(nic, stats, &data);
 
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 8f83361..9dc79c05 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -69,25 +69,6 @@ static inline u8 nicvf_netdev_qidx(struct nicvf *nic, u8 qidx)
 		return qidx;
 }
 
-static inline void nicvf_set_rx_frame_cnt(struct nicvf *nic,
-					  struct sk_buff *skb)
-{
-	if (skb->len <= 64)
-		nic->drv_stats.rx_frames_64++;
-	else if (skb->len <= 127)
-		nic->drv_stats.rx_frames_127++;
-	else if (skb->len <= 255)
-		nic->drv_stats.rx_frames_255++;
-	else if (skb->len <= 511)
-		nic->drv_stats.rx_frames_511++;
-	else if (skb->len <= 1023)
-		nic->drv_stats.rx_frames_1023++;
-	else if (skb->len <= 1518)
-		nic->drv_stats.rx_frames_1518++;
-	else
-		nic->drv_stats.rx_frames_jumbo++;
-}
-
 /* The Cavium ThunderX network controller can *only* be found in SoCs
  * containing the ThunderX ARM64 CPU implementation.  All accesses to the device
  * registers on this platform are implicitly strongly ordered with respect
@@ -514,7 +495,6 @@ static int nicvf_init_resources(struct nicvf *nic)
 }
 
 static void nicvf_snd_pkt_handler(struct net_device *netdev,
-				  struct cmp_queue *cq,
 				  struct cqe_send_t *cqe_tx,
 				  int cqe_type, int budget,
 				  unsigned int *tx_pkts, unsigned int *tx_bytes)
@@ -536,7 +516,7 @@ static void nicvf_snd_pkt_handler(struct net_device *netdev,
 		   __func__, cqe_tx->sq_qs, cqe_tx->sq_idx,
 		   cqe_tx->sqe_ptr, hdr->subdesc_cnt);
 
-	nicvf_check_cqe_tx_errs(nic, cq, cqe_tx);
+	nicvf_check_cqe_tx_errs(nic, cqe_tx);
 	skb = (struct sk_buff *)sq->skbuff[cqe_tx->sqe_ptr];
 	if (skb) {
 		/* Check for dummy descriptor used for HW TSO offload on 88xx */
@@ -630,8 +610,6 @@ static void nicvf_rcv_pkt_handler(struct net_device *netdev,
 		return;
 	}
 
-	nicvf_set_rx_frame_cnt(nic, skb);
-
 	nicvf_set_rxhash(netdev, cqe_rx, skb);
 
 	skb_record_rx_queue(skb, rq_idx);
@@ -703,7 +681,7 @@ static int nicvf_cq_intr_handler(struct net_device *netdev, u8 cq_idx,
 			work_done++;
 		break;
 		case CQE_TYPE_SEND:
-			nicvf_snd_pkt_handler(netdev, cq,
+			nicvf_snd_pkt_handler(netdev,
 					      (void *)cq_desc, CQE_TYPE_SEND,
 					      budget, &tx_pkts, &tx_bytes);
 			tx_done++;
@@ -740,7 +718,7 @@ static int nicvf_cq_intr_handler(struct net_device *netdev, u8 cq_idx,
 		nic = nic->pnicvf;
 		if (netif_tx_queue_stopped(txq) && netif_carrier_ok(netdev)) {
 			netif_tx_start_queue(txq);
-			nic->drv_stats.txq_wake++;
+			this_cpu_inc(nic->drv_stats->txq_wake);
 			if (netif_msg_tx_err(nic))
 				netdev_warn(netdev,
 					    "%s: Transmit queue wakeup SQ%d\n",
@@ -1084,7 +1062,7 @@ static netdev_tx_t nicvf_xmit(struct sk_buff *skb, struct net_device *netdev)
 
 	if (!netif_tx_queue_stopped(txq) && !nicvf_sq_append_skb(nic, skb)) {
 		netif_tx_stop_queue(txq);
-		nic->drv_stats.txq_stop++;
+		this_cpu_inc(nic->drv_stats->txq_stop);
 		if (netif_msg_tx_err(nic))
 			netdev_warn(netdev,
 				    "%s: Transmit ring full, stopping SQ%d\n",
@@ -1202,7 +1180,7 @@ static int nicvf_update_hw_max_frs(struct nicvf *nic, int mtu)
 
 int nicvf_open(struct net_device *netdev)
 {
-	int err, qidx;
+	int cpu, err, qidx;
 	struct nicvf *nic = netdev_priv(netdev);
 	struct queue_set *qs = nic->qs;
 	struct nicvf_cq_poll *cq_poll = NULL;
@@ -1262,6 +1240,11 @@ int nicvf_open(struct net_device *netdev)
 		nicvf_rss_init(nic);
 		if (nicvf_update_hw_max_frs(nic, netdev->mtu))
 			goto cleanup;
+
+		/* Clear percpu stats */
+		for_each_possible_cpu(cpu)
+			memset(per_cpu_ptr(nic->drv_stats, cpu), 0,
+			       sizeof(struct nicvf_drv_stats));
 	}
 
 	err = nicvf_register_interrupts(nic);
@@ -1288,9 +1271,6 @@ int nicvf_open(struct net_device *netdev)
 	for (qidx = 0; qidx < qs->rbdr_cnt; qidx++)
 		nicvf_enable_intr(nic, NICVF_INTR_RBDR, qidx);
 
-	nic->drv_stats.txq_stop = 0;
-	nic->drv_stats.txq_wake = 0;
-
 	return 0;
 cleanup:
 	nicvf_disable_intr(nic, NICVF_INTR_MBOX, 0);
@@ -1383,9 +1363,10 @@ void nicvf_update_lmac_stats(struct nicvf *nic)
 
 void nicvf_update_stats(struct nicvf *nic)
 {
-	int qidx;
+	int qidx, cpu;
+	u64 tmp_stats = 0;
 	struct nicvf_hw_stats *stats = &nic->hw_stats;
-	struct nicvf_drv_stats *drv_stats = &nic->drv_stats;
+	struct nicvf_drv_stats *drv_stats;
 	struct queue_set *qs = nic->qs;
 
 #define GET_RX_STATS(reg) \
@@ -1408,21 +1389,33 @@ void nicvf_update_stats(struct nicvf *nic)
 	stats->rx_drop_l3_bcast = GET_RX_STATS(RX_DRP_L3BCAST);
 	stats->rx_drop_l3_mcast = GET_RX_STATS(RX_DRP_L3MCAST);
 
-	stats->tx_bytes_ok = GET_TX_STATS(TX_OCTS);
-	stats->tx_ucast_frames_ok = GET_TX_STATS(TX_UCAST);
-	stats->tx_bcast_frames_ok = GET_TX_STATS(TX_BCAST);
-	stats->tx_mcast_frames_ok = GET_TX_STATS(TX_MCAST);
+	stats->tx_bytes = GET_TX_STATS(TX_OCTS);
+	stats->tx_ucast_frames = GET_TX_STATS(TX_UCAST);
+	stats->tx_bcast_frames = GET_TX_STATS(TX_BCAST);
+	stats->tx_mcast_frames = GET_TX_STATS(TX_MCAST);
 	stats->tx_drops = GET_TX_STATS(TX_DROP);
 
-	drv_stats->tx_frames_ok = stats->tx_ucast_frames_ok +
-				  stats->tx_bcast_frames_ok +
-				  stats->tx_mcast_frames_ok;
-	drv_stats->rx_frames_ok = stats->rx_ucast_frames +
-				  stats->rx_bcast_frames +
-				  stats->rx_mcast_frames;
-	drv_stats->rx_drops = stats->rx_drop_red +
-			      stats->rx_drop_overrun;
-	drv_stats->tx_drops = stats->tx_drops;
+	/* On T88 pass 2.0, the dummy SQE added for TSO notification
+	 * via CQE has 'dont_send' set. Hence HW drops the pkt pointed
+	 * pointed by dummy SQE and results in tx_drops counter being
+	 * incremented. Subtracting it from tx_tso counter will give
+	 * exact tx_drops counter.
+	 */
+	if (nic->t88 && nic->hw_tso) {
+		for_each_possible_cpu(cpu) {
+			drv_stats = per_cpu_ptr(nic->drv_stats, cpu);
+			tmp_stats += drv_stats->tx_tso;
+		}
+		stats->tx_drops = tmp_stats - stats->tx_drops;
+	}
+	stats->tx_frames = stats->tx_ucast_frames +
+			   stats->tx_bcast_frames +
+			   stats->tx_mcast_frames;
+	stats->rx_frames = stats->rx_ucast_frames +
+			   stats->rx_bcast_frames +
+			   stats->rx_mcast_frames;
+	stats->rx_drops = stats->rx_drop_red +
+			  stats->rx_drop_overrun;
 
 	/* Update RQ and SQ stats */
 	for (qidx = 0; qidx < qs->rq_cnt; qidx++)
@@ -1436,18 +1429,17 @@ static struct rtnl_link_stats64 *nicvf_get_stats64(struct net_device *netdev,
 {
 	struct nicvf *nic = netdev_priv(netdev);
 	struct nicvf_hw_stats *hw_stats = &nic->hw_stats;
-	struct nicvf_drv_stats *drv_stats = &nic->drv_stats;
 
 	nicvf_update_stats(nic);
 
 	stats->rx_bytes = hw_stats->rx_bytes;
-	stats->rx_packets = drv_stats->rx_frames_ok;
-	stats->rx_dropped = drv_stats->rx_drops;
+	stats->rx_packets = hw_stats->rx_frames;
+	stats->rx_dropped = hw_stats->rx_drops;
 	stats->multicast = hw_stats->rx_mcast_frames;
 
-	stats->tx_bytes = hw_stats->tx_bytes_ok;
-	stats->tx_packets = drv_stats->tx_frames_ok;
-	stats->tx_dropped = drv_stats->tx_drops;
+	stats->tx_bytes = hw_stats->tx_bytes;
+	stats->tx_packets = hw_stats->tx_frames;
+	stats->tx_dropped = hw_stats->tx_drops;
 
 	return stats;
 }
@@ -1460,7 +1452,7 @@ static void nicvf_tx_timeout(struct net_device *dev)
 		netdev_warn(dev, "%s: Transmit timed out, resetting\n",
 			    dev->name);
 
-	nic->drv_stats.tx_timeout++;
+	this_cpu_inc(nic->drv_stats->tx_timeout);
 	schedule_work(&nic->reset_task);
 }
 
@@ -1594,6 +1586,12 @@ static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 		goto err_free_netdev;
 	}
 
+	nic->drv_stats = netdev_alloc_pcpu_stats(struct nicvf_drv_stats);
+	if (!nic->drv_stats) {
+		err = -ENOMEM;
+		goto err_free_netdev;
+	}
+
 	err = nicvf_set_qset_resources(nic);
 	if (err)
 		goto err_free_netdev;
@@ -1652,6 +1650,8 @@ static int nicvf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	nicvf_unregister_interrupts(nic);
 err_free_netdev:
 	pci_set_drvdata(pdev, NULL);
+	if (nic->drv_stats)
+		free_percpu(nic->drv_stats);
 	free_netdev(netdev);
 err_release_regions:
 	pci_release_regions(pdev);
@@ -1679,6 +1679,8 @@ static void nicvf_remove(struct pci_dev *pdev)
 		unregister_netdev(pnetdev);
 	nicvf_unregister_interrupts(nic);
 	pci_set_drvdata(pdev, NULL);
+	if (nic->drv_stats)
+		free_percpu(nic->drv_stats);
 	free_netdev(netdev);
 	pci_release_regions(pdev);
 	pci_disable_device(pdev);
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
index f914eef..bdce591 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
@@ -104,7 +104,8 @@ static inline int nicvf_alloc_rcv_buffer(struct nicvf *nic, gfp_t gfp,
 		nic->rb_page = alloc_pages(gfp | __GFP_COMP | __GFP_NOWARN,
 					   order);
 		if (!nic->rb_page) {
-			nic->drv_stats.rcv_buffer_alloc_failures++;
+			this_cpu_inc(nic->pnicvf->drv_stats->
+				     rcv_buffer_alloc_failures);
 			return -ENOMEM;
 		}
 		nic->rb_page_offset = 0;
@@ -483,9 +484,12 @@ static void nicvf_reset_rcv_queue_stats(struct nicvf *nic)
 {
 	union nic_mbx mbx = {};
 
-	/* Reset all RXQ's stats */
+	/* Reset all RQ/SQ and VF stats */
 	mbx.reset_stat.msg = NIC_MBOX_MSG_RESET_STAT_COUNTER;
+	mbx.reset_stat.rx_stat_mask = 0x3FFF;
+	mbx.reset_stat.tx_stat_mask = 0x1F;
 	mbx.reset_stat.rq_stat_mask = 0xFFFF;
+	mbx.reset_stat.sq_stat_mask = 0xFFFF;
 	nicvf_send_msg_to_pf(nic, &mbx);
 }
 
@@ -1032,7 +1036,7 @@ nicvf_sq_add_hdr_subdesc(struct nicvf *nic, struct snd_queue *sq, int qentry,
 		hdr->tso_max_paysize = skb_shinfo(skb)->gso_size;
 		/* For non-tunneled pkts, point this to L2 ethertype */
 		hdr->inner_l3_offset = skb_network_offset(skb) - 2;
-		nic->drv_stats.tx_tso++;
+		this_cpu_inc(nic->pnicvf->drv_stats->tx_tso);
 	}
 }
 
@@ -1164,7 +1168,7 @@ static int nicvf_sq_append_tso(struct nicvf *nic, struct snd_queue *sq,
 
 	nicvf_sq_doorbell(nic, skb, sq_num, desc_cnt);
 
-	nic->drv_stats.tx_tso++;
+	this_cpu_inc(nic->pnicvf->drv_stats->tx_tso);
 	return 1;
 }
 
@@ -1425,8 +1429,6 @@ void nicvf_update_sq_stats(struct nicvf *nic, int sq_idx)
 /* Check for errors in the receive cmp.queue entry */
 int nicvf_check_cqe_rx_errs(struct nicvf *nic, struct cqe_rx_t *cqe_rx)
 {
-	struct nicvf_hw_stats *stats = &nic->hw_stats;
-
 	if (!cqe_rx->err_level && !cqe_rx->err_opcode)
 		return 0;
 
@@ -1438,76 +1440,76 @@ int nicvf_check_cqe_rx_errs(struct nicvf *nic, struct cqe_rx_t *cqe_rx)
 
 	switch (cqe_rx->err_opcode) {
 	case CQ_RX_ERROP_RE_PARTIAL:
-		stats->rx_bgx_truncated_pkts++;
+		this_cpu_inc(nic->drv_stats->rx_bgx_truncated_pkts);
 		break;
 	case CQ_RX_ERROP_RE_JABBER:
-		stats->rx_jabber_errs++;
+		this_cpu_inc(nic->drv_stats->rx_jabber_errs);
 		break;
 	case CQ_RX_ERROP_RE_FCS:
-		stats->rx_fcs_errs++;
+		this_cpu_inc(nic->drv_stats->rx_fcs_errs);
 		break;
 	case CQ_RX_ERROP_RE_RX_CTL:
-		stats->rx_bgx_errs++;
+		this_cpu_inc(nic->drv_stats->rx_bgx_errs);
 		break;
 	case CQ_RX_ERROP_PREL2_ERR:
-		stats->rx_prel2_errs++;
+		this_cpu_inc(nic->drv_stats->rx_prel2_errs);
 		break;
 	case CQ_RX_ERROP_L2_MAL:
-		stats->rx_l2_hdr_malformed++;
+		this_cpu_inc(nic->drv_stats->rx_l2_hdr_malformed);
 		break;
 	case CQ_RX_ERROP_L2_OVERSIZE:
-		stats->rx_oversize++;
+		this_cpu_inc(nic->drv_stats->rx_oversize);
 		break;
 	case CQ_RX_ERROP_L2_UNDERSIZE:
-		stats->rx_undersize++;
+		this_cpu_inc(nic->drv_stats->rx_undersize);
 		break;
 	case CQ_RX_ERROP_L2_LENMISM:
-		stats->rx_l2_len_mismatch++;
+		this_cpu_inc(nic->drv_stats->rx_l2_len_mismatch);
 		break;
 	case CQ_RX_ERROP_L2_PCLP:
-		stats->rx_l2_pclp++;
+		this_cpu_inc(nic->drv_stats->rx_l2_pclp);
 		break;
 	case CQ_RX_ERROP_IP_NOT:
-		stats->rx_ip_ver_errs++;
+		this_cpu_inc(nic->drv_stats->rx_ip_ver_errs);
 		break;
 	case CQ_RX_ERROP_IP_CSUM_ERR:
-		stats->rx_ip_csum_errs++;
+		this_cpu_inc(nic->drv_stats->rx_ip_csum_errs);
 		break;
 	case CQ_RX_ERROP_IP_MAL:
-		stats->rx_ip_hdr_malformed++;
+		this_cpu_inc(nic->drv_stats->rx_ip_hdr_malformed);
 		break;
 	case CQ_RX_ERROP_IP_MALD:
-		stats->rx_ip_payload_malformed++;
+		this_cpu_inc(nic->drv_stats->rx_ip_payload_malformed);
 		break;
 	case CQ_RX_ERROP_IP_HOP:
-		stats->rx_ip_ttl_errs++;
+		this_cpu_inc(nic->drv_stats->rx_ip_ttl_errs);
 		break;
 	case CQ_RX_ERROP_L3_PCLP:
-		stats->rx_l3_pclp++;
+		this_cpu_inc(nic->drv_stats->rx_l3_pclp);
 		break;
 	case CQ_RX_ERROP_L4_MAL:
-		stats->rx_l4_malformed++;
+		this_cpu_inc(nic->drv_stats->rx_l4_malformed);
 		break;
 	case CQ_RX_ERROP_L4_CHK:
-		stats->rx_l4_csum_errs++;
+		this_cpu_inc(nic->drv_stats->rx_l4_csum_errs);
 		break;
 	case CQ_RX_ERROP_UDP_LEN:
-		stats->rx_udp_len_errs++;
+		this_cpu_inc(nic->drv_stats->rx_udp_len_errs);
 		break;
 	case CQ_RX_ERROP_L4_PORT:
-		stats->rx_l4_port_errs++;
+		this_cpu_inc(nic->drv_stats->rx_l4_port_errs);
 		break;
 	case CQ_RX_ERROP_TCP_FLAG:
-		stats->rx_tcp_flag_errs++;
+		this_cpu_inc(nic->drv_stats->rx_tcp_flag_errs);
 		break;
 	case CQ_RX_ERROP_TCP_OFFSET:
-		stats->rx_tcp_offset_errs++;
+		this_cpu_inc(nic->drv_stats->rx_tcp_offset_errs);
 		break;
 	case CQ_RX_ERROP_L4_PCLP:
-		stats->rx_l4_pclp++;
+		this_cpu_inc(nic->drv_stats->rx_l4_pclp);
 		break;
 	case CQ_RX_ERROP_RBDR_TRUNC:
-		stats->rx_truncated_pkts++;
+		this_cpu_inc(nic->drv_stats->rx_truncated_pkts);
 		break;
 	}
 
@@ -1515,56 +1517,52 @@ int nicvf_check_cqe_rx_errs(struct nicvf *nic, struct cqe_rx_t *cqe_rx)
 }
 
 /* Check for errors in the send cmp.queue entry */
-int nicvf_check_cqe_tx_errs(struct nicvf *nic,
-			    struct cmp_queue *cq, struct cqe_send_t *cqe_tx)
+int nicvf_check_cqe_tx_errs(struct nicvf *nic, struct cqe_send_t *cqe_tx)
 {
-	struct cmp_queue_stats *stats = &cq->stats;
-
 	switch (cqe_tx->send_status) {
 	case CQ_TX_ERROP_GOOD:
-		stats->tx.good++;
 		return 0;
 	case CQ_TX_ERROP_DESC_FAULT:
-		stats->tx.desc_fault++;
+		this_cpu_inc(nic->drv_stats->tx_desc_fault);
 		break;
 	case CQ_TX_ERROP_HDR_CONS_ERR:
-		stats->tx.hdr_cons_err++;
+		this_cpu_inc(nic->drv_stats->tx_hdr_cons_err);
 		break;
 	case CQ_TX_ERROP_SUBDC_ERR:
-		stats->tx.subdesc_err++;
+		this_cpu_inc(nic->drv_stats->tx_subdesc_err);
 		break;
 	case CQ_TX_ERROP_MAX_SIZE_VIOL:
-		stats->tx.max_size_exceeded++;
+		this_cpu_inc(nic->drv_stats->tx_max_size_exceeded);
 		break;
 	case CQ_TX_ERROP_IMM_SIZE_OFLOW:
-		stats->tx.imm_size_oflow++;
+		this_cpu_inc(nic->drv_stats->tx_imm_size_oflow);
 		break;
 	case CQ_TX_ERROP_DATA_SEQUENCE_ERR:
-		stats->tx.data_seq_err++;
+		this_cpu_inc(nic->drv_stats->tx_data_seq_err);
 		break;
 	case CQ_TX_ERROP_MEM_SEQUENCE_ERR:
-		stats->tx.mem_seq_err++;
+		this_cpu_inc(nic->drv_stats->tx_mem_seq_err);
 		break;
 	case CQ_TX_ERROP_LOCK_VIOL:
-		stats->tx.lock_viol++;
+		this_cpu_inc(nic->drv_stats->tx_lock_viol);
 		break;
 	case CQ_TX_ERROP_DATA_FAULT:
-		stats->tx.data_fault++;
+		this_cpu_inc(nic->drv_stats->tx_data_fault);
 		break;
 	case CQ_TX_ERROP_TSTMP_CONFLICT:
-		stats->tx.tstmp_conflict++;
+		this_cpu_inc(nic->drv_stats->tx_tstmp_conflict);
 		break;
 	case CQ_TX_ERROP_TSTMP_TIMEOUT:
-		stats->tx.tstmp_timeout++;
+		this_cpu_inc(nic->drv_stats->tx_tstmp_timeout);
 		break;
 	case CQ_TX_ERROP_MEM_FAULT:
-		stats->tx.mem_fault++;
+		this_cpu_inc(nic->drv_stats->tx_mem_fault);
 		break;
 	case CQ_TX_ERROP_CK_OVERLAP:
-		stats->tx.csum_overlap++;
+		this_cpu_inc(nic->drv_stats->tx_csum_overlap);
 		break;
 	case CQ_TX_ERROP_CK_OFLOW:
-		stats->tx.csum_overflow++;
+		this_cpu_inc(nic->drv_stats->tx_csum_overflow);
 		break;
 	}
 
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
index 8f4718e..2e3c940 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.h
@@ -172,26 +172,6 @@ enum CQ_TX_ERROP_E {
 	CQ_TX_ERROP_ENUM_LAST = 0x8a,
 };
 
-struct cmp_queue_stats {
-	struct tx_stats {
-		u64 good;
-		u64 desc_fault;
-		u64 hdr_cons_err;
-		u64 subdesc_err;
-		u64 max_size_exceeded;
-		u64 imm_size_oflow;
-		u64 data_seq_err;
-		u64 mem_seq_err;
-		u64 lock_viol;
-		u64 data_fault;
-		u64 tstmp_conflict;
-		u64 tstmp_timeout;
-		u64 mem_fault;
-		u64 csum_overlap;
-		u64 csum_overflow;
-	} tx;
-} ____cacheline_aligned_in_smp;
-
 enum RQ_SQ_STATS {
 	RQ_SQ_STATS_OCTS,
 	RQ_SQ_STATS_PKTS,
@@ -243,7 +223,6 @@ struct cmp_queue {
 	spinlock_t	lock;  /* lock to serialize processing CQEs */
 	void		*desc;
 	struct q_desc_mem   dmem;
-	struct cmp_queue_stats	stats;
 	int		irq;
 } ____cacheline_aligned_in_smp;
 
@@ -338,6 +317,5 @@ u64  nicvf_queue_reg_read(struct nicvf *nic,
 void nicvf_update_rq_stats(struct nicvf *nic, int rq_idx);
 void nicvf_update_sq_stats(struct nicvf *nic, int sq_idx);
 int nicvf_check_cqe_rx_errs(struct nicvf *nic, struct cqe_rx_t *cqe_rx);
-int nicvf_check_cqe_tx_errs(struct nicvf *nic,
-			    struct cmp_queue *cq, struct cqe_send_t *cqe_tx);
+int nicvf_check_cqe_tx_errs(struct nicvf *nic, struct cqe_send_t *cqe_tx);
 #endif /* NICVF_QUEUES_H */
-- 
2.7.4

^ permalink raw reply related

* Re: [CRIU] [PATCH net-next] tcp: allow to enable the repair mode for non-listening sockets
From: Pavel Emelyanov @ 2016-11-15 12:08 UTC (permalink / raw)
  To: Andrei Vagin, David S. Miller
  Cc: Hideaki YOSHIFUJI, netdev, James Morris, linux-kernel, criu,
	Alexey Kuznetsov, Patrick McHardy
In-Reply-To: <1479176114-12658-1-git-send-email-avagin@openvz.org>

On 11/15/2016 05:15 AM, Andrei Vagin wrote:
> The repair mode is used to get and restore sequence numbers and
> data from queues. It used to checkpoint/restore connections.
> 
> Currently the repair mode can be enabled for sockets in the established
> and closed states, but for other states we have to dump the same socket
> properties, so lets allow to enable repair mode for these sockets.
> 
> The repair mode reveals nothing more for sockets in other states.
> 
> Signed-off-by: Andrei Vagin <avagin@openvz.org>

Acked-by: Pavel Emelyanov <xemul@openvz.org>

> ---
>  net/ipv4/tcp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 3251fe7..a2a3a8c 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2302,7 +2302,7 @@ EXPORT_SYMBOL(tcp_disconnect);
>  static inline bool tcp_can_repair_sock(const struct sock *sk)
>  {
>  	return ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN) &&
> -		((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_ESTABLISHED));
> +		(sk->sk_state != TCP_LISTEN);
>  }
>  
>  static int tcp_repair_set_window(struct tcp_sock *tp, char __user *optbuf, int len)
> 

^ permalink raw reply

* [PATCH v2 3/5] net: thunderx: Fix configuration of L3/L4 length checking
From: sunil.kovvuri @ 2016-11-15 12:07 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, linux-arm-kernel, Sunil Goutham

From: Sunil Goutham <sgoutham@cavium.com>

This patch fixes enabling of HW verification of L3/L4 length and
TCP/UDP checksum which is currently being cleared. Also fixed VLAN
stripping config which is being cleared when multiqset is enabled.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
---
 drivers/net/ethernet/cavium/thunder/nicvf_queues.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
index f0e0ca6..f914eef 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
@@ -538,9 +538,12 @@ static void nicvf_rcv_queue_config(struct nicvf *nic, struct queue_set *qs,
 	mbx.rq.cfg = (1ULL << 62) | (RQ_CQ_DROP << 8);
 	nicvf_send_msg_to_pf(nic, &mbx);
 
-	nicvf_queue_reg_write(nic, NIC_QSET_RQ_GEN_CFG, 0, 0x00);
-	if (!nic->sqs_mode)
+	if (!nic->sqs_mode && (qidx == 0)) {
+		/* Enable checking L3/L4 length and TCP/UDP checksums */
+		nicvf_queue_reg_write(nic, NIC_QSET_RQ_GEN_CFG, 0,
+				      (BIT(24) | BIT(23) | BIT(21)));
 		nicvf_config_vlan_stripping(nic, nic->netdev->features);
+	}
 
 	/* Enable Receive queue */
 	memset(&rq_cfg, 0, sizeof(struct rq_cfg));
-- 
2.7.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox