Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next] tcp: ignore rcv_rtt sample with old ts ecr value
From: Wei Wang @ 2018-06-20  4:42 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: Eric Dumazet, Neal Cardwell, Wei Wang

From: Wei Wang <weiwan@google.com>

When receiving multiple packets with the same ts ecr value, only try
to compute rcv_rtt sample with the earliest received packet.
This is because the rcv_rtt calculated by later received packets
could possibly include long idle time or other types of delay.
For example:
(1) server sends last packet of reply with TS val V1
(2) client ACKs last packet of reply with TS ecr V1
(3) long idle time passes
(4) client sends next request data packet with TS ecr V1 (again!)
At this time, the rcv_rtt computed on server with TS ecr V1 will be
inflated with the idle time and should get ignored.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/tcp.h  |  1 +
 net/ipv4/tcp.c       |  1 +
 net/ipv4/tcp_input.c | 14 +++++++++++---
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 72705eaf4b84..3dbea6610304 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -350,6 +350,7 @@ struct tcp_sock {
 #endif
 
 /* Receiver side RTT estimation */
+	u32 rcv_rtt_last_tsecr;
 	struct {
 		u32	rtt_us;
 		u32	seq;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 141acd92e58a..47c45d5be9f9 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2563,6 +2563,7 @@ int tcp_disconnect(struct sock *sk, int flags)
 	sk->sk_shutdown = 0;
 	sock_reset_flag(sk, SOCK_DONE);
 	tp->srtt_us = 0;
+	tp->rcv_rtt_last_tsecr = 0;
 	tp->write_seq += tp->max_window + 2;
 	if (tp->write_seq == 0)
 		tp->write_seq = 1;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 355d3dffd021..76ca88f63b70 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -582,9 +582,12 @@ static inline void tcp_rcv_rtt_measure_ts(struct sock *sk,
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 
-	if (tp->rx_opt.rcv_tsecr &&
-	    (TCP_SKB_CB(skb)->end_seq -
-	     TCP_SKB_CB(skb)->seq >= inet_csk(sk)->icsk_ack.rcv_mss)) {
+	if (tp->rx_opt.rcv_tsecr == tp->rcv_rtt_last_tsecr)
+		return;
+	tp->rcv_rtt_last_tsecr = tp->rx_opt.rcv_tsecr;
+
+	if (TCP_SKB_CB(skb)->end_seq -
+	    TCP_SKB_CB(skb)->seq >= inet_csk(sk)->icsk_ack.rcv_mss) {
 		u32 delta = tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr;
 		u32 delta_us;
 
@@ -5475,6 +5478,11 @@ void tcp_rcv_established(struct sock *sk, struct sk_buff *skb)
 				tcp_ack(sk, skb, 0);
 				__kfree_skb(skb);
 				tcp_data_snd_check(sk);
+				/* When receiving pure ack in fast path, update
+				 * last ts ecr directly instead of calling
+				 * tcp_rcv_rtt_measure_ts()
+				 */
+				tp->rcv_rtt_last_tsecr = tp->rx_opt.rcv_tsecr;
 				return;
 			} else { /* Header too small */
 				TCP_INC_STATS(sock_net(sk), TCP_MIB_INERRS);
-- 
2.18.0.rc1.244.gcf134e6275-goog

^ permalink raw reply related

* Re: [PATCH net] ipvlan: call dev_change_flags when reset ipvlan mode
From: Hangbin Liu @ 2018-06-20  3:22 UTC (permalink / raw)
  To: Cong Wang
  Cc: Linux Kernel Network Developers, Stefano Brivio, Paolo Abeni,
	David Miller, Mahesh Bandewar
In-Reply-To: <CAM_iQpXeuU=Cxons5=centGNJzm0OaNU3Jj5hE91hvJH0o2-Eg@mail.gmail.com>

On Tue, Jun 19, 2018 at 02:10:18PM -0700, Cong Wang wrote:
> On Mon, Jun 18, 2018 at 7:04 AM, Hangbin Liu <liuhangbin@gmail.com> wrote:
> > @@ -94,10 +95,13 @@ static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval)
> >                         mdev->l3mdev_ops = NULL;
> >                 }
> >                 list_for_each_entry(ipvlan, &port->ipvlans, pnode) {
> > +                       flags = ipvlan->dev->flags;
> >                         if (nval == IPVLAN_MODE_L3 || nval == IPVLAN_MODE_L3S)
> > -                               ipvlan->dev->flags |= IFF_NOARP;
> > +                               dev_change_flags(ipvlan->dev,
> > +                                                flags | IFF_NOARP);
> >                         else
> > -                               ipvlan->dev->flags &= ~IFF_NOARP;
> > +                               dev_change_flags(ipvlan->dev,
> > +                                                flags & ~IFF_NOARP);
> 
> You need to check the return value of dev_change_flags().

Hi Wang Cong,

The only case dev_change_flags() return an err is when we change IFF_UP flag.
Since we only set/reset IFF_NOARP, do you think we still need to check the
return value?

Thanks
Hangbin

^ permalink raw reply

* Re: [PATCH] PCI: allow drivers to limit the number of VFs to 0
From: Jakub Kicinski @ 2018-06-20  2:56 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, linux-pci, netdev, Sathya Perla, Felix Manlunas,
	alexander.duyck, john.fastabend, Jacob Keller, Donald Dutile,
	oss-drivers, Christoph Hellwig
In-Reply-To: <20180619213715.GC33049@bhelgaas-glaptop.roam.corp.google.com>

On Tue, 19 Jun 2018 16:37:15 -0500, Bjorn Helgaas wrote:
> On Fri, May 25, 2018 at 09:02:23AM -0500, Bjorn Helgaas wrote:
> > On Thu, May 24, 2018 at 06:20:15PM -0700, Jakub Kicinski wrote:  
> > > Hi Bjorn!
> > > 
> > > On Thu, 24 May 2018 18:57:48 -0500, Bjorn Helgaas wrote:  
> > > > On Mon, Apr 02, 2018 at 03:46:52PM -0700, Jakub Kicinski wrote:  
> > > > > Some user space depends on enabling sriov_totalvfs number of VFs
> > > > > to not fail, e.g.:
> > > > > 
> > > > > $ cat .../sriov_totalvfs > .../sriov_numvfs
> > > > > 
> > > > > For devices which VF support depends on loaded FW we have the
> > > > > pci_sriov_{g,s}et_totalvfs() API.  However, this API uses 0 as
> > > > > a special "unset" value, meaning drivers can't limit sriov_totalvfs
> > > > > to 0.  Remove the special values completely and simply initialize
> > > > > driver_max_VFs to total_VFs.  Then always use driver_max_VFs.
> > > > > Add a helper for drivers to reset the VF limit back to total.    
> > > > 
> > > > I still can't really make sense out of the changelog.
> > > >
> > > > I think part of the reason it's confusing is because there are two
> > > > things going on:
> > > > 
> > > >   1) You want this:
> > > >   
> > > >        pci_sriov_set_totalvfs(dev, 0);
> > > >        x = pci_sriov_get_totalvfs(dev) 
> > > > 
> > > >      to return 0 instead of total_VFs.  That seems to connect with
> > > >      your subject line.  It means "sriov_totalvfs" in sysfs could be
> > > >      0, but I don't know how that is useful (I'm sure it is; just
> > > >      educate me :))  
> > > 
> > > Let me just quote the bug report that got filed on our internal bug
> > > tracker :)
> > > 
> > >   When testing Juju Openstack with Ubuntu 18.04, enabling SR-IOV causes
> > >   errors because Juju gets the sriov_totalvfs for SR-IOV-capable device
> > >   then tries to set that as the sriov_numvfs parameter.
> > > 
> > >   For SR-IOV incapable FW, the sriov_totalvfs parameter should be 0, 
> > >   but it's set to max.  When FW is switched to flower*, the correct 
> > >   sriov_totalvfs value is presented.
> > > 
> > > * flower is a project name  
> > 
> > From the point of view of the PCI core (which knows nothing about
> > device firmware and relies on the architected config space described
> > by the PCIe spec), this sounds like an erratum: with some firmware
> > installed, the device is not capable of SR-IOV, but still advertises
> > an SR-IOV capability with "TotalVFs > 0".
> > 
> > Regardless of whether that's an erratum, we do allow PF drivers to use
> > pci_sriov_set_totalvfs() to limit the number of VFs that may be
> > enabled by writing to the PF's "sriov_numvfs" sysfs file.
> > 
> > But the current implementation does not allow a PF driver to limit VFs
> > to 0, and that does seem nonsensical.
> >   
> > > My understanding is OpenStack uses sriov_totalvfs to determine how many
> > > VFs can be enabled, looks like this is the code:
> > > 
> > > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n464
> > >   
> > > >   2) You're adding the pci_sriov_reset_totalvfs() interface.  I'm not
> > > >      sure what you intend for this.  Is *every* driver supposed to
> > > >      call it in .remove()?  Could/should this be done in the core
> > > >      somehow instead of depending on every driver?  
> > > 
> > > Good question, I was just thinking yesterday we may want to call it
> > > from the core, but I don't think it's strictly necessary nor always
> > > sufficient (we may reload FW without re-probing).
> > > 
> > > We have a device which supports different number of VFs based on the FW
> > > loaded.  Some legacy FWs does not inform the driver how many VFs it can
> > > support, because it supports max.  So the flow in our driver is this:
> > > 
> > > load_fw(dev);
> > > ...
> > > max_vfs = ask_fw_for_max_vfs(dev);
> > > if (max_vfs >= 0)
> > > 	return pci_sriov_set_totalvfs(dev, max_vfs);
> > > else /* FW didn't tell us, assume max */
> > > 	return pci_sriov_reset_totalvfs(dev); 
> > > 
> > > We also reset the max on device remove, but that's not strictly
> > > necessary.
> > > 
> > > Other users of pci_sriov_set_totalvfs() always know the value to set
> > > the total to (either always get it from FW or it's a constant).
> > > 
> > > If you prefer we can work out the correct max for those legacy cases in
> > > the driver as well, although it seemed cleaner to just ask the core,
> > > since it already has total_VFs value handy :)
> > >   
> > > > I'm also having a hard time connecting your user-space command example
> > > > with the rest of this.  Maybe it will make more sense to me tomorrow
> > > > after some coffee.  
> > > 
> > > OpenStack assumes it will always be able to set sriov_numvfs to
> > > sriov_totalvfs, see this 'if':
> > > 
> > > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n512  
> > 
> > Thanks for educating me.  I think there are two issues here that we
> > can separate.  I extracted the patch below for the first.
> > 
> > The second is the question of resetting driver_max_VFs.  I think we
> > currently have a general issue in the core:
> > 
> >   - load PF driver 1
> >   - driver calls pci_sriov_set_totalvfs() to reduce driver_max_VFs
> >   - unload PF driver 1
> >   - load PF driver 2
> > 
> > Now driver_max_VFs is still stuck at the lower value set by driver 1.
> > I don't think that's the way this should work.
> > 
> > I guess this is partly a consequence of setting driver_max_VFs in
> > sriov_init(), which is called before driver attach and should only
> > depend on hardware characteristics, so it is related to the patch
> > below.  But I think we should fix it in general, not just for
> > netronome.  
> 
> Hi Jakub, the patch below is in v4.18-rc1 as 8d85a7a4f2c9 ("PCI/IOV:
> Allow PF drivers to limit total_VFs to 0").  If there's more we need
> to do here, would you mind rebasing what's left to v4.18-rc1 and
> reposting it?

Hi Bjorn!

Thanks a lot for looking into this!  My understanding is that we have
two ways forward:
 - add a pci_sriov_reset_totalvfs() helper for drivers to call;
 - make the core reset the totalVFs after driver is detached.

IMHO second option is better.  I went ahead and posted:

https://patchwork.ozlabs.org/patch/931210/

This works very well for nfp driver (modulo minor clean ups but I'd
rather route those via networking trees to avoid conflicts).

> > commit 4a338bc6f94b9ad824ac944f5dfc249d6838719c
> > Author: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Date:   Fri May 25 08:18:34 2018 -0500
> > 
> >     PCI/IOV: Allow PF drivers to limit total_VFs to 0
> >     
> >     Some SR-IOV PF drivers implement .sriov_configure(), which allows
> >     user-space to enable VFs by writing the desired number of VFs to the sysfs
> >     "sriov_numvfs" file (see sriov_numvfs_store()).
> >     
> >     The PCI core limits the number of VFs to the TotalVFs advertised by the
> >     device in its SR-IOV capability.  The PF driver can limit the number of VFs
> >     to even fewer (it may have pre-allocated data structures or knowledge of
> >     device limitations) by calling pci_sriov_set_totalvfs(), but previously it
> >     could not limit the VFs to 0.
> >     
> >     Change pci_sriov_get_totalvfs() so it always respects the VF limit imposed
> >     by the PF driver, even if the limit is 0.
> >     
> >     This sequence:
> >     
> >       pci_sriov_set_totalvfs(dev, 0);
> >       x = pci_sriov_get_totalvfs(dev);
> >     
> >     previously set "x" to TotalVFs from the SR-IOV capability.  Now it will set
> >     "x" to 0.
> >     
> >     Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> >     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > 
> > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> > index 192b82898a38..d0d73dbbd5ca 100644
> > --- a/drivers/pci/iov.c
> > +++ b/drivers/pci/iov.c
> > @@ -469,6 +469,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
> >  	iov->nres = nres;
> >  	iov->ctrl = ctrl;
> >  	iov->total_VFs = total;
> > +	iov->driver_max_VFs = total;
> >  	pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, &iov->vf_device);
> >  	iov->pgsz = pgsz;
> >  	iov->self = dev;
> > @@ -827,10 +828,7 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
> >  	if (!dev->is_physfn)
> >  		return 0;
> >  
> > -	if (dev->sriov->driver_max_VFs)
> > -		return dev->sriov->driver_max_VFs;
> > -
> > -	return dev->sriov->total_VFs;
> > +	return dev->sriov->driver_max_VFs;
> >  }
> >  EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
> >    

^ permalink raw reply

* Re: [PATCH v2] net: ethernet: stmmac: dwmac-rk: Add GMAC support for PX30
From: David Wu @ 2018-06-20  2:40 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: davem, robh+dt, mark.rutland, huangtao, netdev, linux-arm-kernel,
	linux-rockchip, linux-kernel, 张晴
In-Reply-To: <2582999.2hZx6CH9S6@diego>

Hi Heiko,

在 2018年06月14日 16:30, Heiko Stübner 写道:
> Am Donnerstag, 14. Juni 2018, 10:14:31 CEST schrieb David Wu:
>> Hi Heiko,
>>
>> 在 2018年06月14日 15:54, Heiko Stübner 写道:
>>> I don't see that new clock documented in the dt-binding.
>>> Also, which clock from the clock-controller does this connect to?
>>
>> The clock is the "SCLK_GMAC_RMII" at the clock-controller, which could
>> be set rate by the link speed.
> 
> Hmm, while these huge number of clocks are somewhat strange,
> shouldn't it be named something with _rmii instead of _speed then?

Okay, it is better to be named _speed.

> 
> Also, I don't see any clk_enable action for that new clock, so you could
> end up with being off?

The new speed is the parent of the clk_tx_rx, to enable/disable 
clk_tx_rx, the new clock would be also enabled/disabled.

> 
> And someone could convert the driver to use the new clk-bulk APIs [0],
> so the large number of clk_prepare_enable calls would be a bit
> trimmed down.
> 
> 
> Heiko
> 
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/clk/clk-bulk.c
> 
> 
> 
> 
> 

^ permalink raw reply

* [PATCH net] net: sungem: fix rx checksum support
From: Eric Dumazet @ 2018-06-20  2:18 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Meelis Roos, Mathieu Malaterre, Andreas Schwab,
	Eric Dumazet, Eric Dumazet

After commit 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE
are friends"), sungem owners reported the infamous "eth0: hw csum failure"
message.

CHECKSUM_COMPLETE has in fact never worked for this driver, but this
was masked by the fact that upper stacks had to strip the FCS, and
therefore skb->ip_summed was set back to CHECKSUM_NONE before
my recent change.

Driver configures a number of bytes to skip when the chip computes
the checksum, and for some reason only half of the Ethernet header
was skipped.

Then a second problem is that we should strip the FCS by default,
unless the driver is updated to eventually support NETIF_F_RXFCS in
the future.

Finally, a driver should check if NETIF_F_RXCSUM feature is enabled
or not, so that the admin can turn off rx checksum if wanted.

Many thanks to Andreas Schwab and Mathieu Malaterre for their
help in debugging this issue.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Mathieu Malaterre <malat@debian.org>
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Tested-by: Andreas Schwab <schwab@linux-m68k.org>
---
 drivers/net/ethernet/sun/sungem.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..b9221fc1674dfa0ef17a43f8ff86d700a1ae514f 100644
--- a/drivers/net/ethernet/sun/sungem.c
+++ b/drivers/net/ethernet/sun/sungem.c
@@ -60,8 +60,7 @@
 #include <linux/sungem_phy.h>
 #include "sungem.h"
 
-/* Stripping FCS is causing problems, disabled for now */
-#undef STRIP_FCS
+#define STRIP_FCS
 
 #define DEFAULT_MSG	(NETIF_MSG_DRV		| \
 			 NETIF_MSG_PROBE	| \
@@ -435,7 +434,7 @@ static int gem_rxmac_reset(struct gem *gp)
 	writel(desc_dma & 0xffffffff, gp->regs + RXDMA_DBLOW);
 	writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK);
 	val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
-	       ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
+	       (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
 	writel(val, gp->regs + RXDMA_CFG);
 	if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN)
 		writel(((5 & RXDMA_BLANK_IPKTS) |
@@ -760,7 +759,6 @@ static int gem_rx(struct gem *gp, int work_to_do)
 	struct net_device *dev = gp->dev;
 	int entry, drops, work_done = 0;
 	u32 done;
-	__sum16 csum;
 
 	if (netif_msg_rx_status(gp))
 		printk(KERN_DEBUG "%s: rx interrupt, done: %d, rx_new: %d\n",
@@ -855,9 +853,13 @@ static int gem_rx(struct gem *gp, int work_to_do)
 			skb = copy_skb;
 		}
 
-		csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff);
-		skb->csum = csum_unfold(csum);
-		skb->ip_summed = CHECKSUM_COMPLETE;
+		if (likely(dev->features & NETIF_F_RXCSUM)) {
+			__sum16 csum;
+
+			csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff);
+			skb->csum = csum_unfold(csum);
+			skb->ip_summed = CHECKSUM_COMPLETE;
+		}
 		skb->protocol = eth_type_trans(skb, gp->dev);
 
 		napi_gro_receive(&gp->napi, skb);
@@ -1761,7 +1763,7 @@ static void gem_init_dma(struct gem *gp)
 	writel(0, gp->regs + TXDMA_KICK);
 
 	val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
-	       ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
+	       (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
 	writel(val, gp->regs + RXDMA_CFG);
 
 	writel(desc_dma >> 32, gp->regs + RXDMA_DBHI);
@@ -2985,8 +2987,8 @@ static int gem_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	pci_set_drvdata(pdev, dev);
 
 	/* We can do scatter/gather and HW checksum */
-	dev->hw_features = NETIF_F_SG | NETIF_F_HW_CSUM;
-	dev->features |= dev->hw_features | NETIF_F_RXCSUM;
+	dev->hw_features = NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
+	dev->features = dev->hw_features;
 	if (pci_using_dac)
 		dev->features |= NETIF_F_HIGHDMA;
 
-- 
2.18.0.rc1.244.gcf134e6275-goog

^ permalink raw reply related

* Re: [PATCH] selftests: net: add config fragments
From: Shannon Nelson @ 2018-06-20  1:42 UTC (permalink / raw)
  To: Anders Roxell, davem, shuah, fw; +Cc: netdev, linux-kselftest, linux-kernel
In-Reply-To: <20180619164111.30785-1-anders.roxell@linaro.org>

On 6/19/2018 9:41 AM, Anders Roxell wrote:
> Add fragments to pass bridge and vlan tests.
> 
> Fixes: 33b01b7b4f19 ("selftests: add rtnetlink test script")
> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
> ---
> 
> Hi,
> 
> net/rtnetlink.sh still fails on tc hbt hierarchy, addrlabel and ipsec:
> Error: Specified qdisc not found.
> RTNETLINK answers: No such file or directory
> Error: Parent Qdisc doesn't exists.
> We have an error talking to the kernel, -1
> Error: Parent Qdisc doesn't exists.
> We have an error talking to the kernel, -1
> Error: Parent Qdisc doesn't exists.
> We have an error talking to the kernel, -1
> Error: Parent Qdisc doesn't exists.
> We have an error talking to the kernel, -1
> Error: Parent Qdisc doesn't exists.
> We have an error talking to the kernel, -1
> Error: Parent Qdisc doesn't exists.
> We have an error talking to the kernel, -1
> Error: Invalid handle.
> FAIL: tc htb hierarchy
> 
> FAIL: ipv6 addrlabel
> 
> FAIL: can't add fou port 7777, skipping test
> RTNETLINK answers: Operation not supported
> FAIL: can't add macsec interface, skipping test
> RTNETLINK answers: Protocol not supported
> RTNETLINK answers: No such process
> RTNETLINK answers: No such process
> ./rtnetlink.sh: line 527:  5356 Terminated              ip x m >
> $tmpfile
> FAIL: ipsec
> 
> 
> I'm using iproute2 tag: 4.17 and tried the qdisc command from the
> function kci_test_tc in net/rtnetlink.sh:
> $ tc qdisc add dev lo root handle 1: htb
> Error: Specified qdisc not found.
> 
> For kci_test_addrlabel it fails on this row:
> ip addrlabel list |grep -q "prefix dead::/64 dev lo label 1"
> 
> Any idea why these three fails?


The "Terminated" line is there because "ip x m" had been put into the 
background, and at the end of the ipsec test it is killed.  I can try to 
play some games with exec and redirection to make that go away.

The "FAIL: ipsec" is partly because the test isn't smart enough to look 
to see if there is any offload actually available to test.  I'm working 
on a patch to netdevsim to add the ipsec-offload in order to have a 
better test.  And yes, this should say "ipsec-offload", not "ipsec".

I don't know about the qdisk or addrlabel issues.

Cheers,
sln


> 
> Cheers,
> Anders
> 
>   tools/testing/selftests/net/config | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config
> index 7ba089b33e8b..cd3a2f1545b5 100644
> --- a/tools/testing/selftests/net/config
> +++ b/tools/testing/selftests/net/config
> @@ -12,3 +12,5 @@ CONFIG_NET_IPVTI=y
>   CONFIG_INET6_XFRM_MODE_TUNNEL=y
>   CONFIG_IPV6_VTI=y
>   CONFIG_DUMMY=y
> +CONFIG_BRIDGE=y
> +CONFIG_VLAN_8021Q=y
> 

^ permalink raw reply

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
From: Andrew Collins @ 2018-06-20  0:56 UTC (permalink / raw)
  To: David Miller; +Cc: f.fainelli, pablo, netfilter-devel, netdev, Steffen Klassert
In-Reply-To: <20180614.165549.1790121217625223670.davem@davemloft.net>

On Thu, Jun 14, 2018 at 5:55 PM, David Miller <davem@davemloft.net> wrote:
> And guess what?  Then millions of possibilities would have been
> openned up, rather than just this one special case.
>
> So, I ask, please see the larger picture.

+cc netdev/etc

This is perhaps unrelated to the topic at hand, but as someone who's shipped
a bunch of devices over the years using the linux kernel forwarding path and
needs performance but wants to avoid moving to out of tree userspace offload
for all the reasons that you and many others have stated, is the long
term vision that the existing kernel forwarding path will transparently take
advantage of eBPF (ala bpfilter), or that users will write custom/individualized
eBPF forwarding paths for their usecases as necessary?

I (and I suspect many others) will start on the latter anyways, I'm just curious
whether it's desired/expected that such custom fastpath users will eventually
be rolled back into/replaced by a transparent upstream in-kernel equivalent.

^ permalink raw reply

* Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary condition
From: Eric Dumazet @ 2018-06-20  0:25 UTC (permalink / raw)
  To: Saeed Mahameed, eric.dumazet@gmail.com, kafai@fb.com,
	Tariq Toukan
  Cc: netdev@vger.kernel.org, edumazet@google.com
In-Reply-To: <deda277de538f2083fb23479f26afa03ffc79644.camel@mellanox.com>



On 06/19/2018 11:05 AM, Saeed Mahameed wrote:

> this is only true for XDP setup, for non XDP max stride_size can only
> be around ~3k and only for mtu > ~6k
> 
> For XDP setup you suggested:
> -               priv->frag_info[0].frag_size = eff_mtu;
> +               priv->frag_info[0].frag_size = PAGE_SIZE;
> 
> currently the condition is:
> 
> release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;
> 
> so my solution and yours have the same problem you described above.
> 
> the problem is not with the initial values or with stride/farg size
> math, it just that in XDP we shouldn't reuse at ALL. I agree with you
> that we need to optimize and maybe for PAGE_SIZE > 8k we need to allow
> XDP setup to reuses. but for now there is a data corruption to handle.


Sure, we all agree there is a bug to fix.

The way you are fixing it is kind of illogical.

The NIC can use a frag if its _size_ is big enough to receive the frame.

The _stride_  is an abstraction created by the driver to report an estimation of the _truesize_,
or memory consumption, so that linux can better track overall memory usage.

For example, if MTU=1500, the size of the fragment is 1536 bytes, but since we can put only
2 fragments per 4KB page (on x86), we declare the _stride_ to be 2048 bytes.

Declaring that a final blob of a page, being 1600 bytes, not able to receive a frame because
_stride_ is 2048 is illogical and waste resources.

^ permalink raw reply

* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Eric Dumazet @ 2018-06-20  0:19 UTC (permalink / raw)
  To: Michael Ellerman, Eric Dumazet, Mathieu Malaterre,
	David S. Miller
  Cc: Eric Dumazet, linux-kernel, netdev
In-Reply-To: <87muvqwc06.fsf@concordia.ellerman.id.au>



On 06/19/2018 05:13 PM, Michael Ellerman wrote:

> Just so I'm clear, this turned out to be a driver/hw problem rather than
> the arch csum implementation?


Yes, that was a driver bug.

I will send an official patch to fix this.

You guys will have faster RX, since CHECKSUM_COMPLETE will finally be used up to TCP/UDP stacks.

^ permalink raw reply

* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Michael Ellerman @ 2018-06-20  0:13 UTC (permalink / raw)
  To: Eric Dumazet, Mathieu Malaterre, David S. Miller
  Cc: Eric Dumazet, linux-kernel, netdev
In-Reply-To: <fbb95c11-240c-1a11-0a62-0483908c577e@gmail.com>

Eric Dumazet <eric.dumazet@gmail.com> writes:

> On 06/15/2018 11:56 AM, Mathieu Malaterre wrote:
>> This reverts commit 88078d98d1bb085d72af8437707279e203524fa5.
>> 
>> It causes regressions for people using chips driven by the sungem
>> driver. Suspicion is that the skb->csum value isn't being adjusted
>> properly.
>> 
>> Symptoms as seen on G4+sungem are:
>> 
>> [   34.023281] eth0: hw csum failure
>> [   34.023438] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #2
>> [   34.023618] Call Trace:
>> [   34.023707] [dffedbd0] [c069ddac] __skb_checksum_complete+0xf0/0x108 (unreliable)
>> [   34.023948] [dffedbf0] [c0777a70] tcp_v4_rcv+0x604/0xe00
>> [   34.024118] [dffedc70] [c0731624] ip_local_deliver_finish+0xa8/0x3c4
>> [   34.024315] [dffedcb0] [c0732430] ip_local_deliver+0xf0/0x154
>> [   34.024493] [dffedcf0] [c07328dc] ip_rcv+0x448/0x774
>> [   34.024653] [dffedd50] [c06aeae0] __netif_receive_skb_core+0x5e8/0x1184
>> [   34.024857] [dffedde0] [c06bba20] napi_gro_receive+0x160/0x22c
>> [   34.025044] [dffede10] [e14b2590] gem_poll+0x7fc/0x1ac0 [sungem]
>> [   34.025228] [dffedee0] [c06bacf0] net_rx_action+0x34c/0x618
>> [   34.025402] [dffedf60] [c07fd27c] __do_softirq+0x16c/0x5f0
>> [   34.025575] [dffedfd0] [c0064c7c] irq_exit+0x110/0x1a8
>> [   34.025738] [dffedff0] [c0016170] call_do_irq+0x24/0x3c
>> [   34.025903] [c0cf7e80] [c0009a84] do_IRQ+0x98/0x1a0
>> [   34.026055] [c0cf7eb0] [c001b474] ret_from_except+0x0/0x14
>> [   34.026225] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
>>                    LR = arch_cpu_idle+0x30/0x78
>> [   34.026510] [c0cf7f70] [c0cf6000] 0xc0cf6000 (unreliable)
>> [   34.026682] [c0cf7f80] [c00a3868] do_idle+0xc4/0x158
>> [   34.026835] [c0cf7fb0] [c00a3ab0] cpu_startup_entry+0x20/0x28
>> [   34.027013] [c0cf7fc0] [c0998820] start_kernel+0x47c/0x490
>> [   34.027181] [c0cf7ff0] [00003444] 0x3444
>> 
>> See commit 7ce5a27f2ef8 ("Revert "net: Handle CHECKSUM_COMPLETE more
>> adequately in pskb_trim_rcsum()."") for previous reference.
>
> This fix seems to hide a bug in csum functions on this architecture.
>
> Or a bug on this NIC when receiving a small packet (less than 60 bytes).
> Maybe the padding bytes are not included in NIC provided csum, and not 0.

Just so I'm clear, this turned out to be a driver/hw problem rather than
the arch csum implementation?

cheers

^ permalink raw reply

* Re: [PATCH] bpfilter: ignore binary files
From: David Miller @ 2018-06-20  0:08 UTC (permalink / raw)
  To: mcroce; +Cc: netdev, ast
In-Reply-To: <20180619152136.2340-1-mcroce@redhat.com>

From: Matteo Croce <mcroce@redhat.com>
Date: Tue, 19 Jun 2018 17:21:36 +0200

> net/bpfilter/bpfilter_umh is a binary file generated when bpfilter is
> enabled, add it to .gitignore to avoid committing it.
> 
> Fixes: d2ba09c17a064 ("net: add skeleton of bpfilter kernel module")
> Signed-off-by: Matteo Croce <mcroce@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH] bpfilter: fix build error
From: David Miller @ 2018-06-20  0:08 UTC (permalink / raw)
  To: mcroce; +Cc: netdev, ast
In-Reply-To: <20180619151620.1912-1-mcroce@redhat.com>

From: Matteo Croce <mcroce@redhat.com>
Date: Tue, 19 Jun 2018 17:16:20 +0200

> bpfilter Makefile assumes that the system locale is en_US, and the
> parsing of objdump output fails.
> Set LC_ALL=C and, while at it, rewrite the objdump parsing so it spawns
> only 2 processes instead of 7.
> 
> Fixes: d2ba09c17a064 ("net: add skeleton of bpfilter kernel module")
> Signed-off-by: Matteo Croce <mcroce@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net/usb/drivers: Remove useless hrtimer_active check
From: David Miller @ 2018-06-20  0:07 UTC (permalink / raw)
  To: daniel.lezcano; +Cc: oliver, linux-usb, netdev, linux-kernel
In-Reply-To: <1529417671-21384-1-git-send-email-daniel.lezcano@linaro.org>

From: Daniel Lezcano <daniel.lezcano@linaro.org>
Date: Tue, 19 Jun 2018 16:14:30 +0200

> The code does:
> 
>  if (hrtimer_active(&t))
>     hrtimer_cancel(&t);
> 
> However, hrtimer_cancel() checks if the timer is active, so the
> test above is pointless.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

Applied.

^ permalink raw reply

* Re: [PATCH net] net/sched: act_ife: preserve the action control in case of error
From: David Miller @ 2018-06-20  0:06 UTC (permalink / raw)
  To: dcaratti; +Cc: jhs, xiyou.wangcong, netdev
In-Reply-To: <717d0682de4fb24ed6818f2bf264202d22e1e8be.1529415179.git.dcaratti@redhat.com>

From: Davide Caratti <dcaratti@redhat.com>
Date: Tue, 19 Jun 2018 15:45:50 +0200

> in the following script
> 
>  # tc actions add action ife encode allow prio pass index 42
>  # tc actions replace action ife encode allow tcindex drop index 42
> 
> the action control should remain equal to 'pass', if the kernel failed
> to replace the TC action. Pospone the assignment of the action control,
> to ensure it is not overwritten in the error path of tcf_ife_init().
> 
> Fixes: ef6980b6becb ("introduce IFE action")
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH net] net/sched: act_ife: fix recursive lock and idr leak
From: David Miller @ 2018-06-20  0:06 UTC (permalink / raw)
  To: dcaratti; +Cc: jhs, xiyou.wangcong, netdev
In-Reply-To: <40b45f70ef007b222b36a4676174e597c41d697f.1529415169.git.dcaratti@redhat.com>

From: Davide Caratti <dcaratti@redhat.com>
Date: Tue, 19 Jun 2018 15:39:46 +0200

> a recursive lock warning [1] can be observed with the following script,
 ...
> in case the kernel was unable to run the last command (e.g. because of
> the impossibility to load 'act_meta_skbtcindex'). For a similar reason,
> the kernel can leak idr in the error path of tcf_ife_init(), because
> tcf_idr_release() is not called after successful idr reservation:
 ...
> Since tcfa_lock is already taken when the action is being edited, a call
> to tcf_idr_release() wrongly makes tcf_idr_cleanup() take the same lock
> again. On the other hand, tcf_idr_release() needs to be called in the
> error path of tcf_ife_init(), to undo the last tcf_idr_create() invocation.
> Fix both problems in tcf_ife_init().
> Since the cleanup() routine can now be called when ife->params is NULL,
> also add a NULL pointer check to avoid calling kfree_rcu(NULL, rcu).
> 
>  [1]
 ...
> Fixes: 4e8c86155010 ("net sched: net sched: ife action fix late binding")
> Fixes: ef6980b6becb ("introduce IFE action")
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

Applied and queued up for -stable.

^ permalink raw reply

* [PATCH] 6lowpan: iphc: reset mac_header after decompress to fix panic
From: Michael Scott @ 2018-06-19 23:44 UTC (permalink / raw)
  Cc: Michael Scott, Alexander Aring, Jukka Rissanen, David S. Miller,
	linux-bluetooth, linux-wpan, netdev, linux-kernel

After decompression of 6lowpan socket data, an IPv6 header is inserted
before the existing socket payload.  After this, we reset the
network_header value of the skb to account for the difference in payload
size from prior to decompression + the addition of the IPv6 header.

However, we fail to reset the mac_header value.

Leaving the mac_header value untouched here, can cause a calculation
error in net/packet/af_packet.c packet_rcv() function when an
AF_PACKET socket is opened in SOCK_RAW mode for use on a 6lowpan
interface.

On line 2088, the data pointer is moved backward by the value returned
from skb_mac_header().  If skb->data is adjusted so that it is before
the skb->head pointer (which can happen when an old value of mac_header
is left in place) the kernel generates a panic in net/core/skbuff.c
line 1717.

This panic can be generated by BLE 6lowpan interfaces (such as bt0) and
802.15.4 interfaces (such as lowpan0) as they both use the same 6lowpan
sources for compression and decompression.

Signed-off-by: Michael Scott <michael@opensourcefoundries.com>
---
 net/6lowpan/iphc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/6lowpan/iphc.c b/net/6lowpan/iphc.c
index 6b1042e21656..52fad5dad9f7 100644
--- a/net/6lowpan/iphc.c
+++ b/net/6lowpan/iphc.c
@@ -770,6 +770,7 @@ int lowpan_header_decompress(struct sk_buff *skb, const struct net_device *dev,
 		hdr.hop_limit, &hdr.daddr);
 
 	skb_push(skb, sizeof(hdr));
+	skb_reset_mac_header(skb);
 	skb_reset_network_header(skb);
 	skb_copy_to_linear_data(skb, &hdr, sizeof(hdr));
 
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH] net: propagate dev_get_valid_name return code
From: David Miller @ 2018-06-19 23:38 UTC (permalink / raw)
  To: lirongqing; +Cc: netdev
In-Reply-To: <1529400197-24165-1-git-send-email-lirongqing@baidu.com>

From: Li RongQing <lirongqing@baidu.com>
Date: Tue, 19 Jun 2018 17:23:17 +0800

> if dev_get_valid_name failed, propagate its return code
> 
> and remove the setting err to ENODEV, it will be set to
> 0 again before dev_change_net_namespace exits.
> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH] [net-next, v2] tcp: use monotonic timestamps for PAWS
From: Eric Dumazet @ 2018-06-19 23:38 UTC (permalink / raw)
  To: Arnd Bergmann, Herbert Xu, David S. Miller, Eric Dumazet,
	Alexey Kuznetsov, Hideaki YOSHIFUJI
  Cc: y2038, Harsh Jain, Atul Gupta, Michael Werner,
	Gustavo A. R. Silva, Yuchung Cheng, Neal Cardwell,
	Soheil Hassas Yeganeh, Florian Westphal, Christoph Paasch,
	Lawrence Brakmo, linux-crypto, linux-kernel, netdev
In-Reply-To: <20180619123724.2535981-1-arnd@arndb.de>



On 06/19/2018 05:36 AM, Arnd Bergmann wrote:
> Using get_seconds() for timestamps is deprecated since it can lead
> to overflows on 32-bit systems. While the interface generally doesn't
> overflow until year 2106, the specific implementation of the TCP PAWS
> algorithm breaks in 2038 when the intermediate signed 32-bit timestamps
> overflow.
> 
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> v2: use time_before32()/time_after32() everywhere as suggested
>     Eric Dumazet


Thanks a lot Arnd.

Signed-off-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: Incomplete fix for be9c798 / 7b2ee50 hv_netvsc: common detach logic?
From: Stephen Hemminger @ 2018-06-19 23:14 UTC (permalink / raw)
  To: Greg KH; +Cc: devel, Stephen Hemminger, Thomas Walker, netdev
In-Reply-To: <20180619230150.GA24530@kroah.com>

On Wed, 20 Jun 2018 08:01:50 +0900
Greg KH <greg@kroah.com> wrote:

> On Tue, Jun 19, 2018 at 03:19:41PM -0400, Thomas Walker wrote:
> > Upon updating some internal kernels from 4.14.18 to 4.14.49, our hyper-v guests failed to bring up network interfaces on boot, logging "A link change request failed with some changes committed already. Interface eth0 may have been left with an inconsistent configuration, please check."  Running 'ifconfig eth0 up' appears to fix the problem temporarily so I went about bisecting which landed on:
> > 
> > commit be9c798d0d13ae609a91177323ac816545c39d28
> > Author: Stephen Hemminger <stephen@networkplumber.org>
> > Date:   Mon May 14 15:32:18 2018 -0700
> > 
> >     hv_netvsc: common detach logic
> >     
> >     [ Commit 7b2ee50c0cd513a176a26a71f2989facdd75bfea upstream. ]
> >     
> >     Make common function for detaching internals of device
> >     during changes to MTU and RSS. Make sure no more packets
> >     are transmitted and all packets have been received before
> >     doing device teardown.
> > 
> >     Change the wait logic to be common and use usleep_range().
> > 
> >     Changes transmit enabling logic so that transmit queues are disabled
> >     during the period when lower device is being changed. And enabled
> >     only after sub channels are setup. This avoids issue where it could
> >     be that a packet was being sent while subchannel was not initialized.
> > 
> >     Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
> >     Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> >     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > 
> > 
> > The removal of which does indeed fix the problem.  That led me to:
> > 
> > commit 52acf73b6e9a6962045feb2ba5a8921da2201915
> > Author: Dexuan Cui <decui@microsoft.com>
> > Date:   Wed Jun 6 21:32:51 2018 +0000
> > 
> >     hv_netvsc: Fix a network regression after ifdown/ifup
> >     
> >     Recently people reported the NIC stops working after
> >     "ifdown eth0; ifup eth0". It turns out in this case the TX queues are not
> >     enabled, after the refactoring of the common detach logic: when the NIC
> >     has sub-channels, usually we enable all the TX queues after all
> >     sub-channels are set up: see rndis_set_subchannel() ->
> >     netif_device_attach(), but in the case of "ifdown eth0; ifup eth0" where
> >     the number of channels doesn't change, we also must make sure the TX queues
> >     are enabled. The patch fixes the regression.
> >     
> >     Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
> >     Signed-off-by: Dexuan Cui <decui@microsoft.com>
> >     Cc: Stephen Hemminger <sthemmin@microsoft.com>
> >     Cc: K. Y. Srinivasan <kys@microsoft.com>
> >     Cc: Haiyang Zhang <haiyangz@microsoft.com>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> > 
> > Which sounded very promising, but does not seem to fully fix the issue.  Doing some more digging, I was able to determine that the message coincides with 'ip link set dev eth0 mtu 1300 up' very shortly (>~1 second) after the hv_netvsc driver loads.  If I delay the mtu change until well after the driver loads, everything works fine.  If I unload hv_netvsc and then reload it and apply the mtu change immediately, the failure re-occurs.  So something is racy here, and the above doesn't entirely address it.
> > 
> > I'm happy to test out any suggested patches and/or do additional debugging if anyone has any suggestions.
> > 
> > (oh, and I did also try 4.18-rc1 and the problem still persists)  
> 
> Always cc: the authors of the patch you are having problems with, along
> with the mailing list for the networking subsystem, so that the right
> people know to look at this.

How are you changing the MTU? and starting the device.
When MTU changes the device has to reconnect with the host. This takes a small amount of time
and no changes to device state are possible then.

If MTU change happens and at the same time some other script tries to bring up the connection,
then that script will get an error.

^ permalink raw reply

* Re: bpfilter compile failure on parisc
From: John David Anglin @ 2018-06-19 23:05 UTC (permalink / raw)
  To: Meelis Roos, netdev, linux-parisc
In-Reply-To: <alpine.LRH.2.21.1806192036240.10716@math.ut.ee>

On 2018-06-19 1:38 PM, Meelis Roos wrote:
> Tried enabling bpfilter on parisc, got this:
>
>    HOSTCC  net/bpfilter/main.o
> net/bpfilter/main.c:3:21: fatal error: sys/uio.h: No such file or directory
>   #include <sys/uio.h>
Probably has something to do with the include directories searched by 
HOSTCC.  The location of
the file is "/usr/include/hppa-linux-gnu/sys/uio.h".

Dave

-- 
John David Anglin  dave.anglin@bell.net

^ permalink raw reply

* davinci_emac failures in 4.18-rc1 on AM3517-EVM
From: Adam Ford @ 2018-06-19 23:11 UTC (permalink / raw)
  To: linux-omap, netdev

I am not sure if anyone else has seen this.  If not, I'll bisect, but
the AM3517 ethernet seems to have died, and it's throwing some errors

[    2.708933] davinci_mdio davinci_mdio.0: failed to get device clock
[    2.715363] davinci_mdio: probe of davinci_mdio.0 failed with error -2

[snip]

[    3.054552] davinci_emac davinci_emac.0: failed to get EMAC clock
[    3.061195] davinci_emac: probe of davinci_emac.0 failed with error -16


If no one has seen this, I'll look into it, but I didn't want to waste
time if someone is already aware of it.

adam

^ permalink raw reply

* Re: [PATCH net] enic: do not overwrite error code
From: David Miller @ 2018-06-19 23:10 UTC (permalink / raw)
  To: gvaradar; +Cc: netdev, ben.hutchings, benve, gregkh, alexander.levin
In-Reply-To: <20180618170105.1301-1-gvaradar@cisco.com>

From: Govindarajulu Varadarajan <gvaradar@cisco.com>
Date: Mon, 18 Jun 2018 10:01:05 -0700

> In failure path, we overwrite err to what vnic_rq_disable() returns. In
> case it returns 0, enic_open() returns success in case of error.
> 
> Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
> Fixes: e8588e268509 ("enic: enable rq before updating rq descriptors")
> Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: Incomplete fix for be9c798 / 7b2ee50 hv_netvsc: common detach logic?
From: Stephen Hemminger @ 2018-06-19 23:07 UTC (permalink / raw)
  To: Greg KH; +Cc: devel, Stephen Hemminger, Thomas Walker, netdev
In-Reply-To: <20180619230150.GA24530@kroah.com>

On Wed, 20 Jun 2018 08:01:50 +0900
Greg KH <greg@kroah.com> wrote:

> On Tue, Jun 19, 2018 at 03:19:41PM -0400, Thomas Walker wrote:
> > Upon updating some internal kernels from 4.14.18 to 4.14.49, our hyper-v guests failed to bring up network interfaces on boot, logging "A link change request failed with some changes committed already. Interface eth0 may have been left with an inconsistent configuration, please check."  Running 'ifconfig eth0 up' appears to fix the problem temporarily so I went about bisecting which landed on:
> > 
> > commit be9c798d0d13ae609a91177323ac816545c39d28
> > Author: Stephen Hemminger <stephen@networkplumber.org>
> > Date:   Mon May 14 15:32:18 2018 -0700
> > 
> >     hv_netvsc: common detach logic
> >     
> >     [ Commit 7b2ee50c0cd513a176a26a71f2989facdd75bfea upstream. ]
> >     
> >     Make common function for detaching internals of device
> >     during changes to MTU and RSS. Make sure no more packets
> >     are transmitted and all packets have been received before
> >     doing device teardown.
> > 
> >     Change the wait logic to be common and use usleep_range().
> > 
> >     Changes transmit enabling logic so that transmit queues are disabled
> >     during the period when lower device is being changed. And enabled
> >     only after sub channels are setup. This avoids issue where it could
> >     be that a packet was being sent while subchannel was not initialized.
> > 
> >     Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
> >     Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> >     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > 
> > 
> > The removal of which does indeed fix the problem.  That led me to:
> > 
> > commit 52acf73b6e9a6962045feb2ba5a8921da2201915
> > Author: Dexuan Cui <decui@microsoft.com>
> > Date:   Wed Jun 6 21:32:51 2018 +0000
> > 
> >     hv_netvsc: Fix a network regression after ifdown/ifup
> >     
> >     Recently people reported the NIC stops working after
> >     "ifdown eth0; ifup eth0". It turns out in this case the TX queues are not
> >     enabled, after the refactoring of the common detach logic: when the NIC
> >     has sub-channels, usually we enable all the TX queues after all
> >     sub-channels are set up: see rndis_set_subchannel() ->
> >     netif_device_attach(), but in the case of "ifdown eth0; ifup eth0" where
> >     the number of channels doesn't change, we also must make sure the TX queues
> >     are enabled. The patch fixes the regression.
> >     
> >     Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
> >     Signed-off-by: Dexuan Cui <decui@microsoft.com>
> >     Cc: Stephen Hemminger <sthemmin@microsoft.com>
> >     Cc: K. Y. Srinivasan <kys@microsoft.com>
> >     Cc: Haiyang Zhang <haiyangz@microsoft.com>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> > 
> > Which sounded very promising, but does not seem to fully fix the issue.  Doing some more digging, I was able to determine that the message coincides with 'ip link set dev eth0 mtu 1300 up' very shortly (>~1 second) after the hv_netvsc driver loads.  If I delay the mtu change until well after the driver loads, everything works fine.  If I unload hv_netvsc and then reload it and apply the mtu change immediately, the failure re-occurs.  So something is racy here, and the above doesn't entirely address it.
> > 
> > I'm happy to test out any suggested patches and/or do additional debugging if anyone has any suggestions.
> > 
> > (oh, and I did also try 4.18-rc1 and the problem still persists)  
> 
> Always cc: the authors of the patch you are having problems with, along
> with the mailing list for the networking subsystem, so that the right
> people know to look at this.

Let me take a look at it, and log it into the bug system.

^ permalink raw reply

* Re: [PATCH net] net/tcp: Fix socket lookups with SO_BINDTODEVICE
From: David Miller @ 2018-06-19 23:04 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, lberger, renato, dsahern
In-Reply-To: <20180618193037.3365-1-dsahern@kernel.org>

From: dsahern@kernel.org
Date: Mon, 18 Jun 2018 12:30:37 -0700

> From: David Ahern <dsahern@gmail.com>
> 
> Similar to 69678bcd4d2d ("udp: fix SO_BINDTODEVICE"), TCP socket lookups
> need to fail if dev_match is not true. Currently, a packet to a given port
> can match a socket bound to device when it should not. In the VRF case,
> this causes the lookup to hit a VRF socket and not a global socket
> resulting in a response trying to go through the VRF when it should it.
                                                                      ^^^

"not", I fixed this up for you.

> Fixes: 3fa6f616a7a4d ("net: ipv4: add second dif to inet socket lookups")
> Fixes: 4297a0ef08572 ("net: ipv6: add second dif to inet6 socket lookups")
> Reported-by: Lou Berger <lberger@labn.net>
> Diagnosed-by: Renato Westphal <renato@opensourcerouting.org>
> Tested-by: Renato Westphal <renato@opensourcerouting.org>
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: Incomplete fix for be9c798 / 7b2ee50 hv_netvsc: common detach logic?
From: Greg KH @ 2018-06-19 23:01 UTC (permalink / raw)
  To: Thomas Walker; +Cc: devel, Stephen Hemminger, netdev
In-Reply-To: <20180619191941.GD32145@twosigma.com>

On Tue, Jun 19, 2018 at 03:19:41PM -0400, Thomas Walker wrote:
> Upon updating some internal kernels from 4.14.18 to 4.14.49, our hyper-v guests failed to bring up network interfaces on boot, logging "A link change request failed with some changes committed already. Interface eth0 may have been left with an inconsistent configuration, please check."  Running 'ifconfig eth0 up' appears to fix the problem temporarily so I went about bisecting which landed on:
> 
> commit be9c798d0d13ae609a91177323ac816545c39d28
> Author: Stephen Hemminger <stephen@networkplumber.org>
> Date:   Mon May 14 15:32:18 2018 -0700
> 
>     hv_netvsc: common detach logic
>     
>     [ Commit 7b2ee50c0cd513a176a26a71f2989facdd75bfea upstream. ]
>     
>     Make common function for detaching internals of device
>     during changes to MTU and RSS. Make sure no more packets
>     are transmitted and all packets have been received before
>     doing device teardown.
> 
>     Change the wait logic to be common and use usleep_range().
> 
>     Changes transmit enabling logic so that transmit queues are disabled
>     during the period when lower device is being changed. And enabled
>     only after sub channels are setup. This avoids issue where it could
>     be that a packet was being sent while subchannel was not initialized.
> 
>     Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
>     Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> 
> The removal of which does indeed fix the problem.  That led me to:
> 
> commit 52acf73b6e9a6962045feb2ba5a8921da2201915
> Author: Dexuan Cui <decui@microsoft.com>
> Date:   Wed Jun 6 21:32:51 2018 +0000
> 
>     hv_netvsc: Fix a network regression after ifdown/ifup
>     
>     Recently people reported the NIC stops working after
>     "ifdown eth0; ifup eth0". It turns out in this case the TX queues are not
>     enabled, after the refactoring of the common detach logic: when the NIC
>     has sub-channels, usually we enable all the TX queues after all
>     sub-channels are set up: see rndis_set_subchannel() ->
>     netif_device_attach(), but in the case of "ifdown eth0; ifup eth0" where
>     the number of channels doesn't change, we also must make sure the TX queues
>     are enabled. The patch fixes the regression.
>     
>     Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
>     Signed-off-by: Dexuan Cui <decui@microsoft.com>
>     Cc: Stephen Hemminger <sthemmin@microsoft.com>
>     Cc: K. Y. Srinivasan <kys@microsoft.com>
>     Cc: Haiyang Zhang <haiyangz@microsoft.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> Which sounded very promising, but does not seem to fully fix the issue.  Doing some more digging, I was able to determine that the message coincides with 'ip link set dev eth0 mtu 1300 up' very shortly (>~1 second) after the hv_netvsc driver loads.  If I delay the mtu change until well after the driver loads, everything works fine.  If I unload hv_netvsc and then reload it and apply the mtu change immediately, the failure re-occurs.  So something is racy here, and the above doesn't entirely address it.
> 
> I'm happy to test out any suggested patches and/or do additional debugging if anyone has any suggestions.
> 
> (oh, and I did also try 4.18-rc1 and the problem still persists)

Always cc: the authors of the patch you are having problems with, along
with the mailing list for the networking subsystem, so that the right
people know to look at this.

Now fixed...

thanks,

greg k-h

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox