Netdev List
 help / color / mirror / Atom feed
* Re: Incomplete fix for be9c798 / 7b2ee50 hv_netvsc: common detach logic?
From: Stephen Hemminger @ 2018-06-19 23:07 UTC (permalink / raw)
  To: Greg KH; +Cc: devel, Stephen Hemminger, Thomas Walker, netdev
In-Reply-To: <20180619230150.GA24530@kroah.com>

On Wed, 20 Jun 2018 08:01:50 +0900
Greg KH <greg@kroah.com> wrote:

> On Tue, Jun 19, 2018 at 03:19:41PM -0400, Thomas Walker wrote:
> > Upon updating some internal kernels from 4.14.18 to 4.14.49, our hyper-v guests failed to bring up network interfaces on boot, logging "A link change request failed with some changes committed already. Interface eth0 may have been left with an inconsistent configuration, please check."  Running 'ifconfig eth0 up' appears to fix the problem temporarily so I went about bisecting which landed on:
> > 
> > commit be9c798d0d13ae609a91177323ac816545c39d28
> > Author: Stephen Hemminger <stephen@networkplumber.org>
> > Date:   Mon May 14 15:32:18 2018 -0700
> > 
> >     hv_netvsc: common detach logic
> >     
> >     [ Commit 7b2ee50c0cd513a176a26a71f2989facdd75bfea upstream. ]
> >     
> >     Make common function for detaching internals of device
> >     during changes to MTU and RSS. Make sure no more packets
> >     are transmitted and all packets have been received before
> >     doing device teardown.
> > 
> >     Change the wait logic to be common and use usleep_range().
> > 
> >     Changes transmit enabling logic so that transmit queues are disabled
> >     during the period when lower device is being changed. And enabled
> >     only after sub channels are setup. This avoids issue where it could
> >     be that a packet was being sent while subchannel was not initialized.
> > 
> >     Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
> >     Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> >     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > 
> > 
> > The removal of which does indeed fix the problem.  That led me to:
> > 
> > commit 52acf73b6e9a6962045feb2ba5a8921da2201915
> > Author: Dexuan Cui <decui@microsoft.com>
> > Date:   Wed Jun 6 21:32:51 2018 +0000
> > 
> >     hv_netvsc: Fix a network regression after ifdown/ifup
> >     
> >     Recently people reported the NIC stops working after
> >     "ifdown eth0; ifup eth0". It turns out in this case the TX queues are not
> >     enabled, after the refactoring of the common detach logic: when the NIC
> >     has sub-channels, usually we enable all the TX queues after all
> >     sub-channels are set up: see rndis_set_subchannel() ->
> >     netif_device_attach(), but in the case of "ifdown eth0; ifup eth0" where
> >     the number of channels doesn't change, we also must make sure the TX queues
> >     are enabled. The patch fixes the regression.
> >     
> >     Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
> >     Signed-off-by: Dexuan Cui <decui@microsoft.com>
> >     Cc: Stephen Hemminger <sthemmin@microsoft.com>
> >     Cc: K. Y. Srinivasan <kys@microsoft.com>
> >     Cc: Haiyang Zhang <haiyangz@microsoft.com>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> > 
> > Which sounded very promising, but does not seem to fully fix the issue.  Doing some more digging, I was able to determine that the message coincides with 'ip link set dev eth0 mtu 1300 up' very shortly (>~1 second) after the hv_netvsc driver loads.  If I delay the mtu change until well after the driver loads, everything works fine.  If I unload hv_netvsc and then reload it and apply the mtu change immediately, the failure re-occurs.  So something is racy here, and the above doesn't entirely address it.
> > 
> > I'm happy to test out any suggested patches and/or do additional debugging if anyone has any suggestions.
> > 
> > (oh, and I did also try 4.18-rc1 and the problem still persists)  
> 
> Always cc: the authors of the patch you are having problems with, along
> with the mailing list for the networking subsystem, so that the right
> people know to look at this.

Let me take a look at it, and log it into the bug system.

^ permalink raw reply

* Re: [PATCH net] net/tcp: Fix socket lookups with SO_BINDTODEVICE
From: David Miller @ 2018-06-19 23:04 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, lberger, renato, dsahern
In-Reply-To: <20180618193037.3365-1-dsahern@kernel.org>

From: dsahern@kernel.org
Date: Mon, 18 Jun 2018 12:30:37 -0700

> From: David Ahern <dsahern@gmail.com>
> 
> Similar to 69678bcd4d2d ("udp: fix SO_BINDTODEVICE"), TCP socket lookups
> need to fail if dev_match is not true. Currently, a packet to a given port
> can match a socket bound to device when it should not. In the VRF case,
> this causes the lookup to hit a VRF socket and not a global socket
> resulting in a response trying to go through the VRF when it should it.
                                                                      ^^^

"not", I fixed this up for you.

> Fixes: 3fa6f616a7a4d ("net: ipv4: add second dif to inet socket lookups")
> Fixes: 4297a0ef08572 ("net: ipv6: add second dif to inet6 socket lookups")
> Reported-by: Lou Berger <lberger@labn.net>
> Diagnosed-by: Renato Westphal <renato@opensourcerouting.org>
> Tested-by: Renato Westphal <renato@opensourcerouting.org>
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: Incomplete fix for be9c798 / 7b2ee50 hv_netvsc: common detach logic?
From: Greg KH @ 2018-06-19 23:01 UTC (permalink / raw)
  To: Thomas Walker; +Cc: devel, Stephen Hemminger, netdev
In-Reply-To: <20180619191941.GD32145@twosigma.com>

On Tue, Jun 19, 2018 at 03:19:41PM -0400, Thomas Walker wrote:
> Upon updating some internal kernels from 4.14.18 to 4.14.49, our hyper-v guests failed to bring up network interfaces on boot, logging "A link change request failed with some changes committed already. Interface eth0 may have been left with an inconsistent configuration, please check."  Running 'ifconfig eth0 up' appears to fix the problem temporarily so I went about bisecting which landed on:
> 
> commit be9c798d0d13ae609a91177323ac816545c39d28
> Author: Stephen Hemminger <stephen@networkplumber.org>
> Date:   Mon May 14 15:32:18 2018 -0700
> 
>     hv_netvsc: common detach logic
>     
>     [ Commit 7b2ee50c0cd513a176a26a71f2989facdd75bfea upstream. ]
>     
>     Make common function for detaching internals of device
>     during changes to MTU and RSS. Make sure no more packets
>     are transmitted and all packets have been received before
>     doing device teardown.
> 
>     Change the wait logic to be common and use usleep_range().
> 
>     Changes transmit enabling logic so that transmit queues are disabled
>     during the period when lower device is being changed. And enabled
>     only after sub channels are setup. This avoids issue where it could
>     be that a packet was being sent while subchannel was not initialized.
> 
>     Fixes: 8195b1396ec8 ("hv_netvsc: fix deadlock on hotplug")
>     Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>     Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> 
> The removal of which does indeed fix the problem.  That led me to:
> 
> commit 52acf73b6e9a6962045feb2ba5a8921da2201915
> Author: Dexuan Cui <decui@microsoft.com>
> Date:   Wed Jun 6 21:32:51 2018 +0000
> 
>     hv_netvsc: Fix a network regression after ifdown/ifup
>     
>     Recently people reported the NIC stops working after
>     "ifdown eth0; ifup eth0". It turns out in this case the TX queues are not
>     enabled, after the refactoring of the common detach logic: when the NIC
>     has sub-channels, usually we enable all the TX queues after all
>     sub-channels are set up: see rndis_set_subchannel() ->
>     netif_device_attach(), but in the case of "ifdown eth0; ifup eth0" where
>     the number of channels doesn't change, we also must make sure the TX queues
>     are enabled. The patch fixes the regression.
>     
>     Fixes: 7b2ee50c0cd5 ("hv_netvsc: common detach logic")
>     Signed-off-by: Dexuan Cui <decui@microsoft.com>
>     Cc: Stephen Hemminger <sthemmin@microsoft.com>
>     Cc: K. Y. Srinivasan <kys@microsoft.com>
>     Cc: Haiyang Zhang <haiyangz@microsoft.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> Which sounded very promising, but does not seem to fully fix the issue.  Doing some more digging, I was able to determine that the message coincides with 'ip link set dev eth0 mtu 1300 up' very shortly (>~1 second) after the hv_netvsc driver loads.  If I delay the mtu change until well after the driver loads, everything works fine.  If I unload hv_netvsc and then reload it and apply the mtu change immediately, the failure re-occurs.  So something is racy here, and the above doesn't entirely address it.
> 
> I'm happy to test out any suggested patches and/or do additional debugging if anyone has any suggestions.
> 
> (oh, and I did also try 4.18-rc1 and the problem still persists)

Always cc: the authors of the patch you are having problems with, along
with the mailing list for the networking subsystem, so that the right
people know to look at this.

Now fixed...

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH net-next] liquidio: use ktime_get_real_ts64() instead of getnstimeofday64()
From: David Miller @ 2018-06-19 23:01 UTC (permalink / raw)
  To: arnd
  Cc: weilin.chang, dvlasenk, y2038, veerasenareddy.burru,
	ricardo.farrington, linux-kernel, derek.chickles, satananda.burla,
	intiyaz.basha, vijaya.guvva, felix.manlunas, raghu.vatsavayi,
	netdev
In-Reply-To: <20180618151724.1406034-1-arnd@arndb.de>


Arnd, please resubmit these when I open net-next up again, which
should be over the upcoming weekend.

Thank you.
_______________________________________________
Y2038 mailing list
Y2038@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/y2038

^ permalink raw reply

* Re: [PATCH] ptp: replace getnstimeofday64() with ktime_get_real_ts64()
From: David Miller @ 2018-06-19 23:00 UTC (permalink / raw)
  To: arnd; +Cc: richardcochran, yangbo.lu, y2038, fabio.estevam, netdev,
	linux-kernel
In-Reply-To: <20180618142109.3445025-1-arnd@arndb.de>

From: Arnd Bergmann <arnd@arndb.de>
Date: Mon, 18 Jun 2018 16:20:39 +0200

> getnstimeofday64() is deprecated and getting replaced throughout
> the kernel with ktime_get_*() based helpers for a more consistent
> interface.
> 
> The two functions do the exact same thing, so this is just
> a cosmetic change.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Applied.

^ permalink raw reply

* Re: [PATCH net] net/ipv6: respect rcu grace period before freeing fib6_info
From: David Miller @ 2018-06-19 22:58 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, dsahern
In-Reply-To: <20180618122431.131265-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Mon, 18 Jun 2018 05:24:31 -0700

> syzbot reported use after free that is caused by fib6_info being
> freed without a proper RCU grace period.
 ...
> Fixes: a64efe142f5e ("net/ipv6: introduce fib6_info struct and helpers")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: David Ahern <dsahern@gmail.com>
> Reported-by: syzbot+9e6d75e3edef427ee888@syzkaller.appspotmail.com

Applied.

^ permalink raw reply

* Re: [PATCH] net: net_failover: fix typo in net_failover_slave_register()
From: David Miller @ 2018-06-19 22:57 UTC (permalink / raw)
  To: liran.alon; +Cc: netdev, linux-kernel, idan.brown
In-Reply-To: <1529323445-23777-1-git-send-email-liran.alon@oracle.com>

From: Liran Alon <liran.alon@oracle.com>
Date: Mon, 18 Jun 2018 15:04:05 +0300

> Sync both unicast and multicast lists instead of unicast twice.
> 
> Fixes: cfc80d9a116 ("net: Introduce net_failover driver")
> Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
> Signed-off-by: Liran Alon <liran.alon@oracle.com>

Applied.

^ permalink raw reply

* Re: [PATCH 0/5] Fixes coding style in xilinx_emaclite.c
From: David Miller @ 2018-06-19 22:55 UTC (permalink / raw)
  To: radhey.shyam.pandey
  Cc: andrew, michal.simek, netdev, linux-arm-kernel, linux-kernel
In-Reply-To: <1529322610-27215-1-git-send-email-radhey.shyam.pandey@xilinx.com>

From: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Date: Mon, 18 Jun 2018 17:20:05 +0530

> This patchset fixes checkpatch and kernel-doc warnings in
> xilinx emaclite driver. No functional change.

Such cleanups are only appropriate for net-next, which is closed at this
time.

^ permalink raw reply

* Re: [PATCH 1/1] tcp: tcp_mtup_probe -> tcp_mtu_probe
From: David Miller @ 2018-06-19 22:55 UTC (permalink / raw)
  To: jianjian.wang1; +Cc: edumazet, kuznet, yoshfuji, netdev
In-Reply-To: <CAP4sYWVqrAyE6sXerWe9Vn7N-jeMtDL_2xZGbu_ScebPY3iK3w@mail.gmail.com>

From: Wang Jian <jianjian.wang1@gmail.com>
Date: Mon, 18 Jun 2018 18:58:50 +0800

> Comment correction.
> 
> Signed-off-by: Jian Wang <jianjian.wang1@gmail.com>

This patch was corrupted by your email client.

Please fix this, email the patch to yourself, and only resubmit this patch
to the mailing list when you can successfull apply the patch yourself that
you send in the test email.

Thank you.

^ permalink raw reply

* Re: [PATCH net] ipvlan: use ETH_MAX_MTU as max mtu
From: David Miller @ 2018-06-19 22:54 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev, jarod, maheshb
In-Reply-To: <3a944c2cbcff6df280847770decca4875b671959.1529309757.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Mon, 18 Jun 2018 16:15:57 +0800

> Similar to the fixes on team and bonding, this restores the ability
> to set an ipvlan device's mtu to anything higher than 1500.
> 
> Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v2] net: hamradio: use eth_broadcast_addr
From: David Miller @ 2018-06-19 22:52 UTC (permalink / raw)
  To: stefan; +Cc: netdev, linux-kernel
In-Reply-To: <20180617214058.6926-1-stefan@agner.ch>

From: Stefan Agner <stefan@agner.ch>
Date: Sun, 17 Jun 2018 23:40:53 +0200

> The array bpq_eth_addr is only used to get the size of an
> address, whereas the bcast_addr is used to set the broadcast
> address. This leads to a warning when using clang:
> drivers/net/hamradio/bpqether.c:94:13: warning: variable 'bpq_eth_addr' is not
>       needed and will not be emitted [-Wunneeded-internal-declaration]
> static char bpq_eth_addr[6];
>             ^
> 
> Remove both variables and use the common eth_broadcast_addr
> to set the broadcast address.
> 
> Signed-off-by: Stefan Agner <stefan@agner.ch>

Applied.

^ permalink raw reply

* Re: [PATCH net] enic: initialize enic->rfs_h.lock in enic_probe
From: David Miller @ 2018-06-19 22:49 UTC (permalink / raw)
  To: gvaradar; +Cc: benve, netdev
In-Reply-To: <20180619151524.1326-1-gvaradar@cisco.com>

From: Govindarajulu Varadarajan <gvaradar@cisco.com>
Date: Tue, 19 Jun 2018 08:15:24 -0700

> lockdep spotted that we are using rfs_h.lock in enic_get_rxnfc() without
> initializing. rfs_h.lock is initialized in enic_open(). But ethtool_ops
> can be called when interface is down.
> 
> Move enic_rfs_flw_tbl_init to enic_probe.
 ...
> Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>

Applied, thanks.

^ permalink raw reply

* Re: [net] bpf, xdp, i40e: fix i40e_build_skb skb reserve and truesize
From: David Miller @ 2018-06-19 22:41 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: daniel, netdev, nhorman, sassmann, jogreene, bjorn.topel,
	john.fastabend
In-Reply-To: <20180619213354.30986-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 19 Jun 2018 14:33:54 -0700

> From: Daniel Borkmann <daniel@iogearbox.net>
> 
> Using skb_reserve(skb, I40E_SKB_PAD + (xdp->data - xdp->data_hard_start))
> is clearly wrong since I40E_SKB_PAD already points to the offset where
> the original xdp->data was sitting since xdp->data_hard_start is defined
> as xdp->data - i40e_rx_offset(rx_ring) where latter offsets to I40E_SKB_PAD
> when build skb is used.
> 
> However, also before cc5b114dcf98 ("bpf, i40e: add meta data support")
> this seems broken since bpf_xdp_adjust_head() helper could have been used
> to alter headroom and enlarge / shrink the frame and with that the assumption
> that the xdp->data remains unchanged does not hold and would push a bogus
> packet to upper stack.
> 
> ixgbe got this right in 924708081629 ("ixgbe: add XDP support for pass and
> drop actions"). In any case, fix it by removing the I40E_SKB_PAD from both
> skb_reserve() and truesize calculation.
> 
> Fixes: cc5b114dcf98 ("bpf, i40e: add meta data support")
> Fixes: 0c8493d90b6b ("i40e: add XDP support for pass and drop actions")
> Reported-by: Keith Busch <keith.busch@linux.intel.com>
> Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Björn Töpel <bjorn.topel@intel.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Tested-by: Keith Busch <keith.busch@linux.intel.com>
> Acked-by: John Fastabend <john.fastabend@gmail.com>
> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Eric Dumazet @ 2018-06-19 22:40 UTC (permalink / raw)
  To: Andreas Schwab, Eric Dumazet
  Cc: Mathieu Malaterre, David S. Miller, Eric Dumazet, LKML,
	Christophe LEROY, Meelis Roos, netdev, linuxppc-dev
In-Reply-To: <87lgbav24y.fsf@igel.home>



On 06/19/2018 03:32 PM, Andreas Schwab wrote:
> On Jun 19 2018, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
>> diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
>> index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..672d6748ab44f0890e92d5ca55d6ff6834c20dc9 100644
>> --- a/drivers/net/ethernet/sun/sungem.c
>> +++ b/drivers/net/ethernet/sun/sungem.c
>> @@ -60,8 +60,7 @@
>>  #include <linux/sungem_phy.h>
>>  #include "sungem.h"
>>  
>> -/* Stripping FCS is causing problems, disabled for now */
>> -#undef STRIP_FCS
>> +#define STRIP_FCS
>>  
>>  #define DEFAULT_MSG    (NETIF_MSG_DRV          | \
>>                          NETIF_MSG_PROBE        | \
>> @@ -435,7 +434,7 @@ static int gem_rxmac_reset(struct gem *gp)
>>         writel(desc_dma & 0xffffffff, gp->regs + RXDMA_DBLOW);
>>         writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK);
>>         val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
>> -              ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
>> +              (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
>>         writel(val, gp->regs + RXDMA_CFG);
>>         if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN)
>>                 writel(((5 & RXDMA_BLANK_IPKTS) |
>> @@ -857,6 +856,14 @@ static int gem_rx(struct gem *gp, int work_to_do)
>>  
>>                 csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff);
>>                 skb->csum = csum_unfold(csum);
>> +               {
>> +               __wsum rsum = csum_partial(skb->data + ETH_HLEN, len - ETH_HLEN, 0);
>> +               if (csum != csum_fold(rsum) && net_ratelimit())
>> +                       pr_err("sungem wrong csum : %x/%x, len %u bytes\n",
>> +                               csum, csum_fold(rsum), len);
>> +                       print_hex_dump(KERN_ERR, "raw data: ", DUMP_PREFIX_OFFSET,
>> +                                           16, 1, skb->data, len, true);
>> +               }
>>                 skb->ip_summed = CHECKSUM_COMPLETE;
>>                 skb->protocol = eth_type_trans(skb, gp->dev);
>>  
>> @@ -1761,7 +1768,7 @@ static void gem_init_dma(struct gem *gp)
>>         writel(0, gp->regs + TXDMA_KICK);
>>  
>>         val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
>> -              ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
>> +              (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
>>         writel(val, gp->regs + RXDMA_CFG);
>>  
>>         writel(desc_dma >> 32, gp->regs + RXDMA_DBHI);
> 
> With that patch I still get the wrong csum messages, but no longer the
> hw csum failure messages (tested on a PowerMac G5).
> 
> [  662.659767] sungem: sungem wrong csum : 8359/7ca6, len 86 bytes, c0000001fee9cc02
> [  662.659775] raw data: 00000000: 00 0d 93 43 81 62 d4 3d 7e 4c 48 b7 86 dd 61 01  ...C.b.=~LH...a.
> [  662.659778] raw data: 00000010: 1c 1e 00 20 06 40 20 01 0a 62 17 11 88 01 00 00  ... .@ ..b......
> [  662.659780] raw data: 00000020: 00 00 00 00 0a 38 20 01 0a 62 17 11 88 01 00 00  .....8 ..b......
> [  662.659783] raw data: 00000030: 00 00 00 00 00 07 9a 18 00 16 c1 9a 7e ea ea 44  ............~..D
> [  662.659785] raw data: 00000040: fb 4a 80 10 05 93 44 08 00 00 01 01 08 0a 59 68  .J....D.......Yh
> [  662.659788] raw data: 00000050: ba e2 0e bb ac ae                                ......
> 
> Andreas.
> 

Note that 8359 and 7ca6 are the same really (a missing ~ to invert csum_partial())

So the bug was that :

1) Driver programmed a wrong start offset for the csum  (7 bytes instead of 14 bytes to skip Ethernet Header)

2) FCS was not stripped.

Basically CHECKSUM_COMPLETE support never worked, but this was hidden by the fact that linux stack
had to throw away this CHECKSUM_COMPLETE because the FCS had to be removed.

Thanks !

^ permalink raw reply

* Re: [PATCH] net: nixge: Add __packed attribute to DMA descriptor struct
From: David Miller @ 2018-06-19 22:37 UTC (permalink / raw)
  To: f.fainelli; +Cc: mdf, keescook, netdev, linux-kernel
In-Reply-To: <832bebb8-300d-e911-2946-5edfe82dc30a@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue, 19 Jun 2018 10:13:55 -0700

> How could padding be inserted given than all of the structure members
> are naturally aligned (all u32 type). Compiler bug?

Agreed, this looks completely unnecessary.

__packed should only be used when absolutely necessary because using
it generates less efficient code on some architectures.

^ permalink raw reply

* Re: [PATCH 1/5] net: emaclite: Use __func__ instead of hardcoded name
From: Joe Perches @ 2018-06-19 22:37 UTC (permalink / raw)
  To: Andy Shevchenko, Radhey Shyam Pandey
  Cc: David S. Miller, Andrew Lunn, Michal Simek, netdev,
	linux-arm Mailing List, Linux Kernel Mailing List
In-Reply-To: <CAHp75VcKuqnUgynqXvACS7zY5=6aRxqymGE1w3Yh8sshehSm2g@mail.gmail.com>

On Wed, 2018-06-20 at 00:36 +0300, Andy Shevchenko wrote:
> On Mon, Jun 18, 2018 at 2:08 PM, Radhey Shyam Pandey
> <radhey.shyam.pandey@xilinx.com> wrote:
> > Switch hardcoded function name with a reference to __func__ making
> > the code more maintainable. Address below checkpatch warning:
> > 
> > WARNING: Prefer using '"%s...", __func__' to using 'xemaclite_mdio_read',
> > this function's name, in a string
> > +               "xemaclite_mdio_read(phy_id=%i, reg=%x) == %x\n",
> > 
> > WARNING: Prefer using '"%s...", __func__' to using 'xemaclite_mdio_write',
> > this function's name, in a string
> > +               "xemaclite_mdio_write(phy_id=%i, reg=%x, val=%x)\n",
> > 
> 
> For dev_dbg() the __func__ should be completely dropped away.

Not really the same.

dev_dbg without CONFIG_DYNAMIC_DEBUG does not have
the ability to prefix __func__.

^ permalink raw reply

* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Andreas Schwab @ 2018-06-19 22:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mathieu Malaterre, David S. Miller, Eric Dumazet, LKML,
	Christophe LEROY, Meelis Roos, netdev, linuxppc-dev
In-Reply-To: <05645b90-d3bc-466d-116f-548f3ee39de9@gmail.com>

On Jun 19 2018, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
> index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..672d6748ab44f0890e92d5ca55d6ff6834c20dc9 100644
> --- a/drivers/net/ethernet/sun/sungem.c
> +++ b/drivers/net/ethernet/sun/sungem.c
> @@ -60,8 +60,7 @@
>  #include <linux/sungem_phy.h>
>  #include "sungem.h"
>  
> -/* Stripping FCS is causing problems, disabled for now */
> -#undef STRIP_FCS
> +#define STRIP_FCS
>  
>  #define DEFAULT_MSG    (NETIF_MSG_DRV          | \
>                          NETIF_MSG_PROBE        | \
> @@ -435,7 +434,7 @@ static int gem_rxmac_reset(struct gem *gp)
>         writel(desc_dma & 0xffffffff, gp->regs + RXDMA_DBLOW);
>         writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK);
>         val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
> -              ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
> +              (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
>         writel(val, gp->regs + RXDMA_CFG);
>         if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN)
>                 writel(((5 & RXDMA_BLANK_IPKTS) |
> @@ -857,6 +856,14 @@ static int gem_rx(struct gem *gp, int work_to_do)
>  
>                 csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff);
>                 skb->csum = csum_unfold(csum);
> +               {
> +               __wsum rsum = csum_partial(skb->data + ETH_HLEN, len - ETH_HLEN, 0);
> +               if (csum != csum_fold(rsum) && net_ratelimit())
> +                       pr_err("sungem wrong csum : %x/%x, len %u bytes\n",
> +                               csum, csum_fold(rsum), len);
> +                       print_hex_dump(KERN_ERR, "raw data: ", DUMP_PREFIX_OFFSET,
> +                                           16, 1, skb->data, len, true);
> +               }
>                 skb->ip_summed = CHECKSUM_COMPLETE;
>                 skb->protocol = eth_type_trans(skb, gp->dev);
>  
> @@ -1761,7 +1768,7 @@ static void gem_init_dma(struct gem *gp)
>         writel(0, gp->regs + TXDMA_KICK);
>  
>         val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
> -              ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
> +              (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
>         writel(val, gp->regs + RXDMA_CFG);
>  
>         writel(desc_dma >> 32, gp->regs + RXDMA_DBHI);

With that patch I still get the wrong csum messages, but no longer the
hw csum failure messages (tested on a PowerMac G5).

[  662.659767] sungem: sungem wrong csum : 8359/7ca6, len 86 bytes, c0000001fee9cc02
[  662.659775] raw data: 00000000: 00 0d 93 43 81 62 d4 3d 7e 4c 48 b7 86 dd 61 01  ...C.b.=~LH...a.
[  662.659778] raw data: 00000010: 1c 1e 00 20 06 40 20 01 0a 62 17 11 88 01 00 00  ... .@ ..b......
[  662.659780] raw data: 00000020: 00 00 00 00 0a 38 20 01 0a 62 17 11 88 01 00 00  .....8 ..b......
[  662.659783] raw data: 00000030: 00 00 00 00 00 07 9a 18 00 16 c1 9a 7e ea ea 44  ............~..D
[  662.659785] raw data: 00000040: fb 4a 80 10 05 93 44 08 00 00 01 01 08 0a 59 68  .J....D.......Yh
[  662.659788] raw data: 00000050: ba e2 0e bb ac ae                                ......

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply

* Re: [PATCH v2 0/4] Slience NCSI logging
From: David Miller @ 2018-06-19 22:27 UTC (permalink / raw)
  To: joel; +Cc: sam, netdev, joe
In-Reply-To: <20180619053834.12257-1-joel@jms.id.au>

From: Joel Stanley <joel@jms.id.au>
Date: Tue, 19 Jun 2018 15:08:30 +0930

> v2:
>   Fix indent issue and commit message based on Joe's feedback
>   Add Sam's acks
> 
> Here are three changes to silence unnecessary warnings in the ncsi code.
> 
> The final patch adds Sam as the maintainer for NCSI.

Series applied.

^ permalink raw reply

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
From: Daniel Borkmann @ 2018-06-19 22:22 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, pablo, netfilter-devel, netdev
In-Reply-To: <20180617092304.fviy36bvgnhexwtb@gauss3.secunet.de>

On 06/17/2018 11:23 AM, Steffen Klassert wrote:
[...]
>> Would be curious about
>> the numbers. You'd get implicit batching for the forwarding via devmap
>> as well if you're required to flush it out via different device with
>> XDP_REDIRECT; otherwise XDP_TX of course. Given we have recently
>> integrated helpers for XDP to do a FIB and neighbor lookup from the
>> kernel tables, where it's thus shared and integrated with the rest of
>> the stack and tooling, it would be awesome to get to the same point
>> with xfrm as well. Eyal recently did a start on that for xfrm for tc
>> progs; would be nice to have integration on XDP as well, potentially
>> it might also result in a bigger plus on the forwarding numbers.
> 
> It might make sense to intrgrate XDP with xfrm to
> be able to compare numbers etc. But I need a working
> XDP setup and some understanding about it first, what
> could take some time.

Okay, no prob. If you have any questions feel free to shoot an email.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH net 0/3] qed*: Fix series.
From: David Miller @ 2018-06-19 22:18 UTC (permalink / raw)
  To: sudarsana.kalluru; +Cc: netdev, Ariel.Elior, Michal.Kalderon
In-Reply-To: <20180619045802.24050-1-sudarsana.kalluru@cavium.com>

From: Sudarsana Reddy Kalluru <sudarsana.kalluru@cavium.com>
Date: Mon, 18 Jun 2018 21:57:59 -0700

> From: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
> 
> The patch series fixes few issues in the qed/qede drivers.
> Please consider applying this series to "net".

Series applied.

^ permalink raw reply

* [PATCH net] enic: initialize enic->rfs_h.lock in enic_probe
From: Govindarajulu Varadarajan @ 2018-06-19 15:15 UTC (permalink / raw)
  To: davem, benve, netdev; +Cc: Govindarajulu Varadarajan

lockdep spotted that we are using rfs_h.lock in enic_get_rxnfc() without
initializing. rfs_h.lock is initialized in enic_open(). But ethtool_ops
can be called when interface is down.

Move enic_rfs_flw_tbl_init to enic_probe.

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 18 PID: 1189 Comm: ethtool Not tainted 4.17.0-rc7-devel+ #27
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
Call Trace:
dump_stack+0x85/0xc0
register_lock_class+0x550/0x560
? __handle_mm_fault+0xa8b/0x1100
__lock_acquire+0x81/0x670
lock_acquire+0xb9/0x1e0
?  enic_get_rxnfc+0x139/0x2b0 [enic]
_raw_spin_lock_bh+0x38/0x80
? enic_get_rxnfc+0x139/0x2b0 [enic]
enic_get_rxnfc+0x139/0x2b0 [enic]
ethtool_get_rxnfc+0x8d/0x1c0
dev_ethtool+0x16c8/0x2400
? __mutex_lock+0x64d/0xa00
? dev_load+0x6a/0x150
dev_ioctl+0x253/0x4b0
sock_do_ioctl+0x9a/0x130
sock_ioctl+0x1af/0x350
do_vfs_ioctl+0x8e/0x670
? syscall_trace_enter+0x1e2/0x380
ksys_ioctl+0x60/0x90
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x5a/0x170
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
---
 drivers/net/ethernet/cisco/enic/enic_clsf.c | 3 +--
 drivers/net/ethernet/cisco/enic/enic_main.c | 3 ++-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic_clsf.c b/drivers/net/ethernet/cisco/enic/enic_clsf.c
index 973c1fb70d09..99038dfc7fbe 100644
--- a/drivers/net/ethernet/cisco/enic/enic_clsf.c
+++ b/drivers/net/ethernet/cisco/enic/enic_clsf.c
@@ -79,7 +79,6 @@ void enic_rfs_flw_tbl_init(struct enic *enic)
 	enic->rfs_h.max = enic->config.num_arfs;
 	enic->rfs_h.free = enic->rfs_h.max;
 	enic->rfs_h.toclean = 0;
-	enic_rfs_timer_start(enic);
 }
 
 void enic_rfs_flw_tbl_free(struct enic *enic)
@@ -88,7 +87,6 @@ void enic_rfs_flw_tbl_free(struct enic *enic)
 
 	enic_rfs_timer_stop(enic);
 	spin_lock_bh(&enic->rfs_h.lock);
-	enic->rfs_h.free = 0;
 	for (i = 0; i < (1 << ENIC_RFS_FLW_BITSHIFT); i++) {
 		struct hlist_head *hhead;
 		struct hlist_node *tmp;
@@ -99,6 +97,7 @@ void enic_rfs_flw_tbl_free(struct enic *enic)
 			enic_delfltr(enic, n->fltr_id);
 			hlist_del(&n->node);
 			kfree(n);
+			enic->rfs_h.free++;
 		}
 	}
 	spin_unlock_bh(&enic->rfs_h.lock);
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index 30d2eaa18c04..e6ad581eadd8 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -1971,7 +1971,7 @@ static int enic_open(struct net_device *netdev)
 		vnic_intr_unmask(&enic->intr[i]);
 
 	enic_notify_timer_start(enic);
-	enic_rfs_flw_tbl_init(enic);
+	enic_rfs_timer_start(enic);
 
 	return 0;
 
@@ -2904,6 +2904,7 @@ static int enic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	timer_setup(&enic->notify_timer, enic_notify_timer, 0);
 
+	enic_rfs_flw_tbl_init(enic);
 	enic_set_rx_coal_setting(enic);
 	INIT_WORK(&enic->reset, enic_reset);
 	INIT_WORK(&enic->tx_hang_reset, enic_tx_hang_reset);
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH] tc, bpf: add option to dump bpf verifier as C program fragment
From: Daniel Borkmann @ 2018-06-19 22:13 UTC (permalink / raw)
  To: David Ahern, Jakub Kicinski, Ophir Munk
  Cc: netdev, Stephen Hemminger, Thomas Monjalon, Olga Shern, ast
In-Reply-To: <30ac6974-a434-926b-5749-25c594e5841c@gmail.com>

On 06/18/2018 11:44 PM, David Ahern wrote:
> On 6/18/18 2:18 PM, Jakub Kicinski wrote:
>> On Sun, 17 Jun 2018 08:48:41 +0000, Ophir Munk wrote:
>>> Similar to cbpf used within tcpdump utility with a "-d" option to dump
>>> the compiled packet-matching code in a human readable form - tc has the
>>> "verbose" option to dump ebpf verifier output.
>>> Another useful option of cbpf using tcpdump "-dd" option is to dump
>>> packet-matching code a C program fragment. Similar to this - this commit
>>> adds a new tc ebpf option named "code" to dump ebpf verifier as C program
>>> fragment.
>>>
>>> Existing "verbose" option sample output:
>>>
>>> Verifier analysis:
>>> 0: (61) r2 = *(u32 *)(r1 +52)
>>> 1: (18) r3 = 0xdeadbeef
>>> 3: (63) *(u32 *)(r10 -4) = r3
>>> .
>>> .
>>> 11: (63) *(u32 *)(r1 +52) = r2
>>> 12: (18) r0 = 0xffffffff
>>> 14: (95) exit
>>>
>>> New "code" option sample output:
>>>
>>> /* struct bpf_insn cls_q_code[] = { */
>>> {0x61,    2,    1,       52, 0x00000000},
>>> {0x18,    3,    0,        0, 0xdeadbeef},
>>> {0x00,    0,    0,        0, 0x00000000},
>>> .
>>> .
>>> {0x63,    1,    2,       52, 0x00000000},
>>> {0x18,    0,    0,        0, 0xffffffff},
>>> {0x00,    0,    0,        0, 0x00000000},
>>> {0x95,    0,    0,        0, 0x00000000},
>>>
>>> Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
>>
>> Hmm... printing C arrays looks like hacky integration with some C
>> code...  Would you not be better served by simply using libbpf in
>> whatever is consuming this output?
> 
> I was thinking the same. bpftool would provide options too -- print the
> above, print in macro encodings and verifier. I gave an example of this
> side by side dump at netconf 2.1. Does not look like the slides made it
> online; see attached.

+1, I would also doubt that this adds a lot in terms of debuggability
when you're trying to load an object file with thousands of insns. Better
way would be to use llvm-objdump on the obj file to get to the annotated
disassembly, see also example in [0]. A .o to .c converter is wip for
libbpf/bpftool as presented in [1], which should provide the flexibility
to embedd an obj file.

Cheers,
Daniel

  [0] http://cilium.readthedocs.io/en/latest/bpf/#llvm
  [1] http://vger.kernel.org/netconf2018_files/AlexeiStarovoitov_netconf2018.pdf page 22

^ permalink raw reply

* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Eric Dumazet @ 2018-06-19 22:10 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Mathieu Malaterre, David S. Miller, Eric Dumazet, LKML,
	Christophe LEROY, Meelis Roos, netdev, linuxppc-dev
In-Reply-To: <e1790720-911b-4d04-2215-752ec612e3d7@gmail.com>



On 06/19/2018 01:10 PM, Eric Dumazet wrote:
> 
> 
> On 06/19/2018 12:10 PM, Andreas Schwab wrote:
>> On Jun 18 2018, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>> DUMP_PREFIX_ADDRESS might give us more information (say alignment problem, or crossing page boundaries)
>>
>> DUMP_PREFIX_ADDRESS is useless for that purpose.
>>
>> Here are some samples of broken csums:
>>
>> [  853.849225] sungem: sungem wrong csum : 9886/07be, len 94 bytes, c0000001fa187e02
>> [  853.849232] raw data: 00000000: 00 0d 93 43 81 62 18 d6 c7 51 b8 1c 08 00 45 10  ...C.b...Q....E.
>> [  853.849235] raw data: 00000010: 00 4c cb a0 40 00 40 11 d9 97 c0 a8 0a 01 c0 a8  .L..@.@.........
>> [  853.849237] raw data: 00000020: 0a 07 00 7b 00 7b 00 38 69 e1 1c 03 0c f7 00 00  ...{.{.8i.......
>> [  853.849240] raw data: 00000030: 08 f0 00 00 15 f0 c0 35 67 67 de d3 ca c9 d9 5b  .......5gg.....[
>> [  853.849242] raw data: 00000040: 1f ff de d3 d2 86 8f 67 fa f2 de d3 d2 86 8f 38  .......g.......8
>> [  853.849244] raw data: 00000050: 2f ff de d3 d2 86 8f 3b ff ff d1 93 bc 50        /......;.....P
> 
> Thanks.
> 
> 4 bytes in excess.
> 
> Might be the FCS, and it does not look like provided csum has a relation with it.
> 
> For some reason FCS stripping was disabled by :
> 
> commit 3e32011d4da6424b3bc65b1e1a047e30ac9882c7
> Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date:   Mon May 19 09:39:11 2003 -0700
> 
>     [SUNGEM]: Updates from PowerPC people.
>     
>     Support more chips and split out all the complex PHY
>     handling into a seperate file.
> 
> 
> Since this NIC never had CHECKSUM_COMPLETE support (since we have to trim each skb,
> thus were forcing ip_summed to CHECKSUM_NONE) we probably should remove it and be happy.
> 
> Unless you guys find a way to let the NIC strip the FCS, and double check the csum is a real csum ;)
> 
> diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
> index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..85439308375e95c3854e4a1561697d69ec85399b 100644
> --- a/drivers/net/ethernet/sun/sungem.c
> +++ b/drivers/net/ethernet/sun/sungem.c
> @@ -760,7 +760,6 @@ static int gem_rx(struct gem *gp, int work_to_do)
>         struct net_device *dev = gp->dev;
>         int entry, drops, work_done = 0;
>         u32 done;
> -       __sum16 csum;
>  
>         if (netif_msg_rx_status(gp))
>                 printk(KERN_DEBUG "%s: rx interrupt, done: %d, rx_new: %d\n",
> @@ -855,9 +854,6 @@ static int gem_rx(struct gem *gp, int work_to_do)
>                         skb = copy_skb;
>                 }
>  
> -               csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff);
> -               skb->csum = csum_unfold(csum);
> -               skb->ip_summed = CHECKSUM_COMPLETE;
>                 skb->protocol = eth_type_trans(skb, gp->dev);
>  
>                 napi_gro_receive(&gp->napi, skb);
> 


If you have time, you also could check if changing the suspect (14 / 2) to ETH_HLEN would help
(and also enabling STRIP_FCS)



diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..672d6748ab44f0890e92d5ca55d6ff6834c20dc9 100644
--- a/drivers/net/ethernet/sun/sungem.c
+++ b/drivers/net/ethernet/sun/sungem.c
@@ -60,8 +60,7 @@
 #include <linux/sungem_phy.h>
 #include "sungem.h"
 
-/* Stripping FCS is causing problems, disabled for now */
-#undef STRIP_FCS
+#define STRIP_FCS
 
 #define DEFAULT_MSG    (NETIF_MSG_DRV          | \
                         NETIF_MSG_PROBE        | \
@@ -435,7 +434,7 @@ static int gem_rxmac_reset(struct gem *gp)
        writel(desc_dma & 0xffffffff, gp->regs + RXDMA_DBLOW);
        writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK);
        val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
-              ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
+              (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
        writel(val, gp->regs + RXDMA_CFG);
        if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN)
                writel(((5 & RXDMA_BLANK_IPKTS) |
@@ -857,6 +856,14 @@ static int gem_rx(struct gem *gp, int work_to_do)
 
                csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff);
                skb->csum = csum_unfold(csum);
+               {
+               __wsum rsum = csum_partial(skb->data + ETH_HLEN, len - ETH_HLEN, 0);
+               if (csum != csum_fold(rsum) && net_ratelimit())
+                       pr_err("sungem wrong csum : %x/%x, len %u bytes\n",
+                               csum, csum_fold(rsum), len);
+                       print_hex_dump(KERN_ERR, "raw data: ", DUMP_PREFIX_OFFSET,
+                                           16, 1, skb->data, len, true);
+               }
                skb->ip_summed = CHECKSUM_COMPLETE;
                skb->protocol = eth_type_trans(skb, gp->dev);
 
@@ -1761,7 +1768,7 @@ static void gem_init_dma(struct gem *gp)
        writel(0, gp->regs + TXDMA_KICK);
 
        val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
-              ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
+              (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
        writel(val, gp->regs + RXDMA_CFG);
 
        writel(desc_dma >> 32, gp->regs + RXDMA_DBHI);

^ permalink raw reply related

* Re: [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss
From: Joe Perches @ 2018-06-19 21:56 UTC (permalink / raw)
  To: Andy Chittenden, 'Trond Myklebust'
  Cc: 'Andrew Morton', 'David Miller', kuznet, pekkas,
	jmorris, yoshfuji, kaber, eric.dumazet, William.Allen.Simpson,
	gilad, ilpo.jarvinen, netdev, linux-kernel, linux-nfs,
	'J. Bruce Fields', 'Neil Brown',
	'Chuck Lever', 'Benny Halevy',
	'Alexandros Batsakis', Andy Chittenden
In-Reply-To: <4c5fc9f3.487e0e0a.2960.ffffe4ec@mx.google.com>

On Mon, 2010-08-09 at 10:27 +0100, Andy Chittenden wrote:

You really need to check your clock.
Mail sent in the year 2010?

^ permalink raw reply

* Re: [PATCH] PCI: allow drivers to limit the number of VFs to 0
From: Bjorn Helgaas @ 2018-06-19 21:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Bjorn Helgaas, linux-pci, netdev, Sathya Perla, Felix Manlunas,
	alexander.duyck, john.fastabend, Jacob Keller, Donald Dutile,
	oss-drivers, Christoph Hellwig
In-Reply-To: <20180525140223.GA45098@bhelgaas-glaptop.roam.corp.google.com>

On Fri, May 25, 2018 at 09:02:23AM -0500, Bjorn Helgaas wrote:
> On Thu, May 24, 2018 at 06:20:15PM -0700, Jakub Kicinski wrote:
> > Hi Bjorn!
> > 
> > On Thu, 24 May 2018 18:57:48 -0500, Bjorn Helgaas wrote:
> > > On Mon, Apr 02, 2018 at 03:46:52PM -0700, Jakub Kicinski wrote:
> > > > Some user space depends on enabling sriov_totalvfs number of VFs
> > > > to not fail, e.g.:
> > > > 
> > > > $ cat .../sriov_totalvfs > .../sriov_numvfs
> > > > 
> > > > For devices which VF support depends on loaded FW we have the
> > > > pci_sriov_{g,s}et_totalvfs() API.  However, this API uses 0 as
> > > > a special "unset" value, meaning drivers can't limit sriov_totalvfs
> > > > to 0.  Remove the special values completely and simply initialize
> > > > driver_max_VFs to total_VFs.  Then always use driver_max_VFs.
> > > > Add a helper for drivers to reset the VF limit back to total.  
> > > 
> > > I still can't really make sense out of the changelog.
> > >
> > > I think part of the reason it's confusing is because there are two
> > > things going on:
> > > 
> > >   1) You want this:
> > >   
> > >        pci_sriov_set_totalvfs(dev, 0);
> > >        x = pci_sriov_get_totalvfs(dev) 
> > > 
> > >      to return 0 instead of total_VFs.  That seems to connect with
> > >      your subject line.  It means "sriov_totalvfs" in sysfs could be
> > >      0, but I don't know how that is useful (I'm sure it is; just
> > >      educate me :))
> > 
> > Let me just quote the bug report that got filed on our internal bug
> > tracker :)
> > 
> >   When testing Juju Openstack with Ubuntu 18.04, enabling SR-IOV causes
> >   errors because Juju gets the sriov_totalvfs for SR-IOV-capable device
> >   then tries to set that as the sriov_numvfs parameter.
> > 
> >   For SR-IOV incapable FW, the sriov_totalvfs parameter should be 0, 
> >   but it's set to max.  When FW is switched to flower*, the correct 
> >   sriov_totalvfs value is presented.
> > 
> > * flower is a project name
> 
> From the point of view of the PCI core (which knows nothing about
> device firmware and relies on the architected config space described
> by the PCIe spec), this sounds like an erratum: with some firmware
> installed, the device is not capable of SR-IOV, but still advertises
> an SR-IOV capability with "TotalVFs > 0".
> 
> Regardless of whether that's an erratum, we do allow PF drivers to use
> pci_sriov_set_totalvfs() to limit the number of VFs that may be
> enabled by writing to the PF's "sriov_numvfs" sysfs file.
> 
> But the current implementation does not allow a PF driver to limit VFs
> to 0, and that does seem nonsensical.
> 
> > My understanding is OpenStack uses sriov_totalvfs to determine how many
> > VFs can be enabled, looks like this is the code:
> > 
> > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n464
> > 
> > >   2) You're adding the pci_sriov_reset_totalvfs() interface.  I'm not
> > >      sure what you intend for this.  Is *every* driver supposed to
> > >      call it in .remove()?  Could/should this be done in the core
> > >      somehow instead of depending on every driver?
> > 
> > Good question, I was just thinking yesterday we may want to call it
> > from the core, but I don't think it's strictly necessary nor always
> > sufficient (we may reload FW without re-probing).
> > 
> > We have a device which supports different number of VFs based on the FW
> > loaded.  Some legacy FWs does not inform the driver how many VFs it can
> > support, because it supports max.  So the flow in our driver is this:
> > 
> > load_fw(dev);
> > ...
> > max_vfs = ask_fw_for_max_vfs(dev);
> > if (max_vfs >= 0)
> > 	return pci_sriov_set_totalvfs(dev, max_vfs);
> > else /* FW didn't tell us, assume max */
> > 	return pci_sriov_reset_totalvfs(dev); 
> > 
> > We also reset the max on device remove, but that's not strictly
> > necessary.
> > 
> > Other users of pci_sriov_set_totalvfs() always know the value to set
> > the total to (either always get it from FW or it's a constant).
> > 
> > If you prefer we can work out the correct max for those legacy cases in
> > the driver as well, although it seemed cleaner to just ask the core,
> > since it already has total_VFs value handy :)
> > 
> > > I'm also having a hard time connecting your user-space command example
> > > with the rest of this.  Maybe it will make more sense to me tomorrow
> > > after some coffee.
> > 
> > OpenStack assumes it will always be able to set sriov_numvfs to
> > sriov_totalvfs, see this 'if':
> > 
> > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n512
> 
> Thanks for educating me.  I think there are two issues here that we
> can separate.  I extracted the patch below for the first.
> 
> The second is the question of resetting driver_max_VFs.  I think we
> currently have a general issue in the core:
> 
>   - load PF driver 1
>   - driver calls pci_sriov_set_totalvfs() to reduce driver_max_VFs
>   - unload PF driver 1
>   - load PF driver 2
> 
> Now driver_max_VFs is still stuck at the lower value set by driver 1.
> I don't think that's the way this should work.
> 
> I guess this is partly a consequence of setting driver_max_VFs in
> sriov_init(), which is called before driver attach and should only
> depend on hardware characteristics, so it is related to the patch
> below.  But I think we should fix it in general, not just for
> netronome.

Hi Jakub, the patch below is in v4.18-rc1 as 8d85a7a4f2c9 ("PCI/IOV:
Allow PF drivers to limit total_VFs to 0").  If there's more we need
to do here, would you mind rebasing what's left to v4.18-rc1 and
reposting it?

> commit 4a338bc6f94b9ad824ac944f5dfc249d6838719c
> Author: Jakub Kicinski <jakub.kicinski@netronome.com>
> Date:   Fri May 25 08:18:34 2018 -0500
> 
>     PCI/IOV: Allow PF drivers to limit total_VFs to 0
>     
>     Some SR-IOV PF drivers implement .sriov_configure(), which allows
>     user-space to enable VFs by writing the desired number of VFs to the sysfs
>     "sriov_numvfs" file (see sriov_numvfs_store()).
>     
>     The PCI core limits the number of VFs to the TotalVFs advertised by the
>     device in its SR-IOV capability.  The PF driver can limit the number of VFs
>     to even fewer (it may have pre-allocated data structures or knowledge of
>     device limitations) by calling pci_sriov_set_totalvfs(), but previously it
>     could not limit the VFs to 0.
>     
>     Change pci_sriov_get_totalvfs() so it always respects the VF limit imposed
>     by the PF driver, even if the limit is 0.
>     
>     This sequence:
>     
>       pci_sriov_set_totalvfs(dev, 0);
>       x = pci_sriov_get_totalvfs(dev);
>     
>     previously set "x" to TotalVFs from the SR-IOV capability.  Now it will set
>     "x" to 0.
>     
>     Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 192b82898a38..d0d73dbbd5ca 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -469,6 +469,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
>  	iov->nres = nres;
>  	iov->ctrl = ctrl;
>  	iov->total_VFs = total;
> +	iov->driver_max_VFs = total;
>  	pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, &iov->vf_device);
>  	iov->pgsz = pgsz;
>  	iov->self = dev;
> @@ -827,10 +828,7 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
>  	if (!dev->is_physfn)
>  		return 0;
>  
> -	if (dev->sriov->driver_max_VFs)
> -		return dev->sriov->driver_max_VFs;
> -
> -	return dev->sriov->total_VFs;
> +	return dev->sriov->driver_max_VFs;
>  }
>  EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
>  

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox