Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 0/5] Fixes coding style in xilinx_emaclite.c
From: David Miller @ 2018-06-19 22:55 UTC (permalink / raw)
  To: radhey.shyam.pandey
  Cc: andrew, michal.simek, netdev, linux-arm-kernel, linux-kernel
In-Reply-To: <1529322610-27215-1-git-send-email-radhey.shyam.pandey@xilinx.com>

From: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Date: Mon, 18 Jun 2018 17:20:05 +0530

> This patchset fixes checkpatch and kernel-doc warnings in
> xilinx emaclite driver. No functional change.

Such cleanups are only appropriate for net-next, which is closed at this
time.

^ permalink raw reply

* Re: [PATCH 1/1] tcp: tcp_mtup_probe -> tcp_mtu_probe
From: David Miller @ 2018-06-19 22:55 UTC (permalink / raw)
  To: jianjian.wang1; +Cc: edumazet, kuznet, yoshfuji, netdev
In-Reply-To: <CAP4sYWVqrAyE6sXerWe9Vn7N-jeMtDL_2xZGbu_ScebPY3iK3w@mail.gmail.com>

From: Wang Jian <jianjian.wang1@gmail.com>
Date: Mon, 18 Jun 2018 18:58:50 +0800

> Comment correction.
> 
> Signed-off-by: Jian Wang <jianjian.wang1@gmail.com>

This patch was corrupted by your email client.

Please fix this, email the patch to yourself, and only resubmit this patch
to the mailing list when you can successfull apply the patch yourself that
you send in the test email.

Thank you.

^ permalink raw reply

* Re: [PATCH net] ipvlan: use ETH_MAX_MTU as max mtu
From: David Miller @ 2018-06-19 22:54 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev, jarod, maheshb
In-Reply-To: <3a944c2cbcff6df280847770decca4875b671959.1529309757.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Mon, 18 Jun 2018 16:15:57 +0800

> Similar to the fixes on team and bonding, this restores the ability
> to set an ipvlan device's mtu to anything higher than 1500.
> 
> Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v2] net: hamradio: use eth_broadcast_addr
From: David Miller @ 2018-06-19 22:52 UTC (permalink / raw)
  To: stefan; +Cc: netdev, linux-kernel
In-Reply-To: <20180617214058.6926-1-stefan@agner.ch>

From: Stefan Agner <stefan@agner.ch>
Date: Sun, 17 Jun 2018 23:40:53 +0200

> The array bpq_eth_addr is only used to get the size of an
> address, whereas the bcast_addr is used to set the broadcast
> address. This leads to a warning when using clang:
> drivers/net/hamradio/bpqether.c:94:13: warning: variable 'bpq_eth_addr' is not
>       needed and will not be emitted [-Wunneeded-internal-declaration]
> static char bpq_eth_addr[6];
>             ^
> 
> Remove both variables and use the common eth_broadcast_addr
> to set the broadcast address.
> 
> Signed-off-by: Stefan Agner <stefan@agner.ch>

Applied.

^ permalink raw reply

* Re: [PATCH net] enic: initialize enic->rfs_h.lock in enic_probe
From: David Miller @ 2018-06-19 22:49 UTC (permalink / raw)
  To: gvaradar; +Cc: benve, netdev
In-Reply-To: <20180619151524.1326-1-gvaradar@cisco.com>

From: Govindarajulu Varadarajan <gvaradar@cisco.com>
Date: Tue, 19 Jun 2018 08:15:24 -0700

> lockdep spotted that we are using rfs_h.lock in enic_get_rxnfc() without
> initializing. rfs_h.lock is initialized in enic_open(). But ethtool_ops
> can be called when interface is down.
> 
> Move enic_rfs_flw_tbl_init to enic_probe.
 ...
> Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>

Applied, thanks.

^ permalink raw reply

* Re: [net] bpf, xdp, i40e: fix i40e_build_skb skb reserve and truesize
From: David Miller @ 2018-06-19 22:41 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: daniel, netdev, nhorman, sassmann, jogreene, bjorn.topel,
	john.fastabend
In-Reply-To: <20180619213354.30986-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 19 Jun 2018 14:33:54 -0700

> From: Daniel Borkmann <daniel@iogearbox.net>
> 
> Using skb_reserve(skb, I40E_SKB_PAD + (xdp->data - xdp->data_hard_start))
> is clearly wrong since I40E_SKB_PAD already points to the offset where
> the original xdp->data was sitting since xdp->data_hard_start is defined
> as xdp->data - i40e_rx_offset(rx_ring) where latter offsets to I40E_SKB_PAD
> when build skb is used.
> 
> However, also before cc5b114dcf98 ("bpf, i40e: add meta data support")
> this seems broken since bpf_xdp_adjust_head() helper could have been used
> to alter headroom and enlarge / shrink the frame and with that the assumption
> that the xdp->data remains unchanged does not hold and would push a bogus
> packet to upper stack.
> 
> ixgbe got this right in 924708081629 ("ixgbe: add XDP support for pass and
> drop actions"). In any case, fix it by removing the I40E_SKB_PAD from both
> skb_reserve() and truesize calculation.
> 
> Fixes: cc5b114dcf98 ("bpf, i40e: add meta data support")
> Fixes: 0c8493d90b6b ("i40e: add XDP support for pass and drop actions")
> Reported-by: Keith Busch <keith.busch@linux.intel.com>
> Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Björn Töpel <bjorn.topel@intel.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Tested-by: Keith Busch <keith.busch@linux.intel.com>
> Acked-by: John Fastabend <john.fastabend@gmail.com>
> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Eric Dumazet @ 2018-06-19 22:40 UTC (permalink / raw)
  To: Andreas Schwab, Eric Dumazet
  Cc: Mathieu Malaterre, David S. Miller, Eric Dumazet, LKML,
	Christophe LEROY, Meelis Roos, netdev, linuxppc-dev
In-Reply-To: <87lgbav24y.fsf@igel.home>



On 06/19/2018 03:32 PM, Andreas Schwab wrote:
> On Jun 19 2018, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
>> diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
>> index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..672d6748ab44f0890e92d5ca55d6ff6834c20dc9 100644
>> --- a/drivers/net/ethernet/sun/sungem.c
>> +++ b/drivers/net/ethernet/sun/sungem.c
>> @@ -60,8 +60,7 @@
>>  #include <linux/sungem_phy.h>
>>  #include "sungem.h"
>>  
>> -/* Stripping FCS is causing problems, disabled for now */
>> -#undef STRIP_FCS
>> +#define STRIP_FCS
>>  
>>  #define DEFAULT_MSG    (NETIF_MSG_DRV          | \
>>                          NETIF_MSG_PROBE        | \
>> @@ -435,7 +434,7 @@ static int gem_rxmac_reset(struct gem *gp)
>>         writel(desc_dma & 0xffffffff, gp->regs + RXDMA_DBLOW);
>>         writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK);
>>         val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
>> -              ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
>> +              (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
>>         writel(val, gp->regs + RXDMA_CFG);
>>         if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN)
>>                 writel(((5 & RXDMA_BLANK_IPKTS) |
>> @@ -857,6 +856,14 @@ static int gem_rx(struct gem *gp, int work_to_do)
>>  
>>                 csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff);
>>                 skb->csum = csum_unfold(csum);
>> +               {
>> +               __wsum rsum = csum_partial(skb->data + ETH_HLEN, len - ETH_HLEN, 0);
>> +               if (csum != csum_fold(rsum) && net_ratelimit())
>> +                       pr_err("sungem wrong csum : %x/%x, len %u bytes\n",
>> +                               csum, csum_fold(rsum), len);
>> +                       print_hex_dump(KERN_ERR, "raw data: ", DUMP_PREFIX_OFFSET,
>> +                                           16, 1, skb->data, len, true);
>> +               }
>>                 skb->ip_summed = CHECKSUM_COMPLETE;
>>                 skb->protocol = eth_type_trans(skb, gp->dev);
>>  
>> @@ -1761,7 +1768,7 @@ static void gem_init_dma(struct gem *gp)
>>         writel(0, gp->regs + TXDMA_KICK);
>>  
>>         val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
>> -              ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
>> +              (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
>>         writel(val, gp->regs + RXDMA_CFG);
>>  
>>         writel(desc_dma >> 32, gp->regs + RXDMA_DBHI);
> 
> With that patch I still get the wrong csum messages, but no longer the
> hw csum failure messages (tested on a PowerMac G5).
> 
> [  662.659767] sungem: sungem wrong csum : 8359/7ca6, len 86 bytes, c0000001fee9cc02
> [  662.659775] raw data: 00000000: 00 0d 93 43 81 62 d4 3d 7e 4c 48 b7 86 dd 61 01  ...C.b.=~LH...a.
> [  662.659778] raw data: 00000010: 1c 1e 00 20 06 40 20 01 0a 62 17 11 88 01 00 00  ... .@ ..b......
> [  662.659780] raw data: 00000020: 00 00 00 00 0a 38 20 01 0a 62 17 11 88 01 00 00  .....8 ..b......
> [  662.659783] raw data: 00000030: 00 00 00 00 00 07 9a 18 00 16 c1 9a 7e ea ea 44  ............~..D
> [  662.659785] raw data: 00000040: fb 4a 80 10 05 93 44 08 00 00 01 01 08 0a 59 68  .J....D.......Yh
> [  662.659788] raw data: 00000050: ba e2 0e bb ac ae                                ......
> 
> Andreas.
> 

Note that 8359 and 7ca6 are the same really (a missing ~ to invert csum_partial())

So the bug was that :

1) Driver programmed a wrong start offset for the csum  (7 bytes instead of 14 bytes to skip Ethernet Header)

2) FCS was not stripped.

Basically CHECKSUM_COMPLETE support never worked, but this was hidden by the fact that linux stack
had to throw away this CHECKSUM_COMPLETE because the FCS had to be removed.

Thanks !

^ permalink raw reply

* Re: [PATCH] net: nixge: Add __packed attribute to DMA descriptor struct
From: David Miller @ 2018-06-19 22:37 UTC (permalink / raw)
  To: f.fainelli; +Cc: mdf, keescook, netdev, linux-kernel
In-Reply-To: <832bebb8-300d-e911-2946-5edfe82dc30a@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue, 19 Jun 2018 10:13:55 -0700

> How could padding be inserted given than all of the structure members
> are naturally aligned (all u32 type). Compiler bug?

Agreed, this looks completely unnecessary.

__packed should only be used when absolutely necessary because using
it generates less efficient code on some architectures.

^ permalink raw reply

* Re: [PATCH 1/5] net: emaclite: Use __func__ instead of hardcoded name
From: Joe Perches @ 2018-06-19 22:37 UTC (permalink / raw)
  To: Andy Shevchenko, Radhey Shyam Pandey
  Cc: David S. Miller, Andrew Lunn, Michal Simek, netdev,
	linux-arm Mailing List, Linux Kernel Mailing List
In-Reply-To: <CAHp75VcKuqnUgynqXvACS7zY5=6aRxqymGE1w3Yh8sshehSm2g@mail.gmail.com>

On Wed, 2018-06-20 at 00:36 +0300, Andy Shevchenko wrote:
> On Mon, Jun 18, 2018 at 2:08 PM, Radhey Shyam Pandey
> <radhey.shyam.pandey@xilinx.com> wrote:
> > Switch hardcoded function name with a reference to __func__ making
> > the code more maintainable. Address below checkpatch warning:
> > 
> > WARNING: Prefer using '"%s...", __func__' to using 'xemaclite_mdio_read',
> > this function's name, in a string
> > +               "xemaclite_mdio_read(phy_id=%i, reg=%x) == %x\n",
> > 
> > WARNING: Prefer using '"%s...", __func__' to using 'xemaclite_mdio_write',
> > this function's name, in a string
> > +               "xemaclite_mdio_write(phy_id=%i, reg=%x, val=%x)\n",
> > 
> 
> For dev_dbg() the __func__ should be completely dropped away.

Not really the same.

dev_dbg without CONFIG_DYNAMIC_DEBUG does not have
the ability to prefix __func__.

^ permalink raw reply

* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Andreas Schwab @ 2018-06-19 22:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mathieu Malaterre, David S. Miller, Eric Dumazet, LKML,
	Christophe LEROY, Meelis Roos, netdev, linuxppc-dev
In-Reply-To: <05645b90-d3bc-466d-116f-548f3ee39de9@gmail.com>

On Jun 19 2018, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
> index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..672d6748ab44f0890e92d5ca55d6ff6834c20dc9 100644
> --- a/drivers/net/ethernet/sun/sungem.c
> +++ b/drivers/net/ethernet/sun/sungem.c
> @@ -60,8 +60,7 @@
>  #include <linux/sungem_phy.h>
>  #include "sungem.h"
>  
> -/* Stripping FCS is causing problems, disabled for now */
> -#undef STRIP_FCS
> +#define STRIP_FCS
>  
>  #define DEFAULT_MSG    (NETIF_MSG_DRV          | \
>                          NETIF_MSG_PROBE        | \
> @@ -435,7 +434,7 @@ static int gem_rxmac_reset(struct gem *gp)
>         writel(desc_dma & 0xffffffff, gp->regs + RXDMA_DBLOW);
>         writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK);
>         val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
> -              ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
> +              (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
>         writel(val, gp->regs + RXDMA_CFG);
>         if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN)
>                 writel(((5 & RXDMA_BLANK_IPKTS) |
> @@ -857,6 +856,14 @@ static int gem_rx(struct gem *gp, int work_to_do)
>  
>                 csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff);
>                 skb->csum = csum_unfold(csum);
> +               {
> +               __wsum rsum = csum_partial(skb->data + ETH_HLEN, len - ETH_HLEN, 0);
> +               if (csum != csum_fold(rsum) && net_ratelimit())
> +                       pr_err("sungem wrong csum : %x/%x, len %u bytes\n",
> +                               csum, csum_fold(rsum), len);
> +                       print_hex_dump(KERN_ERR, "raw data: ", DUMP_PREFIX_OFFSET,
> +                                           16, 1, skb->data, len, true);
> +               }
>                 skb->ip_summed = CHECKSUM_COMPLETE;
>                 skb->protocol = eth_type_trans(skb, gp->dev);
>  
> @@ -1761,7 +1768,7 @@ static void gem_init_dma(struct gem *gp)
>         writel(0, gp->regs + TXDMA_KICK);
>  
>         val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
> -              ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
> +              (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
>         writel(val, gp->regs + RXDMA_CFG);
>  
>         writel(desc_dma >> 32, gp->regs + RXDMA_DBHI);

With that patch I still get the wrong csum messages, but no longer the
hw csum failure messages (tested on a PowerMac G5).

[  662.659767] sungem: sungem wrong csum : 8359/7ca6, len 86 bytes, c0000001fee9cc02
[  662.659775] raw data: 00000000: 00 0d 93 43 81 62 d4 3d 7e 4c 48 b7 86 dd 61 01  ...C.b.=~LH...a.
[  662.659778] raw data: 00000010: 1c 1e 00 20 06 40 20 01 0a 62 17 11 88 01 00 00  ... .@ ..b......
[  662.659780] raw data: 00000020: 00 00 00 00 0a 38 20 01 0a 62 17 11 88 01 00 00  .....8 ..b......
[  662.659783] raw data: 00000030: 00 00 00 00 00 07 9a 18 00 16 c1 9a 7e ea ea 44  ............~..D
[  662.659785] raw data: 00000040: fb 4a 80 10 05 93 44 08 00 00 01 01 08 0a 59 68  .J....D.......Yh
[  662.659788] raw data: 00000050: ba e2 0e bb ac ae                                ......

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply

* Re: [PATCH v2 0/4] Slience NCSI logging
From: David Miller @ 2018-06-19 22:27 UTC (permalink / raw)
  To: joel; +Cc: sam, netdev, joe
In-Reply-To: <20180619053834.12257-1-joel@jms.id.au>

From: Joel Stanley <joel@jms.id.au>
Date: Tue, 19 Jun 2018 15:08:30 +0930

> v2:
>   Fix indent issue and commit message based on Joe's feedback
>   Add Sam's acks
> 
> Here are three changes to silence unnecessary warnings in the ncsi code.
> 
> The final patch adds Sam as the maintainer for NCSI.

Series applied.

^ permalink raw reply

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
From: Daniel Borkmann @ 2018-06-19 22:22 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, pablo, netfilter-devel, netdev
In-Reply-To: <20180617092304.fviy36bvgnhexwtb@gauss3.secunet.de>

On 06/17/2018 11:23 AM, Steffen Klassert wrote:
[...]
>> Would be curious about
>> the numbers. You'd get implicit batching for the forwarding via devmap
>> as well if you're required to flush it out via different device with
>> XDP_REDIRECT; otherwise XDP_TX of course. Given we have recently
>> integrated helpers for XDP to do a FIB and neighbor lookup from the
>> kernel tables, where it's thus shared and integrated with the rest of
>> the stack and tooling, it would be awesome to get to the same point
>> with xfrm as well. Eyal recently did a start on that for xfrm for tc
>> progs; would be nice to have integration on XDP as well, potentially
>> it might also result in a bigger plus on the forwarding numbers.
> 
> It might make sense to intrgrate XDP with xfrm to
> be able to compare numbers etc. But I need a working
> XDP setup and some understanding about it first, what
> could take some time.

Okay, no prob. If you have any questions feel free to shoot an email.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH net 0/3] qed*: Fix series.
From: David Miller @ 2018-06-19 22:18 UTC (permalink / raw)
  To: sudarsana.kalluru; +Cc: netdev, Ariel.Elior, Michal.Kalderon
In-Reply-To: <20180619045802.24050-1-sudarsana.kalluru@cavium.com>

From: Sudarsana Reddy Kalluru <sudarsana.kalluru@cavium.com>
Date: Mon, 18 Jun 2018 21:57:59 -0700

> From: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
> 
> The patch series fixes few issues in the qed/qede drivers.
> Please consider applying this series to "net".

Series applied.

^ permalink raw reply

* [PATCH net] enic: initialize enic->rfs_h.lock in enic_probe
From: Govindarajulu Varadarajan @ 2018-06-19 15:15 UTC (permalink / raw)
  To: davem, benve, netdev; +Cc: Govindarajulu Varadarajan

lockdep spotted that we are using rfs_h.lock in enic_get_rxnfc() without
initializing. rfs_h.lock is initialized in enic_open(). But ethtool_ops
can be called when interface is down.

Move enic_rfs_flw_tbl_init to enic_probe.

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 18 PID: 1189 Comm: ethtool Not tainted 4.17.0-rc7-devel+ #27
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
Call Trace:
dump_stack+0x85/0xc0
register_lock_class+0x550/0x560
? __handle_mm_fault+0xa8b/0x1100
__lock_acquire+0x81/0x670
lock_acquire+0xb9/0x1e0
?  enic_get_rxnfc+0x139/0x2b0 [enic]
_raw_spin_lock_bh+0x38/0x80
? enic_get_rxnfc+0x139/0x2b0 [enic]
enic_get_rxnfc+0x139/0x2b0 [enic]
ethtool_get_rxnfc+0x8d/0x1c0
dev_ethtool+0x16c8/0x2400
? __mutex_lock+0x64d/0xa00
? dev_load+0x6a/0x150
dev_ioctl+0x253/0x4b0
sock_do_ioctl+0x9a/0x130
sock_ioctl+0x1af/0x350
do_vfs_ioctl+0x8e/0x670
? syscall_trace_enter+0x1e2/0x380
ksys_ioctl+0x60/0x90
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x5a/0x170
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
---
 drivers/net/ethernet/cisco/enic/enic_clsf.c | 3 +--
 drivers/net/ethernet/cisco/enic/enic_main.c | 3 ++-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic_clsf.c b/drivers/net/ethernet/cisco/enic/enic_clsf.c
index 973c1fb70d09..99038dfc7fbe 100644
--- a/drivers/net/ethernet/cisco/enic/enic_clsf.c
+++ b/drivers/net/ethernet/cisco/enic/enic_clsf.c
@@ -79,7 +79,6 @@ void enic_rfs_flw_tbl_init(struct enic *enic)
 	enic->rfs_h.max = enic->config.num_arfs;
 	enic->rfs_h.free = enic->rfs_h.max;
 	enic->rfs_h.toclean = 0;
-	enic_rfs_timer_start(enic);
 }

 void enic_rfs_flw_tbl_free(struct enic *enic)
@@ -88,7 +87,6 @@ void enic_rfs_flw_tbl_free(struct enic *enic)

 	enic_rfs_timer_stop(enic);
 	spin_lock_bh(&enic->rfs_h.lock);
-	enic->rfs_h.free = 0;
 	for (i = 0; i < (1 << ENIC_RFS_FLW_BITSHIFT); i++) {
 		struct hlist_head *hhead;
 		struct hlist_node *tmp;
@@ -99,6 +97,7 @@ void enic_rfs_flw_tbl_free(struct enic *enic)
 			enic_delfltr(enic, n->fltr_id);
 			hlist_del(&n->node);
 			kfree(n);
+			enic->rfs_h.free++;
 		}
 	}
 	spin_unlock_bh(&enic->rfs_h.lock);
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index 30d2eaa18c04..e6ad581eadd8 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -1971,7 +1971,7 @@ static int enic_open(struct net_device *netdev)
 		vnic_intr_unmask(&enic->intr[i]);

 	enic_notify_timer_start(enic);
-	enic_rfs_flw_tbl_init(enic);
+	enic_rfs_timer_start(enic);

 	return 0;

@@ -2904,6 +2904,7 @@ static int enic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)

 	timer_setup(&enic->notify_timer, enic_notify_timer, 0);

+	enic_rfs_flw_tbl_init(enic);
 	enic_set_rx_coal_setting(enic);
 	INIT_WORK(&enic->reset, enic_reset);
 	INIT_WORK(&enic->tx_hang_reset, enic_tx_hang_reset);
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH] tc, bpf: add option to dump bpf verifier as C program fragment
From: Daniel Borkmann @ 2018-06-19 22:13 UTC (permalink / raw)
  To: David Ahern, Jakub Kicinski, Ophir Munk
  Cc: netdev, Stephen Hemminger, Thomas Monjalon, Olga Shern, ast
In-Reply-To: <30ac6974-a434-926b-5749-25c594e5841c@gmail.com>

On 06/18/2018 11:44 PM, David Ahern wrote:
> On 6/18/18 2:18 PM, Jakub Kicinski wrote:
>> On Sun, 17 Jun 2018 08:48:41 +0000, Ophir Munk wrote:
>>> Similar to cbpf used within tcpdump utility with a "-d" option to dump
>>> the compiled packet-matching code in a human readable form - tc has the
>>> "verbose" option to dump ebpf verifier output.
>>> Another useful option of cbpf using tcpdump "-dd" option is to dump
>>> packet-matching code a C program fragment. Similar to this - this commit
>>> adds a new tc ebpf option named "code" to dump ebpf verifier as C program
>>> fragment.
>>>
>>> Existing "verbose" option sample output:
>>>
>>> Verifier analysis:
>>> 0: (61) r2 = *(u32 *)(r1 +52)
>>> 1: (18) r3 = 0xdeadbeef
>>> 3: (63) *(u32 *)(r10 -4) = r3
>>> .
>>> .
>>> 11: (63) *(u32 *)(r1 +52) = r2
>>> 12: (18) r0 = 0xffffffff
>>> 14: (95) exit
>>>
>>> New "code" option sample output:
>>>
>>> /* struct bpf_insn cls_q_code[] = { */
>>> {0x61,    2,    1,       52, 0x00000000},
>>> {0x18,    3,    0,        0, 0xdeadbeef},
>>> {0x00,    0,    0,        0, 0x00000000},
>>> .
>>> .
>>> {0x63,    1,    2,       52, 0x00000000},
>>> {0x18,    0,    0,        0, 0xffffffff},
>>> {0x00,    0,    0,        0, 0x00000000},
>>> {0x95,    0,    0,        0, 0x00000000},
>>>
>>> Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
>>
>> Hmm... printing C arrays looks like hacky integration with some C
>> code...  Would you not be better served by simply using libbpf in
>> whatever is consuming this output?
> 
> I was thinking the same. bpftool would provide options too -- print the
> above, print in macro encodings and verifier. I gave an example of this
> side by side dump at netconf 2.1. Does not look like the slides made it
> online; see attached.

+1, I would also doubt that this adds a lot in terms of debuggability
when you're trying to load an object file with thousands of insns. Better
way would be to use llvm-objdump on the obj file to get to the annotated
disassembly, see also example in [0]. A .o to .c converter is wip for
libbpf/bpftool as presented in [1], which should provide the flexibility
to embedd an obj file.

Cheers,
Daniel

  [0] http://cilium.readthedocs.io/en/latest/bpf/#llvm
  [1] http://vger.kernel.org/netconf2018_files/AlexeiStarovoitov_netconf2018.pdf page 22

^ permalink raw reply

* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Eric Dumazet @ 2018-06-19 22:10 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Mathieu Malaterre, David S. Miller, Eric Dumazet, LKML,
	Christophe LEROY, Meelis Roos, netdev, linuxppc-dev
In-Reply-To: <e1790720-911b-4d04-2215-752ec612e3d7@gmail.com>



On 06/19/2018 01:10 PM, Eric Dumazet wrote:
> 
> 
> On 06/19/2018 12:10 PM, Andreas Schwab wrote:
>> On Jun 18 2018, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>> DUMP_PREFIX_ADDRESS might give us more information (say alignment problem, or crossing page boundaries)
>>
>> DUMP_PREFIX_ADDRESS is useless for that purpose.
>>
>> Here are some samples of broken csums:
>>
>> [  853.849225] sungem: sungem wrong csum : 9886/07be, len 94 bytes, c0000001fa187e02
>> [  853.849232] raw data: 00000000: 00 0d 93 43 81 62 18 d6 c7 51 b8 1c 08 00 45 10  ...C.b...Q....E.
>> [  853.849235] raw data: 00000010: 00 4c cb a0 40 00 40 11 d9 97 c0 a8 0a 01 c0 a8  .L..@.@.........
>> [  853.849237] raw data: 00000020: 0a 07 00 7b 00 7b 00 38 69 e1 1c 03 0c f7 00 00  ...{.{.8i.......
>> [  853.849240] raw data: 00000030: 08 f0 00 00 15 f0 c0 35 67 67 de d3 ca c9 d9 5b  .......5gg.....[
>> [  853.849242] raw data: 00000040: 1f ff de d3 d2 86 8f 67 fa f2 de d3 d2 86 8f 38  .......g.......8
>> [  853.849244] raw data: 00000050: 2f ff de d3 d2 86 8f 3b ff ff d1 93 bc 50        /......;.....P
> 
> Thanks.
> 
> 4 bytes in excess.
> 
> Might be the FCS, and it does not look like provided csum has a relation with it.
> 
> For some reason FCS stripping was disabled by :
> 
> commit 3e32011d4da6424b3bc65b1e1a047e30ac9882c7
> Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date:   Mon May 19 09:39:11 2003 -0700
> 
>     [SUNGEM]: Updates from PowerPC people.
>     
>     Support more chips and split out all the complex PHY
>     handling into a seperate file.
> 
> 
> Since this NIC never had CHECKSUM_COMPLETE support (since we have to trim each skb,
> thus were forcing ip_summed to CHECKSUM_NONE) we probably should remove it and be happy.
> 
> Unless you guys find a way to let the NIC strip the FCS, and double check the csum is a real csum ;)
> 
> diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
> index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..85439308375e95c3854e4a1561697d69ec85399b 100644
> --- a/drivers/net/ethernet/sun/sungem.c
> +++ b/drivers/net/ethernet/sun/sungem.c
> @@ -760,7 +760,6 @@ static int gem_rx(struct gem *gp, int work_to_do)
>         struct net_device *dev = gp->dev;
>         int entry, drops, work_done = 0;
>         u32 done;
> -       __sum16 csum;
>  
>         if (netif_msg_rx_status(gp))
>                 printk(KERN_DEBUG "%s: rx interrupt, done: %d, rx_new: %d\n",
> @@ -855,9 +854,6 @@ static int gem_rx(struct gem *gp, int work_to_do)
>                         skb = copy_skb;
>                 }
>  
> -               csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff);
> -               skb->csum = csum_unfold(csum);
> -               skb->ip_summed = CHECKSUM_COMPLETE;
>                 skb->protocol = eth_type_trans(skb, gp->dev);
>  
>                 napi_gro_receive(&gp->napi, skb);
> 


If you have time, you also could check if changing the suspect (14 / 2) to ETH_HLEN would help
(and also enabling STRIP_FCS)



diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..672d6748ab44f0890e92d5ca55d6ff6834c20dc9 100644
--- a/drivers/net/ethernet/sun/sungem.c
+++ b/drivers/net/ethernet/sun/sungem.c
@@ -60,8 +60,7 @@
 #include <linux/sungem_phy.h>
 #include "sungem.h"
 
-/* Stripping FCS is causing problems, disabled for now */
-#undef STRIP_FCS
+#define STRIP_FCS
 
 #define DEFAULT_MSG    (NETIF_MSG_DRV          | \
                         NETIF_MSG_PROBE        | \
@@ -435,7 +434,7 @@ static int gem_rxmac_reset(struct gem *gp)
        writel(desc_dma & 0xffffffff, gp->regs + RXDMA_DBLOW);
        writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK);
        val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
-              ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
+              (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
        writel(val, gp->regs + RXDMA_CFG);
        if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN)
                writel(((5 & RXDMA_BLANK_IPKTS) |
@@ -857,6 +856,14 @@ static int gem_rx(struct gem *gp, int work_to_do)
 
                csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff);
                skb->csum = csum_unfold(csum);
+               {
+               __wsum rsum = csum_partial(skb->data + ETH_HLEN, len - ETH_HLEN, 0);
+               if (csum != csum_fold(rsum) && net_ratelimit())
+                       pr_err("sungem wrong csum : %x/%x, len %u bytes\n",
+                               csum, csum_fold(rsum), len);
+                       print_hex_dump(KERN_ERR, "raw data: ", DUMP_PREFIX_OFFSET,
+                                           16, 1, skb->data, len, true);
+               }
                skb->ip_summed = CHECKSUM_COMPLETE;
                skb->protocol = eth_type_trans(skb, gp->dev);
 
@@ -1761,7 +1768,7 @@ static void gem_init_dma(struct gem *gp)
        writel(0, gp->regs + TXDMA_KICK);
 
        val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
-              ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
+              (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
        writel(val, gp->regs + RXDMA_CFG);
 
        writel(desc_dma >> 32, gp->regs + RXDMA_DBHI);

^ permalink raw reply related

* Re: [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss
From: Joe Perches @ 2018-06-19 21:56 UTC (permalink / raw)
  To: Andy Chittenden, 'Trond Myklebust'
  Cc: 'Andrew Morton', 'David Miller', kuznet, pekkas,
	jmorris, yoshfuji, kaber, eric.dumazet, William.Allen.Simpson,
	gilad, ilpo.jarvinen, netdev, linux-kernel, linux-nfs,
	'J. Bruce Fields', 'Neil Brown',
	'Chuck Lever', 'Benny Halevy',
	'Alexandros Batsakis', Andy Chittenden
In-Reply-To: <4c5fc9f3.487e0e0a.2960.ffffe4ec@mx.google.com>

On Mon, 2010-08-09 at 10:27 +0100, Andy Chittenden wrote:

You really need to check your clock.
Mail sent in the year 2010?

^ permalink raw reply

* Re: [PATCH] PCI: allow drivers to limit the number of VFs to 0
From: Bjorn Helgaas @ 2018-06-19 21:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Bjorn Helgaas, linux-pci, netdev, Sathya Perla, Felix Manlunas,
	alexander.duyck, john.fastabend, Jacob Keller, Donald Dutile,
	oss-drivers, Christoph Hellwig
In-Reply-To: <20180525140223.GA45098@bhelgaas-glaptop.roam.corp.google.com>

On Fri, May 25, 2018 at 09:02:23AM -0500, Bjorn Helgaas wrote:
> On Thu, May 24, 2018 at 06:20:15PM -0700, Jakub Kicinski wrote:
> > Hi Bjorn!
> > 
> > On Thu, 24 May 2018 18:57:48 -0500, Bjorn Helgaas wrote:
> > > On Mon, Apr 02, 2018 at 03:46:52PM -0700, Jakub Kicinski wrote:
> > > > Some user space depends on enabling sriov_totalvfs number of VFs
> > > > to not fail, e.g.:
> > > > 
> > > > $ cat .../sriov_totalvfs > .../sriov_numvfs
> > > > 
> > > > For devices which VF support depends on loaded FW we have the
> > > > pci_sriov_{g,s}et_totalvfs() API.  However, this API uses 0 as
> > > > a special "unset" value, meaning drivers can't limit sriov_totalvfs
> > > > to 0.  Remove the special values completely and simply initialize
> > > > driver_max_VFs to total_VFs.  Then always use driver_max_VFs.
> > > > Add a helper for drivers to reset the VF limit back to total.  
> > > 
> > > I still can't really make sense out of the changelog.
> > >
> > > I think part of the reason it's confusing is because there are two
> > > things going on:
> > > 
> > >   1) You want this:
> > >   
> > >        pci_sriov_set_totalvfs(dev, 0);
> > >        x = pci_sriov_get_totalvfs(dev) 
> > > 
> > >      to return 0 instead of total_VFs.  That seems to connect with
> > >      your subject line.  It means "sriov_totalvfs" in sysfs could be
> > >      0, but I don't know how that is useful (I'm sure it is; just
> > >      educate me :))
> > 
> > Let me just quote the bug report that got filed on our internal bug
> > tracker :)
> > 
> >   When testing Juju Openstack with Ubuntu 18.04, enabling SR-IOV causes
> >   errors because Juju gets the sriov_totalvfs for SR-IOV-capable device
> >   then tries to set that as the sriov_numvfs parameter.
> > 
> >   For SR-IOV incapable FW, the sriov_totalvfs parameter should be 0, 
> >   but it's set to max.  When FW is switched to flower*, the correct 
> >   sriov_totalvfs value is presented.
> > 
> > * flower is a project name
> 
> From the point of view of the PCI core (which knows nothing about
> device firmware and relies on the architected config space described
> by the PCIe spec), this sounds like an erratum: with some firmware
> installed, the device is not capable of SR-IOV, but still advertises
> an SR-IOV capability with "TotalVFs > 0".
> 
> Regardless of whether that's an erratum, we do allow PF drivers to use
> pci_sriov_set_totalvfs() to limit the number of VFs that may be
> enabled by writing to the PF's "sriov_numvfs" sysfs file.
> 
> But the current implementation does not allow a PF driver to limit VFs
> to 0, and that does seem nonsensical.
> 
> > My understanding is OpenStack uses sriov_totalvfs to determine how many
> > VFs can be enabled, looks like this is the code:
> > 
> > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n464
> > 
> > >   2) You're adding the pci_sriov_reset_totalvfs() interface.  I'm not
> > >      sure what you intend for this.  Is *every* driver supposed to
> > >      call it in .remove()?  Could/should this be done in the core
> > >      somehow instead of depending on every driver?
> > 
> > Good question, I was just thinking yesterday we may want to call it
> > from the core, but I don't think it's strictly necessary nor always
> > sufficient (we may reload FW without re-probing).
> > 
> > We have a device which supports different number of VFs based on the FW
> > loaded.  Some legacy FWs does not inform the driver how many VFs it can
> > support, because it supports max.  So the flow in our driver is this:
> > 
> > load_fw(dev);
> > ...
> > max_vfs = ask_fw_for_max_vfs(dev);
> > if (max_vfs >= 0)
> > 	return pci_sriov_set_totalvfs(dev, max_vfs);
> > else /* FW didn't tell us, assume max */
> > 	return pci_sriov_reset_totalvfs(dev); 
> > 
> > We also reset the max on device remove, but that's not strictly
> > necessary.
> > 
> > Other users of pci_sriov_set_totalvfs() always know the value to set
> > the total to (either always get it from FW or it's a constant).
> > 
> > If you prefer we can work out the correct max for those legacy cases in
> > the driver as well, although it seemed cleaner to just ask the core,
> > since it already has total_VFs value handy :)
> > 
> > > I'm also having a hard time connecting your user-space command example
> > > with the rest of this.  Maybe it will make more sense to me tomorrow
> > > after some coffee.
> > 
> > OpenStack assumes it will always be able to set sriov_numvfs to
> > sriov_totalvfs, see this 'if':
> > 
> > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n512
> 
> Thanks for educating me.  I think there are two issues here that we
> can separate.  I extracted the patch below for the first.
> 
> The second is the question of resetting driver_max_VFs.  I think we
> currently have a general issue in the core:
> 
>   - load PF driver 1
>   - driver calls pci_sriov_set_totalvfs() to reduce driver_max_VFs
>   - unload PF driver 1
>   - load PF driver 2
> 
> Now driver_max_VFs is still stuck at the lower value set by driver 1.
> I don't think that's the way this should work.
> 
> I guess this is partly a consequence of setting driver_max_VFs in
> sriov_init(), which is called before driver attach and should only
> depend on hardware characteristics, so it is related to the patch
> below.  But I think we should fix it in general, not just for
> netronome.

Hi Jakub, the patch below is in v4.18-rc1 as 8d85a7a4f2c9 ("PCI/IOV:
Allow PF drivers to limit total_VFs to 0").  If there's more we need
to do here, would you mind rebasing what's left to v4.18-rc1 and
reposting it?

> commit 4a338bc6f94b9ad824ac944f5dfc249d6838719c
> Author: Jakub Kicinski <jakub.kicinski@netronome.com>
> Date:   Fri May 25 08:18:34 2018 -0500
> 
>     PCI/IOV: Allow PF drivers to limit total_VFs to 0
>     
>     Some SR-IOV PF drivers implement .sriov_configure(), which allows
>     user-space to enable VFs by writing the desired number of VFs to the sysfs
>     "sriov_numvfs" file (see sriov_numvfs_store()).
>     
>     The PCI core limits the number of VFs to the TotalVFs advertised by the
>     device in its SR-IOV capability.  The PF driver can limit the number of VFs
>     to even fewer (it may have pre-allocated data structures or knowledge of
>     device limitations) by calling pci_sriov_set_totalvfs(), but previously it
>     could not limit the VFs to 0.
>     
>     Change pci_sriov_get_totalvfs() so it always respects the VF limit imposed
>     by the PF driver, even if the limit is 0.
>     
>     This sequence:
>     
>       pci_sriov_set_totalvfs(dev, 0);
>       x = pci_sriov_get_totalvfs(dev);
>     
>     previously set "x" to TotalVFs from the SR-IOV capability.  Now it will set
>     "x" to 0.
>     
>     Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 192b82898a38..d0d73dbbd5ca 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -469,6 +469,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
>  	iov->nres = nres;
>  	iov->ctrl = ctrl;
>  	iov->total_VFs = total;
> +	iov->driver_max_VFs = total;
>  	pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, &iov->vf_device);
>  	iov->pgsz = pgsz;
>  	iov->self = dev;
> @@ -827,10 +828,7 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
>  	if (!dev->is_physfn)
>  		return 0;
>  
> -	if (dev->sriov->driver_max_VFs)
> -		return dev->sriov->driver_max_VFs;
> -
> -	return dev->sriov->total_VFs;
> +	return dev->sriov->driver_max_VFs;
>  }
>  EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
>  

^ permalink raw reply

* Re: [PATCH 1/5] net: emaclite: Use __func__ instead of hardcoded name
From: Andy Shevchenko @ 2018-06-19 21:36 UTC (permalink / raw)
  To: Radhey Shyam Pandey
  Cc: David S. Miller, Andrew Lunn, Michal Simek, netdev,
	linux-arm Mailing List, Linux Kernel Mailing List
In-Reply-To: <1529320103-7711-2-git-send-email-radhey.shyam.pandey@xilinx.com>

On Mon, Jun 18, 2018 at 2:08 PM, Radhey Shyam Pandey
<radhey.shyam.pandey@xilinx.com> wrote:
> Switch hardcoded function name with a reference to __func__ making
> the code more maintainable. Address below checkpatch warning:
>
> WARNING: Prefer using '"%s...", __func__' to using 'xemaclite_mdio_read',
> this function's name, in a string
> +               "xemaclite_mdio_read(phy_id=%i, reg=%x) == %x\n",
>
> WARNING: Prefer using '"%s...", __func__' to using 'xemaclite_mdio_write',
> this function's name, in a string
> +               "xemaclite_mdio_write(phy_id=%i, reg=%x, val=%x)\n",
>

For dev_dbg() the __func__ should be completely dropped away.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* [net] bpf, xdp, i40e: fix i40e_build_skb skb reserve and truesize
From: Jeff Kirsher @ 2018-06-19 21:33 UTC (permalink / raw)
  To: davem
  Cc: Daniel Borkmann, netdev, nhorman, sassmann, jogreene,
	Björn Töpel, John Fastabend, Jeff Kirsher

From: Daniel Borkmann <daniel@iogearbox.net>

Using skb_reserve(skb, I40E_SKB_PAD + (xdp->data - xdp->data_hard_start))
is clearly wrong since I40E_SKB_PAD already points to the offset where
the original xdp->data was sitting since xdp->data_hard_start is defined
as xdp->data - i40e_rx_offset(rx_ring) where latter offsets to I40E_SKB_PAD
when build skb is used.

However, also before cc5b114dcf98 ("bpf, i40e: add meta data support")
this seems broken since bpf_xdp_adjust_head() helper could have been used
to alter headroom and enlarge / shrink the frame and with that the assumption
that the xdp->data remains unchanged does not hold and would push a bogus
packet to upper stack.

ixgbe got this right in 924708081629 ("ixgbe: add XDP support for pass and
drop actions"). In any case, fix it by removing the I40E_SKB_PAD from both
skb_reserve() and truesize calculation.

Fixes: cc5b114dcf98 ("bpf, i40e: add meta data support")
Fixes: 0c8493d90b6b ("i40e: add XDP support for pass and drop actions")
Reported-by: Keith Busch <keith.busch@linux.intel.com>
Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Tested-by: Keith Busch <keith.busch@linux.intel.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 8ffb7454e67c..ed6dbcfd4e96 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2103,9 +2103,8 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
 	unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
 #else
 	unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) +
-				SKB_DATA_ALIGN(I40E_SKB_PAD +
-					       (xdp->data_end -
-						xdp->data_hard_start));
+				SKB_DATA_ALIGN(xdp->data_end -
+					       xdp->data_hard_start);
 #endif
 	struct sk_buff *skb;
 
@@ -2124,7 +2123,7 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
 		return NULL;
 
 	/* update pointers within the skb to store the data */
-	skb_reserve(skb, I40E_SKB_PAD + (xdp->data - xdp->data_hard_start));
+	skb_reserve(skb, xdp->data - xdp->data_hard_start);
 	__skb_put(skb, xdp->data_end - xdp->data);
 	if (metasize)
 		skb_metadata_set(skb, metasize);
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH bpf-net] bpf: Change bpf_fib_lookup to return lookup status
From: Martin KaFai Lau @ 2018-06-19 21:24 UTC (permalink / raw)
  To: David Ahern; +Cc: dsahern, netdev, borkmann, ast, davem
In-Reply-To: <7697c6fd-7871-4d96-bd1c-c81a7d70ec39@gmail.com>

On Tue, Jun 19, 2018 at 02:16:53PM -0600, David Ahern wrote:
> On 6/19/18 10:36 AM, Martin KaFai Lau wrote:
> > On Tue, Jun 19, 2018 at 09:34:28AM -0600, David Ahern wrote:
> >> On 6/19/18 9:25 AM, Martin KaFai Lau wrote:
> >>> On Mon, Jun 18, 2018 at 03:35:25PM -0600, David Ahern wrote:
> >>>> On 6/18/18 2:55 PM, Martin KaFai Lau wrote:
> >>>>>> 	/* rc > 0 case */
> >>>>>> 	switch(rc) {
> >>>>>> 	case BPF_FIB_LKUP_RET_BLACKHOLE:
> >>>>>> 	case BPF_FIB_LKUP_RET_UNREACHABLE:
> >>>>>> 	case BPF_FIB_LKUP_RET_PROHIBIT:
> >>>>>> 		return XDP_DROP;
> >>>>>> 	}
> >>>>>>
> >>>>>> For the others it becomes a question of do we share why the stack needs
> >>>>>> to be involved? Maybe the program wants to collect stats to show traffic
> >>>>>> patterns that can be improved (BPF_FIB_LKUP_RET_FRAG_NEEDED) or support
> >>>>>> in the kernel needs to be improved (BPF_FIB_LKUP_RET_UNSUPP_LWT) or an
> >>>>>> interface is misconfigured (BPF_FIB_LKUP_RET_FWD_DISABLED).
> >>>>> Thanks for the explanation.
> >>>>>
> >>>>> Agree on the bpf able to collect stats will be useful.
> >>>>>
> >>>>> I am wondering, if a new BPF_FIB_LKUP_RET_XYZ is added later,
> >>>>> how may the old xdp_prog work/not-work?  As of now, the return value
> >>>>> is straight forward, FWD, PASS (to stack) or DROP (error).
> >>>>> With this change, the xdp_prog needs to match/switch() the
> >>>>> BPF_FIB_LKUP_RET_* to at least PASS and DROP.
> >>>>
> >>>> IMO, programs should only call XDP_DROP for known reasons - like the 3
> >>>> above. Anything else punt to the stack.
> >>>>
> >>>> If a new RET_XYZ comes along:
> >>>> 1. the new XYZ is a new ACL response where the packet is to be dropped.
> >>>> If the program does not understand XYZ and punts to the stack
> >>>> (recommendation), then a second lookup is done during normal packet
> >>>> processing and the stack drops it.
> >>>>
> >>>> 2. the new XYZ is a new path in the kernel that is unsupported with
> >>>> respect to XDP forwarding, nothing new for the program to do.
> >>>>
> >>>> Either way I would expect stats on BPF_FIB_LKUP_RET_* to give a hint to
> >>>> the program writer.
> >>>>
> >>>> Worst case of punting packets to the stack for any rc != 0 means the
> >>>> stack is doing 2 lookups - 1 in XDP based on its lookup parameters and 1
> >>>> in normal stack processing - to handle the packet.
> >>> Instead of having the xdp_prog to follow the meaning of what RET_SYZ is,
> >>> should the bpf_*_fib_lookup() return value be kept as is such that
> >>> the xdp_prog is clear what to do.  The reason can be returned in
> >>> the 'struct bpf_fib_lookup'.  The number of reasons can be extended.
> >>> If the xdp_prog does not understand a reason, it still will not
> >>> affect its decision because the return value is clear.
> >>> I think the situation here is similar to regular syscall which usually
> >>> uses -1 to clearly states error and errno to spells out the reason.
> >>>
> >>
> >> I did consider returning the status in struct bpf_fib_lookup. However,
> >> it is 64 bytes and can not be extended without a big performance
> >> penalty, so the only option there is to make an existing entry a union
> >> the most logical of which is the ifindex. It seemed odd to me to have
> >> the result by hidden in the struct as a union on ifindex and returning
> >> the egress index from the function:
> >>
> >> @@ -2625,7 +2636,11 @@ struct bpf_fib_lookup {
> >>
> >>         /* total length of packet from network header - used for MTU
> >> check */
> >>         __u16   tot_len;
> >> -       __u32   ifindex;  /* L3 device index for lookup */
> >> +
> >> +       union {
> >> +               __u32   ifindex;  /* input: L3 device index for lookup */
> >> +               __u32   result;   /* output: one of BPF_FIB_LKUP_RET_* */
> >> +       };
> >>
> >>
> >> It seemed more natural to have ifindex stay ifindex and only change
> >> value on return:
> >>
> >> @@ -2625,7 +2639,11 @@ struct bpf_fib_lookup {
> >>
> >>  	/* total length of packet from network header - used for MTU check */
> >>  	__u16	tot_len;
> >> -	__u32	ifindex;  /* L3 device index for lookup */
> >> +
> >> +	/* input: L3 device index for lookup
> >> +	 * output: nexthop device index from FIB lookup
> >> +	 */
> >> +	__u32	ifindex;
> >>
> >>  	union {
> >>  		/* inputs to lookup */
> >>
> >>
> >> From a program's perspective:
> >>
> >> rc < 0  -- program is passing incorrect data
> >> rc == 0 -- packet can be forwarded
> >> rc > 0  -- packet can not be forwarded.
> >>
> >> BPF programs are not required to track the LKUP_RET values any more than
> >> a function returning multiple negative values - the caller just checks
> >> rc < 0 means failure. If the program cares it can look at specific
> >> values of rc to see the specific value.
> >>
> >> The same applies with the LKUP_RET values - they are there to provide
> >> insight into why the packet is not forwarded directly if the program
> >> cares to know why.
> > hmm...ic. My concern is, the prog can interpret rc > 0 (in this patch) to be
> > drop vs pass (although we can advise them in bpf.h to always pass if it does
> > not understand a rc but it is not a strong contract),  it may catch people
> > a surprise if a xdp_prog suddenly drops everything when running in a
> > newer kernel where the upper stack can actually handle it.
> > 
> > while the current behavior (i.e. before this patch, rc == 0) is always pass
> > to the stack.
> > 
> > I think at least comments should be put in the enum such that
> > the xdp/tc_prog should expect the enum could be extended later, so
> > the suggested behavior should be a pass for unknown LKUP_RET and let
> > the stack to decide.
> > 
> 
> All APIs with enums have the inherent quality that more can be added.
> Nothing about rc > 0 says it is ok to drop the packet and nothing in the
> documentation says it is ok to drop the packet. The program author needs
> to look at the extra information provided by the rc. Specific values are
> a hint that yes the packet can be dropped; others merely say 'packet
> needs help from the stack' with a reason why it needs help.
Right, I understand there are values that hint drop (as your earlier example)
and there are values that hint pass to stack.  and either set of these values
could be extended later.

> 
> My intention is to allow the XDP program to understand FIB based ACLs.
> To that end a fib lookup result of blackhole, unreachable and prohibit
> needs to be returned to the xdp program with the effective summary of
> "can drop in the xdp program".
> 
> If the other return codes are going to cause confusion then less shorten
> the list:
> 
> enum {
> 	BPF_FIB_LKUP_RET_SUCCESS,      /* packet is to be forwareded */
> 	BPF_FIB_LKUP_RET_CAN_DROP,     /* XDP program can drop the packet */
> 	BPF_FIB_LKUP_RET_NEED_STACK,   /* packet needs full stack assist */
> };
> 
> But, that still does not solve the problem of rc > 0 means xdp program
> can drop the packet, but then that is not the intention of rc > 0. The
> intention is only "here's more information about why this packet can not
> be forwarded at this layer"
ok. Please spell out this intention in bpf.h so that the writer will not
default to DROP for the reason that it does not understand.

^ permalink raw reply

* Re: [PATCH net] ipvlan: call dev_change_flags when reset ipvlan mode
From: Cong Wang @ 2018-06-19 21:10 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: Linux Kernel Network Developers, Stefano Brivio, Paolo Abeni,
	David Miller, Mahesh Bandewar
In-Reply-To: <1529330677-15328-1-git-send-email-liuhangbin@gmail.com>

On Mon, Jun 18, 2018 at 7:04 AM, Hangbin Liu <liuhangbin@gmail.com> wrote:
> @@ -94,10 +95,13 @@ static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval)
>                         mdev->l3mdev_ops = NULL;
>                 }
>                 list_for_each_entry(ipvlan, &port->ipvlans, pnode) {
> +                       flags = ipvlan->dev->flags;
>                         if (nval == IPVLAN_MODE_L3 || nval == IPVLAN_MODE_L3S)
> -                               ipvlan->dev->flags |= IFF_NOARP;
> +                               dev_change_flags(ipvlan->dev,
> +                                                flags | IFF_NOARP);
>                         else
> -                               ipvlan->dev->flags &= ~IFF_NOARP;
> +                               dev_change_flags(ipvlan->dev,
> +                                                flags & ~IFF_NOARP);

You need to check the return value of dev_change_flags().

^ permalink raw reply

* Re: iproute2 won't compile without AF_VSOCK
From: Steve Wise @ 2018-06-19 20:41 UTC (permalink / raw)
  To: David Ahern, Stephen Hemminger; +Cc: netdev
In-Reply-To: <d6f6b60d-6dad-fd0c-cdc5-b560fe6da12d@gmail.com>



On 6/19/2018 3:29 PM, David Ahern wrote:
> On 6/19/18 2:27 PM, David Ahern wrote:
>> On 6/19/18 9:47 AM, Stephen Hemminger wrote:
>>> On Tue, 19 Jun 2018 10:17:45 -0500
>>> Steve Wise <swise@opengridcomputing.com> wrote:
>>>
>>>> Hey David,
>>>>
>>>> I'm trying to compile the latest iproute2 on an RHEL-7.3 distro, and it
>>>> fails to compile because AF_VSOCK is not defined.  Should this
>>>> functionality be a configure option to disable it on older distros?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Steve.
>>>>
>>>> ----
>>>>
>>>> misc
>>>>     CC       ss.o
>>>> ss.c:301:27: error: ‘AF_VSOCK’ undeclared here (not in a function)
>>>>    .families = FAMILY_MASK(AF_VSOCK),
>>>>                            ^
>>>> ss.c:252:46: note: in definition of macro ‘FAMILY_MASK’
>>>>  #define FAMILY_MASK(family) ((uint64_t)1 << (family))
>>>>                                               ^
>>>> ss.c:334:2: error: array index in initializer not of integer type
>>>>   [AF_VSOCK] = {
>>>>   ^
>>>> ss.c:334:2: error: (near initialization for ‘default_afs’)
>>>> make[1]: *** [ss.o] Error 1
>>>> make: *** [all] Error 2
>>>>
>>> Probably should just add an #ifdef to takeout that if not present
>>>
>> Most userspace tools have a compat header for cases like this.
>>
>> #ifndef AF_VSOCK
>> #define AF_VSOCK 	40
>> #endif
>>
> Add the above to include//utils.h; AF_MPLS is already there.

I'll send out a patch.

Thanks,

Steve.

^ permalink raw reply

* Re: [bug] cxgb4: vrf stopped working with cxgb4 card
From: AMG Zollner Robert @ 2018-06-19 20:32 UTC (permalink / raw)
  To: Ganesh Goudar; +Cc: office, dsa, netdev
In-Reply-To: <20180619132425.GA6576@chelsio.com>

On 19.06.2018 16:24, Ganesh Goudar wrote:
> On Monday, June 06/11/18, 2018 at 14:47:55 +0530, Ganesh Goudar wrote:
>> On Saturday, June 06/09/18, 2018 at 18:47:55 -0600, David Ahern wrote:
>>> Ganesh:
>>>
>>> On 6/4/18 9:03 AM, AMG Zollner Robert wrote:
>>>> I have noticed that vrf is not working with kernel v4.15.0 but was
>>>> working with v4.13.0 when using cxgb4 Chelsio driver (T520-cr)
>>>>
>>>> Setup:
>>>> Two metal servers with a T520-cr card each, directly connected without a
>>>> switch in between.
>>>>
>>>>         SVR1  only ipfwd                 SVR2     with vrf
>>>> .----------------------------. .----------------------------------.
>>>> |                            |         |             |
>>>> |    192.168.8.1 [  ens2f4]--|---------|--[ens1f4] 192.168.8.2   |
>>>> |    192.168.9.1 [ens2f4d1]--|---------|--<ens1f4d1> 192.168.9.2 VRF=10   |
>>>> `----------------------------' `----------------------------------'
>>>>
>>>> When vrf is not working there are no error messages (dmesg or iproute
>>>> commands), tcpdump on the interface (SVR2.ens1f4d1) enslaved in vrf 10
>>>> shows packets(arp req/reply) coming in and going out, but outgoing
>>>> packets(arp reply) do not reach the other server SVR1.ens2f4d1
>>>>
>>>>
>>>> Bisect:
>>>> Found this commit to be the problem after doing a git bisect between
>>>> v4.13..v4.15:
>>>>
>>>> commit ba581f77df23c8ee70b372966e69cf10bc5453d8
>>>> Author: Ganesh Goudar <ganeshgr@chelsio.com>
>>>> Date:   Sat Sep 23 16:07:28 2017 +0530
>>>>
>>>>      cxgb4: do DCB state reset in couple of places
>>>>
>>>>      reset the driver's DCB state in couple of places
>>>>      where it was missing.
>>>
>>> Are you working on a fix for this or should a revert of the above patch
>>> be sent?
>> Will look into it and fix/revert it soon, Thanks for responding to Robert.
>>>
>>>
>>>>
>>>>
>>>> A bisect step was considered good when:
>>>> - successful ping from SVR1 to SVR2.ens1f4d1 vrf interface
>>>> - successful ping from SVR2 global to SVR2 vrf interface trough SVR1(l3
>>>> forwarding) (this check was redundant,both tests fail or pass simultaneous)
>>>>
>>>> The problem is still present on recent kernels also, checked v4.16.0 and
>>>> v4.17.rc7
>>>>
>>>> Disabling DCB for the card support fixes the problem ( Compiling kernel
>>>> with "CONFIG_CHELSIO_T4_DCB=n")
>>>>
>>>>
>>>>
>>>> This is my first time reporting a bug to the linux kernel and hope I
>>>> have included the right amount of information. Please let me know if I
>>>> have missed something.
>>>>
>>>>
>>>>
>>>> Thank you,
>>>> Zollner Robert
>>>>
>>>>
>>>> --------
>>>> Logs:
>>>>
>>>> VRF configured using folowing commands:
>>>>
>>>> #!/bin/sh
>>>>
>>>> CHDEV=ens1f4
>>>> VRF=vrf-recv
>>>>
>>>> sysctl -w net.ipv4.tcp_l3mdev_accept=1
>>>> sysctl -w net.ipv4.udp_l3mdev_accept=1
>>>> sysctl -w net.ipv4.conf.all.accept_local=1
>>>>
>>>> ifconfig ${CHDEV}   192.168.8.2/24
>>>> ifconfig ${CHDEV}d1 192.168.9.2/24
>>>>
>>>> ip link add ${VRF} type vrf table 10
>>>> ip link set dev ${VRF} up
>>>>
>>>> ip rule add pref 32765 table local
>>>> ip rule del pref 0
>>>>
>>>> ip route add table 10 unreachable default metric 4278198272
>>>>
>>>> ip link set dev ${CHDEV}d1 master ${VRF}
>>>>
>>>> ip route add table 10 default via 192.168.9.1
>>>> ip route add 192.168.9.0/24 via 192.168.8.1
>>>>
>>>>
>>>>
>>>>
>>>
> -netdev, Please feel free to add if needed.
> 
> Hi Robert,
> 
> My knowledge of VRF is very limited, I am trying to bring
> up VRF setup, I just wanted to check if you are doing anything
> related DCB and also please let me know how did you setup SRV1.
> 
> Thanks
> 

Hello Ganesh,

SRV1 is just forwarding(l3) between the two physical ports of the 
T520-CR card.

ifconfig ens1f4   192.168.8.1/24
ifconfig ens1f4d1 192.168.9.1/24
sysctl -w net.ipv4.ip_forward=1

- No VRF is configured on this box
- DCB is also not used


SVR2 is using VRF and is configured with the script inlined in the first 
email.

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-19 20:32 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Siwei Liu, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180619125453.2d2dfb2d.cohuck@redhat.com>

On Tue, Jun 19, 2018 at 12:54:53PM +0200, Cornelia Huck wrote:
> Sorry about dragging mainframes into this, but this will only work for
> homogenous device coupling, not for heterogenous. Consider my vfio-pci
> + virtio-net-ccw example again: The guest cannot find out that the two
> belong together by checking some group ID, it has to either use the MAC
> or some needs-to-be-architectured property.
> 
> Alternatively, we could propose that mechanism as pci-only, which means
> we can rely on mechanisms that won't necessarily work on non-pci
> transports. (FWIW, I don't see a use case for using vfio-ccw to pass
> through a network card anytime in the near future, due to the nature of
> network cards currently in use on s390.)

That's what it boils down to, yes.  If there's need to have this for
non-pci devices, then we should put it in config space.
Cornelia, what do you think?

-- 
MST

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox