Netdev List
 help / color / mirror / Atom feed
* Re: iproute2 won't compile without AF_VSOCK
From: David Ahern @ 2018-06-19 20:27 UTC (permalink / raw)
  To: Stephen Hemminger, Steve Wise; +Cc: netdev
In-Reply-To: <20180619084732.0de6d75d@xeon-e3>

On 6/19/18 9:47 AM, Stephen Hemminger wrote:
> On Tue, 19 Jun 2018 10:17:45 -0500
> Steve Wise <swise@opengridcomputing.com> wrote:
> 
>> Hey David,
>>
>> I'm trying to compile the latest iproute2 on an RHEL-7.3 distro, and it
>> fails to compile because AF_VSOCK is not defined.  Should this
>> functionality be a configure option to disable it on older distros?
>>
>>
>> Thanks,
>>
>> Steve.
>>
>> ----
>>
>> misc
>>     CC       ss.o
>> ss.c:301:27: error: ‘AF_VSOCK’ undeclared here (not in a function)
>>    .families = FAMILY_MASK(AF_VSOCK),
>>                            ^
>> ss.c:252:46: note: in definition of macro ‘FAMILY_MASK’
>>  #define FAMILY_MASK(family) ((uint64_t)1 << (family))
>>                                               ^
>> ss.c:334:2: error: array index in initializer not of integer type
>>   [AF_VSOCK] = {
>>   ^
>> ss.c:334:2: error: (near initialization for ‘default_afs’)
>> make[1]: *** [ss.o] Error 1
>> make: *** [all] Error 2
>>
> 
> Probably should just add an #ifdef to takeout that if not present
> 

Most userspace tools have a compat header for cases like this.

#ifndef AF_VSOCK
#define AF_VSOCK 	40
#endif

^ permalink raw reply

* Re: iproute2 won't compile without AF_VSOCK
From: David Ahern @ 2018-06-19 20:29 UTC (permalink / raw)
  To: Stephen Hemminger, Steve Wise; +Cc: netdev
In-Reply-To: <f8c68944-57b6-18a6-aa50-c019baa2a63b@gmail.com>

On 6/19/18 2:27 PM, David Ahern wrote:
> On 6/19/18 9:47 AM, Stephen Hemminger wrote:
>> On Tue, 19 Jun 2018 10:17:45 -0500
>> Steve Wise <swise@opengridcomputing.com> wrote:
>>
>>> Hey David,
>>>
>>> I'm trying to compile the latest iproute2 on an RHEL-7.3 distro, and it
>>> fails to compile because AF_VSOCK is not defined.  Should this
>>> functionality be a configure option to disable it on older distros?
>>>
>>>
>>> Thanks,
>>>
>>> Steve.
>>>
>>> ----
>>>
>>> misc
>>>     CC       ss.o
>>> ss.c:301:27: error: ‘AF_VSOCK’ undeclared here (not in a function)
>>>    .families = FAMILY_MASK(AF_VSOCK),
>>>                            ^
>>> ss.c:252:46: note: in definition of macro ‘FAMILY_MASK’
>>>  #define FAMILY_MASK(family) ((uint64_t)1 << (family))
>>>                                               ^
>>> ss.c:334:2: error: array index in initializer not of integer type
>>>   [AF_VSOCK] = {
>>>   ^
>>> ss.c:334:2: error: (near initialization for ‘default_afs’)
>>> make[1]: *** [ss.o] Error 1
>>> make: *** [all] Error 2
>>>
>>
>> Probably should just add an #ifdef to takeout that if not present
>>
> 
> Most userspace tools have a compat header for cases like this.
> 
> #ifndef AF_VSOCK
> #define AF_VSOCK 	40
> #endif
> 

Add the above to include//utils.h; AF_MPLS is already there.

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-19 20:32 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Siwei Liu, Samudrala, Sridhar, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180619125453.2d2dfb2d.cohuck@redhat.com>

On Tue, Jun 19, 2018 at 12:54:53PM +0200, Cornelia Huck wrote:
> Sorry about dragging mainframes into this, but this will only work for
> homogenous device coupling, not for heterogenous. Consider my vfio-pci
> + virtio-net-ccw example again: The guest cannot find out that the two
> belong together by checking some group ID, it has to either use the MAC
> or some needs-to-be-architectured property.
> 
> Alternatively, we could propose that mechanism as pci-only, which means
> we can rely on mechanisms that won't necessarily work on non-pci
> transports. (FWIW, I don't see a use case for using vfio-ccw to pass
> through a network card anytime in the near future, due to the nature of
> network cards currently in use on s390.)

That's what it boils down to, yes.  If there's need to have this for
non-pci devices, then we should put it in config space.
Cornelia, what do you think?

-- 
MST

^ permalink raw reply

* Re: [bug] cxgb4: vrf stopped working with cxgb4 card
From: AMG Zollner Robert @ 2018-06-19 20:32 UTC (permalink / raw)
  To: Ganesh Goudar; +Cc: office, dsa, netdev
In-Reply-To: <20180619132425.GA6576@chelsio.com>

On 19.06.2018 16:24, Ganesh Goudar wrote:
> On Monday, June 06/11/18, 2018 at 14:47:55 +0530, Ganesh Goudar wrote:
>> On Saturday, June 06/09/18, 2018 at 18:47:55 -0600, David Ahern wrote:
>>> Ganesh:
>>>
>>> On 6/4/18 9:03 AM, AMG Zollner Robert wrote:
>>>> I have noticed that vrf is not working with kernel v4.15.0 but was
>>>> working with v4.13.0 when using cxgb4 Chelsio driver (T520-cr)
>>>>
>>>> Setup:
>>>> Two metal servers with a T520-cr card each, directly connected without a
>>>> switch in between.
>>>>
>>>>         SVR1  only ipfwd                 SVR2     with vrf
>>>> .----------------------------. .----------------------------------.
>>>> |                            |         |             |
>>>> |    192.168.8.1 [  ens2f4]--|---------|--[ens1f4] 192.168.8.2   |
>>>> |    192.168.9.1 [ens2f4d1]--|---------|--<ens1f4d1> 192.168.9.2 VRF=10   |
>>>> `----------------------------' `----------------------------------'
>>>>
>>>> When vrf is not working there are no error messages (dmesg or iproute
>>>> commands), tcpdump on the interface (SVR2.ens1f4d1) enslaved in vrf 10
>>>> shows packets(arp req/reply) coming in and going out, but outgoing
>>>> packets(arp reply) do not reach the other server SVR1.ens2f4d1
>>>>
>>>>
>>>> Bisect:
>>>> Found this commit to be the problem after doing a git bisect between
>>>> v4.13..v4.15:
>>>>
>>>> commit ba581f77df23c8ee70b372966e69cf10bc5453d8
>>>> Author: Ganesh Goudar <ganeshgr@chelsio.com>
>>>> Date:   Sat Sep 23 16:07:28 2017 +0530
>>>>
>>>>      cxgb4: do DCB state reset in couple of places
>>>>
>>>>      reset the driver's DCB state in couple of places
>>>>      where it was missing.
>>>
>>> Are you working on a fix for this or should a revert of the above patch
>>> be sent?
>> Will look into it and fix/revert it soon, Thanks for responding to Robert.
>>>
>>>
>>>>
>>>>
>>>> A bisect step was considered good when:
>>>> - successful ping from SVR1 to SVR2.ens1f4d1 vrf interface
>>>> - successful ping from SVR2 global to SVR2 vrf interface trough SVR1(l3
>>>> forwarding) (this check was redundant,both tests fail or pass simultaneous)
>>>>
>>>> The problem is still present on recent kernels also, checked v4.16.0 and
>>>> v4.17.rc7
>>>>
>>>> Disabling DCB for the card support fixes the problem ( Compiling kernel
>>>> with "CONFIG_CHELSIO_T4_DCB=n")
>>>>
>>>>
>>>>
>>>> This is my first time reporting a bug to the linux kernel and hope I
>>>> have included the right amount of information. Please let me know if I
>>>> have missed something.
>>>>
>>>>
>>>>
>>>> Thank you,
>>>> Zollner Robert
>>>>
>>>>
>>>> --------
>>>> Logs:
>>>>
>>>> VRF configured using folowing commands:
>>>>
>>>> #!/bin/sh
>>>>
>>>> CHDEV=ens1f4
>>>> VRF=vrf-recv
>>>>
>>>> sysctl -w net.ipv4.tcp_l3mdev_accept=1
>>>> sysctl -w net.ipv4.udp_l3mdev_accept=1
>>>> sysctl -w net.ipv4.conf.all.accept_local=1
>>>>
>>>> ifconfig ${CHDEV}   192.168.8.2/24
>>>> ifconfig ${CHDEV}d1 192.168.9.2/24
>>>>
>>>> ip link add ${VRF} type vrf table 10
>>>> ip link set dev ${VRF} up
>>>>
>>>> ip rule add pref 32765 table local
>>>> ip rule del pref 0
>>>>
>>>> ip route add table 10 unreachable default metric 4278198272
>>>>
>>>> ip link set dev ${CHDEV}d1 master ${VRF}
>>>>
>>>> ip route add table 10 default via 192.168.9.1
>>>> ip route add 192.168.9.0/24 via 192.168.8.1
>>>>
>>>>
>>>>
>>>>
>>>
> -netdev, Please feel free to add if needed.
> 
> Hi Robert,
> 
> My knowledge of VRF is very limited, I am trying to bring
> up VRF setup, I just wanted to check if you are doing anything
> related DCB and also please let me know how did you setup SRV1.
> 
> Thanks
> 

Hello Ganesh,

SRV1 is just forwarding(l3) between the two physical ports of the 
T520-CR card.

ifconfig ens1f4   192.168.8.1/24
ifconfig ens1f4d1 192.168.9.1/24
sysctl -w net.ipv4.ip_forward=1

- No VRF is configured on this box
- DCB is also not used


SVR2 is using VRF and is configured with the script inlined in the first 
email.

^ permalink raw reply

* Re: iproute2 won't compile without AF_VSOCK
From: Steve Wise @ 2018-06-19 20:41 UTC (permalink / raw)
  To: David Ahern, Stephen Hemminger; +Cc: netdev
In-Reply-To: <d6f6b60d-6dad-fd0c-cdc5-b560fe6da12d@gmail.com>



On 6/19/2018 3:29 PM, David Ahern wrote:
> On 6/19/18 2:27 PM, David Ahern wrote:
>> On 6/19/18 9:47 AM, Stephen Hemminger wrote:
>>> On Tue, 19 Jun 2018 10:17:45 -0500
>>> Steve Wise <swise@opengridcomputing.com> wrote:
>>>
>>>> Hey David,
>>>>
>>>> I'm trying to compile the latest iproute2 on an RHEL-7.3 distro, and it
>>>> fails to compile because AF_VSOCK is not defined.  Should this
>>>> functionality be a configure option to disable it on older distros?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Steve.
>>>>
>>>> ----
>>>>
>>>> misc
>>>>     CC       ss.o
>>>> ss.c:301:27: error: ‘AF_VSOCK’ undeclared here (not in a function)
>>>>    .families = FAMILY_MASK(AF_VSOCK),
>>>>                            ^
>>>> ss.c:252:46: note: in definition of macro ‘FAMILY_MASK’
>>>>  #define FAMILY_MASK(family) ((uint64_t)1 << (family))
>>>>                                               ^
>>>> ss.c:334:2: error: array index in initializer not of integer type
>>>>   [AF_VSOCK] = {
>>>>   ^
>>>> ss.c:334:2: error: (near initialization for ‘default_afs’)
>>>> make[1]: *** [ss.o] Error 1
>>>> make: *** [all] Error 2
>>>>
>>> Probably should just add an #ifdef to takeout that if not present
>>>
>> Most userspace tools have a compat header for cases like this.
>>
>> #ifndef AF_VSOCK
>> #define AF_VSOCK 	40
>> #endif
>>
> Add the above to include//utils.h; AF_MPLS is already there.

I'll send out a patch.

Thanks,

Steve.

^ permalink raw reply

* Re: [PATCH net] ipvlan: call dev_change_flags when reset ipvlan mode
From: Cong Wang @ 2018-06-19 21:10 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: Linux Kernel Network Developers, Stefano Brivio, Paolo Abeni,
	David Miller, Mahesh Bandewar
In-Reply-To: <1529330677-15328-1-git-send-email-liuhangbin@gmail.com>

On Mon, Jun 18, 2018 at 7:04 AM, Hangbin Liu <liuhangbin@gmail.com> wrote:
> @@ -94,10 +95,13 @@ static int ipvlan_set_port_mode(struct ipvl_port *port, u16 nval)
>                         mdev->l3mdev_ops = NULL;
>                 }
>                 list_for_each_entry(ipvlan, &port->ipvlans, pnode) {
> +                       flags = ipvlan->dev->flags;
>                         if (nval == IPVLAN_MODE_L3 || nval == IPVLAN_MODE_L3S)
> -                               ipvlan->dev->flags |= IFF_NOARP;
> +                               dev_change_flags(ipvlan->dev,
> +                                                flags | IFF_NOARP);
>                         else
> -                               ipvlan->dev->flags &= ~IFF_NOARP;
> +                               dev_change_flags(ipvlan->dev,
> +                                                flags & ~IFF_NOARP);

You need to check the return value of dev_change_flags().

^ permalink raw reply

* Re: [PATCH bpf-net] bpf: Change bpf_fib_lookup to return lookup status
From: Martin KaFai Lau @ 2018-06-19 21:24 UTC (permalink / raw)
  To: David Ahern; +Cc: dsahern, netdev, borkmann, ast, davem
In-Reply-To: <7697c6fd-7871-4d96-bd1c-c81a7d70ec39@gmail.com>

On Tue, Jun 19, 2018 at 02:16:53PM -0600, David Ahern wrote:
> On 6/19/18 10:36 AM, Martin KaFai Lau wrote:
> > On Tue, Jun 19, 2018 at 09:34:28AM -0600, David Ahern wrote:
> >> On 6/19/18 9:25 AM, Martin KaFai Lau wrote:
> >>> On Mon, Jun 18, 2018 at 03:35:25PM -0600, David Ahern wrote:
> >>>> On 6/18/18 2:55 PM, Martin KaFai Lau wrote:
> >>>>>> 	/* rc > 0 case */
> >>>>>> 	switch(rc) {
> >>>>>> 	case BPF_FIB_LKUP_RET_BLACKHOLE:
> >>>>>> 	case BPF_FIB_LKUP_RET_UNREACHABLE:
> >>>>>> 	case BPF_FIB_LKUP_RET_PROHIBIT:
> >>>>>> 		return XDP_DROP;
> >>>>>> 	}
> >>>>>>
> >>>>>> For the others it becomes a question of do we share why the stack needs
> >>>>>> to be involved? Maybe the program wants to collect stats to show traffic
> >>>>>> patterns that can be improved (BPF_FIB_LKUP_RET_FRAG_NEEDED) or support
> >>>>>> in the kernel needs to be improved (BPF_FIB_LKUP_RET_UNSUPP_LWT) or an
> >>>>>> interface is misconfigured (BPF_FIB_LKUP_RET_FWD_DISABLED).
> >>>>> Thanks for the explanation.
> >>>>>
> >>>>> Agree on the bpf able to collect stats will be useful.
> >>>>>
> >>>>> I am wondering, if a new BPF_FIB_LKUP_RET_XYZ is added later,
> >>>>> how may the old xdp_prog work/not-work?  As of now, the return value
> >>>>> is straight forward, FWD, PASS (to stack) or DROP (error).
> >>>>> With this change, the xdp_prog needs to match/switch() the
> >>>>> BPF_FIB_LKUP_RET_* to at least PASS and DROP.
> >>>>
> >>>> IMO, programs should only call XDP_DROP for known reasons - like the 3
> >>>> above. Anything else punt to the stack.
> >>>>
> >>>> If a new RET_XYZ comes along:
> >>>> 1. the new XYZ is a new ACL response where the packet is to be dropped.
> >>>> If the program does not understand XYZ and punts to the stack
> >>>> (recommendation), then a second lookup is done during normal packet
> >>>> processing and the stack drops it.
> >>>>
> >>>> 2. the new XYZ is a new path in the kernel that is unsupported with
> >>>> respect to XDP forwarding, nothing new for the program to do.
> >>>>
> >>>> Either way I would expect stats on BPF_FIB_LKUP_RET_* to give a hint to
> >>>> the program writer.
> >>>>
> >>>> Worst case of punting packets to the stack for any rc != 0 means the
> >>>> stack is doing 2 lookups - 1 in XDP based on its lookup parameters and 1
> >>>> in normal stack processing - to handle the packet.
> >>> Instead of having the xdp_prog to follow the meaning of what RET_SYZ is,
> >>> should the bpf_*_fib_lookup() return value be kept as is such that
> >>> the xdp_prog is clear what to do.  The reason can be returned in
> >>> the 'struct bpf_fib_lookup'.  The number of reasons can be extended.
> >>> If the xdp_prog does not understand a reason, it still will not
> >>> affect its decision because the return value is clear.
> >>> I think the situation here is similar to regular syscall which usually
> >>> uses -1 to clearly states error and errno to spells out the reason.
> >>>
> >>
> >> I did consider returning the status in struct bpf_fib_lookup. However,
> >> it is 64 bytes and can not be extended without a big performance
> >> penalty, so the only option there is to make an existing entry a union
> >> the most logical of which is the ifindex. It seemed odd to me to have
> >> the result by hidden in the struct as a union on ifindex and returning
> >> the egress index from the function:
> >>
> >> @@ -2625,7 +2636,11 @@ struct bpf_fib_lookup {
> >>
> >>         /* total length of packet from network header - used for MTU
> >> check */
> >>         __u16   tot_len;
> >> -       __u32   ifindex;  /* L3 device index for lookup */
> >> +
> >> +       union {
> >> +               __u32   ifindex;  /* input: L3 device index for lookup */
> >> +               __u32   result;   /* output: one of BPF_FIB_LKUP_RET_* */
> >> +       };
> >>
> >>
> >> It seemed more natural to have ifindex stay ifindex and only change
> >> value on return:
> >>
> >> @@ -2625,7 +2639,11 @@ struct bpf_fib_lookup {
> >>
> >>  	/* total length of packet from network header - used for MTU check */
> >>  	__u16	tot_len;
> >> -	__u32	ifindex;  /* L3 device index for lookup */
> >> +
> >> +	/* input: L3 device index for lookup
> >> +	 * output: nexthop device index from FIB lookup
> >> +	 */
> >> +	__u32	ifindex;
> >>
> >>  	union {
> >>  		/* inputs to lookup */
> >>
> >>
> >> From a program's perspective:
> >>
> >> rc < 0  -- program is passing incorrect data
> >> rc == 0 -- packet can be forwarded
> >> rc > 0  -- packet can not be forwarded.
> >>
> >> BPF programs are not required to track the LKUP_RET values any more than
> >> a function returning multiple negative values - the caller just checks
> >> rc < 0 means failure. If the program cares it can look at specific
> >> values of rc to see the specific value.
> >>
> >> The same applies with the LKUP_RET values - they are there to provide
> >> insight into why the packet is not forwarded directly if the program
> >> cares to know why.
> > hmm...ic. My concern is, the prog can interpret rc > 0 (in this patch) to be
> > drop vs pass (although we can advise them in bpf.h to always pass if it does
> > not understand a rc but it is not a strong contract),  it may catch people
> > a surprise if a xdp_prog suddenly drops everything when running in a
> > newer kernel where the upper stack can actually handle it.
> > 
> > while the current behavior (i.e. before this patch, rc == 0) is always pass
> > to the stack.
> > 
> > I think at least comments should be put in the enum such that
> > the xdp/tc_prog should expect the enum could be extended later, so
> > the suggested behavior should be a pass for unknown LKUP_RET and let
> > the stack to decide.
> > 
> 
> All APIs with enums have the inherent quality that more can be added.
> Nothing about rc > 0 says it is ok to drop the packet and nothing in the
> documentation says it is ok to drop the packet. The program author needs
> to look at the extra information provided by the rc. Specific values are
> a hint that yes the packet can be dropped; others merely say 'packet
> needs help from the stack' with a reason why it needs help.
Right, I understand there are values that hint drop (as your earlier example)
and there are values that hint pass to stack.  and either set of these values
could be extended later.

> 
> My intention is to allow the XDP program to understand FIB based ACLs.
> To that end a fib lookup result of blackhole, unreachable and prohibit
> needs to be returned to the xdp program with the effective summary of
> "can drop in the xdp program".
> 
> If the other return codes are going to cause confusion then less shorten
> the list:
> 
> enum {
> 	BPF_FIB_LKUP_RET_SUCCESS,      /* packet is to be forwareded */
> 	BPF_FIB_LKUP_RET_CAN_DROP,     /* XDP program can drop the packet */
> 	BPF_FIB_LKUP_RET_NEED_STACK,   /* packet needs full stack assist */
> };
> 
> But, that still does not solve the problem of rc > 0 means xdp program
> can drop the packet, but then that is not the intention of rc > 0. The
> intention is only "here's more information about why this packet can not
> be forwarded at this layer"
ok. Please spell out this intention in bpf.h so that the writer will not
default to DROP for the reason that it does not understand.

^ permalink raw reply

* [net] bpf, xdp, i40e: fix i40e_build_skb skb reserve and truesize
From: Jeff Kirsher @ 2018-06-19 21:33 UTC (permalink / raw)
  To: davem
  Cc: Daniel Borkmann, netdev, nhorman, sassmann, jogreene,
	Björn Töpel, John Fastabend, Jeff Kirsher

From: Daniel Borkmann <daniel@iogearbox.net>

Using skb_reserve(skb, I40E_SKB_PAD + (xdp->data - xdp->data_hard_start))
is clearly wrong since I40E_SKB_PAD already points to the offset where
the original xdp->data was sitting since xdp->data_hard_start is defined
as xdp->data - i40e_rx_offset(rx_ring) where latter offsets to I40E_SKB_PAD
when build skb is used.

However, also before cc5b114dcf98 ("bpf, i40e: add meta data support")
this seems broken since bpf_xdp_adjust_head() helper could have been used
to alter headroom and enlarge / shrink the frame and with that the assumption
that the xdp->data remains unchanged does not hold and would push a bogus
packet to upper stack.

ixgbe got this right in 924708081629 ("ixgbe: add XDP support for pass and
drop actions"). In any case, fix it by removing the I40E_SKB_PAD from both
skb_reserve() and truesize calculation.

Fixes: cc5b114dcf98 ("bpf, i40e: add meta data support")
Fixes: 0c8493d90b6b ("i40e: add XDP support for pass and drop actions")
Reported-by: Keith Busch <keith.busch@linux.intel.com>
Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Tested-by: Keith Busch <keith.busch@linux.intel.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 8ffb7454e67c..ed6dbcfd4e96 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2103,9 +2103,8 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
 	unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
 #else
 	unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) +
-				SKB_DATA_ALIGN(I40E_SKB_PAD +
-					       (xdp->data_end -
-						xdp->data_hard_start));
+				SKB_DATA_ALIGN(xdp->data_end -
+					       xdp->data_hard_start);
 #endif
 	struct sk_buff *skb;
 
@@ -2124,7 +2123,7 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
 		return NULL;
 
 	/* update pointers within the skb to store the data */
-	skb_reserve(skb, I40E_SKB_PAD + (xdp->data - xdp->data_hard_start));
+	skb_reserve(skb, xdp->data - xdp->data_hard_start);
 	__skb_put(skb, xdp->data_end - xdp->data);
 	if (metasize)
 		skb_metadata_set(skb, metasize);
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH 1/5] net: emaclite: Use __func__ instead of hardcoded name
From: Andy Shevchenko @ 2018-06-19 21:36 UTC (permalink / raw)
  To: Radhey Shyam Pandey
  Cc: David S. Miller, Andrew Lunn, Michal Simek, netdev,
	linux-arm Mailing List, Linux Kernel Mailing List
In-Reply-To: <1529320103-7711-2-git-send-email-radhey.shyam.pandey@xilinx.com>

On Mon, Jun 18, 2018 at 2:08 PM, Radhey Shyam Pandey
<radhey.shyam.pandey@xilinx.com> wrote:
> Switch hardcoded function name with a reference to __func__ making
> the code more maintainable. Address below checkpatch warning:
>
> WARNING: Prefer using '"%s...", __func__' to using 'xemaclite_mdio_read',
> this function's name, in a string
> +               "xemaclite_mdio_read(phy_id=%i, reg=%x) == %x\n",
>
> WARNING: Prefer using '"%s...", __func__' to using 'xemaclite_mdio_write',
> this function's name, in a string
> +               "xemaclite_mdio_write(phy_id=%i, reg=%x, val=%x)\n",
>

For dev_dbg() the __func__ should be completely dropped away.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* Re: [PATCH] PCI: allow drivers to limit the number of VFs to 0
From: Bjorn Helgaas @ 2018-06-19 21:37 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Bjorn Helgaas, linux-pci, netdev, Sathya Perla, Felix Manlunas,
	alexander.duyck, john.fastabend, Jacob Keller, Donald Dutile,
	oss-drivers, Christoph Hellwig
In-Reply-To: <20180525140223.GA45098@bhelgaas-glaptop.roam.corp.google.com>

On Fri, May 25, 2018 at 09:02:23AM -0500, Bjorn Helgaas wrote:
> On Thu, May 24, 2018 at 06:20:15PM -0700, Jakub Kicinski wrote:
> > Hi Bjorn!
> > 
> > On Thu, 24 May 2018 18:57:48 -0500, Bjorn Helgaas wrote:
> > > On Mon, Apr 02, 2018 at 03:46:52PM -0700, Jakub Kicinski wrote:
> > > > Some user space depends on enabling sriov_totalvfs number of VFs
> > > > to not fail, e.g.:
> > > > 
> > > > $ cat .../sriov_totalvfs > .../sriov_numvfs
> > > > 
> > > > For devices which VF support depends on loaded FW we have the
> > > > pci_sriov_{g,s}et_totalvfs() API.  However, this API uses 0 as
> > > > a special "unset" value, meaning drivers can't limit sriov_totalvfs
> > > > to 0.  Remove the special values completely and simply initialize
> > > > driver_max_VFs to total_VFs.  Then always use driver_max_VFs.
> > > > Add a helper for drivers to reset the VF limit back to total.  
> > > 
> > > I still can't really make sense out of the changelog.
> > >
> > > I think part of the reason it's confusing is because there are two
> > > things going on:
> > > 
> > >   1) You want this:
> > >   
> > >        pci_sriov_set_totalvfs(dev, 0);
> > >        x = pci_sriov_get_totalvfs(dev) 
> > > 
> > >      to return 0 instead of total_VFs.  That seems to connect with
> > >      your subject line.  It means "sriov_totalvfs" in sysfs could be
> > >      0, but I don't know how that is useful (I'm sure it is; just
> > >      educate me :))
> > 
> > Let me just quote the bug report that got filed on our internal bug
> > tracker :)
> > 
> >   When testing Juju Openstack with Ubuntu 18.04, enabling SR-IOV causes
> >   errors because Juju gets the sriov_totalvfs for SR-IOV-capable device
> >   then tries to set that as the sriov_numvfs parameter.
> > 
> >   For SR-IOV incapable FW, the sriov_totalvfs parameter should be 0, 
> >   but it's set to max.  When FW is switched to flower*, the correct 
> >   sriov_totalvfs value is presented.
> > 
> > * flower is a project name
> 
> From the point of view of the PCI core (which knows nothing about
> device firmware and relies on the architected config space described
> by the PCIe spec), this sounds like an erratum: with some firmware
> installed, the device is not capable of SR-IOV, but still advertises
> an SR-IOV capability with "TotalVFs > 0".
> 
> Regardless of whether that's an erratum, we do allow PF drivers to use
> pci_sriov_set_totalvfs() to limit the number of VFs that may be
> enabled by writing to the PF's "sriov_numvfs" sysfs file.
> 
> But the current implementation does not allow a PF driver to limit VFs
> to 0, and that does seem nonsensical.
> 
> > My understanding is OpenStack uses sriov_totalvfs to determine how many
> > VFs can be enabled, looks like this is the code:
> > 
> > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n464
> > 
> > >   2) You're adding the pci_sriov_reset_totalvfs() interface.  I'm not
> > >      sure what you intend for this.  Is *every* driver supposed to
> > >      call it in .remove()?  Could/should this be done in the core
> > >      somehow instead of depending on every driver?
> > 
> > Good question, I was just thinking yesterday we may want to call it
> > from the core, but I don't think it's strictly necessary nor always
> > sufficient (we may reload FW without re-probing).
> > 
> > We have a device which supports different number of VFs based on the FW
> > loaded.  Some legacy FWs does not inform the driver how many VFs it can
> > support, because it supports max.  So the flow in our driver is this:
> > 
> > load_fw(dev);
> > ...
> > max_vfs = ask_fw_for_max_vfs(dev);
> > if (max_vfs >= 0)
> > 	return pci_sriov_set_totalvfs(dev, max_vfs);
> > else /* FW didn't tell us, assume max */
> > 	return pci_sriov_reset_totalvfs(dev); 
> > 
> > We also reset the max on device remove, but that's not strictly
> > necessary.
> > 
> > Other users of pci_sriov_set_totalvfs() always know the value to set
> > the total to (either always get it from FW or it's a constant).
> > 
> > If you prefer we can work out the correct max for those legacy cases in
> > the driver as well, although it seemed cleaner to just ask the core,
> > since it already has total_VFs value handy :)
> > 
> > > I'm also having a hard time connecting your user-space command example
> > > with the rest of this.  Maybe it will make more sense to me tomorrow
> > > after some coffee.
> > 
> > OpenStack assumes it will always be able to set sriov_numvfs to
> > sriov_totalvfs, see this 'if':
> > 
> > http://git.openstack.org/cgit/openstack/charm-neutron-openvswitch/tree/hooks/neutron_ovs_utils.py#n512
> 
> Thanks for educating me.  I think there are two issues here that we
> can separate.  I extracted the patch below for the first.
> 
> The second is the question of resetting driver_max_VFs.  I think we
> currently have a general issue in the core:
> 
>   - load PF driver 1
>   - driver calls pci_sriov_set_totalvfs() to reduce driver_max_VFs
>   - unload PF driver 1
>   - load PF driver 2
> 
> Now driver_max_VFs is still stuck at the lower value set by driver 1.
> I don't think that's the way this should work.
> 
> I guess this is partly a consequence of setting driver_max_VFs in
> sriov_init(), which is called before driver attach and should only
> depend on hardware characteristics, so it is related to the patch
> below.  But I think we should fix it in general, not just for
> netronome.

Hi Jakub, the patch below is in v4.18-rc1 as 8d85a7a4f2c9 ("PCI/IOV:
Allow PF drivers to limit total_VFs to 0").  If there's more we need
to do here, would you mind rebasing what's left to v4.18-rc1 and
reposting it?

> commit 4a338bc6f94b9ad824ac944f5dfc249d6838719c
> Author: Jakub Kicinski <jakub.kicinski@netronome.com>
> Date:   Fri May 25 08:18:34 2018 -0500
> 
>     PCI/IOV: Allow PF drivers to limit total_VFs to 0
>     
>     Some SR-IOV PF drivers implement .sriov_configure(), which allows
>     user-space to enable VFs by writing the desired number of VFs to the sysfs
>     "sriov_numvfs" file (see sriov_numvfs_store()).
>     
>     The PCI core limits the number of VFs to the TotalVFs advertised by the
>     device in its SR-IOV capability.  The PF driver can limit the number of VFs
>     to even fewer (it may have pre-allocated data structures or knowledge of
>     device limitations) by calling pci_sriov_set_totalvfs(), but previously it
>     could not limit the VFs to 0.
>     
>     Change pci_sriov_get_totalvfs() so it always respects the VF limit imposed
>     by the PF driver, even if the limit is 0.
>     
>     This sequence:
>     
>       pci_sriov_set_totalvfs(dev, 0);
>       x = pci_sriov_get_totalvfs(dev);
>     
>     previously set "x" to TotalVFs from the SR-IOV capability.  Now it will set
>     "x" to 0.
>     
>     Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 192b82898a38..d0d73dbbd5ca 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -469,6 +469,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
>  	iov->nres = nres;
>  	iov->ctrl = ctrl;
>  	iov->total_VFs = total;
> +	iov->driver_max_VFs = total;
>  	pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, &iov->vf_device);
>  	iov->pgsz = pgsz;
>  	iov->self = dev;
> @@ -827,10 +828,7 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
>  	if (!dev->is_physfn)
>  		return 0;
>  
> -	if (dev->sriov->driver_max_VFs)
> -		return dev->sriov->driver_max_VFs;
> -
> -	return dev->sriov->total_VFs;
> +	return dev->sriov->driver_max_VFs;
>  }
>  EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
>  

^ permalink raw reply

* Re: [PATCH] [Bug 16494] NFS client over TCP hangs due to packet loss
From: Joe Perches @ 2018-06-19 21:56 UTC (permalink / raw)
  To: Andy Chittenden, 'Trond Myklebust'
  Cc: 'Andrew Morton', 'David Miller', kuznet, pekkas,
	jmorris, yoshfuji, kaber, eric.dumazet, William.Allen.Simpson,
	gilad, ilpo.jarvinen, netdev, linux-kernel, linux-nfs,
	'J. Bruce Fields', 'Neil Brown',
	'Chuck Lever', 'Benny Halevy',
	'Alexandros Batsakis', Andy Chittenden
In-Reply-To: <4c5fc9f3.487e0e0a.2960.ffffe4ec@mx.google.com>

On Mon, 2010-08-09 at 10:27 +0100, Andy Chittenden wrote:

You really need to check your clock.
Mail sent in the year 2010?

^ permalink raw reply

* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Eric Dumazet @ 2018-06-19 22:10 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Mathieu Malaterre, David S. Miller, Eric Dumazet, LKML,
	Christophe LEROY, Meelis Roos, netdev, linuxppc-dev
In-Reply-To: <e1790720-911b-4d04-2215-752ec612e3d7@gmail.com>



On 06/19/2018 01:10 PM, Eric Dumazet wrote:
> 
> 
> On 06/19/2018 12:10 PM, Andreas Schwab wrote:
>> On Jun 18 2018, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>> DUMP_PREFIX_ADDRESS might give us more information (say alignment problem, or crossing page boundaries)
>>
>> DUMP_PREFIX_ADDRESS is useless for that purpose.
>>
>> Here are some samples of broken csums:
>>
>> [  853.849225] sungem: sungem wrong csum : 9886/07be, len 94 bytes, c0000001fa187e02
>> [  853.849232] raw data: 00000000: 00 0d 93 43 81 62 18 d6 c7 51 b8 1c 08 00 45 10  ...C.b...Q....E.
>> [  853.849235] raw data: 00000010: 00 4c cb a0 40 00 40 11 d9 97 c0 a8 0a 01 c0 a8  .L..@.@.........
>> [  853.849237] raw data: 00000020: 0a 07 00 7b 00 7b 00 38 69 e1 1c 03 0c f7 00 00  ...{.{.8i.......
>> [  853.849240] raw data: 00000030: 08 f0 00 00 15 f0 c0 35 67 67 de d3 ca c9 d9 5b  .......5gg.....[
>> [  853.849242] raw data: 00000040: 1f ff de d3 d2 86 8f 67 fa f2 de d3 d2 86 8f 38  .......g.......8
>> [  853.849244] raw data: 00000050: 2f ff de d3 d2 86 8f 3b ff ff d1 93 bc 50        /......;.....P
> 
> Thanks.
> 
> 4 bytes in excess.
> 
> Might be the FCS, and it does not look like provided csum has a relation with it.
> 
> For some reason FCS stripping was disabled by :
> 
> commit 3e32011d4da6424b3bc65b1e1a047e30ac9882c7
> Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date:   Mon May 19 09:39:11 2003 -0700
> 
>     [SUNGEM]: Updates from PowerPC people.
>     
>     Support more chips and split out all the complex PHY
>     handling into a seperate file.
> 
> 
> Since this NIC never had CHECKSUM_COMPLETE support (since we have to trim each skb,
> thus were forcing ip_summed to CHECKSUM_NONE) we probably should remove it and be happy.
> 
> Unless you guys find a way to let the NIC strip the FCS, and double check the csum is a real csum ;)
> 
> diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
> index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..85439308375e95c3854e4a1561697d69ec85399b 100644
> --- a/drivers/net/ethernet/sun/sungem.c
> +++ b/drivers/net/ethernet/sun/sungem.c
> @@ -760,7 +760,6 @@ static int gem_rx(struct gem *gp, int work_to_do)
>         struct net_device *dev = gp->dev;
>         int entry, drops, work_done = 0;
>         u32 done;
> -       __sum16 csum;
>  
>         if (netif_msg_rx_status(gp))
>                 printk(KERN_DEBUG "%s: rx interrupt, done: %d, rx_new: %d\n",
> @@ -855,9 +854,6 @@ static int gem_rx(struct gem *gp, int work_to_do)
>                         skb = copy_skb;
>                 }
>  
> -               csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff);
> -               skb->csum = csum_unfold(csum);
> -               skb->ip_summed = CHECKSUM_COMPLETE;
>                 skb->protocol = eth_type_trans(skb, gp->dev);
>  
>                 napi_gro_receive(&gp->napi, skb);
> 


If you have time, you also could check if changing the suspect (14 / 2) to ETH_HLEN would help
(and also enabling STRIP_FCS)



diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..672d6748ab44f0890e92d5ca55d6ff6834c20dc9 100644
--- a/drivers/net/ethernet/sun/sungem.c
+++ b/drivers/net/ethernet/sun/sungem.c
@@ -60,8 +60,7 @@
 #include <linux/sungem_phy.h>
 #include "sungem.h"
 
-/* Stripping FCS is causing problems, disabled for now */
-#undef STRIP_FCS
+#define STRIP_FCS
 
 #define DEFAULT_MSG    (NETIF_MSG_DRV          | \
                         NETIF_MSG_PROBE        | \
@@ -435,7 +434,7 @@ static int gem_rxmac_reset(struct gem *gp)
        writel(desc_dma & 0xffffffff, gp->regs + RXDMA_DBLOW);
        writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK);
        val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
-              ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
+              (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
        writel(val, gp->regs + RXDMA_CFG);
        if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN)
                writel(((5 & RXDMA_BLANK_IPKTS) |
@@ -857,6 +856,14 @@ static int gem_rx(struct gem *gp, int work_to_do)
 
                csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff);
                skb->csum = csum_unfold(csum);
+               {
+               __wsum rsum = csum_partial(skb->data + ETH_HLEN, len - ETH_HLEN, 0);
+               if (csum != csum_fold(rsum) && net_ratelimit())
+                       pr_err("sungem wrong csum : %x/%x, len %u bytes\n",
+                               csum, csum_fold(rsum), len);
+                       print_hex_dump(KERN_ERR, "raw data: ", DUMP_PREFIX_OFFSET,
+                                           16, 1, skb->data, len, true);
+               }
                skb->ip_summed = CHECKSUM_COMPLETE;
                skb->protocol = eth_type_trans(skb, gp->dev);
 
@@ -1761,7 +1768,7 @@ static void gem_init_dma(struct gem *gp)
        writel(0, gp->regs + TXDMA_KICK);
 
        val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
-              ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
+              (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
        writel(val, gp->regs + RXDMA_CFG);
 
        writel(desc_dma >> 32, gp->regs + RXDMA_DBHI);

^ permalink raw reply related

* Re: [PATCH] tc, bpf: add option to dump bpf verifier as C program fragment
From: Daniel Borkmann @ 2018-06-19 22:13 UTC (permalink / raw)
  To: David Ahern, Jakub Kicinski, Ophir Munk
  Cc: netdev, Stephen Hemminger, Thomas Monjalon, Olga Shern, ast
In-Reply-To: <30ac6974-a434-926b-5749-25c594e5841c@gmail.com>

On 06/18/2018 11:44 PM, David Ahern wrote:
> On 6/18/18 2:18 PM, Jakub Kicinski wrote:
>> On Sun, 17 Jun 2018 08:48:41 +0000, Ophir Munk wrote:
>>> Similar to cbpf used within tcpdump utility with a "-d" option to dump
>>> the compiled packet-matching code in a human readable form - tc has the
>>> "verbose" option to dump ebpf verifier output.
>>> Another useful option of cbpf using tcpdump "-dd" option is to dump
>>> packet-matching code a C program fragment. Similar to this - this commit
>>> adds a new tc ebpf option named "code" to dump ebpf verifier as C program
>>> fragment.
>>>
>>> Existing "verbose" option sample output:
>>>
>>> Verifier analysis:
>>> 0: (61) r2 = *(u32 *)(r1 +52)
>>> 1: (18) r3 = 0xdeadbeef
>>> 3: (63) *(u32 *)(r10 -4) = r3
>>> .
>>> .
>>> 11: (63) *(u32 *)(r1 +52) = r2
>>> 12: (18) r0 = 0xffffffff
>>> 14: (95) exit
>>>
>>> New "code" option sample output:
>>>
>>> /* struct bpf_insn cls_q_code[] = { */
>>> {0x61,    2,    1,       52, 0x00000000},
>>> {0x18,    3,    0,        0, 0xdeadbeef},
>>> {0x00,    0,    0,        0, 0x00000000},
>>> .
>>> .
>>> {0x63,    1,    2,       52, 0x00000000},
>>> {0x18,    0,    0,        0, 0xffffffff},
>>> {0x00,    0,    0,        0, 0x00000000},
>>> {0x95,    0,    0,        0, 0x00000000},
>>>
>>> Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
>>
>> Hmm... printing C arrays looks like hacky integration with some C
>> code...  Would you not be better served by simply using libbpf in
>> whatever is consuming this output?
> 
> I was thinking the same. bpftool would provide options too -- print the
> above, print in macro encodings and verifier. I gave an example of this
> side by side dump at netconf 2.1. Does not look like the slides made it
> online; see attached.

+1, I would also doubt that this adds a lot in terms of debuggability
when you're trying to load an object file with thousands of insns. Better
way would be to use llvm-objdump on the obj file to get to the annotated
disassembly, see also example in [0]. A .o to .c converter is wip for
libbpf/bpftool as presented in [1], which should provide the flexibility
to embedd an obj file.

Cheers,
Daniel

  [0] http://cilium.readthedocs.io/en/latest/bpf/#llvm
  [1] http://vger.kernel.org/netconf2018_files/AlexeiStarovoitov_netconf2018.pdf page 22

^ permalink raw reply

* [PATCH net] enic: initialize enic->rfs_h.lock in enic_probe
From: Govindarajulu Varadarajan @ 2018-06-19 15:15 UTC (permalink / raw)
  To: davem, benve, netdev; +Cc: Govindarajulu Varadarajan

lockdep spotted that we are using rfs_h.lock in enic_get_rxnfc() without
initializing. rfs_h.lock is initialized in enic_open(). But ethtool_ops
can be called when interface is down.

Move enic_rfs_flw_tbl_init to enic_probe.

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 18 PID: 1189 Comm: ethtool Not tainted 4.17.0-rc7-devel+ #27
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
Call Trace:
dump_stack+0x85/0xc0
register_lock_class+0x550/0x560
? __handle_mm_fault+0xa8b/0x1100
__lock_acquire+0x81/0x670
lock_acquire+0xb9/0x1e0
?  enic_get_rxnfc+0x139/0x2b0 [enic]
_raw_spin_lock_bh+0x38/0x80
? enic_get_rxnfc+0x139/0x2b0 [enic]
enic_get_rxnfc+0x139/0x2b0 [enic]
ethtool_get_rxnfc+0x8d/0x1c0
dev_ethtool+0x16c8/0x2400
? __mutex_lock+0x64d/0xa00
? dev_load+0x6a/0x150
dev_ioctl+0x253/0x4b0
sock_do_ioctl+0x9a/0x130
sock_ioctl+0x1af/0x350
do_vfs_ioctl+0x8e/0x670
? syscall_trace_enter+0x1e2/0x380
ksys_ioctl+0x60/0x90
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x5a/0x170
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
---
 drivers/net/ethernet/cisco/enic/enic_clsf.c | 3 +--
 drivers/net/ethernet/cisco/enic/enic_main.c | 3 ++-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cisco/enic/enic_clsf.c b/drivers/net/ethernet/cisco/enic/enic_clsf.c
index 973c1fb70d09..99038dfc7fbe 100644
--- a/drivers/net/ethernet/cisco/enic/enic_clsf.c
+++ b/drivers/net/ethernet/cisco/enic/enic_clsf.c
@@ -79,7 +79,6 @@ void enic_rfs_flw_tbl_init(struct enic *enic)
 	enic->rfs_h.max = enic->config.num_arfs;
 	enic->rfs_h.free = enic->rfs_h.max;
 	enic->rfs_h.toclean = 0;
-	enic_rfs_timer_start(enic);
 }
 
 void enic_rfs_flw_tbl_free(struct enic *enic)
@@ -88,7 +87,6 @@ void enic_rfs_flw_tbl_free(struct enic *enic)
 
 	enic_rfs_timer_stop(enic);
 	spin_lock_bh(&enic->rfs_h.lock);
-	enic->rfs_h.free = 0;
 	for (i = 0; i < (1 << ENIC_RFS_FLW_BITSHIFT); i++) {
 		struct hlist_head *hhead;
 		struct hlist_node *tmp;
@@ -99,6 +97,7 @@ void enic_rfs_flw_tbl_free(struct enic *enic)
 			enic_delfltr(enic, n->fltr_id);
 			hlist_del(&n->node);
 			kfree(n);
+			enic->rfs_h.free++;
 		}
 	}
 	spin_unlock_bh(&enic->rfs_h.lock);
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index 30d2eaa18c04..e6ad581eadd8 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -1971,7 +1971,7 @@ static int enic_open(struct net_device *netdev)
 		vnic_intr_unmask(&enic->intr[i]);
 
 	enic_notify_timer_start(enic);
-	enic_rfs_flw_tbl_init(enic);
+	enic_rfs_timer_start(enic);
 
 	return 0;
 
@@ -2904,6 +2904,7 @@ static int enic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	timer_setup(&enic->notify_timer, enic_notify_timer, 0);
 
+	enic_rfs_flw_tbl_init(enic);
 	enic_set_rx_coal_setting(enic);
 	INIT_WORK(&enic->reset, enic_reset);
 	INIT_WORK(&enic->tx_hang_reset, enic_tx_hang_reset);
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH net 0/3] qed*: Fix series.
From: David Miller @ 2018-06-19 22:18 UTC (permalink / raw)
  To: sudarsana.kalluru; +Cc: netdev, Ariel.Elior, Michal.Kalderon
In-Reply-To: <20180619045802.24050-1-sudarsana.kalluru@cavium.com>

From: Sudarsana Reddy Kalluru <sudarsana.kalluru@cavium.com>
Date: Mon, 18 Jun 2018 21:57:59 -0700

> From: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
> 
> The patch series fixes few issues in the qed/qede drivers.
> Please consider applying this series to "net".

Series applied.

^ permalink raw reply

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
From: Daniel Borkmann @ 2018-06-19 22:22 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: David Miller, pablo, netfilter-devel, netdev
In-Reply-To: <20180617092304.fviy36bvgnhexwtb@gauss3.secunet.de>

On 06/17/2018 11:23 AM, Steffen Klassert wrote:
[...]
>> Would be curious about
>> the numbers. You'd get implicit batching for the forwarding via devmap
>> as well if you're required to flush it out via different device with
>> XDP_REDIRECT; otherwise XDP_TX of course. Given we have recently
>> integrated helpers for XDP to do a FIB and neighbor lookup from the
>> kernel tables, where it's thus shared and integrated with the rest of
>> the stack and tooling, it would be awesome to get to the same point
>> with xfrm as well. Eyal recently did a start on that for xfrm for tc
>> progs; would be nice to have integration on XDP as well, potentially
>> it might also result in a bigger plus on the forwarding numbers.
> 
> It might make sense to intrgrate XDP with xfrm to
> be able to compare numbers etc. But I need a working
> XDP setup and some understanding about it first, what
> could take some time.

Okay, no prob. If you have any questions feel free to shoot an email.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH v2 0/4] Slience NCSI logging
From: David Miller @ 2018-06-19 22:27 UTC (permalink / raw)
  To: joel; +Cc: sam, netdev, joe
In-Reply-To: <20180619053834.12257-1-joel@jms.id.au>

From: Joel Stanley <joel@jms.id.au>
Date: Tue, 19 Jun 2018 15:08:30 +0930

> v2:
>   Fix indent issue and commit message based on Joe's feedback
>   Add Sam's acks
> 
> Here are three changes to silence unnecessary warnings in the ncsi code.
> 
> The final patch adds Sam as the maintainer for NCSI.

Series applied.

^ permalink raw reply

* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Andreas Schwab @ 2018-06-19 22:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mathieu Malaterre, David S. Miller, Eric Dumazet, LKML,
	Christophe LEROY, Meelis Roos, netdev, linuxppc-dev
In-Reply-To: <05645b90-d3bc-466d-116f-548f3ee39de9@gmail.com>

On Jun 19 2018, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
> index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..672d6748ab44f0890e92d5ca55d6ff6834c20dc9 100644
> --- a/drivers/net/ethernet/sun/sungem.c
> +++ b/drivers/net/ethernet/sun/sungem.c
> @@ -60,8 +60,7 @@
>  #include <linux/sungem_phy.h>
>  #include "sungem.h"
>  
> -/* Stripping FCS is causing problems, disabled for now */
> -#undef STRIP_FCS
> +#define STRIP_FCS
>  
>  #define DEFAULT_MSG    (NETIF_MSG_DRV          | \
>                          NETIF_MSG_PROBE        | \
> @@ -435,7 +434,7 @@ static int gem_rxmac_reset(struct gem *gp)
>         writel(desc_dma & 0xffffffff, gp->regs + RXDMA_DBLOW);
>         writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK);
>         val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
> -              ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
> +              (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
>         writel(val, gp->regs + RXDMA_CFG);
>         if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN)
>                 writel(((5 & RXDMA_BLANK_IPKTS) |
> @@ -857,6 +856,14 @@ static int gem_rx(struct gem *gp, int work_to_do)
>  
>                 csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff);
>                 skb->csum = csum_unfold(csum);
> +               {
> +               __wsum rsum = csum_partial(skb->data + ETH_HLEN, len - ETH_HLEN, 0);
> +               if (csum != csum_fold(rsum) && net_ratelimit())
> +                       pr_err("sungem wrong csum : %x/%x, len %u bytes\n",
> +                               csum, csum_fold(rsum), len);
> +                       print_hex_dump(KERN_ERR, "raw data: ", DUMP_PREFIX_OFFSET,
> +                                           16, 1, skb->data, len, true);
> +               }
>                 skb->ip_summed = CHECKSUM_COMPLETE;
>                 skb->protocol = eth_type_trans(skb, gp->dev);
>  
> @@ -1761,7 +1768,7 @@ static void gem_init_dma(struct gem *gp)
>         writel(0, gp->regs + TXDMA_KICK);
>  
>         val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
> -              ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
> +              (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
>         writel(val, gp->regs + RXDMA_CFG);
>  
>         writel(desc_dma >> 32, gp->regs + RXDMA_DBHI);

With that patch I still get the wrong csum messages, but no longer the
hw csum failure messages (tested on a PowerMac G5).

[  662.659767] sungem: sungem wrong csum : 8359/7ca6, len 86 bytes, c0000001fee9cc02
[  662.659775] raw data: 00000000: 00 0d 93 43 81 62 d4 3d 7e 4c 48 b7 86 dd 61 01  ...C.b.=~LH...a.
[  662.659778] raw data: 00000010: 1c 1e 00 20 06 40 20 01 0a 62 17 11 88 01 00 00  ... .@ ..b......
[  662.659780] raw data: 00000020: 00 00 00 00 0a 38 20 01 0a 62 17 11 88 01 00 00  .....8 ..b......
[  662.659783] raw data: 00000030: 00 00 00 00 00 07 9a 18 00 16 c1 9a 7e ea ea 44  ............~..D
[  662.659785] raw data: 00000040: fb 4a 80 10 05 93 44 08 00 00 01 01 08 0a 59 68  .J....D.......Yh
[  662.659788] raw data: 00000050: ba e2 0e bb ac ae                                ......

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply

* Re: [PATCH 1/5] net: emaclite: Use __func__ instead of hardcoded name
From: Joe Perches @ 2018-06-19 22:37 UTC (permalink / raw)
  To: Andy Shevchenko, Radhey Shyam Pandey
  Cc: David S. Miller, Andrew Lunn, Michal Simek, netdev,
	linux-arm Mailing List, Linux Kernel Mailing List
In-Reply-To: <CAHp75VcKuqnUgynqXvACS7zY5=6aRxqymGE1w3Yh8sshehSm2g@mail.gmail.com>

On Wed, 2018-06-20 at 00:36 +0300, Andy Shevchenko wrote:
> On Mon, Jun 18, 2018 at 2:08 PM, Radhey Shyam Pandey
> <radhey.shyam.pandey@xilinx.com> wrote:
> > Switch hardcoded function name with a reference to __func__ making
> > the code more maintainable. Address below checkpatch warning:
> > 
> > WARNING: Prefer using '"%s...", __func__' to using 'xemaclite_mdio_read',
> > this function's name, in a string
> > +               "xemaclite_mdio_read(phy_id=%i, reg=%x) == %x\n",
> > 
> > WARNING: Prefer using '"%s...", __func__' to using 'xemaclite_mdio_write',
> > this function's name, in a string
> > +               "xemaclite_mdio_write(phy_id=%i, reg=%x, val=%x)\n",
> > 
> 
> For dev_dbg() the __func__ should be completely dropped away.

Not really the same.

dev_dbg without CONFIG_DYNAMIC_DEBUG does not have
the ability to prefix __func__.

^ permalink raw reply

* Re: [PATCH] net: nixge: Add __packed attribute to DMA descriptor struct
From: David Miller @ 2018-06-19 22:37 UTC (permalink / raw)
  To: f.fainelli; +Cc: mdf, keescook, netdev, linux-kernel
In-Reply-To: <832bebb8-300d-e911-2946-5edfe82dc30a@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue, 19 Jun 2018 10:13:55 -0700

> How could padding be inserted given than all of the structure members
> are naturally aligned (all u32 type). Compiler bug?

Agreed, this looks completely unnecessary.

__packed should only be used when absolutely necessary because using
it generates less efficient code on some architectures.

^ permalink raw reply

* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Eric Dumazet @ 2018-06-19 22:40 UTC (permalink / raw)
  To: Andreas Schwab, Eric Dumazet
  Cc: Mathieu Malaterre, David S. Miller, Eric Dumazet, LKML,
	Christophe LEROY, Meelis Roos, netdev, linuxppc-dev
In-Reply-To: <87lgbav24y.fsf@igel.home>



On 06/19/2018 03:32 PM, Andreas Schwab wrote:
> On Jun 19 2018, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
>> diff --git a/drivers/net/ethernet/sun/sungem.c b/drivers/net/ethernet/sun/sungem.c
>> index 7a16d40a72d13cf1d522e8a3a396c826fe76f9b9..672d6748ab44f0890e92d5ca55d6ff6834c20dc9 100644
>> --- a/drivers/net/ethernet/sun/sungem.c
>> +++ b/drivers/net/ethernet/sun/sungem.c
>> @@ -60,8 +60,7 @@
>>  #include <linux/sungem_phy.h>
>>  #include "sungem.h"
>>  
>> -/* Stripping FCS is causing problems, disabled for now */
>> -#undef STRIP_FCS
>> +#define STRIP_FCS
>>  
>>  #define DEFAULT_MSG    (NETIF_MSG_DRV          | \
>>                          NETIF_MSG_PROBE        | \
>> @@ -435,7 +434,7 @@ static int gem_rxmac_reset(struct gem *gp)
>>         writel(desc_dma & 0xffffffff, gp->regs + RXDMA_DBLOW);
>>         writel(RX_RING_SIZE - 4, gp->regs + RXDMA_KICK);
>>         val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
>> -              ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
>> +              (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
>>         writel(val, gp->regs + RXDMA_CFG);
>>         if (readl(gp->regs + GREG_BIFCFG) & GREG_BIFCFG_M66EN)
>>                 writel(((5 & RXDMA_BLANK_IPKTS) |
>> @@ -857,6 +856,14 @@ static int gem_rx(struct gem *gp, int work_to_do)
>>  
>>                 csum = (__force __sum16)htons((status & RXDCTRL_TCPCSUM) ^ 0xffff);
>>                 skb->csum = csum_unfold(csum);
>> +               {
>> +               __wsum rsum = csum_partial(skb->data + ETH_HLEN, len - ETH_HLEN, 0);
>> +               if (csum != csum_fold(rsum) && net_ratelimit())
>> +                       pr_err("sungem wrong csum : %x/%x, len %u bytes\n",
>> +                               csum, csum_fold(rsum), len);
>> +                       print_hex_dump(KERN_ERR, "raw data: ", DUMP_PREFIX_OFFSET,
>> +                                           16, 1, skb->data, len, true);
>> +               }
>>                 skb->ip_summed = CHECKSUM_COMPLETE;
>>                 skb->protocol = eth_type_trans(skb, gp->dev);
>>  
>> @@ -1761,7 +1768,7 @@ static void gem_init_dma(struct gem *gp)
>>         writel(0, gp->regs + TXDMA_KICK);
>>  
>>         val = (RXDMA_CFG_BASE | (RX_OFFSET << 10) |
>> -              ((14 / 2) << 13) | RXDMA_CFG_FTHRESH_128);
>> +              (ETH_HLEN << 13) | RXDMA_CFG_FTHRESH_128);
>>         writel(val, gp->regs + RXDMA_CFG);
>>  
>>         writel(desc_dma >> 32, gp->regs + RXDMA_DBHI);
> 
> With that patch I still get the wrong csum messages, but no longer the
> hw csum failure messages (tested on a PowerMac G5).
> 
> [  662.659767] sungem: sungem wrong csum : 8359/7ca6, len 86 bytes, c0000001fee9cc02
> [  662.659775] raw data: 00000000: 00 0d 93 43 81 62 d4 3d 7e 4c 48 b7 86 dd 61 01  ...C.b.=~LH...a.
> [  662.659778] raw data: 00000010: 1c 1e 00 20 06 40 20 01 0a 62 17 11 88 01 00 00  ... .@ ..b......
> [  662.659780] raw data: 00000020: 00 00 00 00 0a 38 20 01 0a 62 17 11 88 01 00 00  .....8 ..b......
> [  662.659783] raw data: 00000030: 00 00 00 00 00 07 9a 18 00 16 c1 9a 7e ea ea 44  ............~..D
> [  662.659785] raw data: 00000040: fb 4a 80 10 05 93 44 08 00 00 01 01 08 0a 59 68  .J....D.......Yh
> [  662.659788] raw data: 00000050: ba e2 0e bb ac ae                                ......
> 
> Andreas.
> 

Note that 8359 and 7ca6 are the same really (a missing ~ to invert csum_partial())

So the bug was that :

1) Driver programmed a wrong start offset for the csum  (7 bytes instead of 14 bytes to skip Ethernet Header)

2) FCS was not stripped.

Basically CHECKSUM_COMPLETE support never worked, but this was hidden by the fact that linux stack
had to throw away this CHECKSUM_COMPLETE because the FCS had to be removed.

Thanks !

^ permalink raw reply

* Re: [net] bpf, xdp, i40e: fix i40e_build_skb skb reserve and truesize
From: David Miller @ 2018-06-19 22:41 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: daniel, netdev, nhorman, sassmann, jogreene, bjorn.topel,
	john.fastabend
In-Reply-To: <20180619213354.30986-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 19 Jun 2018 14:33:54 -0700

> From: Daniel Borkmann <daniel@iogearbox.net>
> 
> Using skb_reserve(skb, I40E_SKB_PAD + (xdp->data - xdp->data_hard_start))
> is clearly wrong since I40E_SKB_PAD already points to the offset where
> the original xdp->data was sitting since xdp->data_hard_start is defined
> as xdp->data - i40e_rx_offset(rx_ring) where latter offsets to I40E_SKB_PAD
> when build skb is used.
> 
> However, also before cc5b114dcf98 ("bpf, i40e: add meta data support")
> this seems broken since bpf_xdp_adjust_head() helper could have been used
> to alter headroom and enlarge / shrink the frame and with that the assumption
> that the xdp->data remains unchanged does not hold and would push a bogus
> packet to upper stack.
> 
> ixgbe got this right in 924708081629 ("ixgbe: add XDP support for pass and
> drop actions"). In any case, fix it by removing the I40E_SKB_PAD from both
> skb_reserve() and truesize calculation.
> 
> Fixes: cc5b114dcf98 ("bpf, i40e: add meta data support")
> Fixes: 0c8493d90b6b ("i40e: add XDP support for pass and drop actions")
> Reported-by: Keith Busch <keith.busch@linux.intel.com>
> Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Björn Töpel <bjorn.topel@intel.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Tested-by: Keith Busch <keith.busch@linux.intel.com>
> Acked-by: John Fastabend <john.fastabend@gmail.com>
> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] enic: initialize enic->rfs_h.lock in enic_probe
From: David Miller @ 2018-06-19 22:49 UTC (permalink / raw)
  To: gvaradar; +Cc: benve, netdev
In-Reply-To: <20180619151524.1326-1-gvaradar@cisco.com>

From: Govindarajulu Varadarajan <gvaradar@cisco.com>
Date: Tue, 19 Jun 2018 08:15:24 -0700

> lockdep spotted that we are using rfs_h.lock in enic_get_rxnfc() without
> initializing. rfs_h.lock is initialized in enic_open(). But ethtool_ops
> can be called when interface is down.
> 
> Move enic_rfs_flw_tbl_init to enic_probe.
 ...
> Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v2] net: hamradio: use eth_broadcast_addr
From: David Miller @ 2018-06-19 22:52 UTC (permalink / raw)
  To: stefan; +Cc: netdev, linux-kernel
In-Reply-To: <20180617214058.6926-1-stefan@agner.ch>

From: Stefan Agner <stefan@agner.ch>
Date: Sun, 17 Jun 2018 23:40:53 +0200

> The array bpq_eth_addr is only used to get the size of an
> address, whereas the bcast_addr is used to set the broadcast
> address. This leads to a warning when using clang:
> drivers/net/hamradio/bpqether.c:94:13: warning: variable 'bpq_eth_addr' is not
>       needed and will not be emitted [-Wunneeded-internal-declaration]
> static char bpq_eth_addr[6];
>             ^
> 
> Remove both variables and use the common eth_broadcast_addr
> to set the broadcast address.
> 
> Signed-off-by: Stefan Agner <stefan@agner.ch>

Applied.

^ permalink raw reply

* Re: [PATCH net] ipvlan: use ETH_MAX_MTU as max mtu
From: David Miller @ 2018-06-19 22:54 UTC (permalink / raw)
  To: lucien.xin; +Cc: netdev, jarod, maheshb
In-Reply-To: <3a944c2cbcff6df280847770decca4875b671959.1529309757.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Mon, 18 Jun 2018 16:15:57 +0800

> Similar to the fixes on team and bonding, this restores the ability
> to set an ipvlan device's mtu to anything higher than 1500.
> 
> Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Applied, thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox