Netdev List
 help / color / mirror / Atom feed
* Re: [RFC PATCH net-next 03/19] ipv6: Clear nexthop flags upon netdev up
From: David Ahern @ 2018-01-03 23:08 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: Ido Schimmel, netdev, davem, roopa, nicolas.dichtel, mlxsw
In-Reply-To: <20180103205321.GA13248@splinter>

On 1/3/18 1:53 PM, Ido Schimmel wrote:
> On Wed, Jan 03, 2018 at 11:47:16AM -0700, David Ahern wrote:
>> On 1/3/18 10:40 AM, Ido Schimmel wrote:
>>> David, can we please get back to the issue at hand? What's the problem
>>> with the location of the call to rt6_sync_up()?
>>
>> My original comment was asking why do it on NETDEV_CHANGE when it should
>> only be needed on NETDEV_UP.
> 
> I can condition the call to rt6_sync_up() on the event being NETDEV_UP,
> but the location needs to stay the same. Before that the interface still
> doesn't have an IP address.

it's fine as is. The NETDEV_CHANGE section has a break after calling
rt6_sync_up(RTNH_F_LINKDOWN) which is added later. To your point,
addrconf_dev_config will add a linklocal address.

^ permalink raw reply

* Re: KASAN: slab-out-of-bounds Write in tcp_v6_syn_recv_sock
From: Cong Wang @ 2018-01-03 23:31 UTC (permalink / raw)
  To: Ozgur
  Cc: syzbot, David Miller, Alexey Kuznetsov, LKML,
	Linux Kernel Network Developers, syzkaller-bugs@googlegroups.com,
	Hideaki YOSHIFUJI, avejwatson@fb.com, ilyal@mellanox.com,
	aviadye@mellanox.com
In-Reply-To: <681471515012954@web19j.yandex.ru>

On Wed, Jan 3, 2018 at 12:55 PM, Ozgur <ozgur@goosey.org> wrote:
>
>
> 03.01.2018, 21:57, "Cong Wang" <xiyou.wangcong@gmail.com>:
>> On Tue, Jan 2, 2018 at 3:58 PM, syzbot
>> <syzbot+6dc95bddc6976b800b0b@syzkaller.appspotmail.com> wrote:
>>>  Hello,
>>>
>>>  syzkaller hit the following crash on
>>>  61233580f1f33c50e159c50e24d80ffd2ba2e06b
>>>  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
>>>  compiler: gcc (GCC) 7.1.1 20170620
>>>  .config is attached
>>>  Raw console output is attached.
>>>  C reproducer is attached
>>>  syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>>>  for information about syzkaller reproducers
>>>
>>>  IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>>  Reported-by: syzbot+6dc95bddc6976b800b0b@syzkaller.appspotmail.com
>>>  It will help syzbot understand when the bug is fixed. See footer for
>>>  details.
>>>  If you forward the report, please keep this part and the footer.
>>>
>>>  TCP: request_sock_TCPv6: Possible SYN flooding on port 20002. Sending
>>>  cookies. Check SNMP counters.
>>>  ==================================================================
>>>  BUG: KASAN: slab-out-of-bounds in memcpy include/linux/string.h:344 [inline]
>>>  BUG: KASAN: slab-out-of-bounds in tcp_v6_syn_recv_sock+0x628/0x23a0
>>>  net/ipv6/tcp_ipv6.c:1144
>>>  Write of size 160 at addr ffff8801cbdd7460 by task syzkaller545407/3196
>>>
>>>  CPU: 1 PID: 3196 Comm: syzkaller545407 Not tainted 4.15.0-rc5+ #241
>>>  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>>>  Google 01/01/2011
>>>  Call Trace:
>>>   <IRQ>
>>>   __dump_stack lib/dump_stack.c:17 [inline]
>>>   dump_stack+0x194/0x257 lib/dump_stack.c:53
>>>   print_address_description+0x73/0x250 mm/kasan/report.c:252
>>>   kasan_report_error mm/kasan/report.c:351 [inline]
>>>   kasan_report+0x25b/0x340 mm/kasan/report.c:409
>>>   check_memory_region_inline mm/kasan/kasan.c:260 [inline]
>>>   check_memory_region+0x137/0x190 mm/kasan/kasan.c:267
>>>   memcpy+0x37/0x50 mm/kasan/kasan.c:303
>>>   memcpy include/linux/string.h:344 [inline]
>>>   tcp_v6_syn_recv_sock+0x628/0x23a0 net/ipv6/tcp_ipv6.c:1144
>>
>> tls_init() changes sk->sk_prot from IPv6 to IPv4, which leads
>> to this bug. I guess IPv6 is not supported for TLS? If so, need
>> a check on proto in tls_init()...
>
> Hello,
>
> I think IPv6 supports with TLS.
> There was a previously posted commit by Mellanox:
>
> https://patchwork.ozlabs.org/patch/801530/

Good to know.

Can you resend the fix? It could probably fix another warning
reported by syzbot too.

^ permalink raw reply

* Re: "lockless" qdisc breaks tx_queue_len change too?
From: Cong Wang @ 2018-01-03 23:41 UTC (permalink / raw)
  To: John Fastabend; +Cc: Linux Kernel Network Developers
In-Reply-To: <cb5e1c62-9f71-b6ef-8a67-d094ce5c8f16@gmail.com>

On Wed, Jan 3, 2018 at 10:09 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
> On 01/02/2018 08:41 PM, Cong Wang wrote:
>> Hi, John
>>
>> While reviewing your ptr_ring fix again today, it looks like your
>> "lockless" qdisc patchset breaks dev->tx_queue_len behavior.
>>
>> Before your patchset, dev->tx_queue_len is merely an integer to read,
>> after your patchset, the skb array has to be resized when
>> dev->tx_queue_len changes, but I don't see any qdisc code handles
>> this...
>>
>> Also, because of that, I doubt __skb_array_empty() in
>> pfifo_fast_dequeue() can be safe any more even with your ptr_ring fix.
>>
>> What am I missing?
>>
>
> I dropped support for tx_queue_len changes after qdisc has been
> created. The only check is at init time when building the qdisc.

This is where it breaks.


>
> Before this series teql and pfifo_fast were the only qdiscs that
> used tx_queue_len other qdiscs used other mechanisms or copied
> tx_queue_len at init time. So the API is inconsistent.

Yeah, pfifo_fast was able to drop based on latest value of tx_queue_len
before your patchset, this is why I am complaining.


>
> OK, but arguably its kAPI now and needs to be supported on live
> qdiscs. So couple options drop the __skb_array_empty() check,
> stop supporting changes on running qdiscs, or do a qdisc swap
> with the new array.

I don't think we can break the old behavior of tx_queue_len change
for pfifo_fast, people may already rely on it.

Doing a swap seems reasonable.

>
> I'm tempted to make the qdisc swap work, still need benchmarks
> I guess without the empty check. Either way to get it working
> we need a callback from tx_queue_len code paths.

Right, probably need a new ops in Qdisc_ops.


>
> Unfortunately, I guess someone somewhere probably uses pfifo_fast
> and changes there queue length with a script after creating the
> qdisc and expects it to work.
>

This is my concern as well. I will work on some patches, this doesn't
look trivial to solve at all.

Thanks.

^ permalink raw reply

* Re: WARNING in sk_stream_kill_queues (2)
From: Cong Wang @ 2018-01-03 23:44 UTC (permalink / raw)
  To: syzbot
  Cc: David Miller, Greg KH, Kate Stewart, LKML, Ingo Molnar,
	Linux Kernel Network Developers, Philippe Ombredanne,
	syzkaller-bugs, Thomas Gleixner
In-Reply-To: <94eb2c18a1d87f79230561d3de2c@google.com>

#syz dup: KASAN: slab-out-of-bounds Write in tcp_v6_syn_recv_sock

^ permalink raw reply

* Re: [patch net-next v4 00/10] net: sched: allow qdiscs to share filter block instances
From: Jakub Kicinski @ 2018-01-03 23:51 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: David Ahern, netdev, davem, jhs, xiyou.wangcong, mlxsw, andrew,
	vivien.didelot, f.fainelli, michael.chan, ganeshgr, saeedm,
	matanb, leonro, idosch, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, ogerlitz, john.fastabend, daniel
In-Reply-To: <20180103172209.GD2067@nanopsycho.orion>

On Wed, 3 Jan 2018 18:22:09 +0100, Jiri Pirko wrote:
> However I don't agree about breaking the existing filter add and show
> and also imposibility to make not-shared block shared in the runtime
> before defining it first.

FWIW I would agree with David that allowing add on a shared block
modify filters on another interface can break existing users.  (No
opinion on dump and lifetime).

^ permalink raw reply

* Re: [PATCH bpf-next v4 1/3] libbpf: add function to setup XDP
From: Eric Leblond @ 2018-01-03 23:59 UTC (permalink / raw)
  To: daniel, Toshiaki Makita, Philippe Ombredanne
  Cc: Alexei Starovoitov, netdev, linux-kernel
In-Reply-To: <20171230204116.30871-2-eric@regit.org>

Hello,

On Sat, 2017-12-30 at 21:41 +0100, Eric Leblond wrote:
> Most of the code is taken from set_link_xdp_fd() in bpf_load.c and
> slightly modified to be library compliant.

I've just discovered this patch is breaking the build of samples/bpf/
(nlattr not included at least and some int type problem). I'm going to
resubmit a patchset fixing this.

Sorry for the noise.

Best regards,
-- 
Eric Leblond <eric@regit.org>
Blog: https://home.regit.org/

^ permalink raw reply

* Re: [patch net-next v2 00/10] Add support for resource abstraction
From: Arkadi Sharshevsky @ 2018-01-04  0:07 UTC (permalink / raw)
  To: David Ahern, Jiri Pirko
  Cc: netdev, roopa, davem, mlxsw, andrew, vivien.didelot, f.fainelli,
	michael.chan, ganeshgr, saeedm, matanb, leonro, idosch,
	jakub.kicinski, ast, daniel, simon.horman, pieter.jansenvanvuuren,
	john.hurley, alexander.h.duyck, linville, gospo, steven.lin1,
	yuvalm, ogerlitz
In-Reply-To: <49c72225-6437-54d5-a046-96fff5b65ce9@cumulusnetworks.com>



On 01/03/2018 08:29 PM, David Ahern wrote:
> On 1/3/18 11:17 AM, Jiri Pirko wrote:
>> Wed, Jan 03, 2018 at 07:14:16PM CET, dsa@cumulusnetworks.com wrote:
>>> On 1/3/18 11:05 AM, Arkadi Sharshevsky wrote:
>>>> As I stated this is a user-space bug which I fixed, and updated my repo
>>>> so please pull. Devlink uses mnl,and currently mnl does not support
>>>> extended ack. I added support for this in my local ver of libmnl:
>>>>
>>>> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Farkadis%2Flibmnl.git&data=02%7C01%7Carkadis%40mellanox.com%7C5c86b6240eb84459c6ae08d552d7f9a4%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636506009929977440&sdata=sgrNzMhPwe63BIVxexZTjl%2FXqW51kpuRiHVhTDNaa70%3D&reserved=0
>>>>
>>>> On branch master, so you can check it out. Besides this bugs, which were
>>>> userspace, can please specify what are the pending problems from your
>>>> point of view? Thanks!
>>>
>>> devlink is in iproute2 package and it has extack support. See 'git log
>>> lib/libnetlink.c'
>>
>> Dave, devlink uses libmnl.
>>
> 
> Now I remember. You wrote it independently and but needed iproute2 be a
> delivery vehicle. It uses none of the common infrastructure from
> iproute2. Could we make this more difficult ....
> 
> Sometime in the next day I will jump through the hoops to get a proper
> devlink command.
> 

This actually was very confusing, I think the extack should be
handled by libmnl and iproute should use mnl_cb_run() routines
and not to implement its own. That way we could both benefit
from that.

You actually do use libmnl in libnetlink.c only for parsing
the headers, and its a dependency for extack handling.

I see this as a completely independent user space issue, which
doesn't have to do anything with this patchset. Not to mention
that everything is working right now.

^ permalink raw reply

* Re: [PATCH net-next 0/2] Enable virtio to act as a master for a passthru device
From: Samudrala, Sridhar @ 2018-01-04  0:22 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jakub Kicinski, Brandeburg, Jesse, Michael S. Tsirkin,
	Stephen Hemminger, Netdev, virtualization, virtio-dev,
	Alexander Duyck
In-Reply-To: <CAKgT0UdtHVq3WWpjHggNabw_+2piaRQAKguivV690HdzKrg62w@mail.gmail.com>

On 1/3/2018 10:28 AM, Alexander Duyck wrote:
> On Wed, Jan 3, 2018 at 10:14 AM, Samudrala, Sridhar
> <sridhar.samudrala@intel.com> wrote:
>>
>> On 1/3/2018 8:59 AM, Alexander Duyck wrote:
>>> On Tue, Jan 2, 2018 at 6:16 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>>>> On Tue,  2 Jan 2018 16:35:36 -0800, Sridhar Samudrala wrote:
>>>>> This patch series enables virtio to switch over to a VF datapath when a
>>>>> VF
>>>>> netdev is present with the same MAC address. It allows live migration of
>>>>> a VM
>>>>> with a direct attached VF without the need to setup a bond/team between
>>>>> a
>>>>> VF and virtio net device in the guest.
>>>>>
>>>>> The hypervisor needs to unplug the VF device from the guest on the
>>>>> source
>>>>> host and reset the MAC filter of the VF to initiate failover of datapath
>>>>> to
>>>>> virtio before starting the migration. After the migration is completed,
>>>>> the
>>>>> destination hypervisor sets the MAC filter on the VF and plugs it back
>>>>> to
>>>>> the guest to switch over to VF datapath.
>>>>>
>>>>> It is based on netvsc implementation and it may be possible to make this
>>>>> code
>>>>> generic and move it to a common location that can be shared by netvsc
>>>>> and virtio.
>>>>>
>>>>> This patch series is based on the discussion initiated by Jesse on this
>>>>> thread.
>>>>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>>>> How does the notion of a device which is both a bond and a leg of a
>>>> bond fit with Alex's recent discussions about feature propagation?
>>>> Which propagation rules will apply to VirtIO master?  Meaning of the
>>>> flags on a software upper device may be different.  Why muddy the
>>>> architecture like this and not introduce a synthetic bond device?
>>> It doesn't really fit with the notion I had. I think there may have
>>> been a bit of a disconnect as I have been out for the last week or so
>>> for the holidays.
>>>
>>> My thought on this was that the feature bit should be spawning a new
>>> para-virtual bond device and that bond should have the virto and the
>>> VF as slaves. Also I thought there was some discussion about trying to
>>> reuse as much of the netvsc code as possible for this so that we could
>>> avoid duplication of effort and have the two drivers use the same
>>> approach. It seems like it should be pretty straight forward since you
>>> would have the feature bit in the case of virto, and netvsc just does
>>> this sort of thing by default if I am not mistaken.
>> This patch is mostly based on netvsc implementation. The only change is
>> avoiding the
>> explicit dev_open() call of the VF netdev after a delay. I am assuming that
>> the guest userspace
>> will bring up the VF netdev and the hypervisor will update the MAC filters
>> to switch to
>> the right data path.
>> We could commonize the code and make it shared between netvsc and virtio. Do
>> we want
>> to do this right away or later? If so, what would be a good location for
>> these shared functions?
>> Is it net/core/dev.c?
> No, I would think about starting a new driver file in "/drivers/net/".
> The idea is this driver would be utilized to create a bond
> automatically and set the appropriate registration hooks. If nothing
> else you could probably just call it something generic like virt-bond
> or vbond or whatever.

We are trying to avoid creating another driver or a device.  Can we look 
into
consolidation of the 2 implementations(virtio & netvsc) as a later patch?
>
>> Also, if we want to go with a solution that creates a bond device, do we
>> want virtio_net/netvsc
>> drivers to create a upper device?  Such a solution is already possible via
>> config scripts that can
>> create a bond with virtio and a VF net device as slaves.  netvsc and this
>> patch series is trying to
>> make it as simple as possible for the VM to use directly attached devices
>> and support live migration
>> by switching to virtio datapath as a backup during the migration process
>> when the VF device
>> is unplugged.
> We all understand that. But you are making the solution very virtio
> specific. We want to see this be usable for other interfaces such as
> netsc and whatever other virtual interfaces are floating around out
> there.
>
> Also I haven't seen us address what happens as far as how we will
> handle this on the host. My thought was we should have a paired
> interface. Something like veth, but made up of a bond on each end. So
> in the host we should have one bond that has a tap/vhost interface and
> a VF port representor, and on the other we would be looking at the
> virtio interface and the VF. Attaching the tap/vhost to the bond could
> be a way of triggering the feature bit to be set in the virtio. That
> way communication between the guest and the host won't get too
> confusing as you will see all traffic from the bonded MAC address
> always show up on the host side bond instead of potentially showing up
> on two unrelated interfaces. It would also make for a good way to
> resolve the east/west traffic problem on hosts since you could just
> send the broadcast/multicast traffic via the tap/vhost/virtio channel
> instead of having to send it back through the port representor and eat
> up all that PCIe bus traffic.
 From the host point of view, here is a simple script that needs to be 
run to do the
live migration. We don't need any bond configuration on the host.

virsh detach-interface $DOMAIN hostdev --mac $MAC
ip link set $PF vf $VF_NUM mac $ZERO_MAC

virsh migrate --live $DOMAIN qemu+ssh://$REMOTE_HOST/system

ssh $REMOTE_HOST ip link set $PF vf $VF_NUM mac $MAC
ssh $REMOTE_HOST virsh attach-interface $DOMAIN hostdev $REMOTE_HOSTDEV 
--mac $MAC

^ permalink raw reply

* Re: [wireless-testsing2:master 1/4] drivers/net/netdevsim/bpf.c:130:14: sparse: incompatible types for 'case' statement
From: Jakub Kicinski @ 2018-01-04  1:02 UTC (permalink / raw)
  To: kbuild test robot; +Cc: David S. Miller, kbuild-all, netdev, Bob Copeland
In-Reply-To: <201801040317.tEFQay7W%fengguang.wu@intel.com>

On Thu, 4 Jan 2018 03:53:20 +0800, kbuild test robot wrote:
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-testing.git master
> head:   6b3b30d0c31ddb2f4d8208c90bc2b4adef47204d
> commit: af2cae39f6ab9dc596616d6a28c7772e1dd55e91 [1/4] Merge remote-tracking branch 'wireless-drivers-next/master'
> reproduce:
>         # apt-get install sparse
>         git checkout af2cae39f6ab9dc596616d6a28c7772e1dd55e91
>         make ARCH=x86_64 allmodconfig
>         make C=1 CF=-D__CHECK_ENDIAN__

>    drivers/net/netdevsim/bpf.c: In function 'nsim_bpf_setup_tc_block_cb':
> >> drivers/net/netdevsim/bpf.c:130:7: error: 'TC_CLSBPF_REPLACE' undeclared (first use in this function); did you mean 'TC_RED_REPLACE'?  
>      case TC_CLSBPF_REPLACE:
>           ^~~~~~~~~~~~~~~~~
>           TC_RED_REPLACE

FWIW looks like the tree contains old net-next code and latest net
(linux/master) code.  Pulling from net-next will solve this.

> :::::: TO: Jakub Kicinski <jakub.kicinski@netronome.com>
> :::::: CC: Daniel Borkmann <daniel@iogearbox.net>

Interestingly Daniel and I were not CCed on the report, is this
intentional?

^ permalink raw reply

* Re: [PATCH] net/mlx5e: hide an unused variable
From: Saeed Mahameed @ 2018-01-04  1:04 UTC (permalink / raw)
  To: Arnd Bergmann, Matan Barak, Leon Romanovsky
  Cc: Or Gerlitz, Hadar Hen Zion, David S. Miller, Paul Blakey,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20180103224022.3737385-1-arnd-r2nGTMty4D4@public.gmane.org>



On 1/3/2018 2:40 PM, Arnd Bergmann wrote:
> The uplink_rpriv variable was added at the start of the function but
> only used inside of an #ifdef:
> 
> drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function 'mlx5e_route_lookup_ipv6':
> drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:1549:25: error: unused variable 'uplink_rpriv' [-Werror=unused-variable]
> 
> This moves the declaration into that #ifdef as well.
> 
> Fixes: 5ed99fb421d4 ("net/mlx5e: Move ethernet representors data into separate struct")
> Signed-off-by: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>

Acked-by: Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Thank you Arnd.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next] net: sched: fix tcf_block_get_ext() in case CONFIG_NET_CLS is not set
From: Jakub Kicinski @ 2018-01-04  1:30 UTC (permalink / raw)
  To: netdev; +Cc: oss-drivers, jiri, aring, Quentin Monnet

From: Quentin Monnet <quentin.monnet@netronome.com>

The definition of functions tcf_block_get() and tcf_block_get_ext()
depends of CONFIG_NET_CLS being set. When those functions gained extack
support, only one version of the declaration of those functions was
updated. Function tcf_block_get() was later fixed with commit
3c1490913f3b ("net: sch: api: fix tcf_block_get").

Change arguments of tcf_block_get_ext() for the case when CONFIG_NET_CLS
is not set.

Fixes: 8d1a77f974ca ("net: sch: api: add extack support in tcf_block_get")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
---
 include/net/pkt_cls.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 5cd3cf51cb35..c4f4e46ea8d6 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -87,7 +87,8 @@ int tcf_block_get(struct tcf_block **p_block,
 
 static inline
 int tcf_block_get_ext(struct tcf_block **p_block, struct Qdisc *q,
-		      struct tcf_block_ext_info *ei)
+		      struct tcf_block_ext_info *ei,
+		      struct netlink_ext_ack *extack)
 {
 	return 0;
 }
-- 
2.15.1

^ permalink raw reply related

* [net-next PATCH 1/2] bpf: sockmap remove unused function
From: John Fastabend @ 2018-01-04  1:57 UTC (permalink / raw)
  To: borkmann, alexei.starovoitov; +Cc: netdev

This was added for some work that was eventually factored out but the
helper call was missed. Remove it now and add it back later if needed.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 kernel/bpf/sockmap.c |    8 --------
 1 file changed, 8 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 5ee2e41..3f662ee 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -96,14 +96,6 @@ static inline struct smap_psock *smap_psock_sk(const struct sock *sk)
 	return rcu_dereference_sk_user_data(sk);
 }
 
-/* compute the linear packet data range [data, data_end) for skb when
- * sk_skb type programs are in use.
- */
-static inline void bpf_compute_data_end_sk_skb(struct sk_buff *skb)
-{
-	TCP_SKB_CB(skb)->bpf.data_end = skb->data + skb_headlen(skb);
-}
-
 enum __sk_action {
 	__SK_DROP = 0,
 	__SK_PASS,

^ permalink raw reply related

* [net-next PATCH 2/2] bpf: only build sockmap with CONFIG_INET
From: John Fastabend @ 2018-01-04  1:57 UTC (permalink / raw)
  To: borkmann, alexei.starovoitov; +Cc: netdev
In-Reply-To: <20180104015739.14160.96127.stgit@john-Precision-Tower-5810>

The sockmap infrastructure is only aware of TCP sockets at the
moment. In the future we plan to add UDP. In both cases CONFIG_NET
should be built-in.

So lets only build sockmap if CONFIG_INET is enabled.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/linux/bpf.h       |    2 +-
 include/linux/bpf_types.h |    2 +-
 kernel/bpf/Makefile       |    2 ++
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7810ae5..9e03046 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -554,7 +554,7 @@ static inline bool bpf_prog_is_dev_bound(struct bpf_prog_aux *aux)
 }
 #endif /* CONFIG_NET && CONFIG_BPF_SYSCALL */
 
-#if defined(CONFIG_STREAM_PARSER) && defined(CONFIG_BPF_SYSCALL)
+#if defined(CONFIG_STREAM_PARSER) && defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_INET)
 struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key);
 int sock_map_prog(struct bpf_map *map, struct bpf_prog *prog, u32 type);
 #else
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 978c1d9..19b8349 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -42,7 +42,7 @@
 BPF_MAP_TYPE(BPF_MAP_TYPE_HASH_OF_MAPS, htab_of_maps_map_ops)
 #ifdef CONFIG_NET
 BPF_MAP_TYPE(BPF_MAP_TYPE_DEVMAP, dev_map_ops)
-#ifdef CONFIG_STREAM_PARSER
+#if defined(CONFIG_STREAM_PARSER) && defined(CONFIG_INET)
 BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops)
 #endif
 BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops)
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index e691da0..a713fd2 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -9,9 +9,11 @@ obj-$(CONFIG_BPF_SYSCALL) += devmap.o
 obj-$(CONFIG_BPF_SYSCALL) += cpumap.o
 obj-$(CONFIG_BPF_SYSCALL) += offload.o
 ifeq ($(CONFIG_STREAM_PARSER),y)
+ifeq ($(CONFIG_INET),y)
 obj-$(CONFIG_BPF_SYSCALL) += sockmap.o
 endif
 endif
+endif
 ifeq ($(CONFIG_PERF_EVENTS),y)
 obj-$(CONFIG_BPF_SYSCALL) += stackmap.o
 endif

^ permalink raw reply related

* RE: [PATCH v2 net,stable 0/2] net: fec: clean up in the cases of probe error
From: Andy Duan @ 2018-01-04  2:07 UTC (permalink / raw)
  To: David Miller
  Cc: festevam@gmail.com, netdev@vger.kernel.org,
	troy.kisky@boundarydevices.com, andrew@lunn.ch
In-Reply-To: <20180103.114057.1176360024955452896.davem@davemloft.net>

From: David Miller <davem@davemloft.net> Sent: Thursday, January 04, 2018 12:41 AM
>> The simple patches just clean up in the cases of probe error like
>> restore dev_id and handle the defer probe when regulator is still not ready.
>>
>> v2:
>> * Fabio Estevam's comment to suggest split v1 to separate patches.
>
>Series applied and queued up for -stable, thanks.

Sorry, pls hold on apply these series, there have one comment from Troy kisky, sorry for the  inconvenience due to my mistake.

^ permalink raw reply

* Re: [PATCH net-next] net: sched: fix tcf_block_get_ext() in case CONFIG_NET_CLS is not set
From: Cong Wang @ 2018-01-04  2:08 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Linux Kernel Network Developers, oss-drivers, Jiri Pirko,
	Alexander Aring, Quentin Monnet
In-Reply-To: <20180104013045.29510-1-jakub.kicinski@netronome.com>

On Wed, Jan 3, 2018 at 5:30 PM, Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
> From: Quentin Monnet <quentin.monnet@netronome.com>
>
> The definition of functions tcf_block_get() and tcf_block_get_ext()
> depends of CONFIG_NET_CLS being set. When those functions gained extack
> support, only one version of the declaration of those functions was
> updated. Function tcf_block_get() was later fixed with commit
> 3c1490913f3b ("net: sch: api: fix tcf_block_get").
>
> Change arguments of tcf_block_get_ext() for the case when CONFIG_NET_CLS
> is not set.

There is one already:
https://patchwork.kernel.org/patch/10130849/

^ permalink raw reply

* [PATCH v3 net,stable 0/2] net: fec: clean up in the cases of probe error
From: Fugang Duan @ 2018-01-04  2:15 UTC (permalink / raw)
  To: festevam, davem; +Cc: netdev, troy.kisky, andrew, fugang.duan

The simple patches just clean up in the cases of probe error like restore dev_id and
handle the defer probe when regulator is still not ready.

v2:
* Fabio Estevam's comment to suggest split v1 to separate patches.
v3:
* Restore dev_id before failed_ioremap path from Troy Kisky's comment.

Fugang Duan (2):
  net: fec: restore dev_id in the cases of probe error
  net: fec: defer probe if regulator is not ready

 drivers/net/ethernet/freescale/fec_main.c | 5 +++++
 1 file changed, 5 insertions(+)

-- 
1.9.1

^ permalink raw reply

* [PATCH v3 net,stable 1/2] net: fec: restore dev_id in the cases of probe error
From: Fugang Duan @ 2018-01-04  2:15 UTC (permalink / raw)
  To: festevam, davem; +Cc: netdev, troy.kisky, andrew, fugang.duan
In-Reply-To: <1515032129-7899-1-git-send-email-fugang.duan@nxp.com>

The static variable dev_id always plus one before netdev registerred.
It should restore the dev_id value in the cases of probe error.

Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index e17d10b..732a8e3 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3574,6 +3574,7 @@ static int fec_enet_get_irq_cnt(struct platform_device *pdev)
 		of_phy_deregister_fixed_link(np);
 failed_phy:
 	of_node_put(phy_node);
+	dev_id--;
 failed_ioremap:
 	free_netdev(ndev);
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH v3 net,stable 2/2] net: fec: defer probe if regulator is not ready
From: Fugang Duan @ 2018-01-04  2:15 UTC (permalink / raw)
  To: festevam, davem; +Cc: netdev, troy.kisky, andrew, fugang.duan
In-Reply-To: <1515032129-7899-1-git-send-email-fugang.duan@nxp.com>

Defer probe if regulator is not ready. E.g. some regulator is fixed
regulator controlled by i2c expander gpio, the i2c device may be probed
after the driver, then it should handle the case of defer probe error.

Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 732a8e3..a32fbf5 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3489,6 +3489,10 @@ static int fec_enet_get_irq_cnt(struct platform_device *pdev)
 			goto failed_regulator;
 		}
 	} else {
+		if (PTR_ERR(fep->reg_phy) == -EPROBE_DEFER) {
+			ret = -EPROBE_DEFER;
+			goto failed_regulator;
+		}
 		fep->reg_phy = NULL;
 	}
 
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH v2 net,stable 0/2] net: fec: clean up in the cases of probe error
From: David Miller @ 2018-01-04  2:20 UTC (permalink / raw)
  To: fugang.duan; +Cc: festevam, netdev, troy.kisky, andrew
In-Reply-To: <AM4PR0401MB2260428F5AAE1C6E3A6EE175FF1F0@AM4PR0401MB2260.eurprd04.prod.outlook.com>

From: Andy Duan <fugang.duan@nxp.com>
Date: Thu, 4 Jan 2018 02:07:40 +0000

> Sorry, pls hold on apply these series, there have one comment from
> Troy kisky, sorry for the inconvenience due to my mistake.

Once I have applied patches, they are there in my GIT tree and and
cannot be "undone".

You have to send me relative fixes on top your changes.

^ permalink raw reply

* Re: [PATCH v3 net,stable 0/2] net: fec: clean up in the cases of probe error
From: David Miller @ 2018-01-04  2:22 UTC (permalink / raw)
  To: fugang.duan; +Cc: festevam, netdev, troy.kisky, andrew
In-Reply-To: <1515032129-7899-1-git-send-email-fugang.duan@nxp.com>

From: Fugang Duan <fugang.duan@nxp.com>
Date: Thu, 4 Jan 2018 10:15:27 +0800

> The simple patches just clean up in the cases of probe error like restore dev_id and
> handle the defer probe when regulator is still not ready.

As I stated, v2 of these patches are already in my tree, so
this patch series will not apply.

You need to send me fixes relative to v2.

^ permalink raw reply

* RE: [PATCH v3 net,stable 0/2] net: fec: clean up in the cases of probe error
From: Andy Duan @ 2018-01-04  2:25 UTC (permalink / raw)
  To: David Miller
  Cc: festevam@gmail.com, netdev@vger.kernel.org,
	troy.kisky@boundarydevices.com, andrew@lunn.ch
In-Reply-To: <20180103.212241.1465013589762600791.davem@davemloft.net>

From: David Miller <davem@davemloft.net> Sent: Thursday, January 04, 2018 10:23 AM
>> The simple patches just clean up in the cases of probe error like
>> restore dev_id and handle the defer probe when regulator is still not ready.
>
>As I stated, v2 of these patches are already in my tree, so this patch series will
>not apply.
>
>You need to send me fixes relative to v2.

Okay, got it.  I will submit another patch to fix it. Sorry for the inconvenience.
Thanks very much.

^ permalink raw reply

* Re: [patch net-next v2 00/10] Add support for resource abstraction
From: David Ahern @ 2018-01-04  2:28 UTC (permalink / raw)
  To: Arkadi Sharshevsky, Jiri Pirko, netdev, roopa
  Cc: davem, mlxsw, andrew, vivien.didelot, f.fainelli, michael.chan,
	ganeshgr, saeedm, matanb, leonro, idosch, jakub.kicinski, ast,
	daniel, simon.horman, pieter.jansenvanvuuren, john.hurley,
	alexander.h.duyck, linville, gospo, steven.lin1, yuvalm, ogerlitz
In-Reply-To: <0f861e90-63d3-2666-ef2d-0fc91beae957@mellanox.com>

On 1/3/18 11:05 AM, Arkadi Sharshevsky wrote:
> 
> 
> On 01/02/2018 08:05 PM, David Ahern wrote:
>> On 1/1/18 7:58 AM, Arkadi Sharshevsky wrote:
>>>
>>> Just to summarize the current fixes required:
>>>
>>> 1. ERIF dpipe table size is reporting wrong size. More precisely the
>>>    ERIF table does not take rifs, so it should not be linked to the rif
>>>    bank resource (is not part of this patchset, future extension).
>>> 2. Extended ACK user-space bug.
>>> 3. ABI documentation- Not sure we agreed upon it, Jiri?
>>>
>>> If I missed something please respond. Nothing of the fixes mentioned
>>> above is relevant for this patchset actually.
>>>
>>
>> Can you fix the userspace command and then we come back to what else is
>> needed? Right now, it is hard to tell what is a user space bug and what
>> is a kernel space bug.
>>
>> For example:
>> $ devlink resource set pci/0000:03:00.0 path /kvd/linear size 10000
>> $ devlink resource show pci/0000:03:00.0
>> pci/0000:03:00.0:
>>   name kvd size 245760 size_valid true
>>   resources:
>>     name linear size 98304 occ 0
>>     name hash_double size 60416
>>     name hash_single size 87040
>>
>> The set command did not fail, yet there is no size_new arg in the output
>> like there is for this change:
>>
>> $ devlink resource set pci/0000:03:00.0 path /kvd/linear size 0
>> $ devlink resource show pci/0000:03:00.0
>> pci/0000:03:00.0:
>>   name kvd size 245760 size_valid true
>>   resources:
>>     name linear size 98304 size_new 0 occ 0
>>     name hash_double size 60416
>>     name hash_single size 87040
>>
> 
> As I stated this is a user-space bug which I fixed, and updated my repo
> so please pull. Devlink uses mnl,and currently mnl does not support
> extended ack. I added support for this in my local ver of libmnl:
> 
> https://github.com/arkadis/libmnl.git
> 
> On branch master, so you can check it out. Besides this bugs, which were
> userspace, can please specify what are the pending problems from your
> point of view? Thanks!
> 

Again, my comments all stem from user experience.

Can you explain what "double_word" means for a unit? I would expect a
units to be kB or count (or items or entries).

$ devlink resource show pci/0000:03:00.0
pci/0000:03:00.0:
  name kvd size 245760 unit double_word size_valid true
  resources:
    name linear size 98304 occ 0 unit double_word
    name hash_double size 60416 unit double_word
    name hash_single size 87040 unit double_word

While that is confusing here from the userspace command it goes hand in
hand with patch 2 and:

+enum devlink_resource_unit {
+	DEVLINK_RESOURCE_UNIT_DOUBLE_WORD,
+};


Also, it seems like the occ of 0 is wrong since we know from past
responses that if I set linear to 0 all of networking breaks.



How does a user learn the granularity of a resource:

$ devlink resource set pci/0000:03:00.0 path /kvd/hash_double size 50000
Error: mlxsw_spectrum: resource set with wrong granularity.

Try again with 51000 and then 52000 and ... Why not export the
granularity read-only? I don't see it in the proposed UAPI.


And then on the reload:

$ devlink reload pci/0000:03:00.0
Error: devlink: resources size validation failed.

Since the reload is not doing any resource sizing that error message is
confusing. Maybe something like "Sum of the resource components exceeds
total size."


Finally, I still contend a 1-line description of each of the resources
goes a long way to improving the user friendliness of this set.

^ permalink raw reply

* [PATCH net-next 1/2] ip: do not set RFS core on error queue reads
From: Soheil Hassas Yeganeh @ 2018-01-04  2:47 UTC (permalink / raw)
  To: davem, netdev
  Cc: pjt, ycheng, Soheil Hassas Yeganeh, Willem de Bruijn,
	Eric Dumazet, Neal Cardwell

From: Soheil Hassas Yeganeh <soheil@google.com>

We should only record RPS on normal reads and writes.
In single threaded processes, all calls record the same state. In
multi-threaded processes where a separate thread processes
errors, the RFS table mispredicts.

Note that, when CONFIG_RPS is disabled, sock_rps_record_flow
is a noop and no branch is added as a result of this patch.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/af_inet.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index bab98a4fedad..54cccdd8b1e3 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -790,7 +790,8 @@ int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 	int addr_len = 0;
 	int err;
 
-	sock_rps_record_flow(sk);
+	if (likely(!(flags & MSG_ERRQUEUE)))
+		sock_rps_record_flow(sk);
 
 	err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
 				   flags & ~MSG_DONTWAIT, &addr_len);
-- 
2.16.0.rc0.223.g4a4ac83678-goog

^ permalink raw reply related

* [PATCH net-next 2/2] net: revert "Update RFS target at poll for tcp/udp"
From: Soheil Hassas Yeganeh @ 2018-01-04  2:47 UTC (permalink / raw)
  To: davem, netdev
  Cc: pjt, ycheng, Soheil Hassas Yeganeh, Willem de Bruijn,
	Eric Dumazet, Neal Cardwell
In-Reply-To: <20180104024711.257600-1-soheil.kdev@gmail.com>

From: Soheil Hassas Yeganeh <soheil@google.com>

On multi-threaded processes, one common architecture is to have
one (or a small number of) threads polling sockets, and a
considerably larger pool of threads reading form and writing to the
sockets. When we set RPS core on tcp_poll() or udp_poll() we essentially
steer all packets of all the polled FDs to one (or small number of)
cores, creaing a bottleneck and/or RPS misprediction.

Another common architecture is to shard FDs among threads pinned
to cores. In such a setting, setting RPS core in tcp_poll() and
udp_poll() is redundant because the RFS core is correctly
set in recvmsg and sendmsg.

Thus, revert the following commit:
c3f1dbaf6e28 ("net: Update RFS target at poll for tcp/udp").

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp.c | 2 --
 net/ipv4/udp.c | 2 --
 2 files changed, 4 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 7ac583a2b9fe..f68cb33d50d1 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -498,8 +498,6 @@ unsigned int tcp_poll(struct file *file, struct socket *sock, poll_table *wait)
 	const struct tcp_sock *tp = tcp_sk(sk);
 	int state;
 
-	sock_rps_record_flow(sk);
-
 	sock_poll_wait(file, sk_sleep(sk), wait);
 
 	state = inet_sk_state_load(sk);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e9c0d1e1772e..db72619e07e4 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2490,8 +2490,6 @@ unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait)
 	if (!skb_queue_empty(&udp_sk(sk)->reader_queue))
 		mask |= POLLIN | POLLRDNORM;
 
-	sock_rps_record_flow(sk);
-
 	/* Check for false positives due to checksum errors */
 	if ((mask & POLLRDNORM) && !(file->f_flags & O_NONBLOCK) &&
 	    !(sk->sk_shutdown & RCV_SHUTDOWN) && first_packet_length(sk) == -1)
-- 
2.16.0.rc0.223.g4a4ac83678-goog

^ permalink raw reply related

* [PATCH net,stable 1/1] net: fec: free/restore resource in related probe error pathes
From: Fugang Duan @ 2018-01-04  2:47 UTC (permalink / raw)
  To: troy.kisky, davem; +Cc: netdev, festevam, fugang.duan

Fixes in probe error path:
- Restore dev_id before failed_ioremap path.
  Fixes: ("net: fec: restore dev_id in the cases of probe error")
- Call of_node_put(phy_node) before failed_phy path.
  Fixes: ("net: fec: Support phys probed from devicetree and fixed-link")

Signed-off-by: Fugang Duan <fugang.duan@nxp.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index feed383..90aa69a 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3576,11 +3576,11 @@ static int fec_enet_get_irq_cnt(struct platform_device *pdev)
 failed_clk:
 	if (of_phy_is_fixed_link(np))
 		of_phy_deregister_fixed_link(np);
-failed_phy:
 	of_node_put(phy_node);
+failed_phy:
+	dev_id--;
 failed_ioremap:
 	free_netdev(ndev);
-	dev_id--;
 
 	return ret;
 }
-- 
1.9.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox