Netdev List

Netdev List
 help / color / mirror / Atom feed

* [iproute PATCH v3 0/3] Check user supplied interface name lengths
From: Phil Sutter @ 2017-10-02 11:46 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

This series adds explicit checks for user-supplied interface names to
make sure they fit Linux's requirements.

The first two patches simplify interface name parsing in some places -
these are side-effects of working on the actual implementation provided
in patch three.

Changes since v2:
- Changed patch 3 as suggested in review.

Changes since v1:
- Patches 1 and 2 introduced.
- Changes to patch 3 are listed in there.

Phil Sutter (3):
  ip{6,}tunnel: Avoid copying user-supplied interface name around
  tc: flower: No need to cache indev arg
  Check user supplied interface name lengths

 include/utils.h |  2 ++
 ip/ip6tunnel.c  |  9 +++++----
 ip/ipl2tp.c     |  4 +++-
 ip/iplink.c     | 31 ++++++++++++-------------------
 ip/ipmaddr.c    |  3 ++-
 ip/iprule.c     | 10 ++++++++--
 ip/iptunnel.c   | 29 +++++++++++++++--------------
 ip/iptuntap.c   |  6 ++++--
 lib/utils.c     | 29 +++++++++++++++++++++++++++++
 misc/arpd.c     |  3 ++-
 tc/f_flower.c   |  7 +++----
 11 files changed, 85 insertions(+), 48 deletions(-)

-- 
2.13.1

^ permalink raw reply

* v4.14-rc2/arm64 misaligned atomic in ip_expire() / skb_clone()
From: Mark Rutland @ 2017-10-02 11:57 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-arm-kernel, syzkaller
  Cc: David S. Miller, Willem de Bruijn, Eric Dumazet

Hi all,

I'm intermittently hitting splats like below in skb_clone() while
fuzzing v4.14-rc2 on arm64 with Syzkaller. It looks like the
atomic_inc() at the end of __skb_clone() is being passed a misaligned
pointer.

I've uploaded a number of splats and their associated (full) Syzkaller
logs, along with my kernel config to my kernel.org webspace [1]. It
might take a while for that to appear.

This isn't a pure v4.14-rc2, as I have a not-yet-upstream fix [2]
applied to avoid a userfaultfd bug. The userfaultfd syscall appears in
all of the Syzkaller logs, so there is the chance that this is related,
but as I've not seen any other issues I suspect that's unlikely.

Thanks,
Mark.

[1] https://www.kernel.org/pub/linux/kernel/people/mark/bugs/20171002-skb_clone-misaligned-atomic
[2] https://lkml.kernel.org/r/20170920180413.26713-1-aarcange@redhat.com

Unable to handle kernel paging request at virtual address ffff80002fd714a2
Mem abort info:
  Exception class = DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
Data abort info:
  ISV = 0, ISS = 0x00000033
  CM = 0, WnR = 0
swapper pgtable: 4k pages, 48-bit VAs, pgd = ffff20000eeb2000
[ffff80002fd714a2] *pgd=000000007eff7003, *pud=000000007eff6003, *pmd=00f800006fc00711
Internal error: Oops: 96000021 [#1] PREEMPT SMP
Modules linked in:
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.0-rc2-00001-gd7ad33d #115
Hardware name: linux,dummy-virt (DT)
task: ffff80003a901a80 task.stack: ffff80003a908000
PC is at __ll_sc_atomic_add+0x4/0x18 arch/arm64/include/asm/atomic_ll_sc.h:113
LR is at atomic_add arch/arm64/include/asm/atomic_lse.h:45 [inline]
LR is at __skb_clone+0x4a8/0x6c0 net/core/skbuff.c:873
pc : [<ffff20000a30ce44>] lr : [<ffff200009dffb58>] pstate: 10000145
sp : ffff80003efd86e0
x29: ffff80003efd86e0 x28: 000060003418b000 
x27: ffff20000ae55360 x26: ffff8000182c1608 
x25: ffff80002fd7137e x24: ffff8000182c1610 
x23: ffff20000ae60000 x22: ffff80001577871c 
x21: 1ffff00007dfb0e8 x20: ffff8000182c1540 
x19: ffff800015778640 x18: ffff20000da58140 
x17: 0000000000000000 x16: 0000000000000002 
x15: ffff20000e1485a0 x14: ffff2000082f912c 
x13: ffff2000082f8dcc x12: ffff2000082f8980 
x11: 1ffff00002aef0df x10: ffff100002aef0df 
x9 : dfff200000000000 x8 : 0082009000a40008 
x7 : 0000000000000000 x6 : ffff800015778700 
x5 : ffff100002aef0e0 x4 : 0000000000000000 
x3 : 1ffff00002aef0e3 x2 : ffff80002fd7147e 
x1 : ffff80002fd714a2 x0 : 0000000000000001 
Process swapper/3 (pid: 0, stack limit = 0xffff80003a908000)
Call trace:
Exception stack(0xffff80003efd85a0 to 0xffff80003efd86e0)
85a0: 0000000000000001 ffff80002fd714a2 ffff80002fd7147e 1ffff00002aef0e3
85c0: 0000000000000000 ffff100002aef0e0 ffff800015778700 0000000000000000
85e0: 0082009000a40008 dfff200000000000 ffff100002aef0df 1ffff00002aef0df
8600: ffff2000082f8980 ffff2000082f8dcc ffff2000082f912c ffff20000e1485a0
8620: 0000000000000002 0000000000000000 ffff20000da58140 ffff800015778640
8640: ffff8000182c1540 1ffff00007dfb0e8 ffff80001577871c ffff20000ae60000
8660: ffff8000182c1610 ffff80002fd7137e ffff8000182c1608 ffff20000ae55360
8680: 000060003418b000 ffff80003efd86e0 ffff200009dffb58 ffff80003efd86e0
86a0: ffff20000a30ce44 0000000010000145 ffff800015778640 ffff8000182c1540
86c0: 0001000000000000 ffff8000182c15ce ffff80003efd86e0 ffff20000a30ce44
[<ffff20000a30ce44>] __ll_sc_atomic_add+0x4/0x18 arch/arm64/include/asm/atomic_ll_sc.h:113
[<ffff200009e1009c>] skb_clone+0x1c4/0x3b0 net/core/skbuff.c:1286
[<ffff200009f2ff80>] ip_expire+0x4e8/0x7c0 net/ipv4/ip_fragment.c:239
[<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
[<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
[<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
[<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
[<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
[<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
[<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
[<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
[<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
[<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
[<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367
Exception stack(0xffff80003a90bd70 to 0xffff80003a90beb0)
bd60:                                   ffff80003a90234c 0000000000000007
bd80: 0000000000000000 1ffff00007520469 1fffe400017ad00c ffffffffffffe540
bda0: 0000000000000000 0000000000000000 ffff80003a902350 1ffff00007520469
bdc0: ffff80003a902348 ffff80003a902368 1ffff0000752046c 1ffff0000752046e
bde0: 1ffff0000752046d ffff20000e1485a0 0000000000000000 0000000000029d44
be00: ffff20000da58140 ffff80003a901a80 ffff80003a901a80 dfff200000000000
be20: ffff20000ae60e98 ffff0400015cc1d3 0000000000000000 ffff20000ae60df8
be40: ffff20000ae60df8 0000000000000000 0000000000000000 ffff80003a90beb0
be60: ffff200008089b50 ffff80003a90beb0 ffff200008089b54 0000000010000145
be80: ffff80003a901a80 ffff80003a901a80 ffffffffffffffff 01f6cee936b5bc00
bea0: ffff80003a90beb0 ffff200008089b54
[<ffff200008084034>] el1_irq+0xb4/0x12c arch/arm64/kernel/entry.S:569
[<ffff200008089b54>] arch_local_irq_enable arch/arm64/include/asm/irqflags.h:40 [inline]
[<ffff200008089b54>] arch_cpu_idle+0x1c/0x28 arch/arm64/kernel/process.c:87
[<ffff20000a360a94>] default_idle_call+0x34/0x78 kernel/sched/idle.c:98
[<ffff200008254a34>] cpuidle_idle_call kernel/sched/idle.c:156 [inline]
[<ffff200008254a34>] do_idle+0x20c/0x370 kernel/sched/idle.c:246
[<ffff20000825513c>] cpu_startup_entry+0x24/0x28 kernel/sched/idle.c:351
[<ffff2000080a2f4c>] secondary_start_kernel+0x2fc/0x498 arch/arm64/kernel/smp.c:280
Code: 978b7cfd 17ffff91 00000000 f9800031 (885f7c31) 
---[ end trace e4e9a51ab15d3a5f ]---

^ permalink raw reply

* Re: [PATCH] mac80211: aead api to reduce redundancy
From: Johannes Berg @ 2017-10-02 12:04 UTC (permalink / raw)
  To: Xiang Gao, davem, linux-kernel, linux-wireless, netdev
In-Reply-To: <20170926131945.3962-1-qasdfgtyuiop@gmail.com>

Please use "v2" tag or so in the subject line, having the same patch
again is really not helpful.

The next should be v3, obviously.

> +++ b/net/mac80211/aead_api.c
> @@ -1,7 +1,4 @@
> -/*
> - * Copyright 2014-2015, Qualcomm Atheros, Inc.
> - *
> - * This program is free software; you can redistribute it and/or
> modify
> +/* This program is free software; you can redistribute it and/or
> modify

I see no reason to make this change, why remove copyright?

> +++ b/net/mac80211/wpa.c
> @@ -464,7 +464,8 @@ static int ccmp_encrypt_skb(struct
> ieee80211_tx_data *tx, struct sk_buff *skb,
>  	pos += IEEE80211_CCMP_HDR_LEN;
>  	ccmp_special_blocks(skb, pn, b_0, aad);
>  	return ieee80211_aes_ccm_encrypt(key->u.ccmp.tfm, b_0, aad,
> pos, len,
> -					 skb_put(skb, mic_len),
> mic_len);
> +					 skb_put(skb,
> +						 key->u.ccmp.tfm-
> >authsize));
>  }

I see no reason for the change from mic_len to authsize here?

> @@ -540,10 +541,11 @@ ieee80211_crypto_ccmp_decrypt(struct
> ieee80211_rx_data *rx,
>  			ccmp_special_blocks(skb, pn, b_0, aad);
>  
>  			if (ieee80211_aes_ccm_decrypt(
> -				    key->u.ccmp.tfm, b_0, aad,
> -				    skb->data + hdrlen + IEEE80211_CCMP_HDR_LEN,
> -				    data_len,
> -				    skb->data + skb->len - mic_len, mic_len))
> +				key->u.ccmp.tfm, b_0, aad,
> +				skb->data + hdrlen + IEEE80211_CCMP_HDR_LEN,
> +				data_len,
> +				skb->data + skb->len - key->u.ccmp.tfm->authsize
> +			))
>  				return RX_DROP_UNUSABLE;

That's a really really strange way of writing this ...

Please reformat.

johannes

^ permalink raw reply

* Re: cross namespace interface notification for tun devices
From: Nicolas Dichtel @ 2017-10-02 12:06 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Netdev, Mathias
In-Reply-To: <CAHmME9pAvv7ebKC-uZGPJRi9Jasgrd2tgCvS1Lji+cgM1mV2qw@mail.gmail.com>

Le 02/10/2017 à 13:11, Jason A. Donenfeld a écrit :
> On Mon, Oct 2, 2017 at 11:32 AM, Nicolas Dichtel
> <nicolas.dichtel@6wind.com> wrote:
>> 1. Move the process to netns B, open the netlink socket and move back the
>> process to netns A. The socket will remain in netns B and you will receive all
>> netlink messages related to netns B.
>>
>> 2. Assign a nsid to netns B in netns A and use NETLINK_LISTEN_ALL_NSID on your
>> netlink socket (see iproute2).
> 
> Both of these seem to rely on the process knowing where the device is
> being moved and having access to that namespace. I don't think these
> two things are a given though. Unless I'm missing something?
I didn't understand correctly.
Your control process cannot monitor or control an interface which is in a
unkown/hidden netns. But x-netns interfaces are special. We already add a way to
identify peer netns for this kind of interfaces.
If an handler get_link_net was added to the rtnl_link_ops of the tun driver, it
will help to identify netns A when you are in netns B. But you need the opposite.
I already try a patch to advertise via netlink the dst netns when an interface
moves to a new netns. I think that it is valid for x-netns interfaces.
As soon as you can identify the dst netns, your problem is solved, right?


Nicolas

^ permalink raw reply

* Re: [net-next V2 PATCH 5/5] samples/bpf: add cpumap sample program xdp_redirect_cpu
From: Jesper Dangaard Brouer @ 2017-10-02 12:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: netdev, jakub.kicinski, Michael S. Tsirkin, Jason Wang, mchan,
	John Fastabend, peter.waskiewicz.jr, Daniel Borkmann,
	Andy Gospodarek, brouer
In-Reply-To: <20170930030607.sk2wzjxxlbhkkt7k@ast-mbp>

On Fri, 29 Sep 2017 20:06:09 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> > +/*** Trace point code ***/
> > +
> > +/* Tracepoint format: /sys/kernel/debug/tracing/events/xdp/xdp_redirect/format
> > + * Code in:                kernel/include/trace/events/xdp.h
> > + */
> > +struct xdp_redirect_ctx {
> > +	unsigned short common_type;	//	offset:0;  size:2; signed:0;
> > +	unsigned char common_flags;	//	offset:2;  size:1; signed:0;
> > +	unsigned char common_preempt_count;//	offset:3;  size:1; signed:0;
> > +	int common_pid;			//	offset:4;  size:4; signed:1;  
> 
> this part is not right. First 8 bytes are not accessible by bpf code.
> Please use __u64 pad; or similar here.

I've corrected this in V3.

Can you explain why BPF cannot access these (first 8 bytes) struct members?


> Just noticed that samples/bpf/xdp_monitor_kern.c has the same problem.
> 
> > +
> > +	int prog_id;			//	offset:8;  size:4; signed:1;
> > +	u32 act;			//	offset:12  size:4; signed:0;
> > +	int ifindex;			//	offset:16  size:4; signed:1;
> > +	int err;			//	offset:20  size:4; signed:1;
> > +	int to_ifindex;			//	offset:24  size:4; signed:1;
> > +	u32 map_id;			//	offset:28  size:4; signed:0;
> > +	int map_index;			//	offset:32  size:4; signed:1;
> > +};					//	offset:36  
> 
> the second part of fields is correct.


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [RFC net-next 0/5] net: dsa: LAG support
From: Andrew Lunn @ 2017-10-02 12:51 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, vivien.didelot, jiri, idosch, Woojung.Huh, john,
	sean.wang
In-Reply-To: <20171001194639.8647-1-f.fainelli@gmail.com>

> - not sure what to do with a switch fabric, naively, if adding two ports
>   of two distinct switches as a LAG group, we may have to propagate that
>   to "dsa" cross-chip interfaces as well

Hi Florian

Marvell switches do support this. If i remember correctly, it requires
some setup for forwarding over the DSA ports.

But for a first implementation, i would be tempted to disallow such
setups. Force the LAG members to be on the same switch.

	Andrew

^ permalink raw reply

* Re: [jkirsher/next-queue PATCH] ixgbe: Update adaptive ITR algorithm
From: Jesper Dangaard Brouer @ 2017-10-02 12:56 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev, intel-wired-lan, john.fastabend, brouer
In-Reply-To: <20170925215225.15616.63705.stgit@localhost.localdomain>

On Mon, 25 Sep 2017 14:55:36 -0700
Alexander Duyck <alexander.duyck@gmail.com> wrote:

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> The following change is meant to update the adaptive ITR algorithm to
> better support the needs of the network. Specifically with this change what
> I have done is make it so that our ITR algorithm will try to prevent either
> starving a socket buffer for memory in the case of Tx, or overruing an Rx
> socket buffer on receive.
> 
> In addition a side effect of the calculations used is that we should
> function better with new features such as XDP which can handle small
> packets at high rates without needing to lock us into NAPI polling mode.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
> 
> So I am putting this out to a wider distribution list than normal for a
> patch like this in order to get feedback on if there are any areas I may
> have overlooked. With this patch is should address many of the performance
> limitations seen with pktgen and XDP in terms of workloads that the old
> adaptive scheme wasn't handling.

Thanks a lot Alex!

I've tested the patch with XDP redirect (map), and the issue I reported
in [1] is solved with this patch.

[1] Subject: "XDP redirect measurements, gotchas and tracepoints"
 http://lkml.kernel.org/r/20170821212506.1cb0d5d6@redhat.com

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [RFC net-next 0/5] net: dsa: LAG support
From: Andrew Lunn @ 2017-10-02 12:59 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Florian Fainelli, netdev, vivien.didelot, jiri, idosch,
	Woojung.Huh, john, sean.wang
In-Reply-To: <20171002065023.GA11832@shredder.mtl.com>

> > - not sure what to do with a switch fabric, naively, if adding two ports
> >   of two distinct switches as a LAG group, we may have to propagate that
> >   to "dsa" cross-chip interfaces as well
> 
> At least in mlxsw case, enslaving switch and non-switch ports to the
> same LAG doesn't make sense. Any traffic routed by the switch will only
> be load-balanced between the switch ports. One way to solve that is to
> forbid such enslavements during NETDEV_PRECHANGEUPPER in case the lower
> devices in the adjacency list of the LAG don't belong to the same
> switch.
> 
> Note that such configurations are bound to fail anyway, as the
> non-switch ports will not have `switchdev_ops` configured and thus fail
> during __switchdev_port_obj_add() / __switchdev_port_attr_set().

Hi Ido

Here Florian is thinking about the D in DSA. Marvell switches have the
capabilities of building a switch fabric out of multiple
interconnected switches. To switchdev, they appear as a single switch.
switchdev has no idea of the mapping of interfaces to switches, nor
the routing of frames between switches. This all happens in the layers
bellow. The hardware does support LAG members on different switches
within the same fabric. But it requires some additional setup for the
ports which link switches together. We have the same issues with MDB,
where additional setup is required for group members spread over the
switch fabric.

      Andrew

^ permalink raw reply

* Re: [RFC net-next 0/5] net: dsa: LAG support
From: Ido Schimmel @ 2017-10-02 13:05 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Florian Fainelli, netdev, vivien.didelot, jiri, idosch,
	Woojung.Huh, john, sean.wang
In-Reply-To: <20171002125932.GB4765@lunn.ch>

Hi Andrew,

On Mon, Oct 02, 2017 at 02:59:32PM +0200, Andrew Lunn wrote:
> > > - not sure what to do with a switch fabric, naively, if adding two ports
> > >   of two distinct switches as a LAG group, we may have to propagate that
> > >   to "dsa" cross-chip interfaces as well
> > 
> > At least in mlxsw case, enslaving switch and non-switch ports to the
> > same LAG doesn't make sense. Any traffic routed by the switch will only
> > be load-balanced between the switch ports. One way to solve that is to
> > forbid such enslavements during NETDEV_PRECHANGEUPPER in case the lower
> > devices in the adjacency list of the LAG don't belong to the same
> > switch.
> > 
> > Note that such configurations are bound to fail anyway, as the
> > non-switch ports will not have `switchdev_ops` configured and thus fail
> > during __switchdev_port_obj_add() / __switchdev_port_attr_set().
> 
> Hi Ido
> 
> Here Florian is thinking about the D in DSA. Marvell switches have the
> capabilities of building a switch fabric out of multiple
> interconnected switches. To switchdev, they appear as a single switch.
> switchdev has no idea of the mapping of interfaces to switches, nor
> the routing of frames between switches. This all happens in the layers
> bellow. The hardware does support LAG members on different switches
> within the same fabric. But it requires some additional setup for the
> ports which link switches together. We have the same issues with MDB,
> where additional setup is required for group members spread over the
> switch fabric.

Yes, I understood that. I was simply referring to the more general
problem of any two net devices and how to solve it. Not currently
implemented in mlxsw, but should be necessary for DSA as well.

Agree with your previous mail about keeping it simple for the first
implementation.

^ permalink raw reply

* Re: [PATCH 05/18] net: use ARRAY_SIZE
From: Andy Shevchenko @ 2017-10-02 13:07 UTC (permalink / raw)
  To: Jérémy Lefaure
  Cc: Sathya Perla, Ajit Khaparde, Sriharsha Basavapatna, Somnath Kotur,
	Jeff Kirsher, Arend van Spriel, Franky Lin, Hante Meuleman,
	Chi-Hsien Lin, Wright Feng, Kalle Valo, Larry Finger, Chaoming Li,
	David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev,
	"linux-kernel@vger.kernel.org" <
In-Reply-To: <20171001193101.8898-6-jeremy.lefaure@lse.epita.fr>

On Sun, Oct 1, 2017 at 10:30 PM, Jérémy Lefaure
<jeremy.lefaure@lse.epita.fr> wrote:
> Using the ARRAY_SIZE macro improves the readability of the code. Also,
> it is not always useful to use a variable to store this constant
> calculated at compile time.
>

> +       {&gainctrl_lut_core0_rev0, ARRAY_SIZE(gainctrl_lut_core0_rev0), 26, 192,
> +        32},

For all such cases I would rather put on one line disregard checkpatch
warning for better readability.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Eric Dumazet @ 2017-10-02 13:36 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKML, netdev, linux-arm-kernel, syzkaller, David S. Miller,
	Willem de Bruijn
In-Reply-To: <20171002104947.GE20737@leverpostej>

On Mon, Oct 2, 2017 at 3:49 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi all,
>
> I hit the below splat at net/core/skbuff.c:2626 while fuzzing v4.14-rc2
> on arm64 with Syzkaller. This is the BUG_ON(len) at the end of
> skb_copy_and_csum_bits().
>
> I've uploaded a copy of the splat, my config, and (full) Syzkaller log
> to my kernel.org web space [1]. I haven't had the opportunity to
> reproduce this yet.
>
> This isn't a pure v4.14-rc2, as I have a not-yet-upstream fix [2]
> applied to avoid a userfaultfd bug. However, per the Syzkaller log, the
> userfaultfd syscall wasn't invoked, so I don't believe that should
> matter.
>
> Thanks,
> Mark.
>
> [1] https://www.kernel.org/pub/linux/kernel/people/mark/bugs/20171002-skbuff-bug/
> [2] https://lkml.kernel.org/r/20170920180413.26713-1-aarcange@redhat.com
>
> ------------[ cut here ]------------
> kernel BUG at net/core/skbuff.c:2626!
> Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.0-rc2-00001-gd7ad33d #115
> Hardware name: linux,dummy-virt (DT)
> task: ffff80003a901a80 task.stack: ffff80003a908000
> PC is at skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
> LR is at skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
> pc : [<ffff200009e03214>] lr : [<ffff200009e03214>] pstate: 00000145
> sp : ffff80003efd7b50
> x29: ffff80003efd7b50 x28: 000000000000003c
> x27: 00000000000001e8 x26: ffff80003a901a90
> x25: 000000000000003c x24: dfff200000000000
> x23: ffff800035723a80 x22: 000000000000003c
> x21: 0000000000000000 x20: 0000000000000000
> x19: 0000000000003a6d x18: ffff20000da58140
> x17: 0000000000000000 x16: 0000000000000001
> x15: ffff20000e1485a0 x14: ffff2000082f8980
> x13: ffff200009fc73d0 x12: ffff200009fc707c
> x11: 1ffff00002c2a3fc x10: ffff100002c2a3fc
> x9 : dfff200000000000 x8 : 07030301a8ff1127
> x7 : edff11270a080204 x6 : ffff800016151fe8
> x5 : ffff100002c2a3fd x4 : 000000000000000c
> x3 : 0000000000000030 x2 : 1ffff00006ae47a1
> x1 : 01f6cee936b5bc00 x0 : 0000000000000000
> Process swapper/3 (pid: 0, stack limit = 0xffff80003a908000)
> Call trace:
> Exception stack(0xffff80003efd7a10 to 0xffff80003efd7b50)
> 7a00:                                   0000000000000000 01f6cee936b5bc00
> 7a20: 1ffff00006ae47a1 0000000000000030 000000000000000c ffff100002c2a3fd
> 7a40: ffff800016151fe8 edff11270a080204 07030301a8ff1127 dfff200000000000
> 7a60: ffff100002c2a3fc 1ffff00002c2a3fc ffff200009fc707c ffff200009fc73d0
> 7a80: ffff2000082f8980 ffff20000e1485a0 0000000000000001 0000000000000000
> 7aa0: ffff20000da58140 0000000000003a6d 0000000000000000 0000000000000000
> 7ac0: 000000000000003c ffff800035723a80 dfff200000000000 000000000000003c
> 7ae0: ffff80003a901a90 00000000000001e8 000000000000003c ffff80003efd7b50
> 7b00: ffff200009e03214 ffff80003efd7b50 ffff200009e03214 0000000000000145
> 7b20: 0000000000003a6d 0000000000000000 0001000000000000 000000000000003c
> 7b40: ffff80003efd7b50 ffff200009e03214
> [<ffff200009e03214>] skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
> [<ffff20000a01d244>] icmp_glue_bits+0xa4/0x2a0 net/ipv4/icmp.c:357
> [<ffff200009f3f0d4>] __ip_append_data+0x10e4/0x20a8 net/ipv4/ip_output.c:1018
> [<ffff200009f41a88>] ip_append_data.part.3+0xe8/0x1a0 net/ipv4/ip_output.c:1170
> [<ffff200009f46e74>] ip_append_data+0xa4/0xb0 net/ipv4/ip_output.c:1173
> [<ffff20000a01ccc8>] icmp_push_reply+0x1b8/0x690 net/ipv4/icmp.c:375
> [<ffff20000a0211b0>] icmp_send+0x1070/0x1890 net/ipv4/icmp.c:741
> [<ffff200009f41d48>] ip_fragment.constprop.4+0x208/0x340 net/ipv4/ip_output.c:552
> [<ffff200009f42228>] ip_finish_output+0x3a8/0xab0 net/ipv4/ip_output.c:315
> [<ffff200009f468c4>] NF_HOOK_COND include/linux/netfilter.h:238 [inline]
> [<ffff200009f468c4>] ip_output+0x284/0x790 net/ipv4/ip_output.c:405
> [<ffff200009f43204>] dst_output include/net/dst.h:458 [inline]
> [<ffff200009f43204>] ip_local_out+0x9c/0x1b8 net/ipv4/ip_output.c:124
> [<ffff200009f445e8>] ip_queue_xmit+0x850/0x18e0 net/ipv4/ip_output.c:504
> [<ffff200009fb091c>] tcp_transmit_skb+0x107c/0x3338 net/ipv4/tcp_output.c:1123
> [<ffff200009fbbcc4>] __tcp_retransmit_skb+0x614/0x1d18 net/ipv4/tcp_output.c:2847
> [<ffff200009fbd840>] tcp_send_loss_probe+0x478/0x7d0 net/ipv4/tcp_output.c:2457
> [<ffff200009fc707c>] tcp_write_timer_handler+0x50c/0x7e8 net/ipv4/tcp_timer.c:557
> [<ffff200009fc73d0>] tcp_write_timer+0x78/0x170 net/ipv4/tcp_timer.c:579
> [<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
> [<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
> [<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
> [<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
> [<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
> [<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
> [<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
> [<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
> [<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
> [<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
> [<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367
> Exception stack(0xffff80003a90bb70 to 0xffff80003a90bcb0)
> bb60:                                   ffff80003a90234c 0000000000000007
> bb80: 0000000000000000 1ffff00007520469 1fffe400017ad00c dfff200000000000
> bba0: dfff200000000000 0000000000000000 ffff80003a902350 1ffff00007520469
> bbc0: ffff80003a902348 ffff80003a902368 1ffff0000752046c 1ffff0000752046e
> bbe0: 1ffff0000752046d ffff20000e1485a0 0000000000000000 0000000000000001
> bc00: ffff20000da58140 ffff80003efd9800 ffff80003efd9800 ffff20000ae60000
> bc20: ffff80003a971a80 1ffff000075217aa 0000000000000000 ffff20000ae60000
> bc40: 0000000000000001 ffff20000a34fce0 0000dffff519f438 ffff80003a90bcb0
> bc60: ffff20000a36134c ffff80003a90bcb0 ffff20000a361350 0000000010000145
> bc80: ffff80003efd9800 ffff80003efd9800 ffffffffffffffff ffff80003efd9800
> bca0: ffff80003a90bcb0 ffff20000a361350
> [<ffff200008084034>] el1_irq+0xb4/0x12c arch/arm64/kernel/entry.S:569
> [<ffff20000a361350>] arch_local_irq_enable arch/arm64/include/asm/irqflags.h:40 [inline]
> [<ffff20000a361350>] __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:168 [inline]
> [<ffff20000a361350>] _raw_spin_unlock_irq+0x30/0x100 kernel/locking/spinlock.c:199
> [<ffff2000081e0850>] finish_lock_switch kernel/sched/sched.h:1335 [inline]
> [<ffff2000081e0850>] finish_task_switch+0x1d8/0x950 kernel/sched/core.c:2657
> [<ffff20000a34fce0>] context_switch kernel/sched/core.c:2793 [inline]
> [<ffff20000a34fce0>] __schedule+0x518/0x17b0 kernel/sched/core.c:3366
> [<ffff20000a3520e8>] schedule_idle+0x58/0xc8 kernel/sched/core.c:3452
> [<ffff200008254a00>] do_idle+0x1d8/0x370 kernel/sched/idle.c:269
> [<ffff200008255138>] cpu_startup_entry+0x20/0x28 kernel/sched/idle.c:351
> [<ffff2000080a2f4c>] secondary_start_kernel+0x2fc/0x498 arch/arm64/kernel/smp.c:280
> Code: 97bcbfac 17fffe19 d503201f 97974258 (d4210000)
> ---[ end trace 3359b414c3a12466 ]---

This is most likely a bug caused by syzkaller setting a ridiculous MTU
on loopback device, below minimum size of ipv4 MTU.

I tried to track it in August [1], but it seems hard to find all the
issues with this.

commit c780a049f9bf442314335372c9abc4548bfe3e44
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Aug 16 11:09:12 2017 -0700

    ipv4: better IP_MAX_MTU enforcement

    While working on yet another syzkaller report, I found
    that our IP_MAX_MTU enforcements were not properly done.

    gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
    final result can be bigger than IP_MAX_MTU :/

    This is a problem because device mtu can be changed on other cpus or
    threads.

    While this patch does not fix the issue I am working on, it is
    probably worth addressing it.

^ permalink raw reply

* Re: Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
From: James Chapman @ 2017-10-02 13:32 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20171001102110.24184f1b@xeon-e3>

This seems to be a NULL pointer exception caused by tunnel->sock being
NULL at the call to bh_lock_sock() in l2tp_xmit_skb() at
l2tp_core.c:1135.

tunnel->sock is set NULL in l2tp_core's tunnel socket destructor.

At the moment, I don't understand how this happens because
pppol2tp_xmit() does a sock_hold() on the tunnel socket before
l2tp_xmit_skb() is called. I'm still looking at this.

Has this problem only recently started happening?





On 1 October 2017 at 18:21, Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
>
> Begin forwarded message:
>
> Date: Sun, 01 Oct 2017 16:22:33 +0000
> From: bugzilla-daemon@bugzilla.kernel.org
> To: stephen@networkplumber.org
> Subject: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
>
>
> https://bugzilla.kernel.org/show_bug.cgi?id=197099
>
>             Bug ID: 197099
>            Summary: Kernel panic in interrupt [l2tp_ppp]
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 4.8.13-1.el6.elrepo.x86_64
>           Hardware: x86-64
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>           Assignee: stephen@networkplumber.org
>           Reporter: svimik@gmail.com
>         Regression: No
>
> Created attachment 258685
>   --> https://bugzilla.kernel.org/attachment.cgi?id=258685&action=edit
> stacktrace screenshot
>
> Hello!
>
> Getting kernel panics on multiple servers. Since it mentions l2tp_core,
> l2tp_ppp and ppp_generic, I decided to report it to Networking (correct me if
> I'm wrong).
>
> Unfortunately I'm still struggling with making kdump work, so the trace
> screenshot is all I have at this moment. The only hope is that this stacktrace
> means something to the guys that wrote the code.
>
> --
> You are receiving this mail because:
> You are the assignee for the bug.

^ permalink raw reply

* Re: v4.14-rc2/arm64 misaligned atomic in ip_expire() / skb_clone()
From: Eric Dumazet @ 2017-10-02 13:44 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-kernel, netdev, linux-arm-kernel, syzkaller,
	David S. Miller, Willem de Bruijn, Eric Dumazet
In-Reply-To: <20171002115730.GA21696@leverpostej>

On Mon, 2017-10-02 at 12:57 +0100, Mark Rutland wrote:
> Hi all,
> 
> I'm intermittently hitting splats like below in skb_clone() while
> fuzzing v4.14-rc2 on arm64 with Syzkaller. It looks like the
> atomic_inc() at the end of __skb_clone() is being passed a misaligned
> pointer.
> 
> I've uploaded a number of splats and their associated (full) Syzkaller
> logs, along with my kernel config to my kernel.org webspace [1]. It
> might take a while for that to appear.
> 
> This isn't a pure v4.14-rc2, as I have a not-yet-upstream fix [2]
> applied to avoid a userfaultfd bug. The userfaultfd syscall appears in
> all of the Syzkaller logs, so there is the chance that this is related,
> but as I've not seen any other issues I suspect that's unlikely.
> 
> Thanks,
> Mark.
> 
> [1] https://www.kernel.org/pub/linux/kernel/people/mark/bugs/20171002-skb_clone-misaligned-atomic
> [2] https://lkml.kernel.org/r/20170920180413.26713-1-aarcange@redhat.com
> 
> Unable to handle kernel paging request at virtual address ffff80002fd714a2
> Mem abort info:
>   Exception class = DABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
> Data abort info:
>   ISV = 0, ISS = 0x00000033
>   CM = 0, WnR = 0
> swapper pgtable: 4k pages, 48-bit VAs, pgd = ffff20000eeb2000
> [ffff80002fd714a2] *pgd=000000007eff7003, *pud=000000007eff6003, *pmd=00f800006fc00711
> Internal error: Oops: 96000021 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.0-rc2-00001-gd7ad33d #115
> Hardware name: linux,dummy-virt (DT)
> task: ffff80003a901a80 task.stack: ffff80003a908000
> PC is at __ll_sc_atomic_add+0x4/0x18 arch/arm64/include/asm/atomic_ll_sc.h:113
> LR is at atomic_add arch/arm64/include/asm/atomic_lse.h:45 [inline]
> LR is at __skb_clone+0x4a8/0x6c0 net/core/skbuff.c:873
> pc : [<ffff20000a30ce44>] lr : [<ffff200009dffb58>] pstate: 10000145
> sp : ffff80003efd86e0
> x29: ffff80003efd86e0 x28: 000060003418b000 
> x27: ffff20000ae55360 x26: ffff8000182c1608 
> x25: ffff80002fd7137e x24: ffff8000182c1610 
> x23: ffff20000ae60000 x22: ffff80001577871c 
> x21: 1ffff00007dfb0e8 x20: ffff8000182c1540 
> x19: ffff800015778640 x18: ffff20000da58140 
> x17: 0000000000000000 x16: 0000000000000002 
> x15: ffff20000e1485a0 x14: ffff2000082f912c 
> x13: ffff2000082f8dcc x12: ffff2000082f8980 
> x11: 1ffff00002aef0df x10: ffff100002aef0df 
> x9 : dfff200000000000 x8 : 0082009000a40008 
> x7 : 0000000000000000 x6 : ffff800015778700 
> x5 : ffff100002aef0e0 x4 : 0000000000000000 
> x3 : 1ffff00002aef0e3 x2 : ffff80002fd7147e 
> x1 : ffff80002fd714a2 x0 : 0000000000000001 
> Process swapper/3 (pid: 0, stack limit = 0xffff80003a908000)
> Call trace:
> Exception stack(0xffff80003efd85a0 to 0xffff80003efd86e0)
> 85a0: 0000000000000001 ffff80002fd714a2 ffff80002fd7147e 1ffff00002aef0e3
> 85c0: 0000000000000000 ffff100002aef0e0 ffff800015778700 0000000000000000
> 85e0: 0082009000a40008 dfff200000000000 ffff100002aef0df 1ffff00002aef0df
> 8600: ffff2000082f8980 ffff2000082f8dcc ffff2000082f912c ffff20000e1485a0
> 8620: 0000000000000002 0000000000000000 ffff20000da58140 ffff800015778640
> 8640: ffff8000182c1540 1ffff00007dfb0e8 ffff80001577871c ffff20000ae60000
> 8660: ffff8000182c1610 ffff80002fd7137e ffff8000182c1608 ffff20000ae55360
> 8680: 000060003418b000 ffff80003efd86e0 ffff200009dffb58 ffff80003efd86e0
> 86a0: ffff20000a30ce44 0000000010000145 ffff800015778640 ffff8000182c1540
> 86c0: 0001000000000000 ffff8000182c15ce ffff80003efd86e0 ffff20000a30ce44
> [<ffff20000a30ce44>] __ll_sc_atomic_add+0x4/0x18 arch/arm64/include/asm/atomic_ll_sc.h:113
> [<ffff200009e1009c>] skb_clone+0x1c4/0x3b0 net/core/skbuff.c:1286
> [<ffff200009f2ff80>] ip_expire+0x4e8/0x7c0 net/ipv4/ip_fragment.c:239
> [<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
> [<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
> [<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
> [<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
> [<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
> [<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
> [<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
> [<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
> [<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
> [<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
> [<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367
> Exception stack(0xffff80003a90bd70 to 0xffff80003a90beb0)
> bd60:                                   ffff80003a90234c 0000000000000007
> bd80: 0000000000000000 1ffff00007520469 1fffe400017ad00c ffffffffffffe540
> bda0: 0000000000000000 0000000000000000 ffff80003a902350 1ffff00007520469
> bdc0: ffff80003a902348 ffff80003a902368 1ffff0000752046c 1ffff0000752046e
> bde0: 1ffff0000752046d ffff20000e1485a0 0000000000000000 0000000000029d44
> be00: ffff20000da58140 ffff80003a901a80 ffff80003a901a80 dfff200000000000
> be20: ffff20000ae60e98 ffff0400015cc1d3 0000000000000000 ffff20000ae60df8
> be40: ffff20000ae60df8 0000000000000000 0000000000000000 ffff80003a90beb0
> be60: ffff200008089b50 ffff80003a90beb0 ffff200008089b54 0000000010000145
> be80: ffff80003a901a80 ffff80003a901a80 ffffffffffffffff 01f6cee936b5bc00
> bea0: ffff80003a90beb0 ffff200008089b54
> [<ffff200008084034>] el1_irq+0xb4/0x12c arch/arm64/kernel/entry.S:569
> [<ffff200008089b54>] arch_local_irq_enable arch/arm64/include/asm/irqflags.h:40 [inline]
> [<ffff200008089b54>] arch_cpu_idle+0x1c/0x28 arch/arm64/kernel/process.c:87
> [<ffff20000a360a94>] default_idle_call+0x34/0x78 kernel/sched/idle.c:98
> [<ffff200008254a34>] cpuidle_idle_call kernel/sched/idle.c:156 [inline]
> [<ffff200008254a34>] do_idle+0x20c/0x370 kernel/sched/idle.c:246
> [<ffff20000825513c>] cpu_startup_entry+0x24/0x28 kernel/sched/idle.c:351
> [<ffff2000080a2f4c>] secondary_start_kernel+0x2fc/0x498 arch/arm64/kernel/smp.c:280
> Code: 978b7cfd 17ffff91 00000000 f9800031 (885f7c31) 
> ---[ end trace e4e9a51ab15d3a5f ]---
> 

skb->head is allocated by a kmalloc() call or similar.

This would happen if skb->end is mangled to not be a multiple of
NET_SKB_PAD  (or at least 4 in your case)

^ permalink raw reply

* Re: [PATCH 05/18] net: use ARRAY_SIZE
From: Kalle Valo @ 2017-10-02 13:46 UTC (permalink / raw)
  To: Jérémy Lefaure
  Cc: Sathya Perla, Ajit Khaparde, Sriharsha Basavapatna, Somnath Kotur,
	Jeff Kirsher, Arend van Spriel, Franky Lin, Hante Meuleman,
	Chi-Hsien Lin, Wright Feng, Larry Finger, Chaoming Li,
	David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI, netdev,
	linux-kernel, intel-wired-lan, linux-usb
In-Reply-To: <20171001193101.8898-6-jeremy.lefaure@lse.epita.fr>

Jérémy Lefaure <jeremy.lefaure@lse.epita.fr> writes:

> Using the ARRAY_SIZE macro improves the readability of the code. Also,
> it is not always useful to use a variable to store this constant
> calculated at compile time.
>
> Found with Coccinelle with the following semantic patch:
> @r depends on (org || report)@
> type T;
> T[] E;
> position p;
> @@
> (
>  (sizeof(E)@p /sizeof(*E))
> |
>  (sizeof(E)@p /sizeof(E[...]))
> |
>  (sizeof(E)@p /sizeof(T))
> )
>
> Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
> ---
>  drivers/net/ethernet/emulex/benet/be_cmds.c        |   4 +-
>  drivers/net/ethernet/intel/i40e/i40e_adminq.h      |   3 +-
>  drivers/net/ethernet/intel/i40evf/i40e_adminq.h    |   3 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c      |   3 +-
>  drivers/net/ethernet/intel/ixgbevf/vf.c            |  17 +-
>  drivers/net/usb/kalmia.c                           |   9 +-
>  .../broadcom/brcm80211/brcmsmac/phy/phytbl_n.c     | 473 ++++++---------------
>  .../net/wireless/realtek/rtlwifi/rtl8723be/hw.c    |   9 +-
>  .../net/wireless/realtek/rtlwifi/rtl8723be/phy.c   |  12 +-
>  .../net/wireless/realtek/rtlwifi/rtl8723be/table.c |  14 +-
>  .../net/wireless/realtek/rtlwifi/rtl8821ae/table.c |  34 +-
>  include/net/bond_3ad.h                             |   3 +-
>  net/ipv6/seg6_local.c                              |   6 +-
>  13 files changed, 177 insertions(+), 413 deletions(-)

We have a tree for wireless so usually it's better to submit wireless
changes on their own but here I assume Dave will apply this to his tree.
If not, please resubmit the wireless part in a separate patch.

-- 
Kalle Valo

^ permalink raw reply

* Re: [v4, 1/9] brcmsmac: make some local variables 'static const' to reduce stack size
From: Kalle Valo @ 2017-10-02 13:53 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Arend van Spriel, Franky Lin, Hante Meuleman, Chi-Hsien Lin,
	Wright Feng, Arnd Bergmann, Mauro Carvalho Chehab, Jiri Pirko,
	David S. Miller, Andrey Ryabinin, Alexander Potapenko,
	Dmitry Vyukov, Masahiro Yamada, Michal Marek, Andrew Morton,
	Kees Cook, Geert Uytterhoeven, Greg Kroah-Hartman, linux-media
In-Reply-To: <20170922212930.620249-2-arnd@arndb.de>

Arnd Bergmann <arnd@arndb.de> wrote:

> With KASAN and a couple of other patches applied, this driver is one
> of the few remaining ones that actually use more than 2048 bytes of
> kernel stack:
> 
> broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 'wlc_phy_workarounds_nphy_gainctrl':
> broadcom/brcm80211/brcmsmac/phy/phy_n.c:16065:1: warning: the frame size of 3264 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 'wlc_phy_workarounds_nphy':
> broadcom/brcm80211/brcmsmac/phy/phy_n.c:17138:1: warning: the frame size of 2864 bytes is larger than 2048 bytes [-Wframe-larger-than=]
> 
> Here, I'm reducing the stack size by marking as many local variables as
> 'static const' as I can without changing the actual code.
> 
> This is the first of three patches to improve the stack usage in this
> driver. It would be good to have this backported to stabl kernels
> to get all drivers in 'allmodconfig' below the 2048 byte limit so
> we can turn on the frame warning again globally, but I realize that
> the patch is larger than the normal limit for stable backports.
> 
> The other two patches do not need to be backported.
> 
> Cc: <stable@vger.kernel.org>
> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Patch applied to wireless-drivers.git, thanks.

c503dd38f850 brcmsmac: make some local variables 'static const' to reduce stack size

-- 
https://patchwork.kernel.org/patch/9967145/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply

* Re: [v4,2/9] brcmsmac: split up wlc_phy_workarounds_nphy
From: Kalle Valo @ 2017-10-02 13:55 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Arend van Spriel, Franky Lin, Hante Meuleman, Chi-Hsien Lin,
	Wright Feng, Arnd Bergmann, Mauro Carvalho Chehab, Jiri Pirko,
	David S. Miller, Andrey Ryabinin, Alexander Potapenko,
	Dmitry Vyukov, Masahiro Yamada, Michal Marek, Andrew Morton,
	Kees Cook, Geert Uytterhoeven, Greg Kroah-Hartman, linux-media
In-Reply-To: <20170922212930.620249-3-arnd@arndb.de>

Arnd Bergmann <arnd@arndb.de> wrote:

> The stack consumption in this driver is still relatively high, with one
> remaining warning if the warning level is lowered to 1536 bytes:
> 
> drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c:17135:1: error: the frame size of 1880 bytes is larger than 1536 bytes [-Werror=frame-larger-than=]
> 
> The affected function is actually a collection of three separate implementations,
> and each of them is fairly large by itself. Splitting them up is done easily
> and improves readability at the same time.
> 
> I'm leaving the original indentation to make the review easier.
> 
> Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

I'll queue this for v4.15. Depends on:

c503dd38f850 brcmsmac: make some local variables 'static const' to reduce stack size

-- 
https://patchwork.kernel.org/patch/9967141/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply

* Re: Fw: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
From: Eric Dumazet @ 2017-10-02 13:56 UTC (permalink / raw)
  To: James Chapman, svimik; +Cc: Stephen Hemminger, netdev
In-Reply-To: <CAEwTi7QkF73cc1hShzOoM9bzb5mBzyFW8t68OCn+XECrJ9d61Q@mail.gmail.com>

CC svimik@gmail.com so that he is aware of this netdev thread.

On Mon, 2017-10-02 at 14:32 +0100, James Chapman wrote:
> This seems to be a NULL pointer exception caused by tunnel->sock being
> NULL at the call to bh_lock_sock() in l2tp_xmit_skb() at
> l2tp_core.c:1135.
> 
> tunnel->sock is set NULL in l2tp_core's tunnel socket destructor.
> 
> At the moment, I don't understand how this happens because
> pppol2tp_xmit() does a sock_hold() on the tunnel socket before
> l2tp_xmit_skb() is called. I'm still looking at this.
> 
> Has this problem only recently started happening?
> 
> 
> 
> 
> 
> On 1 October 2017 at 18:21, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> >
> > Begin forwarded message:
> >
> > Date: Sun, 01 Oct 2017 16:22:33 +0000
> > From: bugzilla-daemon@bugzilla.kernel.org
> > To: stephen@networkplumber.org
> > Subject: [Bug 197099] New: Kernel panic in interrupt [l2tp_ppp]
> >
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=197099
> >
> >             Bug ID: 197099
> >            Summary: Kernel panic in interrupt [l2tp_ppp]
> >            Product: Networking
> >            Version: 2.5
> >     Kernel Version: 4.8.13-1.el6.elrepo.x86_64
> >           Hardware: x86-64
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >           Assignee: stephen@networkplumber.org
> >           Reporter: svimik@gmail.com
> >         Regression: No
> >
> > Created attachment 258685
> >   --> https://bugzilla.kernel.org/attachment.cgi?id=258685&action=edit
> > stacktrace screenshot
> >
> > Hello!
> >
> > Getting kernel panics on multiple servers. Since it mentions l2tp_core,
> > l2tp_ppp and ppp_generic, I decided to report it to Networking (correct me if
> > I'm wrong).
> >
> > Unfortunately I'm still struggling with making kdump work, so the trace
> > screenshot is all I have at this moment. The only hope is that this stacktrace
> > means something to the guys that wrote the code.
> >
> > --
> > You are receiving this mail because:
> > You are the assignee for the bug.

^ permalink raw reply

* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Mark Rutland @ 2017-10-02 14:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: LKML, netdev, linux-arm-kernel, syzkaller, David S. Miller,
	Willem de Bruijn
In-Reply-To: <CANn89i++X2rzsaHkTRayMpCED_exL8WPq12=RwD_hXsZ6cN-wQ@mail.gmail.com>

Hi Eric,

On Mon, Oct 02, 2017 at 06:36:32AM -0700, Eric Dumazet wrote:
> On Mon, Oct 2, 2017 at 3:49 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> > I hit the below splat at net/core/skbuff.c:2626 while fuzzing v4.14-rc2
> > on arm64 with Syzkaller. This is the BUG_ON(len) at the end of
> > skb_copy_and_csum_bits().

> > kernel BUG at net/core/skbuff.c:2626!

> > [<ffff200009e03214>] skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
> > [<ffff20000a01d244>] icmp_glue_bits+0xa4/0x2a0 net/ipv4/icmp.c:357
> > [<ffff200009f3f0d4>] __ip_append_data+0x10e4/0x20a8 net/ipv4/ip_output.c:1018
> > [<ffff200009f41a88>] ip_append_data.part.3+0xe8/0x1a0 net/ipv4/ip_output.c:1170
> > [<ffff200009f46e74>] ip_append_data+0xa4/0xb0 net/ipv4/ip_output.c:1173
> > [<ffff20000a01ccc8>] icmp_push_reply+0x1b8/0x690 net/ipv4/icmp.c:375
> > [<ffff20000a0211b0>] icmp_send+0x1070/0x1890 net/ipv4/icmp.c:741
> > [<ffff200009f41d48>] ip_fragment.constprop.4+0x208/0x340 net/ipv4/ip_output.c:552
> > [<ffff200009f42228>] ip_finish_output+0x3a8/0xab0 net/ipv4/ip_output.c:315
> > [<ffff200009f468c4>] NF_HOOK_COND include/linux/netfilter.h:238 [inline]
> > [<ffff200009f468c4>] ip_output+0x284/0x790 net/ipv4/ip_output.c:405
> > [<ffff200009f43204>] dst_output include/net/dst.h:458 [inline]
> > [<ffff200009f43204>] ip_local_out+0x9c/0x1b8 net/ipv4/ip_output.c:124
> > [<ffff200009f445e8>] ip_queue_xmit+0x850/0x18e0 net/ipv4/ip_output.c:504
> > [<ffff200009fb091c>] tcp_transmit_skb+0x107c/0x3338 net/ipv4/tcp_output.c:1123
> > [<ffff200009fbbcc4>] __tcp_retransmit_skb+0x614/0x1d18 net/ipv4/tcp_output.c:2847
> > [<ffff200009fbd840>] tcp_send_loss_probe+0x478/0x7d0 net/ipv4/tcp_output.c:2457
> > [<ffff200009fc707c>] tcp_write_timer_handler+0x50c/0x7e8 net/ipv4/tcp_timer.c:557
> > [<ffff200009fc73d0>] tcp_write_timer+0x78/0x170 net/ipv4/tcp_timer.c:579
> > [<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
> > [<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
> > [<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
> > [<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
> > [<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
> > [<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
> > [<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
> > [<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
> > [<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
> > [<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
> > [<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367

> This is most likely a bug caused by syzkaller setting a ridiculous MTU
> on loopback device, below minimum size of ipv4 MTU.

> I tried to track it in August [1], but it seems hard to find all the
> issues with this.
> 
> commit c780a049f9bf442314335372c9abc4548bfe3e44
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Wed Aug 16 11:09:12 2017 -0700
> 
>     ipv4: better IP_MAX_MTU enforcement
> 
>     While working on yet another syzkaller report, I found
>     that our IP_MAX_MTU enforcements were not properly done.
> 
>     gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
>     final result can be bigger than IP_MAX_MTU :/
> 
>     This is a problem because device mtu can be changed on other cpus or
>     threads.
> 
>     While this patch does not fix the issue I am working on, it is
>     probably worth addressing it.

Just to check I've understood correctly, are you suggesting that the
IPv4 code should also check the dev->mtu against a IP_MIN_MTU (which
doesn't seem to exist today)?

Otherwise, I do spot another potential issue. The writer side (e.g. most
net_device::ndo_change_mtu implementations and the __dev_set_mtu()
fallback) doesn't use WRITE_ONCE().

IIUC, that means that the write could be torn across multiple accesses,
and we could see dev->mtu < dev->min_mtu on the read side, even if we
use READ_ONCE(), and sanity check the mtu value before calling
__dev_set_mtu().

Thanks,
Mark.

^ permalink raw reply

* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Eric Dumazet @ 2017-10-02 14:42 UTC (permalink / raw)
  To: Mark Rutland
  Cc: LKML, netdev, linux-arm-kernel, syzkaller, David S. Miller,
	Willem de Bruijn
In-Reply-To: <20171002142156.GB21696@leverpostej>

On Mon, Oct 2, 2017 at 7:21 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> Hi Eric,
>
> On Mon, Oct 02, 2017 at 06:36:32AM -0700, Eric Dumazet wrote:
>> On Mon, Oct 2, 2017 at 3:49 AM, Mark Rutland <mark.rutland@arm.com> wrote:
>> > I hit the below splat at net/core/skbuff.c:2626 while fuzzing v4.14-rc2
>> > on arm64 with Syzkaller. This is the BUG_ON(len) at the end of
>> > skb_copy_and_csum_bits().
>
>> > kernel BUG at net/core/skbuff.c:2626!
>
>> > [<ffff200009e03214>] skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
>> > [<ffff20000a01d244>] icmp_glue_bits+0xa4/0x2a0 net/ipv4/icmp.c:357
>> > [<ffff200009f3f0d4>] __ip_append_data+0x10e4/0x20a8 net/ipv4/ip_output.c:1018
>> > [<ffff200009f41a88>] ip_append_data.part.3+0xe8/0x1a0 net/ipv4/ip_output.c:1170
>> > [<ffff200009f46e74>] ip_append_data+0xa4/0xb0 net/ipv4/ip_output.c:1173
>> > [<ffff20000a01ccc8>] icmp_push_reply+0x1b8/0x690 net/ipv4/icmp.c:375
>> > [<ffff20000a0211b0>] icmp_send+0x1070/0x1890 net/ipv4/icmp.c:741
>> > [<ffff200009f41d48>] ip_fragment.constprop.4+0x208/0x340 net/ipv4/ip_output.c:552
>> > [<ffff200009f42228>] ip_finish_output+0x3a8/0xab0 net/ipv4/ip_output.c:315
>> > [<ffff200009f468c4>] NF_HOOK_COND include/linux/netfilter.h:238 [inline]
>> > [<ffff200009f468c4>] ip_output+0x284/0x790 net/ipv4/ip_output.c:405
>> > [<ffff200009f43204>] dst_output include/net/dst.h:458 [inline]
>> > [<ffff200009f43204>] ip_local_out+0x9c/0x1b8 net/ipv4/ip_output.c:124
>> > [<ffff200009f445e8>] ip_queue_xmit+0x850/0x18e0 net/ipv4/ip_output.c:504
>> > [<ffff200009fb091c>] tcp_transmit_skb+0x107c/0x3338 net/ipv4/tcp_output.c:1123
>> > [<ffff200009fbbcc4>] __tcp_retransmit_skb+0x614/0x1d18 net/ipv4/tcp_output.c:2847
>> > [<ffff200009fbd840>] tcp_send_loss_probe+0x478/0x7d0 net/ipv4/tcp_output.c:2457
>> > [<ffff200009fc707c>] tcp_write_timer_handler+0x50c/0x7e8 net/ipv4/tcp_timer.c:557
>> > [<ffff200009fc73d0>] tcp_write_timer+0x78/0x170 net/ipv4/tcp_timer.c:579
>> > [<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
>> > [<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
>> > [<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
>> > [<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
>> > [<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
>> > [<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
>> > [<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
>> > [<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
>> > [<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
>> > [<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
>> > [<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367
>
>> This is most likely a bug caused by syzkaller setting a ridiculous MTU
>> on loopback device, below minimum size of ipv4 MTU.
>
>> I tried to track it in August [1], but it seems hard to find all the
>> issues with this.
>>
>> commit c780a049f9bf442314335372c9abc4548bfe3e44
>> Author: Eric Dumazet <edumazet@google.com>
>> Date:   Wed Aug 16 11:09:12 2017 -0700
>>
>>     ipv4: better IP_MAX_MTU enforcement
>>
>>     While working on yet another syzkaller report, I found
>>     that our IP_MAX_MTU enforcements were not properly done.
>>
>>     gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
>>     final result can be bigger than IP_MAX_MTU :/
>>
>>     This is a problem because device mtu can be changed on other cpus or
>>     threads.
>>
>>     While this patch does not fix the issue I am working on, it is
>>     probably worth addressing it.
>
> Just to check I've understood correctly, are you suggesting that the
> IPv4 code should also check the dev->mtu against a IP_MIN_MTU (which
> doesn't seem to exist today)?

We have plenty of places this is checked.

For example, trying to set MTU < 68 usually removes IPv4 addresses and routes.

Problem is : these checks are not fool proof yet.

( Only the admin was supposed to play these games )

>
> Otherwise, I do spot another potential issue. The writer side (e.g. most
> net_device::ndo_change_mtu implementations and the __dev_set_mtu()
> fallback) doesn't use WRITE_ONCE().

It does not matter how many strange values can be observed by the reader :
We must be fool proof anyway from reader point of view, so the
WRITE_ONCE() is not strictly needed.


>
> IIUC, that means that the write could be torn across multiple accesses,
> and we could see dev->mtu < dev->min_mtu on the read side, even if we
> use READ_ONCE(), and sanity check the mtu value before calling
> __dev_set_mtu().

^ permalink raw reply

* Re: v4.14-rc2/arm64 kernel BUG at net/core/skbuff.c:2626
From: Eric Dumazet @ 2017-10-02 14:48 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Eric Dumazet, LKML, netdev, linux-arm-kernel, syzkaller,
	David S. Miller, Willem de Bruijn
In-Reply-To: <20171002142156.GB21696@leverpostej>

On Mon, 2017-10-02 at 15:21 +0100, Mark Rutland wrote:
> Hi Eric,
> 
> On Mon, Oct 02, 2017 at 06:36:32AM -0700, Eric Dumazet wrote:
> > On Mon, Oct 2, 2017 at 3:49 AM, Mark Rutland <mark.rutland@arm.com> wrote:
> > > I hit the below splat at net/core/skbuff.c:2626 while fuzzing v4.14-rc2
> > > on arm64 with Syzkaller. This is the BUG_ON(len) at the end of
> > > skb_copy_and_csum_bits().
> 
> > > kernel BUG at net/core/skbuff.c:2626!
> 
> > > [<ffff200009e03214>] skb_copy_and_csum_bits+0x8dc/0xae0 net/core/skbuff.c:2626
> > > [<ffff20000a01d244>] icmp_glue_bits+0xa4/0x2a0 net/ipv4/icmp.c:357
> > > [<ffff200009f3f0d4>] __ip_append_data+0x10e4/0x20a8 net/ipv4/ip_output.c:1018
> > > [<ffff200009f41a88>] ip_append_data.part.3+0xe8/0x1a0 net/ipv4/ip_output.c:1170
> > > [<ffff200009f46e74>] ip_append_data+0xa4/0xb0 net/ipv4/ip_output.c:1173
> > > [<ffff20000a01ccc8>] icmp_push_reply+0x1b8/0x690 net/ipv4/icmp.c:375
> > > [<ffff20000a0211b0>] icmp_send+0x1070/0x1890 net/ipv4/icmp.c:741
> > > [<ffff200009f41d48>] ip_fragment.constprop.4+0x208/0x340 net/ipv4/ip_output.c:552
> > > [<ffff200009f42228>] ip_finish_output+0x3a8/0xab0 net/ipv4/ip_output.c:315
> > > [<ffff200009f468c4>] NF_HOOK_COND include/linux/netfilter.h:238 [inline]
> > > [<ffff200009f468c4>] ip_output+0x284/0x790 net/ipv4/ip_output.c:405
> > > [<ffff200009f43204>] dst_output include/net/dst.h:458 [inline]
> > > [<ffff200009f43204>] ip_local_out+0x9c/0x1b8 net/ipv4/ip_output.c:124
> > > [<ffff200009f445e8>] ip_queue_xmit+0x850/0x18e0 net/ipv4/ip_output.c:504
> > > [<ffff200009fb091c>] tcp_transmit_skb+0x107c/0x3338 net/ipv4/tcp_output.c:1123
> > > [<ffff200009fbbcc4>] __tcp_retransmit_skb+0x614/0x1d18 net/ipv4/tcp_output.c:2847
> > > [<ffff200009fbd840>] tcp_send_loss_probe+0x478/0x7d0 net/ipv4/tcp_output.c:2457
> > > [<ffff200009fc707c>] tcp_write_timer_handler+0x50c/0x7e8 net/ipv4/tcp_timer.c:557
> > > [<ffff200009fc73d0>] tcp_write_timer+0x78/0x170 net/ipv4/tcp_timer.c:579
> > > [<ffff2000082f8980>] call_timer_fn+0x1b8/0x430 kernel/time/timer.c:1281
> > > [<ffff2000082f8dcc>] expire_timers+0x1d4/0x320 kernel/time/timer.c:1320
> > > [<ffff2000082f912c>] __run_timers kernel/time/timer.c:1620 [inline]
> > > [<ffff2000082f912c>] run_timer_softirq+0x214/0x5f0 kernel/time/timer.c:1646
> > > [<ffff2000080826c0>] __do_softirq+0x350/0xc0c kernel/softirq.c:284
> > > [<ffff200008170af4>] do_softirq_own_stack include/linux/interrupt.h:498 [inline]
> > > [<ffff200008170af4>] invoke_softirq kernel/softirq.c:371 [inline]
> > > [<ffff200008170af4>] irq_exit+0x1dc/0x2f8 kernel/softirq.c:405
> > > [<ffff2000082a95bc>] __handle_domain_irq+0xdc/0x230 kernel/irq/irqdesc.c:647
> > > [<ffff2000080820ac>] handle_domain_irq include/linux/irqdesc.h:175 [inline]
> > > [<ffff2000080820ac>] gic_handle_irq+0x6c/0xe0 drivers/irqchip/irq-gic.c:367

Please try the following fool proof patch.

This is what I had in my local tree back in August but could not
conclude on the syzkaller bug I was working on.


diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 681e33998e03b609fdca83a83e0fc62a3fee8c39..e51d777797a927058760a1ab7af00579f7488cb5 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -732,7 +732,8 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
 		room = 576;
 	room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
 	room -= sizeof(struct icmphdr);
-
+	if (room < 0)
+		goto ende;
 	icmp_param.data_len = skb_in->len - icmp_param.offset;
 	if (icmp_param.data_len > room)
 		icmp_param.data_len = room;

^ permalink raw reply related

* Re: [PATCH net-next 0/3] bridge: neigh msg proxy and flood suppression support
From: Roopa Prabhu @ 2017-10-02 14:49 UTC (permalink / raw)
  To: davem@davemloft.net
  Cc: netdev@vger.kernel.org, Nikolay Aleksandrov,
	stephen@networkplumber.org, bridge
In-Reply-To: <1506919018-27875-1-git-send-email-roopa@cumulusnetworks.com>

On Sun, Oct 1, 2017 at 9:36 PM, Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>
> This series implements arp and nd suppression in the bridge
> driver for ethernet vpns. It implements rfc7432, section 10
> https://tools.ietf.org/html/rfc7432#section-10
> for ethernet VPN deployments. It is similar to the existing
> BR_ARP_PROXY flag but has a few semantic differences to conform
> to EVPN standard. In case of EVPN, it is mainly used to avoid flooding to
> tunnel ports like vxlan/mpls. Unlike the existing flags it suppresses flood
> of all neigh discovery packets (arp, nd) to tunnel ports.
>
> Roopa Prabhu (3):
>   bridge: add new BR_NEIGH_SUPPRESS port flag to suppress arp and nd
>     flood
>   neigh arp suppress first
>   bridge: suppress nd messages from going to BR_NEIGH_SUPPRESS ports
>

pls ignore, shows conflict applying over recent net-next bridge
changes. Will rebase and submit v2.

^ permalink raw reply

* Re: [PATCH RFC] flow_dissector: Add FLOW_DISSECTOR_F_FLOWER
From: Jiri Pirko @ 2017-10-02 14:49 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, hannes, netdev, rohit
In-Reply-To: <20170929191343.11318-1-tom@quantonium.net>

Fri, Sep 29, 2017 at 09:13:42PM CEST, tom@quantonium.net wrote:
>This patch is RFC and would be applied after "flow_dissector:
>Protocol specific flow dissector offload"
>
>In order to maitain uAPI in flower, the FLOW_DISSECTOR_F_FLOWER flag
>is added to indicate to flow_dissector that the caller is flower.
>As new funtionality is addes to flow_dissector that would break
>the flower uAPI, the code can be wrapped in "if (!(flags &
>FLOW_DISSECTOR_F_FLOWER)).
>
>In this patch the conditional is use around protocol specific
>dissection (e.g. DPI into VXLAN) as well as the code that
>enforces a depth of parsing to prevent DPI. The latter was a
>recent patch that would introduce a parsing limit to flower that
>did not exist before (i.e. would break uAPI).
>
>Signed-off-by: Tom Herbert <tom@quantonium.net>
>---
> include/net/flow_dissector.h |  1 +
> net/core/flow_dissector.c    | 17 +++++++++++------
> net/sched/cls_flow.c         |  3 ++-
> 3 files changed, 14 insertions(+), 7 deletions(-)
>
>diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
>index ad75bbfd1c9c..ca315107d147 100644
>--- a/include/net/flow_dissector.h
>+++ b/include/net/flow_dissector.h
>@@ -214,6 +214,7 @@ enum flow_dissector_key_id {
> #define FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL	BIT(2)
> #define FLOW_DISSECTOR_F_STOP_AT_ENCAP		BIT(3)
> #define FLOW_DISSECTOR_F_STOP_AT_L4		BIT(4)
>+#define FLOW_DISSECTOR_F_FLOWER			BIT(5)

I don't like flow_dissector to have any user-specific bits. Note that
the same dissection may be used not only from flower, but from other
code as well (OVS). Flow dissector should not care who the caller is.

^ permalink raw reply

* Re: [BUG] bpf is broken in net-next
From: Stephen Hemminger @ 2017-10-02 14:52 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Martin KaFai Lau, netdev
In-Reply-To: <20171002013424.tlaictbu7btc4z56@ast-mbp>

On Sun, 1 Oct 2017 18:34:25 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:

> On Sun, Oct 01, 2017 at 02:02:30PM -0700, Stephen Hemminger wrote:
> > Recent regression in net-next building bpf.c in samples/bpf now broken.
> > 
> > $ make samples/bpf/
> > 
> > 
> >   HOSTCC  samples/bpf/../../tools/lib/bpf/bpf.o
> > samples/bpf/../../tools/lib/bpf/bpf.c: In function ‘bpf_create_map_node’:
> > samples/bpf/../../tools/lib/bpf/bpf.c:76:13: error: ‘union bpf_attr’ has no member named ‘map_name’; did you mean ‘map_type’?
> >   memcpy(attr.map_name, name, min(name_len, BPF_OBJ_NAME_LEN - 1));
> >              ^
> > samples/bpf/../../tools/lib/bpf/bpf.c:76:44: error: ‘BPF_OBJ_NAME_LEN’ undeclared (first use in this function)
> >   memcpy(attr.map_name, name, min(name_len, BPF_OBJ_NAME_LEN - 1));  
> 
> everything works fine for me...
> did you do 'make headers_install' ?
> 

Yes, that was the problem. I had done make mrproper after changing branches.

^ permalink raw reply

* Re: [PATCH net-next v2] net: core: decouple ifalias get/set from rtnl lock
From: Eric Dumazet @ 2017-10-02 14:53 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev
In-Reply-To: <20171002102745.3047-1-fw@strlen.de>

On Mon, 2017-10-02 at 12:27 +0200, Florian Westphal wrote:
> Device alias can be set by either rtnetlink (rtnl is held) or sysfs.
> 
> rtnetlink hold the rtnl mutex, sysfs acquires it for this purpose.
> Add an extra mutex for it plus a seqcount to get a consistent snapshot
> of the alias buffer.


> +int dev_get_alias(const struct net_device *dev, char *alias, size_t len)
> +{
> +	unsigned int seq;
> +	int ret;
> +
> +	for (;;) {
> +		const char *name;
> +
> +		ret = 0;
> +		rcu_read_lock();
> +		name = rcu_dereference(dev->ifalias);
> +		seq = raw_seqcount_begin(&ifalias_rename_seq);
> +		if (name)
> +			ret = snprintf(alias, len, "%s", name);
> +		rcu_read_unlock();
> +
> +		if (!read_seqcount_retry(&ifalias_rename_seq, seq))
> +			break;
> +
> +		cond_resched();
> +	}
> +
> +	return ret;
> +}

I believe this too complex and not needed.

Just use RCU : A writer is supposed to work on a private copy, and
_then_ publish the new pointer, so that a reader can not see mangled
string.

We either copy the 'old' name or the 'new' one.

A seqcount is not needed, and wont prevent you from reading the value
right before a change anyway.

^ permalink raw reply

* Re: [PATCH net 3/3] net: skb_queue_purge(): lock/unlock the queue only once
From: Stephen Hemminger @ 2017-10-02 14:55 UTC (permalink / raw)
  To: Michael Witten
  Cc: David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	Eric Dumazet, netdev, linux-kernel
In-Reply-To: <057dd5367241468691b2b9adbc38a3ba-mfwitten@gmail.com>

On Mon, 02 Oct 2017 05:15:32 -0000
Michael Witten <mfwitten@gmail.com> wrote:

> On Sun, 1 Oct 2017 17:59:09 -0700, Stephen Hemminger wrote:
> 
> > On Sun, 01 Oct 2017 22:19:20 -0000 Michael Witten wrote:
> >  
> >> +	spin_lock_irqsave(&q->lock, flags);
> >> +	skb = q->next;
> >> +	__skb_queue_head_init(q);
> >> +	spin_unlock_irqrestore(&q->lock, flags);  
> >
> > Other code manipulating lists uses splice operation and
> > a sk_buff_head temporary on the stack. That would be easier
> > to understand.
> >
> > 	struct sk_buf_head head;
> >
> > 	__skb_queue_head_init(&head);
> > 	spin_lock_irqsave(&q->lock, flags);
> > 	skb_queue_splice_init(q, &head);
> > 	spin_unlock_irqrestore(&q->lock, flags);
> >
> >  
> >> +	while (skb != head) {
> >> +		next = skb->next;
> >>  		kfree_skb(skb);
> >> +		skb = next;
> >> +	}  
> >
> > It would be cleaner if you could use
> > skb_queue_walk_safe rather than open coding the loop.
> >
> > 	skb_queue_walk_safe(&head, skb,  tmp)
> > 		kfree_skb(skb);  
> 
> I appreciate abstraction as much as anybody, but I do not believe
> that such abstractions would actually be an improvement here.
> 
> * Splice-initing seems more like an idiom than an abstraction;
>   at first blush, it wouldn't be clear to me what the intention
>   is.
> 
> * Such abstractions are fairly unnecessary.
> 
>     * The function as written is already so short as to be
>       easily digested.
> 
>     * More to the point, this function is not some generic,
>       higher-level algorithm that just happens to employ the
>       socket buffer interface; rather, it is a function that
>       implements part of that very interface, and may thus
>       twiddle the intimate bits of these data structures
>       without being accused of abusing a leaky abstraction.
> 
> * Such abstractions add overhead, if only conceptually. In this
>   case, a temporary socket buffer queue allocates *3* unnecessary
>   struct members, including a whole `spinlock_t' member:
>   
>     prev
>     qlen
>     lock
> 
>   It's possible that the compiler will be smart enough to leave
>   those out, but I have my suspicions that it won't, not only
>   given that the interface contract requires that the temporary
>   socket buffer queue be properly initialized before use, but
>   also because splicing into the temporary will manipulate its
>   `qlen'. Yet, why worry whether optimization happens? The whole
>   issue can simply be avoided by exploiting the intimate details
>   that are already philosophically available to us.
> 
>   Similarly, the function `skb_queue_walk_safe' is nice, but it
>   loses value both because a temporary queue loses value (as just
>   described), and because it ignores the fact that legitimate
>   access to the internals of these data structures allows for
>   setting up the requested loop in advance; that is to say, the
>   two parts of the function that we are now debating can be woven
>   together more tightly than `skb_queue_walk_safe' allows.
> 
> For these reasons, I stand by the way that the patch currently
> implements this function; it does exactly what is desired, no more
> or less.
> 
> Sincerely,
> Michael Witten

The point is that there was discussion in the past of replacing
the next/prev as used in skb with more generic code from list.h.
If the abstraction was used, then this code would just work.

The temporary skb_buff_head is on the stack, and any
access to updating those fields like qlen are in CPU cache
and therefore have very little impact on any peformance.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox