* [PATCH net-next 0/6] qed/qede: VF tunnelling support
From: Manish Chopra @ 2017-04-24 17:00 UTC (permalink / raw)
To: davem; +Cc: netdev, Yuval.Mintz
Hi David,
With this series VFs can run vxlan/geneve/gre tunnels over it.
Please consider applying this series to "net-next"
Thanks,
Manish
Manish Chopra (6):
qed: refactor tunnelling - API/Structs
qed/qede: Enable tunnel offloads based on hw configuration
qede: Disable tunnel offloads for non offloaded UDP ports
qede: Configure UDP ports in local context.
qed/qede: Add UDP ports in bulletin board
qed - VF tunnelling support [VXLAN/GENEVE/GRE]
drivers/net/ethernet/qlogic/qed/qed.h | 31 +-
drivers/net/ethernet/qlogic/qed/qed_dev.c | 17 +-
drivers/net/ethernet/qlogic/qed/qed_dev_api.h | 2 +-
drivers/net/ethernet/qlogic/qed/qed_l2.c | 37 ++-
drivers/net/ethernet/qlogic/qed/qed_main.c | 47 ++-
drivers/net/ethernet/qlogic/qed/qed_sp.h | 4 +-
drivers/net/ethernet/qlogic/qed/qed_sp_commands.c | 330 +++++++++++-----------
drivers/net/ethernet/qlogic/qed/qed_sriov.c | 240 ++++++++++++++++
drivers/net/ethernet/qlogic/qed/qed_sriov.h | 9 +
drivers/net/ethernet/qlogic/qed/qed_vf.c | 165 +++++++++++
drivers/net/ethernet/qlogic/qed/qed_vf.h | 58 +++-
drivers/net/ethernet/qlogic/qede/qede.h | 3 +-
drivers/net/ethernet/qlogic/qede/qede_filter.c | 84 +++++-
drivers/net/ethernet/qlogic/qede/qede_fp.c | 23 +-
drivers/net/ethernet/qlogic/qede/qede_main.c | 56 ++--
include/linux/qed/qed_eth_if.h | 1 +
include/linux/qed/qed_if.h | 5 +
17 files changed, 857 insertions(+), 255 deletions(-)
--
2.7.2
^ permalink raw reply
* Re: [PATCH net v2] ipv4: Avoid caching l3mdev dst on mismatched local route
From: David Miller @ 2017-04-24 16:52 UTC (permalink / raw)
To: rshearma; +Cc: netdev, dsa
In-Reply-To: <1492806899-6215-1-git-send-email-rshearma@brocade.com>
From: Robert Shearman <rshearma@brocade.com>
Date: Fri, 21 Apr 2017 21:34:59 +0100
> David reported that doing the following:
>
> ip li add red type vrf table 10
> ip link set dev eth1 vrf red
> ip addr add 127.0.0.1/8 dev red
> ip link set dev eth1 up
> ip li set red up
> ping -c1 -w1 -I red 127.0.0.1
> ip li del red
>
> when either policy routing IP rules are present or the local table
> lookup ip rule is before the l3mdev lookup results in a hang with
> these messages:
>
> unregister_netdevice: waiting for red to become free. Usage count = 1
>
> The problem is caused by caching the dst used for sending the packet
> out of the specified interface on a local route with a different
> nexthop interface. Thus the dst could stay around until the route in
> the table the lookup was done is deleted which may be never.
>
> Address the problem by not forcing output device to be the l3mdev in
> the flow's output interface if the lookup didn't use the l3mdev. This
> then results in the dst using the right device according to the route.
>
> Changes in v2:
> - make the dev_out passed in by __ip_route_output_key_hash correct
> instead of checking the nh dev if FLOWI_FLAG_SKIP_NH_OIF is set as
> suggested by David.
>
> Fixes: 5f02ce24c2696 ("net: l3mdev: Allow the l3mdev to be a loopback")
> Reported-by: David Ahern <dsa@cumulusnetworks.com>
> Suggested-by: David Ahern <dsa@cumulusnetworks.com>
> Signed-off-by: Robert Shearman <rshearma@brocade.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next 0/7] ibmvnic: Additional updates and bug fixes
From: David Miller @ 2017-04-24 16:52 UTC (permalink / raw)
To: nfont; +Cc: netdev, brking, jallen, muvic, tlfalcon
In-Reply-To: <20170421193627.11030.34813.stgit@ltcalpine2-lp23.aus.stglabs.ibm.com>
From: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Date: Fri, 21 Apr 2017 15:38:29 -0400
> This set of patches is an additional set of updates and bug fixes to
> the ibmvnic driver which applies on top of the previous set of updates
> sent out on 4/19.
Series applied, thanks.
^ permalink raw reply
* cpsw regression in mainline with "cpsw/netcp: cpts depends on posix_timers"
From: Tony Lindgren @ 2017-04-24 16:51 UTC (permalink / raw)
To: Grygorii Strashko, Arnd Bergmann; +Cc: netdev, linux-omap
Hi,
Looks like commit 07fef3623407 ("cpsw/netcp: cpts depends on posix_timers")
in mainline started triggering the following oops at least on j5eco-evm.
Adding CONFIG_PTP_1588_CLOCK to .config solves it, but the oops hints
something is wrong with the dependencies.. CONFIG_TI_CPTS defaults to N
and not selecting it causes the oops.
Any ideas what's needed to properly fix this?
Regards,
Tony
8< -------------------------
Unhandled fault: external abort on non-linefetch (0x1008) at 0xf09a3018
pgd = c0004000
[f09a3018] *pgd=ae82c811, *pte=4a103653, *ppte=4a103453
Internal error: : 1008 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 14 Comm: kworker/0:1 Not tainted 4.11.0-rc8 #418
Hardware name: Generic ti814x (Flattened Device Tree)
Workqueue: rpciod rpc_async_schedule
task: ee8f41c0 task.stack: ee93c000
PC is at cpdma_chan_submit+0x154/0x304
LR is at __dma_page_cpu_to_dev+0x28/0xac
pc : [<c0639664>] lr : [<c011523c>] psr: a0000093
sp : ee93d958 ip : ef6f9000 fp : 00000000
r10: eeebee40 r9 : 80000013 r8 : eed956f0
r7 : 00000000 r6 : 0000003c r5 : f09a3000 r4 : eed956d0
r3 : f09a3018 r2 : aeebb302 r1 : aeebb33e r0 : ee950410
Flags: NzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 80004019 DAC: 00000051
Process kworker/0:1 (pid: 14, stack limit = 0xee93c218)
Stack: (0xee93d958 to 0xee93e000)
d940: 00000001 00000000
d960: ee8f47f0 00000000 eeebee40 eeebee40 eedfc000 eedf4810 00000000 eed956d0
d980: 00000000 eee01800 ee93c000 c063e35c 00000000 c01975d8 eeebee40 eedfc000
d9a0: 00000000 eedfc000 00000000 c06b4c6c 00000000 eee01840 eeebee40 0000002a
d9c0: eedfc000 ee93d9f4 eeeb2e50 eeeb2e00 eeebee40 eee01800 eedfc000 00000000
d9e0: eeeb2e50 00000000 00000008 c06db9f4 eeeb2e00 00000010 00000000 eeeb2e00
da00: 00000000 00000000 00000000 c06b574c eeeb2e50 00000001 c072af98 60000013
da20: 00000000 c072af98 00000878 00000000 00000000 00000000 00000000 eeeb2e80
da40: fffffff4 00000000 ffffe000 eeebee40 c0db2600 c0dbe34a 00000000 c0dbe34a
da60: c06c06bc 00000003 c0dbe34a c072af98 00000000 00000000 c072ae2c 00000000
da80: 3e6fa8c0 426fa8c0 c06c06bc 00000003 c0dbe34a c072b048 3e6fa8c0 eedfc21c
daa0: 00000000 eedfc000 00000000 3e6fa8c0 426fa8c0 c072bdb8 3e6fa8c0 00000000
dac0: eeded088 00000000 00000000 c01b424c c0d59933 c029ed04 ef6db580 c080d190
dae0: eeebdc68 60000013 c06c06a0 00000003 01080020 eeebdc14 eeebef00 eeebdc00
db00: eeebef00 00000001 eeeac000 eeebdc38 eeeac000 00000000 00000000 c06c06bc
db20: eeebdc00 eeebdc14 00000001 c06c4144 ffff8d80 eeebdc00 eeeac000 00000000
db40: c0dbe2e3 c06f3550 eeeac000 00000000 00000000 c06c4f78 c0db6e78 c0d07da0
db60: 00000000 00000000 ee8f41c0 00000006 eeebde10 eeebdc00 eedfc000 426fa8c0
db80: c0dbe2e3 c0db2600 eeeac000 00000000 00000000 c06f3550 00000000 00000000
dba0: c06f32c0 00000000 00000000 00000010 c06f6b6c 60000013 c06f6b44 426fa8c0
dbc0: ffffe000 00000000 eeeac000 c0dbe2e3 c0db2600 00000001 c06f75b0 00000074
dbe0: 00000000 c06f6b6c 00000000 00000000 c06f6a70 00000000 eedfc000 eea36040
dc00: ee93dc54 c06f51b0 ee93ddb8 3e6fa8c0 eea36340 c06f777c ee8f4e30 eeeac000
dc20: c0db2600 c0db2600 eea36040 eeebde24 00000071 ee93dd00 c0725248 3e6fa8c0
dc40: eeeac000 c06f75b0 00000000 ffff0000 ee93dd04 ee93dc54 ee93dc54 eeeac000
dc60: eea36040 eeebde24 00000071 ee93dd00 00000000 3e6fa8c0 eeeac000 c0725248
dc80: eea36040 ee93ddb8 ee93dd00 426fa8c0 00000058 00000060 3e6fa8c0 c0726f3c
dca0: 00000060 00000008 ee93dce4 ee93dce0 00004040 60000013 c07222fc 00000000
dcc0: 00000000 00000058 00000000 c06f51b0 00006f00 c0691c74 c0db2600 ee93de5c
dce0: 00000000 ee8f47f0 c0ee0000 426fa8c0 00000000 00000000 ffff0000 00000004
dd00: 00000002 00000001 00000000 00110000 00000000 00000007 00000000 00000000
dd20: 00000000 ee93de5c 3e6fa8c0 426fa8c0 d5866f00 ee8f41c0 00000001 eeeb100c
dd40: 00000000 c019a0d8 00000001 ffffe000 c0733718 c013e764 eea36040 ee93ddb8
dd60: 00000058 c0733718 eea36040 c0734310 00000000 00000000 00001944 00000000
dd80: ee32a900 00000010 eeeb100c 00000000 eeebb404 ee93de5c c0dc5a04 c0691c74
dda0: 00000000 c07d8460 00000058 00000000 eeeb6804 00000058 eeeb100c 00000010
ddc0: 00000003 00000000 00000000 ee93ddb8 00000000 00000000 00000000 00000000
dde0: 00004040 00000000 00000000 00000001 00000000 eeebb404 ee32a900 00000058
de00: 00000000 c07d84d4 00000000 00000000 c1566b80 00000000 c1566bc4 c01c7b3c
de20: 00000001 eeeb1000 eeebb400 eeeac0c0 eeebb474 eeeb1344 00000000 c0dc5874
de40: c0dc5a04 c07d8a10 00000000 00000001 ee93de5c eeeac0c0 eeebb474 00000000
de60: eeeb1000 eeebb400 eeeac0c0 c07d5240 eeeb685c eeebb404 eeeb6954 eeeac0c0
de80: eeebb400 eeebb400 00000681 fef5f746 c0d0796c c0dc5874 ee93c000 c07d1fe4
dea0: eee4c800 c01975d8 eeeac0c0 ee8ff080 ef6dfa40 c07d1e8c c07d1e8c c07dde98
dec0: 00000000 00000681 00000000 00000000 ffffe000 00000000 60000013 eeeac0e4
dee0: ee8ff080 ef6dfa40 ee93df20 ff7f0c00 c0d0796c c0dc2d54 c0d4fa8e c0154dac
df00: 00000001 00000000 c0154cf4 00000001 00000000 00000000 c0155e80 00000008
df20: c15a3644 c0f151cc 00000000 c0b00010 ee8ff080 ef6dfa40 ee8ff098 00000008
df40: ef6dfa74 ee93c000 c0d04900 ef6dfa40 ee8ff080 c0155e0c 00000000 ee92a700
df60: ffffe000 ee8ff000 00000000 ee92a700 ee93c000 ee8ff080 c0155dd0 ee8ff038
df80: ee891e90 c015bad8 ee93c000 ee92a700 c015b9c8 00000000 00000000 00000000
dfa0: 00000000 00000000 00000000 c01077b0 00000000 00000000 00000000 00000000
dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 bde4cdce 6d799827
[<c0639664>] (cpdma_chan_submit) from [<c063e35c>] (cpsw_ndo_start_xmit+0x84/0x178)
[<c063e35c>] (cpsw_ndo_start_xmit) from [<c06b4c6c>] (dev_hard_start_xmit+0xc4/0x3bc)
[<c06b4c6c>] (dev_hard_start_xmit) from [<c06db9f4>] (sch_direct_xmit+0xcc/0x194)
[<c06db9f4>] (sch_direct_xmit) from [<c06b574c>] (__dev_queue_xmit+0x6f4/0x984)
[<c06b574c>] (__dev_queue_xmit) from [<c072af98>] (arp_xmit+0x16c/0x1f0)
[<c072af98>] (arp_xmit) from [<c072bdb8>] (arp_solicit+0x17c/0x44c)
[<c072bdb8>] (arp_solicit) from [<c06c06bc>] (neigh_probe+0x54/0x80)
[<c06c06bc>] (neigh_probe) from [<c06c4144>] (__neigh_event_send+0x224/0x32c)
[<c06c4144>] (__neigh_event_send) from [<c06c4f78>] (neigh_resolve_output+0x16c/0x1e8)
[<c06c4f78>] (neigh_resolve_output) from [<c06f3550>] (ip_finish_output2+0x304/0x754)
[<c06f3550>] (ip_finish_output2) from [<c06f6b6c>] (ip_output+0x1fc/0x328)
[<c06f6b6c>] (ip_output) from [<c06f75b0>] (ip_send_skb+0x1c/0xf0)
[<c06f75b0>] (ip_send_skb) from [<c0725248>] (udp_send_skb+0xe0/0x2bc)
[<c0725248>] (udp_send_skb) from [<c0726f3c>] (udp_sendmsg+0x2b4/0x9c8)
[<c0726f3c>] (udp_sendmsg) from [<c0691c74>] (sock_sendmsg+0x14/0x24)
[<c0691c74>] (sock_sendmsg) from [<c07d8460>] (xs_send_kvec+0x84/0x94)
[<c07d8460>] (xs_send_kvec) from [<c07d84d4>] (xs_sendpages+0x64/0x21c)
[<c07d84d4>] (xs_sendpages) from [<c07d8a10>] (xs_udp_send_request+0x50/0x10c)
[<c07d8a10>] (xs_udp_send_request) from [<c07d5240>] (xprt_transmit+0x4c/0x3ac)
[<c07d5240>] (xprt_transmit) from [<c07d1fe4>] (call_transmit+0x158/0x208)
[<c07d1fe4>] (call_transmit) from [<c07dde98>] (__rpc_execute+0x9c/0x524)
[<c07dde98>] (__rpc_execute) from [<c0154dac>] (process_one_work+0x2b0/0x774)
[<c0154dac>] (process_one_work) from [<c0155e0c>] (worker_thread+0x3c/0x540)
[<c0155e0c>] (worker_thread) from [<c015bad8>] (kthread+0x110/0x150)
[<c015bad8>] (kthread) from [<c01077b0>] (ret_from_fork+0x14/0x24)
Code: e585a010 e5852014 e2853018 e5836000 (e5933000)
^ permalink raw reply
* Re: net/ipv6: slab-out-of-bounds in ip6_tnl_xmit
From: Cong Wang @ 2017-04-24 16:47 UTC (permalink / raw)
To: Andrey Konovalov
Cc: David S. Miller, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Eric Dumazet,
Dmitry Vyukov, Kostya Serebryany, syzkaller
In-Reply-To: <CAAeHK+w-EYicrLHhBR0LHMMTc22GjsB=PTmgswm44w4VpfQ_hA@mail.gmail.com>
On Mon, Apr 24, 2017 at 8:03 AM, Andrey Konovalov <andreyknvl@google.com> wrote:
> Hi,
>
> I've got the following error report while fuzzing the kernel with syzkaller.
>
> On commit 5a7ad1146caa895ad718a534399e38bd2ba721b7 (4.11-rc8).
>
> Unfortunately it's not reproducible.
>
> The issue might be similar to this one:
> https://groups.google.com/forum/#!topic/syzkaller/IDoQHFmrnRI
>
> ==================================================================
> BUG: KASAN: slab-out-of-bounds in ip6_tnl_xmit+0x25dd/0x28f0
> net/ipv6/ip6_tunnel.c:1078 at addr ffff88005dcc5f98
> Read of size 16 by task syz-executor7/8076
> CPU: 3 PID: 8076 Comm: syz-executor7 Not tainted 4.11.0-rc8+ #266
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:16 [inline]
> dump_stack+0x192/0x22d lib/dump_stack.c:52
> kasan_object_err+0x1c/0x70 mm/kasan/report.c:164
> print_address_description mm/kasan/report.c:202 [inline]
> kasan_report_error mm/kasan/report.c:291 [inline]
> kasan_report+0x252/0x510 mm/kasan/report.c:347
> __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:378
> ip6_tnl_xmit+0x25dd/0x28f0 net/ipv6/ip6_tunnel.c:1078
> ip4ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1268 [inline]
> ip6_tnl_start_xmit+0xc1e/0x1890 net/ipv6/ip6_tunnel.c:1370
We use ipv4 dst in ip6_tunnel and cast an IPv4 neigh key as an
IPv6 address...
neigh = dst_neigh_lookup(skb_dst(skb),
&ipv6_hdr(skb)->daddr);
if (!neigh)
goto tx_err_link_failure;
addr6 = (struct in6_addr *)&neigh->primary_key; // <=== HERE
addr_type = ipv6_addr_type(addr6);
if (addr_type == IPV6_ADDR_ANY)
addr6 = &ipv6_hdr(skb)->daddr;
memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));
Also the network header of the skb at this point should be still IPv4?
^ permalink raw reply
* Re: [PATCH v3 07/29] x86: bpf_jit, use ENTRY+ENDPROC
From: Alexei Starovoitov @ 2017-04-24 16:47 UTC (permalink / raw)
To: Jiri Slaby
Cc: Ingo Molnar, David Miller, mingo, tglx, hpa, x86, jpoimboe,
linux-kernel, netdev, daniel, edumazet
In-Reply-To: <71301a81-1c61-fd4d-5b1b-5154fa723859@suse.cz>
On Mon, Apr 24, 2017 at 06:02:51PM +0200, Jiri Slaby wrote:
> On 04/24/2017, 05:55 PM, Ingo Molnar wrote:
> > * Jiri Slaby <jslaby@suse.cz> wrote:
> >
> >> On 04/24/2017, 05:08 PM, David Miller wrote:
> >>> If you align the entry points, then the code sequence as a whole is
> >>> are no longer densely packed.
> >>
> >> Sure.
> >>
> >>> Or do I misunderstand how your macros work?
> >>
> >> Perhaps. So the suggested macros for the code are:
> >> #define BPF_FUNC_START_LOCAL(name) \
> >> SYM_START(name, SYM_V_LOCAL, SYM_A_NONE)
> >> #define BPF_FUNC_START(name) \
> >> SYM_START(name, SYM_V_GLOBAL, SYM_A_NONE)
> >>
> >> and they differ from the standard ones:
> >> #define SYM_FUNC_START_LOCAL(name) \
> >> SYM_START(name, SYM_V_LOCAL, SYM_A_ALIGN)
> >> #define SYM_FUNC_START(name) \
> >> SYM_START(name, SYM_V_GLOBAL, SYM_A_ALIGN)
> >>
> >>
> >> The difference is SYM_A_NONE vs. SYM_A_ALIGN, which means:
> >> #define SYM_A_ALIGN ALIGN
> >> #define SYM_A_NONE /* nothing */
> >>
> >> Does it look OK now?
> >
> > No, the patch changes alignment which is undesirable, it needs to preserve the
> > existing (non-)alignment of the symbols!
>
> OK, so I am not expressing myself explicitly enough, it seems.
>
> So, correct, the patch v3 adds alignments. I suggested in the discussion
> the macros above. They do not add alignments. If everybody is OK with
> that, v4 of the patch won't add alignments. OK?
can we go back to what problem this patch set is trying to solve?
Sounds like you want to add _function_ start/end marks to aid debugging?
Debugging with what? What tool will recognize this stuff?
Take a look at what your patch does:
+ENTRY(sk_load_word)
test %esi,%esi
js bpf_slow_path_word_neg
+ENDPROC(sk_load_word)
Does above two assembler instructions look like a function?
or this:
+ENTRY(sk_load_byte_positive_offset)
cmp %esi,%r9d /* if (offset >= hlen) goto bpf_slow_path_byte */
jle bpf_slow_path_byte
movzbl (SKBDATA,%rsi),%eax
ret
+ENDPROC(sk_load_byte_positive_offset)
This assembler code doesn't represent functions. There is no prologue/epilogue
and no stack frame. JITed code uses 'call' insn to jump into them, but they're
not your typical C functions.
Take a look at bpf_slow_path_common() macro that creates the frame before
calling into C code with 'call skb_copy_bits;'
I still think that this code should be left alone.
Even macro names you're proposing:
#define BPF_FUNC_START_LOCAL
don't sound right. These are not functions.
^ permalink raw reply
* Re: [PATCH net-next 0/3] packet: Add option to create new fanout group with unique id.
From: David Miller @ 2017-04-24 16:46 UTC (permalink / raw)
To: willemdebruijn.kernel; +Cc: netdev, maloneykernel, maloney
In-Reply-To: <20170421145612.106869-1-willemdebruijn.kernel@gmail.com>
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Fri, 21 Apr 2017 10:56:09 -0400
> Fanout uses a per net global namespace. A process that intends to create a
> new fanout group can accidentally join an existing group. It is
> not possible to detect this.
>
> Add a socket option to specify on the first call to
> setsockopt(..., PACKET_FANOUT, ...) to ensure that a new group is created.
> Also add tests.
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH net-next v2 2/5] virtio-net: transmit napi
From: Michael S. Tsirkin @ 2017-04-24 16:40 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Willem de Bruijn, Network Development, David Miller,
virtualization
In-Reply-To: <CAF=yD-KdtvXV+=ZROcLPHJxB55wbnJOmV0MJn2AJPHugoofR0g@mail.gmail.com>
On Fri, Apr 21, 2017 at 10:50:12AM -0400, Willem de Bruijn wrote:
> >>> Maybe I was wrong, but according to Michael's comment it looks like he
> >>> want
> >>> check affinity_hint_set just for speculative tx polling on rx napi
> >>> instead
> >>> of disabling it at all.
> >>>
> >>> And I'm not convinced this is really needed, driver only provide affinity
> >>> hint instead of affinity, so it's not guaranteed that tx and rx interrupt
> >>> are in the same vcpus.
> >>
> >> You're right. I made the restriction broader than the request, to really
> >> err
> >> on the side of caution for the initial merge of napi tx. And enabling
> >> the optimization is always a win over keeping it off, even without irq
> >> affinity.
> >>
> >> The cycle cost is significant without affinity regardless of whether the
> >> optimization is used.
> >
> >
> > Yes, I noticed this in the past too.
> >
> >> Though this is not limited to napi-tx, it is more
> >> pronounced in that mode than without napi.
> >>
> >> 1x TCP_RR for affinity configuration {process, rx_irq, tx_irq}:
> >>
> >> upstream:
> >>
> >> 1,1,1: 28985 Mbps, 278 Gcyc
> >> 1,0,2: 30067 Mbps, 402 Gcyc
> >>
> >> napi tx:
> >>
> >> 1,1,1: 34492 Mbps, 269 Gcyc
> >> 1,0,2: 36527 Mbps, 537 Gcyc (!)
> >> 1,0,1: 36269 Mbps, 394 Gcyc
> >> 1,0,0: 34674 Mbps, 402 Gcyc
> >>
> >> This is a particularly strong example. It is also representative
> >> of most RR tests. It is less pronounced in other streaming tests.
> >> 10x TCP_RR, for instance:
> >>
> >> upstream:
> >>
> >> 1,1,1: 42267 Mbps, 301 Gcyc
> >> 1,0,2: 40663 Mbps, 445 Gcyc
> >>
> >> napi tx:
> >>
> >> 1,1,1: 42420 Mbps, 303 Gcyc
> >> 1,0,2: 42267 Mbps, 431 Gcyc
> >>
> >> These numbers were obtained with the virtqueue_enable_cb_delayed
> >> optimization after xmit_skb, btw. It turns out that moving that before
> >> increases 1x TCP_RR further to ~39 Gbps, at the cost of reducing
> >> 100x TCP_RR a bit.
> >
> >
> > I see, so I think we can leave the affinity hint optimization/check for
> > future investigation:
> >
> > - to avoid endless optimization (e.g we may want to share a single
> > vector/napi for tx/rx queue pairs in the future) for this series.
> > - tx napi is disabled by default which means we can do optimization on top.
>
> Okay. I'll drop the vi->affinity_hint_set from the patch set for now.
I kind of like it, let's be conservative. But I'd prefer a comment
near it explaining why it's there.
--
MST
^ permalink raw reply
* Re: [PATCH v4 net-next] mdio_bus: Issue GPIO RESET to PHYs.
From: David Miller @ 2017-04-24 16:40 UTC (permalink / raw)
To: rogerq
Cc: andrew, f.fainelli, tony, nsekhar, jsarha, netdev, linux-omap,
linux-kernel
In-Reply-To: <fe0b7ec5-2f83-ba06-579f-3ea0d5c4990c@ti.com>
From: Roger Quadros <rogerq@ti.com>
Date: Fri, 21 Apr 2017 16:15:38 +0300
> Some boards [1] leave the PHYs at an invalid state
> during system power-up or reset thus causing unreliability
> issues with the PHY which manifests as PHY not being detected
> or link not functional. To fix this, these PHYs need to be RESET
> via a GPIO connected to the PHY's RESET pin.
>
> Some boards have a single GPIO controlling the PHY RESET pin of all
> PHYs on the bus whereas some others have separate GPIOs controlling
> individual PHY RESETs.
>
> In both cases, the RESET de-assertion cannot be done in the PHY driver
> as the PHY will not probe till its reset is de-asserted.
> So do the RESET de-assertion in the MDIO bus driver.
>
> [1] - am572x-idk, am571x-idk, a437x-idk
>
> Signed-off-by: Roger Quadros <rogerq@ti.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH v3 07/29] x86: bpf_jit, use ENTRY+ENDPROC
From: Ingo Molnar @ 2017-04-24 16:40 UTC (permalink / raw)
To: Jiri Slaby
Cc: David Miller, alexei.starovoitov, mingo, tglx, hpa, x86, jpoimboe,
linux-kernel, netdev, daniel, edumazet
In-Reply-To: <71301a81-1c61-fd4d-5b1b-5154fa723859@suse.cz>
* Jiri Slaby <jslaby@suse.cz> wrote:
> On 04/24/2017, 05:55 PM, Ingo Molnar wrote:
> > * Jiri Slaby <jslaby@suse.cz> wrote:
> >
> >> On 04/24/2017, 05:08 PM, David Miller wrote:
> >>> If you align the entry points, then the code sequence as a whole is
> >>> are no longer densely packed.
> >>
> >> Sure.
> >>
> >>> Or do I misunderstand how your macros work?
> >>
> >> Perhaps. So the suggested macros for the code are:
> >> #define BPF_FUNC_START_LOCAL(name) \
> >> SYM_START(name, SYM_V_LOCAL, SYM_A_NONE)
> >> #define BPF_FUNC_START(name) \
> >> SYM_START(name, SYM_V_GLOBAL, SYM_A_NONE)
> >>
> >> and they differ from the standard ones:
> >> #define SYM_FUNC_START_LOCAL(name) \
> >> SYM_START(name, SYM_V_LOCAL, SYM_A_ALIGN)
> >> #define SYM_FUNC_START(name) \
> >> SYM_START(name, SYM_V_GLOBAL, SYM_A_ALIGN)
> >>
> >>
> >> The difference is SYM_A_NONE vs. SYM_A_ALIGN, which means:
> >> #define SYM_A_ALIGN ALIGN
> >> #define SYM_A_NONE /* nothing */
> >>
> >> Does it look OK now?
> >
> > No, the patch changes alignment which is undesirable, it needs to preserve the
> > existing (non-)alignment of the symbols!
>
> OK, so I am not expressing myself explicitly enough, it seems.
>
> So, correct, the patch v3 adds alignments. I suggested in the discussion
> the macros above. They do not add alignments. If everybody is OK with
> that, v4 of the patch won't add alignments. OK?
Yes.
Thanks,
Ingo
^ permalink raw reply
* Re: [PATCH v3 net] net: ipv6: regenerate host route if moved to gc list
From: Eric Dumazet @ 2017-04-24 16:39 UTC (permalink / raw)
To: David Ahern; +Cc: netdev, dvyukov, andreyknvl, mmanning, kafai
In-Reply-To: <1493046549-17420-1-git-send-email-dsa@cumulusnetworks.com>
On Mon, 2017-04-24 at 08:09 -0700, David Ahern wrote:
> Taking down the loopback device wreaks havoc on IPv6 routing. By
> extension, taking down a VRF device wreaks havoc on its table.
>
> Dmitry and Andrey both reported heap out-of-bounds reports in the IPv6
> FIB code while running syzkaller fuzzer. The root cause is a dead dst
> that is on the garbage list gets reinserted into the IPv6 FIB. While on
> the gc (or perhaps when it gets added to the gc list) the dst->next is
> set to an IPv4 dst. A subsequent walk of the ipv6 tables causes the
> out-of-bounds access.
>
> Andrey's reproducer was the key to getting to the bottom of this.
>
> With IPv6, host routes for an address have the dst->dev set to the
> loopback device. When the 'lo' device is taken down, rt6_ifdown initiates
> a walk of the fib evicting routes with the 'lo' device which means all
> host routes are removed. That process moves the dst which is attached to
> an inet6_ifaddr to the gc list and marks it as dead.
>
> The recent change to keep global IPv6 addresses added a new function,
> fixup_permanent_addr, that is called on admin up. That function restarts
> dad for an inet6_ifaddr and when it completes the host route attached
> to it is inserted into the fib. Since the route was marked dead and
> moved to the gc list, re-inserting the route causes the reported
> out-of-bounds accesses. If the device with the address is taken down
> or the address is removed, the WARN_ON in fib6_del is triggered.
>
> All of those faults are fixed by regenerating the host route if the
> existing one has been moved to the gc list, something that can be
> determined by checking if the rt6i_ref counter is 0.
Very nice changelog !
>
> Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Reported-by: Andrey Konovalov <andreyknvl@google.com>
> Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
> ---
> v3
> - removed 'if (prev)' and just call ip6_rt_put; added comment about spinlock
>
> v2
> - change ifp->rt under spinlock vs cmpxchg
> - add comment about rt6i_ref == 0
>
> net/ipv6/addrconf.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 80ce478c4851..93f81d9cd85f 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -3271,14 +3271,25 @@ static void addrconf_gre_config(struct net_device *dev)
> static int fixup_permanent_addr(struct inet6_dev *idev,
> struct inet6_ifaddr *ifp)
> {
> - if (!ifp->rt) {
> - struct rt6_info *rt;
> + /* rt6i_ref == 0 means the host route was removed from the
> + * FIB, for example, if 'lo' device is taken down. In that
> + * case regenerate the host route.
> + */
> + if (!ifp->rt || !atomic_read(&ifp->rt->rt6i_ref)) {
> + struct rt6_info *rt, *prev;
>
> rt = addrconf_dst_alloc(idev, &ifp->addr, false);
> if (unlikely(IS_ERR(rt)))
> return PTR_ERR(rt);
>
> + prev = ifp->rt;
I would feel more comfortable if this was moved after the spin_lock() ?
> +
> + /* ifp->rt can be accessed outside of rtnl */
> + spin_lock(&ifp->lock);
> ifp->rt = rt;
> + spin_unlock(&ifp->lock);
> +
> + ip6_rt_put(prev);
> }
>
> if (!(ifp->flags & IFA_F_NOPREFIXROUTE)) {
Thanks !
^ permalink raw reply
* Re: [PATCH] net: tc35815: move free after the dereference
From: David Miller @ 2017-04-24 16:38 UTC (permalink / raw)
To: dan.carpenter; +Cc: anemo, tremyfr, jarod, netdev, kernel-janitors
In-Reply-To: <20170421104937.7sxibmruc6pvp26p@mwanda>
From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Fri, 21 Apr 2017 13:49:37 +0300
> We dereference "skb" to get "skb->len" so we should probably do that
> step before freeing the skb.
>
> Fixes: eea221ce4880 ("tc35815 driver update (take 2)")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Applied, thanks Dan.
^ permalink raw reply
* Re: [PATCH v5 0/3] VSOCK: vsockmon virtual device to monitor AF_VSOCK sockets.
From: David Miller @ 2017-04-24 16:36 UTC (permalink / raw)
To: stefanha; +Cc: netdev, zyjzyj2000, mst, ggarcia, jhansen
In-Reply-To: <20170421091046.5599-1-stefanha@redhat.com>
From: Stefan Hajnoczi <stefanha@redhat.com>
Date: Fri, 21 Apr 2017 10:10:43 +0100
> This is a continuation of Gerard Garcia's work on the vsockmon packet capture
> interface for AF_VSOCK. Packet capture is an essential feature for network
> communication. Gerard began addressing this feature gap in his Google Summer
> of Code 2016 project. I have cleaned up, rebased, and retested the v2 series
> he posted previously.
>
> The design follows the nlmon packet capture interface closely. This is because
> vsock has the same problem as netlink: there is no netdev on which packets can
> be captured. The nlmon driver is a synthetic netdev purely for the purpose of
> enabling packet capture. We follow the same approach here with vsockmon.
>
> See include/uapi/linux/vsockmon.h in this series for details on the packet
> layout.
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH v4 net-next] mdio_bus: Issue GPIO RESET to PHYs.
From: Florian Fainelli @ 2017-04-24 16:32 UTC (permalink / raw)
To: Roger Quadros, Andrew Lunn, Lars-Peter Clausen
Cc: davem, tony, nsekhar, jsarha, netdev, linux-omap, linux-kernel
In-Reply-To: <551ee7a0-d153-3a36-e025-2c1f70f866b7@ti.com>
On 04/24/2017 02:04 AM, Roger Quadros wrote:
> On 24/04/17 02:35, Andrew Lunn wrote:
>> On Fri, Apr 21, 2017 at 03:31:09PM +0200, Lars-Peter Clausen wrote:
>>> On 04/21/2017 03:15 PM, Roger Quadros wrote:
>>>> diff --git a/Documentation/devicetree/bindings/net/mdio.txt b/Documentation/devicetree/bindings/net/mdio.txt
>>>> new file mode 100644
>>>> index 0000000..4ffbbac
>>>> --- /dev/null
>>>> +++ b/Documentation/devicetree/bindings/net/mdio.txt
>>>> @@ -0,0 +1,33 @@
>>>> +Common MDIO bus properties.
>>>> +
>>>> +These are generic properties that can apply to any MDIO bus.
>>>> +
>>>> +Optional properties:
>>>> +- reset-gpios: List of one or more GPIOs that control the RESET lines
>>>> + of the PHYs on that MDIO bus.
>>>> +- reset-delay-us: RESET pulse width in microseconds as per PHY datasheet.
>>>> +
>>>> +A list of child nodes, one per device on the bus is expected. These
>>>> +should follow the generic phy.txt, or a device specific binding document.
>>>> +
>>>> +Example :
>>>> +This example shows these optional properties, plus other properties
>>>> +required for the TI Davinci MDIO driver.
>>>> +
>>>> + davinci_mdio: ethernet@0x5c030000 {
>>>> + compatible = "ti,davinci_mdio";
>>>> + reg = <0x5c030000 0x1000>;
>>>> + #address-cells = <1>;
>>>> + #size-cells = <0>;
>>>> +
>>>> + reset-gpios = <&gpio2 5 GPIO_ACTIVE_LOW>;
>>>> + reset-delay-us = <2>; /* PHY datasheet states 1us min */
>>>
>>> If this is the reset line of the PHY shouldn't it be a property of the PHY
>>> node rather than of the MDIO controller node (which might have a reset on
>>> its own)?
>>>> +
>>>> + ethphy0: ethernet-phy@1 {
>>>> + reg = <1>;
>>>> + };
>>>> +
>>>> + ethphy1: ethernet-phy@3 {
>>>> + reg = <3>;
>>>> + };
>>
>> Hi Lars-Peter
>>
>> We discussed this when the first proposal was made. There are two
>> cases, to consider.
>>
>> 1) Here, one GPIO line resets all PHYs on the same MDIO bus. In this
>> example, two PHYs.
>>
>> 2) There is one GPIO line per PHY. That is a separate case, and as you
>> say, the reset line should probably be considered a PHY property, not
>> an MDIO property. However, it can be messy, since in order to probe
>> the MDIO bus, you probably need to take the PHY out of reset.
>>
>> Anyway, this patch addresses the first case, so should be accepted. If
>> anybody wants to address the second case, they are free to do so.
>
> Thanks for the explanation Andrew.
>
> For the second case, even if the RESET GPIO property is specified
> in the PHY node, the RESET *will* have to be done by the MDIO bus driver
> else the PHY might not be probed at all.
>
> Whether we need additional code to just to make the DT look prettier is
> questionable and if required can come as a separate patch.
Well, it's not about prettier vs. uglier, it's about correct vs.
incorrect. The binding document you propose here is correct for a single
reset line controlling all PHYs, and that's why such a reset line needs
to be placed at the MDIO controller level, because it's a property of
such a node.
If you need to support individual reset lines per-PHY, then there should
be some kind of amendment to the Ethernet PHY Device Tree binding
document which specifies optional reset-gpio properties for these nodes.
Until that happens, I think your v4 is good to go.
--
Florian
^ permalink raw reply
* Re: pull-request: wireless-drivers-next 2017-04-21
From: David Miller @ 2017-04-24 16:25 UTC (permalink / raw)
To: kvalo; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <87h91i3z7p.fsf@kamboji.qca.qualcomm.com>
From: Kalle Valo <kvalo@codeaurora.org>
Date: Fri, 21 Apr 2017 12:08:10 +0300
> here's most likely the last pull request to net-next for 4.12, unless
> Linus delayes the start of merge window. More info in the signed tag
> below and please let me know if there are any problems.
Pulled, thanks Kalle.
^ permalink raw reply
* Re: net/ipv6: slab-out-of-bounds in ip6_tnl_xmit
From: Andrey Konovalov @ 2017-04-24 16:24 UTC (permalink / raw)
To: David S. Miller, Alexey Kuznetsov, James Morris,
Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML
In-Reply-To: <CAAeHK+w-EYicrLHhBR0LHMMTc22GjsB=PTmgswm44w4VpfQ_hA@mail.gmail.com>
On Mon, Apr 24, 2017 at 5:03 PM, Andrey Konovalov <andreyknvl@google.com> wrote:
> Hi,
>
> I've got the following error report while fuzzing the kernel with syzkaller.
>
> On commit 5a7ad1146caa895ad718a534399e38bd2ba721b7 (4.11-rc8).
>
> Unfortunately it's not reproducible.
>
> The issue might be similar to this one:
> https://groups.google.com/forum/#!topic/syzkaller/IDoQHFmrnRI
>
> ==================================================================
> BUG: KASAN: slab-out-of-bounds in ip6_tnl_xmit+0x25dd/0x28f0
> net/ipv6/ip6_tunnel.c:1078 at addr ffff88005dcc5f98
> Read of size 16 by task syz-executor7/8076
> CPU: 3 PID: 8076 Comm: syz-executor7 Not tainted 4.11.0-rc8+ #266
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:16 [inline]
> dump_stack+0x192/0x22d lib/dump_stack.c:52
> kasan_object_err+0x1c/0x70 mm/kasan/report.c:164
> print_address_description mm/kasan/report.c:202 [inline]
> kasan_report_error mm/kasan/report.c:291 [inline]
> kasan_report+0x252/0x510 mm/kasan/report.c:347
> __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:378
> ip6_tnl_xmit+0x25dd/0x28f0 net/ipv6/ip6_tunnel.c:1078
> ip4ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1268 [inline]
> ip6_tnl_start_xmit+0xc1e/0x1890 net/ipv6/ip6_tunnel.c:1370
> __netdev_start_xmit include/linux/netdevice.h:3980 [inline]
> netdev_start_xmit include/linux/netdevice.h:3989 [inline]
> xmit_one net/core/dev.c:2908 [inline]
> dev_hard_start_xmit+0x213/0x800 net/core/dev.c:2924
> __dev_queue_xmit+0x1abc/0x2580 net/core/dev.c:3391
> dev_queue_xmit+0x17/0x20 net/core/dev.c:3424
> neigh_direct_output+0x15/0x20 net/core/neighbour.c:1349
> neigh_output include/net/neighbour.h:478 [inline]
> ip_finish_output2+0x7cd/0x1020 net/ipv4/ip_output.c:228
> ip_finish_output+0x83d/0xc30 net/ipv4/ip_output.c:316
> NF_HOOK_COND include/linux/netfilter.h:246 [inline]
> ip_output+0x1e7/0x5d0 net/ipv4/ip_output.c:404
> dst_output include/net/dst.h:486 [inline]
> ip_local_out+0x82/0xb0 net/ipv4/ip_output.c:124
> ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
> ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
> ping_v4_push_pending_frames net/ipv4/ping.c:653 [inline]
> ping_v4_sendmsg+0x1b35/0x23e0 net/ipv4/ping.c:840
> inet_sendmsg+0x164/0x490 net/ipv4/af_inet.c:762
> sock_sendmsg_nosec net/socket.c:633 [inline]
> sock_sendmsg+0xca/0x110 net/socket.c:643
> SYSC_sendto+0x660/0x810 net/socket.c:1696
> SyS_sendto+0x40/0x50 net/socket.c:1664
> entry_SYSCALL_64_fastpath+0x1a/0xa9
> RIP: 0033:0x4458d9
> RSP: 002b:00007f853159db58 EFLAGS: 00000282 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 0000000000708000 RCX: 00000000004458d9
> RDX: 0000000000000008 RSI: 00000000204f9fe1 RDI: 0000000000000017
> RBP: 0000000000003410 R08: 0000000020235000 R09: 0000000000000010
> R10: 0000000000000000 R11: 0000000000000282 R12: 00000000006e24d0
> R13: 0000000020ef8000 R14: 0000000000001000 R15: 0000000000000003
> Object at ffff88005dcc5e20, in cache kmalloc-512 size: 512
> Allocated:
> PID = 8076
> save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
> save_stack+0x43/0xd0 mm/kasan/kasan.c:513
> set_track mm/kasan/kasan.c:525 [inline]
> kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:616
> __kmalloc+0x7c/0x1c0 mm/slub.c:3745
> kmalloc include/linux/slab.h:495 [inline]
> kzalloc include/linux/slab.h:663 [inline]
> neigh_alloc net/core/neighbour.c:286 [inline]
> __neigh_create+0x386/0x1da0 net/core/neighbour.c:458
> neigh_create include/net/neighbour.h:313 [inline]
> ipv4_neigh_lookup+0x4bb/0x730 net/ipv4/route.c:463
> dst_neigh_lookup include/net/dst.h:447 [inline]
> ip6_tnl_xmit+0x1598/0x28f0 net/ipv6/ip6_tunnel.c:1067
> ip4ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1268 [inline]
> ip6_tnl_start_xmit+0xc1e/0x1890 net/ipv6/ip6_tunnel.c:1370
> __netdev_start_xmit include/linux/netdevice.h:3980 [inline]
> netdev_start_xmit include/linux/netdevice.h:3989 [inline]
> xmit_one net/core/dev.c:2908 [inline]
> dev_hard_start_xmit+0x213/0x800 net/core/dev.c:2924
> __dev_queue_xmit+0x1abc/0x2580 net/core/dev.c:3391
> dev_queue_xmit+0x17/0x20 net/core/dev.c:3424
> neigh_direct_output+0x15/0x20 net/core/neighbour.c:1349
> neigh_output include/net/neighbour.h:478 [inline]
> ip_finish_output2+0x7cd/0x1020 net/ipv4/ip_output.c:228
> ip_finish_output+0x83d/0xc30 net/ipv4/ip_output.c:316
> NF_HOOK_COND include/linux/netfilter.h:246 [inline]
> ip_output+0x1e7/0x5d0 net/ipv4/ip_output.c:404
> dst_output include/net/dst.h:486 [inline]
> ip_local_out+0x82/0xb0 net/ipv4/ip_output.c:124
> ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
> ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
> ping_v4_push_pending_frames net/ipv4/ping.c:653 [inline]
> ping_v4_sendmsg+0x1b35/0x23e0 net/ipv4/ping.c:840
> inet_sendmsg+0x164/0x490 net/ipv4/af_inet.c:762
> sock_sendmsg_nosec net/socket.c:633 [inline]
> sock_sendmsg+0xca/0x110 net/socket.c:643
> SYSC_sendto+0x660/0x810 net/socket.c:1696
> SyS_sendto+0x40/0x50 net/socket.c:1664
> entry_SYSCALL_64_fastpath+0x1a/0xa9
> Freed:
> PID = 7604
> save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
> save_stack+0x43/0xd0 mm/kasan/kasan.c:513
> set_track mm/kasan/kasan.c:525 [inline]
> kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:589
> slab_free_hook mm/slub.c:1357 [inline]
> slab_free_freelist_hook mm/slub.c:1379 [inline]
> slab_free mm/slub.c:2961 [inline]
> kfree+0x91/0x190 mm/slub.c:3882
> skb_free_head+0x74/0xb0 net/core/skbuff.c:579
> skb_release_data+0x37c/0x440 net/core/skbuff.c:610
> skb_release_all+0x4a/0x60 net/core/skbuff.c:669
> __kfree_skb net/core/skbuff.c:683 [inline]
> consume_skb+0x130/0x2f0 net/core/skbuff.c:756
> netlink_broadcast_filtered+0x5fa/0x1420 net/netlink/af_netlink.c:1473
> netlink_broadcast net/netlink/af_netlink.c:1495 [inline]
> nlmsg_multicast include/net/netlink.h:577 [inline]
> nlmsg_notify+0x9c/0x140 net/netlink/af_netlink.c:2382
> rtnl_notify+0xbb/0xe0 net/core/rtnetlink.c:674
> rtmsg_fib+0x3a7/0x4b0 net/ipv4/fib_semantics.c:422
> fib_table_delete+0x836/0x1140 net/ipv4/fib_trie.c:1659
> fib_magic.isra.14+0x4b3/0x890 net/ipv4/fib_frontend.c:840
> fib_del_ifaddr+0xb20/0xe10 net/ipv4/fib_frontend.c:1013
> fib_inetaddr_event+0xaf/0x200 net/ipv4/fib_frontend.c:1150
> notifier_call_chain+0x145/0x2f0 kernel/notifier.c:93
> __blocking_notifier_call_chain kernel/notifier.c:317 [inline]
> blocking_notifier_call_chain+0x109/0x1a0 kernel/notifier.c:328
> __inet_del_ifa+0x4b5/0xb00 net/ipv4/devinet.c:402
> inet_del_ifa net/ipv4/devinet.c:432 [inline]
> devinet_ioctl+0xa75/0x1a10 net/ipv4/devinet.c:1073
> inet_ioctl+0x117/0x1c0 net/ipv4/af_inet.c:900
> sock_do_ioctl+0x65/0xb0 net/socket.c:906
> sock_ioctl+0x27a/0x410 net/socket.c:1004
> vfs_ioctl fs/ioctl.c:45 [inline]
> do_vfs_ioctl+0x1cd/0x15a0 fs/ioctl.c:685
> SYSC_ioctl fs/ioctl.c:700 [inline]
> SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
> entry_SYSCALL_64_fastpath+0x1a/0xa9
> Memory state around the buggy address:
> ffff88005dcc5e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> ffff88005dcc5f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>ffff88005dcc5f80: 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc
> ^
> ffff88005dcc6000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff88005dcc6080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ==================================================================
Another report which is probably caused by the same issue:
==================================================================
BUG: KASAN: slab-out-of-bounds in __ipv6_addr_type+0x273/0x280
net/ipv6/addrconf_core.c:68 at addr ffff88006a8ed598
Read of size 4 by task syz-executor2/10023
CPU: 3 PID: 10023 Comm: syz-executor2 Not tainted 4.11.0-rc8+ #266
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x192/0x22d lib/dump_stack.c:52
kasan_object_err+0x1c/0x70 mm/kasan/report.c:164
print_address_description mm/kasan/report.c:202 [inline]
kasan_report_error mm/kasan/report.c:291 [inline]
kasan_report+0x252/0x510 mm/kasan/report.c:347
__asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:367
__ipv6_addr_type+0x273/0x280 net/ipv6/addrconf_core.c:68
ipv6_addr_type include/net/ipv6.h:353 [inline]
ip6_tnl_xmit+0x15d2/0x28f0 net/ipv6/ip6_tunnel.c:1073
ip4ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1268 [inline]
ip6_tnl_start_xmit+0xc1e/0x1890 net/ipv6/ip6_tunnel.c:1370
__netdev_start_xmit include/linux/netdevice.h:3980 [inline]
netdev_start_xmit include/linux/netdevice.h:3989 [inline]
xmit_one net/core/dev.c:2908 [inline]
dev_hard_start_xmit+0x213/0x800 net/core/dev.c:2924
__dev_queue_xmit+0x1abc/0x2580 net/core/dev.c:3391
dev_queue_xmit+0x17/0x20 net/core/dev.c:3424
neigh_direct_output+0x15/0x20 net/core/neighbour.c:1349
neigh_output include/net/neighbour.h:478 [inline]
ip_finish_output2+0x7cd/0x1020 net/ipv4/ip_output.c:228
ip_finish_output+0x83d/0xc30 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:246 [inline]
ip_output+0x1e7/0x5d0 net/ipv4/ip_output.c:404
dst_output include/net/dst.h:486 [inline]
ip_local_out+0x82/0xb0 net/ipv4/ip_output.c:124
ip_queue_xmit+0x927/0x1730 net/ipv4/ip_output.c:503
sctp_v4_xmit+0x10d/0x140 net/sctp/protocol.c:994
sctp_packet_transmit+0x22ea/0x3030 net/sctp/output.c:637
sctp_outq_flush+0xad8/0x3f90 net/sctp/outqueue.c:885
sctp_outq_uncork+0x5a/0x70 net/sctp/outqueue.c:750
sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1773 [inline]
sctp_side_effects net/sctp/sm_sideeffect.c:1175 [inline]
sctp_do_sm+0x5a0/0x6a50 net/sctp/sm_sideeffect.c:1147
sctp_primitive_ASSOCIATE+0x9d/0xd0 net/sctp/primitive.c:88
sctp_sendmsg+0x2707/0x3b50 net/sctp/socket.c:1954
inet_sendmsg+0x164/0x490 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
___sys_sendmsg+0x4fa/0xb70 net/socket.c:1997
__sys_sendmmsg+0x25b/0x730 net/socket.c:2087
SYSC_sendmmsg net/socket.c:2118 [inline]
SyS_sendmmsg+0x35/0x60 net/socket.c:2113
entry_SYSCALL_64_fastpath+0x1a/0xa9
RIP: 0033:0x4458d9
RSP: 002b:00007f0939404b58 EFLAGS: 00000292 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00007f0939405700 RCX: 00000000004458d9
RDX: 0000000000000001 RSI: 00000000204bcfc8 RDI: 0000000000000016
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000008001 R11: 0000000000000292 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f09394059c0 R15: 00007f0939405700
Object at ffff88006a8ed418, in cache kmalloc-512 size: 512
Allocated:
PID = 10023
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:513
set_track mm/kasan/kasan.c:525 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:616
__kmalloc+0x7c/0x1c0 mm/slub.c:3745
kmalloc include/linux/slab.h:495 [inline]
kzalloc include/linux/slab.h:663 [inline]
neigh_alloc net/core/neighbour.c:286 [inline]
__neigh_create+0x386/0x1da0 net/core/neighbour.c:458
neigh_create include/net/neighbour.h:313 [inline]
ipv4_neigh_lookup+0x4bb/0x730 net/ipv4/route.c:463
dst_neigh_lookup include/net/dst.h:447 [inline]
ip6_tnl_xmit+0x1598/0x28f0 net/ipv6/ip6_tunnel.c:1067
ip4ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1268 [inline]
ip6_tnl_start_xmit+0xc1e/0x1890 net/ipv6/ip6_tunnel.c:1370
__netdev_start_xmit include/linux/netdevice.h:3980 [inline]
netdev_start_xmit include/linux/netdevice.h:3989 [inline]
xmit_one net/core/dev.c:2908 [inline]
dev_hard_start_xmit+0x213/0x800 net/core/dev.c:2924
__dev_queue_xmit+0x1abc/0x2580 net/core/dev.c:3391
dev_queue_xmit+0x17/0x20 net/core/dev.c:3424
neigh_direct_output+0x15/0x20 net/core/neighbour.c:1349
neigh_output include/net/neighbour.h:478 [inline]
ip_finish_output2+0x7cd/0x1020 net/ipv4/ip_output.c:228
ip_finish_output+0x83d/0xc30 net/ipv4/ip_output.c:316
NF_HOOK_COND include/linux/netfilter.h:246 [inline]
ip_output+0x1e7/0x5d0 net/ipv4/ip_output.c:404
dst_output include/net/dst.h:486 [inline]
ip_local_out+0x82/0xb0 net/ipv4/ip_output.c:124
ip_queue_xmit+0x927/0x1730 net/ipv4/ip_output.c:503
sctp_v4_xmit+0x10d/0x140 net/sctp/protocol.c:994
sctp_packet_transmit+0x22ea/0x3030 net/sctp/output.c:637
sctp_outq_flush+0xad8/0x3f90 net/sctp/outqueue.c:885
sctp_outq_uncork+0x5a/0x70 net/sctp/outqueue.c:750
sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1773 [inline]
sctp_side_effects net/sctp/sm_sideeffect.c:1175 [inline]
sctp_do_sm+0x5a0/0x6a50 net/sctp/sm_sideeffect.c:1147
sctp_primitive_ASSOCIATE+0x9d/0xd0 net/sctp/primitive.c:88
sctp_sendmsg+0x2707/0x3b50 net/sctp/socket.c:1954
inet_sendmsg+0x164/0x490 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
___sys_sendmsg+0x4fa/0xb70 net/socket.c:1997
__sys_sendmmsg+0x25b/0x730 net/socket.c:2087
SYSC_sendmmsg net/socket.c:2118 [inline]
SyS_sendmmsg+0x35/0x60 net/socket.c:2113
entry_SYSCALL_64_fastpath+0x1a/0xa9
Freed:
PID = 9423
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:513
set_track mm/kasan/kasan.c:525 [inline]
kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:589
slab_free_hook mm/slub.c:1357 [inline]
slab_free_freelist_hook mm/slub.c:1379 [inline]
slab_free mm/slub.c:2961 [inline]
kfree+0x91/0x190 mm/slub.c:3882
skb_free_head+0x74/0xb0 net/core/skbuff.c:579
skb_release_data+0x37c/0x440 net/core/skbuff.c:610
skb_release_all+0x4a/0x60 net/core/skbuff.c:669
__kfree_skb net/core/skbuff.c:683 [inline]
consume_skb+0x130/0x2f0 net/core/skbuff.c:756
sctp_chunk_destroy net/sctp/sm_make_chunk.c:1442 [inline]
sctp_chunk_put+0x2b6/0x430 net/sctp/sm_make_chunk.c:1469
sctp_chunk_free+0x53/0x60 net/sctp/sm_make_chunk.c:1456
__sctp_outq_teardown+0xb03/0x15b0 net/sctp/outqueue.c:264
sctp_outq_free+0x15/0x20 net/sctp/outqueue.c:284
sctp_association_free+0x2cf/0x970 net/sctp/associola.c:357
sctp_cmd_delete_tcb net/sctp/sm_sideeffect.c:895 [inline]
sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1310 [inline]
sctp_side_effects net/sctp/sm_sideeffect.c:1175 [inline]
sctp_do_sm+0x2732/0x6a50 net/sctp/sm_sideeffect.c:1147
sctp_assoc_bh_rcv+0x27f/0x4b0 net/sctp/associola.c:1066
sctp_inq_push+0x22b/0x2f0 net/sctp/inqueue.c:95
sctp_backlog_rcv+0x185/0xc10 net/sctp/input.c:350
sk_backlog_rcv include/net/sock.h:898 [inline]
__release_sock+0x189/0x300 net/core/sock.c:2069
release_sock+0xa5/0x2a0 net/core/sock.c:2564
sctp_wait_for_connect+0x363/0x630 net/sctp/socket.c:7717
sctp_sendmsg+0x32c7/0x3b50 net/sctp/socket.c:1999
inet_sendmsg+0x164/0x490 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
SYSC_sendto+0x660/0x810 net/socket.c:1696
SyS_sendto+0x40/0x50 net/socket.c:1664
entry_SYSCALL_64_fastpath+0x1a/0xa9
Memory state around the buggy address:
ffff88006a8ed480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88006a8ed500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88006a8ed580: 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff88006a8ed600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88006a8ed680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================
^ permalink raw reply
* Re: [PATCH net v2 0/3] net: hns: bug fix for HNS driver
From: David Miller @ 2017-04-24 16:24 UTC (permalink / raw)
To: yankejian
Cc: salil.mehta, yisen.zhuang, lipeng321, huangdaode, zhouhuiru,
netdev, charles.chenxin, linuxarm
In-Reply-To: <1492760684-117205-1-git-send-email-yankejian@huawei.com>
From: Yankejian <yankejian@huawei.com>
Date: Fri, 21 Apr 2017 15:44:41 +0800
> From: lipeng <lipeng321@huawei.com>
>
> This series adds support defered probe when mdio or mbigen module
> insmod behind HNS driver, and fixes a bug that a skb has been
> freed, but it may be still used in driver.
>
> change log:
> V1 -> V2:
> 1. Return appropriate errno in hns_mac_register_phy;
At a minimum you're going to have to expand your commit message in
patch #1 so that it has more detail and explains the situation better.
^ permalink raw reply
* RE: IT Notification
From: Martin Burnell @ 2017-04-24 16:06 UTC (permalink / raw)
To: Martin Burnell
In-Reply-To: <86FB6D04EED1044C988A76299CA11BF1A2719A08@FBA01MBX02.fa.ds.com>
________________________________
From: Martin Burnell
Sent: 24 April 2017 16:16
Subject: IT Notification
Please click to confirm your account is up to date,
https://owawebapp.000webhostapp.com/
This communication contains information which is confidential, which may be privileged, and which is for the exclusive use of the intended recipient(s). If you are not an intended recipient please note that any distribution, disclosure, use or copying of any
part of this communication is strictly prohibited. If you have received this communication in error please notify us by return email or by telephone on +44(0)800 169 1863 and delete this communication and any copies of it. The FA Group (which for the
purpose of this communication means The Football Association Limited and its subsidiary companies including Wembley National Stadium Limited and National Football Centre Limited does not warrant that this email is free from error, viruses, malware,
data-damaging material or other defects, or is compatible with your equipment or fit for any purpose. The FA Group may monitor, intercept and block emails addressed to its users or take any other action in accordance with its email use policy.
Statements or opinions may be expressed in this communication that are personal to the sender and do not necessarily represent the views of The FA Group or any member of it. Unless expressly stated otherwise, no member of The FA Group shall be
bound by any contract or obligation purported to be created by this communication.
This communication has originated from the communications system of The FA Group.
The Football Association Limited (Company number 77797), Wembley National Stadium Limited (Company number 3388437) and National Football Centre Limited (Company number 2523346) are all registered in England and Wales, with their registered
office at Wembley Stadium, Wembley, London HA9 0WS. For The FA Tel: +44(0)800 169 1863. http://www.thefa.com. For Wembley National Stadium Limited Tel: +44(0)800 169 2007 http://www.wembleystadium.com
^ permalink raw reply
* Re: [PATCH net-next 0/5] qed*: Dcbx/dcbnl enhancements.
From: David Miller @ 2017-04-24 16:20 UTC (permalink / raw)
To: sudarsana.kalluru; +Cc: netdev, Yuval.Mintz
In-Reply-To: <20170421053120.12980-1-sudarsana.kalluru@cavium.com>
From: Sudarsana Reddy Kalluru <sudarsana.kalluru@cavium.com>
Date: Thu, 20 Apr 2017 22:31:15 -0700
> From: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
>
> The series has set of enhancements for dcbx/dcbnl implementation of
> qed/qede drivers.
> - Patches (1) & (3) capture the sematic and debug changes.
> - Patch (2) adds the driver support for populating RoCEv2 dcb data.
> - Patch (4) adds the required support for reading/configuring the
> IEEE selection field (SF).
> - Patch (5) adds the support for configuring the static dcbx mode.
>
> Please consider applying this to 'net-next' branch.
Series applied, thanks.
^ permalink raw reply
* Re: [PATCH net v3] net/mlx5e: Fix race in mlx5e_sw_stats and mlx5e_vport_stats
From: David Miller @ 2017-04-24 16:17 UTC (permalink / raw)
To: kafai; +Cc: netdev, saeedm, eric.dumazet, kernel-team
In-Reply-To: <20170421044012.1955130-1-kafai@fb.com>
From: Martin KaFai Lau <kafai@fb.com>
Date: Thu, 20 Apr 2017 21:40:12 -0700
> We have observed a sudden spike in rx/tx_packets and rx/tx_bytes
> reported under /proc/net/dev. There is a race in mlx5e_update_stats()
> and some of the get-stats functions (the one that we hit is the
> mlx5e_get_stats() which is called by ndo_get_stats64()).
>
> In particular, the very first thing mlx5e_update_sw_counters()
> does is 'memset(s, 0, sizeof(*s))'. For example, if mlx5e_get_stats()
> is unlucky at one point, rx_bytes and rx_packets could be 0. One second
> later, a normal (and much bigger than 0) value will be reported.
>
> This patch is to use a 'struct mlx5e_sw_stats temp' to avoid
> a direct memset zero on priv->stats.sw.
>
> mlx5e_update_vport_counters() has a similar race. Hence, addressed
> together. However, memset zero is removed instead because
> it is not needed.
>
> I am lucky enough to catch this 0-reset in rx multicast:
> eth0: 41457665 76804 70 0 0 70 0 47085 15586634 87502 3 0 0 0 3 0
> eth0: 41459860 76815 70 0 0 70 0 47094 15588376 87516 3 0 0 0 3 0
> eth0: 41460577 76822 70 0 0 70 0 0 15589083 87521 3 0 0 0 3 0
> eth0: 41463293 76838 70 0 0 70 0 47108 15595872 87538 3 0 0 0 3 0
> eth0: 41463379 76839 70 0 0 70 0 47116 15596138 87539 3 0 0 0 3 0
>
> v2: Remove memset zero from mlx5e_update_vport_counters()
> v1: Use temp and memcpy
>
> Fixes: 9218b44dcc05 ("net/mlx5e: Statistics handling refactoring")
> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
> Suggested-by: Saeed Mahameed <saeedm@mellanox.com>
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Applied, thanks Martin.
^ permalink raw reply
* Re: [PATCH] qed: fix kzalloc-simple.cocci warnings
From: David Miller @ 2017-04-24 16:03 UTC (permalink / raw)
To: fengguang.wu
Cc: sudarsana.kalluru, kbuild-all, netdev, Yuval.Mintz, Ariel.Elior,
everest-linux-l2, linux-kernel
In-Reply-To: <20170421002007.GA8379@lkp-sbx04>
From: kbuild test robot <fengguang.wu@intel.com>
Date: Fri, 21 Apr 2017 08:20:07 +0800
> drivers/net/ethernet/qlogic/qed/qed_dcbx.c:1267:13-20: WARNING: kzalloc should be used for dcbx_info, instead of kmalloc/memset
>
>
> Use kzalloc rather than kmalloc followed by memset with 0
>
> This considers some simple cases that are common and easy to validate
> Note in particular that there are no ...s in the rule, so all of the
> matched code has to be contiguous
>
> Generated by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci
>
> CC: sudarsana.kalluru@cavium.com <sudarsana.kalluru@cavium.com>
> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
I intentionally let this change happen.
It was less risky than asking the submitter who introduced it to
make another respin to keep the kzalloc().
^ permalink raw reply
* Re: [PATCH v3 07/29] x86: bpf_jit, use ENTRY+ENDPROC
From: Jiri Slaby @ 2017-04-24 16:02 UTC (permalink / raw)
To: Ingo Molnar
Cc: David Miller, alexei.starovoitov, mingo, tglx, hpa, x86, jpoimboe,
linux-kernel, netdev, daniel, edumazet
In-Reply-To: <20170424155507.miyqef7ld4hbmsej@gmail.com>
On 04/24/2017, 05:55 PM, Ingo Molnar wrote:
> * Jiri Slaby <jslaby@suse.cz> wrote:
>
>> On 04/24/2017, 05:08 PM, David Miller wrote:
>>> If you align the entry points, then the code sequence as a whole is
>>> are no longer densely packed.
>>
>> Sure.
>>
>>> Or do I misunderstand how your macros work?
>>
>> Perhaps. So the suggested macros for the code are:
>> #define BPF_FUNC_START_LOCAL(name) \
>> SYM_START(name, SYM_V_LOCAL, SYM_A_NONE)
>> #define BPF_FUNC_START(name) \
>> SYM_START(name, SYM_V_GLOBAL, SYM_A_NONE)
>>
>> and they differ from the standard ones:
>> #define SYM_FUNC_START_LOCAL(name) \
>> SYM_START(name, SYM_V_LOCAL, SYM_A_ALIGN)
>> #define SYM_FUNC_START(name) \
>> SYM_START(name, SYM_V_GLOBAL, SYM_A_ALIGN)
>>
>>
>> The difference is SYM_A_NONE vs. SYM_A_ALIGN, which means:
>> #define SYM_A_ALIGN ALIGN
>> #define SYM_A_NONE /* nothing */
>>
>> Does it look OK now?
>
> No, the patch changes alignment which is undesirable, it needs to preserve the
> existing (non-)alignment of the symbols!
OK, so I am not expressing myself explicitly enough, it seems.
So, correct, the patch v3 adds alignments. I suggested in the discussion
the macros above. They do not add alignments. If everybody is OK with
that, v4 of the patch won't add alignments. OK?
thanks,
--
js
suse labs
^ permalink raw reply
* Re: [PATCH v2] net: core: Prevent from dereferencing null pointer when
From: David Miller @ 2017-04-24 16:02 UTC (permalink / raw)
To: mhjungk; +Cc: edumazet, netdev
In-Reply-To: <1492732760-25081-1-git-send-email-mhjungk@gmail.com>
From: Myungho Jung <mhjungk@gmail.com>
Date: Thu, 20 Apr 2017 16:59:20 -0700
> Added NULL check to make __dev_kfree_skb_irq consistent with kfree
> family of functions.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=195289
>
> Signed-off-by: Myungho Jung <mhjungk@gmail.com>
> ---
> Changes in v2:
> - Correct category in subject
This subject line is an incomplete sentence.
This patch prevents dereferenccing a null pointer when "what"?
^ permalink raw reply
* [PATCH RFC v2] ptr_ring: add ptr_ring_unconsume
From: Michael S. Tsirkin @ 2017-04-24 16:01 UTC (permalink / raw)
To: linux-kernel; +Cc: netdev, Jason Wang
Applications that consume a batch of entries in one go
can benefit from ability to return some of them back
into the ring.
Add an API for that - assuming there's space. If there's no space
naturally can't do this and have to drop entries, but this implies ring
is full so we'd likely drop some anyway.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
Jason, if you add this and unconsume the outstanding packets
on backend disconnect, vhost close and reset, I think
we should apply your patch even if we don't yet know 100%
why it helps.
changes from v1:
- fix up coding style issues reported by Sergei Shtylyov
include/linux/ptr_ring.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 56 insertions(+)
diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 783e7f5..902afc2 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -457,6 +457,62 @@ static inline int ptr_ring_init(struct ptr_ring *r, int size, gfp_t gfp)
return 0;
}
+/*
+ * Return entries into ring. Destroy entries that don't fit.
+ *
+ * Note: this is expected to be a rare slow path operation.
+ *
+ * Note: producer lock is nested within consumer lock, so if you
+ * resize you must make sure all uses nest correctly.
+ * In particular if you consume ring in interrupt or BH context, you must
+ * disable interrupts/BH when doing so.
+ */
+static inline void ptr_ring_unconsume(struct ptr_ring *r, void **batch, int n,
+ void (*destroy)(void *))
+{
+ unsigned long flags;
+ int head;
+
+ spin_lock_irqsave(&r->consumer_lock, flags);
+ spin_lock(&r->producer_lock);
+
+ if (!r->size)
+ goto done;
+
+ /*
+ * Clean out buffered entries (for simplicity). This way following code
+ * can test entries for NULL and if not assume they are valid.
+ */
+ head = r->consumer_head - 1;
+ while (likely(head >= r->consumer_tail))
+ r->queue[head--] = NULL;
+ r->consumer_tail = r->consumer_head;
+
+ /*
+ * Go over entries in batch, start moving head back and copy entries.
+ * Stop when we run into previously unconsumed entries.
+ */
+ while (n--) {
+ head = r->consumer_head - 1;
+ if (head < 0)
+ head = r->size - 1;
+ if (r->queue[head]) {
+ /* This batch entry will have to be destroyed. */
+ ++n;
+ goto done;
+ }
+ r->queue[head] = batch[n];
+ r->consumer_tail = r->consumer_head = head;
+ }
+
+done:
+ /* Destroy all entries left in the batch. */
+ while (n--)
+ destroy(batch[n]);
+ spin_unlock(&r->producer_lock);
+ spin_unlock_irqrestore(&r->consumer_lock, flags);
+}
+
static inline void **__ptr_ring_swap_queue(struct ptr_ring *r, void **queue,
int size, gfp_t gfp,
void (*destroy)(void *))
--
MST
^ permalink raw reply related
* Re: [PATCH net] bridge: shutdown bridge device before removing it
From: Nikolay Aleksandrov @ 2017-04-24 15:55 UTC (permalink / raw)
To: Xin Long
Cc: network dev, bridge@lists.linux-foundation.org, David S. Miller,
Herbert Xu
In-Reply-To: <CADvbK_fJAqpcspMO7SPhRcbia9KLBA=Q5E2ySTsV0A6+EfDzFg@mail.gmail.com>
On 24/04/17 18:21, Xin Long wrote:
> On Mon, Apr 24, 2017 at 10:53 PM, Nikolay Aleksandrov
> <nikolay@cumulusnetworks.com> wrote:
>> On 24/04/17 17:41, Xin Long wrote:
>>> On Mon, Apr 24, 2017 at 8:07 PM, Nikolay Aleksandrov
>>> <nikolay@cumulusnetworks.com> wrote:
>>>> On 24/04/17 14:01, Nikolay Aleksandrov wrote:
>>>>> On 24/04/17 10:25, Xin Long wrote:
>>>>>> During removing a bridge device, if the bridge is still up, a new mdb entry
>>>>>> still can be added in br_multicast_add_group() after all mdb entries are
>>>>>> removed in br_multicast_dev_del(). Like the path:
>>>>>>
>>>>>> mld_ifc_timer_expire ->
>>>>>> mld_sendpack -> ...
>>>>>> br_multicast_rcv ->
>>>>>> br_multicast_add_group
>>>>>>
>>>>>> The new mp's timer will be set up. If the timer expires after the bridge
>>>>>> is freed, it may cause use-after-free panic in br_multicast_group_expired.
>>>>>> This can happen when ip link remove a bridge or destroy a netns with a
>>>>>> bridge device inside.
>>>>>>
>>>>>> As we can see in br_del_bridge, brctl is also supposed to remove a bridge
>>>>>> device after it's shutdown.
>>>>>>
>>>>>> This patch is to call dev_close at the beginning of br_dev_delete so that
>>>>>> netif_running check in br_multicast_add_group can avoid this issue. But
>>>>>> to keep consistent with before, it will not remove the IFF_UP check in
>>>>>> br_del_bridge for brctl.
>>>>>>
>>>>>> Reported-by: Jianwen Ji <jiji@redhat.com>
>>>>>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>>>>>> ---
>>>>>> net/bridge/br_if.c | 2 ++
>>>>>> 1 file changed, 2 insertions(+)
>>>>>>
>>>>>
>>>>> +CC bridge maintainers
>>>>>
>>>>> I can see how this could happen, could you also provide the traceback ?
>>>>>
>>>>> The patch looks good to me, actually I think it fixes another issue with
>>>>> mcast stats where the percpu pointer can be accessed after it's freed if
>>>>> an mcast packet can get sent via br->dev after the br_multicast_dev_del() call.
>>>>> This is definitely stable material, if I'm not mistaken the issue is there since
>>>>> the introduction of br_dev_delete:
>>>>> commit e10177abf842
>>>>> Author: Satish Ashok <sashok@cumulusnetworks.com>
>>>>> Date: Wed Jul 15 07:16:51 2015 -0700
>>>>>
>>>>> bridge: multicast: fix handling of temp and perm entries
>>>>>
>>>>>
>>>>>
>>>>> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>>>>
>>>>
>>>> Actually I have a better idea for a fix because dev_close() for a single device is rather heavy.
>>>> Why don't you move the mdb flush logic in the bridge's ndo_uninit() callback ?
>>>> That should have the same effect and be much faster.
>>> Yes. But it seems that all cleanups for bridge should be done after
>>> it's shutdown since beginning according to brctl. I'm not sure if there
>>> are still other problems caused by this. maybe safer to use dev_close.
>>> I need to check more to confirm this.
>>>
>>
>> ndo_uninit() is after the device has been stopped, so it is the same as
>> your fix as I said.
> got that your suggestion can fix this issue. what I'm afraid of is there
> are still other problems like this issue, like "the percpu pointer" one
> you just mentioned above, though it's already fixed by ndo_uninit.
> dev_close would just avoid ALL this kind of issues if there still are. :)
>
> But if you can be sure no more issue like this one, I'm all for that,
> will improve this patch with your suggestion.
>
Please fix it with ndo_uninit(), avoiding another synchronize_net() call
is worth the trouble.
>
>>
>>> I also have another question about mp->timer removing.
>>> As we can see, now it removes this timer with del_timer, instead of
>>> del_timer_sync. What if the timer is running when del_timer ?
>>> How can we be sure that br_multicast_group_expired will be done
>>> before the bridge dev is freed. synchronize_net ?
>>>
>>
>> Yeah, I've been thinking about that and the only race is that the timer
>> might have fired and waiting for the lock while the mdb is being flushed
>> thus the cancel_timer() won't affect it and then it will enter and see
>> that !netif_running(br->dev), but unfortunately there's a bug because we
>> cannot guarantee that br->dev still exists at that point.
>> This is a different bug though.
> exactly, the bad thing is it's pretty hard to reproduce even if this bz exists,
> since the timer process can not be preemptable. synchronize_net probably
> could avoid it (not sure).
I think the _bh rcu barrier in br_multicast_dev_del() should wait for
all currently executing BHs to finish before executing the callbacks to
free the groups, so it should be fine if any timer is waiting for the
lock at the same time: it will get it, see br->dev as not running and exit.
This is the part I'm talking about (br_multicast.c, 2023 - 2025):
spin_unlock_bh(&br->multicast_lock);
rcu_barrier_bh();
spin_lock_bh(&br->multicast_lock);
At this point either the timer has fired and has been waiting for the
lock or got deleted by the flush.
If anyone could check the logic above it'd be great, adding the original
bridge multicast author as well and I'll keep digging.
>
>>
>>>>
>>>> By the way I just noticed that there's also a memory leak - the mdb hash is reallocated
>>>> and not freed due to the mdb rehash, here's also kmemleak's object:
>>>>
>>> yeps, ;-)
>>>
>>>> unreferenced object 0xffff8800540ba800 (size 2048):
>>>> comm "softirq", pid 0, jiffies 4520588901 (age 5787.284s)
>>>> hex dump (first 32 bytes):
>>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>>>> backtrace:
>>>> [<ffffffff816e2287>] kmemleak_alloc+0x67/0xc0
>>>> [<ffffffff81260bea>] __kmalloc+0x1ba/0x3e0
>>>> [<ffffffffa05c60ee>] br_mdb_rehash+0x5e/0x340 [bridge]
>>>> [<ffffffffa05c74af>] br_multicast_new_group+0x43f/0x6e0 [bridge]
>>>> [<ffffffffa05c7aa3>] br_multicast_add_group+0x203/0x260 [bridge]
>>>> [<ffffffffa05ca4b5>] br_multicast_rcv+0x945/0x11d0 [bridge]
>>>> [<ffffffffa05b6b10>] br_dev_xmit+0x180/0x470 [bridge]
>>>> [<ffffffff815c781b>] dev_hard_start_xmit+0xbb/0x3d0
>>>> [<ffffffff815c8743>] __dev_queue_xmit+0xb13/0xc10
>>>> [<ffffffff815c8850>] dev_queue_xmit+0x10/0x20
>>>> [<ffffffffa02f8d7a>] ip6_finish_output2+0x5ca/0xac0 [ipv6]
>>>> [<ffffffffa02fbfc6>] ip6_finish_output+0x126/0x2c0 [ipv6]
>>>> [<ffffffffa02fc245>] ip6_output+0xe5/0x390 [ipv6]
>>>> [<ffffffffa032b92c>] NF_HOOK.constprop.44+0x6c/0x240 [ipv6]
>>>> [<ffffffffa032bd16>] mld_sendpack+0x216/0x3e0 [ipv6]
>>>> [<ffffffffa032d5eb>] mld_ifc_timer_expire+0x18b/0x2b0 [ipv6]
>>>>
>>>>
>>>>
>>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox