* Re: WARNING: kernel stack frame pointer at ffff880156a5fea0 in bash:2103 has bad value 00007ffec7d87e50
From: Daniel Borkmann @ 2017-09-26 21:59 UTC (permalink / raw)
To: Richard Weinberger, Alexei Starovoitov
Cc: ast, netdev, linux-kernel, jpoimboe, mingo
In-Reply-To: <1598510.AHGpDp18sh@blindfold>
On 09/26/2017 11:51 PM, Richard Weinberger wrote:
> Alexei,
>
> CC'ing Josh and Ingo.
>
> Am Dienstag, 26. September 2017, 06:09:02 CEST schrieb Alexei Starovoitov:
>> On Mon, Sep 25, 2017 at 11:23:31PM +0200, Richard Weinberger wrote:
>>> Hi!
>>>
>>> While playing with bcc's opensnoop tool on Linux 4.14-rc2 I managed to
>>> trigger this splat:
>>>
>>> [ 297.629773] WARNING: kernel stack frame pointer at ffff880156a5fea0 in
>>> bash:2103 has bad value 00007ffec7d87e50
>>> [ 297.629777] unwind stack type:0 next_sp: (null) mask:0x6
>>> graph_idx:0
>>> [ 297.629783] ffff88015b207ae0: ffff88015b207b68 (0xffff88015b207b68)
>>> [ 297.629790] ffff88015b207ae8: ffffffffb163c00e
>>> (__save_stack_trace+0x6e/
>>> 0xd0)
>>> [ 297.629792] ffff88015b207af0: 0000000000000000 ...
>>> [ 297.629795] ffff88015b207af8: ffff880156a58000 (0xffff880156a58000)
>>> [ 297.629799] ffff88015b207b00: ffff880156a60000 (0xffff880156a60000)
>>> [ 297.629800] ffff88015b207b08: 0000000000000000 ...
>>> [ 297.629803] ffff88015b207b10: 0000000000000006 (0x6)
>>> [ 297.629806] ffff88015b207b18: ffff880151b02700 (0xffff880151b02700)
>>> [ 297.629809] ffff88015b207b20: 0000010100000000 (0x10100000000)
>>> [ 297.629812] ffff88015b207b28: ffff880156a5fea0 (0xffff880156a5fea0)
>>> [ 297.629815] ffff88015b207b30: ffff88015b207ae0 (0xffff88015b207ae0)
>>> [ 297.629818] ffff88015b207b38: ffffffffc0050282 (0xffffffffc0050282)
>>> [ 297.629819] ffff88015b207b40: 0000000000000000 ...
>>> [ 297.629822] ffff88015b207b48: 0000000001000000 (0x1000000)
>>> [ 297.629825] ffff88015b207b50: ffff880157b98280 (0xffff880157b98280)
>>> [ 297.629828] ffff88015b207b58: ffff880157b98380 (0xffff880157b98380)
>>> [ 297.629831] ffff88015b207b60: ffff88015ad2b500 (0xffff88015ad2b500)
>>> [ 297.629834] ffff88015b207b68: ffff88015b207b78 (0xffff88015b207b78)
>>> [ 297.629838] ffff88015b207b70: ffffffffb163c086
>>> (save_stack_trace+0x16/0x20) [ 297.629841] ffff88015b207b78:
>>> ffff88015b207da8 (0xffff88015b207da8) [ 297.629847] ffff88015b207b80:
>>> ffffffffb18a8ed6 (save_stack+0x46/0xd0) [ 297.629850] ffff88015b207b88:
>>> 000000400000000c (0x400000000c)
>>> [ 297.629852] ffff88015b207b90: ffff88015b207ba0 (0xffff88015b207ba0)
>>> [ 297.629855] ffff88015b207b98: ffff880100000000 (0xffff880100000000)
>>> [ 297.629859] ffff88015b207ba0: ffffffffb163c086
>>> (save_stack_trace+0x16/0x20) [ 297.629864] ffff88015b207ba8:
>>> ffffffffb18a8ed6 (save_stack+0x46/0xd0) [ 297.629868] ffff88015b207bb0:
>>> ffffffffb18a9752 (kasan_slab_free+0x72/0xc0)
>> Thanks for the report!
>> I'm not sure I understand what's going on here.
>> It seems you have kasan enabled and it's trying to do save_stack()
>> and something crashing?
>> I don't see any bpf related helpers in the stack trace.
>> Which architecture is this? and .config ?
>> Is bpf jit enabled? If so, make sure that net.core.bpf_jit_kallsyms=1
>
> I found some time to dig a little further.
> It seems to happen only when CONFIG_DEBUG_SPINLOCK is enabled, please see the
> attached .config. The JIT is off.
> KAsan is also not involved at all, the regular stack saving machinery from the
> trace framework initiates the stack unwinder.
>
> The issue arises as soon as in pre_handler_kretprobe() raw_spin_lock_irqsave()
> is being called.
> It happens on all releases that have commit c32c47c68a0a ("x86/unwind: Warn on
> bad frame pointer").
> Interestingly it does not happen when I run
> samples/kprobes/kretprobe_example.ko. So, BPF must be involved somehow.
Some time ago, Josh fixed this one here, seems perhaps related in
some way; it was triggerable back then from one of the BPF tracing
samples if I recall correctly:
commit a8b7a92318b6d7779f6d8e9aa6ba0e3de01a8943
Author: Josh Poimboeuf <jpoimboe@redhat.com>
Date: Wed Apr 12 13:47:12 2017 -0500
x86/unwind: Silence entry-related warnings
A few people have reported unwinder warnings like the following:
WARNING: kernel stack frame pointer at ffffc90000fe7ff0 in rsync:1157 has bad value (null)
unwind stack type:0 next_sp: (null) mask:2 graph_idx:0
ffffc90000fe7f98: ffffc90000fe7ff0 (0xffffc90000fe7ff0)
ffffc90000fe7fa0: ffffffffb7000f56 (trace_hardirqs_off_thunk+0x1a/0x1c)
ffffc90000fe7fa8: 0000000000000246 (0x246)
ffffc90000fe7fb0: 0000000000000000 ...
ffffc90000fe7fc0: 00007ffe3af639bc (0x7ffe3af639bc)
ffffc90000fe7fc8: 0000000000000006 (0x6)
ffffc90000fe7fd0: 00007f80af433fc5 (0x7f80af433fc5)
ffffc90000fe7fd8: 00007ffe3af638e0 (0x7ffe3af638e0)
ffffc90000fe7fe0: 00007ffe3af638e0 (0x7ffe3af638e0)
ffffc90000fe7fe8: 00007ffe3af63970 (0x7ffe3af63970)
ffffc90000fe7ff0: 0000000000000000 ...
ffffc90000fe7ff8: ffffffffb7b74b9a (entry_SYSCALL_64_after_swapgs+0x17/0x4f)
This warning can happen when unwinding a code path where an interrupt
occurred in x86 entry code before it set up the first stack frame.
Silently ignore any warnings for this case.
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: c32c47c68a0a ("x86/unwind: Warn on bad frame pointer")
Link: http://lkml.kernel.org/r/dbd6838826466a60dc23a52098185bc973ce2f1e.1492020577.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
> Here is another variant of the warning, it matches the attached .config:
>
> [ 42.729039] WARNING: kernel stack frame pointer at ffff99ef4076bea0 in
> opensnoop:2008 has bad value 0000000000000008
> [ 42.729041] unwind stack type:0 next_sp: (null) mask:0x2
> graph_idx:0
> [ 42.729042] ffff99ef4076bcb0: ffff99ef4076bd38 (0xffff99ef4076bd38)
> [ 42.729044] ffff99ef4076bcb8: ffffffffac42781e (__save_stack_trace+0x6e/
> 0xd0)
> [ 42.729044] ffff99ef4076bcc0: 0000000000000000 ...
> [ 42.729045] ffff99ef4076bcc8: ffff99ef40768000 (0xffff99ef40768000)
> [ 42.729045] ffff99ef4076bcd0: ffff99ef4076c000 (0xffff99ef4076c000)
> [ 42.729045] ffff99ef4076bcd8: 0000000000000000 ...
> [ 42.729046] ffff99ef4076bce0: 0000000000000002 (0x2)
> [ 42.729046] ffff99ef4076bce8: ffff8a1c39163fc0 (0xffff8a1c39163fc0)
> [ 42.729047] ffff99ef4076bcf0: 0000000100000000 (0x100000000)
> [ 42.729047] ffff99ef4076bcf8: ffff99ef4076bea0 (0xffff99ef4076bea0)
> [ 42.729048] ffff99ef4076bd00: ffff99ef4076bcb0 (0xffff99ef4076bcb0)
> [ 42.729048] ffff99ef4076bd08: ffffffffc00b302f (0xffffffffc00b302f)
> [ 42.729048] ffff99ef4076bd10: 0000000000000000 ...
> [ 42.729049] ffff99ef4076bd18: ffff8a1c39163fc0 (0xffff8a1c39163fc0)
> [ 42.729049] ffff99ef4076bd20: 0000000000000000 ...
> [ 42.729052] ffff99ef4076bd28: ffffffffadb9ccc0 (lock_classes
> +0x55500/0x29fec0)
> [ 42.729052] ffff99ef4076bd30: 0000000000000000 ...
> [ 42.729052] ffff99ef4076bd38: ffff99ef4076bd48 (0xffff99ef4076bd48)
> [ 42.729053] ffff99ef4076bd40: ffffffffac427896 (save_stack_trace+0x16/0x20)
> [ 42.729054] ffff99ef4076bd48: ffff99ef4076bd98 (0xffff99ef4076bd98)
> [ 42.729055] ffff99ef4076bd50: ffffffffac4a18d5 (__lock_acquire.isra.
> 34+0x525/0x700)
> [ 42.729055] ffff99ef4076bd58: 0000000000000000 ...
> [ 42.729055] ffff99ef4076bd68: ffff99ef00000411 (0xffff99ef00000411)
> [ 42.729056] ffff99ef4076bd70: 0000000000000046 (0x46)
> [ 42.729056] ffff99ef4076bd78: 0000000000000000 ...
> [ 42.729057] ffff99ef4076bd98: ffff99ef4076be00 (0xffff99ef4076be00)
> [ 42.729057] ffff99ef4076bda0: ffffffffac4a224a (lock_acquire+0xca/0x170)
> [ 42.729059] ffff99ef4076bda8: ffffffffac50a2cd (pre_handler_kretprobe+0x3d/
> 0x1b0)
> [ 42.729059] ffff99ef4076bdb0: 0000000100000000 (0x100000000)
> [ 42.729060] ffff99ef4076bdb8: ffff8a1c00000000 (0xffff8a1c00000000)
> [ 42.729063] ffff99ef4076bdc0: 0000000000000046 (0x46)
> [ 42.729063] ffff99ef4076bdc8: 00000001ac47ee61 (0x1ac47ee61)
> [ 42.729064] ffff99ef4076bdd0: ffff8a1c37b0e0d0 (0xffff8a1c37b0e0d0)
> [ 42.729064] ffff99ef4076bdd8: ffff8a1c37b0e0b8 (0xffff8a1c37b0e0b8)
> [ 42.729067] ffff99ef4076bde0: 0000000000000082 (0x82)
> [ 42.729067] ffff99ef4076bde8: ffff8a1c37b0e0b8 (0xffff8a1c37b0e0b8)
> [ 42.729067] ffff99ef4076bdf0: ffff99ef4076beb0 (0xffff99ef4076beb0)
> [ 42.729068] ffff99ef4076bdf8: ffff8a1c39163fc0 (0xffff8a1c39163fc0)
> [ 42.729068] ffff99ef4076be00: ffff99ef4076be28 (0xffff99ef4076be28)
> [ 42.729070] ffff99ef4076be08: fffffffface13e56 (_raw_spin_lock_irqsave
> +0x46/0x60)
> [ 42.729071] ffff99ef4076be10: ffffffffac50a2cd (pre_handler_kretprobe+0x3d/
> 0x1b0)
> [ 42.729072] ffff99ef4076be18: ffff8a1c37b0e010 (0xffff8a1c37b0e010)
> [ 42.729072] ffff99ef4076be20: ffff8a1c37b0e010 (0xffff8a1c37b0e010)
> [ 42.729073] ffff99ef4076be28: ffff99ef4076be60 (0xffff99ef4076be60)
> [ 42.729074] ffff99ef4076be30: ffffffffac50a2cd (pre_handler_kretprobe+0x3d/
> 0x1b0)
> [ 42.729074] ffff99ef4076be38: ffff8a1c37b0e010 (0xffff8a1c37b0e010)
> [ 42.729074] ffff99ef4076be40: ffff8a1c38cc1780 (0xffff8a1c38cc1780)
> [ 42.729075] ffff99ef4076be48: ffff99ef4076beb0 (0xffff99ef4076beb0)
> [ 42.729075] ffff99ef4076be50: 000055b4ef12d1b0 (0x55b4ef12d1b0)
> [ 42.729076] ffff99ef4076be58: 000055b4ee9920a0 (0x55b4ee9920a0)
> [ 42.729076] ffff99ef4076be60: ffff99ef4076be88 (0xffff99ef4076be88)
> [ 42.729077] ffff99ef4076be68: ffffffffac509f6a (opt_pre_handler+0x3a/0x60)
> [ 42.729078] ffff99ef4076be70: 0000000000000246 (0x246)
> [ 42.729078] ffff99ef4076be78: 000055b4ef12cd70 (0x55b4ef12cd70)
> [ 42.729079] ffff99ef4076be80: 0000000000000001 (0x1)
> [ 42.729079] ffff99ef4076be88: ffff99ef4076bea0 (0xffff99ef4076bea0)
> [ 42.729080] ffff99ef4076be90: ffffffffac442721 (optimized_callback
> +0x81/0x90)
> [ 42.729081] ffff99ef4076be98: 000055b4ef134d50 (0x55b4ef134d50)
> [ 42.729081] ffff99ef4076bea0: 0000000000000008 (0x8)
> [ 42.729082] ffff99ef4076bea8: ffffffffc00b302f (0xffffffffc00b302f)
> [ 42.729082] ffff99ef4076beb0: 000055b4ee9920a0 (0x55b4ee9920a0)
> [ 42.729083] ffff99ef4076beb8: 000055b4ef12d1b0 (0x55b4ef12d1b0)
> [ 42.729083] ffff99ef4076bec0: 0000000000000001 (0x1)
> [ 42.729084] ffff99ef4076bec8: 000055b4ef12cd70 (0x55b4ef12cd70)
> [ 42.729084] ffff99ef4076bed0: 0000000000000008 (0x8)
> [ 42.729084] ffff99ef4076bed8: 000055b4ef134d50 (0x55b4ef134d50)
> [ 42.729085] ffff99ef4076bee0: ffff8a1c39163fc0 (0xffff8a1c39163fc0)
> [ 42.729085] ffff99ef4076bee8: 0000000000000000 ...
> [ 42.729086] ffff99ef4076bef0: 0000000000000001 (0x1)
> [ 42.729086] ffff99ef4076bef8: 0000000000000008 (0x8)
> [ 42.729086] ffff99ef4076bf00: 0000000000000002 (0x2)
> [ 42.729087] ffff99ef4076bf08: 0000000000000000 ...
> [ 42.729087] ffff99ef4076bf10: 00000000000001b6 (0x1b6)
> [ 42.729087] ffff99ef4076bf18: 0000000000000000 ...
> [ 42.729088] ffff99ef4076bf20: 000055b4ef12d1b0 (0x55b4ef12d1b0)
> [ 42.729088] ffff99ef4076bf28: ffffffffffffffff (0xffffffffffffffff)
> [ 42.729090] ffff99ef4076bf30: ffffffffac5c5031 (SyS_open+0x1/0x20)
> [ 42.729090] ffff99ef4076bf38: 0000000000000010 (0x10)
> [ 42.729090] ffff99ef4076bf40: 0000000000000293 (0x293)
> [ 42.729091] ffff99ef4076bf48: ffff99ef4076bf50 (0xffff99ef4076bf50)
> [ 42.729092] ffff99ef4076bf50: fffffffface13f77 (entry_SYSCALL_64_fastpath
> +0x1a/0xaa)
> [ 42.729092] ffff99ef4076bf58: 0000000000000026 (0x26)
> [ 42.729093] ffff99ef4076bf60: 00007f276f5e2600 (0x7f276f5e2600)
> [ 42.729093] ffff99ef4076bf68: 0000000000000001 (0x1)
> [ 42.729094] ffff99ef4076bf70: 0000000000000026 (0x26)
> [ 42.729094] ffff99ef4076bf78: 000055b4ef1035d0 (0x55b4ef1035d0)
> [ 42.729094] ffff99ef4076bf80: 0000000000000026 (0x26)
> [ 42.729095] ffff99ef4076bf88: 0000000000000246 (0x246)
> [ 42.729095] ffff99ef4076bf90: 0000000000000000 ...
> [ 42.729095] ffff99ef4076bf98: 0000000000000001 (0x1)
> [ 42.729096] ffff99ef4076bfa0: 0000000000000008 (0x8)
> [ 42.729096] ffff99ef4076bfa8: ffffffffffffffda (0xffffffffffffffda)
> [ 42.729097] ffff99ef4076bfb0: 00007f276f3234e0 (0x7f276f3234e0)
> [ 42.729097] ffff99ef4076bfb8: 00000000000001b6 (0x1b6)
> [ 42.729097] ffff99ef4076bfc0: 0000000000000000 ...
> [ 42.729098] ffff99ef4076bfc8: 000055b4ef12d1b0 (0x55b4ef12d1b0)
> [ 42.729098] ffff99ef4076bfd0: 0000000000000002 (0x2)
> [ 42.729099] ffff99ef4076bfd8: 00007f276f3234e0 (0x7f276f3234e0)
> [ 42.729099] ffff99ef4076bfe0: 0000000000000033 (0x33)
> [ 42.729100] ffff99ef4076bfe8: 0000000000000246 (0x246)
> [ 42.729100] ffff99ef4076bff0: 00007ffd98082448 (0x7ffd98082448)
> [ 42.729100] ffff99ef4076bff8: 000000000000002b (0x2b)
>
> Thanks,
> //richard
>
^ permalink raw reply
* Re: [PATCH net-next 1/5] net: dsa: return -ENODEV is there is no slave PHY
From: Florian Fainelli @ 2017-09-26 21:59 UTC (permalink / raw)
To: Vivien Didelot, netdev; +Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn
In-Reply-To: <20170926211535.21273-2-vivien.didelot@savoirfairelinux.com>
On 09/26/2017 02:15 PM, Vivien Didelot wrote:
> Instead of returning -EOPNOTSUPP when a slave device has no PHY,
> directly return -ENODEV as ethtool and phylib do.
>
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [PATCH net-next 3/5] net: dsa: use phy_ethtool_get_link_ksettings
From: Florian Fainelli @ 2017-09-26 22:00 UTC (permalink / raw)
To: Vivien Didelot, netdev; +Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn
In-Reply-To: <20170926211535.21273-4-vivien.didelot@savoirfairelinux.com>
On 09/26/2017 02:15 PM, Vivien Didelot wrote:
> Use phy_ethtool_get_link_ksettings now that dsa_slave_get_link_ksettings
> does exactly the same.
>
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [PATCH net-next 5/5] net: dsa: use phy_ethtool_nway_reset
From: Florian Fainelli @ 2017-09-26 22:05 UTC (permalink / raw)
To: Vivien Didelot, netdev; +Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn
In-Reply-To: <20170926211535.21273-6-vivien.didelot@savoirfairelinux.com>
On 09/26/2017 02:15 PM, Vivien Didelot wrote:
> Use phy_ethtool_nway_reset now that dsa_slave_nway_reset does exactly
> the same.
>
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [PATCH net-next 4/5] net: dsa: use phy_ethtool_set_link_ksettings
From: Florian Fainelli @ 2017-09-26 22:05 UTC (permalink / raw)
To: Vivien Didelot, netdev; +Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn
In-Reply-To: <20170926211535.21273-5-vivien.didelot@savoirfairelinux.com>
On 09/26/2017 02:15 PM, Vivien Didelot wrote:
> Use phy_ethtool_set_link_ksettings now that dsa_slave_set_link_ksettings
> does exactly the same.
>
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [PATCH net-next 1/2] tools: rename tools/net directory to tools/bpf
From: Alexei Starovoitov @ 2017-09-26 22:19 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: netdev, daniel, davem, hannes, dsahern, oss-drivers
In-Reply-To: <20170926153522.31500-2-jakub.kicinski@netronome.com>
On Tue, Sep 26, 2017 at 08:35:21AM -0700, Jakub Kicinski wrote:
> We currently only have BPF tools in the tools/net directory.
> We are about to add more BPF tools there, not necessarily
> networking related, rename the directory and related Makefile
> targets to bpf.
>
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>
makes sense.
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply
* Re: [PATCH net-next 2/2] tools: bpf: add bpftool
From: Alexei Starovoitov @ 2017-09-26 22:24 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: netdev, daniel, davem, hannes, dsahern, oss-drivers
In-Reply-To: <20170926153522.31500-3-jakub.kicinski@netronome.com>
On Tue, Sep 26, 2017 at 08:35:22AM -0700, Jakub Kicinski wrote:
> Add a simple tool for querying and updating BPF objects on the system.
>
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Reviewed-by: Simon Horman <simon.horman@netronome.com>
> ---
> tools/bpf/Makefile | 18 +-
> tools/bpf/bpftool/Makefile | 80 +++++
> tools/bpf/bpftool/common.c | 214 ++++++++++++
> tools/bpf/bpftool/jit_disasm.c | 83 +++++
> tools/bpf/bpftool/main.c | 212 ++++++++++++
> tools/bpf/bpftool/main.h | 99 ++++++
> tools/bpf/bpftool/map.c | 742 +++++++++++++++++++++++++++++++++++++++++
> tools/bpf/bpftool/prog.c | 392 ++++++++++++++++++++++
> 8 files changed, 1837 insertions(+), 3 deletions(-)
...
> +static int do_help(int argc, char **argv)
> +{
> + fprintf(stderr,
> + "Usage: %s %s show [MAP]\n"
> + " %s %s dump MAP\n"
> + " %s %s update MAP key BYTES value VALUE [UPDATE_FLAGS]\n"
> + " %s %s lookup MAP key BYTES\n"
> + " %s %s getnext MAP [key BYTES]\n"
> + " %s %s delete MAP key BYTES\n"
> + " %s %s pin MAP FILE\n"
> + " %s %s help\n"
> + "\n"
> + " MAP := { id MAP_ID | pinned FILE }\n"
> + " " HELP_SPEC_PROGRAM "\n"
> + " VALUE := { BYTES | MAP | PROG }\n"
> + " UPDATE_FLAGS := { any | exist | noexist }\n"
> + "",
overall looks good to me, but still difficult to grasp how to use it.
Can you add README with example usage and expected output?
Acked-by: Alexei Starovoitov <ast@kernel.org>
You also realize that you're signing up maintaining this tool, right? ;)
^ permalink raw reply
* [PATCH net-next 3/6] net: dsa: mv88e6xxx: Fixed port netdev check for VLANs
From: Andrew Lunn @ 2017-09-26 22:26 UTC (permalink / raw)
To: David Miller; +Cc: Vivien Didelot, netdev, Andrew Lunn
In-Reply-To: <1506464764-12699-1-git-send-email-andrew@lunn.ch>
Having the same VLAN on multiple bridges is currently unsupported as
an offload. mv88e6xxx_port_check_hw_vlan() is used to ensure that a
VLAN is not on multiple bridges when adding a VLAN range to a port. It
loops the ports and checks to see if there are ports in a different
bridge with the same VLAN.
While walking all switch ports, the code was checking if the new port
has a netdev attached to it. If not, skip checking the port being
walked. This seems like a typ0. If the new port does not have a
netdev, how has a VLAN been added to it in the first place, requiring
this check be performed at all? More likely, we should be checking if
the port being walked has a netdev. Without it having a netdev, it
cannot have a VLAN on it, so there is no need to check further for
that particular port.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index c6678aa9b4ef..884f0507cf48 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1120,7 +1120,7 @@ static int mv88e6xxx_port_check_hw_vlan(struct dsa_switch *ds, int port,
if (dsa_is_dsa_port(ds, i) || dsa_is_cpu_port(ds, i))
continue;
- if (!ds->ports[port].netdev)
+ if (!ds->ports[i].netdev)
continue;
if (vlan.member[i] ==
--
2.14.1
^ permalink raw reply related
* [PATCH net-next 4/6] net: dsa: mv88e6xxx: Print offending port when vlan check fails
From: Andrew Lunn @ 2017-09-26 22:26 UTC (permalink / raw)
To: David Miller; +Cc: Vivien Didelot, netdev, Andrew Lunn
In-Reply-To: <1506464764-12699-1-git-send-email-andrew@lunn.ch>
When testing if a VLAN is one more than one bridge, we print an error
message that the VLAN is already in use somewhere else. Print both the
new port which would like the VLAN, and the port which already has it,
to aid debugging.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
drivers/net/dsa/mv88e6xxx/chip.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 884f0507cf48..8a4756490a5a 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1134,8 +1134,8 @@ static int mv88e6xxx_port_check_hw_vlan(struct dsa_switch *ds, int port,
if (!ds->ports[i].bridge_dev)
continue;
- dev_err(ds->dev, "p%d: hw VLAN %d already used by %s\n",
- port, vlan.vid,
+ dev_err(ds->dev, "p%d: hw VLAN %d already used by port %d in %s\n",
+ port, vlan.vid, i,
netdev_name(ds->ports[i].bridge_dev));
err = -EOPNOTSUPP;
goto unlock;
--
2.14.1
^ permalink raw reply related
* [PATCH net-next 6/6] net: dsa: mv88e6xxx: Flood broadcast frames in hardware
From: Andrew Lunn @ 2017-09-26 22:26 UTC (permalink / raw)
To: David Miller; +Cc: Vivien Didelot, netdev, Andrew Lunn
In-Reply-To: <1506464764-12699-1-git-send-email-andrew@lunn.ch>
By default, the switch does not flood broadcast frames. Instead the
broadcast address is unknown in the ATU, so the frame gets forwarded
out the cpu port. The software bridge then floods it back to the
individual switch ports which are members of the bridge.
Add an ATU entry in the switch so that it floods broadcast frames out
ports, rather than have the software bridge do it. Also, send a copy
out the cpu port and any dsa ports. Rely on the port vectors to
prevent broadcast frames leaking between bridges, and separated ports.
Additionally, when a VLAN is added, a new FID is allocated. This
represents a new table of ATU entries. A broadcast entry is added to
the new FID.
With offload_fwd_mark being set, the software bridge will not flood
the frames it receives back to the switch.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
drivers/net/dsa/mv88e6xxx/chip.c | 34 +++++++++++++++++++++++++++++++++-
1 file changed, 33 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 9fbe1f02b9ce..908bb867df3b 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1235,6 +1235,30 @@ static int mv88e6xxx_port_db_load_purge(struct mv88e6xxx_chip *chip, int port,
return mv88e6xxx_g1_atu_loadpurge(chip, vlan.fid, &entry);
}
+static int mv88e6xxx_port_add_broadcast(struct mv88e6xxx_chip *chip, int port,
+ u16 vid)
+{
+ const char broadcast[6] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
+
+ return mv88e6xxx_port_db_load_purge(
+ chip, port, broadcast, vid,
+ MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC);
+}
+
+static int mv88e6xxx_broadcast_setup(struct mv88e6xxx_chip *chip, u16 vid)
+{
+ int port;
+ int err;
+
+ for (port = 0; port < mv88e6xxx_num_ports(chip); ++port) {
+ err = mv88e6xxx_port_add_broadcast(chip, port, vid);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
static int _mv88e6xxx_port_vlan_add(struct mv88e6xxx_chip *chip, int port,
u16 vid, u8 member)
{
@@ -1247,7 +1271,11 @@ static int _mv88e6xxx_port_vlan_add(struct mv88e6xxx_chip *chip, int port,
vlan.member[port] = member;
- return mv88e6xxx_vtu_loadpurge(chip, &vlan);
+ err = mv88e6xxx_vtu_loadpurge(chip, &vlan);
+ if (err)
+ return err;
+
+ return mv88e6xxx_broadcast_setup(chip, vid);
}
static void mv88e6xxx_port_vlan_add(struct dsa_switch *ds, int port,
@@ -2025,6 +2053,10 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
if (err)
goto unlock;
+ err = mv88e6xxx_broadcast_setup(chip, 0);
+ if (err)
+ goto unlock;
+
err = mv88e6xxx_pot_setup(chip);
if (err)
goto unlock;
--
2.14.1
^ permalink raw reply related
* [PATCH net-next 0/6] mv88e6xxx broadcast flooding in hardware
From: Andrew Lunn @ 2017-09-26 22:25 UTC (permalink / raw)
To: David Miller; +Cc: Vivien Didelot, netdev, Andrew Lunn
This patchset makes the mv88e6xxx driver perform flooding in hardware,
rather than let the software bridge perform the flooding. This is a
prerequisite for IGMP snooping on the bridge interface.
In order to make hardware broadcasting work, a few other issues need
fixing or improving. SWITCHDEV_ATTR_ID_PORT_PARENT_ID is broken, which
is apparent when testing on the ZII devel board with multiple
switches.
Some of these patches are taken from a previous RFC patchset of IGMP
support. Hence the v2 comments...
Andrew Lunn (6):
net: dsa: Fix SWITCHDEV_ATTR_ID_PORT_PARENT_ID
net: dsa: {e}dsa: set offload_fwd_mark on received packets
net: dsa: mv88e6xxx: Fixed port netdev check for VLANs
net: dsa: mv88e6xxx: Print offending port when vlan check fails
net: dsa: mv88e6xxx: Move mv88e6xxx_port_db_load_purge()
net: dsa: mv88e6xxx: Flood broadcast frames in hardware
drivers/net/dsa/mv88e6xxx/chip.c | 128 ++++++++++++++++++++++++---------------
net/dsa/slave.c | 11 ++--
net/dsa/tag_dsa.c | 1 +
net/dsa/tag_edsa.c | 1 +
4 files changed, 89 insertions(+), 52 deletions(-)
--
2.14.1
^ permalink raw reply
* [PATCH net-next 5/6] net: dsa: mv88e6xxx: Move mv88e6xxx_port_db_load_purge()
From: Andrew Lunn @ 2017-09-26 22:26 UTC (permalink / raw)
To: David Miller; +Cc: Vivien Didelot, netdev, Andrew Lunn
In-Reply-To: <1506464764-12699-1-git-send-email-andrew@lunn.ch>
This function is going to be needed by a soon to be added new
function. Move it earlier so we can avoid a forward declaration.
No code changes.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
drivers/net/dsa/mv88e6xxx/chip.c | 88 ++++++++++++++++++++--------------------
1 file changed, 44 insertions(+), 44 deletions(-)
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 8a4756490a5a..9fbe1f02b9ce 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1191,6 +1191,50 @@ mv88e6xxx_port_vlan_prepare(struct dsa_switch *ds, int port,
return 0;
}
+static int mv88e6xxx_port_db_load_purge(struct mv88e6xxx_chip *chip, int port,
+ const unsigned char *addr, u16 vid,
+ u8 state)
+{
+ struct mv88e6xxx_vtu_entry vlan;
+ struct mv88e6xxx_atu_entry entry;
+ int err;
+
+ /* Null VLAN ID corresponds to the port private database */
+ if (vid == 0)
+ err = mv88e6xxx_port_get_fid(chip, port, &vlan.fid);
+ else
+ err = mv88e6xxx_vtu_get(chip, vid, &vlan, false);
+ if (err)
+ return err;
+
+ entry.state = MV88E6XXX_G1_ATU_DATA_STATE_UNUSED;
+ ether_addr_copy(entry.mac, addr);
+ eth_addr_dec(entry.mac);
+
+ err = mv88e6xxx_g1_atu_getnext(chip, vlan.fid, &entry);
+ if (err)
+ return err;
+
+ /* Initialize a fresh ATU entry if it isn't found */
+ if (entry.state == MV88E6XXX_G1_ATU_DATA_STATE_UNUSED ||
+ !ether_addr_equal(entry.mac, addr)) {
+ memset(&entry, 0, sizeof(entry));
+ ether_addr_copy(entry.mac, addr);
+ }
+
+ /* Purge the ATU entry only if no port is using it anymore */
+ if (state == MV88E6XXX_G1_ATU_DATA_STATE_UNUSED) {
+ entry.portvec &= ~BIT(port);
+ if (!entry.portvec)
+ entry.state = MV88E6XXX_G1_ATU_DATA_STATE_UNUSED;
+ } else {
+ entry.portvec |= BIT(port);
+ entry.state = state;
+ }
+
+ return mv88e6xxx_g1_atu_loadpurge(chip, vlan.fid, &entry);
+}
+
static int _mv88e6xxx_port_vlan_add(struct mv88e6xxx_chip *chip, int port,
u16 vid, u8 member)
{
@@ -1307,50 +1351,6 @@ static int mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int port,
return err;
}
-static int mv88e6xxx_port_db_load_purge(struct mv88e6xxx_chip *chip, int port,
- const unsigned char *addr, u16 vid,
- u8 state)
-{
- struct mv88e6xxx_vtu_entry vlan;
- struct mv88e6xxx_atu_entry entry;
- int err;
-
- /* Null VLAN ID corresponds to the port private database */
- if (vid == 0)
- err = mv88e6xxx_port_get_fid(chip, port, &vlan.fid);
- else
- err = mv88e6xxx_vtu_get(chip, vid, &vlan, false);
- if (err)
- return err;
-
- entry.state = MV88E6XXX_G1_ATU_DATA_STATE_UNUSED;
- ether_addr_copy(entry.mac, addr);
- eth_addr_dec(entry.mac);
-
- err = mv88e6xxx_g1_atu_getnext(chip, vlan.fid, &entry);
- if (err)
- return err;
-
- /* Initialize a fresh ATU entry if it isn't found */
- if (entry.state == MV88E6XXX_G1_ATU_DATA_STATE_UNUSED ||
- !ether_addr_equal(entry.mac, addr)) {
- memset(&entry, 0, sizeof(entry));
- ether_addr_copy(entry.mac, addr);
- }
-
- /* Purge the ATU entry only if no port is using it anymore */
- if (state == MV88E6XXX_G1_ATU_DATA_STATE_UNUSED) {
- entry.portvec &= ~BIT(port);
- if (!entry.portvec)
- entry.state = MV88E6XXX_G1_ATU_DATA_STATE_UNUSED;
- } else {
- entry.portvec |= BIT(port);
- entry.state = state;
- }
-
- return mv88e6xxx_g1_atu_loadpurge(chip, vlan.fid, &entry);
-}
-
static int mv88e6xxx_port_fdb_add(struct dsa_switch *ds, int port,
const unsigned char *addr, u16 vid)
{
--
2.14.1
^ permalink raw reply related
* [PATCH net-next 2/6] net: dsa: {e}dsa: set offload_fwd_mark on received packets
From: Andrew Lunn @ 2017-09-26 22:25 UTC (permalink / raw)
To: David Miller; +Cc: Vivien Didelot, netdev, Andrew Lunn
In-Reply-To: <1506464764-12699-1-git-send-email-andrew@lunn.ch>
The software bridge needs to know if a packet has already been bridged
by hardware offload to ports in the same hardware offload, in order
that it does not re-flood them, causing duplicates. This is
particularly true for broadcast and multicast traffic which the host
has requested.
By setting offload_fwd_mark in the skb the bridge will only flood to
ports in other offloads and other netifs. Set this flag in the DSA and
EDSA tag driver.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
v2
--
For the moment, do this in the tag drivers, not the generic code.
Once we get more test results from other switches, maybe move it back
again.
---
net/dsa/tag_dsa.c | 1 +
net/dsa/tag_edsa.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/net/dsa/tag_dsa.c b/net/dsa/tag_dsa.c
index fbf9ca954773..ea6ada9d5016 100644
--- a/net/dsa/tag_dsa.c
+++ b/net/dsa/tag_dsa.c
@@ -154,6 +154,7 @@ static struct sk_buff *dsa_rcv(struct sk_buff *skb, struct net_device *dev,
}
skb->dev = ds->ports[source_port].netdev;
+ skb->offload_fwd_mark = 1;
return skb;
}
diff --git a/net/dsa/tag_edsa.c b/net/dsa/tag_edsa.c
index 76367ba1b2e2..a961b22a7018 100644
--- a/net/dsa/tag_edsa.c
+++ b/net/dsa/tag_edsa.c
@@ -173,6 +173,7 @@ static struct sk_buff *edsa_rcv(struct sk_buff *skb, struct net_device *dev,
}
skb->dev = ds->ports[source_port].netdev;
+ skb->offload_fwd_mark = 1;
return skb;
}
--
2.14.1
^ permalink raw reply related
* [PATCH net-next 1/6] net: dsa: Fix SWITCHDEV_ATTR_ID_PORT_PARENT_ID
From: Andrew Lunn @ 2017-09-26 22:25 UTC (permalink / raw)
To: David Miller; +Cc: Vivien Didelot, netdev, Andrew Lunn
In-Reply-To: <1506464764-12699-1-git-send-email-andrew@lunn.ch>
SWITCHDEV_ATTR_ID_PORT_PARENT_ID is used by the software bridge when
determining which ports to flood a packet out. If the packet
originated from a switch, it assumes the switch has already flooded
the packet out the switches ports, so the bridge should not flood the
packet itself out switch ports. Ports on the same switch are expected
to return the same parent ID when SWITCHDEV_ATTR_ID_PORT_PARENT_ID is
called.
DSA gets this wrong with clusters of switches. As far as the software
bridge is concerned, the cluster is all one switch. A packet from any
switch in the cluster can be assumed to of been flooded as needed out
all ports of the cluster, not just the switch it originated
from. Hence all ports of a cluster should return the same parent. The
old implementation did not, each switch in the cluster had its own ID.
Also wrong was that the ID was not unique if multiple DSA instances
are in operation.
Use the tree ID as the parent ID, which is the same for all switches
in a cluster and unique across switch clusters.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
v2: Swap from MAC address to dst->tree
---
net/dsa/slave.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index bd51ef56ec5b..ee72aa164956 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -354,13 +354,16 @@ static int dsa_slave_port_attr_get(struct net_device *dev,
struct switchdev_attr *attr)
{
struct dsa_slave_priv *p = netdev_priv(dev);
- struct dsa_switch *ds = p->dp->ds;
switch (attr->id) {
- case SWITCHDEV_ATTR_ID_PORT_PARENT_ID:
- attr->u.ppid.id_len = sizeof(ds->index);
- memcpy(&attr->u.ppid.id, &ds->index, attr->u.ppid.id_len);
+ case SWITCHDEV_ATTR_ID_PORT_PARENT_ID: {
+ struct dsa_switch *ds = p->dp->ds;
+ struct dsa_switch_tree *dst = ds->dst;
+
+ attr->u.ppid.id_len = sizeof(dst->tree);
+ memcpy(&attr->u.ppid.id, &dst->tree, sizeof(dst->tree));
break;
+ }
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT:
attr->u.brport_flags_support = 0;
break;
--
2.14.1
^ permalink raw reply related
* [PATCH 6/6] net: dsa: mv88e6xxx: Forward broadcast frames to cpu and dsa ports
From: Andrew Lunn @ 2017-09-26 22:26 UTC (permalink / raw)
To: David Miller; +Cc: Vivien Didelot, netdev, Andrew Lunn
In-Reply-To: <1506464764-12699-1-git-send-email-andrew@lunn.ch>
By default, the switch does not flood broadcast frames. Instead the
broadcast address is unknown in the ATU, so the frame gets forwarded
out the cpu port. The software bridge then floods it back to the
individual switch ports which are members of the bridge.
Add an ATU entry in the switch so that it floods broadcast frames out
ports, rather than have the software bridge do it. Also, send a copy
out the cpu port and any dsa ports. Rely on the port vectors to
prevent broadcast frames leaking between bridges, and separated ports.
Additionally, when a VLAN is added, a new FID is allocated. This
represents a new table of ATU entries. Broadcast entry is added to the
new FID.
With offload_fwd_mark being set, the software bridge will not flood
the frames it receives back to the switch.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
---
drivers/net/dsa/mv88e6xxx/chip.c | 34 +++++++++++++++++++++++++++++++++-
1 file changed, 33 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 9fbe1f02b9ce..908bb867df3b 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1235,6 +1235,30 @@ static int mv88e6xxx_port_db_load_purge(struct mv88e6xxx_chip *chip, int port,
return mv88e6xxx_g1_atu_loadpurge(chip, vlan.fid, &entry);
}
+static int mv88e6xxx_port_add_broadcast(struct mv88e6xxx_chip *chip, int port,
+ u16 vid)
+{
+ const char broadcast[6] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
+
+ return mv88e6xxx_port_db_load_purge(
+ chip, port, broadcast, vid,
+ MV88E6XXX_G1_ATU_DATA_STATE_MC_STATIC);
+}
+
+static int mv88e6xxx_broadcast_setup(struct mv88e6xxx_chip *chip, u16 vid)
+{
+ int port;
+ int err;
+
+ for (port = 0; port < mv88e6xxx_num_ports(chip); ++port) {
+ err = mv88e6xxx_port_add_broadcast(chip, port, vid);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
static int _mv88e6xxx_port_vlan_add(struct mv88e6xxx_chip *chip, int port,
u16 vid, u8 member)
{
@@ -1247,7 +1271,11 @@ static int _mv88e6xxx_port_vlan_add(struct mv88e6xxx_chip *chip, int port,
vlan.member[port] = member;
- return mv88e6xxx_vtu_loadpurge(chip, &vlan);
+ err = mv88e6xxx_vtu_loadpurge(chip, &vlan);
+ if (err)
+ return err;
+
+ return mv88e6xxx_broadcast_setup(chip, vid);
}
static void mv88e6xxx_port_vlan_add(struct dsa_switch *ds, int port,
@@ -2025,6 +2053,10 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
if (err)
goto unlock;
+ err = mv88e6xxx_broadcast_setup(chip, 0);
+ if (err)
+ goto unlock;
+
err = mv88e6xxx_pot_setup(chip);
if (err)
goto unlock;
--
2.14.1
^ permalink raw reply related
* Re: [PATCH v2 net-next 1/2] bpf/verifier: improve disassembly of BPF_END instructions
From: Alexei Starovoitov @ 2017-09-26 22:33 UTC (permalink / raw)
To: Edward Cree; +Cc: davem, netdev, daniel, ys114321
In-Reply-To: <b0a84ccf-8842-876c-ec82-b4b1da3d6efa@solarflare.com>
On Tue, Sep 26, 2017 at 04:35:13PM +0100, Edward Cree wrote:
> print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a
> different structure: it has a size in insn->imm (even if it's BPF_X) and
> uses the BPF_SRC (X or K) to indicate which endianness to use. So it
> needs different code to print it.
>
> Signed-off-by: Edward Cree <ecree@solarflare.com>
well, it's an improvement over what we have today, so
Acked-by: Alexei Starovoitov <ast@kernel.org>
^ permalink raw reply
* Re: [PATCH v2 net-next 2/2] bpf/verifier: improve disassembly of BPF_NEG instructions
From: Alexei Starovoitov @ 2017-09-26 22:34 UTC (permalink / raw)
To: Edward Cree; +Cc: davem, netdev, daniel, ys114321
In-Reply-To: <f27b1dfe-4ea2-5737-d3d5-21cb581d1927@solarflare.com>
On Tue, Sep 26, 2017 at 04:35:29PM +0100, Edward Cree wrote:
> BPF_NEG takes only one operand, unlike the bulk of BPF_ALU[64] which are
> compound-assignments. So give it its own format in print_bpf_insn().
>
> Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
thank you for the cleanup.
^ permalink raw reply
* Re: WARNING: kernel stack frame pointer at ffff880156a5fea0 in bash:2103 has bad value 00007ffec7d87e50
From: Josh Poimboeuf @ 2017-09-26 22:42 UTC (permalink / raw)
To: Richard Weinberger
Cc: Alexei Starovoitov, ast, daniel, netdev, linux-kernel, mingo
In-Reply-To: <1598510.AHGpDp18sh@blindfold>
On Tue, Sep 26, 2017 at 11:51:31PM +0200, Richard Weinberger wrote:
> Alexei,
>
> CC'ing Josh and Ingo.
>
> Am Dienstag, 26. September 2017, 06:09:02 CEST schrieb Alexei Starovoitov:
> > On Mon, Sep 25, 2017 at 11:23:31PM +0200, Richard Weinberger wrote:
> > > Hi!
> > >
> > > While playing with bcc's opensnoop tool on Linux 4.14-rc2 I managed to
> > > trigger this splat:
> > >
> > > [ 297.629773] WARNING: kernel stack frame pointer at ffff880156a5fea0 in
> > > bash:2103 has bad value 00007ffec7d87e50
> > > [ 297.629777] unwind stack type:0 next_sp: (null) mask:0x6
> > > graph_idx:0
> > > [ 297.629783] ffff88015b207ae0: ffff88015b207b68 (0xffff88015b207b68)
> > > [ 297.629790] ffff88015b207ae8: ffffffffb163c00e
> > > (__save_stack_trace+0x6e/
> > > 0xd0)
> > > [ 297.629792] ffff88015b207af0: 0000000000000000 ...
> > > [ 297.629795] ffff88015b207af8: ffff880156a58000 (0xffff880156a58000)
> > > [ 297.629799] ffff88015b207b00: ffff880156a60000 (0xffff880156a60000)
> > > [ 297.629800] ffff88015b207b08: 0000000000000000 ...
> > > [ 297.629803] ffff88015b207b10: 0000000000000006 (0x6)
> > > [ 297.629806] ffff88015b207b18: ffff880151b02700 (0xffff880151b02700)
> > > [ 297.629809] ffff88015b207b20: 0000010100000000 (0x10100000000)
> > > [ 297.629812] ffff88015b207b28: ffff880156a5fea0 (0xffff880156a5fea0)
> > > [ 297.629815] ffff88015b207b30: ffff88015b207ae0 (0xffff88015b207ae0)
> > > [ 297.629818] ffff88015b207b38: ffffffffc0050282 (0xffffffffc0050282)
> > > [ 297.629819] ffff88015b207b40: 0000000000000000 ...
> > > [ 297.629822] ffff88015b207b48: 0000000001000000 (0x1000000)
> > > [ 297.629825] ffff88015b207b50: ffff880157b98280 (0xffff880157b98280)
> > > [ 297.629828] ffff88015b207b58: ffff880157b98380 (0xffff880157b98380)
> > > [ 297.629831] ffff88015b207b60: ffff88015ad2b500 (0xffff88015ad2b500)
> > > [ 297.629834] ffff88015b207b68: ffff88015b207b78 (0xffff88015b207b78)
> > > [ 297.629838] ffff88015b207b70: ffffffffb163c086
> > > (save_stack_trace+0x16/0x20) [ 297.629841] ffff88015b207b78:
> > > ffff88015b207da8 (0xffff88015b207da8) [ 297.629847] ffff88015b207b80:
> > > ffffffffb18a8ed6 (save_stack+0x46/0xd0) [ 297.629850] ffff88015b207b88:
> > > 000000400000000c (0x400000000c)
> > > [ 297.629852] ffff88015b207b90: ffff88015b207ba0 (0xffff88015b207ba0)
> > > [ 297.629855] ffff88015b207b98: ffff880100000000 (0xffff880100000000)
> > > [ 297.629859] ffff88015b207ba0: ffffffffb163c086
> > > (save_stack_trace+0x16/0x20) [ 297.629864] ffff88015b207ba8:
> > > ffffffffb18a8ed6 (save_stack+0x46/0xd0) [ 297.629868] ffff88015b207bb0:
> > > ffffffffb18a9752 (kasan_slab_free+0x72/0xc0)
> > Thanks for the report!
> > I'm not sure I understand what's going on here.
> > It seems you have kasan enabled and it's trying to do save_stack()
> > and something crashing?
> > I don't see any bpf related helpers in the stack trace.
> > Which architecture is this? and .config ?
> > Is bpf jit enabled? If so, make sure that net.core.bpf_jit_kallsyms=1
>
> I found some time to dig a little further.
> It seems to happen only when CONFIG_DEBUG_SPINLOCK is enabled, please see the
> attached .config. The JIT is off.
> KAsan is also not involved at all, the regular stack saving machinery from the
> trace framework initiates the stack unwinder.
>
> The issue arises as soon as in pre_handler_kretprobe() raw_spin_lock_irqsave()
> is being called.
> It happens on all releases that have commit c32c47c68a0a ("x86/unwind: Warn on
> bad frame pointer").
> Interestingly it does not happen when I run
> samples/kprobes/kretprobe_example.ko. So, BPF must be involved somehow.
>
> Here is another variant of the warning, it matches the attached .config:
I can take a look at it. Unfortunately, for these types of issues I
often need the vmlinux file to be able to make sense of the unwinder
dump. So if you happen to have somewhere to copy the vmlinux to, that
would be helpful. Or if you give me your GCC version I can try to
rebuild it locally.
--
Josh
^ permalink raw reply
* Re: [PATCH 6/6] net: dsa: mv88e6xxx: Forward broadcast frames to cpu and dsa ports
From: Andrew Lunn @ 2017-09-26 22:43 UTC (permalink / raw)
To: David Miller; +Cc: Vivien Didelot, netdev
In-Reply-To: <1506464764-12699-8-git-send-email-andrew@lunn.ch>
Ah, twice patch 6. Not good.
I will wait for a few days for comments, and then repost without the
duplication.
Andrew
^ permalink raw reply
* Re: [PATCH v2 net-next 1/2] bpf/verifier: improve disassembly of BPF_END instructions
From: Daniel Borkmann @ 2017-09-26 22:53 UTC (permalink / raw)
To: Edward Cree, davem; +Cc: netdev, alexei.starovoitov, ys114321
In-Reply-To: <b0a84ccf-8842-876c-ec82-b4b1da3d6efa@solarflare.com>
On 09/26/2017 05:35 PM, Edward Cree wrote:
> print_bpf_insn() was treating all BPF_ALU[64] the same, but BPF_END has a
> different structure: it has a size in insn->imm (even if it's BPF_X) and
> uses the BPF_SRC (X or K) to indicate which endianness to use. So it
> needs different code to print it.
>
> Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
^ permalink raw reply
* Re: [PATCH v2 net-next 2/2] bpf/verifier: improve disassembly of BPF_NEG instructions
From: Daniel Borkmann @ 2017-09-26 22:53 UTC (permalink / raw)
To: Edward Cree, davem; +Cc: netdev, alexei.starovoitov, ys114321
In-Reply-To: <f27b1dfe-4ea2-5737-d3d5-21cb581d1927@solarflare.com>
On 09/26/2017 05:35 PM, Edward Cree wrote:
> BPF_NEG takes only one operand, unlike the bulk of BPF_ALU[64] which are
> compound-assignments. So give it its own format in print_bpf_insn().
>
> Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
^ permalink raw reply
* Re: [PATCH net-next 2/2] tools: bpf: add bpftool
From: Jakub Kicinski @ 2017-09-26 23:02 UTC (permalink / raw)
To: Alexei Starovoitov; +Cc: netdev, daniel, davem, hannes, dsahern, oss-drivers
In-Reply-To: <20170926222405.nq23enzudbjklczb@ast-mbp>
On Tue, 26 Sep 2017 15:24:06 -0700, Alexei Starovoitov wrote:
> On Tue, Sep 26, 2017 at 08:35:22AM -0700, Jakub Kicinski wrote:
> > Add a simple tool for querying and updating BPF objects on the system.
> >
> > Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> > Reviewed-by: Simon Horman <simon.horman@netronome.com>
> > ---
> > tools/bpf/Makefile | 18 +-
> > tools/bpf/bpftool/Makefile | 80 +++++
> > tools/bpf/bpftool/common.c | 214 ++++++++++++
> > tools/bpf/bpftool/jit_disasm.c | 83 +++++
> > tools/bpf/bpftool/main.c | 212 ++++++++++++
> > tools/bpf/bpftool/main.h | 99 ++++++
> > tools/bpf/bpftool/map.c | 742 +++++++++++++++++++++++++++++++++++++++++
> > tools/bpf/bpftool/prog.c | 392 ++++++++++++++++++++++
> > 8 files changed, 1837 insertions(+), 3 deletions(-)
> ...
> > +static int do_help(int argc, char **argv)
> > +{
> > + fprintf(stderr,
> > + "Usage: %s %s show [MAP]\n"
> > + " %s %s dump MAP\n"
> > + " %s %s update MAP key BYTES value VALUE [UPDATE_FLAGS]\n"
> > + " %s %s lookup MAP key BYTES\n"
> > + " %s %s getnext MAP [key BYTES]\n"
> > + " %s %s delete MAP key BYTES\n"
> > + " %s %s pin MAP FILE\n"
> > + " %s %s help\n"
> > + "\n"
> > + " MAP := { id MAP_ID | pinned FILE }\n"
> > + " " HELP_SPEC_PROGRAM "\n"
> > + " VALUE := { BYTES | MAP | PROG }\n"
> > + " UPDATE_FLAGS := { any | exist | noexist }\n"
> > + "",
>
> overall looks good to me, but still difficult to grasp how to use it.
> Can you add README with example usage and expected output?
I have a README on GitHub, but I was thinking about perhaps writing a
proper man page? Do you prefer one over the other?
> Acked-by: Alexei Starovoitov <ast@kernel.org>
Thanks!
> You also realize that you're signing up maintaining this tool, right? ;)
Yes :)
^ permalink raw reply
* Re: [PATCH net-next 0/2] tools: add bpftool
From: David Ahern @ 2017-09-26 23:32 UTC (permalink / raw)
To: Jakub Kicinski, netdev
Cc: daniel, alexei.starovoitov, davem, hannes, oss-drivers
In-Reply-To: <20170926153522.31500-1-jakub.kicinski@netronome.com>
On 9/26/17 9:35 AM, Jakub Kicinski wrote:
> I'm looking for a home for bpftool, Daniel suggested that
> tools/net could be a good place, since there are only BPF
> utilities there already.
>
> The tool should be complete for simple use cases and we
> will continue extending it as we go along. E.g. providing
> disassembly of loaded programs directly using LLVM library
> and JSON output are high on the priority list.
I have found this to be a very useful tool. Thanks for working on it.
Moving it into the kernel will make it easier to build since it relies
on libbpf and other files from the kernel tree.
One change I have made locally is to link against libbpf.a. That way I
only need to copy one file to a system to use it.
^ permalink raw reply
* [next-queue PATCH 0/3] TSN: Add qdisc based config interface for CBS
From: Vinicius Costa Gomes @ 2017-09-26 23:39 UTC (permalink / raw)
To: netdev, intel-wired-lan
Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, andre.guedes,
ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
richardcochran, henrik
Hi,
Changes from the RFC:
- Fixed comments from Henrik Austad;
- Simplified the Qdisc, using the generic implementation of callbacks
where possible;
- Small refactor on the driver (igb) code;
This patchset is a proposal of how the Traffic Control subsystem can
be used to offload the configuration of the Credit Based Shaper
(defined in the IEEE 802.1Q-2014 Section 8.6.8.2) into supported
network devices.
As part of this work, we've assessed previous public discussions
related to TSN enabling: patches from Henrik Austad (Cisco), the
presentation from Eric Mann at Linux Plumbers 2012, patches from
Gangfeng Huang (National Instruments) and the current state of the
OpenAVNU project (https://github.com/AVnu/OpenAvnu/).
Overview
========
Time-sensitive Networking (TSN) is a set of standards that aim to
address resources availability for providing bandwidth reservation and
bounded latency on Ethernet based LANs. The proposal described here
aims to cover mainly what is needed to enable the following standards:
802.1Qat and 802.1Qav.
The initial target of this work is the Intel i210 NIC, but other
controllers' datasheet were also taken into account, like the Renesas
RZ/A1H RZ/A1M group and the Synopsis DesignWare Ethernet QoS
controller.
Proposal
========
Feature-wise, what is covered here is the configuration interfaces for
HW implementations of the Credit-Based shaper (CBS, 802.1Qav). CBS is
a per-queue shaper. Given that this feature is related to traffic
shaping, and that the traffic control subsystem already provides a
queueing discipline that offloads config into the device driver (i.e.
mqprio), designing a new qdisc for the specific purpose of offloading
the config for the CBS shaper seemed like a good fit.
For steering traffic into the correct queues, we use the socket option
SO_PRIORITY and then a mechanism to map priority to traffic classes /
Tx queues. The qdisc mqprio is currently used in our tests.
As for the CBS config interface, this patchset is proposing a new
qdisc called 'cbs'. Its 'tc' cmd line is:
$ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \
idleslope I
Note that the parameters for this qdisc are the ones defined by the
802.1Q-2014 spec, so no hardware specific functionality is exposed here.
Testing this RFC
================
Attached to this cover letter are:
- calculate_cbs_params.py: A Python script to calculate the
parameters to the CBS queueing discipline;
- tsn-talker.c: A sample C implementation of the talker side of a stream;
- tsn-listener.c: A sample C implementation of the listener side of a
stream;
For testing the patches of this series, you may want to use the
attached samples to this cover letter and use the 'mqprio' qdisc to
setup the priorities to Tx queues mapping, together with the 'cbs'
qdisc to configure the HW shaper of the i210 controller:
1) Setup priorities to traffic classes to hardware queues mapping
$ tc qdisc replace dev ens4 handle 100: parent root mqprio num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
For a more detailed explanation, see mqprio(8), in short, this command
will map traffic with priority 3 to the hardware queue 0, traffic with
priority 2 to hardware queue 1, and the rest will be mapped to
hardware queues 2 and 3.
2) Check scheme. You want to get the inner qdiscs ID from the bottom up
$ tc -g class show dev ens4
Ex.:
+---(100:3) mqprio
| +---(100:6) mqprio
| +---(100:7) mqprio
|
+---(100:2) mqprio
| +---(100:5) mqprio
|
+---(100:1) mqprio
+---(100:4) mqprio
* Here '100:4' is Tx Queue #0 and '100:5' is Tx Queue #1.
3) Calculate CBS parameters for classes A and B. i.e. BW for A is 20Mbps and
for B is 10Mbps:
$ calc_cbs_params.py -A 20000 -a 1500 -B 10000 -b 1500
4) Configure CBS for traffic class A (priority 3) as provided by the script:
$ tc qdisc replace dev ens4 parent 100:4 cbs locredit -1470 \
hicredit 30 sendslope -980000 idleslope 20000
5) Configure CBS for traffic class B (priority 2):
$ tc qdisc replace dev ens4 parent 100:5 cbs \
locredit -1485 hicredit 31 sendslope -990000 idleslope 10000
6) Run Listener:
$ ./tsn-listener -d 01:AA:AA:AA:AA:AA -i ens4 -s 1500
7) Run Talker for class A (prio 3 here), compiled from samples/tsn/talker.c
$ ./tsn-talker -d 01:AA:AA:AA:AA:AA -i ens4 -p 3 -s 1500
* The bandwidth displayed on the listener output at this stage should be very
close to the one configured for class A.
8) You can also run a Talker for class B (prio 2 here and using a
different address):
$ ./tsn-talker -d 01:BB:BB:BB:BB:BB -i ens4 -s 1500
Known Issues
============
- There is an implicit dependency on how mqprio assigns handles to
hardware queues;
- There is a problem on how mqprio assigns hardware queues to its
children qdiscs. A separated patchset is being worked on to solve
this.
Authors
=======
- Andre Guedes <andre.guedes@intel.com>
- Ivan Briano <ivan.briano@intel.com>
- Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
- Vinicius Gomes <vinicius.gomes@intel.com>
Andre Guedes (1):
igb: Add support for CBS offload
Vinicius Costa Gomes (2):
net/sched: Introduce the user API for the CBS shaper
net/sched: Introduce Credit Based Shaper (CBS) qdisc
drivers/net/ethernet/intel/igb/e1000_defines.h | 23 ++
drivers/net/ethernet/intel/igb/e1000_regs.h | 8 +
drivers/net/ethernet/intel/igb/igb.h | 6 +
drivers/net/ethernet/intel/igb/igb_main.c | 347 +++++++++++++++++++++++++
include/linux/netdevice.h | 1 +
include/net/pkt_sched.h | 9 +
include/uapi/linux/pkt_sched.h | 17 ++
net/sched/Kconfig | 12 +
net/sched/Makefile | 1 +
net/sched/sch_cbs.c | 229 ++++++++++++++++
10 files changed, 653 insertions(+)
create mode 100644 net/sched/sch_cbs.c
Annex: Sample files
===================
calc_cbs_params.py
--8<---------------cut here---------------start------------->8---
#!/usr/bin/env python
#
# Copyright (c) 2017, Intel Corporation
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice,
# this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of Intel Corporation nor the names of its contributors
# may be used to endorse or promote products derived from this software
# without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
import argparse
import math
def print_cbs_params_for_class_a(args):
idleslope = args.idleslope_a
sendslope = idleslope - args.link_speed
# According to 802.1Q-2014 spec, Annex L, hiCredit and
# loCredit for SR class A are calculated following the
# equations L-10 and L-12, respectively.
hicredit = math.ceil(idleslope * args.frame_non_sr / args.link_speed)
locredit = math.ceil(sendslope * args.frame_a / args.link_speed)
print("tc qdisc add dev <IFNAME> parent <QDISC-ID> cbs idleslope %d sendslope %d hicredit %d locredit %d" % \
(idleslope, sendslope, hicredit, locredit))
def print_cbs_params_for_class_b(args):
idleslope = args.idleslope_b
sendslope = idleslope - args.link_speed
# Annex L doesn't present a straightforward equation to
# calculate hiCredit for Class B so we have to derive it
# based on generic equations presented in that Annex.
#
# L-3 is the primary equation to calculate hiCredit. Section
# L.2 states that the 'maxInterferenceSize' for SR class B
# is the maximum burst size for SR class A plus the
# maxInterferenceSize from SR class A (which is equal to the
# maximum frame from non-SR traffic).
#
# The maximum burst size for SR class A equation is shown in
# L-16. Merging L-16 into L-3 we get the resulting equation
# which calculates hiCredit B (refer to section L.3 in case
# you're not familiar with the legend):
#
# hiCredit B = Rb * ( Mo Ma )
# ---------- + ------
# Ro - Ra Ro
#
hicredit = math.ceil(idleslope * \
((args.frame_non_sr / (args.link_speed - args.idleslope_a)) + \
(args.frame_a / args.link_speed)))
# loCredit B is calculated following equation L-2.
locredit = math.ceil(sendslope * args.frame_b / args.link_speed)
print("tc qdisc add dev <IFNAME> parent <QDISC-ID> cbs idleslope %d sendslope %d hicredit %d locredit %d" % \
(idleslope, sendslope, hicredit, locredit))
def main():
parser = argparse.ArgumentParser()
parser.add_argument('-S', dest='link_speed', default=1000000.0, type=float,
help='Link speed in kbps')
parser.add_argument('-s', dest='frame_non_sr', default=1500.0, type=float,
help='Maximum frame size from non-SR traffic (MTU size'
'usually')
parser.add_argument('-A', dest='idleslope_a', default=0, type=float,
help='Idleslope for SR class A in kbps')
parser.add_argument('-a', dest='frame_a', default=0, type=float,
help='Maximum frame size for SR class A traffic')
parser.add_argument('-B', dest='idleslope_b', default=0, type=float,
help='Idleslope for SR class B in kbps')
parser.add_argument('-b', dest='frame_b', default=0, type=float,
help='Maximum frame size for SR class B traffic')
args = parser.parse_args()
if args.idleslope_a > 0:
print_cbs_params_for_class_a(args)
if args.idleslope_b > 0:
print_cbs_params_for_class_b(args)
if __name__ == "__main__":
main()
--8<---------------cut here---------------end--------------->8---
tsn-talker.c
--8<---------------cut here---------------start------------->8---
/*
* Copyright (c) 2017, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
* SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include <alloca.h>
#include <argp.h>
#include <arpa/inet.h>
#include <inttypes.h>
#include <linux/if.h>
#include <linux/if_ether.h>
#include <linux/if_packet.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
#define MAGIC 0xCC
static uint8_t ifname[IFNAMSIZ];
static uint8_t macaddr[ETH_ALEN];
static int priority = -1;
static size_t size = 1500;
static uint64_t seq;
static int delay = -1;
static struct argp_option options[] = {
{"dst-addr", 'd', "MACADDR", 0, "Stream Destination MAC address" },
{"delay", 'D', "NUM", 0, "Delay (in us) between packet transmission" },
{"ifname", 'i', "IFNAME", 0, "Network Interface" },
{"prio", 'p', "NUM", 0, "SO_PRIORITY to be set in socket" },
{"packet-size", 's', "NUM", 0, "Size of packets to be transmitted" },
{ 0 }
};
static error_t parser(int key, char *arg, struct argp_state *state)
{
int res;
switch (key) {
case 'd':
res = sscanf(arg, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
&macaddr[0], &macaddr[1], &macaddr[2],
&macaddr[3], &macaddr[4], &macaddr[5]);
if (res != 6) {
printf("Invalid address\n");
exit(EXIT_FAILURE);
}
break;
case 'D':
delay = atoi(arg);
break;
case 'i':
strncpy(ifname, arg, sizeof(ifname) - 1);
break;
case 'p':
priority = atoi(arg);
break;
case 's':
size = atoi(arg);
break;
}
return 0;
}
static struct argp argp = { options, parser };
int main(int argc, char *argv[])
{
int fd, res;
struct ifreq req;
uint8_t *data;
struct sockaddr_ll sk_addr = {
.sll_family = AF_PACKET,
.sll_protocol = htons(ETH_P_TSN),
.sll_halen = ETH_ALEN,
};
argp_parse(&argp, argc, argv, 0, NULL, NULL);
fd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_TSN));
if (fd < 0) {
perror("Couldn't open socket");
return 1;
}
strncpy(req.ifr_name, ifname, sizeof(req.ifr_name));
res = ioctl(fd, SIOCGIFINDEX, &req);
if (res < 0) {
perror("Couldn't get interface index");
goto err;
}
sk_addr.sll_ifindex = req.ifr_ifindex;
memcpy(&sk_addr.sll_addr, macaddr, ETH_ALEN);
if (priority != -1) {
res = setsockopt(fd, SOL_SOCKET, SO_PRIORITY, &priority,
sizeof(priority));
if (res < 0) {
perror("Couldn't set priority");
goto err;
}
}
data = alloca(size);
memset(data, MAGIC, size);
printf("Sending packets...\n");
while (1) {
uint64_t *seq_ptr = (uint64_t *) &data[0];
ssize_t n;
*seq_ptr = seq++;
n = sendto(fd, data, size, 0, (struct sockaddr *) &sk_addr,
sizeof(sk_addr));
if (n < 0)
perror("Failed to send data");
if (delay > 0)
usleep(delay);
}
close(fd);
return 0;
err:
close(fd);
return 1;
}
--8<---------------cut here---------------end--------------->8---
tsn-listener.c
--8<---------------cut here---------------start------------->8---
/*
* Copyright (c) 2017, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
* (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
* SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
* OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include <alloca.h>
#include <argp.h>
#include <arpa/inet.h>
#include <inttypes.h>
#include <linux/if.h>
#include <linux/if_ether.h>
#include <linux/if_packet.h>
#include <poll.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/timerfd.h>
#include <unistd.h>
static uint8_t ifname[IFNAMSIZ];
static uint8_t macaddr[ETH_ALEN];
static uint64_t data_count;
static int size = 1500;
static time_t interval = 1;
static bool check_seq = false;
static uint64_t expected_seq;
static struct argp_option options[] = {
{"check-seq", 'c', NULL, 0, "Check sequence number within packet" },
{"dst-addr", 'd', "MACADDR", 0, "Stream Destination MAC address" },
{"ifname", 'i', "IFNAME", 0, "Network Interface" },
{"interval", 'I', "SEC", 0, "Interval between bandwidth reports" },
{"packet-size", 's', "NUM", 0, "Expected packet size" },
{ 0 }
};
static error_t parser(int key, char *arg, struct argp_state *state)
{
int res;
switch (key) {
case 'c':
check_seq = true;
break;
case 'd':
res = sscanf(arg, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
&macaddr[0], &macaddr[1], &macaddr[2],
&macaddr[3], &macaddr[4], &macaddr[5]);
if (res != 6) {
printf("Invalid address\n");
exit(EXIT_FAILURE);
}
break;
case 'i':
strncpy(ifname, arg, sizeof(ifname) - 1);
break;
case 'I':
interval = atoi(arg);
break;
case 's':
size = atoi(arg);
break;
}
return 0;
}
static struct argp argp = { options, parser };
static int setup_timer(void)
{
int fd, res;
struct itimerspec tspec = { 0 };
fd = timerfd_create(CLOCK_MONOTONIC, 0);
if (fd < 0) {
perror("Couldn't create timer");
return -1;
}
tspec.it_value.tv_sec = interval;
tspec.it_interval.tv_sec = interval;
res = timerfd_settime(fd, 0, &tspec, NULL);
if (res < 0) {
perror("Couldn't set timer");
close(fd);
return -1;
}
return fd;
}
static int setup_socket(void)
{
int fd, res;
struct sockaddr_ll sk_addr = {
.sll_family = AF_PACKET,
.sll_protocol = htons(ETH_P_TSN),
};
fd = socket(AF_PACKET, SOCK_DGRAM, htons(ETH_P_TSN));
if (fd < 0) {
perror("Couldn't open socket");
return -1;
}
/* If user provided a network interface, bind() to it. */
if (ifname[0] != '\0') {
struct ifreq req;
strncpy(req.ifr_name, ifname, sizeof(req.ifr_name));
res = ioctl(fd, SIOCGIFINDEX, &req);
if (res < 0) {
perror("Couldn't get interface index");
goto err;
}
sk_addr.sll_ifindex = req.ifr_ifindex;
res = bind(fd, (struct sockaddr *) &sk_addr, sizeof(sk_addr));
if (res < 0) {
perror("Couldn't bind() to interface");
goto err;
}
}
/* If user provided the stream destination address, set it as multicast
* address.
*/
if (macaddr[0] != '\0') {
struct packet_mreq mreq;
mreq.mr_ifindex = sk_addr.sll_ifindex;
mreq.mr_type = PACKET_MR_MULTICAST;
mreq.mr_alen = ETH_ALEN;
memcpy(&mreq.mr_address, macaddr, ETH_ALEN);
res = setsockopt(fd, SOL_PACKET, PACKET_ADD_MEMBERSHIP,
&mreq, sizeof(struct packet_mreq));
if (res < 0) {
perror("Couldn't set PACKET_ADD_MEMBERSHIP");
goto err;
}
}
return fd;
err:
close(fd);
return -1;
}
static void recv_packet(int fd)
{
uint8_t *data = alloca(size);
ssize_t n = recv(fd, data, size, 0);
if (n < 0) {
perror("Failed to receive data");
return;
}
if (n != size)
printf("Size mismatch: expected %d, got %d\n", size, n);
if (check_seq) {
uint64_t *seq = (uint64_t *) &data[0];
/* If 'expected_seq' is equal to zero, it means this is the
* first packet we received so we don't know what sequence
* number to expect.
*/
if (expected_seq == 0)
expected_seq = *seq;
if (*seq != expected_seq) {
printf("Sequence mismatch: expected %llu, got %llu\n",
expected_seq, *seq);
expected_seq = *seq;
}
expected_seq++;
}
data_count += n;
}
static void report_bw(int fd)
{
uint64_t expirations;
ssize_t n = read(fd, &expirations, sizeof(uint64_t));
if (n < 0) {
perror("Couldn't read timerfd");
return;
}
if (expirations != 1)
printf("Some went wrong with timerfd\n");
printf("Receiving data rate: %llu kbps\n", (data_count * 8) / (1000 * interval));
data_count = 0;
}
int main(int argc, char *argv[])
{
int sk_fd, timer_fd, res;
struct pollfd fds[2];
argp_parse(&argp, argc, argv, 0, NULL, NULL);
sk_fd = setup_socket();
if (sk_fd < 0)
return 1;
timer_fd = setup_timer();
if (timer_fd < 0) {
close(sk_fd);
return 1;
}
fds[0].fd = sk_fd;
fds[0].events = POLLIN;
fds[1].fd = timer_fd;
fds[1].events = POLLIN;
printf("Waiting for packets...\n");
while (1) {
res = poll(fds, 2, -1);
if (res < 0) {
perror("Error on poll()");
goto err;
}
if (fds[0].revents & POLLIN)
recv_packet(fds[0].fd);
if (fds[1].revents & POLLIN) {
report_bw(fds[1].fd);
}
}
close(timer_fd);
close(sk_fd);
return 0;
err:
close(timer_fd);
close(sk_fd);
return 1;
}
--8<---------------cut here---------------end--------------->8---
^ permalink raw reply
* [next-queue PATCH 1/3] net/sched: Introduce the user API for the CBS shaper
From: Vinicius Costa Gomes @ 2017-09-26 23:39 UTC (permalink / raw)
To: netdev, intel-wired-lan
Cc: Vinicius Costa Gomes, jhs, xiyou.wangcong, jiri, andre.guedes,
ivan.briano, jesus.sanchez-palencia, boon.leong.ong,
richardcochran, henrik
In-Reply-To: <20170926233916.11774-1-vinicius.gomes@intel.com>
Export the API necessary for configuring the CBS shaper (implemented
in the next patch) via the tc tool.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
---
include/uapi/linux/pkt_sched.h | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 099bf5528fed..27c849c053cf 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -871,4 +871,21 @@ struct tc_pie_xstats {
__u32 maxq; /* maximum queue size */
__u32 ecn_mark; /* packets marked with ecn*/
};
+
+/* CBS */
+struct tc_cbs_qopt {
+ __s32 hicredit;
+ __s32 locredit;
+ __s32 idleslope;
+ __s32 sendslope;
+};
+
+enum {
+ TCA_CBS_UNSPEC,
+ TCA_CBS_PARMS,
+ __TCA_CBS_MAX,
+};
+
+#define TCA_CBS_MAX (__TCA_CBS_MAX - 1)
+
#endif
--
2.14.2
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox