* Re: [RFC bpf-next] bpf: document eBPF helpers and add a script to generate man page
From: Alexei Starovoitov @ 2018-04-09 3:48 UTC (permalink / raw)
To: Quentin Monnet; +Cc: daniel, ast, netdev, oss-drivers, linux-doc, linux-man
In-Reply-To: <20180406111122.11038-1-quentin.monnet@netronome.com>
On Fri, Apr 06, 2018 at 12:11:22PM +0100, Quentin Monnet wrote:
> eBPF helper functions can be called from within eBPF programs to perform
> a variety of tasks that would be otherwise hard or impossible to do with
> eBPF itself. There is a growing number of such helper functions in the
> kernel, but documentation is scarce. The main user space header file
> does contain a short commented description of most helpers, but it is
> somewhat outdated and not complete. It is more a "cheat sheet" than a
> real documentation accessible to new eBPF developers.
>
> This commit attempts to improve the situation by replacing the existing
> overview for the helpers with a more developed description. Furthermore,
> a Python script is added to generate a manual page for eBPF helpers. The
> workflow is the following, and requires the rst2man utility:
>
> $ ./scripts/bpf_helpers_doc.py \
> --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
> $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
> $ man /tmp/bpf-helpers.7
>
> The objective is to keep all documentation related to the helpers in a
> single place, and to be able to generate from here a manual page that
> could be packaged in the man-pages repository and shipped with most
> distributions [1].
>
> Additionally, parsing the prototypes of the helper functions could
> hopefully be reused, with a different Printer object, to generate
> header files needed in some eBPF-related projects.
>
> Regarding the description of each helper, it comprises several items:
>
> - The function prototype.
> - A description of the function and of its arguments (except for a
> couple of cases, when there are no arguments and the return value
> makes the function usage really obvious).
> - A description of return values (if not void).
> - A listing of eBPF program types (if relevant, map types) compatible
> with the helper.
> - Information about the helper being restricted to GPL programs, or not.
> - The kernel version in which the helper was introduced.
> - The commit that introduced the helper (this is mostly to have it in
> the source of the man page, as it can be used to track changes and
> update the page).
>
> For several helpers, descriptions are inspired (at times, nearly copied)
> from the commit logs introducing them in the kernel--Many thanks to
> their respective authors! They were completed as much as possible, the
> objective being to have something easily accessible even for people just
> starting with eBPF. There is probably a bit more work to do in this
> direction for some helpers.
>
> Some RST formatting is used in the descriptions (not in function
> prototypes, to keep them readable, but the Python script provided in
> order to generate the RST for the manual page does add formatting to
> prototypes, to produce something pretty) to get "bold" and "italics" in
> manual pages. Hopefully, the descriptions in bpf.h file remains
> perfectly readable. Note that the few trailing white spaces are
> intentional, removing them would break paragraphs for rst2man.
>
> The descriptions should ideally be updated each time someone adds a new
> helper, or updates the behaviour (compatibility extended to new program
> types, new socket option supported...) or the interface (new flags
> available, ...) of existing ones.
>
> [1] I have not contacted people from the man-pages project prior to
> sending this RFC, so I can offer no guaranty at this time that they
> would accept to take the generated man page.
>
> Cc: linux-doc@vger.kernel.org
> Cc: linux-man@vger.kernel.org
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
> ---
> include/uapi/linux/bpf.h | 2237 ++++++++++++++++++++++++++++++++++++--------
> scripts/bpf_helpers_doc.py | 568 +++++++++++
> 2 files changed, 2429 insertions(+), 376 deletions(-)
> create mode 100755 scripts/bpf_helpers_doc.py
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c5ec89732a8d..f47aeddbbe0a 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -367,394 +367,1879 @@ union bpf_attr {
>
> /* BPF helper function descriptions:
> *
> - * void *bpf_map_lookup_elem(&map, &key)
> - * Return: Map value or NULL
> - *
> - * int bpf_map_update_elem(&map, &key, &value, flags)
> - * Return: 0 on success or negative error
> - *
> - * int bpf_map_delete_elem(&map, &key)
> - * Return: 0 on success or negative error
> - *
> - * int bpf_probe_read(void *dst, int size, void *src)
> - * Return: 0 on success or negative error
> + * void *bpf_map_lookup_elem(struct bpf_map *map, void *key)
> + * Description
> + * Perform a lookup in *map* for an entry associated to *key*.
> + * Return
> + * Map value associated to *key*, or **NULL** if no entry was
> + * found.
> + * For
> + * All types of programs. Limited to maps of types
> + * **BPF_MAP_TYPE_HASH**,
> + * **BPF_MAP_TYPE_ARRAY**,
> + * **BPF_MAP_TYPE_PERCPU_HASH**,
> + * **BPF_MAP_TYPE_PERCPU_ARRAY**,
> + * **BPF_MAP_TYPE_LRU_HASH**,
> + * **BPF_MAP_TYPE_LRU_PERCPU_HASH**,
> + * **BPF_MAP_TYPE_LPM_TRIE**,
> + * **BPF_MAP_TYPE_ARRAY_OF_MAPS**,
> + * and **BPF_MAP_TYPE_HASH_OF_MAPS**.
> + * GPL only
> + * No
> + * Since
> + * Linux 3.19
I think we should give it a try. There is a risk that it will become stale
and to reduce that I'd propose to remove 'For', 'GPL only' and 'Since',
since for new helpers 'Since' is kinda hard to fill in before it lands
all the way, and 'For' keeps changing as new types are added.
'Description' is the most useful and it needs separate thorough review
for every helper.
^ permalink raw reply
* Re: [PATCH net 2/6] ip_tunnel: better validate user provided tunnel names
From: Sasha Levin @ 2018-04-09 3:37 UTC (permalink / raw)
To: Sasha Levin, Eric Dumazet, David S . Miller
Cc: netdev, stable@vger.kernel.org
In-Reply-To: <20180405133931.207634-3-edumazet@google.com>
Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: c54419321455 GRE: Refactor GRE tunneling code..
The bot has also determined it's probably a bug fixing patch. (score: 46.6256)
The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126.
v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!
^ permalink raw reply
* Re: [PATCH net 0/6] net: better validate user provided tunnel names
From: Sasha Levin @ 2018-04-09 3:37 UTC (permalink / raw)
To: Sasha Levin, Eric Dumazet, David S . Miller
Cc: netdev, stable@vger.kernel.org
In-Reply-To: <20180405133931.207634-1-edumazet@google.com>
Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: ed1efb2aefbb ipv6: Add support for IPsec virtual tunnel interfaces.
The bot has also determined it's probably a bug fixing patch. (score: 53.6463)
The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126.
v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!
^ permalink raw reply
* Re: [PATCH] net: phy: marvell: Enable interrupt function on LED2 pin
From: Sasha Levin @ 2018-04-09 3:37 UTC (permalink / raw)
To: Sasha Levin, Esben Haabendal, Esben Haabendal,
netdev@vger.kernel.org
Cc: Esben Haabendal, Rasmus Villemoes, stable@vger.kernel.org
In-Reply-To: <20180405133504.12257-1-esben.haabendal@gmail.com>
Hi,
[This is an automated email]
This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 7.3040)
The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126.
v4.16: Build OK!
v4.15.15: Build failed! Errors:
drivers/net/phy/marvell.c:472:9: error: implicit declaration of function ‘phy_modify’; did you mean ‘pmd_modify’? [-Werror=implicit-function-declaration]
v4.14.32: Build failed! Errors:
drivers/net/phy/marvell.c:472:9: error: implicit declaration of function ‘phy_modify’; did you mean ‘pmd_modify’? [-Werror=implicit-function-declaration]
v4.9.92: Failed to apply! Possible dependencies:
864dc729d528 ("net: phy: marvell: Refactor m88e1121 RGMII delay configuration")
v4.4.126: Failed to apply! Possible dependencies:
864dc729d528 ("net: phy: marvell: Refactor m88e1121 RGMII delay configuration")
Please let us know if you'd like to have this patch included in a stable tree.
--
Thanks,
Sasha
^ permalink raw reply
* Re: [PATCH net 6/6] vti6: better validate user provided tunnel names
From: Sasha Levin @ 2018-04-09 3:37 UTC (permalink / raw)
To: Sasha Levin, Eric Dumazet, David S . Miller
Cc: netdev, Steffen Klassert, stable@vger.kernel.org
In-Reply-To: <20180405133931.207634-7-edumazet@google.com>
Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: ed1efb2aefbb ipv6: Add support for IPsec virtual tunnel interfaces.
The bot has also determined it's probably a bug fixing patch. (score: 65.4654)
The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126.
v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!
^ permalink raw reply
* Re: [PATCH net 4/6] ip6_gre: better validate user provided tunnel names
From: Sasha Levin @ 2018-04-09 3:37 UTC (permalink / raw)
To: Sasha Levin, Eric Dumazet, David S . Miller
Cc: netdev, stable@vger.kernel.org
In-Reply-To: <20180405133931.207634-5-edumazet@google.com>
Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: c12b395a4664 gre: Support GRE over IPv6.
The bot has also determined it's probably a bug fixing patch. (score: 52.9896)
The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126.
v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!
^ permalink raw reply
* Re: [PATCH net 5/6] ip6_tunnel: better validate user provided tunnel names
From: Sasha Levin @ 2018-04-09 3:37 UTC (permalink / raw)
To: Sasha Levin, Eric Dumazet, David S . Miller
Cc: netdev, stable@vger.kernel.org
In-Reply-To: <20180405133931.207634-6-edumazet@google.com>
Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 1da177e4c3f4 Linux-2.6.12-rc2.
The bot has also determined it's probably a bug fixing patch. (score: 24.0820)
The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126.
v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!
^ permalink raw reply
* Re: [PATCH net 3/6] ipv6: sit: better validate user provided tunnel names
From: Sasha Levin @ 2018-04-09 3:37 UTC (permalink / raw)
To: Sasha Levin, Eric Dumazet, David S . Miller
Cc: netdev, stable@vger.kernel.org
In-Reply-To: <20180405133931.207634-4-edumazet@google.com>
Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 1da177e4c3f4 Linux-2.6.12-rc2.
The bot has also determined it's probably a bug fixing patch. (score: 53.2877)
The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126.
v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!
^ permalink raw reply
* Re: [PATCH net] arp: fix arp_filter on l3slave devices
From: Sasha Levin @ 2018-04-09 3:36 UTC (permalink / raw)
To: Sasha Levin, Miguel Fadon Perlines, netdev@vger.kernel.org
Cc: David Ahern, stable@vger.kernel.org
In-Reply-To: <1522916738-192046-1-git-send-email-mfadon@teldat.com>
Hi,
[This is an automated email]
This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 33.5930)
The bot has tested the following trees: v4.16, v4.15.15, v4.14.32, v4.9.92, v4.4.126.
v4.16: Build OK!
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Build OK!
v4.4.126: Build OK!
Please let us know if you'd like to have this patch included in a stable tree.
^ permalink raw reply
* Re: [RFC PATCH bpf-next 2/6] bpf: add bpf_get_stack helper
From: Alexei Starovoitov @ 2018-04-09 3:34 UTC (permalink / raw)
To: Yonghong Song, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20180406214846.916265-3-yhs@fb.com>
On 4/6/18 2:48 PM, Yonghong Song wrote:
> Currently, stackmap and bpf_get_stackid helper are provided
> for bpf program to get the stack trace. This approach has
> a limitation though. If two stack traces have the same hash,
> only one will get stored in the stackmap table,
> so some stack traces are missing from user perspective.
>
> This patch implements a new helper, bpf_get_stack, will
> send stack traces directly to bpf program. The bpf program
> is able to see all stack traces, and then can do in-kernel
> processing or send stack traces to user space through
> shared map or bpf_perf_event_output.
>
> Signed-off-by: Yonghong Song <yhs@fb.com>
> ---
> include/linux/bpf.h | 1 +
> include/linux/filter.h | 3 ++-
> include/uapi/linux/bpf.h | 17 +++++++++++++--
> kernel/bpf/stackmap.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
> kernel/bpf/syscall.c | 12 ++++++++++-
> kernel/bpf/verifier.c | 3 +++
> kernel/trace/bpf_trace.c | 50 +++++++++++++++++++++++++++++++++++++++++-
> 7 files changed, 137 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 95a7abd..72ccb9a 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -676,6 +676,7 @@ extern const struct bpf_func_proto bpf_get_current_comm_proto;
> extern const struct bpf_func_proto bpf_skb_vlan_push_proto;
> extern const struct bpf_func_proto bpf_skb_vlan_pop_proto;
> extern const struct bpf_func_proto bpf_get_stackid_proto;
> +extern const struct bpf_func_proto bpf_get_stack_proto;
> extern const struct bpf_func_proto bpf_sock_map_update_proto;
>
> /* Shared helpers among cBPF and eBPF. */
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index fc4e8f9..9b64f63 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -467,7 +467,8 @@ struct bpf_prog {
> dst_needed:1, /* Do we need dst entry? */
> blinded:1, /* Was blinded */
> is_func:1, /* program is a bpf function */
> - kprobe_override:1; /* Do we override a kprobe? */
> + kprobe_override:1, /* Do we override a kprobe? */
> + need_callchain_buf:1; /* Needs callchain buffer? */
> enum bpf_prog_type type; /* Type of BPF program */
> enum bpf_attach_type expected_attach_type; /* For some prog types */
> u32 len; /* Number of filter blocks */
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index c5ec897..a4ff5b7 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -517,6 +517,17 @@ union bpf_attr {
> * other bits - reserved
> * Return: >= 0 stackid on success or negative error
> *
> + * int bpf_get_stack(ctx, buf, size, flags)
> + * walk user or kernel stack and store the ips in buf
> + * @ctx: struct pt_regs*
> + * @buf: user buffer to fill stack
> + * @size: the buf size
> + * @flags: bits 0-7 - numer of stack frames to skip
> + * bit 8 - collect user stack instead of kernel
> + * bit 11 - get build-id as well if user stack
> + * other bits - reserved
> + * Return: >= 0 size copied on success or negative error
> + *
> * s64 bpf_csum_diff(from, from_size, to, to_size, seed)
> * calculate csum diff
> * @from: raw from buffer
> @@ -821,7 +832,8 @@ union bpf_attr {
> FN(msg_apply_bytes), \
> FN(msg_cork_bytes), \
> FN(msg_pull_data), \
> - FN(bind),
> + FN(bind), \
> + FN(get_stack),
>
> /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> * function eBPF program intends to call
> @@ -855,11 +867,12 @@ enum bpf_func_id {
> /* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
> #define BPF_F_TUNINFO_IPV6 (1ULL << 0)
>
> -/* BPF_FUNC_get_stackid flags. */
> +/* BPF_FUNC_get_stackid and BPF_FUNC_get_stack flags. */
> #define BPF_F_SKIP_FIELD_MASK 0xffULL
> #define BPF_F_USER_STACK (1ULL << 8)
> #define BPF_F_FAST_STACK_CMP (1ULL << 9)
> #define BPF_F_REUSE_STACKID (1ULL << 10)
> +#define BPF_F_USER_BUILD_ID (1ULL << 11)
the comment above is not quite correct.
This new flag is only available for new helper.
>
> /* BPF_FUNC_skb_set_tunnel_key flags. */
> #define BPF_F_ZERO_CSUM_TX (1ULL << 1)
> diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
> index 04f6ec1..371c72e 100644
> --- a/kernel/bpf/stackmap.c
> +++ b/kernel/bpf/stackmap.c
> @@ -402,6 +402,62 @@ const struct bpf_func_proto bpf_get_stackid_proto = {
> .arg3_type = ARG_ANYTHING,
> };
>
> +BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, size,
> + u64, flags)
> +{
> + u32 init_nr, trace_nr, copy_len, elem_size, num_elem;
> + bool user_build_id = flags & BPF_F_USER_BUILD_ID;
> + u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
> + bool user = flags & BPF_F_USER_STACK;
> + struct perf_callchain_entry *trace;
> + bool kernel = !user;
> + u64 *ips;
> +
> + if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
> + BPF_F_USER_BUILD_ID)))
> + return -EINVAL;
> +
> + elem_size = (user && user_build_id) ? sizeof(struct bpf_stack_build_id)
> + : sizeof(u64);
> + if (unlikely(size % elem_size))
> + return -EINVAL;
> +
> + num_elem = size / elem_size;
> + if (sysctl_perf_event_max_stack < num_elem)
> + init_nr = 0;
prog's buffer should be zero padded in this case since it
points to uninit_mem.
> + else
> + init_nr = sysctl_perf_event_max_stack - num_elem;
> + trace = get_perf_callchain(regs, init_nr, kernel, user,
> + sysctl_perf_event_max_stack, false, false);
> + if (unlikely(!trace))
> + return -EFAULT;
> +
> + trace_nr = trace->nr - init_nr;
> + if (trace_nr <= skip)
> + return -EFAULT;
> +
> + trace_nr -= skip;
> + trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
> + copy_len = trace_nr * elem_size;
> + ips = trace->ip + skip + init_nr;
> + if (user && user_build_id)
the combination of kern + user_build_id should probably be rejected
earlier with einval.
> + stack_map_get_build_id_offset(buf, ips, trace_nr, user);
> + else
> + memcpy(buf, ips, copy_len);
> +
> + return copy_len;
> +}
> +
> +const struct bpf_func_proto bpf_get_stack_proto = {
> + .func = bpf_get_stack,
> + .gpl_only = true,
> + .ret_type = RET_INTEGER,
> + .arg1_type = ARG_PTR_TO_CTX,
> + .arg2_type = ARG_PTR_TO_UNINIT_MEM,
> + .arg3_type = ARG_CONST_SIZE_OR_ZERO,
why allow zero size?
I'm not sure the helper will work correctly when size=0
> + .arg4_type = ARG_ANYTHING,
> +};
> +
> /* Called from eBPF program */
> static void *stack_map_lookup_elem(struct bpf_map *map, void *key)
> {
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 0244973..2aa3a65 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -984,10 +984,13 @@ void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
> static void __bpf_prog_put_rcu(struct rcu_head *rcu)
> {
> struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);
> + bool need_callchain_buf = aux->prog->need_callchain_buf;
>
> free_used_maps(aux);
> bpf_prog_uncharge_memlock(aux->prog);
> security_bpf_prog_free(aux);
> + if (need_callchain_buf)
> + put_callchain_buffers();
> bpf_prog_free(aux->prog);
> }
>
> @@ -1004,7 +1007,8 @@ static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
> bpf_prog_kallsyms_del(prog->aux->func[i]);
> bpf_prog_kallsyms_del(prog);
>
> - call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
> + synchronize_rcu();
> + __bpf_prog_put_rcu(&prog->aux->rcu);
there should have been lockdep splat.
We cannot call synchronize_rcu here, since we cannot sleep
in some cases.
^ permalink raw reply
* Re: kernel BUG at drivers/vhost/vhost.c:LINE! (2)
From: Stefan Hajnoczi @ 2018-04-09 3:28 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: syzbot, Jason Wang, kvm, linux-kernel, netdev, syzkaller-bugs,
Linux Virtualization
In-Reply-To: <20180409054316-mutt-send-email-mst@kernel.org>
[-- Attachment #1: Type: text/plain, Size: 961 bytes --]
On Mon, Apr 09, 2018 at 05:44:36AM +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 09, 2018 at 10:37:45AM +0800, Stefan Hajnoczi wrote:
> > On Sat, Apr 7, 2018 at 3:02 AM, syzbot
> > <syzbot+65a84dde0214b0387ccd@syzkaller.appspotmail.com> wrote:
> > > syzbot hit the following crash on upstream commit
> > > 38c23685b273cfb4ccf31a199feccce3bdcb5d83 (Fri Apr 6 04:29:35 2018 +0000)
> > > Merge tag 'armsoc-drivers' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
> > > syzbot dashboard link:
> > > https://syzkaller.appspot.com/bug?extid=65a84dde0214b0387ccd
> >
> > To prevent duplicated work: I am working on this one.
> >
> > Stefan
>
> Do you want to try this patchset:
> https://lkml.org/lkml/2018/4/5/665
>
> ?
Thanks, I'll give it a shot.
I also noticed a regression in commit
d65026c6c62e7d9616c8ceb5a53b68bcdc050525 ("vhost: validate log when
IOTLB is enabled") and am currently testing a fix.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply
* Re: DPAA TX Issues
From: Jacob S. Moroni @ 2018-04-09 3:20 UTC (permalink / raw)
To: madalin.bucur; +Cc: netdev
In-Reply-To: <1523231216.3843066.1330879848.1F0E863A@webmail.messagingengine.com>
On Sun, Apr 8, 2018, at 7:46 PM, Jacob S. Moroni wrote:
> Hello Madalin,
>
> I've been experiencing some issues with the DPAA Ethernet driver,
> specifically related to frame transmission. Hopefully you can point
> me in the right direction.
>
> TLDR: Attempting to transmit faster than a few frames per second causes
> the TX FQ CGR to enter into the congested state and remain there forever,
> even after transmission stops.
>
> The hardware is a T2080RDB, running from the tip of net-next, using
> the standard t2080rdb device tree and corenet64_smp_defconfig kernel
> config. No changes were made to any of the files. The issue occurs
> with 4.16.1 stable as well. In fact, the only time I've been able
> to achieve reliable frame transmission was with the SDK 4.1 kernel.
>
> For my tests, I'm running iperf3 both with and without the -R
> option (send/receive). When using a USB Ethernet adapter, there
> are no issues.
>
> The issue is that it seems like the TX frame queues are getting
> "stuck" when attempting to transmit at rates greater than a few frames
> per second. Ping works fine, but it seems like anything that could
> potentially cause multiple TX frames to be enqueued causes issues.
>
> If I run iperf3 in reverse mode (with the T2080RDB receiving), then
> I can achieve ~940 Mbps, but this is also somewhat unreliable.
>
> If I run it with the T2080RDB transmitting, the test will never
> complete. Sometimes it starts transmitting for a few seconds then stops,
> and other times it never even starts. This also seems to force the
> interface into a bad state.
>
> The ethtool stats show that the interface has entered
> congestion a few times, and that it's currently congested. The fact
> that it's currently congested even after stopping transmission
> indicates that the FQ somehow stopped being drained. I've also
> noticed that whenever this issue occurs, the TX confirmation
> counters are always less than the TX packet counters.
>
> When it gets into this state, I can see that the memory usage is
> climbing, up until about the point of where the CGR threshold
> is (about 100 MB).
>
> Any idea what could prevent the TX FQ from being drained? My first
> guess was flow control, but it's completely disabled.
>
> I tried messing with the egress congestion threshold, workqueue
> assignments, etc., but nothing seemed to have any effect.
>
> If you need any more information or want me to run any tests,
> please let me know.
>
> Thanks,
> --
> Jacob S. Moroni
> mail@jakemoroni.com
It turns out that irqbalance was causing all of the issues. After
disabling it and rebooting, the interfaces worked perfectly.
Perhaps there's an issue with how the qman/bman portals are defined
as per-cpu variables.
During the portal's probe, the CPUs are assigned one-by-one and
subsequently passed into request_irq as the argument.
However, it seems like if the IRQ affinity changes, then the ISR could be
passed a reference to a per-cpu variable belonging to another CPU.
At least I know where to look now.
- Jake
^ permalink raw reply
* [GIT] Networking
From: David Miller @ 2018-04-09 2:50 UTC (permalink / raw)
To: torvalds; +Cc: akpm, netdev, linux-kernel
1) The sockmap code has to free socket memory on close if
there is corked data, from John Fastabend.
2) Tunnel names coming from userspace need to be length
validated. From Eric Dumazet.
3) arp_filter() has to take VRFs properly into account, from
Miguel Fadon Perlines.
4) Fix oops in error path of tcf_bpf_init(), from Davide Caratti.
5) Missing idr_remove() in u32_delete_key(), from Cong Wang.
6) More syzbot stuff. Several use of uninitialized value fixes all
over, from Eric Dumazet.
7) Do not leak kernel memory to userspace in sctp, also from Eric
Dumazet.
8) Discard frames from unused ports in DSA, from Andrew Lunn.
9) Fix DMA mapping and reset/failover problems in ibmvnic, from Thomas
Falcon.
10) Do not access dp83640 PHY registers prematurely after reset, from
Esben Haabendal.
Please pull, thanks a lot!
The following changes since commit 06dd3dfeea60e2a6457a6aedf97afc8e6d2ba497:
Merge tag 'char-misc-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc (2018-04-04 20:07:20 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
for you to fetch changes up to 76327a35caabd1a932e83d6a42b967aa08584e5d:
dp83640: Ensure against premature access to PHY registers after reset (2018-04-08 19:58:52 -0400)
----------------------------------------------------------------
Anders Roxell (1):
kernel/bpf/syscall: fix warning defined but not used
Andrew Lunn (1):
net: dsa: Discard frames from unused ports
Anirudh Venkataramanan (1):
ice: Bug fixes in ethtool code
Cong Wang (2):
net_sched: fix a missing idr_remove() in u32_delete_key()
tipc: use the right skb in tipc_sk_fill_sock_diag()
David S. Miller (7):
Merge branch 'net-tunnel-name-validate'
Merge branch 'hv_netvsc-Fix-shutdown-issues-on-older-Windows-hosts'
Merge branch '100GbE' of git://git.kernel.org/.../jkirsher/net-queue
Merge branch 'net-fix-uninit-values-in-networking-stack'
Merge branch 'ibmvnic-Fix-driver-reset-and-DMA-bugs'
Merge branch 'for-upstream' of git://git.kernel.org/.../bluetooth/bluetooth
Merge git://git.kernel.org/.../bpf/bpf
Davide Caratti (1):
net/sched: fix NULL dereference in the error path of tcf_bpf_init()
Eric Dumazet (16):
net: fool proof dev_valid_name()
ip_tunnel: better validate user provided tunnel names
ipv6: sit: better validate user provided tunnel names
ip6_gre: better validate user provided tunnel names
ip6_tunnel: better validate user provided tunnel names
vti6: better validate user provided tunnel names
crypto: af_alg - fix possible uninit-value in alg_bind()
netlink: fix uninit-value in netlink_sendmsg
net: fix rtnh_ok()
net: initialize skb->peeked when cloning
net: fix uninit-value in __hw_addr_add_ex()
dccp: initialize ireq->ir_mark
ipv4: fix uninit-value in ip_route_output_key_hash_rcu()
soreuseport: initialise timewait reuseport field
sctp: do not leak kernel memory to user space
sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6
Esben Haabendal (4):
net: phy: marvell: Enable interrupt function on LED2 pin
net/fsl_pq_mdio: Allow explicit speficition of TBIPA address
ARM: dts: ls1021a: Specify TBIPA register address
dp83640: Ensure against premature access to PHY registers after reset
Jeff Barnhill (1):
net/ipv6: Increment OUTxxx counters after netfilter hook
Jiri Pirko (1):
devlink: convert occ_get op to separate registration
John Fastabend (2):
bpf: sockmap, free memory on sock close with cork data
bpf: sockmap, duplicates release calls may NULL sk_prot
Maxime Chevallier (1):
net: mvpp2: Fix parser entry init boundary check
Miguel Fadon Perlines (1):
arp: fix arp_filter on l3slave devices
Mohammed Gamal (4):
hv_netvsc: Use Windows version instead of NVSP version on GPAD teardown
hv_netvsc: Split netvsc_revoke_buf() and netvsc_teardown_gpadl()
hv_netvsc: Ensure correct teardown message sequence order
hv_netvsc: Pass net_device parameter to revoke and teardown functions
Nathan Fontenot (1):
ibmvnic: Do not reset CRQ for Mobility driver resets
Szymon Janc (1):
Bluetooth: Fix connection if directed advertising and privacy is used
Thomas Falcon (4):
ibmvnic: Fix DMA mapping mistakes
ibmvnic: Zero used TX descriptor counter on reset
ibmvnic: Fix reset scheduler error handling
ibmvnic: Fix failover case for non-redundant configuration
Wei Yongjun (1):
ice: Fix error return code in ice_init_hw()
Documentation/devicetree/bindings/net/fsl-tsec-phy.txt | 6 +++-
arch/arm/boot/dts/ls1021a.dtsi | 3 +-
crypto/af_alg.c | 8 ++---
drivers/net/ethernet/freescale/fsl_pq_mdio.c | 50 ++++++++++++++++++++----------
drivers/net/ethernet/ibm/ibmvnic.c | 146 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------------------
drivers/net/ethernet/ibm/ibmvnic.h | 1 +
drivers/net/ethernet/intel/ice/ice_common.c | 4 ++-
drivers/net/ethernet/intel/ice/ice_ethtool.c | 4 +--
drivers/net/ethernet/marvell/mvpp2.c | 2 +-
drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 24 +++------------
drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 1 -
drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c | 67 +++++++++++++++++++++++-----------------
drivers/net/hyperv/netvsc.c | 60 ++++++++++++++++++++++++++----------
drivers/net/netdevsim/devlink.c | 65 +++++++++++++++++++--------------------
drivers/net/phy/dp83640.c | 18 +++++++++++
drivers/net/phy/marvell.c | 20 ++++++++++--
include/net/bluetooth/hci_core.h | 2 +-
include/net/devlink.h | 40 +++++++++++++++---------
include/net/inet_timewait_sock.h | 1 +
include/net/nexthop.h | 2 +-
kernel/bpf/sockmap.c | 12 ++++++--
kernel/bpf/syscall.c | 24 +++++++--------
net/bluetooth/hci_conn.c | 29 +++++++++++++-----
net/bluetooth/hci_event.c | 15 ++++++---
net/bluetooth/l2cap_core.c | 2 +-
net/core/dev.c | 2 +-
net/core/dev_addr_lists.c | 4 +--
net/core/devlink.c | 74 ++++++++++++++++++++++++++++++++++++++------
net/core/skbuff.c | 1 +
net/dccp/ipv4.c | 1 +
net/dccp/ipv6.c | 1 +
net/dsa/dsa_priv.h | 8 ++++-
net/ipv4/arp.c | 2 +-
net/ipv4/inet_timewait_sock.c | 1 +
net/ipv4/ip_tunnel.c | 11 ++++---
net/ipv4/route.c | 11 ++++---
net/ipv6/ip6_gre.c | 8 +++--
net/ipv6/ip6_output.c | 7 +++--
net/ipv6/ip6_tunnel.c | 11 ++++---
net/ipv6/ip6_vti.c | 7 +++--
net/ipv6/sit.c | 8 +++--
net/netlink/af_netlink.c | 2 ++
net/sched/act_bpf.c | 12 +++++---
net/sched/cls_u32.c | 1 +
net/sctp/ipv6.c | 4 ++-
net/sctp/socket.c | 13 +++++---
net/tipc/diag.c | 2 +-
net/tipc/socket.c | 6 ++--
net/tipc/socket.h | 4 +--
49 files changed, 534 insertions(+), 273 deletions(-)
^ permalink raw reply
* Re: kernel BUG at drivers/vhost/vhost.c:LINE! (2)
From: Michael S. Tsirkin @ 2018-04-09 2:44 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: syzbot, Jason Wang, kvm, linux-kernel, netdev, syzkaller-bugs,
Linux Virtualization
In-Reply-To: <CAJSP0QVHk7NSQcjfLWq13=1Vzm=_vtmKZqp4dDjuh8ETLO5g_g@mail.gmail.com>
On Mon, Apr 09, 2018 at 10:37:45AM +0800, Stefan Hajnoczi wrote:
> On Sat, Apr 7, 2018 at 3:02 AM, syzbot
> <syzbot+65a84dde0214b0387ccd@syzkaller.appspotmail.com> wrote:
> > syzbot hit the following crash on upstream commit
> > 38c23685b273cfb4ccf31a199feccce3bdcb5d83 (Fri Apr 6 04:29:35 2018 +0000)
> > Merge tag 'armsoc-drivers' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
> > syzbot dashboard link:
> > https://syzkaller.appspot.com/bug?extid=65a84dde0214b0387ccd
>
> To prevent duplicated work: I am working on this one.
>
> Stefan
Do you want to try this patchset:
https://lkml.org/lkml/2018/4/5/665
?
--
MST
^ permalink raw reply
* Re: [PATCH] vhost-net: set packet weight of tx polling to 2 * vq size
From: Michael S. Tsirkin @ 2018-04-09 2:42 UTC (permalink / raw)
To: haibinzhang(张海斌)
Cc: Jason Wang, kvm@vger.kernel.org,
virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org,
lidongchen(陈立东),
yunfangtai(台运方)
In-Reply-To: <88D661ADF6AFBF42B2AB88D8E7682B0901FC47D3@EXMBX-SZMAIL011.tencent.com>
On Fri, Apr 06, 2018 at 08:22:37AM +0000, haibinzhang(张海斌) wrote:
> handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy
> polling udp packets with small length(e.g. 1byte udp payload), because setting
> VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length.
>
> Ping-Latencies shown below were tested between two Virtual Machines using
> netperf (UDP_STREAM, len=1), and then another machine pinged the client:
>
> Packet-Weight Ping-Latencies(millisecond)
> min avg max
> Origin 3.319 18.489 57.303
> 64 1.643 2.021 2.552
> 128 1.825 2.600 3.224
> 256 1.997 2.710 4.295
> 512 1.860 3.171 4.631
> 1024 2.002 4.173 9.056
> 2048 2.257 5.650 9.688
> 4096 2.093 8.508 15.943
And this is with Q size 256 right?
> Ring size is a hint from device about a burst size it can tolerate. Based on
> benchmarks, set the weight to 2 * vq size.
>
> To evaluate this change, another tests were done using netperf(RR, TX) between
> two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
> tweaked through qemu. Results shown below does not show obvious changes.
What I asked for is ping-latency with different VQ sizes,
streaming below does not show anything.
> vq size=256 TCP_RR vq size=512 TCP_RR
> size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize%
> 1/ 1/ -7%/ -2% 1/ 1/ 0%/ -2%
> 1/ 4/ +1%/ 0% 1/ 4/ +1%/ 0%
> 1/ 8/ +1%/ -2% 1/ 8/ 0%/ +1%
> 64/ 1/ -6%/ 0% 64/ 1/ +7%/ +3%
> 64/ 4/ 0%/ +2% 64/ 4/ -1%/ +1%
> 64/ 8/ 0%/ 0% 64/ 8/ -1%/ -2%
> 256/ 1/ -3%/ -4% 256/ 1/ -4%/ -2%
> 256/ 4/ +3%/ +4% 256/ 4/ +1%/ +2%
> 256/ 8/ +2%/ 0% 256/ 8/ +1%/ -1%
>
> vq size=256 UDP_RR vq size=512 UDP_RR
> size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize%
> 1/ 1/ -5%/ +1% 1/ 1/ -3%/ -2%
> 1/ 4/ +4%/ +1% 1/ 4/ -2%/ +2%
> 1/ 8/ -1%/ -1% 1/ 8/ -1%/ 0%
> 64/ 1/ -2%/ -3% 64/ 1/ +1%/ +1%
> 64/ 4/ -5%/ -1% 64/ 4/ +2%/ 0%
> 64/ 8/ 0%/ -1% 64/ 8/ -2%/ +1%
> 256/ 1/ +7%/ +1% 256/ 1/ -7%/ 0%
> 256/ 4/ +1%/ +1% 256/ 4/ -3%/ -4%
> 256/ 8/ +2%/ +2% 256/ 8/ +1%/ +1%
>
> vq size=256 TCP_STREAM vq size=512 TCP_STREAM
> size/sessions/+thu%/+normalize% size/sessions/+thu%/+normalize%
> 64/ 1/ 0%/ -3% 64/ 1/ 0%/ 0%
> 64/ 4/ +3%/ -1% 64/ 4/ -2%/ +4%
> 64/ 8/ +9%/ -4% 64/ 8/ -1%/ +2%
> 256/ 1/ +1%/ -4% 256/ 1/ +1%/ +1%
> 256/ 4/ -1%/ -1% 256/ 4/ -3%/ 0%
> 256/ 8/ +7%/ +5% 256/ 8/ -3%/ 0%
> 512/ 1/ +1%/ 0% 512/ 1/ -1%/ -1%
> 512/ 4/ +1%/ -1% 512/ 4/ 0%/ 0%
> 512/ 8/ +7%/ -5% 512/ 8/ +6%/ -1%
> 1024/ 1/ 0%/ -1% 1024/ 1/ 0%/ +1%
> 1024/ 4/ +3%/ 0% 1024/ 4/ +1%/ 0%
> 1024/ 8/ +8%/ +5% 1024/ 8/ -1%/ 0%
> 2048/ 1/ +2%/ +2% 2048/ 1/ -1%/ 0%
> 2048/ 4/ +1%/ 0% 2048/ 4/ 0%/ -1%
> 2048/ 8/ -2%/ 0% 2048/ 8/ 5%/ -1%
> 4096/ 1/ -2%/ 0% 4096/ 1/ -2%/ 0%
> 4096/ 4/ +2%/ 0% 4096/ 4/ 0%/ 0%
> 4096/ 8/ +9%/ -2% 4096/ 8/ -5%/ -1%
>
> Signed-off-by: Haibin Zhang <haibinzhang@tencent.com>
> Signed-off-by: Yunfang Tai <yunfangtai@tencent.com>
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
Code is fine but I'd like to see validation of the heuristic
2*vq->num with another vq size.
> ---
> drivers/vhost/net.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 8139bc70ad7d..3563a305cc0a 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -44,6 +44,10 @@ MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
> * Using this limit prevents one virtqueue from starving others. */
> #define VHOST_NET_WEIGHT 0x80000
>
> +/* Max number of packets transferred before requeueing the job.
> + * Using this limit prevents one virtqueue from starving rx. */
> +#define VHOST_NET_PKT_WEIGHT(vq) ((vq)->num * 2)
> +
> /* MAX number of TX used buffers for outstanding zerocopy */
> #define VHOST_MAX_PEND 128
> #define VHOST_GOODCOPY_LEN 256
> @@ -473,6 +477,7 @@ static void handle_tx(struct vhost_net *net)
> struct socket *sock;
> struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
> bool zcopy, zcopy_used;
> + int sent_pkts = 0;
>
> mutex_lock(&vq->mutex);
> sock = vq->private_data;
> @@ -580,7 +585,8 @@ static void handle_tx(struct vhost_net *net)
> else
> vhost_zerocopy_signal_used(net, vq);
> vhost_net_tx_packet(net);
> - if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
> + if (unlikely(total_len >= VHOST_NET_WEIGHT) ||
> + unlikely(++sent_pkts >= VHOST_NET_PKT_WEIGHT(vq))) {
> vhost_poll_queue(&vq->poll);
> break;
> }
> --
> 2.12.3
>
^ permalink raw reply
* Re: [PATCH net-next] net/ncsi: Refactor MAC, VLAN filters
From: David Miller @ 2018-04-09 2:40 UTC (permalink / raw)
To: sam; +Cc: netdev, linux-kernel, openbmc
In-Reply-To: <20180409021128.10055-1-sam@mendozajonas.com>
The net-next tree is closed at this time, please resend this when the
merge window is over and the net-next tree opens back up.
Thank you.
^ permalink raw reply
* Re: kernel BUG at drivers/vhost/vhost.c:LINE! (2)
From: Stefan Hajnoczi @ 2018-04-09 2:37 UTC (permalink / raw)
To: syzbot
Cc: Jason Wang, kvm, linux-kernel, Michael S. Tsirkin, netdev,
syzkaller-bugs, Linux Virtualization
In-Reply-To: <001a11449424f11322056932b09c@google.com>
On Sat, Apr 7, 2018 at 3:02 AM, syzbot
<syzbot+65a84dde0214b0387ccd@syzkaller.appspotmail.com> wrote:
> syzbot hit the following crash on upstream commit
> 38c23685b273cfb4ccf31a199feccce3bdcb5d83 (Fri Apr 6 04:29:35 2018 +0000)
> Merge tag 'armsoc-drivers' of
> git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
> syzbot dashboard link:
> https://syzkaller.appspot.com/bug?extid=65a84dde0214b0387ccd
To prevent duplicated work: I am working on this one.
Stefan
>
> So far this crash happened 4 times on upstream.
> C reproducer: https://syzkaller.appspot.com/x/repro.c?id=6586748079439872
> syzkaller reproducer:
> https://syzkaller.appspot.com/x/repro.syz?id=5974272052822016
> Raw console output:
> https://syzkaller.appspot.com/x/log.txt?id=6224632407392256
> Kernel config:
> https://syzkaller.appspot.com/x/.config?id=-5813481738265533882
> compiler: gcc (GCC) 8.0.1 20180301 (experimental)
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+65a84dde0214b0387ccd@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
> ------------[ cut here ]------------
> kernel BUG at drivers/vhost/vhost.c:1652!
> invalid opcode: 0000 [#1] SMP KASAN
> ------------[ cut here ]------------
> Dumping ftrace buffer:
> kernel BUG at drivers/vhost/vhost.c:1652!
> (ftrace buffer empty)
> Modules linked in:
> CPU: 1 PID: 4461 Comm: syzkaller684218 Not tainted 4.16.0+ #3
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> RIP: 0010:set_bit_to_user drivers/vhost/vhost.c:1652 [inline]
> RIP: 0010:log_write+0x42a/0x4d0 drivers/vhost/vhost.c:1676
> RSP: 0018:ffff8801b256f920 EFLAGS: 00010293
> RAX: ffff8801adc9e2c0 RBX: dffffc0000000000 RCX: ffffffff85924a0f
> RDX: 0000000000000000 RSI: ffffffff85924cea RDI: 0000000000000005
> RBP: ffff8801b256fa58 R08: ffff8801adc9e2c0 R09: ffffed003962412d
> R10: ffff8801b256fad8 R11: ffff8801cb12096f R12: 0001ffffffffffff
> R13: ffffed00364adf36 R14: 0000000000000000 R15: ffff8801b256fa30
> FS: 00007fdf24b19700(0000) GS:ffff8801db100000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020bf6000 CR3: 00000001ae6a7000 CR4: 00000000001406e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> vhost_update_used_flags+0x3af/0x4a0 drivers/vhost/vhost.c:1723
> vhost_vq_init_access+0x117/0x590 drivers/vhost/vhost.c:1763
> vhost_vsock_start drivers/vhost/vsock.c:446 [inline]
> vhost_vsock_dev_ioctl+0x751/0x920 drivers/vhost/vsock.c:678
> vfs_ioctl fs/ioctl.c:46 [inline]
> file_ioctl fs/ioctl.c:500 [inline]
> do_vfs_ioctl+0x1cf/0x1650 fs/ioctl.c:684
> ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
> SYSC_ioctl fs/ioctl.c:708 [inline]
> SyS_ioctl+0x24/0x30 fs/ioctl.c:706
> do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
> entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RIP: 0033:0x4456c9
> RSP: 002b:00007fdf24b18da8 EFLAGS: 00000297 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00000000006dac24 RCX: 00000000004456c9
> RDX: 0000000020f82ffc RSI: 000000004004af61 RDI: 000000000000001b
> RBP: 00000000006dac20 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000297 R12: 6b636f73762d7473
> R13: 6f68762f7665642f R14: fffffffffffffffc R15: 0000000000000007
> Code: e8 7c 5e e4 fb 4c 89 ef e8 e4 16 06 fc 48 8d 85 58 ff ff ff 48 c1 e8
> 03 c6 04 18 f8 e9 46 ff ff ff 45 31 f6 eb 91 e8 56 5e e4 fb <0f> 0b e8 4f 5e
> e4 fb 48 c7 c6 a0 a3 24 88 4c 89 ef e8 60 b6 10
> RIP: set_bit_to_user drivers/vhost/vhost.c:1652 [inline] RSP:
> ffff8801b256f920
> RIP: log_write+0x42a/0x4d0 drivers/vhost/vhost.c:1676 RSP: ffff8801b256f920
> invalid opcode: 0000 [#2] SMP KASAN
> ---[ end trace 0d0ff45aa44d8a23 ]---
> Dumping ftrace buffer:
> (ftrace buffer empty)
> Modules linked in:
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is
> merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> If you want to test a patch for this bug, please reply with:
> #syz test: git://repo/address.git branch
> and provide the patch inline or as an attachment.
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.
^ permalink raw reply
* [PATCH net-next] net/ncsi: Refactor MAC, VLAN filters
From: Samuel Mendoza-Jonas @ 2018-04-09 2:11 UTC (permalink / raw)
To: netdev; +Cc: Samuel Mendoza-Jonas, David S . Miller, linux-kernel, openbmc
The NCSI driver defines a generic ncsi_channel_filter struct that can be
used to store arbitrarily formatted filters, and several generic methods
of accessing data stored in such a filter.
However in both the driver and as defined in the NCSI specification
there are only two actual filters: VLAN ID filters and MAC address
filters. The splitting of the MAC filter into unicast, multicast, and
mixed is also technically not necessary as these are stored in the same
location in hardware.
To save complexity, particularly in the set up and accessing of these
generic filters, remove them in favour of two specific structs. These
can be acted on directly and do not need several generic helper
functions to use.
This also fixes a memory error found by KASAN on ARM32 (which is not
upstream yet), where response handlers accessing a filter's data field
could write past allocated memory.
[ 114.926512] ==================================================================
[ 114.933861] BUG: KASAN: slab-out-of-bounds in ncsi_configure_channel+0x4b8/0xc58
[ 114.941304] Read of size 2 at addr 94888558 by task kworker/0:2/546
[ 114.947593]
[ 114.949146] CPU: 0 PID: 546 Comm: kworker/0:2 Not tainted 4.16.0-rc6-00119-ge156398bfcad #13
...
[ 115.170233] The buggy address belongs to the object at 94888540
[ 115.170233] which belongs to the cache kmalloc-32 of size 32
[ 115.181917] The buggy address is located 24 bytes inside of
[ 115.181917] 32-byte region [94888540, 94888560)
[ 115.192115] The buggy address belongs to the page:
[ 115.196943] page:9eeac100 count:1 mapcount:0 mapping:94888000 index:0x94888fc1
[ 115.204200] flags: 0x100(slab)
[ 115.207330] raw: 00000100 94888000 94888fc1 0000003f 00000001 9eea2014 9eecaa74 96c003e0
[ 115.215444] page dumped because: kasan: bad access detected
[ 115.221036]
[ 115.222544] Memory state around the buggy address:
[ 115.227384] 94888400: fb fb fb fb fc fc fc fc 04 fc fc fc fc fc fc fc
[ 115.233959] 94888480: 00 00 00 fc fc fc fc fc 00 04 fc fc fc fc fc fc
[ 115.240529] >94888500: 00 00 04 fc fc fc fc fc 00 00 04 fc fc fc fc fc
[ 115.247077] ^
[ 115.252523] 94888580: 00 04 fc fc fc fc fc fc 06 fc fc fc fc fc fc fc
[ 115.259093] 94888600: 00 00 06 fc fc fc fc fc 00 00 04 fc fc fc fc fc
[ 115.265639] ==================================================================
Reported-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
---
net/ncsi/internal.h | 34 +++---
net/ncsi/ncsi-manage.c | 226 +++++++++-------------------------------
net/ncsi/ncsi-netlink.c | 20 ++--
net/ncsi/ncsi-rsp.c | 178 +++++++++++++------------------
4 files changed, 147 insertions(+), 311 deletions(-)
diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h
index 8da84312cd3b..8055e3965cef 100644
--- a/net/ncsi/internal.h
+++ b/net/ncsi/internal.h
@@ -68,15 +68,6 @@ enum {
NCSI_MODE_MAX
};
-enum {
- NCSI_FILTER_BASE = 0,
- NCSI_FILTER_VLAN = 0,
- NCSI_FILTER_UC,
- NCSI_FILTER_MC,
- NCSI_FILTER_MIXED,
- NCSI_FILTER_MAX
-};
-
struct ncsi_channel_version {
u32 version; /* Supported BCD encoded NCSI version */
u32 alpha2; /* Supported BCD encoded NCSI version */
@@ -98,11 +89,18 @@ struct ncsi_channel_mode {
u32 data[8]; /* Data entries */
};
-struct ncsi_channel_filter {
- u32 index; /* Index of channel filters */
- u32 total; /* Total entries in the filter table */
- u64 bitmap; /* Bitmap of valid entries */
- u32 data[]; /* Data for the valid entries */
+struct ncsi_channel_mac_filter {
+ u8 n_uc;
+ u8 n_mc;
+ u8 n_mixed;
+ u64 bitmap;
+ unsigned char *addrs;
+};
+
+struct ncsi_channel_vlan_filter {
+ u8 n_vids;
+ u64 bitmap;
+ u16 *vids;
};
struct ncsi_channel_stats {
@@ -186,7 +184,9 @@ struct ncsi_channel {
struct ncsi_channel_version version;
struct ncsi_channel_cap caps[NCSI_CAP_MAX];
struct ncsi_channel_mode modes[NCSI_MODE_MAX];
- struct ncsi_channel_filter *filters[NCSI_FILTER_MAX];
+ /* Filtering Settings */
+ struct ncsi_channel_mac_filter mac_filter;
+ struct ncsi_channel_vlan_filter vlan_filter;
struct ncsi_channel_stats stats;
struct {
struct timer_list timer;
@@ -320,10 +320,6 @@ extern spinlock_t ncsi_dev_lock;
list_for_each_entry_rcu(nc, &np->channels, node)
/* Resources */
-u32 *ncsi_get_filter(struct ncsi_channel *nc, int table, int index);
-int ncsi_find_filter(struct ncsi_channel *nc, int table, void *data);
-int ncsi_add_filter(struct ncsi_channel *nc, int table, void *data);
-int ncsi_remove_filter(struct ncsi_channel *nc, int table, int index);
void ncsi_start_channel_monitor(struct ncsi_channel *nc);
void ncsi_stop_channel_monitor(struct ncsi_channel *nc);
struct ncsi_channel *ncsi_find_channel(struct ncsi_package *np,
diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c
index c3695ba0cf94..5561e221b71f 100644
--- a/net/ncsi/ncsi-manage.c
+++ b/net/ncsi/ncsi-manage.c
@@ -27,125 +27,6 @@
LIST_HEAD(ncsi_dev_list);
DEFINE_SPINLOCK(ncsi_dev_lock);
-static inline int ncsi_filter_size(int table)
-{
- int sizes[] = { 2, 6, 6, 6 };
-
- BUILD_BUG_ON(ARRAY_SIZE(sizes) != NCSI_FILTER_MAX);
- if (table < NCSI_FILTER_BASE || table >= NCSI_FILTER_MAX)
- return -EINVAL;
-
- return sizes[table];
-}
-
-u32 *ncsi_get_filter(struct ncsi_channel *nc, int table, int index)
-{
- struct ncsi_channel_filter *ncf;
- int size;
-
- ncf = nc->filters[table];
- if (!ncf)
- return NULL;
-
- size = ncsi_filter_size(table);
- if (size < 0)
- return NULL;
-
- return ncf->data + size * index;
-}
-
-/* Find the first active filter in a filter table that matches the given
- * data parameter. If data is NULL, this returns the first active filter.
- */
-int ncsi_find_filter(struct ncsi_channel *nc, int table, void *data)
-{
- struct ncsi_channel_filter *ncf;
- void *bitmap;
- int index, size;
- unsigned long flags;
-
- ncf = nc->filters[table];
- if (!ncf)
- return -ENXIO;
-
- size = ncsi_filter_size(table);
- if (size < 0)
- return size;
-
- spin_lock_irqsave(&nc->lock, flags);
- bitmap = (void *)&ncf->bitmap;
- index = -1;
- while ((index = find_next_bit(bitmap, ncf->total, index + 1))
- < ncf->total) {
- if (!data || !memcmp(ncf->data + size * index, data, size)) {
- spin_unlock_irqrestore(&nc->lock, flags);
- return index;
- }
- }
- spin_unlock_irqrestore(&nc->lock, flags);
-
- return -ENOENT;
-}
-
-int ncsi_add_filter(struct ncsi_channel *nc, int table, void *data)
-{
- struct ncsi_channel_filter *ncf;
- int index, size;
- void *bitmap;
- unsigned long flags;
-
- size = ncsi_filter_size(table);
- if (size < 0)
- return size;
-
- index = ncsi_find_filter(nc, table, data);
- if (index >= 0)
- return index;
-
- ncf = nc->filters[table];
- if (!ncf)
- return -ENODEV;
-
- spin_lock_irqsave(&nc->lock, flags);
- bitmap = (void *)&ncf->bitmap;
- do {
- index = find_next_zero_bit(bitmap, ncf->total, 0);
- if (index >= ncf->total) {
- spin_unlock_irqrestore(&nc->lock, flags);
- return -ENOSPC;
- }
- } while (test_and_set_bit(index, bitmap));
-
- memcpy(ncf->data + size * index, data, size);
- spin_unlock_irqrestore(&nc->lock, flags);
-
- return index;
-}
-
-int ncsi_remove_filter(struct ncsi_channel *nc, int table, int index)
-{
- struct ncsi_channel_filter *ncf;
- int size;
- void *bitmap;
- unsigned long flags;
-
- size = ncsi_filter_size(table);
- if (size < 0)
- return size;
-
- ncf = nc->filters[table];
- if (!ncf || index >= ncf->total)
- return -ENODEV;
-
- spin_lock_irqsave(&nc->lock, flags);
- bitmap = (void *)&ncf->bitmap;
- if (test_and_clear_bit(index, bitmap))
- memset(ncf->data + size * index, 0, size);
- spin_unlock_irqrestore(&nc->lock, flags);
-
- return 0;
-}
-
static void ncsi_report_link(struct ncsi_dev_priv *ndp, bool force_down)
{
struct ncsi_dev *nd = &ndp->ndev;
@@ -339,20 +220,13 @@ struct ncsi_channel *ncsi_add_channel(struct ncsi_package *np, unsigned char id)
static void ncsi_remove_channel(struct ncsi_channel *nc)
{
struct ncsi_package *np = nc->package;
- struct ncsi_channel_filter *ncf;
unsigned long flags;
- int i;
- /* Release filters */
spin_lock_irqsave(&nc->lock, flags);
- for (i = 0; i < NCSI_FILTER_MAX; i++) {
- ncf = nc->filters[i];
- if (!ncf)
- continue;
- nc->filters[i] = NULL;
- kfree(ncf);
- }
+ /* Release filters */
+ kfree(nc->mac_filter.addrs);
+ kfree(nc->vlan_filter.vids);
nc->state = NCSI_CHANNEL_INACTIVE;
spin_unlock_irqrestore(&nc->lock, flags);
@@ -670,32 +544,26 @@ static void ncsi_suspend_channel(struct ncsi_dev_priv *ndp)
static int clear_one_vid(struct ncsi_dev_priv *ndp, struct ncsi_channel *nc,
struct ncsi_cmd_arg *nca)
{
+ struct ncsi_channel_vlan_filter *ncf;
+ unsigned long flags;
+ void *bitmap;
int index;
- u32 *data;
u16 vid;
- index = ncsi_find_filter(nc, NCSI_FILTER_VLAN, NULL);
- if (index < 0) {
- /* Filter table empty */
- return -1;
- }
+ ncf = &nc->vlan_filter;
+ bitmap = &ncf->bitmap;
- data = ncsi_get_filter(nc, NCSI_FILTER_VLAN, index);
- if (!data) {
- netdev_err(ndp->ndev.dev,
- "NCSI: failed to retrieve filter %d\n", index);
- /* Set the VLAN id to 0 - this will still disable the entry in
- * the filter table, but we won't know what it was.
- */
- vid = 0;
- } else {
- vid = *(u16 *)data;
+ spin_lock_irqsave(&nc->lock, flags);
+ index = find_next_bit(bitmap, ncf->n_vids, 0);
+ if (index >= ncf->n_vids) {
+ spin_unlock_irqrestore(&nc->lock, flags);
+ return -1;
}
+ vid = ncf->vids[index];
- netdev_printk(KERN_DEBUG, ndp->ndev.dev,
- "NCSI: removed vlan tag %u at index %d\n",
- vid, index + 1);
- ncsi_remove_filter(nc, NCSI_FILTER_VLAN, index);
+ clear_bit(index, bitmap);
+ ncf->vids[index] = 0;
+ spin_unlock_irqrestore(&nc->lock, flags);
nca->type = NCSI_PKT_CMD_SVF;
nca->words[1] = vid;
@@ -711,45 +579,55 @@ static int clear_one_vid(struct ncsi_dev_priv *ndp, struct ncsi_channel *nc,
static int set_one_vid(struct ncsi_dev_priv *ndp, struct ncsi_channel *nc,
struct ncsi_cmd_arg *nca)
{
+ struct ncsi_channel_vlan_filter *ncf;
struct vlan_vid *vlan = NULL;
- int index = 0;
+ unsigned long flags;
+ int i, index;
+ void *bitmap;
+ u16 vid;
+ if (list_empty(&ndp->vlan_vids))
+ return -1;
+
+ ncf = &nc->vlan_filter;
+ bitmap = &ncf->bitmap;
+
+ spin_lock_irqsave(&nc->lock, flags);
+
+ rcu_read_lock();
list_for_each_entry_rcu(vlan, &ndp->vlan_vids, list) {
- index = ncsi_find_filter(nc, NCSI_FILTER_VLAN, &vlan->vid);
- if (index < 0) {
- /* New tag to add */
- netdev_printk(KERN_DEBUG, ndp->ndev.dev,
- "NCSI: new vlan id to set: %u\n",
- vlan->vid);
+ vid = vlan->vid;
+ for (i = 0; i < ncf->n_vids; i++)
+ if (ncf->vids[i] == vid) {
+ vid = 0;
+ break;
+ }
+ if (vid)
break;
- }
- netdev_printk(KERN_DEBUG, ndp->ndev.dev,
- "vid %u already at filter pos %d\n",
- vlan->vid, index);
}
+ rcu_read_unlock();
- if (!vlan || index >= 0) {
- netdev_printk(KERN_DEBUG, ndp->ndev.dev,
- "no vlan ids left to set\n");
+ if (!vid) {
+ /* No VLAN ID is not set */
+ spin_unlock_irqrestore(&nc->lock, flags);
return -1;
}
- index = ncsi_add_filter(nc, NCSI_FILTER_VLAN, &vlan->vid);
- if (index < 0) {
+ index = find_next_zero_bit(bitmap, ncf->n_vids, 0);
+ if (index < 0 || index >= ncf->n_vids) {
netdev_err(ndp->ndev.dev,
- "Failed to add new VLAN tag, error %d\n", index);
- if (index == -ENOSPC)
- netdev_err(ndp->ndev.dev,
- "Channel %u already has all VLAN filters set\n",
- nc->id);
+ "Channel %u already has all VLAN filters set\n",
+ nc->id);
+ spin_unlock_irqrestore(&nc->lock, flags);
return -1;
}
- netdev_printk(KERN_DEBUG, ndp->ndev.dev,
- "NCSI: set vid %u in packet, index %u\n",
- vlan->vid, index + 1);
+ ncf->vids[index] = vid;
+ set_bit(index, bitmap);
+ spin_unlock_irqrestore(&nc->lock, flags);
+
nca->type = NCSI_PKT_CMD_SVF;
- nca->words[1] = vlan->vid;
+ nca->words[1] = vid;
/* HW filter index starts at 1 */
nca->bytes[6] = index + 1;
nca->bytes[7] = 0x01;
diff --git a/net/ncsi/ncsi-netlink.c b/net/ncsi/ncsi-netlink.c
index 8d7e849d4825..b09ef77bf4cd 100644
--- a/net/ncsi/ncsi-netlink.c
+++ b/net/ncsi/ncsi-netlink.c
@@ -58,10 +58,9 @@ static int ncsi_write_channel_info(struct sk_buff *skb,
struct ncsi_dev_priv *ndp,
struct ncsi_channel *nc)
{
- struct nlattr *vid_nest;
- struct ncsi_channel_filter *ncf;
+ struct ncsi_channel_vlan_filter *ncf;
struct ncsi_channel_mode *m;
- u32 *data;
+ struct nlattr *vid_nest;
int i;
nla_put_u32(skb, NCSI_CHANNEL_ATTR_ID, nc->id);
@@ -79,18 +78,13 @@ static int ncsi_write_channel_info(struct sk_buff *skb,
vid_nest = nla_nest_start(skb, NCSI_CHANNEL_ATTR_VLAN_LIST);
if (!vid_nest)
return -ENOMEM;
- ncf = nc->filters[NCSI_FILTER_VLAN];
+ ncf = &nc->vlan_filter;
i = -1;
- if (ncf) {
- while ((i = find_next_bit((void *)&ncf->bitmap, ncf->total,
- i + 1)) < ncf->total) {
- data = ncsi_get_filter(nc, NCSI_FILTER_VLAN, i);
- /* Uninitialised channels will have 'zero' vlan ids */
- if (!data || !*data)
- continue;
+ while ((i = find_next_bit((void *)&ncf->bitmap, ncf->n_vids,
+ i + 1)) < ncf->n_vids) {
+ if (ncf->vids[i])
nla_put_u16(skb, NCSI_CHANNEL_ATTR_VLAN_ID,
- *(u16 *)data);
- }
+ ncf->vids[i]);
}
nla_nest_end(skb, vid_nest);
diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index efd933ff5570..ce9497966ebe 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -334,9 +334,9 @@ static int ncsi_rsp_handler_svf(struct ncsi_request *nr)
struct ncsi_rsp_pkt *rsp;
struct ncsi_dev_priv *ndp = nr->ndp;
struct ncsi_channel *nc;
- struct ncsi_channel_filter *ncf;
- unsigned short vlan;
- int ret;
+ struct ncsi_channel_vlan_filter *ncf;
+ unsigned long flags;
+ void *bitmap;
/* Find the package and channel */
rsp = (struct ncsi_rsp_pkt *)skb_network_header(nr->rsp);
@@ -346,22 +346,23 @@ static int ncsi_rsp_handler_svf(struct ncsi_request *nr)
return -ENODEV;
cmd = (struct ncsi_cmd_svf_pkt *)skb_network_header(nr->cmd);
- ncf = nc->filters[NCSI_FILTER_VLAN];
- if (!ncf)
- return -ENOENT;
- if (cmd->index >= ncf->total)
+ ncf = &nc->vlan_filter;
+ if (cmd->index > ncf->n_vids)
return -ERANGE;
- /* Add or remove the VLAN filter */
+ /* Add or remove the VLAN filter. Remember HW indexes from 1 */
+ spin_lock_irqsave(&nc->lock, flags);
+ bitmap = &ncf->bitmap;
if (!(cmd->enable & 0x1)) {
- /* HW indexes from 1 */
- ret = ncsi_remove_filter(nc, NCSI_FILTER_VLAN, cmd->index - 1);
+ if (test_and_clear_bit(cmd->index - 1, bitmap))
+ ncf->vids[cmd->index - 1] = 0;
} else {
- vlan = ntohs(cmd->vlan);
- ret = ncsi_add_filter(nc, NCSI_FILTER_VLAN, &vlan);
+ set_bit(cmd->index - 1, bitmap);
+ ncf->vids[cmd->index - 1] = ntohs(cmd->vlan);
}
+ spin_unlock_irqrestore(&nc->lock, flags);
- return ret;
+ return 0;
}
static int ncsi_rsp_handler_ev(struct ncsi_request *nr)
@@ -422,8 +423,12 @@ static int ncsi_rsp_handler_sma(struct ncsi_request *nr)
struct ncsi_rsp_pkt *rsp;
struct ncsi_dev_priv *ndp = nr->ndp;
struct ncsi_channel *nc;
- struct ncsi_channel_filter *ncf;
+ struct ncsi_channel_mac_filter *ncf;
+ unsigned long flags;
void *bitmap;
+ bool enabled;
+ int index;
+
/* Find the package and channel */
rsp = (struct ncsi_rsp_pkt *)skb_network_header(nr->rsp);
@@ -436,31 +441,23 @@ static int ncsi_rsp_handler_sma(struct ncsi_request *nr)
* isn't supported yet.
*/
cmd = (struct ncsi_cmd_sma_pkt *)skb_network_header(nr->cmd);
- switch (cmd->at_e >> 5) {
- case 0x0: /* UC address */
- ncf = nc->filters[NCSI_FILTER_UC];
- break;
- case 0x1: /* MC address */
- ncf = nc->filters[NCSI_FILTER_MC];
- break;
- default:
- return -EINVAL;
- }
+ enabled = cmd->at_e & 0x1;
+ ncf = &nc->mac_filter;
+ bitmap = &ncf->bitmap;
- /* Sanity check on the filter */
- if (!ncf)
- return -ENOENT;
- else if (cmd->index >= ncf->total)
+ if (cmd->index > ncf->n_uc + ncf->n_mc + ncf->n_mixed)
return -ERANGE;
- bitmap = &ncf->bitmap;
- if (cmd->at_e & 0x1) {
- set_bit(cmd->index, bitmap);
- memcpy(ncf->data + 6 * cmd->index, cmd->mac, 6);
+ index = (cmd->index - 1) * ETH_ALEN;
+ spin_lock_irqsave(&nc->lock, flags);
+ if (enabled) {
+ set_bit(cmd->index - 1, bitmap);
+ memcpy(&ncf->addrs[index], cmd->mac, ETH_ALEN);
} else {
- clear_bit(cmd->index, bitmap);
- memset(ncf->data + 6 * cmd->index, 0, 6);
+ clear_bit(cmd->index - 1, bitmap);
+ memset(&ncf->addrs[index], 0, ETH_ALEN);
}
+ spin_unlock_irqrestore(&nc->lock, flags);
return 0;
}
@@ -631,9 +628,7 @@ static int ncsi_rsp_handler_gc(struct ncsi_request *nr)
struct ncsi_rsp_gc_pkt *rsp;
struct ncsi_dev_priv *ndp = nr->ndp;
struct ncsi_channel *nc;
- struct ncsi_channel_filter *ncf;
- size_t size, entry_size;
- int cnt, i;
+ size_t size;
/* Find the channel */
rsp = (struct ncsi_rsp_gc_pkt *)skb_network_header(nr->rsp);
@@ -655,64 +650,40 @@ static int ncsi_rsp_handler_gc(struct ncsi_request *nr)
nc->caps[NCSI_CAP_VLAN].cap = rsp->vlan_mode &
NCSI_CAP_VLAN_MASK;
- /* Build filters */
- for (i = 0; i < NCSI_FILTER_MAX; i++) {
- switch (i) {
- case NCSI_FILTER_VLAN:
- cnt = rsp->vlan_cnt;
- entry_size = 2;
- break;
- case NCSI_FILTER_MIXED:
- cnt = rsp->mixed_cnt;
- entry_size = 6;
- break;
- case NCSI_FILTER_MC:
- cnt = rsp->mc_cnt;
- entry_size = 6;
- break;
- case NCSI_FILTER_UC:
- cnt = rsp->uc_cnt;
- entry_size = 6;
- break;
- default:
- continue;
- }
-
- if (!cnt || nc->filters[i])
- continue;
-
- size = sizeof(*ncf) + cnt * entry_size;
- ncf = kzalloc(size, GFP_ATOMIC);
- if (!ncf) {
- pr_warn("%s: Cannot alloc filter table (%d)\n",
- __func__, i);
- return -ENOMEM;
- }
-
- ncf->index = i;
- ncf->total = cnt;
- if (i == NCSI_FILTER_VLAN) {
- /* Set VLAN filters active so they are cleared in
- * first configuration state
- */
- ncf->bitmap = U64_MAX;
- } else {
- ncf->bitmap = 0x0ul;
- }
- nc->filters[i] = ncf;
- }
+ size = (rsp->uc_cnt + rsp->mc_cnt + rsp->mixed_cnt) * ETH_ALEN;
+ nc->mac_filter.addrs = kzalloc(size, GFP_KERNEL);
+ if (!nc->mac_filter.addrs)
+ return -ENOMEM;
+ nc->mac_filter.n_uc = rsp->uc_cnt;
+ nc->mac_filter.n_mc = rsp->mc_cnt;
+ nc->mac_filter.n_mixed = rsp->mixed_cnt;
+
+ nc->vlan_filter.vids = kcalloc(rsp->vlan_cnt,
+ sizeof(*nc->vlan_filter.vids),
+ GFP_KERNEL);
+ if (!nc->vlan_filter.vids)
+ return -ENOMEM;
+ /* Set VLAN filters active so they are cleared in the first
+ * configuration state
+ */
+ nc->vlan_filter.bitmap = U64_MAX;
+ nc->vlan_filter.n_vids = rsp->vlan_cnt;
return 0;
}
static int ncsi_rsp_handler_gp(struct ncsi_request *nr)
{
- struct ncsi_rsp_gp_pkt *rsp;
+ struct ncsi_channel_vlan_filter *ncvf;
+ struct ncsi_channel_mac_filter *ncmf;
struct ncsi_dev_priv *ndp = nr->ndp;
+ struct ncsi_rsp_gp_pkt *rsp;
struct ncsi_channel *nc;
- unsigned short enable, vlan;
+ unsigned short enable;
unsigned char *pdata;
- int table, i;
+ unsigned long flags;
+ void *bitmap;
+ int i;
/* Find the channel */
rsp = (struct ncsi_rsp_gp_pkt *)skb_network_header(nr->rsp);
@@ -746,36 +717,33 @@ static int ncsi_rsp_handler_gp(struct ncsi_request *nr)
/* MAC addresses filter table */
pdata = (unsigned char *)rsp + 48;
enable = rsp->mac_enable;
+ ncmf = &nc->mac_filter;
+ spin_lock_irqsave(&nc->lock, flags);
+ bitmap = &ncmf->bitmap;
for (i = 0; i < rsp->mac_cnt; i++, pdata += 6) {
- if (i >= (nc->filters[NCSI_FILTER_UC]->total +
- nc->filters[NCSI_FILTER_MC]->total))
- table = NCSI_FILTER_MIXED;
- else if (i >= nc->filters[NCSI_FILTER_UC]->total)
- table = NCSI_FILTER_MC;
- else
- table = NCSI_FILTER_UC;
-
if (!(enable & (0x1 << i)))
- continue;
-
- if (ncsi_find_filter(nc, table, pdata) >= 0)
- continue;
+ clear_bit(i, bitmap);
+ else
+ set_bit(i, bitmap);
- ncsi_add_filter(nc, table, pdata);
+ memcpy(&ncmf->addrs[i * ETH_ALEN], pdata, ETH_ALEN);
}
+ spin_unlock_irqrestore(&nc->lock, flags);
/* VLAN filter table */
enable = ntohs(rsp->vlan_enable);
+ ncvf = &nc->vlan_filter;
+ bitmap = &ncvf->bitmap;
+ spin_lock_irqsave(&nc->lock, flags);
for (i = 0; i < rsp->vlan_cnt; i++, pdata += 2) {
if (!(enable & (0x1 << i)))
- continue;
-
- vlan = ntohs(*(__be16 *)pdata);
- if (ncsi_find_filter(nc, NCSI_FILTER_VLAN, &vlan) >= 0)
- continue;
+ clear_bit(i, bitmap);
+ else
+ set_bit(i, bitmap);
- ncsi_add_filter(nc, NCSI_FILTER_VLAN, &vlan);
+ ncvf->vids[i] = ntohs(*(__be16 *)pdata);
}
+ spin_unlock_irqrestore(&nc->lock, flags);
return 0;
}
--
2.17.0
^ permalink raw reply related
* Re: KASAN: use-after-free Read in inet_create
From: Sowmini Varadhan @ 2018-04-09 1:04 UTC (permalink / raw)
To: Eric Biggers
Cc: linux-rdma, rds-devel, Santosh Shilimkar, syzbot, davem, kuznet,
linux-kernel, netdev, syzkaller-bugs, yoshfuji
In-Reply-To: <20180408231756.GI685@sol.localdomain>
#syz dup: KASAN: use-after-free Read in rds_cong_queue_updates
There are a number of manifestations of this bug, basically
all suggest that the connect/reconnect etc workqs are somehow
being scheduled after the netns is deleted, despite the
code refactoring in Commit 3db6e0d172c (and looks like
the WARN_ONs in that commit are not even being triggered).
We've not been able to reproduce this issues, and without
a crash dump (or some hint of other threads that were running
at the time of the problem) are working on figuring out
the root-cause by code-inspection.
--Sowmini
^ permalink raw reply
* [PATCH AUTOSEL for 4.9 160/293] MIPS: Give __secure_computing() access to syscall arguments.
From: Sasha Levin @ 2018-04-09 0:24 UTC (permalink / raw)
To: stable@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: David Daney, Alexei Starovoitov, Daniel Borkmann, Matt Redfearn,
netdev@vger.kernel.org, linux-mips@linux-mips.org, Ralf Baechle,
Sasha Levin
In-Reply-To: <20180409002239.163177-1-alexander.levin@microsoft.com>
From: David Daney <david.daney@cavium.com>
[ Upstream commit 669c4092225f0ed5df12ebee654581b558a5e3ed ]
KProbes of __seccomp_filter() are not very useful without access to
the syscall arguments.
Do what x86 does, and populate a struct seccomp_data to be passed to
__secure_computing(). This allows samples/bpf/tracex5 to extract a
sensible trace.
Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/16368/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
---
arch/mips/kernel/ptrace.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/arch/mips/kernel/ptrace.c b/arch/mips/kernel/ptrace.c
index 0c8ae2cc6380..956dae7e6a69 100644
--- a/arch/mips/kernel/ptrace.c
+++ b/arch/mips/kernel/ptrace.c
@@ -1011,8 +1011,26 @@ asmlinkage long syscall_trace_enter(struct pt_regs *regs, long syscall)
tracehook_report_syscall_entry(regs))
return -1;
- if (secure_computing(NULL) == -1)
- return -1;
+#ifdef CONFIG_SECCOMP
+ if (unlikely(test_thread_flag(TIF_SECCOMP))) {
+ int ret, i;
+ struct seccomp_data sd;
+
+ sd.nr = syscall;
+ sd.arch = syscall_get_arch();
+ for (i = 0; i < 6; i++) {
+ unsigned long v, r;
+
+ r = mips_get_syscall_arg(&v, current, regs, i);
+ sd.args[i] = r ? 0 : v;
+ }
+ sd.instruction_pointer = KSTK_EIP(current);
+
+ ret = __secure_computing(&sd);
+ if (ret == -1)
+ return ret;
+ }
+#endif
if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
trace_sys_enter(regs, regs->regs[2]);
--
2.15.1
^ permalink raw reply related
* Re: [PATCH v3] dp83640: Ensure against premature access to PHY registers after reset
From: David Miller @ 2018-04-08 23:59 UTC (permalink / raw)
To: esben.haabendal
Cc: netdev, eha, richardcochran, andrew, f.fainelli, linux-kernel
In-Reply-To: <20180408201702.23299-1-esben.haabendal@gmail.com>
From: Esben Haabendal <esben.haabendal@gmail.com>
Date: Sun, 8 Apr 2018 22:17:01 +0200
> From: Esben Haabendal <eha@deif.com>
>
> The datasheet specifies a 3uS pause after performing a software
> reset. The default implementation of genphy_soft_reset() does not
> provide this, so implement soft_reset with the needed pause.
>
> Signed-off-by: Esben Haabendal <eha@deif.com>
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Applied, thank you.
^ permalink raw reply
* Re: pull-request: bpf 2018-04-09
From: David Miller @ 2018-04-08 23:58 UTC (permalink / raw)
To: daniel; +Cc: ast, netdev
In-Reply-To: <20180408222847.27289-1-daniel@iogearbox.net>
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Mon, 9 Apr 2018 00:28:47 +0200
> The following pull-request contains BPF updates for your *net* tree.
>
> The main changes are:
>
> 1) Two sockmap fixes: i) fix a potential warning when a socket with
> pending cork data is closed by freeing the memory right when the
> socket is closed instead of seeing still outstanding memory at
> garbage collector time, ii) fix a NULL pointer deref in case of
> duplicates release calls, so make sure to only reset the sk_prot
> pointer when it's in a valid state to do so, both from John.
>
> 2) Fix a compilation warning in bpf_prog_attach_check_attach_type()
> by moving the function under CONFIG_CGROUP_BPF ifdef since only
> used there, from Anders.
>
> Please consider pulling these changes from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
Pulled, thanks Daniel.
^ permalink raw reply
* Re: [RFC] connector: add group_exit_code and signal_flags fields to exit_proc_event
From: Evgeniy Polyakov @ 2018-04-08 23:49 UTC (permalink / raw)
To: Stefan Strogin, matthltc@us.ibm.com
Cc: netdev@vger.kernel.org, xe-linux-external@cisco.com,
Victor Kamensky, Taras Kondratiuk, Ruslan Bilovol
In-Reply-To: <307aa191-05b6-5227-19a0-13741c839fd1@cisco.com>
Hi everyone
Sorry for that late reply
01.03.2018, 21:58, "Stefan Strogin" <sstrogin@cisco.com>:
> So I was thinking to add these two fields to union event_data:
> task->signal->group_exit_code
> task->signal->flags
> This won't increase size of struct proc_event (because of comm_proc_event)
> and shouldn't break backward compatibility for the user-space. But it will
> add some useful information about what caused the process death.
> What do you think, is it an acceptable approach?
As I saw in other discussion, doesn't it break userspace API, or you are sure that no sizes has been increased?
You are using the same structure as used for plain signals and add group status there, how will userspace react,
if it was compiled with older headers? What if it uses zero-field alignment, i.e. allocating exactly the size of structure with byte precision?
^ permalink raw reply
* DPAA TX Issues
From: Jacob S. Moroni @ 2018-04-08 23:46 UTC (permalink / raw)
To: madalin.bucur; +Cc: netdev
Hello Madalin,
I've been experiencing some issues with the DPAA Ethernet driver,
specifically related to frame transmission. Hopefully you can point
me in the right direction.
TLDR: Attempting to transmit faster than a few frames per second causes
the TX FQ CGR to enter into the congested state and remain there forever,
even after transmission stops.
The hardware is a T2080RDB, running from the tip of net-next, using
the standard t2080rdb device tree and corenet64_smp_defconfig kernel
config. No changes were made to any of the files. The issue occurs
with 4.16.1 stable as well. In fact, the only time I've been able
to achieve reliable frame transmission was with the SDK 4.1 kernel.
For my tests, I'm running iperf3 both with and without the -R
option (send/receive). When using a USB Ethernet adapter, there
are no issues.
The issue is that it seems like the TX frame queues are getting
"stuck" when attempting to transmit at rates greater than a few frames
per second. Ping works fine, but it seems like anything that could
potentially cause multiple TX frames to be enqueued causes issues.
If I run iperf3 in reverse mode (with the T2080RDB receiving), then
I can achieve ~940 Mbps, but this is also somewhat unreliable.
If I run it with the T2080RDB transmitting, the test will never
complete. Sometimes it starts transmitting for a few seconds then stops,
and other times it never even starts. This also seems to force the
interface into a bad state.
The ethtool stats show that the interface has entered
congestion a few times, and that it's currently congested. The fact
that it's currently congested even after stopping transmission
indicates that the FQ somehow stopped being drained. I've also
noticed that whenever this issue occurs, the TX confirmation
counters are always less than the TX packet counters.
When it gets into this state, I can see that the memory usage is
climbing, up until about the point of where the CGR threshold
is (about 100 MB).
Any idea what could prevent the TX FQ from being drained? My first
guess was flow control, but it's completely disabled.
I tried messing with the egress congestion threshold, workqueue
assignments, etc., but nothing seemed to have any effect.
If you need any more information or want me to run any tests,
please let me know.
Thanks,
--
Jacob S. Moroni
mail@jakemoroni.com
^ permalink raw reply
* Re: KASAN: use-after-free Read in inet_create
From: Eric Biggers @ 2018-04-08 23:17 UTC (permalink / raw)
To: linux-rdma, rds-devel, Santosh Shilimkar
Cc: syzbot, davem, kuznet, linux-kernel, netdev, syzkaller-bugs,
yoshfuji
In-Reply-To: <001a1144d1c8e819f6055fee7118@google.com>
[+RDS list and maintainer]
On Sat, Dec 09, 2017 at 12:50:01PM -0800, syzbot wrote:
> Hello,
>
> syzkaller hit the following crash on
> 82bcf1def3b5f1251177ad47c44f7e17af039b4b
> git://git.cmpxchg.org/linux-mmots.git/master
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached
> Raw console output is attached.
>
> Unfortunately, I don't have any reproducer for this bug yet.
>
>
> ==================================================================
> BUG: KASAN: use-after-free in inet_create+0xda0/0xf50 net/ipv4/af_inet.c:338
> Read of size 4 at addr ffff8801bde28554 by task kworker/u4:5/3492
>
> CPU: 0 PID: 3492 Comm: kworker/u4:5 Not tainted 4.15.0-rc2-mm1+ #39
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Workqueue: krdsd rds_connect_worker
> Call Trace:
> __dump_stack lib/dump_stack.c:17 [inline]
> dump_stack+0x194/0x257 lib/dump_stack.c:53
> print_address_description+0x73/0x250 mm/kasan/report.c:252
> kasan_report_error mm/kasan/report.c:351 [inline]
> kasan_report+0x25b/0x340 mm/kasan/report.c:409
> __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
> inet_create+0xda0/0xf50 net/ipv4/af_inet.c:338
> __sock_create+0x4d4/0x850 net/socket.c:1265
> sock_create_kern+0x3f/0x50 net/socket.c:1311
> rds_tcp_conn_path_connect+0x26f/0x920 net/rds/tcp_connect.c:108
> rds_connect_worker+0x156/0x1f0 net/rds/threads.c:165
> process_one_work+0xbfd/0x1bc0 kernel/workqueue.c:2113
> worker_thread+0x223/0x1990 kernel/workqueue.c:2247
> kthread+0x37a/0x440 kernel/kthread.c:238
> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:524
>
> Allocated by task 3362:
> save_stack+0x43/0xd0 mm/kasan/kasan.c:447
> set_track mm/kasan/kasan.c:459 [inline]
> kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
> kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
> kmem_cache_alloc+0x12e/0x760 mm/slab.c:3548
> kmem_cache_zalloc include/linux/slab.h:695 [inline]
> net_alloc net/core/net_namespace.c:362 [inline]
> copy_net_ns+0x196/0x580 net/core/net_namespace.c:402
> create_new_namespaces+0x425/0x880 kernel/nsproxy.c:107
> unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:206
> SYSC_unshare kernel/fork.c:2421 [inline]
> SyS_unshare+0x653/0xfa0 kernel/fork.c:2371
> entry_SYSCALL_64_fastpath+0x1f/0x96
>
> Freed by task 35:
> save_stack+0x43/0xd0 mm/kasan/kasan.c:447
> set_track mm/kasan/kasan.c:459 [inline]
> kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
> __cache_free mm/slab.c:3492 [inline]
> kmem_cache_free+0x77/0x280 mm/slab.c:3750
> net_free+0xca/0x110 net/core/net_namespace.c:378
> net_drop_ns.part.11+0x26/0x30 net/core/net_namespace.c:385
> net_drop_ns net/core/net_namespace.c:384 [inline]
> cleanup_net+0x895/0xb60 net/core/net_namespace.c:502
> process_one_work+0xbfd/0x1bc0 kernel/workqueue.c:2113
> worker_thread+0x223/0x1990 kernel/workqueue.c:2247
> kthread+0x37a/0x440 kernel/kthread.c:238
> ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:524
>
> The buggy address belongs to the object at ffff8801bde28080
> which belongs to the cache net_namespace of size 6272
> The buggy address is located 1236 bytes inside of
> 6272-byte region [ffff8801bde28080, ffff8801bde29900)
> The buggy address belongs to the page:
> page:00000000df6a4dc0 count:1 mapcount:0 mapping:00000000553659f1 index:0x0
> compound_mapcount: 0
> flags: 0x2fffc0000008100(slab|head)
> raw: 02fffc0000008100 ffff8801bde28080 0000000000000000 0000000100000001
> raw: ffffea0006f75da0 ffffea0006f60220 ffff8801d989fe00 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
> ffff8801bde28400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8801bde28480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > ffff8801bde28500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ^
> ffff8801bde28580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8801bde28600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================
>
>
> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@googlegroups.com.
> Please credit me with: Reported-by: syzbot <syzkaller@googlegroups.com>
>
> syzbot will keep track of this bug report.
> Once a fix for this bug is merged into any tree, reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug
> report.
> Note: all commands must start from beginning of the line in the email body.
>
This is still happening regularly, though syzbot hasn't been able to generate a
reproducer yet. All the reports seem to involve rds_connect_worker()
encountering a freed network namespace (struct net) when calling
sock_create_kern() from rds_tcp_conn_path_connect(). Probably something in RDS
needs to be taking a reference to the network namespace and isn't, or the RDS
workqueue isn't being shut down correctly. You can see all reports of this on
the syzbot dashboard at
https://syzkaller.appspot.com/bug?id=1f45ae538a0453220337ccb84962249fdd67107f.
Last one was April 5 on Linus' tree (commit 3e968c9f1401088).
- Eric
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox