* Re: ath6kl: fix ath6kl_data_tx()'s return type
From: Kalle Valo @ 2018-04-27 11:35 UTC (permalink / raw)
To: Luc Van Oostenryck
Cc: linux-kernel, Luc Van Oostenryck, Kalle Valo, linux-wireless,
netdev
In-Reply-To: <20180424131900.5718-1-luc.vanoostenryck@gmail.com>
Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote:
> The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
> which is a typedef for an enum type, but the implementation in this
> driver returns an 'int'.
>
> Fix this by returning 'netdev_tx_t' in this driver too.
>
> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Patch applied to ath-next branch of ath.git, thanks.
378b1d65070f ath6kl: fix ath6kl_data_tx()'s return type
--
https://patchwork.kernel.org/patch/10359823/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
^ permalink raw reply
* RE: [PATCH] DT: net: can: rcar_canfd: document R8A77970 bindings
From: Ramesh Shanmugasundaram @ 2018-04-27 11:33 UTC (permalink / raw)
To: Sergei Shtylyov, Marc Kleine-Budde, Rob Herring,
linux-can@vger.kernel.org, netdev@vger.kernel.org,
devicetree@vger.kernel.org
Cc: Wolfgang Grandegger, Mark Rutland,
linux-renesas-soc@vger.kernel.org
In-Reply-To: <7a3d170f-1d3a-a807-9256-15fe1a78ca4e@cogentembedded.com>
Hello Sergei,
Thanks for your patch.
> Subject: [PATCH] DT: net: can: rcar_canfd: document R8A77970 bindings
>
> Document the R-Car V3M (R8A77970) SoC support in the R-Car CAN-FD
> bindings.
>
> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Ramesh Shanmugasundaram <ramesh.shanmugasundaram@bp.renesas.com>
Thanks,
Ramesh
^ permalink raw reply
* Re: ip6-in-ip{4,6} ipsec tunnel issues with 1280 MTU
From: Ashwanth Goli @ 2018-04-27 11:02 UTC (permalink / raw)
To: Paolo Abeni; +Cc: netdev, maloney, edumazet, David Ahern, netdev-owner
In-Reply-To: <1524743477.2658.38.camel@redhat.com>
On 2018-04-26 17:21, Paolo Abeni wrote:
> Hi,
>
> [fixed CC list]
>
> On Wed, 2018-04-25 at 21:43 +0530, Ashwanth Goli wrote:
>> Hi Pablo,
>
> Actually I'm Paolo, but yours is a recurring mistake ;)
>
>> I am noticing an issue similar to the one reported by Alexis Perez
>> [Regression for ip6-in-ip4 IPsec tunnel in 4.14.16]
>>
>> In my IPsec setup outer MTU is set to 1280, ip6_setup_cork sees an MTU
>> less than IPV6_MIN_MTU because of the tunnel headers. -EINVAL is being
>> returned as a result of the MTU check that got added with below patch.
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/net/ipv6/ip6_output.c?h=v4.14.34&id=8278804e05f6bcfe3fdfea4a404020752ead15a6
>>
>> Can we remove this MTU check since your recent patch [ipv6: the entire
>> IPv6 header chain must fit the first fragment] fixes a similar issue?
>
> AFAICS, RFC 2473 implies we can have MTU below 1280 for tunnel devices
> so we can probably relax the MTU check for such devices, but I think we
> would still need it in the general case.
>
> Cheers,
>
> Paolo
Should I send out the following change as a patch?
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 2e891d2..c4c3313 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1235,7 +1235,7 @@ static int ip6_setup_cork(struct sock *sk, struct
inet_cork_full *cork,
if (np->frag_size)
mtu = np->frag_size;
}
- if (mtu < IPV6_MIN_MTU)
+ if (!(rt->dst.flags & DST_XFRM_TUNNEL) && mtu < IPV6_MIN_MTU)
return -EINVAL;
cork->base.fragsize = mtu;
if (dst_allfrag(xfrm_dst_path(&rt->dst)))
^ permalink raw reply related
* Re: [PATCH net-next v3] Add Common Applications Kept Enhanced (cake) qdisc
From: kbuild test robot @ 2018-04-27 10:54 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: kbuild-all, netdev, cake, Toke Høiland-Jørgensen,
Dave Taht
In-Reply-To: <20180425134249.21300-1-toke@toke.dk>
[-- Attachment #1: Type: text/plain, Size: 4280 bytes --]
Hi Toke,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on net-next/master]
url: https://github.com/0day-ci/linux/commits/Toke-H-iland-J-rgensen/Add-Common-Applications-Kept-Enhanced-cake-qdisc/20180427-175308
config: i386-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All errors (new ones prefixed by >>):
>> net/sched/sch_cake.c:68:10: fatal error: pkt_sched.h: No such file or directory
#include "pkt_sched.h"
^~~~~~~~~~~~~
compilation terminated.
vim +68 net/sched/sch_cake.c
2
3 /* COMMON Applications Kept Enhanced (CAKE) discipline
4 *
5 * Copyright (C) 2014-2018 Jonathan Morton <chromatix99@gmail.com>
6 * Copyright (C) 2015-2018 Toke Høiland-Jørgensen <toke@toke.dk>
7 * Copyright (C) 2014-2018 Dave Täht <dave.taht@gmail.com>
8 * Copyright (C) 2015-2018 Sebastian Moeller <moeller0@gmx.de>
9 * (C) 2015-2018 Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk>
10 * Copyright (C) 2017 Ryan Mounce <ryan@mounce.com.au>
11 *
12 * The CAKE Principles:
13 * (or, how to have your cake and eat it too)
14 *
15 * This is a combination of several shaping, AQM and FQ techniques into one
16 * easy-to-use package:
17 *
18 * - An overall bandwidth shaper, to move the bottleneck away from dumb CPE
19 * equipment and bloated MACs. This operates in deficit mode (as in sch_fq),
20 * eliminating the need for any sort of burst parameter (eg. token bucket
21 * depth). Burst support is limited to that necessary to overcome scheduling
22 * latency.
23 *
24 * - A Diffserv-aware priority queue, giving more priority to certain classes,
25 * up to a specified fraction of bandwidth. Above that bandwidth threshold,
26 * the priority is reduced to avoid starving other tins.
27 *
28 * - Each priority tin has a separate Flow Queue system, to isolate traffic
29 * flows from each other. This prevents a burst on one flow from increasing
30 * the delay to another. Flows are distributed to queues using a
31 * set-associative hash function.
32 *
33 * - Each queue is actively managed by Cobalt, which is a combination of the
34 * Codel and Blue AQM algorithms. This serves flows fairly, and signals
35 * congestion early via ECN (if available) and/or packet drops, to keep
36 * latency low. The codel parameters are auto-tuned based on the bandwidth
37 * setting, as is necessary at low bandwidths.
38 *
39 * The configuration parameters are kept deliberately simple for ease of use.
40 * Everything has sane defaults. Complete generality of configuration is *not*
41 * a goal.
42 *
43 * The priority queue operates according to a weighted DRR scheme, combined with
44 * a bandwidth tracker which reuses the shaper logic to detect which side of the
45 * bandwidth sharing threshold the tin is operating. This determines whether a
46 * priority-based weight (high) or a bandwidth-based weight (low) is used for
47 * that tin in the current pass.
48 *
49 * This qdisc was inspired by Eric Dumazet's fq_codel code, which he kindly
50 * granted us permission to leverage.
51 */
52
53 #include <linux/module.h>
54 #include <linux/types.h>
55 #include <linux/kernel.h>
56 #include <linux/jiffies.h>
57 #include <linux/string.h>
58 #include <linux/in.h>
59 #include <linux/errno.h>
60 #include <linux/init.h>
61 #include <linux/skbuff.h>
62 #include <linux/jhash.h>
63 #include <linux/slab.h>
64 #include <linux/vmalloc.h>
65 #include <linux/reciprocal_div.h>
66 #include <net/netlink.h>
67 #include <linux/version.h>
> 68 #include "pkt_sched.h"
69 #include <linux/if_vlan.h>
70 #include <net/pkt_sched.h>
71 #include <net/tcp.h>
72 #include <net/flow_dissector.h>
73
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 62952 bytes --]
^ permalink raw reply
* Re: [PATCH net-next 03/13] sctp: remove an if() that is always true
From: Neil Horman @ 2018-04-27 10:50 UTC (permalink / raw)
To: Marcelo Ricardo Leitner; +Cc: netdev, linux-sctp, Vlad Yasevich, Xin Long
In-Reply-To: <b083bd9240e25ad84cda4d9212b886da2373ec11.1524772453.git.marcelo.leitner@gmail.com>
On Thu, Apr 26, 2018 at 04:58:52PM -0300, Marcelo Ricardo Leitner wrote:
> As noticed by Xin Long, the if() here is always true as PMTU can never
> be 0.
>
> Reported-by: Xin Long <lucien.xin@gmail.com>
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> ---
> net/sctp/associola.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index b3aa95222bd52113295cb246c503c903bdd5c353..c5ed09cfa8423b17546e3d45f6d06db03af66384 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -1397,10 +1397,8 @@ void sctp_assoc_sync_pmtu(struct sctp_association *asoc)
> pmtu = t->pathmtu;
> }
>
> - if (pmtu) {
> - asoc->pathmtu = pmtu;
> - asoc->frag_point = sctp_frag_point(asoc, pmtu);
> - }
> + asoc->pathmtu = pmtu;
> + asoc->frag_point = sctp_frag_point(asoc, pmtu);
>
Can you double check this? Looking at it, it seems far fetched, but if someone
sends a crafted icmp dest unreach message to the host, pmtu_sending might be
able to get set for an association (which may have no transports established
yet), and if so, on the first packet send sctp_assoc_sync_pmtu can be called,
leading to a fall through in the loop over all transports, and pmtu being zero.
It seems like a far fetched set of circumstances, I know, but if it can happen,
I think you might see a crash in sctp_frag_point due to an underflow of the frag
value
Neil
> pr_debug("%s: asoc:%p, pmtu:%d, frag_point:%d\n", __func__, asoc,
> asoc->pathmtu, asoc->frag_point);
> --
> 2.14.3
>
>
^ permalink raw reply
* [net-next] ipv6: sr: Add documentation for seg_flowlabel sysctl
From: Ahmed Abdelsalam @ 2018-04-27 10:35 UTC (permalink / raw)
To: davem, linux-doc, netdev; +Cc: Ahmed Abdelsalam
This patch adds a documentation for seg_flowlabel sysctl into
Documentation/networking/ip-sysctl.txt
Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
---
Documentation/networking/ip-sysctl.txt | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 5dc1a04..7528f71 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -1428,6 +1428,19 @@ ip6frag_low_thresh - INTEGER
ip6frag_time - INTEGER
Time in seconds to keep an IPv6 fragment in memory.
+IPv6 Segment Routing:
+
+seg6_flowlabel - INTEGER
+ Controls the behaviour of computing the flowlabel of outer
+ IPv6 header in case of SR T.encaps
+
+ -1 set flowlabel to zero.
+ 0 copy flowlabel from Inner paceket in case of Inner IPv6
+ (Set flowlabel to 0 in case IPv4/L2)
+ 1 Compute the flowlabel using seg6_make_flowlabel()
+
+ Default is 0.
+
conf/default/*:
Change the interface-specific default settings.
--
2.1.4
^ permalink raw reply related
* [PATCH net-next 2/2 v3] netns: restrict uevents
From: Christian Brauner @ 2018-04-27 10:23 UTC (permalink / raw)
To: ebiederm, davem, netdev, linux-kernel
Cc: avagin, ktkhai, serge, gregkh, Christian Brauner
In-Reply-To: <20180427102306.8617-1-christian.brauner@ubuntu.com>
commit 07e98962fa77 ("kobject: Send hotplug events in all network namespaces")
enabled sending hotplug events into all network namespaces back in 2010.
Over time the set of uevents that get sent into all network namespaces has
shrunk. We have now reached the point where hotplug events for all devices
that carry a namespace tag are filtered according to that namespace.
Specifically, they are filtered whenever the namespace tag of the kobject
does not match the namespace tag of the netlink socket.
Currently, only network devices carry namespace tags (i.e. network
namespace tags). Hence, uevents for network devices only show up in the
network namespace such devices are created in or moved to.
However, any uevent for a kobject that does not have a namespace tag
associated with it will not be filtered and we will broadcast it into all
network namespaces. This behavior stopped making sense when user namespaces
were introduced.
This patch simplifies and fixes couple of things:
- Split codepath for sending uevents by kobject namespace tags:
1. Untagged kobjects - uevent_net_broadcast_untagged():
Untagged kobjects will be broadcast into all uevent sockets recorded
in uevent_sock_list, i.e. into all network namespacs owned by the
intial user namespace.
2. Tagged kobjects - uevent_net_broadcast_tagged():
Tagged kobjects will only be broadcast into the network namespace they
were tagged with.
Handling of tagged kobjects in 2. does not cause any semantic changes.
This is just splitting out the filtering logic that was handled by
kobj_bcast_filter() before.
Handling of untagged kobjects in 1. will cause a semantic change. The
reasons why this is needed and ok have been discussed in [1]. Here is a
short summary:
- Userspace ignores uevents from network namespaces that are not owned by
the intial user namespace:
Uevents are filtered by userspace in a user namespace because the
received uid != 0. Instead the uid associated with the event will be
65534 == "nobody" because the global root uid is not mapped.
This means we can safely and without introducing regressions modify the
kernel to not send uevents into all network namespaces whose owning
user namespace is not the initial user namespace because we know that
userspace will ignore the message because of the uid anyway.
I have a) verified that is is true for every udev implementation out
there b) that this behavior has been present in all udev
implementations from the very beginning.
- Thundering herd:
Broadcasting uevents into all network namespaces introduces significant
overhead.
All processes that listen to uevents running in non-initial user
namespaces will end up responding to uevents that will be meaningless
to them. Mainly, because non-initial user namespaces cannot easily
manage devices unless they have a privileged host-process helping them
out. This means that there will be a thundering herd of activity when
there shouldn't be any.
- Removing needless overhead/Increasing performance:
Currently, the uevent socket for each network namespace is added to the
global variable uevent_sock_list. The list itself needs to be protected
by a mutex. So everytime a uevent is generated the mutex is taken on
the list. The mutex is held *from the creation of the uevent (memory
allocation, string creation etc. until all uevent sockets have been
handled*. This is aggravated by the fact that for each uevent socket
that has listeners the mc_list must be walked as well which means we're
talking O(n^2) here. Given that a standard Linux workload usually has
quite a lot of network namespaces and - in the face of containers - a
lot of user namespaces this quickly becomes a performance problem (see
"Thundering herd" above). By just recording uevent sockets of network
namespaces that are owned by the initial user namespace we
significantly increase performance in this codepath.
- Injecting uevents:
There's a valid argument that containers might be interested in
receiving device events especially if they are delegated to them by a
privileged userspace process. One prime example are SR-IOV enabled
devices that are explicitly designed to be handed of to other users
such as VMs or containers.
This use-case can now be correctly handled since
commit 692ec06d7c92 ("netns: send uevent messages"). This commit
introduced the ability to send uevents from userspace. As such we can
let a sufficiently privileged (CAP_SYS_ADMIN in the owning user
namespace of the network namespace of the netlink socket) userspace
process make a decision what uevents should be sent. This removes the
need to blindly broadcast uevents into all user namespaces and provides
a performant and safe solution to this problem.
- Filtering logic:
This patch filters by *owning user namespace of the network namespace a
given task resides in* and not by user namespace of the task per se.
This means if the user namespace of a given task is unshared but the
network namespace is kept and is owned by the initial user namespace a
listener that is opening the uevent socket in that network namespace
can still listen to uevents.
- Fix permission for tagged kobjects:
Network devices that are created or moved into a network namespace that
is owned by a non-initial user namespace currently are send with
INVALID_{G,U}ID in their credentials. This means that all current udev
implementations in userspace will ignore the uevent they receive for
them. This has lead to weird bugs whereby new devices showing up in such
network namespaces were not recognized and did not get IPs assigned etc.
This patch adjusts the permission to the appropriate {g,u}id in the
respective user namespace. This way udevd is able to correctly handle
such devices.
- Simplify filtering logic:
do_one_broadcast() already ensures that only listeners in mc_list receive
uevents that have the same network namespace as the uevent socket itself.
So the filtering logic in kobj_bcast_filter is not needed (see [3]). This
patch therefore removes kobj_bcast_filter() and replaces
netlink_broadcast_filtered() with the simpler netlink_broadcast()
everywhere.
[1]: https://lkml.org/lkml/2018/4/4/739
[2]: https://lkml.org/lkml/2018/4/26/767
[3]: https://lkml.org/lkml/2018/4/26/738
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
lib/kobject_uevent.c | 140 ++++++++++++++++++++++++++++++-------------
1 file changed, 99 insertions(+), 41 deletions(-)
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index c3cb110f663b..d8ce5e6d83af 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -22,6 +22,7 @@
#include <linux/socket.h>
#include <linux/skbuff.h>
#include <linux/netlink.h>
+#include <linux/uidgid.h>
#include <linux/uuid.h>
#include <linux/ctype.h>
#include <net/sock.h>
@@ -231,30 +232,6 @@ int kobject_synth_uevent(struct kobject *kobj, const char *buf, size_t count)
return r;
}
-#ifdef CONFIG_NET
-static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
-{
- struct kobject *kobj = data, *ksobj;
- const struct kobj_ns_type_operations *ops;
-
- ops = kobj_ns_ops(kobj);
- if (!ops && kobj->kset) {
- ksobj = &kobj->kset->kobj;
- if (ksobj->parent != NULL)
- ops = kobj_ns_ops(ksobj->parent);
- }
-
- if (ops && ops->netlink_ns && kobj->ktype->namespace) {
- const void *sock_ns, *ns;
- ns = kobj->ktype->namespace(kobj);
- sock_ns = ops->netlink_ns(dsk);
- return sock_ns != ns;
- }
-
- return 0;
-}
-#endif
-
#ifdef CONFIG_UEVENT_HELPER
static int kobj_usermode_filter(struct kobject *kobj)
{
@@ -296,6 +273,7 @@ static void cleanup_uevent_env(struct subprocess_info *info)
}
#endif
+#ifdef CONFIG_NET
static struct sk_buff *alloc_uevent_skb(struct kobj_uevent_env *env,
const char *action_string,
const char *devpath)
@@ -321,15 +299,13 @@ static struct sk_buff *alloc_uevent_skb(struct kobj_uevent_env *env,
return skb;
}
-static int kobject_uevent_net_broadcast(struct kobject *kobj,
- struct kobj_uevent_env *env,
- const char *action_string,
- const char *devpath)
+static int uevent_net_broadcast_untagged(struct kobj_uevent_env *env,
+ const char *action_string,
+ const char *devpath)
{
- int retval = 0;
-#if defined(CONFIG_NET)
struct sk_buff *skb = NULL;
struct uevent_sock *ue_sk;
+ int retval = 0;
/* send netlink message */
list_for_each_entry(ue_sk, &uevent_sock_list, list) {
@@ -345,19 +321,95 @@ static int kobject_uevent_net_broadcast(struct kobject *kobj,
continue;
}
- retval = netlink_broadcast_filtered(uevent_sock, skb_get(skb),
- 0, 1, GFP_KERNEL,
- kobj_bcast_filter,
- kobj);
+ retval = netlink_broadcast(uevent_sock, skb_get(skb), 0, 1,
+ GFP_KERNEL);
/* ENOBUFS should be handled in userspace */
if (retval == -ENOBUFS || retval == -ESRCH)
retval = 0;
}
consume_skb(skb);
-#endif
+
return retval;
}
+static int uevent_net_broadcast_tagged(struct sock *usk,
+ struct kobj_uevent_env *env,
+ const char *action_string,
+ const char *devpath)
+{
+ struct user_namespace *owning_user_ns = sock_net(usk)->user_ns;
+ struct sk_buff *skb = NULL;
+ int ret;
+
+ skb = alloc_uevent_skb(env, action_string, devpath);
+ if (!skb)
+ return -ENOMEM;
+
+ /* fix credentials */
+ if (owning_user_ns != &init_user_ns) {
+ struct netlink_skb_parms *parms = &NETLINK_CB(skb);
+ kuid_t root_uid;
+ kgid_t root_gid;
+
+ /* fix uid */
+ root_uid = make_kuid(owning_user_ns, 0);
+ if (!uid_valid(root_uid))
+ root_uid = GLOBAL_ROOT_UID;
+ parms->creds.uid = root_uid;
+
+ /* fix gid */
+ root_gid = make_kgid(owning_user_ns, 0);
+ if (!gid_valid(root_gid))
+ root_gid = GLOBAL_ROOT_GID;
+ parms->creds.gid = root_gid;
+ }
+
+ ret = netlink_broadcast(usk, skb, 0, 1, GFP_KERNEL);
+ /* ENOBUFS should be handled in userspace */
+ if (ret == -ENOBUFS || ret == -ESRCH)
+ ret = 0;
+
+ return ret;
+}
+#endif
+
+static int kobject_uevent_net_broadcast(struct kobject *kobj,
+ struct kobj_uevent_env *env,
+ const char *action_string,
+ const char *devpath)
+{
+ int ret = 0;
+
+#ifdef CONFIG_NET
+ const struct kobj_ns_type_operations *ops;
+ const struct net *net = NULL;
+
+ ops = kobj_ns_ops(kobj);
+ if (!ops && kobj->kset) {
+ struct kobject *ksobj = &kobj->kset->kobj;
+ if (ksobj->parent != NULL)
+ ops = kobj_ns_ops(ksobj->parent);
+ }
+
+ /* kobjects currently only carry network namespace tags and they
+ * are the only tag relevant here since we want to decide which
+ * network namespaces to broadcast the uevent into.
+ */
+ if (ops && ops->netlink_ns && kobj->ktype->namespace)
+ if (ops->type == KOBJ_NS_TYPE_NET)
+ net = kobj->ktype->namespace(kobj);
+
+ if (!net)
+ ret = uevent_net_broadcast_untagged(env, action_string,
+ devpath);
+ else
+ ret = uevent_net_broadcast_tagged(net->uevent_sock->sk, env,
+ action_string, devpath);
+#endif
+
+ return ret;
+}
+
static void zap_modalias_env(struct kobj_uevent_env *env)
{
static const char modalias_prefix[] = "MODALIAS=";
@@ -716,9 +768,13 @@ static int uevent_net_init(struct net *net)
net->uevent_sock = ue_sk;
- mutex_lock(&uevent_sock_mutex);
- list_add_tail(&ue_sk->list, &uevent_sock_list);
- mutex_unlock(&uevent_sock_mutex);
+ /* Restrict uevents to initial user namespace. */
+ if (sock_net(ue_sk->sk)->user_ns == &init_user_ns) {
+ mutex_lock(&uevent_sock_mutex);
+ list_add_tail(&ue_sk->list, &uevent_sock_list);
+ mutex_unlock(&uevent_sock_mutex);
+ }
+
return 0;
}
@@ -726,9 +782,11 @@ static void uevent_net_exit(struct net *net)
{
struct uevent_sock *ue_sk = net->uevent_sock;
- mutex_lock(&uevent_sock_mutex);
- list_del(&ue_sk->list);
- mutex_unlock(&uevent_sock_mutex);
+ if (sock_net(ue_sk->sk)->user_ns == &init_user_ns) {
+ mutex_lock(&uevent_sock_mutex);
+ list_del(&ue_sk->list);
+ mutex_unlock(&uevent_sock_mutex);
+ }
netlink_kernel_release(ue_sk->sk);
kfree(ue_sk);
--
2.17.0
^ permalink raw reply related
* [PATCH net-next 1/2 v3] uevent: add alloc_uevent_skb() helper
From: Christian Brauner @ 2018-04-27 10:23 UTC (permalink / raw)
To: ebiederm, davem, netdev, linux-kernel
Cc: avagin, ktkhai, serge, gregkh, Christian Brauner
In-Reply-To: <20180427102306.8617-1-christian.brauner@ubuntu.com>
This patch adds alloc_uevent_skb() in preparation for follow up patches.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
lib/kobject_uevent.c | 39 ++++++++++++++++++++++++++-------------
1 file changed, 26 insertions(+), 13 deletions(-)
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 15ea216a67ce..c3cb110f663b 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -296,6 +296,31 @@ static void cleanup_uevent_env(struct subprocess_info *info)
}
#endif
+static struct sk_buff *alloc_uevent_skb(struct kobj_uevent_env *env,
+ const char *action_string,
+ const char *devpath)
+{
+ struct sk_buff *skb = NULL;
+ char *scratch;
+ size_t len;
+
+ /* allocate message with maximum possible size */
+ len = strlen(action_string) + strlen(devpath) + 2;
+ skb = alloc_skb(len + env->buflen, GFP_KERNEL);
+ if (!skb)
+ return NULL;
+
+ /* add header */
+ scratch = skb_put(skb, len);
+ sprintf(scratch, "%s@%s", action_string, devpath);
+
+ skb_put_data(skb, env->buf, env->buflen);
+
+ NETLINK_CB(skb).dst_group = 1;
+
+ return skb;
+}
+
static int kobject_uevent_net_broadcast(struct kobject *kobj,
struct kobj_uevent_env *env,
const char *action_string,
@@ -314,22 +339,10 @@ static int kobject_uevent_net_broadcast(struct kobject *kobj,
continue;
if (!skb) {
- /* allocate message with the maximum possible size */
- size_t len = strlen(action_string) + strlen(devpath) + 2;
- char *scratch;
-
retval = -ENOMEM;
- skb = alloc_skb(len + env->buflen, GFP_KERNEL);
+ skb = alloc_uevent_skb(env, action_string, devpath);
if (!skb)
continue;
-
- /* add header */
- scratch = skb_put(skb, len);
- sprintf(scratch, "%s@%s", action_string, devpath);
-
- skb_put_data(skb, env->buf, env->buflen);
-
- NETLINK_CB(skb).dst_group = 1;
}
retval = netlink_broadcast_filtered(uevent_sock, skb_get(skb),
--
2.17.0
^ permalink raw reply related
* [PATCH net-next 0/2] netns: uevent filtering
From: Christian Brauner @ 2018-04-27 10:23 UTC (permalink / raw)
To: ebiederm, davem, netdev, linux-kernel
Cc: avagin, ktkhai, serge, gregkh, Christian Brauner
Hey everyone,
This is the new approach to uevent filtering as discussed (see the
threads in [1], [2], and [3]).
This series deals with with fixing up uevent filtering logic:
- uevent filtering logic is simplified
- locking time on uevent_sock_list is minimized
- tagged and untagged kobjects are handled in separate codepaths
- permissions for userspace are fixed for network device uevents in
network namespaces owned by non-initial user namespaces
Udev is now able to see those events correctly which it wasn't before.
For example, moving a physical device into a network namespace not
owned by the initial user namespaces before gave:
root@xen1:~# udevadm --debug monitor -k
calling: monitor
monitor will print the received events for:
KERNEL - the kernel uevent
sender uid=65534, message ignored
sender uid=65534, message ignored
sender uid=65534, message ignored
sender uid=65534, message ignored
sender uid=65534, message ignored
and now after the discussion and solution in [3] correctly gives:
root@xen1:~# udevadm --debug monitor -k
calling: monitor
monitor will print the received events for:
KERNEL - the kernel uevent
KERNEL[625.301042] add /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/enp1s0f1 (net)
KERNEL[625.301109] move /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/enp1s0f1 (net)
KERNEL[625.301138] move /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/eth1 (net)
KERNEL[655.333272] remove /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/eth1 (net)
Thanks!
Christian
[1]: https://lkml.org/lkml/2018/4/4/739
[2]: https://lkml.org/lkml/2018/4/26/767
[3]: https://lkml.org/lkml/2018/4/26/738
Christian Brauner (2):
uevent: add alloc_uevent_skb() helper
netns: restrict uevents
lib/kobject_uevent.c | 175 ++++++++++++++++++++++++++++++-------------
1 file changed, 123 insertions(+), 52 deletions(-)
--
2.17.0
^ permalink raw reply
* Re: [dm-devel] [PATCH v5] fault-injection: introduce kvmalloc fallback options
From: Mikulas Patocka @ 2018-04-27 10:20 UTC (permalink / raw)
To: Michal Hocko
Cc: Michael S. Tsirkin, John Stoffel, James Bottomley, Michal,
eric.dumazet, netdev, jasowang, Randy Dunlap, linux-kernel,
Matthew Wilcox, linux-mm, dm-devel, Vlastimil Babka, Andrew,
David Rientjes, Morton, virtualization, David Miller, edumazet
In-Reply-To: <20180427082555.GC17484@dhcp22.suse.cz>
On Fri, 27 Apr 2018, Michal Hocko wrote:
> On Thu 26-04-18 18:52:05, Mikulas Patocka wrote:
> >
> >
> > On Fri, 27 Apr 2018, Michael S. Tsirkin wrote:
> [...]
> > > But assuming it's important to control this kind of
> > > fault injection to be controlled from
> > > a dedicated menuconfig option, why not the rest of
> > > faults?
> >
> > The injected faults cause damage to the user, so there's no point to
> > enable them by default. vmalloc fallback should not cause any damage
> > (assuming that the code is correctly written).
>
> But you want to find those bugs which would BUG_ON easier, so there is a
> risk of harm IIUC
Yes, I want to harm them, but I only want to harm the users using the
debugging kernel. Testers should be "harmed" by crashes - so that the
users of production kernels are harmed less.
If someone hits this, he should report it, use the kernel parameter to
turn it off and continue with the testing.
> and this is not much different than other fault injecting paths.
Fault injections causes misbehavior even on completely bug-free code (for
example, syscalls randomly returning -ENOMEM). This won't cause
misbehavior on bug-free code.
Mikulas
^ permalink raw reply
* Re: [PATCH bpf-next v2] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
From: Daniel Borkmann @ 2018-04-27 10:10 UTC (permalink / raw)
To: Leo Yan, Alexei Starovoitov, David S. Miller, Jonathan Corbet,
netdev, linux-kernel, linux-doc
In-Reply-To: <1524823374-6174-1-git-send-email-leo.yan@linaro.org>
On 04/27/2018 12:02 PM, Leo Yan wrote:
> When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for
> bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2
> for JIT opcode dumping; this patch is to update the doc for it.
>
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
Applied to bpf-next, thanks Leo!
^ permalink raw reply
* Re: [PATCH 1/3] selftests/bpf: Makefile: add includes to fix broken test build
From: Daniel Borkmann @ 2018-04-27 10:04 UTC (permalink / raw)
To: Sirio Balmelli, ast; +Cc: netdev
In-Reply-To: <20180426083107.GA13908@vm4>
On 04/26/2018 10:31 AM, Sirio Balmelli wrote:
> several bpf tests fail to build with clang 7.0.0:
> ...
> In file included from ../../../include/uapi/linux/bpf.h:11:
> In file included from ./include/uapi/linux/types.h:5:
> /usr/include/asm-generic/int-ll64.h:11:10: fatal error: 'asm/bitsperlong.h' file not found
>
> /usr/include/asm-generic/int-ll64.h is from outside the kernel repo,
> probably a good idea to repoint to -I$(ROOT)/include/uapi.
> asm/bitsperlong.h is architecture-specific, cater for this with an
> architecture-specific include -I$(ROOT)/$(ARCH)/include/uapi.
>
> Re-building now yields:
> ../../../../include/uapi/linux/stddef.h:2:10: fatal error: 'linux/compiler_types.h' file not found
>
> Fix this with -I$(ROOT)/include
>
> Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
> ---
> tools/testing/selftests/bpf/Makefile | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 0b72cc7..6a8cfaf 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -80,8 +80,14 @@ else
> CPU ?= generic
> endif
>
> -CLANG_FLAGS = -I. -I./include/uapi -I../../../include/uapi \
> - -Wno-compare-distinct-pointer-types
> +ARCH := arch/$(subst _64,,$(shell uname -p))
> +ROOT :=../../../..
> +TOOLS :=../../..
> +CLANG_FLAGS = -I. -I./include/uapi \
> + -I$(TOOLS)/include/uapi -I$(TOOLS)/include \
> + -I$(ROOT)/$(ARCH)/include/uapi \
> + -I$(ROOT)/include/uapi -I$(ROOT)/include \
> + -Wno-compare-distinct-pointer-types
Problem is that this will now pull in all sort of kernel headers whereas
before the includes are limited and contained to tools/include/ respectively
tools/arch/*/include/, meaning, the tools/ infrastructure has specifically
headers that are needed under these locations. And a bitsperlong.h is already
present there, thus please change and respin your fix to reuse that one.
Thanks Sirio!
^ permalink raw reply
* Re: [PATCH net-next 00/13] sctp: refactor MTU handling
From: Xin Long @ 2018-04-27 10:04 UTC (permalink / raw)
To: Marcelo Ricardo Leitner
Cc: network dev, linux-sctp, Vlad Yasevich, Neil Horman
In-Reply-To: <cover.1524772453.git.marcelo.leitner@gmail.com>
On Fri, Apr 27, 2018 at 3:58 AM, Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
> Currently MTU handling is spread over SCTP stack. There are multiple
> places doing same/similar calculations and updating them is error prone
> as one spot can easily be left out.
>
> This patchset converges it into a more concise and consistent code. In
> general, it moves MTU handling from functions with bigger objectives,
> such as sctp_assoc_add_peer(), to specific functions.
>
> It's also a preparation for the next patchset, which removes the
> duplication between sctp_make_op_error_space and
> sctp_make_op_error_fixed and relies on sctp_mtu_payload introduced here.
>
> More details on each patch.
>
> Marcelo Ricardo Leitner (13):
> sctp: remove old and unused SCTP_MIN_PMTU
> sctp: move transport pathmtu calc away of sctp_assoc_add_peer
> sctp: remove an if() that is always true
> sctp: introduce sctp_assoc_set_pmtu
> sctp: introduce sctp_mtu_payload
> sctp: introduce sctp_assoc_update_frag_point
> sctp: remove sctp_assoc_pending_pmtu
> sctp: introduce sctp_dst_mtu
> sctp: remove sctp_transport_pmtu_check
> sctp: re-use sctp_transport_pmtu in sctp_transport_route
> sctp: honor PMTU_DISABLED when handling icmp
> sctp: consider idata chunks when setting SCTP_MAXSEG
> sctp: allow unsetting sockopt MAXSEG
>
> include/net/sctp/constants.h | 5 ++--
> include/net/sctp/sctp.h | 52 ++++++++++++++------------------------
> include/net/sctp/structs.h | 2 ++
> net/sctp/associola.c | 60 +++++++++++++++++++++++---------------------
> net/sctp/chunk.c | 12 +--------
> net/sctp/output.c | 28 ++++++++-------------
> net/sctp/socket.c | 43 ++++++++++++++-----------------
> net/sctp/transport.c | 37 ++++++++++++++-------------
> 8 files changed, 105 insertions(+), 134 deletions(-)
>
> --
> 2.14.3
>
Series
Reviewed-by: Xin Long <lucien.xin@gmail.com>
^ permalink raw reply
* [PATCH bpf-next v2] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
From: Leo Yan @ 2018-04-27 10:02 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, David S. Miller,
Jonathan Corbet, netdev, linux-kernel, linux-doc
Cc: Leo Yan
When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for
bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2
for JIT opcode dumping; this patch is to update the doc for it.
Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
Documentation/networking/filter.txt | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index fd55c7d..5032e12 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -483,6 +483,12 @@ Example output from dmesg:
[ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
[ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3
+When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1 and
+setting any other value than that will return in failure. This is even the case for
+setting bpf_jit_enable to 2, since dumping the final JIT image into the kernel log
+is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is the
+generally recommended approach instead.
+
In the kernel source tree under tools/bpf/, there's bpf_jit_disasm for
generating disassembly out of the kernel log's hexdump:
--
1.9.1
^ permalink raw reply related
* Re: [PATCH bpf-next] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
From: Daniel Borkmann @ 2018-04-27 9:59 UTC (permalink / raw)
To: Leo Yan
Cc: Alexei Starovoitov, David S. Miller, Jonathan Corbet, netdev,
linux-kernel, linux-doc
In-Reply-To: <20180427094910.GA31015@leoy-ThinkPad-X240s>
On 04/27/2018 11:49 AM, Leo Yan wrote:
> On Fri, Apr 27, 2018 at 11:44:44AM +0200, Daniel Borkmann wrote:
>> On 04/26/2018 04:26 AM, Leo Yan wrote:
>>> When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for
>>> bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2
>>> for JIT opcode dumping; this patch is to update the doc for it.
>>>
>>> Signed-off-by: Leo Yan <leo.yan@linaro.org>
>>> ---
>>> Documentation/networking/filter.txt | 6 ++++++
>>> 1 file changed, 6 insertions(+)
>>>
>>> diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
>>> index fd55c7d..feddab9 100644
>>> --- a/Documentation/networking/filter.txt
>>> +++ b/Documentation/networking/filter.txt
>>> @@ -483,6 +483,12 @@ Example output from dmesg:
>>> [ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
>>> [ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3
>>>
>>> +When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is set to 1 by default
>>> +and it returns failure if change to any other value from proc node; this is
>>> +for security consideration to avoid leaking info to unprivileged users. In this
>>> +case, we can't directly dump JIT opcode image from kernel log, alternatively we
>>> +need to use bpf tool for the dumping.
>>> +
>>
>> Could you change this doc text a bit, I think it's slightly misleading. From the first
>> sentence one could also interpret that value 0 would leaking info to unprivileged users
>> whereas here we're only talking about the case of value 2. Maybe something roughly like
>> this to make it more clear:
>>
>> When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1 and
>> setting any other value than that will return in failure. This is even the case for
>> setting bpf_jit_enable to 2, since dumping the final JIT image into the kernel log
>> is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is the
>> generally recommended approach instead.
>
> Yeah, your rephrasing is more clear and better. Will do this and send
> new patch soon. Thanks for your helping.
Awesome, thank you!
^ permalink raw reply
* Re: [PATCH 2/3] selftests/bpf: test_xdp_noinline.c: fix 'noinline' macro expansion
From: Daniel Borkmann @ 2018-04-27 9:58 UTC (permalink / raw)
To: Sirio Balmelli, ast; +Cc: netdev
In-Reply-To: <20180426083125.GA13968@vm4>
On 04/26/2018 10:31 AM, Sirio Balmelli wrote:
> Compiling with clang 7.0.0 yields:
> test_xdp_noinline.c:470:24: warning: unknown attribute '__attribute__' ignored [-Wunknown-attributes]
> ../../../include/linux/compiler-gcc.h:24:19: note: expanded from macro 'noinline'
> ^
> test_xdp_noinline.c:494:24: error: use of undeclared identifier 'noinline'; did you mean 'inline'?
> static __attribute__ ((noinline))
>
> This appears to be the 'noinline' attribute being itself macro-expanded,
> so the compiler sees '__attribute__ ((__attribute__((noinline))))'.
>
> Fix using an #ifndef.
> Homogenize function declarations.
>
> Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
I think this error is a result of your previous patch that you pull in
kernel headers suddenly. Otherwise include/linux/compiler-gcc.h should
have never been included. That's why you see the wrong expansion of ...
__attribute__ ((noinline))
... into ...
__attribute__ ((__attribute__ ((noinline))))
... since noinline is additionally defined in include/linux/compiler-gcc.h.
> ---
> tools/testing/selftests/bpf/test_xdp_noinline.c | 79 +++++++++++++------------
> 1 file changed, 42 insertions(+), 37 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/test_xdp_noinline.c b/tools/testing/selftests/bpf/test_xdp_noinline.c
> index 5e4aac7..5b5f3f2 100644
> --- a/tools/testing/selftests/bpf/test_xdp_noinline.c
> +++ b/tools/testing/selftests/bpf/test_xdp_noinline.c
> @@ -15,6 +15,11 @@
> #include <linux/udp.h>
> #include "bpf_helpers.h"
>
> +/* some compiler-specific header might define this */
> +#ifndef noinline
> +#define noinline (__attribute__ ((noinline)))
> +#endif
> +
> #define bpf_printk(fmt, ...) \
> ({ \
> char ____fmt[] = fmt; \
> @@ -55,7 +60,7 @@ static __u32 rol32(__u32 word, unsigned int shift)
>
> typedef unsigned int u32;
>
> -static __attribute__ ((noinline))
> +static noinline
> u32 jhash(const void *key, u32 length, u32 initval)
> {
> u32 a, b, c;
> @@ -92,7 +97,7 @@ u32 jhash(const void *key, u32 length, u32 initval)
> return c;
> }
>
> -static __attribute__ ((noinline))
> +static noinline
> u32 __jhash_nwords(u32 a, u32 b, u32 c, u32 initval)
> {
> a += initval;
> @@ -102,7 +107,7 @@ u32 __jhash_nwords(u32 a, u32 b, u32 c, u32 initval)
> return c;
> }
>
> -static __attribute__ ((noinline))
> +static noinline
> u32 jhash_2words(u32 a, u32 b, u32 initval)
> {
> return __jhash_nwords(a, b, 0, initval + JHASH_INITVAL + (2 << 2));
> @@ -239,7 +244,7 @@ static inline __u64 calc_offset(bool is_ipv6, bool is_icmp)
> return off;
> }
>
> -static __attribute__ ((noinline))
> +static noinline
> bool parse_udp(void *data, void *data_end,
> bool is_ipv6, struct packet_description *pckt)
> {
> @@ -261,7 +266,7 @@ bool parse_udp(void *data, void *data_end,
> return 1;
> }
>
> -static __attribute__ ((noinline))
> +static noinline
> bool parse_tcp(void *data, void *data_end,
> bool is_ipv6, struct packet_description *pckt)
> {
> @@ -285,7 +290,7 @@ bool parse_tcp(void *data, void *data_end,
> return 1;
> }
>
> -static __attribute__ ((noinline))
> +static noinline
> bool encap_v6(struct xdp_md *xdp, struct ctl_value *cval,
> struct packet_description *pckt,
> struct real_definition *dst, __u32 pkt_bytes)
> @@ -328,7 +333,7 @@ bool encap_v6(struct xdp_md *xdp, struct ctl_value *cval,
> return 1;
> }
>
> -static __attribute__ ((noinline))
> +static noinline
> bool encap_v4(struct xdp_md *xdp, struct ctl_value *cval,
> struct packet_description *pckt,
> struct real_definition *dst, __u32 pkt_bytes)
> @@ -382,7 +387,7 @@ bool encap_v4(struct xdp_md *xdp, struct ctl_value *cval,
> return 1;
> }
>
> -static __attribute__ ((noinline))
> +static noinline
> bool decap_v6(struct xdp_md *xdp, void **data, void **data_end, bool inner_v4)
> {
> struct eth_hdr *new_eth;
> @@ -403,7 +408,7 @@ bool decap_v6(struct xdp_md *xdp, void **data, void **data_end, bool inner_v4)
> return 1;
> }
>
> -static __attribute__ ((noinline))
> +static noinline
> bool decap_v4(struct xdp_md *xdp, void **data, void **data_end)
> {
> struct eth_hdr *new_eth;
> @@ -421,7 +426,7 @@ bool decap_v4(struct xdp_md *xdp, void **data, void **data_end)
> return 1;
> }
>
> -static __attribute__ ((noinline))
> +static noinline
> int swap_mac_and_send(void *data, void *data_end)
> {
> unsigned char tmp_mac[6];
> @@ -434,7 +439,7 @@ int swap_mac_and_send(void *data, void *data_end)
> return XDP_TX;
> }
>
> -static __attribute__ ((noinline))
> +static noinline
> int send_icmp_reply(void *data, void *data_end)
> {
> struct icmphdr *icmp_hdr;
> @@ -467,7 +472,7 @@ int send_icmp_reply(void *data, void *data_end)
> return swap_mac_and_send(data, data_end);
> }
>
> -static __attribute__ ((noinline))
> +static noinline
> int send_icmp6_reply(void *data, void *data_end)
> {
> struct icmp6hdr *icmp_hdr;
> @@ -491,7 +496,7 @@ int send_icmp6_reply(void *data, void *data_end)
> return swap_mac_and_send(data, data_end);
> }
>
> -static __attribute__ ((noinline))
> +static noinline
> int parse_icmpv6(void *data, void *data_end, __u64 off,
> struct packet_description *pckt)
> {
> @@ -516,7 +521,7 @@ int parse_icmpv6(void *data, void *data_end, __u64 off,
> return -1;
> }
>
> -static __attribute__ ((noinline))
> +static noinline
> int parse_icmp(void *data, void *data_end, __u64 off,
> struct packet_description *pckt)
> {
> @@ -543,7 +548,7 @@ int parse_icmp(void *data, void *data_end, __u64 off,
> return -1;
> }
>
> -static __attribute__ ((noinline))
> +static noinline
> __u32 get_packet_hash(struct packet_description *pckt,
> bool hash_16bytes)
> {
> @@ -555,11 +560,11 @@ __u32 get_packet_hash(struct packet_description *pckt,
> 24);
> }
>
> -__attribute__ ((noinline))
> -static bool get_packet_dst(struct real_definition **real,
> - struct packet_description *pckt,
> - struct vip_meta *vip_info,
> - bool is_ipv6, void *lru_map)
> +static noinline
> +bool get_packet_dst(struct real_definition **real,
> + struct packet_description *pckt,
> + struct vip_meta *vip_info,
> + bool is_ipv6, void *lru_map)
> {
> struct real_pos_lru new_dst_lru = { };
> bool hash_16bytes = is_ipv6;
> @@ -608,10 +613,10 @@ static bool get_packet_dst(struct real_definition **real,
> return 1;
> }
>
> -__attribute__ ((noinline))
> -static void connection_table_lookup(struct real_definition **real,
> - struct packet_description *pckt,
> - void *lru_map)
> +static noinline
> +void connection_table_lookup(struct real_definition **real,
> + struct packet_description *pckt,
> + void *lru_map)
> {
>
> struct real_pos_lru *dst_lru;
> @@ -635,11 +640,11 @@ static void connection_table_lookup(struct real_definition **real,
> * below function has 6 arguments whereas bpf and llvm allow maximum of 5
> * but since it's _static_ llvm can optimize one argument away
> */
> -__attribute__ ((noinline))
> -static int process_l3_headers_v6(struct packet_description *pckt,
> - __u8 *protocol, __u64 off,
> - __u16 *pkt_bytes, void *data,
> - void *data_end)
> +static noinline
> +int process_l3_headers_v6(struct packet_description *pckt,
> + __u8 *protocol, __u64 off,
> + __u16 *pkt_bytes, void *data,
> + void *data_end)
> {
> struct ipv6hdr *ip6h;
> __u64 iph_len;
> @@ -666,11 +671,11 @@ static int process_l3_headers_v6(struct packet_description *pckt,
> return -1;
> }
>
> -__attribute__ ((noinline))
> -static int process_l3_headers_v4(struct packet_description *pckt,
> - __u8 *protocol, __u64 off,
> - __u16 *pkt_bytes, void *data,
> - void *data_end)
> +static noinline
> +int process_l3_headers_v4(struct packet_description *pckt,
> + __u8 *protocol, __u64 off,
> + __u16 *pkt_bytes, void *data,
> + void *data_end)
> {
> struct iphdr *iph;
> __u64 iph_len;
> @@ -698,9 +703,9 @@ static int process_l3_headers_v4(struct packet_description *pckt,
> return -1;
> }
>
> -__attribute__ ((noinline))
> -static int process_packet(void *data, __u64 off, void *data_end,
> - bool is_ipv6, struct xdp_md *xdp)
> +static inline
s/inline/noinline/
> +int process_packet(void *data, __u64 off, void *data_end,
> + bool is_ipv6, struct xdp_md *xdp)
> {
>
> struct real_definition *dst = NULL;
>
^ permalink raw reply
* Re: [PATCH 3/3] selftests/bpf: .gitignore: add test_btf
From: Daniel Borkmann @ 2018-04-27 9:53 UTC (permalink / raw)
To: Sirio Balmelli, ast; +Cc: netdev
In-Reply-To: <20180426083146.GA14025@vm4>
Hi Sirio,
thanks for your patch!
On 04/26/2018 10:31 AM, Sirio Balmelli wrote:
> Signed-off-by: Sirio Balmelli <sirio@b-ad.ch>
> ---
> tools/testing/selftests/bpf/.gitignore | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
> index 5e1ab2f..9513c77 100644
> --- a/tools/testing/selftests/bpf/.gitignore
> +++ b/tools/testing/selftests/bpf/.gitignore
> @@ -12,6 +12,7 @@ test_tcpbpf_user
> test_verifier_log
> feature
> test_libbpf_open
> +test_btf
This one is already part of bpf-next tree, please rebase:
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/.gitignore
> test_sock
> test_sock_addr
> urandom_read
Thanks,
Daniel
^ permalink raw reply
* [PATCH v1 net-next] microchip_t1: Add driver for Microchip LAN87XX T1 PHYs
From: Nisar Sayed @ 2018-04-27 15:10 UTC (permalink / raw)
To: davem; +Cc: UNGLinuxDriver, netdev
Add driver for Microchip LAN87XX T1 PHYs
This patch support driver for Microchp T1 PHYs.
There will be followup patches to this driver to support T1 PHY
features such as cable diagnostics, signal quality indicator(SQI),
sleep and wakeup (TC10) support.
Signed-off-by: Nisar Sayed <Nisar.Sayed@microchip.com>
---
v0 - v1:
* Rename microchipT1phy.c file to microchip_t1.c
* Remove microchipT1phy.h include file
* Add SPDX license identifier
* Remove remove probe and remove functions
* Update LAN87XX_INTERRUPT_MASK write as suggested
---
drivers/net/phy/Kconfig | 5 +++
drivers/net/phy/Makefile | 1 +
drivers/net/phy/microchip_t1.c | 88 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 94 insertions(+)
create mode 100644 drivers/net/phy/microchip_t1.c
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index bdfbabb..7b0b351 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -354,6 +354,11 @@ config MICROCHIP_PHY
help
Supports the LAN88XX PHYs.
+config MICROCHIP_T1_PHY
+ tristate "Microchip T1 PHYs"
+ ---help---
+ Supports the LAN87XX PHYs.
+
config MICROSEMI_PHY
tristate "Microsemi PHYs"
---help---
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 01acbcb..3d0550b 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -70,6 +70,7 @@ obj-$(CONFIG_MESON_GXL_PHY) += meson-gxl.o
obj-$(CONFIG_MICREL_KS8995MA) += spi_ks8995.o
obj-$(CONFIG_MICREL_PHY) += micrel.o
obj-$(CONFIG_MICROCHIP_PHY) += microchip.o
+obj-$(CONFIG_MICROCHIP_T1_PHY) += microchip_t1.o
obj-$(CONFIG_MICROSEMI_PHY) += mscc.o
obj-$(CONFIG_NATIONAL_PHY) += national.o
obj-$(CONFIG_QSEMI_PHY) += qsemi.o
diff --git a/drivers/net/phy/microchip_t1.c b/drivers/net/phy/microchip_t1.c
new file mode 100644
index 0000000..1f6f299
--- /dev/null
+++ b/drivers/net/phy/microchip_t1.c
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2018 Microchip Technology
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mii.h>
+#include <linux/phy.h>
+
+/* Interrupt Source Register */
+#define LAN87XX_INTERRUPT_SOURCE (0x18)
+
+/* Interrupt Mask Register */
+#define LAN87XX_INTERRUPT_MASK (0x19)
+#define LAN87XX_MASK_LINK_UP (0x0004)
+#define LAN87XX_MASK_LINK_DOWN (0x0002)
+
+#define DRIVER_AUTHOR "Nisar Sayed <nisar.sayed@microchip.com>"
+#define DRIVER_DESC "Microchip LAN87XX T1 PHY driver"
+
+static int lan87xx_phy_config_intr(struct phy_device *phydev)
+{
+ int rc, val = 0;
+
+ if (phydev->interrupts == PHY_INTERRUPT_ENABLED) {
+ /* unmask all source and clear them before enable */
+ rc = phy_write(phydev, LAN87XX_INTERRUPT_MASK, 0x7FFF);
+ rc = phy_read(phydev, LAN87XX_INTERRUPT_SOURCE);
+ val = (LAN87XX_MASK_LINK_UP | LAN87XX_MASK_LINK_DOWN);
+ }
+
+ rc = phy_write(phydev, LAN87XX_INTERRUPT_MASK, val);
+
+ return rc < 0 ? rc : 0;
+}
+
+static int lan87xx_phy_ack_interrupt(struct phy_device *phydev)
+{
+ int rc = phy_read(phydev, LAN87XX_INTERRUPT_SOURCE);
+
+ return rc < 0 ? rc : 0;
+}
+
+static struct phy_driver microchip_t1_phy_driver[] = {
+ {
+ .phy_id = 0x0007c150,
+ .phy_id_mask = 0xfffffff0,
+ .name = "Microchip LAN87xx",
+
+ .features = SUPPORTED_100baseT_Full,
+ .flags = PHY_HAS_INTERRUPT,
+
+ .config_init = genphy_config_init,
+ .config_aneg = genphy_config_aneg,
+
+ .ack_interrupt = lan87xx_phy_ack_interrupt,
+ .config_intr = lan87xx_phy_config_intr,
+
+ .suspend = genphy_suspend,
+ .resume = genphy_resume,
+ }
+};
+
+module_phy_driver(microchip_t1_phy_driver);
+
+static struct mdio_device_id __maybe_unused microchip_t1_tbl[] = {
+ { 0x0007c150, 0xfffffff0 },
+ { }
+};
+
+MODULE_DEVICE_TABLE(mdio, microchip_t1_tbl);
+
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
+MODULE_LICENSE("GPL");
--
2.14.1
^ permalink raw reply related
* Re: [PATCH bpf-next] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
From: Leo Yan @ 2018-04-27 9:49 UTC (permalink / raw)
To: Daniel Borkmann
Cc: Alexei Starovoitov, David S. Miller, Jonathan Corbet, netdev,
linux-kernel, linux-doc
In-Reply-To: <275e03a2-b74e-8f60-4ffe-26c9a79fae9d@iogearbox.net>
On Fri, Apr 27, 2018 at 11:44:44AM +0200, Daniel Borkmann wrote:
> On 04/26/2018 04:26 AM, Leo Yan wrote:
> > When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for
> > bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2
> > for JIT opcode dumping; this patch is to update the doc for it.
> >
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> > Documentation/networking/filter.txt | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
> > index fd55c7d..feddab9 100644
> > --- a/Documentation/networking/filter.txt
> > +++ b/Documentation/networking/filter.txt
> > @@ -483,6 +483,12 @@ Example output from dmesg:
> > [ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
> > [ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3
> >
> > +When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is set to 1 by default
> > +and it returns failure if change to any other value from proc node; this is
> > +for security consideration to avoid leaking info to unprivileged users. In this
> > +case, we can't directly dump JIT opcode image from kernel log, alternatively we
> > +need to use bpf tool for the dumping.
> > +
>
> Could you change this doc text a bit, I think it's slightly misleading. From the first
> sentence one could also interpret that value 0 would leaking info to unprivileged users
> whereas here we're only talking about the case of value 2. Maybe something roughly like
> this to make it more clear:
>
> When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1 and
> setting any other value than that will return in failure. This is even the case for
> setting bpf_jit_enable to 2, since dumping the final JIT image into the kernel log
> is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is the
> generally recommended approach instead.
Yeah, your rephrasing is more clear and better. Will do this and send
new patch soon. Thanks for your helping.
> Thanks,
> Daniel
^ permalink raw reply
* Re: [PATCH] netfilter: ebtables: handle string from userspace with care
From: Dmitry Vyukov @ 2018-04-27 9:46 UTC (permalink / raw)
To: Florian Westphal
Cc: Paolo Abeni, netfilter-devel, syzbot, coreteam, syzkaller-bugs,
netdev
In-Reply-To: <20180427092622.4ifhb4zjoncwawmi@breakpoint.cc>
On Fri, Apr 27, 2018 at 11:26 AM, Florian Westphal <fw@strlen.de> wrote:
> Paolo Abeni <pabeni@redhat.com> wrote:
>> strlcpy() can't be safely used on a user-space provided string,
>> as it can try to read beyond the buffer's end, if the latter is
>> not NULL terminated.
>
> Yes.
>
>> Leveraging the above, syzbot has been able to trigger the following
>> splat:
>>
>> BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300
>> [inline]
>> BUG: KASAN: stack-out-of-bounds in compat_mtw_from_user
>> net/bridge/netfilter/ebtables.c:1957 [inline]
>> BUG: KASAN: stack-out-of-bounds in ebt_size_mwt
>> net/bridge/netfilter/ebtables.c:2059 [inline]
>> BUG: KASAN: stack-out-of-bounds in size_entry_mwt
>> net/bridge/netfilter/ebtables.c:2155 [inline]
>> BUG: KASAN: stack-out-of-bounds in compat_copy_entries+0x96c/0x14a0
>> net/bridge/netfilter/ebtables.c:2194
>> Write of size 33 at addr ffff8801b0abf888 by task syz-executor0/4504
>
> Which is weird, I don't understand this report.
> The code IS wrong, but it should cause out-of-bounds read (strlen on
> src), but not out-of-bounds write.
Please see this for explanation:
https://groups.google.com/d/msg/syzkaller-bugs/-Jyti8zBWjU/6n-fkmXeBAAJ
The stack overwrite actually happens here.
> Yes, I sent a recent patch (dceb48d86b4871984b8ce9ad5057fb2c01aa33de in
> nf.git) that would now allow to get rid of the strlcpy and use the
> source directly.
^ permalink raw reply
* Re: [PATCH bpf-next] bpf, doc: Update bpf_jit_enable limitation for CONFIG_BPF_JIT_ALWAYS_ON
From: Daniel Borkmann @ 2018-04-27 9:44 UTC (permalink / raw)
To: Leo Yan, Alexei Starovoitov, David S. Miller, Jonathan Corbet,
netdev, linux-kernel, linux-doc
In-Reply-To: <1524709611-29437-1-git-send-email-leo.yan@linaro.org>
On 04/26/2018 04:26 AM, Leo Yan wrote:
> When CONFIG_BPF_JIT_ALWAYS_ON is enabled, kernel has limitation for
> bpf_jit_enable, so it has fixed value 1 and we cannot set it to 2
> for JIT opcode dumping; this patch is to update the doc for it.
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
> Documentation/networking/filter.txt | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
> index fd55c7d..feddab9 100644
> --- a/Documentation/networking/filter.txt
> +++ b/Documentation/networking/filter.txt
> @@ -483,6 +483,12 @@ Example output from dmesg:
> [ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00
> [ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3
>
> +When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is set to 1 by default
> +and it returns failure if change to any other value from proc node; this is
> +for security consideration to avoid leaking info to unprivileged users. In this
> +case, we can't directly dump JIT opcode image from kernel log, alternatively we
> +need to use bpf tool for the dumping.
> +
Could you change this doc text a bit, I think it's slightly misleading. From the first
sentence one could also interpret that value 0 would leaking info to unprivileged users
whereas here we're only talking about the case of value 2. Maybe something roughly like
this to make it more clear:
When CONFIG_BPF_JIT_ALWAYS_ON is enabled, bpf_jit_enable is permanently set to 1 and
setting any other value than that will return in failure. This is even the case for
setting bpf_jit_enable to 2, since dumping the final JIT image into the kernel log
is discouraged and introspection through bpftool (under tools/bpf/bpftool/) is the
generally recommended approach instead.
Thanks,
Daniel
^ permalink raw reply
* Re: [PATCH net-next] selftests: pmtu: Minimum MTU for vti6 is 68
From: Xin Long @ 2018-04-27 9:33 UTC (permalink / raw)
To: Stefano Brivio
Cc: David S . Miller, Steffen Klassert, Alexey Kodanev, Jarod Wilson,
Sabrina Dubroca, network dev
In-Reply-To: <c2369c8f004006b33007bad40b63c35f50ff3c23.1524764073.git.sbrivio@redhat.com>
On Fri, Apr 27, 2018 at 1:41 AM, Stefano Brivio <sbrivio@redhat.com> wrote:
> A vti6 interface can carry IPv4 packets too.
>
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---
> tools/testing/selftests/net/pmtu.sh | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
> index 1e428781a625..7651fd4d86fe 100755
> --- a/tools/testing/selftests/net/pmtu.sh
> +++ b/tools/testing/selftests/net/pmtu.sh
> @@ -368,7 +368,7 @@ test_pmtu_vti6_link_add_mtu() {
>
> fail=0
>
> - min=1280
> + min=68 # vti6 can carry IPv4 packets too
> max=$((65535 - 40))
> # Check invalid values first
> for v in $((min - 1)) $((max + 1)); do
> @@ -384,7 +384,7 @@ test_pmtu_vti6_link_add_mtu() {
> done
>
> # Now check valid values
> - for v in 1280 1300 $((65535 - 40)); do
> + for v in 68 1280 1300 $((65535 - 40)); do
> ${ns_a} ip link add vti6_a mtu ${v} type vti6 local ${veth6_a_addr} remote ${veth6_b_addr} key 10
> mtu="$(link_get_mtu "${ns_a}" vti6_a)"
> ${ns_a} ip link del vti6_a
> --
> 2.15.1
>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
^ permalink raw reply
* Re: [PATCH] netfilter: ebtables: handle string from userspace with care
From: Florian Westphal @ 2018-04-27 9:26 UTC (permalink / raw)
To: Paolo Abeni; +Cc: netfilter-devel, syzbot, fw, coreteam, syzkaller-bugs, netdev
In-Reply-To: <8710122d42aa1f3e081812f2abf406973f834982.1524818458.git.pabeni@redhat.com>
Paolo Abeni <pabeni@redhat.com> wrote:
> strlcpy() can't be safely used on a user-space provided string,
> as it can try to read beyond the buffer's end, if the latter is
> not NULL terminated.
Yes.
> Leveraging the above, syzbot has been able to trigger the following
> splat:
>
> BUG: KASAN: stack-out-of-bounds in strlcpy include/linux/string.h:300
> [inline]
> BUG: KASAN: stack-out-of-bounds in compat_mtw_from_user
> net/bridge/netfilter/ebtables.c:1957 [inline]
> BUG: KASAN: stack-out-of-bounds in ebt_size_mwt
> net/bridge/netfilter/ebtables.c:2059 [inline]
> BUG: KASAN: stack-out-of-bounds in size_entry_mwt
> net/bridge/netfilter/ebtables.c:2155 [inline]
> BUG: KASAN: stack-out-of-bounds in compat_copy_entries+0x96c/0x14a0
> net/bridge/netfilter/ebtables.c:2194
> Write of size 33 at addr ffff8801b0abf888 by task syz-executor0/4504
Which is weird, I don't understand this report.
The code IS wrong, but it should cause out-of-bounds read (strlen on
src), but not out-of-bounds write.
Yes, I sent a recent patch (dceb48d86b4871984b8ce9ad5057fb2c01aa33de in
nf.git) that would now allow to get rid of the strlcpy and use the
source directly.
^ permalink raw reply
* Re: [PATCH] [PATCH bpf-next] samples/bpf/bpf_load.c: remove redundant ret assignment in bpf_load_program()
From: Daniel Borkmann @ 2018-04-27 9:15 UTC (permalink / raw)
To: Wang Sheng-Hui, ast, netdev
In-Reply-To: <20180425020713.1795-1-shhuiw@foxmail.com>
On 04/25/2018 04:07 AM, Wang Sheng-Hui wrote:
> 2 redundant ret assignments removded:
> * 'ret = 1' before the logic 'if (data_maps)', and if any errors jump to
> label 'done'. No 'ret = 1' needed before the error jump.
> * After the '/* load programs */' part, if everything goes well, then
> the BPF code will be loaded and 'ret' set to 0 by load_and_attach().
> If something goes wrong, 'ret' set to none-O, the redundant 'ret = 0'
> after the for clause will make the error skipped.
> For example, if some BPF code cannot provide supported program types
> in ELF SEC("unknown"), the for clause will not call load_and_attach()
> to load the BPF code. 1 should be returned to callees instead of 0.
>
> Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Applied yesterday to bpf-next (and now in net-next), thanks Nikita!
^ permalink raw reply
* Re: [RFC v3 0/5] virtio: support packed ring
From: Tiwei Bie @ 2018-04-27 9:12 UTC (permalink / raw)
To: Jason Wang
Cc: Michael S. Tsirkin, virtualization, linux-kernel, netdev, wexu,
jfreimann
In-Reply-To: <5c712aa2-f00e-b472-cdfc-48175aea790d@redhat.com>
On Fri, Apr 27, 2018 at 02:17:51PM +0800, Jason Wang wrote:
> On 2018年04月27日 12:18, Michael S. Tsirkin wrote:
> > On Fri, Apr 27, 2018 at 11:56:05AM +0800, Jason Wang wrote:
> > > On 2018年04月25日 13:15, Tiwei Bie wrote:
> > > > Hello everyone,
> > > >
> > > > This RFC implements packed ring support in virtio driver.
> > > >
> > > > Some simple functional tests have been done with Jason's
> > > > packed ring implementation in vhost:
> > > >
> > > > https://lkml.org/lkml/2018/4/23/12
> > > >
> > > > Both of ping and netperf worked as expected (with EVENT_IDX
> > > > disabled). But there are below known issues:
> > > >
> > > > 1. Reloading the guest driver will break the Tx/Rx;
> > > Will have a look at this issue.
> > >
> > > > 2. Zeroing the flags when detaching a used desc will
> > > > break the guest -> host path.
> > > I still think zeroing flags is unnecessary or even a bug. At host, I track
> > > last observed avail wrap counter and detect avail like (what is suggested in
> > > the example code in the spec):
> > >
> > > static bool desc_is_avail(struct vhost_virtqueue *vq, __virtio16 flags)
> > > {
> > > bool avail = flags & cpu_to_vhost16(vq, DESC_AVAIL);
> > >
> > > return avail == vq->avail_wrap_counter;
> > > }
> > >
> > > So zeroing wrap can not work with this obviously.
> > >
> > > Thanks
> > I agree. I think what one should do is flip the available bit.
> >
>
> But is this flipping a must?
>
> Thanks
Yeah, that's my question too. It seems to be a requirement
for driver that, the only change to the desc status that a
driver can do during running is to mark the desc as avail,
and any other changes to the desc status are not allowed.
Similarly, the device can only mark the desc as used, and
any other changes to the desc status are also not allowed.
So the question is, are there such requirements?
Based on below contents in the spec:
"""
Thus VIRTQ_DESC_F_AVAIL and VIRTQ_DESC_F_USED bits are different
for an available descriptor and equal for a used descriptor.
Note that this observation is mostly useful for sanity-checking
as these are necessary but not sufficient conditions
"""
It seems that, it's necessary for devices to check whether
the AVAIL bit and USED bit are different.
Best regards,
Tiwei Bie
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox