Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets
From: Christoph Paasch @ 2018-06-03 19:54 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Maciej Żenczykowski, David S . Miller, Eric Dumazet, netdev
In-Reply-To: <20180603174705.51802-1-zenczykowski@gmail.com>

Hello,

On Sun, Jun 3, 2018 at 10:47 AM, Maciej Żenczykowski
<zenczykowski@gmail.com> wrote:
> From: Maciej Żenczykowski <maze@google.com>
>
> It is not safe to do so because such sockets are already in the
> hash tables and changing these options can result in invalidating
> the tb->fastreuse(port) caching.
>
> This can have later far reaching consequences wrt. bind conflict checks
> which rely on these caches (for optimization purposes).
>
> Not to mention that you can currently end up with two identical
> non-reuseport listening sockets bound to the same local ip:port
> by clearing reuseport on them after they've already both been bound.

as a side-note: Some time back I realized that one can also - on the
active opener side - create two TCP connections with the same 5-tuple
going out over the same interface.

One simply needs to first create a connection with a socket that has
SO_BINDTODEV set that specifies the same interface as the default
route. The second socket (which doesn't uses SO_BINDTODEV) then can
end up using the same source-port, if the range of available ports has
been exhausted.
This makes for some interesting packet-traces! :)

This is because INET_MATCH in __inet_check_established only checks for
!(sk->sk_bound_dev_if). inet_hash_connect() probably would need info
of the route's outgoing interface (of the new socket) to decide
whether or not there is a match.

But even that wouldn't be failsafe as the routing could change later
on... So, I dropped the ball on that.

Not sure if it's a big deal or not...


Cheers,
Christoph



>
> There is unfortunately no EISBOUND error or anything similar,
> and EISCONN seems to be misleading for a bound-but-not-connected
> socket, so use EUCLEAN 'Structure needs cleaning' which AFAICT
> is the closest you can get to meaning 'socket in bad state'.
> (although perhaps EINVAL wouldn't be a bad choice either?)
>
> This does unfortunately run the risk of breaking buggy
> userspace programs...
>
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
> Cc: Eric Dumazet <edumazet@google.com>
>
> Change-Id: I77c2b3429b2fdf42671eee0fa7a8ba721c94963b
> ---
>  net/core/sock.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 435a0ba85e52..feca4c98f8a0 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -728,9 +728,22 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
>                         sock_valbool_flag(sk, SOCK_DBG, valbool);
>                 break;
>         case SO_REUSEADDR:
> -               sk->sk_reuse = (valbool ? SK_CAN_REUSE : SK_NO_REUSE);
> +               val = (valbool ? SK_CAN_REUSE : SK_NO_REUSE);
> +               if ((sk->sk_family == PF_INET || sk->sk_family == PF_INET6) &&
> +                   inet_sk(sk)->inet_num &&
> +                   (sk->sk_reuse != val)) {
> +                       ret = (sk->sk_state == TCP_ESTABLISHED) ? -EISCONN : -EUCLEAN;
> +                       break;
> +               }
> +               sk->sk_reuse = val;
>                 break;
>         case SO_REUSEPORT:
> +               if ((sk->sk_family == PF_INET || sk->sk_family == PF_INET6) &&
> +                   inet_sk(sk)->inet_num &&
> +                   (sk->sk_reuseport != valbool)) {
> +                       ret = (sk->sk_state == TCP_ESTABLISHED) ? -EISCONN : -EUCLEAN;
> +                       break;
> +               }
>                 sk->sk_reuseport = valbool;
>                 break;
>         case SO_TYPE:
> --
> 2.17.1.1185.g55be947832-goog
>

^ permalink raw reply

* Re: [PATCH net-next 0/3] bpf: implement bpf_get_current_cgroup_id() helper
From: Alexei Starovoitov @ 2018-06-03 20:00 UTC (permalink / raw)
  To: Yonghong Song; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180603073654.3600598-1-yhs@fb.com>

On Sun, Jun 03, 2018 at 12:36:51AM -0700, Yonghong Song wrote:
> bpf has been used extensively for tracing. For example, bcc
> contains an almost full set of bpf-based tools to trace kernel
> and user functions/events. Most tracing tools are currently
> either filtered based on pid or system-wide.
> 
> Containers have been used quite extensively in industry and
> cgroup is often used together to provide resource isolation
> and protection. Several processes may run inside the same
> container. It is often desirable to get container-level tracing
> results as well, e.g. syscall count, function count, I/O
> activity, etc.
> 
> This patch implements a new helper, bpf_get_current_cgroup_id(),
> which will return cgroup id based on the cgroup within which
> the current task is running.
> 
> Patch #1 implements the new helper in the kernel.
> Patch #2 syncs the uapi bpf.h header and helper between tools
> and kernel.
> Patch #3 shows how to get the same cgroup id in user space,
> so a filter or policy could be configgured in the bpf program
> based on current task cgroup.

for all patches:
Acked-by: Alexei Starovoitov <ast@kernel.org>

please rebase, so it can be applied and s/net-next/bpf-next/ in subj.
Thanks!

^ permalink raw reply

* Re: ANNOUNCE: Enhanced IP v1.4
From: Tom Herbert @ 2018-06-03 20:37 UTC (permalink / raw)
  To: Sam Patton; +Cc: Willy Tarreau, Linux Kernel Network Developers
In-Reply-To: <330e58f3-61d3-6abc-4f7c-1726e0ce852d@enhancedip.org>

On Sat, Jun 2, 2018 at 9:17 AM, Sam Patton <sam@enhancedip.org> wrote:
> Hello Willy, netdev,
>
> Thank you for your reply and advice.  I couldn't agree more with you
> about containers and the exciting prospects there,
>
> as well as the ADSL scenario you mention.
>
> As far as application examples, check out this simple netcat-like
> program I use for testing:
>
> https://github.com/EnIP/enhancedip/blob/master/userspace/netcat/netcat.c
>
> Lines 61-67 show how to connect directly via an EnIP address.  The
> netcat-like application uses
>
> a header file called eip.h.  You can look at it here:
>
> https://github.com/EnIP/enhancedip/blob/master/userspace/include/eip.h
>
> EnIP makes use of IPv6 AAAA records for DNS lookup.  We simply put
> 2001:0101 (which is an IPv6 experimental prefix) and
>
> then we put the 64-bit EnIP address into the next 8 bytes of the
> address.  The remaining bytes are set to zero.
>
> In the kernel, if you want to see how we convert the IPv6 DNS lookup
> into something connect() can manage,
>
> check out the add_enhanced_ip() routine found here:
>
> https://github.com/EnIP/enhancedip/blob/master/kernel/4.9.28/socket.c
>
> The reason we had to do changes for openssh and not other applications
> (that use DNS) is openssh has a check to
>
> see if the socket is using IP options.  If the socket does, sshd drops
> the connection.  I had to work around that to get openssh working
>
> with EnIP.  The result: if you want to connect the netcat-like program
> with IP addresses you'll end up doing something like the
>
> example above.  If you're using DNS (getaddrinfo) to connect(), it
> should just work (except for sshd as noted).
>
> Here's the draft experimental RFC:
> https://tools.ietf.org/html/draft-chimiak-enhanced-ipv4-03
> I'll also note that I am doing this code part time as a hobby for a long
> time so I appreciate your help and support.  It would be really
>
> great if the kernel community decided to pick this up, but if it's not a
> reality please let me know soonest so I can move on to a
>
Hi Sam,

This is not an inconsequential mechanism that is being proposed. It's
a modification to IP protocol that is intended to work on the
Internet, but it looks like the draft hasn't been updated for two
years and it is not adopted by any IETF working group. I don't see how
this can go anywhere without IETF support. Also, I suggest that you
look at the IPv10 proposal since that was very similar in intent. One
of the reasons that IPv10 shot down was because protocol transition
mechanisms were more interesting ten years ago than today. IPv6 has
good traction now. In fact, it's probably the case that it's now
easier to bring up IPv6 than to try to make IPv4 options work over the
Internet.

Tom


> different hobby.  :)
>
> Thank you.
>
> Sam Patton
>
> On 6/2/18 1:57 AM, Willy Tarreau wrote:
>> Hello Sam,
>>
>> On Fri, Jun 01, 2018 at 09:48:28PM -0400, Sam Patton wrote:
>>> Hello!
>>>
>>> If you do not know what Enhanced IP is, read this post on netdev first:
>>>
>>> https://www.spinics.net/lists/netdev/msg327242.html
>>>
>>>
>>> The Enhanced IP project presents:
>>>
>>>              Enhanced IP v1.4
>>>
>>> The Enhanced IP (EnIP) code has been updated.  It now builds with OpenWRT barrier breaker (for 148 different devices). We've been testing with the Western Digital N600 and N750 wireless home routers.
>> (...) First note, please think about breaking your lines if you want your
>> mails to be read by the widest audience, as for some of us here, reading
>> lines wider than a terminal is really annoying, and often not considered
>> worth spending time on them considering there are so many easier ones
>> left to read.
>>
>>> Interested in seeing Enhanced IP in the Linux kernel, read on.  Not
>>> interested in seeing Enhanced IP in the Linux kernel read on.
>> (...)
>>
>> So I personally find the concept quite interesting. It reminds me of the
>> previous IPv5/IPv7/IPv8 initiatives, which in my opinion were a bit hopeless.
>> Here the fact that you decide to consider the IPv4 address as a network opens
>> new perspectives. For containerized environments it could be considered that
>> each server, with one IPv4, can host 2^32 guests and that NAT is not needed
>> anymore for example. It could also open the possibility that enthousiasts
>> can more easily host some services at home behind their ADSL line without
>> having to run on strange ports.
>>
>> However I think your approach is not the most efficient to encourage adoption.
>> It's important to understand that there will be little incentive for people
>> to patch their kernels to run some code if they don't have the applications
>> on top of it. The kernel is not the end goal for most users, the kernel is
>> just the lower layer needed to run applications on top. I looked at your site
>> and the github repo, and all I could find was a pre-patched openssh, no simple
>> explanation of what to change in an application.
>>
>> What you need to do first is to *explain* how to modify userland applications
>> to support En-IP, provide an echo server and show the parts which have to be
>> changed. Write a simple client and do the same. Provide your changes to
>> existing programs as patches, not as pre-patched code. This way anyone can
>> use your patches on top of other versions, and can use these patches to
>> understand what has to be modified in their applications.
>>
>> Once applications are easy to patch, the incentive to install patched kernels
>> everywhere will be higher. For many enthousiasts, knowing that they only have
>> to modify the ADSL router to automatically make their internal IoT stuff
>> accessible from outside indeed becomes appealing.
>>
>> Then you'll need to provide patches for well known applications like curl,
>> wget, DNS servers (bind...), then browsers.
>>
>> In my case I could be interested in adding support for En-ip into haproxy,
>> and only once I don't see any showstopped in doing this, I'd be willing to
>> patch my kernel to support it.
>>
>> Last advice, provide links to your drafts in future e-mails, they are not
>> easy to find on your site, we have to navigate through various pages to
>> finally find them.
>>
>> Regards,
>> Willy
>

^ permalink raw reply

* [PATCH net-next] net: ipv6: Generate random IID for addresses on RAWIP devices
From: Subash Abhinov Kasiviswanathan @ 2018-06-03 21:54 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan

RAWIP devices such as rmnet do not have a hardware address and
instead require the kernel to generate a random IID for the
temporary addresses. For permanent addresses, the device IID is
used along with prefix received.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 net/ipv6/addrconf.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index f09afc2..e4c4540 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2230,6 +2230,18 @@ static int addrconf_ifid_ip6tnl(u8 *eui, struct net_device *dev)
 	return 0;
 }
 
+static int addrconf_ifid_rawip(u8 *eui, struct net_device *dev)
+{
+	struct in6_addr lladdr;
+
+	if (ipv6_get_lladdr(dev, &lladdr, IFA_F_TENTATIVE))
+		get_random_bytes(eui, 8);
+	else
+		memcpy(eui, lladdr.s6_addr + 8, 8);
+
+	return 0;
+}
+
 static int ipv6_generate_eui64(u8 *eui, struct net_device *dev)
 {
 	switch (dev->type) {
@@ -2252,6 +2264,8 @@ static int ipv6_generate_eui64(u8 *eui, struct net_device *dev)
 	case ARPHRD_TUNNEL6:
 	case ARPHRD_IP6GRE:
 		return addrconf_ifid_ip6tnl(eui, dev);
+	case ARPHRD_RAWIP:
+		return addrconf_ifid_rawip(eui, dev);
 	}
 	return -1;
 }
@@ -3286,7 +3300,8 @@ static void addrconf_dev_config(struct net_device *dev)
 	    (dev->type != ARPHRD_IP6GRE) &&
 	    (dev->type != ARPHRD_IPGRE) &&
 	    (dev->type != ARPHRD_TUNNEL) &&
-	    (dev->type != ARPHRD_NONE)) {
+	    (dev->type != ARPHRD_NONE) &&
+	    (dev->type != ARPHRD_RAWIP)) {
 		/* Alas, we support only Ethernet autoconfiguration. */
 		return;
 	}
-- 
1.9.1

^ permalink raw reply related

* [PATCH net] net: qualcomm: rmnet: Fix use after free while sending command ack
From: Subash Abhinov Kasiviswanathan @ 2018-06-03 22:17 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan

When sending an ack to a command packet, the skb is still referenced
after it is sent to the real device. Since the real device could
free the skb, the device pointer would be invalid.

Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
index 78fdad0..f530b07 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
@@ -67,6 +67,7 @@ static void rmnet_map_send_ack(struct sk_buff *skb,
 			       struct rmnet_port *port)
 {
 	struct rmnet_map_control_command *cmd;
+	struct net_device *dev = skb->dev;
 	int xmit_status;
 
 	if (port->data_format & RMNET_FLAGS_INGRESS_MAP_CKSUMV4) {
@@ -86,9 +87,9 @@ static void rmnet_map_send_ack(struct sk_buff *skb,
 	cmd = RMNET_MAP_GET_CMD_START(skb);
 	cmd->cmd_type = type & 0x03;
 
-	netif_tx_lock(skb->dev);
-	xmit_status = skb->dev->netdev_ops->ndo_start_xmit(skb, skb->dev);
-	netif_tx_unlock(skb->dev);
+	netif_tx_lock(dev);
+	xmit_status = dev->netdev_ops->ndo_start_xmit(skb, dev);
+	netif_tx_unlock(dev);
 }
 
 /* Process MAP command frame and send N/ACK message as appropriate. Message cmd
-- 
1.9.1

^ permalink raw reply related

* Re: ANNOUNCE: Enhanced IP v1.4
From: Eric Dumazet @ 2018-06-03 22:41 UTC (permalink / raw)
  To: Tom Herbert, Sam Patton; +Cc: Willy Tarreau, Linux Kernel Network Developers
In-Reply-To: <CALx6S35Ci6edOGz6Fbge7EuY1pPcK+yziudPi=EZvSbyQt+uRQ@mail.gmail.com>



On 06/03/2018 01:37 PM, Tom Herbert wrote:

> This is not an inconsequential mechanism that is being proposed. It's
> a modification to IP protocol that is intended to work on the
> Internet, but it looks like the draft hasn't been updated for two
> years and it is not adopted by any IETF working group. I don't see how
> this can go anywhere without IETF support. Also, I suggest that you
> look at the IPv10 proposal since that was very similar in intent. One
> of the reasons that IPv10 shot down was because protocol transition
> mechanisms were more interesting ten years ago than today. IPv6 has
> good traction now. In fact, it's probably the case that it's now
> easier to bring up IPv6 than to try to make IPv4 options work over the
> Internet.

+1

Many hosts do not use IPv4 anymore.

We even have the project making IPv4 support in linux optional.

^ permalink raw reply

* [PATCH net-next v2 1/3] bpf: implement bpf_get_current_cgroup_id() helper
From: Yonghong Song @ 2018-06-03 22:59 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20180603225943.2370719-1-yhs@fb.com>

bpf has been used extensively for tracing. For example, bcc
contains an almost full set of bpf-based tools to trace kernel
and user functions/events. Most tracing tools are currently
either filtered based on pid or system-wide.

Containers have been used quite extensively in industry and
cgroup is often used together to provide resource isolation
and protection. Several processes may run inside the same
container. It is often desirable to get container-level tracing
results as well, e.g. syscall count, function count, I/O
activity, etc.

This patch implements a new helper, bpf_get_current_cgroup_id(),
which will return cgroup id based on the cgroup within which
the current task is running.

The later patch will provide an example to show that
userspace can get the same cgroup id so it could
configure a filter or policy in the bpf program based on
task cgroup id.

The helper is currently implemented for tracing. It can
be added to other program types as well when needed.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 include/linux/bpf.h      |  1 +
 include/uapi/linux/bpf.h |  8 +++++++-
 kernel/bpf/core.c        |  1 +
 kernel/bpf/helpers.c     | 15 +++++++++++++++
 kernel/trace/bpf_trace.c |  2 ++
 5 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bbe2974..995c3b1 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -746,6 +746,7 @@ extern const struct bpf_func_proto bpf_get_stackid_proto;
 extern const struct bpf_func_proto bpf_get_stack_proto;
 extern const struct bpf_func_proto bpf_sock_map_update_proto;
 extern const struct bpf_func_proto bpf_sock_hash_update_proto;
+extern const struct bpf_func_proto bpf_get_current_cgroup_id_proto;
 
 /* Shared helpers among cBPF and eBPF. */
 void bpf_user_rnd_init_once(void);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f0b6608..18712b0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2070,6 +2070,11 @@ union bpf_attr {
  * 		**CONFIG_SOCK_CGROUP_DATA** configuration option.
  * 	Return
  * 		The id is returned or 0 in case the id could not be retrieved.
+ *
+ * u64 bpf_get_current_cgroup_id(void)
+ * 	Return
+ * 		A 64-bit integer containing the current cgroup id based
+ * 		on the cgroup within which the current task is running.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2151,7 +2156,8 @@ union bpf_attr {
 	FN(lwt_seg6_action),		\
 	FN(rc_repeat),			\
 	FN(rc_keydown),			\
-	FN(skb_cgroup_id),
+	FN(skb_cgroup_id),		\
+	FN(get_current_cgroup_id),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 527587d..9f14937 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1765,6 +1765,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
 const struct bpf_func_proto bpf_get_current_comm_proto __weak;
 const struct bpf_func_proto bpf_sock_map_update_proto __weak;
 const struct bpf_func_proto bpf_sock_hash_update_proto __weak;
+const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
 
 const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
 {
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 3d24e23..73065e2 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -179,3 +179,18 @@ const struct bpf_func_proto bpf_get_current_comm_proto = {
 	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
 	.arg2_type	= ARG_CONST_SIZE,
 };
+
+#ifdef CONFIG_CGROUPS
+BPF_CALL_0(bpf_get_current_cgroup_id)
+{
+	struct cgroup *cgrp = task_dfl_cgroup(current);
+
+	return cgrp->kn->id.id;
+}
+
+const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
+	.func		= bpf_get_current_cgroup_id,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+};
+#endif
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 752992c..e2ab5b7 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -564,6 +564,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_prandom_u32_proto;
 	case BPF_FUNC_probe_read_str:
 		return &bpf_probe_read_str_proto;
+	case BPF_FUNC_get_current_cgroup_id:
+		return &bpf_get_current_cgroup_id_proto;
 	default:
 		return NULL;
 	}
-- 
2.9.5

^ permalink raw reply related

* [PATCH net-next v2 0/3] bpf: implement bpf_get_current_cgroup_id() helper
From: Yonghong Song @ 2018-06-03 22:59 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team

bpf has been used extensively for tracing. For example, bcc
contains an almost full set of bpf-based tools to trace kernel
and user functions/events. Most tracing tools are currently
either filtered based on pid or system-wide.

Containers have been used quite extensively in industry and
cgroup is often used together to provide resource isolation
and protection. Several processes may run inside the same
container. It is often desirable to get container-level tracing
results as well, e.g. syscall count, function count, I/O
activity, etc.

This patch implements a new helper, bpf_get_current_cgroup_id(),
which will return cgroup id based on the cgroup within which
the current task is running.

Patch #1 implements the new helper in the kernel.
Patch #2 syncs the uapi bpf.h header and helper between tools
and kernel.
Patch #3 shows how to get the same cgroup id in user space,
so a filter or policy could be configgured in the bpf program
based on current task cgroup.

Changelog:
  v1 -> v2:
     . rebase to resolve merge conflict with latest bpf-next.

Yonghong Song (3):
  bpf: implement bpf_get_current_cgroup_id() helper
  tools/bpf: sync uapi bpf.h for bpf_get_current_cgroup_id() helper
  tools/bpf: add a selftest for bpf_get_current_cgroup_id() helper

 include/linux/bpf.h                              |   1 +
 include/uapi/linux/bpf.h                         |   8 +-
 kernel/bpf/core.c                                |   1 +
 kernel/bpf/helpers.c                             |  15 +++
 kernel/trace/bpf_trace.c                         |   2 +
 tools/include/uapi/linux/bpf.h                   |   8 +-
 tools/testing/selftests/bpf/.gitignore           |   1 +
 tools/testing/selftests/bpf/Makefile             |   6 +-
 tools/testing/selftests/bpf/bpf_helpers.h        |   2 +
 tools/testing/selftests/bpf/cgroup_helpers.c     |  57 +++++++++
 tools/testing/selftests/bpf/cgroup_helpers.h     |   1 +
 tools/testing/selftests/bpf/get_cgroup_id_kern.c |  28 +++++
 tools/testing/selftests/bpf/get_cgroup_id_user.c | 141 +++++++++++++++++++++++
 13 files changed, 267 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/get_cgroup_id_kern.c
 create mode 100644 tools/testing/selftests/bpf/get_cgroup_id_user.c

-- 
2.9.5

^ permalink raw reply

* [PATCH net-next v2 2/3] tools/bpf: sync uapi bpf.h for bpf_get_current_cgroup_id() helper
From: Yonghong Song @ 2018-06-03 22:59 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20180603225943.2370719-1-yhs@fb.com>

Sync kernel uapi/linux/bpf.h with tools uapi/linux/bpf.h.
Also add the necessary helper define in bpf_helpers.h.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/include/uapi/linux/bpf.h            | 8 +++++++-
 tools/testing/selftests/bpf/bpf_helpers.h | 2 ++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f0b6608..18712b0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2070,6 +2070,11 @@ union bpf_attr {
  * 		**CONFIG_SOCK_CGROUP_DATA** configuration option.
  * 	Return
  * 		The id is returned or 0 in case the id could not be retrieved.
+ *
+ * u64 bpf_get_current_cgroup_id(void)
+ * 	Return
+ * 		A 64-bit integer containing the current cgroup id based
+ * 		on the cgroup within which the current task is running.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2151,7 +2156,8 @@ union bpf_attr {
 	FN(lwt_seg6_action),		\
 	FN(rc_repeat),			\
 	FN(rc_keydown),			\
-	FN(skb_cgroup_id),
+	FN(skb_cgroup_id),		\
+	FN(get_current_cgroup_id),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index a66a9d9..f2f28b6 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -131,6 +131,8 @@ static int (*bpf_rc_repeat)(void *ctx) =
 static int (*bpf_rc_keydown)(void *ctx, unsigned int protocol,
 			     unsigned long long scancode, unsigned int toggle) =
 	(void *) BPF_FUNC_rc_keydown;
+static unsigned long long (*bpf_get_current_cgroup_id)(void) =
+	(void *) BPF_FUNC_get_current_cgroup_id;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
-- 
2.9.5

^ permalink raw reply related

* [PATCH net-next v2 3/3] tools/bpf: add a selftest for bpf_get_current_cgroup_id() helper
From: Yonghong Song @ 2018-06-03 22:59 UTC (permalink / raw)
  To: ast, daniel, netdev; +Cc: kernel-team
In-Reply-To: <20180603225943.2370719-1-yhs@fb.com>

Syscall name_to_handle_at() can be used to get cgroup id
for a particular cgroup path in user space. The selftest
got cgroup id from both user and kernel, and compare to
ensure they are equal to each other.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 tools/testing/selftests/bpf/.gitignore           |   1 +
 tools/testing/selftests/bpf/Makefile             |   6 +-
 tools/testing/selftests/bpf/cgroup_helpers.c     |  57 +++++++++
 tools/testing/selftests/bpf/cgroup_helpers.h     |   1 +
 tools/testing/selftests/bpf/get_cgroup_id_kern.c |  28 +++++
 tools/testing/selftests/bpf/get_cgroup_id_user.c | 141 +++++++++++++++++++++++
 6 files changed, 232 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/get_cgroup_id_kern.c
 create mode 100644 tools/testing/selftests/bpf/get_cgroup_id_user.c

diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index 6ea8359..49938d7 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -18,3 +18,4 @@ urandom_read
 test_btf
 test_sockmap
 test_lirc_mode2_user
+get_cgroup_id_user
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 553d181..607ed87 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -24,7 +24,7 @@ urandom_read: urandom_read.c
 # Order correspond to 'make run_tests' order
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \
 	test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
-	test_sock test_btf test_sockmap test_lirc_mode2_user
+	test_sock test_btf test_sockmap test_lirc_mode2_user get_cgroup_id_user
 
 TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o \
 	test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o     \
@@ -34,7 +34,8 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
 	sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o test_adjust_tail.o \
 	test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o \
 	test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \
-	test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o
+	test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \
+	get_cgroup_id_kern.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
@@ -63,6 +64,7 @@ $(OUTPUT)/test_sock: cgroup_helpers.c
 $(OUTPUT)/test_sock_addr: cgroup_helpers.c
 $(OUTPUT)/test_sockmap: cgroup_helpers.c
 $(OUTPUT)/test_progs: trace_helpers.c
+$(OUTPUT)/get_cgroup_id_user: cgroup_helpers.c
 
 .PHONY: force
 
diff --git a/tools/testing/selftests/bpf/cgroup_helpers.c b/tools/testing/selftests/bpf/cgroup_helpers.c
index f3bca3a..c87b4e0 100644
--- a/tools/testing/selftests/bpf/cgroup_helpers.c
+++ b/tools/testing/selftests/bpf/cgroup_helpers.c
@@ -6,6 +6,7 @@
 #include <sys/types.h>
 #include <linux/limits.h>
 #include <stdio.h>
+#include <stdlib.h>
 #include <linux/sched.h>
 #include <fcntl.h>
 #include <unistd.h>
@@ -176,3 +177,59 @@ int create_and_get_cgroup(char *path)
 
 	return fd;
 }
+
+/**
+ * get_cgroup_id() - Get cgroup id for a particular cgroup path
+ * @path: The cgroup path, relative to the workdir, to join
+ *
+ * On success, it returns the cgroup id. On failure it returns 0,
+ * which is an invalid cgroup id.
+ * If there is a failure, it prints the error to stderr.
+ */
+unsigned long long get_cgroup_id(char *path)
+{
+	int dirfd, err, flags, mount_id, fhsize;
+	union {
+		unsigned long long cgid;
+		unsigned char raw_bytes[8];
+	} id;
+	char cgroup_workdir[PATH_MAX + 1];
+	struct file_handle *fhp, *fhp2;
+	unsigned long long ret = 0;
+
+	format_cgroup_path(cgroup_workdir, path);
+
+	dirfd = AT_FDCWD;
+	flags = 0;
+	fhsize = sizeof(*fhp);
+	fhp = calloc(1, fhsize);
+	if (!fhp) {
+		log_err("calloc");
+		return 0;
+	}
+	err = name_to_handle_at(dirfd, cgroup_workdir, fhp, &mount_id, flags);
+	if (err >= 0 || fhp->handle_bytes != 8) {
+		log_err("name_to_handle_at");
+		goto free_mem;
+	}
+
+	fhsize = sizeof(struct file_handle) + fhp->handle_bytes;
+	fhp2 = realloc(fhp, fhsize);
+	if (!fhp2) {
+		log_err("realloc");
+		goto free_mem;
+	}
+	err = name_to_handle_at(dirfd, cgroup_workdir, fhp2, &mount_id, flags);
+	fhp = fhp2;
+	if (err < 0) {
+		log_err("name_to_handle_at");
+		goto free_mem;
+	}
+
+	memcpy(id.raw_bytes, fhp->f_handle, 8);
+	ret = id.cgid;
+
+free_mem:
+	free(fhp);
+	return ret;
+}
diff --git a/tools/testing/selftests/bpf/cgroup_helpers.h b/tools/testing/selftests/bpf/cgroup_helpers.h
index 06485e0..20a4a5d 100644
--- a/tools/testing/selftests/bpf/cgroup_helpers.h
+++ b/tools/testing/selftests/bpf/cgroup_helpers.h
@@ -13,5 +13,6 @@ int create_and_get_cgroup(char *path);
 int join_cgroup(char *path);
 int setup_cgroup_environment(void);
 void cleanup_cgroup_environment(void);
+unsigned long long get_cgroup_id(char *path);
 
 #endif
diff --git a/tools/testing/selftests/bpf/get_cgroup_id_kern.c b/tools/testing/selftests/bpf/get_cgroup_id_kern.c
new file mode 100644
index 0000000..2cf8cb2
--- /dev/null
+++ b/tools/testing/selftests/bpf/get_cgroup_id_kern.c
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Facebook
+
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") cg_ids = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u64),
+	.max_entries = 1,
+};
+
+SEC("tracepoint/syscalls/sys_enter_nanosleep")
+int trace(void *ctx)
+{
+	__u32 key = 0;
+	__u64 *val;
+
+	val = bpf_map_lookup_elem(&cg_ids, &key);
+	if (val)
+		*val = bpf_get_current_cgroup_id();
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1; /* ignored by tracepoints, required by libbpf.a */
diff --git a/tools/testing/selftests/bpf/get_cgroup_id_user.c b/tools/testing/selftests/bpf/get_cgroup_id_user.c
new file mode 100644
index 0000000..ea19a42
--- /dev/null
+++ b/tools/testing/selftests/bpf/get_cgroup_id_user.c
@@ -0,0 +1,141 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Facebook
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <syscall.h>
+#include <unistd.h>
+#include <linux/perf_event.h>
+#include <sys/ioctl.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include <linux/bpf.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "cgroup_helpers.h"
+#include "bpf_rlimit.h"
+
+#define CHECK(condition, tag, format...) ({		\
+	int __ret = !!(condition);			\
+	if (__ret) {					\
+		printf("%s:FAIL:%s ", __func__, tag);	\
+		printf(format);				\
+	} else {					\
+		printf("%s:PASS:%s\n", __func__, tag);	\
+	}						\
+	__ret;						\
+})
+
+static int bpf_find_map(const char *test, struct bpf_object *obj,
+			const char *name)
+{
+	struct bpf_map *map;
+
+	map = bpf_object__find_map_by_name(obj, name);
+	if (!map)
+		return -1;
+	return bpf_map__fd(map);
+}
+
+#define TEST_CGROUP "/test-bpf-get-cgroup-id/"
+
+int main(int argc, char **argv)
+{
+	const char *probe_name = "syscalls/sys_enter_nanosleep";
+	const char *file = "get_cgroup_id_kern.o";
+	int err, bytes, efd, prog_fd, pmu_fd;
+	struct perf_event_attr attr = {};
+	int cgroup_fd, cgidmap_fd;
+	struct bpf_object *obj;
+	__u64 kcgid = 0, ucgid;
+	int exit_code = 1;
+	char buf[256];
+	__u32 key = 0;
+
+	err = setup_cgroup_environment();
+	if (CHECK(err, "setup_cgroup_environment", "err %d errno %d\n", err,
+		  errno))
+		return 1;
+
+	cgroup_fd = create_and_get_cgroup(TEST_CGROUP);
+	if (CHECK(cgroup_fd < 0, "create_and_get_cgroup", "err %d errno %d\n",
+		  cgroup_fd, errno))
+		goto cleanup_cgroup_env;
+
+	err = join_cgroup(TEST_CGROUP);
+	if (CHECK(err, "join_cgroup", "err %d errno %d\n", err, errno))
+		goto cleanup_cgroup_env;
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
+	if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno))
+		goto cleanup_cgroup_env;
+
+	cgidmap_fd = bpf_find_map(__func__, obj, "cg_ids");
+	if (CHECK(cgidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
+		  cgidmap_fd, errno))
+		goto close_prog;
+
+	snprintf(buf, sizeof(buf),
+		 "/sys/kernel/debug/tracing/events/%s/id", probe_name);
+	efd = open(buf, O_RDONLY, 0);
+	if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
+		goto close_prog;
+	bytes = read(efd, buf, sizeof(buf));
+	close(efd);
+	if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read",
+		  "bytes %d errno %d\n", bytes, errno))
+		goto close_prog;
+
+	attr.config = strtol(buf, NULL, 0);
+	attr.type = PERF_TYPE_TRACEPOINT;
+	attr.sample_type = PERF_SAMPLE_RAW;
+	attr.sample_period = 1;
+	attr.wakeup_events = 1;
+
+	/* attach to this pid so the all bpf invocations will be in the
+	 * cgroup associated with this pid.
+	 */
+	pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0);
+	if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd,
+		  errno))
+		goto close_prog;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
+	if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err,
+		  errno))
+		goto close_pmu;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
+	if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err,
+		  errno))
+		goto close_pmu;
+
+	/* trigger some syscalls */
+	sleep(1);
+
+	err = bpf_map_lookup_elem(cgidmap_fd, &key, &kcgid);
+	if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno))
+		goto close_pmu;
+
+	ucgid = get_cgroup_id(TEST_CGROUP);
+	if (CHECK(kcgid != ucgid, "compare_cgroup_id",
+		  "kern cgid %llx user cgid %llx", kcgid, ucgid))
+		goto close_pmu;
+
+	exit_code = 0;
+	printf("%s:PASS\n", argv[0]);
+
+close_pmu:
+	close(pmu_fd);
+close_prog:
+	bpf_object__close(obj);
+cleanup_cgroup_env:
+	cleanup_cgroup_environment();
+	return exit_code;
+}
-- 
2.9.5

^ permalink raw reply related

* Re: [PATCH net-next 0/3] bpf: implement bpf_get_current_cgroup_id() helper
From: Yonghong Song @ 2018-06-03 23:03 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180603195959.jvo54so66mhkpvww@ast-mbp>



On 6/3/18 1:00 PM, Alexei Starovoitov wrote:
> On Sun, Jun 03, 2018 at 12:36:51AM -0700, Yonghong Song wrote:
>> bpf has been used extensively for tracing. For example, bcc
>> contains an almost full set of bpf-based tools to trace kernel
>> and user functions/events. Most tracing tools are currently
>> either filtered based on pid or system-wide.
>>
>> Containers have been used quite extensively in industry and
>> cgroup is often used together to provide resource isolation
>> and protection. Several processes may run inside the same
>> container. It is often desirable to get container-level tracing
>> results as well, e.g. syscall count, function count, I/O
>> activity, etc.
>>
>> This patch implements a new helper, bpf_get_current_cgroup_id(),
>> which will return cgroup id based on the cgroup within which
>> the current task is running.
>>
>> Patch #1 implements the new helper in the kernel.
>> Patch #2 syncs the uapi bpf.h header and helper between tools
>> and kernel.
>> Patch #3 shows how to get the same cgroup id in user space,
>> so a filter or policy could be configgured in the bpf program
>> based on current task cgroup.
> 
> for all patches:
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> 
> please rebase, so it can be applied and s/net-next/bpf-next/ in subj.

Sorry. Missed to change subject line from "net-next" to "bpf-next".
Do you want to submit another revision?

> Thanks!
> 

^ permalink raw reply

* KASAN: slab-out-of-bounds Read in bpf_csum_update
From: syzbot @ 2018-06-03 23:36 UTC (permalink / raw)
  To: ast, daniel, davem, linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    0512e0134582 Merge tag 'xfs-4.17-fixes-3' of git://git.ker..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=17eb2d7b800000
kernel config:  https://syzkaller.appspot.com/x/.config?x=968b0b23c7854c0b
dashboard link: https://syzkaller.appspot.com/bug?extid=efae31b384d5badbd620
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=162c6def800000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=14fe3db7800000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+efae31b384d5badbd620@syzkaller.appspotmail.com

random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
==================================================================
BUG: KASAN: slab-out-of-bounds in ____bpf_csum_update  
net/core/filter.c:1679 [inline]
BUG: KASAN: slab-out-of-bounds in bpf_csum_update+0xb4/0xc0  
net/core/filter.c:1673
Read of size 1 at addr ffff8801d9235b50 by task syz-executor507/4513

CPU: 0 PID: 4513 Comm: syz-executor507 Not tainted 4.17.0-rc7+ #78
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
  __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
  ____bpf_csum_update net/core/filter.c:1679 [inline]
  bpf_csum_update+0xb4/0xc0 net/core/filter.c:1673

Allocated by task 0:
(stack is not available)

Freed by task 0:
(stack is not available)

The buggy address belongs to the object at ffff8801d9235a40
  which belongs to the cache skbuff_head_cache of size 232
The buggy address is located 40 bytes to the right of
  232-byte region [ffff8801d9235a40, ffff8801d9235b28)
The buggy address belongs to the page:
page:ffffea0007648d40 count:1 mapcount:0 mapping:ffff8801d9235040 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffff8801d9235040 0000000000000000 000000010000000c
raw: ffffea00074360a0 ffff8801d944d848 ffff8801d9bdd6c0 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801d9235a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff8801d9235a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff8801d9235b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                  ^
  ffff8801d9235b80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff8801d9235c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* KASAN: use-after-free Read in skb_ensure_writable
From: syzbot @ 2018-06-03 23:36 UTC (permalink / raw)
  To: ast, daniel, davem, linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    bcece5dc40b9 bpf: Change bpf_fib_lookup to return -EAFNOSU..
git tree:       bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=10ee76b7800000
kernel config:  https://syzkaller.appspot.com/x/.config?x=e4078980b886800c
dashboard link: https://syzkaller.appspot.com/bug?extid=c8504affd4fdd0c1b626
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=10d926df800000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1778c26f800000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+c8504affd4fdd0c1b626@syzkaller.appspotmail.com

random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
==================================================================
BUG: KASAN: use-after-free in pskb_may_pull include/linux/skbuff.h:2108  
[inline]
BUG: KASAN: use-after-free in skb_ensure_writable+0x554/0x620  
net/core/skbuff.c:5118
Read of size 4 at addr ffff8801b0b40fc0 by task syz-executor258/4479

CPU: 0 PID: 4479 Comm: syz-executor258 Not tainted 4.17.0-rc6+ #29
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
  __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
  pskb_may_pull include/linux/skbuff.h:2108 [inline]
  skb_ensure_writable+0x554/0x620 net/core/skbuff.c:5118
  __bpf_try_make_writable net/core/filter.c:1606 [inline]
  bpf_try_make_writable net/core/filter.c:1612 [inline]
  ____bpf_l3_csum_replace net/core/filter.c:1774 [inline]
  bpf_l3_csum_replace+0x8c/0x4d0 net/core/filter.c:1765

The buggy address belongs to the page:
page:ffffea0006c2d000 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x2fffc0000000000()
raw: 02fffc0000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: ffffea00075ea760 ffffea0006c39660 ffff8801b5848738 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801b0b40e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  ffff8801b0b40f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ffff8801b0b40f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                            ^
  ffff8801b0b41000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff8801b0b41080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* KASAN: slab-out-of-bounds Read in skb_ensure_writable
From: syzbot @ 2018-06-03 23:36 UTC (permalink / raw)
  To: ast, daniel, davem, linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    0512e0134582 Merge tag 'xfs-4.17-fixes-3' of git://git.ker..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=14956af7800000
kernel config:  https://syzkaller.appspot.com/x/.config?x=968b0b23c7854c0b
dashboard link: https://syzkaller.appspot.com/bug?extid=e5190cb881d8660fb1a3
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=123d9d7b800000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=100329d7800000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+e5190cb881d8660fb1a3@syzkaller.appspotmail.com

random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
==================================================================
BUG: KASAN: slab-out-of-bounds in pskb_may_pull include/linux/skbuff.h:2104  
[inline]
BUG: KASAN: slab-out-of-bounds in skb_ensure_writable+0x554/0x620  
net/core/skbuff.c:5101
Read of size 4 at addr ffff8801aefc1780 by task syz-executor159/4509

CPU: 0 PID: 4509 Comm: syz-executor159 Not tainted 4.17.0-rc7+ #78
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
  __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432
  pskb_may_pull include/linux/skbuff.h:2104 [inline]
  skb_ensure_writable+0x554/0x620 net/core/skbuff.c:5101
  __bpf_try_make_writable net/core/filter.c:1419 [inline]
  bpf_try_make_writable net/core/filter.c:1425 [inline]
  ____bpf_l3_csum_replace net/core/filter.c:1546 [inline]
  bpf_l3_csum_replace+0x8c/0x4d0 net/core/filter.c:1537

Allocated by task 0:
(stack is not available)

Freed by task 0:
(stack is not available)

The buggy address belongs to the object at ffff8801aefc1680
  which belongs to the cache skbuff_head_cache of size 232
The buggy address is located 24 bytes to the right of
  232-byte region [ffff8801aefc1680, ffff8801aefc1768)
The buggy address belongs to the page:
page:ffffea0006bbf040 count:1 mapcount:0 mapping:ffff8801aefc1040 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffff8801aefc1040 0000000000000000 000000010000000c
raw: ffffea0006b61060 ffffea0006bd69e0 ffff8801d9bdd6c0 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801aefc1680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff8801aefc1700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff8801aefc1780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                    ^
  ffff8801aefc1800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff8801aefc1880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* KASAN: use-after-free Read in bpf_csum_update
From: syzbot @ 2018-06-03 23:36 UTC (permalink / raw)
  To: ast, daniel, davem, linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    bcece5dc40b9 bpf: Change bpf_fib_lookup to return -EAFNOSU..
git tree:       bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=161e2c6f800000
kernel config:  https://syzkaller.appspot.com/x/.config?x=e4078980b886800c
dashboard link: https://syzkaller.appspot.com/bug?extid=3d0b2441dbb71751615e
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=17cb5adf800000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=17ebf19f800000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+3d0b2441dbb71751615e@syzkaller.appspotmail.com

random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
random: sshd: uninitialized urandom read (32 bytes read)
==================================================================
BUG: KASAN: use-after-free in ____bpf_csum_update net/core/filter.c:1907  
[inline]
BUG: KASAN: use-after-free in bpf_csum_update+0xb4/0xc0  
net/core/filter.c:1901
Read of size 1 at addr ffff8801ad062f10 by task syz-executor354/4488

CPU: 1 PID: 4488 Comm: syz-executor354 Not tainted 4.17.0-rc6+ #29
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
  __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
  ____bpf_csum_update net/core/filter.c:1907 [inline]
  bpf_csum_update+0xb4/0xc0 net/core/filter.c:1901

The buggy address belongs to the page:
page:ffffea0006b41880 count:0 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x2fffc0000000000()
raw: 02fffc0000000000 0000000000000000 0000000000000000 00000000ffffffff
raw: ffffea0006b41820 ffffea0006b671a0 0000000000000000 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801ad062e00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  ffff8801ad062e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ffff8801ad062f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                          ^
  ffff8801ad062f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
  ffff8801ad063000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH net-next] net: ipv6: Generate random IID for addresses on RAWIP devices
From: 吉藤英明 @ 2018-06-03 23:50 UTC (permalink / raw)
  To: Subash Abhinov Kasiviswanathan; +Cc: davem, netdev, yoshfuji
In-Reply-To: <1528062874-19250-1-git-send-email-subashab@codeaurora.org>

Hello,

2018-06-04 6:54 GMT+09:00 Subash Abhinov Kasiviswanathan
<subashab@codeaurora.org>:
> RAWIP devices such as rmnet do not have a hardware address and
> instead require the kernel to generate a random IID for the
> temporary addresses. For permanent addresses, the device IID is
> used along with prefix received.
>
> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
> ---
>  net/ipv6/addrconf.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index f09afc2..e4c4540 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -2230,6 +2230,18 @@ static int addrconf_ifid_ip6tnl(u8 *eui, struct net_device *dev)
>         return 0;
>  }
>
> +static int addrconf_ifid_rawip(u8 *eui, struct net_device *dev)
> +{
> +       struct in6_addr lladdr;
> +
> +       if (ipv6_get_lladdr(dev, &lladdr, IFA_F_TENTATIVE))
> +               get_random_bytes(eui, 8);

Please be aware of I/G bit and G/L bit.

--yoshfuji

^ permalink raw reply

* Re: [PATCH 15/18] rhashtable: use bit_spin_locks to protect hash bucket.
From: NeilBrown @ 2018-06-04  0:25 UTC (permalink / raw)
  To: Eric Dumazet, Herbert Xu
  Cc: Thomas Graf, netdev, linux-kernel, Eric Dumazet, David S. Miller
In-Reply-To: <9bea77df-e7db-677a-31b2-710dc6d956ee@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 677 bytes --]

On Sat, Jun 02 2018, Eric Dumazet wrote:

> On 06/02/2018 01:03 AM, Herbert Xu wrote:
>  
>> Yes the concept looks good to me.  But I would like to hear from
>> Eric/Dave as to whether this would be acceptable for existing
>> network hash tables such as the ones in inet.
>
>
> What about lockdep support ?

bitlocks don't have native lockdep support.
I would be fairly easy to add lockdep support to
rht_{lock,unlock,unlocked} if you think it is worthwhile.
It could only really help if a hash-function or cmp-function took a
lock, but it is not a great cost so we may as well just do it.
I'll try to have a patch in the next day or so.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* Re: [PATCH 10/18] rhashtable: remove rhashtable_walk_peek()
From: NeilBrown @ 2018-06-04  0:30 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Thomas Graf, netdev, linux-kernel, Tom Herbert
In-Reply-To: <20180602154851.pfy4wryezuhxp76v@gondor.apana.org.au>

[-- Attachment #1: Type: text/plain, Size: 1530 bytes --]

On Sat, Jun 02 2018, Herbert Xu wrote:

> On Fri, Jun 01, 2018 at 02:44:09PM +1000, NeilBrown wrote:
>> This function has a somewhat confused behavior that is not properly
>> described by the documentation.
>> Sometimes is returns the previous object, sometimes it returns the
>> next one.
>> Sometimes it changes the iterator, sometimes it doesn't.
>> 
>> This function is not currently used and is not worth keeping, so
>> remove it.
>> 
>> A future patch will introduce a new function with a
>> simpler interface which can meet the same need that
>> this was added for.
>> 
>> Signed-off-by: NeilBrown <neilb@suse.com>
>
> Please keep Tom Herbert in the loop.  IIRC he had an issue with
> this patch.

Yes you are right - sorry for forgetting to add Tom.

My understanding of where this issue stands is that Tom raised issue and
asked for clarification, I replied, nothing further happened.

It summary, my position is that:
- most users of my new rhashtable_walk_prev() will use it like
   rhasthable_talk_prev() ?: rhashtable_walk_next()
  which is close to what rhashtable_walk_peek() does
- I know of no use-case that could not be solved if we only had
  the combined operation
- BUT it is hard to document the combined operation, as it really
  does two things.  If it is hard to document, then it might be
  hard to understand.

So provide the most understandable/maintainable solution, I think
we should provide rhashtable_walk_prev() as a separate interface.

Thanks,
NeilBronw

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply

* Re: [PATCH v5 net] stmmac: 802.1ad tag stripping fix
From: Toshiaki Makita @ 2018-06-04  0:49 UTC (permalink / raw)
  To: David Miller, eladv6
  Cc: Jose.Abreu, f.fainelli, netdev, peppe.cavallaro, alexandre.torgue
In-Reply-To: <20180603.103313.1827081859745223157.davem@davemloft.net>

On 2018/06/03 23:33, David Miller wrote:
> From: Elad Nachman <eladv6@gmail.com>
> Date: Wed, 30 May 2018 08:48:25 +0300
> 
>>  static void stmmac_rx_vlan(struct net_device *dev, struct sk_buff *skb)
>>  {
>> -	struct ethhdr *ehdr;
>> +	struct vlan_ethhdr *veth;
>>  	u16 vlanid;
>> +	__be16 vlan_proto;
> 
> Please order local variables from longest to shortest line.
> 
>>  
>> -	if ((dev->features & NETIF_F_HW_VLAN_CTAG_RX) ==
>> -	    NETIF_F_HW_VLAN_CTAG_RX &&
>> -	    !__vlan_get_tag(skb, &vlanid)) {
>> +	if (!__vlan_get_tag(skb, &vlanid)) {
>>  		/* pop the vlan tag */
>> -		ehdr = (struct ethhdr *)skb->data;
>> -		memmove(skb->data + VLAN_HLEN, ehdr, ETH_ALEN * 2);
>> +		veth = (struct vlan_ethhdr *)skb->data;
>> +		vlan_proto = veth->h_vlan_proto;
>> +		memmove(skb->data + VLAN_HLEN, veth, ETH_ALEN * 2);
>>  		skb_pull(skb, VLAN_HLEN);
>> -		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlanid);
>> +		__vlan_hwaccel_put_tag(skb, vlan_proto, vlanid);
>>  	}
>>  }
> 
> I can't see how it is valid to do an unconditional software VLAN
> untagging even when VLAN is disabled in the kernel config or the
> NETIF_F_* feature bits are not set.

Right. It is not valid.

> 
> At a minimum that feature test has to stay there, and when it's clear
> we let the generic VLAN code untag the packet.

Since NETIF_F_HW_VLAN_*_RX are not protocol agnostic, we need two kind
of similar checking here.

veth = (struct vlan_ethhdr *)skb->data;
vlan_proto = veth->h_vlan_proto;
if ((vlan_proto == htons(ETH_P_8021Q) &&
     dev->features & NETIF_F_HW_VLAN_CTAG_RX) ||
    (vlan_proto == htons(ETH_P_8021AD) &&
     dev->features & NETIF_F_HW_VLAN_STAG_RX) {
	vlanid = ntohs(veth->h_vlan_TCI);
	memmove(...);
	skb_pull(...);
	__vlan_hwaccel_put_tag(skb, vlan_proto, vlanid);
}

An alternative way is not to check vlan_proto or features here but
compile this code only when VLAN is enabled in the kernel config. This
can be valid only because this driver does not have NETIF_F_HW_VLAN_*_RX
in hw_features and they can not be toggled for now.

static void stmmac_rx_vlan(struct net_device *dev, struct sk_buff *skb)
{
#ifdef STMMAC_VLAN_TAG_USED
	...
	if (!__vlan_get_tag(skb, &vlanid)) {
		...
		__vlan_hwaccel_put_tag(skb, vlan_proto, vlanid);
	}
#endif
}

-- 
Toshiaki Makita

^ permalink raw reply

* Re: [PATCH net-next] netfilter: fix null-ptr-deref in nf_nat_decode_session
From: Prashant Bhole @ 2018-06-04  1:10 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Florian Westphal, David S . Miller, Jozsef Kadlecsik, netdev,
	netfilter-devel
In-Reply-To: <20180528105249.36pvoyo5loqjjmsa@breakpoint.cc>

CC netfilter-devel

On 5/28/2018 7:52 PM, Florian Westphal wrote:
> Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> wrote:
>> Add null check for nat_hook in nf_nat_decode_session()
> 
> Acked-by: Florian Westphal <fw@strlen.de>

Hi Pablo,
Just pinging in case this patch was missed.

-Prashant

^ permalink raw reply

* Re: [PATCH iproute2] iplink_vrf: Save device index from response for return code
From: Hangbin Liu @ 2018-06-04  1:11 UTC (permalink / raw)
  To: dsahern; +Cc: stephen, netdev, David Ahern, Phil Sutter
In-Reply-To: <20180601155016.3524-1-dsahern@kernel.org>

On Fri, Jun 01, 2018 at 08:50:16AM -0700, dsahern@kernel.org wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> A recent commit changed rtnl_talk_* to return the response message in
> allocated memory so callers need to free it. The change to name_is_vrf
> did not save the device index which is pointing to a struct inside the
> now allocated and freed memory resulting in garbage getting returned
> in some cases.
> 
> Fix by using a stack variable to save the return value and only set
> it to ifi->ifi_index after all checks are done and before the answer
> buffer is freed.
> 
> Fixes: 86bf43c7c2fdc ("lib/libnetlink: update rtnl_talk to support malloc buff at run time")
> Cc: Hangbin Liu <liuhangbin@gmail.com>
> Cc: Phil Sutter <phil@nwl.cc>
> Signed-off-by: David Ahern <dsahern@gmail.com>
> ---
>  ip/iplink_vrf.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/ip/iplink_vrf.c b/ip/iplink_vrf.c
> index e9dd0df98412..6004bb4f305e 100644
> --- a/ip/iplink_vrf.c
> +++ b/ip/iplink_vrf.c
> @@ -191,6 +191,7 @@ int name_is_vrf(const char *name)
>  	struct rtattr *tb[IFLA_MAX+1];
>  	struct rtattr *li[IFLA_INFO_MAX+1];
>  	struct ifinfomsg *ifi;
> +	int ifindex = 0;
>  	int len;
>  
>  	addattr_l(&req.n, sizeof(req), IFLA_IFNAME, name, strlen(name) + 1);
> @@ -218,7 +219,8 @@ int name_is_vrf(const char *name)
>  	if (strcmp(RTA_DATA(li[IFLA_INFO_KIND]), "vrf"))
>  		goto out;
>  
> +	ifindex = ifi->ifi_index;
>  out:
>  	free(answer);
> -	return ifi->ifi_index;
> +	return ifindex;
>  }
> -- 
> 2.11.0
> 

Thanks for the fix.

Acked-by: Hangbin Liu <liuhangbin@gmail.com>

^ permalink raw reply

* Re: [PATCH 10/18] rhashtable: remove rhashtable_walk_peek()
From: Tom Herbert @ 2018-06-04  1:18 UTC (permalink / raw)
  To: NeilBrown
  Cc: Herbert Xu, Thomas Graf, Linux Kernel Network Developers, LKML,
	Tom Herbert
In-Reply-To: <87y3fvpf40.fsf@notabene.neil.brown.name>

On Sun, Jun 3, 2018 at 5:30 PM, NeilBrown <neilb@suse.com> wrote:
> On Sat, Jun 02 2018, Herbert Xu wrote:
>
>> On Fri, Jun 01, 2018 at 02:44:09PM +1000, NeilBrown wrote:
>>> This function has a somewhat confused behavior that is not properly
>>> described by the documentation.
>>> Sometimes is returns the previous object, sometimes it returns the
>>> next one.
>>> Sometimes it changes the iterator, sometimes it doesn't.
>>>
>>> This function is not currently used and is not worth keeping, so
>>> remove it.
>>>
>>> A future patch will introduce a new function with a
>>> simpler interface which can meet the same need that
>>> this was added for.
>>>
>>> Signed-off-by: NeilBrown <neilb@suse.com>
>>
>> Please keep Tom Herbert in the loop.  IIRC he had an issue with
>> this patch.
>
> Yes you are right - sorry for forgetting to add Tom.
>
> My understanding of where this issue stands is that Tom raised issue and
> asked for clarification, I replied, nothing further happened.
>
> It summary, my position is that:
> - most users of my new rhashtable_walk_prev() will use it like
>    rhasthable_talk_prev() ?: rhashtable_walk_next()
>   which is close to what rhashtable_walk_peek() does
> - I know of no use-case that could not be solved if we only had
>   the combined operation
> - BUT it is hard to document the combined operation, as it really
>   does two things.  If it is hard to document, then it might be
>   hard to understand.
>
> So provide the most understandable/maintainable solution, I think
> we should provide rhashtable_walk_prev() as a separate interface.
>
I'm still missing why requiring two API operations instead of one is
simpler or easier to document. Also, I disagree that
rhashtable_walk_peek does two things-- it just does one which is to
return the current element in the walk without advancing to the next
one. The fact that the iterator may or may not move is immaterial in
the API, that is an implementation detail. In fact, it's conceivable
that we might completely reimplement this someday such that the
iterator works completely differently implementation semantics but the
API doesn't change. Also the naming in your proposal is confusing,
we'd have operations to get the previous, and the next next object--
so the user may ask where's the API to get the current object in the
walk? The idea that we get it by first trying to get the previous
object, and then if that fails getting the next object seems
counterintuitive.

Tom


Tom

> Thanks,
> NeilBronw

^ permalink raw reply

* Re: [PATCH net-next v2 0/3] bpf: implement bpf_get_current_cgroup_id() helper
From: Alexei Starovoitov @ 2018-06-04  1:27 UTC (permalink / raw)
  To: Yonghong Song; +Cc: ast, daniel, netdev, kernel-team
In-Reply-To: <20180603225943.2370719-1-yhs@fb.com>

On Sun, Jun 03, 2018 at 03:59:40PM -0700, Yonghong Song wrote:
> bpf has been used extensively for tracing. For example, bcc
> contains an almost full set of bpf-based tools to trace kernel
> and user functions/events. Most tracing tools are currently
> either filtered based on pid or system-wide.
> 
> Containers have been used quite extensively in industry and
> cgroup is often used together to provide resource isolation
> and protection. Several processes may run inside the same
> container. It is often desirable to get container-level tracing
> results as well, e.g. syscall count, function count, I/O
> activity, etc.
> 
> This patch implements a new helper, bpf_get_current_cgroup_id(),
> which will return cgroup id based on the cgroup within which
> the current task is running.
> 
> Patch #1 implements the new helper in the kernel.
> Patch #2 syncs the uapi bpf.h header and helper between tools
> and kernel.
> Patch #3 shows how to get the same cgroup id in user space,
> so a filter or policy could be configgured in the bpf program
> based on current task cgroup.
> 
> Changelog:
>   v1 -> v2:
>      . rebase to resolve merge conflict with latest bpf-next.

Applied, Thanks.

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: flowlabel in bpf_fib_lookup should be flowinfo
From: Alexei Starovoitov @ 2018-06-04  1:41 UTC (permalink / raw)
  To: dsahern; +Cc: netdev, borkmann, ast, David Ahern, Michal Kubecek
In-Reply-To: <20180603151519.10205-1-dsahern@kernel.org>

On Sun, Jun 03, 2018 at 08:15:19AM -0700, dsahern@kernel.org wrote:
> From: David Ahern <dsahern@gmail.com>
> 
> As Michal noted the flow struct takes both the flow label and priority.
> Update the bpf_fib_lookup API to note that it is flowinfo and not just
> the flow label.
> 
> Cc: Michal Kubecek <mkubecek@suse.cz>
> Signed-off-by: David Ahern <dsahern@gmail.com>

Applied, Thanks

^ permalink raw reply

* Re: [PATCH bpf-next] bpf: flowlabel in bpf_fib_lookup should be flowinfo
From: David Ahern @ 2018-06-04  1:47 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev, borkmann, ast, Michal Kubecek
In-Reply-To: <20180604014118.5nazn4s43443esio@ast-mbp>

On 6/3/18 7:41 PM, Alexei Starovoitov wrote:
> On Sun, Jun 03, 2018 at 08:15:19AM -0700, dsahern@kernel.org wrote:
>> From: David Ahern <dsahern@gmail.com>
>>
>> As Michal noted the flow struct takes both the flow label and priority.
>> Update the bpf_fib_lookup API to note that it is flowinfo and not just
>> the flow label.
>>
>> Cc: Michal Kubecek <mkubecek@suse.cz>
>> Signed-off-by: David Ahern <dsahern@gmail.com>
> 
> Applied, Thanks
> 

I noticed 4.17 was released. Just to make sure we are on the same page,
this patch needs to be 4.18.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox