Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf-next 3/3] bpf: Add mtu checking to FIB forwarding helper
From: Daniel Borkmann @ 2018-05-18 14:01 UTC (permalink / raw)
  To: David Ahern, netdev, borkmann, ast; +Cc: davem
In-Reply-To: <a61bc84d-a413-47fd-77c8-11f793e35b84@gmail.com>

On 05/18/2018 02:34 AM, David Ahern wrote:
> On 5/17/18 4:22 PM, Daniel Borkmann wrote:
>> On 05/17/2018 06:09 PM, David Ahern wrote:
>>> Add check that egress MTU can handle packet to be forwarded. If
>>> the MTU is less than the packet lenght, return 0 meaning the
>>> packet is expected to continue up the stack for help - eg.,
>>> fragmenting the packet or sending an ICMP.
>>>
>>> Signed-off-by: David Ahern <dsahern@gmail.com>
>>> ---
>>>  net/core/filter.c | 10 ++++++++++
>>>  1 file changed, 10 insertions(+)
>>>
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index 6d0d1560bd70..c47c47a75d4b 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -4098,6 +4098,7 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>>>  	struct fib_nh *nh;
>>>  	struct flowi4 fl4;
>>>  	int err;
>>> +	u32 mtu;
>>>  
>>>  	dev = dev_get_by_index_rcu(net, params->ifindex);
>>>  	if (unlikely(!dev))
>>> @@ -4149,6 +4150,10 @@ static int bpf_ipv4_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>>>  	if (res.fi->fib_nhs > 1)
>>>  		fib_select_path(net, &res, &fl4, NULL);
>>>  
>>> +	mtu = ip_mtu_from_fib_result(&res, params->ipv4_dst);
>>> +	if (params->tot_len > mtu)
>>> +		return 0;
>>> +
>>>  	nh = &res.fi->fib_nh[res.nh_sel];
>>>  
>>>  	/* do not handle lwt encaps right now */
>>> @@ -4188,6 +4193,7 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>>>  	struct flowi6 fl6;
>>>  	int strict = 0;
>>>  	int oif;
>>> +	u32 mtu;
>>>  
>>>  	/* link local addresses are never forwarded */
>>>  	if (rt6_need_strict(dst) || rt6_need_strict(src))
>>> @@ -4250,6 +4256,10 @@ static int bpf_ipv6_fib_lookup(struct net *net, struct bpf_fib_lookup *params,
>>>  						       fl6.flowi6_oif, NULL,
>>>  						       strict);
>>>  
>>> +	mtu = ip6_mtu_from_fib6(f6i, dst, src);
>>> +	if (params->tot_len > mtu)
>>> +		return 0;
>>> +
>>>  	if (f6i->fib6_nh.nh_lwtstate)
>>>  		return 0;
>>
>> Could you elaborate how this interacts in tc BPF use case where you have e.g.
>> GSO packets and tot_len from aggregated packets would definitely be larger
>> than MTU (e.g. see is_skb_forwardable() as one example on such checks)? Should
>> this be an opt-in via a new flag for the helper?
> 
> It should not be opt-in for XDP.

Yes, correct, for XDP it should not.

> I could add a flag to the internal call -- bpf_skb_fib_lookup sets the
> flag to skip the MTU check in bpf_ipv4_fib_lookup and bpf_ipv6_fib_lookup.
> 
> For the skb case do you want bpf_skb_fib_lookup call is_skb_forwardable
> or leave that to the BPF program?

I think it probably makes sense to add an internal (unexposed) flag or bool
where we propagate skb_is_gso(skb) from bpf_skb_fib_lookup() call-site and
have similar logic where we first check this bool and if false do the MTU
check (so it still can get enforced for control packets). Thus probably nothing
of that implementation detail needs to be exposed to the program author.

^ permalink raw reply

* Re: [PATCH net] tuntap: raise EPOLLOUT on device up
From: Michael S. Tsirkin @ 2018-05-18 14:06 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, Hannes Frederic Sowa, Eric Dumazet
In-Reply-To: <6159b23e-eeda-1d4b-6a30-63b1a30666eb@redhat.com>

On Fri, May 18, 2018 at 10:00:31PM +0800, Jason Wang wrote:
> 
> 
> On 2018年05月18日 21:26, Jason Wang wrote:
> > 
> > 
> > On 2018年05月18日 21:13, Michael S. Tsirkin wrote:
> > > On Fri, May 18, 2018 at 09:00:43PM +0800, Jason Wang wrote:
> > > > We return -EIO on device down but can not raise EPOLLOUT after it was
> > > > up. This may confuse user like vhost which expects tuntap to raise
> > > > EPOLLOUT to re-enable its TX routine after tuntap is down. This could
> > > > be easily reproduced by transmitting packets from VM while down and up
> > > > the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
> > > > 
> > > > Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > > > Cc: Eric Dumazet <edumazet@google.com>
> > > > Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > > >   drivers/net/tun.c | 4 +++-
> > > >   1 file changed, 3 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > > index d45ac37..1b29761 100644
> > > > --- a/drivers/net/tun.c
> > > > +++ b/drivers/net/tun.c
> > > > @@ -1734,8 +1734,10 @@ static ssize_t tun_get_user(struct
> > > > tun_struct *tun, struct tun_file *tfile,
> > > >       int skb_xdp = 1;
> > > >       bool frags = tun_napi_frags_enabled(tun);
> > > >   -    if (!(tun->dev->flags & IFF_UP))
> > > > +    if (!(tun->dev->flags & IFF_UP)) {
> > > Isn't this racy?  What if flag is cleared at this point?
> > 
> > I think you mean "set at this point"? Then yes, so we probably need to
> > set the bit during tun_net_close().
> > 
> > Thanks
> 
> Looks no need, vhost will poll socket after it see EIO. So we are ok here?
> 
> Thanks

In fact I don't even understand why does this help any longer.

-- 
MST

^ permalink raw reply

* [PATCH v4 0/3] IR decoding using BPF
From: Sean Young @ 2018-05-18 14:07 UTC (permalink / raw)
  To: linux-media, linux-kernel, Alexei Starovoitov,
	Mauro Carvalho Chehab, Daniel Borkmann, netdev, Matthias Reichl,
	Devin Heitmueller, Y Song, Quentin Monnet

The kernel IR decoders (drivers/media/rc/ir-*-decoder.c) support the most
widely used IR protocols, but there are many protocols which are not
supported[1]. For example, the lirc-remotes[2] repo has over 2700 remotes,
many of which are not supported by rc-core. There is a "long tail" of
unsupported IR protocols, for which lircd is need to decode the IR .

IR encoding is done in such a way that some simple circuit can decode it;
therefore, bpf is ideal.

In order to support all these protocols, here we have bpf based IR decoding.
The idea is that user-space can define a decoder in bpf, attach it to
the rc device through the lirc chardev.

Separate work is underway to extend ir-keytable to have an extensive library
of bpf-based decoders, and a much expanded library of rc keymaps.

Another future application would be to compile IRP[3] to a IR BPF program, and
so support virtually every remote without having to write a decoder for each.
It might also be possible to support non-button devices such as analog
directional pads or air conditioning remote controls and decode the target
temperature in bpf, and pass that to an input device.

Thanks,

Sean Young

[1] http://www.hifi-remote.com/wiki/index.php?title=DecodeIR
[2] https://sourceforge.net/p/lirc-remotes/code/ci/master/tree/remotes/
[3] http://www.hifi-remote.com/wiki/index.php?title=IRP_Notation

Changes since v3:
 - Implemented review comments from Quentin Monnet and Y Song (thanks!)
 - More helpful and better formatted bpf helper documentation
 - Changed back to bpf_prog_array rather than open-coded implementation
 - scancodes can be 64 bit
 - bpf gets passed values in microseconds, not nanoseconds.
   microseconds is more than than enough (IR receivers support carriers upto
   70kHz, at which point a single period is already 14 microseconds). Also,
   this makes it much more consistent with lirc mode2.
 - Since it looks much more like lirc mode2, rename the program type to
   BPF_PROG_TYPE_LIRC_MODE2.
 - Rebased on bpf-next

Changes since v2:
 - Fixed locking issues
 - Improved self-test to cover more cases
 - Rebased on bpf-next again

Changes since v1:
 - Code review comments from Y Song <ys114321@gmail.com> and
   Randy Dunlap <rdunlap@infradead.org>
 - Re-wrote sample bpf to be selftest
 - Renamed RAWIR_DECODER -> RAWIR_EVENT (Kconfig, context, bpf prog type)
 - Rebase on bpf-next
 - Introduced bpf_rawir_event context structure with simpler access checking

Sean Young (3):
  bpf: bpf_prog_array_copy() should return -ENOENT if exclude_prog not
    found
  media: rc: introduce BPF_PROG_LIRC_MODE2
  bpf: add selftest for lirc_mode2 type program

 drivers/media/rc/Kconfig                      |  13 +
 drivers/media/rc/Makefile                     |   1 +
 drivers/media/rc/bpf-lirc.c                   | 308 ++++++++++++++++++
 drivers/media/rc/lirc_dev.c                   |  30 ++
 drivers/media/rc/rc-core-priv.h               |  22 ++
 drivers/media/rc/rc-ir-raw.c                  |  12 +-
 include/linux/bpf_rcdev.h                     |  30 ++
 include/linux/bpf_types.h                     |   3 +
 include/uapi/linux/bpf.h                      |  53 ++-
 kernel/bpf/core.c                             |  11 +-
 kernel/bpf/syscall.c                          |   7 +
 kernel/trace/bpf_trace.c                      |   2 +
 tools/bpf/bpftool/prog.c                      |   1 +
 tools/include/uapi/linux/bpf.h                |  53 ++-
 tools/include/uapi/linux/lirc.h               | 217 ++++++++++++
 tools/lib/bpf/libbpf.c                        |   1 +
 tools/testing/selftests/bpf/Makefile          |   8 +-
 tools/testing/selftests/bpf/bpf_helpers.h     |   6 +
 .../testing/selftests/bpf/test_lirc_mode2.sh  |  28 ++
 .../selftests/bpf/test_lirc_mode2_kern.c      |  23 ++
 .../selftests/bpf/test_lirc_mode2_user.c      | 154 +++++++++
 21 files changed, 974 insertions(+), 9 deletions(-)
 create mode 100644 drivers/media/rc/bpf-lirc.c
 create mode 100644 include/linux/bpf_rcdev.h
 create mode 100644 tools/include/uapi/linux/lirc.h
 create mode 100755 tools/testing/selftests/bpf/test_lirc_mode2.sh
 create mode 100644 tools/testing/selftests/bpf/test_lirc_mode2_kern.c
 create mode 100644 tools/testing/selftests/bpf/test_lirc_mode2_user.c

-- 
2.17.0

^ permalink raw reply

* [PATCH v4 1/3] bpf: bpf_prog_array_copy() should return -ENOENT if exclude_prog not found
From: Sean Young @ 2018-05-18 14:07 UTC (permalink / raw)
  To: linux-media, linux-kernel, Alexei Starovoitov,
	Mauro Carvalho Chehab, Daniel Borkmann, netdev, Matthias Reichl,
	Devin Heitmueller, Y Song, Quentin Monnet
In-Reply-To: <cover.1526651592.git.sean@mess.org>

This makes is it possible for bpf prog detach to return -ENOENT.

Signed-off-by: Sean Young <sean@mess.org>
---
 kernel/bpf/core.c        | 11 +++++++++--
 kernel/trace/bpf_trace.c |  2 ++
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 2194c6a9df42..198848837783 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1566,6 +1566,7 @@ int bpf_prog_array_copy(struct bpf_prog_array __rcu *old_array,
 	int new_prog_cnt, carry_prog_cnt = 0;
 	struct bpf_prog **existing_prog;
 	struct bpf_prog_array *array;
+	bool found_exclude = false;
 	int new_prog_idx = 0;
 
 	/* Figure out how many existing progs we need to carry over to
@@ -1574,14 +1575,20 @@ int bpf_prog_array_copy(struct bpf_prog_array __rcu *old_array,
 	if (old_array) {
 		existing_prog = old_array->progs;
 		for (; *existing_prog; existing_prog++) {
-			if (*existing_prog != exclude_prog &&
-			    *existing_prog != &dummy_bpf_prog.prog)
+			if (*existing_prog == exclude_prog) {
+				found_exclude = true;
+				continue;
+			}
+			if (*existing_prog != &dummy_bpf_prog.prog)
 				carry_prog_cnt++;
 			if (*existing_prog == include_prog)
 				return -EEXIST;
 		}
 	}
 
+	if (exclude_prog && !found_exclude)
+		return -ENOENT;
+
 	/* How many progs (not NULL) will be in the new array? */
 	new_prog_cnt = carry_prog_cnt;
 	if (include_prog)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ce2cbbff27e4..daf36acb2a17 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1005,6 +1005,8 @@ void perf_event_detach_bpf_prog(struct perf_event *event)
 
 	old_array = event->tp_event->prog_array;
 	ret = bpf_prog_array_copy(old_array, event->prog, NULL, &new_array);
+	if (ret == -ENOENT)
+		goto unlock;
 	if (ret < 0) {
 		bpf_prog_array_delete_safe(old_array, event->prog);
 	} else {
-- 
2.17.0

^ permalink raw reply related

* [PATCH v4 3/3] bpf: add selftest for lirc_mode2 type program
From: Sean Young @ 2018-05-18 14:07 UTC (permalink / raw)
  To: linux-media, linux-kernel, Alexei Starovoitov,
	Mauro Carvalho Chehab, Daniel Borkmann, netdev, Matthias Reichl,
	Devin Heitmueller, Y Song, Quentin Monnet
In-Reply-To: <cover.1526651592.git.sean@mess.org>

This is simple test over rc-loopback.

Signed-off-by: Sean Young <sean@mess.org>
---
 tools/bpf/bpftool/prog.c                      |   1 +
 tools/include/uapi/linux/bpf.h                |  53 ++++-
 tools/include/uapi/linux/lirc.h               | 217 ++++++++++++++++++
 tools/lib/bpf/libbpf.c                        |   1 +
 tools/testing/selftests/bpf/Makefile          |   8 +-
 tools/testing/selftests/bpf/bpf_helpers.h     |   6 +
 .../testing/selftests/bpf/test_lirc_mode2.sh  |  28 +++
 .../selftests/bpf/test_lirc_mode2_kern.c      |  23 ++
 .../selftests/bpf/test_lirc_mode2_user.c      | 154 +++++++++++++
 9 files changed, 487 insertions(+), 4 deletions(-)
 create mode 100644 tools/include/uapi/linux/lirc.h
 create mode 100755 tools/testing/selftests/bpf/test_lirc_mode2.sh
 create mode 100644 tools/testing/selftests/bpf/test_lirc_mode2_kern.c
 create mode 100644 tools/testing/selftests/bpf/test_lirc_mode2_user.c

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 9bdfdf2d3fbe..07f1ace39a46 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -71,6 +71,7 @@ static const char * const prog_type_name[] = {
 	[BPF_PROG_TYPE_SK_MSG]		= "sk_msg",
 	[BPF_PROG_TYPE_RAW_TRACEPOINT]	= "raw_tracepoint",
 	[BPF_PROG_TYPE_CGROUP_SOCK_ADDR] = "cgroup_sock_addr",
+	[BPF_PROG_TYPE_LIRC_MODE2]	= "lirc_mode2",
 };
 
 static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index d94d333a8225..8227832b713e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -141,6 +141,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SK_MSG,
 	BPF_PROG_TYPE_RAW_TRACEPOINT,
 	BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
+	BPF_PROG_TYPE_LIRC_MODE2,
 };
 
 enum bpf_attach_type {
@@ -158,6 +159,7 @@ enum bpf_attach_type {
 	BPF_CGROUP_INET6_CONNECT,
 	BPF_CGROUP_INET4_POST_BIND,
 	BPF_CGROUP_INET6_POST_BIND,
+	BPF_LIRC_MODE2,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -1902,6 +1904,53 @@ union bpf_attr {
  *		egress otherwise). This is the only flag supported for now.
  *	Return
  *		**SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle)
+ *	Description
+ *		This helper is used in programs implementing IR decoding, to
+ *		report a successfully decoded key press with *scancode*,
+ *		*toggle* value in the given *protocol*. The scancode will be
+ *		translated to a keycode using the rc keymap, and reported as
+ *		an input key down event. After a period a key up event is
+ *		generated. This period can be extended by calling either
+ *		**bpf_rc_keydown** () with the same values, or calling
+ *		**bpf_rc_repeat** ().
+ *
+ *		Some protocols include a toggle bit, in case the button
+ *		was released and pressed again between consecutive scancodes
+ *
+ *		The *ctx* should point to the lirc sample as passed into
+ *		the program.
+ *
+ *		The *protocol* is the decoded protocol number (see
+ *		**enum rc_proto** for some predefined values).
+ *
+ *		This helper is only available is the kernel was compiled with
+ *		the **CONFIG_BPF_LIRC_MODE2** configuration option set to
+ *		"**y**".
+ *
+ *	Return
+ *		0
+ *
+ * int bpf_rc_repeat(void *ctx)
+ *	Description
+ *		This helper is used in programs implementing IR decoding, to
+ *		report a successfully decoded repeat key message. This delays
+ *		the generation of a key up event for previously generated
+ *		key down event.
+ *
+ *		Some IR protocols like NEC have a special IR message for
+ *		repeating last button, for when a button is held down.
+ *
+ *		The *ctx* should point to the lirc sample as passed into
+ *		the program.
+ *
+ *		This helper is only available is the kernel was compiled with
+ *		the **CONFIG_BPF_LIRC_MODE2** configuration option set to
+ *		"**y**".
+ *
+ *	Return
+ *		0
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -1976,7 +2025,9 @@ union bpf_attr {
 	FN(fib_lookup),			\
 	FN(sock_hash_update),		\
 	FN(msg_redirect_hash),		\
-	FN(sk_redirect_hash),
+	FN(sk_redirect_hash),		\
+	FN(rc_repeat),			\
+	FN(rc_keydown),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/include/uapi/linux/lirc.h b/tools/include/uapi/linux/lirc.h
new file mode 100644
index 000000000000..f189931042a7
--- /dev/null
+++ b/tools/include/uapi/linux/lirc.h
@@ -0,0 +1,217 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * lirc.h - linux infrared remote control header file
+ * last modified 2010/07/13 by Jarod Wilson
+ */
+
+#ifndef _LINUX_LIRC_H
+#define _LINUX_LIRC_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+#define PULSE_BIT       0x01000000
+#define PULSE_MASK      0x00FFFFFF
+
+#define LIRC_MODE2_SPACE     0x00000000
+#define LIRC_MODE2_PULSE     0x01000000
+#define LIRC_MODE2_FREQUENCY 0x02000000
+#define LIRC_MODE2_TIMEOUT   0x03000000
+
+#define LIRC_VALUE_MASK      0x00FFFFFF
+#define LIRC_MODE2_MASK      0xFF000000
+
+#define LIRC_SPACE(val) (((val)&LIRC_VALUE_MASK) | LIRC_MODE2_SPACE)
+#define LIRC_PULSE(val) (((val)&LIRC_VALUE_MASK) | LIRC_MODE2_PULSE)
+#define LIRC_FREQUENCY(val) (((val)&LIRC_VALUE_MASK) | LIRC_MODE2_FREQUENCY)
+#define LIRC_TIMEOUT(val) (((val)&LIRC_VALUE_MASK) | LIRC_MODE2_TIMEOUT)
+
+#define LIRC_VALUE(val) ((val)&LIRC_VALUE_MASK)
+#define LIRC_MODE2(val) ((val)&LIRC_MODE2_MASK)
+
+#define LIRC_IS_SPACE(val) (LIRC_MODE2(val) == LIRC_MODE2_SPACE)
+#define LIRC_IS_PULSE(val) (LIRC_MODE2(val) == LIRC_MODE2_PULSE)
+#define LIRC_IS_FREQUENCY(val) (LIRC_MODE2(val) == LIRC_MODE2_FREQUENCY)
+#define LIRC_IS_TIMEOUT(val) (LIRC_MODE2(val) == LIRC_MODE2_TIMEOUT)
+
+/* used heavily by lirc userspace */
+#define lirc_t int
+
+/*** lirc compatible hardware features ***/
+
+#define LIRC_MODE2SEND(x) (x)
+#define LIRC_SEND2MODE(x) (x)
+#define LIRC_MODE2REC(x) ((x) << 16)
+#define LIRC_REC2MODE(x) ((x) >> 16)
+
+#define LIRC_MODE_RAW                  0x00000001
+#define LIRC_MODE_PULSE                0x00000002
+#define LIRC_MODE_MODE2                0x00000004
+#define LIRC_MODE_SCANCODE             0x00000008
+#define LIRC_MODE_LIRCCODE             0x00000010
+
+
+#define LIRC_CAN_SEND_RAW              LIRC_MODE2SEND(LIRC_MODE_RAW)
+#define LIRC_CAN_SEND_PULSE            LIRC_MODE2SEND(LIRC_MODE_PULSE)
+#define LIRC_CAN_SEND_MODE2            LIRC_MODE2SEND(LIRC_MODE_MODE2)
+#define LIRC_CAN_SEND_LIRCCODE         LIRC_MODE2SEND(LIRC_MODE_LIRCCODE)
+
+#define LIRC_CAN_SEND_MASK             0x0000003f
+
+#define LIRC_CAN_SET_SEND_CARRIER      0x00000100
+#define LIRC_CAN_SET_SEND_DUTY_CYCLE   0x00000200
+#define LIRC_CAN_SET_TRANSMITTER_MASK  0x00000400
+
+#define LIRC_CAN_REC_RAW               LIRC_MODE2REC(LIRC_MODE_RAW)
+#define LIRC_CAN_REC_PULSE             LIRC_MODE2REC(LIRC_MODE_PULSE)
+#define LIRC_CAN_REC_MODE2             LIRC_MODE2REC(LIRC_MODE_MODE2)
+#define LIRC_CAN_REC_SCANCODE          LIRC_MODE2REC(LIRC_MODE_SCANCODE)
+#define LIRC_CAN_REC_LIRCCODE          LIRC_MODE2REC(LIRC_MODE_LIRCCODE)
+
+#define LIRC_CAN_REC_MASK              LIRC_MODE2REC(LIRC_CAN_SEND_MASK)
+
+#define LIRC_CAN_SET_REC_CARRIER       (LIRC_CAN_SET_SEND_CARRIER << 16)
+#define LIRC_CAN_SET_REC_DUTY_CYCLE    (LIRC_CAN_SET_SEND_DUTY_CYCLE << 16)
+
+#define LIRC_CAN_SET_REC_DUTY_CYCLE_RANGE 0x40000000
+#define LIRC_CAN_SET_REC_CARRIER_RANGE    0x80000000
+#define LIRC_CAN_GET_REC_RESOLUTION       0x20000000
+#define LIRC_CAN_SET_REC_TIMEOUT          0x10000000
+#define LIRC_CAN_SET_REC_FILTER           0x08000000
+
+#define LIRC_CAN_MEASURE_CARRIER          0x02000000
+#define LIRC_CAN_USE_WIDEBAND_RECEIVER    0x04000000
+
+#define LIRC_CAN_SEND(x) ((x)&LIRC_CAN_SEND_MASK)
+#define LIRC_CAN_REC(x) ((x)&LIRC_CAN_REC_MASK)
+
+#define LIRC_CAN_NOTIFY_DECODE            0x01000000
+
+/*** IOCTL commands for lirc driver ***/
+
+#define LIRC_GET_FEATURES              _IOR('i', 0x00000000, __u32)
+
+#define LIRC_GET_SEND_MODE             _IOR('i', 0x00000001, __u32)
+#define LIRC_GET_REC_MODE              _IOR('i', 0x00000002, __u32)
+#define LIRC_GET_REC_RESOLUTION        _IOR('i', 0x00000007, __u32)
+
+#define LIRC_GET_MIN_TIMEOUT           _IOR('i', 0x00000008, __u32)
+#define LIRC_GET_MAX_TIMEOUT           _IOR('i', 0x00000009, __u32)
+
+/* code length in bits, currently only for LIRC_MODE_LIRCCODE */
+#define LIRC_GET_LENGTH                _IOR('i', 0x0000000f, __u32)
+
+#define LIRC_SET_SEND_MODE             _IOW('i', 0x00000011, __u32)
+#define LIRC_SET_REC_MODE              _IOW('i', 0x00000012, __u32)
+/* Note: these can reset the according pulse_width */
+#define LIRC_SET_SEND_CARRIER          _IOW('i', 0x00000013, __u32)
+#define LIRC_SET_REC_CARRIER           _IOW('i', 0x00000014, __u32)
+#define LIRC_SET_SEND_DUTY_CYCLE       _IOW('i', 0x00000015, __u32)
+#define LIRC_SET_TRANSMITTER_MASK      _IOW('i', 0x00000017, __u32)
+
+/*
+ * when a timeout != 0 is set the driver will send a
+ * LIRC_MODE2_TIMEOUT data packet, otherwise LIRC_MODE2_TIMEOUT is
+ * never sent, timeout is disabled by default
+ */
+#define LIRC_SET_REC_TIMEOUT           _IOW('i', 0x00000018, __u32)
+
+/* 1 enables, 0 disables timeout reports in MODE2 */
+#define LIRC_SET_REC_TIMEOUT_REPORTS   _IOW('i', 0x00000019, __u32)
+
+/*
+ * if enabled from the next key press on the driver will send
+ * LIRC_MODE2_FREQUENCY packets
+ */
+#define LIRC_SET_MEASURE_CARRIER_MODE	_IOW('i', 0x0000001d, __u32)
+
+/*
+ * to set a range use LIRC_SET_REC_CARRIER_RANGE with the
+ * lower bound first and later LIRC_SET_REC_CARRIER with the upper bound
+ */
+#define LIRC_SET_REC_CARRIER_RANGE     _IOW('i', 0x0000001f, __u32)
+
+#define LIRC_SET_WIDEBAND_RECEIVER     _IOW('i', 0x00000023, __u32)
+
+/*
+ * struct lirc_scancode - decoded scancode with protocol for use with
+ *	LIRC_MODE_SCANCODE
+ *
+ * @timestamp: Timestamp in nanoseconds using CLOCK_MONOTONIC when IR
+ *	was decoded.
+ * @flags: should be 0 for transmit. When receiving scancodes,
+ *	LIRC_SCANCODE_FLAG_TOGGLE or LIRC_SCANCODE_FLAG_REPEAT can be set
+ *	depending on the protocol
+ * @rc_proto: see enum rc_proto
+ * @keycode: the translated keycode. Set to 0 for transmit.
+ * @scancode: the scancode received or to be sent
+ */
+struct lirc_scancode {
+	__u64	timestamp;
+	__u16	flags;
+	__u16	rc_proto;
+	__u32	keycode;
+	__u64	scancode;
+};
+
+/* Set if the toggle bit of rc-5 or rc-6 is enabled */
+#define LIRC_SCANCODE_FLAG_TOGGLE	1
+/* Set if this is a nec or sanyo repeat */
+#define LIRC_SCANCODE_FLAG_REPEAT	2
+
+/**
+ * enum rc_proto - the Remote Controller protocol
+ *
+ * @RC_PROTO_UNKNOWN: Protocol not known
+ * @RC_PROTO_OTHER: Protocol known but proprietary
+ * @RC_PROTO_RC5: Philips RC5 protocol
+ * @RC_PROTO_RC5X_20: Philips RC5x 20 bit protocol
+ * @RC_PROTO_RC5_SZ: StreamZap variant of RC5
+ * @RC_PROTO_JVC: JVC protocol
+ * @RC_PROTO_SONY12: Sony 12 bit protocol
+ * @RC_PROTO_SONY15: Sony 15 bit protocol
+ * @RC_PROTO_SONY20: Sony 20 bit protocol
+ * @RC_PROTO_NEC: NEC protocol
+ * @RC_PROTO_NECX: Extended NEC protocol
+ * @RC_PROTO_NEC32: NEC 32 bit protocol
+ * @RC_PROTO_SANYO: Sanyo protocol
+ * @RC_PROTO_MCIR2_KBD: RC6-ish MCE keyboard
+ * @RC_PROTO_MCIR2_MSE: RC6-ish MCE mouse
+ * @RC_PROTO_RC6_0: Philips RC6-0-16 protocol
+ * @RC_PROTO_RC6_6A_20: Philips RC6-6A-20 protocol
+ * @RC_PROTO_RC6_6A_24: Philips RC6-6A-24 protocol
+ * @RC_PROTO_RC6_6A_32: Philips RC6-6A-32 protocol
+ * @RC_PROTO_RC6_MCE: MCE (Philips RC6-6A-32 subtype) protocol
+ * @RC_PROTO_SHARP: Sharp protocol
+ * @RC_PROTO_XMP: XMP protocol
+ * @RC_PROTO_CEC: CEC protocol
+ * @RC_PROTO_IMON: iMon Pad protocol
+ */
+enum rc_proto {
+	RC_PROTO_UNKNOWN	= 0,
+	RC_PROTO_OTHER		= 1,
+	RC_PROTO_RC5		= 2,
+	RC_PROTO_RC5X_20	= 3,
+	RC_PROTO_RC5_SZ		= 4,
+	RC_PROTO_JVC		= 5,
+	RC_PROTO_SONY12		= 6,
+	RC_PROTO_SONY15		= 7,
+	RC_PROTO_SONY20		= 8,
+	RC_PROTO_NEC		= 9,
+	RC_PROTO_NECX		= 10,
+	RC_PROTO_NEC32		= 11,
+	RC_PROTO_SANYO		= 12,
+	RC_PROTO_MCIR2_KBD	= 13,
+	RC_PROTO_MCIR2_MSE	= 14,
+	RC_PROTO_RC6_0		= 15,
+	RC_PROTO_RC6_6A_20	= 16,
+	RC_PROTO_RC6_6A_24	= 17,
+	RC_PROTO_RC6_6A_32	= 18,
+	RC_PROTO_RC6_MCE	= 19,
+	RC_PROTO_SHARP		= 20,
+	RC_PROTO_XMP		= 21,
+	RC_PROTO_CEC		= 22,
+	RC_PROTO_IMON		= 23,
+};
+
+#endif
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 3dbe217bf23e..01e514479f6b 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1461,6 +1461,7 @@ static bool bpf_prog_type__needs_kver(enum bpf_prog_type type)
 	case BPF_PROG_TYPE_CGROUP_DEVICE:
 	case BPF_PROG_TYPE_SK_MSG:
 	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
+	case BPF_PROG_TYPE_LIRC_MODE2:
 		return false;
 	case BPF_PROG_TYPE_UNSPEC:
 	case BPF_PROG_TYPE_KPROBE:
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 1eb0fa2aba92..ee6d49f18be5 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -24,7 +24,7 @@ urandom_read: urandom_read.c
 # Order correspond to 'make run_tests' order
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \
 	test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
-	test_sock test_btf test_sockmap
+	test_sock test_btf test_sockmap test_lirc_mode2_user
 
 TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o \
 	test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o     \
@@ -33,7 +33,8 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
 	sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \
 	sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o test_adjust_tail.o \
 	test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o \
-	test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o
+	test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \
+	test_lirc_mode2_kern.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
@@ -42,7 +43,8 @@ TEST_PROGS := test_kmod.sh \
 	test_xdp_meta.sh \
 	test_offload.py \
 	test_sock_addr.sh \
-	test_tunnel.sh
+	test_tunnel.sh \
+	test_lirc_mode2.sh
 
 # Compile but not part of 'make run_tests'
 TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 8f143dfb3700..a6864827ed34 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -114,6 +114,12 @@ static int (*bpf_get_stack)(void *ctx, void *buf, int size, int flags) =
 static int (*bpf_fib_lookup)(void *ctx, struct bpf_fib_lookup *params,
 			     int plen, __u32 flags) =
 	(void *) BPF_FUNC_fib_lookup;
+static int (*bpf_rc_repeat)(void *ctx) =
+	(void *) BPF_FUNC_rc_repeat;
+static int (*bpf_rc_keydown)(void *ctx, unsigned int protocol,
+			     unsigned long long scancode, unsigned int toggle) =
+	(void *) BPF_FUNC_rc_keydown;
+
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/test_lirc_mode2.sh b/tools/testing/selftests/bpf/test_lirc_mode2.sh
new file mode 100755
index 000000000000..ce2e15e4f976
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_lirc_mode2.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+GREEN='\033[0;92m'
+RED='\033[0;31m'
+NC='\033[0m' # No Color
+
+modprobe rc-loopback
+
+for i in /sys/class/rc/rc*
+do
+	if grep -q DRV_NAME=rc-loopback $i/uevent
+	then
+		LIRCDEV=$(grep DEVNAME= $i/lirc*/uevent | sed sQDEVNAME=Q/dev/Q)
+	fi
+done
+
+if [ -n $LIRCDEV ];
+then
+	TYPE=lirc_mode2
+	./test_lirc_mode2_user $LIRCDEV
+	ret=$?
+	if [ $ret -ne 0 ]; then
+		echo -e ${RED}"FAIL: $TYPE"${NC}
+	else
+		echo -e ${GREEN}"PASS: $TYPE"${NC}
+	fi
+fi
diff --git a/tools/testing/selftests/bpf/test_lirc_mode2_kern.c b/tools/testing/selftests/bpf/test_lirc_mode2_kern.c
new file mode 100644
index 000000000000..ba26855563a5
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_lirc_mode2_kern.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+// test ir decoder
+//
+// Copyright (C) 2018 Sean Young <sean@mess.org>
+
+#include <linux/bpf.h>
+#include <linux/lirc.h>
+#include "bpf_helpers.h"
+
+SEC("lirc_mode2")
+int bpf_decoder(unsigned int *sample)
+{
+	if (LIRC_IS_PULSE(*sample)) {
+		unsigned int duration = LIRC_VALUE(*sample);
+
+		if (duration & 0x10000)
+			bpf_rc_keydown(sample, 0x40, duration & 0xffff, 0);
+	}
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_lirc_mode2_user.c b/tools/testing/selftests/bpf/test_lirc_mode2_user.c
new file mode 100644
index 000000000000..bd77688c8277
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_lirc_mode2_user.c
@@ -0,0 +1,154 @@
+// SPDX-License-Identifier: GPL-2.0
+// test ir decoder
+//
+// Copyright (C) 2018 Sean Young <sean@mess.org>
+
+// A lirc chardev is a device representing a consumer IR (cir) device which
+// can receive infrared signals from remote control (and/or transmit IR)
+//
+// IR is sent as a series of pulses and space somewhat like morse code. The
+// BPF program can decode this into scancodes so that rc-core can translate
+// this into input key codes using the rc keymap
+//
+// This test works by sending IR over rc-loopback, so the IR is processed by
+// BPF and then decoded into scancodes. The/ lirc chardev must be the one
+// associated with rc-loopback, see the output of ir-keytable(1)".
+//
+// The following CONFIG options must be enabled for the test to succeed:
+// CONFIG_RC_CORE=y
+// CONFIG_BPF_RAWIR_EVENT=y
+// CONFIG_RC_LOOPBACK=y
+
+// Steps:
+// 1. Open the /dev/lircN device for rc-loopback (given on command line)
+// 2. Attach bpf_lirc_mode2 program which decodes some IR.
+// 3. Send some IR to the same IR device; since it is loopback, this will
+//    end up in the bpf program
+// 4. bpf program should decode IR and report keycode
+// 5. We can read keycode from same /dev/lirc device
+
+#include <linux/bpf.h>
+#include <linux/lirc.h>
+#include <assert.h>
+#include <errno.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <unistd.h>
+#include <poll.h>
+#include <libgen.h>
+#include <sys/resource.h>
+#include <sys/types.h>
+#include <sys/ioctl.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+
+#include "bpf_util.h"
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+int main(int argc, char **argv)
+{
+	struct bpf_object *obj;
+	int ret, lircfd, progfd, mode;
+	int testir = 0x1dead;
+	u32 prog_ids[10], prog_flags[10], prog_cnt;
+
+	if (argc != 2) {
+		printf("Usage: %s /dev/lircN\n", argv[0]);
+		return 2;
+	}
+
+	ret = bpf_prog_load("test_lirc_mode2_kern.o",
+			    BPF_PROG_TYPE_LIRC_MODE2, &obj, &progfd);
+	if (ret) {
+		printf("Failed to load bpf program\n");
+		return 1;
+	}
+
+	lircfd = open(argv[1], O_RDWR | O_NONBLOCK);
+	if (lircfd == -1) {
+		printf("failed to open lirc device %s: %m\n", argv[1]);
+		return 1;
+	}
+
+	/* Let's try detach it before it was ever attached */
+	ret = bpf_prog_detach2(progfd, lircfd, BPF_LIRC_MODE2);
+	if (ret != -1 || errno != ENOENT) {
+		printf("bpf_prog_detach2 not attached should fail: %m\n");
+		return 1;
+	}
+
+	mode = LIRC_MODE_SCANCODE;
+	if (ioctl(lircfd, LIRC_SET_REC_MODE, &mode)) {
+		printf("failed to set rec mode: %m\n");
+		return 1;
+	}
+
+	prog_cnt = 10;
+	ret = bpf_prog_query(lircfd, BPF_LIRC_MODE2, 0, prog_flags, prog_ids,
+			     &prog_cnt);
+	if (ret) {
+		printf("Failed to query bpf programs on lirc device: %m\n");
+		return 1;
+	}
+
+	if (prog_cnt != 0) {
+		printf("Expected nothing to be attached\n");
+		return 1;
+	}
+
+	ret = bpf_prog_attach(progfd, lircfd, BPF_LIRC_MODE2, 0);
+	if (ret) {
+		printf("Failed to attach bpf to lirc device: %m\n");
+		return 1;
+	}
+
+	/* Write raw IR */
+	ret = write(lircfd, &testir, sizeof(testir));
+	if (ret != sizeof(testir)) {
+		printf("Failed to send test IR message: %m\n");
+		return 1;
+	}
+
+	struct pollfd pfd = { .fd = lircfd, .events = POLLIN };
+	struct lirc_scancode lsc;
+
+	poll(&pfd, 1, 100);
+
+	/* Read decoded IR */
+	ret = read(lircfd, &lsc, sizeof(lsc));
+	if (ret != sizeof(lsc)) {
+		printf("Failed to read decoded IR: %m\n");
+		return 1;
+	}
+
+	if (lsc.scancode != 0xdead || lsc.rc_proto != 64) {
+		printf("Incorrect scancode decoded\n");
+		return 1;
+	}
+
+	prog_cnt = 10;
+	ret = bpf_prog_query(lircfd, BPF_LIRC_MODE2, 0, prog_flags, prog_ids,
+			     &prog_cnt);
+	if (ret) {
+		printf("Failed to query bpf programs on lirc device: %m\n");
+		return 1;
+	}
+
+	if (prog_cnt != 1) {
+		printf("Expected one program to be attached\n");
+		return 1;
+	}
+
+	/* Let's try detaching it now it is actually attached */
+	ret = bpf_prog_detach2(progfd, lircfd, BPF_LIRC_MODE2);
+	if (ret) {
+		printf("bpf_prog_detach2: returned %m\n");
+		return 1;
+	}
+
+	return 0;
+}
-- 
2.17.0

^ permalink raw reply related

* [PATCH v4 2/3] media: rc: introduce BPF_PROG_LIRC_MODE2
From: Sean Young @ 2018-05-18 14:07 UTC (permalink / raw)
  To: linux-media, linux-kernel, Alexei Starovoitov,
	Mauro Carvalho Chehab, Daniel Borkmann, netdev, Matthias Reichl,
	Devin Heitmueller, Y Song, Quentin Monnet
In-Reply-To: <cover.1526651592.git.sean@mess.org>

Add support for BPF_PROG_LIRC_MODE2. This type of BPF program can call
rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
that the last key should be repeated.

The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall;
the target_fd must be the /dev/lircN device.

Signed-off-by: Sean Young <sean@mess.org>
---
 drivers/media/rc/Kconfig        |  13 ++
 drivers/media/rc/Makefile       |   1 +
 drivers/media/rc/bpf-lirc.c     | 308 ++++++++++++++++++++++++++++++++
 drivers/media/rc/lirc_dev.c     |  30 ++++
 drivers/media/rc/rc-core-priv.h |  22 +++
 drivers/media/rc/rc-ir-raw.c    |  12 +-
 include/linux/bpf_rcdev.h       |  30 ++++
 include/linux/bpf_types.h       |   3 +
 include/uapi/linux/bpf.h        |  53 +++++-
 kernel/bpf/syscall.c            |   7 +
 10 files changed, 476 insertions(+), 3 deletions(-)
 create mode 100644 drivers/media/rc/bpf-lirc.c
 create mode 100644 include/linux/bpf_rcdev.h

diff --git a/drivers/media/rc/Kconfig b/drivers/media/rc/Kconfig
index eb2c3b6eca7f..d5b35a6ba899 100644
--- a/drivers/media/rc/Kconfig
+++ b/drivers/media/rc/Kconfig
@@ -25,6 +25,19 @@ config LIRC
 	   passes raw IR to and from userspace, which is needed for
 	   IR transmitting (aka "blasting") and for the lirc daemon.
 
+config BPF_LIRC_MODE2
+	bool "Support for eBPF programs attached to lirc devices"
+	depends on BPF_SYSCALL
+	depends on RC_CORE=y
+	depends on LIRC
+	help
+	   Allow attaching eBPF programs to a lirc device using the bpf(2)
+	   syscall command BPF_PROG_ATTACH. This is supported for raw IR
+	   receivers.
+
+	   These eBPF programs can be used to decode IR into scancodes, for
+	   IR protocols not supported by the kernel decoders.
+
 menuconfig RC_DECODERS
 	bool "Remote controller decoders"
 	depends on RC_CORE
diff --git a/drivers/media/rc/Makefile b/drivers/media/rc/Makefile
index 2e1c87066f6c..e0340d043fe8 100644
--- a/drivers/media/rc/Makefile
+++ b/drivers/media/rc/Makefile
@@ -5,6 +5,7 @@ obj-y += keymaps/
 obj-$(CONFIG_RC_CORE) += rc-core.o
 rc-core-y := rc-main.o rc-ir-raw.o
 rc-core-$(CONFIG_LIRC) += lirc_dev.o
+rc-core-$(CONFIG_BPF_LIRC_MODE2) += bpf-lirc.o
 obj-$(CONFIG_IR_NEC_DECODER) += ir-nec-decoder.o
 obj-$(CONFIG_IR_RC5_DECODER) += ir-rc5-decoder.o
 obj-$(CONFIG_IR_RC6_DECODER) += ir-rc6-decoder.o
diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
new file mode 100644
index 000000000000..c9673df2d9cd
--- /dev/null
+++ b/drivers/media/rc/bpf-lirc.c
@@ -0,0 +1,308 @@
+// SPDX-License-Identifier: GPL-2.0
+// bpf-lirc.c - handles bpf
+//
+// Copyright (C) 2018 Sean Young <sean@mess.org>
+
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <linux/bpf_rcdev.h>
+#include "rc-core-priv.h"
+
+/*
+ * BPF interface for raw IR
+ */
+const struct bpf_prog_ops lirc_mode2_prog_ops = {
+};
+
+BPF_CALL_1(bpf_rc_repeat, u32*, sample)
+{
+	struct ir_raw_event_ctrl *ctrl;
+
+	ctrl = container_of(sample, struct ir_raw_event_ctrl, bpf_sample);
+
+	rc_repeat(ctrl->dev);
+
+	return 0;
+}
+
+static const struct bpf_func_proto rc_repeat_proto = {
+	.func	   = bpf_rc_repeat,
+	.gpl_only  = true, /* rc_repeat is EXPORT_SYMBOL_GPL */
+	.ret_type  = RET_INTEGER,
+	.arg1_type = ARG_PTR_TO_CTX,
+};
+
+/*
+ * Currently rc-core does not support 64-bit scancodes, but there are many
+ * known protocols with more than 32 bits. So, define the interface as u64
+ * as a future-proof.
+ */
+BPF_CALL_4(bpf_rc_keydown, u32*, sample, u32, protocol, u64, scancode,
+	   u32, toggle)
+{
+	struct ir_raw_event_ctrl *ctrl;
+
+	ctrl = container_of(sample, struct ir_raw_event_ctrl, bpf_sample);
+
+	rc_keydown(ctrl->dev, protocol, scancode, toggle != 0);
+
+	return 0;
+}
+
+static const struct bpf_func_proto rc_keydown_proto = {
+	.func	   = bpf_rc_keydown,
+	.gpl_only  = true, /* rc_keydown is EXPORT_SYMBOL_GPL */
+	.ret_type  = RET_INTEGER,
+	.arg1_type = ARG_PTR_TO_CTX,
+	.arg2_type = ARG_ANYTHING,
+	.arg3_type = ARG_ANYTHING,
+	.arg4_type = ARG_ANYTHING,
+};
+
+static const struct bpf_func_proto *
+lirc_mode2_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+	switch (func_id) {
+	case BPF_FUNC_rc_repeat:
+		return &rc_repeat_proto;
+	case BPF_FUNC_rc_keydown:
+		return &rc_keydown_proto;
+	case BPF_FUNC_map_lookup_elem:
+		return &bpf_map_lookup_elem_proto;
+	case BPF_FUNC_map_update_elem:
+		return &bpf_map_update_elem_proto;
+	case BPF_FUNC_map_delete_elem:
+		return &bpf_map_delete_elem_proto;
+	case BPF_FUNC_ktime_get_ns:
+		return &bpf_ktime_get_ns_proto;
+	case BPF_FUNC_tail_call:
+		return &bpf_tail_call_proto;
+	case BPF_FUNC_get_prandom_u32:
+		return &bpf_get_prandom_u32_proto;
+	case BPF_FUNC_trace_printk:
+		if (capable(CAP_SYS_ADMIN))
+			return bpf_get_trace_printk_proto();
+		/* fall through */
+	default:
+		return NULL;
+	}
+}
+
+static bool lirc_mode2_is_valid_access(int off, int size,
+				       enum bpf_access_type type,
+				       const struct bpf_prog *prog,
+				       struct bpf_insn_access_aux *info)
+{
+	/* We have one field of u32 */
+	return type == BPF_READ && off == 0 && size == sizeof(u32);
+}
+
+const struct bpf_verifier_ops lirc_mode2_verifier_ops = {
+	.get_func_proto  = lirc_mode2_func_proto,
+	.is_valid_access = lirc_mode2_is_valid_access
+};
+
+#define BPF_MAX_PROGS 64
+
+static int rc_dev_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog)
+{
+	struct bpf_prog_array __rcu *old_array;
+	struct bpf_prog_array *new_array;
+	struct ir_raw_event_ctrl *raw;
+	int ret;
+
+	if (rcdev->driver_type != RC_DRIVER_IR_RAW)
+		return -EINVAL;
+
+	ret = mutex_lock_interruptible(&ir_raw_handler_lock);
+	if (ret)
+		return ret;
+
+	raw = rcdev->raw;
+	if (!raw) {
+		ret = -ENODEV;
+		goto unlock;
+	}
+
+	if (raw->progs && bpf_prog_array_length(raw->progs) >= BPF_MAX_PROGS) {
+		ret = -E2BIG;
+		goto unlock;
+	}
+
+	old_array = raw->progs;
+	ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
+	if (ret < 0)
+		goto unlock;
+
+	rcu_assign_pointer(raw->progs, new_array);
+	bpf_prog_array_free(old_array);
+
+unlock:
+	mutex_unlock(&ir_raw_handler_lock);
+	return ret;
+}
+
+static int rc_dev_bpf_detach(struct rc_dev *rcdev, struct bpf_prog *prog)
+{
+	struct bpf_prog_array __rcu *old_array;
+	struct bpf_prog_array *new_array;
+	struct ir_raw_event_ctrl *raw;
+	int ret;
+
+	if (rcdev->driver_type != RC_DRIVER_IR_RAW)
+		return -EINVAL;
+
+	ret = mutex_lock_interruptible(&ir_raw_handler_lock);
+	if (ret)
+		return ret;
+
+	raw = rcdev->raw;
+	if (!raw) {
+		ret = -ENODEV;
+		goto unlock;
+	}
+
+	old_array = raw->progs;
+	ret = bpf_prog_array_copy(old_array, prog, NULL, &new_array);
+	/*
+	 * Do not use bpf_prog_array_delete_safe() as we would end up
+	 * with a dummy entry in the array, and the we would free the
+	 * dummy in rc_dev_bpf_free()
+	 */
+	if (ret == 0) {
+		rcu_assign_pointer(raw->progs, new_array);
+		bpf_prog_array_free(old_array);
+	}
+unlock:
+	mutex_unlock(&ir_raw_handler_lock);
+	return ret;
+}
+
+void rc_dev_bpf_run(struct rc_dev *rcdev, u32 sample)
+{
+	struct ir_raw_event_ctrl *raw = rcdev->raw;
+
+	raw->bpf_sample = sample;
+
+	if (raw->progs)
+		BPF_PROG_RUN_ARRAY(raw->progs, &raw->bpf_sample, BPF_PROG_RUN);
+}
+
+/*
+ * This should be called once the rc thread has been stopped, so there can be
+ * no concurrent bpf execution.
+ */
+void rc_dev_bpf_free(struct rc_dev *rcdev)
+{
+	struct bpf_prog **progs;
+
+	if (!rcdev->raw->progs)
+		return;
+
+	progs = rcu_dereference(rcdev->raw->progs)->progs;
+	while (*progs)
+		bpf_prog_put(*progs++);
+
+	bpf_prog_array_free(rcdev->raw->progs);
+}
+
+int rc_dev_prog_attach(const union bpf_attr *attr)
+{
+	struct bpf_prog *prog;
+	struct rc_dev *rcdev;
+	int ret;
+
+	if (attr->attach_flags)
+		return -EINVAL;
+
+	prog = bpf_prog_get_type(attr->attach_bpf_fd,
+				 BPF_PROG_TYPE_LIRC_MODE2);
+	if (IS_ERR(prog))
+		return PTR_ERR(prog);
+
+	rcdev = rc_dev_get_from_fd(attr->target_fd);
+	if (IS_ERR(rcdev)) {
+		bpf_prog_put(prog);
+		return PTR_ERR(rcdev);
+	}
+
+	ret = rc_dev_bpf_attach(rcdev, prog);
+	if (ret)
+		bpf_prog_put(prog);
+
+	put_device(&rcdev->dev);
+
+	return ret;
+}
+
+int rc_dev_prog_detach(const union bpf_attr *attr)
+{
+	struct bpf_prog *prog;
+	struct rc_dev *rcdev;
+	int ret;
+
+	if (attr->attach_flags)
+		return -EINVAL;
+
+	prog = bpf_prog_get_type(attr->attach_bpf_fd,
+				 BPF_PROG_TYPE_LIRC_MODE2);
+	if (IS_ERR(prog))
+		return PTR_ERR(prog);
+
+	rcdev = rc_dev_get_from_fd(attr->target_fd);
+	if (IS_ERR(rcdev)) {
+		bpf_prog_put(prog);
+		return PTR_ERR(rcdev);
+	}
+
+	ret = rc_dev_bpf_detach(rcdev, prog);
+
+	bpf_prog_put(prog);
+	put_device(&rcdev->dev);
+
+	return ret;
+}
+
+int rc_dev_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr)
+{
+	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
+	struct bpf_prog_array __rcu *progs;
+	struct rc_dev *rcdev;
+	u32 cnt, flags = 0;
+	int ret;
+
+	if (attr->query.query_flags)
+		return -EINVAL;
+
+	rcdev = rc_dev_get_from_fd(attr->query.target_fd);
+	if (IS_ERR(rcdev))
+		return PTR_ERR(rcdev);
+
+	if (rcdev->driver_type != RC_DRIVER_IR_RAW)
+		return -EINVAL;
+
+	ret = mutex_lock_interruptible(&ir_raw_handler_lock);
+	if (ret)
+		return ret;
+
+	progs = rcdev->raw->progs;
+	cnt = progs ? bpf_prog_array_length(progs) : 0;
+
+	if (copy_to_user(&uattr->query.prog_cnt, &cnt, sizeof(cnt))) {
+		ret = -EFAULT;
+		goto unlock;
+	}
+	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags))) {
+		ret = -EFAULT;
+		goto unlock;
+	}
+
+	if (attr->query.prog_cnt != 0 && prog_ids && cnt)
+		ret = bpf_prog_array_copy_to_user(progs, prog_ids, cnt);
+
+unlock:
+	mutex_unlock(&ir_raw_handler_lock);
+	put_device(&rcdev->dev);
+
+	return ret;
+}
diff --git a/drivers/media/rc/lirc_dev.c b/drivers/media/rc/lirc_dev.c
index 24e9fbb80e81..7e760bf11a51 100644
--- a/drivers/media/rc/lirc_dev.c
+++ b/drivers/media/rc/lirc_dev.c
@@ -20,6 +20,7 @@
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/device.h>
+#include <linux/file.h>
 #include <linux/idr.h>
 #include <linux/poll.h>
 #include <linux/sched.h>
@@ -104,6 +105,12 @@ void ir_lirc_raw_event(struct rc_dev *dev, struct ir_raw_event ev)
 			TO_US(ev.duration), TO_STR(ev.pulse));
 	}
 
+	/*
+	 * bpf does not care about the gap generated above; that exists
+	 * for backwards compatibility
+	 */
+	rc_dev_bpf_run(dev, sample);
+
 	spin_lock_irqsave(&dev->lirc_fh_lock, flags);
 	list_for_each_entry(fh, &dev->lirc_fh, list) {
 		if (LIRC_IS_TIMEOUT(sample) && !fh->send_timeout_reports)
@@ -816,4 +823,27 @@ void __exit lirc_dev_exit(void)
 	unregister_chrdev_region(lirc_base_dev, RC_DEV_MAX);
 }
 
+struct rc_dev *rc_dev_get_from_fd(int fd)
+{
+	struct fd f = fdget(fd);
+	struct lirc_fh *fh;
+	struct rc_dev *dev;
+
+	if (!f.file)
+		return ERR_PTR(-EBADF);
+
+	if (f.file->f_op != &lirc_fops) {
+		fdput(f);
+		return ERR_PTR(-EINVAL);
+	}
+
+	fh = f.file->private_data;
+	dev = fh->rc;
+
+	get_device(&dev->dev);
+	fdput(f);
+
+	return dev;
+}
+
 MODULE_ALIAS("lirc_dev");
diff --git a/drivers/media/rc/rc-core-priv.h b/drivers/media/rc/rc-core-priv.h
index e0e6a17460f6..511e4a2dc2d5 100644
--- a/drivers/media/rc/rc-core-priv.h
+++ b/drivers/media/rc/rc-core-priv.h
@@ -13,6 +13,7 @@
 #define	MAX_IR_EVENT_SIZE	512
 
 #include <linux/slab.h>
+#include <uapi/linux/bpf.h>
 #include <media/rc-core.h>
 
 /**
@@ -57,6 +58,11 @@ struct ir_raw_event_ctrl {
 	/* raw decoder state follows */
 	struct ir_raw_event prev_ev;
 	struct ir_raw_event this_ev;
+
+#ifdef CONFIG_BPF_LIRC_MODE2
+	u32				bpf_sample;
+	struct bpf_prog_array __rcu	*progs;
+#endif
 	struct nec_dec {
 		int state;
 		unsigned count;
@@ -126,6 +132,9 @@ struct ir_raw_event_ctrl {
 	} imon;
 };
 
+/* Mutex for locking raw IR processing and handler change */
+extern struct mutex ir_raw_handler_lock;
+
 /* macros for IR decoders */
 static inline bool geq_margin(unsigned d1, unsigned d2, unsigned margin)
 {
@@ -288,6 +297,7 @@ void ir_lirc_raw_event(struct rc_dev *dev, struct ir_raw_event ev);
 void ir_lirc_scancode_event(struct rc_dev *dev, struct lirc_scancode *lsc);
 int ir_lirc_register(struct rc_dev *dev);
 void ir_lirc_unregister(struct rc_dev *dev);
+struct rc_dev *rc_dev_get_from_fd(int fd);
 #else
 static inline int lirc_dev_init(void) { return 0; }
 static inline void lirc_dev_exit(void) {}
@@ -299,4 +309,16 @@ static inline int ir_lirc_register(struct rc_dev *dev) { return 0; }
 static inline void ir_lirc_unregister(struct rc_dev *dev) { }
 #endif
 
+/*
+ * bpf interface
+ */
+#ifdef CONFIG_BPF_LIRC_MODE2
+void rc_dev_bpf_free(struct rc_dev *dev);
+void rc_dev_bpf_run(struct rc_dev *dev, u32 sample);
+#else
+static inline void rc_dev_bpf_free(struct rc_dev *dev) { }
+static inline void rc_dev_bpf_run(struct rc_dev *dev, u32 sample)
+{ }
+#endif
+
 #endif /* _RC_CORE_PRIV */
diff --git a/drivers/media/rc/rc-ir-raw.c b/drivers/media/rc/rc-ir-raw.c
index 374f83105a23..a3131d4236b3 100644
--- a/drivers/media/rc/rc-ir-raw.c
+++ b/drivers/media/rc/rc-ir-raw.c
@@ -14,7 +14,7 @@
 static LIST_HEAD(ir_raw_client_list);
 
 /* Used to handle IR raw handler extensions */
-static DEFINE_MUTEX(ir_raw_handler_lock);
+DEFINE_MUTEX(ir_raw_handler_lock);
 static LIST_HEAD(ir_raw_handler_list);
 static atomic64_t available_protocols = ATOMIC64_INIT(0);
 
@@ -621,9 +621,17 @@ void ir_raw_event_unregister(struct rc_dev *dev)
 	list_for_each_entry(handler, &ir_raw_handler_list, list)
 		if (handler->raw_unregister)
 			handler->raw_unregister(dev);
-	mutex_unlock(&ir_raw_handler_lock);
+
+	rc_dev_bpf_free(dev);
 
 	ir_raw_event_free(dev);
+
+	/*
+	 * A user can be calling bpf(BPF_PROG_{QUERY|ATTACH|DETACH}), so
+	 * ensure that the raw member is null on unlock; this is how
+	 * "device gone" is checked.
+	 */
+	mutex_unlock(&ir_raw_handler_lock);
 }
 
 /*
diff --git a/include/linux/bpf_rcdev.h b/include/linux/bpf_rcdev.h
new file mode 100644
index 000000000000..570ca0036cf5
--- /dev/null
+++ b/include/linux/bpf_rcdev.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _BPF_RCDEV_H
+#define _BPF_RCDEV_H
+
+#include <linux/bpf.h>
+#include <uapi/linux/bpf.h>
+
+#ifdef CONFIG_BPF_LIRC_MODE2
+int rc_dev_prog_attach(const union bpf_attr *attr);
+int rc_dev_prog_detach(const union bpf_attr *attr);
+int rc_dev_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr);
+#else
+static inline int rc_dev_prog_attach(const union bpf_attr *attr)
+{
+	return -EINVAL;
+}
+
+static inline int rc_dev_prog_detach(const union bpf_attr *attr)
+{
+	return -EINVAL;
+}
+
+static inline int rc_dev_prog_query(const union bpf_attr *attr,
+				    union bpf_attr __user *uattr)
+{
+	return -EINVAL;
+}
+#endif
+
+#endif /* _BPF_RCDEV_H */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index b67f8793de0d..47b771421d40 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -25,6 +25,9 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT, raw_tracepoint)
 #ifdef CONFIG_CGROUP_BPF
 BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
 #endif
+#ifdef CONFIG_BPF_LIRC_MODE2
+BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2)
+#endif
 
 BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d94d333a8225..8227832b713e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -141,6 +141,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SK_MSG,
 	BPF_PROG_TYPE_RAW_TRACEPOINT,
 	BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
+	BPF_PROG_TYPE_LIRC_MODE2,
 };
 
 enum bpf_attach_type {
@@ -158,6 +159,7 @@ enum bpf_attach_type {
 	BPF_CGROUP_INET6_CONNECT,
 	BPF_CGROUP_INET4_POST_BIND,
 	BPF_CGROUP_INET6_POST_BIND,
+	BPF_LIRC_MODE2,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -1902,6 +1904,53 @@ union bpf_attr {
  *		egress otherwise). This is the only flag supported for now.
  *	Return
  *		**SK_PASS** on success, or **SK_DROP** on error.
+ *
+ * int bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle)
+ *	Description
+ *		This helper is used in programs implementing IR decoding, to
+ *		report a successfully decoded key press with *scancode*,
+ *		*toggle* value in the given *protocol*. The scancode will be
+ *		translated to a keycode using the rc keymap, and reported as
+ *		an input key down event. After a period a key up event is
+ *		generated. This period can be extended by calling either
+ *		**bpf_rc_keydown** () with the same values, or calling
+ *		**bpf_rc_repeat** ().
+ *
+ *		Some protocols include a toggle bit, in case the button
+ *		was released and pressed again between consecutive scancodes
+ *
+ *		The *ctx* should point to the lirc sample as passed into
+ *		the program.
+ *
+ *		The *protocol* is the decoded protocol number (see
+ *		**enum rc_proto** for some predefined values).
+ *
+ *		This helper is only available is the kernel was compiled with
+ *		the **CONFIG_BPF_LIRC_MODE2** configuration option set to
+ *		"**y**".
+ *
+ *	Return
+ *		0
+ *
+ * int bpf_rc_repeat(void *ctx)
+ *	Description
+ *		This helper is used in programs implementing IR decoding, to
+ *		report a successfully decoded repeat key message. This delays
+ *		the generation of a key up event for previously generated
+ *		key down event.
+ *
+ *		Some IR protocols like NEC have a special IR message for
+ *		repeating last button, for when a button is held down.
+ *
+ *		The *ctx* should point to the lirc sample as passed into
+ *		the program.
+ *
+ *		This helper is only available is the kernel was compiled with
+ *		the **CONFIG_BPF_LIRC_MODE2** configuration option set to
+ *		"**y**".
+ *
+ *	Return
+ *		0
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -1976,7 +2025,9 @@ union bpf_attr {
 	FN(fib_lookup),			\
 	FN(sock_hash_update),		\
 	FN(msg_redirect_hash),		\
-	FN(sk_redirect_hash),
+	FN(sk_redirect_hash),		\
+	FN(rc_repeat),			\
+	FN(rc_keydown),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index bfcde949c7f8..6e9a4cbc14b9 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -11,6 +11,7 @@
  */
 #include <linux/bpf.h>
 #include <linux/bpf_trace.h>
+#include <linux/bpf_rcdev.h>
 #include <linux/btf.h>
 #include <linux/syscalls.h>
 #include <linux/slab.h>
@@ -1576,6 +1577,8 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_SK_SKB_STREAM_PARSER:
 	case BPF_SK_SKB_STREAM_VERDICT:
 		return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, true);
+	case BPF_LIRC_MODE2:
+		return rc_dev_prog_attach(attr);
 	default:
 		return -EINVAL;
 	}
@@ -1646,6 +1649,8 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	case BPF_SK_SKB_STREAM_PARSER:
 	case BPF_SK_SKB_STREAM_VERDICT:
 		return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, false);
+	case BPF_LIRC_MODE2:
+		return rc_dev_prog_detach(attr);
 	default:
 		return -EINVAL;
 	}
@@ -1693,6 +1698,8 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	case BPF_CGROUP_SOCK_OPS:
 	case BPF_CGROUP_DEVICE:
 		break;
+	case BPF_LIRC_MODE2:
+		return rc_dev_prog_query(attr, uattr);
 	default:
 		return -EINVAL;
 	}
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH bpf-next 0/4] AF_XDP follow-up patches, cosmetics
From: Daniel Borkmann @ 2018-05-18 14:10 UTC (permalink / raw)
  To: Björn Töpel, magnus.karlsson, magnus.karlsson, ast,
	netdev
  Cc: Björn Töpel
In-Reply-To: <20180518120024.8588-1-bjorn.topel@gmail.com>

On 05/18/2018 02:00 PM, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> This series contain "cosmetics only" follow-up patches for AF_XDP.
> 
> Thanks to Daniel for suggesting them!
> 
> Björn Töpel (4):
>   xsk: clean up SPDX headers
>   xsk: remove newline at end of file
>   xsk: fixed some cases of unnecessary parentheses
>   xsk: proper '=' alignment

Applied to bpf-next, thanks Björn!

^ permalink raw reply

* Re: [PATCH net] tuntap: raise EPOLLOUT on device up
From: Jason Wang @ 2018-05-18 14:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, linux-kernel, Hannes Frederic Sowa, Eric Dumazet
In-Reply-To: <20180518170155-mutt-send-email-mst@kernel.org>



On 2018年05月18日 22:06, Michael S. Tsirkin wrote:
> On Fri, May 18, 2018 at 10:00:31PM +0800, Jason Wang wrote:
>>
>> On 2018年05月18日 21:26, Jason Wang wrote:
>>>
>>> On 2018年05月18日 21:13, Michael S. Tsirkin wrote:
>>>> On Fri, May 18, 2018 at 09:00:43PM +0800, Jason Wang wrote:
>>>>> We return -EIO on device down but can not raise EPOLLOUT after it was
>>>>> up. This may confuse user like vhost which expects tuntap to raise
>>>>> EPOLLOUT to re-enable its TX routine after tuntap is down. This could
>>>>> be easily reproduced by transmitting packets from VM while down and up
>>>>> the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
>>>>>
>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>>>> Cc: Eric Dumazet <edumazet@google.com>
>>>>> Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>> ---
>>>>>    drivers/net/tun.c | 4 +++-
>>>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>>> index d45ac37..1b29761 100644
>>>>> --- a/drivers/net/tun.c
>>>>> +++ b/drivers/net/tun.c
>>>>> @@ -1734,8 +1734,10 @@ static ssize_t tun_get_user(struct
>>>>> tun_struct *tun, struct tun_file *tfile,
>>>>>        int skb_xdp = 1;
>>>>>        bool frags = tun_napi_frags_enabled(tun);
>>>>>    -    if (!(tun->dev->flags & IFF_UP))
>>>>> +    if (!(tun->dev->flags & IFF_UP)) {
>>>> Isn't this racy?  What if flag is cleared at this point?
>>> I think you mean "set at this point"? Then yes, so we probably need to
>>> set the bit during tun_net_close().
>>>
>>> Thanks
>> Looks no need, vhost will poll socket after it see EIO. So we are ok here?
>>
>> Thanks
> In fact I don't even understand why does this help any longer.
>

We disable tx polling and only enable it on demand for a better rx 
performance. You may want to have a look at :

commit feb8892cb441c742d4220cf7ced001e7fa070731
Author: Jason Wang <jasowang@redhat.com>
Date:   Mon Nov 13 11:45:34 2017 +0800

     vhost_net: conditionally enable tx polling

Thanks

^ permalink raw reply

* Re: [PATCH v3 net-next 00/12] net: stmmac: Clean-up and tune-up
From: Corentin Labbe @ 2018-05-18 14:12 UTC (permalink / raw)
  To: Jose Abreu
  Cc: netdev, David S. Miller, Joao Pinto, Vitor Soares,
	Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <cover.1526651009.git.joabreu@synopsys.com>

On Fri, May 18, 2018 at 02:55:57PM +0100, Jose Abreu wrote:
> This targets to uniformize the handling of the different GMAC versions in
> stmmac_main.c file and also tune-up the HW.
> 
> Currently there are some if/else conditions in the main source file which
> calls different callbacks depending on the ID of GMAC.
> 
> With the introducion of a generic HW interface handling which automatically
> selects the GMAC callbacks to be used, it is now unpleasant to see if
> conditions in the main code because this should be completely agnostic of the
> GMAC version.
> 
> This series removes most of these conditions. There are some if conditions
> that remain untouched but the callbacks handling are now uniformized.
> 
> Tested in GMAC5, hope I didn't break any previous versions.
> 
> Please check [1] for performance analisys of patches 3-12.
> 
> ---
> David,
> 
> This will probably generate a merge conflict with [2] (which was not merged
> yet). I'm waiting for Corentin input and then, if this series is merged
> before, I will rebase [2]. Or the other way around if you prefer :D
> 
> Thanks
> ---
> 
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Joao Pinto <jpinto@synopsys.com>
> Cc: Vitor Soares <soares@synopsys.com>
> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> Cc: Alexandre Torgue <alexandre.torgue@st.com>
> 
> [1] https://marc.info/?l=linux-netdev&m=152656352607905&w=2
> [2] https://patchwork.ozlabs.org/patch/915286/
> 
> Jose Abreu (12):
>   net: stmmac: Enable OSP for GMAC4
>   net: stmmac: Do not keep rearming the coalesce timer in stmmac_xmit
>   net: stmmac: Let descriptor code set skbuff address
>   net: stmmac: Let descriptor code clear the descriptor
>   net: stmmac: Uniformize the use of dma_{rx/tx}_mode callbacks
>   net: stmmac: Remove uneeded checks for GMAC version
>   net: stmmac: Move PTP and MMC base address calculation to hwif.c
>   net: stmmac: Uniformize the use of dma_init_* callbacks
>   net: stmmac: Remove uneeded check for GMAC version in stmmac_xmit
>   net: stmmac: Uniformize set_rx_owner()
>   net: stmmac: Let descriptor code get skbuff address
>   net: stmmac: Remove if condition by taking advantage of hwif return
>     code
> 
>  drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c  |   82 +++++---
>  .../net/ethernet/stmicro/stmmac/dwmac1000_dma.c    |   92 ++++++----
>  drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c |   35 +++--
>  drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c |   34 +++-
>  drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c   |    7 +-
>  drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h   |    1 -
>  drivers/net/ethernet/stmicro/stmmac/enh_desc.c     |   20 ++-
>  drivers/net/ethernet/stmicro/stmmac/hwif.c         |   34 ++++
>  drivers/net/ethernet/stmicro/stmmac/hwif.h         |   27 ++-
>  drivers/net/ethernet/stmicro/stmmac/norm_desc.c    |   20 ++-
>  drivers/net/ethernet/stmicro/stmmac/stmmac.h       |    1 +
>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  198 +++++++-------------
>  12 files changed, 323 insertions(+), 228 deletions(-)
> 
> 

Hello

You didnt have put in CC linux-kernel@vger.kernel.org as required by get_maintener.pl letting more people to see this series.
Since this series touch dwmac-sun8i.c you should have also added Chen-Yu Tsai/Maxime Ripard (as also asked by get_maintainer).

Regards

^ permalink raw reply

* Re: [PATCH net] net: sched: red: avoid hashing NULL child
From: Jiri Kosina @ 2018-05-18 14:22 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko, David S. Miller,
	Hangbin Liu
In-Reply-To: <286744ab66eaa07d61d5d8ce8a07cb0b1615d35c.1526647813.git.pabeni@redhat.com>

On Fri, 18 May 2018, Paolo Abeni wrote:

> Hangbin reported an Oops triggered by the syzkaller qdisc rules:
> 
>  kasan: GPF could be caused by NULL-ptr deref or user memory access
>  general protection fault: 0000 [#1] SMP KASAN PTI
>  Modules linked in: sch_red
>  CPU: 0 PID: 28699 Comm: syz-executor5 Not tainted 4.17.0-rc4.kcov #1
>  Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>  RIP: 0010:qdisc_hash_add+0x26/0xa0
>  RSP: 0018:ffff8800589cf470 EFLAGS: 00010203
>  RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff824ad971
>  RDX: 0000000000000007 RSI: ffffc9000ce9f000 RDI: 000000000000003c
>  RBP: 0000000000000001 R08: ffffed000b139ea2 R09: ffff8800589cf4f0
>  R10: ffff8800589cf50f R11: ffffed000b139ea2 R12: ffff880054019fc0
>  R13: ffff880054019fb4 R14: ffff88005c0af600 R15: ffff880054019fb0
>  FS:  00007fa6edcb1700(0000) GS:ffff88005ce00000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 0000000020000740 CR3: 000000000fc16000 CR4: 00000000000006f0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Call Trace:
>   red_change+0x2d2/0xed0 [sch_red]
>   qdisc_create+0x57e/0xef0
>   tc_modify_qdisc+0x47f/0x14e0
>   rtnetlink_rcv_msg+0x6a8/0x920
>   netlink_rcv_skb+0x2a2/0x3c0
>   netlink_unicast+0x511/0x740
>   netlink_sendmsg+0x825/0xc30
>   sock_sendmsg+0xc5/0x100
>   ___sys_sendmsg+0x778/0x8e0
>   __sys_sendmsg+0xf5/0x1b0
>   do_syscall_64+0xbd/0x3b0
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
>  RIP: 0033:0x450869
>  RSP: 002b:00007fa6edcb0c48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
>  RAX: ffffffffffffffda RBX: 00007fa6edcb16b4 RCX: 0000000000450869
>  RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000013
>  RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
>  R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
>  R13: 0000000000008778 R14: 0000000000702838 R15: 00007fa6edcb1700
>  Code: e9 0b fe ff ff 0f 1f 44 00 00 55 53 48 89 fb 89 f5 e8 3f 07 f3 fe 48 8d 7b 3c 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 51
>  RIP: qdisc_hash_add+0x26/0xa0 RSP: ffff8800589cf470
> 
> When a red qdisc is updated with a 0 limit, the child qdisc is left
> unmodified, no additional scheduler is created in red_change(),
> the 'child' local variable is rightfully NULL and must not add it
> to the hash table.
> 
> This change addresses the above issue moving qdisc_hash_add() right
> after the child qdisc creation. It additionally removes unneeded checks
> for noop_qdisc.
> 
> Reported-by: Hangbin Liu <liuhangbin@gmail.com>
> Fixes: 49b499718fa1 ("net: sched: make default fifo qdiscs appear in the dump")

Acked-by: Jiri Kosina <jkosina@suse.cz>

Thanks for fixing this Paolo.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply

* Re: [PATCH v3 net-next 00/12] net: stmmac: Clean-up and tune-up
From: Jose Abreu @ 2018-05-18 14:23 UTC (permalink / raw)
  To: Corentin Labbe, Jose Abreu
  Cc: netdev, David S. Miller, Joao Pinto, Vitor Soares,
	Giuseppe Cavallaro, Alexandre Torgue, Maxime Ripard, Chen-Yu Tsai
In-Reply-To: <20180518141201.GA11284@Red>

Hi Corentin,

On 18-05-2018 15:12, Corentin Labbe wrote:
> On Fri, May 18, 2018 at 02:55:57PM +0100, Jose Abreu wrote:
>> This targets to uniformize the handling of the different GMAC versions in
>> stmmac_main.c file and also tune-up the HW.
>>
>> Currently there are some if/else conditions in the main source file which
>> calls different callbacks depending on the ID of GMAC.
>>
>> With the introducion of a generic HW interface handling which automatically
>> selects the GMAC callbacks to be used, it is now unpleasant to see if
>> conditions in the main code because this should be completely agnostic of the
>> GMAC version.
>>
>> This series removes most of these conditions. There are some if conditions
>> that remain untouched but the callbacks handling are now uniformized.
>>
>> Tested in GMAC5, hope I didn't break any previous versions.
>>
>> Please check [1] for performance analisys of patches 3-12.
>>
>> ---
>> David,
>>
>> This will probably generate a merge conflict with [2] (which was not merged
>> yet). I'm waiting for Corentin input and then, if this series is merged
>> before, I will rebase [2]. Or the other way around if you prefer :D
>>
>> Thanks
>> ---
>>
>> Cc: David S. Miller <davem@davemloft.net>
>> Cc: Joao Pinto <jpinto@synopsys.com>
>> Cc: Vitor Soares <soares@synopsys.com>
>> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
>> Cc: Alexandre Torgue <alexandre.torgue@st.com>
>>
>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dnetdev-26m-3D152656352607905-26w-3D2&d=DwIBAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=WHDsc6kcWAl4i96Vm5hJ_19IJiuxx_p_Rzo2g-uHDKw&m=a7bgrSQpisaMSa5fT-je94smZ_TM7QTxNFKqkvI5Nns&s=Tr23Xj_UCR_PaJp8AYiy18hfhbILnsaCsKDT5_4m2z4&e=
>> [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_915286_&d=DwIBAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=WHDsc6kcWAl4i96Vm5hJ_19IJiuxx_p_Rzo2g-uHDKw&m=a7bgrSQpisaMSa5fT-je94smZ_TM7QTxNFKqkvI5Nns&s=Q0SV-ZR35zIJWjiaLNXqlOWchppQ2CsO-Fh-BFCjCB8&e=
>>
>> Jose Abreu (12):
>>   net: stmmac: Enable OSP for GMAC4
>>   net: stmmac: Do not keep rearming the coalesce timer in stmmac_xmit
>>   net: stmmac: Let descriptor code set skbuff address
>>   net: stmmac: Let descriptor code clear the descriptor
>>   net: stmmac: Uniformize the use of dma_{rx/tx}_mode callbacks
>>   net: stmmac: Remove uneeded checks for GMAC version
>>   net: stmmac: Move PTP and MMC base address calculation to hwif.c
>>   net: stmmac: Uniformize the use of dma_init_* callbacks
>>   net: stmmac: Remove uneeded check for GMAC version in stmmac_xmit
>>   net: stmmac: Uniformize set_rx_owner()
>>   net: stmmac: Let descriptor code get skbuff address
>>   net: stmmac: Remove if condition by taking advantage of hwif return
>>     code
>>
>>  drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c  |   82 +++++---
>>  .../net/ethernet/stmicro/stmmac/dwmac1000_dma.c    |   92 ++++++----
>>  drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c |   35 +++--
>>  drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c |   34 +++-
>>  drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c   |    7 +-
>>  drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h   |    1 -
>>  drivers/net/ethernet/stmicro/stmmac/enh_desc.c     |   20 ++-
>>  drivers/net/ethernet/stmicro/stmmac/hwif.c         |   34 ++++
>>  drivers/net/ethernet/stmicro/stmmac/hwif.h         |   27 ++-
>>  drivers/net/ethernet/stmicro/stmmac/norm_desc.c    |   20 ++-
>>  drivers/net/ethernet/stmicro/stmmac/stmmac.h       |    1 +
>>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  198 +++++++-------------
>>  12 files changed, 323 insertions(+), 228 deletions(-)
>>
>>
> Hello
>
> You didnt have put in CC linux-kernel@vger.kernel.org as required by get_maintener.pl letting more people to see this series.
> Since this series touch dwmac-sun8i.c you should have also added Chen-Yu Tsai/Maxime Ripard (as also asked by get_maintainer).

Usually I just cc according to MAINTAINERS file but thanks for
noticing. Added in cc now.

Thanks and Best Regards,
Jose Miguel Abreu

>
> Regards

^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: Populate missing callbacks in HWIF initialization
From: Corentin Labbe @ 2018-05-18 14:25 UTC (permalink / raw)
  To: Jose Abreu
  Cc: netdev, David S. Miller, Joao Pinto, Giuseppe Cavallaro,
	Alexandre Torgue
In-Reply-To: <a08345ae8576d5ebddcd31ee73beea18484a671d.1526550924.git.joabreu@synopsys.com>

On Thu, May 17, 2018 at 10:57:28AM +0100, Jose Abreu wrote:
> Some HW specific setusp, like sun8i, do not populate all the necessary
> callbacks, which is what HWIF helpers were expecting.
> 
> Fix this by always trying to get the generic helpers and populate them
> if they were not previously populated by HW specific setup.
> 
> Signed-off-by: Jose Abreu <joabreu@synopsys.com>
> Fixes: 5f0456b43140 ("net: stmmac: Implement logic to automatically
> select HW Interface")
> Reported-by: Corentin Labbe <clabbe.montjoie@gmail.com>
> Cc: Corentin Labbe <clabbe.montjoie@gmail.com>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Joao Pinto <jpinto@synopsys.com>
> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> Cc: Alexandre Torgue <alexandre.torgue@st.com>
> ---
> Hi Corentin,
> 
> Please check if this patch makes sun8i work again.
> 
> Thanks and Best Regards,
> Jose Miguel Abreu
> ---

Hello

Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>

Thanks for the quick fix.

Note that this patch conflict with your next v3 serie

Regards

^ permalink raw reply

* Re: [PATCH v3 net-next 00/12] net: stmmac: Clean-up and tune-up
From: Corentin Labbe @ 2018-05-18 14:28 UTC (permalink / raw)
  To: Jose Abreu
  Cc: netdev, David S. Miller, Joao Pinto, Vitor Soares,
	Giuseppe Cavallaro, Alexandre Torgue, Maxime Ripard, Chen-Yu Tsai
In-Reply-To: <a42375f2-6521-f959-75ea-2a5be1680be9@synopsys.com>

On Fri, May 18, 2018 at 03:23:44PM +0100, Jose Abreu wrote:
> Hi Corentin,
> 
> On 18-05-2018 15:12, Corentin Labbe wrote:
> > On Fri, May 18, 2018 at 02:55:57PM +0100, Jose Abreu wrote:
> >> This targets to uniformize the handling of the different GMAC versions in
> >> stmmac_main.c file and also tune-up the HW.
> >>
> >> Currently there are some if/else conditions in the main source file which
> >> calls different callbacks depending on the ID of GMAC.
> >>
> >> With the introducion of a generic HW interface handling which automatically
> >> selects the GMAC callbacks to be used, it is now unpleasant to see if
> >> conditions in the main code because this should be completely agnostic of the
> >> GMAC version.
> >>
> >> This series removes most of these conditions. There are some if conditions
> >> that remain untouched but the callbacks handling are now uniformized.
> >>
> >> Tested in GMAC5, hope I didn't break any previous versions.
> >>
> >> Please check [1] for performance analisys of patches 3-12.
> >>
> >> ---
> >> David,
> >>
> >> This will probably generate a merge conflict with [2] (which was not merged
> >> yet). I'm waiting for Corentin input and then, if this series is merged
> >> before, I will rebase [2]. Or the other way around if you prefer :D
> >>
> >> Thanks
> >> ---
> >>
> >> Cc: David S. Miller <davem@davemloft.net>
> >> Cc: Joao Pinto <jpinto@synopsys.com>
> >> Cc: Vitor Soares <soares@synopsys.com>
> >> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
> >> Cc: Alexandre Torgue <alexandre.torgue@st.com>
> >>
> >> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dnetdev-26m-3D152656352607905-26w-3D2&d=DwIBAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=WHDsc6kcWAl4i96Vm5hJ_19IJiuxx_p_Rzo2g-uHDKw&m=a7bgrSQpisaMSa5fT-je94smZ_TM7QTxNFKqkvI5Nns&s=Tr23Xj_UCR_PaJp8AYiy18hfhbILnsaCsKDT5_4m2z4&e=
> >> [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_915286_&d=DwIBAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=WHDsc6kcWAl4i96Vm5hJ_19IJiuxx_p_Rzo2g-uHDKw&m=a7bgrSQpisaMSa5fT-je94smZ_TM7QTxNFKqkvI5Nns&s=Q0SV-ZR35zIJWjiaLNXqlOWchppQ2CsO-Fh-BFCjCB8&e=
> >>
> >> Jose Abreu (12):
> >>   net: stmmac: Enable OSP for GMAC4
> >>   net: stmmac: Do not keep rearming the coalesce timer in stmmac_xmit
> >>   net: stmmac: Let descriptor code set skbuff address
> >>   net: stmmac: Let descriptor code clear the descriptor
> >>   net: stmmac: Uniformize the use of dma_{rx/tx}_mode callbacks
> >>   net: stmmac: Remove uneeded checks for GMAC version
> >>   net: stmmac: Move PTP and MMC base address calculation to hwif.c
> >>   net: stmmac: Uniformize the use of dma_init_* callbacks
> >>   net: stmmac: Remove uneeded check for GMAC version in stmmac_xmit
> >>   net: stmmac: Uniformize set_rx_owner()
> >>   net: stmmac: Let descriptor code get skbuff address
> >>   net: stmmac: Remove if condition by taking advantage of hwif return
> >>     code
> >>
> >>  drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c  |   82 +++++---
> >>  .../net/ethernet/stmicro/stmmac/dwmac1000_dma.c    |   92 ++++++----
> >>  drivers/net/ethernet/stmicro/stmmac/dwmac100_dma.c |   35 +++--
> >>  drivers/net/ethernet/stmicro/stmmac/dwmac4_descs.c |   34 +++-
> >>  drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c   |    7 +-
> >>  drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h   |    1 -
> >>  drivers/net/ethernet/stmicro/stmmac/enh_desc.c     |   20 ++-
> >>  drivers/net/ethernet/stmicro/stmmac/hwif.c         |   34 ++++
> >>  drivers/net/ethernet/stmicro/stmmac/hwif.h         |   27 ++-
> >>  drivers/net/ethernet/stmicro/stmmac/norm_desc.c    |   20 ++-
> >>  drivers/net/ethernet/stmicro/stmmac/stmmac.h       |    1 +
> >>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  198 +++++++-------------
> >>  12 files changed, 323 insertions(+), 228 deletions(-)
> >>
> >>
> > Hello
> >
> > You didnt have put in CC linux-kernel@vger.kernel.org as required by get_maintener.pl letting more people to see this series.
> > Since this series touch dwmac-sun8i.c you should have also added Chen-Yu Tsai/Maxime Ripard (as also asked by get_maintainer).
> 
> Usually I just cc according to MAINTAINERS file but thanks for
> noticing. Added in cc now.
> 

./scripts/get_maintainer.pl does this for you (and it use MAINTAINERS).
You have to use it at least, since it handle regex that could be easily unseen like for dwmac-sun8i.

^ permalink raw reply

* Re: [RFC v4 3/5] virtio_ring: add packed ring support
From: Tiwei Bie @ 2018-05-18 14:33 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann
In-Reply-To: <e247b46a-183f-3959-859b-aebb0bf81c07@redhat.com>

On Fri, May 18, 2018 at 09:17:05PM +0800, Jason Wang wrote:
> On 2018年05月18日 19:29, Tiwei Bie wrote:
> > On Thu, May 17, 2018 at 08:01:52PM +0800, Jason Wang wrote:
> > > On 2018年05月16日 22:33, Tiwei Bie wrote:
> > > > On Wed, May 16, 2018 at 10:05:44PM +0800, Jason Wang wrote:
> > > > > On 2018年05月16日 21:45, Tiwei Bie wrote:
> > > > > > On Wed, May 16, 2018 at 08:51:43PM +0800, Jason Wang wrote:
> > > > > > > On 2018年05月16日 20:39, Tiwei Bie wrote:
> > > > > > > > On Wed, May 16, 2018 at 07:50:16PM +0800, Jason Wang wrote:
> > > > > > > > > On 2018年05月16日 16:37, Tiwei Bie wrote:
> > > > [...]
> > > > > > > > > > +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
> > > > > > > > > > +			      unsigned int id, void **ctx)
> > > > > > > > > > +{
> > > > > > > > > > +	struct vring_packed_desc *desc;
> > > > > > > > > > +	unsigned int i, j;
> > > > > > > > > > +
> > > > > > > > > > +	/* Clear data ptr. */
> > > > > > > > > > +	vq->desc_state[id].data = NULL;
> > > > > > > > > > +
> > > > > > > > > > +	i = head;
> > > > > > > > > > +
> > > > > > > > > > +	for (j = 0; j < vq->desc_state[id].num; j++) {
> > > > > > > > > > +		desc = &vq->vring_packed.desc[i];
> > > > > > > > > > +		vring_unmap_one_packed(vq, desc);
> > > > > > > > > As mentioned in previous discussion, this probably won't work for the case
> > > > > > > > > of out of order completion since it depends on the information in the
> > > > > > > > > descriptor ring. We probably need to extend ctx to record such information.
> > > > > > > > Above code doesn't depend on the information in the descriptor
> > > > > > > > ring. The vq->desc_state[] is the extended ctx.
> > > > > > > > 
> > > > > > > > Best regards,
> > > > > > > > Tiwei Bie
> > > > > > > Yes, but desc is a pointer to descriptor ring I think so
> > > > > > > vring_unmap_one_packed() still depends on the content of descriptor ring?
> > > > > > > 
> > > > > > I got your point now. I think it makes sense to reserve
> > > > > > the bits of the addr field. Driver shouldn't try to get
> > > > > > addrs from the descriptors when cleanup the descriptors
> > > > > > no matter whether we support out-of-order or not.
> > > > > Maybe I was wrong, but I remember spec mentioned something like this.
> > > > You're right. Spec mentioned this. I was just repeating
> > > > the spec to emphasize that it does make sense. :)
> > > > 
> > > > > > But combining it with the out-of-order support, it will
> > > > > > mean that the driver still needs to maintain a desc/ctx
> > > > > > list that is very similar to the desc ring in the split
> > > > > > ring. I'm not quite sure whether it's something we want.
> > > > > > If it is true, I'll do it. So do you think we also want
> > > > > > to maintain such a desc/ctx list for packed ring?
> > > > > To make it work for OOO backends I think we need something like this
> > > > > (hardware NIC drivers are usually have something like this).
> > > > Which hardware NIC drivers have this?
> > > It's quite common I think, e.g driver track e.g dma addr and page frag
> > > somewhere. e.g the ring->rx_info in mlx4 driver.
> > It seems that I had a misunderstanding on your
> > previous comments. I know it's quite common for
> > drivers to track e.g. DMA addrs somewhere (and
> > I think one reason behind this is that they want
> > to reuse the bits of addr field).
> 
> Yes, we may want this for virtio-net as well in the future.
> 
> >   But tracking
> > addrs somewhere doesn't means supporting OOO.
> > I thought you were saying it's quite common for
> > hardware NIC drivers to support OOO (i.e. NICs
> > will return the descriptors OOO):
> > 
> > I'm not familiar with mlx4, maybe I'm wrong.
> > I just had a quick glance. And I found below
> > comments in mlx4_en_process_rx_cq():
> > 
> > ```
> > /* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
> >   * descriptor offset can be deduced from the CQE index instead of
> >   * reading 'cqe->index' */
> > index = cq->mcq.cons_index & ring->size_mask;
> > cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
> > ```
> > 
> > It seems that although they have a completion
> > queue, they are still using the ring in order.
> 
> I guess so (at least from the above bits). Git grep -i "out of order" in
> drivers/net gives some hints. Looks like there're few deivces do this.
> 
> > I guess maybe storage device may want OOO.
> 
> Right, some iSCSI did.
> 
> But tracking them elsewhere is not only for OOO.
> 
> Spec said:
> 
> for element address
> 
> "
> In a used descriptor, Element Address is unused.
> "
> 
> for Next flag:
> 
> "
> For example, if descriptors are used in the same order in which they are
> made available, this will result in
> the used descriptor overwriting the first available descriptor in the list,
> the used descriptor for the next list
> overwriting the first available descriptor in the next list, etc.
> "
> 
> for in order completion:
> 
> "
> This will result in the used descriptor overwriting the first available
> descriptor in the batch, the used descriptor
> for the next batch overwriting the first available descriptor in the next
> batch, etc.
> "
> 
> So:
> 
> - It's an alignment to the spec
> - device may (or should) overwrite the descriptor make also make address
> field useless.

You didn't get my point...

I agreed driver should track the DMA addrs or some
other necessary things from the very beginning. And
I also repeated the spec to emphasize that it does
make sense. And I'd like to do that.

What I was saying is that, to support OOO, we may
need to manage these context (which saves DMA addrs
etc) via a list which is similar to the desc list
maintained via `next` in split ring instead of an
array whose elements always can be indexed directly.

The desc ring in split ring is an array, but its
free entries are managed as list via next. I was
just wondering, do we want to manage such a list
because of OOO. It's just a very simple question
that I want to hear your opinion... (It doesn't
means anything, e.g. It doesn't mean I don't want
to support OOO. It's just a simple question...)

Best regards,
Tiwei Bie

> 
> Thanks
> 
> > 
> > Best regards,
> > Tiwei Bie
> > 
> > > Thanks
> > > 
> > > > > Not for the patch, but it looks like having a OUT_OF_ORDER feature bit is
> > > > > much more simpler to be started with.
> > > > +1
> > > > 
> > > > Best regards,
> > > > Tiwei Bie
> 

^ permalink raw reply

* Re: [PATCH] hippi: fix spelling mistake: "Framming" -> "Framing"
From: Jes Sorensen @ 2018-05-18 14:33 UTC (permalink / raw)
  To: Colin King, David S . Miller, netdev; +Cc: kernel-janitors, linux-kernel
In-Reply-To: <20180518100922.28790-1-colin.king@canonical.com>

On 05/18/2018 06:09 AM, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> Trivial fix to spelling mistake in printk message text
> 
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
> ---
>  drivers/net/hippi/rrunner.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/hippi/rrunner.c b/drivers/net/hippi/rrunner.c
> index 1ab97d99b9ba..f41116488079 100644
> --- a/drivers/net/hippi/rrunner.c
> +++ b/drivers/net/hippi/rrunner.c
> @@ -867,7 +867,7 @@ static u32 rr_handle_event(struct net_device *dev, u32 prodidx, u32 eidx)
>  			       dev->name);
>  			goto drop;
>  		case E_FRM_ERR:
> -			printk(KERN_WARNING "%s: Framming Error\n",
> +			printk(KERN_WARNING "%s: Framing Error\n",
>  			       dev->name);
>  			goto drop;
>  		case E_FLG_SYN_ERR:
> 

Fine with me! I do wonder if it's time to retire this driver and the
HIPPI code. I haven't had access to hardware for over a decade so I have
no idea if it even works anymore.

Jes

^ permalink raw reply

* Re: [PATCH 1/2] bpf: sockmap, double free in __sock_map_ctx_update_elem()
From: Dan Carpenter @ 2018-05-18 14:39 UTC (permalink / raw)
  To: Daniel Borkmann, Gustavo A. R. Silva
  Cc: Alexei Starovoitov, John Fastabend, netdev, kernel-janitors
In-Reply-To: <e9dc40fb-ffa6-58af-d9df-eaab2d0a9259@iogearbox.net>

On Fri, May 18, 2018 at 10:27:18AM +0200, Daniel Borkmann wrote:
> 
> Thanks for the two fixes, appreciate it! There were two similar ones that
> fix the same issues which were already applied yesterday to bpf-next:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=0e4364560361d57e8cd873a8990327f3471d7d8a
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=a78622932c27e8ec33e5ba180f3d2e87fb806b28

Hey Gustavo,

We're sort of duplicating each other's work.  Could you CC
kernel-janitors@vger.kernel.org for static checker fixes so that I can
see what you're working on?

We'll probably still send the occasional duplicate which is fine...

regards,
dan carpenter

^ permalink raw reply

* Re: [PATCH net] tuntap: raise EPOLLOUT on device up
From: Michael S. Tsirkin @ 2018-05-18 14:46 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, Hannes Frederic Sowa, Eric Dumazet
In-Reply-To: <ac404f5e-22ea-00e7-8415-63116ce4881e@redhat.com>

On Fri, May 18, 2018 at 10:11:54PM +0800, Jason Wang wrote:
> 
> 
> On 2018年05月18日 22:06, Michael S. Tsirkin wrote:
> > On Fri, May 18, 2018 at 10:00:31PM +0800, Jason Wang wrote:
> > > 
> > > On 2018年05月18日 21:26, Jason Wang wrote:
> > > > 
> > > > On 2018年05月18日 21:13, Michael S. Tsirkin wrote:
> > > > > On Fri, May 18, 2018 at 09:00:43PM +0800, Jason Wang wrote:
> > > > > > We return -EIO on device down but can not raise EPOLLOUT after it was
> > > > > > up. This may confuse user like vhost which expects tuntap to raise
> > > > > > EPOLLOUT to re-enable its TX routine after tuntap is down. This could
> > > > > > be easily reproduced by transmitting packets from VM while down and up
> > > > > > the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
> > > > > > 
> > > > > > Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > > > > > Cc: Eric Dumazet <edumazet@google.com>
> > > > > > Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
> > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > ---
> > > > > >    drivers/net/tun.c | 4 +++-
> > > > > >    1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > > > > index d45ac37..1b29761 100644
> > > > > > --- a/drivers/net/tun.c
> > > > > > +++ b/drivers/net/tun.c
> > > > > > @@ -1734,8 +1734,10 @@ static ssize_t tun_get_user(struct
> > > > > > tun_struct *tun, struct tun_file *tfile,
> > > > > >        int skb_xdp = 1;
> > > > > >        bool frags = tun_napi_frags_enabled(tun);
> > > > > >    -    if (!(tun->dev->flags & IFF_UP))
> > > > > > +    if (!(tun->dev->flags & IFF_UP)) {
> > > > > Isn't this racy?  What if flag is cleared at this point?
> > > > I think you mean "set at this point"? Then yes, so we probably need to
> > > > set the bit during tun_net_close().
> > > > 
> > > > Thanks
> > > Looks no need, vhost will poll socket after it see EIO. So we are ok here?
> > > 
> > > Thanks
> > In fact I don't even understand why does this help any longer.
> > 
> 
> We disable tx polling and only enable it on demand for a better rx
> performance. You may want to have a look at :
> 
> commit feb8892cb441c742d4220cf7ced001e7fa070731
> Author: Jason Wang <jasowang@redhat.com>
> Date:   Mon Nov 13 11:45:34 2017 +0800
> 
>     vhost_net: conditionally enable tx polling
> 
> Thanks


Question is, what looks at SOCKWQ_ASYNC_NOSPACE.
I think it's tested when packet is transmitted,
but there is no guarantee here any packet will
ever be transmitted.

-- 
MST

^ permalink raw reply

* Re: [PATCH v3 net-next 00/12] net: stmmac: Clean-up and tune-up
From: David Miller @ 2018-05-18 15:09 UTC (permalink / raw)
  To: Jose.Abreu
  Cc: netdev, Joao.Pinto, Vitor.Soares, peppe.cavallaro,
	alexandre.torgue
In-Reply-To: <cover.1526651009.git.joabreu@synopsys.com>

From: Jose Abreu <Jose.Abreu@synopsys.com>
Date: Fri, 18 May 2018 14:55:57 +0100

> This targets to uniformize the handling of the different GMAC versions in
> stmmac_main.c file and also tune-up the HW.
> 
> Currently there are some if/else conditions in the main source file which
> calls different callbacks depending on the ID of GMAC.
> 
> With the introducion of a generic HW interface handling which automatically
> selects the GMAC callbacks to be used, it is now unpleasant to see if
> conditions in the main code because this should be completely agnostic of the
> GMAC version.
> 
> This series removes most of these conditions. There are some if conditions
> that remain untouched but the callbacks handling are now uniformized.
> 
> Tested in GMAC5, hope I didn't break any previous versions.
> 
> Please check [1] for performance analisys of patches 3-12.

This looks a lot better, series applied, thanks.


> This will probably generate a merge conflict with [2] (which was not merged
> yet). I'm waiting for Corentin input and then, if this series is merged
> before, I will rebase [2]. Or the other way around if you prefer :D

Since I just merged this, please rebase 2.

Thank you.

^ permalink raw reply

* Re: VLAN performance issues with Open vSwitch when 8021q not loaded
From: Eric Dumazet @ 2018-05-18 15:12 UTC (permalink / raw)
  To: Mikhail Sennikovsky, netdev
In-Reply-To: <CALHVEJaaY=DeavKjBkcuDT19EpLhmCyJunP1V7hyNJsCkqvrdw@mail.gmail.com>



On 05/18/2018 06:24 AM, Mikhail Sennikovsky wrote:
> Hi all,
> 
> We were recently experiencing network performance issues with VLAN
> networking setup with Open vSwitch, for the ingress traffic coming to
> VLAN trunk port of the ovs.
> As we discovered, the issue was caused by gro not working for it,
> which in turn was because the gro receive callbacks for 802.1Q payload
> type are defined in the 8021q module (see
> https://elixir.bootlin.com/linux/v4.16.9/source/net/8021q/vlan.c#L762
> ), which was not loaded.
> This resulted in a significant bandwidth performance drop, having
> ~3Gbps instead of the expected ~7Gbps for a simple iperf3 test in our
> case.
> 
> The obvious work-around would be to load the 8021q module, which
> indeed makes bandwidth performance back to the expected numbers.
> This seems like a hidden and not obvious magic however.
> 
> So I'm questioning, whether it makes sense to have the gro receive
> callbacks for 802.1Q and 802.1ad moved to some common place, that
> would be used/enabled by both 8021q and openvswitch modules.


Sure, GRO should do whatever it can.

TCP IPv6 traffic can be aggregated by GRO even if IPv6 module is not loaded.

net/ipv6/Makefile :

ipv6-offload := ip6_offload.o tcpv6_offload.o exthdrs_offload.o
...
obj-$(CONFIG_INET) += output_core.o protocol.o $(ipv6-offload)

^ permalink raw reply

* Re: [PATCH bpf v2 1/6] bpf: support 64-bit offsets for bpf function calls
From: Daniel Borkmann @ 2018-05-18 15:15 UTC (permalink / raw)
  To: Sandipan Das, ast; +Cc: netdev, linuxppc-dev, naveen.n.rao, mpe, jakub.kicinski
In-Reply-To: <20180518125039.6500-2-sandipan@linux.vnet.ibm.com>

On 05/18/2018 02:50 PM, Sandipan Das wrote:
> The imm field of a bpf instruction is a signed 32-bit integer.
> For JIT bpf-to-bpf function calls, it stores the offset of the
> start address of the callee's JITed image from __bpf_call_base.
> 
> For some architectures, such as powerpc64, this offset may be
> as large as 64 bits and cannot be accomodated in the imm field
> without truncation.
> 
> We resolve this by:
> 
> [1] Additionally using the auxillary data of each function to
>     keep a list of start addresses of the JITed images for all
>     functions determined by the verifier.
> 
> [2] Retaining the subprog id inside the off field of the call
>     instructions and using it to index into the list mentioned
>     above and lookup the callee's address.
> 
> To make sure that the existing JIT compilers continue to work
> without requiring changes, we keep the imm field as it is.
> 
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
>  kernel/bpf/verifier.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index a9e4b1372da6..6c56cce9c4e3 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5383,11 +5383,24 @@ static int jit_subprogs(struct bpf_verifier_env *env)
>  			    insn->src_reg != BPF_PSEUDO_CALL)
>  				continue;
>  			subprog = insn->off;
> -			insn->off = 0;
>  			insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
>  				func[subprog]->bpf_func -
>  				__bpf_call_base;
>  		}
> +
> +		/* we use the aux data to keep a list of the start addresses
> +		 * of the JITed images for each function in the program
> +		 *
> +		 * for some architectures, such as powerpc64, the imm field
> +		 * might not be large enough to hold the offset of the start
> +		 * address of the callee's JITed image from __bpf_call_base
> +		 *
> +		 * in such cases, we can lookup the start address of a callee
> +		 * by using its subprog id, available from the off field of
> +		 * the call instruction, as an index for this list
> +		 */
> +		func[i]->aux->func = func;
> +		func[i]->aux->func_cnt = env->subprog_cnt + 1;

The target tree you have here is infact bpf, since in bpf-next there was a
cleanup where the + 1 is removed. Just for the record that we need to keep
this in mind for bpf into bpf-next merge since this would otherwise subtly
break.

>  	}
>  	for (i = 0; i < env->subprog_cnt; i++) {
>  		old_bpf_func = func[i]->bpf_func;
> 

^ permalink raw reply

* Re: [PATCH bpf-next v3 00/15] Introducing AF_XDP support
From: Björn Töpel @ 2018-05-18 15:18 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Alexei Starovoitov, Karlsson, Magnus,
	Duyck, Alexander H, Alexander Duyck, John Fastabend,
	Jesper Dangaard Brouer, Willem de Bruijn, Michael S. Tsirkin,
	Netdev, Björn Töpel, michael.lundkvist,
	Brandeburg, Jesse, Singhai, Anjali, Zhang, Qi Z
In-Reply-To: <3cc841af-d950-c961-7865-a5694eb5c618@iogearbox.net>

2018-05-18 15:43 GMT+02:00 Daniel Borkmann <daniel@iogearbox.net>:
> On 05/18/2018 05:38 AM, Alexei Starovoitov wrote:
>> On 5/16/18 11:46 PM, Björn Töpel wrote:
>>> 2018-05-04 1:38 GMT+02:00 Alexei Starovoitov <alexei.starovoitov@gmail.com>:
>>>> On Fri, May 04, 2018 at 12:49:09AM +0200, Daniel Borkmann wrote:
>>>>> On 05/02/2018 01:01 PM, Björn Töpel wrote:
>>>>>> From: Björn Töpel <bjorn.topel@intel.com>
>>>>>>
>>>>>> This patch set introduces a new address family called AF_XDP that is
>>>>>> optimized for high performance packet processing and, in upcoming
>>>>>> patch sets, zero-copy semantics. In this patch set, we have removed
>>>>>> all zero-copy related code in order to make it smaller, simpler and
>>>>>> hopefully more review friendly. This patch set only supports copy-mode
>>>>>> for the generic XDP path (XDP_SKB) for both RX and TX and copy-mode
>>>>>> for RX using the XDP_DRV path. Zero-copy support requires XDP and
>>>>>> driver changes that Jesper Dangaard Brouer is working on. Some of his
>>>>>> work has already been accepted. We will publish our zero-copy support
>>>>>> for RX and TX on top of his patch sets at a later point in time.
>>>>>
>>>>> +1, would be great to see it land this cycle. Saw few minor nits here
>>>>> and there but nothing to hold it up, for the series:
>>>>>
>>>>> Acked-by: Daniel Borkmann <daniel@iogearbox.net>
>>>>>
>>>>> Thanks everyone!
>>>>
>>>> Great stuff!
>>>>
>>>> Applied to bpf-next, with one condition.
>>>> Upcoming zero-copy patches for both RX and TX need to be posted
>>>> and reviewed within this release window.
>>>> If netdev community as a whole won't be able to agree on the zero-copy
>>>> bits we'd need to revert this feature before the next merge window.
>>>>
>>>> Few other minor nits:
>>>> patch 3:
>>>> +struct xdp_ring {
>>>> +       __u32 producer __attribute__((aligned(64)));
>>>> +       __u32 consumer __attribute__((aligned(64)));
>>>> +};
>>>> It kinda begs for ____cacheline_aligned_in_smp to be introduced for uapi headers.
>>>
>>> Hmm, I need some guidance on what a sane uapi variant would be. We
>>> can't have the uapi depend on the kernel build. ARM64, e.g., can have
>>> both 64B and 128B according to the specs. Contemporary IA processors
>>> have 64B.
>>>
>>> The simplest, and maybe most future-proof, would be 128B aligned for
>>> all. Another is having 128B for ARM and 64B for all IA. A third option
>>> is having a hand-shaking API (I think virtio has that) for determine
>>> the cache line size, but I'd rather not go down that route.
>>>
>>> Thoughts/ideas on how a uapi ____cacheline_aligned_in_smp version
>>> would look like?
>>
>> I suspect i40e+arm combination wasn't tested anyway.
>> The api may have endianness issues too on something like sparc.
>> I think the way to be backwards compatible in this area
>> is to make the api usable on x86 only by adding
>> to include/uapi/linux/if_xdp.h
>> #if defined(__x86_64__)
>> #define AF_XDP_CACHE_BYTES 64
>> #else
>> #error "AF_XDP support is not yet available for this architecture"
>> #endif
>> and doing:
>>     __u32 producer __attribute__((aligned(AF_XDP_CACHE_BYTES)));
>>     __u32 consumer __attribute__((aligned(AF_XDP_CACHE_BYTES)));
>>
>> And progressively add to this for arm64 and few other archs.
>> Eventually removing #error and adding some generic define
>> that's good enough for long tail of architectures that
>> we really cannot test.
>
> Been looking into this yesterday as well a bit, and it's a bit of a mess what
> uapi headers do on this regard (though there are just a handful of such headers).
> Some of the kernel uapi headers hard-code generally 64 bytes regardless of the
> underlying arch. In general, the kernel does expose it to user space via sysfs
> (coherency_line_size). Here's what perf does to retrieve it:
>
> #ifdef _SC_LEVEL1_DCACHE_LINESIZE
> #define cache_line_size(cacheline_sizep) *cacheline_sizep = sysconf(_SC_LEVEL1_DCACHE_LINESIZE)
> #else
> static void cache_line_size(int *cacheline_sizep)
> {
>         if (sysfs__read_int("devices/system/cpu/cpu0/cache/index0/coherency_line_size", cacheline_sizep))
>                 pr_debug("cannot determine cache line size");
> }
> #endif
>
> The sysconf() implementation for _SC_LEVEL1_DCACHE_LINESIZE seems also only
> available for x86, arm64, s390 and ppc on a cursory glance in the glibc code.
> In the x86 case it retrieves the info from cpuid insn. In order to generically
> use it in combination with the header you'd have some probe which would then
> set this as a define before including the header.
>

But as a uapi we cannot depend on the L1 cache line size for what's
currently running on the system, right? So, either a "one cache line
size for all flavors of an arch", e.g. for ARMv8 that would be 128,
even though there can be 64 flavors out there.

Another way would be to remove the ring structure completely, and
leave that to the user-space application to figure out. So there's a
runtime interface (getsockopt) to probe the offsets the head and tail
pointer is after/before the mmap call, and only expose the descriptor
format explicitly in if_xdp.h. Don't know if that is too unorthodox or
not...

> Then projects like urcu, they do ...
>
> #define ____cacheline_internodealigned_in_smp \
>         __attribute__((__aligned__(CAA_CACHE_LINE_SIZE)))
>
> ... and then hard code CAA_CACHE_LINE_SIZE for x86 (== 128), s390 (== 128),
> ppc (== 256) and sparc64 (== 256) with a generic fallback to 64.
>
> Hmm, perhaps a combination of the two would make sense where in case of known
> cacheline size it can still be used and we only have the fallback in such way.
> Like:
>
> #ifndef XDP_CACHE_BYTES
> # if defined(__x86_64__)
> #  define XDP_CACHE_BYTES       64
> # else
> #  error "Please define XDP_CACHE_BYTES for this architecture!"
> # endif
> #endif
>
> Too bad there's no asm uapi header at least for the archs where it's fixed
> anyway such that not every project out there has to redefine all of it from
> scratch and we could just include it (and the generic-asm one would throw
> a compile error if it's not externally defined or such).
>

I started out with adding a cache.h to the arch/XXX/include/uapi, but
then realized that this didn't really work. :-)

> Cheers,
> Daniel

^ permalink raw reply

* [PATCH] powerpc/32: Implement csum_ipv6_magic in assembly
From: Christophe Leroy @ 2018-05-18 15:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linux-kernel, linuxppc-dev, netdev

The generic csum_ipv6_magic() generates a pretty bad result

00000000 <csum_ipv6_magic>:
   0:	81 23 00 00 	lwz     r9,0(r3)
   4:	81 03 00 04 	lwz     r8,4(r3)
   8:	7c e7 4a 14 	add     r7,r7,r9
   c:	7d 29 38 10 	subfc   r9,r9,r7
  10:	7d 4a 51 10 	subfe   r10,r10,r10
  14:	7d 27 42 14 	add     r9,r7,r8
  18:	7d 2a 48 50 	subf    r9,r10,r9
  1c:	80 e3 00 08 	lwz     r7,8(r3)
  20:	7d 08 48 10 	subfc   r8,r8,r9
  24:	7d 4a 51 10 	subfe   r10,r10,r10
  28:	7d 29 3a 14 	add     r9,r9,r7
  2c:	81 03 00 0c 	lwz     r8,12(r3)
  30:	7d 2a 48 50 	subf    r9,r10,r9
  34:	7c e7 48 10 	subfc   r7,r7,r9
  38:	7d 4a 51 10 	subfe   r10,r10,r10
  3c:	7d 29 42 14 	add     r9,r9,r8
  40:	7d 2a 48 50 	subf    r9,r10,r9
  44:	80 e4 00 00 	lwz     r7,0(r4)
  48:	7d 08 48 10 	subfc   r8,r8,r9
  4c:	7d 4a 51 10 	subfe   r10,r10,r10
  50:	7d 29 3a 14 	add     r9,r9,r7
  54:	7d 2a 48 50 	subf    r9,r10,r9
  58:	81 04 00 04 	lwz     r8,4(r4)
  5c:	7c e7 48 10 	subfc   r7,r7,r9
  60:	7d 4a 51 10 	subfe   r10,r10,r10
  64:	7d 29 42 14 	add     r9,r9,r8
  68:	7d 2a 48 50 	subf    r9,r10,r9
  6c:	80 e4 00 08 	lwz     r7,8(r4)
  70:	7d 08 48 10 	subfc   r8,r8,r9
  74:	7d 4a 51 10 	subfe   r10,r10,r10
  78:	7d 29 3a 14 	add     r9,r9,r7
  7c:	7d 2a 48 50 	subf    r9,r10,r9
  80:	81 04 00 0c 	lwz     r8,12(r4)
  84:	7c e7 48 10 	subfc   r7,r7,r9
  88:	7d 4a 51 10 	subfe   r10,r10,r10
  8c:	7d 29 42 14 	add     r9,r9,r8
  90:	7d 2a 48 50 	subf    r9,r10,r9
  94:	7d 08 48 10 	subfc   r8,r8,r9
  98:	7d 4a 51 10 	subfe   r10,r10,r10
  9c:	7d 29 2a 14 	add     r9,r9,r5
  a0:	7d 2a 48 50 	subf    r9,r10,r9
  a4:	7c a5 48 10 	subfc   r5,r5,r9
  a8:	7c 63 19 10 	subfe   r3,r3,r3
  ac:	7d 29 32 14 	add     r9,r9,r6
  b0:	7d 23 48 50 	subf    r9,r3,r9
  b4:	7c c6 48 10 	subfc   r6,r6,r9
  b8:	7c 63 19 10 	subfe   r3,r3,r3
  bc:	7c 63 48 50 	subf    r3,r3,r9
  c0:	54 6a 80 3e 	rotlwi  r10,r3,16
  c4:	7c 63 52 14 	add     r3,r3,r10
  c8:	7c 63 18 f8 	not     r3,r3
  cc:	54 63 84 3e 	rlwinm  r3,r3,16,16,31
  d0:	4e 80 00 20 	blr

This patch implements it in assembly for PPC32

Link: https://github.com/linuxppc/linux/issues/9
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
 arch/powerpc/include/asm/checksum.h |  8 ++++++++
 arch/powerpc/lib/checksum_32.S      | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/arch/powerpc/include/asm/checksum.h b/arch/powerpc/include/asm/checksum.h
index 54065caa40b3..c41c280c252f 100644
--- a/arch/powerpc/include/asm/checksum.h
+++ b/arch/powerpc/include/asm/checksum.h
@@ -13,6 +13,7 @@
 #include <asm-generic/checksum.h>
 #else
 #include <linux/bitops.h>
+#include <linux/in6.h>
 /*
  * Computes the checksum of a memory block at src, length len,
  * and adds in "sum" (32-bit), while copying the block to dst.
@@ -211,6 +212,13 @@ static inline __sum16 ip_compute_csum(const void *buff, int len)
 	return csum_fold(csum_partial(buff, len, 0));
 }
 
+#ifdef CONFIG_PPC32
+#define _HAVE_ARCH_IPV6_CSUM
+__sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+			const struct in6_addr *daddr,
+			__u32 len, __u8 proto, __wsum sum);
+#endif
+
 #endif
 #endif /* __KERNEL__ */
 #endif
diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
index 9a671c774b22..f9c047145696 100644
--- a/arch/powerpc/lib/checksum_32.S
+++ b/arch/powerpc/lib/checksum_32.S
@@ -293,3 +293,36 @@ dst_error:
 	EX_TABLE(51b, dst_error);
 
 EXPORT_SYMBOL(csum_partial_copy_generic)
+
+/*
+ * static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr,
+ *				      const struct in6_addr *daddr,
+ *				      __u32 len, __u8 proto, __wsum sum)
+ */
+
+_GLOBAL(csum_ipv6_magic)
+	lwz	r8, 0(r3)
+	lwz	r9, 4(r3)
+	lwz	r10, 8(r3)
+	lwz	r11, 12(r3)
+	addc	r0, r5, r6
+	adde	r0, r0, r7
+	adde	r0, r0, r8
+	adde	r0, r0, r9
+	adde	r0, r0, r10
+	adde	r0, r0, r11
+	lwz	r8, 0(r4)
+	lwz	r9, 4(r4)
+	lwz	r10, 8(r4)
+	lwz	r11, 12(r4)
+	adde	r0, r0, r8
+	adde	r0, r0, r9
+	adde	r0, r0, r10
+	adde	r0, r0, r11
+	addze	r0
+	rotlwi	r3, r0, 16
+	add	r3, r0, r3
+	not	r3, r3
+	rlwinm	r3, r3, 16, 16, 31
+	blr
+EXPORT_SYMBOL(csum_ipv6_magic)
-- 
2.13.3

^ permalink raw reply related

* Re: [Cake] [PATCH net-next v12 3/7] sch_cake: Add optional ACK filter
From: Eric Dumazet @ 2018-05-18 15:20 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant, Cong Wang
  Cc: Toke Høiland-Jørgensen, Cake List,
	Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <05E1D675-B73B-4409-8991-EB89D5538EAB@darbyshire-bryant.me.uk>

On 05/18/2018 04:18 AM, Kevin Darbyshire-Bryant wrote:
> 

> Speaking as a user of cake’s ack filtering, although it may be an odd place, it is incredibly useful in my linux based home router middle box that usefully extracts extra usable bandwidth from my asymmetric link.  And whilst ack compression/reduction/filtering call it what you will, will come to the linux TCP stack, as yet other OS stacks are less enlightened and benefit from the router’s tweaking/meddling/interference.
> 
> 

Kevin, there is no doubt the feature might be useful.

But it has to be properly implemented, or we risk to block future TCP improvements.

An ACK filter MUST parse the TCP options.

1) If some options can not be understood, ACK MUST not be dropped.

2) SACK options MUST be carefully analyzed

3) Timestamps also need some care.

Too many middle-boxes are breaking TCP, we do not want to carry another TCP blackhole in upstream linux,
that would be a shame.

conntrack had bugs that seriously impacted TCP Fastopen deployment for example.

^ permalink raw reply

* Re: [RFC PATCH ghak32 V2 01/13] audit: add container id
From: Richard Guy Briggs @ 2018-05-18 15:21 UTC (permalink / raw)
  To: Steve Grubb
  Cc: cgroups, containers, linux-api, Linux-Audit Mailing List,
	linux-fsdevel, LKML, netdev, ebiederm, luto, jlayton, carlos,
	dhowells, viro, simo, eparis, serge
In-Reply-To: <20180518095636.56ff322d@ivy-bridge>

On 2018-05-18 09:56, Steve Grubb wrote:
> On Thu, 17 May 2018 17:56:00 -0400
> Richard Guy Briggs <rgb@redhat.com> wrote:
> 
> > > During syscall events, the path info is returned in a a record
> > > simply called AUDIT_PATH, cwd info is returned in AUDIT_CWD. So,
> > > rather than calling the record that gets attached to everything
> > > AUDIT_CONTAINER_INFO, how about simply AUDIT_CONTAINER.  
> > 
> > Considering the container initiation record is different than the
> > record to document the container involved in an otherwise normal
> > syscall, we need two names.  I don't have a strong opinion what they
> > are.
> > 
> > I'd prefer AUDIT_CONTAIN and AUDIT_CONTAINER_INFO so that the two
> > are different enough to be visually distinct while leaving
> > AUDIT_CONTAINERID for the field type in patch 4 ("audit: add
> > containerid filtering")

(Sorry, I had intended AUDIT_CONTAINER for the first in that paragraph
above.)

> How about AUDIT_CONTAINER for the auxiliary record? The one that starts
> the container, I don't have a strong opinion on. Could be
> AUDIT_CONTAINER_INIT, AUDIT_CONTAINER_START, AUDIT_CONTAINERID,
> AUDIT_CONTAINER_ID, or something else. The API call that sets the ID
> for filtering could be AUDIT_CID or AUDIT_CONTID if that helps decide
> what the initial event might be. Normally, it should match the field
> being filtered.

Ok, I had shortened the record field name to "contid=" to be unique
enough while not using too much netlink bandwidth.  I could have used
"cid=" but that could be unobvious or ambiguous.  I didn't want to use
the full "containerid=" due to that.  I suppose I could change the
field name macro to AUDIT_CONTID.

For the one that starts the container, I'd prefer to leave the name a
bit more general than "_INIT", "_START", so maybe I'll swap them around
and use AUDIT_CONTAINER_INFO for the startup record, and use
AUDIT_CONTAINER for the syscall auxiliary record.

Does that work?

> -Steve

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: [PATCH] hippi: fix spelling mistake: "Framming" -> "Framing"
From: David Miller @ 2018-05-18 15:26 UTC (permalink / raw)
  To: jes.sorensen; +Cc: colin.king, netdev, kernel-janitors, linux-kernel
In-Reply-To: <bdb779e6-1705-3607-7ad8-3b57dfcba31a@gmail.com>

From: Jes Sorensen <jes.sorensen@gmail.com>
Date: Fri, 18 May 2018 10:33:50 -0400

> I do wonder if it's time to retire this driver and the HIPPI code. I
> haven't had access to hardware for over a decade so I have no idea
> if it even works anymore.

That's a good question.

If you really think nobody is using this stuff, yes we can remove it.

Then if someone brings it to our attention that there are users, we
can very easily bring it back.

But as the HIPPI maintiner it is really up to you.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox