* [PATCH bpf 1/2] bpf: fix alignment of netns_dev/netns_ino fields in bpf_{map,prog}_info
From: Eugene Syromiatnikov @ 2018-05-27 11:28 UTC (permalink / raw)
To: netdev
Cc: linux-kernel, Martin KaFai Lau, Daniel Borkmann,
Alexei Starovoitov, David S. Miller, Jiri Olsa, Ingo Molnar,
Lawrence Brakmo, Andrey Ignatov, Jakub Kicinski, John Fastabend,
Dmitry V. Levin
Recent introduction of netns_dev/netns_ino to bpf_map_info/bpf_prog info
has broken compat, as offsets of these fields are different in 32-bit
and 64-bit ABIs. One fix (other than implementing compat support in
syscall in order to handle this discrepancy) is to use __aligned_u64
instead of __u64 for these fields.
Reported-by: Dmitry V. Levin <ldv@altlinux.org>
Fixes: 52775b33bb507 ("bpf: offload: report device information about
offloaded maps")
Fixes: 675fc275a3a2d ("bpf: offload: report device information for
offloaded programs")
Signed-off-by: Eugene Syromiatnikov <esyr@redhat.com>
---
include/uapi/linux/bpf.h | 8 ++++----
tools/include/uapi/linux/bpf.h | 8 ++++----
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c5ec897..903010a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1017,8 +1017,8 @@ struct bpf_prog_info {
__aligned_u64 map_ids;
char name[BPF_OBJ_NAME_LEN];
__u32 ifindex;
- __u64 netns_dev;
- __u64 netns_ino;
+ __aligned_u64 netns_dev;
+ __aligned_u64 netns_ino;
} __attribute__((aligned(8)));
struct bpf_map_info {
@@ -1030,8 +1030,8 @@ struct bpf_map_info {
__u32 map_flags;
char name[BPF_OBJ_NAME_LEN];
__u32 ifindex;
- __u64 netns_dev;
- __u64 netns_ino;
+ __aligned_u64 netns_dev;
+ __aligned_u64 netns_ino;
} __attribute__((aligned(8)));
/* User bpf_sock_addr struct to access socket fields and sockaddr struct passed
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c5ec897..903010a 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1017,8 +1017,8 @@ struct bpf_prog_info {
__aligned_u64 map_ids;
char name[BPF_OBJ_NAME_LEN];
__u32 ifindex;
- __u64 netns_dev;
- __u64 netns_ino;
+ __aligned_u64 netns_dev;
+ __aligned_u64 netns_ino;
} __attribute__((aligned(8)));
struct bpf_map_info {
@@ -1030,8 +1030,8 @@ struct bpf_map_info {
__u32 map_flags;
char name[BPF_OBJ_NAME_LEN];
__u32 ifindex;
- __u64 netns_dev;
- __u64 netns_ino;
+ __aligned_u64 netns_dev;
+ __aligned_u64 netns_ino;
} __attribute__((aligned(8)));
/* User bpf_sock_addr struct to access socket fields and sockaddr struct passed
--
2.1.4
^ permalink raw reply related
* [PATCH bpf 0/2] Use __aligned_u64 in UAPI fields
From: Eugene Syromiatnikov @ 2018-05-27 11:28 UTC (permalink / raw)
To: netdev
Cc: linux-kernel, Martin KaFai Lau, Daniel Borkmann,
Alexei Starovoitov, David S. Miller, Jiri Olsa, Ingo Molnar,
Lawrence Brakmo, Andrey Ignatov, Jakub Kicinski, John Fastabend,
Dmitry V. Levin
Hello.
It was discovered during strace development that struct bpf_map_info and
struct bpf_prog_info now have different layouts of i386/compat and x86_64.
Since it's already broken and bpf syscall has no separate compat (as
far as I can see), and the affecting change was introduced recently (in
Linux 4.16), it's proposed to change the layout of these structures
on 32-bit architectures by using __aligned_u64.
In order to somewhat future-proof from this problem in future, an
approach similar to the one implemented in RDMA subsystem recently
is proposed: use __aligned_u64 consistently throughout the UAPI header.
Eugene Syromiatnikov (2):
bpf: fix alignment of netns_dev/netns_ino fields in
bpf_{map,prog}_info
bpf: enforce usage of __aligned_u64 in the UAPI header
include/uapi/linux/bpf.h | 30 +++++++++++++++---------------
tools/include/uapi/linux/bpf.h | 30 +++++++++++++++---------------
2 files changed, 30 insertions(+), 30 deletions(-)
--
2.1.4
^ permalink raw reply
* [PATCH v5 3/3] bpf: add selftest for lirc_mode2 type program
From: Sean Young @ 2018-05-27 11:24 UTC (permalink / raw)
To: linux-media, linux-kernel, Alexei Starovoitov,
Mauro Carvalho Chehab, Daniel Borkmann, netdev, Matthias Reichl,
Devin Heitmueller, Y Song, Quentin Monnet
In-Reply-To: <cover.1527419762.git.sean@mess.org>
This is simple test over rc-loopback.
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Sean Young <sean@mess.org>
---
tools/bpf/bpftool/prog.c | 1 +
tools/include/uapi/linux/bpf.h | 53 ++++-
tools/include/uapi/linux/lirc.h | 217 ++++++++++++++++++
tools/lib/bpf/libbpf.c | 1 +
tools/testing/selftests/bpf/.gitignore | 1 +
tools/testing/selftests/bpf/Makefile | 7 +-
tools/testing/selftests/bpf/bpf_helpers.h | 5 +
.../testing/selftests/bpf/test_lirc_mode2.sh | 28 +++
.../selftests/bpf/test_lirc_mode2_kern.c | 23 ++
.../selftests/bpf/test_lirc_mode2_user.c | 149 ++++++++++++
10 files changed, 481 insertions(+), 4 deletions(-)
create mode 100644 tools/include/uapi/linux/lirc.h
create mode 100755 tools/testing/selftests/bpf/test_lirc_mode2.sh
create mode 100644 tools/testing/selftests/bpf/test_lirc_mode2_kern.c
create mode 100644 tools/testing/selftests/bpf/test_lirc_mode2_user.c
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 39b88e760367..a4f435203fef 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -71,6 +71,7 @@ static const char * const prog_type_name[] = {
[BPF_PROG_TYPE_SK_MSG] = "sk_msg",
[BPF_PROG_TYPE_RAW_TRACEPOINT] = "raw_tracepoint",
[BPF_PROG_TYPE_CGROUP_SOCK_ADDR] = "cgroup_sock_addr",
+ [BPF_PROG_TYPE_LIRC_MODE2] = "lirc_mode2",
};
static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 9b8c6e310e9a..4636c596096d 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -143,6 +143,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_RAW_TRACEPOINT,
BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
BPF_PROG_TYPE_LWT_SEG6LOCAL,
+ BPF_PROG_TYPE_LIRC_MODE2,
};
enum bpf_attach_type {
@@ -160,6 +161,7 @@ enum bpf_attach_type {
BPF_CGROUP_INET6_CONNECT,
BPF_CGROUP_INET4_POST_BIND,
BPF_CGROUP_INET6_POST_BIND,
+ BPF_LIRC_MODE2,
__MAX_BPF_ATTACH_TYPE
};
@@ -2004,6 +2006,53 @@ union bpf_attr {
* direct packet access.
* Return
* 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle)
+ * Description
+ * This helper is used in programs implementing IR decoding, to
+ * report a successfully decoded key press with *scancode*,
+ * *toggle* value in the given *protocol*. The scancode will be
+ * translated to a keycode using the rc keymap, and reported as
+ * an input key down event. After a period a key up event is
+ * generated. This period can be extended by calling either
+ * **bpf_rc_keydown** () again with the same values, or calling
+ * **bpf_rc_repeat** ().
+ *
+ * Some protocols include a toggle bit, in case the button was
+ * released and pressed again between consecutive scancodes.
+ *
+ * The *ctx* should point to the lirc sample as passed into
+ * the program.
+ *
+ * The *protocol* is the decoded protocol number (see
+ * **enum rc_proto** for some predefined values).
+ *
+ * This helper is only available is the kernel was compiled with
+ * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
+ * "**y**".
+ *
+ * Return
+ * 0
+ *
+ * int bpf_rc_repeat(void *ctx)
+ * Description
+ * This helper is used in programs implementing IR decoding, to
+ * report a successfully decoded repeat key message. This delays
+ * the generation of a key up event for previously generated
+ * key down event.
+ *
+ * Some IR protocols like NEC have a special IR message for
+ * repeating last button, for when a button is held down.
+ *
+ * The *ctx* should point to the lirc sample as passed into
+ * the program.
+ *
+ * This helper is only available is the kernel was compiled with
+ * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
+ * "**y**".
+ *
+ * Return
+ * 0
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -2082,7 +2131,9 @@ union bpf_attr {
FN(lwt_push_encap), \
FN(lwt_seg6_store_bytes), \
FN(lwt_seg6_adjust_srh), \
- FN(lwt_seg6_action),
+ FN(lwt_seg6_action), \
+ FN(rc_repeat), \
+ FN(rc_keydown),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
diff --git a/tools/include/uapi/linux/lirc.h b/tools/include/uapi/linux/lirc.h
new file mode 100644
index 000000000000..f189931042a7
--- /dev/null
+++ b/tools/include/uapi/linux/lirc.h
@@ -0,0 +1,217 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * lirc.h - linux infrared remote control header file
+ * last modified 2010/07/13 by Jarod Wilson
+ */
+
+#ifndef _LINUX_LIRC_H
+#define _LINUX_LIRC_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+#define PULSE_BIT 0x01000000
+#define PULSE_MASK 0x00FFFFFF
+
+#define LIRC_MODE2_SPACE 0x00000000
+#define LIRC_MODE2_PULSE 0x01000000
+#define LIRC_MODE2_FREQUENCY 0x02000000
+#define LIRC_MODE2_TIMEOUT 0x03000000
+
+#define LIRC_VALUE_MASK 0x00FFFFFF
+#define LIRC_MODE2_MASK 0xFF000000
+
+#define LIRC_SPACE(val) (((val)&LIRC_VALUE_MASK) | LIRC_MODE2_SPACE)
+#define LIRC_PULSE(val) (((val)&LIRC_VALUE_MASK) | LIRC_MODE2_PULSE)
+#define LIRC_FREQUENCY(val) (((val)&LIRC_VALUE_MASK) | LIRC_MODE2_FREQUENCY)
+#define LIRC_TIMEOUT(val) (((val)&LIRC_VALUE_MASK) | LIRC_MODE2_TIMEOUT)
+
+#define LIRC_VALUE(val) ((val)&LIRC_VALUE_MASK)
+#define LIRC_MODE2(val) ((val)&LIRC_MODE2_MASK)
+
+#define LIRC_IS_SPACE(val) (LIRC_MODE2(val) == LIRC_MODE2_SPACE)
+#define LIRC_IS_PULSE(val) (LIRC_MODE2(val) == LIRC_MODE2_PULSE)
+#define LIRC_IS_FREQUENCY(val) (LIRC_MODE2(val) == LIRC_MODE2_FREQUENCY)
+#define LIRC_IS_TIMEOUT(val) (LIRC_MODE2(val) == LIRC_MODE2_TIMEOUT)
+
+/* used heavily by lirc userspace */
+#define lirc_t int
+
+/*** lirc compatible hardware features ***/
+
+#define LIRC_MODE2SEND(x) (x)
+#define LIRC_SEND2MODE(x) (x)
+#define LIRC_MODE2REC(x) ((x) << 16)
+#define LIRC_REC2MODE(x) ((x) >> 16)
+
+#define LIRC_MODE_RAW 0x00000001
+#define LIRC_MODE_PULSE 0x00000002
+#define LIRC_MODE_MODE2 0x00000004
+#define LIRC_MODE_SCANCODE 0x00000008
+#define LIRC_MODE_LIRCCODE 0x00000010
+
+
+#define LIRC_CAN_SEND_RAW LIRC_MODE2SEND(LIRC_MODE_RAW)
+#define LIRC_CAN_SEND_PULSE LIRC_MODE2SEND(LIRC_MODE_PULSE)
+#define LIRC_CAN_SEND_MODE2 LIRC_MODE2SEND(LIRC_MODE_MODE2)
+#define LIRC_CAN_SEND_LIRCCODE LIRC_MODE2SEND(LIRC_MODE_LIRCCODE)
+
+#define LIRC_CAN_SEND_MASK 0x0000003f
+
+#define LIRC_CAN_SET_SEND_CARRIER 0x00000100
+#define LIRC_CAN_SET_SEND_DUTY_CYCLE 0x00000200
+#define LIRC_CAN_SET_TRANSMITTER_MASK 0x00000400
+
+#define LIRC_CAN_REC_RAW LIRC_MODE2REC(LIRC_MODE_RAW)
+#define LIRC_CAN_REC_PULSE LIRC_MODE2REC(LIRC_MODE_PULSE)
+#define LIRC_CAN_REC_MODE2 LIRC_MODE2REC(LIRC_MODE_MODE2)
+#define LIRC_CAN_REC_SCANCODE LIRC_MODE2REC(LIRC_MODE_SCANCODE)
+#define LIRC_CAN_REC_LIRCCODE LIRC_MODE2REC(LIRC_MODE_LIRCCODE)
+
+#define LIRC_CAN_REC_MASK LIRC_MODE2REC(LIRC_CAN_SEND_MASK)
+
+#define LIRC_CAN_SET_REC_CARRIER (LIRC_CAN_SET_SEND_CARRIER << 16)
+#define LIRC_CAN_SET_REC_DUTY_CYCLE (LIRC_CAN_SET_SEND_DUTY_CYCLE << 16)
+
+#define LIRC_CAN_SET_REC_DUTY_CYCLE_RANGE 0x40000000
+#define LIRC_CAN_SET_REC_CARRIER_RANGE 0x80000000
+#define LIRC_CAN_GET_REC_RESOLUTION 0x20000000
+#define LIRC_CAN_SET_REC_TIMEOUT 0x10000000
+#define LIRC_CAN_SET_REC_FILTER 0x08000000
+
+#define LIRC_CAN_MEASURE_CARRIER 0x02000000
+#define LIRC_CAN_USE_WIDEBAND_RECEIVER 0x04000000
+
+#define LIRC_CAN_SEND(x) ((x)&LIRC_CAN_SEND_MASK)
+#define LIRC_CAN_REC(x) ((x)&LIRC_CAN_REC_MASK)
+
+#define LIRC_CAN_NOTIFY_DECODE 0x01000000
+
+/*** IOCTL commands for lirc driver ***/
+
+#define LIRC_GET_FEATURES _IOR('i', 0x00000000, __u32)
+
+#define LIRC_GET_SEND_MODE _IOR('i', 0x00000001, __u32)
+#define LIRC_GET_REC_MODE _IOR('i', 0x00000002, __u32)
+#define LIRC_GET_REC_RESOLUTION _IOR('i', 0x00000007, __u32)
+
+#define LIRC_GET_MIN_TIMEOUT _IOR('i', 0x00000008, __u32)
+#define LIRC_GET_MAX_TIMEOUT _IOR('i', 0x00000009, __u32)
+
+/* code length in bits, currently only for LIRC_MODE_LIRCCODE */
+#define LIRC_GET_LENGTH _IOR('i', 0x0000000f, __u32)
+
+#define LIRC_SET_SEND_MODE _IOW('i', 0x00000011, __u32)
+#define LIRC_SET_REC_MODE _IOW('i', 0x00000012, __u32)
+/* Note: these can reset the according pulse_width */
+#define LIRC_SET_SEND_CARRIER _IOW('i', 0x00000013, __u32)
+#define LIRC_SET_REC_CARRIER _IOW('i', 0x00000014, __u32)
+#define LIRC_SET_SEND_DUTY_CYCLE _IOW('i', 0x00000015, __u32)
+#define LIRC_SET_TRANSMITTER_MASK _IOW('i', 0x00000017, __u32)
+
+/*
+ * when a timeout != 0 is set the driver will send a
+ * LIRC_MODE2_TIMEOUT data packet, otherwise LIRC_MODE2_TIMEOUT is
+ * never sent, timeout is disabled by default
+ */
+#define LIRC_SET_REC_TIMEOUT _IOW('i', 0x00000018, __u32)
+
+/* 1 enables, 0 disables timeout reports in MODE2 */
+#define LIRC_SET_REC_TIMEOUT_REPORTS _IOW('i', 0x00000019, __u32)
+
+/*
+ * if enabled from the next key press on the driver will send
+ * LIRC_MODE2_FREQUENCY packets
+ */
+#define LIRC_SET_MEASURE_CARRIER_MODE _IOW('i', 0x0000001d, __u32)
+
+/*
+ * to set a range use LIRC_SET_REC_CARRIER_RANGE with the
+ * lower bound first and later LIRC_SET_REC_CARRIER with the upper bound
+ */
+#define LIRC_SET_REC_CARRIER_RANGE _IOW('i', 0x0000001f, __u32)
+
+#define LIRC_SET_WIDEBAND_RECEIVER _IOW('i', 0x00000023, __u32)
+
+/*
+ * struct lirc_scancode - decoded scancode with protocol for use with
+ * LIRC_MODE_SCANCODE
+ *
+ * @timestamp: Timestamp in nanoseconds using CLOCK_MONOTONIC when IR
+ * was decoded.
+ * @flags: should be 0 for transmit. When receiving scancodes,
+ * LIRC_SCANCODE_FLAG_TOGGLE or LIRC_SCANCODE_FLAG_REPEAT can be set
+ * depending on the protocol
+ * @rc_proto: see enum rc_proto
+ * @keycode: the translated keycode. Set to 0 for transmit.
+ * @scancode: the scancode received or to be sent
+ */
+struct lirc_scancode {
+ __u64 timestamp;
+ __u16 flags;
+ __u16 rc_proto;
+ __u32 keycode;
+ __u64 scancode;
+};
+
+/* Set if the toggle bit of rc-5 or rc-6 is enabled */
+#define LIRC_SCANCODE_FLAG_TOGGLE 1
+/* Set if this is a nec or sanyo repeat */
+#define LIRC_SCANCODE_FLAG_REPEAT 2
+
+/**
+ * enum rc_proto - the Remote Controller protocol
+ *
+ * @RC_PROTO_UNKNOWN: Protocol not known
+ * @RC_PROTO_OTHER: Protocol known but proprietary
+ * @RC_PROTO_RC5: Philips RC5 protocol
+ * @RC_PROTO_RC5X_20: Philips RC5x 20 bit protocol
+ * @RC_PROTO_RC5_SZ: StreamZap variant of RC5
+ * @RC_PROTO_JVC: JVC protocol
+ * @RC_PROTO_SONY12: Sony 12 bit protocol
+ * @RC_PROTO_SONY15: Sony 15 bit protocol
+ * @RC_PROTO_SONY20: Sony 20 bit protocol
+ * @RC_PROTO_NEC: NEC protocol
+ * @RC_PROTO_NECX: Extended NEC protocol
+ * @RC_PROTO_NEC32: NEC 32 bit protocol
+ * @RC_PROTO_SANYO: Sanyo protocol
+ * @RC_PROTO_MCIR2_KBD: RC6-ish MCE keyboard
+ * @RC_PROTO_MCIR2_MSE: RC6-ish MCE mouse
+ * @RC_PROTO_RC6_0: Philips RC6-0-16 protocol
+ * @RC_PROTO_RC6_6A_20: Philips RC6-6A-20 protocol
+ * @RC_PROTO_RC6_6A_24: Philips RC6-6A-24 protocol
+ * @RC_PROTO_RC6_6A_32: Philips RC6-6A-32 protocol
+ * @RC_PROTO_RC6_MCE: MCE (Philips RC6-6A-32 subtype) protocol
+ * @RC_PROTO_SHARP: Sharp protocol
+ * @RC_PROTO_XMP: XMP protocol
+ * @RC_PROTO_CEC: CEC protocol
+ * @RC_PROTO_IMON: iMon Pad protocol
+ */
+enum rc_proto {
+ RC_PROTO_UNKNOWN = 0,
+ RC_PROTO_OTHER = 1,
+ RC_PROTO_RC5 = 2,
+ RC_PROTO_RC5X_20 = 3,
+ RC_PROTO_RC5_SZ = 4,
+ RC_PROTO_JVC = 5,
+ RC_PROTO_SONY12 = 6,
+ RC_PROTO_SONY15 = 7,
+ RC_PROTO_SONY20 = 8,
+ RC_PROTO_NEC = 9,
+ RC_PROTO_NECX = 10,
+ RC_PROTO_NEC32 = 11,
+ RC_PROTO_SANYO = 12,
+ RC_PROTO_MCIR2_KBD = 13,
+ RC_PROTO_MCIR2_MSE = 14,
+ RC_PROTO_RC6_0 = 15,
+ RC_PROTO_RC6_6A_20 = 16,
+ RC_PROTO_RC6_6A_24 = 17,
+ RC_PROTO_RC6_6A_32 = 18,
+ RC_PROTO_RC6_MCE = 19,
+ RC_PROTO_SHARP = 20,
+ RC_PROTO_XMP = 21,
+ RC_PROTO_CEC = 22,
+ RC_PROTO_IMON = 23,
+};
+
+#endif
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index d20411ebfa2f..6cb45898afa7 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -1462,6 +1462,7 @@ static bool bpf_prog_type__needs_kver(enum bpf_prog_type type)
case BPF_PROG_TYPE_CGROUP_DEVICE:
case BPF_PROG_TYPE_SK_MSG:
case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
+ case BPF_PROG_TYPE_LIRC_MODE2:
return false;
case BPF_PROG_TYPE_UNSPEC:
case BPF_PROG_TYPE_KPROBE:
diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index adc8e5474b66..6ea835982464 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -17,3 +17,4 @@ test_sock_addr
urandom_read
test_btf
test_sockmap
+test_lirc_mode2_user
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 85044448bbc7..06400291e6e2 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -24,7 +24,7 @@ urandom_read: urandom_read.c
# Order correspond to 'make run_tests' order
TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test_progs \
test_align test_verifier_log test_dev_cgroup test_tcpbpf_user \
- test_sock test_btf test_sockmap
+ test_sock test_btf test_sockmap test_lirc_mode2_user
TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test_obj_id.o \
test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o sockmap_parse_prog.o \
@@ -34,7 +34,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test
sockmap_tcp_msg_prog.o connect4_prog.o connect6_prog.o test_adjust_tail.o \
test_btf_haskv.o test_btf_nokv.o test_sockmap_kern.o test_tunnel_kern.o \
test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \
- test_lwt_seg6local.o
+ test_lwt_seg6local.o test_lirc_mode2_kern.o
# Order correspond to 'make run_tests' order
TEST_PROGS := test_kmod.sh \
@@ -44,7 +44,8 @@ TEST_PROGS := test_kmod.sh \
test_offload.py \
test_sock_addr.sh \
test_tunnel.sh \
- test_lwt_seg6local.sh
+ test_lwt_seg6local.sh \
+ test_lirc_mode2.sh
# Compile but not part of 'make run_tests'
TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 334d3e8c5e89..a66a9d91acf4 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -126,6 +126,11 @@ static int (*bpf_lwt_seg6_action)(void *ctx, unsigned int action, void *param,
static int (*bpf_lwt_seg6_adjust_srh)(void *ctx, unsigned int offset,
unsigned int len) =
(void *) BPF_FUNC_lwt_seg6_adjust_srh;
+static int (*bpf_rc_repeat)(void *ctx) =
+ (void *) BPF_FUNC_rc_repeat;
+static int (*bpf_rc_keydown)(void *ctx, unsigned int protocol,
+ unsigned long long scancode, unsigned int toggle) =
+ (void *) BPF_FUNC_rc_keydown;
/* llvm builtin functions that eBPF C program may use to
* emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/test_lirc_mode2.sh b/tools/testing/selftests/bpf/test_lirc_mode2.sh
new file mode 100755
index 000000000000..ce2e15e4f976
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_lirc_mode2.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+GREEN='\033[0;92m'
+RED='\033[0;31m'
+NC='\033[0m' # No Color
+
+modprobe rc-loopback
+
+for i in /sys/class/rc/rc*
+do
+ if grep -q DRV_NAME=rc-loopback $i/uevent
+ then
+ LIRCDEV=$(grep DEVNAME= $i/lirc*/uevent | sed sQDEVNAME=Q/dev/Q)
+ fi
+done
+
+if [ -n $LIRCDEV ];
+then
+ TYPE=lirc_mode2
+ ./test_lirc_mode2_user $LIRCDEV
+ ret=$?
+ if [ $ret -ne 0 ]; then
+ echo -e ${RED}"FAIL: $TYPE"${NC}
+ else
+ echo -e ${GREEN}"PASS: $TYPE"${NC}
+ fi
+fi
diff --git a/tools/testing/selftests/bpf/test_lirc_mode2_kern.c b/tools/testing/selftests/bpf/test_lirc_mode2_kern.c
new file mode 100644
index 000000000000..ba26855563a5
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_lirc_mode2_kern.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0
+// test ir decoder
+//
+// Copyright (C) 2018 Sean Young <sean@mess.org>
+
+#include <linux/bpf.h>
+#include <linux/lirc.h>
+#include "bpf_helpers.h"
+
+SEC("lirc_mode2")
+int bpf_decoder(unsigned int *sample)
+{
+ if (LIRC_IS_PULSE(*sample)) {
+ unsigned int duration = LIRC_VALUE(*sample);
+
+ if (duration & 0x10000)
+ bpf_rc_keydown(sample, 0x40, duration & 0xffff, 0);
+ }
+
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_lirc_mode2_user.c b/tools/testing/selftests/bpf/test_lirc_mode2_user.c
new file mode 100644
index 000000000000..d470d63c33db
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_lirc_mode2_user.c
@@ -0,0 +1,149 @@
+// SPDX-License-Identifier: GPL-2.0
+// test ir decoder
+//
+// Copyright (C) 2018 Sean Young <sean@mess.org>
+
+// A lirc chardev is a device representing a consumer IR (cir) device which
+// can receive infrared signals from remote control and/or transmit IR.
+//
+// IR is sent as a series of pulses and space somewhat like morse code. The
+// BPF program can decode this into scancodes so that rc-core can translate
+// this into input key codes using the rc keymap.
+//
+// This test works by sending IR over rc-loopback, so the IR is processed by
+// BPF and then decoded into scancodes. The lirc chardev must be the one
+// associated with rc-loopback, see the output of ir-keytable(1).
+//
+// The following CONFIG options must be enabled for the test to succeed:
+// CONFIG_RC_CORE=y
+// CONFIG_BPF_RAWIR_EVENT=y
+// CONFIG_RC_LOOPBACK=y
+
+// Steps:
+// 1. Open the /dev/lircN device for rc-loopback (given on command line)
+// 2. Attach bpf_lirc_mode2 program which decodes some IR.
+// 3. Send some IR to the same IR device; since it is loopback, this will
+// end up in the bpf program
+// 4. bpf program should decode IR and report keycode
+// 5. We can read keycode from same /dev/lirc device
+
+#include <linux/bpf.h>
+#include <linux/lirc.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <poll.h>
+#include <sys/types.h>
+#include <sys/ioctl.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+
+#include "bpf_util.h"
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+int main(int argc, char **argv)
+{
+ struct bpf_object *obj;
+ int ret, lircfd, progfd, mode;
+ int testir = 0x1dead;
+ u32 prog_ids[10], prog_flags[10], prog_cnt;
+
+ if (argc != 2) {
+ printf("Usage: %s /dev/lircN\n", argv[0]);
+ return 2;
+ }
+
+ ret = bpf_prog_load("test_lirc_mode2_kern.o",
+ BPF_PROG_TYPE_LIRC_MODE2, &obj, &progfd);
+ if (ret) {
+ printf("Failed to load bpf program\n");
+ return 1;
+ }
+
+ lircfd = open(argv[1], O_RDWR | O_NONBLOCK);
+ if (lircfd == -1) {
+ printf("failed to open lirc device %s: %m\n", argv[1]);
+ return 1;
+ }
+
+ /* Let's try detach it before it was ever attached */
+ ret = bpf_prog_detach2(progfd, lircfd, BPF_LIRC_MODE2);
+ if (ret != -1 || errno != ENOENT) {
+ printf("bpf_prog_detach2 not attached should fail: %m\n");
+ return 1;
+ }
+
+ mode = LIRC_MODE_SCANCODE;
+ if (ioctl(lircfd, LIRC_SET_REC_MODE, &mode)) {
+ printf("failed to set rec mode: %m\n");
+ return 1;
+ }
+
+ prog_cnt = 10;
+ ret = bpf_prog_query(lircfd, BPF_LIRC_MODE2, 0, prog_flags, prog_ids,
+ &prog_cnt);
+ if (ret) {
+ printf("Failed to query bpf programs on lirc device: %m\n");
+ return 1;
+ }
+
+ if (prog_cnt != 0) {
+ printf("Expected nothing to be attached\n");
+ return 1;
+ }
+
+ ret = bpf_prog_attach(progfd, lircfd, BPF_LIRC_MODE2, 0);
+ if (ret) {
+ printf("Failed to attach bpf to lirc device: %m\n");
+ return 1;
+ }
+
+ /* Write raw IR */
+ ret = write(lircfd, &testir, sizeof(testir));
+ if (ret != sizeof(testir)) {
+ printf("Failed to send test IR message: %m\n");
+ return 1;
+ }
+
+ struct pollfd pfd = { .fd = lircfd, .events = POLLIN };
+ struct lirc_scancode lsc;
+
+ poll(&pfd, 1, 100);
+
+ /* Read decoded IR */
+ ret = read(lircfd, &lsc, sizeof(lsc));
+ if (ret != sizeof(lsc)) {
+ printf("Failed to read decoded IR: %m\n");
+ return 1;
+ }
+
+ if (lsc.scancode != 0xdead || lsc.rc_proto != 64) {
+ printf("Incorrect scancode decoded\n");
+ return 1;
+ }
+
+ prog_cnt = 10;
+ ret = bpf_prog_query(lircfd, BPF_LIRC_MODE2, 0, prog_flags, prog_ids,
+ &prog_cnt);
+ if (ret) {
+ printf("Failed to query bpf programs on lirc device: %m\n");
+ return 1;
+ }
+
+ if (prog_cnt != 1) {
+ printf("Expected one program to be attached\n");
+ return 1;
+ }
+
+ /* Let's try detaching it now it is actually attached */
+ ret = bpf_prog_detach2(progfd, lircfd, BPF_LIRC_MODE2);
+ if (ret) {
+ printf("bpf_prog_detach2: returned %m\n");
+ return 1;
+ }
+
+ return 0;
+}
--
2.17.0
^ permalink raw reply related
* [PATCH v5 2/3] media: rc: introduce BPF_PROG_LIRC_MODE2
From: Sean Young @ 2018-05-27 11:24 UTC (permalink / raw)
To: linux-media, linux-kernel, Alexei Starovoitov,
Mauro Carvalho Chehab, Daniel Borkmann, netdev, Matthias Reichl,
Devin Heitmueller, Y Song, Quentin Monnet
In-Reply-To: <cover.1527419762.git.sean@mess.org>
Add support for BPF_PROG_LIRC_MODE2. This type of BPF program can call
rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report
that the last key should be repeated.
The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall;
the target_fd must be the /dev/lircN device.
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Sean Young <sean@mess.org>
---
drivers/media/rc/Kconfig | 13 ++
drivers/media/rc/Makefile | 1 +
drivers/media/rc/bpf-lirc.c | 313 ++++++++++++++++++++++++++++++++
drivers/media/rc/lirc_dev.c | 30 +++
drivers/media/rc/rc-core-priv.h | 21 +++
drivers/media/rc/rc-ir-raw.c | 12 +-
include/linux/bpf_lirc.h | 29 +++
include/linux/bpf_types.h | 3 +
include/uapi/linux/bpf.h | 53 +++++-
kernel/bpf/syscall.c | 7 +
10 files changed, 479 insertions(+), 3 deletions(-)
create mode 100644 drivers/media/rc/bpf-lirc.c
create mode 100644 include/linux/bpf_lirc.h
diff --git a/drivers/media/rc/Kconfig b/drivers/media/rc/Kconfig
index eb2c3b6eca7f..d5b35a6ba899 100644
--- a/drivers/media/rc/Kconfig
+++ b/drivers/media/rc/Kconfig
@@ -25,6 +25,19 @@ config LIRC
passes raw IR to and from userspace, which is needed for
IR transmitting (aka "blasting") and for the lirc daemon.
+config BPF_LIRC_MODE2
+ bool "Support for eBPF programs attached to lirc devices"
+ depends on BPF_SYSCALL
+ depends on RC_CORE=y
+ depends on LIRC
+ help
+ Allow attaching eBPF programs to a lirc device using the bpf(2)
+ syscall command BPF_PROG_ATTACH. This is supported for raw IR
+ receivers.
+
+ These eBPF programs can be used to decode IR into scancodes, for
+ IR protocols not supported by the kernel decoders.
+
menuconfig RC_DECODERS
bool "Remote controller decoders"
depends on RC_CORE
diff --git a/drivers/media/rc/Makefile b/drivers/media/rc/Makefile
index 2e1c87066f6c..e0340d043fe8 100644
--- a/drivers/media/rc/Makefile
+++ b/drivers/media/rc/Makefile
@@ -5,6 +5,7 @@ obj-y += keymaps/
obj-$(CONFIG_RC_CORE) += rc-core.o
rc-core-y := rc-main.o rc-ir-raw.o
rc-core-$(CONFIG_LIRC) += lirc_dev.o
+rc-core-$(CONFIG_BPF_LIRC_MODE2) += bpf-lirc.o
obj-$(CONFIG_IR_NEC_DECODER) += ir-nec-decoder.o
obj-$(CONFIG_IR_RC5_DECODER) += ir-rc5-decoder.o
obj-$(CONFIG_IR_RC6_DECODER) += ir-rc6-decoder.o
diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
new file mode 100644
index 000000000000..40826bba06b6
--- /dev/null
+++ b/drivers/media/rc/bpf-lirc.c
@@ -0,0 +1,313 @@
+// SPDX-License-Identifier: GPL-2.0
+// bpf-lirc.c - handles bpf
+//
+// Copyright (C) 2018 Sean Young <sean@mess.org>
+
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <linux/bpf_lirc.h>
+#include "rc-core-priv.h"
+
+/*
+ * BPF interface for raw IR
+ */
+const struct bpf_prog_ops lirc_mode2_prog_ops = {
+};
+
+BPF_CALL_1(bpf_rc_repeat, u32*, sample)
+{
+ struct ir_raw_event_ctrl *ctrl;
+
+ ctrl = container_of(sample, struct ir_raw_event_ctrl, bpf_sample);
+
+ rc_repeat(ctrl->dev);
+
+ return 0;
+}
+
+static const struct bpf_func_proto rc_repeat_proto = {
+ .func = bpf_rc_repeat,
+ .gpl_only = true, /* rc_repeat is EXPORT_SYMBOL_GPL */
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+};
+
+/*
+ * Currently rc-core does not support 64-bit scancodes, but there are many
+ * known protocols with more than 32 bits. So, define the interface as u64
+ * as a future-proof.
+ */
+BPF_CALL_4(bpf_rc_keydown, u32*, sample, u32, protocol, u64, scancode,
+ u32, toggle)
+{
+ struct ir_raw_event_ctrl *ctrl;
+
+ ctrl = container_of(sample, struct ir_raw_event_ctrl, bpf_sample);
+
+ rc_keydown(ctrl->dev, protocol, scancode, toggle != 0);
+
+ return 0;
+}
+
+static const struct bpf_func_proto rc_keydown_proto = {
+ .func = bpf_rc_keydown,
+ .gpl_only = true, /* rc_keydown is EXPORT_SYMBOL_GPL */
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_ANYTHING,
+ .arg3_type = ARG_ANYTHING,
+ .arg4_type = ARG_ANYTHING,
+};
+
+static const struct bpf_func_proto *
+lirc_mode2_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+ switch (func_id) {
+ case BPF_FUNC_rc_repeat:
+ return &rc_repeat_proto;
+ case BPF_FUNC_rc_keydown:
+ return &rc_keydown_proto;
+ case BPF_FUNC_map_lookup_elem:
+ return &bpf_map_lookup_elem_proto;
+ case BPF_FUNC_map_update_elem:
+ return &bpf_map_update_elem_proto;
+ case BPF_FUNC_map_delete_elem:
+ return &bpf_map_delete_elem_proto;
+ case BPF_FUNC_ktime_get_ns:
+ return &bpf_ktime_get_ns_proto;
+ case BPF_FUNC_tail_call:
+ return &bpf_tail_call_proto;
+ case BPF_FUNC_get_prandom_u32:
+ return &bpf_get_prandom_u32_proto;
+ case BPF_FUNC_trace_printk:
+ if (capable(CAP_SYS_ADMIN))
+ return bpf_get_trace_printk_proto();
+ /* fall through */
+ default:
+ return NULL;
+ }
+}
+
+static bool lirc_mode2_is_valid_access(int off, int size,
+ enum bpf_access_type type,
+ const struct bpf_prog *prog,
+ struct bpf_insn_access_aux *info)
+{
+ /* We have one field of u32 */
+ return type == BPF_READ && off == 0 && size == sizeof(u32);
+}
+
+const struct bpf_verifier_ops lirc_mode2_verifier_ops = {
+ .get_func_proto = lirc_mode2_func_proto,
+ .is_valid_access = lirc_mode2_is_valid_access
+};
+
+#define BPF_MAX_PROGS 64
+
+static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog)
+{
+ struct bpf_prog_array __rcu *old_array;
+ struct bpf_prog_array *new_array;
+ struct ir_raw_event_ctrl *raw;
+ int ret;
+
+ if (rcdev->driver_type != RC_DRIVER_IR_RAW)
+ return -EINVAL;
+
+ ret = mutex_lock_interruptible(&ir_raw_handler_lock);
+ if (ret)
+ return ret;
+
+ raw = rcdev->raw;
+ if (!raw) {
+ ret = -ENODEV;
+ goto unlock;
+ }
+
+ if (raw->progs && bpf_prog_array_length(raw->progs) >= BPF_MAX_PROGS) {
+ ret = -E2BIG;
+ goto unlock;
+ }
+
+ old_array = raw->progs;
+ ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array);
+ if (ret < 0)
+ goto unlock;
+
+ rcu_assign_pointer(raw->progs, new_array);
+ bpf_prog_array_free(old_array);
+
+unlock:
+ mutex_unlock(&ir_raw_handler_lock);
+ return ret;
+}
+
+static int lirc_bpf_detach(struct rc_dev *rcdev, struct bpf_prog *prog)
+{
+ struct bpf_prog_array __rcu *old_array;
+ struct bpf_prog_array *new_array;
+ struct ir_raw_event_ctrl *raw;
+ int ret;
+
+ if (rcdev->driver_type != RC_DRIVER_IR_RAW)
+ return -EINVAL;
+
+ ret = mutex_lock_interruptible(&ir_raw_handler_lock);
+ if (ret)
+ return ret;
+
+ raw = rcdev->raw;
+ if (!raw) {
+ ret = -ENODEV;
+ goto unlock;
+ }
+
+ old_array = raw->progs;
+ ret = bpf_prog_array_copy(old_array, prog, NULL, &new_array);
+ /*
+ * Do not use bpf_prog_array_delete_safe() as we would end up
+ * with a dummy entry in the array, and the we would free the
+ * dummy in lirc_bpf_free()
+ */
+ if (ret)
+ goto unlock;
+
+ rcu_assign_pointer(raw->progs, new_array);
+ bpf_prog_array_free(old_array);
+unlock:
+ mutex_unlock(&ir_raw_handler_lock);
+ return ret;
+}
+
+void lirc_bpf_run(struct rc_dev *rcdev, u32 sample)
+{
+ struct ir_raw_event_ctrl *raw = rcdev->raw;
+
+ raw->bpf_sample = sample;
+
+ if (raw->progs)
+ BPF_PROG_RUN_ARRAY(raw->progs, &raw->bpf_sample, BPF_PROG_RUN);
+}
+
+/*
+ * This should be called once the rc thread has been stopped, so there can be
+ * no concurrent bpf execution.
+ */
+void lirc_bpf_free(struct rc_dev *rcdev)
+{
+ struct bpf_prog **progs;
+
+ if (!rcdev->raw->progs)
+ return;
+
+ progs = rcu_dereference(rcdev->raw->progs)->progs;
+ while (*progs)
+ bpf_prog_put(*progs++);
+
+ bpf_prog_array_free(rcdev->raw->progs);
+}
+
+int lirc_prog_attach(const union bpf_attr *attr)
+{
+ struct bpf_prog *prog;
+ struct rc_dev *rcdev;
+ int ret;
+
+ if (attr->attach_flags)
+ return -EINVAL;
+
+ prog = bpf_prog_get_type(attr->attach_bpf_fd,
+ BPF_PROG_TYPE_LIRC_MODE2);
+ if (IS_ERR(prog))
+ return PTR_ERR(prog);
+
+ rcdev = rc_dev_get_from_fd(attr->target_fd);
+ if (IS_ERR(rcdev)) {
+ bpf_prog_put(prog);
+ return PTR_ERR(rcdev);
+ }
+
+ ret = lirc_bpf_attach(rcdev, prog);
+ if (ret)
+ bpf_prog_put(prog);
+
+ put_device(&rcdev->dev);
+
+ return ret;
+}
+
+int lirc_prog_detach(const union bpf_attr *attr)
+{
+ struct bpf_prog *prog;
+ struct rc_dev *rcdev;
+ int ret;
+
+ if (attr->attach_flags)
+ return -EINVAL;
+
+ prog = bpf_prog_get_type(attr->attach_bpf_fd,
+ BPF_PROG_TYPE_LIRC_MODE2);
+ if (IS_ERR(prog))
+ return PTR_ERR(prog);
+
+ rcdev = rc_dev_get_from_fd(attr->target_fd);
+ if (IS_ERR(rcdev)) {
+ bpf_prog_put(prog);
+ return PTR_ERR(rcdev);
+ }
+
+ ret = lirc_bpf_detach(rcdev, prog);
+
+ bpf_prog_put(prog);
+ put_device(&rcdev->dev);
+
+ return ret;
+}
+
+int lirc_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr)
+{
+ __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
+ struct bpf_prog_array __rcu *progs;
+ struct rc_dev *rcdev;
+ u32 cnt, flags = 0;
+ int ret;
+
+ if (attr->query.query_flags)
+ return -EINVAL;
+
+ rcdev = rc_dev_get_from_fd(attr->query.target_fd);
+ if (IS_ERR(rcdev))
+ return PTR_ERR(rcdev);
+
+ if (rcdev->driver_type != RC_DRIVER_IR_RAW) {
+ ret = -EINVAL;
+ goto put;
+ }
+
+ ret = mutex_lock_interruptible(&ir_raw_handler_lock);
+ if (ret)
+ goto put;
+
+ progs = rcdev->raw->progs;
+ cnt = progs ? bpf_prog_array_length(progs) : 0;
+
+ if (copy_to_user(&uattr->query.prog_cnt, &cnt, sizeof(cnt))) {
+ ret = -EFAULT;
+ goto unlock;
+ }
+
+ if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags))) {
+ ret = -EFAULT;
+ goto unlock;
+ }
+
+ if (attr->query.prog_cnt != 0 && prog_ids && cnt)
+ ret = bpf_prog_array_copy_to_user(progs, prog_ids, cnt);
+
+unlock:
+ mutex_unlock(&ir_raw_handler_lock);
+put:
+ put_device(&rcdev->dev);
+
+ return ret;
+}
diff --git a/drivers/media/rc/lirc_dev.c b/drivers/media/rc/lirc_dev.c
index 24e9fbb80e81..da7013a12a58 100644
--- a/drivers/media/rc/lirc_dev.c
+++ b/drivers/media/rc/lirc_dev.c
@@ -20,6 +20,7 @@
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/device.h>
+#include <linux/file.h>
#include <linux/idr.h>
#include <linux/poll.h>
#include <linux/sched.h>
@@ -104,6 +105,12 @@ void ir_lirc_raw_event(struct rc_dev *dev, struct ir_raw_event ev)
TO_US(ev.duration), TO_STR(ev.pulse));
}
+ /*
+ * bpf does not care about the gap generated above; that exists
+ * for backwards compatibility
+ */
+ lirc_bpf_run(dev, sample);
+
spin_lock_irqsave(&dev->lirc_fh_lock, flags);
list_for_each_entry(fh, &dev->lirc_fh, list) {
if (LIRC_IS_TIMEOUT(sample) && !fh->send_timeout_reports)
@@ -816,4 +823,27 @@ void __exit lirc_dev_exit(void)
unregister_chrdev_region(lirc_base_dev, RC_DEV_MAX);
}
+struct rc_dev *rc_dev_get_from_fd(int fd)
+{
+ struct fd f = fdget(fd);
+ struct lirc_fh *fh;
+ struct rc_dev *dev;
+
+ if (!f.file)
+ return ERR_PTR(-EBADF);
+
+ if (f.file->f_op != &lirc_fops) {
+ fdput(f);
+ return ERR_PTR(-EINVAL);
+ }
+
+ fh = f.file->private_data;
+ dev = fh->rc;
+
+ get_device(&dev->dev);
+ fdput(f);
+
+ return dev;
+}
+
MODULE_ALIAS("lirc_dev");
diff --git a/drivers/media/rc/rc-core-priv.h b/drivers/media/rc/rc-core-priv.h
index e0e6a17460f6..eb004757038b 100644
--- a/drivers/media/rc/rc-core-priv.h
+++ b/drivers/media/rc/rc-core-priv.h
@@ -13,6 +13,7 @@
#define MAX_IR_EVENT_SIZE 512
#include <linux/slab.h>
+#include <uapi/linux/bpf.h>
#include <media/rc-core.h>
/**
@@ -57,6 +58,11 @@ struct ir_raw_event_ctrl {
/* raw decoder state follows */
struct ir_raw_event prev_ev;
struct ir_raw_event this_ev;
+
+#ifdef CONFIG_BPF_LIRC_MODE2
+ u32 bpf_sample;
+ struct bpf_prog_array __rcu *progs;
+#endif
struct nec_dec {
int state;
unsigned count;
@@ -126,6 +132,9 @@ struct ir_raw_event_ctrl {
} imon;
};
+/* Mutex for locking raw IR processing and handler change */
+extern struct mutex ir_raw_handler_lock;
+
/* macros for IR decoders */
static inline bool geq_margin(unsigned d1, unsigned d2, unsigned margin)
{
@@ -288,6 +297,7 @@ void ir_lirc_raw_event(struct rc_dev *dev, struct ir_raw_event ev);
void ir_lirc_scancode_event(struct rc_dev *dev, struct lirc_scancode *lsc);
int ir_lirc_register(struct rc_dev *dev);
void ir_lirc_unregister(struct rc_dev *dev);
+struct rc_dev *rc_dev_get_from_fd(int fd);
#else
static inline int lirc_dev_init(void) { return 0; }
static inline void lirc_dev_exit(void) {}
@@ -299,4 +309,15 @@ static inline int ir_lirc_register(struct rc_dev *dev) { return 0; }
static inline void ir_lirc_unregister(struct rc_dev *dev) { }
#endif
+/*
+ * bpf interface
+ */
+#ifdef CONFIG_BPF_LIRC_MODE2
+void lirc_bpf_free(struct rc_dev *dev);
+void lirc_bpf_run(struct rc_dev *dev, u32 sample);
+#else
+static inline void lirc_bpf_free(struct rc_dev *dev) { }
+static inline void lirc_bpf_run(struct rc_dev *dev, u32 sample) { }
+#endif
+
#endif /* _RC_CORE_PRIV */
diff --git a/drivers/media/rc/rc-ir-raw.c b/drivers/media/rc/rc-ir-raw.c
index 374f83105a23..7675b7ee5bc7 100644
--- a/drivers/media/rc/rc-ir-raw.c
+++ b/drivers/media/rc/rc-ir-raw.c
@@ -14,7 +14,7 @@
static LIST_HEAD(ir_raw_client_list);
/* Used to handle IR raw handler extensions */
-static DEFINE_MUTEX(ir_raw_handler_lock);
+DEFINE_MUTEX(ir_raw_handler_lock);
static LIST_HEAD(ir_raw_handler_list);
static atomic64_t available_protocols = ATOMIC64_INIT(0);
@@ -621,9 +621,17 @@ void ir_raw_event_unregister(struct rc_dev *dev)
list_for_each_entry(handler, &ir_raw_handler_list, list)
if (handler->raw_unregister)
handler->raw_unregister(dev);
- mutex_unlock(&ir_raw_handler_lock);
+
+ lirc_bpf_free(dev);
ir_raw_event_free(dev);
+
+ /*
+ * A user can be calling bpf(BPF_PROG_{QUERY|ATTACH|DETACH}), so
+ * ensure that the raw member is null on unlock; this is how
+ * "device gone" is checked.
+ */
+ mutex_unlock(&ir_raw_handler_lock);
}
/*
diff --git a/include/linux/bpf_lirc.h b/include/linux/bpf_lirc.h
new file mode 100644
index 000000000000..5f8a4283092d
--- /dev/null
+++ b/include/linux/bpf_lirc.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _BPF_LIRC_H
+#define _BPF_LIRC_H
+
+#include <uapi/linux/bpf.h>
+
+#ifdef CONFIG_BPF_LIRC_MODE2
+int lirc_prog_attach(const union bpf_attr *attr);
+int lirc_prog_detach(const union bpf_attr *attr);
+int lirc_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr);
+#else
+static inline int lirc_prog_attach(const union bpf_attr *attr)
+{
+ return -EINVAL;
+}
+
+static inline int lirc_prog_detach(const union bpf_attr *attr)
+{
+ return -EINVAL;
+}
+
+static inline int lirc_prog_query(const union bpf_attr *attr,
+ union bpf_attr __user *uattr)
+{
+ return -EINVAL;
+}
+#endif
+
+#endif /* _BPF_LIRC_H */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index b161e506dcfc..c5700c2d5549 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -26,6 +26,9 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_RAW_TRACEPOINT, raw_tracepoint)
#ifdef CONFIG_CGROUP_BPF
BPF_PROG_TYPE(BPF_PROG_TYPE_CGROUP_DEVICE, cg_dev)
#endif
+#ifdef CONFIG_BPF_LIRC_MODE2
+BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2)
+#endif
BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 9b8c6e310e9a..4636c596096d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -143,6 +143,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_RAW_TRACEPOINT,
BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
BPF_PROG_TYPE_LWT_SEG6LOCAL,
+ BPF_PROG_TYPE_LIRC_MODE2,
};
enum bpf_attach_type {
@@ -160,6 +161,7 @@ enum bpf_attach_type {
BPF_CGROUP_INET6_CONNECT,
BPF_CGROUP_INET4_POST_BIND,
BPF_CGROUP_INET6_POST_BIND,
+ BPF_LIRC_MODE2,
__MAX_BPF_ATTACH_TYPE
};
@@ -2004,6 +2006,53 @@ union bpf_attr {
* direct packet access.
* Return
* 0 on success, or a negative error in case of failure.
+ *
+ * int bpf_rc_keydown(void *ctx, u32 protocol, u64 scancode, u32 toggle)
+ * Description
+ * This helper is used in programs implementing IR decoding, to
+ * report a successfully decoded key press with *scancode*,
+ * *toggle* value in the given *protocol*. The scancode will be
+ * translated to a keycode using the rc keymap, and reported as
+ * an input key down event. After a period a key up event is
+ * generated. This period can be extended by calling either
+ * **bpf_rc_keydown** () again with the same values, or calling
+ * **bpf_rc_repeat** ().
+ *
+ * Some protocols include a toggle bit, in case the button was
+ * released and pressed again between consecutive scancodes.
+ *
+ * The *ctx* should point to the lirc sample as passed into
+ * the program.
+ *
+ * The *protocol* is the decoded protocol number (see
+ * **enum rc_proto** for some predefined values).
+ *
+ * This helper is only available is the kernel was compiled with
+ * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
+ * "**y**".
+ *
+ * Return
+ * 0
+ *
+ * int bpf_rc_repeat(void *ctx)
+ * Description
+ * This helper is used in programs implementing IR decoding, to
+ * report a successfully decoded repeat key message. This delays
+ * the generation of a key up event for previously generated
+ * key down event.
+ *
+ * Some IR protocols like NEC have a special IR message for
+ * repeating last button, for when a button is held down.
+ *
+ * The *ctx* should point to the lirc sample as passed into
+ * the program.
+ *
+ * This helper is only available is the kernel was compiled with
+ * the **CONFIG_BPF_LIRC_MODE2** configuration option set to
+ * "**y**".
+ *
+ * Return
+ * 0
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -2082,7 +2131,9 @@ union bpf_attr {
FN(lwt_push_encap), \
FN(lwt_seg6_store_bytes), \
FN(lwt_seg6_adjust_srh), \
- FN(lwt_seg6_action),
+ FN(lwt_seg6_action), \
+ FN(rc_repeat), \
+ FN(rc_keydown),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 388d4feda348..3c104113d040 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -11,6 +11,7 @@
*/
#include <linux/bpf.h>
#include <linux/bpf_trace.h>
+#include <linux/bpf_lirc.h>
#include <linux/btf.h>
#include <linux/syscalls.h>
#include <linux/slab.h>
@@ -1578,6 +1579,8 @@ static int bpf_prog_attach(const union bpf_attr *attr)
case BPF_SK_SKB_STREAM_PARSER:
case BPF_SK_SKB_STREAM_VERDICT:
return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, true);
+ case BPF_LIRC_MODE2:
+ return lirc_prog_attach(attr);
default:
return -EINVAL;
}
@@ -1648,6 +1651,8 @@ static int bpf_prog_detach(const union bpf_attr *attr)
case BPF_SK_SKB_STREAM_PARSER:
case BPF_SK_SKB_STREAM_VERDICT:
return sockmap_get_from_fd(attr, BPF_PROG_TYPE_SK_SKB, false);
+ case BPF_LIRC_MODE2:
+ return lirc_prog_detach(attr);
default:
return -EINVAL;
}
@@ -1695,6 +1700,8 @@ static int bpf_prog_query(const union bpf_attr *attr,
case BPF_CGROUP_SOCK_OPS:
case BPF_CGROUP_DEVICE:
break;
+ case BPF_LIRC_MODE2:
+ return lirc_prog_query(attr, uattr);
default:
return -EINVAL;
}
--
2.17.0
^ permalink raw reply related
* [PATCH v5 1/3] bpf: bpf_prog_array_copy() should return -ENOENT if exclude_prog not found
From: Sean Young @ 2018-05-27 11:24 UTC (permalink / raw)
To: linux-media, linux-kernel, Alexei Starovoitov,
Mauro Carvalho Chehab, Daniel Borkmann, netdev, Matthias Reichl,
Devin Heitmueller, Y Song, Quentin Monnet
In-Reply-To: <cover.1527419762.git.sean@mess.org>
This makes is it possible for bpf prog detach to return -ENOENT.
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Sean Young <sean@mess.org>
---
kernel/bpf/core.c | 11 +++++++++--
kernel/trace/bpf_trace.c | 2 ++
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index b574dddc05b8..527587de8a67 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1616,6 +1616,7 @@ int bpf_prog_array_copy(struct bpf_prog_array __rcu *old_array,
int new_prog_cnt, carry_prog_cnt = 0;
struct bpf_prog **existing_prog;
struct bpf_prog_array *array;
+ bool found_exclude = false;
int new_prog_idx = 0;
/* Figure out how many existing progs we need to carry over to
@@ -1624,14 +1625,20 @@ int bpf_prog_array_copy(struct bpf_prog_array __rcu *old_array,
if (old_array) {
existing_prog = old_array->progs;
for (; *existing_prog; existing_prog++) {
- if (*existing_prog != exclude_prog &&
- *existing_prog != &dummy_bpf_prog.prog)
+ if (*existing_prog == exclude_prog) {
+ found_exclude = true;
+ continue;
+ }
+ if (*existing_prog != &dummy_bpf_prog.prog)
carry_prog_cnt++;
if (*existing_prog == include_prog)
return -EEXIST;
}
}
+ if (exclude_prog && !found_exclude)
+ return -ENOENT;
+
/* How many progs (not NULL) will be in the new array? */
new_prog_cnt = carry_prog_cnt;
if (include_prog)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 81fdf2fc94ac..af1486d9a0ed 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1006,6 +1006,8 @@ void perf_event_detach_bpf_prog(struct perf_event *event)
old_array = event->tp_event->prog_array;
ret = bpf_prog_array_copy(old_array, event->prog, NULL, &new_array);
+ if (ret == -ENOENT)
+ goto unlock;
if (ret < 0) {
bpf_prog_array_delete_safe(old_array, event->prog);
} else {
--
2.17.0
^ permalink raw reply related
* [PATCH v5 0/3] IR decoding using BPF
From: Sean Young @ 2018-05-27 11:24 UTC (permalink / raw)
To: linux-media, linux-kernel, Alexei Starovoitov,
Mauro Carvalho Chehab, Daniel Borkmann, netdev, Matthias Reichl,
Devin Heitmueller, Y Song, Quentin Monnet
The kernel IR decoders (drivers/media/rc/ir-*-decoder.c) support the most
widely used IR protocols, but there are many protocols which are not
supported[1]. For example, the lirc-remotes[2] repo has over 2700 remotes,
many of which are not supported by rc-core. There is a "long tail" of
unsupported IR protocols, for which lircd is need to decode the IR .
IR encoding is done in such a way that some simple circuit can decode it;
therefore, bpf is ideal.
In order to support all these protocols, here we have bpf based IR decoding.
The idea is that user-space can define a decoder in bpf, attach it to
the rc device through the lirc chardev.
Separate work is underway to extend ir-keytable to have an extensive library
of bpf-based decoders, and a much expanded library of rc keymaps.
Another future application would be to compile IRP[3] to a IR BPF program, and
so support virtually every remote without having to write a decoder for each.
It might also be possible to support non-button devices such as analog
directional pads or air conditioning remote controls and decode the target
temperature in bpf, and pass that to an input device.
Thanks,
Sean Young
[1] http://www.hifi-remote.com/wiki/index.php?title=DecodeIR
[2] https://sourceforge.net/p/lirc-remotes/code/ci/master/tree/remotes/
[3] http://www.hifi-remote.com/wiki/index.php?title=IRP_Notation
Changes since v4:
- Renamed rc_dev_bpf_{attach,detach,query} to lirc_bpf_{attach,detach,query}
- Fixed error path in lirc_bpf_query
- Rebased on bpf-next
Changes since v3:
- Implemented review comments from Quentin Monnet and Y Song (thanks!)
- More helpful and better formatted bpf helper documentation
- Changed back to bpf_prog_array rather than open-coded implementation
- scancodes can be 64 bit
- bpf gets passed values in microseconds, not nanoseconds.
microseconds is more than than enough (IR receivers support carriers upto
70kHz, at which point a single period is already 14 microseconds). Also,
this makes it much more consistent with lirc mode2.
- Since it looks much more like lirc mode2, rename the program type to
BPF_PROG_TYPE_LIRC_MODE2.
- Rebased on bpf-next
Changes since v2:
- Fixed locking issues
- Improved self-test to cover more cases
- Rebased on bpf-next again
Changes since v1:
- Code review comments from Y Song <ys114321@gmail.com> and
Randy Dunlap <rdunlap@infradead.org>
- Re-wrote sample bpf to be selftest
- Renamed RAWIR_DECODER -> RAWIR_EVENT (Kconfig, context, bpf prog type)
- Rebase on bpf-next
- Introduced bpf_rawir_event context structure with simpler access checking
Sean Young (3):
bpf: bpf_prog_array_copy() should return -ENOENT if exclude_prog not
found
media: rc: introduce BPF_PROG_LIRC_MODE2
bpf: add selftest for lirc_mode2 type program
drivers/media/rc/Kconfig | 13 +
drivers/media/rc/Makefile | 1 +
drivers/media/rc/bpf-lirc.c | 313 ++++++++++++++++++
drivers/media/rc/lirc_dev.c | 30 ++
drivers/media/rc/rc-core-priv.h | 21 ++
drivers/media/rc/rc-ir-raw.c | 12 +-
include/linux/bpf_lirc.h | 29 ++
include/linux/bpf_types.h | 3 +
include/uapi/linux/bpf.h | 53 ++-
kernel/bpf/core.c | 11 +-
kernel/bpf/syscall.c | 7 +
kernel/trace/bpf_trace.c | 2 +
tools/bpf/bpftool/prog.c | 1 +
tools/include/uapi/linux/bpf.h | 53 ++-
tools/include/uapi/linux/lirc.h | 217 ++++++++++++
tools/lib/bpf/libbpf.c | 1 +
tools/testing/selftests/bpf/.gitignore | 1 +
tools/testing/selftests/bpf/Makefile | 7 +-
tools/testing/selftests/bpf/bpf_helpers.h | 5 +
.../testing/selftests/bpf/test_lirc_mode2.sh | 28 ++
.../selftests/bpf/test_lirc_mode2_kern.c | 23 ++
.../selftests/bpf/test_lirc_mode2_user.c | 149 +++++++++
22 files changed, 971 insertions(+), 9 deletions(-)
create mode 100644 drivers/media/rc/bpf-lirc.c
create mode 100644 include/linux/bpf_lirc.h
create mode 100644 tools/include/uapi/linux/lirc.h
create mode 100755 tools/testing/selftests/bpf/test_lirc_mode2.sh
create mode 100644 tools/testing/selftests/bpf/test_lirc_mode2_kern.c
create mode 100644 tools/testing/selftests/bpf/test_lirc_mode2_user.c
--
2.17.0
^ permalink raw reply
* [PATCH rdma-next 3/3] IB/mlx5: Introduce a new mini-CQE format
From: Leon Romanovsky @ 2018-05-27 10:42 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Guy Levi, Yishai Hadas,
Yonatan Cohen, Saeed Mahameed, linux-netdev
In-Reply-To: <20180527104234.17261-1-leon@kernel.org>
From: Yonatan Cohen <yonatanc@mellanox.com>
The new mini-CQE format includes the stride index, byte count and
packet checksum.
Stride index is needed for striding WQ feature.
This patch exposes this capability and enables its setting
via mlx5 UHW data as part of query device and cq creation.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/hw/mlx5/cq.c | 42 +++++++++++++++++++++++++++++----------
drivers/infiniband/hw/mlx5/main.c | 4 ++++
include/uapi/rdma/mlx5-abi.h | 2 +-
3 files changed, 37 insertions(+), 11 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 7b4ce1a19de0..ad39d64b8108 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -751,6 +751,28 @@ static int alloc_cq_frag_buf(struct mlx5_ib_dev *dev,
return 0;
}
+enum {
+ MLX5_CQE_RES_FORMAT_HASH = 0,
+ MLX5_CQE_RES_FORMAT_CSUM = 1,
+ MLX5_CQE_RES_FORMAT_CSUM_STRIDX = 3,
+};
+
+static int mini_cqe_res_format_to_hw(struct mlx5_ib_dev *dev, u8 format)
+{
+ switch (format) {
+ case MLX5_IB_CQE_RES_FORMAT_HASH:
+ return MLX5_CQE_RES_FORMAT_HASH;
+ case MLX5_IB_CQE_RES_FORMAT_CSUM:
+ return MLX5_CQE_RES_FORMAT_CSUM;
+ case MLX5_IB_CQE_RES_FORMAT_CSUM_STRIDX:
+ if (MLX5_CAP_GEN(dev->mdev, mini_cqe_resp_stride_index))
+ return MLX5_CQE_RES_FORMAT_CSUM_STRIDX;
+ return -EOPNOTSUPP;
+ default:
+ return -EINVAL;
+ }
+}
+
static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
struct ib_ucontext *context, struct mlx5_ib_cq *cq,
int entries, u32 **cqb,
@@ -816,6 +838,8 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
*index = to_mucontext(context)->bfregi.sys_pages[0];
if (ucmd.cqe_comp_en == 1) {
+ int mini_cqe_format;
+
if (!((*cqe_size == 128 &&
MLX5_CAP_GEN(dev->mdev, cqe_compression_128)) ||
(*cqe_size == 64 &&
@@ -826,20 +850,18 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct ib_udata *udata,
goto err_cqb;
}
- if (unlikely(!ucmd.cqe_comp_res_format ||
- !(ucmd.cqe_comp_res_format <
- MLX5_IB_CQE_RES_RESERVED) ||
- (ucmd.cqe_comp_res_format &
- (ucmd.cqe_comp_res_format - 1)))) {
- err = -EOPNOTSUPP;
- mlx5_ib_warn(dev, "CQE compression res format %d is not supported!\n",
- ucmd.cqe_comp_res_format);
+ mini_cqe_format =
+ mini_cqe_res_format_to_hw(dev,
+ ucmd.cqe_comp_res_format);
+ if (mini_cqe_format < 0) {
+ err = mini_cqe_format;
+ mlx5_ib_dbg(dev, "CQE compression res format %d error: %d\n",
+ ucmd.cqe_comp_res_format, err);
goto err_cqb;
}
MLX5_SET(cqc, cqc, cqe_comp_en, 1);
- MLX5_SET(cqc, cqc, mini_cqe_res_format,
- ilog2(ucmd.cqe_comp_res_format));
+ MLX5_SET(cqc, cqc, mini_cqe_res_format, mini_cqe_format);
}
if (ucmd.flags & MLX5_IB_CREATE_CQ_FLAGS_CQE_128B_PAD) {
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 95e67a85078c..238f1eed714c 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -992,6 +992,10 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
resp.cqe_comp_caps.supported_format =
MLX5_IB_CQE_RES_FORMAT_HASH |
MLX5_IB_CQE_RES_FORMAT_CSUM;
+
+ if (MLX5_CAP_GEN(dev->mdev, mini_cqe_resp_stride_index))
+ resp.cqe_comp_caps.supported_format |=
+ MLX5_IB_CQE_RES_FORMAT_CSUM_STRIDX;
}
}
diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
index beec971effef..a03b68b3e26c 100644
--- a/include/uapi/rdma/mlx5-abi.h
+++ b/include/uapi/rdma/mlx5-abi.h
@@ -166,7 +166,7 @@ struct mlx5_ib_rss_caps {
enum mlx5_ib_cqe_comp_res_format {
MLX5_IB_CQE_RES_FORMAT_HASH = 1 << 0,
MLX5_IB_CQE_RES_FORMAT_CSUM = 1 << 1,
- MLX5_IB_CQE_RES_RESERVED = 1 << 2,
+ MLX5_IB_CQE_RES_FORMAT_CSUM_STRIDX = 1 << 2,
};
struct mlx5_ib_cqe_comp_caps {
--
2.14.3
^ permalink raw reply related
* [PATCH rdma-next 2/3] IB/mlx5: Refactor CQE compression response
From: Leon Romanovsky @ 2018-05-27 10:42 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Guy Levi, Yishai Hadas,
Yonatan Cohen, Saeed Mahameed, linux-netdev
In-Reply-To: <20180527104234.17261-1-leon@kernel.org>
From: Yonatan Cohen <yonatanc@mellanox.com>
Refactor CQE compression response to be fully set only
when it`s really supported. There is no change from user
perspective because anyway resp.cqe_comp_caps.max_num was
set to zero.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>W
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/hw/mlx5/main.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 59e080da32aa..95e67a85078c 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -982,13 +982,17 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
}
if (field_avail(typeof(resp), cqe_comp_caps, uhw->outlen)) {
- resp.cqe_comp_caps.max_num =
- MLX5_CAP_GEN(dev->mdev, cqe_compression) ?
- MLX5_CAP_GEN(dev->mdev, cqe_compression_max_num) : 0;
- resp.cqe_comp_caps.supported_format =
- MLX5_IB_CQE_RES_FORMAT_HASH |
- MLX5_IB_CQE_RES_FORMAT_CSUM;
resp.response_length += sizeof(resp.cqe_comp_caps);
+
+ if (MLX5_CAP_GEN(dev->mdev, cqe_compression)) {
+ resp.cqe_comp_caps.max_num =
+ MLX5_CAP_GEN(dev->mdev,
+ cqe_compression_max_num);
+
+ resp.cqe_comp_caps.supported_format =
+ MLX5_IB_CQE_RES_FORMAT_HASH |
+ MLX5_IB_CQE_RES_FORMAT_CSUM;
+ }
}
if (field_avail(typeof(resp), packet_pacing_caps, uhw->outlen) &&
--
2.14.3
^ permalink raw reply related
* [PATCH mlx5-next 1/3] net/mlx5: Exposing a new mini-CQE format
From: Leon Romanovsky @ 2018-05-27 10:42 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Guy Levi, Yishai Hadas,
Yonatan Cohen, Saeed Mahameed, linux-netdev
In-Reply-To: <20180527104234.17261-1-leon@kernel.org>
From: Yonatan Cohen <yonatanc@mellanox.com>
The new mini-CQE format includes byte-count, checksum
and stride index.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
include/linux/mlx5/mlx5_ifc.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index ee8e5a0d0acf..1ea2827e94be 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1160,7 +1160,8 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 flex_parser_protocols[0x20];
u8 reserved_at_560[0x20];
- u8 reserved_at_580[0x3d];
+ u8 reserved_at_580[0x3c];
+ u8 mini_cqe_resp_stride_index[0x1];
u8 cqe_128_always[0x1];
u8 cqe_compression_128[0x1];
u8 cqe_compression[0x1];
--
2.14.3
^ permalink raw reply related
* [PATCH rdma-next 0/3] Introduce new mlx5 CQE format
From: Leon Romanovsky @ 2018-05-27 10:42 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Guy Levi, Yishai Hadas,
Yonatan Cohen, Saeed Mahameed, linux-netdev
From: Leon Romanovsky <leonro@mellanox.com>
Introduce new internal to mlx5 CQE format - mini-CQE. It is a CQE in
compressed form that holds data needed to extra a single full CQE.
It stride index, byte count and packet checksum.
Thanks
Yonatan Cohen (3):
net/mlx5: Exposing a new mini-CQE format
IB/mlx5: Refactor CQE compression response
IB/mlx5: Introduce a new mini-CQE format
drivers/infiniband/hw/mlx5/cq.c | 42 +++++++++++++++++++++++++++++----------
drivers/infiniband/hw/mlx5/main.c | 20 +++++++++++++------
include/linux/mlx5/mlx5_ifc.h | 3 ++-
include/uapi/rdma/mlx5-abi.h | 2 +-
4 files changed, 49 insertions(+), 18 deletions(-)
^ permalink raw reply
* [PATCH rdma-next v1 08/13] IB/core: Add support for flow counters
From: Leon Romanovsky @ 2018-05-27 10:23 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev
In-Reply-To: <20180527102346.15149-1-leon@kernel.org>
From: Raed Salem <raeds@mellanox.com>
A counters object could be attached to flow on creation
by providing the counter specification action.
General counters description which count packets and bytes are
introduced, downstream patches from this series will use them
as part of flow counters binding.
In addition, increase number of flow specifications supported
layers to 10 upon adding count specification and for the
previously added drop specification.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
include/rdma/ib_verbs.h | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 80956b1c9f4d..3acf7a9fa452 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1859,9 +1859,10 @@ enum ib_flow_spec_type {
IB_FLOW_SPEC_ACTION_TAG = 0x1000,
IB_FLOW_SPEC_ACTION_DROP = 0x1001,
IB_FLOW_SPEC_ACTION_HANDLE = 0x1002,
+ IB_FLOW_SPEC_ACTION_COUNT = 0x1003,
};
#define IB_FLOW_SPEC_LAYER_MASK 0xF0
-#define IB_FLOW_SPEC_SUPPORT_LAYERS 8
+#define IB_FLOW_SPEC_SUPPORT_LAYERS 10
/* Flow steering rule priority is set according to it's domain.
* Lower domain value means higher priority.
@@ -2041,6 +2042,17 @@ struct ib_flow_spec_action_handle {
struct ib_flow_action *act;
};
+enum ib_counters_description {
+ IB_COUNTER_PACKETS,
+ IB_COUNTER_BYTES,
+};
+
+struct ib_flow_spec_action_count {
+ enum ib_flow_spec_type type;
+ u16 size;
+ struct ib_counters *counters;
+};
+
union ib_flow_spec {
struct {
u32 type;
@@ -2058,6 +2070,7 @@ union ib_flow_spec {
struct ib_flow_spec_action_tag flow_tag;
struct ib_flow_spec_action_drop drop;
struct ib_flow_spec_action_handle action;
+ struct ib_flow_spec_action_count flow_count;
};
struct ib_flow_attr {
--
2.14.3
^ permalink raw reply related
* [PATCH rdma-next v1 13/13] IB/mlx5: Add counters read support
From: Leon Romanovsky @ 2018-05-27 10:23 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev
In-Reply-To: <20180527102346.15149-1-leon@kernel.org>
From: Raed Salem <raeds@mellanox.com>
This patch implements the uverbs counters read API, it will use the
specific read counters function to the given type to accomplish its
task.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/hw/mlx5/main.c | 44 +++++++++++++++++++++++++++++++++++++++
1 file changed, 44 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 2044d9f69a83..ef688a265d47 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -5316,6 +5316,49 @@ static void depopulate_specs_root(struct mlx5_ib_dev *dev)
uverbs_free_spec_tree(dev->ib_dev.specs_root);
}
+static int mlx5_ib_read_counters(struct ib_counters *counters,
+ struct ib_counters_read_attr *read_attr,
+ struct uverbs_attr_bundle *attrs)
+{
+ struct mlx5_ib_mcounters *mcounters = to_mcounters(counters);
+ struct mlx5_read_counters_attr mread_attr = {};
+ u32 *desc;
+ int ret, i;
+
+ mutex_lock(&mcounters->mcntrs_mutex);
+ if (mcounters->cntrs_max_index > read_attr->ncounters) {
+ ret = -EINVAL;
+ goto err_bound;
+ }
+
+ mread_attr.out = kcalloc(mcounters->counters_num, sizeof(u64),
+ GFP_KERNEL);
+ if (!mread_attr.out) {
+ ret = -ENOMEM;
+ goto err_bound;
+ }
+
+ mread_attr.hw_cntrs_hndl = mcounters->hw_cntrs_hndl;
+ mread_attr.flags = read_attr->flags;
+ ret = mcounters->read_counters(counters->device, &mread_attr);
+ if (ret)
+ goto err_read;
+
+ /*
+ * We pass over the counters data array to assign according
+ * to the descriptions and indexing pairs.
+ */
+ desc = mcounters->counters_data;
+ for (i = 0; i < mcounters->ncounters * 2; i += 2)
+ read_attr->counters_buff[desc[i+1]] += mread_attr.out[desc[i]];
+
+err_read:
+ kfree(mread_attr.out);
+err_bound:
+ mutex_unlock(&mcounters->mcntrs_mutex);
+ return ret;
+}
+
static int mlx5_ib_destroy_counters(struct ib_counters *counters)
{
struct mlx5_ib_mcounters *mcounters = to_mcounters(counters);
@@ -5589,6 +5632,7 @@ int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev)
dev->ib_dev.driver_id = RDMA_DRIVER_MLX5;
dev->ib_dev.create_counters = mlx5_ib_create_counters;
dev->ib_dev.destroy_counters = mlx5_ib_destroy_counters;
+ dev->ib_dev.read_counters = mlx5_ib_read_counters;
err = init_node_data(dev);
if (err)
--
2.14.3
^ permalink raw reply related
* [PATCH rdma-next v1 12/13] IB/mlx5: Add flow counters read support
From: Leon Romanovsky @ 2018-05-27 10:23 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev
In-Reply-To: <20180527102346.15149-1-leon@kernel.org>
From: Raed Salem <raeds@mellanox.com>
Implements the flow counters read wrapper.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/hw/mlx5/main.c | 16 ++++++++++++++++
drivers/infiniband/hw/mlx5/mlx5_ib.h | 11 +++++++++++
2 files changed, 27 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 3f1e957946e6..2044d9f69a83 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3150,7 +3150,21 @@ static void set_underlay_qp(struct mlx5_ib_dev *dev,
}
}
+static int read_flow_counters(struct ib_device *ibdev,
+ struct mlx5_read_counters_attr *read_attr)
+{
+ struct mlx5_fc *fc = (struct mlx5_fc *)(read_attr->hw_cntrs_hndl);
+ struct mlx5_ib_dev *dev = to_mdev(ibdev);
+
+ return mlx5_fc_query(dev->mdev, fc->id,
+ &read_attr->out[IB_COUNTER_PACKETS],
+ &read_attr->out[IB_COUNTER_BYTES]);
+}
+
#define MAX_COUNTERS_NUM (USHRT_MAX / (sizeof(u32) * 2))
+/* flow counters currently expose two counters packets and bytes */
+#define FLOW_COUNTERS_NUM 2
+
static int counters_set_description(struct ib_counters *counters,
enum mlx5_ib_counters_type counters_type,
const void __user *cntrs_data,
@@ -3182,6 +3196,8 @@ static int counters_set_description(struct ib_counters *counters,
/* init the fields for the object */
mcounters->type = counters_type;
+ mcounters->read_counters = read_flow_counters;
+ mcounters->counters_num = FLOW_COUNTERS_NUM;
mcounters->ncounters = ncounters;
desc = mcounters->counters_data;
for (i = 0; i < ncounters * 2; i += 2) {
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 7313d3cd04f0..1baee579d84b 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -814,6 +814,12 @@ struct mlx5_memic {
DECLARE_BITMAP(memic_alloc_pages, MLX5_MAX_MEMIC_PAGES);
};
+struct mlx5_read_counters_attr {
+ void *hw_cntrs_hndl;
+ u64 *out;
+ u32 flags;
+};
+
enum mlx5_ib_counters_type {
MLX5_IB_COUNTERS_FLOW,
};
@@ -821,7 +827,12 @@ enum mlx5_ib_counters_type {
struct mlx5_ib_mcounters {
struct ib_counters ibcntrs;
enum mlx5_ib_counters_type type;
+ /* number of counters supported for this counters type */
+ u32 counters_num;
void *hw_cntrs_hndl;
+ /* read function for this counters type */
+ int (*read_counters)(struct ib_device *ibdev,
+ struct mlx5_read_counters_attr *read_attr);
/* max index set as part of create_flow */
u32 cntrs_max_index;
/* number of counters data entries (<description,index> pair) */
--
2.14.3
^ permalink raw reply related
* [PATCH rdma-next v1 11/13] IB/mlx5: Add flow counters binding support
From: Leon Romanovsky @ 2018-05-27 10:23 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev
In-Reply-To: <20180527102346.15149-1-leon@kernel.org>
From: Raed Salem <raeds@mellanox.com>
Associates a counters with a flow when IB_FLOW_SPEC_ACTION_COUNT
is part of the flow specifications.
The counters user space placements of location and description
(index, description) pairs are passed as private data of the
counters flow specification.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/hw/mlx5/main.c | 207 +++++++++++++++++++++++++++++++++--
drivers/infiniband/hw/mlx5/mlx5_ib.h | 15 +++
include/linux/mlx5/fs.h | 1 +
include/uapi/rdma/mlx5-abi.h | 14 +++
4 files changed, 225 insertions(+), 12 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 18bfee86fa52..3f1e957946e6 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2472,7 +2472,7 @@ static int check_mpls_supp_fields(u32 field_support, const __be32 *set_mask)
#define LAST_TUNNEL_FIELD tunnel_id
#define LAST_FLOW_TAG_FIELD tag_id
#define LAST_DROP_FIELD size
-#define LAST_DROP_FIELD size
+#define LAST_COUNTERS_FIELD counters
/* Field is the last supported field */
#define FIELDS_NOT_SUPPORTED(filter, field)\
@@ -2836,6 +2836,18 @@ static int parse_flow_attr(struct mlx5_core_dev *mdev, u32 *match_c,
if (ret)
return ret;
break;
+ case IB_FLOW_SPEC_ACTION_COUNT:
+ if (FIELDS_NOT_SUPPORTED(ib_spec->flow_count,
+ LAST_COUNTERS_FIELD))
+ return -EOPNOTSUPP;
+
+ /* for now support only one counters spec per flow */
+ if (action->action & MLX5_FLOW_CONTEXT_ACTION_COUNT)
+ return -EINVAL;
+
+ action->counters = ib_spec->flow_count.counters;
+ action->action |= MLX5_FLOW_CONTEXT_ACTION_COUNT;
+ break;
default:
return -EINVAL;
}
@@ -2983,6 +2995,17 @@ static void put_flow_table(struct mlx5_ib_dev *dev,
}
}
+static void counters_clear_description(struct ib_counters *counters)
+{
+ struct mlx5_ib_mcounters *mcounters = to_mcounters(counters);
+
+ mutex_lock(&mcounters->mcntrs_mutex);
+ kfree(mcounters->counters_data);
+ mcounters->counters_data = NULL;
+ mcounters->cntrs_max_index = 0;
+ mutex_unlock(&mcounters->mcntrs_mutex);
+}
+
static int mlx5_ib_destroy_flow(struct ib_flow *flow_id)
{
struct mlx5_ib_dev *dev = to_mdev(flow_id->qp->device);
@@ -3004,6 +3027,10 @@ static int mlx5_ib_destroy_flow(struct ib_flow *flow_id)
put_flow_table(dev, handler->prio, true);
mutex_unlock(&dev->flow_db->lock);
+ if (handler->ibcounters &&
+ atomic_read(&handler->ibcounters->usecnt) == 1)
+ counters_clear_description(handler->ibcounters);
+
kfree(handler);
return 0;
@@ -3123,22 +3150,119 @@ static void set_underlay_qp(struct mlx5_ib_dev *dev,
}
}
+#define MAX_COUNTERS_NUM (USHRT_MAX / (sizeof(u32) * 2))
+static int counters_set_description(struct ib_counters *counters,
+ enum mlx5_ib_counters_type counters_type,
+ const void __user *cntrs_data,
+ u32 ncounters)
+{
+ struct mlx5_ib_mcounters *mcounters = to_mcounters(counters);
+ u32 *desc;
+ int ret;
+ int i;
+
+ if (counters_type != MLX5_IB_COUNTERS_FLOW)
+ return -EINVAL;
+
+ if (ncounters > MAX_COUNTERS_NUM)
+ return -EINVAL;
+
+ /* each counter entry have both description and index pair */
+ mcounters->counters_data = kcalloc(ncounters,
+ sizeof(u32) * 2,
+ GFP_KERNEL);
+ if (!mcounters->counters_data)
+ return -ENOMEM;
+
+ if (copy_from_user(mcounters->counters_data, cntrs_data,
+ sizeof(u32) * ncounters * 2)) {
+ ret = -EFAULT;
+ goto data_err;
+ }
+
+ /* init the fields for the object */
+ mcounters->type = counters_type;
+ mcounters->ncounters = ncounters;
+ desc = mcounters->counters_data;
+ for (i = 0; i < ncounters * 2; i += 2) {
+ if (desc[i] > IB_COUNTER_BYTES) {
+ ret = -EINVAL;
+ goto data_err;
+ }
+
+ if (mcounters->cntrs_max_index <= desc[i+1])
+ mcounters->cntrs_max_index = desc[i+1] + 1;
+ }
+
+ return 0;
+
+data_err:
+ counters_clear_description(counters);
+
+ return ret;
+}
+
+static int flow_counters_set_data(struct ib_counters *ibcounters,
+ struct mlx5_ib_create_flow *ucmd)
+{
+ struct mlx5_ib_mcounters *mcounters = to_mcounters(ibcounters);
+ struct mlx5_ib_flow_counters_data *cntrs_data = NULL;
+ int err = 0;
+
+ mutex_lock(&mcounters->mcntrs_mutex);
+ if (ucmd && ucmd->ncounters_data != 0) {
+ cntrs_data = ucmd->data;
+ /* counters already bound to at least one flow */
+ if (mcounters->cntrs_max_index) {
+ err = -EINVAL;
+ goto err;
+ }
+
+ err = counters_set_description(ibcounters,
+ MLX5_IB_COUNTERS_FLOW,
+ u64_to_user_ptr(cntrs_data->counters_data),
+ cntrs_data->ncounters);
+ if (err)
+ goto err;
+
+ } else if (!mcounters->cntrs_max_index) {
+ /* counters not bound yet, must have udata passed */
+ err = -EINVAL;
+ goto err;
+ }
+
+ if (!mcounters->hw_cntrs_hndl) {
+ mcounters->hw_cntrs_hndl =
+ (void *)mlx5_fc_create(to_mdev(ibcounters->device)->mdev,
+ false);
+ if (!mcounters->hw_cntrs_hndl)
+ err = -ENOMEM;
+ }
+
+err:
+ mutex_unlock(&mcounters->mcntrs_mutex);
+
+ return err;
+}
+
static struct mlx5_ib_flow_handler *_create_flow_rule(struct mlx5_ib_dev *dev,
struct mlx5_ib_flow_prio *ft_prio,
const struct ib_flow_attr *flow_attr,
struct mlx5_flow_destination *dst,
- u32 underlay_qpn)
+ u32 underlay_qpn,
+ struct mlx5_ib_create_flow *ucmd)
{
struct mlx5_flow_table *ft = ft_prio->flow_table;
struct mlx5_ib_flow_handler *handler;
struct mlx5_flow_act flow_act = {.flow_tag = MLX5_FS_DEFAULT_FLOW_TAG};
struct mlx5_flow_spec *spec;
- struct mlx5_flow_destination *rule_dst = dst;
+ struct mlx5_flow_destination dest_arr[2] = {};
+ struct mlx5_flow_destination *rule_dst = dest_arr;
const void *ib_flow = (const void *)flow_attr + sizeof(*flow_attr);
unsigned int spec_index;
u32 prev_type = 0;
int err = 0;
- int dest_num = 1;
+ int dest_num = 0;
bool is_egress = flow_attr->flags & IB_FLOW_ATTR_FLAGS_EGRESS;
if (!is_valid_attr(dev->mdev, flow_attr))
@@ -3152,6 +3276,10 @@ static struct mlx5_ib_flow_handler *_create_flow_rule(struct mlx5_ib_dev *dev,
}
INIT_LIST_HEAD(&handler->list);
+ if (dst) {
+ memcpy(&dest_arr[0], dst, sizeof(*dst));
+ dest_num++;
+ }
for (spec_index = 0; spec_index < flow_attr->num_of_specs; spec_index++) {
err = parse_flow_attr(dev->mdev, spec->match_criteria,
@@ -3188,15 +3316,30 @@ static struct mlx5_ib_flow_handler *_create_flow_rule(struct mlx5_ib_dev *dev,
goto free;
}
+ if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_COUNT) {
+ err = flow_counters_set_data(flow_act.counters, ucmd);
+ if (err)
+ goto free;
+
+ handler->ibcounters = flow_act.counters;
+ dest_arr[dest_num].type =
+ MLX5_FLOW_DESTINATION_TYPE_COUNTER;
+ dest_arr[dest_num].counter =
+ (struct mlx5_fc *)(to_mcounters(flow_act.counters)->hw_cntrs_hndl);
+ dest_num++;
+ }
+
if (flow_act.action & MLX5_FLOW_CONTEXT_ACTION_DROP) {
- rule_dst = NULL;
- dest_num = 0;
+ if (!(flow_act.action & MLX5_FLOW_CONTEXT_ACTION_COUNT)) {
+ rule_dst = NULL;
+ dest_num = 0;
+ }
} else {
if (is_egress)
flow_act.action |= MLX5_FLOW_CONTEXT_ACTION_ALLOW;
else
flow_act.action |=
- dst ? MLX5_FLOW_CONTEXT_ACTION_FWD_DEST :
+ dest_num ? MLX5_FLOW_CONTEXT_ACTION_FWD_DEST :
MLX5_FLOW_CONTEXT_ACTION_FWD_NEXT_PRIO;
}
@@ -3233,7 +3376,7 @@ static struct mlx5_ib_flow_handler *create_flow_rule(struct mlx5_ib_dev *dev,
const struct ib_flow_attr *flow_attr,
struct mlx5_flow_destination *dst)
{
- return _create_flow_rule(dev, ft_prio, flow_attr, dst, 0);
+ return _create_flow_rule(dev, ft_prio, flow_attr, dst, 0, NULL);
}
static struct mlx5_ib_flow_handler *create_dont_trap_rule(struct mlx5_ib_dev *dev,
@@ -3373,12 +3516,43 @@ static struct ib_flow *mlx5_ib_create_flow(struct ib_qp *qp,
struct mlx5_ib_flow_prio *ft_prio_tx = NULL;
struct mlx5_ib_flow_prio *ft_prio;
bool is_egress = flow_attr->flags & IB_FLOW_ATTR_FLAGS_EGRESS;
+ struct mlx5_ib_create_flow *ucmd = NULL, ucmd_hdr;
+ size_t min_ucmd_sz, required_ucmd_sz;
int err;
int underlay_qpn;
- if (udata &&
- udata->inlen && !ib_is_udata_cleared(udata, 0, udata->inlen))
- return ERR_PTR(-EOPNOTSUPP);
+ if (udata && udata->inlen) {
+ min_ucmd_sz = offsetof(typeof(ucmd_hdr), reserved) +
+ sizeof(ucmd_hdr.reserved);
+ if (udata->inlen < min_ucmd_sz)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ err = ib_copy_from_udata(&ucmd_hdr, udata, min_ucmd_sz);
+ if (err)
+ return ERR_PTR(err);
+
+ /* currently supports only one counters data */
+ if (ucmd_hdr.ncounters_data > 1)
+ return ERR_PTR(-EINVAL);
+
+ required_ucmd_sz = min_ucmd_sz +
+ sizeof(struct mlx5_ib_flow_counters_data) *
+ ucmd_hdr.ncounters_data;
+ if (udata->inlen > required_ucmd_sz &&
+ !ib_is_udata_cleared(udata, required_ucmd_sz,
+ udata->inlen - required_ucmd_sz))
+ return ERR_PTR(-EOPNOTSUPP);
+
+ ucmd = kzalloc(required_ucmd_sz, GFP_KERNEL);
+ if (!ucmd)
+ return ERR_PTR(-ENOMEM);
+
+ err = ib_copy_from_udata(ucmd, udata, required_ucmd_sz);
+ if (err) {
+ kfree(ucmd);
+ return ERR_PTR(err);
+ }
+ }
if (flow_attr->priority > MLX5_IB_FLOW_LAST_PRIO)
return ERR_PTR(-ENOMEM);
@@ -3433,7 +3607,7 @@ static struct ib_flow *mlx5_ib_create_flow(struct ib_qp *qp,
underlay_qpn = (mqp->flags & MLX5_IB_QP_UNDERLAY) ?
mqp->underlay_qpn : 0;
handler = _create_flow_rule(dev, ft_prio, flow_attr,
- dst, underlay_qpn);
+ dst, underlay_qpn, ucmd);
}
} else if (flow_attr->type == IB_FLOW_ATTR_ALL_DEFAULT ||
flow_attr->type == IB_FLOW_ATTR_MC_DEFAULT) {
@@ -3454,6 +3628,7 @@ static struct ib_flow *mlx5_ib_create_flow(struct ib_qp *qp,
mutex_unlock(&dev->flow_db->lock);
kfree(dst);
+ kfree(ucmd);
return &handler->ibflow;
@@ -3464,6 +3639,7 @@ static struct ib_flow *mlx5_ib_create_flow(struct ib_qp *qp,
unlock:
mutex_unlock(&dev->flow_db->lock);
kfree(dst);
+ kfree(ucmd);
kfree(handler);
return ERR_PTR(err);
}
@@ -5128,6 +5304,11 @@ static int mlx5_ib_destroy_counters(struct ib_counters *counters)
{
struct mlx5_ib_mcounters *mcounters = to_mcounters(counters);
+ counters_clear_description(counters);
+ if (mcounters->hw_cntrs_hndl)
+ mlx5_fc_destroy(to_mdev(counters->device)->mdev,
+ (struct mlx5_fc *)mcounters->hw_cntrs_hndl);
+
kfree(mcounters);
return 0;
@@ -5142,6 +5323,8 @@ static struct ib_counters *mlx5_ib_create_counters(struct ib_device *device,
if (!mcounters)
return ERR_PTR(-ENOMEM);
+ mutex_init(&mcounters->mcntrs_mutex);
+
return &mcounters->ibcntrs;
}
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index fd27ec1aed08..7313d3cd04f0 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -175,6 +175,7 @@ struct mlx5_ib_flow_handler {
struct ib_flow ibflow;
struct mlx5_ib_flow_prio *prio;
struct mlx5_flow_handle *rule;
+ struct ib_counters *ibcounters;
};
struct mlx5_ib_flow_db {
@@ -813,8 +814,22 @@ struct mlx5_memic {
DECLARE_BITMAP(memic_alloc_pages, MLX5_MAX_MEMIC_PAGES);
};
+enum mlx5_ib_counters_type {
+ MLX5_IB_COUNTERS_FLOW,
+};
+
struct mlx5_ib_mcounters {
struct ib_counters ibcntrs;
+ enum mlx5_ib_counters_type type;
+ void *hw_cntrs_hndl;
+ /* max index set as part of create_flow */
+ u32 cntrs_max_index;
+ /* number of counters data entries (<description,index> pair) */
+ u32 ncounters;
+ /* counters data array for descriptions and indexes */
+ u32 *counters_data;
+ /* protects access to mcounters internal data */
+ struct mutex mcntrs_mutex;
};
static inline struct mlx5_ib_mcounters *
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 93aab0f055b4..4612e0ad688b 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -160,6 +160,7 @@ struct mlx5_flow_act {
u32 modify_id;
uintptr_t esp_id;
struct mlx5_fs_vlan vlan;
+ struct ib_counters *counters;
};
#define MLX5_DECLARE_FLOW_ACT(name) \
diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
index 508ea8c82da7..ef3f430a7050 100644
--- a/include/uapi/rdma/mlx5-abi.h
+++ b/include/uapi/rdma/mlx5-abi.h
@@ -443,4 +443,18 @@ enum {
enum {
MLX5_IB_CLOCK_INFO_V1 = 0,
};
+
+struct mlx5_ib_flow_counters_data {
+ __aligned_u64 counters_data;
+ __u32 ncounters;
+ __u32 reserved;
+};
+
+struct mlx5_ib_create_flow {
+ __u32 ncounters_data;
+ __u32 reserved;
+ /* Following are counters data based on ncounters_data */
+ struct mlx5_ib_flow_counters_data data[];
+};
+
#endif /* MLX5_ABI_USER_H */
--
2.14.3
^ permalink raw reply related
* [PATCH rdma-next v1 10/13] IB/mlx5: Add counters create and destroy support
From: Leon Romanovsky @ 2018-05-27 10:23 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev
In-Reply-To: <20180527102346.15149-1-leon@kernel.org>
From: Raed Salem <raeds@mellanox.com>
This patch implements the device counters create and destroy APIs
and introducing some internal management structures.
Downstream patches in this series will add the functionality to
support flow counters binding and reading.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/hw/mlx5/main.c | 23 +++++++++++++++++++++++
drivers/infiniband/hw/mlx5/mlx5_ib.h | 10 ++++++++++
2 files changed, 33 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 59f86198eb3b..18bfee86fa52 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -5124,6 +5124,27 @@ static void depopulate_specs_root(struct mlx5_ib_dev *dev)
uverbs_free_spec_tree(dev->ib_dev.specs_root);
}
+static int mlx5_ib_destroy_counters(struct ib_counters *counters)
+{
+ struct mlx5_ib_mcounters *mcounters = to_mcounters(counters);
+
+ kfree(mcounters);
+
+ return 0;
+}
+
+static struct ib_counters *mlx5_ib_create_counters(struct ib_device *device,
+ struct uverbs_attr_bundle *attrs)
+{
+ struct mlx5_ib_mcounters *mcounters;
+
+ mcounters = kzalloc(sizeof(*mcounters), GFP_KERNEL);
+ if (!mcounters)
+ return ERR_PTR(-ENOMEM);
+
+ return &mcounters->ibcntrs;
+}
+
void mlx5_ib_stage_init_cleanup(struct mlx5_ib_dev *dev)
{
mlx5_ib_cleanup_multiport_master(dev);
@@ -5367,6 +5388,8 @@ int mlx5_ib_stage_caps_init(struct mlx5_ib_dev *dev)
dev->ib_dev.destroy_flow_action = mlx5_ib_destroy_flow_action;
dev->ib_dev.modify_flow_action_esp = mlx5_ib_modify_flow_action_esp;
dev->ib_dev.driver_id = RDMA_DRIVER_MLX5;
+ dev->ib_dev.create_counters = mlx5_ib_create_counters;
+ dev->ib_dev.destroy_counters = mlx5_ib_destroy_counters;
err = init_node_data(dev);
if (err)
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 49a1aa0ff429..fd27ec1aed08 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -813,6 +813,16 @@ struct mlx5_memic {
DECLARE_BITMAP(memic_alloc_pages, MLX5_MAX_MEMIC_PAGES);
};
+struct mlx5_ib_mcounters {
+ struct ib_counters ibcntrs;
+};
+
+static inline struct mlx5_ib_mcounters *
+to_mcounters(struct ib_counters *ibcntrs)
+{
+ return container_of(ibcntrs, struct mlx5_ib_mcounters, ibcntrs);
+}
+
struct mlx5_ib_dev {
struct ib_device ib_dev;
struct mlx5_core_dev *mdev;
--
2.14.3
^ permalink raw reply related
* [PATCH rdma-next v1 09/13] IB/uverbs: Add support for flow counters
From: Leon Romanovsky @ 2018-05-27 10:23 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev
In-Reply-To: <20180527102346.15149-1-leon@kernel.org>
From: Raed Salem <raeds@mellanox.com>
The struct ib_uverbs_flow_spec_action_count associates
a counters object with the flow.
Post this association the flow counters can be read via
the counters object.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/uverbs.h | 1 +
drivers/infiniband/core/uverbs_cmd.c | 81 +++++++++++++++++++++++++++++++-----
include/uapi/rdma/ib_user_verbs.h | 13 ++++++
3 files changed, 84 insertions(+), 11 deletions(-)
diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index 5b2461fa634d..c0d40fc3a53a 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -263,6 +263,7 @@ struct ib_uverbs_flow_spec {
struct ib_uverbs_flow_spec_action_tag flow_tag;
struct ib_uverbs_flow_spec_action_drop drop;
struct ib_uverbs_flow_spec_action_handle action;
+ struct ib_uverbs_flow_spec_action_count flow_count;
};
};
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index ddb9d79691be..3179a95c6f5e 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2748,43 +2748,82 @@ ssize_t ib_uverbs_detach_mcast(struct ib_uverbs_file *file,
struct ib_uflow_resources {
size_t max;
size_t num;
- struct ib_flow_action *collection[0];
+ size_t collection_num;
+ size_t counters_num;
+ struct ib_counters **counters;
+ struct ib_flow_action **collection;
};
static struct ib_uflow_resources *flow_resources_alloc(size_t num_specs)
{
struct ib_uflow_resources *resources;
- resources =
- kmalloc(sizeof(*resources) +
- num_specs * sizeof(*resources->collection), GFP_KERNEL);
+ resources = kzalloc(sizeof(*resources), GFP_KERNEL);
if (!resources)
- return NULL;
+ goto err_res;
+
+ resources->counters =
+ kcalloc(num_specs, sizeof(*resources->counters), GFP_KERNEL);
+
+ if (!resources->counters)
+ goto err_cnt;
+
+ resources->collection =
+ kcalloc(num_specs, sizeof(*resources->collection), GFP_KERNEL);
+
+ if (!resources->collection)
+ goto err_collection;
- resources->num = 0;
resources->max = num_specs;
return resources;
+
+err_collection:
+ kfree(resources->counters);
+err_cnt:
+ kfree(resources);
+err_res:
+ return NULL;
}
void ib_uverbs_flow_resources_free(struct ib_uflow_resources *uflow_res)
{
unsigned int i;
- for (i = 0; i < uflow_res->num; i++)
+ for (i = 0; i < uflow_res->collection_num; i++)
atomic_dec(&uflow_res->collection[i]->usecnt);
+ for (i = 0; i < uflow_res->counters_num; i++)
+ atomic_dec(&uflow_res->counters[i]->usecnt);
+
+ kfree(uflow_res->collection);
+ kfree(uflow_res->counters);
kfree(uflow_res);
}
static void flow_resources_add(struct ib_uflow_resources *uflow_res,
- struct ib_flow_action *action)
+ enum ib_flow_spec_type type,
+ void *ibobj)
{
WARN_ON(uflow_res->num >= uflow_res->max);
- atomic_inc(&action->usecnt);
- uflow_res->collection[uflow_res->num++] = action;
+ switch (type) {
+ case IB_FLOW_SPEC_ACTION_HANDLE:
+ atomic_inc(&((struct ib_flow_action *)ibobj)->usecnt);
+ uflow_res->collection[uflow_res->collection_num++] =
+ (struct ib_flow_action *)ibobj;
+ break;
+ case IB_FLOW_SPEC_ACTION_COUNT:
+ atomic_inc(&((struct ib_counters *)ibobj)->usecnt);
+ uflow_res->counters[uflow_res->counters_num++] =
+ (struct ib_counters *)ibobj;
+ break;
+ default:
+ WARN_ON(1);
+ }
+
+ uflow_res->num++;
}
static int kern_spec_to_ib_spec_action(struct ib_ucontext *ucontext,
@@ -2821,9 +2860,29 @@ static int kern_spec_to_ib_spec_action(struct ib_ucontext *ucontext,
return -EINVAL;
ib_spec->action.size =
sizeof(struct ib_flow_spec_action_handle);
- flow_resources_add(uflow_res, ib_spec->action.act);
+ flow_resources_add(uflow_res,
+ IB_FLOW_SPEC_ACTION_HANDLE,
+ ib_spec->action.act);
uobj_put_obj_read(ib_spec->action.act);
break;
+ case IB_FLOW_SPEC_ACTION_COUNT:
+ if (kern_spec->flow_count.size !=
+ sizeof(struct ib_uverbs_flow_spec_action_count))
+ return -EINVAL;
+ ib_spec->flow_count.counters =
+ uobj_get_obj_read(counters,
+ UVERBS_OBJECT_COUNTERS,
+ kern_spec->flow_count.handle,
+ ucontext);
+ if (!ib_spec->flow_count.counters)
+ return -EINVAL;
+ ib_spec->flow_count.size =
+ sizeof(struct ib_flow_spec_action_count);
+ flow_resources_add(uflow_res,
+ IB_FLOW_SPEC_ACTION_COUNT,
+ ib_spec->flow_count.counters);
+ uobj_put_obj_read(ib_spec->flow_count.counters);
+ break;
default:
return -EINVAL;
}
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 409507f83b91..4f9991de8e3a 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -998,6 +998,19 @@ struct ib_uverbs_flow_spec_action_handle {
__u32 reserved1;
};
+struct ib_uverbs_flow_spec_action_count {
+ union {
+ struct ib_uverbs_flow_spec_hdr hdr;
+ struct {
+ __u32 type;
+ __u16 size;
+ __u16 reserved;
+ };
+ };
+ __u32 handle;
+ __u32 reserved1;
+};
+
struct ib_uverbs_flow_tunnel_filter {
__be32 tunnel_id;
};
--
2.14.3
^ permalink raw reply related
* [PATCH rdma-next v1 07/13] IB/core: Support passing uhw for create_flow
From: Leon Romanovsky @ 2018-05-27 10:23 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev
In-Reply-To: <20180527102346.15149-1-leon@kernel.org>
From: Matan Barak <matanb@mellanox.com>
This is required when user-space drivers need to pass extra information
regarding how to handle this flow steering specification.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/uverbs_cmd.c | 7 ++++++-
drivers/infiniband/core/verbs.c | 2 +-
drivers/infiniband/hw/mlx4/main.c | 6 +++++-
drivers/infiniband/hw/mlx5/main.c | 7 ++++++-
include/rdma/ib_verbs.h | 3 ++-
5 files changed, 20 insertions(+), 5 deletions(-)
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index e74262ee104c..ddb9d79691be 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3542,11 +3542,16 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
err = -EINVAL;
goto err_free;
}
- flow_id = ib_create_flow(qp, flow_attr, IB_FLOW_DOMAIN_USER);
+
+ flow_id = qp->device->create_flow(qp, flow_attr,
+ IB_FLOW_DOMAIN_USER, uhw);
+
if (IS_ERR(flow_id)) {
err = PTR_ERR(flow_id);
goto err_free;
}
+ atomic_inc(&qp->usecnt);
+ flow_id->qp = qp;
flow_id->uobject = uobj;
uobj->object = flow_id;
uflow = container_of(uobj, typeof(*uflow), uobject);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 6ddfb1fade79..0b56828c1319 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1983,7 +1983,7 @@ struct ib_flow *ib_create_flow(struct ib_qp *qp,
if (!qp->device->create_flow)
return ERR_PTR(-EOPNOTSUPP);
- flow_id = qp->device->create_flow(qp, flow_attr, domain);
+ flow_id = qp->device->create_flow(qp, flow_attr, domain, NULL);
if (!IS_ERR(flow_id)) {
atomic_inc(&qp->usecnt);
flow_id->qp = qp;
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index bf12394c13c1..6fe5d5d1d1d9 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1848,7 +1848,7 @@ static int mlx4_ib_add_dont_trap_rule(struct mlx4_dev *dev,
static struct ib_flow *mlx4_ib_create_flow(struct ib_qp *qp,
struct ib_flow_attr *flow_attr,
- int domain)
+ int domain, struct ib_udata *udata)
{
int err = 0, i = 0, j = 0;
struct mlx4_ib_flow *mflow;
@@ -1866,6 +1866,10 @@ static struct ib_flow *mlx4_ib_create_flow(struct ib_qp *qp,
(flow_attr->type != IB_FLOW_ATTR_NORMAL))
return ERR_PTR(-EOPNOTSUPP);
+ if (udata &&
+ udata->inlen && !ib_is_udata_cleared(udata, 0, udata->inlen))
+ return ERR_PTR(-EOPNOTSUPP);
+
memset(type, 0, sizeof(type));
mflow = kzalloc(sizeof(*mflow), GFP_KERNEL);
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 25a271ef8374..59f86198eb3b 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3363,7 +3363,8 @@ static struct mlx5_ib_flow_handler *create_sniffer_rule(struct mlx5_ib_dev *dev,
static struct ib_flow *mlx5_ib_create_flow(struct ib_qp *qp,
struct ib_flow_attr *flow_attr,
- int domain)
+ int domain,
+ struct ib_udata *udata)
{
struct mlx5_ib_dev *dev = to_mdev(qp->device);
struct mlx5_ib_qp *mqp = to_mqp(qp);
@@ -3375,6 +3376,10 @@ static struct ib_flow *mlx5_ib_create_flow(struct ib_qp *qp,
int err;
int underlay_qpn;
+ if (udata &&
+ udata->inlen && !ib_is_udata_cleared(udata, 0, udata->inlen))
+ return ERR_PTR(-EOPNOTSUPP);
+
if (flow_attr->priority > MLX5_IB_FLOW_LAST_PRIO)
return ERR_PTR(-ENOMEM);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index f6bd3b97b971..80956b1c9f4d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2459,7 +2459,8 @@ struct ib_device {
struct ib_flow * (*create_flow)(struct ib_qp *qp,
struct ib_flow_attr
*flow_attr,
- int domain);
+ int domain,
+ struct ib_udata *udata);
int (*destroy_flow)(struct ib_flow *flow_id);
int (*check_mr_status)(struct ib_mr *mr, u32 check_mask,
struct ib_mr_status *mr_status);
--
2.14.3
^ permalink raw reply related
* [PATCH rdma-next v1 06/13] IB/uverbs: Add read counters support
From: Leon Romanovsky @ 2018-05-27 10:23 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev
In-Reply-To: <20180527102346.15149-1-leon@kernel.org>
From: Raed Salem <raeds@mellanox.com>
This patch exposes the read counters verb to user space
applications.
By that verb the user can read the hardware counters which
are associated with the counters object.
The application needs to provide a sufficient memory to
hold the statistics.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
.../infiniband/core/uverbs_std_types_counters.c | 59 +++++++++++++++++++++-
include/uapi/rdma/ib_user_ioctl_cmds.h | 7 +++
2 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/uverbs_std_types_counters.c b/drivers/infiniband/core/uverbs_std_types_counters.c
index a5bc50ceee13..b35fcd3718c8 100644
--- a/drivers/infiniband/core/uverbs_std_types_counters.c
+++ b/drivers/infiniband/core/uverbs_std_types_counters.c
@@ -80,6 +80,49 @@ static int UVERBS_HANDLER(UVERBS_METHOD_COUNTERS_CREATE)(struct ib_device *ib_de
return ret;
}
+static int UVERBS_HANDLER(UVERBS_METHOD_COUNTERS_READ)(struct ib_device *ib_dev,
+ struct ib_uverbs_file *file,
+ struct uverbs_attr_bundle *attrs)
+{
+ struct ib_counters_read_attr read_attr = {};
+ const struct uverbs_attr *uattr;
+ struct ib_counters *counters =
+ uverbs_attr_get_obj(attrs, UVERBS_ATTR_READ_COUNTERS_HANDLE);
+ int ret;
+
+ if (!ib_dev->read_counters)
+ return -EOPNOTSUPP;
+
+ if (!atomic_read(&counters->usecnt))
+ return -EINVAL;
+
+ ret = uverbs_copy_from(&read_attr.flags, attrs,
+ UVERBS_ATTR_READ_COUNTERS_FLAGS);
+ if (ret)
+ return ret;
+
+ uattr = uverbs_attr_get(attrs, UVERBS_ATTR_READ_COUNTERS_BUFF);
+ read_attr.ncounters = uattr->ptr_attr.len / sizeof(u64);
+ read_attr.counters_buff = kcalloc(read_attr.ncounters,
+ sizeof(u64), GFP_KERNEL);
+ if (!read_attr.counters_buff)
+ return -ENOMEM;
+
+ ret = ib_dev->read_counters(counters,
+ &read_attr,
+ attrs);
+ if (ret)
+ goto err_read;
+
+ ret = uverbs_copy_to(attrs, UVERBS_ATTR_READ_COUNTERS_BUFF,
+ read_attr.counters_buff,
+ read_attr.ncounters * sizeof(u64));
+
+err_read:
+ kfree(read_attr.counters_buff);
+ return ret;
+}
+
static DECLARE_UVERBS_NAMED_METHOD(UVERBS_METHOD_COUNTERS_CREATE,
&UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_COUNTERS_HANDLE,
UVERBS_OBJECT_COUNTERS,
@@ -93,8 +136,22 @@ static DECLARE_UVERBS_NAMED_METHOD_WITH_HANDLER(UVERBS_METHOD_COUNTERS_DESTROY,
UVERBS_ACCESS_DESTROY,
UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)));
+#define MAX_COUNTERS_BUFF_SIZE USHRT_MAX
+static DECLARE_UVERBS_NAMED_METHOD(UVERBS_METHOD_COUNTERS_READ,
+ &UVERBS_ATTR_IDR(UVERBS_ATTR_READ_COUNTERS_HANDLE,
+ UVERBS_OBJECT_COUNTERS,
+ UVERBS_ACCESS_READ,
+ UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)),
+ &UVERBS_ATTR_PTR_OUT(UVERBS_ATTR_READ_COUNTERS_BUFF,
+ UVERBS_ATTR_SIZE(0, MAX_COUNTERS_BUFF_SIZE),
+ UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)),
+ &UVERBS_ATTR_PTR_IN(UVERBS_ATTR_READ_COUNTERS_FLAGS,
+ UVERBS_ATTR_TYPE(__u32),
+ UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)));
+
DECLARE_UVERBS_NAMED_OBJECT(UVERBS_OBJECT_COUNTERS,
&UVERBS_TYPE_ALLOC_IDR(0, uverbs_free_counters),
&UVERBS_METHOD(UVERBS_METHOD_COUNTERS_CREATE),
- &UVERBS_METHOD(UVERBS_METHOD_COUNTERS_DESTROY));
+ &UVERBS_METHOD(UVERBS_METHOD_COUNTERS_DESTROY),
+ &UVERBS_METHOD(UVERBS_METHOD_COUNTERS_READ));
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index c28ce62d2e40..888ac5975a6c 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -140,9 +140,16 @@ enum uverbs_attrs_destroy_counters_cmd_attr_ids {
UVERBS_ATTR_DESTROY_COUNTERS_HANDLE,
};
+enum uverbs_attrs_read_counters_cmd_attr_ids {
+ UVERBS_ATTR_READ_COUNTERS_HANDLE,
+ UVERBS_ATTR_READ_COUNTERS_BUFF,
+ UVERBS_ATTR_READ_COUNTERS_FLAGS,
+};
+
enum uverbs_methods_actions_counters_ops {
UVERBS_METHOD_COUNTERS_CREATE,
UVERBS_METHOD_COUNTERS_DESTROY,
+ UVERBS_METHOD_COUNTERS_READ,
};
#endif
--
2.14.3
^ permalink raw reply related
* [PATCH rdma-next v1 05/13] IB/core: Introduce counters read verb
From: Leon Romanovsky @ 2018-05-27 10:23 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev
In-Reply-To: <20180527102346.15149-1-leon@kernel.org>
From: Raed Salem <raeds@mellanox.com>
The user supplies counters instance and a reference to an output
array of uint64_t.
The driver reads the hardware counters values and writes them to
the output index location in the user supplied array.
All counters values are represented as uint64_t types.
To be able to successfully read the data the counters must be
first bound to an IB object.
Downstream patches will present binding method for
flow counters.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
include/rdma/ib_verbs.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index ce3d39725966..f6bd3b97b971 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2219,6 +2219,17 @@ struct ib_counters {
atomic_t usecnt;
};
+enum ib_read_counters_flags {
+ /* prefer read values from driver cache */
+ IB_READ_COUNTERS_ATTR_PREFER_CACHED = 1 << 0,
+};
+
+struct ib_counters_read_attr {
+ u64 *counters_buff;
+ u32 ncounters;
+ u32 flags; /* use enum ib_read_counters_flags */
+};
+
struct uverbs_attr_bundle;
struct ib_device {
@@ -2493,6 +2504,9 @@ struct ib_device {
struct ib_counters * (*create_counters)(struct ib_device *device,
struct uverbs_attr_bundle *attrs);
int (*destroy_counters)(struct ib_counters *counters);
+ int (*read_counters)(struct ib_counters *counters,
+ struct ib_counters_read_attr *counters_read_attr,
+ struct uverbs_attr_bundle *attrs);
/**
* rdma netdev operation
--
2.14.3
^ permalink raw reply related
* [PATCH rdma-next v1 04/13] IB/uverbs: Add create/destroy counters support
From: Leon Romanovsky @ 2018-05-27 10:23 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev
In-Reply-To: <20180527102346.15149-1-leon@kernel.org>
From: Raed Salem <raeds@mellanox.com>
User space application which uses counters functionality,
is expected to allocate/release the counters resources by
calling create/destroy verbs and in turn get a unique handle
that can be used to attach the counters to its counted type.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/Makefile | 2 +-
drivers/infiniband/core/uverbs.h | 1 +
drivers/infiniband/core/uverbs_std_types.c | 3 +-
.../infiniband/core/uverbs_std_types_counters.c | 100 +++++++++++++++++++++
include/uapi/rdma/ib_user_ioctl_cmds.h | 14 +++
5 files changed, 118 insertions(+), 2 deletions(-)
create mode 100644 drivers/infiniband/core/uverbs_std_types_counters.c
diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index 8d42373a2d8a..61667705d746 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -37,4 +37,4 @@ ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
rdma_core.o uverbs_std_types.o uverbs_ioctl.o \
uverbs_ioctl_merge.o uverbs_std_types_cq.o \
uverbs_std_types_flow_action.o uverbs_std_types_dm.o \
- uverbs_std_types_mr.o
+ uverbs_std_types_mr.o uverbs_std_types_counters.o
diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
index cfb51618ab7a..5b2461fa634d 100644
--- a/drivers/infiniband/core/uverbs.h
+++ b/drivers/infiniband/core/uverbs.h
@@ -287,6 +287,7 @@ extern const struct uverbs_object_def UVERBS_OBJECT(UVERBS_OBJECT_RWQ_IND_TBL);
extern const struct uverbs_object_def UVERBS_OBJECT(UVERBS_OBJECT_XRCD);
extern const struct uverbs_object_def UVERBS_OBJECT(UVERBS_OBJECT_FLOW_ACTION);
extern const struct uverbs_object_def UVERBS_OBJECT(UVERBS_OBJECT_DM);
+extern const struct uverbs_object_def UVERBS_OBJECT(UVERBS_OBJECT_COUNTERS);
#define IB_UVERBS_DECLARE_CMD(name) \
ssize_t ib_uverbs_##name(struct ib_uverbs_file *file, \
diff --git a/drivers/infiniband/core/uverbs_std_types.c b/drivers/infiniband/core/uverbs_std_types.c
index 569f48bd821e..b570acbd94af 100644
--- a/drivers/infiniband/core/uverbs_std_types.c
+++ b/drivers/infiniband/core/uverbs_std_types.c
@@ -302,7 +302,8 @@ static DECLARE_UVERBS_OBJECT_TREE(uverbs_default_objects,
&UVERBS_OBJECT(UVERBS_OBJECT_RWQ_IND_TBL),
&UVERBS_OBJECT(UVERBS_OBJECT_XRCD),
&UVERBS_OBJECT(UVERBS_OBJECT_FLOW_ACTION),
- &UVERBS_OBJECT(UVERBS_OBJECT_DM));
+ &UVERBS_OBJECT(UVERBS_OBJECT_DM),
+ &UVERBS_OBJECT(UVERBS_OBJECT_COUNTERS));
const struct uverbs_object_tree_def *uverbs_default_get_objects(void)
{
diff --git a/drivers/infiniband/core/uverbs_std_types_counters.c b/drivers/infiniband/core/uverbs_std_types_counters.c
new file mode 100644
index 000000000000..a5bc50ceee13
--- /dev/null
+++ b/drivers/infiniband/core/uverbs_std_types_counters.c
@@ -0,0 +1,100 @@
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/*
+ * Copyright (c) 2018, Mellanox Technologies inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "uverbs.h"
+#include <rdma/uverbs_std_types.h>
+
+static int uverbs_free_counters(struct ib_uobject *uobject,
+ enum rdma_remove_reason why)
+{
+ struct ib_counters *counters = uobject->object;
+
+ if (why == RDMA_REMOVE_DESTROY &&
+ atomic_read(&counters->usecnt))
+ return -EBUSY;
+
+ return counters->device->destroy_counters(counters);
+}
+
+static int UVERBS_HANDLER(UVERBS_METHOD_COUNTERS_CREATE)(struct ib_device *ib_dev,
+ struct ib_uverbs_file *file,
+ struct uverbs_attr_bundle *attrs)
+{
+ struct ib_counters *counters;
+ struct ib_uobject *uobj;
+ int ret;
+
+ /*
+ * This check should be removed once the infrastructure
+ * have the ability to remove methods from parse tree once
+ * such condition is met.
+ */
+ if (!ib_dev->create_counters)
+ return -EOPNOTSUPP;
+
+ uobj = uverbs_attr_get_uobject(attrs, UVERBS_ATTR_CREATE_COUNTERS_HANDLE);
+ counters = ib_dev->create_counters(ib_dev, attrs);
+ if (IS_ERR(counters)) {
+ ret = PTR_ERR(counters);
+ goto err_create_counters;
+ }
+
+ counters->device = ib_dev;
+ counters->uobject = uobj;
+ uobj->object = counters;
+ atomic_set(&counters->usecnt, 0);
+
+ return 0;
+
+err_create_counters:
+ return ret;
+}
+
+static DECLARE_UVERBS_NAMED_METHOD(UVERBS_METHOD_COUNTERS_CREATE,
+ &UVERBS_ATTR_IDR(UVERBS_ATTR_CREATE_COUNTERS_HANDLE,
+ UVERBS_OBJECT_COUNTERS,
+ UVERBS_ACCESS_NEW,
+ UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)));
+
+static DECLARE_UVERBS_NAMED_METHOD_WITH_HANDLER(UVERBS_METHOD_COUNTERS_DESTROY,
+ uverbs_destroy_def_handler,
+ &UVERBS_ATTR_IDR(UVERBS_ATTR_DESTROY_COUNTERS_HANDLE,
+ UVERBS_OBJECT_COUNTERS,
+ UVERBS_ACCESS_DESTROY,
+ UA_FLAGS(UVERBS_ATTR_SPEC_F_MANDATORY)));
+
+DECLARE_UVERBS_NAMED_OBJECT(UVERBS_OBJECT_COUNTERS,
+ &UVERBS_TYPE_ALLOC_IDR(0, uverbs_free_counters),
+ &UVERBS_METHOD(UVERBS_METHOD_COUNTERS_CREATE),
+ &UVERBS_METHOD(UVERBS_METHOD_COUNTERS_DESTROY));
+
diff --git a/include/uapi/rdma/ib_user_ioctl_cmds.h b/include/uapi/rdma/ib_user_ioctl_cmds.h
index 83e3890eef20..c28ce62d2e40 100644
--- a/include/uapi/rdma/ib_user_ioctl_cmds.h
+++ b/include/uapi/rdma/ib_user_ioctl_cmds.h
@@ -55,6 +55,7 @@ enum uverbs_default_objects {
UVERBS_OBJECT_WQ,
UVERBS_OBJECT_FLOW_ACTION,
UVERBS_OBJECT_DM,
+ UVERBS_OBJECT_COUNTERS,
};
enum {
@@ -131,4 +132,17 @@ enum uverbs_methods_mr {
UVERBS_METHOD_DM_MR_REG,
};
+enum uverbs_attrs_create_counters_cmd_attr_ids {
+ UVERBS_ATTR_CREATE_COUNTERS_HANDLE,
+};
+
+enum uverbs_attrs_destroy_counters_cmd_attr_ids {
+ UVERBS_ATTR_DESTROY_COUNTERS_HANDLE,
+};
+
+enum uverbs_methods_actions_counters_ops {
+ UVERBS_METHOD_COUNTERS_CREATE,
+ UVERBS_METHOD_COUNTERS_DESTROY,
+};
+
#endif
--
2.14.3
^ permalink raw reply related
* [PATCH rdma-next v1 03/13] IB/core: Introduce counters object and its create/destroy
From: Leon Romanovsky @ 2018-05-27 10:23 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev
In-Reply-To: <20180527102346.15149-1-leon@kernel.org>
From: Raed Salem <raeds@mellanox.com>
A verbs application may need to get statistics and info on various
aspects of a verb object (e.g. Flow, QP, ...), in general case the
application will state which object's counters its interested in
(we refer to this action as attach), bind this new counters object
to the appropriate verb object and on later stage read their values
using the counters object.
This series introduces a general API for counters object that may
accumulate any ib object counters type, bound and read on demand.
Counters instance is allocated on an IB context and belongs to
that context.
Upon successful creation the counters can be bound to a verbs
object so that hardware counter instances can be created and read.
Downstream patches in this series will introduce the attach, bind
and the read functionality.
Counters instance can be de-allocated, upon successful
destruction the related hardware resources are released.
Prior to destroy call the user must first make sure that the counters
is not being used by any IB object, e.g. not attached to any of its
counted type otherwise an EBUSY error is invoked.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
include/rdma/ib_verbs.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index e849bd0fc618..ce3d39725966 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2212,6 +2212,13 @@ struct ib_port_pkey_list {
struct list_head pkey_list;
};
+struct ib_counters {
+ struct ib_device *device;
+ struct ib_uobject *uobject;
+ /* num of objects attached */
+ atomic_t usecnt;
+};
+
struct uverbs_attr_bundle;
struct ib_device {
@@ -2483,6 +2490,10 @@ struct ib_device {
struct ib_mr * (*reg_dm_mr)(struct ib_pd *pd, struct ib_dm *dm,
struct ib_dm_mr_attr *attr,
struct uverbs_attr_bundle *attrs);
+ struct ib_counters * (*create_counters)(struct ib_device *device,
+ struct uverbs_attr_bundle *attrs);
+ int (*destroy_counters)(struct ib_counters *counters);
+
/**
* rdma netdev operation
*
--
2.14.3
^ permalink raw reply related
* [PATCH mlx5-next v1 02/13] net/mlx5: Export flow counter related API
From: Leon Romanovsky @ 2018-05-27 10:23 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev
In-Reply-To: <20180527102346.15149-1-leon@kernel.org>
From: Raed Salem <raeds@mellanox.com>
Exports counters API to be used in both IB and EN.
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Raed Salem <raeds@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/fs_core.h | 23 ----------------------
.../net/ethernet/mellanox/mlx5/core/fs_counters.c | 3 +++
include/linux/mlx5/fs.h | 22 +++++++++++++++++++++
3 files changed, 25 insertions(+), 23 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
index b6da322a8016..40992aed1791 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h
@@ -131,29 +131,6 @@ struct mlx5_flow_table {
struct rhltable fgs_hash;
};
-struct mlx5_fc_cache {
- u64 packets;
- u64 bytes;
- u64 lastuse;
-};
-
-struct mlx5_fc {
- struct rb_node node;
- struct list_head list;
-
- /* last{packets,bytes} members are used when calculating the delta since
- * last reading
- */
- u64 lastpackets;
- u64 lastbytes;
-
- u32 id;
- bool deleted;
- bool aging;
-
- struct mlx5_fc_cache cache ____cacheline_aligned_in_smp;
-};
-
struct mlx5_ft_underlay_qp {
struct list_head list;
u32 qpn;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
index b7ab929d5f8e..10f407843e03 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_counters.c
@@ -243,6 +243,7 @@ struct mlx5_fc *mlx5_fc_create(struct mlx5_core_dev *dev, bool aging)
return ERR_PTR(err);
}
+EXPORT_SYMBOL(mlx5_fc_create);
void mlx5_fc_destroy(struct mlx5_core_dev *dev, struct mlx5_fc *counter)
{
@@ -260,6 +261,7 @@ void mlx5_fc_destroy(struct mlx5_core_dev *dev, struct mlx5_fc *counter)
mlx5_cmd_fc_free(dev, counter->id);
kfree(counter);
}
+EXPORT_SYMBOL(mlx5_fc_destroy);
int mlx5_init_fc_stats(struct mlx5_core_dev *dev)
{
@@ -317,6 +319,7 @@ int mlx5_fc_query(struct mlx5_core_dev *dev, u16 id,
{
return mlx5_cmd_fc_query(dev, id, packets, bytes);
}
+EXPORT_SYMBOL(mlx5_fc_query);
void mlx5_fc_query_cached(struct mlx5_fc *counter,
u64 *bytes, u64 *packets, u64 *lastuse)
diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h
index 9f4d32e41c06..93aab0f055b4 100644
--- a/include/linux/mlx5/fs.h
+++ b/include/linux/mlx5/fs.h
@@ -186,6 +186,28 @@ struct mlx5_fc *mlx5_fc_create(struct mlx5_core_dev *dev, bool aging);
void mlx5_fc_destroy(struct mlx5_core_dev *dev, struct mlx5_fc *counter);
void mlx5_fc_query_cached(struct mlx5_fc *counter,
u64 *bytes, u64 *packets, u64 *lastuse);
+int mlx5_fc_query(struct mlx5_core_dev *dev, u16 id,
+ u64 *packets, u64 *bytes);
+
+struct mlx5_fc_cache {
+ u64 packets;
+ u64 bytes;
+ u64 lastuse;
+};
+
+struct mlx5_fc {
+ struct rb_node node;
+ struct list_head list;
+
+ u64 lastpackets;
+ u64 lastbytes;
+
+ u32 id;
+ bool deleted;
+ bool aging;
+ struct mlx5_fc_cache cache ____cacheline_aligned_in_smp;
+};
+
int mlx5_fs_add_rx_underlay_qpn(struct mlx5_core_dev *dev, u32 underlay_qpn);
int mlx5_fs_remove_rx_underlay_qpn(struct mlx5_core_dev *dev, u32 underlay_qpn);
--
2.14.3
^ permalink raw reply related
* [PATCH rdma-next v1 01/13] IB/uverbs: Add an ib_uobject getter to ioctl() infrastructure
From: Leon Romanovsky @ 2018-05-27 10:23 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev
In-Reply-To: <20180527102346.15149-1-leon@kernel.org>
From: Matan Barak <matanb@mellanox.com>
Previously, the user had to dig inside the attribute to get the uobject.
Add a helper function that correctly extract it (and do the required
checks) for him/her.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/uverbs_std_types_cq.c | 23 +++++++++++-----------
.../infiniband/core/uverbs_std_types_flow_action.c | 4 ++--
include/rdma/uverbs_ioctl.h | 11 +++++++++++
3 files changed, 25 insertions(+), 13 deletions(-)
diff --git a/drivers/infiniband/core/uverbs_std_types_cq.c b/drivers/infiniband/core/uverbs_std_types_cq.c
index b0dbae9dd0d7..3d293d01afea 100644
--- a/drivers/infiniband/core/uverbs_std_types_cq.c
+++ b/drivers/infiniband/core/uverbs_std_types_cq.c
@@ -65,7 +65,6 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(struct ib_device *ib_dev,
struct ib_cq_init_attr attr = {};
struct ib_cq *cq;
struct ib_uverbs_completion_event_file *ev_file = NULL;
- const struct uverbs_attr *ev_file_attr;
struct ib_uobject *ev_file_uobj;
if (!(ib_dev->uverbs_cmd_mask & 1ULL << IB_USER_VERBS_CMD_CREATE_CQ))
@@ -87,10 +86,8 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(struct ib_device *ib_dev,
UVERBS_ATTR_CREATE_CQ_FLAGS)))
return -EFAULT;
- ev_file_attr = uverbs_attr_get(attrs, UVERBS_ATTR_CREATE_CQ_COMP_CHANNEL);
- if (!IS_ERR(ev_file_attr)) {
- ev_file_uobj = ev_file_attr->obj_attr.uobject;
-
+ ev_file_uobj = uverbs_attr_get_uobject(attrs, UVERBS_ATTR_CREATE_CQ_COMP_CHANNEL);
+ if (!IS_ERR(ev_file_uobj)) {
ev_file = container_of(ev_file_uobj,
struct ib_uverbs_completion_event_file,
uobj_file.uobj);
@@ -102,8 +99,8 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_CREATE)(struct ib_device *ib_dev,
goto err_event_file;
}
- obj = container_of(uverbs_attr_get(attrs,
- UVERBS_ATTR_CREATE_CQ_HANDLE)->obj_attr.uobject,
+ obj = container_of(uverbs_attr_get_uobject(attrs,
+ UVERBS_ATTR_CREATE_CQ_HANDLE),
typeof(*obj), uobject);
obj->uverbs_file = ucontext->ufile;
obj->comp_events_reported = 0;
@@ -170,13 +167,17 @@ static int UVERBS_HANDLER(UVERBS_METHOD_CQ_DESTROY)(struct ib_device *ib_dev,
struct ib_uverbs_file *file,
struct uverbs_attr_bundle *attrs)
{
- struct ib_uverbs_destroy_cq_resp resp;
struct ib_uobject *uobj =
- uverbs_attr_get(attrs, UVERBS_ATTR_DESTROY_CQ_HANDLE)->obj_attr.uobject;
- struct ib_ucq_object *obj = container_of(uobj, struct ib_ucq_object,
- uobject);
+ uverbs_attr_get_uobject(attrs, UVERBS_ATTR_DESTROY_CQ_HANDLE);
+ struct ib_uverbs_destroy_cq_resp resp;
+ struct ib_ucq_object *obj;
int ret;
+ if (IS_ERR(uobj))
+ return PTR_ERR(uobj);
+
+ obj = container_of(uobj, struct ib_ucq_object, uobject);
+
if (!(ib_dev->uverbs_cmd_mask & 1ULL << IB_USER_VERBS_CMD_DESTROY_CQ))
return -EOPNOTSUPP;
diff --git a/drivers/infiniband/core/uverbs_std_types_flow_action.c b/drivers/infiniband/core/uverbs_std_types_flow_action.c
index b4f016dfa23d..a7be51cf2e42 100644
--- a/drivers/infiniband/core/uverbs_std_types_flow_action.c
+++ b/drivers/infiniband/core/uverbs_std_types_flow_action.c
@@ -320,7 +320,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_FLOW_ACTION_ESP_CREATE)(struct ib_device
return ret;
/* No need to check as this attribute is marked as MANDATORY */
- uobj = uverbs_attr_get(attrs, UVERBS_ATTR_FLOW_ACTION_ESP_HANDLE)->obj_attr.uobject;
+ uobj = uverbs_attr_get_uobject(attrs, UVERBS_ATTR_FLOW_ACTION_ESP_HANDLE);
action = ib_dev->create_flow_action_esp(ib_dev, &esp_attr.hdr, attrs);
if (IS_ERR(action))
return PTR_ERR(action);
@@ -350,7 +350,7 @@ static int UVERBS_HANDLER(UVERBS_METHOD_FLOW_ACTION_ESP_MODIFY)(struct ib_device
if (ret)
return ret;
- uobj = uverbs_attr_get(attrs, UVERBS_ATTR_FLOW_ACTION_ESP_HANDLE)->obj_attr.uobject;
+ uobj = uverbs_attr_get_uobject(attrs, UVERBS_ATTR_FLOW_ACTION_ESP_HANDLE);
action = uobj->object;
if (action->type != IB_FLOW_ACTION_ESP)
diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h
index 4a4201d997a7..7ac6271a5ee0 100644
--- a/include/rdma/uverbs_ioctl.h
+++ b/include/rdma/uverbs_ioctl.h
@@ -420,6 +420,17 @@ static inline void *uverbs_attr_get_obj(const struct uverbs_attr_bundle *attrs_b
return uobj->object;
}
+static inline struct ib_uobject *uverbs_attr_get_uobject(const struct uverbs_attr_bundle *attrs_bundle,
+ u16 idx)
+{
+ const struct uverbs_attr *attr = uverbs_attr_get(attrs_bundle, idx);
+
+ if (IS_ERR(attr))
+ return ERR_CAST(attr);
+
+ return attr->obj_attr.uobject;
+}
+
static inline int uverbs_copy_to(const struct uverbs_attr_bundle *attrs_bundle,
size_t idx, const void *from, size_t size)
{
--
2.14.3
^ permalink raw reply related
* [PATCH rdma-next v1 00/13] Verbs flow counters support
From: Leon Romanovsky @ 2018-05-27 10:23 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, RDMA mailing list, Boris Pismenny, Matan Barak,
Raed Salem, Yishai Hadas, Saeed Mahameed, linux-netdev
From: Leon Romanovsky <leonro@mellanox.com>
Changelog v0->v1:
* Decouple from DevX submission
* Use uverbs_attr_get_obj at counters read method
* Added define for max read buffer size (MAX_COUNTERS_BUFF_SIZE)
* Removed the struct mlx5_ib_flow_counter basic_flow_cnts and
the related structs used, used define instead
* Took Matan's patch from DevX
* uverbs_free_counters removed void* casting
* Added check to bound ncounters value (added define
* Changed user supplied data buffer structure to be array of
struct <desc,index> pair (applied this change to user space also)
Not changed:
* UAPI files
* Addition of uhw to flow
Thanks
----------------------------------------------------------------------
>From Raed:
This series comes to allow user space applications to monitor real time
traffic activity and events of the verbs objects it manages, e.g.:
ibv_qp, ibv_wq, ibv_flow.
This API enables generic counters creation and define mapping
to association with a verbs object, current mlx5 driver using
this API for flow counters.
With this API, an application can monitor the entire life cycle of
object activity, defined here as a static counters attachment.
This API also allows dynamic counters monitoring of measurement points
for a partial period in the verbs object life cycle.
In addition it presents the implementation of the generic counters interface.
This will be achieved by extending flow creation by adding a new flow count
specification type which allows the user to associate a previously created
flow counters using the generic verbs counters interface to the created flow,
once associated the user could read statistics by using the read function of
the generic counters interface.
The API includes:
1. create and destroyed API of a new counters objects
2. read the counters values from HW
Note:
Attaching API to allow application to define the measurement points per objects
is a user space only API and this data is passed to kernel when the counted
object (e.g. flow) is created with the counters object.
Thanks
Matan Barak (2):
IB/uverbs: Add an ib_uobject getter to ioctl() infrastructure
IB/core: Support passing uhw for create_flow
Raed Salem (11):
net/mlx5: Export flow counter related API
IB/core: Introduce counters object and its create/destroy
IB/uverbs: Add create/destroy counters support
IB/core: Introduce counters read verb
IB/uverbs: Add read counters support
IB/core: Add support for flow counters
IB/uverbs: Add support for flow counters
IB/mlx5: Add counters create and destroy support
IB/mlx5: Add flow counters binding support
IB/mlx5: Add flow counters read support
IB/mlx5: Add counters read support
drivers/infiniband/core/Makefile | 2 +-
drivers/infiniband/core/uverbs.h | 2 +
drivers/infiniband/core/uverbs_cmd.c | 88 ++++++-
drivers/infiniband/core/uverbs_std_types.c | 3 +-
.../infiniband/core/uverbs_std_types_counters.c | 157 +++++++++++
drivers/infiniband/core/uverbs_std_types_cq.c | 23 +-
.../infiniband/core/uverbs_std_types_flow_action.c | 4 +-
drivers/infiniband/core/verbs.c | 2 +-
drivers/infiniband/hw/mlx4/main.c | 6 +-
drivers/infiniband/hw/mlx5/main.c | 291 ++++++++++++++++++++-
drivers/infiniband/hw/mlx5/mlx5_ib.h | 36 +++
drivers/net/ethernet/mellanox/mlx5/core/fs_core.h | 23 --
.../net/ethernet/mellanox/mlx5/core/fs_counters.c | 3 +
include/linux/mlx5/fs.h | 23 ++
include/rdma/ib_verbs.h | 43 ++-
include/rdma/uverbs_ioctl.h | 11 +
include/uapi/rdma/ib_user_ioctl_cmds.h | 21 ++
include/uapi/rdma/ib_user_verbs.h | 13 +
include/uapi/rdma/mlx5-abi.h | 14 +
19 files changed, 701 insertions(+), 64 deletions(-)
create mode 100644 drivers/infiniband/core/uverbs_std_types_counters.c
^ permalink raw reply
* Re: [PATCH net] sctp: not allow to set rto_min with a value below 200 msecs
From: Michael Tuexen @ 2018-05-27 8:58 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: Neil Horman, Xin Long, network dev, linux-sctp, David Miller,
David Ahern, Eric Dumazet, Marcelo Ricardo Leitner, syzkaller
In-Reply-To: <CACT4Y+YozRSfcoUoKHOWy5wujhVdks38vcfNGhwNj-REWcd-hw@mail.gmail.com>
> On 26. May 2018, at 17:50, Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Sat, May 26, 2018 at 5:42 PM, Michael Tuexen
> <michael.tuexen@lurchi.franken.de> wrote:
>>> On 25. May 2018, at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote:
>>>
>>> On Sat, May 26, 2018 at 01:41:02AM +0800, Xin Long wrote:
>>>> syzbot reported a rcu_sched self-detected stall on CPU which is caused
>>>> by too small value set on rto_min with SCTP_RTOINFO sockopt. With this
>>>> value, hb_timer will get stuck there, as in its timer handler it starts
>>>> this timer again with this value, then goes to the timer handler again.
>>>>
>>>> This problem is there since very beginning, and thanks to Eric for the
>>>> reproducer shared from a syzbot mail.
>>>>
>>>> This patch fixes it by not allowing to set rto_min with a value below
>>>> 200 msecs, which is based on TCP's, by either setsockopt or sysctl.
>>>>
>>>> Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com
>>>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>>>> ---
>>>> include/net/sctp/constants.h | 1 +
>>>> net/sctp/socket.c | 10 +++++++---
>>>> net/sctp/sysctl.c | 3 ++-
>>>> 3 files changed, 10 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/include/net/sctp/constants.h b/include/net/sctp/constants.h
>>>> index 20ff237..2ee7a7b 100644
>>>> --- a/include/net/sctp/constants.h
>>>> +++ b/include/net/sctp/constants.h
>>>> @@ -277,6 +277,7 @@ enum { SCTP_MAX_GABS = 16 };
>>>> #define SCTP_RTO_INITIAL (3 * 1000)
>>>> #define SCTP_RTO_MIN (1 * 1000)
>>>> #define SCTP_RTO_MAX (60 * 1000)
>>>> +#define SCTP_RTO_HARD_MIN 200
>>>>
>>>> #define SCTP_RTO_ALPHA 3 /* 1/8 when converted to right shifts. */
>>>> #define SCTP_RTO_BETA 2 /* 1/4 when converted to right shifts. */
>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>>> index ae7e7c6..6ef12c7 100644
>>>> --- a/net/sctp/socket.c
>>>> +++ b/net/sctp/socket.c
>>>> @@ -3029,7 +3029,8 @@ static int sctp_setsockopt_nodelay(struct sock *sk, char __user *optval,
>>>> * be changed.
>>>> *
>>>> */
>>>> -static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigned int optlen)
>>>> +static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval,
>>>> + unsigned int optlen)
>>>> {
>>>> struct sctp_rtoinfo rtoinfo;
>>>> struct sctp_association *asoc;
>>>> @@ -3056,10 +3057,13 @@ static int sctp_setsockopt_rtoinfo(struct sock *sk, char __user *optval, unsigne
>>>> else
>>>> rto_max = asoc ? asoc->rto_max : sp->rtoinfo.srto_max;
>>>>
>>>> - if (rto_min)
>>>> + if (rto_min) {
>>>> + if (rto_min < SCTP_RTO_HARD_MIN)
>>>> + return -EINVAL;
>>>> rto_min = asoc ? msecs_to_jiffies(rto_min) : rto_min;
>>>> - else
>>>> + } else {
>>>> rto_min = asoc ? asoc->rto_min : sp->rtoinfo.srto_min;
>>>> + }
>>>>
>>>> if (rto_min > rto_max)
>>>> return -EINVAL;
>>>> diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
>>>> index 33ca5b7..7ec854a 100644
>>>> --- a/net/sctp/sysctl.c
>>>> +++ b/net/sctp/sysctl.c
>>>> @@ -52,6 +52,7 @@ static int rto_alpha_min = 0;
>>>> static int rto_beta_min = 0;
>>>> static int rto_alpha_max = 1000;
>>>> static int rto_beta_max = 1000;
>>>> +static int rto_hard_min = SCTP_RTO_HARD_MIN;
>>>>
>>>> static unsigned long max_autoclose_min = 0;
>>>> static unsigned long max_autoclose_max =
>>>> @@ -116,7 +117,7 @@ static struct ctl_table sctp_net_table[] = {
>>>> .maxlen = sizeof(unsigned int),
>>>> .mode = 0644,
>>>> .proc_handler = proc_sctp_do_rto_min,
>>>> - .extra1 = &one,
>>>> + .extra1 = &rto_hard_min,
>>>> .extra2 = &init_net.sctp.rto_max
>>>> },
>>>> {
>>>> --
>>>> 2.1.0
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>> Patch looks fine, you probably want to note this hard minimum in man(7) sctp as
>>> well
>>>
>> I'm aware of some signalling networks which use RTO.min of smaller values than 200ms.
>> So could this be reduced?
>
> Hi Michael,
>
> What value do they use?
I have seen values of
RTO.Min = 50ms
RTO.Max = 200ms
RTO.Initial = 100ms
Best regards
Michael
>
> Xin, Neil, is there more principled way of ensuring that a timer won't
> cause a hard CPU stall? There are slow machines and there are slow
> kernels (in particular syzbot kernel has tons of debug configs
> enabled). 200ms _should_ not cause problems because we did not see
> them with tcp. But it's hard to say what's the low limit as we are
> trying to put a hard upper bound on execution time of a complex
> section of code. Is there something like cond_resched for timers?
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox