* [PATCH net-next] once: switch to new jump label API
From: Eric Biggers @ 2017-10-09 21:30 UTC (permalink / raw)
To: netdev
Cc: David S . Miller, linux-kernel, Hannes Frederic Sowa, Jason Baron,
Peter Zijlstra, Eric Biggers
From: Eric Biggers <ebiggers@google.com>
Switch the DO_ONCE() macro from the deprecated jump label API to the new
one. The new one is more readable, and for DO_ONCE() it also makes the
generated code more icache-friendly: now the one-time initialization
code is placed out-of-line at the jump target, rather than at the inline
fallthrough case.
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
include/linux/once.h | 6 +++---
lib/once.c | 8 ++++----
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/include/linux/once.h b/include/linux/once.h
index 9c98aaa87cbc..724724918e8b 100644
--- a/include/linux/once.h
+++ b/include/linux/once.h
@@ -5,7 +5,7 @@
#include <linux/jump_label.h>
bool __do_once_start(bool *done, unsigned long *flags);
-void __do_once_done(bool *done, struct static_key *once_key,
+void __do_once_done(bool *done, struct static_key_true *once_key,
unsigned long *flags);
/* Call a function exactly once. The idea of DO_ONCE() is to perform
@@ -38,8 +38,8 @@ void __do_once_done(bool *done, struct static_key *once_key,
({ \
bool ___ret = false; \
static bool ___done = false; \
- static struct static_key ___once_key = STATIC_KEY_INIT_TRUE; \
- if (static_key_true(&___once_key)) { \
+ static DEFINE_STATIC_KEY_TRUE(___once_key); \
+ if (static_branch_unlikely(&___once_key)) { \
unsigned long ___flags; \
___ret = __do_once_start(&___done, &___flags); \
if (unlikely(___ret)) { \
diff --git a/lib/once.c b/lib/once.c
index 05c8604627eb..831c5a6b0bb2 100644
--- a/lib/once.c
+++ b/lib/once.c
@@ -5,7 +5,7 @@
struct once_work {
struct work_struct work;
- struct static_key *key;
+ struct static_key_true *key;
};
static void once_deferred(struct work_struct *w)
@@ -14,11 +14,11 @@ static void once_deferred(struct work_struct *w)
work = container_of(w, struct once_work, work);
BUG_ON(!static_key_enabled(work->key));
- static_key_slow_dec(work->key);
+ static_branch_disable(work->key);
kfree(work);
}
-static void once_disable_jump(struct static_key *key)
+static void once_disable_jump(struct static_key_true *key)
{
struct once_work *w;
@@ -51,7 +51,7 @@ bool __do_once_start(bool *done, unsigned long *flags)
}
EXPORT_SYMBOL(__do_once_start);
-void __do_once_done(bool *done, struct static_key *once_key,
+void __do_once_done(bool *done, struct static_key_true *once_key,
unsigned long *flags)
__releases(once_lock)
{
--
2.14.2.920.gcf0c67979c-goog
^ permalink raw reply related
* RE: [PATCH RFC v1 0/3] Support for tap user-space access with veth interfaces
From: Grandhi, Sainath @ 2017-10-09 21:51 UTC (permalink / raw)
To: netdev@vger.kernel.org; +Cc: davem@davemloft.net
In-Reply-To: <1504744467-79590-1-git-send-email-sainath.grandhi@intel.com>
Hello,
Just a reminder for feedback. Please let me know your comments.
> -----Original Message-----
> From: Grandhi, Sainath
> Sent: Wednesday, September 06, 2017 5:34 PM
> To: netdev@vger.kernel.org
> Cc: davem@davemloft.net; Grandhi, Sainath <sainath.grandhi@intel.com>
> Subject: [PATCH RFC v1 0/3] Support for tap user-space access with veth
> interfaces
>
> From: Sainath Grandhi <sainath.grandhi@intel.com>
>
> This patchset adds a tap device driver for veth virtual network interface.
> With this implementation, tap character interface can be added only to the peer
> veth interface. Adding tap interface to veth is for usecases that forwards
> packets between host and VMs. This eliminates the need for an additional
> software bridge. This can be extended to create both the peer interfaces as tap
> interfaces. These patches are a step in that direction.
>
> Sainath Grandhi (3):
> net: Adding API to parse IFLA_LINKINFO attribute
> net: Abstracting out common routines from veth for use by vethtap
> vethtap: veth based tap driver
>
> drivers/net/Kconfig | 1 +
> drivers/net/Makefile | 2 +
> drivers/net/{veth.c => veth_main.c} | 80 ++++++++++---
> drivers/net/vethtap.c | 216 ++++++++++++++++++++++++++++++++++++
> include/linux/if_veth.h | 13 +++
> include/net/rtnetlink.h | 3 +
> net/core/rtnetlink.c | 8 ++
> 7 files changed, 308 insertions(+), 15 deletions(-) rename drivers/net/{veth.c =>
> veth_main.c} (89%) create mode 100644 drivers/net/vethtap.c create mode
> 100644 include/linux/if_veth.h
>
> --
> 2.7.4
^ permalink raw reply
* Re: [PATCH net-next RFC 4/9] net: dsa: mv88e6xxx: add support for event capture
From: Levi Pearson @ 2017-10-09 22:08 UTC (permalink / raw)
To: Richard Cochran
Cc: Brandon Streiff, Linux Kernel Network Developers, linux-kernel,
David S. Miller, Florian Fainelli, Andrew Lunn, Vivien Didelot,
Erik Hons
In-Reply-To: <20171008150634.hb7nxwbbmi5xff7m@localhost>
On Sun, Oct 8, 2017 at 9:06 AM, Richard Cochran
<richardcochran@gmail.com> wrote:
>> +static int mv88e6xxx_ptp_enable_perout(struct mv88e6xxx_chip *chip,
>> + struct ptp_clock_request *rq, int on)
>> +{
>> + struct timespec ts;
>> + u64 ns;
>> + int pin;
>> + int err;
>> +
>> + pin = ptp_find_pin(chip->ptp_clock, PTP_PF_PEROUT, rq->extts.index);
>> +
>> + if (pin < 0)
>> + return -EBUSY;
>> +
>> + ts.tv_sec = rq->perout.period.sec;
>> + ts.tv_nsec = rq->perout.period.nsec;
>> + ns = timespec_to_ns(&ts);
>> +
>> + if (ns > U32_MAX)
>> + return -ERANGE;
>> +
>> + mutex_lock(&chip->reg_lock);
>> +
>> + err = mv88e6xxx_config_periodic_trig(chip, (u32)ns, 0);
>
> Here you ignore the phase of the signal given in the trq->perout.start
> field. That is not what the user expects. For periodic outputs where
> the phase cannot be set, we really would need a new ioctl.
>
> However, in this case, you should just drop this functionality. I
> understand that this works with your adjustable external oscillator,
> but we cannot support that in mainline (at least, not yet).
I've been working with this patchset and just came across this
limitation as well. The periodic timer output is the basis of the Qbv
t0 timer in devices that support Qbv, and setting this up with the
correct start time is pretty important in that context. The hardware
does support setting a start time, but it must be specified according
to the cycle count of the free-running timer rather than a nanosecond
value. I think this can be worked out from the values stored in the
timecounter struct and I'm writing some code for it now, but if you've
already written something I'd be happy to integrate that instead.
Another issue related to this is that while the free-running counter
in the hardware can't be easily adjusted, the periodic event generator
*can* be finely adjusted (via picosecond and sub-picosecond
accumulators) to correct for drift between the local clock and the PTP
grandmaster time. So to be semantically correct, this needs to be both
started at the right time *and* it needs to have the periodic
corrections made so that the fine correction parameters in the
hardware keep it adjusted to be synchronous with PTP grandmaster time.
So, taking this functionality out in the first pass seems like a good
move for Brandon to take, but I'm working on a complete implementation
for it. I think I've got a handle on how to do it, but if you have any
suggestions, I'm open to them.
Levi
^ permalink raw reply
* [PATCH net-next v2 0/5] bpf: security: New file mode and LSM hooks for eBPF object permission control
From: Chenbo Feng @ 2017-10-09 22:20 UTC (permalink / raw)
To: linux-security-module, netdev, SELinux
Cc: Jeffrey Vander Stoep, Alexei Starovoitov, lorenzo,
Daniel Borkmann, Stephen Smalley, Chenbo Feng
From: Chenbo Feng <fengc@google.com>
Much like files and sockets, eBPF objects are accessed, controlled, and
shared via a file descriptor (FD). Unlike files and sockets, the
existing mechanism for eBPF object access control is very limited.
Currently there are two options for granting accessing to eBPF
operations: grant access to all processes, or only CAP_SYS_ADMIN
processes. The CAP_SYS_ADMIN-only mode is not ideal because most users
do not have this capability and granting a user CAP_SYS_ADMIN grants too
many other security-sensitive permissions. It also unnecessarily allows
all CAP_SYS_ADMIN processes access to eBPF functionality. Allowing all
processes to access to eBPF objects is also undesirable since it has
potential to allow unprivileged processes to consume kernel memory, and
opens up attack surface to the kernel.
Adding LSM hooks maintains the status quo for systems which do not use
an LSM, preserving compatibility with userspace, while allowing security
modules to choose how best to handle permissions on eBPF objects. Here
is a possible use case for the lsm hooks with selinux module:
The network-control daemon (netd) creates and loads an eBPF object for
network packet filtering and analysis. It passes the object FD to an
unprivileged network monitor app (netmonitor), which is not allowed to
create, modify or load eBPF objects, but is allowed to read the traffic
stats from the map.
Selinux could use these hooks to grant the following permissions:
allow netd self:bpf_map { create read write};
allow netmonitor netd:fd use;
allow netmonitor netd:bpf_map read;
In this patch series, A file mode is added to bpf map to store the
accessing mode. With this file mode flags, the map can be obtained read
only, write only or read and write. With the help of this file mode,
several security hooks can be added to the eBPF syscall implementations
to do permissions checks. These LSM hooks are mainly focused on checking
the process privileges before it obtains the fd for a specific bpf
object. No matter from a file location or from a eBPF id. Besides that,
a general check hook is also implemented at the start of bpf syscalls so
that each security module can have their own implementation on the reset
of bpf object related functionalities.
In order to store the ownership and security information about eBPF
maps, a security field pointer is added to the struct bpf_map. And the
last two patch set are implementation of selinux check on these hooks
introduced, plus an additional check when eBPF object is passed between
processes using unix socket as well as binder IPC.
Change since V1:
- Whitelist the new bpf flags in the map allocate check.
- Added bpf selftest for the new flags.
- Added two new security hooks for copying the security information from
the bpf object security struct to file security struct
- Simplified the checking action when bpf fd is passed between processes.
Chenbo Feng (5):
bpf: Add file mode configuration into bpf maps
bpf: Add tests for eBPF file mode
security: bpf: Add LSM hooks for bpf object related syscall
selinux: bpf: Add selinux check for eBPF syscall operations
selinux: bpf: Add addtional check for bpf object file receive
include/linux/bpf.h | 15 ++-
include/linux/lsm_hooks.h | 71 +++++++++++++
include/linux/security.h | 54 ++++++++++
include/uapi/linux/bpf.h | 6 ++
kernel/bpf/arraymap.c | 7 +-
kernel/bpf/devmap.c | 5 +-
kernel/bpf/hashtab.c | 5 +-
kernel/bpf/inode.c | 15 ++-
kernel/bpf/lpm_trie.c | 3 +-
kernel/bpf/sockmap.c | 5 +-
kernel/bpf/stackmap.c | 5 +-
kernel/bpf/syscall.c | 112 ++++++++++++++++++---
security/security.c | 40 ++++++++
security/selinux/hooks.c | 172 ++++++++++++++++++++++++++++++++
security/selinux/include/classmap.h | 2 +
security/selinux/include/objsec.h | 4 +
tools/testing/selftests/bpf/test_maps.c | 48 +++++++++
17 files changed, 542 insertions(+), 27 deletions(-)
--
2.14.2.920.gcf0c67979c-goog
^ permalink raw reply
* [PATCH net-next v2 2/5] bpf: Add tests for eBPF file mode
From: Chenbo Feng @ 2017-10-09 22:20 UTC (permalink / raw)
To: linux-security-module, netdev, SELinux
Cc: Jeffrey Vander Stoep, Alexei Starovoitov, lorenzo,
Daniel Borkmann, Stephen Smalley, Chenbo Feng
In-Reply-To: <20171009222028.13096-1-chenbofeng.kernel@gmail.com>
From: Chenbo Feng <fengc@google.com>
Two related tests are added into bpf selftest to test read only map and
write only map. The tests verified the read only and write only flags
are working on hash maps.
Signed-off-by: Chenbo Feng <fengc@google.com>
---
tools/testing/selftests/bpf/test_maps.c | 48 +++++++++++++++++++++++++++++++++
1 file changed, 48 insertions(+)
diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c
index fe3a443a1102..896f23cfe918 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -1033,6 +1033,51 @@ static void test_map_parallel(void)
assert(bpf_map_get_next_key(fd, &key, &key) == -1 && errno == ENOENT);
}
+static void test_map_rdonly(void)
+{
+ int i, fd, key = 0, value = 0;
+
+ fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value),
+ MAP_SIZE, map_flags | BPF_F_RDONLY);
+ if (fd < 0) {
+ printf("Failed to create map for read only test '%s'!\n",
+ strerror(errno));
+ exit(1);
+ }
+
+ key = 1;
+ value = 1234;
+ /* Insert key=1 element. */
+ assert(bpf_map_update_elem(fd, &key, &value, BPF_ANY) == -1 &&
+ errno == EPERM);
+
+ /* Check that key=2 is not found. */
+ assert(bpf_map_lookup_elem(fd, &key, &value) == -1 && errno == ENOENT);
+ assert(bpf_map_get_next_key(fd, &key, &value) == -1 && errno == ENOENT);
+}
+
+static void test_map_wronly(void)
+{
+ int i, fd, key = 0, value = 0;
+
+ fd = bpf_create_map(BPF_MAP_TYPE_HASH, sizeof(key), sizeof(value),
+ MAP_SIZE, map_flags | BPF_F_WRONLY);
+ if (fd < 0) {
+ printf("Failed to create map for read only test '%s'!\n",
+ strerror(errno));
+ exit(1);
+ }
+
+ key = 1;
+ value = 1234;
+ /* Insert key=1 element. */
+ assert(bpf_map_update_elem(fd, &key, &value, BPF_ANY) == 0)
+
+ /* Check that key=2 is not found. */
+ assert(bpf_map_lookup_elem(fd, &key, &value) == -1 && errno == EPERM);
+ assert(bpf_map_get_next_key(fd, &key, &value) == -1 && errno == EPERM);
+}
+
static void run_all_tests(void)
{
test_hashmap(0, NULL);
@@ -1050,6 +1095,9 @@ static void run_all_tests(void)
test_map_large();
test_map_parallel();
test_map_stress();
+
+ test_map_rdonly();
+ test_map_wronly();
}
int main(void)
--
2.14.2.920.gcf0c67979c-goog
^ permalink raw reply related
* [PATCH net-next v2 1/5] bpf: Add file mode configuration into bpf maps
From: Chenbo Feng @ 2017-10-09 22:20 UTC (permalink / raw)
To: linux-security-module, netdev, SELinux
Cc: Jeffrey Vander Stoep, Alexei Starovoitov, lorenzo,
Daniel Borkmann, Stephen Smalley, Chenbo Feng
In-Reply-To: <20171009222028.13096-1-chenbofeng.kernel@gmail.com>
From: Chenbo Feng <fengc@google.com>
Introduce the map read/write flags to the eBPF syscalls that returns the
map fd. The flags is used to set up the file mode when construct a new
file descriptor for bpf maps. To not break the backward capability, the
f_flags is set to O_RDWR if the flag passed by syscall is 0. Otherwise
it should be O_RDONLY or O_WRONLY. When the userspace want to modify or
read the map content, it will check the file mode to see if it is
allowed to make the change.
Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
include/linux/bpf.h | 6 ++--
include/uapi/linux/bpf.h | 6 ++++
kernel/bpf/arraymap.c | 7 +++--
kernel/bpf/devmap.c | 5 ++-
kernel/bpf/hashtab.c | 5 +--
kernel/bpf/inode.c | 15 ++++++---
kernel/bpf/lpm_trie.c | 3 +-
kernel/bpf/sockmap.c | 5 ++-
kernel/bpf/stackmap.c | 5 ++-
kernel/bpf/syscall.c | 80 +++++++++++++++++++++++++++++++++++++++++++-----
10 files changed, 114 insertions(+), 23 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bc7da2ddfcaf..0e9ca2555d7f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -308,11 +308,11 @@ void bpf_map_area_free(void *base);
extern int sysctl_unprivileged_bpf_disabled;
-int bpf_map_new_fd(struct bpf_map *map);
+int bpf_map_new_fd(struct bpf_map *map, int flags);
int bpf_prog_new_fd(struct bpf_prog *prog);
int bpf_obj_pin_user(u32 ufd, const char __user *pathname);
-int bpf_obj_get_user(const char __user *pathname);
+int bpf_obj_get_user(const char __user *pathname, int flags);
int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value);
int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value);
@@ -331,6 +331,8 @@ int bpf_fd_htab_map_update_elem(struct bpf_map *map, struct file *map_file,
void *key, void *value, u64 map_flags);
int bpf_fd_htab_map_lookup_elem(struct bpf_map *map, void *key, u32 *value);
+int bpf_get_file_flag(int flags);
+
/* memcpy that is used with 8-byte aligned pointers, power-of-8 size and
* forced to use 'long' read/writes to try to atomically copy long counters.
* Best-effort only. No barriers here, since it _will_ race with concurrent
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6db9e1d679cd..9cb50a228c39 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -217,6 +217,10 @@ enum bpf_attach_type {
#define BPF_OBJ_NAME_LEN 16U
+/* Flags for accessing BPF object */
+#define BPF_F_RDONLY (1U << 3)
+#define BPF_F_WRONLY (1U << 4)
+
union bpf_attr {
struct { /* anonymous struct used by BPF_MAP_CREATE command */
__u32 map_type; /* one of enum bpf_map_type */
@@ -259,6 +263,7 @@ union bpf_attr {
struct { /* anonymous struct used by BPF_OBJ_* commands */
__aligned_u64 pathname;
__u32 bpf_fd;
+ __u32 file_flags;
};
struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
@@ -286,6 +291,7 @@ union bpf_attr {
__u32 map_id;
};
__u32 next_id;
+ __u32 open_flags;
};
struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 68d866628be0..f869e48ef2f6 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -19,6 +19,9 @@
#include "map_in_map.h"
+#define ARRAY_CREATE_FLAG_MASK \
+ (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+
static void bpf_array_free_percpu(struct bpf_array *array)
{
int i;
@@ -56,8 +59,8 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
/* check sanity of attributes */
if (attr->max_entries == 0 || attr->key_size != 4 ||
- attr->value_size == 0 || attr->map_flags & ~BPF_F_NUMA_NODE ||
- (percpu && numa_node != NUMA_NO_NODE))
+ attr->value_size == 0 || attr->map_flags &
+ ~ARRAY_CREATE_FLAG_MASK || (percpu && numa_node != NUMA_NO_NODE))
return ERR_PTR(-EINVAL);
if (attr->value_size > KMALLOC_MAX_SIZE)
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index e093d9a2c4dd..e5d3de7cff2e 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -50,6 +50,9 @@
#include <linux/bpf.h>
#include <linux/filter.h>
+#define DEV_CREATE_FLAG_MASK \
+ (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+
struct bpf_dtab_netdev {
struct net_device *dev;
struct bpf_dtab *dtab;
@@ -80,7 +83,7 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
/* check sanity of attributes */
if (attr->max_entries == 0 || attr->key_size != 4 ||
- attr->value_size != 4 || attr->map_flags & ~BPF_F_NUMA_NODE)
+ attr->value_size != 4 || attr->map_flags & ~DEV_CREATE_FLAG_MASK)
return ERR_PTR(-EINVAL);
dtab = kzalloc(sizeof(*dtab), GFP_USER);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 431126f31ea3..919955236e63 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -18,8 +18,9 @@
#include "bpf_lru_list.h"
#include "map_in_map.h"
-#define HTAB_CREATE_FLAG_MASK \
- (BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE)
+#define HTAB_CREATE_FLAG_MASK \
+ (BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE | \
+ BPF_F_RDONLY | BPF_F_WRONLY)
struct bucket {
struct hlist_nulls_head head;
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index e833ed914358..7d8c6dd8dd5d 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -295,7 +295,7 @@ int bpf_obj_pin_user(u32 ufd, const char __user *pathname)
}
static void *bpf_obj_do_get(const struct filename *pathname,
- enum bpf_type *type)
+ enum bpf_type *type, int flags)
{
struct inode *inode;
struct path path;
@@ -307,7 +307,7 @@ static void *bpf_obj_do_get(const struct filename *pathname,
return ERR_PTR(ret);
inode = d_backing_inode(path.dentry);
- ret = inode_permission(inode, MAY_WRITE);
+ ret = inode_permission(inode, ACC_MODE(flags));
if (ret)
goto out;
@@ -326,18 +326,23 @@ static void *bpf_obj_do_get(const struct filename *pathname,
return ERR_PTR(ret);
}
-int bpf_obj_get_user(const char __user *pathname)
+int bpf_obj_get_user(const char __user *pathname, int flags)
{
enum bpf_type type = BPF_TYPE_UNSPEC;
struct filename *pname;
int ret = -ENOENT;
+ int f_flags;
void *raw;
+ f_flags = bpf_get_file_flag(flags);
+ if (f_flags < 0)
+ return f_flags;
+
pname = getname(pathname);
if (IS_ERR(pname))
return PTR_ERR(pname);
- raw = bpf_obj_do_get(pname, &type);
+ raw = bpf_obj_do_get(pname, &type, f_flags);
if (IS_ERR(raw)) {
ret = PTR_ERR(raw);
goto out;
@@ -346,7 +351,7 @@ int bpf_obj_get_user(const char __user *pathname)
if (type == BPF_TYPE_PROG)
ret = bpf_prog_new_fd(raw);
else if (type == BPF_TYPE_MAP)
- ret = bpf_map_new_fd(raw);
+ ret = bpf_map_new_fd(raw, f_flags);
else
goto out;
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 34d8a690ea05..885e45479680 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -495,7 +495,8 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
#define LPM_KEY_SIZE_MAX LPM_KEY_SIZE(LPM_DATA_SIZE_MAX)
#define LPM_KEY_SIZE_MIN LPM_KEY_SIZE(LPM_DATA_SIZE_MIN)
-#define LPM_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_NUMA_NODE)
+#define LPM_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_NUMA_NODE | \
+ BPF_F_RDONLY | BPF_F_WRONLY)
static struct bpf_map *trie_alloc(union bpf_attr *attr)
{
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index a298d6666698..86ec846f2d5e 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -40,6 +40,9 @@
#include <linux/list.h>
#include <net/strparser.h>
+#define SOCK_CREATE_FLAG_MASK \
+ (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+
struct bpf_stab {
struct bpf_map map;
struct sock **sock_map;
@@ -489,7 +492,7 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
/* check sanity of attributes */
if (attr->max_entries == 0 || attr->key_size != 4 ||
- attr->value_size != 4 || attr->map_flags & ~BPF_F_NUMA_NODE)
+ attr->value_size != 4 || attr->map_flags & ~SOCK_CREATE_FLAG_MASK)
return ERR_PTR(-EINVAL);
if (attr->value_size > KMALLOC_MAX_SIZE)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 135be433e9a0..a15bc636cc98 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -11,6 +11,9 @@
#include <linux/perf_event.h>
#include "percpu_freelist.h"
+#define STACK_CREATE_FLAG_MASK \
+ (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+
struct stack_map_bucket {
struct pcpu_freelist_node fnode;
u32 hash;
@@ -60,7 +63,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
if (!capable(CAP_SYS_ADMIN))
return ERR_PTR(-EPERM);
- if (attr->map_flags & ~BPF_F_NUMA_NODE)
+ if (attr->map_flags & ~STACK_CREATE_FLAG_MASK)
return ERR_PTR(-EINVAL);
/* check sanity of attributes */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d124e702e040..b02582ead9a4 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -294,17 +294,48 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
}
#endif
+static ssize_t bpf_dummy_read(struct file *filp, char __user *buf, size_t siz,
+ loff_t *ppos)
+{
+ /* We need this handler such that alloc_file() enables
+ * f_mode with FMODE_CAN_READ.
+ */
+ return -EINVAL;
+}
+
+static ssize_t bpf_dummy_write(struct file *filp, const char __user *buf,
+ size_t siz, loff_t *ppos)
+{
+ /* We need this handler such that alloc_file() enables
+ * f_mode with FMODE_CAN_WRITE.
+ */
+ return -EINVAL;
+}
+
static const struct file_operations bpf_map_fops = {
#ifdef CONFIG_PROC_FS
.show_fdinfo = bpf_map_show_fdinfo,
#endif
.release = bpf_map_release,
+ .read = bpf_dummy_read,
+ .write = bpf_dummy_write,
};
-int bpf_map_new_fd(struct bpf_map *map)
+int bpf_map_new_fd(struct bpf_map *map, int flags)
{
return anon_inode_getfd("bpf-map", &bpf_map_fops, map,
- O_RDWR | O_CLOEXEC);
+ flags | O_CLOEXEC);
+}
+
+int bpf_get_file_flag(int flags)
+{
+ if ((flags & BPF_F_RDONLY) && (flags & BPF_F_WRONLY))
+ return -EINVAL;
+ if (flags & BPF_F_RDONLY)
+ return O_RDONLY;
+ if (flags & BPF_F_WRONLY)
+ return O_WRONLY;
+ return O_RDWR;
}
/* helper macro to check that unused fields 'union bpf_attr' are zero */
@@ -344,12 +375,17 @@ static int map_create(union bpf_attr *attr)
{
int numa_node = bpf_map_attr_numa_node(attr);
struct bpf_map *map;
+ int f_flags;
int err;
err = CHECK_ATTR(BPF_MAP_CREATE);
if (err)
return -EINVAL;
+ f_flags = bpf_get_file_flag(attr->map_flags);
+ if (f_flags < 0)
+ return f_flags;
+
if (numa_node != NUMA_NO_NODE &&
((unsigned int)numa_node >= nr_node_ids ||
!node_online(numa_node)))
@@ -375,7 +411,7 @@ static int map_create(union bpf_attr *attr)
if (err)
goto free_map;
- err = bpf_map_new_fd(map);
+ err = bpf_map_new_fd(map, f_flags);
if (err < 0) {
/* failed to allocate fd.
* bpf_map_put() is needed because the above
@@ -490,6 +526,11 @@ static int map_lookup_elem(union bpf_attr *attr)
if (IS_ERR(map))
return PTR_ERR(map);
+ if (!(f.file->f_mode & FMODE_CAN_READ)) {
+ err = -EPERM;
+ goto err_put;
+ }
+
key = memdup_user(ukey, map->key_size);
if (IS_ERR(key)) {
err = PTR_ERR(key);
@@ -570,6 +611,11 @@ static int map_update_elem(union bpf_attr *attr)
if (IS_ERR(map))
return PTR_ERR(map);
+ if (!(f.file->f_mode & FMODE_CAN_WRITE)) {
+ err = -EPERM;
+ goto err_put;
+ }
+
key = memdup_user(ukey, map->key_size);
if (IS_ERR(key)) {
err = PTR_ERR(key);
@@ -653,6 +699,11 @@ static int map_delete_elem(union bpf_attr *attr)
if (IS_ERR(map))
return PTR_ERR(map);
+ if (!(f.file->f_mode & FMODE_CAN_WRITE)) {
+ err = -EPERM;
+ goto err_put;
+ }
+
key = memdup_user(ukey, map->key_size);
if (IS_ERR(key)) {
err = PTR_ERR(key);
@@ -696,6 +747,11 @@ static int map_get_next_key(union bpf_attr *attr)
if (IS_ERR(map))
return PTR_ERR(map);
+ if (!(f.file->f_mode & FMODE_CAN_READ)) {
+ err = -EPERM;
+ goto err_put;
+ }
+
if (ukey) {
key = memdup_user(ukey, map->key_size);
if (IS_ERR(key)) {
@@ -902,6 +958,8 @@ static const struct file_operations bpf_prog_fops = {
.show_fdinfo = bpf_prog_show_fdinfo,
#endif
.release = bpf_prog_release,
+ .read = bpf_dummy_read,
+ .write = bpf_dummy_write,
};
int bpf_prog_new_fd(struct bpf_prog *prog)
@@ -1111,11 +1169,11 @@ static int bpf_prog_load(union bpf_attr *attr)
return err;
}
-#define BPF_OBJ_LAST_FIELD bpf_fd
+#define BPF_OBJ_LAST_FIELD file_flags
static int bpf_obj_pin(const union bpf_attr *attr)
{
- if (CHECK_ATTR(BPF_OBJ))
+ if (CHECK_ATTR(BPF_OBJ) || attr->file_flags != 0)
return -EINVAL;
return bpf_obj_pin_user(attr->bpf_fd, u64_to_user_ptr(attr->pathname));
@@ -1126,7 +1184,8 @@ static int bpf_obj_get(const union bpf_attr *attr)
if (CHECK_ATTR(BPF_OBJ) || attr->bpf_fd != 0)
return -EINVAL;
- return bpf_obj_get_user(u64_to_user_ptr(attr->pathname));
+ return bpf_obj_get_user(u64_to_user_ptr(attr->pathname),
+ attr->file_flags);
}
#ifdef CONFIG_CGROUP_BPF
@@ -1386,12 +1445,13 @@ static int bpf_prog_get_fd_by_id(const union bpf_attr *attr)
return fd;
}
-#define BPF_MAP_GET_FD_BY_ID_LAST_FIELD map_id
+#define BPF_MAP_GET_FD_BY_ID_LAST_FIELD open_flags
static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
{
struct bpf_map *map;
u32 id = attr->map_id;
+ int f_flags;
int fd;
if (CHECK_ATTR(BPF_MAP_GET_FD_BY_ID))
@@ -1400,6 +1460,10 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
+ f_flags = bpf_get_file_flag(attr->open_flags);
+ if (f_flags < 0)
+ return f_flags;
+
spin_lock_bh(&map_idr_lock);
map = idr_find(&map_idr, id);
if (map)
@@ -1411,7 +1475,7 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
if (IS_ERR(map))
return PTR_ERR(map);
- fd = bpf_map_new_fd(map);
+ fd = bpf_map_new_fd(map, f_flags);
if (fd < 0)
bpf_map_put(map);
--
2.14.2.920.gcf0c67979c-goog
^ permalink raw reply related
* [PATCH net-next v2 3/5] security: bpf: Add LSM hooks for bpf object related syscall
From: Chenbo Feng @ 2017-10-09 22:20 UTC (permalink / raw)
To: linux-security-module, netdev, SELinux
Cc: Jeffrey Vander Stoep, Alexei Starovoitov, lorenzo,
Daniel Borkmann, Stephen Smalley, Chenbo Feng
In-Reply-To: <20171009222028.13096-1-chenbofeng.kernel@gmail.com>
From: Chenbo Feng <fengc@google.com>
Introduce several LSM hooks for the syscalls that will allow the
userspace to access to eBPF object such as eBPF programs and eBPF maps.
The security check is aimed to enforce a per object security protection
for eBPF object so only processes with the right priviliges can
read/write to a specific map or use a specific eBPF program. Besides
that, a general security hook is added before the multiplexer of bpf
syscall to check the cmd and the attribute used for the command. The
actual security module can decide which command need to be checked and
how the cmd should be checked.
Signed-off-by: Chenbo Feng <fengc@google.com>
---
include/linux/bpf.h | 6 ++++++
include/linux/lsm_hooks.h | 54 +++++++++++++++++++++++++++++++++++++++++++++++
include/linux/security.h | 45 +++++++++++++++++++++++++++++++++++++++
kernel/bpf/syscall.c | 28 ++++++++++++++++++++++--
security/security.c | 32 ++++++++++++++++++++++++++++
5 files changed, 163 insertions(+), 2 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0e9ca2555d7f..225740688ab7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -57,6 +57,9 @@ struct bpf_map {
atomic_t usercnt;
struct bpf_map *inner_map_meta;
char name[BPF_OBJ_NAME_LEN];
+#ifdef CONFIG_SECURITY
+ void *security;
+#endif
};
/* function argument constraints */
@@ -190,6 +193,9 @@ struct bpf_prog_aux {
struct user_struct *user;
u64 load_time; /* ns since boottime */
char name[BPF_OBJ_NAME_LEN];
+#ifdef CONFIG_SECURITY
+ void *security;
+#endif
union {
struct work_struct work;
struct rcu_head rcu;
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index c9258124e417..7161d8e7ee79 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1351,6 +1351,40 @@
* @inode we wish to get the security context of.
* @ctx is a pointer in which to place the allocated security context.
* @ctxlen points to the place to put the length of @ctx.
+ *
+ * Security hooks for using the eBPF maps and programs functionalities through
+ * eBPF syscalls.
+ *
+ * @bpf:
+ * Do a initial check for all bpf syscalls after the attribute is copied
+ * into the kernel. The actual security module can implement their own
+ * rules to check the specific cmd they need.
+ *
+ * @bpf_map:
+ * Do a check when the kernel generate and return a file descriptor for
+ * eBPF maps.
+ *
+ * @map: bpf map that we want to access
+ * @mask: the access flags
+ *
+ * @bpf_prog:
+ * Do a check when the kernel generate and return a file descriptor for
+ * eBPF programs.
+ *
+ * @prog: bpf prog that userspace want to use.
+ *
+ * @bpf_map_alloc_security:
+ * Initialize the security field inside bpf map.
+ *
+ * @bpf_map_free_security:
+ * Clean up the security information stored inside bpf map.
+ *
+ * @bpf_prog_alloc_security:
+ * Initialize the security field inside bpf program.
+ *
+ * @bpf_prog_free_security:
+ * Clean up the security information stored inside bpf prog.
+ *
*/
union security_list_options {
int (*binder_set_context_mgr)(struct task_struct *mgr);
@@ -1682,6 +1716,17 @@ union security_list_options {
struct audit_context *actx);
void (*audit_rule_free)(void *lsmrule);
#endif /* CONFIG_AUDIT */
+
+#ifdef CONFIG_BPF_SYSCALL
+ int (*bpf)(int cmd, union bpf_attr *attr,
+ unsigned int size);
+ int (*bpf_map)(struct bpf_map *map, fmode_t fmode);
+ int (*bpf_prog)(struct bpf_prog *prog);
+ int (*bpf_map_alloc_security)(struct bpf_map *map);
+ void (*bpf_map_free_security)(struct bpf_map *map);
+ int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux);
+ void (*bpf_prog_free_security)(struct bpf_prog_aux *aux);
+#endif /* CONFIG_BPF_SYSCALL */
};
struct security_hook_heads {
@@ -1901,6 +1946,15 @@ struct security_hook_heads {
struct list_head audit_rule_match;
struct list_head audit_rule_free;
#endif /* CONFIG_AUDIT */
+#ifdef CONFIG_BPF_SYSCALL
+ struct list_head bpf;
+ struct list_head bpf_map;
+ struct list_head bpf_prog;
+ struct list_head bpf_map_alloc_security;
+ struct list_head bpf_map_free_security;
+ struct list_head bpf_prog_alloc_security;
+ struct list_head bpf_prog_free_security;
+#endif /* CONFIG_BPF_SYSCALL */
} __randomize_layout;
/*
diff --git a/include/linux/security.h b/include/linux/security.h
index ce6265960d6c..18800b0911e5 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -31,6 +31,7 @@
#include <linux/string.h>
#include <linux/mm.h>
#include <linux/fs.h>
+#include <linux/bpf.h>
struct linux_binprm;
struct cred;
@@ -1730,6 +1731,50 @@ static inline void securityfs_remove(struct dentry *dentry)
#endif
+#ifdef CONFIG_BPF_SYSCALL
+#ifdef CONFIG_SECURITY
+extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size);
+extern int security_bpf_map(struct bpf_map *map, fmode_t fmode);
+extern int security_bpf_prog(struct bpf_prog *prog);
+extern int security_bpf_map_alloc(struct bpf_map *map);
+extern void security_bpf_map_free(struct bpf_map *map);
+extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux);
+extern void security_bpf_prog_free(struct bpf_prog_aux *aux);
+#else
+static inline int security_bpf(int cmd, union bpf_attr *attr,
+ unsigned int size)
+{
+ return 0;
+}
+
+static inline int security_bpf_map(struct bpf_map *map, fmode_t fmode)
+{
+ return 0;
+}
+
+static inline int security_bpf_prog(struct bpf_prog *prog)
+{
+ return 0;
+}
+
+static inline int security_bpf_map_alloc(struct bpf_map *map)
+{
+ return 0;
+}
+
+static inline void security_bpf_map_free(struct bpf_map *map)
+{ }
+
+static inline int security_bpf_prog_alloc(struct bpf_prog_aux *aux)
+{
+ return 0;
+}
+
+static inline void security_bpf_prog_free(struct bpf_prog_aux *aux)
+{ }
+#endif /* CONFIG_SECURITY */
+#endif /* CONFIG_BPF_SYSCALL */
+
#ifdef CONFIG_SECURITY
static inline char *alloc_secdata(void)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b02582ead9a4..1cf31ddd7616 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -210,6 +210,7 @@ static void bpf_map_free_deferred(struct work_struct *work)
struct bpf_map *map = container_of(work, struct bpf_map, work);
bpf_map_uncharge_memlock(map);
+ security_bpf_map_free(map);
/* implementation dependent freeing */
map->ops->map_free(map);
}
@@ -323,6 +324,9 @@ static const struct file_operations bpf_map_fops = {
int bpf_map_new_fd(struct bpf_map *map, int flags)
{
+ if (security_bpf_map(map, OPEN_FMODE(flags)))
+ return -EPERM;
+
return anon_inode_getfd("bpf-map", &bpf_map_fops, map,
flags | O_CLOEXEC);
}
@@ -403,10 +407,14 @@ static int map_create(union bpf_attr *attr)
atomic_set(&map->refcnt, 1);
atomic_set(&map->usercnt, 1);
- err = bpf_map_charge_memlock(map);
+ err = security_bpf_map_alloc(map);
if (err)
goto free_map_nouncharge;
+ err = bpf_map_charge_memlock(map);
+ if (err)
+ goto free_map_sec;
+
err = bpf_map_alloc_id(map);
if (err)
goto free_map;
@@ -428,6 +436,8 @@ static int map_create(union bpf_attr *attr)
free_map:
bpf_map_uncharge_memlock(map);
+free_map_sec:
+ security_bpf_map_free(map);
free_map_nouncharge:
map->ops->map_free(map);
return err;
@@ -906,6 +916,7 @@ static void __bpf_prog_put_rcu(struct rcu_head *rcu)
free_used_maps(aux);
bpf_prog_uncharge_memlock(aux->prog);
+ security_bpf_prog_free(aux);
bpf_prog_free(aux->prog);
}
@@ -964,6 +975,9 @@ static const struct file_operations bpf_prog_fops = {
int bpf_prog_new_fd(struct bpf_prog *prog)
{
+ if (security_bpf_prog(prog))
+ return -EPERM;
+
return anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog,
O_RDWR | O_CLOEXEC);
}
@@ -1103,10 +1117,14 @@ static int bpf_prog_load(union bpf_attr *attr)
if (!prog)
return -ENOMEM;
- err = bpf_prog_charge_memlock(prog);
+ err = security_bpf_prog_alloc(prog->aux);
if (err)
goto free_prog_nouncharge;
+ err = bpf_prog_charge_memlock(prog);
+ if (err)
+ goto free_prog_sec;
+
prog->len = attr->insn_cnt;
err = -EFAULT;
@@ -1164,6 +1182,8 @@ static int bpf_prog_load(union bpf_attr *attr)
free_used_maps(prog->aux);
free_prog:
bpf_prog_uncharge_memlock(prog);
+free_prog_sec:
+ security_bpf_prog_free(prog->aux);
free_prog_nouncharge:
bpf_prog_free(prog);
return err;
@@ -1630,6 +1650,10 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
if (copy_from_user(&attr, uattr, size) != 0)
return -EFAULT;
+ err = security_bpf(cmd, &attr, size);
+ if (err)
+ return -EPERM;
+
switch (cmd) {
case BPF_MAP_CREATE:
err = map_create(&attr);
diff --git a/security/security.c b/security/security.c
index 4bf0f571b4ef..1cd8526cb0b7 100644
--- a/security/security.c
+++ b/security/security.c
@@ -12,6 +12,7 @@
* (at your option) any later version.
*/
+#include <linux/bpf.h>
#include <linux/capability.h>
#include <linux/dcache.h>
#include <linux/module.h>
@@ -1703,3 +1704,34 @@ int security_audit_rule_match(u32 secid, u32 field, u32 op, void *lsmrule,
actx);
}
#endif /* CONFIG_AUDIT */
+
+#ifdef CONFIG_BPF_SYSCALL
+int security_bpf(int cmd, union bpf_attr *attr, unsigned int size)
+{
+ return call_int_hook(bpf, 0, cmd, attr, size);
+}
+int security_bpf_map(struct bpf_map *map, fmode_t fmode)
+{
+ return call_int_hook(bpf_map, 0, map, fmode);
+}
+int security_bpf_prog(struct bpf_prog *prog)
+{
+ return call_int_hook(bpf_prog, 0, prog);
+}
+int security_bpf_map_alloc(struct bpf_map *map)
+{
+ return call_int_hook(bpf_map_alloc_security, 0, map);
+}
+int security_bpf_prog_alloc(struct bpf_prog_aux *aux)
+{
+ return call_int_hook(bpf_prog_alloc_security, 0, aux);
+}
+void security_bpf_map_free(struct bpf_map *map)
+{
+ call_void_hook(bpf_map_free_security, map);
+}
+void security_bpf_prog_free(struct bpf_prog_aux *aux)
+{
+ call_void_hook(bpf_prog_free_security, aux);
+}
+#endif /* CONFIG_BPF_SYSCALL */
--
2.14.2.920.gcf0c67979c-goog
^ permalink raw reply related
* [PATCH net-next v2 4/5] selinux: bpf: Add selinux check for eBPF syscall operations
From: Chenbo Feng @ 2017-10-09 22:20 UTC (permalink / raw)
To: linux-security-module, netdev, SELinux
Cc: Jeffrey Vander Stoep, Alexei Starovoitov, lorenzo,
Daniel Borkmann, Stephen Smalley, Chenbo Feng
In-Reply-To: <20171009222028.13096-1-chenbofeng.kernel@gmail.com>
From: Chenbo Feng <fengc@google.com>
Implement the actual checks introduced to eBPF related syscalls. This
implementation use the security field inside bpf object to store a sid that
identify the bpf object. And when processes try to access the object,
selinux will check if processes have the right privileges. The creation
of eBPF object are also checked at the general bpf check hook and new
cmd introduced to eBPF domain can also be checked there.
Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
---
security/selinux/hooks.c | 111 ++++++++++++++++++++++++++++++++++++
security/selinux/include/classmap.h | 2 +
security/selinux/include/objsec.h | 4 ++
3 files changed, 117 insertions(+)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index f5d304736852..41aba4e3d57c 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -85,6 +85,7 @@
#include <linux/export.h>
#include <linux/msg.h>
#include <linux/shm.h>
+#include <linux/bpf.h>
#include "avc.h"
#include "objsec.h"
@@ -6252,6 +6253,106 @@ static void selinux_ib_free_security(void *ib_sec)
}
#endif
+#ifdef CONFIG_BPF_SYSCALL
+static int selinux_bpf(int cmd, union bpf_attr *attr,
+ unsigned int size)
+{
+ u32 sid = current_sid();
+ int ret;
+
+ switch (cmd) {
+ case BPF_MAP_CREATE:
+ ret = avc_has_perm(sid, sid, SECCLASS_BPF_MAP, BPF_MAP__CREATE,
+ NULL);
+ break;
+ case BPF_PROG_LOAD:
+ ret = avc_has_perm(sid, sid, SECCLASS_BPF_PROG, BPF_PROG__LOAD,
+ NULL);
+ break;
+ default:
+ ret = 0;
+ break;
+ }
+
+ return ret;
+}
+
+static u32 bpf_map_fmode_to_av(fmode_t fmode)
+{
+ u32 av = 0;
+
+ if (f_mode & FMODE_READ)
+ av |= BPF_MAP__READ;
+ if (f_mode & FMODE_WRITE)
+ av |= BPF_MAP__WRITE;
+ return av;
+}
+
+static int selinux_bpf_map(struct bpf_map *map, fmode_t fmode)
+{
+ u32 sid = current_sid();
+ struct bpf_security_struct *bpfsec;
+
+ bpfsec = map->security;
+ return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_MAP,
+ bpf_map_fmode_to_av(fmode), NULL);
+}
+
+static int selinux_bpf_prog(struct bpf_prog *prog)
+{
+ u32 sid = current_sid();
+ struct bpf_security_struct *bpfsec;
+
+ bpfsec = prog->aux->security;
+ return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_PROG,
+ BPF_PROG__USE, NULL);
+}
+
+static int selinux_bpf_map_alloc(struct bpf_map *map)
+{
+ struct bpf_security_struct *bpfsec;
+
+ bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
+ if (!bpfsec)
+ return -ENOMEM;
+
+ bpfsec->sid = current_sid();
+ map->security = bpfsec;
+
+ return 0;
+}
+
+static void selinux_bpf_map_free(struct bpf_map *map)
+{
+ struct bpf_security_struct *bpfsec = map->security;
+
+ map->security = NULL;
+ kfree(bpfsec);
+}
+
+static int selinux_bpf_prog_alloc(struct bpf_prog_aux *aux)
+{
+ struct bpf_security_struct *bpfsec;
+
+ bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
+ if (!bpfsec)
+ return -ENOMEM;
+
+ bpfsec->sid = current_sid();
+ aux->security = bpfsec;
+
+ return 0;
+}
+
+static void selinux_bpf_prog_free(struct bpf_prog_aux *aux)
+{
+ struct bpf_security_struct *bpfsec = aux->security;
+
+ aux->security = NULL;
+ kfree(bpfsec);
+}
+#endif
+
static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
LSM_HOOK_INIT(binder_set_context_mgr, selinux_binder_set_context_mgr),
LSM_HOOK_INIT(binder_transaction, selinux_binder_transaction),
@@ -6471,6 +6572,16 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
LSM_HOOK_INIT(audit_rule_match, selinux_audit_rule_match),
LSM_HOOK_INIT(audit_rule_free, selinux_audit_rule_free),
#endif
+
+#ifdef CONFIG_BPF_SYSCALL
+ LSM_HOOK_INIT(bpf, selinux_bpf),
+ LSM_HOOK_INIT(bpf_map, selinux_bpf_map),
+ LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
+ LSM_HOOK_INIT(bpf_map_alloc_security, selinux_bpf_map_alloc),
+ LSM_HOOK_INIT(bpf_prog_alloc_security, selinux_bpf_prog_alloc),
+ LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
+ LSM_HOOK_INIT(bpf_prog_free_security, selinux_bpf_prog_free),
+#endif
};
static __init int selinux_init(void)
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index 35ffb29a69cb..7253c5eea59c 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -237,6 +237,8 @@ struct security_class_mapping secclass_map[] = {
{ "access", NULL } },
{ "infiniband_endport",
{ "manage_subnet", NULL } },
+ { "bpf_map", {"create", "read", "write"} },
+ { "bpf_prog", {"load", "use"} },
{ NULL }
};
diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h
index 1649cd18eb0b..3d54468ce334 100644
--- a/security/selinux/include/objsec.h
+++ b/security/selinux/include/objsec.h
@@ -150,6 +150,10 @@ struct pkey_security_struct {
u32 sid; /* SID of pkey */
};
+struct bpf_security_struct {
+ u32 sid; /*SID of bpf obj creater*/
+};
+
extern unsigned int selinux_checkreqprot;
#endif /* _SELINUX_OBJSEC_H_ */
--
2.14.2.920.gcf0c67979c-goog
^ permalink raw reply related
* [PATCH net-next v2 5/5] selinux: bpf: Add addtional check for bpf object file receive
From: Chenbo Feng @ 2017-10-09 22:20 UTC (permalink / raw)
To: linux-security-module, netdev, SELinux
Cc: Jeffrey Vander Stoep, Alexei Starovoitov, lorenzo,
Daniel Borkmann, Stephen Smalley, Chenbo Feng
In-Reply-To: <20171009222028.13096-1-chenbofeng.kernel@gmail.com>
From: Chenbo Feng <fengc@google.com>
Introduce a bpf object related check when sending and receiving files
through unix domain socket as well as binder. It checks if the receiving
process have privilege to read/write the bpf map or use the bpf program.
This check is necessary because the bpf maps and programs are using a
anonymous inode as their shared inode so the normal way of checking the
files and sockets when passing between processes cannot work properly on
eBPF object. This check only works when the BPF_SYSCALL is configured.
The information stored inside the file security struct is the same as
the information in bpf object security struct.
Signed-off-by: Chenbo Feng <fengc@google.com>
---
include/linux/bpf.h | 3 +++
include/linux/lsm_hooks.h | 17 +++++++++++++
include/linux/security.h | 9 +++++++
kernel/bpf/syscall.c | 4 ++--
security/security.c | 8 +++++++
security/selinux/hooks.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++
6 files changed, 100 insertions(+), 2 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 225740688ab7..81d6c01b8825 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -285,6 +285,9 @@ int bpf_prog_array_copy_to_user(struct bpf_prog_array __rcu *progs,
#ifdef CONFIG_BPF_SYSCALL
DECLARE_PER_CPU(int, bpf_prog_active);
+extern const struct file_operations bpf_map_fops;
+extern const struct file_operations bpf_prog_fops;
+
#define BPF_PROG_TYPE(_id, _ops) \
extern const struct bpf_verifier_ops _ops;
#define BPF_MAP_TYPE(_id, _ops) \
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 7161d8e7ee79..517dea60b87b 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1385,6 +1385,19 @@
* @bpf_prog_free_security:
* Clean up the security information stored inside bpf prog.
*
+ * @bpf_map_file:
+ * When creating a bpf map fd, set up the file security information with
+ * the bpf security information stored in the map struct. So when the map
+ * fd is passed between processes, the security module can directly read
+ * the security information from file security struct rather than the bpf
+ * security struct.
+ *
+ * @bpf_prog_file:
+ * When creating a bpf prog fd, set up the file security information with
+ * the bpf security information stored in the prog struct. So when the prog
+ * fd is passed between processes, the security module can directly read
+ * the security information from file security struct rather than the bpf
+ * security struct.
*/
union security_list_options {
int (*binder_set_context_mgr)(struct task_struct *mgr);
@@ -1726,6 +1739,8 @@ union security_list_options {
void (*bpf_map_free_security)(struct bpf_map *map);
int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux);
void (*bpf_prog_free_security)(struct bpf_prog_aux *aux);
+ void (*bpf_map_file)(struct bpf_map *map, struct file *file);
+ void (*bpf_prog_file)(struct bpf_prog_aux *aux, struct file *file);
#endif /* CONFIG_BPF_SYSCALL */
};
@@ -1954,6 +1969,8 @@ struct security_hook_heads {
struct list_head bpf_map_free_security;
struct list_head bpf_prog_alloc_security;
struct list_head bpf_prog_free_security;
+ struct list_head bpf_map_file;
+ struct list_head bpf_prog_file;
#endif /* CONFIG_BPF_SYSCALL */
} __randomize_layout;
diff --git a/include/linux/security.h b/include/linux/security.h
index 18800b0911e5..57573b794e2d 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1740,6 +1740,8 @@ extern int security_bpf_map_alloc(struct bpf_map *map);
extern void security_bpf_map_free(struct bpf_map *map);
extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux);
extern void security_bpf_prog_free(struct bpf_prog_aux *aux);
+extern void security_bpf_map_file(struct bpf_map *map, struct file *file);
+extern void security_bpf_prog_file(struct bpf_prog_aux *aux, struct file *file);
#else
static inline int security_bpf(int cmd, union bpf_attr *attr,
unsigned int size)
@@ -1772,6 +1774,13 @@ static inline int security_bpf_prog_alloc(struct bpf_prog_aux *aux)
static inline void security_bpf_prog_free(struct bpf_prog_aux *aux)
{ }
+
+static inline void security_bpf_map_file(struct bpf_map *map, struct file *file)
+{ }
+
+static inline void security_bpf_prog_file(struct bpf_prog_aux *aux,
+ struct file *file)
+{ }
#endif /* CONFIG_SECURITY */
#endif /* CONFIG_BPF_SYSCALL */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1cf31ddd7616..b144181d3f3a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -313,7 +313,7 @@ static ssize_t bpf_dummy_write(struct file *filp, const char __user *buf,
return -EINVAL;
}
-static const struct file_operations bpf_map_fops = {
+const struct file_operations bpf_map_fops = {
#ifdef CONFIG_PROC_FS
.show_fdinfo = bpf_map_show_fdinfo,
#endif
@@ -964,7 +964,7 @@ static void bpf_prog_show_fdinfo(struct seq_file *m, struct file *filp)
}
#endif
-static const struct file_operations bpf_prog_fops = {
+const struct file_operations bpf_prog_fops = {
#ifdef CONFIG_PROC_FS
.show_fdinfo = bpf_prog_show_fdinfo,
#endif
diff --git a/security/security.c b/security/security.c
index 1cd8526cb0b7..dacf649b8cfa 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1734,4 +1734,12 @@ void security_bpf_prog_free(struct bpf_prog_aux *aux)
{
call_void_hook(bpf_prog_free_security, aux);
}
+void security_bpf_map_file(struct bpf_map *map, struct file *file)
+{
+ call_void_hook(bpf_map_file, map, file);
+}
+void security_bpf_prog_file(struct bpf_prog_aux *aux, struct file *file)
+{
+ call_void_hook(bpf_prog_file, aux, file);
+}
#endif /* CONFIG_BPF_SYSCALL */
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 41aba4e3d57c..fea88655e0ee 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -1815,6 +1815,10 @@ static inline int file_path_has_perm(const struct cred *cred,
return inode_has_perm(cred, file_inode(file), av, &ad);
}
+#ifdef CONFIG_BPF_SYSCALL
+static int bpf_file_check(struct file *file, u32 sid);
+#endif
+
/* Check whether a task can use an open file descriptor to
access an inode in a given way. Check access to the
descriptor itself, and then use dentry_has_perm to
@@ -1845,6 +1849,12 @@ static int file_has_perm(const struct cred *cred,
goto out;
}
+#ifdef CONFIG_BPF_SYSCALL
+ rc = bpf_file_check(file, cred_sid(cred));
+ if (rc)
+ goto out;
+#endif
+
/* av is zero if only checking access to the descriptor. */
rc = 0;
if (av)
@@ -2165,6 +2175,12 @@ static int selinux_binder_transfer_file(struct task_struct *from,
return rc;
}
+#ifdef CONFIG_BPF_SYSCALL
+ rc = bpf_file_check(file, sid);
+ if (rc)
+ return rc;
+#endif
+
if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
return 0;
@@ -6288,6 +6304,33 @@ static u32 bpf_map_fmode_to_av(fmode_t fmode)
return av;
}
+/* This function will check the file pass through unix socket or binder to see
+ * if it is a bpf related object. And apply correspinding checks on the bpf
+ * object based on the type. The bpf maps and programs, not like other files and
+ * socket, are using a shared anonymous inode inside the kernel as their inode.
+ * So checking that inode cannot identify if the process have privilege to
+ * access the bpf object and that's why we have to add this additional check in
+ * selinux_file_receive and selinux_binder_transfer_files.
+ */
+static int bpf_file_check(struct file *file, u32 sid)
+{
+ struct file_security_struct *fsec = file->f_security;
+ int ret;
+
+ if (file->f_op == &bpf_map_fops) {
+ ret = avc_has_perm(sid, fsec->sid, SECCLASS_BPF_MAP,
+ bpf_map_fmode_to_av(file->f_mode), NULL);
+ if (ret)
+ return ret;
+ } else if (file->f_op == &bpf_prog_fops) {
+ ret = avc_has_perm(sid, fsec->sid, SECCLASS_BPF_PROG,
+ BPF_PROG__USE, NULL);
+ if (ret)
+ return ret;
+ }
+ return 0;
+}
+
static int selinux_bpf_map(struct bpf_map *map, fmode_t fmode)
{
u32 sid = current_sid();
@@ -6351,6 +6394,22 @@ static void selinux_bpf_prog_free(struct bpf_prog_aux *aux)
aux->security = NULL;
kfree(bpfsec);
}
+
+static void selinux_bpf_map_file(struct bpf_map *map, struct file *file)
+{
+ struct bpf_security_struct *bpfsec = map->security;
+ struct file_security_struct *fsec = file->f_security;
+
+ fsec->sid = bpfsec->sid;
+}
+
+static void selinux_bpf_prog_file(struct bpf_prog_aux *aux, struct file *file)
+{
+ struct bpf_security_struct *bpfsec = aux->security;
+ struct file_security_struct *fsec = file->f_security;
+
+ fsec->sid = bpfsec->sid;
+}
#endif
static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
@@ -6581,6 +6640,8 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
LSM_HOOK_INIT(bpf_prog_alloc_security, selinux_bpf_prog_alloc),
LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
LSM_HOOK_INIT(bpf_prog_free_security, selinux_bpf_prog_free),
+ LSM_HOOK_INIT(bpf_map_file, selinux_bpf_map_file),
+ LSM_HOOK_INIT(bpf_prog_file, selinux_bpf_prog_file),
#endif
};
--
2.14.2.920.gcf0c67979c-goog
^ permalink raw reply related
* [net-next 00/14][pull request] 40GbE Intel Wired LAN Driver Updates 2017-10-09
From: Jeff Kirsher @ 2017-10-09 22:38 UTC (permalink / raw)
To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene
This series contains updates to i40e and i40evf only.
Jake fixes missed flag conversion from u64 to u32. Fixes a deafult ITR
value issue where the driver defaults to an ITR value of half the
expected value (in terms of minimum microseconds between interrupts). So
fix this by changing the default values to be calculated using the
ITR_REG_TO_USEC() macro which indicates that we are converting from the
register units into microseconds. Updates the drivers to bump the tail in
increments of 8 and double the number of descriptors we will bundle into
one tail bump when receiving. With the recent kernel support for
enabling XPS and QoS at the same time, we no longer need to worry about
the number of traffic classes when enabling XPS.
Lihong converts the use of hash_for_each() to hash_for_each_safe() to
safely remove a hash entry. Adds a check for the return value for
find_first_bit() in the case that it returns the size passed to search.
Alan fixes a bug in which filters are erroneously removed if they are
removed and then added again. So make sure that when adding a filter, if
we find it already existed in our list, make sure it is not marked to be
removed.
Jayaprakash adds the retrying of PHY reads when the I2C is busy for a
maximum period of 500ms.
Rami fixes code comment typo.
Stefano Brivio simplifies the code by removing the use of a local
return code variable and simply return the results of the read function.
The following are changes since commit 2e997d8b12d2933d7640bb3a43af8eb6857a73af:
Merge branch 'ipv6-addrlabel-avoid-dirtying-ip6addrlbl_entry'
and are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE
Alan Brady (1):
i40evf: fix mac filter removal timing issue
Jacob Keller (7):
i40e: fix flags declaration
i40e/i40evf: fix incorrect default ITR values on driver load
i40e/i40evf: always set the CLEARPBA flag when re-enabling interrupts
i40e: reduce lrxqthresh from 2 to 1
i40e/i40evf: bump tail only in multiples of 8
i40e/i40evf: bundle more descriptors when allocating buffers
i40e: allow XPS with QoS enabled
Jayaprakash Shanmugam (1):
i40e: Retry AQC GetPhyAbilities to overcome I2CRead hangs
Lihong Yang (3):
i40e: use the safe hash table iterator when deleting mac filters
i40e: add check for return from find_first_bit call
i40e: use a local variable instead of calculating multiple times
Rami Rosen (1):
i40e: fix a typo
Stefano Brivio (1):
i40e: Avoid some useless variables and initializers in NVM functions
drivers/net/ethernet/intel/i40e/i40e.h | 7 ++--
drivers/net/ethernet/intel/i40e/i40e_common.c | 42 +++++++++++++++-------
drivers/net/ethernet/intel/i40e/i40e_main.c | 36 ++++++++-----------
drivers/net/ethernet/intel/i40e/i40e_nvm.c | 20 ++++-------
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 15 +++++---
drivers/net/ethernet/intel/i40e/i40e_txrx.h | 8 +++--
drivers/net/ethernet/intel/i40e/i40e_type.h | 3 ++
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 27 +++++++-------
drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 13 +++++--
drivers/net/ethernet/intel/i40evf/i40e_txrx.h | 8 +++--
drivers/net/ethernet/intel/i40evf/i40e_type.h | 3 ++
drivers/net/ethernet/intel/i40evf/i40evf_main.c | 6 ++--
12 files changed, 107 insertions(+), 81 deletions(-)
--
2.14.2
^ permalink raw reply
* [net-next 03/14] i40evf: fix mac filter removal timing issue
From: Jeff Kirsher @ 2017-10-09 22:38 UTC (permalink / raw)
To: davem; +Cc: Alan Brady, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171009223841.2557-1-jeffrey.t.kirsher@intel.com>
From: Alan Brady <alan.brady@intel.com>
Due to the asynchronous nature in which mac filters are added and
deleted, there exists a bug in which filters are erroneously removed if
removed then added again quickly.
The events are as such:
- filter marked for removal
- same filter is re-added before watchdog that cleans up filters
- we skip re-adding the filter because we have it already in the
list
- watchdog filter cleanup kicks off and filter is removed
So when we were re-adding the same filter, it didn't actually get added
because it already existed in the list, but was marked for removal and
had yet to actually be removed.
This patch fixes the issue by making sure that when adding a filter, if
we find it already existing in our list, make sure it is not marked to
be removed.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40evf/i40evf_main.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 1d2fc898b664..f62d9565c7b5 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -880,6 +880,8 @@ i40evf_mac_filter *i40evf_add_filter(struct i40evf_adapter *adapter,
list_add_tail(&f->list, &adapter->mac_filter_list);
f->add = true;
adapter->aq_required |= I40EVF_FLAG_AQ_ADD_MAC_FILTER;
+ } else {
+ f->remove = false;
}
clear_bit(__I40EVF_IN_CRITICAL_TASK, &adapter->crit_section);
--
2.14.2
^ permalink raw reply related
* [net-next 01/14] i40e: fix flags declaration
From: Jeff Kirsher @ 2017-10-09 22:38 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171009223841.2557-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
Since we don't yet have more than 32 flags, we'll use a u32 for both the
hw_features and flag field. Should we gain more flags in the future, we
may need to convert to a u64 or separate flags out into two fields.
This was overlooked in the previous commit 2781de2134c4 ("i40e/i40evf:
organize and re-number feature flags"), where the feature flag was not
converted form u64 to u32.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 18c453a3e728..7baf6d8a84dd 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -424,7 +424,7 @@ struct i40e_pf {
#define I40E_HW_PORT_ID_VALID BIT(17)
#define I40E_HW_RESTART_AUTONEG BIT(18)
- u64 flags;
+ u32 flags;
#define I40E_FLAG_RX_CSUM_ENABLED BIT(0)
#define I40E_FLAG_MSI_ENABLED BIT(1)
#define I40E_FLAG_MSIX_ENABLED BIT(2)
--
2.14.2
^ permalink raw reply related
* [net-next 02/14] i40e: use the safe hash table iterator when deleting mac filters
From: Jeff Kirsher @ 2017-10-09 22:38 UTC (permalink / raw)
To: davem; +Cc: Lihong Yang, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171009223841.2557-1-jeffrey.t.kirsher@intel.com>
From: Lihong Yang <lihong.yang@intel.com>
This patch replaces hash_for_each function with hash_for_each_safe
when calling __i40e_del_filter. The hash_for_each_safe function is
the right one to use when iterating over a hash table to safely remove
a hash entry. Otherwise, incorrect values may be read from freed memory.
Detected by CoverityScan, CID 1402048 Read from pointer after free
Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 04568137e029..c062d74d21f3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -2883,6 +2883,7 @@ int i40e_ndo_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
struct i40e_mac_filter *f;
struct i40e_vf *vf;
int ret = 0;
+ struct hlist_node *h;
int bkt;
/* validate the request */
@@ -2921,7 +2922,7 @@ int i40e_ndo_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac)
/* Delete all the filters for this VSI - we're going to kill it
* anyway.
*/
- hash_for_each(vsi->mac_filter_hash, bkt, f, hlist)
+ hash_for_each_safe(vsi->mac_filter_hash, bkt, h, f, hlist)
__i40e_del_filter(vsi, f);
spin_unlock_bh(&vsi->mac_filter_hash_lock);
--
2.14.2
^ permalink raw reply related
* [net-next 04/14] i40e/i40evf: fix incorrect default ITR values on driver load
From: Jeff Kirsher @ 2017-10-09 22:38 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171009223841.2557-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
The ITR register expects to be programmed in units of 2 microseconds.
Because of this, all of the drivers I40E_ITR_* constants are in terms of
this 2 microsecond register.
Unfortunately, the rx_itr_default value is expected to be programmed in
microseconds.
Effectively the driver defaults to an ITR value of half the expected
value (in terms of minimum microseconds between interrupts).
Fix this by changing the default values to be calculated using
ITR_REG_TO_USEC macro which indicates that we're converting from the
register units into microseconds.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 4 ++--
drivers/net/ethernet/intel/i40e/i40e_txrx.h | 6 ++++--
drivers/net/ethernet/intel/i40evf/i40e_txrx.h | 6 ++++--
drivers/net/ethernet/intel/i40evf/i40evf_main.c | 4 ++--
4 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 60b11fdeca2d..d4b0cc36afb1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8983,8 +8983,8 @@ static int i40e_sw_init(struct i40e_pf *pf)
I40E_FLAG_MSIX_ENABLED;
/* Set default ITR */
- pf->rx_itr_default = I40E_ITR_DYNAMIC | I40E_ITR_RX_DEF;
- pf->tx_itr_default = I40E_ITR_DYNAMIC | I40E_ITR_TX_DEF;
+ pf->rx_itr_default = I40E_ITR_RX_DEF;
+ pf->tx_itr_default = I40E_ITR_TX_DEF;
/* Depending on PF configurations, it is possible that the RSS
* maximum might end up larger than the available queues
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index a4e3e665a1a1..c3156aa3f709 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -38,8 +38,10 @@
#define I40E_ITR_8K 0x003E
#define I40E_ITR_4K 0x007A
#define I40E_MAX_INTRL 0x3B /* reg uses 4 usec resolution */
-#define I40E_ITR_RX_DEF I40E_ITR_20K
-#define I40E_ITR_TX_DEF I40E_ITR_20K
+#define I40E_ITR_RX_DEF (ITR_REG_TO_USEC(I40E_ITR_20K) | \
+ I40E_ITR_DYNAMIC)
+#define I40E_ITR_TX_DEF (ITR_REG_TO_USEC(I40E_ITR_20K) | \
+ I40E_ITR_DYNAMIC)
#define I40E_ITR_DYNAMIC 0x8000 /* use top bit as a flag */
#define I40E_MIN_INT_RATE 250 /* ~= 1000000 / (I40E_MAX_ITR * 2) */
#define I40E_MAX_INT_RATE 500000 /* == 1000000 / (I40E_MIN_ITR * 2) */
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
index d8ca802a71a9..8f9830d7649a 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
@@ -38,8 +38,10 @@
#define I40E_ITR_8K 0x003E
#define I40E_ITR_4K 0x007A
#define I40E_MAX_INTRL 0x3B /* reg uses 4 usec resolution */
-#define I40E_ITR_RX_DEF I40E_ITR_20K
-#define I40E_ITR_TX_DEF I40E_ITR_20K
+#define I40E_ITR_RX_DEF (ITR_REG_TO_USEC(I40E_ITR_20K) | \
+ I40E_ITR_DYNAMIC)
+#define I40E_ITR_TX_DEF (ITR_REG_TO_USEC(I40E_ITR_20K) | \
+ I40E_ITR_DYNAMIC)
#define I40E_ITR_DYNAMIC 0x8000 /* use top bit as a flag */
#define I40E_MIN_INT_RATE 250 /* ~= 1000000 / (I40E_MAX_ITR * 2) */
#define I40E_MAX_INT_RATE 500000 /* == 1000000 / (I40E_MIN_ITR * 2) */
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index f62d9565c7b5..5bcbd46e2f6c 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -1223,7 +1223,7 @@ static int i40evf_alloc_queues(struct i40evf_adapter *adapter)
tx_ring->netdev = adapter->netdev;
tx_ring->dev = &adapter->pdev->dev;
tx_ring->count = adapter->tx_desc_count;
- tx_ring->tx_itr_setting = (I40E_ITR_DYNAMIC | I40E_ITR_TX_DEF);
+ tx_ring->tx_itr_setting = I40E_ITR_TX_DEF;
if (adapter->flags & I40EVF_FLAG_WB_ON_ITR_CAPABLE)
tx_ring->flags |= I40E_TXR_FLAGS_WB_ON_ITR;
@@ -1232,7 +1232,7 @@ static int i40evf_alloc_queues(struct i40evf_adapter *adapter)
rx_ring->netdev = adapter->netdev;
rx_ring->dev = &adapter->pdev->dev;
rx_ring->count = adapter->rx_desc_count;
- rx_ring->rx_itr_setting = (I40E_ITR_DYNAMIC | I40E_ITR_RX_DEF);
+ rx_ring->rx_itr_setting = I40E_ITR_RX_DEF;
}
adapter->num_active_queues = num_active_queues;
--
2.14.2
^ permalink raw reply related
* [net-next 05/14] i40e/i40evf: always set the CLEARPBA flag when re-enabling interrupts
From: Jeff Kirsher @ 2017-10-09 22:38 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171009223841.2557-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
In the past we changed driver behavior to not clear the PBA when
re-enabling interrupts. This change was motivated by the flawed belief
that clearing the PBA would cause a lost interrupt if a receive
interrupt occurred while interrupts were disabled.
According to empirical testing this isn't the case. Additionally, the
data sheet specifically says that we should set the CLEARPBA bit when
re-enabling interrupts in a polling setup.
This reverts commit 40d72a509862 ("i40e/i40evf: don't lose interrupts")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e.h | 5 +----
drivers/net/ethernet/intel/i40e/i40e_main.c | 11 +++++------
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 6 ++----
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 +-
drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 4 +---
5 files changed, 10 insertions(+), 18 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 7baf6d8a84dd..8139b4ee1dc3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -949,9 +949,6 @@ static inline void i40e_irq_dynamic_enable(struct i40e_vsi *vsi, int vector)
struct i40e_hw *hw = &pf->hw;
u32 val;
- /* definitely clear the PBA here, as this function is meant to
- * clean out all previous interrupts AND enable the interrupt
- */
val = I40E_PFINT_DYN_CTLN_INTENA_MASK |
I40E_PFINT_DYN_CTLN_CLEARPBA_MASK |
(I40E_ITR_NONE << I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT);
@@ -960,7 +957,7 @@ static inline void i40e_irq_dynamic_enable(struct i40e_vsi *vsi, int vector)
}
void i40e_irq_dynamic_disable_icr0(struct i40e_pf *pf);
-void i40e_irq_dynamic_enable_icr0(struct i40e_pf *pf, bool clearpba);
+void i40e_irq_dynamic_enable_icr0(struct i40e_pf *pf);
int i40e_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd);
int i40e_open(struct net_device *netdev);
int i40e_close(struct net_device *netdev);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index d4b0cc36afb1..00a83afb02e9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3403,15 +3403,14 @@ void i40e_irq_dynamic_disable_icr0(struct i40e_pf *pf)
/**
* i40e_irq_dynamic_enable_icr0 - Enable default interrupt generation for icr0
* @pf: board private structure
- * @clearpba: true when all pending interrupt events should be cleared
**/
-void i40e_irq_dynamic_enable_icr0(struct i40e_pf *pf, bool clearpba)
+void i40e_irq_dynamic_enable_icr0(struct i40e_pf *pf)
{
struct i40e_hw *hw = &pf->hw;
u32 val;
val = I40E_PFINT_DYN_CTL0_INTENA_MASK |
- (clearpba ? I40E_PFINT_DYN_CTL0_CLEARPBA_MASK : 0) |
+ I40E_PFINT_DYN_CTL0_CLEARPBA_MASK |
(I40E_ITR_NONE << I40E_PFINT_DYN_CTL0_ITR_INDX_SHIFT);
wr32(hw, I40E_PFINT_DYN_CTL0, val);
@@ -3597,7 +3596,7 @@ static int i40e_vsi_enable_irq(struct i40e_vsi *vsi)
for (i = 0; i < vsi->num_q_vectors; i++)
i40e_irq_dynamic_enable(vsi, i);
} else {
- i40e_irq_dynamic_enable_icr0(pf, true);
+ i40e_irq_dynamic_enable_icr0(pf);
}
i40e_flush(&pf->hw);
@@ -3746,7 +3745,7 @@ static irqreturn_t i40e_intr(int irq, void *data)
wr32(hw, I40E_PFINT_ICR0_ENA, ena_mask);
if (!test_bit(__I40E_DOWN, pf->state)) {
i40e_service_event_schedule(pf);
- i40e_irq_dynamic_enable_icr0(pf, false);
+ i40e_irq_dynamic_enable_icr0(pf);
}
return ret;
@@ -8455,7 +8454,7 @@ static int i40e_setup_misc_vector(struct i40e_pf *pf)
i40e_flush(hw);
- i40e_irq_dynamic_enable_icr0(pf, true);
+ i40e_irq_dynamic_enable_icr0(pf);
return err;
}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 3bd176606c09..616abf79253e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2202,9 +2202,7 @@ static u32 i40e_buildreg_itr(const int type, const u16 itr)
u32 val;
val = I40E_PFINT_DYN_CTLN_INTENA_MASK |
- /* Don't clear PBA because that can cause lost interrupts that
- * came in while we were cleaning/polling
- */
+ I40E_PFINT_DYN_CTLN_CLEARPBA_MASK |
(type << I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT) |
(itr << I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT);
@@ -2241,7 +2239,7 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
/* If we don't have MSIX, then we only need to re-enable icr0 */
if (!(vsi->back->flags & I40E_FLAG_MSIX_ENABLED)) {
- i40e_irq_dynamic_enable_icr0(vsi->back, false);
+ i40e_irq_dynamic_enable_icr0(vsi->back);
return;
}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index c062d74d21f3..10298956a81b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1358,7 +1358,7 @@ int i40e_alloc_vfs(struct i40e_pf *pf, u16 num_alloc_vfs)
i40e_free_vfs(pf);
err_iov:
/* Re-enable interrupt 0. */
- i40e_irq_dynamic_enable_icr0(pf, false);
+ i40e_irq_dynamic_enable_icr0(pf);
return ret;
}
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 37e1de886d48..fe817e2b6fef 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1409,9 +1409,7 @@ static u32 i40e_buildreg_itr(const int type, const u16 itr)
u32 val;
val = I40E_VFINT_DYN_CTLN1_INTENA_MASK |
- /* Don't clear PBA because that can cause lost interrupts that
- * came in while we were cleaning/polling
- */
+ I40E_VFINT_DYN_CTLN1_CLEARPBA_MASK |
(type << I40E_VFINT_DYN_CTLN1_ITR_INDX_SHIFT) |
(itr << I40E_VFINT_DYN_CTLN1_INTERVAL_SHIFT);
--
2.14.2
^ permalink raw reply related
* [net-next 06/14] i40e: reduce lrxqthresh from 2 to 1
From: Jeff Kirsher @ 2017-10-09 22:38 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171009223841.2557-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
The lrxq thresh value tells hardware to immediately interrupt when there
are fewer than N*64 packets left in the ring.
Counter intuitively, empirical testing has shown that decreasing this
value from 2 to 1, and thus changing from an immediate interrupt at
fewer than 128 descriptors down to 64 descriptors causes a small
increase in the maximum total packets per second we can receive. This
increase occurs even when we're polling with interrupts masked, as the
hardware must still handle interrupts internally even if we've disabled
them in software.
Also reduce the value for any VFs we allocate.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 00a83afb02e9..74875ddaeb33 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3030,7 +3030,7 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
if (hw->revision_id == 0)
rx_ctx.lrxqthresh = 0;
else
- rx_ctx.lrxqthresh = 2;
+ rx_ctx.lrxqthresh = 1;
rx_ctx.crcstrip = 1;
rx_ctx.l2tsel = 1;
/* this controls whether VLAN is stripped from inner headers */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 10298956a81b..83727906a386 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -639,7 +639,7 @@ static int i40e_config_vsi_rx_queue(struct i40e_vf *vf, u16 vsi_id,
rx_ctx.dsize = 1;
/* default values */
- rx_ctx.lrxqthresh = 2;
+ rx_ctx.lrxqthresh = 1;
rx_ctx.crcstrip = 1;
rx_ctx.prefena = 1;
rx_ctx.l2tsel = 1;
--
2.14.2
^ permalink raw reply related
* [net-next 07/14] i40e/i40evf: bump tail only in multiples of 8
From: Jeff Kirsher @ 2017-10-09 22:38 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171009223841.2557-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
Hardware only fetches descriptors on cachelines of 8, essentially
ignoring the lower 3 bits of the tail register. Thus, it is pointless to
bump tail by an unaligned access as the hardware will ignore some of the
new descriptors we allocated. Thus, it's ideal if we can ensure tail
writes are always aligned to 8.
At first, it seems like we'd already do this, since we allocate
descriptors in batches which are a multiple of 8. Since we'd always
increment by a multiple of 8, it seems like the value should always be
aligned.
However, this ignores allocation failures. If we fail to allocate
a buffer, our tail register will become unaligned. Once it has become
unaligned it will essentially be stuck unaligned until a buffer
allocation happens to fail at the exact amount necessary to re-align it.
We can do better, by simply rounding down the number of buffers we're
about to allocate (cleaned_count) such that "next_to_clean
+ cleaned_count" is rounded to the nearest multiple of 8.
We do this by calculating how far off that value is and subtracting it
from the cleaned_count. This essentially defers allocation of buffers if
they're going to be ignored by hardware anyways, and re-aligns our
next_to_use and tail values after a failure to allocate a descriptor.
This calculation ensures that we always align the tail writes in a way
the hardware expects and don't unnecessarily allocate buffers which
won't be fetched immediately.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 9 +++++++++
drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 9 +++++++++
2 files changed, 18 insertions(+)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 616abf79253e..a23306f04e00 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1372,6 +1372,15 @@ bool i40e_alloc_rx_buffers(struct i40e_ring *rx_ring, u16 cleaned_count)
union i40e_rx_desc *rx_desc;
struct i40e_rx_buffer *bi;
+ /* Hardware only fetches new descriptors in cache lines of 8,
+ * essentially ignoring the lower 3 bits of the tail register. We want
+ * to ensure our tail writes are aligned to avoid unnecessary work. We
+ * can't simply round down the cleaned count, since we might fail to
+ * allocate some buffers. What we really want is to ensure that
+ * next_to_used + cleaned_count produces an aligned value.
+ */
+ cleaned_count -= (ntu + cleaned_count) & 0x7;
+
/* do nothing if no valid netdev defined */
if (!rx_ring->netdev || !cleaned_count)
return false;
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index fe817e2b6fef..6806ada11490 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -711,6 +711,15 @@ bool i40evf_alloc_rx_buffers(struct i40e_ring *rx_ring, u16 cleaned_count)
union i40e_rx_desc *rx_desc;
struct i40e_rx_buffer *bi;
+ /* Hardware only fetches new descriptors in cache lines of 8,
+ * essentially ignoring the lower 3 bits of the tail register. We want
+ * to ensure our tail writes are aligned to avoid unnecessary work. We
+ * can't simply round down the cleaned count, since we might fail to
+ * allocate some buffers. What we really want is to ensure that
+ * next_to_used + cleaned_count produces an aligned value.
+ */
+ cleaned_count -= (ntu + cleaned_count) & 0x7;
+
/* do nothing if no valid netdev defined */
if (!rx_ring->netdev || !cleaned_count)
return false;
--
2.14.2
^ permalink raw reply related
* [net-next 10/14] i40e: add check for return from find_first_bit call
From: Jeff Kirsher @ 2017-10-09 22:38 UTC (permalink / raw)
To: davem; +Cc: Lihong Yang, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171009223841.2557-1-jeffrey.t.kirsher@intel.com>
From: Lihong Yang <lihong.yang@intel.com>
The find_first_bit function will return the size passed to search
if the first set bit is not found. This patch adds the check in case
that happens as the return value would be used as the index in an array
and that would have caused the out-of-bounds access.
Detected by CoverityScan, CID 1295969 Out-of-bounds access
Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 83727906a386..125dcd1d2233 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -306,6 +306,10 @@ static void i40e_config_irq_link_list(struct i40e_vf *vf, u16 vsi_id,
next_q = find_first_bit(&linklistmap,
(I40E_MAX_VSI_QP *
I40E_VIRTCHNL_SUPPORTED_QTYPES));
+ if (unlikely(next_q == (I40E_MAX_VSI_QP *
+ I40E_VIRTCHNL_SUPPORTED_QTYPES)))
+ goto irq_list_done;
+
vsi_queue_id = next_q / I40E_VIRTCHNL_SUPPORTED_QTYPES;
qtype = next_q % I40E_VIRTCHNL_SUPPORTED_QTYPES;
pf_queue_id = i40e_vc_get_pf_queue_id(vf, vsi_id, vsi_queue_id);
--
2.14.2
^ permalink raw reply related
* [net-next 09/14] i40e: allow XPS with QoS enabled
From: Jeff Kirsher @ 2017-10-09 22:38 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171009223841.2557-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
Recently, the kernel gained support for enabling XPS and QoS at the
same time. Thus, we no longer need to worry about the number of
traffic classes when enabling XPS.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 74875ddaeb33..b26f615bed5a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2879,23 +2879,18 @@ static void i40e_vsi_free_rx_resources(struct i40e_vsi *vsi)
**/
static void i40e_config_xps_tx_ring(struct i40e_ring *ring)
{
- struct i40e_vsi *vsi = ring->vsi;
int cpu;
if (!ring->q_vector || !ring->netdev)
return;
- if ((vsi->tc_config.numtc <= 1) &&
- !test_and_set_bit(__I40E_TX_XPS_INIT_DONE, ring->state)) {
- cpu = cpumask_local_spread(ring->q_vector->v_idx, -1);
- netif_set_xps_queue(ring->netdev, get_cpu_mask(cpu),
- ring->queue_index);
- }
+ /* We only initialize XPS once, so as not to overwrite user settings */
+ if (test_and_set_bit(__I40E_TX_XPS_INIT_DONE, ring->state))
+ return;
- /* schedule our worker thread which will take care of
- * applying the new filter changes
- */
- i40e_service_event_schedule(vsi->back);
+ cpu = cpumask_local_spread(ring->q_vector->v_idx, -1);
+ netif_set_xps_queue(ring->netdev, get_cpu_mask(cpu),
+ ring->queue_index);
}
/**
--
2.14.2
^ permalink raw reply related
* [net-next 08/14] i40e/i40evf: bundle more descriptors when allocating buffers
From: Jeff Kirsher @ 2017-10-09 22:38 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171009223841.2557-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
Double the number of descriptors we'll bundle into one tail bump when
receiving. Empirical testing has shown that we reduce CPU utilization
and don't appear to reduce throughput or packet rate. 32 seems to be the
sweet spot, as it's half the default polling budget, so we'd essentially
reduce from 4 tail writes when polling down to 2. Increasing this up to
64 appears to have negative impacts as it may become possible that we
don't bump the tail each time we get polled, which could cause a long
delay between returning descriptors to the hardware.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_txrx.h | 2 +-
drivers/net/ethernet/intel/i40evf/i40e_txrx.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index c3156aa3f709..ff57ae451524 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -208,7 +208,7 @@ static inline bool i40e_test_staterr(union i40e_rx_desc *rx_desc,
}
/* How many Rx Buffers do we bundle into one write to the hardware ? */
-#define I40E_RX_BUFFER_WRITE 16 /* Must be power of 2 */
+#define I40E_RX_BUFFER_WRITE 32 /* Must be power of 2 */
#define I40E_RX_INCREMENT(r, i) \
do { \
(i)++; \
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
index 8f9830d7649a..8d26c85d12e1 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
@@ -191,7 +191,7 @@ static inline bool i40e_test_staterr(union i40e_rx_desc *rx_desc,
}
/* How many Rx Buffers do we bundle into one write to the hardware ? */
-#define I40E_RX_BUFFER_WRITE 16 /* Must be power of 2 */
+#define I40E_RX_BUFFER_WRITE 32 /* Must be power of 2 */
#define I40E_RX_INCREMENT(r, i) \
do { \
(i)++; \
--
2.14.2
^ permalink raw reply related
* [net-next 12/14] i40e: use a local variable instead of calculating multiple times
From: Jeff Kirsher @ 2017-10-09 22:38 UTC (permalink / raw)
To: davem; +Cc: Lihong Yang, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171009223841.2557-1-jeffrey.t.kirsher@intel.com>
From: Lihong Yang <lihong.yang@intel.com>
The computed result of I40E_MAX_VSI_QP * I40E_VIRTCHNL_SUPPORTED_QTYPES
is used more than three times in function i40e_config_irq_link_list.
Simply declare a local variable to store it to improve readability.
Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 20 +++++++-------------
1 file changed, 7 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 125dcd1d2233..0c4fa225c7be 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -273,7 +273,7 @@ static void i40e_config_irq_link_list(struct i40e_vf *vf, u16 vsi_id,
struct i40e_hw *hw = &pf->hw;
u16 vsi_queue_id, pf_queue_id;
enum i40e_queue_type qtype;
- u16 next_q, vector_id;
+ u16 next_q, vector_id, size;
u32 reg, reg_idx;
u16 itr_idx = 0;
@@ -303,11 +303,9 @@ static void i40e_config_irq_link_list(struct i40e_vf *vf, u16 vsi_id,
vsi_queue_id + 1));
}
- next_q = find_first_bit(&linklistmap,
- (I40E_MAX_VSI_QP *
- I40E_VIRTCHNL_SUPPORTED_QTYPES));
- if (unlikely(next_q == (I40E_MAX_VSI_QP *
- I40E_VIRTCHNL_SUPPORTED_QTYPES)))
+ size = I40E_MAX_VSI_QP * I40E_VIRTCHNL_SUPPORTED_QTYPES;
+ next_q = find_first_bit(&linklistmap, size);
+ if (unlikely(next_q == size))
goto irq_list_done;
vsi_queue_id = next_q / I40E_VIRTCHNL_SUPPORTED_QTYPES;
@@ -317,7 +315,7 @@ static void i40e_config_irq_link_list(struct i40e_vf *vf, u16 vsi_id,
wr32(hw, reg_idx, reg);
- while (next_q < (I40E_MAX_VSI_QP * I40E_VIRTCHNL_SUPPORTED_QTYPES)) {
+ while (next_q < size) {
switch (qtype) {
case I40E_QUEUE_TYPE_RX:
reg_idx = I40E_QINT_RQCTL(pf_queue_id);
@@ -331,12 +329,8 @@ static void i40e_config_irq_link_list(struct i40e_vf *vf, u16 vsi_id,
break;
}
- next_q = find_next_bit(&linklistmap,
- (I40E_MAX_VSI_QP *
- I40E_VIRTCHNL_SUPPORTED_QTYPES),
- next_q + 1);
- if (next_q <
- (I40E_MAX_VSI_QP * I40E_VIRTCHNL_SUPPORTED_QTYPES)) {
+ next_q = find_next_bit(&linklistmap, size, next_q + 1);
+ if (next_q < size) {
vsi_queue_id = next_q / I40E_VIRTCHNL_SUPPORTED_QTYPES;
qtype = next_q % I40E_VIRTCHNL_SUPPORTED_QTYPES;
pf_queue_id = i40e_vc_get_pf_queue_id(vf, vsi_id,
--
2.14.2
^ permalink raw reply related
* [net-next 13/14] i40e: fix a typo
From: Jeff Kirsher @ 2017-10-09 22:38 UTC (permalink / raw)
To: davem; +Cc: Rami Rosen, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171009223841.2557-1-jeffrey.t.kirsher@intel.com>
From: Rami Rosen <rami.rosen@intel.com>
This patch fixes a typo in i40e_vsi_alloc_arrays() documentation.
The first parameter name should be "vsi" instead of "type".
Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b26f615bed5a..4de52001a2b9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7688,7 +7688,7 @@ static int i40e_set_num_rings_in_vsi(struct i40e_vsi *vsi)
/**
* i40e_vsi_alloc_arrays - Allocate queue and vector pointer arrays for the vsi
- * @type: VSI pointer
+ * @vsi: VSI pointer
* @alloc_qvectors: a bool to specify if q_vectors need to be allocated.
*
* On error: returns error code (negative)
--
2.14.2
^ permalink raw reply related
* [net-next 11/14] i40e: Retry AQC GetPhyAbilities to overcome I2CRead hangs
From: Jeff Kirsher @ 2017-10-09 22:38 UTC (permalink / raw)
To: davem
Cc: Jayaprakash Shanmugam, netdev, nhorman, sassmann, jogreene,
Jeff Kirsher
In-Reply-To: <20171009223841.2557-1-jeffrey.t.kirsher@intel.com>
From: Jayaprakash Shanmugam <jayaprakash.shanmugam@intel.com>
- When the I2C is busy, the PHY reads are delayed. The firmware will
return EGAIN in these cases with an expectation that the SW will
trigger the reads again
- This patch retries the operation for a maximum period of 500ms
Signed-off-by: Jayaprakash Shanmugam <jayaprakash.shanmugam@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_common.c | 42 ++++++++++++++++++---------
drivers/net/ethernet/intel/i40e/i40e_type.h | 3 ++
drivers/net/ethernet/intel/i40evf/i40e_type.h | 3 ++
3 files changed, 35 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 60542beda7ad..53aad378d49c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -1567,30 +1567,46 @@ i40e_status i40e_aq_get_phy_capabilities(struct i40e_hw *hw,
struct i40e_aq_desc desc;
i40e_status status;
u16 abilities_size = sizeof(struct i40e_aq_get_phy_abilities_resp);
+ u16 max_delay = I40E_MAX_PHY_TIMEOUT, total_delay = 0;
if (!abilities)
return I40E_ERR_PARAM;
- i40e_fill_default_direct_cmd_desc(&desc,
- i40e_aqc_opc_get_phy_abilities);
+ do {
+ i40e_fill_default_direct_cmd_desc(&desc,
+ i40e_aqc_opc_get_phy_abilities);
- desc.flags |= cpu_to_le16((u16)I40E_AQ_FLAG_BUF);
- if (abilities_size > I40E_AQ_LARGE_BUF)
- desc.flags |= cpu_to_le16((u16)I40E_AQ_FLAG_LB);
+ desc.flags |= cpu_to_le16((u16)I40E_AQ_FLAG_BUF);
+ if (abilities_size > I40E_AQ_LARGE_BUF)
+ desc.flags |= cpu_to_le16((u16)I40E_AQ_FLAG_LB);
- if (qualified_modules)
- desc.params.external.param0 |=
+ if (qualified_modules)
+ desc.params.external.param0 |=
cpu_to_le32(I40E_AQ_PHY_REPORT_QUALIFIED_MODULES);
- if (report_init)
- desc.params.external.param0 |=
+ if (report_init)
+ desc.params.external.param0 |=
cpu_to_le32(I40E_AQ_PHY_REPORT_INITIAL_VALUES);
- status = i40e_asq_send_command(hw, &desc, abilities, abilities_size,
- cmd_details);
+ status = i40e_asq_send_command(hw, &desc, abilities,
+ abilities_size, cmd_details);
- if (hw->aq.asq_last_status == I40E_AQ_RC_EIO)
- status = I40E_ERR_UNKNOWN_PHY;
+ if (status)
+ break;
+
+ if (hw->aq.asq_last_status == I40E_AQ_RC_EIO) {
+ status = I40E_ERR_UNKNOWN_PHY;
+ break;
+ } else if (hw->aq.asq_last_status == I40E_AQ_RC_EAGAIN) {
+ usleep_range(1000, 2000);
+ total_delay++;
+ status = I40E_ERR_TIMEOUT;
+ }
+ } while ((hw->aq.asq_last_status != I40E_AQ_RC_OK) &&
+ (total_delay < max_delay));
+
+ if (status)
+ return status;
if (report_init) {
if (hw->mac.type == I40E_MAC_XL710 &&
diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h b/drivers/net/ethernet/intel/i40e/i40e_type.h
index 4b32b1d38a66..0410fcbdbb94 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -46,6 +46,9 @@
/* Max default timeout in ms, */
#define I40E_MAX_NVM_TIMEOUT 18000
+/* Max timeout in ms for the phy to respond */
+#define I40E_MAX_PHY_TIMEOUT 500
+
/* Switch from ms to the 1usec global time (this is the GTIME resolution) */
#define I40E_MS_TO_GTIME(time) ((time) * 1000)
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_type.h b/drivers/net/ethernet/intel/i40evf/i40e_type.h
index 9364b67fff9c..213b773dfad6 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_type.h
@@ -46,6 +46,9 @@
/* Max default timeout in ms, */
#define I40E_MAX_NVM_TIMEOUT 18000
+/* Max timeout in ms for the phy to respond */
+#define I40E_MAX_PHY_TIMEOUT 500
+
/* Switch from ms to the 1usec global time (this is the GTIME resolution) */
#define I40E_MS_TO_GTIME(time) ((time) * 1000)
--
2.14.2
^ permalink raw reply related
* [net-next 14/14] i40e: Avoid some useless variables and initializers in NVM functions
From: Jeff Kirsher @ 2017-10-09 22:38 UTC (permalink / raw)
To: davem; +Cc: Stefano Brivio, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <20171009223841.2557-1-jeffrey.t.kirsher@intel.com>
From: Stefano Brivio <sbrivio@redhat.com>
Fixes: 09f79fd49d94 ("i40e: avoid NVM acquire deadlock during NVM update")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_nvm.c | 20 +++++++-------------
1 file changed, 7 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_nvm.c b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
index 57505b1df98d..151d9cfb6ea4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_nvm.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_nvm.c
@@ -311,13 +311,10 @@ static i40e_status i40e_read_nvm_word_aq(struct i40e_hw *hw, u16 offset,
static i40e_status __i40e_read_nvm_word(struct i40e_hw *hw,
u16 offset, u16 *data)
{
- i40e_status ret_code = 0;
-
if (hw->flags & I40E_HW_FLAG_AQ_SRCTL_ACCESS_ENABLE)
- ret_code = i40e_read_nvm_word_aq(hw, offset, data);
- else
- ret_code = i40e_read_nvm_word_srctl(hw, offset, data);
- return ret_code;
+ return i40e_read_nvm_word_aq(hw, offset, data);
+
+ return i40e_read_nvm_word_srctl(hw, offset, data);
}
/**
@@ -331,7 +328,7 @@ static i40e_status __i40e_read_nvm_word(struct i40e_hw *hw,
i40e_status i40e_read_nvm_word(struct i40e_hw *hw, u16 offset,
u16 *data)
{
- i40e_status ret_code = 0;
+ i40e_status ret_code;
ret_code = i40e_acquire_nvm(hw, I40E_RESOURCE_READ);
if (ret_code)
@@ -446,13 +443,10 @@ static i40e_status __i40e_read_nvm_buffer(struct i40e_hw *hw,
u16 offset, u16 *words,
u16 *data)
{
- i40e_status ret_code = 0;
-
if (hw->flags & I40E_HW_FLAG_AQ_SRCTL_ACCESS_ENABLE)
- ret_code = i40e_read_nvm_buffer_aq(hw, offset, words, data);
- else
- ret_code = i40e_read_nvm_buffer_srctl(hw, offset, words, data);
- return ret_code;
+ return i40e_read_nvm_buffer_aq(hw, offset, words, data);
+
+ return i40e_read_nvm_buffer_srctl(hw, offset, words, data);
}
/**
--
2.14.2
^ permalink raw reply related
* Re: [PATCH] cdc_ether: flag the u-blox TOBY-L2 and SARA-U2 as wwan
From: David Miller @ 2017-10-09 23:03 UTC (permalink / raw)
To: aleksander
Cc: oliver, marco.demarco, stefano.godeas, linux-usb, netdev,
linux-kernel
In-Reply-To: <20171009120512.16681-1-aleksander@aleksander.es>
From: Aleksander Morgado <aleksander@aleksander.es>
Date: Mon, 9 Oct 2017 14:05:12 +0200
> The u-blox TOBY-L2 is a LTE Cat 4 module with HSPA+ and 2G fallback.
> This module allows switching to different USB profiles with the
> 'AT+UUSBCONF' command, and provides a ECM network interface when the
> 'AT+UUSBCONF=2' profile is selected.
>
> The u-blox SARA-U2 is a HSPA module with 2G fallback. The default USB
> configuration includes a ECM network interface.
>
> Both these modules are controlled via AT commands through one of the
> TTYs exposed. Connecting these modules may be done just by activating
> the desired PDP context with 'AT+CGACT=1,<cid>' and then running DHCP
> on the ECM interface.
>
> Signed-off-by: Aleksander Morgado <aleksander@aleksander.es>
Applied, thank you.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox