* Re: [PATCH 06/11] evm: Allow setxattr() and setattr() if metadata digest won't change
From: Mimi Zohar @ 2020-08-24 12:17 UTC (permalink / raw)
To: Roberto Sassu, mjg59
Cc: linux-integrity, linux-security-module, linux-kernel,
silviu.vlasceanu
In-Reply-To: <20200618160458.1579-6-roberto.sassu@huawei.com>
On Thu, 2020-06-18 at 18:04 +0200, Roberto Sassu wrote:
> If metadata are immutable, they cannot be changed. If metadata are already
> set to the final value before cp and tar restore the value from the source,
> those applications display an error even if the operation is legitimate
> (they don't change the value).
"metadata" is singular. The first sentence would be clearer by using
the specific metadata. What problem are you trying to solve? It
doesn't look like you're trying to solve the problem of writing the EVM
portable signatures without an exiting HMAC.
>
> This patch determines whether setxattr()/setattr() change metadata and, if
> not, allows the operations even if metadata are immutable.
Doesn't setxattr/setattr always change file metadata? Please describe
the real problem.
>
> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
> ---
> security/integrity/evm/evm_main.c | 72 +++++++++++++++++++++++++++++++
> 1 file changed, 72 insertions(+)
>
> diff --git a/security/integrity/evm/evm_main.c b/security/integrity/evm/evm_main.c
> index 30072030f05d..41cc6a4aaaab 100644
> --- a/security/integrity/evm/evm_main.c
> +++ b/security/integrity/evm/evm_main.c
> @@ -18,6 +18,7 @@
> #include <linux/integrity.h>
> #include <linux/evm.h>
> #include <linux/magic.h>
> +#include <linux/posix_acl_xattr.h>
>
> #include <crypto/hash.h>
> #include <crypto/hash_info.h>
> @@ -305,6 +306,56 @@ static enum integrity_status evm_verify_current_integrity(struct dentry *dentry)
> return evm_verify_hmac(dentry, NULL, NULL, 0, NULL);
> }
>
> +static int evm_xattr_acl_change(struct dentry *dentry, const char *xattr_name,
> + const void *xattr_value, size_t xattr_value_len)
> +{
> + umode_t mode;
> + struct posix_acl *acl = NULL, *acl_res;
> + struct inode *inode = d_backing_inode(dentry);
> + int rc;
> +
> + /* UID/GID in ACL have been already converted from user to init ns */
> + acl = posix_acl_from_xattr(&init_user_ns, xattr_value, xattr_value_len);
> + if (!acl)
> + return 1;
> +
> + acl_res = acl;
> + rc = posix_acl_update_mode(inode, &mode, &acl_res);
> +
> + posix_acl_release(acl);
> +
> + if (rc)
> + return 1;
> +
> + if (acl_res && inode->i_mode != mode)
> + return 1;
> +
> + return 0;
> +}
> +
> +static int evm_xattr_change(struct dentry *dentry, const char *xattr_name,
> + const void *xattr_value, size_t xattr_value_len)
> +{
> + char *xattr_data = NULL;
> + int rc = 0;
> +
> + if (posix_xattr_acl(xattr_name))
> + return evm_xattr_acl_change(dentry, xattr_name, xattr_value,
> + xattr_value_len);
> +
> + rc = vfs_getxattr_alloc(dentry, xattr_name, &xattr_data, 0, GFP_NOFS);
> + if (rc < 0)
> + return 1;
> +
> + if (rc == xattr_value_len)
> + rc = memcmp(xattr_value, xattr_data, rc);
> + else
> + rc = 1;
> +
> + kfree(xattr_data);
> + return rc;
> +}
> +
> /*
> * evm_protect_xattr - protect the EVM extended attribute
> *
> @@ -361,6 +412,10 @@ static int evm_protect_xattr(struct dentry *dentry, const char *xattr_name,
> if (evm_status == INTEGRITY_FAIL_IMMUTABLE)
> return 0;
>
> + if (evm_status == INTEGRITY_PASS_IMMUTABLE &&
> + !evm_xattr_change(dentry, xattr_name, xattr_value, xattr_value_len))
> + return 0;
> +
> if (evm_status != INTEGRITY_PASS)
> integrity_audit_msg(AUDIT_INTEGRITY_METADATA, d_backing_inode(dentry),
> dentry->d_name.name, "appraise_metadata",
> @@ -477,6 +532,19 @@ void evm_inode_post_removexattr(struct dentry *dentry, const char *xattr_name)
> evm_update_evmxattr(dentry, xattr_name, NULL, 0);
> }
>
> +static int evm_attr_change(struct dentry *dentry, struct iattr *attr)
static functions don't normally require a function comment, but in this
case it wouldn't hurt to clarify why the uid, gid, mode bits are set,
but aren't being modified.
Similarly a function comment would help with the readability of
evm_xattr_acl_change().
Mimi
> +{
> + struct inode *inode = d_backing_inode(dentry);
> + unsigned int ia_valid = attr->ia_valid;
> +
> + if ((!(ia_valid & ATTR_UID) || uid_eq(attr->ia_uid, inode->i_uid)) &&
> + (!(ia_valid & ATTR_GID) || gid_eq(attr->ia_gid, inode->i_gid)) &&
> + (!(ia_valid & ATTR_MODE) || attr->ia_mode == inode->i_mode))
> + return 0;
> +
> + return 1;
> +}
> +
> /**
> * evm_inode_setattr - prevent updating an invalid EVM extended attribute
> * @dentry: pointer to the affected dentry
> @@ -506,6 +574,10 @@ int evm_inode_setattr(struct dentry *dentry, struct iattr *attr)
> (evm_status == INTEGRITY_FAIL_IMMUTABLE))
> return 0;
>
> + if (evm_status == INTEGRITY_PASS_IMMUTABLE &&
> + !evm_attr_change(dentry, attr))
> + return 0;
> +
> integrity_audit_msg(AUDIT_INTEGRITY_METADATA, d_backing_inode(dentry),
> dentry->d_name.name, "appraise_metadata",
> integrity_status_msg[evm_status], -EPERM, 0);
^ permalink raw reply
* Re: [PATCH 05/11] evm: Allow xattr/attr operations for portable signatures if check fails
From: Mimi Zohar @ 2020-08-24 12:16 UTC (permalink / raw)
To: Roberto Sassu, mjg59; +Cc: linux-integrity, linux-security-module, linux-kernel
In-Reply-To: <20200618160133.937-5-roberto.sassu@huawei.com>
On Thu, 2020-06-18 at 18:01 +0200, Roberto Sassu wrote:
> If files with portable signatures are copied from one location to another
> or are extracted from an archive, verification can temporarily fail until
> all xattrs/attrs are set in the destination. Portable signatures are the
> only ones that can be moved to different files, as they don't depend on
> system-specific information such as the inode generation.
^Only portable signatures may be moved or copied from one file to
another, as they ... Instead portable signatures must include
"security.ima".
> Unlike other security.evm types, portable signatures
^, EVM portable signatures are also immutable. They
> can never be replaced
> even if an xattr/attr operation is granted, as once evm_update_evmxattr()
> detects this type, it returns without updating the HMAC. Thus, it wouldn't
> be a problem to allow those operations so that verification passes on the
> destination after all xattrs/attrs are copied.
This needs to be reworded a bit.
>
> This patch first introduces a new integrity status called
> INTEGRITY_FAIL_IMMUTABLE, that allows callers of
> evm_verify_current_integrity() to detect that a portable signature didn't
> pass verification and then adds an exception in evm_protect_xattr() and
> evm_inode_setattr() for this status and returns 0 instead of -EPERM.
>
> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
After this patch, nothing prevents modifying the xattrs after all of
them are in place and the signature verification would be successful.
(Ok, that is being addressed in subsequent patches.)
> ---
> include/linux/integrity.h | 1 +
> security/integrity/evm/evm_main.c | 25 ++++++++++++++++++++-----
> security/integrity/ima/ima_appraise.c | 1 +
> 3 files changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/integrity.h b/include/linux/integrity.h
> index 2271939c5c31..2ea0f2f65ab6 100644
> --- a/include/linux/integrity.h
> +++ b/include/linux/integrity.h
> @@ -13,6 +13,7 @@ enum integrity_status {
> INTEGRITY_PASS = 0,
> INTEGRITY_PASS_IMMUTABLE,
> INTEGRITY_FAIL,
> + INTEGRITY_FAIL_IMMUTABLE,
> INTEGRITY_NOLABEL,
> INTEGRITY_NOXATTRS,
> INTEGRITY_UNKNOWN,
> diff --git a/security/integrity/evm/evm_main.c b/security/integrity/evm/evm_main.c
> index 4e9f5e8b21d5..30072030f05d 100644
> --- a/security/integrity/evm/evm_main.c
> +++ b/security/integrity/evm/evm_main.c
> @@ -27,7 +27,8 @@
> int evm_initialized;
>
> static const char * const integrity_status_msg[] = {
> - "pass", "pass_immutable", "fail", "no_label", "no_xattrs", "unknown"
> + "pass", "pass_immutable", "fail", "fail_immutable", "no_label",
> + "no_xattrs", "unknown"
> };
> int evm_hmac_attrs;
>
> @@ -134,7 +135,7 @@ static enum integrity_status evm_verify_hmac(struct dentry *dentry,
> enum integrity_status evm_status = INTEGRITY_PASS;
> struct evm_digest digest;
> struct inode *inode;
> - int rc, xattr_len;
> + int rc, xattr_len, evm_immutable = 0;
>
> if (iint && (iint->evm_status == INTEGRITY_PASS ||
> iint->evm_status == INTEGRITY_PASS_IMMUTABLE))
> @@ -179,8 +180,10 @@ static enum integrity_status evm_verify_hmac(struct dentry *dentry,
> if (rc)
> rc = -EINVAL;
> break;
> - case EVM_IMA_XATTR_DIGSIG:
> case EVM_XATTR_PORTABLE_DIGSIG:
> + evm_immutable = 1;
> + fallthrough;
> + case EVM_IMA_XATTR_DIGSIG:
> /* accept xattr with non-empty signature field */
> if (xattr_len <= sizeof(struct signature_v2_hdr)) {
> evm_status = INTEGRITY_FAIL;
> @@ -219,7 +222,8 @@ static enum integrity_status evm_verify_hmac(struct dentry *dentry,
>
> if (rc)
> evm_status = (rc == -ENODATA) ?
> - INTEGRITY_NOXATTRS : INTEGRITY_FAIL;
> + INTEGRITY_NOXATTRS : evm_immutable ?
> + INTEGRITY_FAIL_IMMUTABLE : INTEGRITY_FAIL;
Embedded ternary operator should be replaced with normal C syntax.
> out:
> if (iint)
> iint->evm_status = evm_status;
> @@ -351,6 +355,12 @@ static int evm_protect_xattr(struct dentry *dentry, const char *xattr_name,
> -EPERM, 0);
> }
> out:
> + /* It is safe to allow fail_immutable, portable signatures can never be
> + * updated
> + */
Replace "It" with "Writing other xattrs". Writing other xattrs is
safe for portable signatures, as portable signatures are immutable and
...."
> + if (evm_status == INTEGRITY_FAIL_IMMUTABLE)
> + return 0;
> +
> if (evm_status != INTEGRITY_PASS)
> integrity_audit_msg(AUDIT_INTEGRITY_METADATA, d_backing_inode(dentry),
> dentry->d_name.name, "appraise_metadata",
> @@ -488,9 +498,14 @@ int evm_inode_setattr(struct dentry *dentry, struct iattr *attr)
> if (!(ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID)))
> return 0;
> evm_status = evm_verify_current_integrity(dentry);
> + /* It is safe to allow fail_immutable, portable signatures can never
> + * be updated
> + */
Replace "It" with what is safe.
Mimi
> if ((evm_status == INTEGRITY_PASS) ||
> - (evm_status == INTEGRITY_NOXATTRS))
> + (evm_status == INTEGRITY_NOXATTRS) ||
> + (evm_status == INTEGRITY_FAIL_IMMUTABLE))
> return 0;
> +
> integrity_audit_msg(AUDIT_INTEGRITY_METADATA, d_backing_inode(dentry),
> dentry->d_name.name, "appraise_metadata",
> integrity_status_msg[evm_status], -EPERM, 0);
> diff --git a/security/integrity/ima/ima_appraise.c b/security/integrity/ima/ima_appraise.c
> index a9649b04b9f1..21bda264fc30 100644
> --- a/security/integrity/ima/ima_appraise.c
> +++ b/security/integrity/ima/ima_appraise.c
> @@ -393,6 +393,7 @@ int ima_appraise_measurement(enum ima_hooks func,
> case INTEGRITY_NOLABEL: /* No security.evm xattr. */
> cause = "missing-HMAC";
> goto out;
> + case INTEGRITY_FAIL_IMMUTABLE:
> case INTEGRITY_FAIL: /* Invalid HMAC/signature. */
> cause = "invalid-HMAC";
> goto out;
^ permalink raw reply
* Re: [PATCH 04/11] evm: Check size of security.evm before using it
From: Mimi Zohar @ 2020-08-24 12:14 UTC (permalink / raw)
To: Roberto Sassu, mjg59
Cc: linux-integrity, linux-security-module, linux-kernel, stable
In-Reply-To: <20200618160133.937-4-roberto.sassu@huawei.com>
On Thu, 2020-06-18 at 18:01 +0200, Roberto Sassu wrote:
> This patch checks the size for the EVM_IMA_XATTR_DIGSIG and
> EVM_XATTR_PORTABLE_DIGSIG types to ensure that the algorithm is read from
> the buffer returned by vfs_getxattr_alloc().
>
> Cc: stable@vger.kernel.org # 4.19.x
> Fixes: 5feeb61183dde ("evm: Allow non-SHA1 digital signatures")
> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
^ permalink raw reply
* [PATCH bpf-next v9 2/7] bpf: Generalize caching for sk_storage.
From: KP Singh @ 2020-08-23 16:56 UTC (permalink / raw)
To: linux-kernel, bpf, linux-security-module
Cc: Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann,
Paul Turner, Jann Horn, Florent Revest
In-Reply-To: <20200823165612.404892-1-kpsingh@chromium.org>
From: KP Singh <kpsingh@google.com>
Provide the a ability to define local storage caches on a per-object
type basis. The caches and caching indices for different objects should
not be inter-mixed as suggested in:
https://lore.kernel.org/bpf/20200630193441.kdwnkestulg5erii@kafai-mbp.dhcp.thefacebook.com/
"Caching a sk-storage at idx=0 of a sk should not stop an
inode-storage to be cached at the same idx of a inode."
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: KP Singh <kpsingh@google.com>
---
include/net/bpf_sk_storage.h | 19 +++++++++++++++++++
net/core/bpf_sk_storage.c | 31 +++++++++++++++----------------
2 files changed, 34 insertions(+), 16 deletions(-)
diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h
index 5036c94c0503..950c5aaba15e 100644
--- a/include/net/bpf_sk_storage.h
+++ b/include/net/bpf_sk_storage.h
@@ -3,6 +3,9 @@
#ifndef _BPF_SK_STORAGE_H
#define _BPF_SK_STORAGE_H
+#include <linux/types.h>
+#include <linux/spinlock.h>
+
struct sock;
void bpf_sk_storage_free(struct sock *sk);
@@ -15,6 +18,22 @@ struct sk_buff;
struct nlattr;
struct sock;
+#define BPF_LOCAL_STORAGE_CACHE_SIZE 16
+
+struct bpf_local_storage_cache {
+ spinlock_t idx_lock;
+ u64 idx_usage_counts[BPF_LOCAL_STORAGE_CACHE_SIZE];
+};
+
+#define DEFINE_BPF_STORAGE_CACHE(name) \
+static struct bpf_local_storage_cache name = { \
+ .idx_lock = __SPIN_LOCK_UNLOCKED(name.idx_lock), \
+}
+
+u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache);
+void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache,
+ u16 idx);
+
#ifdef CONFIG_BPF_SYSCALL
int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk);
struct bpf_sk_storage_diag *
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index f975e2d01207..ec61ee7c7ee4 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -14,6 +14,8 @@
#define BPF_LOCAL_STORAGE_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_CLONE)
+DEFINE_BPF_STORAGE_CACHE(sk_cache);
+
struct bpf_local_storage_map_bucket {
struct hlist_head list;
raw_spinlock_t lock;
@@ -78,10 +80,6 @@ struct bpf_local_storage_elem {
#define SELEM(_SDATA) \
container_of((_SDATA), struct bpf_local_storage_elem, sdata)
#define SDATA(_SELEM) (&(_SELEM)->sdata)
-#define BPF_LOCAL_STORAGE_CACHE_SIZE 16
-
-static DEFINE_SPINLOCK(cache_idx_lock);
-static u64 cache_idx_usage_counts[BPF_LOCAL_STORAGE_CACHE_SIZE];
struct bpf_local_storage {
struct bpf_local_storage_data __rcu *cache[BPF_LOCAL_STORAGE_CACHE_SIZE];
@@ -521,16 +519,16 @@ static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
return 0;
}
-static u16 cache_idx_get(void)
+u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache)
{
u64 min_usage = U64_MAX;
u16 i, res = 0;
- spin_lock(&cache_idx_lock);
+ spin_lock(&cache->idx_lock);
for (i = 0; i < BPF_LOCAL_STORAGE_CACHE_SIZE; i++) {
- if (cache_idx_usage_counts[i] < min_usage) {
- min_usage = cache_idx_usage_counts[i];
+ if (cache->idx_usage_counts[i] < min_usage) {
+ min_usage = cache->idx_usage_counts[i];
res = i;
/* Found a free cache_idx */
@@ -538,18 +536,19 @@ static u16 cache_idx_get(void)
break;
}
}
- cache_idx_usage_counts[res]++;
+ cache->idx_usage_counts[res]++;
- spin_unlock(&cache_idx_lock);
+ spin_unlock(&cache->idx_lock);
return res;
}
-static void cache_idx_free(u16 idx)
+void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache,
+ u16 idx)
{
- spin_lock(&cache_idx_lock);
- cache_idx_usage_counts[idx]--;
- spin_unlock(&cache_idx_lock);
+ spin_lock(&cache->idx_lock);
+ cache->idx_usage_counts[idx]--;
+ spin_unlock(&cache->idx_lock);
}
/* Called by __sk_destruct() & bpf_sk_storage_clone() */
@@ -601,7 +600,7 @@ static void bpf_local_storage_map_free(struct bpf_map *map)
smap = (struct bpf_local_storage_map *)map;
- cache_idx_free(smap->cache_idx);
+ bpf_local_storage_cache_idx_free(&sk_cache, smap->cache_idx);
/* Note that this map might be concurrently cloned from
* bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone
@@ -718,7 +717,7 @@ static struct bpf_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
smap->elem_size =
sizeof(struct bpf_local_storage_elem) + attr->value_size;
- smap->cache_idx = cache_idx_get();
+ smap->cache_idx = bpf_local_storage_cache_idx_get(&sk_cache);
return &smap->map;
}
--
2.28.0.297.g1956fa8f8d-goog
^ permalink raw reply related
* [PATCH bpf-next v9 7/7] bpf: Add selftests for local_storage
From: KP Singh @ 2020-08-23 16:56 UTC (permalink / raw)
To: linux-kernel, bpf, linux-security-module
Cc: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
Martin KaFai Lau, Paul Turner, Jann Horn, Florent Revest
In-Reply-To: <20200823165612.404892-1-kpsingh@chromium.org>
From: KP Singh <kpsingh@google.com>
inode_local_storage:
* Hook to the file_open and inode_unlink LSM hooks.
* Create and unlink a temporary file.
* Store some information in the inode's bpf_local_storage during
file_open.
* Verify that this information exists when the file is unlinked.
sk_local_storage:
* Hook to the socket_post_create and socket_bind LSM hooks.
* Open and bind a socket and set the sk_storage in the
socket_post_create hook using the start_server helper.
* Verify if the information is set in the socket_bind hook.
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: KP Singh <kpsingh@google.com>
---
.../bpf/prog_tests/test_local_storage.c | 60 ++++++++
.../selftests/bpf/progs/local_storage.c | 140 ++++++++++++++++++
2 files changed, 200 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_local_storage.c
create mode 100644 tools/testing/selftests/bpf/progs/local_storage.c
diff --git a/tools/testing/selftests/bpf/prog_tests/test_local_storage.c b/tools/testing/selftests/bpf/prog_tests/test_local_storage.c
new file mode 100644
index 000000000000..91cd6f357246
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/test_local_storage.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (C) 2020 Google LLC.
+ */
+
+#include <test_progs.h>
+#include <linux/limits.h>
+
+#include "local_storage.skel.h"
+#include "network_helpers.h"
+
+int create_and_unlink_file(void)
+{
+ char fname[PATH_MAX] = "/tmp/fileXXXXXX";
+ int fd;
+
+ fd = mkstemp(fname);
+ if (fd < 0)
+ return fd;
+
+ close(fd);
+ unlink(fname);
+ return 0;
+}
+
+void test_test_local_storage(void)
+{
+ struct local_storage *skel = NULL;
+ int err, duration = 0, serv_sk = -1;
+
+ skel = local_storage__open_and_load();
+ if (CHECK(!skel, "skel_load", "lsm skeleton failed\n"))
+ goto close_prog;
+
+ err = local_storage__attach(skel);
+ if (CHECK(err, "attach", "lsm attach failed: %d\n", err))
+ goto close_prog;
+
+ skel->bss->monitored_pid = getpid();
+
+ err = create_and_unlink_file();
+ if (CHECK(err < 0, "exec_cmd", "err %d errno %d\n", err, errno))
+ goto close_prog;
+
+ CHECK(skel->data->inode_storage_result != 0, "inode_storage_result",
+ "inode_local_storage not set\n");
+
+ serv_sk = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0);
+ if (CHECK(serv_sk < 0, "start_server", "failed to start server\n"))
+ goto close_prog;
+
+ CHECK(skel->data->sk_storage_result != 0, "sk_storage_result",
+ "sk_local_storage not set\n");
+
+ close(serv_sk);
+
+close_prog:
+ local_storage__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/local_storage.c b/tools/testing/selftests/bpf/progs/local_storage.c
new file mode 100644
index 000000000000..0758ba229ae0
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/local_storage.c
@@ -0,0 +1,140 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2020 Google LLC.
+ */
+
+#include <errno.h>
+#include <linux/bpf.h>
+#include <stdbool.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+#define DUMMY_STORAGE_VALUE 0xdeadbeef
+
+int monitored_pid = 0;
+int inode_storage_result = -1;
+int sk_storage_result = -1;
+
+struct dummy_storage {
+ __u32 value;
+};
+
+struct {
+ __uint(type, BPF_MAP_TYPE_INODE_STORAGE);
+ __uint(map_flags, BPF_F_NO_PREALLOC);
+ __type(key, int);
+ __type(value, struct dummy_storage);
+} inode_storage_map SEC(".maps");
+
+struct {
+ __uint(type, BPF_MAP_TYPE_SK_STORAGE);
+ __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_CLONE);
+ __type(key, int);
+ __type(value, struct dummy_storage);
+} sk_storage_map SEC(".maps");
+
+/* TODO Use vmlinux.h once BTF pruning for embedded types is fixed.
+ */
+struct sock {} __attribute__((preserve_access_index));
+struct sockaddr {} __attribute__((preserve_access_index));
+struct socket {
+ struct sock *sk;
+} __attribute__((preserve_access_index));
+
+struct inode {} __attribute__((preserve_access_index));
+struct dentry {
+ struct inode *d_inode;
+} __attribute__((preserve_access_index));
+struct file {
+ struct inode *f_inode;
+} __attribute__((preserve_access_index));
+
+
+SEC("lsm/inode_unlink")
+int BPF_PROG(unlink_hook, struct inode *dir, struct dentry *victim)
+{
+ __u32 pid = bpf_get_current_pid_tgid() >> 32;
+ struct dummy_storage *storage;
+
+ if (pid != monitored_pid)
+ return 0;
+
+ storage = bpf_inode_storage_get(&inode_storage_map, victim->d_inode, 0,
+ BPF_SK_STORAGE_GET_F_CREATE);
+ if (!storage)
+ return 0;
+
+ if (storage->value == DUMMY_STORAGE_VALUE)
+ inode_storage_result = -1;
+
+ inode_storage_result =
+ bpf_inode_storage_delete(&inode_storage_map, victim->d_inode);
+
+ return 0;
+}
+
+SEC("lsm/socket_bind")
+int BPF_PROG(socket_bind, struct socket *sock, struct sockaddr *address,
+ int addrlen)
+{
+ __u32 pid = bpf_get_current_pid_tgid() >> 32;
+ struct dummy_storage *storage;
+
+ if (pid != monitored_pid)
+ return 0;
+
+ storage = bpf_sk_storage_get(&sk_storage_map, sock->sk, 0,
+ BPF_SK_STORAGE_GET_F_CREATE);
+ if (!storage)
+ return 0;
+
+ if (storage->value == DUMMY_STORAGE_VALUE)
+ sk_storage_result = -1;
+
+ sk_storage_result = bpf_sk_storage_delete(&sk_storage_map, sock->sk);
+ return 0;
+}
+
+SEC("lsm/socket_post_create")
+int BPF_PROG(socket_post_create, struct socket *sock, int family, int type,
+ int protocol, int kern)
+{
+ __u32 pid = bpf_get_current_pid_tgid() >> 32;
+ struct dummy_storage *storage;
+
+ if (pid != monitored_pid)
+ return 0;
+
+ storage = bpf_sk_storage_get(&sk_storage_map, sock->sk, 0,
+ BPF_SK_STORAGE_GET_F_CREATE);
+ if (!storage)
+ return 0;
+
+ storage->value = DUMMY_STORAGE_VALUE;
+
+ return 0;
+}
+
+SEC("lsm/file_open")
+int BPF_PROG(file_open, struct file *file)
+{
+ __u32 pid = bpf_get_current_pid_tgid() >> 32;
+ struct dummy_storage *storage;
+
+ if (pid != monitored_pid)
+ return 0;
+
+ if (!file->f_inode)
+ return 0;
+
+ storage = bpf_inode_storage_get(&inode_storage_map, file->f_inode, 0,
+ BPF_LOCAL_STORAGE_GET_F_CREATE);
+ if (!storage)
+ return 0;
+
+ storage->value = DUMMY_STORAGE_VALUE;
+ return 0;
+}
--
2.28.0.297.g1956fa8f8d-goog
^ permalink raw reply related
* [PATCH bpf-next v9 6/7] bpf: Allow local storage to be used from LSM programs
From: KP Singh @ 2020-08-23 16:56 UTC (permalink / raw)
To: linux-kernel, bpf, linux-security-module
Cc: Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann,
Paul Turner, Jann Horn, Florent Revest
In-Reply-To: <20200823165612.404892-1-kpsingh@chromium.org>
From: KP Singh <kpsingh@google.com>
Adds support for both bpf_{sk, inode}_storage_{get, delete} to be used
in LSM programs. These helpers are not used for tracing programs
(currently) as their usage is tied to the life-cycle of the object and
should only be used where the owning object won't be freed (when the
owning object is passed as an argument to the LSM hook). Thus, they
are safer to use in LSM hooks than tracing. Usage of local storage in
tracing programs will probably follow a per function based whitelist
approach.
Since the UAPI helper signature for bpf_sk_storage expect a bpf_sock,
it, leads to a compilation warning for LSM programs, it's also updated
to accept a void * pointer instead.
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: KP Singh <kpsingh@google.com>
---
include/net/bpf_sk_storage.h | 2 ++
include/uapi/linux/bpf.h | 7 +++++--
kernel/bpf/bpf_lsm.c | 21 ++++++++++++++++++++-
net/core/bpf_sk_storage.c | 25 +++++++++++++++++++++++++
tools/include/uapi/linux/bpf.h | 7 +++++--
5 files changed, 57 insertions(+), 5 deletions(-)
diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h
index 3c516dd07caf..119f4c9c3a9c 100644
--- a/include/net/bpf_sk_storage.h
+++ b/include/net/bpf_sk_storage.h
@@ -20,6 +20,8 @@ void bpf_sk_storage_free(struct sock *sk);
extern const struct bpf_func_proto bpf_sk_storage_get_proto;
extern const struct bpf_func_proto bpf_sk_storage_delete_proto;
+extern const struct bpf_func_proto sk_storage_get_btf_proto;
+extern const struct bpf_func_proto sk_storage_delete_btf_proto;
struct bpf_local_storage_elem;
struct bpf_sk_storage_diag;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b0504ab59a2e..c7f0a806932d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2808,7 +2808,7 @@ union bpf_attr {
*
* **-ERANGE** if resulting value was out of range.
*
- * void *bpf_sk_storage_get(struct bpf_map *map, struct bpf_sock *sk, void *value, u64 flags)
+ * void *bpf_sk_storage_get(struct bpf_map *map, void *sk, void *value, u64 flags)
* Description
* Get a bpf-local-storage from a *sk*.
*
@@ -2824,6 +2824,9 @@ union bpf_attr {
* "type". The bpf-local-storage "type" (i.e. the *map*) is
* searched against all bpf-local-storages residing at *sk*.
*
+ * *sk* is a kernel **struct sock** pointer for LSM program.
+ * *sk* is a **struct bpf_sock** pointer for other program types.
+ *
* An optional *flags* (**BPF_SK_STORAGE_GET_F_CREATE**) can be
* used such that a new bpf-local-storage will be
* created if one does not exist. *value* can be used
@@ -2836,7 +2839,7 @@ union bpf_attr {
* **NULL** if not found or there was an error in adding
* a new bpf-local-storage.
*
- * long bpf_sk_storage_delete(struct bpf_map *map, struct bpf_sock *sk)
+ * long bpf_sk_storage_delete(struct bpf_map *map, void *sk)
* Description
* Delete a bpf-local-storage from a *sk*.
* Return
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index fb278144e9fd..9cd1428c7199 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -11,6 +11,8 @@
#include <linux/bpf_lsm.h>
#include <linux/kallsyms.h>
#include <linux/bpf_verifier.h>
+#include <net/bpf_sk_storage.h>
+#include <linux/bpf_local_storage.h>
/* For every LSM hook that allows attachment of BPF programs, declare a nop
* function where a BPF program can be attached.
@@ -45,10 +47,27 @@ int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
return 0;
}
+static const struct bpf_func_proto *
+bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+ switch (func_id) {
+ case BPF_FUNC_inode_storage_get:
+ return &bpf_inode_storage_get_proto;
+ case BPF_FUNC_inode_storage_delete:
+ return &bpf_inode_storage_delete_proto;
+ case BPF_FUNC_sk_storage_get:
+ return &sk_storage_get_btf_proto;
+ case BPF_FUNC_sk_storage_delete:
+ return &sk_storage_delete_btf_proto;
+ default:
+ return tracing_prog_func_proto(func_id, prog);
+ }
+}
+
const struct bpf_prog_ops lsm_prog_ops = {
};
const struct bpf_verifier_ops lsm_verifier_ops = {
- .get_func_proto = tracing_prog_func_proto,
+ .get_func_proto = bpf_lsm_func_proto,
.is_valid_access = btf_ctx_access,
};
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index f29d9a9b4ea4..55fae03b4cc3 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -12,6 +12,7 @@
#include <net/sock.h>
#include <uapi/linux/sock_diag.h>
#include <uapi/linux/btf.h>
+#include <linux/btf_ids.h>
DEFINE_BPF_STORAGE_CACHE(sk_cache);
@@ -377,6 +378,30 @@ const struct bpf_func_proto bpf_sk_storage_delete_proto = {
.arg2_type = ARG_PTR_TO_SOCKET,
};
+BTF_ID_LIST(sk_storage_btf_ids)
+BTF_ID_UNUSED
+BTF_ID(struct, sock)
+
+const struct bpf_func_proto sk_storage_get_btf_proto = {
+ .func = bpf_sk_storage_get,
+ .gpl_only = false,
+ .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID,
+ .arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL,
+ .arg4_type = ARG_ANYTHING,
+ .btf_id = sk_storage_btf_ids,
+};
+
+const struct bpf_func_proto sk_storage_delete_btf_proto = {
+ .func = bpf_sk_storage_delete,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID,
+ .btf_id = sk_storage_btf_ids,
+};
+
struct bpf_sk_storage_diag {
u32 nr_maps;
struct bpf_map *maps[];
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index b0504ab59a2e..c7f0a806932d 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2808,7 +2808,7 @@ union bpf_attr {
*
* **-ERANGE** if resulting value was out of range.
*
- * void *bpf_sk_storage_get(struct bpf_map *map, struct bpf_sock *sk, void *value, u64 flags)
+ * void *bpf_sk_storage_get(struct bpf_map *map, void *sk, void *value, u64 flags)
* Description
* Get a bpf-local-storage from a *sk*.
*
@@ -2824,6 +2824,9 @@ union bpf_attr {
* "type". The bpf-local-storage "type" (i.e. the *map*) is
* searched against all bpf-local-storages residing at *sk*.
*
+ * *sk* is a kernel **struct sock** pointer for LSM program.
+ * *sk* is a **struct bpf_sock** pointer for other program types.
+ *
* An optional *flags* (**BPF_SK_STORAGE_GET_F_CREATE**) can be
* used such that a new bpf-local-storage will be
* created if one does not exist. *value* can be used
@@ -2836,7 +2839,7 @@ union bpf_attr {
* **NULL** if not found or there was an error in adding
* a new bpf-local-storage.
*
- * long bpf_sk_storage_delete(struct bpf_map *map, struct bpf_sock *sk)
+ * long bpf_sk_storage_delete(struct bpf_map *map, void *sk)
* Description
* Delete a bpf-local-storage from a *sk*.
* Return
--
2.28.0.297.g1956fa8f8d-goog
^ permalink raw reply related
* [PATCH bpf-next v9 5/7] bpf: Implement bpf_local_storage for inodes
From: KP Singh @ 2020-08-23 16:56 UTC (permalink / raw)
To: linux-kernel, bpf, linux-security-module
Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Paul Turner, Jann Horn, Florent Revest
In-Reply-To: <20200823165612.404892-1-kpsingh@chromium.org>
From: KP Singh <kpsingh@google.com>
Similar to bpf_local_storage for sockets, add local storage for inodes.
The life-cycle of storage is managed with the life-cycle of the inode.
i.e. the storage is destroyed along with the owning inode.
The BPF LSM allocates an __rcu pointer to the bpf_local_storage in the
security blob which are now stackable and can co-exist with other LSMs.
Signed-off-by: KP Singh <kpsingh@google.com>
---
include/linux/bpf_lsm.h | 29 ++
include/linux/bpf_types.h | 3 +
include/uapi/linux/bpf.h | 38 +++
kernel/bpf/Makefile | 1 +
kernel/bpf/bpf_inode_storage.c | 265 ++++++++++++++++++
kernel/bpf/syscall.c | 3 +-
kernel/bpf/verifier.c | 10 +
security/bpf/hooks.c | 7 +
.../bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
tools/bpf/bpftool/bash-completion/bpftool | 3 +-
tools/bpf/bpftool/map.c | 3 +-
tools/include/uapi/linux/bpf.h | 38 +++
tools/lib/bpf/libbpf_probes.c | 5 +-
13 files changed, 401 insertions(+), 6 deletions(-)
create mode 100644 kernel/bpf/bpf_inode_storage.c
diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
index af74712af585..aaacb6aafc87 100644
--- a/include/linux/bpf_lsm.h
+++ b/include/linux/bpf_lsm.h
@@ -17,9 +17,28 @@
#include <linux/lsm_hook_defs.h>
#undef LSM_HOOK
+struct bpf_storage_blob {
+ struct bpf_local_storage __rcu *storage;
+};
+
+extern struct lsm_blob_sizes bpf_lsm_blob_sizes;
+
int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
const struct bpf_prog *prog);
+static inline struct bpf_storage_blob *bpf_inode(
+ const struct inode *inode)
+{
+ if (unlikely(!inode->i_security))
+ return NULL;
+
+ return inode->i_security + bpf_lsm_blob_sizes.lbs_inode;
+}
+
+extern const struct bpf_func_proto bpf_inode_storage_get_proto;
+extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
+void bpf_inode_storage_free(struct inode *inode);
+
#else /* !CONFIG_BPF_LSM */
static inline int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
@@ -28,6 +47,16 @@ static inline int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
return -EOPNOTSUPP;
}
+static inline struct bpf_storage_blob *bpf_inode(
+ const struct inode *inode)
+{
+ return NULL;
+}
+
+static inline void bpf_inode_storage_free(struct inode *inode)
+{
+}
+
#endif /* CONFIG_BPF_LSM */
#endif /* _LINUX_BPF_LSM_H */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index a52a5688418e..2e6f568377f1 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -107,6 +107,9 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_SK_STORAGE, sk_storage_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKHASH, sock_hash_ops)
#endif
+#ifdef CONFIG_BPF_LSM
+BPF_MAP_TYPE(BPF_MAP_TYPE_INODE_STORAGE, inode_storage_map_ops)
+#endif
BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops)
#if defined(CONFIG_XDP_SOCKETS)
BPF_MAP_TYPE(BPF_MAP_TYPE_XSKMAP, xsk_map_ops)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f1a4a476586b..b0504ab59a2e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -155,6 +155,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_DEVMAP_HASH,
BPF_MAP_TYPE_STRUCT_OPS,
BPF_MAP_TYPE_RINGBUF,
+ BPF_MAP_TYPE_INODE_STORAGE,
};
/* Note that tracing related programs such as
@@ -3395,6 +3396,41 @@ union bpf_attr {
* A non-negative value equal to or less than *size* on success,
* or a negative error in case of failure.
*
+ * void *bpf_inode_storage_get(struct bpf_map *map, void *inode, void *value, u64 flags)
+ * Description
+ * Get a bpf_local_storage from an *inode*.
+ *
+ * Logically, it could be thought of as getting the value from
+ * a *map* with *inode* as the **key**. From this
+ * perspective, the usage is not much different from
+ * **bpf_map_lookup_elem**\ (*map*, **&**\ *inode*) except this
+ * helper enforces the key must be an inode and the map must also
+ * be a **BPF_MAP_TYPE_INODE_STORAGE**.
+ *
+ * Underneath, the value is stored locally at *inode* instead of
+ * the *map*. The *map* is used as the bpf-local-storage
+ * "type". The bpf-local-storage "type" (i.e. the *map*) is
+ * searched against all bpf_local_storage residing at *inode*.
+ *
+ * An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be
+ * used such that a new bpf_local_storage will be
+ * created if one does not exist. *value* can be used
+ * together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify
+ * the initial value of a bpf_local_storage. If *value* is
+ * **NULL**, the new bpf_local_storage will be zero initialized.
+ * Return
+ * A bpf_local_storage pointer is returned on success.
+ *
+ * **NULL** if not found or there was an error in adding
+ * a new bpf_local_storage.
+ *
+ * int bpf_inode_storage_delete(struct bpf_map *map, void *inode)
+ * Description
+ * Delete a bpf_local_storage from an *inode*.
+ * Return
+ * 0 on success.
+ *
+ * **-ENOENT** if the bpf_local_storage cannot be found.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -3539,6 +3575,8 @@ union bpf_attr {
FN(skc_to_tcp_request_sock), \
FN(skc_to_udp6_sock), \
FN(get_task_stack), \
+ FN(inode_storage_get), \
+ FN(inode_storage_delete), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 6961ff400cba..bdc8cd1b6767 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -5,6 +5,7 @@ CFLAGS_core.o += $(call cc-disable-warning, override-init)
obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o
obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
+obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o
obj-$(CONFIG_BPF_SYSCALL) += disasm.o
obj-$(CONFIG_BPF_JIT) += trampoline.o
obj-$(CONFIG_BPF_SYSCALL) += btf.o
diff --git a/kernel/bpf/bpf_inode_storage.c b/kernel/bpf/bpf_inode_storage.c
new file mode 100644
index 000000000000..b0b283c224c1
--- /dev/null
+++ b/kernel/bpf/bpf_inode_storage.c
@@ -0,0 +1,265 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2019 Facebook
+ * Copyright 2020 Google LLC.
+ */
+
+#include <linux/rculist.h>
+#include <linux/list.h>
+#include <linux/hash.h>
+#include <linux/types.h>
+#include <linux/spinlock.h>
+#include <linux/bpf.h>
+#include <linux/bpf_local_storage.h>
+#include <net/sock.h>
+#include <uapi/linux/sock_diag.h>
+#include <uapi/linux/btf.h>
+#include <linux/bpf_lsm.h>
+#include <linux/btf_ids.h>
+#include <linux/fdtable.h>
+
+DEFINE_BPF_STORAGE_CACHE(inode_cache);
+
+static struct bpf_local_storage __rcu **
+inode_storage_ptr(void *owner)
+{
+ struct inode *inode = owner;
+ struct bpf_storage_blob *bsb;
+
+ bsb = bpf_inode(inode);
+ if (!bsb)
+ return NULL;
+ return &bsb->storage;
+}
+
+static struct bpf_local_storage_data *inode_storage_lookup(struct inode *inode,
+ struct bpf_map *map,
+ bool cacheit_lockit)
+{
+ struct bpf_local_storage *inode_storage;
+ struct bpf_local_storage_map *smap;
+ struct bpf_storage_blob *bsb;
+
+ bsb = bpf_inode(inode);
+ if (!bsb)
+ return NULL;
+
+ inode_storage = rcu_dereference(bsb->storage);
+ if (!inode_storage)
+ return NULL;
+
+ smap = (struct bpf_local_storage_map *)map;
+ return bpf_local_storage_lookup(inode_storage, smap, cacheit_lockit);
+}
+
+void bpf_inode_storage_free(struct inode *inode)
+{
+ struct bpf_local_storage_elem *selem;
+ struct bpf_local_storage *local_storage;
+ bool free_inode_storage = false;
+ struct bpf_storage_blob *bsb;
+ struct hlist_node *n;
+
+ bsb = bpf_inode(inode);
+ if (!bsb)
+ return;
+
+ rcu_read_lock();
+
+ local_storage = rcu_dereference(bsb->storage);
+ if (!local_storage) {
+ rcu_read_unlock();
+ return;
+ }
+
+ /* Netiher the bpf_prog nor the bpf-map's syscall
+ * could be modifying the local_storage->list now.
+ * Thus, no elem can be added-to or deleted-from the
+ * local_storage->list by the bpf_prog or by the bpf-map's syscall.
+ *
+ * It is racing with bpf_local_storage_map_free() alone
+ * when unlinking elem from the local_storage->list and
+ * the map's bucket->list.
+ */
+ raw_spin_lock_bh(&local_storage->lock);
+ hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) {
+ /* Always unlink from map before unlinking from
+ * local_storage.
+ */
+ bpf_selem_unlink_map(selem);
+ free_inode_storage = bpf_selem_unlink_storage_nolock(
+ local_storage, selem, false);
+ }
+ raw_spin_unlock_bh(&local_storage->lock);
+ rcu_read_unlock();
+
+ /* free_inoode_storage should always be true as long as
+ * local_storage->list was non-empty.
+ */
+ if (free_inode_storage)
+ kfree_rcu(local_storage, rcu);
+}
+
+static void *bpf_fd_inode_storage_lookup_elem(struct bpf_map *map, void *key)
+{
+ struct bpf_local_storage_data *sdata;
+ struct file *f;
+ int fd;
+
+ fd = *(int *)key;
+ f = fget_raw(fd);
+ if (!f)
+ return NULL;
+
+ sdata = inode_storage_lookup(f->f_inode, map, true);
+ fput(f);
+ return sdata ? sdata->data : NULL;
+}
+
+static int bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key,
+ void *value, u64 map_flags)
+{
+ struct bpf_local_storage_data *sdata;
+ struct file *f;
+ int fd;
+
+ fd = *(int *)key;
+ f = fget_raw(fd);
+ if (!f)
+ return -EBADF;
+
+ sdata = bpf_local_storage_update(f->f_inode,
+ (struct bpf_local_storage_map *)map,
+ value, map_flags);
+ fput(f);
+ return PTR_ERR_OR_ZERO(sdata);
+}
+
+static int inode_storage_delete(struct inode *inode, struct bpf_map *map)
+{
+ struct bpf_local_storage_data *sdata;
+
+ sdata = inode_storage_lookup(inode, map, false);
+ if (!sdata)
+ return -ENOENT;
+
+ bpf_selem_unlink(SELEM(sdata));
+
+ return 0;
+}
+
+static int bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key)
+{
+ struct file *f;
+ int fd, err;
+
+ fd = *(int *)key;
+ f = fget_raw(fd);
+ if (!f)
+ return -EBADF;
+
+ err = inode_storage_delete(f->f_inode, map);
+ fput(f);
+ return err;
+}
+
+BPF_CALL_4(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode,
+ void *, value, u64, flags)
+{
+ struct bpf_local_storage_data *sdata;
+
+ if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE))
+ return (unsigned long)NULL;
+
+ sdata = inode_storage_lookup(inode, map, true);
+ if (sdata)
+ return (unsigned long)sdata->data;
+
+ /* This helper must only called from where the inode is gurranteed
+ * to have a refcount and cannot be freed.
+ */
+ if (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) {
+ sdata = bpf_local_storage_update(
+ inode, (struct bpf_local_storage_map *)map, value,
+ BPF_NOEXIST);
+ return IS_ERR(sdata) ? (unsigned long)NULL :
+ (unsigned long)sdata->data;
+ }
+
+ return (unsigned long)NULL;
+}
+
+BPF_CALL_2(bpf_inode_storage_delete,
+ struct bpf_map *, map, struct inode *, inode)
+{
+ /* This helper must only called from where the inode is gurranteed
+ * to have a refcount and cannot be freed.
+ */
+ return inode_storage_delete(inode, map);
+}
+
+static int notsupp_get_next_key(struct bpf_map *map, void *key,
+ void *next_key)
+{
+ return -ENOTSUPP;
+}
+
+static struct bpf_map *inode_storage_map_alloc(union bpf_attr *attr)
+{
+ struct bpf_local_storage_map *smap;
+
+ smap = bpf_local_storage_map_alloc(attr);
+ if (IS_ERR(smap))
+ return ERR_CAST(smap);
+
+ smap->cache_idx = bpf_local_storage_cache_idx_get(&inode_cache);
+ return &smap->map;
+}
+
+static void inode_storage_map_free(struct bpf_map *map)
+{
+ struct bpf_local_storage_map *smap;
+
+ smap = (struct bpf_local_storage_map *)map;
+ bpf_local_storage_cache_idx_free(&inode_cache, smap->cache_idx);
+ bpf_local_storage_map_free(smap);
+}
+
+static int inode_storage_map_btf_id;
+const struct bpf_map_ops inode_storage_map_ops = {
+ .map_alloc_check = bpf_local_storage_map_alloc_check,
+ .map_alloc = inode_storage_map_alloc,
+ .map_free = inode_storage_map_free,
+ .map_get_next_key = notsupp_get_next_key,
+ .map_lookup_elem = bpf_fd_inode_storage_lookup_elem,
+ .map_update_elem = bpf_fd_inode_storage_update_elem,
+ .map_delete_elem = bpf_fd_inode_storage_delete_elem,
+ .map_check_btf = bpf_local_storage_map_check_btf,
+ .map_btf_name = "bpf_local_storage_map",
+ .map_btf_id = &inode_storage_map_btf_id,
+ .map_owner_storage_ptr = inode_storage_ptr,
+};
+
+BTF_ID_LIST(bpf_inode_storage_btf_ids)
+BTF_ID_UNUSED
+BTF_ID(struct, inode)
+
+const struct bpf_func_proto bpf_inode_storage_get_proto = {
+ .func = bpf_inode_storage_get,
+ .gpl_only = false,
+ .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID,
+ .arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL,
+ .arg4_type = ARG_ANYTHING,
+ .btf_id = bpf_inode_storage_btf_ids,
+};
+
+const struct bpf_func_proto bpf_inode_storage_delete_proto = {
+ .func = bpf_inode_storage_delete,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID,
+ .btf_id = bpf_inode_storage_btf_ids,
+};
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b46e973faee9..5443cea86cef 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -769,7 +769,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
if (map->map_type != BPF_MAP_TYPE_HASH &&
map->map_type != BPF_MAP_TYPE_ARRAY &&
map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
- map->map_type != BPF_MAP_TYPE_SK_STORAGE)
+ map->map_type != BPF_MAP_TYPE_SK_STORAGE &&
+ map->map_type != BPF_MAP_TYPE_INODE_STORAGE)
return -ENOTSUPP;
if (map->spin_lock_off + sizeof(struct bpf_spin_lock) >
map->value_size) {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index dd24503ab3d3..38748794518e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4311,6 +4311,11 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
func_id != BPF_FUNC_sk_storage_delete)
goto error;
break;
+ case BPF_MAP_TYPE_INODE_STORAGE:
+ if (func_id != BPF_FUNC_inode_storage_get &&
+ func_id != BPF_FUNC_inode_storage_delete)
+ goto error;
+ break;
default:
break;
}
@@ -4384,6 +4389,11 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
if (map->map_type != BPF_MAP_TYPE_SK_STORAGE)
goto error;
break;
+ case BPF_FUNC_inode_storage_get:
+ case BPF_FUNC_inode_storage_delete:
+ if (map->map_type != BPF_MAP_TYPE_INODE_STORAGE)
+ goto error;
+ break;
default:
break;
}
diff --git a/security/bpf/hooks.c b/security/bpf/hooks.c
index 32d32d485451..35f9b19259e5 100644
--- a/security/bpf/hooks.c
+++ b/security/bpf/hooks.c
@@ -3,6 +3,7 @@
/*
* Copyright (C) 2020 Google LLC.
*/
+#include <linux/bpf_local_storage.h>
#include <linux/lsm_hooks.h>
#include <linux/bpf_lsm.h>
@@ -11,6 +12,7 @@ static struct security_hook_list bpf_lsm_hooks[] __lsm_ro_after_init = {
LSM_HOOK_INIT(NAME, bpf_lsm_##NAME),
#include <linux/lsm_hook_defs.h>
#undef LSM_HOOK
+ LSM_HOOK_INIT(inode_free_security, bpf_inode_storage_free),
};
static int __init bpf_lsm_init(void)
@@ -20,7 +22,12 @@ static int __init bpf_lsm_init(void)
return 0;
}
+struct lsm_blob_sizes bpf_lsm_blob_sizes __lsm_ro_after_init = {
+ .lbs_inode = sizeof(struct bpf_storage_blob),
+};
+
DEFINE_LSM(bpf) = {
.name = "bpf",
.init = bpf_lsm_init,
+ .blobs = &bpf_lsm_blob_sizes
};
diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index 41e2a74252d0..083db6c2fc67 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -49,7 +49,7 @@ MAP COMMANDS
| | **lru_percpu_hash** | **lpm_trie** | **array_of_maps** | **hash_of_maps**
| | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash**
| | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
-| | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** }
+| | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage** }
DESCRIPTION
===========
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index f53ed2f1a4aa..7b68e3c0a5fb 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -704,7 +704,8 @@ _bpftool()
lru_percpu_hash lpm_trie array_of_maps \
hash_of_maps devmap devmap_hash sockmap cpumap \
xskmap sockhash cgroup_storage reuseport_sockarray \
- percpu_cgroup_storage queue stack' -- \
+ percpu_cgroup_storage queue stack sk_storage \
+ struct_ops inode_storage' -- \
"$cur" ) )
return 0
;;
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 3a27d31a1856..bc0071228f88 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -50,6 +50,7 @@ const char * const map_type_name[] = {
[BPF_MAP_TYPE_SK_STORAGE] = "sk_storage",
[BPF_MAP_TYPE_STRUCT_OPS] = "struct_ops",
[BPF_MAP_TYPE_RINGBUF] = "ringbuf",
+ [BPF_MAP_TYPE_INODE_STORAGE] = "inode_storage",
};
const size_t map_type_name_size = ARRAY_SIZE(map_type_name);
@@ -1442,7 +1443,7 @@ static int do_help(int argc, char **argv)
" lru_percpu_hash | lpm_trie | array_of_maps | hash_of_maps |\n"
" devmap | devmap_hash | sockmap | cpumap | xskmap | sockhash |\n"
" cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n"
- " queue | stack | sk_storage | struct_ops | ringbuf }\n"
+ " queue | stack | sk_storage | struct_ops | ringbuf | inode_storage }\n"
" " HELP_SPEC_OPTIONS "\n"
"",
bin_name, argv[-2]);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f1a4a476586b..b0504ab59a2e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -155,6 +155,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_DEVMAP_HASH,
BPF_MAP_TYPE_STRUCT_OPS,
BPF_MAP_TYPE_RINGBUF,
+ BPF_MAP_TYPE_INODE_STORAGE,
};
/* Note that tracing related programs such as
@@ -3395,6 +3396,41 @@ union bpf_attr {
* A non-negative value equal to or less than *size* on success,
* or a negative error in case of failure.
*
+ * void *bpf_inode_storage_get(struct bpf_map *map, void *inode, void *value, u64 flags)
+ * Description
+ * Get a bpf_local_storage from an *inode*.
+ *
+ * Logically, it could be thought of as getting the value from
+ * a *map* with *inode* as the **key**. From this
+ * perspective, the usage is not much different from
+ * **bpf_map_lookup_elem**\ (*map*, **&**\ *inode*) except this
+ * helper enforces the key must be an inode and the map must also
+ * be a **BPF_MAP_TYPE_INODE_STORAGE**.
+ *
+ * Underneath, the value is stored locally at *inode* instead of
+ * the *map*. The *map* is used as the bpf-local-storage
+ * "type". The bpf-local-storage "type" (i.e. the *map*) is
+ * searched against all bpf_local_storage residing at *inode*.
+ *
+ * An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be
+ * used such that a new bpf_local_storage will be
+ * created if one does not exist. *value* can be used
+ * together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify
+ * the initial value of a bpf_local_storage. If *value* is
+ * **NULL**, the new bpf_local_storage will be zero initialized.
+ * Return
+ * A bpf_local_storage pointer is returned on success.
+ *
+ * **NULL** if not found or there was an error in adding
+ * a new bpf_local_storage.
+ *
+ * int bpf_inode_storage_delete(struct bpf_map *map, void *inode)
+ * Description
+ * Delete a bpf_local_storage from an *inode*.
+ * Return
+ * 0 on success.
+ *
+ * **-ENOENT** if the bpf_local_storage cannot be found.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -3539,6 +3575,8 @@ union bpf_attr {
FN(skc_to_tcp_request_sock), \
FN(skc_to_udp6_sock), \
FN(get_task_stack), \
+ FN(inode_storage_get), \
+ FN(inode_storage_delete), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index 010c9a76fd2b..5482a9b7ae2d 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -170,7 +170,7 @@ int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
return btf_fd;
}
-static int load_sk_storage_btf(void)
+static int load_local_storage_btf(void)
{
const char strs[] = "\0bpf_spin_lock\0val\0cnt\0l";
/* struct bpf_spin_lock {
@@ -229,12 +229,13 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex)
key_size = 0;
break;
case BPF_MAP_TYPE_SK_STORAGE:
+ case BPF_MAP_TYPE_INODE_STORAGE:
btf_key_type_id = 1;
btf_value_type_id = 3;
value_size = 8;
max_entries = 0;
map_flags = BPF_F_NO_PREALLOC;
- btf_fd = load_sk_storage_btf();
+ btf_fd = load_local_storage_btf();
if (btf_fd < 0)
return false;
break;
--
2.28.0.297.g1956fa8f8d-goog
^ permalink raw reply related
* [PATCH bpf-next v9 4/7] bpf: Split bpf_local_storage to bpf_sk_storage
From: KP Singh @ 2020-08-23 16:56 UTC (permalink / raw)
To: linux-kernel, bpf, linux-security-module
Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Paul Turner, Jann Horn, Florent Revest
In-Reply-To: <20200823165612.404892-1-kpsingh@chromium.org>
From: KP Singh <kpsingh@google.com>
A purely mechanical change:
bpf_sk_storage.c = bpf_sk_storage.c + bpf_local_storage.c
bpf_sk_storage.h = bpf_sk_storage.h + bpf_local_storage.h
Signed-off-by: KP Singh <kpsingh@google.com>
---
include/linux/bpf_local_storage.h | 163 ++++++++
include/net/bpf_sk_storage.h | 61 +--
kernel/bpf/Makefile | 1 +
kernel/bpf/bpf_local_storage.c | 600 ++++++++++++++++++++++++++
net/core/bpf_sk_storage.c | 672 +-----------------------------
5 files changed, 766 insertions(+), 731 deletions(-)
create mode 100644 include/linux/bpf_local_storage.h
create mode 100644 kernel/bpf/bpf_local_storage.c
diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
new file mode 100644
index 000000000000..b2c9463f36a1
--- /dev/null
+++ b/include/linux/bpf_local_storage.h
@@ -0,0 +1,163 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2019 Facebook
+ * Copyright 2020 Google LLC.
+ */
+
+#ifndef _BPF_LOCAL_STORAGE_H
+#define _BPF_LOCAL_STORAGE_H
+
+#include <linux/bpf.h>
+#include <linux/rculist.h>
+#include <linux/list.h>
+#include <linux/hash.h>
+#include <linux/types.h>
+#include <uapi/linux/btf.h>
+
+#define BPF_LOCAL_STORAGE_CACHE_SIZE 16
+
+struct bpf_local_storage_map_bucket {
+ struct hlist_head list;
+ raw_spinlock_t lock;
+};
+
+/* Thp map is not the primary owner of a bpf_local_storage_elem.
+ * Instead, the container object (eg. sk->sk_bpf_storage) is.
+ *
+ * The map (bpf_local_storage_map) is for two purposes
+ * 1. Define the size of the "local storage". It is
+ * the map's value_size.
+ *
+ * 2. Maintain a list to keep track of all elems such
+ * that they can be cleaned up during the map destruction.
+ *
+ * When a bpf local storage is being looked up for a
+ * particular object, the "bpf_map" pointer is actually used
+ * as the "key" to search in the list of elem in
+ * the respective bpf_local_storage owned by the object.
+ *
+ * e.g. sk->sk_bpf_storage is the mini-map with the "bpf_map" pointer
+ * as the searching key.
+ */
+struct bpf_local_storage_map {
+ struct bpf_map map;
+ /* Lookup elem does not require accessing the map.
+ *
+ * Updating/Deleting requires a bucket lock to
+ * link/unlink the elem from the map. Having
+ * multiple buckets to improve contention.
+ */
+ struct bpf_local_storage_map_bucket *buckets;
+ u32 bucket_log;
+ u16 elem_size;
+ u16 cache_idx;
+};
+
+struct bpf_local_storage_data {
+ /* smap is used as the searching key when looking up
+ * from the object's bpf_local_storage.
+ *
+ * Put it in the same cacheline as the data to minimize
+ * the number of cachelines access during the cache hit case.
+ */
+ struct bpf_local_storage_map __rcu *smap;
+ u8 data[] __aligned(8);
+};
+
+/* Linked to bpf_local_storage and bpf_local_storage_map */
+struct bpf_local_storage_elem {
+ struct hlist_node map_node; /* Linked to bpf_local_storage_map */
+ struct hlist_node snode; /* Linked to bpf_local_storage */
+ struct bpf_local_storage __rcu *local_storage;
+ struct rcu_head rcu;
+ /* 8 bytes hole */
+ /* The data is stored in aother cacheline to minimize
+ * the number of cachelines access during a cache hit.
+ */
+ struct bpf_local_storage_data sdata ____cacheline_aligned;
+};
+
+struct bpf_local_storage {
+ struct bpf_local_storage_data __rcu *cache[BPF_LOCAL_STORAGE_CACHE_SIZE];
+ struct hlist_head list; /* List of bpf_local_storage_elem */
+ void *owner; /* The object that owns the above "list" of
+ * bpf_local_storage_elem.
+ */
+ struct rcu_head rcu;
+ raw_spinlock_t lock; /* Protect adding/removing from the "list" */
+};
+
+/* U16_MAX is much more than enough for sk local storage
+ * considering a tcp_sock is ~2k.
+ */
+#define BPF_LOCAL_STORAGE_MAX_VALUE_SIZE \
+ min_t(u32, \
+ (KMALLOC_MAX_SIZE - MAX_BPF_STACK - \
+ sizeof(struct bpf_local_storage_elem)), \
+ (U16_MAX - sizeof(struct bpf_local_storage_elem)))
+
+#define SELEM(_SDATA) \
+ container_of((_SDATA), struct bpf_local_storage_elem, sdata)
+#define SDATA(_SELEM) (&(_SELEM)->sdata)
+
+#define BPF_LOCAL_STORAGE_CACHE_SIZE 16
+
+struct bpf_local_storage_cache {
+ spinlock_t idx_lock;
+ u64 idx_usage_counts[BPF_LOCAL_STORAGE_CACHE_SIZE];
+};
+
+#define DEFINE_BPF_STORAGE_CACHE(name) \
+static struct bpf_local_storage_cache name = { \
+ .idx_lock = __SPIN_LOCK_UNLOCKED(name.idx_lock), \
+}
+
+u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache);
+void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache,
+ u16 idx);
+
+/* Helper functions for bpf_local_storage */
+int bpf_local_storage_map_alloc_check(union bpf_attr *attr);
+
+struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr);
+
+struct bpf_local_storage_data *
+bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
+ struct bpf_local_storage_map *smap,
+ bool cacheit_lockit);
+
+void bpf_local_storage_map_free(struct bpf_local_storage_map *smap);
+
+int bpf_local_storage_map_check_btf(const struct bpf_map *map,
+ const struct btf *btf,
+ const struct btf_type *key_type,
+ const struct btf_type *value_type);
+
+void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
+ struct bpf_local_storage_elem *selem);
+
+bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
+ struct bpf_local_storage_elem *selem,
+ bool uncharge_omem);
+
+void bpf_selem_unlink(struct bpf_local_storage_elem *selem);
+
+void bpf_selem_link_map(struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *selem);
+
+void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem);
+
+struct bpf_local_storage_elem *
+bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value,
+ bool charge_mem);
+
+int
+bpf_local_storage_alloc(void *owner,
+ struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *first_selem);
+
+struct bpf_local_storage_data *
+bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
+ void *value, u64 map_flags);
+
+#endif /* _BPF_LOCAL_STORAGE_H */
diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h
index 9e631b5466e3..3c516dd07caf 100644
--- a/include/net/bpf_sk_storage.h
+++ b/include/net/bpf_sk_storage.h
@@ -12,6 +12,7 @@
#include <net/sock.h>
#include <uapi/linux/sock_diag.h>
#include <uapi/linux/btf.h>
+#include <linux/bpf_local_storage.h>
struct sock;
@@ -26,66 +27,6 @@ struct sk_buff;
struct nlattr;
struct sock;
-#define BPF_LOCAL_STORAGE_CACHE_SIZE 16
-
-struct bpf_local_storage_cache {
- spinlock_t idx_lock;
- u64 idx_usage_counts[BPF_LOCAL_STORAGE_CACHE_SIZE];
-};
-
-#define DEFINE_BPF_STORAGE_CACHE(name) \
-static struct bpf_local_storage_cache name = { \
- .idx_lock = __SPIN_LOCK_UNLOCKED(name.idx_lock), \
-}
-
-u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache);
-void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache,
- u16 idx);
-
-/* Helper functions for bpf_local_storage */
-int bpf_local_storage_map_alloc_check(union bpf_attr *attr);
-
-struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr);
-
-struct bpf_local_storage_data *
-bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
- struct bpf_local_storage_map *smap,
- bool cacheit_lockit);
-
-void bpf_local_storage_map_free(struct bpf_local_storage_map *smap);
-
-int bpf_local_storage_map_check_btf(const struct bpf_map *map,
- const struct btf *btf,
- const struct btf_type *key_type,
- const struct btf_type *value_type);
-
-void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
- struct bpf_local_storage_elem *selem);
-
-bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
- struct bpf_local_storage_elem *selem,
- bool uncharge_omem);
-
-void bpf_selem_unlink(struct bpf_local_storage_elem *selem);
-
-void bpf_selem_link_map(struct bpf_local_storage_map *smap,
- struct bpf_local_storage_elem *selem);
-
-void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem);
-
-struct bpf_local_storage_elem *
-bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value,
- bool charge_mem);
-
-int
-bpf_local_storage_alloc(void *owner,
- struct bpf_local_storage_map *smap,
- struct bpf_local_storage_elem *first_selem);
-
-struct bpf_local_storage_data *
-bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
- void *value, u64 map_flags);
-
#ifdef CONFIG_BPF_SYSCALL
int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk);
struct bpf_sk_storage_diag *
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 19e137aae40e..6961ff400cba 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_BPF_JIT) += dispatcher.o
ifeq ($(CONFIG_NET),y)
obj-$(CONFIG_BPF_SYSCALL) += devmap.o
obj-$(CONFIG_BPF_SYSCALL) += cpumap.o
+obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o
obj-$(CONFIG_BPF_SYSCALL) += offload.o
obj-$(CONFIG_BPF_SYSCALL) += net_namespace.o
endif
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
new file mode 100644
index 000000000000..ffa7d11fc2bd
--- /dev/null
+++ b/kernel/bpf/bpf_local_storage.c
@@ -0,0 +1,600 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2019 Facebook */
+#include <linux/rculist.h>
+#include <linux/list.h>
+#include <linux/hash.h>
+#include <linux/types.h>
+#include <linux/spinlock.h>
+#include <linux/bpf.h>
+#include <linux/btf_ids.h>
+#include <linux/bpf_local_storage.h>
+#include <net/sock.h>
+#include <uapi/linux/sock_diag.h>
+#include <uapi/linux/btf.h>
+
+#define BPF_LOCAL_STORAGE_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_CLONE)
+
+static struct bpf_local_storage_map_bucket *
+select_bucket(struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *selem)
+{
+ return &smap->buckets[hash_ptr(selem, smap->bucket_log)];
+}
+
+static int mem_charge(struct bpf_local_storage_map *smap, void *owner, u32 size)
+{
+ struct bpf_map *map = &smap->map;
+
+ if (!map->ops->map_local_storage_charge)
+ return 0;
+
+ return map->ops->map_local_storage_charge(smap, owner, size);
+}
+
+static void mem_uncharge(struct bpf_local_storage_map *smap, void *owner,
+ u32 size)
+{
+ struct bpf_map *map = &smap->map;
+
+ if (map->ops->map_local_storage_uncharge)
+ map->ops->map_local_storage_uncharge(smap, owner, size);
+}
+
+static struct bpf_local_storage __rcu **
+owner_storage(struct bpf_local_storage_map *smap, void *owner)
+{
+ struct bpf_map *map = &smap->map;
+
+ return map->ops->map_owner_storage_ptr(owner);
+}
+
+static bool selem_linked_to_storage(const struct bpf_local_storage_elem *selem)
+{
+ return !hlist_unhashed(&selem->snode);
+}
+
+static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
+{
+ return !hlist_unhashed(&selem->map_node);
+}
+
+struct bpf_local_storage_elem *
+bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
+ void *value, bool charge_mem)
+{
+ struct bpf_local_storage_elem *selem;
+
+ if (charge_mem && mem_charge(smap, owner, smap->elem_size))
+ return NULL;
+
+ selem = kzalloc(smap->elem_size, GFP_ATOMIC | __GFP_NOWARN);
+ if (selem) {
+ if (value)
+ memcpy(SDATA(selem)->data, value, smap->map.value_size);
+ return selem;
+ }
+
+ if (charge_mem)
+ mem_uncharge(smap, owner, smap->elem_size);
+
+ return NULL;
+}
+
+/* local_storage->lock must be held and selem->local_storage == local_storage.
+ * The caller must ensure selem->smap is still valid to be
+ * dereferenced for its smap->elem_size and smap->cache_idx.
+ */
+bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
+ struct bpf_local_storage_elem *selem,
+ bool uncharge_mem)
+{
+ struct bpf_local_storage_map *smap;
+ bool free_local_storage;
+ void *owner;
+
+ smap = rcu_dereference(SDATA(selem)->smap);
+ owner = local_storage->owner;
+
+ /* All uncharging on the owner must be done first.
+ * The owner may be freed once the last selem is unlinked
+ * from local_storage.
+ */
+ if (uncharge_mem)
+ mem_uncharge(smap, owner, smap->elem_size);
+
+ free_local_storage = hlist_is_singular_node(&selem->snode,
+ &local_storage->list);
+ if (free_local_storage) {
+ mem_uncharge(smap, owner, sizeof(struct bpf_local_storage));
+ local_storage->owner = NULL;
+
+ /* After this RCU_INIT, owner may be freed and cannot be used */
+ RCU_INIT_POINTER(*owner_storage(smap, owner), NULL);
+
+ /* local_storage is not freed now. local_storage->lock is
+ * still held and raw_spin_unlock_bh(&local_storage->lock)
+ * will be done by the caller.
+ *
+ * Although the unlock will be done under
+ * rcu_read_lock(), it is more intutivie to
+ * read if kfree_rcu(local_storage, rcu) is done
+ * after the raw_spin_unlock_bh(&local_storage->lock).
+ *
+ * Hence, a "bool free_local_storage" is returned
+ * to the caller which then calls the kfree_rcu()
+ * after unlock.
+ */
+ }
+ hlist_del_init_rcu(&selem->snode);
+ if (rcu_access_pointer(local_storage->cache[smap->cache_idx]) ==
+ SDATA(selem))
+ RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
+
+ kfree_rcu(selem, rcu);
+
+ return free_local_storage;
+}
+
+static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem)
+{
+ struct bpf_local_storage *local_storage;
+ bool free_local_storage = false;
+
+ if (unlikely(!selem_linked_to_storage(selem)))
+ /* selem has already been unlinked from sk */
+ return;
+
+ local_storage = rcu_dereference(selem->local_storage);
+ raw_spin_lock_bh(&local_storage->lock);
+ if (likely(selem_linked_to_storage(selem)))
+ free_local_storage = bpf_selem_unlink_storage_nolock(
+ local_storage, selem, true);
+ raw_spin_unlock_bh(&local_storage->lock);
+
+ if (free_local_storage)
+ kfree_rcu(local_storage, rcu);
+}
+
+void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
+ struct bpf_local_storage_elem *selem)
+{
+ RCU_INIT_POINTER(selem->local_storage, local_storage);
+ hlist_add_head(&selem->snode, &local_storage->list);
+}
+
+void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
+{
+ struct bpf_local_storage_map *smap;
+ struct bpf_local_storage_map_bucket *b;
+
+ if (unlikely(!selem_linked_to_map(selem)))
+ /* selem has already be unlinked from smap */
+ return;
+
+ smap = rcu_dereference(SDATA(selem)->smap);
+ b = select_bucket(smap, selem);
+ raw_spin_lock_bh(&b->lock);
+ if (likely(selem_linked_to_map(selem)))
+ hlist_del_init_rcu(&selem->map_node);
+ raw_spin_unlock_bh(&b->lock);
+}
+
+void bpf_selem_link_map(struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *selem)
+{
+ struct bpf_local_storage_map_bucket *b = select_bucket(smap, selem);
+
+ raw_spin_lock_bh(&b->lock);
+ RCU_INIT_POINTER(SDATA(selem)->smap, smap);
+ hlist_add_head_rcu(&selem->map_node, &b->list);
+ raw_spin_unlock_bh(&b->lock);
+}
+
+void bpf_selem_unlink(struct bpf_local_storage_elem *selem)
+{
+ /* Always unlink from map before unlinking from local_storage
+ * because selem will be freed after successfully unlinked from
+ * the local_storage.
+ */
+ bpf_selem_unlink_map(selem);
+ __bpf_selem_unlink_storage(selem);
+}
+
+struct bpf_local_storage_data *
+bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
+ struct bpf_local_storage_map *smap,
+ bool cacheit_lockit)
+{
+ struct bpf_local_storage_data *sdata;
+ struct bpf_local_storage_elem *selem;
+
+ /* Fast path (cache hit) */
+ sdata = rcu_dereference(local_storage->cache[smap->cache_idx]);
+ if (sdata && rcu_access_pointer(sdata->smap) == smap)
+ return sdata;
+
+ /* Slow path (cache miss) */
+ hlist_for_each_entry_rcu(selem, &local_storage->list, snode)
+ if (rcu_access_pointer(SDATA(selem)->smap) == smap)
+ break;
+
+ if (!selem)
+ return NULL;
+
+ sdata = SDATA(selem);
+ if (cacheit_lockit) {
+ /* spinlock is needed to avoid racing with the
+ * parallel delete. Otherwise, publishing an already
+ * deleted sdata to the cache will become a use-after-free
+ * problem in the next bpf_local_storage_lookup().
+ */
+ raw_spin_lock_bh(&local_storage->lock);
+ if (selem_linked_to_storage(selem))
+ rcu_assign_pointer(local_storage->cache[smap->cache_idx],
+ sdata);
+ raw_spin_unlock_bh(&local_storage->lock);
+ }
+
+ return sdata;
+}
+
+static int check_flags(const struct bpf_local_storage_data *old_sdata,
+ u64 map_flags)
+{
+ if (old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_NOEXIST)
+ /* elem already exists */
+ return -EEXIST;
+
+ if (!old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_EXIST)
+ /* elem doesn't exist, cannot update it */
+ return -ENOENT;
+
+ return 0;
+}
+
+int bpf_local_storage_alloc(void *owner,
+ struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *first_selem)
+{
+ struct bpf_local_storage *prev_storage, *storage;
+ struct bpf_local_storage **owner_storage_ptr;
+ int err;
+
+ err = mem_charge(smap, owner, sizeof(*storage));
+ if (err)
+ return err;
+
+ storage = kzalloc(sizeof(*storage), GFP_ATOMIC | __GFP_NOWARN);
+ if (!storage) {
+ err = -ENOMEM;
+ goto uncharge;
+ }
+
+ INIT_HLIST_HEAD(&storage->list);
+ raw_spin_lock_init(&storage->lock);
+ storage->owner = owner;
+
+ bpf_selem_link_storage_nolock(storage, first_selem);
+ bpf_selem_link_map(smap, first_selem);
+
+ owner_storage_ptr =
+ (struct bpf_local_storage **)owner_storage(smap, owner);
+ /* Publish storage to the owner.
+ * Instead of using any lock of the kernel object (i.e. owner),
+ * cmpxchg will work with any kernel object regardless what
+ * the running context is, bh, irq...etc.
+ *
+ * From now on, the owner->storage pointer (e.g. sk->sk_bpf_storage)
+ * is protected by the storage->lock. Hence, when freeing
+ * the owner->storage, the storage->lock must be held before
+ * setting owner->storage ptr to NULL.
+ */
+ prev_storage = cmpxchg(owner_storage_ptr, NULL, storage);
+ if (unlikely(prev_storage)) {
+ bpf_selem_unlink_map(first_selem);
+ err = -EAGAIN;
+ goto uncharge;
+
+ /* Note that even first_selem was linked to smap's
+ * bucket->list, first_selem can be freed immediately
+ * (instead of kfree_rcu) because
+ * bpf_local_storage_map_free() does a
+ * synchronize_rcu() before walking the bucket->list.
+ * Hence, no one is accessing selem from the
+ * bucket->list under rcu_read_lock().
+ */
+ }
+
+ return 0;
+
+uncharge:
+ kfree(storage);
+ mem_uncharge(smap, owner, sizeof(*storage));
+ return err;
+}
+
+/* sk cannot be going away because it is linking new elem
+ * to sk->sk_bpf_storage. (i.e. sk->sk_refcnt cannot be 0).
+ * Otherwise, it will become a leak (and other memory issues
+ * during map destruction).
+ */
+struct bpf_local_storage_data *
+bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
+ void *value, u64 map_flags)
+{
+ struct bpf_local_storage_data *old_sdata = NULL;
+ struct bpf_local_storage_elem *selem;
+ struct bpf_local_storage *local_storage;
+ int err;
+
+ /* BPF_EXIST and BPF_NOEXIST cannot be both set */
+ if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) ||
+ /* BPF_F_LOCK can only be used in a value with spin_lock */
+ unlikely((map_flags & BPF_F_LOCK) &&
+ !map_value_has_spin_lock(&smap->map)))
+ return ERR_PTR(-EINVAL);
+
+ local_storage = rcu_dereference(*owner_storage(smap, owner));
+ if (!local_storage || hlist_empty(&local_storage->list)) {
+ /* Very first elem for the owner */
+ err = check_flags(NULL, map_flags);
+ if (err)
+ return ERR_PTR(err);
+
+ selem = bpf_selem_alloc(smap, owner, value, true);
+ if (!selem)
+ return ERR_PTR(-ENOMEM);
+
+ err = bpf_local_storage_alloc(owner, smap, selem);
+ if (err) {
+ kfree(selem);
+ mem_uncharge(smap, owner, smap->elem_size);
+ return ERR_PTR(err);
+ }
+
+ return SDATA(selem);
+ }
+
+ if ((map_flags & BPF_F_LOCK) && !(map_flags & BPF_NOEXIST)) {
+ /* Hoping to find an old_sdata to do inline update
+ * such that it can avoid taking the local_storage->lock
+ * and changing the lists.
+ */
+ old_sdata =
+ bpf_local_storage_lookup(local_storage, smap, false);
+ err = check_flags(old_sdata, map_flags);
+ if (err)
+ return ERR_PTR(err);
+ if (old_sdata && selem_linked_to_storage(SELEM(old_sdata))) {
+ copy_map_value_locked(&smap->map, old_sdata->data,
+ value, false);
+ return old_sdata;
+ }
+ }
+
+ raw_spin_lock_bh(&local_storage->lock);
+
+ /* Recheck local_storage->list under local_storage->lock */
+ if (unlikely(hlist_empty(&local_storage->list))) {
+ /* A parallel del is happening and local_storage is going
+ * away. It has just been checked before, so very
+ * unlikely. Return instead of retry to keep things
+ * simple.
+ */
+ err = -EAGAIN;
+ goto unlock_err;
+ }
+
+ old_sdata = bpf_local_storage_lookup(local_storage, smap, false);
+ err = check_flags(old_sdata, map_flags);
+ if (err)
+ goto unlock_err;
+
+ if (old_sdata && (map_flags & BPF_F_LOCK)) {
+ copy_map_value_locked(&smap->map, old_sdata->data, value,
+ false);
+ selem = SELEM(old_sdata);
+ goto unlock;
+ }
+
+ /* local_storage->lock is held. Hence, we are sure
+ * we can unlink and uncharge the old_sdata successfully
+ * later. Hence, instead of charging the new selem now
+ * and then uncharge the old selem later (which may cause
+ * a potential but unnecessary charge failure), avoid taking
+ * a charge at all here (the "!old_sdata" check) and the
+ * old_sdata will not be uncharged later during
+ * bpf_selem_unlink_storage_nolock().
+ */
+ selem = bpf_selem_alloc(smap, owner, value, !old_sdata);
+ if (!selem) {
+ err = -ENOMEM;
+ goto unlock_err;
+ }
+
+ /* First, link the new selem to the map */
+ bpf_selem_link_map(smap, selem);
+
+ /* Second, link (and publish) the new selem to local_storage */
+ bpf_selem_link_storage_nolock(local_storage, selem);
+
+ /* Third, remove old selem, SELEM(old_sdata) */
+ if (old_sdata) {
+ bpf_selem_unlink_map(SELEM(old_sdata));
+ bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata),
+ false);
+ }
+
+unlock:
+ raw_spin_unlock_bh(&local_storage->lock);
+ return SDATA(selem);
+
+unlock_err:
+ raw_spin_unlock_bh(&local_storage->lock);
+ return ERR_PTR(err);
+}
+
+u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache)
+{
+ u64 min_usage = U64_MAX;
+ u16 i, res = 0;
+
+ spin_lock(&cache->idx_lock);
+
+ for (i = 0; i < BPF_LOCAL_STORAGE_CACHE_SIZE; i++) {
+ if (cache->idx_usage_counts[i] < min_usage) {
+ min_usage = cache->idx_usage_counts[i];
+ res = i;
+
+ /* Found a free cache_idx */
+ if (!min_usage)
+ break;
+ }
+ }
+ cache->idx_usage_counts[res]++;
+
+ spin_unlock(&cache->idx_lock);
+
+ return res;
+}
+
+void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache,
+ u16 idx)
+{
+ spin_lock(&cache->idx_lock);
+ cache->idx_usage_counts[idx]--;
+ spin_unlock(&cache->idx_lock);
+}
+
+void bpf_local_storage_map_free(struct bpf_local_storage_map *smap)
+{
+ struct bpf_local_storage_elem *selem;
+ struct bpf_local_storage_map_bucket *b;
+ unsigned int i;
+
+ /* Note that this map might be concurrently cloned from
+ * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone
+ * RCU read section to finish before proceeding. New RCU
+ * read sections should be prevented via bpf_map_inc_not_zero.
+ */
+ synchronize_rcu();
+
+ /* bpf prog and the userspace can no longer access this map
+ * now. No new selem (of this map) can be added
+ * to the owner->storage or to the map bucket's list.
+ *
+ * The elem of this map can be cleaned up here
+ * or when the storage is freed e.g.
+ * by bpf_sk_storage_free() during __sk_destruct().
+ */
+ for (i = 0; i < (1U << smap->bucket_log); i++) {
+ b = &smap->buckets[i];
+
+ rcu_read_lock();
+ /* No one is adding to b->list now */
+ while ((selem = hlist_entry_safe(
+ rcu_dereference_raw(hlist_first_rcu(&b->list)),
+ struct bpf_local_storage_elem, map_node))) {
+ bpf_selem_unlink(selem);
+ cond_resched_rcu();
+ }
+ rcu_read_unlock();
+ }
+
+ /* While freeing the storage we may still need to access the map.
+ *
+ * e.g. when bpf_sk_storage_free() has unlinked selem from the map
+ * which then made the above while((selem = ...)) loop
+ * exit immediately.
+ *
+ * However, while freeing the storage one still needs to access the
+ * smap->elem_size to do the uncharging in
+ * bpf_selem_unlink_storage_nolock().
+ *
+ * Hence, wait another rcu grace period for the storage to be freed.
+ */
+ synchronize_rcu();
+
+ kvfree(smap->buckets);
+ kfree(smap);
+}
+
+int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
+{
+ if (attr->map_flags & ~BPF_LOCAL_STORAGE_CREATE_FLAG_MASK ||
+ !(attr->map_flags & BPF_F_NO_PREALLOC) ||
+ attr->max_entries ||
+ attr->key_size != sizeof(int) || !attr->value_size ||
+ /* Enforce BTF for userspace sk dumping */
+ !attr->btf_key_type_id || !attr->btf_value_type_id)
+ return -EINVAL;
+
+ if (!bpf_capable())
+ return -EPERM;
+
+ if (attr->value_size > BPF_LOCAL_STORAGE_MAX_VALUE_SIZE)
+ return -E2BIG;
+
+ return 0;
+}
+
+struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
+{
+ struct bpf_local_storage_map *smap;
+ unsigned int i;
+ u32 nbuckets;
+ u64 cost;
+ int ret;
+
+ smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN);
+ if (!smap)
+ return ERR_PTR(-ENOMEM);
+ bpf_map_init_from_attr(&smap->map, attr);
+
+ nbuckets = roundup_pow_of_two(num_possible_cpus());
+ /* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */
+ nbuckets = max_t(u32, 2, nbuckets);
+ smap->bucket_log = ilog2(nbuckets);
+ cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap);
+
+ ret = bpf_map_charge_init(&smap->map.memory, cost);
+ if (ret < 0) {
+ kfree(smap);
+ return ERR_PTR(ret);
+ }
+
+ smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
+ GFP_USER | __GFP_NOWARN);
+ if (!smap->buckets) {
+ bpf_map_charge_finish(&smap->map.memory);
+ kfree(smap);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ for (i = 0; i < nbuckets; i++) {
+ INIT_HLIST_HEAD(&smap->buckets[i].list);
+ raw_spin_lock_init(&smap->buckets[i].lock);
+ }
+
+ smap->elem_size =
+ sizeof(struct bpf_local_storage_elem) + attr->value_size;
+
+ return smap;
+}
+
+int bpf_local_storage_map_check_btf(const struct bpf_map *map,
+ const struct btf *btf,
+ const struct btf_type *key_type,
+ const struct btf_type *value_type)
+{
+ u32 int_data;
+
+ if (BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
+ return -EINVAL;
+
+ int_data = *(u32 *)(key_type + 1);
+ if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data))
+ return -EINVAL;
+
+ return 0;
+}
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index cd8b7017913b..f29d9a9b4ea4 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -7,97 +7,14 @@
#include <linux/spinlock.h>
#include <linux/bpf.h>
#include <linux/btf_ids.h>
+#include <linux/bpf_local_storage.h>
#include <net/bpf_sk_storage.h>
#include <net/sock.h>
#include <uapi/linux/sock_diag.h>
#include <uapi/linux/btf.h>
-#define BPF_LOCAL_STORAGE_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_CLONE)
-
DEFINE_BPF_STORAGE_CACHE(sk_cache);
-struct bpf_local_storage_map_bucket {
- struct hlist_head list;
- raw_spinlock_t lock;
-};
-
-/* Thp map is not the primary owner of a bpf_local_storage_elem.
- * Instead, the container object (eg. sk->sk_bpf_storage) is.
- *
- * The map (bpf_local_storage_map) is for two purposes
- * 1. Define the size of the "local storage". It is
- * the map's value_size.
- *
- * 2. Maintain a list to keep track of all elems such
- * that they can be cleaned up during the map destruction.
- *
- * When a bpf local storage is being looked up for a
- * particular object, the "bpf_map" pointer is actually used
- * as the "key" to search in the list of elem in
- * the respective bpf_local_storage owned by the object.
- *
- * e.g. sk->sk_bpf_storage is the mini-map with the "bpf_map" pointer
- * as the searching key.
- */
-struct bpf_local_storage_map {
- struct bpf_map map;
- /* Lookup elem does not require accessing the map.
- *
- * Updating/Deleting requires a bucket lock to
- * link/unlink the elem from the map. Having
- * multiple buckets to improve contention.
- */
- struct bpf_local_storage_map_bucket *buckets;
- u32 bucket_log;
- u16 elem_size;
- u16 cache_idx;
-};
-
-struct bpf_local_storage_data {
- /* smap is used as the searching key when looking up
- * from the object's bpf_local_storage.
- *
- * Put it in the same cacheline as the data to minimize
- * the number of cachelines access during the cache hit case.
- */
- struct bpf_local_storage_map __rcu *smap;
- u8 data[] __aligned(8);
-};
-
-/* Linked to bpf_local_storage and bpf_local_storage_map */
-struct bpf_local_storage_elem {
- struct hlist_node map_node; /* Linked to bpf_local_storage_map */
- struct hlist_node snode; /* Linked to bpf_local_storage */
- struct bpf_local_storage __rcu *local_storage;
- struct rcu_head rcu;
- /* 8 bytes hole */
- /* The data is stored in aother cacheline to minimize
- * the number of cachelines access during a cache hit.
- */
- struct bpf_local_storage_data sdata ____cacheline_aligned;
-};
-
-#define SELEM(_SDATA) \
- container_of((_SDATA), struct bpf_local_storage_elem, sdata)
-#define SDATA(_SELEM) (&(_SELEM)->sdata)
-
-struct bpf_local_storage {
- struct bpf_local_storage_data __rcu *cache[BPF_LOCAL_STORAGE_CACHE_SIZE];
- struct hlist_head list; /* List of bpf_local_storage_elem */
- void *owner; /* The object that owns the above "list" of
- * bpf_local_storage_elem.
- */
- struct rcu_head rcu;
- raw_spinlock_t lock; /* Protect adding/removing from the "list" */
-};
-
-static struct bpf_local_storage_map_bucket *
-select_bucket(struct bpf_local_storage_map *smap,
- struct bpf_local_storage_elem *selem)
-{
- return &smap->buckets[hash_ptr(selem, smap->bucket_log)];
-}
-
static int omem_charge(struct sock *sk, unsigned int size)
{
/* same check as in sock_kmalloc() */
@@ -110,223 +27,6 @@ static int omem_charge(struct sock *sk, unsigned int size)
return -ENOMEM;
}
-static int mem_charge(struct bpf_local_storage_map *smap, void *owner, u32 size)
-{
- struct bpf_map *map = &smap->map;
-
- if (!map->ops->map_local_storage_charge)
- return 0;
-
- return map->ops->map_local_storage_charge(smap, owner, size);
-}
-
-static void mem_uncharge(struct bpf_local_storage_map *smap, void *owner,
- u32 size)
-{
- struct bpf_map *map = &smap->map;
-
- if (map->ops->map_local_storage_uncharge)
- map->ops->map_local_storage_uncharge(smap, owner, size);
-}
-
-static struct bpf_local_storage __rcu **
-owner_storage(struct bpf_local_storage_map *smap, void *owner)
-{
- struct bpf_map *map = &smap->map;
-
- return map->ops->map_owner_storage_ptr(owner);
-}
-
-static bool selem_linked_to_storage(const struct bpf_local_storage_elem *selem)
-{
- return !hlist_unhashed(&selem->snode);
-}
-
-static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
-{
- return !hlist_unhashed(&selem->map_node);
-}
-
-struct bpf_local_storage_elem *
-bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
- void *value, bool charge_mem)
-{
- struct bpf_local_storage_elem *selem;
-
- if (charge_mem && mem_charge(smap, owner, smap->elem_size))
- return NULL;
-
- selem = kzalloc(smap->elem_size, GFP_ATOMIC | __GFP_NOWARN);
- if (selem) {
- if (value)
- memcpy(SDATA(selem)->data, value, smap->map.value_size);
- return selem;
- }
-
- if (charge_mem)
- mem_uncharge(smap, owner, smap->elem_size);
-
- return NULL;
-}
-
-/* local_storage->lock must be held and selem->local_storage == local_storage.
- * The caller must ensure selem->smap is still valid to be
- * dereferenced for its smap->elem_size and smap->cache_idx.
- */
-bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
- struct bpf_local_storage_elem *selem,
- bool uncharge_mem)
-{
- struct bpf_local_storage_map *smap;
- bool free_local_storage;
- void *owner;
-
- smap = rcu_dereference(SDATA(selem)->smap);
- owner = local_storage->owner;
-
- /* All uncharging on the owner must be done first.
- * The owner may be freed once the last selem is unlinked
- * from local_storage.
- */
- if (uncharge_mem)
- mem_uncharge(smap, owner, smap->elem_size);
-
- free_local_storage = hlist_is_singular_node(&selem->snode,
- &local_storage->list);
- if (free_local_storage) {
- mem_uncharge(smap, owner, sizeof(struct bpf_local_storage));
- local_storage->owner = NULL;
-
- /* After this RCU_INIT, owner may be freed and cannot be used */
- RCU_INIT_POINTER(*owner_storage(smap, owner), NULL);
-
- /* local_storage is not freed now. local_storage->lock is
- * still held and raw_spin_unlock_bh(&local_storage->lock)
- * will be done by the caller.
- *
- * Although the unlock will be done under
- * rcu_read_lock(), it is more intutivie to
- * read if kfree_rcu(local_storage, rcu) is done
- * after the raw_spin_unlock_bh(&local_storage->lock).
- *
- * Hence, a "bool free_local_storage" is returned
- * to the caller which then calls the kfree_rcu()
- * after unlock.
- */
- }
- hlist_del_init_rcu(&selem->snode);
- if (rcu_access_pointer(local_storage->cache[smap->cache_idx]) ==
- SDATA(selem))
- RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
-
- kfree_rcu(selem, rcu);
-
- return free_local_storage;
-}
-
-static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem)
-{
- struct bpf_local_storage *local_storage;
- bool free_local_storage = false;
-
- if (unlikely(!selem_linked_to_storage(selem)))
- /* selem has already been unlinked from sk */
- return;
-
- local_storage = rcu_dereference(selem->local_storage);
- raw_spin_lock_bh(&local_storage->lock);
- if (likely(selem_linked_to_storage(selem)))
- free_local_storage = bpf_selem_unlink_storage_nolock(
- local_storage, selem, true);
- raw_spin_unlock_bh(&local_storage->lock);
-
- if (free_local_storage)
- kfree_rcu(local_storage, rcu);
-}
-
-void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
- struct bpf_local_storage_elem *selem)
-{
- RCU_INIT_POINTER(selem->local_storage, local_storage);
- hlist_add_head(&selem->snode, &local_storage->list);
-}
-
-void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
-{
- struct bpf_local_storage_map *smap;
- struct bpf_local_storage_map_bucket *b;
-
- if (unlikely(!selem_linked_to_map(selem)))
- /* selem has already be unlinked from smap */
- return;
-
- smap = rcu_dereference(SDATA(selem)->smap);
- b = select_bucket(smap, selem);
- raw_spin_lock_bh(&b->lock);
- if (likely(selem_linked_to_map(selem)))
- hlist_del_init_rcu(&selem->map_node);
- raw_spin_unlock_bh(&b->lock);
-}
-
-void bpf_selem_link_map(struct bpf_local_storage_map *smap,
- struct bpf_local_storage_elem *selem)
-{
- struct bpf_local_storage_map_bucket *b = select_bucket(smap, selem);
-
- raw_spin_lock_bh(&b->lock);
- RCU_INIT_POINTER(SDATA(selem)->smap, smap);
- hlist_add_head_rcu(&selem->map_node, &b->list);
- raw_spin_unlock_bh(&b->lock);
-}
-
-void bpf_selem_unlink(struct bpf_local_storage_elem *selem)
-{
- /* Always unlink from map before unlinking from local_storage
- * because selem will be freed after successfully unlinked from
- * the local_storage.
- */
- bpf_selem_unlink_map(selem);
- __bpf_selem_unlink_storage(selem);
-}
-
-struct bpf_local_storage_data *
-bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
- struct bpf_local_storage_map *smap,
- bool cacheit_lockit)
-{
- struct bpf_local_storage_data *sdata;
- struct bpf_local_storage_elem *selem;
-
- /* Fast path (cache hit) */
- sdata = rcu_dereference(local_storage->cache[smap->cache_idx]);
- if (sdata && rcu_access_pointer(sdata->smap) == smap)
- return sdata;
-
- /* Slow path (cache miss) */
- hlist_for_each_entry_rcu(selem, &local_storage->list, snode)
- if (rcu_access_pointer(SDATA(selem)->smap) == smap)
- break;
-
- if (!selem)
- return NULL;
-
- sdata = SDATA(selem);
- if (cacheit_lockit) {
- /* spinlock is needed to avoid racing with the
- * parallel delete. Otherwise, publishing an already
- * deleted sdata to the cache will become a use-after-free
- * problem in the next bpf_local_storage_lookup().
- */
- raw_spin_lock_bh(&local_storage->lock);
- if (selem_linked_to_storage(selem))
- rcu_assign_pointer(local_storage->cache[smap->cache_idx],
- sdata);
- raw_spin_unlock_bh(&local_storage->lock);
- }
-
- return sdata;
-}
-
static struct bpf_local_storage_data *
sk_storage_lookup(struct sock *sk, struct bpf_map *map, bool cacheit_lockit)
{
@@ -341,202 +41,6 @@ sk_storage_lookup(struct sock *sk, struct bpf_map *map, bool cacheit_lockit)
return bpf_local_storage_lookup(sk_storage, smap, cacheit_lockit);
}
-static int check_flags(const struct bpf_local_storage_data *old_sdata,
- u64 map_flags)
-{
- if (old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_NOEXIST)
- /* elem already exists */
- return -EEXIST;
-
- if (!old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_EXIST)
- /* elem doesn't exist, cannot update it */
- return -ENOENT;
-
- return 0;
-}
-
-int bpf_local_storage_alloc(void *owner,
- struct bpf_local_storage_map *smap,
- struct bpf_local_storage_elem *first_selem)
-{
- struct bpf_local_storage *prev_storage, *storage;
- struct bpf_local_storage **owner_storage_ptr;
- int err;
-
- err = mem_charge(smap, owner, sizeof(*storage));
- if (err)
- return err;
-
- storage = kzalloc(sizeof(*storage), GFP_ATOMIC | __GFP_NOWARN);
- if (!storage) {
- err = -ENOMEM;
- goto uncharge;
- }
-
- INIT_HLIST_HEAD(&storage->list);
- raw_spin_lock_init(&storage->lock);
- storage->owner = owner;
-
- bpf_selem_link_storage_nolock(storage, first_selem);
- bpf_selem_link_map(smap, first_selem);
-
- owner_storage_ptr =
- (struct bpf_local_storage **)owner_storage(smap, owner);
- /* Publish storage to the owner.
- * Instead of using any lock of the kernel object (i.e. owner),
- * cmpxchg will work with any kernel object regardless what
- * the running context is, bh, irq...etc.
- *
- * From now on, the owner->storage pointer (e.g. sk->sk_bpf_storage)
- * is protected by the storage->lock. Hence, when freeing
- * the owner->storage, the storage->lock must be held before
- * setting owner->storage ptr to NULL.
- */
- prev_storage = cmpxchg(owner_storage_ptr, NULL, storage);
- if (unlikely(prev_storage)) {
- bpf_selem_unlink_map(first_selem);
- err = -EAGAIN;
- goto uncharge;
-
- /* Note that even first_selem was linked to smap's
- * bucket->list, first_selem can be freed immediately
- * (instead of kfree_rcu) because
- * bpf_local_storage_map_free() does a
- * synchronize_rcu() before walking the bucket->list.
- * Hence, no one is accessing selem from the
- * bucket->list under rcu_read_lock().
- */
- }
-
- return 0;
-
-uncharge:
- kfree(storage);
- mem_uncharge(smap, owner, sizeof(*storage));
- return err;
-}
-
-/* sk cannot be going away because it is linking new elem
- * to sk->sk_bpf_storage. (i.e. sk->sk_refcnt cannot be 0).
- * Otherwise, it will become a leak (and other memory issues
- * during map destruction).
- */
-struct bpf_local_storage_data *
-bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
- void *value, u64 map_flags)
-{
- struct bpf_local_storage_data *old_sdata = NULL;
- struct bpf_local_storage_elem *selem;
- struct bpf_local_storage *local_storage;
- int err;
-
- /* BPF_EXIST and BPF_NOEXIST cannot be both set */
- if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) ||
- /* BPF_F_LOCK can only be used in a value with spin_lock */
- unlikely((map_flags & BPF_F_LOCK) &&
- !map_value_has_spin_lock(&smap->map)))
- return ERR_PTR(-EINVAL);
-
- local_storage = rcu_dereference(*owner_storage(smap, owner));
- if (!local_storage || hlist_empty(&local_storage->list)) {
- /* Very first elem for the owner */
- err = check_flags(NULL, map_flags);
- if (err)
- return ERR_PTR(err);
-
- selem = bpf_selem_alloc(smap, owner, value, true);
- if (!selem)
- return ERR_PTR(-ENOMEM);
-
- err = bpf_local_storage_alloc(owner, smap, selem);
- if (err) {
- kfree(selem);
- mem_uncharge(smap, owner, smap->elem_size);
- return ERR_PTR(err);
- }
-
- return SDATA(selem);
- }
-
- if ((map_flags & BPF_F_LOCK) && !(map_flags & BPF_NOEXIST)) {
- /* Hoping to find an old_sdata to do inline update
- * such that it can avoid taking the local_storage->lock
- * and changing the lists.
- */
- old_sdata =
- bpf_local_storage_lookup(local_storage, smap, false);
- err = check_flags(old_sdata, map_flags);
- if (err)
- return ERR_PTR(err);
- if (old_sdata && selem_linked_to_storage(SELEM(old_sdata))) {
- copy_map_value_locked(&smap->map, old_sdata->data,
- value, false);
- return old_sdata;
- }
- }
-
- raw_spin_lock_bh(&local_storage->lock);
-
- /* Recheck local_storage->list under local_storage->lock */
- if (unlikely(hlist_empty(&local_storage->list))) {
- /* A parallel del is happening and local_storage is going
- * away. It has just been checked before, so very
- * unlikely. Return instead of retry to keep things
- * simple.
- */
- err = -EAGAIN;
- goto unlock_err;
- }
-
- old_sdata = bpf_local_storage_lookup(local_storage, smap, false);
- err = check_flags(old_sdata, map_flags);
- if (err)
- goto unlock_err;
-
- if (old_sdata && (map_flags & BPF_F_LOCK)) {
- copy_map_value_locked(&smap->map, old_sdata->data, value,
- false);
- selem = SELEM(old_sdata);
- goto unlock;
- }
-
- /* local_storage->lock is held. Hence, we are sure
- * we can unlink and uncharge the old_sdata successfully
- * later. Hence, instead of charging the new selem now
- * and then uncharge the old selem later (which may cause
- * a potential but unnecessary charge failure), avoid taking
- * a charge at all here (the "!old_sdata" check) and the
- * old_sdata will not be uncharged later during
- * bpf_selem_unlink_storage_nolock().
- */
- selem = bpf_selem_alloc(smap, owner, value, !old_sdata);
- if (!selem) {
- err = -ENOMEM;
- goto unlock_err;
- }
-
- /* First, link the new selem to the map */
- bpf_selem_link_map(smap, selem);
-
- /* Second, link (and publish) the new selem to local_storage */
- bpf_selem_link_storage_nolock(local_storage, selem);
-
- /* Third, remove old selem, SELEM(old_sdata) */
- if (old_sdata) {
- bpf_selem_unlink_map(SELEM(old_sdata));
- bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata),
- false);
- }
-
-unlock:
- raw_spin_unlock_bh(&local_storage->lock);
- return SDATA(selem);
-
-unlock_err:
- raw_spin_unlock_bh(&local_storage->lock);
- return ERR_PTR(err);
-}
-
static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
{
struct bpf_local_storage_data *sdata;
@@ -550,38 +54,6 @@ static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
return 0;
}
-u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache)
-{
- u64 min_usage = U64_MAX;
- u16 i, res = 0;
-
- spin_lock(&cache->idx_lock);
-
- for (i = 0; i < BPF_LOCAL_STORAGE_CACHE_SIZE; i++) {
- if (cache->idx_usage_counts[i] < min_usage) {
- min_usage = cache->idx_usage_counts[i];
- res = i;
-
- /* Found a free cache_idx */
- if (!min_usage)
- break;
- }
- }
- cache->idx_usage_counts[res]++;
-
- spin_unlock(&cache->idx_lock);
-
- return res;
-}
-
-void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache,
- u16 idx)
-{
- spin_lock(&cache->idx_lock);
- cache->idx_usage_counts[idx]--;
- spin_unlock(&cache->idx_lock);
-}
-
/* Called by __sk_destruct() & bpf_sk_storage_clone() */
void bpf_sk_storage_free(struct sock *sk)
{
@@ -622,59 +94,6 @@ void bpf_sk_storage_free(struct sock *sk)
kfree_rcu(sk_storage, rcu);
}
-void bpf_local_storage_map_free(struct bpf_local_storage_map *smap)
-{
- struct bpf_local_storage_elem *selem;
- struct bpf_local_storage_map_bucket *b;
- unsigned int i;
-
- /* Note that this map might be concurrently cloned from
- * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone
- * RCU read section to finish before proceeding. New RCU
- * read sections should be prevented via bpf_map_inc_not_zero.
- */
- synchronize_rcu();
-
- /* bpf prog and the userspace can no longer access this map
- * now. No new selem (of this map) can be added
- * to the owner->storage or to the map bucket's list.
- *
- * The elem of this map can be cleaned up here
- * or when the storage is freed e.g.
- * by bpf_sk_storage_free() during __sk_destruct().
- */
- for (i = 0; i < (1U << smap->bucket_log); i++) {
- b = &smap->buckets[i];
-
- rcu_read_lock();
- /* No one is adding to b->list now */
- while ((selem = hlist_entry_safe(
- rcu_dereference_raw(hlist_first_rcu(&b->list)),
- struct bpf_local_storage_elem, map_node))) {
- bpf_selem_unlink(selem);
- cond_resched_rcu();
- }
- rcu_read_unlock();
- }
-
- /* While freeing the storage we may still need to access the map.
- *
- * e.g. when bpf_sk_storage_free() has unlinked selem from the map
- * which then made the above while((selem = ...)) loop
- * exit immediately.
- *
- * However, while freeing the storage one still needs to access the
- * smap->elem_size to do the uncharging in
- * bpf_selem_unlink_storage_nolock().
- *
- * Hence, wait another rcu grace period for the storage to be freed.
- */
- synchronize_rcu();
-
- kvfree(smap->buckets);
- kfree(smap);
-}
-
static void sk_storage_map_free(struct bpf_map *map)
{
struct bpf_local_storage_map *smap;
@@ -684,78 +103,6 @@ static void sk_storage_map_free(struct bpf_map *map)
bpf_local_storage_map_free(smap);
}
-/* U16_MAX is much more than enough for sk local storage
- * considering a tcp_sock is ~2k.
- */
-#define BPF_LOCAL_STORAGE_MAX_VALUE_SIZE \
- min_t(u32, \
- (KMALLOC_MAX_SIZE - MAX_BPF_STACK - \
- sizeof(struct bpf_local_storage_elem)), \
- (U16_MAX - sizeof(struct bpf_local_storage_elem)))
-
-int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
-{
- if (attr->map_flags & ~BPF_LOCAL_STORAGE_CREATE_FLAG_MASK ||
- !(attr->map_flags & BPF_F_NO_PREALLOC) ||
- attr->max_entries ||
- attr->key_size != sizeof(int) || !attr->value_size ||
- /* Enforce BTF for userspace sk dumping */
- !attr->btf_key_type_id || !attr->btf_value_type_id)
- return -EINVAL;
-
- if (!bpf_capable())
- return -EPERM;
-
- if (attr->value_size > BPF_LOCAL_STORAGE_MAX_VALUE_SIZE)
- return -E2BIG;
-
- return 0;
-}
-
-struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
-{
- struct bpf_local_storage_map *smap;
- unsigned int i;
- u32 nbuckets;
- u64 cost;
- int ret;
-
- smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN);
- if (!smap)
- return ERR_PTR(-ENOMEM);
- bpf_map_init_from_attr(&smap->map, attr);
-
- nbuckets = roundup_pow_of_two(num_possible_cpus());
- /* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */
- nbuckets = max_t(u32, 2, nbuckets);
- smap->bucket_log = ilog2(nbuckets);
- cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap);
-
- ret = bpf_map_charge_init(&smap->map.memory, cost);
- if (ret < 0) {
- kfree(smap);
- return ERR_PTR(ret);
- }
-
- smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
- GFP_USER | __GFP_NOWARN);
- if (!smap->buckets) {
- bpf_map_charge_finish(&smap->map.memory);
- kfree(smap);
- return ERR_PTR(-ENOMEM);
- }
-
- for (i = 0; i < nbuckets; i++) {
- INIT_HLIST_HEAD(&smap->buckets[i].list);
- raw_spin_lock_init(&smap->buckets[i].lock);
- }
-
- smap->elem_size =
- sizeof(struct bpf_local_storage_elem) + attr->value_size;
-
- return smap;
-}
-
static struct bpf_map *sk_storage_map_alloc(union bpf_attr *attr)
{
struct bpf_local_storage_map *smap;
@@ -774,23 +121,6 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key,
return -ENOTSUPP;
}
-int bpf_local_storage_map_check_btf(const struct bpf_map *map,
- const struct btf *btf,
- const struct btf_type *key_type,
- const struct btf_type *value_type)
-{
- u32 int_data;
-
- if (BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
- return -EINVAL;
-
- int_data = *(u32 *)(key_type + 1);
- if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data))
- return -EINVAL;
-
- return 0;
-}
-
static void *bpf_fd_sk_storage_lookup_elem(struct bpf_map *map, void *key)
{
struct bpf_local_storage_data *sdata;
--
2.28.0.297.g1956fa8f8d-goog
^ permalink raw reply related
* [PATCH bpf-next v9 3/7] bpf: Generalize bpf_sk_storage
From: KP Singh @ 2020-08-23 16:56 UTC (permalink / raw)
To: linux-kernel, bpf, linux-security-module
Cc: Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann,
Paul Turner, Jann Horn, Florent Revest
In-Reply-To: <20200823165612.404892-1-kpsingh@chromium.org>
From: KP Singh <kpsingh@google.com>
Refactor the functionality in bpf_sk_storage.c so that concept of
storage linked to kernel objects can be extended to other objects like
inode, task_struct etc.
Each new local storage will still be a separate map and provide its own
set of helpers. This allows for future object specific extensions and
still share a lot of the underlying implementation.
This includes the changes suggested by Martin in:
https://lore.kernel.org/bpf/20200725013047.4006241-1-kafai@fb.com/
adding new map operations to support bpf_local_storage maps:
* storages for different kernel objects to optionally have different
memory charging strategy (map_local_storage_charge,
map_local_storage_uncharge)
* Functionality to extract the storage pointer from a pointer to the
owning object (map_owner_storage_ptr)
Co-developed-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: KP Singh <kpsingh@google.com>
---
include/linux/bpf.h | 8 ++
include/net/bpf_sk_storage.h | 52 +++++++
include/uapi/linux/bpf.h | 8 +-
net/core/bpf_sk_storage.c | 238 +++++++++++++++++++++------------
tools/include/uapi/linux/bpf.h | 8 +-
5 files changed, 228 insertions(+), 86 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 81f38e2fda78..8c443b93ac11 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -34,6 +34,8 @@ struct btf_type;
struct exception_table_entry;
struct seq_operations;
struct bpf_iter_aux_info;
+struct bpf_local_storage;
+struct bpf_local_storage_map;
extern struct idr btf_idr;
extern spinlock_t btf_idr_lock;
@@ -104,6 +106,12 @@ struct bpf_map_ops {
__poll_t (*map_poll)(struct bpf_map *map, struct file *filp,
struct poll_table_struct *pts);
+ /* Functions called by bpf_local_storage maps */
+ int (*map_local_storage_charge)(struct bpf_local_storage_map *smap,
+ void *owner, u32 size);
+ void (*map_local_storage_uncharge)(struct bpf_local_storage_map *smap,
+ void *owner, u32 size);
+ struct bpf_local_storage __rcu ** (*map_owner_storage_ptr)(void *owner);
/* BTF name and id of struct allocated by map_alloc */
const char * const map_btf_name;
int *map_btf_id;
diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h
index 950c5aaba15e..9e631b5466e3 100644
--- a/include/net/bpf_sk_storage.h
+++ b/include/net/bpf_sk_storage.h
@@ -3,8 +3,15 @@
#ifndef _BPF_SK_STORAGE_H
#define _BPF_SK_STORAGE_H
+#include <linux/rculist.h>
+#include <linux/list.h>
+#include <linux/hash.h>
#include <linux/types.h>
#include <linux/spinlock.h>
+#include <linux/bpf.h>
+#include <net/sock.h>
+#include <uapi/linux/sock_diag.h>
+#include <uapi/linux/btf.h>
struct sock;
@@ -13,6 +20,7 @@ void bpf_sk_storage_free(struct sock *sk);
extern const struct bpf_func_proto bpf_sk_storage_get_proto;
extern const struct bpf_func_proto bpf_sk_storage_delete_proto;
+struct bpf_local_storage_elem;
struct bpf_sk_storage_diag;
struct sk_buff;
struct nlattr;
@@ -34,6 +42,50 @@ u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache);
void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache,
u16 idx);
+/* Helper functions for bpf_local_storage */
+int bpf_local_storage_map_alloc_check(union bpf_attr *attr);
+
+struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr);
+
+struct bpf_local_storage_data *
+bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
+ struct bpf_local_storage_map *smap,
+ bool cacheit_lockit);
+
+void bpf_local_storage_map_free(struct bpf_local_storage_map *smap);
+
+int bpf_local_storage_map_check_btf(const struct bpf_map *map,
+ const struct btf *btf,
+ const struct btf_type *key_type,
+ const struct btf_type *value_type);
+
+void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
+ struct bpf_local_storage_elem *selem);
+
+bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
+ struct bpf_local_storage_elem *selem,
+ bool uncharge_omem);
+
+void bpf_selem_unlink(struct bpf_local_storage_elem *selem);
+
+void bpf_selem_link_map(struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *selem);
+
+void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem);
+
+struct bpf_local_storage_elem *
+bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value,
+ bool charge_mem);
+
+int
+bpf_local_storage_alloc(void *owner,
+ struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *first_selem);
+
+struct bpf_local_storage_data *
+bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
+ void *value, u64 map_flags);
+
#ifdef CONFIG_BPF_SYSCALL
int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk);
struct bpf_sk_storage_diag *
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index a1bbaff7a0af..f1a4a476586b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3648,9 +3648,13 @@ enum {
BPF_F_SYSCTL_BASE_NAME = (1ULL << 0),
};
-/* BPF_FUNC_sk_storage_get flags */
+/* BPF_FUNC_<kernel_obj>_storage_get flags */
enum {
- BPF_SK_STORAGE_GET_F_CREATE = (1ULL << 0),
+ BPF_LOCAL_STORAGE_GET_F_CREATE = (1ULL << 0),
+ /* BPF_SK_STORAGE_GET_F_CREATE is only kept for backward compatibility
+ * and BPF_LOCAL_STORAGE_GET_F_CREATE must be used instead.
+ */
+ BPF_SK_STORAGE_GET_F_CREATE = BPF_LOCAL_STORAGE_GET_F_CREATE,
};
/* BPF_FUNC_read_branch_records flags. */
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index ec61ee7c7ee4..cd8b7017913b 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -84,7 +84,7 @@ struct bpf_local_storage_elem {
struct bpf_local_storage {
struct bpf_local_storage_data __rcu *cache[BPF_LOCAL_STORAGE_CACHE_SIZE];
struct hlist_head list; /* List of bpf_local_storage_elem */
- struct sock *owner; /* The object that owns the above "list" of
+ void *owner; /* The object that owns the above "list" of
* bpf_local_storage_elem.
*/
struct rcu_head rcu;
@@ -110,6 +110,33 @@ static int omem_charge(struct sock *sk, unsigned int size)
return -ENOMEM;
}
+static int mem_charge(struct bpf_local_storage_map *smap, void *owner, u32 size)
+{
+ struct bpf_map *map = &smap->map;
+
+ if (!map->ops->map_local_storage_charge)
+ return 0;
+
+ return map->ops->map_local_storage_charge(smap, owner, size);
+}
+
+static void mem_uncharge(struct bpf_local_storage_map *smap, void *owner,
+ u32 size)
+{
+ struct bpf_map *map = &smap->map;
+
+ if (map->ops->map_local_storage_uncharge)
+ map->ops->map_local_storage_uncharge(smap, owner, size);
+}
+
+static struct bpf_local_storage __rcu **
+owner_storage(struct bpf_local_storage_map *smap, void *owner)
+{
+ struct bpf_map *map = &smap->map;
+
+ return map->ops->map_owner_storage_ptr(owner);
+}
+
static bool selem_linked_to_storage(const struct bpf_local_storage_elem *selem)
{
return !hlist_unhashed(&selem->snode);
@@ -120,13 +147,13 @@ static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
return !hlist_unhashed(&selem->map_node);
}
-static struct bpf_local_storage_elem *
-bpf_selem_alloc(struct bpf_local_storage_map *smap, struct sock *sk,
- void *value, bool charge_omem)
+struct bpf_local_storage_elem *
+bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
+ void *value, bool charge_mem)
{
struct bpf_local_storage_elem *selem;
- if (charge_omem && omem_charge(sk, smap->elem_size))
+ if (charge_mem && mem_charge(smap, owner, smap->elem_size))
return NULL;
selem = kzalloc(smap->elem_size, GFP_ATOMIC | __GFP_NOWARN);
@@ -136,8 +163,8 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, struct sock *sk,
return selem;
}
- if (charge_omem)
- atomic_sub(smap->elem_size, &sk->sk_omem_alloc);
+ if (charge_mem)
+ mem_uncharge(smap, owner, smap->elem_size);
return NULL;
}
@@ -146,32 +173,32 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, struct sock *sk,
* The caller must ensure selem->smap is still valid to be
* dereferenced for its smap->elem_size and smap->cache_idx.
*/
-static bool
-bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
- struct bpf_local_storage_elem *selem,
- bool uncharge_omem)
+bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
+ struct bpf_local_storage_elem *selem,
+ bool uncharge_mem)
{
struct bpf_local_storage_map *smap;
bool free_local_storage;
- struct sock *sk;
+ void *owner;
smap = rcu_dereference(SDATA(selem)->smap);
- sk = local_storage->owner;
+ owner = local_storage->owner;
/* All uncharging on the owner must be done first.
* The owner may be freed once the last selem is unlinked
* from local_storage.
*/
- if (uncharge_omem)
- atomic_sub(smap->elem_size, &sk->sk_omem_alloc);
+ if (uncharge_mem)
+ mem_uncharge(smap, owner, smap->elem_size);
free_local_storage = hlist_is_singular_node(&selem->snode,
&local_storage->list);
if (free_local_storage) {
- atomic_sub(sizeof(struct bpf_local_storage), &sk->sk_omem_alloc);
+ mem_uncharge(smap, owner, sizeof(struct bpf_local_storage));
local_storage->owner = NULL;
- /* After this RCU_INIT, sk may be freed and cannot be used */
- RCU_INIT_POINTER(sk->sk_bpf_storage, NULL);
+
+ /* After this RCU_INIT, owner may be freed and cannot be used */
+ RCU_INIT_POINTER(*owner_storage(smap, owner), NULL);
/* local_storage is not freed now. local_storage->lock is
* still held and raw_spin_unlock_bh(&local_storage->lock)
@@ -209,23 +236,22 @@ static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem)
local_storage = rcu_dereference(selem->local_storage);
raw_spin_lock_bh(&local_storage->lock);
if (likely(selem_linked_to_storage(selem)))
- free_local_storage =
- bpf_selem_unlink_storage_nolock(local_storage, selem, true);
+ free_local_storage = bpf_selem_unlink_storage_nolock(
+ local_storage, selem, true);
raw_spin_unlock_bh(&local_storage->lock);
if (free_local_storage)
kfree_rcu(local_storage, rcu);
}
-static void
-bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
- struct bpf_local_storage_elem *selem)
+void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
+ struct bpf_local_storage_elem *selem)
{
RCU_INIT_POINTER(selem->local_storage, local_storage);
hlist_add_head(&selem->snode, &local_storage->list);
}
-static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
+void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
{
struct bpf_local_storage_map *smap;
struct bpf_local_storage_map_bucket *b;
@@ -242,8 +268,8 @@ static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
raw_spin_unlock_bh(&b->lock);
}
-static void bpf_selem_link_map(struct bpf_local_storage_map *smap,
- struct bpf_local_storage_elem *selem)
+void bpf_selem_link_map(struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *selem)
{
struct bpf_local_storage_map_bucket *b = select_bucket(smap, selem);
@@ -253,7 +279,7 @@ static void bpf_selem_link_map(struct bpf_local_storage_map *smap,
raw_spin_unlock_bh(&b->lock);
}
-static void bpf_selem_unlink(struct bpf_local_storage_elem *selem)
+void bpf_selem_unlink(struct bpf_local_storage_elem *selem)
{
/* Always unlink from map before unlinking from local_storage
* because selem will be freed after successfully unlinked from
@@ -263,7 +289,7 @@ static void bpf_selem_unlink(struct bpf_local_storage_elem *selem)
__bpf_selem_unlink_storage(selem);
}
-static struct bpf_local_storage_data *
+struct bpf_local_storage_data *
bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
struct bpf_local_storage_map *smap,
bool cacheit_lockit)
@@ -329,40 +355,45 @@ static int check_flags(const struct bpf_local_storage_data *old_sdata,
return 0;
}
-static int sk_storage_alloc(struct sock *sk,
+int bpf_local_storage_alloc(void *owner,
struct bpf_local_storage_map *smap,
struct bpf_local_storage_elem *first_selem)
{
- struct bpf_local_storage *prev_sk_storage, *sk_storage;
+ struct bpf_local_storage *prev_storage, *storage;
+ struct bpf_local_storage **owner_storage_ptr;
int err;
- err = omem_charge(sk, sizeof(*sk_storage));
+ err = mem_charge(smap, owner, sizeof(*storage));
if (err)
return err;
- sk_storage = kzalloc(sizeof(*sk_storage), GFP_ATOMIC | __GFP_NOWARN);
- if (!sk_storage) {
+ storage = kzalloc(sizeof(*storage), GFP_ATOMIC | __GFP_NOWARN);
+ if (!storage) {
err = -ENOMEM;
goto uncharge;
}
- INIT_HLIST_HEAD(&sk_storage->list);
- raw_spin_lock_init(&sk_storage->lock);
- sk_storage->owner = sk;
- bpf_selem_link_storage_nolock(sk_storage, first_selem);
+ INIT_HLIST_HEAD(&storage->list);
+ raw_spin_lock_init(&storage->lock);
+ storage->owner = owner;
+
+ bpf_selem_link_storage_nolock(storage, first_selem);
bpf_selem_link_map(smap, first_selem);
- /* Publish sk_storage to sk. sk->sk_lock cannot be acquired.
- * Hence, atomic ops is used to set sk->sk_bpf_storage
- * from NULL to the newly allocated sk_storage ptr.
+
+ owner_storage_ptr =
+ (struct bpf_local_storage **)owner_storage(smap, owner);
+ /* Publish storage to the owner.
+ * Instead of using any lock of the kernel object (i.e. owner),
+ * cmpxchg will work with any kernel object regardless what
+ * the running context is, bh, irq...etc.
*
- * From now on, the sk->sk_bpf_storage pointer is protected
- * by the sk_storage->lock. Hence, when freeing
- * the sk->sk_bpf_storage, the sk_storage->lock must
- * be held before setting sk->sk_bpf_storage to NULL.
+ * From now on, the owner->storage pointer (e.g. sk->sk_bpf_storage)
+ * is protected by the storage->lock. Hence, when freeing
+ * the owner->storage, the storage->lock must be held before
+ * setting owner->storage ptr to NULL.
*/
- prev_sk_storage = cmpxchg((struct bpf_local_storage **)&sk->sk_bpf_storage,
- NULL, sk_storage);
- if (unlikely(prev_sk_storage)) {
+ prev_storage = cmpxchg(owner_storage_ptr, NULL, storage);
+ if (unlikely(prev_storage)) {
bpf_selem_unlink_map(first_selem);
err = -EAGAIN;
goto uncharge;
@@ -380,8 +411,8 @@ static int sk_storage_alloc(struct sock *sk,
return 0;
uncharge:
- kfree(sk_storage);
- atomic_sub(sizeof(*sk_storage), &sk->sk_omem_alloc);
+ kfree(storage);
+ mem_uncharge(smap, owner, sizeof(*storage));
return err;
}
@@ -390,38 +421,37 @@ static int sk_storage_alloc(struct sock *sk,
* Otherwise, it will become a leak (and other memory issues
* during map destruction).
*/
-static struct bpf_local_storage_data *
-bpf_local_storage_update(struct sock *sk, struct bpf_map *map, void *value,
- u64 map_flags)
+struct bpf_local_storage_data *
+bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
+ void *value, u64 map_flags)
{
struct bpf_local_storage_data *old_sdata = NULL;
struct bpf_local_storage_elem *selem;
struct bpf_local_storage *local_storage;
- struct bpf_local_storage_map *smap;
int err;
/* BPF_EXIST and BPF_NOEXIST cannot be both set */
if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) ||
/* BPF_F_LOCK can only be used in a value with spin_lock */
- unlikely((map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map)))
+ unlikely((map_flags & BPF_F_LOCK) &&
+ !map_value_has_spin_lock(&smap->map)))
return ERR_PTR(-EINVAL);
- smap = (struct bpf_local_storage_map *)map;
- local_storage = rcu_dereference(sk->sk_bpf_storage);
+ local_storage = rcu_dereference(*owner_storage(smap, owner));
if (!local_storage || hlist_empty(&local_storage->list)) {
/* Very first elem for the owner */
err = check_flags(NULL, map_flags);
if (err)
return ERR_PTR(err);
- selem = bpf_selem_alloc(smap, sk, value, true);
+ selem = bpf_selem_alloc(smap, owner, value, true);
if (!selem)
return ERR_PTR(-ENOMEM);
- err = sk_storage_alloc(sk, smap, selem);
+ err = bpf_local_storage_alloc(owner, smap, selem);
if (err) {
kfree(selem);
- atomic_sub(smap->elem_size, &sk->sk_omem_alloc);
+ mem_uncharge(smap, owner, smap->elem_size);
return ERR_PTR(err);
}
@@ -439,7 +469,7 @@ bpf_local_storage_update(struct sock *sk, struct bpf_map *map, void *value,
if (err)
return ERR_PTR(err);
if (old_sdata && selem_linked_to_storage(SELEM(old_sdata))) {
- copy_map_value_locked(map, old_sdata->data,
+ copy_map_value_locked(&smap->map, old_sdata->data,
value, false);
return old_sdata;
}
@@ -464,7 +494,8 @@ bpf_local_storage_update(struct sock *sk, struct bpf_map *map, void *value,
goto unlock_err;
if (old_sdata && (map_flags & BPF_F_LOCK)) {
- copy_map_value_locked(map, old_sdata->data, value, false);
+ copy_map_value_locked(&smap->map, old_sdata->data, value,
+ false);
selem = SELEM(old_sdata);
goto unlock;
}
@@ -478,7 +509,7 @@ bpf_local_storage_update(struct sock *sk, struct bpf_map *map, void *value,
* old_sdata will not be uncharged later during
* bpf_selem_unlink_storage_nolock().
*/
- selem = bpf_selem_alloc(smap, sk, value, !old_sdata);
+ selem = bpf_selem_alloc(smap, owner, value, !old_sdata);
if (!selem) {
err = -ENOMEM;
goto unlock_err;
@@ -591,17 +622,12 @@ void bpf_sk_storage_free(struct sock *sk)
kfree_rcu(sk_storage, rcu);
}
-static void bpf_local_storage_map_free(struct bpf_map *map)
+void bpf_local_storage_map_free(struct bpf_local_storage_map *smap)
{
struct bpf_local_storage_elem *selem;
- struct bpf_local_storage_map *smap;
struct bpf_local_storage_map_bucket *b;
unsigned int i;
- smap = (struct bpf_local_storage_map *)map;
-
- bpf_local_storage_cache_idx_free(&sk_cache, smap->cache_idx);
-
/* Note that this map might be concurrently cloned from
* bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone
* RCU read section to finish before proceeding. New RCU
@@ -646,7 +672,16 @@ static void bpf_local_storage_map_free(struct bpf_map *map)
synchronize_rcu();
kvfree(smap->buckets);
- kfree(map);
+ kfree(smap);
+}
+
+static void sk_storage_map_free(struct bpf_map *map)
+{
+ struct bpf_local_storage_map *smap;
+
+ smap = (struct bpf_local_storage_map *)map;
+ bpf_local_storage_cache_idx_free(&sk_cache, smap->cache_idx);
+ bpf_local_storage_map_free(smap);
}
/* U16_MAX is much more than enough for sk local storage
@@ -658,7 +693,7 @@ static void bpf_local_storage_map_free(struct bpf_map *map)
sizeof(struct bpf_local_storage_elem)), \
(U16_MAX - sizeof(struct bpf_local_storage_elem)))
-static int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
+int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
{
if (attr->map_flags & ~BPF_LOCAL_STORAGE_CREATE_FLAG_MASK ||
!(attr->map_flags & BPF_F_NO_PREALLOC) ||
@@ -677,7 +712,7 @@ static int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
return 0;
}
-static struct bpf_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
+struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
{
struct bpf_local_storage_map *smap;
unsigned int i;
@@ -717,8 +752,19 @@ static struct bpf_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
smap->elem_size =
sizeof(struct bpf_local_storage_elem) + attr->value_size;
- smap->cache_idx = bpf_local_storage_cache_idx_get(&sk_cache);
+ return smap;
+}
+
+static struct bpf_map *sk_storage_map_alloc(union bpf_attr *attr)
+{
+ struct bpf_local_storage_map *smap;
+
+ smap = bpf_local_storage_map_alloc(attr);
+ if (IS_ERR(smap))
+ return ERR_CAST(smap);
+
+ smap->cache_idx = bpf_local_storage_cache_idx_get(&sk_cache);
return &smap->map;
}
@@ -728,10 +774,10 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key,
return -ENOTSUPP;
}
-static int bpf_local_storage_map_check_btf(const struct bpf_map *map,
- const struct btf *btf,
- const struct btf_type *key_type,
- const struct btf_type *value_type)
+int bpf_local_storage_map_check_btf(const struct bpf_map *map,
+ const struct btf *btf,
+ const struct btf_type *key_type,
+ const struct btf_type *value_type)
{
u32 int_data;
@@ -772,8 +818,9 @@ static int bpf_fd_sk_storage_update_elem(struct bpf_map *map, void *key,
fd = *(int *)key;
sock = sockfd_lookup(fd, &err);
if (sock) {
- sdata = bpf_local_storage_update(sock->sk, map, value,
- map_flags);
+ sdata = bpf_local_storage_update(
+ sock->sk, (struct bpf_local_storage_map *)map, value,
+ map_flags);
sockfd_put(sock);
return PTR_ERR_OR_ZERO(sdata);
}
@@ -862,7 +909,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
bpf_selem_link_map(smap, copy_selem);
bpf_selem_link_storage_nolock(new_sk_storage, copy_selem);
} else {
- ret = sk_storage_alloc(newsk, smap, copy_selem);
+ ret = bpf_local_storage_alloc(newsk, smap, copy_selem);
if (ret) {
kfree(copy_selem);
atomic_sub(smap->elem_size,
@@ -906,7 +953,9 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
* destruction).
*/
refcount_inc_not_zero(&sk->sk_refcnt)) {
- sdata = bpf_local_storage_update(sk, map, value, BPF_NOEXIST);
+ sdata = bpf_local_storage_update(
+ sk, (struct bpf_local_storage_map *)map, value,
+ BPF_NOEXIST);
/* sk must be a fullsock (guaranteed by verifier),
* so sock_gen_put() is unnecessary.
*/
@@ -931,11 +980,33 @@ BPF_CALL_2(bpf_sk_storage_delete, struct bpf_map *, map, struct sock *, sk)
return -ENOENT;
}
+static int sk_storage_charge(struct bpf_local_storage_map *smap,
+ void *owner, u32 size)
+{
+ return omem_charge(owner, size);
+}
+
+static void sk_storage_uncharge(struct bpf_local_storage_map *smap,
+ void *owner, u32 size)
+{
+ struct sock *sk = owner;
+
+ atomic_sub(size, &sk->sk_omem_alloc);
+}
+
+static struct bpf_local_storage __rcu **
+sk_storage_ptr(void *owner)
+{
+ struct sock *sk = owner;
+
+ return &sk->sk_bpf_storage;
+}
+
static int sk_storage_map_btf_id;
const struct bpf_map_ops sk_storage_map_ops = {
.map_alloc_check = bpf_local_storage_map_alloc_check,
- .map_alloc = bpf_local_storage_map_alloc,
- .map_free = bpf_local_storage_map_free,
+ .map_alloc = sk_storage_map_alloc,
+ .map_free = sk_storage_map_free,
.map_get_next_key = notsupp_get_next_key,
.map_lookup_elem = bpf_fd_sk_storage_lookup_elem,
.map_update_elem = bpf_fd_sk_storage_update_elem,
@@ -943,6 +1014,9 @@ const struct bpf_map_ops sk_storage_map_ops = {
.map_check_btf = bpf_local_storage_map_check_btf,
.map_btf_name = "bpf_local_storage_map",
.map_btf_id = &sk_storage_map_btf_id,
+ .map_local_storage_charge = sk_storage_charge,
+ .map_local_storage_uncharge = sk_storage_uncharge,
+ .map_owner_storage_ptr = sk_storage_ptr,
};
const struct bpf_func_proto bpf_sk_storage_get_proto = {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index a1bbaff7a0af..f1a4a476586b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3648,9 +3648,13 @@ enum {
BPF_F_SYSCTL_BASE_NAME = (1ULL << 0),
};
-/* BPF_FUNC_sk_storage_get flags */
+/* BPF_FUNC_<kernel_obj>_storage_get flags */
enum {
- BPF_SK_STORAGE_GET_F_CREATE = (1ULL << 0),
+ BPF_LOCAL_STORAGE_GET_F_CREATE = (1ULL << 0),
+ /* BPF_SK_STORAGE_GET_F_CREATE is only kept for backward compatibility
+ * and BPF_LOCAL_STORAGE_GET_F_CREATE must be used instead.
+ */
+ BPF_SK_STORAGE_GET_F_CREATE = BPF_LOCAL_STORAGE_GET_F_CREATE,
};
/* BPF_FUNC_read_branch_records flags. */
--
2.28.0.297.g1956fa8f8d-goog
^ permalink raw reply related
* [PATCH bpf-next v9 1/7] bpf: Renames in preparation for bpf_local_storage
From: KP Singh @ 2020-08-23 16:56 UTC (permalink / raw)
To: linux-kernel, bpf, linux-security-module
Cc: Martin KaFai Lau, Alexei Starovoitov, Daniel Borkmann,
Paul Turner, Jann Horn, Florent Revest
In-Reply-To: <20200823165612.404892-1-kpsingh@chromium.org>
From: KP Singh <kpsingh@google.com>
A purely mechanical change to split the renaming from the actual
generalization.
Flags/consts:
SK_STORAGE_CREATE_FLAG_MASK BPF_LOCAL_STORAGE_CREATE_FLAG_MASK
BPF_SK_STORAGE_CACHE_SIZE BPF_LOCAL_STORAGE_CACHE_SIZE
MAX_VALUE_SIZE BPF_LOCAL_STORAGE_MAX_VALUE_SIZE
Structs:
bucket bpf_local_storage_map_bucket
bpf_sk_storage_map bpf_local_storage_map
bpf_sk_storage_data bpf_local_storage_data
bpf_sk_storage_elem bpf_local_storage_elem
bpf_sk_storage bpf_local_storage
The "sk" member in bpf_local_storage is also updated to "owner"
in preparation for changing the type to void * in a subsequent patch.
Functions:
selem_linked_to_sk selem_linked_to_storage
selem_alloc bpf_selem_alloc
__selem_unlink_sk bpf_selem_unlink_storage_nolock
__selem_link_sk bpf_selem_link_storage_nolock
selem_unlink_sk __bpf_selem_unlink_storage
sk_storage_update bpf_local_storage_update
__sk_storage_lookup bpf_local_storage_lookup
bpf_sk_storage_map_free bpf_local_storage_map_free
bpf_sk_storage_map_alloc bpf_local_storage_map_alloc
bpf_sk_storage_map_alloc_check bpf_local_storage_map_alloc_check
bpf_sk_storage_map_check_btf bpf_local_storage_map_check_btf
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: KP Singh <kpsingh@google.com>
---
include/net/sock.h | 4 +-
net/core/bpf_sk_storage.c | 488 +++++++++++++++++++-------------------
2 files changed, 252 insertions(+), 240 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 064637d1ddf6..18423cc9cde8 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -246,7 +246,7 @@ struct sock_common {
/* public: */
};
-struct bpf_sk_storage;
+struct bpf_local_storage;
/**
* struct sock - network layer representation of sockets
@@ -517,7 +517,7 @@ struct sock {
void (*sk_destruct)(struct sock *sk);
struct sock_reuseport __rcu *sk_reuseport_cb;
#ifdef CONFIG_BPF_SYSCALL
- struct bpf_sk_storage __rcu *sk_bpf_storage;
+ struct bpf_local_storage __rcu *sk_bpf_storage;
#endif
struct rcu_head sk_rcu;
};
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 281200dc0a01..f975e2d01207 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -12,33 +12,32 @@
#include <uapi/linux/sock_diag.h>
#include <uapi/linux/btf.h>
-#define SK_STORAGE_CREATE_FLAG_MASK \
- (BPF_F_NO_PREALLOC | BPF_F_CLONE)
+#define BPF_LOCAL_STORAGE_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_CLONE)
-struct bucket {
+struct bpf_local_storage_map_bucket {
struct hlist_head list;
raw_spinlock_t lock;
};
-/* Thp map is not the primary owner of a bpf_sk_storage_elem.
- * Instead, the sk->sk_bpf_storage is.
+/* Thp map is not the primary owner of a bpf_local_storage_elem.
+ * Instead, the container object (eg. sk->sk_bpf_storage) is.
*
- * The map (bpf_sk_storage_map) is for two purposes
- * 1. Define the size of the "sk local storage". It is
+ * The map (bpf_local_storage_map) is for two purposes
+ * 1. Define the size of the "local storage". It is
* the map's value_size.
*
* 2. Maintain a list to keep track of all elems such
* that they can be cleaned up during the map destruction.
*
* When a bpf local storage is being looked up for a
- * particular sk, the "bpf_map" pointer is actually used
+ * particular object, the "bpf_map" pointer is actually used
* as the "key" to search in the list of elem in
- * sk->sk_bpf_storage.
+ * the respective bpf_local_storage owned by the object.
*
- * Hence, consider sk->sk_bpf_storage is the mini-map
- * with the "bpf_map" pointer as the searching key.
+ * e.g. sk->sk_bpf_storage is the mini-map with the "bpf_map" pointer
+ * as the searching key.
*/
-struct bpf_sk_storage_map {
+struct bpf_local_storage_map {
struct bpf_map map;
/* Lookup elem does not require accessing the map.
*
@@ -46,55 +45,57 @@ struct bpf_sk_storage_map {
* link/unlink the elem from the map. Having
* multiple buckets to improve contention.
*/
- struct bucket *buckets;
+ struct bpf_local_storage_map_bucket *buckets;
u32 bucket_log;
u16 elem_size;
u16 cache_idx;
};
-struct bpf_sk_storage_data {
+struct bpf_local_storage_data {
/* smap is used as the searching key when looking up
- * from sk->sk_bpf_storage.
+ * from the object's bpf_local_storage.
*
* Put it in the same cacheline as the data to minimize
* the number of cachelines access during the cache hit case.
*/
- struct bpf_sk_storage_map __rcu *smap;
+ struct bpf_local_storage_map __rcu *smap;
u8 data[] __aligned(8);
};
-/* Linked to bpf_sk_storage and bpf_sk_storage_map */
-struct bpf_sk_storage_elem {
- struct hlist_node map_node; /* Linked to bpf_sk_storage_map */
- struct hlist_node snode; /* Linked to bpf_sk_storage */
- struct bpf_sk_storage __rcu *sk_storage;
+/* Linked to bpf_local_storage and bpf_local_storage_map */
+struct bpf_local_storage_elem {
+ struct hlist_node map_node; /* Linked to bpf_local_storage_map */
+ struct hlist_node snode; /* Linked to bpf_local_storage */
+ struct bpf_local_storage __rcu *local_storage;
struct rcu_head rcu;
/* 8 bytes hole */
/* The data is stored in aother cacheline to minimize
* the number of cachelines access during a cache hit.
*/
- struct bpf_sk_storage_data sdata ____cacheline_aligned;
+ struct bpf_local_storage_data sdata ____cacheline_aligned;
};
-#define SELEM(_SDATA) container_of((_SDATA), struct bpf_sk_storage_elem, sdata)
+#define SELEM(_SDATA) \
+ container_of((_SDATA), struct bpf_local_storage_elem, sdata)
#define SDATA(_SELEM) (&(_SELEM)->sdata)
-#define BPF_SK_STORAGE_CACHE_SIZE 16
+#define BPF_LOCAL_STORAGE_CACHE_SIZE 16
static DEFINE_SPINLOCK(cache_idx_lock);
-static u64 cache_idx_usage_counts[BPF_SK_STORAGE_CACHE_SIZE];
+static u64 cache_idx_usage_counts[BPF_LOCAL_STORAGE_CACHE_SIZE];
-struct bpf_sk_storage {
- struct bpf_sk_storage_data __rcu *cache[BPF_SK_STORAGE_CACHE_SIZE];
- struct hlist_head list; /* List of bpf_sk_storage_elem */
- struct sock *sk; /* The sk that owns the the above "list" of
- * bpf_sk_storage_elem.
+struct bpf_local_storage {
+ struct bpf_local_storage_data __rcu *cache[BPF_LOCAL_STORAGE_CACHE_SIZE];
+ struct hlist_head list; /* List of bpf_local_storage_elem */
+ struct sock *owner; /* The object that owns the above "list" of
+ * bpf_local_storage_elem.
*/
struct rcu_head rcu;
raw_spinlock_t lock; /* Protect adding/removing from the "list" */
};
-static struct bucket *select_bucket(struct bpf_sk_storage_map *smap,
- struct bpf_sk_storage_elem *selem)
+static struct bpf_local_storage_map_bucket *
+select_bucket(struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *selem)
{
return &smap->buckets[hash_ptr(selem, smap->bucket_log)];
}
@@ -111,21 +112,21 @@ static int omem_charge(struct sock *sk, unsigned int size)
return -ENOMEM;
}
-static bool selem_linked_to_sk(const struct bpf_sk_storage_elem *selem)
+static bool selem_linked_to_storage(const struct bpf_local_storage_elem *selem)
{
return !hlist_unhashed(&selem->snode);
}
-static bool selem_linked_to_map(const struct bpf_sk_storage_elem *selem)
+static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
{
return !hlist_unhashed(&selem->map_node);
}
-static struct bpf_sk_storage_elem *selem_alloc(struct bpf_sk_storage_map *smap,
- struct sock *sk, void *value,
- bool charge_omem)
+static struct bpf_local_storage_elem *
+bpf_selem_alloc(struct bpf_local_storage_map *smap, struct sock *sk,
+ void *value, bool charge_omem)
{
- struct bpf_sk_storage_elem *selem;
+ struct bpf_local_storage_elem *selem;
if (charge_omem && omem_charge(sk, smap->elem_size))
return NULL;
@@ -143,89 +144,93 @@ static struct bpf_sk_storage_elem *selem_alloc(struct bpf_sk_storage_map *smap,
return NULL;
}
-/* sk_storage->lock must be held and selem->sk_storage == sk_storage.
+/* local_storage->lock must be held and selem->local_storage == local_storage.
* The caller must ensure selem->smap is still valid to be
* dereferenced for its smap->elem_size and smap->cache_idx.
*/
-static bool __selem_unlink_sk(struct bpf_sk_storage *sk_storage,
- struct bpf_sk_storage_elem *selem,
- bool uncharge_omem)
+static bool
+bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage,
+ struct bpf_local_storage_elem *selem,
+ bool uncharge_omem)
{
- struct bpf_sk_storage_map *smap;
- bool free_sk_storage;
+ struct bpf_local_storage_map *smap;
+ bool free_local_storage;
struct sock *sk;
smap = rcu_dereference(SDATA(selem)->smap);
- sk = sk_storage->sk;
+ sk = local_storage->owner;
- /* All uncharging on sk->sk_omem_alloc must be done first.
- * sk may be freed once the last selem is unlinked from sk_storage.
+ /* All uncharging on the owner must be done first.
+ * The owner may be freed once the last selem is unlinked
+ * from local_storage.
*/
if (uncharge_omem)
atomic_sub(smap->elem_size, &sk->sk_omem_alloc);
- free_sk_storage = hlist_is_singular_node(&selem->snode,
- &sk_storage->list);
- if (free_sk_storage) {
- atomic_sub(sizeof(struct bpf_sk_storage), &sk->sk_omem_alloc);
- sk_storage->sk = NULL;
+ free_local_storage = hlist_is_singular_node(&selem->snode,
+ &local_storage->list);
+ if (free_local_storage) {
+ atomic_sub(sizeof(struct bpf_local_storage), &sk->sk_omem_alloc);
+ local_storage->owner = NULL;
/* After this RCU_INIT, sk may be freed and cannot be used */
RCU_INIT_POINTER(sk->sk_bpf_storage, NULL);
- /* sk_storage is not freed now. sk_storage->lock is
- * still held and raw_spin_unlock_bh(&sk_storage->lock)
+ /* local_storage is not freed now. local_storage->lock is
+ * still held and raw_spin_unlock_bh(&local_storage->lock)
* will be done by the caller.
*
* Although the unlock will be done under
* rcu_read_lock(), it is more intutivie to
- * read if kfree_rcu(sk_storage, rcu) is done
- * after the raw_spin_unlock_bh(&sk_storage->lock).
+ * read if kfree_rcu(local_storage, rcu) is done
+ * after the raw_spin_unlock_bh(&local_storage->lock).
*
- * Hence, a "bool free_sk_storage" is returned
+ * Hence, a "bool free_local_storage" is returned
* to the caller which then calls the kfree_rcu()
* after unlock.
*/
}
hlist_del_init_rcu(&selem->snode);
- if (rcu_access_pointer(sk_storage->cache[smap->cache_idx]) ==
+ if (rcu_access_pointer(local_storage->cache[smap->cache_idx]) ==
SDATA(selem))
- RCU_INIT_POINTER(sk_storage->cache[smap->cache_idx], NULL);
+ RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
kfree_rcu(selem, rcu);
- return free_sk_storage;
+ return free_local_storage;
}
-static void selem_unlink_sk(struct bpf_sk_storage_elem *selem)
+static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem)
{
- struct bpf_sk_storage *sk_storage;
- bool free_sk_storage = false;
+ struct bpf_local_storage *local_storage;
+ bool free_local_storage = false;
- if (unlikely(!selem_linked_to_sk(selem)))
+ if (unlikely(!selem_linked_to_storage(selem)))
/* selem has already been unlinked from sk */
return;
- sk_storage = rcu_dereference(selem->sk_storage);
- raw_spin_lock_bh(&sk_storage->lock);
- if (likely(selem_linked_to_sk(selem)))
- free_sk_storage = __selem_unlink_sk(sk_storage, selem, true);
- raw_spin_unlock_bh(&sk_storage->lock);
+ local_storage = rcu_dereference(selem->local_storage);
+ raw_spin_lock_bh(&local_storage->lock);
+ if (likely(selem_linked_to_storage(selem)))
+ free_local_storage =
+ bpf_selem_unlink_storage_nolock(local_storage, selem, true);
+ raw_spin_unlock_bh(&local_storage->lock);
- if (free_sk_storage)
- kfree_rcu(sk_storage, rcu);
+ if (free_local_storage)
+ kfree_rcu(local_storage, rcu);
}
-static void __selem_link_sk(struct bpf_sk_storage *sk_storage,
- struct bpf_sk_storage_elem *selem)
+static void
+bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
+ struct bpf_local_storage_elem *selem)
{
- RCU_INIT_POINTER(selem->sk_storage, sk_storage);
- hlist_add_head(&selem->snode, &sk_storage->list);
+ RCU_INIT_POINTER(selem->local_storage, local_storage);
+ hlist_add_head(&selem->snode, &local_storage->list);
}
-static void selem_unlink_map(struct bpf_sk_storage_elem *selem)
+static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
{
- struct bpf_sk_storage_map *smap;
- struct bucket *b;
+ struct bpf_local_storage_map *smap;
+ struct bpf_local_storage_map_bucket *b;
if (unlikely(!selem_linked_to_map(selem)))
/* selem has already be unlinked from smap */
@@ -239,10 +244,10 @@ static void selem_unlink_map(struct bpf_sk_storage_elem *selem)
raw_spin_unlock_bh(&b->lock);
}
-static void selem_link_map(struct bpf_sk_storage_map *smap,
- struct bpf_sk_storage_elem *selem)
+static void bpf_selem_link_map(struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *selem)
{
- struct bucket *b = select_bucket(smap, selem);
+ struct bpf_local_storage_map_bucket *b = select_bucket(smap, selem);
raw_spin_lock_bh(&b->lock);
RCU_INIT_POINTER(SDATA(selem)->smap, smap);
@@ -250,31 +255,31 @@ static void selem_link_map(struct bpf_sk_storage_map *smap,
raw_spin_unlock_bh(&b->lock);
}
-static void selem_unlink(struct bpf_sk_storage_elem *selem)
+static void bpf_selem_unlink(struct bpf_local_storage_elem *selem)
{
- /* Always unlink from map before unlinking from sk_storage
+ /* Always unlink from map before unlinking from local_storage
* because selem will be freed after successfully unlinked from
- * the sk_storage.
+ * the local_storage.
*/
- selem_unlink_map(selem);
- selem_unlink_sk(selem);
+ bpf_selem_unlink_map(selem);
+ __bpf_selem_unlink_storage(selem);
}
-static struct bpf_sk_storage_data *
-__sk_storage_lookup(struct bpf_sk_storage *sk_storage,
- struct bpf_sk_storage_map *smap,
- bool cacheit_lockit)
+static struct bpf_local_storage_data *
+bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
+ struct bpf_local_storage_map *smap,
+ bool cacheit_lockit)
{
- struct bpf_sk_storage_data *sdata;
- struct bpf_sk_storage_elem *selem;
+ struct bpf_local_storage_data *sdata;
+ struct bpf_local_storage_elem *selem;
/* Fast path (cache hit) */
- sdata = rcu_dereference(sk_storage->cache[smap->cache_idx]);
+ sdata = rcu_dereference(local_storage->cache[smap->cache_idx]);
if (sdata && rcu_access_pointer(sdata->smap) == smap)
return sdata;
/* Slow path (cache miss) */
- hlist_for_each_entry_rcu(selem, &sk_storage->list, snode)
+ hlist_for_each_entry_rcu(selem, &local_storage->list, snode)
if (rcu_access_pointer(SDATA(selem)->smap) == smap)
break;
@@ -286,33 +291,33 @@ __sk_storage_lookup(struct bpf_sk_storage *sk_storage,
/* spinlock is needed to avoid racing with the
* parallel delete. Otherwise, publishing an already
* deleted sdata to the cache will become a use-after-free
- * problem in the next __sk_storage_lookup().
+ * problem in the next bpf_local_storage_lookup().
*/
- raw_spin_lock_bh(&sk_storage->lock);
- if (selem_linked_to_sk(selem))
- rcu_assign_pointer(sk_storage->cache[smap->cache_idx],
+ raw_spin_lock_bh(&local_storage->lock);
+ if (selem_linked_to_storage(selem))
+ rcu_assign_pointer(local_storage->cache[smap->cache_idx],
sdata);
- raw_spin_unlock_bh(&sk_storage->lock);
+ raw_spin_unlock_bh(&local_storage->lock);
}
return sdata;
}
-static struct bpf_sk_storage_data *
+static struct bpf_local_storage_data *
sk_storage_lookup(struct sock *sk, struct bpf_map *map, bool cacheit_lockit)
{
- struct bpf_sk_storage *sk_storage;
- struct bpf_sk_storage_map *smap;
+ struct bpf_local_storage *sk_storage;
+ struct bpf_local_storage_map *smap;
sk_storage = rcu_dereference(sk->sk_bpf_storage);
if (!sk_storage)
return NULL;
- smap = (struct bpf_sk_storage_map *)map;
- return __sk_storage_lookup(sk_storage, smap, cacheit_lockit);
+ smap = (struct bpf_local_storage_map *)map;
+ return bpf_local_storage_lookup(sk_storage, smap, cacheit_lockit);
}
-static int check_flags(const struct bpf_sk_storage_data *old_sdata,
+static int check_flags(const struct bpf_local_storage_data *old_sdata,
u64 map_flags)
{
if (old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_NOEXIST)
@@ -327,10 +332,10 @@ static int check_flags(const struct bpf_sk_storage_data *old_sdata,
}
static int sk_storage_alloc(struct sock *sk,
- struct bpf_sk_storage_map *smap,
- struct bpf_sk_storage_elem *first_selem)
+ struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *first_selem)
{
- struct bpf_sk_storage *prev_sk_storage, *sk_storage;
+ struct bpf_local_storage *prev_sk_storage, *sk_storage;
int err;
err = omem_charge(sk, sizeof(*sk_storage));
@@ -344,10 +349,10 @@ static int sk_storage_alloc(struct sock *sk,
}
INIT_HLIST_HEAD(&sk_storage->list);
raw_spin_lock_init(&sk_storage->lock);
- sk_storage->sk = sk;
+ sk_storage->owner = sk;
- __selem_link_sk(sk_storage, first_selem);
- selem_link_map(smap, first_selem);
+ bpf_selem_link_storage_nolock(sk_storage, first_selem);
+ bpf_selem_link_map(smap, first_selem);
/* Publish sk_storage to sk. sk->sk_lock cannot be acquired.
* Hence, atomic ops is used to set sk->sk_bpf_storage
* from NULL to the newly allocated sk_storage ptr.
@@ -357,17 +362,17 @@ static int sk_storage_alloc(struct sock *sk,
* the sk->sk_bpf_storage, the sk_storage->lock must
* be held before setting sk->sk_bpf_storage to NULL.
*/
- prev_sk_storage = cmpxchg((struct bpf_sk_storage **)&sk->sk_bpf_storage,
+ prev_sk_storage = cmpxchg((struct bpf_local_storage **)&sk->sk_bpf_storage,
NULL, sk_storage);
if (unlikely(prev_sk_storage)) {
- selem_unlink_map(first_selem);
+ bpf_selem_unlink_map(first_selem);
err = -EAGAIN;
goto uncharge;
/* Note that even first_selem was linked to smap's
* bucket->list, first_selem can be freed immediately
* (instead of kfree_rcu) because
- * bpf_sk_storage_map_free() does a
+ * bpf_local_storage_map_free() does a
* synchronize_rcu() before walking the bucket->list.
* Hence, no one is accessing selem from the
* bucket->list under rcu_read_lock().
@@ -387,15 +392,14 @@ static int sk_storage_alloc(struct sock *sk,
* Otherwise, it will become a leak (and other memory issues
* during map destruction).
*/
-static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk,
- struct bpf_map *map,
- void *value,
- u64 map_flags)
+static struct bpf_local_storage_data *
+bpf_local_storage_update(struct sock *sk, struct bpf_map *map, void *value,
+ u64 map_flags)
{
- struct bpf_sk_storage_data *old_sdata = NULL;
- struct bpf_sk_storage_elem *selem;
- struct bpf_sk_storage *sk_storage;
- struct bpf_sk_storage_map *smap;
+ struct bpf_local_storage_data *old_sdata = NULL;
+ struct bpf_local_storage_elem *selem;
+ struct bpf_local_storage *local_storage;
+ struct bpf_local_storage_map *smap;
int err;
/* BPF_EXIST and BPF_NOEXIST cannot be both set */
@@ -404,15 +408,15 @@ static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk,
unlikely((map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map)))
return ERR_PTR(-EINVAL);
- smap = (struct bpf_sk_storage_map *)map;
- sk_storage = rcu_dereference(sk->sk_bpf_storage);
- if (!sk_storage || hlist_empty(&sk_storage->list)) {
- /* Very first elem for this sk */
+ smap = (struct bpf_local_storage_map *)map;
+ local_storage = rcu_dereference(sk->sk_bpf_storage);
+ if (!local_storage || hlist_empty(&local_storage->list)) {
+ /* Very first elem for the owner */
err = check_flags(NULL, map_flags);
if (err)
return ERR_PTR(err);
- selem = selem_alloc(smap, sk, value, true);
+ selem = bpf_selem_alloc(smap, sk, value, true);
if (!selem)
return ERR_PTR(-ENOMEM);
@@ -428,25 +432,26 @@ static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk,
if ((map_flags & BPF_F_LOCK) && !(map_flags & BPF_NOEXIST)) {
/* Hoping to find an old_sdata to do inline update
- * such that it can avoid taking the sk_storage->lock
+ * such that it can avoid taking the local_storage->lock
* and changing the lists.
*/
- old_sdata = __sk_storage_lookup(sk_storage, smap, false);
+ old_sdata =
+ bpf_local_storage_lookup(local_storage, smap, false);
err = check_flags(old_sdata, map_flags);
if (err)
return ERR_PTR(err);
- if (old_sdata && selem_linked_to_sk(SELEM(old_sdata))) {
+ if (old_sdata && selem_linked_to_storage(SELEM(old_sdata))) {
copy_map_value_locked(map, old_sdata->data,
value, false);
return old_sdata;
}
}
- raw_spin_lock_bh(&sk_storage->lock);
+ raw_spin_lock_bh(&local_storage->lock);
- /* Recheck sk_storage->list under sk_storage->lock */
- if (unlikely(hlist_empty(&sk_storage->list))) {
- /* A parallel del is happening and sk_storage is going
+ /* Recheck local_storage->list under local_storage->lock */
+ if (unlikely(hlist_empty(&local_storage->list))) {
+ /* A parallel del is happening and local_storage is going
* away. It has just been checked before, so very
* unlikely. Return instead of retry to keep things
* simple.
@@ -455,7 +460,7 @@ static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk,
goto unlock_err;
}
- old_sdata = __sk_storage_lookup(sk_storage, smap, false);
+ old_sdata = bpf_local_storage_lookup(local_storage, smap, false);
err = check_flags(old_sdata, map_flags);
if (err)
goto unlock_err;
@@ -466,50 +471,52 @@ static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk,
goto unlock;
}
- /* sk_storage->lock is held. Hence, we are sure
+ /* local_storage->lock is held. Hence, we are sure
* we can unlink and uncharge the old_sdata successfully
* later. Hence, instead of charging the new selem now
* and then uncharge the old selem later (which may cause
* a potential but unnecessary charge failure), avoid taking
* a charge at all here (the "!old_sdata" check) and the
- * old_sdata will not be uncharged later during __selem_unlink_sk().
+ * old_sdata will not be uncharged later during
+ * bpf_selem_unlink_storage_nolock().
*/
- selem = selem_alloc(smap, sk, value, !old_sdata);
+ selem = bpf_selem_alloc(smap, sk, value, !old_sdata);
if (!selem) {
err = -ENOMEM;
goto unlock_err;
}
/* First, link the new selem to the map */
- selem_link_map(smap, selem);
+ bpf_selem_link_map(smap, selem);
- /* Second, link (and publish) the new selem to sk_storage */
- __selem_link_sk(sk_storage, selem);
+ /* Second, link (and publish) the new selem to local_storage */
+ bpf_selem_link_storage_nolock(local_storage, selem);
/* Third, remove old selem, SELEM(old_sdata) */
if (old_sdata) {
- selem_unlink_map(SELEM(old_sdata));
- __selem_unlink_sk(sk_storage, SELEM(old_sdata), false);
+ bpf_selem_unlink_map(SELEM(old_sdata));
+ bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata),
+ false);
}
unlock:
- raw_spin_unlock_bh(&sk_storage->lock);
+ raw_spin_unlock_bh(&local_storage->lock);
return SDATA(selem);
unlock_err:
- raw_spin_unlock_bh(&sk_storage->lock);
+ raw_spin_unlock_bh(&local_storage->lock);
return ERR_PTR(err);
}
static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
{
- struct bpf_sk_storage_data *sdata;
+ struct bpf_local_storage_data *sdata;
sdata = sk_storage_lookup(sk, map, false);
if (!sdata)
return -ENOENT;
- selem_unlink(SELEM(sdata));
+ bpf_selem_unlink(SELEM(sdata));
return 0;
}
@@ -521,7 +528,7 @@ static u16 cache_idx_get(void)
spin_lock(&cache_idx_lock);
- for (i = 0; i < BPF_SK_STORAGE_CACHE_SIZE; i++) {
+ for (i = 0; i < BPF_LOCAL_STORAGE_CACHE_SIZE; i++) {
if (cache_idx_usage_counts[i] < min_usage) {
min_usage = cache_idx_usage_counts[i];
res = i;
@@ -548,8 +555,8 @@ static void cache_idx_free(u16 idx)
/* Called by __sk_destruct() & bpf_sk_storage_clone() */
void bpf_sk_storage_free(struct sock *sk)
{
- struct bpf_sk_storage_elem *selem;
- struct bpf_sk_storage *sk_storage;
+ struct bpf_local_storage_elem *selem;
+ struct bpf_local_storage *sk_storage;
bool free_sk_storage = false;
struct hlist_node *n;
@@ -565,7 +572,7 @@ void bpf_sk_storage_free(struct sock *sk)
* Thus, no elem can be added-to or deleted-from the
* sk_storage->list by the bpf_prog or by the bpf-map's syscall.
*
- * It is racing with bpf_sk_storage_map_free() alone
+ * It is racing with bpf_local_storage_map_free() alone
* when unlinking elem from the sk_storage->list and
* the map's bucket->list.
*/
@@ -574,8 +581,9 @@ void bpf_sk_storage_free(struct sock *sk)
/* Always unlink from map before unlinking from
* sk_storage.
*/
- selem_unlink_map(selem);
- free_sk_storage = __selem_unlink_sk(sk_storage, selem, true);
+ bpf_selem_unlink_map(selem);
+ free_sk_storage = bpf_selem_unlink_storage_nolock(sk_storage,
+ selem, true);
}
raw_spin_unlock_bh(&sk_storage->lock);
rcu_read_unlock();
@@ -584,14 +592,14 @@ void bpf_sk_storage_free(struct sock *sk)
kfree_rcu(sk_storage, rcu);
}
-static void bpf_sk_storage_map_free(struct bpf_map *map)
+static void bpf_local_storage_map_free(struct bpf_map *map)
{
- struct bpf_sk_storage_elem *selem;
- struct bpf_sk_storage_map *smap;
- struct bucket *b;
+ struct bpf_local_storage_elem *selem;
+ struct bpf_local_storage_map *smap;
+ struct bpf_local_storage_map_bucket *b;
unsigned int i;
- smap = (struct bpf_sk_storage_map *)map;
+ smap = (struct bpf_local_storage_map *)map;
cache_idx_free(smap->cache_idx);
@@ -604,10 +612,10 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
/* bpf prog and the userspace can no longer access this map
* now. No new selem (of this map) can be added
- * to the sk->sk_bpf_storage or to the map bucket's list.
+ * to the owner->storage or to the map bucket's list.
*
* The elem of this map can be cleaned up here
- * or
+ * or when the storage is freed e.g.
* by bpf_sk_storage_free() during __sk_destruct().
*/
for (i = 0; i < (1U << smap->bucket_log); i++) {
@@ -615,26 +623,26 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
rcu_read_lock();
/* No one is adding to b->list now */
- while ((selem = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(&b->list)),
- struct bpf_sk_storage_elem,
- map_node))) {
- selem_unlink(selem);
+ while ((selem = hlist_entry_safe(
+ rcu_dereference_raw(hlist_first_rcu(&b->list)),
+ struct bpf_local_storage_elem, map_node))) {
+ bpf_selem_unlink(selem);
cond_resched_rcu();
}
rcu_read_unlock();
}
- /* bpf_sk_storage_free() may still need to access the map.
- * e.g. bpf_sk_storage_free() has unlinked selem from the map
+ /* While freeing the storage we may still need to access the map.
+ *
+ * e.g. when bpf_sk_storage_free() has unlinked selem from the map
* which then made the above while((selem = ...)) loop
- * exited immediately.
+ * exit immediately.
*
- * However, the bpf_sk_storage_free() still needs to access
- * the smap->elem_size to do the uncharging in
- * __selem_unlink_sk().
+ * However, while freeing the storage one still needs to access the
+ * smap->elem_size to do the uncharging in
+ * bpf_selem_unlink_storage_nolock().
*
- * Hence, wait another rcu grace period for the
- * bpf_sk_storage_free() to finish.
+ * Hence, wait another rcu grace period for the storage to be freed.
*/
synchronize_rcu();
@@ -645,14 +653,15 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
/* U16_MAX is much more than enough for sk local storage
* considering a tcp_sock is ~2k.
*/
-#define MAX_VALUE_SIZE \
+#define BPF_LOCAL_STORAGE_MAX_VALUE_SIZE \
min_t(u32, \
- (KMALLOC_MAX_SIZE - MAX_BPF_STACK - sizeof(struct bpf_sk_storage_elem)), \
- (U16_MAX - sizeof(struct bpf_sk_storage_elem)))
+ (KMALLOC_MAX_SIZE - MAX_BPF_STACK - \
+ sizeof(struct bpf_local_storage_elem)), \
+ (U16_MAX - sizeof(struct bpf_local_storage_elem)))
-static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr)
+static int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
{
- if (attr->map_flags & ~SK_STORAGE_CREATE_FLAG_MASK ||
+ if (attr->map_flags & ~BPF_LOCAL_STORAGE_CREATE_FLAG_MASK ||
!(attr->map_flags & BPF_F_NO_PREALLOC) ||
attr->max_entries ||
attr->key_size != sizeof(int) || !attr->value_size ||
@@ -663,15 +672,15 @@ static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr)
if (!bpf_capable())
return -EPERM;
- if (attr->value_size > MAX_VALUE_SIZE)
+ if (attr->value_size > BPF_LOCAL_STORAGE_MAX_VALUE_SIZE)
return -E2BIG;
return 0;
}
-static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
+static struct bpf_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
{
- struct bpf_sk_storage_map *smap;
+ struct bpf_local_storage_map *smap;
unsigned int i;
u32 nbuckets;
u64 cost;
@@ -707,7 +716,8 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
raw_spin_lock_init(&smap->buckets[i].lock);
}
- smap->elem_size = sizeof(struct bpf_sk_storage_elem) + attr->value_size;
+ smap->elem_size =
+ sizeof(struct bpf_local_storage_elem) + attr->value_size;
smap->cache_idx = cache_idx_get();
return &smap->map;
@@ -719,10 +729,10 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key,
return -ENOTSUPP;
}
-static int bpf_sk_storage_map_check_btf(const struct bpf_map *map,
- const struct btf *btf,
- const struct btf_type *key_type,
- const struct btf_type *value_type)
+static int bpf_local_storage_map_check_btf(const struct bpf_map *map,
+ const struct btf *btf,
+ const struct btf_type *key_type,
+ const struct btf_type *value_type)
{
u32 int_data;
@@ -738,7 +748,7 @@ static int bpf_sk_storage_map_check_btf(const struct bpf_map *map,
static void *bpf_fd_sk_storage_lookup_elem(struct bpf_map *map, void *key)
{
- struct bpf_sk_storage_data *sdata;
+ struct bpf_local_storage_data *sdata;
struct socket *sock;
int fd, err;
@@ -756,14 +766,15 @@ static void *bpf_fd_sk_storage_lookup_elem(struct bpf_map *map, void *key)
static int bpf_fd_sk_storage_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags)
{
- struct bpf_sk_storage_data *sdata;
+ struct bpf_local_storage_data *sdata;
struct socket *sock;
int fd, err;
fd = *(int *)key;
sock = sockfd_lookup(fd, &err);
if (sock) {
- sdata = sk_storage_update(sock->sk, map, value, map_flags);
+ sdata = bpf_local_storage_update(sock->sk, map, value,
+ map_flags);
sockfd_put(sock);
return PTR_ERR_OR_ZERO(sdata);
}
@@ -787,14 +798,14 @@ static int bpf_fd_sk_storage_delete_elem(struct bpf_map *map, void *key)
return err;
}
-static struct bpf_sk_storage_elem *
+static struct bpf_local_storage_elem *
bpf_sk_storage_clone_elem(struct sock *newsk,
- struct bpf_sk_storage_map *smap,
- struct bpf_sk_storage_elem *selem)
+ struct bpf_local_storage_map *smap,
+ struct bpf_local_storage_elem *selem)
{
- struct bpf_sk_storage_elem *copy_selem;
+ struct bpf_local_storage_elem *copy_selem;
- copy_selem = selem_alloc(smap, newsk, NULL, true);
+ copy_selem = bpf_selem_alloc(smap, newsk, NULL, true);
if (!copy_selem)
return NULL;
@@ -810,9 +821,9 @@ bpf_sk_storage_clone_elem(struct sock *newsk,
int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
{
- struct bpf_sk_storage *new_sk_storage = NULL;
- struct bpf_sk_storage *sk_storage;
- struct bpf_sk_storage_elem *selem;
+ struct bpf_local_storage *new_sk_storage = NULL;
+ struct bpf_local_storage *sk_storage;
+ struct bpf_local_storage_elem *selem;
int ret = 0;
RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
@@ -824,8 +835,8 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
goto out;
hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
- struct bpf_sk_storage_elem *copy_selem;
- struct bpf_sk_storage_map *smap;
+ struct bpf_local_storage_elem *copy_selem;
+ struct bpf_local_storage_map *smap;
struct bpf_map *map;
smap = rcu_dereference(SDATA(selem)->smap);
@@ -833,7 +844,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
continue;
/* Note that for lockless listeners adding new element
- * here can race with cleanup in bpf_sk_storage_map_free.
+ * here can race with cleanup in bpf_local_storage_map_free.
* Try to grab map refcnt to make sure that it's still
* alive and prevent concurrent removal.
*/
@@ -849,8 +860,8 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
}
if (new_sk_storage) {
- selem_link_map(smap, copy_selem);
- __selem_link_sk(new_sk_storage, copy_selem);
+ bpf_selem_link_map(smap, copy_selem);
+ bpf_selem_link_storage_nolock(new_sk_storage, copy_selem);
} else {
ret = sk_storage_alloc(newsk, smap, copy_selem);
if (ret) {
@@ -861,7 +872,8 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
goto out;
}
- new_sk_storage = rcu_dereference(copy_selem->sk_storage);
+ new_sk_storage =
+ rcu_dereference(copy_selem->local_storage);
}
bpf_map_put(map);
}
@@ -879,7 +891,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
void *, value, u64, flags)
{
- struct bpf_sk_storage_data *sdata;
+ struct bpf_local_storage_data *sdata;
if (flags > BPF_SK_STORAGE_GET_F_CREATE)
return (unsigned long)NULL;
@@ -895,7 +907,7 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
* destruction).
*/
refcount_inc_not_zero(&sk->sk_refcnt)) {
- sdata = sk_storage_update(sk, map, value, BPF_NOEXIST);
+ sdata = bpf_local_storage_update(sk, map, value, BPF_NOEXIST);
/* sk must be a fullsock (guaranteed by verifier),
* so sock_gen_put() is unnecessary.
*/
@@ -922,15 +934,15 @@ BPF_CALL_2(bpf_sk_storage_delete, struct bpf_map *, map, struct sock *, sk)
static int sk_storage_map_btf_id;
const struct bpf_map_ops sk_storage_map_ops = {
- .map_alloc_check = bpf_sk_storage_map_alloc_check,
- .map_alloc = bpf_sk_storage_map_alloc,
- .map_free = bpf_sk_storage_map_free,
+ .map_alloc_check = bpf_local_storage_map_alloc_check,
+ .map_alloc = bpf_local_storage_map_alloc,
+ .map_free = bpf_local_storage_map_free,
.map_get_next_key = notsupp_get_next_key,
.map_lookup_elem = bpf_fd_sk_storage_lookup_elem,
.map_update_elem = bpf_fd_sk_storage_update_elem,
.map_delete_elem = bpf_fd_sk_storage_delete_elem,
- .map_check_btf = bpf_sk_storage_map_check_btf,
- .map_btf_name = "bpf_sk_storage_map",
+ .map_check_btf = bpf_local_storage_map_check_btf,
+ .map_btf_name = "bpf_local_storage_map",
.map_btf_id = &sk_storage_map_btf_id,
};
@@ -1022,7 +1034,7 @@ bpf_sk_storage_diag_alloc(const struct nlattr *nla_stgs)
u32 nr_maps = 0;
int rem, err;
- /* bpf_sk_storage_map is currently limited to CAP_SYS_ADMIN as
+ /* bpf_local_storage_map is currently limited to CAP_SYS_ADMIN as
* the map_alloc_check() side also does.
*/
if (!bpf_capable())
@@ -1072,13 +1084,13 @@ bpf_sk_storage_diag_alloc(const struct nlattr *nla_stgs)
}
EXPORT_SYMBOL_GPL(bpf_sk_storage_diag_alloc);
-static int diag_get(struct bpf_sk_storage_data *sdata, struct sk_buff *skb)
+static int diag_get(struct bpf_local_storage_data *sdata, struct sk_buff *skb)
{
struct nlattr *nla_stg, *nla_value;
- struct bpf_sk_storage_map *smap;
+ struct bpf_local_storage_map *smap;
/* It cannot exceed max nlattr's payload */
- BUILD_BUG_ON(U16_MAX - NLA_HDRLEN < MAX_VALUE_SIZE);
+ BUILD_BUG_ON(U16_MAX - NLA_HDRLEN < BPF_LOCAL_STORAGE_MAX_VALUE_SIZE);
nla_stg = nla_nest_start(skb, SK_DIAG_BPF_STORAGE);
if (!nla_stg)
@@ -1114,9 +1126,9 @@ static int bpf_sk_storage_diag_put_all(struct sock *sk, struct sk_buff *skb,
{
/* stg_array_type (e.g. INET_DIAG_BPF_SK_STORAGES) */
unsigned int diag_size = nla_total_size(0);
- struct bpf_sk_storage *sk_storage;
- struct bpf_sk_storage_elem *selem;
- struct bpf_sk_storage_map *smap;
+ struct bpf_local_storage *sk_storage;
+ struct bpf_local_storage_elem *selem;
+ struct bpf_local_storage_map *smap;
struct nlattr *nla_stgs;
unsigned int saved_len;
int err = 0;
@@ -1169,8 +1181,8 @@ int bpf_sk_storage_diag_put(struct bpf_sk_storage_diag *diag,
{
/* stg_array_type (e.g. INET_DIAG_BPF_SK_STORAGES) */
unsigned int diag_size = nla_total_size(0);
- struct bpf_sk_storage *sk_storage;
- struct bpf_sk_storage_data *sdata;
+ struct bpf_local_storage *sk_storage;
+ struct bpf_local_storage_data *sdata;
struct nlattr *nla_stgs;
unsigned int saved_len;
int err = 0;
@@ -1197,8 +1209,8 @@ int bpf_sk_storage_diag_put(struct bpf_sk_storage_diag *diag,
saved_len = skb->len;
for (i = 0; i < diag->nr_maps; i++) {
- sdata = __sk_storage_lookup(sk_storage,
- (struct bpf_sk_storage_map *)diag->maps[i],
+ sdata = bpf_local_storage_lookup(sk_storage,
+ (struct bpf_local_storage_map *)diag->maps[i],
false);
if (!sdata)
@@ -1235,19 +1247,19 @@ struct bpf_iter_seq_sk_storage_map_info {
unsigned skip_elems;
};
-static struct bpf_sk_storage_elem *
+static struct bpf_local_storage_elem *
bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info,
- struct bpf_sk_storage_elem *prev_selem)
+ struct bpf_local_storage_elem *prev_selem)
{
- struct bpf_sk_storage *sk_storage;
- struct bpf_sk_storage_elem *selem;
+ struct bpf_local_storage *sk_storage;
+ struct bpf_local_storage_elem *selem;
u32 skip_elems = info->skip_elems;
- struct bpf_sk_storage_map *smap;
+ struct bpf_local_storage_map *smap;
u32 bucket_id = info->bucket_id;
u32 i, count, n_buckets;
- struct bucket *b;
+ struct bpf_local_storage_map_bucket *b;
- smap = (struct bpf_sk_storage_map *)info->map;
+ smap = (struct bpf_local_storage_map *)info->map;
n_buckets = 1U << smap->bucket_log;
if (bucket_id >= n_buckets)
return NULL;
@@ -1257,7 +1269,7 @@ bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info,
count = 0;
while (selem) {
selem = hlist_entry_safe(selem->map_node.next,
- struct bpf_sk_storage_elem, map_node);
+ struct bpf_local_storage_elem, map_node);
if (!selem) {
/* not found, unlock and go to the next bucket */
b = &smap->buckets[bucket_id++];
@@ -1265,7 +1277,7 @@ bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info,
skip_elems = 0;
break;
}
- sk_storage = rcu_dereference_raw(selem->sk_storage);
+ sk_storage = rcu_dereference_raw(selem->local_storage);
if (sk_storage) {
info->skip_elems = skip_elems + count;
return selem;
@@ -1278,7 +1290,7 @@ bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info,
raw_spin_lock_bh(&b->lock);
count = 0;
hlist_for_each_entry(selem, &b->list, map_node) {
- sk_storage = rcu_dereference_raw(selem->sk_storage);
+ sk_storage = rcu_dereference_raw(selem->local_storage);
if (sk_storage && count >= skip_elems) {
info->bucket_id = i;
info->skip_elems = count;
@@ -1297,7 +1309,7 @@ bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info,
static void *bpf_sk_storage_map_seq_start(struct seq_file *seq, loff_t *pos)
{
- struct bpf_sk_storage_elem *selem;
+ struct bpf_local_storage_elem *selem;
selem = bpf_sk_storage_map_seq_find_next(seq->private, NULL);
if (!selem)
@@ -1330,11 +1342,11 @@ DEFINE_BPF_ITER_FUNC(bpf_sk_storage_map, struct bpf_iter_meta *meta,
void *value)
static int __bpf_sk_storage_map_seq_show(struct seq_file *seq,
- struct bpf_sk_storage_elem *selem)
+ struct bpf_local_storage_elem *selem)
{
struct bpf_iter_seq_sk_storage_map_info *info = seq->private;
struct bpf_iter__bpf_sk_storage_map ctx = {};
- struct bpf_sk_storage *sk_storage;
+ struct bpf_local_storage *sk_storage;
struct bpf_iter_meta meta;
struct bpf_prog *prog;
int ret = 0;
@@ -1345,8 +1357,8 @@ static int __bpf_sk_storage_map_seq_show(struct seq_file *seq,
ctx.meta = &meta;
ctx.map = info->map;
if (selem) {
- sk_storage = rcu_dereference_raw(selem->sk_storage);
- ctx.sk = sk_storage->sk;
+ sk_storage = rcu_dereference_raw(selem->local_storage);
+ ctx.sk = sk_storage->owner;
ctx.value = SDATA(selem)->data;
}
ret = bpf_iter_run_prog(prog, &ctx);
@@ -1363,13 +1375,13 @@ static int bpf_sk_storage_map_seq_show(struct seq_file *seq, void *v)
static void bpf_sk_storage_map_seq_stop(struct seq_file *seq, void *v)
{
struct bpf_iter_seq_sk_storage_map_info *info = seq->private;
- struct bpf_sk_storage_map *smap;
- struct bucket *b;
+ struct bpf_local_storage_map *smap;
+ struct bpf_local_storage_map_bucket *b;
if (!v) {
(void)__bpf_sk_storage_map_seq_show(seq, v);
} else {
- smap = (struct bpf_sk_storage_map *)info->map;
+ smap = (struct bpf_local_storage_map *)info->map;
b = &smap->buckets[info->bucket_id];
raw_spin_unlock_bh(&b->lock);
}
--
2.28.0.297.g1956fa8f8d-goog
^ permalink raw reply related
* [PATCH bpf-next v9 0/7] Generalizing bpf_local_storage
From: KP Singh @ 2020-08-23 16:56 UTC (permalink / raw)
To: linux-kernel, bpf, linux-security-module
Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
Paul Turner, Jann Horn, Florent Revest
From: KP Singh <kpsingh@google.com>
# v8 -> v9
- Fixed reference count logic for files for inode maps.
- Other fixes suggested by Martin
- Rebase
# v7 -> v8
- Fixed an issue with BTF IDs for helpers and added
bpf_<>_storage_delete to selftests to catch this issue.
- Update comments about refcounts and grabbed a refcount to the open
file for userspace inode helpers.
- Rebase.
# v6 -> v7
- Updated the series to use Martin's POC patch:
https://lore.kernel.org/bpf/20200725013047.4006241-1-kafai@fb.com/
I added a Co-developed-by: tag, but would need Martin's Signoff
(was not sure of the procedure here).
- Rebase.
# v5 -> v6
- Fixed a build warning.
- Rebase.
# v4 -> v5
- Split non-functional changes into separate commits.
- Updated the cache macros to be simpler.
- Fixed some bugs noticed by Martin.
- Updated the userspace map functions to use an fd for lookups, updates
and deletes.
- Rebase.
# v3 -> v4
- Fixed a missing include to bpf_sk_storage.h in bpf_sk_storage.c
- Fixed some functions that were not marked as static which led to
W=1 compilation warnings.
# v2 -> v3
* Restructured the code as per Martin's suggestions:
- Common functionality in bpf_local_storage.c
- bpf_sk_storage functionality remains in net/bpf_sk_storage.
- bpf_inode_storage is kept separate as it is enabled only with
CONFIG_BPF_LSM.
* A separate cache for inode and sk storage with macros to define it.
* Use the ops style approach as suggested by Martin instead of the
enum + switch style.
* Added the inode map to bpftool bash completion and docs.
* Rebase and indentation fixes.
# v1 -> v2
* Use the security blob pointer instead of dedicated member in
struct inode.
* Better code re-use as suggested by Alexei.
* Dropped the inode count arithmetic as pointed out by Alexei.
* Minor bug fixes and rebase.
bpf_sk_storage can already be used by some BPF program types to annotate
socket objects. These annotations are managed with the life-cycle of the
object (i.e. freed when the object is freed) which makes BPF programs
much simpler and less prone to errors and leaks.
This patch series:
* Generalizes the bpf_sk_storage infrastructure to allow easy
implementation of local storage for other objects
* Implements local storage for inodes
* Makes both bpf_{sk, inode}_storage available to LSM programs.
Local storage is safe to use in LSM programs as the attachment sites are
limited and the owning object won't be freed, however, this is not the
case for tracing. Usage in tracing is expected to follow a white-list
based approach similar to the d_path helper
(https://lore.kernel.org/bpf/20200506132946.2164578-1-jolsa@kernel.org).
Access to local storage would allow LSM programs to implement stateful
detections like detecting the unlink of a running executable from the
examples shared as a part of the KRSI series
https://lore.kernel.org/bpf/20200329004356.27286-1-kpsingh@chromium.org/
and
https://github.com/sinkap/linux-krsi/blob/patch/v1/examples/samples/bpf/lsm_detect_exec_unlink.c
KP Singh (7):
bpf: Renames in preparation for bpf_local_storage
bpf: Generalize caching for sk_storage.
bpf: Generalize bpf_sk_storage
bpf: Split bpf_local_storage to bpf_sk_storage
bpf: Implement bpf_local_storage for inodes
bpf: Allow local storage to be used from LSM programs
bpf: Add selftests for local_storage
include/linux/bpf.h | 8 +
include/linux/bpf_local_storage.h | 163 ++++
include/linux/bpf_lsm.h | 29 +
include/linux/bpf_types.h | 3 +
include/net/bpf_sk_storage.h | 14 +
include/net/sock.h | 4 +-
include/uapi/linux/bpf.h | 53 +-
kernel/bpf/Makefile | 2 +
kernel/bpf/bpf_inode_storage.c | 265 ++++++
kernel/bpf/bpf_local_storage.c | 600 +++++++++++++
kernel/bpf/bpf_lsm.c | 21 +-
kernel/bpf/syscall.c | 3 +-
kernel/bpf/verifier.c | 10 +
net/core/bpf_sk_storage.c | 830 +++---------------
security/bpf/hooks.c | 7 +
.../bpf/bpftool/Documentation/bpftool-map.rst | 2 +-
tools/bpf/bpftool/bash-completion/bpftool | 3 +-
tools/bpf/bpftool/map.c | 3 +-
tools/include/uapi/linux/bpf.h | 53 +-
tools/lib/bpf/libbpf_probes.c | 5 +-
.../bpf/prog_tests/test_local_storage.c | 60 ++
.../selftests/bpf/progs/local_storage.c | 140 +++
22 files changed, 1566 insertions(+), 712 deletions(-)
create mode 100644 include/linux/bpf_local_storage.h
create mode 100644 kernel/bpf/bpf_inode_storage.c
create mode 100644 kernel/bpf/bpf_local_storage.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_local_storage.c
create mode 100644 tools/testing/selftests/bpf/progs/local_storage.c
--
2.28.0.297.g1956fa8f8d-goog
^ permalink raw reply
* [PATCH] security: fix some spelling mistakes in the comments by codespell
From: Xiaoming Ni @ 2020-08-22 3:05 UTC (permalink / raw)
To: serge, jmorris, zohar, dmitry.kasatkin, casey, takedakn,
penguin-kernel, linux-security-module, linux-kernel,
linux-integrity
Cc: wangle6, nixiaoming
ecurity/commoncap.c: capabilties ==> capabilities
security/lsm_audit.c: programers ==> programmers
security/tomoyo/audit.c: stuct ==> struct
security/tomoyo/common.c: Poiter ==> Pointer
security/smack/smack_lsm.c: overriden ==> overridden overridden
security/smack/smackfs.c: overriden ==> overridden
security/integrity/ima/ima_template_lib.c: algoritm ==> algorithm
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
---
security/commoncap.c | 2 +-
security/integrity/ima/ima_template_lib.c | 2 +-
security/lsm_audit.c | 2 +-
security/smack/smack_lsm.c | 2 +-
security/smack/smackfs.c | 2 +-
security/tomoyo/audit.c | 2 +-
security/tomoyo/common.c | 2 +-
7 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/security/commoncap.c b/security/commoncap.c
index 59bf3c1674c8..2c3a0f1b6876 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -1046,7 +1046,7 @@ int cap_task_fix_setuid(struct cred *new, const struct cred *old, int flags)
break;
case LSM_SETID_FS:
- /* juggle the capabilties to follow FSUID changes, unless
+ /* juggle the capabilities to follow FSUID changes, unless
* otherwise suppressed
*
* FIXME - is fsuser used for all CAP_FS_MASK capabilities?
diff --git a/security/integrity/ima/ima_template_lib.c b/security/integrity/ima/ima_template_lib.c
index c022ee9e2a4e..9513564ee0b2 100644
--- a/security/integrity/ima/ima_template_lib.c
+++ b/security/integrity/ima/ima_template_lib.c
@@ -231,7 +231,7 @@ static int ima_eventdigest_init_common(const u8 *digest, u32 digestsize,
* digest formats:
* - DATA_FMT_DIGEST: digest
* - DATA_FMT_DIGEST_WITH_ALGO: [<hash algo>] + ':' + '\0' + digest,
- * where <hash algo> is provided if the hash algoritm is not
+ * where <hash algo> is provided if the hash algorithm is not
* SHA1 or MD5
*/
u8 buffer[CRYPTO_MAX_ALG_NAME + 2 + IMA_MAX_DIGEST_SIZE] = { 0 };
diff --git a/security/lsm_audit.c b/security/lsm_audit.c
index 53d0d183db8f..a0ff2e441069 100644
--- a/security/lsm_audit.c
+++ b/security/lsm_audit.c
@@ -212,7 +212,7 @@ static void dump_common_audit_data(struct audit_buffer *ab,
char comm[sizeof(current->comm)];
/*
- * To keep stack sizes in check force programers to notice if they
+ * To keep stack sizes in check force programmers to notice if they
* start making this union too large! See struct lsm_network_audit
* as an example of how to deal with large data.
*/
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 8c0893eb5aa8..960d4641239c 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -1784,7 +1784,7 @@ static int smack_file_send_sigiotask(struct task_struct *tsk,
*/
file = container_of(fown, struct file, f_owner);
- /* we don't log here as rc can be overriden */
+ /* we don't log here as rc can be overridden */
blob = smack_file(file);
skp = *blob;
rc = smk_access(skp, tkp, MAY_DELIVER, NULL);
diff --git a/security/smack/smackfs.c b/security/smack/smackfs.c
index 9c4308077574..5c581ec814ac 100644
--- a/security/smack/smackfs.c
+++ b/security/smack/smackfs.c
@@ -111,7 +111,7 @@ struct smack_known *smack_syslog_label;
/*
* Ptrace current rule
* SMACK_PTRACE_DEFAULT regular smack ptrace rules (/proc based)
- * SMACK_PTRACE_EXACT labels must match, but can be overriden with
+ * SMACK_PTRACE_EXACT labels must match, but can be overridden with
* CAP_SYS_PTRACE
* SMACK_PTRACE_DRACONIAN lables must match, CAP_SYS_PTRACE has no effect
*/
diff --git a/security/tomoyo/audit.c b/security/tomoyo/audit.c
index 3c96e8402e94..b51bad121c11 100644
--- a/security/tomoyo/audit.c
+++ b/security/tomoyo/audit.c
@@ -311,7 +311,7 @@ static LIST_HEAD(tomoyo_log);
/* Lock for "struct list_head tomoyo_log". */
static DEFINE_SPINLOCK(tomoyo_log_lock);
-/* Length of "stuct list_head tomoyo_log". */
+/* Length of "struct list_head tomoyo_log". */
static unsigned int tomoyo_log_count;
/**
diff --git a/security/tomoyo/common.c b/security/tomoyo/common.c
index 4bee32bfe16d..38bdc0ffc312 100644
--- a/security/tomoyo/common.c
+++ b/security/tomoyo/common.c
@@ -2608,7 +2608,7 @@ ssize_t tomoyo_read_control(struct tomoyo_io_buffer *head, char __user *buffer,
/**
* tomoyo_parse_policy - Parse a policy line.
*
- * @head: Poiter to "struct tomoyo_io_buffer".
+ * @head: Pointer to "struct tomoyo_io_buffer".
* @line: Line to parse.
*
* Returns 0 on success, negative value otherwise.
--
2.27.0
^ permalink raw reply related
* [PATCH] SELinux: Measure state and hash of policy using IMA
From: Lakshmi Ramasubramanian @ 2020-08-22 1:00 UTC (permalink / raw)
To: zohar, stephen.smalley.work, casey
Cc: tyhicks, tusharsu, sashal, jmorris, linux-integrity, selinux,
linux-security-module, linux-kernel
Critical data structures of security modules are currently not measured.
Therefore an attestation service, for instance, would not be able to
attest whether the security modules are always operating with the policies
and configuration that the system administrator had setup. The policies
and configuration for the security modules could be tampered with by
rogue user mode agents or modified through some inadvertent actions on
the system. Measuring such critical data would enable an attestation
service to reliably assess the security configuration of the system.
SELinux configuration and policy are some of the critical data for this
security module that needs to be measured. This measurement can be used
by an attestation service, for instance, to verify if the configuration
and policies have been setup correctly and that they haven't been tampered
with at runtime.
Measure SELinux configuration, policy capabilities settings, and the hash
of the loaded policy by calling the IMA hook ima_measure_critical_data().
Since the size of the loaded policy can be quite large, hash of the policy
is measured instead of the entire policy to avoid bloating the IMA log.
Enable early boot measurement for SELinux in IMA since SELinux
initializes its state and policy before custom IMA policy is loaded.
Sample measurement of SELinux state and hash of the policy:
10 e32e...5ac3 ima-buf sha256:86e8...4594 selinux-state-1595389364:287899386 696e697469616c697a65643d313b656e61626c65643d313b656e666f7263696e673d303b636865636b72657170726f743d313b6e6574776f726b5f706565725f636f6e74726f6c733d313b6f70656e5f7065726d733d313b657874656e6465645f736f636b65745f636c6173733d313b616c776179735f636865636b5f6e6574776f726b3d303b6367726f75705f7365636c6162656c3d313b6e6e705f6e6f737569645f7472616e736974696f6e3d313b67656e66735f7365636c6162656c5f73796d6c696e6b733d303
10 9e81...0857 ima-buf sha256:4941...68fc selinux-policy-hash-1597335667:462051628 8d1d...1834
To verify the measurement check the following:
Execute the following command to extract the measured data
from the IMA log for SELinux configuration (selinux-state).
grep -m 1 "selinux-state" /sys/kernel/security/integrity/ima/ascii_runtime_measurements | cut -d' ' -f 6 | xxd -r -p
The output should be the list of key-value pairs. For example,
initialized=1;enabled=1;enforcing=0;checkreqprot=1;network_peer_controls=1;open_perms=1;extended_socket_class=1;always_check_network=0;cgroup_seclabel=1;nnp_nosuid_transition=1;genfs_seclabel_symlinks=0;
To verify the measured data with the current SELinux state:
=> enabled should be set to 1 if /sys/fs/selinux folder exists,
0 otherwise
For other entries, compare the integer value in the files
=> /sys/fs/selinux/enforce
=> /sys/fs/selinux/checkreqprot
And, each of the policy capabilities files under
=> /sys/fs/selinux/policy_capabilities
For selinux-policy-hash, the hash of SELinux policy is included
in the IMA log entry.
To verify the measured data with the current SELinux policy run
the following commands and verify the output hash values match.
sha256sum /sys/fs/selinux/policy | cut -d' ' -f 1
grep -m 1 "selinux-policy-hash" /sys/kernel/security/integrity/ima/ascii_runtime_measurements | cut -d' ' -f 6
This patch is dependent on the following patch series:
https://patchwork.kernel.org/patch/11709527/
https://patchwork.kernel.org/patch/11730193/
https://patchwork.kernel.org/patch/11730757/
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Suggested-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Reported-by: kernel test robot <lkp@intel.com> # error: implicit declaration of function 'vfree'
Reported-by: kernel test robot <lkp@intel.com> # error: implicit declaration of function 'crypto_alloc_shash'
Reported-by: kernel test robot <lkp@intel.com> # sparse: symbol 'security_read_selinux_policy' was not declared. Should it be static?
---
security/integrity/ima/Kconfig | 3 +-
security/selinux/Makefile | 2 +
security/selinux/hooks.c | 1 +
security/selinux/include/security.h | 15 +++
security/selinux/measure.c | 149 ++++++++++++++++++++++++++++
security/selinux/selinuxfs.c | 3 +
security/selinux/ss/services.c | 71 +++++++++++--
7 files changed, 233 insertions(+), 11 deletions(-)
create mode 100644 security/selinux/measure.c
diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index bc2adab7bae2..e4fb1761d64a 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -324,8 +324,7 @@ config IMA_MEASURE_ASYMMETRIC_KEYS
config IMA_QUEUE_EARLY_BOOT_DATA
bool
- depends on IMA_MEASURE_ASYMMETRIC_KEYS
- depends on SYSTEM_TRUSTED_KEYRING
+ depends on SECURITY_SELINUX || (IMA_MEASURE_ASYMMETRIC_KEYS && SYSTEM_TRUSTED_KEYRING)
default y
config IMA_SECURE_AND_OR_TRUSTED_BOOT
diff --git a/security/selinux/Makefile b/security/selinux/Makefile
index 4d8e0e8adf0b..83d512116341 100644
--- a/security/selinux/Makefile
+++ b/security/selinux/Makefile
@@ -16,6 +16,8 @@ selinux-$(CONFIG_NETLABEL) += netlabel.o
selinux-$(CONFIG_SECURITY_INFINIBAND) += ibpkey.o
+selinux-$(CONFIG_IMA) += measure.o
+
ccflags-y := -I$(srctree)/security/selinux -I$(srctree)/security/selinux/include
$(addprefix $(obj)/,$(selinux-y)): $(obj)/flask.h
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index efa6108b1ce9..5521dfc1900b 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -7394,6 +7394,7 @@ int selinux_disable(struct selinux_state *state)
}
selinux_mark_disabled(state);
+ selinux_measure_state(state);
pr_info("SELinux: Disabled at runtime.\n");
diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
index b0e02cfe3ce1..77f42d9b544b 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -222,16 +222,31 @@ static inline bool selinux_policycap_genfs_seclabel_symlinks(void)
return state->policycap[POLICYDB_CAPABILITY_GENFS_SECLABEL_SYMLINKS];
}
+static inline bool selinux_checkreqprot(const struct selinux_state *state)
+{
+ return READ_ONCE(state->checkreqprot);
+}
+
int security_mls_enabled(struct selinux_state *state);
int security_load_policy(struct selinux_state *state,
void *data, size_t len);
int security_read_policy(struct selinux_state *state,
void **data, size_t *len);
+int security_read_policy_kernel(struct selinux_state *state,
+ void **data, size_t *len);
size_t security_policydb_len(struct selinux_state *state);
int security_policycap_supported(struct selinux_state *state,
unsigned int req_cap);
+#ifdef CONFIG_IMA
+extern void selinux_measure_state(struct selinux_state *selinux_state);
+#else
+static inline void selinux_measure_state(struct selinux_state *selinux_state)
+{
+}
+#endif
+
#define SEL_VEC_MAX 32
struct av_decision {
u32 allowed;
diff --git a/security/selinux/measure.c b/security/selinux/measure.c
new file mode 100644
index 000000000000..33594c92f35c
--- /dev/null
+++ b/security/selinux/measure.c
@@ -0,0 +1,149 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Measure SELinux state using IMA subsystem.
+ */
+#include <linux/vmalloc.h>
+#include <linux/ktime.h>
+#include <linux/ima.h>
+#include "security.h"
+
+/*
+ * This function creates an unique name by appending the timestamp to
+ * the given string. This string is passed as "event name" to the IMA
+ * hook to measure the given SELinux data.
+ *
+ * The data provided by SELinux to the IMA subsystem for measuring may have
+ * already been measured (for instance the same state existed earlier).
+ * But for SELinux the current data represents a state change and hence
+ * needs to be measured again. To enable this, pass an unique "Event Name"
+ * to the IMA hook so that IMA subsystem will always measure the given data.
+ *
+ * For example,
+ * At time T0 SELinux data to be measured is "foo". IMA measures it.
+ * At time T1 the data is changed to "bar". IMA measures it.
+ * At time T2 the data is changed to "foo" again. IMA will not measure it
+ * (since it was already measured) unless the event name, for instance,
+ * is different in this call.
+ */
+static char *selinux_event_name(const char *name_prefix)
+{
+ char *event_name = NULL;
+ struct timespec64 cur_time;
+
+ ktime_get_real_ts64(&cur_time);
+ event_name = kasprintf(GFP_KERNEL, "%s-%lld:%09ld", name_prefix,
+ cur_time.tv_sec, cur_time.tv_nsec);
+ if (!event_name) {
+ pr_err("%s: event name not allocated.\n", __func__);
+ return NULL;
+ }
+
+ return event_name;
+}
+
+static int read_selinux_state(char **state_str, int *state_str_len,
+ struct selinux_state *state)
+{
+ char *buf, *str_fmt = "%s=%d;";
+ int i, buf_len, curr;
+ bool initialized = selinux_initialized(state);
+ bool enabled = !selinux_disabled(state);
+ bool enforcing = enforcing_enabled(state);
+ bool checkreqprot = selinux_checkreqprot(state);
+
+ buf_len = snprintf(NULL, 0, str_fmt, "initialized", initialized);
+ buf_len += snprintf(NULL, 0, str_fmt, "enabled", enabled);
+ buf_len += snprintf(NULL, 0, str_fmt, "enforcing", enforcing);
+ buf_len += snprintf(NULL, 0, str_fmt, "checkreqprot", checkreqprot);
+
+ for (i = 0; i < __POLICYDB_CAPABILITY_MAX; i++) {
+ buf_len += snprintf(NULL, 0, str_fmt,
+ selinux_policycap_names[i],
+ state->policycap[i]);
+ }
+ ++buf_len;
+
+ buf = kzalloc(buf_len, GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ curr = snprintf(buf, buf_len, str_fmt,
+ "initialized", initialized);
+ curr += snprintf((buf + curr), (buf_len - curr), str_fmt,
+ "enabled", enabled);
+ curr += snprintf((buf + curr), (buf_len - curr), str_fmt,
+ "enforcing", enforcing);
+ curr += snprintf((buf + curr), (buf_len - curr), str_fmt,
+ "checkreqprot", checkreqprot);
+
+ for (i = 0; i < __POLICYDB_CAPABILITY_MAX; i++) {
+ curr += snprintf((buf + curr), (buf_len - curr), str_fmt,
+ selinux_policycap_names[i],
+ state->policycap[i]);
+ }
+
+ *state_str = buf;
+ *state_str_len = curr;
+
+ return 0;
+}
+
+void selinux_measure_state(struct selinux_state *state)
+{
+ void *policy = NULL;
+ char *state_event_name = NULL;
+ char *policy_event_name = NULL;
+ char *state_str = NULL;
+ size_t policy_len;
+ int state_str_len, rc = 0;
+ bool initialized = selinux_initialized(state);
+
+ rc = read_selinux_state(&state_str, &state_str_len, state);
+ if (rc) {
+ pr_err("%s: Failed to read selinux state.\n", __func__);
+ return;
+ }
+
+ /*
+ * Get an unique string for measuring the current SELinux state.
+ */
+ state_event_name = selinux_event_name("selinux-state");
+ if (!state_event_name) {
+ pr_err("%s: Event name for state not allocated.\n",
+ __func__);
+ rc = -ENOMEM;
+ goto out;
+ }
+
+ rc = ima_measure_critical_data(state_event_name, "selinux",
+ state_str, state_str_len, false);
+ if (rc)
+ goto out;
+
+ /*
+ * Measure SELinux policy only after initialization is completed.
+ */
+ if (!initialized)
+ goto out;
+
+ rc = security_read_policy_kernel(state, &policy, &policy_len);
+ if (rc)
+ goto out;
+
+ policy_event_name = selinux_event_name("selinux-policy-hash");
+ if (!policy_event_name) {
+ pr_err("%s: Event name for policy not allocated.\n",
+ __func__);
+ rc = -ENOMEM;
+ goto out;
+ }
+
+ rc = ima_measure_critical_data(policy_event_name, "selinux",
+ policy, policy_len, true);
+
+out:
+ kfree(state_event_name);
+ kfree(policy_event_name);
+ kfree(state_str);
+ vfree(policy);
+}
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 4781314c2510..6d46eaef5c92 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -173,6 +173,7 @@ static ssize_t sel_write_enforce(struct file *file, const char __user *buf,
from_kuid(&init_user_ns, audit_get_loginuid(current)),
audit_get_sessionid(current));
enforcing_set(state, new_value);
+ selinux_measure_state(state);
if (new_value)
avc_ss_reset(state->avc, 0);
selnl_notify_setenforce(new_value);
@@ -678,6 +679,8 @@ static ssize_t sel_write_checkreqprot(struct file *file, const char __user *buf,
fsi->state->checkreqprot = new_value ? 1 : 0;
length = count;
+
+ selinux_measure_state(fsi->state);
out:
kfree(page);
return length;
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index ef0afd878bfc..3978c804c361 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -2182,6 +2182,7 @@ int security_load_policy(struct selinux_state *state, void *data, size_t len)
selinux_status_update_policyload(state, seqno);
selinux_netlbl_cache_invalidate();
selinux_xfrm_notify_policyload();
+ selinux_measure_state(state);
return 0;
}
@@ -2270,6 +2271,7 @@ int security_load_policy(struct selinux_state *state, void *data, size_t len)
selinux_status_update_policyload(state, seqno);
selinux_netlbl_cache_invalidate();
selinux_xfrm_notify_policyload();
+ selinux_measure_state(state);
rc = 0;
goto out;
@@ -2941,6 +2943,7 @@ int security_set_bools(struct selinux_state *state, u32 len, int *values)
selnl_notify_policyload(seqno);
selinux_status_update_policyload(state, seqno);
selinux_xfrm_notify_policyload();
+ selinux_measure_state(state);
}
return rc;
}
@@ -3720,14 +3723,23 @@ int security_netlbl_sid_to_secattr(struct selinux_state *state,
}
#endif /* CONFIG_NETLABEL */
+static int security_read_policy_len(struct selinux_state *state, size_t *len)
+{
+ if (!selinux_initialized(state))
+ return -EINVAL;
+
+ *len = security_policydb_len(state);
+ return 0;
+}
+
/**
- * security_read_policy - read the policy.
+ * security_read_selinux_policy - read the policy.
* @data: binary policy data
* @len: length of data in bytes
*
*/
-int security_read_policy(struct selinux_state *state,
- void **data, size_t *len)
+static int security_read_selinux_policy(struct selinux_state *state,
+ void **data, size_t *len)
{
struct policydb *policydb = &state->ss->policydb;
int rc;
@@ -3736,12 +3748,6 @@ int security_read_policy(struct selinux_state *state,
if (!selinux_initialized(state))
return -EINVAL;
- *len = security_policydb_len(state);
-
- *data = vmalloc_user(*len);
- if (!*data)
- return -ENOMEM;
-
fp.data = *data;
fp.len = *len;
@@ -3754,5 +3760,52 @@ int security_read_policy(struct selinux_state *state,
*len = (unsigned long)fp.data - (unsigned long)*data;
return 0;
+}
+
+/**
+ * security_read_policy - read the policy.
+ * @data: binary policy data
+ * @len: length of data in bytes
+ *
+ */
+int security_read_policy(struct selinux_state *state,
+ void **data, size_t *len)
+{
+ int rc;
+
+ rc = security_read_policy_len(state, len);
+ if (rc)
+ return rc;
+
+ *data = vmalloc_user(*len);
+ if (!*data)
+ return -ENOMEM;
+
+ return security_read_selinux_policy(state, data, len);
+}
+
+/**
+ * security_read_policy_kernel - read the policy.
+ * @data: binary policy data
+ * @len: length of data in bytes
+ *
+ * Allocates kernel memory for reading SELinux policy.
+ * This function is for internal use only and should not
+ * be used for returning data to user space
+ *
+ */
+int security_read_policy_kernel(struct selinux_state *state,
+ void **data, size_t *len)
+{
+ int rc;
+
+ rc = security_read_policy_len(state, len);
+ if (rc)
+ return rc;
+
+ *data = vmalloc(*len);
+ if (!*data)
+ return -ENOMEM;
+ return security_read_selinux_policy(state, data, len);
}
--
2.28.0
^ permalink raw reply related
* [PATCH] IMA: Handle early boot data measurement
From: Lakshmi Ramasubramanian @ 2020-08-21 23:12 UTC (permalink / raw)
To: zohar, stephen.smalley.work, casey
Cc: tyhicks, tusharsu, sashal, jmorris, linux-integrity, selinux,
linux-security-module, linux-kernel
The current implementation of early boot measurement in
the IMA subsystem is very specific to asymmetric keys. It does not
handle early boot measurement of data from other subsystems such as
Linux Security Module (LSM), Device-Mapper, etc. As a result data,
provided by these subsystems during system boot are not measured by IMA.
Update the early boot key measurement to handle any early boot data.
Refactor the code from ima_queue_keys.c to a new file ima_queue_data.c.
Rename the kernel configuration CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS to
CONFIG_IMA_QUEUE_EARLY_BOOT_DATA so it can be used for enabling any
early boot data measurement. Since measurement of asymmetric keys is
the first consumer of early boot measurement, this kernel configuration
is enabled if IMA_MEASURE_ASYMMETRIC_KEYS and SYSTEM_TRUSTED_KEYRING are
both enabled.
Update the IMA hook ima_measure_critical_data() to utilize early boot
measurement support.
This patch is dependent on the following patch series:
https://patchwork.kernel.org/patch/11709527/
https://patchwork.kernel.org/patch/11730193/
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
---
security/integrity/ima/Kconfig | 2 +-
security/integrity/ima/Makefile | 2 +-
security/integrity/ima/ima.h | 37 ++--
security/integrity/ima/ima_asymmetric_keys.c | 7 +-
security/integrity/ima/ima_init.c | 2 +-
security/integrity/ima/ima_main.c | 10 +
security/integrity/ima/ima_policy.c | 2 +-
security/integrity/ima/ima_queue_data.c | 191 +++++++++++++++++++
security/integrity/ima/ima_queue_keys.c | 175 -----------------
9 files changed, 231 insertions(+), 197 deletions(-)
create mode 100644 security/integrity/ima/ima_queue_data.c
delete mode 100644 security/integrity/ima/ima_queue_keys.c
diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index 080c53545ff0..bc2adab7bae2 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -322,7 +322,7 @@ config IMA_MEASURE_ASYMMETRIC_KEYS
depends on ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
default y
-config IMA_QUEUE_EARLY_BOOT_KEYS
+config IMA_QUEUE_EARLY_BOOT_DATA
bool
depends on IMA_MEASURE_ASYMMETRIC_KEYS
depends on SYSTEM_TRUSTED_KEYRING
diff --git a/security/integrity/ima/Makefile b/security/integrity/ima/Makefile
index 67dabca670e2..6a0f9ef93bf0 100644
--- a/security/integrity/ima/Makefile
+++ b/security/integrity/ima/Makefile
@@ -13,4 +13,4 @@ ima-$(CONFIG_IMA_APPRAISE_MODSIG) += ima_modsig.o
ima-$(CONFIG_HAVE_IMA_KEXEC) += ima_kexec.o
ima-$(CONFIG_IMA_BLACKLIST_KEYRING) += ima_mok.o
ima-$(CONFIG_IMA_MEASURE_ASYMMETRIC_KEYS) += ima_asymmetric_keys.o
-ima-$(CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS) += ima_queue_keys.o
+ima-$(CONFIG_IMA_QUEUE_EARLY_BOOT_DATA) += ima_queue_data.o
\ No newline at end of file
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 00b84052c8f1..366d7cb5c22e 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -228,29 +228,36 @@ extern const char *const func_tokens[];
struct modsig;
-#ifdef CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS
+#ifdef CONFIG_IMA_QUEUE_EARLY_BOOT_DATA
/*
* To track keys that need to be measured.
*/
-struct ima_key_entry {
+struct ima_data_entry {
struct list_head list;
void *payload;
size_t payload_len;
- char *keyring_name;
+ const char *event_name;
+ const char *event_data;
+ enum ima_hooks func;
+ bool measure_payload_hash;
};
-void ima_init_key_queue(void);
-bool ima_should_queue_key(void);
-bool ima_queue_key(struct key *keyring, const void *payload,
- size_t payload_len);
-void ima_process_queued_keys(void);
+void ima_init_data_queue(void);
+bool ima_should_queue_data(void);
+bool ima_queue_data(const char *event_name, const void *payload,
+ size_t payload_len, const char *event_data,
+ enum ima_hooks func, bool measure_payload_hash);
+void ima_process_queued_data(void);
#else
-static inline void ima_init_key_queue(void) {}
-static inline bool ima_should_queue_key(void) { return false; }
-static inline bool ima_queue_key(struct key *keyring,
- const void *payload,
- size_t payload_len) { return false; }
-static inline void ima_process_queued_keys(void) {}
-#endif /* CONFIG_IMA_QUEUE_EARLY_BOOT_KEYS */
+static inline void ima_init_data_queue(void) {}
+static inline bool ima_should_queue_data(void) { return false; }
+static inline bool ima_queue_data(const char *event_name,
+ const void *payload,
+ size_t payload_len,
+ const char *event_data,
+ enum ima_hooks func,
+ bool measure_payload_hash) { return false; }
+static inline void ima_process_queued_data(void) {}
+#endif /* CONFIG_IMA_QUEUE_EARLY_BOOT_DATA */
/* LIM API function definitions */
int ima_get_action(struct inode *inode, const struct cred *cred, u32 secid,
diff --git a/security/integrity/ima/ima_asymmetric_keys.c b/security/integrity/ima/ima_asymmetric_keys.c
index a74095793936..65423754765f 100644
--- a/security/integrity/ima/ima_asymmetric_keys.c
+++ b/security/integrity/ima/ima_asymmetric_keys.c
@@ -37,9 +37,10 @@ void ima_post_key_create_or_update(struct key *keyring, struct key *key,
if (!payload || (payload_len == 0))
return;
- if (ima_should_queue_key())
- queued = ima_queue_key(keyring, payload, payload_len);
-
+ if (ima_should_queue_data())
+ queued = ima_queue_data(keyring->description, payload,
+ payload_len, keyring->description,
+ KEY_CHECK, false);
if (queued)
return;
diff --git a/security/integrity/ima/ima_init.c b/security/integrity/ima/ima_init.c
index 4902fe7bd570..892894bf4af3 100644
--- a/security/integrity/ima/ima_init.c
+++ b/security/integrity/ima/ima_init.c
@@ -145,7 +145,7 @@ int __init ima_init(void)
if (rc != 0)
return rc;
- ima_init_key_queue();
+ ima_init_data_queue();
return rc;
}
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 5db6dc3d45aa..728557662903 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -885,9 +885,19 @@ int ima_measure_critical_data(const char *event_name,
const void *buf, int buf_len,
bool measure_buf_hash)
{
+ bool queued = false;
+
if (!event_name || !event_data_source || !buf || !buf_len)
return -EINVAL;
+ if (ima_should_queue_data())
+ queued = ima_queue_data(event_name, buf, buf_len,
+ event_data_source, CRITICAL_DATA,
+ measure_buf_hash);
+
+ if (queued)
+ return 0;
+
return process_buffer_measurement(NULL, buf, buf_len, event_name,
CRITICAL_DATA, 0, event_data_source,
measure_buf_hash);
diff --git a/security/integrity/ima/ima_policy.c b/security/integrity/ima/ima_policy.c
index 59058ce7906b..f4f095ae6f8f 100644
--- a/security/integrity/ima/ima_policy.c
+++ b/security/integrity/ima/ima_policy.c
@@ -906,7 +906,7 @@ void ima_update_policy(void)
ima_update_policy_flag();
/* Custom IMA policy has been loaded */
- ima_process_queued_keys();
+ ima_process_queued_data();
}
/* Keep the enumeration in sync with the policy_tokens! */
diff --git a/security/integrity/ima/ima_queue_data.c b/security/integrity/ima/ima_queue_data.c
new file mode 100644
index 000000000000..d466ee742377
--- /dev/null
+++ b/security/integrity/ima/ima_queue_data.c
@@ -0,0 +1,191 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright (C) 2020 Microsoft Corporation
+ *
+ * Author: Lakshmi Ramasubramanian (nramas@linux.microsoft.com)
+ *
+ * File: ima_queue_data.c
+ * Enables deferred processing of data
+ */
+
+#include <linux/workqueue.h>
+#include "ima.h"
+
+/*
+ * Flag to indicate whether data can be processed
+ * right away or should be queued for processing later.
+ */
+static bool ima_process_data;
+
+/*
+ * To synchronize access to the list of data that need to be measured
+ */
+static DEFINE_MUTEX(ima_data_lock);
+static LIST_HEAD(ima_queued_data);
+
+/*
+ * If custom IMA policy is not loaded then data queued up
+ * for measurement should be freed. This worker is used
+ * for handling this scenario.
+ */
+static long ima_data_queue_timeout = 300000; /* 5 Minutes */
+static void ima_data_handler(struct work_struct *work);
+static DECLARE_DELAYED_WORK(ima_data_delayed_work, ima_data_handler);
+static bool timer_expired;
+
+/*
+ * This worker function frees data that may still be
+ * queued up in case custom IMA policy was not loaded.
+ */
+static void ima_data_handler(struct work_struct *work)
+{
+ timer_expired = true;
+ ima_process_queued_data();
+}
+
+/*
+ * This function sets up a worker to free queued data in case
+ * custom IMA policy was never loaded.
+ */
+void ima_init_data_queue(void)
+{
+ schedule_delayed_work(&ima_data_delayed_work,
+ msecs_to_jiffies(ima_data_queue_timeout));
+}
+
+static void ima_free_data_entry(struct ima_data_entry *entry)
+{
+ if (!entry)
+ return;
+
+ kvfree(entry->payload);
+ kfree(entry->event_name);
+ kfree(entry->event_data);
+ kfree(entry);
+}
+
+static void *ima_kvmemdup(const void *src, size_t len)
+{
+ void *p = kvmalloc(len, GFP_KERNEL);
+
+ if (p)
+ memcpy(p, src, len);
+ return p;
+}
+
+static struct ima_data_entry *ima_alloc_data_entry(const char *event_name,
+ const void *payload,
+ size_t payload_len,
+ const char *event_data,
+ enum ima_hooks func,
+ bool measure_payload_hash)
+{
+ struct ima_data_entry *entry;
+
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry)
+ goto out;
+
+ /*
+ * Payload size may exceed the limit supported by kmalloc.
+ * So use kvmalloc instead.
+ */
+ entry->payload = ima_kvmemdup(payload, payload_len);
+ entry->event_name = kstrdup(event_name, GFP_KERNEL);
+ if (event_data)
+ entry->event_data = kstrdup(event_data, GFP_KERNEL);
+
+ entry->payload_len = payload_len;
+ entry->func = func;
+ entry->measure_payload_hash = measure_payload_hash;
+
+ if (!entry->payload || !entry->event_name ||
+ (event_data && !entry->event_data))
+ goto out;
+
+ INIT_LIST_HEAD(&entry->list);
+ return entry;
+
+out:
+ integrity_audit_message(AUDIT_INTEGRITY_PCR, NULL,
+ event_name, func_measure_str(func),
+ "ENOMEM", -ENOMEM, 0, -ENOMEM);
+ ima_free_data_entry(entry);
+ return NULL;
+}
+
+bool ima_queue_data(const char *event_name, const void *payload,
+ size_t payload_len, const char *event_data,
+ enum ima_hooks func, bool measure_payload_hash)
+{
+ bool queued = false;
+ struct ima_data_entry *entry;
+
+ entry = ima_alloc_data_entry(event_name, payload, payload_len,
+ event_data, func, measure_payload_hash);
+ if (!entry)
+ return false;
+
+ mutex_lock(&ima_data_lock);
+ if (!ima_process_data) {
+ list_add_tail(&entry->list, &ima_queued_data);
+ queued = true;
+ }
+ mutex_unlock(&ima_data_lock);
+
+ if (!queued)
+ ima_free_data_entry(entry);
+
+ return queued;
+}
+
+/*
+ * ima_process_queued_data() - process data queued for measurement
+ *
+ * This function sets ima_process_data to true and processes queued data.
+ * From here on data will be processed right away (not queued).
+ */
+void ima_process_queued_data(void)
+{
+ struct ima_data_entry *entry, *tmp;
+ bool process = false;
+
+ if (ima_process_data)
+ return;
+
+ /*
+ * Since ima_process_data is set to true, any new data will be
+ * processed immediately and not be queued to ima_queued_data list.
+ * First one setting the ima_process_data flag to true will
+ * process the queued data.
+ */
+ mutex_lock(&ima_data_lock);
+ if (!ima_process_data) {
+ ima_process_data = true;
+ process = true;
+ }
+ mutex_unlock(&ima_data_lock);
+
+ if (!process)
+ return;
+
+ if (!timer_expired)
+ cancel_delayed_work_sync(&ima_data_delayed_work);
+
+ list_for_each_entry_safe(entry, tmp, &ima_queued_data, list) {
+ if (!timer_expired)
+ process_buffer_measurement(NULL, entry->payload,
+ entry->payload_len,
+ entry->event_name,
+ entry->func, 0,
+ entry->event_data,
+ entry->measure_payload_hash);
+ list_del(&entry->list);
+ ima_free_data_entry(entry);
+ }
+}
+
+inline bool ima_should_queue_data(void)
+{
+ return !ima_process_data;
+}
diff --git a/security/integrity/ima/ima_queue_keys.c b/security/integrity/ima/ima_queue_keys.c
deleted file mode 100644
index c2f2ad34f9b7..000000000000
--- a/security/integrity/ima/ima_queue_keys.c
+++ /dev/null
@@ -1,175 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0+
-/*
- * Copyright (C) 2019 Microsoft Corporation
- *
- * Author: Lakshmi Ramasubramanian (nramas@linux.microsoft.com)
- *
- * File: ima_queue_keys.c
- * Enables deferred processing of keys
- */
-
-#include <linux/workqueue.h>
-#include <keys/asymmetric-type.h>
-#include "ima.h"
-
-/*
- * Flag to indicate whether a key can be processed
- * right away or should be queued for processing later.
- */
-static bool ima_process_keys;
-
-/*
- * To synchronize access to the list of keys that need to be measured
- */
-static DEFINE_MUTEX(ima_keys_lock);
-static LIST_HEAD(ima_keys);
-
-/*
- * If custom IMA policy is not loaded then keys queued up
- * for measurement should be freed. This worker is used
- * for handling this scenario.
- */
-static long ima_key_queue_timeout = 300000; /* 5 Minutes */
-static void ima_keys_handler(struct work_struct *work);
-static DECLARE_DELAYED_WORK(ima_keys_delayed_work, ima_keys_handler);
-static bool timer_expired;
-
-/*
- * This worker function frees keys that may still be
- * queued up in case custom IMA policy was not loaded.
- */
-static void ima_keys_handler(struct work_struct *work)
-{
- timer_expired = true;
- ima_process_queued_keys();
-}
-
-/*
- * This function sets up a worker to free queued keys in case
- * custom IMA policy was never loaded.
- */
-void ima_init_key_queue(void)
-{
- schedule_delayed_work(&ima_keys_delayed_work,
- msecs_to_jiffies(ima_key_queue_timeout));
-}
-
-static void ima_free_key_entry(struct ima_key_entry *entry)
-{
- if (entry) {
- kfree(entry->payload);
- kfree(entry->keyring_name);
- kfree(entry);
- }
-}
-
-static struct ima_key_entry *ima_alloc_key_entry(struct key *keyring,
- const void *payload,
- size_t payload_len)
-{
- int rc = 0;
- const char *audit_cause = "ENOMEM";
- struct ima_key_entry *entry;
-
- entry = kzalloc(sizeof(*entry), GFP_KERNEL);
- if (entry) {
- entry->payload = kmemdup(payload, payload_len, GFP_KERNEL);
- entry->keyring_name = kstrdup(keyring->description,
- GFP_KERNEL);
- entry->payload_len = payload_len;
- }
-
- if ((entry == NULL) || (entry->payload == NULL) ||
- (entry->keyring_name == NULL)) {
- rc = -ENOMEM;
- goto out;
- }
-
- INIT_LIST_HEAD(&entry->list);
-
-out:
- if (rc) {
- integrity_audit_message(AUDIT_INTEGRITY_PCR, NULL,
- keyring->description,
- func_measure_str(KEY_CHECK),
- audit_cause, rc, 0, rc);
- ima_free_key_entry(entry);
- entry = NULL;
- }
-
- return entry;
-}
-
-bool ima_queue_key(struct key *keyring, const void *payload,
- size_t payload_len)
-{
- bool queued = false;
- struct ima_key_entry *entry;
-
- entry = ima_alloc_key_entry(keyring, payload, payload_len);
- if (!entry)
- return false;
-
- mutex_lock(&ima_keys_lock);
- if (!ima_process_keys) {
- list_add_tail(&entry->list, &ima_keys);
- queued = true;
- }
- mutex_unlock(&ima_keys_lock);
-
- if (!queued)
- ima_free_key_entry(entry);
-
- return queued;
-}
-
-/*
- * ima_process_queued_keys() - process keys queued for measurement
- *
- * This function sets ima_process_keys to true and processes queued keys.
- * From here on keys will be processed right away (not queued).
- */
-void ima_process_queued_keys(void)
-{
- struct ima_key_entry *entry, *tmp;
- bool process = false;
-
- if (ima_process_keys)
- return;
-
- /*
- * Since ima_process_keys is set to true, any new key will be
- * processed immediately and not be queued to ima_keys list.
- * First one setting the ima_process_keys flag to true will
- * process the queued keys.
- */
- mutex_lock(&ima_keys_lock);
- if (!ima_process_keys) {
- ima_process_keys = true;
- process = true;
- }
- mutex_unlock(&ima_keys_lock);
-
- if (!process)
- return;
-
- if (!timer_expired)
- cancel_delayed_work_sync(&ima_keys_delayed_work);
-
- list_for_each_entry_safe(entry, tmp, &ima_keys, list) {
- if (!timer_expired)
- process_buffer_measurement(NULL, entry->payload,
- entry->payload_len,
- entry->keyring_name,
- KEY_CHECK, 0,
- entry->keyring_name,
- false);
- list_del(&entry->list);
- ima_free_key_entry(entry);
- }
-}
-
-inline bool ima_should_queue_key(void)
-{
- return !ima_process_keys;
-}
--
2.28.0
^ permalink raw reply related
* Re: [PATCH v7 1/3] Add a new LSM-supporting anonymous inode interface
From: Lokesh Gidra @ 2020-08-21 21:20 UTC (permalink / raw)
To: Alexander Viro, James Morris, Stephen Smalley, Casey Schaufler,
Eric Biggers
Cc: Serge E. Hallyn, Paul Moore, Eric Paris, Daniel Colascione,
Kees Cook, Eric W. Biederman, KP Singh, David Howells,
Thomas Cedeno, Anders Roxell, Sami Tolvanen, Matthew Garrett,
Aaron Goidel, Randy Dunlap, Joel Fernandes (Google), YueHaibing,
Christian Brauner, Alexei Starovoitov, Alexey Budankov,
Adrian Reber, Aleksa Sarai, Linux FS Devel, linux-kernel,
LSM List, SElinux list, Kalesh Singh, Calin Juravle,
Suren Baghdasaryan, Nick Kralevich, Jeffrey Vander Stoep,
kernel-team
In-Reply-To: <20200821185645.801971-2-lokeshgidra@google.com>
On Fri, Aug 21, 2020 at 11:57 AM Lokesh Gidra <lokeshgidra@google.com> wrote:
>
> From: Daniel Colascione <dancol@google.com>
>
> This change adds a new function, anon_inode_getfd_secure, that creates
> anonymous-node file with individual non-S_PRIVATE inode to which security
> modules can apply policy. Existing callers continue using the original
> singleton-inode kind of anonymous-inode file. We can transition anonymous
> inode users to the new kind of anonymous inode in individual patches for
> the sake of bisection and review.
>
> The new function accepts an optional context_inode parameter that
> callers can use to provide additional contextual information to
> security modules for granting/denying permission to create an anon inode
> of the same type.
>
> For example, in case of userfaultfd, the created inode is a
> 'logical child' of the context_inode (userfaultfd inode of the
> parent process) in the sense that it provides the security context
> required during creation of the child process' userfaultfd inode.
>
> Signed-off-by: Daniel Colascione <dancol@google.com>
>
> [Fix comment documenting return values of inode_init_security_anon()]
> [Add context_inode description in comments to anon_inode_getfd_secure()]
> [Remove definition of anon_inode_getfile_secure() as there are no callers]
> [Make _anon_inode_getfile() static]
> [Use correct error cast in _anon_inode_getfile()]
>
> Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
> ---
> fs/anon_inodes.c | 148 ++++++++++++++++++++++++----------
> include/linux/anon_inodes.h | 13 +++
> include/linux/lsm_hook_defs.h | 2 +
> include/linux/lsm_hooks.h | 7 ++
> include/linux/security.h | 3 +
> security/security.c | 9 +++
> 6 files changed, 141 insertions(+), 41 deletions(-)
>
> diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
> index 89714308c25b..2aa8b57be895 100644
> --- a/fs/anon_inodes.c
> +++ b/fs/anon_inodes.c
> @@ -55,61 +55,78 @@ static struct file_system_type anon_inode_fs_type = {
> .kill_sb = kill_anon_super,
> };
>
> -/**
> - * anon_inode_getfile - creates a new file instance by hooking it up to an
> - * anonymous inode, and a dentry that describe the "class"
> - * of the file
> - *
> - * @name: [in] name of the "class" of the new file
> - * @fops: [in] file operations for the new file
> - * @priv: [in] private data for the new file (will be file's private_data)
> - * @flags: [in] flags
> - *
> - * Creates a new file by hooking it on a single inode. This is useful for files
> - * that do not need to have a full-fledged inode in order to operate correctly.
> - * All the files created with anon_inode_getfile() will share a single inode,
> - * hence saving memory and avoiding code duplication for the file/inode/dentry
> - * setup. Returns the newly created file* or an error pointer.
> - */
> -struct file *anon_inode_getfile(const char *name,
> - const struct file_operations *fops,
> - void *priv, int flags)
> +static struct inode *anon_inode_make_secure_inode(
> + const char *name,
> + const struct inode *context_inode)
> +{
> + struct inode *inode;
> + const struct qstr qname = QSTR_INIT(name, strlen(name));
> + int error;
> +
> + inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
> + if (IS_ERR(inode))
> + return inode;
> + inode->i_flags &= ~S_PRIVATE;
> + error = security_inode_init_security_anon(
> + inode, &qname, context_inode);
> + if (error) {
> + iput(inode);
> + return ERR_PTR(error);
> + }
> + return inode;
> +}
> +
> +static struct file *_anon_inode_getfile(const char *name,
> + const struct file_operations *fops,
> + void *priv, int flags,
> + const struct inode *context_inode,
> + bool secure)
> {
> + struct inode *inode;
> struct file *file;
>
> - if (IS_ERR(anon_inode_inode))
> - return ERR_PTR(-ENODEV);
> + if (secure) {
> + inode = anon_inode_make_secure_inode(
> + name, context_inode);
> + if (IS_ERR(inode))
> + return ERR_CAST(inode);
> + } else {
> + inode = anon_inode_inode;
> + if (IS_ERR(inode))
> + return ERR_PTR(-ENODEV);
> + /*
> + * We know the anon_inode inode count is always
> + * greater than zero, so ihold() is safe.
> + */
> + ihold(inode);
> + }
>
> - if (fops->owner && !try_module_get(fops->owner))
> - return ERR_PTR(-ENOENT);
> + if (fops->owner && !try_module_get(fops->owner)) {
> + file = ERR_PTR(-ENOENT);
> + goto err;
> + }
>
> - /*
> - * We know the anon_inode inode count is always greater than zero,
> - * so ihold() is safe.
> - */
> - ihold(anon_inode_inode);
> - file = alloc_file_pseudo(anon_inode_inode, anon_inode_mnt, name,
> + file = alloc_file_pseudo(inode, anon_inode_mnt, name,
> flags & (O_ACCMODE | O_NONBLOCK), fops);
> if (IS_ERR(file))
> goto err;
>
> - file->f_mapping = anon_inode_inode->i_mapping;
> + file->f_mapping = inode->i_mapping;
>
> file->private_data = priv;
>
> return file;
>
> err:
> - iput(anon_inode_inode);
> + iput(inode);
> module_put(fops->owner);
> return file;
> }
> -EXPORT_SYMBOL_GPL(anon_inode_getfile);
>
> /**
> - * anon_inode_getfd - creates a new file instance by hooking it up to an
> - * anonymous inode, and a dentry that describe the "class"
> - * of the file
> + * anon_inode_getfile - creates a new file instance by hooking it up to an
> + * anonymous inode, and a dentry that describe the "class"
> + * of the file
> *
> * @name: [in] name of the "class" of the new file
> * @fops: [in] file operations for the new file
> @@ -118,12 +135,23 @@ EXPORT_SYMBOL_GPL(anon_inode_getfile);
> *
> * Creates a new file by hooking it on a single inode. This is useful for files
> * that do not need to have a full-fledged inode in order to operate correctly.
> - * All the files created with anon_inode_getfd() will share a single inode,
> + * All the files created with anon_inode_getfile() will share a single inode,
> * hence saving memory and avoiding code duplication for the file/inode/dentry
> - * setup. Returns new descriptor or an error code.
> + * setup. Returns the newly created file* or an error pointer.
> */
> -int anon_inode_getfd(const char *name, const struct file_operations *fops,
> - void *priv, int flags)
> +struct file *anon_inode_getfile(const char *name,
> + const struct file_operations *fops,
> + void *priv, int flags)
> +{
> + return _anon_inode_getfile(name, fops, priv, flags, NULL, false);
> +}
> +EXPORT_SYMBOL_GPL(anon_inode_getfile);
> +
> +static int _anon_inode_getfd(const char *name,
> + const struct file_operations *fops,
> + void *priv, int flags,
> + const struct inode *context_inode,
> + bool secure)
> {
> int error, fd;
> struct file *file;
> @@ -133,7 +161,8 @@ int anon_inode_getfd(const char *name, const struct file_operations *fops,
> return error;
> fd = error;
>
> - file = anon_inode_getfile(name, fops, priv, flags);
> + file = _anon_inode_getfile(name, fops, priv, flags, context_inode,
> + secure);
> if (IS_ERR(file)) {
> error = PTR_ERR(file);
> goto err_put_unused_fd;
> @@ -146,8 +175,46 @@ int anon_inode_getfd(const char *name, const struct file_operations *fops,
> put_unused_fd(fd);
> return error;
> }
> +
> +/**
> + * anon_inode_getfd - creates a new file instance by hooking it up to
> + * an anonymous inode and a dentry that describe
> + * the "class" of the file
> + *
> + * @name: [in] name of the "class" of the new file
> + * @fops: [in] file operations for the new file
> + * @priv: [in] private data for the new file (will be file's private_data)
> + * @flags: [in] flags
> + *
> + * Creates a new file by hooking it on a single inode. This is
> + * useful for files that do not need to have a full-fledged inode in
> + * order to operate correctly. All the files created with
> + * anon_inode_getfile() will use the same singleton inode, reducing
> + * memory use and avoiding code duplication for the file/inode/dentry
> + * setup. Returns a newly created file descriptor or an error code.
> + */
> +int anon_inode_getfd(const char *name, const struct file_operations *fops,
> + void *priv, int flags)
> +{
> + return _anon_inode_getfd(name, fops, priv, flags, NULL, false);
> +}
> EXPORT_SYMBOL_GPL(anon_inode_getfd);
>
> +/**
> + * Like anon_inode_getfd(), but adds the @context_inode argument to
> + * allow security modules to control creation of the new file. Once the
> + * security module makes the decision, this inode is no longer needed
> + * and hence reference to it is not held.
> + */
> +int anon_inode_getfd_secure(const char *name, const struct file_operations *fops,
> + void *priv, int flags,
> + const struct inode *context_inode)
> +{
> + return _anon_inode_getfd(name, fops, priv, flags,
> + context_inode, true);
> +}
> +EXPORT_SYMBOL_GPL(anon_inode_getfd_secure);
> +
> static int __init anon_inode_init(void)
> {
> anon_inode_mnt = kern_mount(&anon_inode_fs_type);
> @@ -162,4 +229,3 @@ static int __init anon_inode_init(void)
> }
>
> fs_initcall(anon_inode_init);
> -
> diff --git a/include/linux/anon_inodes.h b/include/linux/anon_inodes.h
> index d0d7d96261ad..67bd85d92dca 100644
> --- a/include/linux/anon_inodes.h
> +++ b/include/linux/anon_inodes.h
> @@ -10,12 +10,25 @@
> #define _LINUX_ANON_INODES_H
>
> struct file_operations;
> +struct inode;
> +
> +struct file *anon_inode_getfile_secure(const char *name,
> + const struct file_operations *fops,
> + void *priv, int flags,
> + const struct inode *context_inode);
>
oops, I forgot to remove the function prototype. I'll fix it in the
next revision. Meanwhile please let me know if any other issues are
noticed.
> struct file *anon_inode_getfile(const char *name,
> const struct file_operations *fops,
> void *priv, int flags);
> +
> +int anon_inode_getfd_secure(const char *name,
> + const struct file_operations *fops,
> + void *priv, int flags,
> + const struct inode *context_inode);
> +
> int anon_inode_getfd(const char *name, const struct file_operations *fops,
> void *priv, int flags);
>
> +
> #endif /* _LINUX_ANON_INODES_H */
>
> diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
> index 2a8c74d99015..35ff75c43de4 100644
> --- a/include/linux/lsm_hook_defs.h
> +++ b/include/linux/lsm_hook_defs.h
> @@ -113,6 +113,8 @@ LSM_HOOK(void, LSM_RET_VOID, inode_free_security, struct inode *inode)
> LSM_HOOK(int, 0, inode_init_security, struct inode *inode,
> struct inode *dir, const struct qstr *qstr, const char **name,
> void **value, size_t *len)
> +LSM_HOOK(int, 0, inode_init_security_anon, struct inode *inode,
> + const struct qstr *name, const struct inode *context_inode)
> LSM_HOOK(int, 0, inode_create, struct inode *dir, struct dentry *dentry,
> umode_t mode)
> LSM_HOOK(int, 0, inode_link, struct dentry *old_dentry, struct inode *dir,
> diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
> index 9e2e3e63719d..2d590d2689b9 100644
> --- a/include/linux/lsm_hooks.h
> +++ b/include/linux/lsm_hooks.h
> @@ -233,6 +233,13 @@
> * Returns 0 if @name and @value have been successfully set,
> * -EOPNOTSUPP if no security attribute is needed, or
> * -ENOMEM on memory allocation failure.
> + * @inode_init_security_anon:
> + * Set up a secure anonymous inode.
> + * @inode contains the inode structure
> + * @name name of the anonymous inode class
> + * @context_inode optional related inode
> + * Returns 0 on success, -EACCESS if the security module denies the
> + * creation of this inode, or another -errno upon other errors.
> * @inode_create:
> * Check permission to create a regular file.
> * @dir contains inode structure of the parent of the new file.
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 0a0a03b36a3b..95c133a8f8bb 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -322,6 +322,9 @@ void security_inode_free(struct inode *inode);
> int security_inode_init_security(struct inode *inode, struct inode *dir,
> const struct qstr *qstr,
> initxattrs initxattrs, void *fs_data);
> +int security_inode_init_security_anon(struct inode *inode,
> + const struct qstr *name,
> + const struct inode *context_inode);
> int security_old_inode_init_security(struct inode *inode, struct inode *dir,
> const struct qstr *qstr, const char **name,
> void **value, size_t *len);
> diff --git a/security/security.c b/security/security.c
> index 70a7ad357bc6..149b3f024e2d 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -1057,6 +1057,15 @@ int security_inode_init_security(struct inode *inode, struct inode *dir,
> }
> EXPORT_SYMBOL(security_inode_init_security);
>
> +int
> +security_inode_init_security_anon(struct inode *inode,
> + const struct qstr *name,
> + const struct inode *context_inode)
> +{
> + return call_int_hook(inode_init_security_anon, 0, inode, name,
> + context_inode);
> +}
> +
> int security_old_inode_init_security(struct inode *inode, struct inode *dir,
> const struct qstr *qstr, const char **name,
> void **value, size_t *len)
> --
> 2.28.0.297.g1956fa8f8d-goog
>
^ permalink raw reply
* Re: [net-next PATCH] netlabel: fix problems with mapping removal
From: Paul Moore @ 2020-08-21 20:36 UTC (permalink / raw)
To: David Miller; +Cc: netdev, selinux, linux-security-module, Stephen Smalley
In-Reply-To: <20200821.113844.1413413632075759126.davem@davemloft.net>
On Fri, Aug 21, 2020 at 2:38 PM David Miller <davem@davemloft.net> wrote:
> From: Paul Moore <paul@paul-moore.com>
> Date: Thu, 20 Aug 2020 21:46:14 -0400
>
> > This patch fixes two main problems seen when removing NetLabel
> > mappings: memory leaks and potentially extra audit noise.
>
> These are bug fixes therefore this needs to target the 'net' tree
> and you must also provide appropriate "Fixes:" tags.
>
> Thank you.
Reposted as requested:
https://lore.kernel.org/selinux/159804209207.16190.14955035148979265114.stgit@sifl
--
paul moore
www.paul-moore.com
^ permalink raw reply
* [net PATCH] netlabel: fix problems with mapping removal
From: Paul Moore @ 2020-08-21 20:34 UTC (permalink / raw)
To: netdev; +Cc: linux-security-module, selinux, Stephen Smalley
This patch fixes two main problems seen when removing NetLabel
mappings: memory leaks and potentially extra audit noise.
The memory leaks are caused by not properly free'ing the mapping's
address selector struct when free'ing the entire entry as well as
not properly cleaning up a temporary mapping entry when adding new
address selectors to an existing entry. This patch fixes both these
problems such that kmemleak reports no NetLabel associated leaks
after running the SELinux test suite.
The potentially extra audit noise was caused by the auditing code in
netlbl_domhsh_remove_entry() being called regardless of the entry's
validity. If another thread had already marked the entry as invalid,
but not removed/free'd it from the list of mappings, then it was
possible that an additional mapping removal audit record would be
generated. This patch fixes this by returning early from the removal
function when the entry was previously marked invalid. This change
also had the side benefit of improving the code by decreasing the
indentation level of large chunk of code by one (accounting for most
of the diffstat).
Fixes: 63c416887437 ("netlabel: Add network address selectors to the NetLabel/LSM domain mapping")
Reported-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
---
net/netlabel/netlabel_domainhash.c | 59 ++++++++++++++++++------------------
1 file changed, 30 insertions(+), 29 deletions(-)
diff --git a/net/netlabel/netlabel_domainhash.c b/net/netlabel/netlabel_domainhash.c
index d07de2c0fbc7..f73a8382c275 100644
--- a/net/netlabel/netlabel_domainhash.c
+++ b/net/netlabel/netlabel_domainhash.c
@@ -85,6 +85,7 @@ static void netlbl_domhsh_free_entry(struct rcu_head *entry)
kfree(netlbl_domhsh_addr6_entry(iter6));
}
#endif /* IPv6 */
+ kfree(ptr->def.addrsel);
}
kfree(ptr->domain);
kfree(ptr);
@@ -537,6 +538,8 @@ int netlbl_domhsh_add(struct netlbl_dom_map *entry,
goto add_return;
}
#endif /* IPv6 */
+ /* cleanup the new entry since we've moved everything over */
+ netlbl_domhsh_free_entry(&entry->rcu);
} else
ret_val = -EINVAL;
@@ -580,6 +583,12 @@ int netlbl_domhsh_remove_entry(struct netlbl_dom_map *entry,
{
int ret_val = 0;
struct audit_buffer *audit_buf;
+ struct netlbl_af4list *iter4;
+ struct netlbl_domaddr4_map *map4;
+#if IS_ENABLED(CONFIG_IPV6)
+ struct netlbl_af6list *iter6;
+ struct netlbl_domaddr6_map *map6;
+#endif /* IPv6 */
if (entry == NULL)
return -ENOENT;
@@ -597,6 +606,9 @@ int netlbl_domhsh_remove_entry(struct netlbl_dom_map *entry,
ret_val = -ENOENT;
spin_unlock(&netlbl_domhsh_lock);
+ if (ret_val)
+ return ret_val;
+
audit_buf = netlbl_audit_start_common(AUDIT_MAC_MAP_DEL, audit_info);
if (audit_buf != NULL) {
audit_log_format(audit_buf,
@@ -606,40 +618,29 @@ int netlbl_domhsh_remove_entry(struct netlbl_dom_map *entry,
audit_log_end(audit_buf);
}
- if (ret_val == 0) {
- struct netlbl_af4list *iter4;
- struct netlbl_domaddr4_map *map4;
-#if IS_ENABLED(CONFIG_IPV6)
- struct netlbl_af6list *iter6;
- struct netlbl_domaddr6_map *map6;
-#endif /* IPv6 */
-
- switch (entry->def.type) {
- case NETLBL_NLTYPE_ADDRSELECT:
- netlbl_af4list_foreach_rcu(iter4,
- &entry->def.addrsel->list4) {
- map4 = netlbl_domhsh_addr4_entry(iter4);
- cipso_v4_doi_putdef(map4->def.cipso);
- }
+ switch (entry->def.type) {
+ case NETLBL_NLTYPE_ADDRSELECT:
+ netlbl_af4list_foreach_rcu(iter4, &entry->def.addrsel->list4) {
+ map4 = netlbl_domhsh_addr4_entry(iter4);
+ cipso_v4_doi_putdef(map4->def.cipso);
+ }
#if IS_ENABLED(CONFIG_IPV6)
- netlbl_af6list_foreach_rcu(iter6,
- &entry->def.addrsel->list6) {
- map6 = netlbl_domhsh_addr6_entry(iter6);
- calipso_doi_putdef(map6->def.calipso);
- }
+ netlbl_af6list_foreach_rcu(iter6, &entry->def.addrsel->list6) {
+ map6 = netlbl_domhsh_addr6_entry(iter6);
+ calipso_doi_putdef(map6->def.calipso);
+ }
#endif /* IPv6 */
- break;
- case NETLBL_NLTYPE_CIPSOV4:
- cipso_v4_doi_putdef(entry->def.cipso);
- break;
+ break;
+ case NETLBL_NLTYPE_CIPSOV4:
+ cipso_v4_doi_putdef(entry->def.cipso);
+ break;
#if IS_ENABLED(CONFIG_IPV6)
- case NETLBL_NLTYPE_CALIPSO:
- calipso_doi_putdef(entry->def.calipso);
- break;
+ case NETLBL_NLTYPE_CALIPSO:
+ calipso_doi_putdef(entry->def.calipso);
+ break;
#endif /* IPv6 */
- }
- call_rcu(&entry->rcu, netlbl_domhsh_free_entry);
}
+ call_rcu(&entry->rcu, netlbl_domhsh_free_entry);
return ret_val;
}
^ permalink raw reply related
* Re: [PATCH 03/11] evm: Refuse EVM_ALLOW_METADATA_WRITES only if the HMAC key is loaded
From: Mimi Zohar @ 2020-08-21 20:14 UTC (permalink / raw)
To: Roberto Sassu, mjg59
Cc: linux-integrity, linux-security-module, linux-kernel, stable
In-Reply-To: <20200618160133.937-3-roberto.sassu@huawei.com>
Hi Roberto,
On Thu, 2020-06-18 at 18:01 +0200, Roberto Sassu wrote:
> Granting metadata write is safe if the HMAC key is not loaded, as it won't
> let an attacker obtain a valid HMAC from corrupted xattrs. evm_write_key()
> however does not allow it if any key is loaded, including a public key,
> which should not be a problem.
>
Why is the existing hebavior a problem? What is the problem being
solved?
> This patch allows setting EVM_ALLOW_METADATA_WRITES if the EVM_INIT_HMAC
> flag is not set.
>
> Cc: stable@vger.kernel.org # 4.16.x
> Fixes: ae1ba1676b88e ("EVM: Allow userland to permit modification of EVM-protected metadata")
> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
> ---
> security/integrity/evm/evm_secfs.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/security/integrity/evm/evm_secfs.c b/security/integrity/evm/evm_secfs.c
> index cfc3075769bb..92fe26ace797 100644
> --- a/security/integrity/evm/evm_secfs.c
> +++ b/security/integrity/evm/evm_secfs.c
> @@ -84,7 +84,7 @@ static ssize_t evm_write_key(struct file *file, const char __user *buf,
> * keys are loaded.
> */
> if ((i & EVM_ALLOW_METADATA_WRITES) &&
> - ((evm_initialized & EVM_KEY_MASK) != 0) &&
> + ((evm_initialized & EVM_INIT_HMAC) != 0) &&
> !(evm_initialized & EVM_ALLOW_METADATA_WRITES))
> return -EPERM;
>
Documentation/ABI/testing/evm needs to be updated as well.
thanks,
Mimi
^ permalink raw reply
* [PATCH v7 2/3] Teach SELinux about anonymous inodes
From: Lokesh Gidra @ 2020-08-21 18:56 UTC (permalink / raw)
To: Alexander Viro, James Morris, Stephen Smalley, Casey Schaufler,
Eric Biggers
Cc: Serge E. Hallyn, Paul Moore, Eric Paris, Lokesh Gidra,
Daniel Colascione, Kees Cook, Eric W. Biederman, KP Singh,
David Howells, Thomas Cedeno, Anders Roxell, Sami Tolvanen,
Matthew Garrett, Aaron Goidel, Randy Dunlap,
Joel Fernandes (Google), YueHaibing, Christian Brauner,
Alexei Starovoitov, Alexey Budankov, Adrian Reber, Aleksa Sarai,
linux-fsdevel, linux-kernel, linux-security-module, selinux,
kaleshsingh, calin, surenb, nnk, jeffv, kernel-team,
Daniel Colascione, Andrew Morton
In-Reply-To: <20200821185645.801971-1-lokeshgidra@google.com>
From: Daniel Colascione <dancol@google.com>
This change uses the anon_inodes and LSM infrastructure introduced in
the previous patch to give SELinux the ability to control
anonymous-inode files that are created using the new anon_inode_getfd_secure()
function.
A SELinux policy author detects and controls these anonymous inodes by
adding a name-based type_transition rule that assigns a new security
type to anonymous-inode files created in some domain. The name used
for the name-based transition is the name associated with the
anonymous inode for file listings --- e.g., "[userfaultfd]" or
"[perf_event]".
Example:
type uffd_t;
type_transition sysadm_t sysadm_t : anon_inode uffd_t "[userfaultfd]";
allow sysadm_t uffd_t:anon_inode { create };
(The next patch in this series is necessary for making userfaultfd
support this new interface. The example above is just
for exposition.)
Signed-off-by: Daniel Colascione <dancol@google.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
security/selinux/hooks.c | 53 +++++++++++++++++++++++++++++
security/selinux/include/classmap.h | 2 ++
2 files changed, 55 insertions(+)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index ca901025802a..5b403ad44aad 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2926,6 +2926,58 @@ static int selinux_inode_init_security(struct inode *inode, struct inode *dir,
return 0;
}
+static int selinux_inode_init_security_anon(struct inode *inode,
+ const struct qstr *name,
+ const struct inode *context_inode)
+{
+ const struct task_security_struct *tsec = selinux_cred(current_cred());
+ struct common_audit_data ad;
+ struct inode_security_struct *isec;
+ int rc;
+
+ if (unlikely(!selinux_state.initialized))
+ return 0;
+
+ isec = selinux_inode(inode);
+
+ /*
+ * We only get here once per ephemeral inode. The inode has
+ * been initialized via inode_alloc_security but is otherwise
+ * untouched.
+ */
+
+ if (context_inode) {
+ struct inode_security_struct *context_isec =
+ selinux_inode(context_inode);
+ isec->sclass = context_isec->sclass;
+ isec->sid = context_isec->sid;
+ } else {
+ isec->sclass = SECCLASS_ANON_INODE;
+ rc = security_transition_sid(
+ &selinux_state, tsec->sid, tsec->sid,
+ isec->sclass, name, &isec->sid);
+ if (rc)
+ return rc;
+ }
+
+ isec->initialized = LABEL_INITIALIZED;
+
+ /*
+ * Now that we've initialized security, check whether we're
+ * allowed to actually create this type of anonymous inode.
+ */
+
+ ad.type = LSM_AUDIT_DATA_INODE;
+ ad.u.inode = inode;
+
+ return avc_has_perm(&selinux_state,
+ tsec->sid,
+ isec->sid,
+ isec->sclass,
+ FILE__CREATE,
+ &ad);
+}
+
static int selinux_inode_create(struct inode *dir, struct dentry *dentry, umode_t mode)
{
return may_create(dir, dentry, SECCLASS_FILE);
@@ -6993,6 +7045,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
LSM_HOOK_INIT(inode_free_security, selinux_inode_free_security),
LSM_HOOK_INIT(inode_init_security, selinux_inode_init_security),
+ LSM_HOOK_INIT(inode_init_security_anon, selinux_inode_init_security_anon),
LSM_HOOK_INIT(inode_create, selinux_inode_create),
LSM_HOOK_INIT(inode_link, selinux_inode_link),
LSM_HOOK_INIT(inode_unlink, selinux_inode_unlink),
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index 40cebde62856..ba2e01a6955c 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -249,6 +249,8 @@ struct security_class_mapping secclass_map[] = {
{"open", "cpu", "kernel", "tracepoint", "read", "write"} },
{ "lockdown",
{ "integrity", "confidentiality", NULL } },
+ { "anon_inode",
+ { COMMON_FILE_PERMS, NULL } },
{ NULL }
};
--
2.28.0.297.g1956fa8f8d-goog
^ permalink raw reply related
* [PATCH v7 3/3] Wire UFFD up to SELinux
From: Lokesh Gidra @ 2020-08-21 18:56 UTC (permalink / raw)
To: Alexander Viro, James Morris, Stephen Smalley, Casey Schaufler,
Eric Biggers
Cc: Serge E. Hallyn, Paul Moore, Eric Paris, Lokesh Gidra,
Daniel Colascione, Kees Cook, Eric W. Biederman, KP Singh,
David Howells, Thomas Cedeno, Anders Roxell, Sami Tolvanen,
Matthew Garrett, Aaron Goidel, Randy Dunlap,
Joel Fernandes (Google), YueHaibing, Christian Brauner,
Alexei Starovoitov, Alexey Budankov, Adrian Reber, Aleksa Sarai,
linux-fsdevel, linux-kernel, linux-security-module, selinux,
kaleshsingh, calin, surenb, nnk, jeffv, kernel-team,
Daniel Colascione
In-Reply-To: <20200821185645.801971-1-lokeshgidra@google.com>
From: Daniel Colascione <dancol@google.com>
This change gives userfaultfd file descriptors a real security
context, allowing policy to act on them.
Signed-off-by: Daniel Colascione <dancol@google.com>
[Remove owner inode from userfaultfd_ctx]
[Use anon_inode_getfd_secure() instead of anon_inode_getfile_secure()
in userfaultfd syscall]
[Use inode of file in userfaultfd_read() in resolve_userfault_fork()]
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
---
fs/userfaultfd.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 0e4a3837da52..46ea552fe7c4 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -978,14 +978,16 @@ static __poll_t userfaultfd_poll(struct file *file, poll_table *wait)
static const struct file_operations userfaultfd_fops;
-static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
- struct userfaultfd_ctx *new,
+static int resolve_userfault_fork(struct userfaultfd_ctx *new,
+ struct inode *inode,
struct uffd_msg *msg)
{
int fd;
- fd = anon_inode_getfd("[userfaultfd]", &userfaultfd_fops, new,
- O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
+ fd = anon_inode_getfd_secure(
+ "[userfaultfd]", &userfaultfd_fops, new,
+ O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS),
+ inode);
if (fd < 0)
return fd;
@@ -995,7 +997,7 @@ static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
}
static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
- struct uffd_msg *msg)
+ struct uffd_msg *msg, struct inode *inode)
{
ssize_t ret;
DECLARE_WAITQUEUE(wait, current);
@@ -1106,7 +1108,7 @@ static ssize_t userfaultfd_ctx_read(struct userfaultfd_ctx *ctx, int no_wait,
spin_unlock_irq(&ctx->fd_wqh.lock);
if (!ret && msg->event == UFFD_EVENT_FORK) {
- ret = resolve_userfault_fork(ctx, fork_nctx, msg);
+ ret = resolve_userfault_fork(fork_nctx, inode, msg);
spin_lock_irq(&ctx->event_wqh.lock);
if (!list_empty(&fork_event)) {
/*
@@ -1166,6 +1168,7 @@ static ssize_t userfaultfd_read(struct file *file, char __user *buf,
ssize_t _ret, ret = 0;
struct uffd_msg msg;
int no_wait = file->f_flags & O_NONBLOCK;
+ struct inode *inode = file_inode(file);
if (ctx->state == UFFD_STATE_WAIT_API)
return -EINVAL;
@@ -1173,7 +1176,7 @@ static ssize_t userfaultfd_read(struct file *file, char __user *buf,
for (;;) {
if (count < sizeof(msg))
return ret ? ret : -EINVAL;
- _ret = userfaultfd_ctx_read(ctx, no_wait, &msg);
+ _ret = userfaultfd_ctx_read(ctx, no_wait, &msg, inode);
if (_ret < 0)
return ret ? ret : _ret;
if (copy_to_user((__u64 __user *) buf, &msg, sizeof(msg)))
@@ -1995,8 +1998,10 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
/* prevent the mm struct to be freed */
mmgrab(ctx->mm);
- fd = anon_inode_getfd("[userfaultfd]", &userfaultfd_fops, ctx,
- O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS));
+ fd = anon_inode_getfd_secure("[userfaultfd]",
+ &userfaultfd_fops, ctx,
+ O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS),
+ NULL);
if (fd < 0) {
mmdrop(ctx->mm);
kmem_cache_free(userfaultfd_ctx_cachep, ctx);
--
2.28.0.297.g1956fa8f8d-goog
^ permalink raw reply related
* [PATCH v7 1/3] Add a new LSM-supporting anonymous inode interface
From: Lokesh Gidra @ 2020-08-21 18:56 UTC (permalink / raw)
To: Alexander Viro, James Morris, Stephen Smalley, Casey Schaufler,
Eric Biggers
Cc: Serge E. Hallyn, Paul Moore, Eric Paris, Lokesh Gidra,
Daniel Colascione, Kees Cook, Eric W. Biederman, KP Singh,
David Howells, Thomas Cedeno, Anders Roxell, Sami Tolvanen,
Matthew Garrett, Aaron Goidel, Randy Dunlap,
Joel Fernandes (Google), YueHaibing, Christian Brauner,
Alexei Starovoitov, Alexey Budankov, Adrian Reber, Aleksa Sarai,
linux-fsdevel, linux-kernel, linux-security-module, selinux,
kaleshsingh, calin, surenb, nnk, jeffv, kernel-team,
Daniel Colascione
In-Reply-To: <20200821185645.801971-1-lokeshgidra@google.com>
From: Daniel Colascione <dancol@google.com>
This change adds a new function, anon_inode_getfd_secure, that creates
anonymous-node file with individual non-S_PRIVATE inode to which security
modules can apply policy. Existing callers continue using the original
singleton-inode kind of anonymous-inode file. We can transition anonymous
inode users to the new kind of anonymous inode in individual patches for
the sake of bisection and review.
The new function accepts an optional context_inode parameter that
callers can use to provide additional contextual information to
security modules for granting/denying permission to create an anon inode
of the same type.
For example, in case of userfaultfd, the created inode is a
'logical child' of the context_inode (userfaultfd inode of the
parent process) in the sense that it provides the security context
required during creation of the child process' userfaultfd inode.
Signed-off-by: Daniel Colascione <dancol@google.com>
[Fix comment documenting return values of inode_init_security_anon()]
[Add context_inode description in comments to anon_inode_getfd_secure()]
[Remove definition of anon_inode_getfile_secure() as there are no callers]
[Make _anon_inode_getfile() static]
[Use correct error cast in _anon_inode_getfile()]
Signed-off-by: Lokesh Gidra <lokeshgidra@google.com>
---
fs/anon_inodes.c | 148 ++++++++++++++++++++++++----------
include/linux/anon_inodes.h | 13 +++
include/linux/lsm_hook_defs.h | 2 +
include/linux/lsm_hooks.h | 7 ++
include/linux/security.h | 3 +
security/security.c | 9 +++
6 files changed, 141 insertions(+), 41 deletions(-)
diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 89714308c25b..2aa8b57be895 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -55,61 +55,78 @@ static struct file_system_type anon_inode_fs_type = {
.kill_sb = kill_anon_super,
};
-/**
- * anon_inode_getfile - creates a new file instance by hooking it up to an
- * anonymous inode, and a dentry that describe the "class"
- * of the file
- *
- * @name: [in] name of the "class" of the new file
- * @fops: [in] file operations for the new file
- * @priv: [in] private data for the new file (will be file's private_data)
- * @flags: [in] flags
- *
- * Creates a new file by hooking it on a single inode. This is useful for files
- * that do not need to have a full-fledged inode in order to operate correctly.
- * All the files created with anon_inode_getfile() will share a single inode,
- * hence saving memory and avoiding code duplication for the file/inode/dentry
- * setup. Returns the newly created file* or an error pointer.
- */
-struct file *anon_inode_getfile(const char *name,
- const struct file_operations *fops,
- void *priv, int flags)
+static struct inode *anon_inode_make_secure_inode(
+ const char *name,
+ const struct inode *context_inode)
+{
+ struct inode *inode;
+ const struct qstr qname = QSTR_INIT(name, strlen(name));
+ int error;
+
+ inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
+ if (IS_ERR(inode))
+ return inode;
+ inode->i_flags &= ~S_PRIVATE;
+ error = security_inode_init_security_anon(
+ inode, &qname, context_inode);
+ if (error) {
+ iput(inode);
+ return ERR_PTR(error);
+ }
+ return inode;
+}
+
+static struct file *_anon_inode_getfile(const char *name,
+ const struct file_operations *fops,
+ void *priv, int flags,
+ const struct inode *context_inode,
+ bool secure)
{
+ struct inode *inode;
struct file *file;
- if (IS_ERR(anon_inode_inode))
- return ERR_PTR(-ENODEV);
+ if (secure) {
+ inode = anon_inode_make_secure_inode(
+ name, context_inode);
+ if (IS_ERR(inode))
+ return ERR_CAST(inode);
+ } else {
+ inode = anon_inode_inode;
+ if (IS_ERR(inode))
+ return ERR_PTR(-ENODEV);
+ /*
+ * We know the anon_inode inode count is always
+ * greater than zero, so ihold() is safe.
+ */
+ ihold(inode);
+ }
- if (fops->owner && !try_module_get(fops->owner))
- return ERR_PTR(-ENOENT);
+ if (fops->owner && !try_module_get(fops->owner)) {
+ file = ERR_PTR(-ENOENT);
+ goto err;
+ }
- /*
- * We know the anon_inode inode count is always greater than zero,
- * so ihold() is safe.
- */
- ihold(anon_inode_inode);
- file = alloc_file_pseudo(anon_inode_inode, anon_inode_mnt, name,
+ file = alloc_file_pseudo(inode, anon_inode_mnt, name,
flags & (O_ACCMODE | O_NONBLOCK), fops);
if (IS_ERR(file))
goto err;
- file->f_mapping = anon_inode_inode->i_mapping;
+ file->f_mapping = inode->i_mapping;
file->private_data = priv;
return file;
err:
- iput(anon_inode_inode);
+ iput(inode);
module_put(fops->owner);
return file;
}
-EXPORT_SYMBOL_GPL(anon_inode_getfile);
/**
- * anon_inode_getfd - creates a new file instance by hooking it up to an
- * anonymous inode, and a dentry that describe the "class"
- * of the file
+ * anon_inode_getfile - creates a new file instance by hooking it up to an
+ * anonymous inode, and a dentry that describe the "class"
+ * of the file
*
* @name: [in] name of the "class" of the new file
* @fops: [in] file operations for the new file
@@ -118,12 +135,23 @@ EXPORT_SYMBOL_GPL(anon_inode_getfile);
*
* Creates a new file by hooking it on a single inode. This is useful for files
* that do not need to have a full-fledged inode in order to operate correctly.
- * All the files created with anon_inode_getfd() will share a single inode,
+ * All the files created with anon_inode_getfile() will share a single inode,
* hence saving memory and avoiding code duplication for the file/inode/dentry
- * setup. Returns new descriptor or an error code.
+ * setup. Returns the newly created file* or an error pointer.
*/
-int anon_inode_getfd(const char *name, const struct file_operations *fops,
- void *priv, int flags)
+struct file *anon_inode_getfile(const char *name,
+ const struct file_operations *fops,
+ void *priv, int flags)
+{
+ return _anon_inode_getfile(name, fops, priv, flags, NULL, false);
+}
+EXPORT_SYMBOL_GPL(anon_inode_getfile);
+
+static int _anon_inode_getfd(const char *name,
+ const struct file_operations *fops,
+ void *priv, int flags,
+ const struct inode *context_inode,
+ bool secure)
{
int error, fd;
struct file *file;
@@ -133,7 +161,8 @@ int anon_inode_getfd(const char *name, const struct file_operations *fops,
return error;
fd = error;
- file = anon_inode_getfile(name, fops, priv, flags);
+ file = _anon_inode_getfile(name, fops, priv, flags, context_inode,
+ secure);
if (IS_ERR(file)) {
error = PTR_ERR(file);
goto err_put_unused_fd;
@@ -146,8 +175,46 @@ int anon_inode_getfd(const char *name, const struct file_operations *fops,
put_unused_fd(fd);
return error;
}
+
+/**
+ * anon_inode_getfd - creates a new file instance by hooking it up to
+ * an anonymous inode and a dentry that describe
+ * the "class" of the file
+ *
+ * @name: [in] name of the "class" of the new file
+ * @fops: [in] file operations for the new file
+ * @priv: [in] private data for the new file (will be file's private_data)
+ * @flags: [in] flags
+ *
+ * Creates a new file by hooking it on a single inode. This is
+ * useful for files that do not need to have a full-fledged inode in
+ * order to operate correctly. All the files created with
+ * anon_inode_getfile() will use the same singleton inode, reducing
+ * memory use and avoiding code duplication for the file/inode/dentry
+ * setup. Returns a newly created file descriptor or an error code.
+ */
+int anon_inode_getfd(const char *name, const struct file_operations *fops,
+ void *priv, int flags)
+{
+ return _anon_inode_getfd(name, fops, priv, flags, NULL, false);
+}
EXPORT_SYMBOL_GPL(anon_inode_getfd);
+/**
+ * Like anon_inode_getfd(), but adds the @context_inode argument to
+ * allow security modules to control creation of the new file. Once the
+ * security module makes the decision, this inode is no longer needed
+ * and hence reference to it is not held.
+ */
+int anon_inode_getfd_secure(const char *name, const struct file_operations *fops,
+ void *priv, int flags,
+ const struct inode *context_inode)
+{
+ return _anon_inode_getfd(name, fops, priv, flags,
+ context_inode, true);
+}
+EXPORT_SYMBOL_GPL(anon_inode_getfd_secure);
+
static int __init anon_inode_init(void)
{
anon_inode_mnt = kern_mount(&anon_inode_fs_type);
@@ -162,4 +229,3 @@ static int __init anon_inode_init(void)
}
fs_initcall(anon_inode_init);
-
diff --git a/include/linux/anon_inodes.h b/include/linux/anon_inodes.h
index d0d7d96261ad..67bd85d92dca 100644
--- a/include/linux/anon_inodes.h
+++ b/include/linux/anon_inodes.h
@@ -10,12 +10,25 @@
#define _LINUX_ANON_INODES_H
struct file_operations;
+struct inode;
+
+struct file *anon_inode_getfile_secure(const char *name,
+ const struct file_operations *fops,
+ void *priv, int flags,
+ const struct inode *context_inode);
struct file *anon_inode_getfile(const char *name,
const struct file_operations *fops,
void *priv, int flags);
+
+int anon_inode_getfd_secure(const char *name,
+ const struct file_operations *fops,
+ void *priv, int flags,
+ const struct inode *context_inode);
+
int anon_inode_getfd(const char *name, const struct file_operations *fops,
void *priv, int flags);
+
#endif /* _LINUX_ANON_INODES_H */
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 2a8c74d99015..35ff75c43de4 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -113,6 +113,8 @@ LSM_HOOK(void, LSM_RET_VOID, inode_free_security, struct inode *inode)
LSM_HOOK(int, 0, inode_init_security, struct inode *inode,
struct inode *dir, const struct qstr *qstr, const char **name,
void **value, size_t *len)
+LSM_HOOK(int, 0, inode_init_security_anon, struct inode *inode,
+ const struct qstr *name, const struct inode *context_inode)
LSM_HOOK(int, 0, inode_create, struct inode *dir, struct dentry *dentry,
umode_t mode)
LSM_HOOK(int, 0, inode_link, struct dentry *old_dentry, struct inode *dir,
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 9e2e3e63719d..2d590d2689b9 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -233,6 +233,13 @@
* Returns 0 if @name and @value have been successfully set,
* -EOPNOTSUPP if no security attribute is needed, or
* -ENOMEM on memory allocation failure.
+ * @inode_init_security_anon:
+ * Set up a secure anonymous inode.
+ * @inode contains the inode structure
+ * @name name of the anonymous inode class
+ * @context_inode optional related inode
+ * Returns 0 on success, -EACCESS if the security module denies the
+ * creation of this inode, or another -errno upon other errors.
* @inode_create:
* Check permission to create a regular file.
* @dir contains inode structure of the parent of the new file.
diff --git a/include/linux/security.h b/include/linux/security.h
index 0a0a03b36a3b..95c133a8f8bb 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -322,6 +322,9 @@ void security_inode_free(struct inode *inode);
int security_inode_init_security(struct inode *inode, struct inode *dir,
const struct qstr *qstr,
initxattrs initxattrs, void *fs_data);
+int security_inode_init_security_anon(struct inode *inode,
+ const struct qstr *name,
+ const struct inode *context_inode);
int security_old_inode_init_security(struct inode *inode, struct inode *dir,
const struct qstr *qstr, const char **name,
void **value, size_t *len);
diff --git a/security/security.c b/security/security.c
index 70a7ad357bc6..149b3f024e2d 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1057,6 +1057,15 @@ int security_inode_init_security(struct inode *inode, struct inode *dir,
}
EXPORT_SYMBOL(security_inode_init_security);
+int
+security_inode_init_security_anon(struct inode *inode,
+ const struct qstr *name,
+ const struct inode *context_inode)
+{
+ return call_int_hook(inode_init_security_anon, 0, inode, name,
+ context_inode);
+}
+
int security_old_inode_init_security(struct inode *inode, struct inode *dir,
const struct qstr *qstr, const char **name,
void **value, size_t *len)
--
2.28.0.297.g1956fa8f8d-goog
^ permalink raw reply related
* [PATCH v7 0/3] SELinux support for anonymous inodes and UFFD
From: Lokesh Gidra @ 2020-08-21 18:56 UTC (permalink / raw)
To: Alexander Viro, James Morris, Stephen Smalley, Casey Schaufler,
Eric Biggers
Cc: Serge E. Hallyn, Paul Moore, Eric Paris, Lokesh Gidra,
Daniel Colascione, Kees Cook, Eric W. Biederman, KP Singh,
David Howells, Thomas Cedeno, Anders Roxell, Sami Tolvanen,
Matthew Garrett, Aaron Goidel, Randy Dunlap,
Joel Fernandes (Google), YueHaibing, Christian Brauner,
Alexei Starovoitov, Alexey Budankov, Adrian Reber, Aleksa Sarai,
linux-fsdevel, linux-kernel, linux-security-module, selinux,
kaleshsingh, calin, surenb, nnk, jeffv, kernel-team
Userfaultfd in unprivileged contexts could be potentially very
useful. We'd like to harden userfaultfd to make such unprivileged use
less risky. This patch series allows SELinux to manage userfaultfd
file descriptors and in the future, other kinds of
anonymous-inode-based file descriptor. SELinux policy authors can
apply policy types to anonymous inodes by providing name-based
transition rules keyed off the anonymous inode internal name (
"[userfaultfd]" in the case of userfaultfd(2) file descriptors) and
applying policy to the new SIDs thus produced.
With SELinux managed userfaultfd, an admin can control creation and
movement of the file descriptors. In particular, handling of
a userfaultfd descriptor by a different process is essentially a
ptrace access into the process, without any of the corresponding
security_ptrace_access_check() checks. For privacy, the admin may
want to deny such accesses, which is possible with SELinux support.
Inside the kernel, a new anon_inode interface, anon_inode_getfd_secure,
allows callers to opt into this SELinux management. In this new "secure"
mode, anon_inodes create new ephemeral inodes for anonymous file objects
instead of reusing the normal anon_inodes singleton dummy inode. A new
LSM hook gives security modules an opportunity to configure and veto
these ephemeral inodes.
This patch series is one of two fork of [1] and is an
alternative to [2].
The primary difference between the two patch series is that this
partch series creates a unique inode for each "secure" anonymous
inode, while the other patch series ([2]) continues using the
singleton dummy anonymous inode and adds a way to attach SELinux
security information directly to file objects.
I prefer the approach in this patch series because 1) it's a smaller
patch than [2], and 2) it produces a more regular security
architecture: in this patch series, secure anonymous inodes aren't
S_PRIVATE and they maintain the SELinux property that the label for a
file is in its inode. We do need an additional inode per anonymous
file, but per-struct-file inode creation doesn't seem to be a problem
for pipes and sockets.
The previous version of this feature ([1]) created a new SELinux
security class for userfaultfd file descriptors. This version adopts
the generic transition-based approach of [2].
This patch series also differs from [2] in that it doesn't affect all
anonymous inodes right away --- instead requiring anon_inodes callers
to opt in --- but this difference isn't one of basic approach. The
important question to resolve is whether we should be creating new
inodes or enhancing per-file data.
Changes from the first version of the patch:
- Removed some error checks
- Defined a new anon_inode SELinux class to resolve the
ambiguity in [3]
- Inherit sclass as well as descriptor from context inode
Changes from the second version of the patch:
- Fixed example policy in the commit message to reflect the use of
the new anon_inode class.
Changes from the third version of the patch:
- Dropped the fops parameter to the LSM hook
- Documented hook parameters
- Fixed incorrect class used for SELinux transition
- Removed stray UFFD changed early in the series
- Removed a redundant ERR_PTR(PTR_ERR())
Changes from the fourth version of the patch:
- Removed an unused parameter from an internal function
- Fixed function documentation
Changes from the fifth version of the patch:
- Fixed function documentation in fs/anon_inodes.c and
include/linux/lsm_hooks.h
- Used anon_inode_getfd_secure() in userfaultfd() syscall and removed
owner from userfaultfd_ctx.
Changed from the sixth version of the patch:
- Removed definition of anon_inode_getfile_secure() as there are no
callers.
- Simplified function description of anon_inode_getfd_secure().
- Elaborated more on the purpose of 'context_inode' in commit message.
[1] https://lore.kernel.org/lkml/20200211225547.235083-1-dancol@google.com/
[2] https://lore.kernel.org/linux-fsdevel/20200213194157.5877-1-sds@tycho.nsa.gov/
[3] https://lore.kernel.org/lkml/23f725ca-5b5a-5938-fcc8-5bbbfc9ba9bc@tycho.nsa.gov/
Daniel Colascione (3):
Add a new LSM-supporting anonymous inode interface
Teach SELinux about anonymous inodes
Wire UFFD up to SELinux
fs/anon_inodes.c | 148 ++++++++++++++++++++--------
fs/userfaultfd.c | 23 +++--
include/linux/anon_inodes.h | 13 +++
include/linux/lsm_hook_defs.h | 2 +
include/linux/lsm_hooks.h | 7 ++
include/linux/security.h | 3 +
security/security.c | 9 ++
security/selinux/hooks.c | 53 ++++++++++
security/selinux/include/classmap.h | 2 +
9 files changed, 210 insertions(+), 50 deletions(-)
--
2.28.0.297.g1956fa8f8d-goog
^ permalink raw reply
* Re: [PATCH 02/11] evm: Load EVM key in ima_load_x509() to avoid appraisal
From: Mimi Zohar @ 2020-08-21 18:45 UTC (permalink / raw)
To: Roberto Sassu, mjg59; +Cc: linux-integrity, linux-security-module, linux-kernel
In-Reply-To: <20200618160133.937-2-roberto.sassu@huawei.com>
On Thu, 2020-06-18 at 18:01 +0200, Roberto Sassu wrote:
> Public keys do not need to be appraised by IMA as the restriction on the
> IMA/EVM keyrings ensures that a key is loaded only if it is signed with a
> key in the primary or secondary keyring.
>
> However, when evm_load_x509() is loaded, appraisal is already enabled and
> a valid IMA signature must be added to the EVM key to pass verification.
>
> Since the restriction is applied on both IMA and EVM keyrings, it is safe
> to disable appraisal also when the EVM key is loaded. This patch calls
> evm_load_x509() inside ima_load_x509() if CONFIG_IMA_LOAD_X509 is defined.
>
> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
> ---
> security/integrity/iint.c | 2 ++
> security/integrity/ima/ima_init.c | 4 ++++
> 2 files changed, 6 insertions(+)
>
> diff --git a/security/integrity/iint.c b/security/integrity/iint.c
> index e12c4900510f..4765a266ba96 100644
> --- a/security/integrity/iint.c
> +++ b/security/integrity/iint.c
> @@ -212,7 +212,9 @@ int integrity_kernel_read(struct file *file, loff_t offset,
> void __init integrity_load_keys(void)
> {
> ima_load_x509();
> +#ifndef CONFIG_IMA_LOAD_X509
> evm_load_x509();
> +#endif
> }
>
> static int __init integrity_fs_init(void)
> diff --git a/security/integrity/ima/ima_init.c b/security/integrity/ima/ima_init.c
> index 4902fe7bd570..9d29a1680da8 100644
> --- a/security/integrity/ima/ima_init.c
> +++ b/security/integrity/ima/ima_init.c
> @@ -106,6 +106,10 @@ void __init ima_load_x509(void)
>
> ima_policy_flag &= ~unset_flags;
> integrity_load_x509(INTEGRITY_KEYRING_IMA, CONFIG_IMA_X509_PATH);
> +
> + /* load also EVM key to avoid appraisal */
> + evm_load_x509();
> +
> ima_policy_flag |= unset_flags;
> }
> #endif
As much as possible IMA and EVM should remain independent of each
other. Modifying integrity_load_x509() doesn't help. This looks like
a good reason for calling another EVM function from within IMA.
Mimi
^ permalink raw reply
* Re: [net-next PATCH] netlabel: fix problems with mapping removal
From: David Miller @ 2020-08-21 18:38 UTC (permalink / raw)
To: paul; +Cc: netdev, selinux, linux-security-module, stephen.smalley.work
In-Reply-To: <159797437409.20181.15427109610194880479.stgit@sifl>
From: Paul Moore <paul@paul-moore.com>
Date: Thu, 20 Aug 2020 21:46:14 -0400
> This patch fixes two main problems seen when removing NetLabel
> mappings: memory leaks and potentially extra audit noise.
These are bug fixes therefore this needs to target the 'net' tree
and you must also provide appropriate "Fixes:" tags.
Thank you.
^ permalink raw reply
* Re: [PATCH 01/11] evm: Execute evm_inode_init_security() only when the HMAC key is loaded
From: Mimi Zohar @ 2020-08-21 18:30 UTC (permalink / raw)
To: Roberto Sassu, mjg59
Cc: linux-integrity, linux-security-module, linux-kernel, stable
In-Reply-To: <20200618160133.937-1-roberto.sassu@huawei.com>
Hi Roberto,
Sorry for the delay in reviewing these patches. Missing from this
patch set is a cover letter with an explanation for grouping these
patches into a patch set, other than for convenience. In this case, it
would be along the lines that the original use case for EVM portable
and immutable keys support was for a few critical files, not combined
with an EVM encrypted key type. This patch set more fully integrates
the initial EVM portable and immutable signature support.
On Thu, 2020-06-18 at 18:01 +0200, Roberto Sassu wrote:
> evm_inode_init_security() requires the HMAC key to calculate the HMAC on
> initial xattrs provided by LSMs. Unfortunately, with the evm_key_loaded()
> check, the function continues even if the HMAC key is not loaded
> (evm_key_loaded() returns true also if EVM has been initialized only with a
> public key). If the HMAC key is not loaded, evm_inode_init_security()
> returns an error later when it calls evm_init_hmac().
>
> Thus, this patch replaces the evm_key_loaded() check with a check of the
> EVM_INIT_HMAC flag in evm_initialized, so that evm_inode_init_security()
> returns 0 instead of an error.
>
> Cc: stable@vger.kernel.org # 4.5.x
> Fixes: 26ddabfe96b ("evm: enable EVM when X509 certificate is loaded")
> Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Reviewed-by: Mimi Zohar <zohar@linux.ibm.com>
> ---
> security/integrity/evm/evm_main.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/security/integrity/evm/evm_main.c b/security/integrity/evm/evm_main.c
> index 0d36259b690d..744c105b48d1 100644
> --- a/security/integrity/evm/evm_main.c
> +++ b/security/integrity/evm/evm_main.c
> @@ -521,7 +521,8 @@ int evm_inode_init_security(struct inode *inode,
> struct evm_xattr *xattr_data;
> int rc;
>
> - if (!evm_key_loaded() || !evm_protected_xattr(lsm_xattr->name))
> + if (!(evm_initialized & EVM_INIT_HMAC) ||
> + !evm_protected_xattr(lsm_xattr->name))
> return 0;
>
> xattr_data = kzalloc(sizeof(*xattr_data), GFP_NOFS);
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox