Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* [PATCH] KEYS: remove redundant memsets
From: trix @ 2020-07-21 14:15 UTC (permalink / raw)
  To: dhowells, jarkko.sakkinen, jmorris, serge, denkenz, marcel
  Cc: keyrings, linux-security-module, linux-kernel, Tom Rix

From: Tom Rix <trix@redhat.com>

Reviewing use of memset in keyctrl_pkey.c

keyctl_pkey_params_get prologue code to set params up

	memset(params, 0, sizeof(*params));
	params->encoding = "raw";

keyctl_pkey_params_get_2 and keyctl_pkey_query have the same
prologue and they call keyctl_pkey_params_get.

So remove the prologue from the callers.

In keyctl_pkey_params_get_2, reorder the copy_from_user
of uparams to closer to it's use to ensure that
the keyctrl_pkey_params_get is called first.

Fixes: 00d60fd3b932 ("KEYS: Provide keyctls to drive the new key type ops for asymmetric keys [ver #2]")

Signed-off-by: Tom Rix <trix@redhat.com>
---
 security/keys/keyctl_pkey.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/security/keys/keyctl_pkey.c b/security/keys/keyctl_pkey.c
index 931d8dfb4a7f..60b504681388 100644
--- a/security/keys/keyctl_pkey.c
+++ b/security/keys/keyctl_pkey.c
@@ -119,12 +119,6 @@ static int keyctl_pkey_params_get_2(const struct keyctl_pkey_params __user *_par
 	struct kernel_pkey_query info;
 	int ret;
 
-	memset(params, 0, sizeof(*params));
-	params->encoding = "raw";
-
-	if (copy_from_user(&uparams, _params, sizeof(uparams)) != 0)
-		return -EFAULT;
-
 	ret = keyctl_pkey_params_get(uparams.key_id, _info, params);
 	if (ret < 0)
 		return ret;
@@ -133,6 +127,9 @@ static int keyctl_pkey_params_get_2(const struct keyctl_pkey_params __user *_par
 	if (ret < 0)
 		return ret;
 
+	if (copy_from_user(&uparams, _params, sizeof(uparams)) != 0)
+		return -EFAULT;
+
 	switch (op) {
 	case KEYCTL_PKEY_ENCRYPT:
 	case KEYCTL_PKEY_DECRYPT:
@@ -166,8 +163,6 @@ long keyctl_pkey_query(key_serial_t id,
 	struct kernel_pkey_query res;
 	long ret;
 
-	memset(&params, 0, sizeof(params));
-
 	ret = keyctl_pkey_params_get(id, _info, &params);
 	if (ret < 0)
 		goto error;
-- 
2.18.1


^ permalink raw reply related

* Re: [PATCH ghak84 v4] audit: purge audit_log_string from the intra-kernel audit API
From: Paul Moore @ 2020-07-21 15:19 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: john.johansen, Linux-Audit Mailing List, LKML,
	Linux Security Module list, Eric Paris
In-Reply-To: <20200714210027.me2ieywjfcsf4v5r@madcap2.tricolour.ca>

On Tue, Jul 14, 2020 at 5:00 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2020-07-14 16:29, Paul Moore wrote:
> > On Tue, Jul 14, 2020 at 1:44 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > On 2020-07-14 12:21, Paul Moore wrote:
> > > > On Mon, Jul 13, 2020 at 3:52 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> > > > >
> > > > > audit_log_string() was inteded to be an internal audit function and
> > > > > since there are only two internal uses, remove them.  Purge all external
> > > > > uses of it by restructuring code to use an existing audit_log_format()
> > > > > or using audit_log_format().
> > > > >
> > > > > Please see the upstream issue
> > > > > https://github.com/linux-audit/audit-kernel/issues/84
> > > > >
> > > > > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> > > > > ---
> > > > > Passes audit-testsuite.
> > > > >
> > > > > Changelog:
> > > > > v4
> > > > > - use double quotes in all replaced audit_log_string() calls
> > > > >
> > > > > v3
> > > > > - fix two warning: non-void function does not return a value in all control paths
> > > > >         Reported-by: kernel test robot <lkp@intel.com>
> > > > >
> > > > > v2
> > > > > - restructure to piggyback on existing audit_log_format() calls, checking quoting needs for each.
> > > > >
> > > > > v1 Vlad Dronov
> > > > > - https://github.com/nefigtut/audit-kernel/commit/dbbcba46335a002f44b05874153a85b9cc18aebf
> > > > >
> > > > >  include/linux/audit.h     |  5 -----
> > > > >  kernel/audit.c            |  4 ++--
> > > > >  security/apparmor/audit.c | 10 ++++------
> > > > >  security/apparmor/file.c  | 25 +++++++------------------
> > > > >  security/apparmor/ipc.c   | 46 +++++++++++++++++++++++-----------------------
> > > > >  security/apparmor/net.c   | 14 ++++++++------
> > > > >  security/lsm_audit.c      |  4 ++--
> > > > >  7 files changed, 46 insertions(+), 62 deletions(-)
> > > >
> > > > Thanks for restoring the quotes, just one question below ...
> > > >
> > > > > diff --git a/security/apparmor/ipc.c b/security/apparmor/ipc.c
> > > > > index 4ecedffbdd33..fe36d112aad9 100644
> > > > > --- a/security/apparmor/ipc.c
> > > > > +++ b/security/apparmor/ipc.c
> > > > > @@ -20,25 +20,23 @@
> > > > >
> > > > >  /**
> > > > >   * audit_ptrace_mask - convert mask to permission string
> > > > > - * @buffer: buffer to write string to (NOT NULL)
> > > > >   * @mask: permission mask to convert
> > > > > + *
> > > > > + * Returns: pointer to static string
> > > > >   */
> > > > > -static void audit_ptrace_mask(struct audit_buffer *ab, u32 mask)
> > > > > +static const char *audit_ptrace_mask(u32 mask)
> > > > >  {
> > > > >         switch (mask) {
> > > > >         case MAY_READ:
> > > > > -               audit_log_string(ab, "read");
> > > > > -               break;
> > > > > +               return "read";
> > > > >         case MAY_WRITE:
> > > > > -               audit_log_string(ab, "trace");
> > > > > -               break;
> > > > > +               return "trace";
> > > > >         case AA_MAY_BE_READ:
> > > > > -               audit_log_string(ab, "readby");
> > > > > -               break;
> > > > > +               return "readby";
> > > > >         case AA_MAY_BE_TRACED:
> > > > > -               audit_log_string(ab, "tracedby");
> > > > > -               break;
> > > > > +               return "tracedby";
> > > > >         }
> > > > > +       return "";
> > > >
> > > > Are we okay with this returning an empty string ("") in this case?
> > > > Should it be a question mark ("?")?
> > > >
> > > > My guess is that userspace parsing should be okay since it still has
> > > > quotes, I'm just not sure if we wanted to use a question mark as we do
> > > > in other cases where the field value is empty/unknown.
> > >
> > > Previously, it would have been an empty value, not even double quotes.
> > > "?" might be an improvement.
> >
> > Did you want to fix that now in this patch, or leave it to later?  As
> > I said above, I'm not too bothered by it with the quotes so either way
> > is fine by me.
>
> I'd defer to Steve, otherwise I'd say leave it, since there wasn't
> anything there before and this makes that more evident.
>
> > John, I'm assuming you are okay with this patch?

With no comments from John or Steve in the past week, I've gone ahead
and merged the patch into audit/next.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH 2/2] LSM: SafeSetID: Add GID security policy handling
From: Thomas Cedeno @ 2020-07-21 17:01 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Micah Morton, linux-security-module, keescook, casey, paul,
	stephen.smalley.work, jmorris
In-Reply-To: <20200721024202.GB28125@mail.hallyn.com>

On Mon, Jul 20, 2020 at 10:42 PM Serge E. Hallyn <serge@hallyn.com> wrote:
>
> On Mon, Jul 20, 2020 at 11:12:03AM -0700, Micah Morton wrote:
> > From: Thomas Cedeno <thomascedeno@google.com>
> >
> > The SafeSetID LSM has functionality for restricting setuid() calls based
> > on its configured security policies. This patch adds the analogous
> > functionality for setgid() calls. This is mostly a copy-and-paste change
> > with some code deduplication, plus slight modifications/name changes to
> > the policy-rule-related structs (now contain GID rules in addition to
> > the UID ones) and some type generalization since SafeSetID now needs to
> > deal with kgid_t and kuid_t types.
> >
> > Signed-off-by: Thomas Cedeno <thomascedeno@google.com>
> > Signed-off-by: Micah Morton <mortonm@chromium.org>
>
> Just one question and two little comments below.
>
> > ---
> > NOTE: Looks like some userns-related lines in the selftest for SafeSetID
> > recently had some kind of regression. We won't be sending a patch to
> > update the selftest until we can get to the bottom of that. However, we
> > have a WIP (due to the userns regression) update to the selftest which
> > tests the GID stuff and we have used it to ensure this patch is correct.
> >
> >  security/safesetid/lsm.c        | 178 +++++++++++++++++++++-------
> >  security/safesetid/lsm.h        |  38 ++++--
> >  security/safesetid/securityfs.c | 198 +++++++++++++++++++++++---------
> >  3 files changed, 309 insertions(+), 105 deletions(-)
> >
> > diff --git a/security/safesetid/lsm.c b/security/safesetid/lsm.c
> > index 7760019ad35d..787a98e82f1e 100644
> > --- a/security/safesetid/lsm.c
> > +++ b/security/safesetid/lsm.c
> > @@ -24,20 +24,36 @@
> >  /* Flag indicating whether initialization completed */
> >  int safesetid_initialized;
> >
> > -struct setuid_ruleset __rcu *safesetid_setuid_rules;
> > +struct setid_ruleset __rcu *safesetid_setuid_rules;
> > +struct setid_ruleset __rcu *safesetid_setgid_rules;
> > +
> >
> >  /* Compute a decision for a transition from @src to @dst under @policy. */
> > -enum sid_policy_type _setuid_policy_lookup(struct setuid_ruleset *policy,
> > -             kuid_t src, kuid_t dst)
> > +enum sid_policy_type _setid_policy_lookup(struct setid_ruleset *policy,
> > +             kid_t src, kid_t dst)
> >  {
> > -     struct setuid_rule *rule;
> > +     struct setid_rule *rule;
> >       enum sid_policy_type result = SIDPOL_DEFAULT;
> >
> > -     hash_for_each_possible(policy->rules, rule, next, __kuid_val(src)) {
> > -             if (!uid_eq(rule->src_uid, src))
> > -                     continue;
> > -             if (uid_eq(rule->dst_uid, dst))
> > -                     return SIDPOL_ALLOWED;
> > +     if (policy->type == UID) {
> > +             hash_for_each_possible(policy->rules, rule, next, __kuid_val(src.uid)) {
> > +                     if (!uid_eq(rule->src_id.uid, src.uid))
> > +                             continue;
> > +                     if (uid_eq(rule->dst_id.uid, dst.uid))
> > +                             return SIDPOL_ALLOWED;
> > +                     result = SIDPOL_CONSTRAINED;
>
> Can you describe precisely under which conditions SIDPOL_CONSTRAINED should
> be returned vs. SIDPOL_DEFAULT?
>


For calculating ID transitions, SafeSetID takes in a src and dst UID
or GID and if an existing policy lists the source ID but not the dst
ID in one or more of its rules, we need to constrain the ID and thus
return SIDPOOL_CONSTRAINED. If no policy even mentions the src ID, it
passes through as SIDPOOL_DEFAULT, where the ID is not constrained and
can be used for other purposes.


> > +             }
> > +     } else if (policy->type == GID) {
> > +             hash_for_each_possible(policy->rules, rule, next, __kgid_val(src.gid)) {
> > +                     if (!gid_eq(rule->src_id.gid, src.gid))
> > +                             continue;
> > +                     if (gid_eq(rule->dst_id.gid, dst.gid)){
> > +                             return SIDPOL_ALLOWED;
> > +                     }
> > +                     result = SIDPOL_CONSTRAINED;
> > +             }
> > +     } else {
> > +             /* Should not reach here, report the ID as contrainsted */
> >               result = SIDPOL_CONSTRAINED;
> >       }
> >       return result;
> > @@ -47,15 +63,26 @@ enum sid_policy_type _setuid_policy_lookup(struct setuid_ruleset *policy,
> >   * Compute a decision for a transition from @src to @dst under the active
> >   * policy.
> >   */
> > -static enum sid_policy_type setuid_policy_lookup(kuid_t src, kuid_t dst)
> > +static enum sid_policy_type setid_policy_lookup(kid_t src, kid_t dst, enum setid_type new_type)
> >  {
> >       enum sid_policy_type result = SIDPOL_DEFAULT;
> > -     struct setuid_ruleset *pol;
> > +     struct setid_ruleset *pol;
> >
> >       rcu_read_lock();
> > -     pol = rcu_dereference(safesetid_setuid_rules);
> > -     if (pol)
> > -             result = _setuid_policy_lookup(pol, src, dst);
> > +     if (new_type == UID)
> > +             pol = rcu_dereference(safesetid_setuid_rules);
> > +     else if (new_type == GID)
> > +             pol = rcu_dereference(safesetid_setgid_rules);
> > +     else { /* Should not reach here */
> > +             result = SIDPOL_CONSTRAINED;
> > +             rcu_read_unlock();
> > +             return result;
> > +     }
> > +
> > +     if (pol) {
> > +             pol->type = new_type;
> > +             result = _setid_policy_lookup(pol, src, dst);
> > +     }
> >       rcu_read_unlock();
> >       return result;
> >  }
> > @@ -65,8 +92,8 @@ static int safesetid_security_capable(const struct cred *cred,
> >                                     int cap,
> >                                     unsigned int opts)
> >  {
> > -     /* We're only interested in CAP_SETUID. */
> > -     if (cap != CAP_SETUID)
> > +     /* We're only interested in CAP_SETUID and CAP_SETGID. */
> > +     if (cap != CAP_SETUID && cap != CAP_SETGID)
> >               return 0;
> >
> >       /*
> > @@ -77,45 +104,83 @@ static int safesetid_security_capable(const struct cred *cred,
> >       if ((opts & CAP_OPT_INSETID) != 0)
> >               return 0;
> >
> > -     /*
> > -      * If no policy applies to this task, allow the use of CAP_SETUID for
> > -      * other purposes.
> > -      */
> > -     if (setuid_policy_lookup(cred->uid, INVALID_UID) == SIDPOL_DEFAULT)
> > +     switch (cap) {
> > +     case CAP_SETUID:
> > +             /*
> > +             * If no policy applies to this task, allow the use of CAP_SETUID for
> > +             * other purposes.
> > +             */
> > +             if (setid_policy_lookup((kid_t)cred->uid, INVALID_ID, UID) == SIDPOL_DEFAULT)
> > +                     return 0;
> > +             /*
> > +              * Reject use of CAP_SETUID for functionality other than calling
> > +              * set*uid() (e.g. setting up userns uid mappings).
> > +              */
> > +             pr_warn("Operation requires CAP_SETUID, which is not available to UID %u for operations besides approved set*uid transitions\n",
> > +                     __kuid_val(cred->uid));
> > +             return -EPERM;
> > +             break;
> > +     case CAP_SETGID:
> > +             /*
> > +             * If no policy applies to this task, allow the use of CAP_SETGID for
> > +             * other purposes.
> > +             */
> > +             if (setid_policy_lookup((kid_t)cred->gid, INVALID_ID, GID) == SIDPOL_DEFAULT)
> > +                     return 0;
> > +             /*
> > +              * Reject use of CAP_SETUID for functionality other than calling
> > +              * set*gid() (e.g. setting up userns gid mappings).
> > +              */
> > +             pr_warn("Operation requires CAP_SETGID, which is not available to GID %u for operations besides approved set*gid transitions\n",
> > +                     __kuid_val(cred->uid));
> > +             return -EPERM;
> > +             break;
> > +     default:
> > +             /* Error, the only capabilities were checking for is CAP_SETUID/GID */
> >               return 0;
> > -
> > -     /*
> > -      * Reject use of CAP_SETUID for functionality other than calling
> > -      * set*uid() (e.g. setting up userns uid mappings).
> > -      */
> > -     pr_warn("Operation requires CAP_SETUID, which is not available to UID %u for operations besides approved set*uid transitions\n",
> > -             __kuid_val(cred->uid));
> > -     return -EPERM;
> > +             break;
> > +     }
> > +     return 0;
> >  }
> >
> >  /*
> >   * Check whether a caller with old credentials @old is allowed to switch to
> > - * credentials that contain @new_uid.
> > + * credentials that contain @new_id.
> >   */
> > -static bool uid_permitted_for_cred(const struct cred *old, kuid_t new_uid)
> > +static bool id_permitted_for_cred(const struct cred *old, kid_t new_id, enum setid_type new_type)
> >  {
> >       bool permitted;
> >
> > -     /* If our old creds already had this UID in it, it's fine. */
> > -     if (uid_eq(new_uid, old->uid) || uid_eq(new_uid, old->euid) ||
> > -         uid_eq(new_uid, old->suid))
> > -             return true;
> > +     /* If our old creds already had this ID in it, it's fine. */
> > +     if (new_type == UID) {
> > +             if (uid_eq(new_id.uid, old->uid) || uid_eq(new_id.uid, old->euid) ||
> > +                     uid_eq(new_id.uid, old->suid))
> > +                     return true;
> > +     } else if (new_type == GID){
> > +             if (gid_eq(new_id.gid, old->gid) || gid_eq(new_id.gid, old->egid) ||
> > +                     gid_eq(new_id.gid, old->sgid))
> > +                     return true;
> > +     } else /* Error, new_type is an invalid type */
> > +             return false;
> >
> >       /*
> >        * Transitions to new UIDs require a check against the policy of the old
> >        * RUID.
> >        */
> >       permitted =
> > -         setuid_policy_lookup(old->uid, new_uid) != SIDPOL_CONSTRAINED;
> > +         setid_policy_lookup((kid_t)old->uid, new_id, new_type) != SIDPOL_CONSTRAINED;
> > +
> >       if (!permitted) {
> > -             pr_warn("UID transition ((%d,%d,%d) -> %d) blocked\n",
> > -                     __kuid_val(old->uid), __kuid_val(old->euid),
> > -                     __kuid_val(old->suid), __kuid_val(new_uid));
> > +             if (new_type == UID) {
> > +                     pr_warn("UID transition ((%d,%d,%d) -> %d) blocked\n",
> > +                             __kuid_val(old->uid), __kuid_val(old->euid),
> > +                             __kuid_val(old->suid), __kuid_val(new_id.uid));
> > +             } else if (new_type == GID) {
> > +                     pr_warn("GID transition ((%d,%d,%d) -> %d) blocked\n",
> > +                             __kgid_val(old->gid), __kgid_val(old->egid),
> > +                             __kgid_val(old->sgid), __kgid_val(new_id.gid));
> > +             } else /* Error, new_type is an invalid type */
> > +                     return false;
> >       }
> >       return permitted;
> >  }
> > @@ -131,13 +196,37 @@ static int safesetid_task_fix_setuid(struct cred *new,
> >  {
> >
> >       /* Do nothing if there are no setuid restrictions for our old RUID. */
> > -     if (setuid_policy_lookup(old->uid, INVALID_UID) == SIDPOL_DEFAULT)
> > +     if (setid_policy_lookup((kid_t)old->uid, INVALID_ID, UID) == SIDPOL_DEFAULT)
> > +             return 0;
> > +
> > +     if (id_permitted_for_cred(old, (kid_t)new->uid, UID) &&
> > +         id_permitted_for_cred(old, (kid_t)new->euid, UID) &&
> > +         id_permitted_for_cred(old, (kid_t)new->suid, UID) &&
> > +         id_permitted_for_cred(old, (kid_t)new->fsuid, UID))
> > +             return 0;
> > +
> > +     /*
> > +      * Kill this process to avoid potential security vulnerabilities
> > +      * that could arise from a missing whitelist entry preventing a
> > +      * privileged process from dropping to a lesser-privileged one.
> > +      */
> > +     force_sig(SIGKILL);
> > +     return -EACCES;
> > +}
> > +
> > +static int safesetid_task_fix_setgid(struct cred *new,
> > +                                  const struct cred *old,
> > +                                  int flags)
> > +{
> > +
> > +     /* Do nothing if there are no setgid restrictions for our old RGID. */
> > +     if (setid_policy_lookup((kid_t)old->gid, INVALID_ID, GID) == SIDPOL_DEFAULT)
> >               return 0;
> >
> > -     if (uid_permitted_for_cred(old, new->uid) &&
> > -         uid_permitted_for_cred(old, new->euid) &&
> > -         uid_permitted_for_cred(old, new->suid) &&
> > -         uid_permitted_for_cred(old, new->fsuid))
> > +     if (id_permitted_for_cred(old, (kid_t)new->gid, GID) &&
> > +         id_permitted_for_cred(old, (kid_t)new->egid, GID) &&
> > +         id_permitted_for_cred(old, (kid_t)new->sgid, GID) &&
> > +         id_permitted_for_cred(old, (kid_t)new->fsgid, GID))
> >               return 0;
> >
> >       /*
> > @@ -151,6 +240,7 @@ static int safesetid_task_fix_setuid(struct cred *new,
> >
> >  static struct security_hook_list safesetid_security_hooks[] = {
> >       LSM_HOOK_INIT(task_fix_setuid, safesetid_task_fix_setuid),
> > +     LSM_HOOK_INIT(task_fix_setgid, safesetid_task_fix_setgid),
> >       LSM_HOOK_INIT(capable, safesetid_security_capable)
> >  };
> >
> > diff --git a/security/safesetid/lsm.h b/security/safesetid/lsm.h
> > index db6d16e6bbc3..bde8c43a3767 100644
> > --- a/security/safesetid/lsm.h
> > +++ b/security/safesetid/lsm.h
> > @@ -27,27 +27,47 @@ enum sid_policy_type {
> >       SIDPOL_ALLOWED /* target ID explicitly allowed */
> >  };
> >
> > +typedef union {
> > +     kuid_t uid;
> > +     kgid_t gid;
> > +} kid_t;
> > +
> > +enum setid_type {
> > +     UID,
> > +     GID
> > +};
> > +
> >  /*
> > - * Hash table entry to store safesetid policy signifying that 'src_uid'
> > - * can setuid to 'dst_uid'.
> > + * Hash table entry to store safesetid policy signifying that 'src_id'
> > + * can set*id to 'dst_id'.
> >   */
> > -struct setuid_rule {
> > +struct setid_rule {
> >       struct hlist_node next;
> > -     kuid_t src_uid;
> > -     kuid_t dst_uid;
> > +     kid_t src_id;
> > +     kid_t dst_id;
> > +
> > +     /* Flag to signal if rule is for UID's or GID's */
> > +     enum setid_type type;
> >  };
> >
> >  #define SETID_HASH_BITS 8 /* 256 buckets in hash table */
> >
> > -struct setuid_ruleset {
> > +/* Extension of INVALID_UID/INVALID_GID for kid_t type */
> > +#define INVALID_ID (kid_t){.uid = INVALID_UID}
> > +
> > +struct setid_ruleset {
> >       DECLARE_HASHTABLE(rules, SETID_HASH_BITS);
> >       char *policy_str;
> >       struct rcu_head rcu;
> > +
> > +     //Flag to signal if ruleset is for UID's or GID's
> > +     enum setid_type type;
> >  };
> >
> > -enum sid_policy_type _setuid_policy_lookup(struct setuid_ruleset *policy,
> > -             kuid_t src, kuid_t dst);
> > +enum sid_policy_type _setid_policy_lookup(struct setid_ruleset *policy,
> > +             kid_t src, kid_t dst);
> >
> > -extern struct setuid_ruleset __rcu *safesetid_setuid_rules;
> > +extern struct setid_ruleset __rcu *safesetid_setuid_rules;
> > +extern struct setid_ruleset __rcu *safesetid_setgid_rules;
> >
> >  #endif /* _SAFESETID_H */
> > diff --git a/security/safesetid/securityfs.c b/security/safesetid/securityfs.c
> > index f8bc574cea9c..211050d0a922 100644
> > --- a/security/safesetid/securityfs.c
> > +++ b/security/safesetid/securityfs.c
> > @@ -19,22 +19,23 @@
> >
> >  #include "lsm.h"
> >
> > -static DEFINE_MUTEX(policy_update_lock);
> > +static DEFINE_MUTEX(uid_policy_update_lock);
> > +static DEFINE_MUTEX(gid_policy_update_lock);
> >
> >  /*
> > - * In the case the input buffer contains one or more invalid UIDs, the kuid_t
> > + * In the case the input buffer contains one or more invalid IDs, the kid_t
> >   * variables pointed to by @parent and @child will get updated but this
> >   * function will return an error.
> >   * Contents of @buf may be modified.
> >   */
> >  static int parse_policy_line(struct file *file, char *buf,
> > -     struct setuid_rule *rule)
> > +     struct setid_rule *rule)
> >  {
> >       char *child_str;
> >       int ret;
> >       u32 parsed_parent, parsed_child;
> >
> > -     /* Format of |buf| string should be <UID>:<UID>. */
> > +     /* Format of |buf| string should be <UID>:<UID> or <GID>:<GID> */
> >       child_str = strchr(buf, ':');
> >       if (child_str == NULL)
> >               return -EINVAL;
> > @@ -49,20 +50,36 @@ static int parse_policy_line(struct file *file, char *buf,
> >       if (ret)
> >               return ret;
> >
> > -     rule->src_uid = make_kuid(file->f_cred->user_ns, parsed_parent);
> > -     rule->dst_uid = make_kuid(file->f_cred->user_ns, parsed_child);
> > -     if (!uid_valid(rule->src_uid) || !uid_valid(rule->dst_uid))
> > +     if (rule->type == UID){
> > +             rule->src_id.uid = make_kuid(file->f_cred->user_ns, parsed_parent);
> > +             rule->dst_id.uid = make_kuid(file->f_cred->user_ns, parsed_child);
> > +     } else if (rule->type == GID){
> > +             rule->src_id.gid = make_kgid(file->f_cred->user_ns, parsed_parent);
> > +             rule->dst_id.gid = make_kgid(file->f_cred->user_ns, parsed_child);
> > +     } else {
> > +             /* Error, rule->type is an invalid type */
> >               return -EINVAL;
> > +     }
>
> Is there any reason to have these the below if/else actions go into the above
> if/else block?
>


As for the two if/elseif/else branches in the parse_policy_line
function, I think you're right in the fact that they can be collapsed
together. We'll introduce another patch to modify this.



> > +     if (rule->type == UID) {
> > +             if (!uid_valid(rule->src_id.uid) || !uid_valid(rule->dst_id.uid))
> > +                     return -EINVAL;
> > +     } else if (rule->type == GID) {
> > +             if (!gid_valid(rule->src_id.gid) || !gid_valid(rule->dst_id.gid))
> > +                     return -EINVAL;
> > +     } else {
> > +             /* Error, rule->type is an invalid type */
> > +             return -EINVAL;
> > +     }
> >       return 0;
> >  }
> >
> >  static void __release_ruleset(struct rcu_head *rcu)
> >  {
> > -     struct setuid_ruleset *pol =
> > -             container_of(rcu, struct setuid_ruleset, rcu);
> > +     struct setid_ruleset *pol =
> > +             container_of(rcu, struct setid_ruleset, rcu);
> >       int bucket;
> > -     struct setuid_rule *rule;
> > +     struct setid_rule *rule;
> >       struct hlist_node *tmp;
> >
> >       hash_for_each_safe(pol->rules, bucket, tmp, rule, next)
> > @@ -71,36 +88,58 @@ static void __release_ruleset(struct rcu_head *rcu)
> >       kfree(pol);
> >  }
> >
> > -static void release_ruleset(struct setuid_ruleset *pol)
> > -{
> > +static void release_ruleset(struct setid_ruleset *pol){
> >       call_rcu(&pol->rcu, __release_ruleset);
> >  }
> >
> > -static void insert_rule(struct setuid_ruleset *pol, struct setuid_rule *rule)
> > +static void insert_rule(struct setid_ruleset *pol, struct setid_rule *rule)
> >  {
> > -     hash_add(pol->rules, &rule->next, __kuid_val(rule->src_uid));
> > +     if (pol->type == UID)
> > +             hash_add(pol->rules, &rule->next, __kuid_val(rule->src_id.uid));
> > +     else if (pol->type == GID)
> > +             hash_add(pol->rules, &rule->next, __kgid_val(rule->src_id.gid));
> > +     else /* Error, pol->type is neither UID or GID */
> > +             return;
> >  }
> >
> > -static int verify_ruleset(struct setuid_ruleset *pol)
> > +static int verify_ruleset(struct setid_ruleset *pol)
> >  {
> >       int bucket;
> > -     struct setuid_rule *rule, *nrule;
> > +     struct setid_rule *rule, *nrule;
> >       int res = 0;
> >
> >       hash_for_each(pol->rules, bucket, rule, next) {
> > -             if (_setuid_policy_lookup(pol, rule->dst_uid, INVALID_UID) ==
> > -                 SIDPOL_DEFAULT) {
> > -                     pr_warn("insecure policy detected: uid %d is constrained but transitively unconstrained through uid %d\n",
> > -                             __kuid_val(rule->src_uid),
> > -                             __kuid_val(rule->dst_uid));
> > +             if (_setid_policy_lookup(pol, rule->dst_id, INVALID_ID) == SIDPOL_DEFAULT) {
> > +                     if (pol->type == UID) {
> > +                             pr_warn("insecure policy detected: uid %d is constrained but transitively unconstrained through uid %d\n",
> > +                                     __kuid_val(rule->src_id.uid),
> > +                                     __kuid_val(rule->dst_id.uid));
> > +                     } else if (pol->type == GID) {
> > +                             pr_warn("insecure policy detected: gid %d is constrained but transitively unconstrained through gid %d\n",
> > +                                     __kgid_val(rule->src_id.gid),
> > +                                     __kgid_val(rule->dst_id.gid));
> > +                     } else { /* pol->type is an invalid type */
> > +                             res = -EINVAL;
> > +                             return res;
> > +                     }
> >                       res = -EINVAL;
> >
> >                       /* fix it up */
> > -                     nrule = kmalloc(sizeof(struct setuid_rule), GFP_KERNEL);
> > +                     nrule = kmalloc(sizeof(struct setid_rule), GFP_KERNEL);
> >                       if (!nrule)
> >                               return -ENOMEM;
> > -                     nrule->src_uid = rule->dst_uid;
> > -                     nrule->dst_uid = rule->dst_uid;
> > +                     if (pol->type == UID){
> > +                             nrule->src_id.uid = rule->dst_id.uid;
> > +                             nrule->dst_id.uid = rule->dst_id.uid;
> > +                             nrule->type = UID;
> > +                     } else if (pol->type == GID){
> > +                             nrule->src_id.gid = rule->dst_id.gid;
> > +                             nrule->dst_id.gid = rule->dst_id.gid;
> > +                             nrule->type = GID;
> > +                     } else { /* Error, pol->type is neither UID or GID */
> > +                             kfree(nrule);
>
> Why not check this before the kmalloc?
>


The whole else branch is a sanity check and probably will never be
used, but looking at it a second time, it is being redundantly checked
with the else statement in the above branch, so this else statement
can be gone with the new patch.



> > +                             return res;
> > +                     }
> >                       insert_rule(pol, nrule);
> >               }
> >       }
> > @@ -108,16 +147,17 @@ static int verify_ruleset(struct setuid_ruleset *pol)
> >  }
> >
> >  static ssize_t handle_policy_update(struct file *file,
> > -                                 const char __user *ubuf, size_t len)
> > +                                 const char __user *ubuf, size_t len, enum setid_type policy_type)
> >  {
> > -     struct setuid_ruleset *pol;
> > +     struct setid_ruleset *pol;
> >       char *buf, *p, *end;
> >       int err;
> >
> > -     pol = kmalloc(sizeof(struct setuid_ruleset), GFP_KERNEL);
> > +     pol = kmalloc(sizeof(struct setid_ruleset), GFP_KERNEL);
> >       if (!pol)
> >               return -ENOMEM;
> >       pol->policy_str = NULL;
> > +     pol->type = policy_type;
> >       hash_init(pol->rules);
> >
> >       p = buf = memdup_user_nul(ubuf, len);
> > @@ -133,7 +173,7 @@ static ssize_t handle_policy_update(struct file *file,
> >
> >       /* policy lines, including the last one, end with \n */
> >       while (*p != '\0') {
> > -             struct setuid_rule *rule;
> > +             struct setid_rule *rule;
> >
> >               end = strchr(p, '\n');
> >               if (end == NULL) {
> > @@ -142,18 +182,18 @@ static ssize_t handle_policy_update(struct file *file,
> >               }
> >               *end = '\0';
> >
> > -             rule = kmalloc(sizeof(struct setuid_rule), GFP_KERNEL);
> > +             rule = kmalloc(sizeof(struct setid_rule), GFP_KERNEL);
> >               if (!rule) {
> >                       err = -ENOMEM;
> >                       goto out_free_buf;
> >               }
> >
> > +             rule->type = policy_type;
> >               err = parse_policy_line(file, p, rule);
> >               if (err)
> >                       goto out_free_rule;
> >
> > -             if (_setuid_policy_lookup(pol, rule->src_uid, rule->dst_uid) ==
> > -                 SIDPOL_ALLOWED) {
> > +             if (_setid_policy_lookup(pol, rule->src_id, rule->dst_id) == SIDPOL_ALLOWED) {
> >                       pr_warn("bad policy: duplicate entry\n");
> >                       err = -EEXIST;
> >                       goto out_free_rule;
> > @@ -178,21 +218,45 @@ static ssize_t handle_policy_update(struct file *file,
> >        * What we really want here is an xchg() wrapper for RCU, but since that
> >        * doesn't currently exist, just use a spinlock for now.
> >        */
> > -     mutex_lock(&policy_update_lock);
> > -     pol = rcu_replace_pointer(safesetid_setuid_rules, pol,
> > -                               lockdep_is_held(&policy_update_lock));
> > -     mutex_unlock(&policy_update_lock);
> > +     if (policy_type == UID) {
> > +             mutex_lock(&uid_policy_update_lock);
> > +             pol = rcu_replace_pointer(safesetid_setuid_rules, pol,
> > +                                       lockdep_is_held(&uid_policy_update_lock));
> > +             mutex_unlock(&uid_policy_update_lock);
> > +     } else if (policy_type == GID) {
> > +             mutex_lock(&gid_policy_update_lock);
> > +             pol = rcu_replace_pointer(safesetid_setgid_rules, pol,
> > +                                       lockdep_is_held(&gid_policy_update_lock));
> > +             mutex_unlock(&gid_policy_update_lock);
> > +     } else {
> > +             /* Error, policy type is neither UID or GID */
> > +             pr_warn("error: bad policy type");
> > +     }
> >       err = len;
> >
> >  out_free_buf:
> >       kfree(buf);
> >  out_free_pol:
> >       if (pol)
> > -                release_ruleset(pol);
> > +             release_ruleset(pol);
> >       return err;
> >  }
> >
> > -static ssize_t safesetid_file_write(struct file *file,
> > +static ssize_t safesetid_uid_file_write(struct file *file,
> > +                                 const char __user *buf,
> > +                                 size_t len,
> > +                                 loff_t *ppos)
> > +{
> > +     if (!file_ns_capable(file, &init_user_ns, CAP_MAC_ADMIN))
> > +             return -EPERM;
> > +
> > +     if (*ppos != 0)
> > +             return -EINVAL;
> > +
> > +     return handle_policy_update(file, buf, len, UID);
> > +}
> > +
> > +static ssize_t safesetid_gid_file_write(struct file *file,
> >                                   const char __user *buf,
> >                                   size_t len,
> >                                   loff_t *ppos)
> > @@ -203,38 +267,60 @@ static ssize_t safesetid_file_write(struct file *file,
> >       if (*ppos != 0)
> >               return -EINVAL;
> >
> > -     return handle_policy_update(file, buf, len);
> > +     return handle_policy_update(file, buf, len, GID);
> >  }
> >
> >  static ssize_t safesetid_file_read(struct file *file, char __user *buf,
> > -                                size_t len, loff_t *ppos)
> > +                                size_t len, loff_t *ppos, struct mutex *policy_update_lock, struct setid_ruleset* ruleset)
> >  {
> >       ssize_t res = 0;
> > -     struct setuid_ruleset *pol;
> > +     struct setid_ruleset *pol;
> >       const char *kbuf;
> >
> > -     mutex_lock(&policy_update_lock);
> > -     pol = rcu_dereference_protected(safesetid_setuid_rules,
> > -                                     lockdep_is_held(&policy_update_lock));
> > +     mutex_lock(policy_update_lock);
> > +     pol = rcu_dereference_protected(ruleset, lockdep_is_held(&policy_update_lock));
> >       if (pol) {
> >               kbuf = pol->policy_str;
> >               res = simple_read_from_buffer(buf, len, ppos,
> >                                             kbuf, strlen(kbuf));
> >       }
> > -     mutex_unlock(&policy_update_lock);
> > +     mutex_unlock(policy_update_lock);
> > +
> >       return res;
> >  }
> >
> > -static const struct file_operations safesetid_file_fops = {
> > -     .read = safesetid_file_read,
> > -     .write = safesetid_file_write,
> > +static ssize_t safesetid_uid_file_read(struct file *file, char __user *buf,
> > +                                size_t len, loff_t *ppos)
> > +{
> > +     return safesetid_file_read(file, buf, len, ppos,
> > +                                &uid_policy_update_lock, safesetid_setuid_rules);
> > +}
> > +
> > +static ssize_t safesetid_gid_file_read(struct file *file, char __user *buf,
> > +                                size_t len, loff_t *ppos)
> > +{
> > +     return safesetid_file_read(file, buf, len, ppos,
> > +                                &gid_policy_update_lock, safesetid_setgid_rules);
> > +}
> > +
> > +
> > +
> > +static const struct file_operations safesetid_uid_file_fops = {
> > +     .read = safesetid_uid_file_read,
> > +     .write = safesetid_uid_file_write,
> > +};
> > +
> > +static const struct file_operations safesetid_gid_file_fops = {
> > +     .read = safesetid_gid_file_read,
> > +     .write = safesetid_gid_file_write,
> >  };
> >
> >  static int __init safesetid_init_securityfs(void)
> >  {
> >       int ret;
> >       struct dentry *policy_dir;
> > -     struct dentry *policy_file;
> > +     struct dentry *uid_policy_file;
> > +     struct dentry *gid_policy_file;
> >
> >       if (!safesetid_initialized)
> >               return 0;
> > @@ -245,13 +331,21 @@ static int __init safesetid_init_securityfs(void)
> >               goto error;
> >       }
> >
> > -     policy_file = securityfs_create_file("whitelist_policy", 0600,
> > -                     policy_dir, NULL, &safesetid_file_fops);
> > -     if (IS_ERR(policy_file)) {
> > -             ret = PTR_ERR(policy_file);
> > +     uid_policy_file = securityfs_create_file("uid_whitelist_policy", 0600,
> > +                     policy_dir, NULL, &safesetid_uid_file_fops);
> > +     if (IS_ERR(uid_policy_file)) {
> > +             ret = PTR_ERR(uid_policy_file);
> >               goto error;
> >       }
> >
> > +     gid_policy_file = securityfs_create_file("gid_whitelist_policy", 0600,
> > +                     policy_dir, NULL, &safesetid_gid_file_fops);
> > +     if (IS_ERR(gid_policy_file)) {
> > +             ret = PTR_ERR(gid_policy_file);
> > +             goto error;
> > +     }
> > +
> > +
> >       return 0;
> >
> >  error:
> > --
> > 2.28.0.rc0.105.gf9edc3c819-goog

^ permalink raw reply

* Re: [PATCH 2/2] LSM: SafeSetID: Add GID security policy handling
From: Serge E. Hallyn @ 2020-07-21 17:05 UTC (permalink / raw)
  To: Thomas Cedeno
  Cc: Serge E. Hallyn, Micah Morton, linux-security-module, keescook,
	casey, paul, stephen.smalley.work, jmorris
In-Reply-To: <CADgn5cO8ezsEfoWQBwQ=JPvGQ9nuTcPpTWcgr1YjDC=j60AKBw@mail.gmail.com>

On Tue, Jul 21, 2020 at 01:01:20PM -0400, Thomas Cedeno wrote:
> On Mon, Jul 20, 2020 at 10:42 PM Serge E. Hallyn <serge@hallyn.com> wrote:
> >
> > On Mon, Jul 20, 2020 at 11:12:03AM -0700, Micah Morton wrote:
> > > From: Thomas Cedeno <thomascedeno@google.com>
> > >
> > > The SafeSetID LSM has functionality for restricting setuid() calls based
> > > on its configured security policies. This patch adds the analogous
> > > functionality for setgid() calls. This is mostly a copy-and-paste change
> > > with some code deduplication, plus slight modifications/name changes to
> > > the policy-rule-related structs (now contain GID rules in addition to
> > > the UID ones) and some type generalization since SafeSetID now needs to
> > > deal with kgid_t and kuid_t types.
> > >
> > > Signed-off-by: Thomas Cedeno <thomascedeno@google.com>
> > > Signed-off-by: Micah Morton <mortonm@chromium.org>
> >
> > Just one question and two little comments below.

thanks for the explanation.

Reviewed-by: Serge Hallyn <serge@hallyn.com>

> >
> > > ---
> > > NOTE: Looks like some userns-related lines in the selftest for SafeSetID
> > > recently had some kind of regression. We won't be sending a patch to
> > > update the selftest until we can get to the bottom of that. However, we
> > > have a WIP (due to the userns regression) update to the selftest which
> > > tests the GID stuff and we have used it to ensure this patch is correct.
> > >
> > >  security/safesetid/lsm.c        | 178 +++++++++++++++++++++-------
> > >  security/safesetid/lsm.h        |  38 ++++--
> > >  security/safesetid/securityfs.c | 198 +++++++++++++++++++++++---------
> > >  3 files changed, 309 insertions(+), 105 deletions(-)
> > >
> > > diff --git a/security/safesetid/lsm.c b/security/safesetid/lsm.c
> > > index 7760019ad35d..787a98e82f1e 100644
> > > --- a/security/safesetid/lsm.c
> > > +++ b/security/safesetid/lsm.c
> > > @@ -24,20 +24,36 @@
> > >  /* Flag indicating whether initialization completed */
> > >  int safesetid_initialized;
> > >
> > > -struct setuid_ruleset __rcu *safesetid_setuid_rules;
> > > +struct setid_ruleset __rcu *safesetid_setuid_rules;
> > > +struct setid_ruleset __rcu *safesetid_setgid_rules;
> > > +
> > >
> > >  /* Compute a decision for a transition from @src to @dst under @policy. */
> > > -enum sid_policy_type _setuid_policy_lookup(struct setuid_ruleset *policy,
> > > -             kuid_t src, kuid_t dst)
> > > +enum sid_policy_type _setid_policy_lookup(struct setid_ruleset *policy,
> > > +             kid_t src, kid_t dst)
> > >  {
> > > -     struct setuid_rule *rule;
> > > +     struct setid_rule *rule;
> > >       enum sid_policy_type result = SIDPOL_DEFAULT;
> > >
> > > -     hash_for_each_possible(policy->rules, rule, next, __kuid_val(src)) {
> > > -             if (!uid_eq(rule->src_uid, src))
> > > -                     continue;
> > > -             if (uid_eq(rule->dst_uid, dst))
> > > -                     return SIDPOL_ALLOWED;
> > > +     if (policy->type == UID) {
> > > +             hash_for_each_possible(policy->rules, rule, next, __kuid_val(src.uid)) {
> > > +                     if (!uid_eq(rule->src_id.uid, src.uid))
> > > +                             continue;
> > > +                     if (uid_eq(rule->dst_id.uid, dst.uid))
> > > +                             return SIDPOL_ALLOWED;
> > > +                     result = SIDPOL_CONSTRAINED;
> >
> > Can you describe precisely under which conditions SIDPOL_CONSTRAINED should
> > be returned vs. SIDPOL_DEFAULT?
> >
> 
> 
> For calculating ID transitions, SafeSetID takes in a src and dst UID
> or GID and if an existing policy lists the source ID but not the dst
> ID in one or more of its rules, we need to constrain the ID and thus
> return SIDPOOL_CONSTRAINED. If no policy even mentions the src ID, it
> passes through as SIDPOOL_DEFAULT, where the ID is not constrained and
> can be used for other purposes.
> 
> 
> > > +             }
> > > +     } else if (policy->type == GID) {
> > > +             hash_for_each_possible(policy->rules, rule, next, __kgid_val(src.gid)) {
> > > +                     if (!gid_eq(rule->src_id.gid, src.gid))
> > > +                             continue;
> > > +                     if (gid_eq(rule->dst_id.gid, dst.gid)){
> > > +                             return SIDPOL_ALLOWED;
> > > +                     }
> > > +                     result = SIDPOL_CONSTRAINED;
> > > +             }
> > > +     } else {
> > > +             /* Should not reach here, report the ID as contrainsted */
> > >               result = SIDPOL_CONSTRAINED;
> > >       }
> > >       return result;
> > > @@ -47,15 +63,26 @@ enum sid_policy_type _setuid_policy_lookup(struct setuid_ruleset *policy,
> > >   * Compute a decision for a transition from @src to @dst under the active
> > >   * policy.
> > >   */
> > > -static enum sid_policy_type setuid_policy_lookup(kuid_t src, kuid_t dst)
> > > +static enum sid_policy_type setid_policy_lookup(kid_t src, kid_t dst, enum setid_type new_type)
> > >  {
> > >       enum sid_policy_type result = SIDPOL_DEFAULT;
> > > -     struct setuid_ruleset *pol;
> > > +     struct setid_ruleset *pol;
> > >
> > >       rcu_read_lock();
> > > -     pol = rcu_dereference(safesetid_setuid_rules);
> > > -     if (pol)
> > > -             result = _setuid_policy_lookup(pol, src, dst);
> > > +     if (new_type == UID)
> > > +             pol = rcu_dereference(safesetid_setuid_rules);
> > > +     else if (new_type == GID)
> > > +             pol = rcu_dereference(safesetid_setgid_rules);
> > > +     else { /* Should not reach here */
> > > +             result = SIDPOL_CONSTRAINED;
> > > +             rcu_read_unlock();
> > > +             return result;
> > > +     }
> > > +
> > > +     if (pol) {
> > > +             pol->type = new_type;
> > > +             result = _setid_policy_lookup(pol, src, dst);
> > > +     }
> > >       rcu_read_unlock();
> > >       return result;
> > >  }
> > > @@ -65,8 +92,8 @@ static int safesetid_security_capable(const struct cred *cred,
> > >                                     int cap,
> > >                                     unsigned int opts)
> > >  {
> > > -     /* We're only interested in CAP_SETUID. */
> > > -     if (cap != CAP_SETUID)
> > > +     /* We're only interested in CAP_SETUID and CAP_SETGID. */
> > > +     if (cap != CAP_SETUID && cap != CAP_SETGID)
> > >               return 0;
> > >
> > >       /*
> > > @@ -77,45 +104,83 @@ static int safesetid_security_capable(const struct cred *cred,
> > >       if ((opts & CAP_OPT_INSETID) != 0)
> > >               return 0;
> > >
> > > -     /*
> > > -      * If no policy applies to this task, allow the use of CAP_SETUID for
> > > -      * other purposes.
> > > -      */
> > > -     if (setuid_policy_lookup(cred->uid, INVALID_UID) == SIDPOL_DEFAULT)
> > > +     switch (cap) {
> > > +     case CAP_SETUID:
> > > +             /*
> > > +             * If no policy applies to this task, allow the use of CAP_SETUID for
> > > +             * other purposes.
> > > +             */
> > > +             if (setid_policy_lookup((kid_t)cred->uid, INVALID_ID, UID) == SIDPOL_DEFAULT)
> > > +                     return 0;
> > > +             /*
> > > +              * Reject use of CAP_SETUID for functionality other than calling
> > > +              * set*uid() (e.g. setting up userns uid mappings).
> > > +              */
> > > +             pr_warn("Operation requires CAP_SETUID, which is not available to UID %u for operations besides approved set*uid transitions\n",
> > > +                     __kuid_val(cred->uid));
> > > +             return -EPERM;
> > > +             break;
> > > +     case CAP_SETGID:
> > > +             /*
> > > +             * If no policy applies to this task, allow the use of CAP_SETGID for
> > > +             * other purposes.
> > > +             */
> > > +             if (setid_policy_lookup((kid_t)cred->gid, INVALID_ID, GID) == SIDPOL_DEFAULT)
> > > +                     return 0;
> > > +             /*
> > > +              * Reject use of CAP_SETUID for functionality other than calling
> > > +              * set*gid() (e.g. setting up userns gid mappings).
> > > +              */
> > > +             pr_warn("Operation requires CAP_SETGID, which is not available to GID %u for operations besides approved set*gid transitions\n",
> > > +                     __kuid_val(cred->uid));
> > > +             return -EPERM;
> > > +             break;
> > > +     default:
> > > +             /* Error, the only capabilities were checking for is CAP_SETUID/GID */
> > >               return 0;
> > > -
> > > -     /*
> > > -      * Reject use of CAP_SETUID for functionality other than calling
> > > -      * set*uid() (e.g. setting up userns uid mappings).
> > > -      */
> > > -     pr_warn("Operation requires CAP_SETUID, which is not available to UID %u for operations besides approved set*uid transitions\n",
> > > -             __kuid_val(cred->uid));
> > > -     return -EPERM;
> > > +             break;
> > > +     }
> > > +     return 0;
> > >  }
> > >
> > >  /*
> > >   * Check whether a caller with old credentials @old is allowed to switch to
> > > - * credentials that contain @new_uid.
> > > + * credentials that contain @new_id.
> > >   */
> > > -static bool uid_permitted_for_cred(const struct cred *old, kuid_t new_uid)
> > > +static bool id_permitted_for_cred(const struct cred *old, kid_t new_id, enum setid_type new_type)
> > >  {
> > >       bool permitted;
> > >
> > > -     /* If our old creds already had this UID in it, it's fine. */
> > > -     if (uid_eq(new_uid, old->uid) || uid_eq(new_uid, old->euid) ||
> > > -         uid_eq(new_uid, old->suid))
> > > -             return true;
> > > +     /* If our old creds already had this ID in it, it's fine. */
> > > +     if (new_type == UID) {
> > > +             if (uid_eq(new_id.uid, old->uid) || uid_eq(new_id.uid, old->euid) ||
> > > +                     uid_eq(new_id.uid, old->suid))
> > > +                     return true;
> > > +     } else if (new_type == GID){
> > > +             if (gid_eq(new_id.gid, old->gid) || gid_eq(new_id.gid, old->egid) ||
> > > +                     gid_eq(new_id.gid, old->sgid))
> > > +                     return true;
> > > +     } else /* Error, new_type is an invalid type */
> > > +             return false;
> > >
> > >       /*
> > >        * Transitions to new UIDs require a check against the policy of the old
> > >        * RUID.
> > >        */
> > >       permitted =
> > > -         setuid_policy_lookup(old->uid, new_uid) != SIDPOL_CONSTRAINED;
> > > +         setid_policy_lookup((kid_t)old->uid, new_id, new_type) != SIDPOL_CONSTRAINED;
> > > +
> > >       if (!permitted) {
> > > -             pr_warn("UID transition ((%d,%d,%d) -> %d) blocked\n",
> > > -                     __kuid_val(old->uid), __kuid_val(old->euid),
> > > -                     __kuid_val(old->suid), __kuid_val(new_uid));
> > > +             if (new_type == UID) {
> > > +                     pr_warn("UID transition ((%d,%d,%d) -> %d) blocked\n",
> > > +                             __kuid_val(old->uid), __kuid_val(old->euid),
> > > +                             __kuid_val(old->suid), __kuid_val(new_id.uid));
> > > +             } else if (new_type == GID) {
> > > +                     pr_warn("GID transition ((%d,%d,%d) -> %d) blocked\n",
> > > +                             __kgid_val(old->gid), __kgid_val(old->egid),
> > > +                             __kgid_val(old->sgid), __kgid_val(new_id.gid));
> > > +             } else /* Error, new_type is an invalid type */
> > > +                     return false;
> > >       }
> > >       return permitted;
> > >  }
> > > @@ -131,13 +196,37 @@ static int safesetid_task_fix_setuid(struct cred *new,
> > >  {
> > >
> > >       /* Do nothing if there are no setuid restrictions for our old RUID. */
> > > -     if (setuid_policy_lookup(old->uid, INVALID_UID) == SIDPOL_DEFAULT)
> > > +     if (setid_policy_lookup((kid_t)old->uid, INVALID_ID, UID) == SIDPOL_DEFAULT)
> > > +             return 0;
> > > +
> > > +     if (id_permitted_for_cred(old, (kid_t)new->uid, UID) &&
> > > +         id_permitted_for_cred(old, (kid_t)new->euid, UID) &&
> > > +         id_permitted_for_cred(old, (kid_t)new->suid, UID) &&
> > > +         id_permitted_for_cred(old, (kid_t)new->fsuid, UID))
> > > +             return 0;
> > > +
> > > +     /*
> > > +      * Kill this process to avoid potential security vulnerabilities
> > > +      * that could arise from a missing whitelist entry preventing a
> > > +      * privileged process from dropping to a lesser-privileged one.
> > > +      */
> > > +     force_sig(SIGKILL);
> > > +     return -EACCES;
> > > +}
> > > +
> > > +static int safesetid_task_fix_setgid(struct cred *new,
> > > +                                  const struct cred *old,
> > > +                                  int flags)
> > > +{
> > > +
> > > +     /* Do nothing if there are no setgid restrictions for our old RGID. */
> > > +     if (setid_policy_lookup((kid_t)old->gid, INVALID_ID, GID) == SIDPOL_DEFAULT)
> > >               return 0;
> > >
> > > -     if (uid_permitted_for_cred(old, new->uid) &&
> > > -         uid_permitted_for_cred(old, new->euid) &&
> > > -         uid_permitted_for_cred(old, new->suid) &&
> > > -         uid_permitted_for_cred(old, new->fsuid))
> > > +     if (id_permitted_for_cred(old, (kid_t)new->gid, GID) &&
> > > +         id_permitted_for_cred(old, (kid_t)new->egid, GID) &&
> > > +         id_permitted_for_cred(old, (kid_t)new->sgid, GID) &&
> > > +         id_permitted_for_cred(old, (kid_t)new->fsgid, GID))
> > >               return 0;
> > >
> > >       /*
> > > @@ -151,6 +240,7 @@ static int safesetid_task_fix_setuid(struct cred *new,
> > >
> > >  static struct security_hook_list safesetid_security_hooks[] = {
> > >       LSM_HOOK_INIT(task_fix_setuid, safesetid_task_fix_setuid),
> > > +     LSM_HOOK_INIT(task_fix_setgid, safesetid_task_fix_setgid),
> > >       LSM_HOOK_INIT(capable, safesetid_security_capable)
> > >  };
> > >
> > > diff --git a/security/safesetid/lsm.h b/security/safesetid/lsm.h
> > > index db6d16e6bbc3..bde8c43a3767 100644
> > > --- a/security/safesetid/lsm.h
> > > +++ b/security/safesetid/lsm.h
> > > @@ -27,27 +27,47 @@ enum sid_policy_type {
> > >       SIDPOL_ALLOWED /* target ID explicitly allowed */
> > >  };
> > >
> > > +typedef union {
> > > +     kuid_t uid;
> > > +     kgid_t gid;
> > > +} kid_t;
> > > +
> > > +enum setid_type {
> > > +     UID,
> > > +     GID
> > > +};
> > > +
> > >  /*
> > > - * Hash table entry to store safesetid policy signifying that 'src_uid'
> > > - * can setuid to 'dst_uid'.
> > > + * Hash table entry to store safesetid policy signifying that 'src_id'
> > > + * can set*id to 'dst_id'.
> > >   */
> > > -struct setuid_rule {
> > > +struct setid_rule {
> > >       struct hlist_node next;
> > > -     kuid_t src_uid;
> > > -     kuid_t dst_uid;
> > > +     kid_t src_id;
> > > +     kid_t dst_id;
> > > +
> > > +     /* Flag to signal if rule is for UID's or GID's */
> > > +     enum setid_type type;
> > >  };
> > >
> > >  #define SETID_HASH_BITS 8 /* 256 buckets in hash table */
> > >
> > > -struct setuid_ruleset {
> > > +/* Extension of INVALID_UID/INVALID_GID for kid_t type */
> > > +#define INVALID_ID (kid_t){.uid = INVALID_UID}
> > > +
> > > +struct setid_ruleset {
> > >       DECLARE_HASHTABLE(rules, SETID_HASH_BITS);
> > >       char *policy_str;
> > >       struct rcu_head rcu;
> > > +
> > > +     //Flag to signal if ruleset is for UID's or GID's
> > > +     enum setid_type type;
> > >  };
> > >
> > > -enum sid_policy_type _setuid_policy_lookup(struct setuid_ruleset *policy,
> > > -             kuid_t src, kuid_t dst);
> > > +enum sid_policy_type _setid_policy_lookup(struct setid_ruleset *policy,
> > > +             kid_t src, kid_t dst);
> > >
> > > -extern struct setuid_ruleset __rcu *safesetid_setuid_rules;
> > > +extern struct setid_ruleset __rcu *safesetid_setuid_rules;
> > > +extern struct setid_ruleset __rcu *safesetid_setgid_rules;
> > >
> > >  #endif /* _SAFESETID_H */
> > > diff --git a/security/safesetid/securityfs.c b/security/safesetid/securityfs.c
> > > index f8bc574cea9c..211050d0a922 100644
> > > --- a/security/safesetid/securityfs.c
> > > +++ b/security/safesetid/securityfs.c
> > > @@ -19,22 +19,23 @@
> > >
> > >  #include "lsm.h"
> > >
> > > -static DEFINE_MUTEX(policy_update_lock);
> > > +static DEFINE_MUTEX(uid_policy_update_lock);
> > > +static DEFINE_MUTEX(gid_policy_update_lock);
> > >
> > >  /*
> > > - * In the case the input buffer contains one or more invalid UIDs, the kuid_t
> > > + * In the case the input buffer contains one or more invalid IDs, the kid_t
> > >   * variables pointed to by @parent and @child will get updated but this
> > >   * function will return an error.
> > >   * Contents of @buf may be modified.
> > >   */
> > >  static int parse_policy_line(struct file *file, char *buf,
> > > -     struct setuid_rule *rule)
> > > +     struct setid_rule *rule)
> > >  {
> > >       char *child_str;
> > >       int ret;
> > >       u32 parsed_parent, parsed_child;
> > >
> > > -     /* Format of |buf| string should be <UID>:<UID>. */
> > > +     /* Format of |buf| string should be <UID>:<UID> or <GID>:<GID> */
> > >       child_str = strchr(buf, ':');
> > >       if (child_str == NULL)
> > >               return -EINVAL;
> > > @@ -49,20 +50,36 @@ static int parse_policy_line(struct file *file, char *buf,
> > >       if (ret)
> > >               return ret;
> > >
> > > -     rule->src_uid = make_kuid(file->f_cred->user_ns, parsed_parent);
> > > -     rule->dst_uid = make_kuid(file->f_cred->user_ns, parsed_child);
> > > -     if (!uid_valid(rule->src_uid) || !uid_valid(rule->dst_uid))
> > > +     if (rule->type == UID){
> > > +             rule->src_id.uid = make_kuid(file->f_cred->user_ns, parsed_parent);
> > > +             rule->dst_id.uid = make_kuid(file->f_cred->user_ns, parsed_child);
> > > +     } else if (rule->type == GID){
> > > +             rule->src_id.gid = make_kgid(file->f_cred->user_ns, parsed_parent);
> > > +             rule->dst_id.gid = make_kgid(file->f_cred->user_ns, parsed_child);
> > > +     } else {
> > > +             /* Error, rule->type is an invalid type */
> > >               return -EINVAL;
> > > +     }
> >
> > Is there any reason to have these the below if/else actions go into the above
> > if/else block?
> >
> 
> 
> As for the two if/elseif/else branches in the parse_policy_line
> function, I think you're right in the fact that they can be collapsed
> together. We'll introduce another patch to modify this.
> 
> 
> 
> > > +     if (rule->type == UID) {
> > > +             if (!uid_valid(rule->src_id.uid) || !uid_valid(rule->dst_id.uid))
> > > +                     return -EINVAL;
> > > +     } else if (rule->type == GID) {
> > > +             if (!gid_valid(rule->src_id.gid) || !gid_valid(rule->dst_id.gid))
> > > +                     return -EINVAL;
> > > +     } else {
> > > +             /* Error, rule->type is an invalid type */
> > > +             return -EINVAL;
> > > +     }
> > >       return 0;
> > >  }
> > >
> > >  static void __release_ruleset(struct rcu_head *rcu)
> > >  {
> > > -     struct setuid_ruleset *pol =
> > > -             container_of(rcu, struct setuid_ruleset, rcu);
> > > +     struct setid_ruleset *pol =
> > > +             container_of(rcu, struct setid_ruleset, rcu);
> > >       int bucket;
> > > -     struct setuid_rule *rule;
> > > +     struct setid_rule *rule;
> > >       struct hlist_node *tmp;
> > >
> > >       hash_for_each_safe(pol->rules, bucket, tmp, rule, next)
> > > @@ -71,36 +88,58 @@ static void __release_ruleset(struct rcu_head *rcu)
> > >       kfree(pol);
> > >  }
> > >
> > > -static void release_ruleset(struct setuid_ruleset *pol)
> > > -{
> > > +static void release_ruleset(struct setid_ruleset *pol){
> > >       call_rcu(&pol->rcu, __release_ruleset);
> > >  }
> > >
> > > -static void insert_rule(struct setuid_ruleset *pol, struct setuid_rule *rule)
> > > +static void insert_rule(struct setid_ruleset *pol, struct setid_rule *rule)
> > >  {
> > > -     hash_add(pol->rules, &rule->next, __kuid_val(rule->src_uid));
> > > +     if (pol->type == UID)
> > > +             hash_add(pol->rules, &rule->next, __kuid_val(rule->src_id.uid));
> > > +     else if (pol->type == GID)
> > > +             hash_add(pol->rules, &rule->next, __kgid_val(rule->src_id.gid));
> > > +     else /* Error, pol->type is neither UID or GID */
> > > +             return;
> > >  }
> > >
> > > -static int verify_ruleset(struct setuid_ruleset *pol)
> > > +static int verify_ruleset(struct setid_ruleset *pol)
> > >  {
> > >       int bucket;
> > > -     struct setuid_rule *rule, *nrule;
> > > +     struct setid_rule *rule, *nrule;
> > >       int res = 0;
> > >
> > >       hash_for_each(pol->rules, bucket, rule, next) {
> > > -             if (_setuid_policy_lookup(pol, rule->dst_uid, INVALID_UID) ==
> > > -                 SIDPOL_DEFAULT) {
> > > -                     pr_warn("insecure policy detected: uid %d is constrained but transitively unconstrained through uid %d\n",
> > > -                             __kuid_val(rule->src_uid),
> > > -                             __kuid_val(rule->dst_uid));
> > > +             if (_setid_policy_lookup(pol, rule->dst_id, INVALID_ID) == SIDPOL_DEFAULT) {
> > > +                     if (pol->type == UID) {
> > > +                             pr_warn("insecure policy detected: uid %d is constrained but transitively unconstrained through uid %d\n",
> > > +                                     __kuid_val(rule->src_id.uid),
> > > +                                     __kuid_val(rule->dst_id.uid));
> > > +                     } else if (pol->type == GID) {
> > > +                             pr_warn("insecure policy detected: gid %d is constrained but transitively unconstrained through gid %d\n",
> > > +                                     __kgid_val(rule->src_id.gid),
> > > +                                     __kgid_val(rule->dst_id.gid));
> > > +                     } else { /* pol->type is an invalid type */
> > > +                             res = -EINVAL;
> > > +                             return res;
> > > +                     }
> > >                       res = -EINVAL;
> > >
> > >                       /* fix it up */
> > > -                     nrule = kmalloc(sizeof(struct setuid_rule), GFP_KERNEL);
> > > +                     nrule = kmalloc(sizeof(struct setid_rule), GFP_KERNEL);
> > >                       if (!nrule)
> > >                               return -ENOMEM;
> > > -                     nrule->src_uid = rule->dst_uid;
> > > -                     nrule->dst_uid = rule->dst_uid;
> > > +                     if (pol->type == UID){
> > > +                             nrule->src_id.uid = rule->dst_id.uid;
> > > +                             nrule->dst_id.uid = rule->dst_id.uid;
> > > +                             nrule->type = UID;
> > > +                     } else if (pol->type == GID){
> > > +                             nrule->src_id.gid = rule->dst_id.gid;
> > > +                             nrule->dst_id.gid = rule->dst_id.gid;
> > > +                             nrule->type = GID;
> > > +                     } else { /* Error, pol->type is neither UID or GID */
> > > +                             kfree(nrule);
> >
> > Why not check this before the kmalloc?
> >
> 
> 
> The whole else branch is a sanity check and probably will never be
> used, but looking at it a second time, it is being redundantly checked
> with the else statement in the above branch, so this else statement
> can be gone with the new patch.
> 
> 
> 
> > > +                             return res;
> > > +                     }
> > >                       insert_rule(pol, nrule);
> > >               }
> > >       }
> > > @@ -108,16 +147,17 @@ static int verify_ruleset(struct setuid_ruleset *pol)
> > >  }
> > >
> > >  static ssize_t handle_policy_update(struct file *file,
> > > -                                 const char __user *ubuf, size_t len)
> > > +                                 const char __user *ubuf, size_t len, enum setid_type policy_type)
> > >  {
> > > -     struct setuid_ruleset *pol;
> > > +     struct setid_ruleset *pol;
> > >       char *buf, *p, *end;
> > >       int err;
> > >
> > > -     pol = kmalloc(sizeof(struct setuid_ruleset), GFP_KERNEL);
> > > +     pol = kmalloc(sizeof(struct setid_ruleset), GFP_KERNEL);
> > >       if (!pol)
> > >               return -ENOMEM;
> > >       pol->policy_str = NULL;
> > > +     pol->type = policy_type;
> > >       hash_init(pol->rules);
> > >
> > >       p = buf = memdup_user_nul(ubuf, len);
> > > @@ -133,7 +173,7 @@ static ssize_t handle_policy_update(struct file *file,
> > >
> > >       /* policy lines, including the last one, end with \n */
> > >       while (*p != '\0') {
> > > -             struct setuid_rule *rule;
> > > +             struct setid_rule *rule;
> > >
> > >               end = strchr(p, '\n');
> > >               if (end == NULL) {
> > > @@ -142,18 +182,18 @@ static ssize_t handle_policy_update(struct file *file,
> > >               }
> > >               *end = '\0';
> > >
> > > -             rule = kmalloc(sizeof(struct setuid_rule), GFP_KERNEL);
> > > +             rule = kmalloc(sizeof(struct setid_rule), GFP_KERNEL);
> > >               if (!rule) {
> > >                       err = -ENOMEM;
> > >                       goto out_free_buf;
> > >               }
> > >
> > > +             rule->type = policy_type;
> > >               err = parse_policy_line(file, p, rule);
> > >               if (err)
> > >                       goto out_free_rule;
> > >
> > > -             if (_setuid_policy_lookup(pol, rule->src_uid, rule->dst_uid) ==
> > > -                 SIDPOL_ALLOWED) {
> > > +             if (_setid_policy_lookup(pol, rule->src_id, rule->dst_id) == SIDPOL_ALLOWED) {
> > >                       pr_warn("bad policy: duplicate entry\n");
> > >                       err = -EEXIST;
> > >                       goto out_free_rule;
> > > @@ -178,21 +218,45 @@ static ssize_t handle_policy_update(struct file *file,
> > >        * What we really want here is an xchg() wrapper for RCU, but since that
> > >        * doesn't currently exist, just use a spinlock for now.
> > >        */
> > > -     mutex_lock(&policy_update_lock);
> > > -     pol = rcu_replace_pointer(safesetid_setuid_rules, pol,
> > > -                               lockdep_is_held(&policy_update_lock));
> > > -     mutex_unlock(&policy_update_lock);
> > > +     if (policy_type == UID) {
> > > +             mutex_lock(&uid_policy_update_lock);
> > > +             pol = rcu_replace_pointer(safesetid_setuid_rules, pol,
> > > +                                       lockdep_is_held(&uid_policy_update_lock));
> > > +             mutex_unlock(&uid_policy_update_lock);
> > > +     } else if (policy_type == GID) {
> > > +             mutex_lock(&gid_policy_update_lock);
> > > +             pol = rcu_replace_pointer(safesetid_setgid_rules, pol,
> > > +                                       lockdep_is_held(&gid_policy_update_lock));
> > > +             mutex_unlock(&gid_policy_update_lock);
> > > +     } else {
> > > +             /* Error, policy type is neither UID or GID */
> > > +             pr_warn("error: bad policy type");
> > > +     }
> > >       err = len;
> > >
> > >  out_free_buf:
> > >       kfree(buf);
> > >  out_free_pol:
> > >       if (pol)
> > > -                release_ruleset(pol);
> > > +             release_ruleset(pol);
> > >       return err;
> > >  }
> > >
> > > -static ssize_t safesetid_file_write(struct file *file,
> > > +static ssize_t safesetid_uid_file_write(struct file *file,
> > > +                                 const char __user *buf,
> > > +                                 size_t len,
> > > +                                 loff_t *ppos)
> > > +{
> > > +     if (!file_ns_capable(file, &init_user_ns, CAP_MAC_ADMIN))
> > > +             return -EPERM;
> > > +
> > > +     if (*ppos != 0)
> > > +             return -EINVAL;
> > > +
> > > +     return handle_policy_update(file, buf, len, UID);
> > > +}
> > > +
> > > +static ssize_t safesetid_gid_file_write(struct file *file,
> > >                                   const char __user *buf,
> > >                                   size_t len,
> > >                                   loff_t *ppos)
> > > @@ -203,38 +267,60 @@ static ssize_t safesetid_file_write(struct file *file,
> > >       if (*ppos != 0)
> > >               return -EINVAL;
> > >
> > > -     return handle_policy_update(file, buf, len);
> > > +     return handle_policy_update(file, buf, len, GID);
> > >  }
> > >
> > >  static ssize_t safesetid_file_read(struct file *file, char __user *buf,
> > > -                                size_t len, loff_t *ppos)
> > > +                                size_t len, loff_t *ppos, struct mutex *policy_update_lock, struct setid_ruleset* ruleset)
> > >  {
> > >       ssize_t res = 0;
> > > -     struct setuid_ruleset *pol;
> > > +     struct setid_ruleset *pol;
> > >       const char *kbuf;
> > >
> > > -     mutex_lock(&policy_update_lock);
> > > -     pol = rcu_dereference_protected(safesetid_setuid_rules,
> > > -                                     lockdep_is_held(&policy_update_lock));
> > > +     mutex_lock(policy_update_lock);
> > > +     pol = rcu_dereference_protected(ruleset, lockdep_is_held(&policy_update_lock));
> > >       if (pol) {
> > >               kbuf = pol->policy_str;
> > >               res = simple_read_from_buffer(buf, len, ppos,
> > >                                             kbuf, strlen(kbuf));
> > >       }
> > > -     mutex_unlock(&policy_update_lock);
> > > +     mutex_unlock(policy_update_lock);
> > > +
> > >       return res;
> > >  }
> > >
> > > -static const struct file_operations safesetid_file_fops = {
> > > -     .read = safesetid_file_read,
> > > -     .write = safesetid_file_write,
> > > +static ssize_t safesetid_uid_file_read(struct file *file, char __user *buf,
> > > +                                size_t len, loff_t *ppos)
> > > +{
> > > +     return safesetid_file_read(file, buf, len, ppos,
> > > +                                &uid_policy_update_lock, safesetid_setuid_rules);
> > > +}
> > > +
> > > +static ssize_t safesetid_gid_file_read(struct file *file, char __user *buf,
> > > +                                size_t len, loff_t *ppos)
> > > +{
> > > +     return safesetid_file_read(file, buf, len, ppos,
> > > +                                &gid_policy_update_lock, safesetid_setgid_rules);
> > > +}
> > > +
> > > +
> > > +
> > > +static const struct file_operations safesetid_uid_file_fops = {
> > > +     .read = safesetid_uid_file_read,
> > > +     .write = safesetid_uid_file_write,
> > > +};
> > > +
> > > +static const struct file_operations safesetid_gid_file_fops = {
> > > +     .read = safesetid_gid_file_read,
> > > +     .write = safesetid_gid_file_write,
> > >  };
> > >
> > >  static int __init safesetid_init_securityfs(void)
> > >  {
> > >       int ret;
> > >       struct dentry *policy_dir;
> > > -     struct dentry *policy_file;
> > > +     struct dentry *uid_policy_file;
> > > +     struct dentry *gid_policy_file;
> > >
> > >       if (!safesetid_initialized)
> > >               return 0;
> > > @@ -245,13 +331,21 @@ static int __init safesetid_init_securityfs(void)
> > >               goto error;
> > >       }
> > >
> > > -     policy_file = securityfs_create_file("whitelist_policy", 0600,
> > > -                     policy_dir, NULL, &safesetid_file_fops);
> > > -     if (IS_ERR(policy_file)) {
> > > -             ret = PTR_ERR(policy_file);
> > > +     uid_policy_file = securityfs_create_file("uid_whitelist_policy", 0600,
> > > +                     policy_dir, NULL, &safesetid_uid_file_fops);
> > > +     if (IS_ERR(uid_policy_file)) {
> > > +             ret = PTR_ERR(uid_policy_file);
> > >               goto error;
> > >       }
> > >
> > > +     gid_policy_file = securityfs_create_file("gid_whitelist_policy", 0600,
> > > +                     policy_dir, NULL, &safesetid_gid_file_fops);
> > > +     if (IS_ERR(gid_policy_file)) {
> > > +             ret = PTR_ERR(gid_policy_file);
> > > +             goto error;
> > > +     }
> > > +
> > > +
> > >       return 0;
> > >
> > >  error:
> > > --
> > > 2.28.0.rc0.105.gf9edc3c819-goog

^ permalink raw reply

* Re: [PATCH ghak84 v4] audit: purge audit_log_string from the intra-kernel audit API
From: John Johansen @ 2020-07-21 19:30 UTC (permalink / raw)
  To: Paul Moore, Richard Guy Briggs
  Cc: Linux-Audit Mailing List, LKML, Linux Security Module list,
	Eric Paris
In-Reply-To: <CAHC9VhQgDGPutYxQawMPmezm1a+i1nXO5KSn9_7KPDZsRBJ4pw@mail.gmail.com>

On 7/21/20 8:19 AM, Paul Moore wrote:
> On Tue, Jul 14, 2020 at 5:00 PM Richard Guy Briggs <rgb@redhat.com> wrote:
>> On 2020-07-14 16:29, Paul Moore wrote:
>>> On Tue, Jul 14, 2020 at 1:44 PM Richard Guy Briggs <rgb@redhat.com> wrote:
>>>> On 2020-07-14 12:21, Paul Moore wrote:
>>>>> On Mon, Jul 13, 2020 at 3:52 PM Richard Guy Briggs <rgb@redhat.com> wrote:
>>>>>>
>>>>>> audit_log_string() was inteded to be an internal audit function and
>>>>>> since there are only two internal uses, remove them.  Purge all external
>>>>>> uses of it by restructuring code to use an existing audit_log_format()
>>>>>> or using audit_log_format().
>>>>>>
>>>>>> Please see the upstream issue
>>>>>> https://github.com/linux-audit/audit-kernel/issues/84
>>>>>>
>>>>>> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
>>>>>> ---
>>>>>> Passes audit-testsuite.
>>>>>>
>>>>>> Changelog:
>>>>>> v4
>>>>>> - use double quotes in all replaced audit_log_string() calls
>>>>>>
>>>>>> v3
>>>>>> - fix two warning: non-void function does not return a value in all control paths
>>>>>>         Reported-by: kernel test robot <lkp@intel.com>
>>>>>>
>>>>>> v2
>>>>>> - restructure to piggyback on existing audit_log_format() calls, checking quoting needs for each.
>>>>>>
>>>>>> v1 Vlad Dronov
>>>>>> - https://github.com/nefigtut/audit-kernel/commit/dbbcba46335a002f44b05874153a85b9cc18aebf
>>>>>>
>>>>>>  include/linux/audit.h     |  5 -----
>>>>>>  kernel/audit.c            |  4 ++--
>>>>>>  security/apparmor/audit.c | 10 ++++------
>>>>>>  security/apparmor/file.c  | 25 +++++++------------------
>>>>>>  security/apparmor/ipc.c   | 46 +++++++++++++++++++++++-----------------------
>>>>>>  security/apparmor/net.c   | 14 ++++++++------
>>>>>>  security/lsm_audit.c      |  4 ++--
>>>>>>  7 files changed, 46 insertions(+), 62 deletions(-)
>>>>>
>>>>> Thanks for restoring the quotes, just one question below ...
>>>>>
>>>>>> diff --git a/security/apparmor/ipc.c b/security/apparmor/ipc.c
>>>>>> index 4ecedffbdd33..fe36d112aad9 100644
>>>>>> --- a/security/apparmor/ipc.c
>>>>>> +++ b/security/apparmor/ipc.c
>>>>>> @@ -20,25 +20,23 @@
>>>>>>
>>>>>>  /**
>>>>>>   * audit_ptrace_mask - convert mask to permission string
>>>>>> - * @buffer: buffer to write string to (NOT NULL)
>>>>>>   * @mask: permission mask to convert
>>>>>> + *
>>>>>> + * Returns: pointer to static string
>>>>>>   */
>>>>>> -static void audit_ptrace_mask(struct audit_buffer *ab, u32 mask)
>>>>>> +static const char *audit_ptrace_mask(u32 mask)
>>>>>>  {
>>>>>>         switch (mask) {
>>>>>>         case MAY_READ:
>>>>>> -               audit_log_string(ab, "read");
>>>>>> -               break;
>>>>>> +               return "read";
>>>>>>         case MAY_WRITE:
>>>>>> -               audit_log_string(ab, "trace");
>>>>>> -               break;
>>>>>> +               return "trace";
>>>>>>         case AA_MAY_BE_READ:
>>>>>> -               audit_log_string(ab, "readby");
>>>>>> -               break;
>>>>>> +               return "readby";
>>>>>>         case AA_MAY_BE_TRACED:
>>>>>> -               audit_log_string(ab, "tracedby");
>>>>>> -               break;
>>>>>> +               return "tracedby";
>>>>>>         }
>>>>>> +       return "";
>>>>>
>>>>> Are we okay with this returning an empty string ("") in this case?
>>>>> Should it be a question mark ("?")?
>>>>>
>>>>> My guess is that userspace parsing should be okay since it still has
>>>>> quotes, I'm just not sure if we wanted to use a question mark as we do
>>>>> in other cases where the field value is empty/unknown.
>>>>
>>>> Previously, it would have been an empty value, not even double quotes.
>>>> "?" might be an improvement.
>>>
>>> Did you want to fix that now in this patch, or leave it to later?  As
>>> I said above, I'm not too bothered by it with the quotes so either way
>>> is fine by me.
>>
>> I'd defer to Steve, otherwise I'd say leave it, since there wasn't
>> anything there before and this makes that more evident.
>>
>>> John, I'm assuming you are okay with this patch?
> 
> With no comments from John or Steve in the past week, I've gone ahead
> and merged the patch into audit/next.
> 


sorry, for some reason I thought a new iteration of this was coming.

the patch is fine, the empty unknown value should be possible here
so changing it to "?" won't affect anything.

^ permalink raw reply

* Re: [PATCH ghak84 v4] audit: purge audit_log_string from the intra-kernel audit API
From: Paul Moore @ 2020-07-21 21:16 UTC (permalink / raw)
  To: John Johansen
  Cc: Richard Guy Briggs, Linux-Audit Mailing List, LKML,
	Linux Security Module list, Eric Paris
In-Reply-To: <e6eb37d5-ec6b-852a-74df-bbf453607fbe@canonical.com>

On Tue, Jul 21, 2020 at 3:31 PM John Johansen
<john.johansen@canonical.com> wrote:
> On 7/21/20 8:19 AM, Paul Moore wrote:
> > On Tue, Jul 14, 2020 at 5:00 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> >> On 2020-07-14 16:29, Paul Moore wrote:
> >>> On Tue, Jul 14, 2020 at 1:44 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> >>>> On 2020-07-14 12:21, Paul Moore wrote:
> >>>>> On Mon, Jul 13, 2020 at 3:52 PM Richard Guy Briggs <rgb@redhat.com> wrote:
> >>>>>>
> >>>>>> audit_log_string() was inteded to be an internal audit function and
> >>>>>> since there are only two internal uses, remove them.  Purge all external
> >>>>>> uses of it by restructuring code to use an existing audit_log_format()
> >>>>>> or using audit_log_format().
> >>>>>>
> >>>>>> Please see the upstream issue
> >>>>>> https://github.com/linux-audit/audit-kernel/issues/84
> >>>>>>
> >>>>>> Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
> >>>>>> ---
> >>>>>> Passes audit-testsuite.
> >>>>>>
> >>>>>> Changelog:
> >>>>>> v4
> >>>>>> - use double quotes in all replaced audit_log_string() calls
> >>>>>>
> >>>>>> v3
> >>>>>> - fix two warning: non-void function does not return a value in all control paths
> >>>>>>         Reported-by: kernel test robot <lkp@intel.com>
> >>>>>>
> >>>>>> v2
> >>>>>> - restructure to piggyback on existing audit_log_format() calls, checking quoting needs for each.
> >>>>>>
> >>>>>> v1 Vlad Dronov
> >>>>>> - https://github.com/nefigtut/audit-kernel/commit/dbbcba46335a002f44b05874153a85b9cc18aebf
> >>>>>>
> >>>>>>  include/linux/audit.h     |  5 -----
> >>>>>>  kernel/audit.c            |  4 ++--
> >>>>>>  security/apparmor/audit.c | 10 ++++------
> >>>>>>  security/apparmor/file.c  | 25 +++++++------------------
> >>>>>>  security/apparmor/ipc.c   | 46 +++++++++++++++++++++++-----------------------
> >>>>>>  security/apparmor/net.c   | 14 ++++++++------
> >>>>>>  security/lsm_audit.c      |  4 ++--
> >>>>>>  7 files changed, 46 insertions(+), 62 deletions(-)
> >>>>>
> >>>>> Thanks for restoring the quotes, just one question below ...
> >>>>>
> >>>>>> diff --git a/security/apparmor/ipc.c b/security/apparmor/ipc.c
> >>>>>> index 4ecedffbdd33..fe36d112aad9 100644
> >>>>>> --- a/security/apparmor/ipc.c
> >>>>>> +++ b/security/apparmor/ipc.c
> >>>>>> @@ -20,25 +20,23 @@
> >>>>>>
> >>>>>>  /**
> >>>>>>   * audit_ptrace_mask - convert mask to permission string
> >>>>>> - * @buffer: buffer to write string to (NOT NULL)
> >>>>>>   * @mask: permission mask to convert
> >>>>>> + *
> >>>>>> + * Returns: pointer to static string
> >>>>>>   */
> >>>>>> -static void audit_ptrace_mask(struct audit_buffer *ab, u32 mask)
> >>>>>> +static const char *audit_ptrace_mask(u32 mask)
> >>>>>>  {
> >>>>>>         switch (mask) {
> >>>>>>         case MAY_READ:
> >>>>>> -               audit_log_string(ab, "read");
> >>>>>> -               break;
> >>>>>> +               return "read";
> >>>>>>         case MAY_WRITE:
> >>>>>> -               audit_log_string(ab, "trace");
> >>>>>> -               break;
> >>>>>> +               return "trace";
> >>>>>>         case AA_MAY_BE_READ:
> >>>>>> -               audit_log_string(ab, "readby");
> >>>>>> -               break;
> >>>>>> +               return "readby";
> >>>>>>         case AA_MAY_BE_TRACED:
> >>>>>> -               audit_log_string(ab, "tracedby");
> >>>>>> -               break;
> >>>>>> +               return "tracedby";
> >>>>>>         }
> >>>>>> +       return "";
> >>>>>
> >>>>> Are we okay with this returning an empty string ("") in this case?
> >>>>> Should it be a question mark ("?")?
> >>>>>
> >>>>> My guess is that userspace parsing should be okay since it still has
> >>>>> quotes, I'm just not sure if we wanted to use a question mark as we do
> >>>>> in other cases where the field value is empty/unknown.
> >>>>
> >>>> Previously, it would have been an empty value, not even double quotes.
> >>>> "?" might be an improvement.
> >>>
> >>> Did you want to fix that now in this patch, or leave it to later?  As
> >>> I said above, I'm not too bothered by it with the quotes so either way
> >>> is fine by me.
> >>
> >> I'd defer to Steve, otherwise I'd say leave it, since there wasn't
> >> anything there before and this makes that more evident.
> >>
> >>> John, I'm assuming you are okay with this patch?
> >
> > With no comments from John or Steve in the past week, I've gone ahead
> > and merged the patch into audit/next.
>
> sorry, for some reason I thought a new iteration of this was coming.
>
> the patch is fine, the empty unknown value should be possible here
> so changing it to "?" won't affect anything.

Yeah, I was kind of on the fence about requiring a new version from
Richard.  I think "?" is arguably the right approach, but I don't
think it matters enough to force the issue.  If it proves to be
problematic we can fix it later.

Regardless, it's in audit/next now.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH 06/13] fs/kernel_read_file: Remove redundant size argument
From: Scott Branden @ 2020-07-21 21:43 UTC (permalink / raw)
  To: Kees Cook
  Cc: Mimi Zohar, Matthew Wilcox, James Morris, Luis Chamberlain,
	Greg Kroah-Hartman, Rafael J. Wysocki, Alexander Viro, Jessica Yu,
	Dmitry Kasatkin, Serge E. Hallyn, Casey Schaufler,
	Eric W. Biederman, Peter Zijlstra, Matthew Garrett, David Howells,
	Mauro Carvalho Chehab, Randy Dunlap, Joel Fernandes (Google),
	KP Singh, Dave Olsthoorn, Hans de Goede, Peter Jones,
	Andrew Morton, Stephen Boyd, Paul Moore, Stephen Smalley,
	linux-security-module, linux-integrity, selinux, linux-fsdevel,
	kexec, linux-kernel
In-Reply-To: <20200717174309.1164575-7-keescook@chromium.org>

Hi Kees,

On 2020-07-17 10:43 a.m., Kees Cook wrote:
> In preparation for refactoring kernel_read_file*(), remove the redundant
> "size" argument which is not needed: it can be included in the return
> code, with callers adjusted. (VFS reads already cannot be larger than
> INT_MAX.)
>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>   drivers/base/firmware_loader/main.c |  8 ++++----
>   fs/kernel_read_file.c               | 20 +++++++++-----------
>   include/linux/kernel_read_file.h    |  8 ++++----
>   kernel/kexec_file.c                 | 13 ++++++-------
>   kernel/module.c                     |  7 +++----
>   security/integrity/digsig.c         |  5 +++--
>   security/integrity/ima/ima_fs.c     |  5 +++--
>   7 files changed, 32 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c
> index d4a413ea48ce..ea419c7d3d34 100644
> --- a/drivers/base/firmware_loader/main.c
> +++ b/drivers/base/firmware_loader/main.c
> @@ -462,7 +462,7 @@ fw_get_filesystem_firmware(struct device *device, struct fw_priv *fw_priv,
>   					     size_t in_size,
>   					     const void *in_buffer))
>   {
> -	loff_t size;
> +	size_t size;
>   	int i, len;
>   	int rc = -ENOENT;
>   	char *path;
> @@ -494,10 +494,9 @@ fw_get_filesystem_firmware(struct device *device, struct fw_priv *fw_priv,
>   		fw_priv->size = 0;
>   
>   		/* load firmware files from the mount namespace of init */
> -		rc = kernel_read_file_from_path_initns(path, &buffer,
> -						       &size, msize,
> +		rc = kernel_read_file_from_path_initns(path, &buffer, msize,
>   						       READING_FIRMWARE);
> -		if (rc) {
> +		if (rc < 0) {
>   			if (rc != -ENOENT)
>   				dev_warn(device, "loading %s failed with error %d\n",
>   					 path, rc);
> @@ -506,6 +505,7 @@ fw_get_filesystem_firmware(struct device *device, struct fw_priv *fw_priv,
>   					 path);
>   			continue;
>   		}
> +		size = rc;
Change fails to return 0.  Need rc = 0; here.
>   		dev_dbg(device, "Loading firmware from %s\n", path);
>   		if (decompress) {
>   			dev_dbg(device, "f/w decompressing %s\n",
>


^ permalink raw reply

* Re: [PATCH 06/13] fs/kernel_read_file: Remove redundant size argument
From: Kees Cook @ 2020-07-21 21:50 UTC (permalink / raw)
  To: Scott Branden
  Cc: Mimi Zohar, Matthew Wilcox, James Morris, Luis Chamberlain,
	Greg Kroah-Hartman, Rafael J. Wysocki, Alexander Viro, Jessica Yu,
	Dmitry Kasatkin, Serge E. Hallyn, Casey Schaufler,
	Eric W. Biederman, Peter Zijlstra, Matthew Garrett, David Howells,
	Mauro Carvalho Chehab, Randy Dunlap, Joel Fernandes (Google),
	KP Singh, Dave Olsthoorn, Hans de Goede, Peter Jones,
	Andrew Morton, Stephen Boyd, Paul Moore, Stephen Smalley,
	linux-security-module, linux-integrity, selinux, linux-fsdevel,
	kexec, linux-kernel
In-Reply-To: <ec326654-c43b-259c-409c-63929ad5b217@broadcom.com>

On Tue, Jul 21, 2020 at 02:43:07PM -0700, Scott Branden wrote:
> On 2020-07-17 10:43 a.m., Kees Cook wrote:
> > In preparation for refactoring kernel_read_file*(), remove the redundant
> > "size" argument which is not needed: it can be included in the return
> > code, with callers adjusted. (VFS reads already cannot be larger than
> > INT_MAX.)
> > 
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > ---
> >   drivers/base/firmware_loader/main.c |  8 ++++----
> >   fs/kernel_read_file.c               | 20 +++++++++-----------
> >   include/linux/kernel_read_file.h    |  8 ++++----
> >   kernel/kexec_file.c                 | 13 ++++++-------
> >   kernel/module.c                     |  7 +++----
> >   security/integrity/digsig.c         |  5 +++--
> >   security/integrity/ima/ima_fs.c     |  5 +++--
> >   7 files changed, 32 insertions(+), 34 deletions(-)
> > 
> > diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c
> > index d4a413ea48ce..ea419c7d3d34 100644
> > --- a/drivers/base/firmware_loader/main.c
> > +++ b/drivers/base/firmware_loader/main.c
> > @@ -462,7 +462,7 @@ fw_get_filesystem_firmware(struct device *device, struct fw_priv *fw_priv,
> >   					     size_t in_size,
> >   					     const void *in_buffer))
> >   {
> > -	loff_t size;
> > +	size_t size;
> >   	int i, len;
> >   	int rc = -ENOENT;
> >   	char *path;
> > @@ -494,10 +494,9 @@ fw_get_filesystem_firmware(struct device *device, struct fw_priv *fw_priv,
> >   		fw_priv->size = 0;
> >   		/* load firmware files from the mount namespace of init */
> > -		rc = kernel_read_file_from_path_initns(path, &buffer,
> > -						       &size, msize,
> > +		rc = kernel_read_file_from_path_initns(path, &buffer, msize,
> >   						       READING_FIRMWARE);
> > -		if (rc) {
> > +		if (rc < 0) {
> >   			if (rc != -ENOENT)
> >   				dev_warn(device, "loading %s failed with error %d\n",
> >   					 path, rc);
> > @@ -506,6 +505,7 @@ fw_get_filesystem_firmware(struct device *device, struct fw_priv *fw_priv,
> >   					 path);
> >   			continue;
> >   		}
> > +		size = rc;
> Change fails to return 0.  Need rc = 0; here.

Oh nice; good catch! I'll fix this.

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH] KEYS: remove redundant memsets
From: kernel test robot @ 2020-07-22  6:42 UTC (permalink / raw)
  To: trix, dhowells, jarkko.sakkinen, jmorris, serge, denkenz, marcel
  Cc: kbuild-all, keyrings, linux-security-module, linux-kernel,
	Tom Rix
In-Reply-To: <20200721141516.20335-1-trix@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4930 bytes --]

Hi,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on security/next-testing]
[also build test WARNING on linus/master dhowells-fs/fscache-next v5.8-rc6 next-20200721]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/trix-redhat-com/KEYS-remove-redundant-memsets/20200721-221633
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git next-testing
config: microblaze-randconfig-r022-20200719 (attached as .config)
compiler: microblaze-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=microblaze 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   security/keys/keyctl_pkey.c: In function 'keyctl_pkey_params_get_2':
>> security/keys/keyctl_pkey.c:122:8: warning: 'uparams.key_id' is used uninitialized in this function [-Wuninitialized]
     122 |  ret = keyctl_pkey_params_get(uparams.key_id, _info, params);
         |        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

vim +122 security/keys/keyctl_pkey.c

00d60fd3b93219 David Howells 2018-10-09  108  
00d60fd3b93219 David Howells 2018-10-09  109  /*
00d60fd3b93219 David Howells 2018-10-09  110   * Get parameters from userspace.  Callers must always call the free function
00d60fd3b93219 David Howells 2018-10-09  111   * on params, even if an error is returned.
00d60fd3b93219 David Howells 2018-10-09  112   */
00d60fd3b93219 David Howells 2018-10-09  113  static int keyctl_pkey_params_get_2(const struct keyctl_pkey_params __user *_params,
00d60fd3b93219 David Howells 2018-10-09  114  				    const char __user *_info,
00d60fd3b93219 David Howells 2018-10-09  115  				    int op,
00d60fd3b93219 David Howells 2018-10-09  116  				    struct kernel_pkey_params *params)
00d60fd3b93219 David Howells 2018-10-09  117  {
00d60fd3b93219 David Howells 2018-10-09  118  	struct keyctl_pkey_params uparams;
00d60fd3b93219 David Howells 2018-10-09  119  	struct kernel_pkey_query info;
00d60fd3b93219 David Howells 2018-10-09  120  	int ret;
00d60fd3b93219 David Howells 2018-10-09  121  
00d60fd3b93219 David Howells 2018-10-09 @122  	ret = keyctl_pkey_params_get(uparams.key_id, _info, params);
00d60fd3b93219 David Howells 2018-10-09  123  	if (ret < 0)
00d60fd3b93219 David Howells 2018-10-09  124  		return ret;
00d60fd3b93219 David Howells 2018-10-09  125  
00d60fd3b93219 David Howells 2018-10-09  126  	ret = params->key->type->asym_query(params, &info);
00d60fd3b93219 David Howells 2018-10-09  127  	if (ret < 0)
00d60fd3b93219 David Howells 2018-10-09  128  		return ret;
00d60fd3b93219 David Howells 2018-10-09  129  
bb67c86855f477 Tom Rix       2020-07-21  130  	if (copy_from_user(&uparams, _params, sizeof(uparams)) != 0)
bb67c86855f477 Tom Rix       2020-07-21  131  		return -EFAULT;
bb67c86855f477 Tom Rix       2020-07-21  132  
00d60fd3b93219 David Howells 2018-10-09  133  	switch (op) {
00d60fd3b93219 David Howells 2018-10-09  134  	case KEYCTL_PKEY_ENCRYPT:
00d60fd3b93219 David Howells 2018-10-09  135  	case KEYCTL_PKEY_DECRYPT:
00d60fd3b93219 David Howells 2018-10-09  136  		if (uparams.in_len  > info.max_enc_size ||
00d60fd3b93219 David Howells 2018-10-09  137  		    uparams.out_len > info.max_dec_size)
00d60fd3b93219 David Howells 2018-10-09  138  			return -EINVAL;
00d60fd3b93219 David Howells 2018-10-09  139  		break;
00d60fd3b93219 David Howells 2018-10-09  140  	case KEYCTL_PKEY_SIGN:
00d60fd3b93219 David Howells 2018-10-09  141  	case KEYCTL_PKEY_VERIFY:
00d60fd3b93219 David Howells 2018-10-09  142  		if (uparams.in_len  > info.max_sig_size ||
00d60fd3b93219 David Howells 2018-10-09  143  		    uparams.out_len > info.max_data_size)
00d60fd3b93219 David Howells 2018-10-09  144  			return -EINVAL;
00d60fd3b93219 David Howells 2018-10-09  145  		break;
00d60fd3b93219 David Howells 2018-10-09  146  	default:
00d60fd3b93219 David Howells 2018-10-09  147  		BUG();
00d60fd3b93219 David Howells 2018-10-09  148  	}
00d60fd3b93219 David Howells 2018-10-09  149  
00d60fd3b93219 David Howells 2018-10-09  150  	params->in_len  = uparams.in_len;
00d60fd3b93219 David Howells 2018-10-09  151  	params->out_len = uparams.out_len;
00d60fd3b93219 David Howells 2018-10-09  152  	return 0;
00d60fd3b93219 David Howells 2018-10-09  153  }
00d60fd3b93219 David Howells 2018-10-09  154  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 24894 bytes --]

^ permalink raw reply

* Re: [PATCH] KEYS: remove redundant memsets
From: David Howells @ 2020-07-22  8:01 UTC (permalink / raw)
  To: trix
  Cc: dhowells, jarkko.sakkinen, jmorris, serge, denkenz, marcel,
	keyrings, linux-security-module, linux-kernel
In-Reply-To: <20200721141516.20335-1-trix@redhat.com>

trix@redhat.com wrote:

> -	if (copy_from_user(&uparams, _params, sizeof(uparams)) != 0)
> -		return -EFAULT;
> -
>  	ret = keyctl_pkey_params_get(uparams.key_id, _info, params);

Erm...  uparams is used on the very next statement after the copy_from_user().

David


^ permalink raw reply

* Re: [PATCH v8 00/12] Introduce CAP_PERFMON to secure system performance monitoring and observability
From: Arnaldo Carvalho de Melo @ 2020-07-22 11:30 UTC (permalink / raw)
  To: Alexey Budankov
  Cc: Peter Zijlstra, Ravi Bangoria, Alexei Starovoitov, Ingo Molnar,
	James Morris, Namhyung Kim, Serge Hallyn, Jiri Olsa, Song Liu,
	Andi Kleen, Stephane Eranian, Igor Lubashev, Thomas Gleixner,
	linux-kernel, linux-security-module@vger.kernel.org,
	selinux@vger.kernel.org, intel-gfx@lists.freedesktop.org,
	linux-doc@vger.kernel.org, linux-man
In-Reply-To: <8d6030a4-ff2c-230c-c36e-d0a8c68832ac@linux.intel.com>

Em Tue, Jul 21, 2020 at 04:06:34PM +0300, Alexey Budankov escreveu:
> 
> On 13.07.2020 21:51, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Jul 13, 2020 at 03:37:51PM +0300, Alexey Budankov escreveu:
> >>
> >> On 13.07.2020 15:17, Arnaldo Carvalho de Melo wrote:
> >>> Em Mon, Jul 13, 2020 at 12:48:25PM +0300, Alexey Budankov escreveu:
> >> If it had that patch below then message change would not be required.

> > Sure, but the tool should continue to work and provide useful messages
> > when running on kernels without that change. Pointing to the document is
> > valid and should be done, that is an agreed point. But the tool can do
> > some checks, narrow down the possible causes for the error message and
> > provide something that in most cases will make the user make progress.

> >> However this two sentences in the end of whole message would still add up:
> >> "Please read the 'Perf events and tool security' document:
> >>  https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html"

> > We're in violent agreement here. :-)
 
> Here is the message draft mentioning a) CAP_SYS_PTRACE, for kernels prior
> v5.8, and b) Perf security document link. The plan is to send a patch extending
> perf_events with CAP_PERFMON check [1] for ptrace_may_access() and extending
> the tool with this message.
 
> "Access to performance monitoring and observability operations is limited.
>  Enforced MAC policy settings (SELinux) can limit access to performance
>  monitoring and observability operations. Inspect system audit records for
>  more perf_event access control information and adjusting the policy.
>  Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
>  access to performance monitoring and observability operations for processes
>  without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
>  More information can be found at 'Perf events and tool security' document:
>  https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
>  perf_event_paranoid setting is -1:
>      -1: Allow use of (almost) all events by all users
>            Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>  >= 0: Disallow raw and ftrace function tracepoint access
>  >= 1: Disallow CPU event access
>  >= 2: Disallow kernel profiling
>  To make the adjusted perf_event_paranoid setting permanent preserve it
>  in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)"

Looks ok! Lots of knobs to control access as one needs.

- Arnaldo
 
> Alexei
> 
> [1] https://lore.kernel.org/lkml/20200713121746.GA7029@kernel.org/

^ permalink raw reply

* [PATCH v2] KEYS: remove redundant memset
From: trix @ 2020-07-22 13:46 UTC (permalink / raw)
  To: dhowells, jarkko.sakkinen, jmorris, serge, denkenz, marcel
  Cc: keyrings, linux-security-module, linux-kernel, Tom Rix

From: Tom Rix <trix@redhat.com>

Reviewing use of memset in keyctrl_pkey.c

keyctl_pkey_params_get prologue code to set params up

	memset(params, 0, sizeof(*params));
	params->encoding = "raw";

keyctl_pkey_query has the same prologue
and calls keyctl_pkey_params_get.

So remove the prologue.

Fixes: 00d60fd3b932 ("KEYS: Provide keyctls to drive the new key type ops for asymmetric keys [ver #2]")

Signed-off-by: Tom Rix <trix@redhat.com>
---
v1: remove change to keyctl_pkey_params_get_2

 security/keys/keyctl_pkey.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/security/keys/keyctl_pkey.c b/security/keys/keyctl_pkey.c
index 931d8dfb4a7f..5de0d599a274 100644
--- a/security/keys/keyctl_pkey.c
+++ b/security/keys/keyctl_pkey.c
@@ -166,8 +166,6 @@ long keyctl_pkey_query(key_serial_t id,
 	struct kernel_pkey_query res;
 	long ret;
 
-	memset(&params, 0, sizeof(params));
-
 	ret = keyctl_pkey_params_get(id, _info, &params);
 	if (ret < 0)
 		goto error;
-- 
2.18.1


^ permalink raw reply related

* Re: [PATCH v6 5/7] fs,doc: Enable to enforce noexec mounts or file exec through O_MAYEXEC
From: Thibaut Sautereau @ 2020-07-22 16:16 UTC (permalink / raw)
  To: Kees Cook, Mickaël Salaün
  Cc: linux-kernel, Aleksa Sarai, Alexei Starovoitov, Al Viro,
	Andrew Morton, Andy Lutomirski, Christian Brauner,
	Christian Heimes, Daniel Borkmann, Deven Bowers, Dmitry Vyukov,
	Eric Biggers, Eric Chiang, Florian Weimer, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Lakshmi Ramasubramanian,
	Matthew Garrett, Matthew Wilcox, Michael Kerrisk,
	Mickaël Salaün, Mimi Zohar, Philippe Trébuchet,
	Scott Shell, Sean Christopherson, Shuah Khan, Steve Dower,
	Steve Grubb, Tetsuo Handa, Thibaut Sautereau, Vincent Strubel,
	kernel-hardening, linux-api, linux-integrity,
	linux-security-module, linux-fsdevel
In-Reply-To: <35ea0914-7360-43ab-e381-9614d18cceba@digikod.net>

On Thu, Jul 16, 2020 at 04:39:14PM +0200, Mickaël Salaün wrote:
> 
> On 15/07/2020 22:37, Kees Cook wrote:
> > On Tue, Jul 14, 2020 at 08:16:36PM +0200, Mickaël Salaün wrote:
> >> @@ -2849,7 +2855,7 @@ static int may_open(const struct path *path, int acc_mode, int flag)
> >>  	case S_IFLNK:
> >>  		return -ELOOP;
> >>  	case S_IFDIR:
> >> -		if (acc_mode & (MAY_WRITE | MAY_EXEC))
> >> +		if (acc_mode & (MAY_WRITE | MAY_EXEC | MAY_OPENEXEC))
> >>  			return -EISDIR;
> >>  		break;
> > 
> > (I need to figure out where "open for reading" rejects S_IFDIR, since
> > it's clearly not here...)

Doesn't it come from generic_read_dir() in fs/libfs.c?

> > 
> >>  	case S_IFBLK:
> >> @@ -2859,13 +2865,26 @@ static int may_open(const struct path *path, int acc_mode, int flag)
> >>  		fallthrough;
> >>  	case S_IFIFO:
> >>  	case S_IFSOCK:
> >> -		if (acc_mode & MAY_EXEC)
> >> +		if (acc_mode & (MAY_EXEC | MAY_OPENEXEC))
> >>  			return -EACCES;
> >>  		flag &= ~O_TRUNC;
> >>  		break;
> > 
> > This will immediately break a system that runs code with MAY_OPENEXEC
> > set but reads from a block, char, fifo, or socket, even in the case of
> > a sysadmin leaving the "file" sysctl disabled.
> 
> As documented, O_MAYEXEC is for regular files. The only legitimate use
> case seems to be with pipes, which should probably be allowed when
> enforcement is disabled.

By the way Kees, while we fix that for the next series, do you think it
would be relevant, at least for the sake of clarity, to add a
WARN_ON_ONCE(acc_mode & MAY_OPENEXEC) for the S_IFSOCK case, since a
socket cannot be open anyway?

-- 
Thibaut Sautereau
CLIP OS developer

^ permalink raw reply

* [PATCH bpf-next v5 0/7] Generalizing bpf_local_storage
From: KP Singh @ 2020-07-22 17:14 UTC (permalink / raw)
  To: linux-kernel, bpf, linux-security-module
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Paul Turner, Jann Horn, Florent Revest

From: KP Singh <kpsingh@google.com>

# v4 -> v5

- Split non-functional changes into separate commits.
- Updated the cache macros to be simpler.
- Fixed some bugs noticed by Martin.
- Updated the userspace map functions to use an fd for lookups, updates
  and deletes.
- Rebase.

# v3 -> v4

- Fixed a missing include to bpf_sk_storage.h in bpf_sk_storage.c
- Fixed some functions that were not marked as static which led to
  W=1 compilation warnings.

# v2 -> v3

* Restructured the code as per Martin's suggestions:
  - Common functionality in bpf_local_storage.c
  - bpf_sk_storage functionality remains in net/bpf_sk_storage.
  - bpf_inode_storage is kept separate as it is enabled only with
    CONFIG_BPF_LSM.
* A separate cache for inode and sk storage with macros to define it.
* Use the ops style approach as suggested by Martin instead of the
  enum + switch style.
* Added the inode map to bpftool bash completion and docs.
* Rebase and indentation fixes.

# v1 -> v2

* Use the security blob pointer instead of dedicated member in
  struct inode.
* Better code re-use as suggested by Alexei.
* Dropped the inode count arithmetic as pointed out by Alexei.
* Minor bug fixes and rebase.

bpf_sk_storage can already be used by some BPF program types to annotate
socket objects. These annotations are managed with the life-cycle of the
object (i.e. freed when the object is freed) which makes BPF programs
much simpler and less prone to errors and leaks.

This patch series:

* Generalizes the bpf_sk_storage infrastructure to allow easy
  implementation of local storage for other objects
* Implements local storage for inodes
* Makes both bpf_{sk, inode}_storage available to LSM programs.

Local storage is safe to use in LSM programs as the attachment sites are
limited and the owning object won't be freed, however, this is not the
case for tracing. Usage in tracing is expected to follow a white-list
based approach similar to the d_path helper
(https://lore.kernel.org/bpf/20200506132946.2164578-1-jolsa@kernel.org).

Access to local storage would allow LSM programs to implement stateful
detections like detecting the unlink of a running executable from the
examples shared as a part of the KRSI series
https://lore.kernel.org/bpf/20200329004356.27286-1-kpsingh@chromium.org/
and
https://github.com/sinkap/linux-krsi/blob/patch/v1/examples/samples/bpf/lsm_detect_exec_unlink.c


KP Singh (7):
  bpf: Renames to prepare for generalizing sk_storage.
  bpf: Generalize caching for sk_storage.
  bpf: Generalize bpf_sk_storage
  bpf: Split bpf_local_storage to bpf_sk_storage
  bpf: Implement bpf_local_storage for inodes
  bpf: Allow local storage to be used from LSM programs
  bpf: Add selftests for local_storage

 include/linux/bpf.h                           |  13 +
 include/linux/bpf_local_storage.h             | 175 ++++
 include/linux/bpf_lsm.h                       |  21 +
 include/linux/bpf_types.h                     |   3 +
 include/net/bpf_sk_storage.h                  |  12 +
 include/net/sock.h                            |   4 +-
 include/uapi/linux/bpf.h                      |  54 +-
 kernel/bpf/Makefile                           |   2 +
 kernel/bpf/bpf_inode_storage.c                | 356 ++++++++
 kernel/bpf/bpf_local_storage.c                | 519 ++++++++++++
 kernel/bpf/bpf_lsm.c                          |  21 +-
 kernel/bpf/syscall.c                          |   3 +-
 kernel/bpf/verifier.c                         |  10 +
 net/core/bpf_sk_storage.c                     | 759 ++++--------------
 security/bpf/hooks.c                          |   7 +
 .../bpf/bpftool/Documentation/bpftool-map.rst |   2 +-
 tools/bpf/bpftool/bash-completion/bpftool     |   3 +-
 tools/bpf/bpftool/map.c                       |   3 +-
 tools/include/uapi/linux/bpf.h                |  54 +-
 tools/lib/bpf/libbpf_probes.c                 |   5 +-
 .../bpf/prog_tests/test_local_storage.c       |  60 ++
 .../selftests/bpf/progs/local_storage.c       | 136 ++++
 22 files changed, 1596 insertions(+), 626 deletions(-)
 create mode 100644 include/linux/bpf_local_storage.h
 create mode 100644 kernel/bpf/bpf_inode_storage.c
 create mode 100644 kernel/bpf/bpf_local_storage.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_local_storage.c
 create mode 100644 tools/testing/selftests/bpf/progs/local_storage.c

-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply

* [PATCH bpf-next v5 6/7] bpf: Allow local storage to be used from LSM programs
From: KP Singh @ 2020-07-22 17:14 UTC (permalink / raw)
  To: linux-kernel, bpf, linux-security-module
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Paul Turner, Jann Horn, Florent Revest
In-Reply-To: <20200722171409.102949-1-kpsingh@chromium.org>

From: KP Singh <kpsingh@google.com>

Adds support for both bpf_{sk, inode}_storage_{get, delete} to be used
in LSM programs. These helpers are not used for tracing programs
(currently) as their usage is tied to the life-cycle of the object and
should only be used where the owning object won't be freed (when the
owning object is passed as an argument to the LSM hook). Thus, they
are safer to use in LSM hooks than tracing. Usage of local storage in
tracing programs will probably follow a per function based whitelist
approach.

Since the UAPI helper signature for bpf_sk_storage expect a bpf_sock,
it, leads to a compilation warning for LSM programs, it's also updated
to accept a void * pointer instead.

Signed-off-by: KP Singh <kpsingh@google.com>
---
 include/net/bpf_sk_storage.h   |  2 ++
 include/uapi/linux/bpf.h       |  8 ++++++--
 kernel/bpf/bpf_lsm.c           | 21 ++++++++++++++++++++-
 net/core/bpf_sk_storage.c      | 27 +++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  8 ++++++--
 5 files changed, 61 insertions(+), 5 deletions(-)

diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h
index 4cdf37ac278c..d123807b5083 100644
--- a/include/net/bpf_sk_storage.h
+++ b/include/net/bpf_sk_storage.h
@@ -19,6 +19,8 @@ void bpf_sk_storage_free(struct sock *sk);
 
 extern const struct bpf_func_proto bpf_sk_storage_get_proto;
 extern const struct bpf_func_proto bpf_sk_storage_delete_proto;
+extern const struct bpf_func_proto sk_storage_get_btf_proto;
+extern const struct bpf_func_proto sk_storage_delete_btf_proto;
 
 struct bpf_sk_storage_diag;
 struct sk_buff;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 0bdfbe6067be..5be19f93b159 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2790,7 +2790,7 @@ union bpf_attr {
  *
  *		**-ERANGE** if resulting value was out of range.
  *
- * void *bpf_sk_storage_get(struct bpf_map *map, struct bpf_sock *sk, void *value, u64 flags)
+ * void *bpf_sk_storage_get(struct bpf_map *map, void *sk, void *value, u64 flags)
  *	Description
  *		Get a bpf-local-storage from a *sk*.
  *
@@ -2806,6 +2806,10 @@ union bpf_attr {
  *		"type". The bpf-local-storage "type" (i.e. the *map*) is
  *		searched against all bpf-local-storages residing at *sk*.
  *
+ *		For socket programs, *sk* should be a **struct bpf_sock** pointer
+ *		and an **ARG_PTR_TO_BTF_ID** of type **struct sock** for LSM
+ *		programs.
+ *
  *		An optional *flags* (**BPF_SK_STORAGE_GET_F_CREATE**) can be
  *		used such that a new bpf-local-storage will be
  *		created if one does not exist.  *value* can be used
@@ -2818,7 +2822,7 @@ union bpf_attr {
  *		**NULL** if not found or there was an error in adding
  *		a new bpf-local-storage.
  *
- * long bpf_sk_storage_delete(struct bpf_map *map, struct bpf_sock *sk)
+ * long bpf_sk_storage_delete(struct bpf_map *map, void *sk)
  *	Description
  *		Delete a bpf-local-storage from a *sk*.
  *	Return
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index fb278144e9fd..9cd1428c7199 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -11,6 +11,8 @@
 #include <linux/bpf_lsm.h>
 #include <linux/kallsyms.h>
 #include <linux/bpf_verifier.h>
+#include <net/bpf_sk_storage.h>
+#include <linux/bpf_local_storage.h>
 
 /* For every LSM hook that allows attachment of BPF programs, declare a nop
  * function where a BPF program can be attached.
@@ -45,10 +47,27 @@ int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
 	return 0;
 }
 
+static const struct bpf_func_proto *
+bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+	switch (func_id) {
+	case BPF_FUNC_inode_storage_get:
+		return &bpf_inode_storage_get_proto;
+	case BPF_FUNC_inode_storage_delete:
+		return &bpf_inode_storage_delete_proto;
+	case BPF_FUNC_sk_storage_get:
+		return &sk_storage_get_btf_proto;
+	case BPF_FUNC_sk_storage_delete:
+		return &sk_storage_delete_btf_proto;
+	default:
+		return tracing_prog_func_proto(func_id, prog);
+	}
+}
+
 const struct bpf_prog_ops lsm_prog_ops = {
 };
 
 const struct bpf_verifier_ops lsm_verifier_ops = {
-	.get_func_proto = tracing_prog_func_proto,
+	.get_func_proto = bpf_lsm_func_proto,
 	.is_valid_access = btf_ctx_access,
 };
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index be0ed44d0887..17efb8a9196d 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -11,6 +11,7 @@
 #include <net/sock.h>
 #include <uapi/linux/sock_diag.h>
 #include <uapi/linux/btf.h>
+#include <linux/btf_ids.h>
 
 DEFINE_BPF_STORAGE_CACHE(sk_cache);
 
@@ -465,6 +466,32 @@ const struct bpf_func_proto bpf_sk_storage_delete_proto = {
 	.arg2_type	= ARG_PTR_TO_SOCKET,
 };
 
+BTF_ID_LIST(sk_storage_get_btf_ids)
+BTF_ID(struct, sock)
+
+const struct bpf_func_proto sk_storage_get_btf_proto = {
+	.func		= bpf_sk_storage_get,
+	.gpl_only	= false,
+	.ret_type	= RET_PTR_TO_MAP_VALUE_OR_NULL,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_PTR_TO_BTF_ID,
+	.arg3_type	= ARG_PTR_TO_MAP_VALUE_OR_NULL,
+	.arg4_type	= ARG_ANYTHING,
+	.btf_id		= sk_storage_get_btf_ids,
+};
+
+BTF_ID_LIST(sk_storage_delete_btf_ids)
+BTF_ID(struct, sock)
+
+const struct bpf_func_proto sk_storage_delete_btf_proto = {
+	.func		= bpf_sk_storage_delete,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_PTR_TO_BTF_ID,
+	.btf_id		= sk_storage_delete_btf_ids,
+};
+
 struct bpf_sk_storage_diag {
 	u32 nr_maps;
 	struct bpf_map *maps[];
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 0bdfbe6067be..5be19f93b159 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2790,7 +2790,7 @@ union bpf_attr {
  *
  *		**-ERANGE** if resulting value was out of range.
  *
- * void *bpf_sk_storage_get(struct bpf_map *map, struct bpf_sock *sk, void *value, u64 flags)
+ * void *bpf_sk_storage_get(struct bpf_map *map, void *sk, void *value, u64 flags)
  *	Description
  *		Get a bpf-local-storage from a *sk*.
  *
@@ -2806,6 +2806,10 @@ union bpf_attr {
  *		"type". The bpf-local-storage "type" (i.e. the *map*) is
  *		searched against all bpf-local-storages residing at *sk*.
  *
+ *		For socket programs, *sk* should be a **struct bpf_sock** pointer
+ *		and an **ARG_PTR_TO_BTF_ID** of type **struct sock** for LSM
+ *		programs.
+ *
  *		An optional *flags* (**BPF_SK_STORAGE_GET_F_CREATE**) can be
  *		used such that a new bpf-local-storage will be
  *		created if one does not exist.  *value* can be used
@@ -2818,7 +2822,7 @@ union bpf_attr {
  *		**NULL** if not found or there was an error in adding
  *		a new bpf-local-storage.
  *
- * long bpf_sk_storage_delete(struct bpf_map *map, struct bpf_sock *sk)
+ * long bpf_sk_storage_delete(struct bpf_map *map, void *sk)
  *	Description
  *		Delete a bpf-local-storage from a *sk*.
  *	Return
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related

* [PATCH bpf-next v5 7/7] bpf: Add selftests for local_storage
From: KP Singh @ 2020-07-22 17:14 UTC (permalink / raw)
  To: linux-kernel, bpf, linux-security-module
  Cc: Andrii Nakryiko, Alexei Starovoitov, Daniel Borkmann,
	Martin KaFai Lau, Paul Turner, Jann Horn, Florent Revest
In-Reply-To: <20200722171409.102949-1-kpsingh@chromium.org>

From: KP Singh <kpsingh@google.com>

inode_local_storage:

* Hook to the file_open and inode_unlink LSM hooks.
* Create and unlink a temporary file.
* Store some information in the inode's bpf_local_storage during
  file_open.
* Verify that this information exists when the file is unlinked.

sk_local_storage:

* Hook to the socket_post_create and socket_bind LSM hooks.
* Open and bind a socket and set the sk_storage in the
  socket_post_create hook using the start_server helper.
* Verify if the information is set in the socket_bind hook.

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: KP Singh <kpsingh@google.com>
---
 .../bpf/prog_tests/test_local_storage.c       |  60 ++++++++
 .../selftests/bpf/progs/local_storage.c       | 136 ++++++++++++++++++
 2 files changed, 196 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/test_local_storage.c
 create mode 100644 tools/testing/selftests/bpf/progs/local_storage.c

diff --git a/tools/testing/selftests/bpf/prog_tests/test_local_storage.c b/tools/testing/selftests/bpf/prog_tests/test_local_storage.c
new file mode 100644
index 000000000000..d4ba89195c43
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/test_local_storage.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright (C) 2020 Google LLC.
+ */
+
+#include <test_progs.h>
+#include <linux/limits.h>
+
+#include "local_storage.skel.h"
+#include "network_helpers.h"
+
+int create_and_unlink_file(void)
+{
+	char fname[PATH_MAX] = "/tmp/fileXXXXXX";
+	int fd;
+
+	fd = mkstemp(fname);
+	if (fd < 0)
+		return fd;
+
+	close(fd);
+	unlink(fname);
+	return 0;
+}
+
+void test_test_local_storage(void)
+{
+	struct local_storage *skel = NULL;
+	int err, duration = 0, serv_sk = -1;
+
+	skel = local_storage__open_and_load();
+	if (CHECK(!skel, "skel_load", "lsm skeleton failed\n"))
+		goto close_prog;
+
+	err = local_storage__attach(skel);
+	if (CHECK(err, "attach", "lsm attach failed: %d\n", err))
+		goto close_prog;
+
+	skel->bss->monitored_pid = getpid();
+
+	err = create_and_unlink_file();
+	if (CHECK(err < 0, "exec_cmd", "err %d errno %d\n", err, errno))
+		goto close_prog;
+
+	CHECK(!skel->bss->inode_storage_result, "inode_storage_result",
+	      "inode_local_storage not set");
+
+	serv_sk = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0);
+	if (CHECK(serv_sk < 0, "start_server", "failed to start server\n"))
+		goto close_prog;
+
+	CHECK(!skel->bss->sk_storage_result, "sk_storage_result",
+	      "sk_local_storage not set");
+
+	close(serv_sk);
+
+close_prog:
+	local_storage__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/local_storage.c b/tools/testing/selftests/bpf/progs/local_storage.c
new file mode 100644
index 000000000000..cb608b7b90f0
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/local_storage.c
@@ -0,0 +1,136 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2020 Google LLC.
+ */
+
+#include <errno.h>
+#include <linux/bpf.h>
+#include <stdbool.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+#define DUMMY_STORAGE_VALUE 0xdeadbeef
+
+int monitored_pid = 0;
+bool inode_storage_result = false;
+bool sk_storage_result = false;
+
+struct dummy_storage {
+	__u32 value;
+};
+
+struct {
+	__uint(type, BPF_MAP_TYPE_INODE_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC);
+	__type(key, int);
+	__type(value, struct dummy_storage);
+} inode_storage_map SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_SK_STORAGE);
+	__uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_CLONE);
+	__type(key, int);
+	__type(value, struct dummy_storage);
+} sk_storage_map SEC(".maps");
+
+/* TODO Use vmlinux.h once BTF pruning for embedded types is fixed.
+ */
+struct sock {} __attribute__((preserve_access_index));
+struct sockaddr {} __attribute__((preserve_access_index));
+struct socket {
+	struct sock *sk;
+} __attribute__((preserve_access_index));
+
+struct inode {} __attribute__((preserve_access_index));
+struct dentry {
+	struct inode *d_inode;
+} __attribute__((preserve_access_index));
+struct file {
+	struct inode *f_inode;
+} __attribute__((preserve_access_index));
+
+
+SEC("lsm/inode_unlink")
+int BPF_PROG(unlink_hook, struct inode *dir, struct dentry *victim)
+{
+	__u32 pid = bpf_get_current_pid_tgid() >> 32;
+	struct dummy_storage *storage;
+
+	if (pid != monitored_pid)
+		return 0;
+
+	storage = bpf_inode_storage_get(&inode_storage_map, victim->d_inode, 0,
+				     BPF_SK_STORAGE_GET_F_CREATE);
+	if (!storage)
+		return 0;
+
+	if (storage->value == DUMMY_STORAGE_VALUE)
+		inode_storage_result = true;
+
+	return 0;
+}
+
+SEC("lsm/socket_bind")
+int BPF_PROG(socket_bind, struct socket *sock, struct sockaddr *address,
+	     int addrlen)
+{
+	__u32 pid = bpf_get_current_pid_tgid() >> 32;
+	struct dummy_storage *storage;
+
+	if (pid != monitored_pid)
+		return 0;
+
+	storage = bpf_sk_storage_get(&sk_storage_map, sock->sk, 0,
+				     BPF_SK_STORAGE_GET_F_CREATE);
+	if (!storage)
+		return 0;
+
+	if (storage->value == DUMMY_STORAGE_VALUE)
+		sk_storage_result = true;
+
+	return 0;
+}
+
+SEC("lsm/socket_post_create")
+int BPF_PROG(socket_post_create, struct socket *sock, int family, int type,
+	     int protocol, int kern)
+{
+	__u32 pid = bpf_get_current_pid_tgid() >> 32;
+	struct dummy_storage *storage;
+
+	if (pid != monitored_pid)
+		return 0;
+
+	storage = bpf_sk_storage_get(&sk_storage_map, sock->sk, 0,
+				     BPF_SK_STORAGE_GET_F_CREATE);
+	if (!storage)
+		return 0;
+
+	storage->value = DUMMY_STORAGE_VALUE;
+
+	return 0;
+}
+
+SEC("lsm/file_open")
+int BPF_PROG(test_int_hook, struct file *file)
+{
+	__u32 pid = bpf_get_current_pid_tgid() >> 32;
+	struct dummy_storage *storage;
+
+	if (pid != monitored_pid)
+		return 0;
+
+	if (!file->f_inode)
+		return 0;
+
+	storage = bpf_inode_storage_get(&inode_storage_map, file->f_inode, 0,
+				     BPF_LOCAL_STORAGE_GET_F_CREATE);
+	if (!storage)
+		return 0;
+
+	storage->value = DUMMY_STORAGE_VALUE;
+	return 0;
+}
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related

* [PATCH bpf-next v5 5/7] bpf: Implement bpf_local_storage for inodes
From: KP Singh @ 2020-07-22 17:14 UTC (permalink / raw)
  To: linux-kernel, bpf, linux-security-module
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Paul Turner, Jann Horn, Florent Revest
In-Reply-To: <20200722171409.102949-1-kpsingh@chromium.org>

From: KP Singh <kpsingh@google.com>

Similar to bpf_local_storage for sockets, add local storage for inodes.
The life-cycle of storage is managed with the life-cycle of the inode.
i.e. the storage is destroyed along with the owning inode.

The BPF LSM allocates an __rcu pointer to the bpf_local_storage in the
security blob which are now stackable and can co-exist with other LSMs.

Signed-off-by: KP Singh <kpsingh@google.com>
---
 include/linux/bpf_local_storage.h             |  10 +
 include/linux/bpf_lsm.h                       |  21 ++
 include/linux/bpf_types.h                     |   3 +
 include/uapi/linux/bpf.h                      |  38 ++
 kernel/bpf/Makefile                           |   1 +
 kernel/bpf/bpf_inode_storage.c                | 356 ++++++++++++++++++
 kernel/bpf/syscall.c                          |   3 +-
 kernel/bpf/verifier.c                         |  10 +
 security/bpf/hooks.c                          |   7 +
 .../bpf/bpftool/Documentation/bpftool-map.rst |   2 +-
 tools/bpf/bpftool/bash-completion/bpftool     |   3 +-
 tools/bpf/bpftool/map.c                       |   3 +-
 tools/include/uapi/linux/bpf.h                |  38 ++
 tools/lib/bpf/libbpf_probes.c                 |   5 +-
 14 files changed, 494 insertions(+), 6 deletions(-)
 create mode 100644 kernel/bpf/bpf_inode_storage.c

diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index d80573b11d4c..49eaae723aa2 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -162,4 +162,14 @@ bpf_local_storage_update(void *owner, struct bpf_map *map,
 			 struct bpf_local_storage *local_storage, void *value,
 			 u64 map_flags);
 
+#ifdef CONFIG_BPF_LSM
+extern const struct bpf_func_proto bpf_inode_storage_get_proto;
+extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
+void bpf_inode_storage_free(struct inode *inode);
+#else
+static inline void bpf_inode_storage_free(struct inode *inode)
+{
+}
+#endif /* CONFIG_BPF_LSM */
+
 #endif /* _BPF_LOCAL_STORAGE_H */
diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
index af74712af585..d0683ada1e49 100644
--- a/include/linux/bpf_lsm.h
+++ b/include/linux/bpf_lsm.h
@@ -17,9 +17,24 @@
 #include <linux/lsm_hook_defs.h>
 #undef LSM_HOOK
 
+struct bpf_storage_blob {
+	struct bpf_local_storage __rcu *storage;
+};
+
+extern struct lsm_blob_sizes bpf_lsm_blob_sizes;
+
 int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
 			const struct bpf_prog *prog);
 
+static inline struct bpf_storage_blob *bpf_inode(
+	const struct inode *inode)
+{
+	if (unlikely(!inode->i_security))
+		return NULL;
+
+	return inode->i_security + bpf_lsm_blob_sizes.lbs_inode;
+}
+
 #else /* !CONFIG_BPF_LSM */
 
 static inline int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
@@ -28,6 +43,12 @@ static inline int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
 	return -EOPNOTSUPP;
 }
 
+static inline struct bpf_storage_blob *bpf_inode(
+	const struct inode *inode)
+{
+	return NULL;
+}
+
 #endif /* CONFIG_BPF_LSM */
 
 #endif /* _LINUX_BPF_LSM_H */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index a52a5688418e..2e6f568377f1 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -107,6 +107,9 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_SK_STORAGE, sk_storage_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops)
 BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKHASH, sock_hash_ops)
 #endif
+#ifdef CONFIG_BPF_LSM
+BPF_MAP_TYPE(BPF_MAP_TYPE_INODE_STORAGE, inode_storage_map_ops)
+#endif
 BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops)
 #if defined(CONFIG_XDP_SOCKETS)
 BPF_MAP_TYPE(BPF_MAP_TYPE_XSKMAP, xsk_map_ops)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b9d2e4792d08..0bdfbe6067be 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -148,6 +148,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_DEVMAP_HASH,
 	BPF_MAP_TYPE_STRUCT_OPS,
 	BPF_MAP_TYPE_RINGBUF,
+	BPF_MAP_TYPE_INODE_STORAGE,
 };
 
 /* Note that tracing related programs such as
@@ -3377,6 +3378,41 @@ union bpf_attr {
  *		A non-negative value equal to or less than *size* on success,
  *		or a negative error in case of failure.
  *
+ * void *bpf_inode_storage_get(struct bpf_map *map, void *inode, void *value, u64 flags)
+ *	Description
+ *		Get a bpf_local_storage from an *inode*.
+ *
+ *		Logically, it could be thought of as getting the value from
+ *		a *map* with *inode* as the **key**.  From this
+ *		perspective,  the usage is not much different from
+ *		**bpf_map_lookup_elem**\ (*map*, **&**\ *inode*) except this
+ *		helper enforces the key must be an inode and the map must also
+ *		be a **BPF_MAP_TYPE_INODE_STORAGE**.
+ *
+ *		Underneath, the value is stored locally at *inode* instead of
+ *		the *map*.  The *map* is used as the bpf-local-storage
+ *		"type". The bpf-local-storage "type" (i.e. the *map*) is
+ *		searched against all bpf_local_storage residing at *inode*.
+ *
+ *		An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be
+ *		used such that a new bpf_local_storage will be
+ *		created if one does not exist.  *value* can be used
+ *		together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify
+ *		the initial value of a bpf_local_storage.  If *value* is
+ *		**NULL**, the new bpf_local_storage will be zero initialized.
+ *	Return
+ *		A bpf_local_storage pointer is returned on success.
+ *
+ *		**NULL** if not found or there was an error in adding
+ *		a new bpf_local_storage.
+ *
+ * int bpf_inode_storage_delete(struct bpf_map *map, void *inode)
+ *	Description
+ *		Delete a bpf_local_storage from an *inode*.
+ *	Return
+ *		0 on success.
+ *
+ *		**-ENOENT** if the bpf_local_storage cannot be found.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -3521,6 +3557,8 @@ union bpf_attr {
 	FN(skc_to_tcp_request_sock),	\
 	FN(skc_to_udp6_sock),		\
 	FN(get_task_stack),		\
+	FN(inode_storage_get),		\
+	FN(inode_storage_delete),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 0acb8f8a6042..0ea9fd15977c 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -5,6 +5,7 @@ CFLAGS_core.o += $(call cc-disable-warning, override-init)
 obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
 obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
+obj-${CONFIG_BPF_LSM}	  += bpf_inode_storage.o
 obj-$(CONFIG_BPF_SYSCALL) += disasm.o
 obj-$(CONFIG_BPF_JIT) += trampoline.o
 obj-$(CONFIG_BPF_SYSCALL) += btf.o
diff --git a/kernel/bpf/bpf_inode_storage.c b/kernel/bpf/bpf_inode_storage.c
new file mode 100644
index 000000000000..85eebaf2135b
--- /dev/null
+++ b/kernel/bpf/bpf_inode_storage.c
@@ -0,0 +1,356 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2019 Facebook
+ * Copyright 2020 Google LLC.
+ */
+
+#include <linux/rculist.h>
+#include <linux/list.h>
+#include <linux/hash.h>
+#include <linux/types.h>
+#include <linux/spinlock.h>
+#include <linux/bpf.h>
+#include <linux/bpf_local_storage.h>
+#include <net/sock.h>
+#include <uapi/linux/sock_diag.h>
+#include <uapi/linux/btf.h>
+#include <linux/bpf_lsm.h>
+#include <linux/btf_ids.h>
+#include <linux/fdtable.h>
+
+DEFINE_BPF_STORAGE_CACHE(inode_cache);
+
+static struct bpf_local_storage_elem *
+inode_selem_alloc(struct bpf_local_storage_map *smap, void *owner,
+		  void *value, bool charge_omem)
+{
+	return bpf_selem_alloc(smap, value);
+}
+
+static bool unlink_inode_storage(struct bpf_local_storage *local_storage,
+				 struct bpf_local_storage_elem *selem,
+				 bool uncharge_omem)
+{
+	struct bpf_local_storage_map *smap;
+	struct bpf_storage_blob *bsb;
+	bool free_local_storage;
+	struct inode *inode;
+
+	inode = local_storage->owner;
+	bsb = bpf_inode(inode);
+	if (!bsb)
+		return false;
+
+	smap = rcu_dereference(SDATA(selem)->smap);
+	/* All uncharging on sk->sk_omem_alloc must be done first.
+	 * sk may be freed once the last selem is unlinked from sk_storage.
+	 */
+
+	free_local_storage = hlist_is_singular_node(&selem->snode,
+						    &local_storage->list);
+
+	if (free_local_storage) {
+		/* After this RCU_INIT, sk may be freed and cannot be used */
+		RCU_INIT_POINTER(bsb->storage, NULL);
+		local_storage->owner = NULL;
+	}
+
+	return free_local_storage;
+
+}
+
+static struct bpf_local_storage_data *inode_storage_lookup(struct inode *inode,
+							   struct bpf_map *map,
+							   bool cacheit_lockit)
+{
+	struct bpf_local_storage *inode_storage;
+	struct bpf_local_storage_map *smap;
+	struct bpf_storage_blob *bsb;
+
+	bsb = bpf_inode(inode);
+	if (!bsb)
+		return ERR_PTR(-ENOENT);
+
+	inode_storage = rcu_dereference(bsb->storage);
+	if (!inode_storage)
+		return NULL;
+
+	smap = (struct bpf_local_storage_map *)map;
+	return bpf_local_storage_lookup(inode_storage, smap, cacheit_lockit);
+}
+
+static int inode_storage_alloc(void *owner, struct bpf_local_storage_map *smap,
+			       struct bpf_local_storage_elem *first_selem)
+{
+	struct bpf_local_storage *curr;
+	struct bpf_storage_blob *bsb;
+	struct inode *inode = owner;
+	int err;
+
+	bsb = bpf_inode(inode);
+	if (!bsb)
+		return -EINVAL;
+
+	curr = bpf_local_storage_alloc(smap);
+	if (!curr)
+		return -ENOMEM;
+
+	curr->owner = inode;
+
+	bpf_selem_link_storage(curr, first_selem);
+	bpf_selem_link_map(smap, first_selem);
+
+	err = bpf_local_storage_publish(first_selem,
+		(struct bpf_local_storage **)&bsb->storage, curr);
+	if (err) {
+		kfree(curr);
+		return err;
+	}
+
+	return 0;
+}
+
+static struct bpf_local_storage_data *inode_storage_update(void *owner,
+							   struct bpf_map *map,
+							   void *value,
+							   u64 map_flags)
+{
+	struct bpf_local_storage_data *old_sdata = NULL;
+	struct bpf_local_storage_elem *selem;
+	struct bpf_local_storage *local_storage;
+	struct bpf_local_storage_map *smap;
+	struct bpf_storage_blob *bsb;
+	struct inode *inode;
+	int err;
+
+	err = bpf_local_storage_check_update_flags(map, map_flags);
+	if (err)
+		return ERR_PTR(err);
+
+	inode = owner;
+	bsb = bpf_inode(inode);
+	local_storage = rcu_dereference(bsb->storage);
+	smap = (struct bpf_local_storage_map *)map;
+
+	if (!local_storage || hlist_empty(&local_storage->list)) {
+		/* Very first elem */
+		selem = map->ops->map_selem_alloc(smap, owner, value, !old_sdata);
+		if (!selem)
+			return ERR_PTR(-ENOMEM);
+
+		err = inode_storage_alloc(owner, smap, selem);
+		if (err) {
+			kfree(selem);
+			return ERR_PTR(err);
+		}
+
+		return SDATA(selem);
+	}
+
+	return bpf_local_storage_update(owner, map, local_storage, value,
+					map_flags);
+}
+
+
+void bpf_inode_storage_free(struct inode *inode)
+{
+	struct bpf_local_storage_elem *selem;
+	struct bpf_local_storage *local_storage;
+	bool free_inode_storage = false;
+	struct bpf_storage_blob *bsb;
+	struct hlist_node *n;
+
+	bsb = bpf_inode(inode);
+	if (!bsb)
+		return;
+
+	rcu_read_lock();
+
+	local_storage = rcu_dereference(bsb->storage);
+	if (!local_storage) {
+		rcu_read_unlock();
+		return;
+	}
+
+	/* Netiher the bpf_prog nor the bpf-map's syscall
+	 * could be modifying the local_storage->list now.
+	 * Thus, no elem can be added-to or deleted-from the
+	 * local_storage->list by the bpf_prog or by the bpf-map's syscall.
+	 *
+	 * It is racing with bpf_local_storage_map_free() alone
+	 * when unlinking elem from the local_storage->list and
+	 * the map's bucket->list.
+	 */
+	raw_spin_lock_bh(&local_storage->lock);
+	hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) {
+		/* Always unlink from map before unlinking from
+		 * local_storage.
+		 */
+		bpf_selem_unlink_map(selem);
+		free_inode_storage =
+			bpf_selem_unlink_storage(local_storage, selem, false);
+	}
+	raw_spin_unlock_bh(&local_storage->lock);
+	rcu_read_unlock();
+
+	/* free_inoode_storage should always be true as long as
+	 * local_storage->list was non-empty.
+	 */
+	if (free_inode_storage)
+		kfree_rcu(local_storage, rcu);
+}
+
+
+static void *bpf_fd_inode_storage_lookup_elem(struct bpf_map *map, void *key)
+{
+	struct bpf_local_storage_data *sdata;
+	struct file *f;
+	int fd;
+
+	fd = *(int *)key;
+	f = fcheck(fd);
+	if (!f)
+		return ERR_PTR(-EINVAL);
+
+	sdata = inode_storage_lookup(f->f_inode, map, true);
+	return sdata ? sdata->data : NULL;
+}
+
+static int bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key,
+					 void *value, u64 map_flags)
+{
+	struct bpf_local_storage_data *sdata;
+	struct file *f;
+	int fd;
+
+	fd = *(int *)key;
+	f = fcheck(fd);
+	if (!f)
+		return -EINVAL;
+
+	sdata = inode_storage_update(f->f_inode, map, value, map_flags);
+	return PTR_ERR_OR_ZERO(sdata);
+}
+
+static int inode_storage_delete(struct inode *inode, struct bpf_map *map)
+{
+	struct bpf_local_storage_data *sdata;
+
+	sdata = inode_storage_lookup(inode, map, false);
+	if (!sdata)
+		return -ENOENT;
+
+	bpf_selem_unlink(SELEM(sdata));
+
+	return 0;
+}
+
+static int bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key)
+{
+	struct file *f;
+	int fd;
+
+	fd = *(int *)key;
+	f = fcheck(fd);
+	if (!f)
+		return -EINVAL;
+
+	return inode_storage_delete(f->f_inode, map);
+}
+
+BPF_CALL_4(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode,
+	   void *, value, u64, flags)
+{
+	struct bpf_local_storage_data *sdata;
+
+	if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE))
+		return (unsigned long)NULL;
+
+	sdata = inode_storage_lookup(inode, map, true);
+	if (sdata)
+		return (unsigned long)sdata->data;
+
+	if (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) {
+		sdata = inode_storage_update(inode, map, value, BPF_NOEXIST);
+		return IS_ERR(sdata) ?
+			(unsigned long)NULL : (unsigned long)sdata->data;
+	}
+
+	return (unsigned long)NULL;
+}
+
+BPF_CALL_2(bpf_inode_storage_delete,
+	   struct bpf_map *, map, struct inode *, inode)
+{
+	return inode_storage_delete(inode, map);
+}
+
+static int notsupp_get_next_key(struct bpf_map *map, void *key,
+				void *next_key)
+{
+	return -ENOTSUPP;
+}
+
+static struct bpf_map *inode_storage_map_alloc(union bpf_attr *attr)
+{
+	struct bpf_local_storage_map *smap;
+
+	smap = bpf_local_storage_map_alloc(attr);
+	if (IS_ERR(smap))
+		return ERR_CAST(smap);
+
+	smap->cache_idx = bpf_local_storage_cache_idx_get(&inode_cache);
+	return &smap->map;
+}
+
+static void inode_storage_map_free(struct bpf_map *map)
+{
+	struct bpf_local_storage_map *smap;
+
+	smap = (struct bpf_local_storage_map *)map;
+	bpf_local_storage_cache_idx_free(&inode_cache, smap->cache_idx);
+	bpf_local_storage_map_free(smap);
+}
+
+static int inode_storage_map_btf_id;
+const struct bpf_map_ops inode_storage_map_ops = {
+	.map_alloc_check = bpf_local_storage_map_alloc_check,
+	.map_alloc = inode_storage_map_alloc,
+	.map_free = inode_storage_map_free,
+	.map_get_next_key = notsupp_get_next_key,
+	.map_lookup_elem = bpf_fd_inode_storage_lookup_elem,
+	.map_update_elem = bpf_fd_inode_storage_update_elem,
+	.map_delete_elem = bpf_fd_inode_storage_delete_elem,
+	.map_check_btf = bpf_local_storage_map_check_btf,
+	.map_btf_name = "bpf_local_storage_map",
+	.map_btf_id = &inode_storage_map_btf_id,
+	.map_selem_alloc = inode_selem_alloc,
+	.map_local_storage_update = inode_storage_update,
+	.map_local_storage_unlink = unlink_inode_storage,
+};
+
+BTF_ID_LIST(bpf_inode_storage_get_btf_ids)
+BTF_ID(struct, inode)
+
+const struct bpf_func_proto bpf_inode_storage_get_proto = {
+	.func		= bpf_inode_storage_get,
+	.gpl_only	= false,
+	.ret_type	= RET_PTR_TO_MAP_VALUE_OR_NULL,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_PTR_TO_BTF_ID,
+	.arg3_type	= ARG_PTR_TO_MAP_VALUE_OR_NULL,
+	.arg4_type	= ARG_ANYTHING,
+	.btf_id		= bpf_inode_storage_get_btf_ids,
+};
+
+BTF_ID_LIST(bpf_inode_storage_delete_btf_ids)
+BTF_ID(struct, inode)
+
+const struct bpf_func_proto bpf_inode_storage_delete_proto = {
+	.func		= bpf_inode_storage_delete,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_CONST_MAP_PTR,
+	.arg2_type	= ARG_PTR_TO_BTF_ID,
+	.btf_id		= bpf_inode_storage_delete_btf_ids,
+};
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d07417d17712..b64b39dd7d56 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -768,7 +768,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 		if (map->map_type != BPF_MAP_TYPE_HASH &&
 		    map->map_type != BPF_MAP_TYPE_ARRAY &&
 		    map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
-		    map->map_type != BPF_MAP_TYPE_SK_STORAGE)
+		    map->map_type != BPF_MAP_TYPE_SK_STORAGE &&
+		    map->map_type != BPF_MAP_TYPE_INODE_STORAGE)
 			return -ENOTSUPP;
 		if (map->spin_lock_off + sizeof(struct bpf_spin_lock) >
 		    map->value_size) {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 9a6703bc3f36..238bf6c9fd05 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4167,6 +4167,11 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		    func_id != BPF_FUNC_sk_storage_delete)
 			goto error;
 		break;
+	case BPF_MAP_TYPE_INODE_STORAGE:
+		if (func_id != BPF_FUNC_inode_storage_get &&
+		    func_id != BPF_FUNC_inode_storage_delete)
+			goto error;
+		break;
 	default:
 		break;
 	}
@@ -4240,6 +4245,11 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		if (map->map_type != BPF_MAP_TYPE_SK_STORAGE)
 			goto error;
 		break;
+	case BPF_FUNC_inode_storage_get:
+	case BPF_FUNC_inode_storage_delete:
+		if (map->map_type != BPF_MAP_TYPE_INODE_STORAGE)
+			goto error;
+		break;
 	default:
 		break;
 	}
diff --git a/security/bpf/hooks.c b/security/bpf/hooks.c
index 32d32d485451..35f9b19259e5 100644
--- a/security/bpf/hooks.c
+++ b/security/bpf/hooks.c
@@ -3,6 +3,7 @@
 /*
  * Copyright (C) 2020 Google LLC.
  */
+#include <linux/bpf_local_storage.h>
 #include <linux/lsm_hooks.h>
 #include <linux/bpf_lsm.h>
 
@@ -11,6 +12,7 @@ static struct security_hook_list bpf_lsm_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(NAME, bpf_lsm_##NAME),
 	#include <linux/lsm_hook_defs.h>
 	#undef LSM_HOOK
+	LSM_HOOK_INIT(inode_free_security, bpf_inode_storage_free),
 };
 
 static int __init bpf_lsm_init(void)
@@ -20,7 +22,12 @@ static int __init bpf_lsm_init(void)
 	return 0;
 }
 
+struct lsm_blob_sizes bpf_lsm_blob_sizes __lsm_ro_after_init = {
+	.lbs_inode = sizeof(struct bpf_storage_blob),
+};
+
 DEFINE_LSM(bpf) = {
 	.name = "bpf",
 	.init = bpf_lsm_init,
+	.blobs = &bpf_lsm_blob_sizes
 };
diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst
index 41e2a74252d0..083db6c2fc67 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-map.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst
@@ -49,7 +49,7 @@ MAP COMMANDS
 |		| **lru_percpu_hash** | **lpm_trie** | **array_of_maps** | **hash_of_maps**
 |		| **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash**
 |		| **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage**
-|		| **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** }
+|		| **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage** }
 
 DESCRIPTION
 ===========
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 7b137264ea3a..bccdffb70e23 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -688,7 +688,8 @@ _bpftool()
                                 lru_percpu_hash lpm_trie array_of_maps \
                                 hash_of_maps devmap devmap_hash sockmap cpumap \
                                 xskmap sockhash cgroup_storage reuseport_sockarray \
-                                percpu_cgroup_storage queue stack' -- \
+                                percpu_cgroup_storage queue stack sk_storage \
+                                struct_ops inode_storage' -- \
                                                    "$cur" ) )
                             return 0
                             ;;
diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
index 3a27d31a1856..bc0071228f88 100644
--- a/tools/bpf/bpftool/map.c
+++ b/tools/bpf/bpftool/map.c
@@ -50,6 +50,7 @@ const char * const map_type_name[] = {
 	[BPF_MAP_TYPE_SK_STORAGE]		= "sk_storage",
 	[BPF_MAP_TYPE_STRUCT_OPS]		= "struct_ops",
 	[BPF_MAP_TYPE_RINGBUF]			= "ringbuf",
+	[BPF_MAP_TYPE_INODE_STORAGE]		= "inode_storage",
 };
 
 const size_t map_type_name_size = ARRAY_SIZE(map_type_name);
@@ -1442,7 +1443,7 @@ static int do_help(int argc, char **argv)
 		"                 lru_percpu_hash | lpm_trie | array_of_maps | hash_of_maps |\n"
 		"                 devmap | devmap_hash | sockmap | cpumap | xskmap | sockhash |\n"
 		"                 cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n"
-		"                 queue | stack | sk_storage | struct_ops | ringbuf }\n"
+		"                 queue | stack | sk_storage | struct_ops | ringbuf | inode_storage }\n"
 		"       " HELP_SPEC_OPTIONS "\n"
 		"",
 		bin_name, argv[-2]);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index b9d2e4792d08..0bdfbe6067be 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -148,6 +148,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_DEVMAP_HASH,
 	BPF_MAP_TYPE_STRUCT_OPS,
 	BPF_MAP_TYPE_RINGBUF,
+	BPF_MAP_TYPE_INODE_STORAGE,
 };
 
 /* Note that tracing related programs such as
@@ -3377,6 +3378,41 @@ union bpf_attr {
  *		A non-negative value equal to or less than *size* on success,
  *		or a negative error in case of failure.
  *
+ * void *bpf_inode_storage_get(struct bpf_map *map, void *inode, void *value, u64 flags)
+ *	Description
+ *		Get a bpf_local_storage from an *inode*.
+ *
+ *		Logically, it could be thought of as getting the value from
+ *		a *map* with *inode* as the **key**.  From this
+ *		perspective,  the usage is not much different from
+ *		**bpf_map_lookup_elem**\ (*map*, **&**\ *inode*) except this
+ *		helper enforces the key must be an inode and the map must also
+ *		be a **BPF_MAP_TYPE_INODE_STORAGE**.
+ *
+ *		Underneath, the value is stored locally at *inode* instead of
+ *		the *map*.  The *map* is used as the bpf-local-storage
+ *		"type". The bpf-local-storage "type" (i.e. the *map*) is
+ *		searched against all bpf_local_storage residing at *inode*.
+ *
+ *		An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be
+ *		used such that a new bpf_local_storage will be
+ *		created if one does not exist.  *value* can be used
+ *		together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify
+ *		the initial value of a bpf_local_storage.  If *value* is
+ *		**NULL**, the new bpf_local_storage will be zero initialized.
+ *	Return
+ *		A bpf_local_storage pointer is returned on success.
+ *
+ *		**NULL** if not found or there was an error in adding
+ *		a new bpf_local_storage.
+ *
+ * int bpf_inode_storage_delete(struct bpf_map *map, void *inode)
+ *	Description
+ *		Delete a bpf_local_storage from an *inode*.
+ *	Return
+ *		0 on success.
+ *
+ *		**-ENOENT** if the bpf_local_storage cannot be found.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -3521,6 +3557,8 @@ union bpf_attr {
 	FN(skc_to_tcp_request_sock),	\
 	FN(skc_to_udp6_sock),		\
 	FN(get_task_stack),		\
+	FN(inode_storage_get),		\
+	FN(inode_storage_delete),	\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index 5a3d3f078408..daaad635d0ed 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -173,7 +173,7 @@ int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
 	return btf_fd;
 }
 
-static int load_sk_storage_btf(void)
+static int load_local_storage_btf(void)
 {
 	const char strs[] = "\0bpf_spin_lock\0val\0cnt\0l";
 	/* struct bpf_spin_lock {
@@ -232,12 +232,13 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex)
 		key_size	= 0;
 		break;
 	case BPF_MAP_TYPE_SK_STORAGE:
+	case BPF_MAP_TYPE_INODE_STORAGE:
 		btf_key_type_id = 1;
 		btf_value_type_id = 3;
 		value_size = 8;
 		max_entries = 0;
 		map_flags = BPF_F_NO_PREALLOC;
-		btf_fd = load_sk_storage_btf();
+		btf_fd = load_local_storage_btf();
 		if (btf_fd < 0)
 			return false;
 		break;
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related

* [PATCH bpf-next v5 3/7] bpf: Generalize bpf_sk_storage
From: KP Singh @ 2020-07-22 17:14 UTC (permalink / raw)
  To: linux-kernel, bpf, linux-security-module
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Paul Turner, Jann Horn, Florent Revest
In-Reply-To: <20200722171409.102949-1-kpsingh@chromium.org>

From: KP Singh <kpsingh@google.com>

Refactor the functionality in bpf_sk_storage.c so that concept of
storage linked to kernel objects can be extended to other objects like
inode, task_struct etc.

Each new local storage will still be a separate map and provide its own
set of helpers. This allows for future object specific extensions and
still share a lot of the underlying implementation.

Signed-off-by: KP Singh <kpsingh@google.com>
---
 include/linux/bpf.h            |  13 ++
 include/net/bpf_sk_storage.h   |  55 +++++
 include/uapi/linux/bpf.h       |   8 +-
 net/core/bpf_sk_storage.c      | 387 ++++++++++++++++++++-------------
 tools/include/uapi/linux/bpf.h |   8 +-
 5 files changed, 320 insertions(+), 151 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bae557ff2da8..9b83665b56e4 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -33,6 +33,9 @@ struct btf;
 struct btf_type;
 struct exception_table_entry;
 struct seq_operations;
+struct bpf_local_storage;
+struct bpf_local_storage_map;
+struct bpf_local_storage_elem;
 
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
@@ -93,6 +96,16 @@ struct bpf_map_ops {
 	__poll_t (*map_poll)(struct bpf_map *map, struct file *filp,
 			     struct poll_table_struct *pts);
 
+	/* Functions called by bpf_local_storage maps */
+	bool (*map_local_storage_unlink)(struct bpf_local_storage *local_storage,
+					 struct bpf_local_storage_elem *selem,
+					 bool uncharge_omem);
+	struct bpf_local_storage_elem *(*map_selem_alloc)(
+		struct bpf_local_storage_map *smap, void *owner, void *value,
+		bool charge_omem);
+	struct bpf_local_storage_data *(*map_local_storage_update)(
+		void  *owner, struct bpf_map *map, void *value, u64 flags);
+
 	/* BTF name and id of struct allocated by map_alloc */
 	const char * const map_btf_name;
 	int *map_btf_id;
diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h
index 950c5aaba15e..e3185cfb91da 100644
--- a/include/net/bpf_sk_storage.h
+++ b/include/net/bpf_sk_storage.h
@@ -3,8 +3,15 @@
 #ifndef _BPF_SK_STORAGE_H
 #define _BPF_SK_STORAGE_H
 
+#include <linux/rculist.h>
+#include <linux/list.h>
+#include <linux/hash.h>
 #include <linux/types.h>
 #include <linux/spinlock.h>
+#include <linux/bpf.h>
+#include <net/sock.h>
+#include <uapi/linux/sock_diag.h>
+#include <uapi/linux/btf.h>
 
 struct sock;
 
@@ -34,6 +41,54 @@ u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache);
 void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache,
 				      u16 idx);
 
+/* Helper functions for bpf_local_storage */
+int bpf_local_storage_map_alloc_check(union bpf_attr *attr);
+
+struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr);
+
+struct bpf_local_storage_data *
+bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
+			 struct bpf_local_storage_map *smap,
+			 bool cacheit_lockit);
+
+void bpf_local_storage_map_free(struct bpf_local_storage_map *smap);
+
+int bpf_local_storage_map_check_btf(const struct bpf_map *map,
+				    const struct btf *btf,
+				    const struct btf_type *key_type,
+				    const struct btf_type *value_type);
+
+void bpf_selem_link_storage(struct bpf_local_storage *local_storage,
+			    struct bpf_local_storage_elem *selem);
+
+bool bpf_selem_unlink_storage(struct bpf_local_storage *local_storage,
+			      struct bpf_local_storage_elem *selem,
+			      bool uncharge_omem);
+
+void bpf_selem_unlink(struct bpf_local_storage_elem *selem);
+
+void bpf_selem_link_map(struct bpf_local_storage_map *smap,
+			struct bpf_local_storage_elem *selem);
+
+void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem);
+
+struct bpf_local_storage_elem *
+bpf_selem_alloc(struct bpf_local_storage_map *smap, void *value);
+
+struct bpf_local_storage *
+bpf_local_storage_alloc(struct bpf_local_storage_map *smap);
+
+int bpf_local_storage_publish(struct bpf_local_storage_elem *first_selem,
+			      struct bpf_local_storage **addr,
+			      struct bpf_local_storage *curr);
+
+int bpf_local_storage_check_update_flags(struct bpf_map *map, u64 map_flags);
+
+struct bpf_local_storage_data *
+bpf_local_storage_update(void *owner, struct bpf_map *map,
+			 struct bpf_local_storage *local_storage, void *value,
+			 u64 map_flags);
+
 #ifdef CONFIG_BPF_SYSCALL
 int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk);
 struct bpf_sk_storage_diag *
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 54d0c886e3ba..b9d2e4792d08 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3630,9 +3630,13 @@ enum {
 	BPF_F_SYSCTL_BASE_NAME		= (1ULL << 0),
 };
 
-/* BPF_FUNC_sk_storage_get flags */
+/* BPF_FUNC_<local>_storage_get flags */
 enum {
-	BPF_SK_STORAGE_GET_F_CREATE	= (1ULL << 0),
+	BPF_LOCAL_STORAGE_GET_F_CREATE	= (1ULL << 0),
+	/* BPF_SK_STORAGE_GET_F_CREATE is only kept for backward compatibility
+	 * and BPF_LOCAL_STORAGE_GET_F_CREATE must be used instead.
+	 */
+	BPF_SK_STORAGE_GET_F_CREATE  = BPF_LOCAL_STORAGE_GET_F_CREATE,
 };
 
 /* BPF_FUNC_read_branch_records flags. */
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index aa3e3a47acb5..f6bb02f076ad 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -83,7 +83,7 @@ struct bpf_local_storage_elem {
 struct bpf_local_storage {
 	struct bpf_local_storage_data __rcu *cache[BPF_LOCAL_STORAGE_CACHE_SIZE];
 	struct hlist_head list; /* List of bpf_local_storage_elem */
-	struct sock *owner;	/* The object that owns the the above "list" of
+	void *owner;		/* The object that owns the the above "list" of
 				 * bpf_local_storage_elem.
 				 */
 	struct rcu_head rcu;
@@ -119,15 +119,11 @@ static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
 	return !hlist_unhashed(&selem->map_node);
 }
 
-static struct bpf_local_storage_elem *
-bpf_selem_alloc(struct bpf_local_storage_map *smap, struct sock *sk,
-		void *value, bool charge_omem)
+struct bpf_local_storage_elem *
+bpf_selem_alloc(struct bpf_local_storage_map *smap, void *value)
 {
 	struct bpf_local_storage_elem *selem;
 
-	if (charge_omem && omem_charge(sk, smap->elem_size))
-		return NULL;
-
 	selem = kzalloc(smap->elem_size, GFP_ATOMIC | __GFP_NOWARN);
 	if (selem) {
 		if (value)
@@ -135,6 +131,23 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, struct sock *sk,
 		return selem;
 	}
 
+	return NULL;
+}
+
+static struct bpf_local_storage_elem *
+sk_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value,
+	       bool charge_omem)
+{
+	struct bpf_local_storage_elem *selem;
+	struct sock *sk = owner;
+
+	if (charge_omem && omem_charge(sk, smap->elem_size))
+		return NULL;
+
+	selem = bpf_selem_alloc(smap, value);
+	if (selem)
+		return selem;
+
 	if (charge_omem)
 		atomic_sub(smap->elem_size, &sk->sk_omem_alloc);
 
@@ -145,51 +158,65 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, struct sock *sk,
  * The caller must ensure selem->smap is still valid to be
  * dereferenced for its smap->elem_size and smap->cache_idx.
  */
-static bool bpf_selem_unlink_storage(struct bpf_local_storage *local_storage,
-				     struct bpf_local_storage_elem *selem,
-				     bool uncharge_omem)
+bool bpf_selem_unlink_storage(struct bpf_local_storage *local_storage,
+			      struct bpf_local_storage_elem *selem,
+			      bool uncharge_omem)
 {
 	struct bpf_local_storage_map *smap;
 	bool free_local_storage;
-	struct sock *sk;
 
 	smap = rcu_dereference(SDATA(selem)->smap);
-	sk = local_storage->owner;
+	/* local_storage is not freed now. local_storage->lock is
+	 * still held and raw_spin_unlock_bh(&local_storage->lock)
+	 * will be done by the caller.
+	 * Although the unlock will be done under
+	 * rcu_read_lock(),  it is more intutivie to
+	 * read if kfree_rcu(local_storage, rcu) is done
+	 * after the raw_spin_unlock_bh(&local_storage->lock).
+	 *
+	 * Hence, a "bool free_local_storage" is returned
+	 * to the caller which then calls the kfree_rcu()
+	 * after unlock.
+	 */
+	free_local_storage = smap->map.ops->map_local_storage_unlink(
+		local_storage, selem, uncharge_omem);
+	hlist_del_init_rcu(&selem->snode);
+	if (rcu_access_pointer(local_storage->cache[smap->cache_idx]) ==
+	    SDATA(selem))
+		RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
+
+	kfree_rcu(selem, rcu);
+
+	return free_local_storage;
+}
 
+static bool unlink_sk_storage(struct bpf_local_storage *local_storage,
+			      struct bpf_local_storage_elem *selem,
+			      bool uncharge_omem)
+{
+	struct bpf_local_storage_map *smap;
+	struct sock *sk = local_storage->owner;
+	bool free_local_storage;
+
+	smap = rcu_dereference(SDATA(selem)->smap);
 	/* All uncharging on sk->sk_omem_alloc must be done first.
 	 * sk may be freed once the last selem is unlinked from local_storage.
 	 */
-	if (uncharge_omem)
-		atomic_sub(smap->elem_size, &sk->sk_omem_alloc);
 
 	free_local_storage = hlist_is_singular_node(&selem->snode,
 						    &local_storage->list);
+
+	if (uncharge_omem)
+		atomic_sub(smap->elem_size, &sk->sk_omem_alloc);
+
 	if (free_local_storage) {
-		atomic_sub(sizeof(struct bpf_local_storage), &sk->sk_omem_alloc);
-		local_storage->owner = NULL;
+		atomic_sub(sizeof(struct bpf_local_storage),
+			   &sk->sk_omem_alloc);
+
 		/* After this RCU_INIT, sk may be freed and cannot be used */
 		RCU_INIT_POINTER(sk->sk_bpf_storage, NULL);
-
-		/* local_storage is not freed now.  local_storage->lock is
-		 * still held and raw_spin_unlock_bh(&local_storage->lock)
-		 * will be done by the caller.
-		 *
-		 * Although the unlock will be done under
-		 * rcu_read_lock(),  it is more intutivie to
-		 * read if kfree_rcu(local_storage, rcu) is done
-		 * after the raw_spin_unlock_bh(&local_storage->lock).
-		 *
-		 * Hence, a "bool free_local_storage" is returned
-		 * to the caller which then calls the kfree_rcu()
-		 * after unlock.
-		 */
+		local_storage->owner = NULL;
 	}
-	hlist_del_init_rcu(&selem->snode);
-	if (rcu_access_pointer(local_storage->cache[smap->cache_idx]) ==
-	    SDATA(selem))
-		RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
-
-	kfree_rcu(selem, rcu);
 
 	return free_local_storage;
 }
@@ -214,14 +241,14 @@ static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem)
 		kfree_rcu(local_storage, rcu);
 }
 
-static void bpf_selem_link_storage(struct bpf_local_storage *local_storage,
-				   struct bpf_local_storage_elem *selem)
+void bpf_selem_link_storage(struct bpf_local_storage *local_storage,
+			    struct bpf_local_storage_elem *selem)
 {
 	RCU_INIT_POINTER(selem->local_storage, local_storage);
 	hlist_add_head(&selem->snode, &local_storage->list);
 }
 
-static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
+void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
 {
 	struct bpf_local_storage_map *smap;
 	struct bpf_local_storage_map_bucket *b;
@@ -238,8 +265,8 @@ static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
 	raw_spin_unlock_bh(&b->lock);
 }
 
-static void bpf_selem_link_map(struct bpf_local_storage_map *smap,
-			       struct bpf_local_storage_elem *selem)
+void bpf_selem_link_map(struct bpf_local_storage_map *smap,
+			struct bpf_local_storage_elem *selem)
 {
 	struct bpf_local_storage_map_bucket *b = select_bucket(smap, selem);
 
@@ -249,7 +276,7 @@ static void bpf_selem_link_map(struct bpf_local_storage_map *smap,
 	raw_spin_unlock_bh(&b->lock);
 }
 
-static void bpf_selem_unlink(struct bpf_local_storage_elem *selem)
+void bpf_selem_unlink(struct bpf_local_storage_elem *selem)
 {
 	/* Always unlink from map before unlinking from local_storage
 	 * because selem will be freed after successfully unlinked from
@@ -259,7 +286,7 @@ static void bpf_selem_unlink(struct bpf_local_storage_elem *selem)
 	__bpf_selem_unlink_storage(selem);
 }
 
-static struct bpf_local_storage_data *
+struct bpf_local_storage_data *
 bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
 			 struct bpf_local_storage_map *smap,
 			 bool cacheit_lockit)
@@ -325,59 +352,53 @@ static int check_flags(const struct bpf_local_storage_data *old_sdata,
 	return 0;
 }
 
-static int sk_storage_alloc(struct sock *sk,
+struct bpf_local_storage *
+bpf_local_storage_alloc(struct bpf_local_storage_map *smap)
+{
+	struct bpf_local_storage *storage;
+
+	storage = kzalloc(sizeof(*storage), GFP_ATOMIC | __GFP_NOWARN);
+	if (!storage)
+		return NULL;
+
+	INIT_HLIST_HEAD(&storage->list);
+	raw_spin_lock_init(&storage->lock);
+	return storage;
+}
+
+static int sk_storage_alloc(void *owner,
 			    struct bpf_local_storage_map *smap,
 			    struct bpf_local_storage_elem *first_selem)
 {
-	struct bpf_local_storage *prev_sk_storage, *sk_storage;
+	struct bpf_local_storage *curr;
+	struct sock *sk = owner;
 	int err;
 
-	err = omem_charge(sk, sizeof(*sk_storage));
+	err = omem_charge(sk, sizeof(*curr));
 	if (err)
 		return err;
 
-	sk_storage = kzalloc(sizeof(*sk_storage), GFP_ATOMIC | __GFP_NOWARN);
-	if (!sk_storage) {
+	curr = bpf_local_storage_alloc(smap);
+	if (!curr) {
 		err = -ENOMEM;
 		goto uncharge;
 	}
-	INIT_HLIST_HEAD(&sk_storage->list);
-	raw_spin_lock_init(&sk_storage->lock);
-	sk_storage->owner = sk;
 
-	bpf_selem_link_storage(sk_storage, first_selem);
+	curr->owner = sk;
+
+	bpf_selem_link_storage(curr, first_selem);
 	bpf_selem_link_map(smap, first_selem);
-	/* Publish sk_storage to sk.  sk->sk_lock cannot be acquired.
-	 * Hence, atomic ops is used to set sk->sk_bpf_storage
-	 * from NULL to the newly allocated sk_storage ptr.
-	 *
-	 * From now on, the sk->sk_bpf_storage pointer is protected
-	 * by the sk_storage->lock.  Hence,  when freeing
-	 * the sk->sk_bpf_storage, the sk_storage->lock must
-	 * be held before setting sk->sk_bpf_storage to NULL.
-	 */
-	prev_sk_storage = cmpxchg((struct bpf_local_storage **)&sk->sk_bpf_storage,
-				  NULL, sk_storage);
-	if (unlikely(prev_sk_storage)) {
-		bpf_selem_unlink_map(first_selem);
-		err = -EAGAIN;
-		goto uncharge;
 
-		/* Note that even first_selem was linked to smap's
-		 * bucket->list, first_selem can be freed immediately
-		 * (instead of kfree_rcu) because
-		 * bpf_sk_storage_map_free() does a
-		 * synchronize_rcu() before walking the bucket->list.
-		 * Hence, no one is accessing selem from the
-		 * bucket->list under rcu_read_lock().
-		 */
-	}
+	err = bpf_local_storage_publish(first_selem,
+		(struct bpf_local_storage **)&sk->sk_bpf_storage, curr);
+	if (err)
+		goto uncharge;
 
 	return 0;
 
 uncharge:
-	kfree(sk_storage);
-	atomic_sub(sizeof(*sk_storage), &sk->sk_omem_alloc);
+	kfree(curr);
+	atomic_sub(sizeof(*curr), &sk->sk_omem_alloc);
 	return err;
 }
 
@@ -386,54 +407,28 @@ static int sk_storage_alloc(struct sock *sk,
  * Otherwise, it will become a leak (and other memory issues
  * during map destruction).
  */
-static struct bpf_local_storage_data *
-bpf_local_storage_update(struct sock *sk, struct bpf_map *map, void *value,
+struct bpf_local_storage_data *
+bpf_local_storage_update(void *owner, struct bpf_map *map,
+			 struct bpf_local_storage *local_storage, void *value,
 			 u64 map_flags)
 {
 	struct bpf_local_storage_data *old_sdata = NULL;
 	struct bpf_local_storage_elem *selem;
-	struct bpf_local_storage *local_storage;
 	struct bpf_local_storage_map *smap;
 	int err;
 
-	/* BPF_EXIST and BPF_NOEXIST cannot be both set */
-	if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) ||
-	    /* BPF_F_LOCK can only be used in a value with spin_lock */
-	    unlikely((map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map)))
-		return ERR_PTR(-EINVAL);
-
 	smap = (struct bpf_local_storage_map *)map;
-	local_storage = rcu_dereference(sk->sk_bpf_storage);
-	if (!local_storage || hlist_empty(&local_storage->list)) {
-		/* Very first elem for this object */
-		err = check_flags(NULL, map_flags);
-		if (err)
-			return ERR_PTR(err);
-
-		selem = bpf_selem_alloc(smap, sk, value, true);
-		if (!selem)
-			return ERR_PTR(-ENOMEM);
-
-		err = sk_storage_alloc(sk, smap, selem);
-		if (err) {
-			kfree(selem);
-			atomic_sub(smap->elem_size, &sk->sk_omem_alloc);
-			return ERR_PTR(err);
-		}
-
-		return SDATA(selem);
-	}
 
 	if ((map_flags & BPF_F_LOCK) && !(map_flags & BPF_NOEXIST)) {
 		/* Hoping to find an old_sdata to do inline update
 		 * such that it can avoid taking the local_storage->lock
 		 * and changing the lists.
 		 */
-		old_sdata =
-			bpf_local_storage_lookup(local_storage, smap, false);
+		old_sdata = bpf_local_storage_lookup(local_storage, smap, false);
 		err = check_flags(old_sdata, map_flags);
 		if (err)
 			return ERR_PTR(err);
+
 		if (old_sdata && selem_linked_to_storage(SELEM(old_sdata))) {
 			copy_map_value_locked(map, old_sdata->data,
 					      value, false);
@@ -471,10 +466,9 @@ bpf_local_storage_update(struct sock *sk, struct bpf_map *map, void *value,
 	 * and then uncharge the old selem later (which may cause
 	 * a potential but unnecessary charge failure),  avoid taking
 	 * a charge at all here (the "!old_sdata" check) and the
-	 * old_sdata will not be uncharged later during
-	 * bpf_selem_unlink_storage().
+	 * old_sdata will not be uncharged later during bpf_selem_unlink().
 	 */
-	selem = bpf_selem_alloc(smap, sk, value, !old_sdata);
+	selem = map->ops->map_selem_alloc(smap, owner, value, !old_sdata);
 	if (!selem) {
 		err = -ENOMEM;
 		goto unlock_err;
@@ -501,6 +495,87 @@ bpf_local_storage_update(struct sock *sk, struct bpf_map *map, void *value,
 	return ERR_PTR(err);
 }
 
+int bpf_local_storage_check_update_flags(struct bpf_map *map, u64 map_flags)
+{
+	/* BPF_EXIST and BPF_NOEXIST cannot be both set */
+	if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) ||
+	    /* BPF_F_LOCK can only be used in a value with spin_lock */
+	    unlikely((map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map)))
+		return -EINVAL;
+
+	return 0;
+}
+
+/* Publish local_storage to the address.  This is used because we are already
+ * in a region where we cannot grab a lock on the object owning the storage (
+ * (e.g sk->sk_lock). Hence, atomic ops is used.
+ *
+ * From now on, the addr pointer is protected
+ * by the local_storage->lock.  Hence, upon freeing,
+ * the local_storage->lock must be held before unlinking the storage from the
+ * owner.
+ */
+int bpf_local_storage_publish(struct bpf_local_storage_elem *first_selem,
+			      struct bpf_local_storage **addr,
+			      struct bpf_local_storage *curr)
+{
+	struct bpf_local_storage *prev;
+
+	prev = cmpxchg(addr, NULL, curr);
+	if (unlikely(prev)) {
+		/* Note that even first_selem was linked to smap's
+		 * bucket->list, first_selem can be freed immediately
+		 * (instead of kfree_rcu) because
+		 * bpf_local_storage_map_free() does a
+		 * synchronize_rcu() before walking the bucket->list.
+		 * Hence, no one is accessing selem from the
+		 * bucket->list under rcu_read_lock().
+		 */
+		bpf_selem_unlink_map(first_selem);
+		return -EAGAIN;
+	}
+
+	return 0;
+}
+
+static struct bpf_local_storage_data *
+sk_storage_update(void *owner, struct bpf_map *map, void *value, u64 map_flags)
+{
+	struct bpf_local_storage_data *old_sdata = NULL;
+	struct bpf_local_storage_elem *selem;
+	struct bpf_local_storage *local_storage;
+	struct bpf_local_storage_map *smap;
+	struct sock *sk;
+	int err;
+
+	err = bpf_local_storage_check_update_flags(map, map_flags);
+	if (err)
+		return ERR_PTR(err);
+
+	sk = owner;
+	local_storage = rcu_dereference(sk->sk_bpf_storage);
+	smap = (struct bpf_local_storage_map *)map;
+
+	if (!local_storage || hlist_empty(&local_storage->list)) {
+		/* Very first elem */
+		selem = map->ops->map_selem_alloc(smap, owner, value, !old_sdata);
+		if (!selem)
+			return ERR_PTR(-ENOMEM);
+
+		err = sk_storage_alloc(owner, smap, selem);
+		if (err) {
+			kfree(selem);
+			atomic_sub(smap->elem_size, &sk->sk_omem_alloc);
+			return ERR_PTR(err);
+		}
+
+		return SDATA(selem);
+	}
+
+	return bpf_local_storage_update(owner, map, local_storage, value,
+					map_flags);
+}
+
 static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
 {
 	struct bpf_local_storage_data *sdata;
@@ -566,7 +641,7 @@ void bpf_sk_storage_free(struct sock *sk)
 	 * Thus, no elem can be added-to or deleted-from the
 	 * sk_storage->list by the bpf_prog or by the bpf-map's syscall.
 	 *
-	 * It is racing with bpf_sk_storage_map_free() alone
+	 * It is racing with bpf_local_storage_map_free() alone
 	 * when unlinking elem from the sk_storage->list and
 	 * the map's bucket->list.
 	 */
@@ -586,17 +661,12 @@ void bpf_sk_storage_free(struct sock *sk)
 		kfree_rcu(sk_storage, rcu);
 }
 
-static void bpf_local_storage_map_free(struct bpf_map *map)
+void bpf_local_storage_map_free(struct bpf_local_storage_map *smap)
 {
 	struct bpf_local_storage_elem *selem;
-	struct bpf_local_storage_map *smap;
 	struct bpf_local_storage_map_bucket *b;
 	unsigned int i;
 
-	smap = (struct bpf_local_storage_map *)map;
-
-	bpf_local_storage_cache_idx_free(&sk_cache, smap->cache_idx);
-
 	/* Note that this map might be concurrently cloned from
 	 * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone
 	 * RCU read section to finish before proceeding. New RCU
@@ -606,42 +676,51 @@ static void bpf_local_storage_map_free(struct bpf_map *map)
 
 	/* bpf prog and the userspace can no longer access this map
 	 * now.  No new selem (of this map) can be added
-	 * to the sk->sk_bpf_storage or to the map bucket's list.
+	 * to the bpf_local_storage or to the map bucket's list.
 	 *
 	 * The elem of this map can be cleaned up here
-	 * or
-	 * by bpf_sk_storage_free() during __sk_destruct().
+	 * or by bpf_local_storage_free() during the destruction of the
+	 * owner object. eg. __sk_destruct.
 	 */
 	for (i = 0; i < (1U << smap->bucket_log); i++) {
 		b = &smap->buckets[i];
 
 		rcu_read_lock();
 		/* No one is adding to b->list now */
-		while ((selem = hlist_entry_safe(
-				rcu_dereference_raw(hlist_first_rcu(&b->list)),
-				struct bpf_local_storage_elem, map_node))) {
+		while ((selem = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(&b->list)),
+						 struct bpf_local_storage_elem,
+						 map_node))) {
 			bpf_selem_unlink(selem);
 			cond_resched_rcu();
 		}
 		rcu_read_unlock();
 	}
 
-	/* bpf_sk_storage_free() may still need to access the map.
-	 * e.g. bpf_sk_storage_free() has unlinked selem from the map
+	/* bpf_local_storage_free() may still need to access the map.
+	 * e.g. bpf_local_storage_free() has unlinked selem from the map
 	 * which then made the above while((selem = ...)) loop
 	 * exited immediately.
 	 *
-	 * However, the bpf_sk_storage_free() still needs to access
+	 * However, the bpf_local_storage_free() still needs to access
 	 * the smap->elem_size to do the uncharging in
-	 * bpf_selem_unlink_storage().
+	 * bpf_selem_unlink().
 	 *
 	 * Hence, wait another rcu grace period for the
-	 * bpf_sk_storage_free() to finish.
+	 * bpf_local_storage_free() to finish.
 	 */
 	synchronize_rcu();
 
 	kvfree(smap->buckets);
-	kfree(map);
+	kfree(smap);
+}
+
+static void sk_storage_map_free(struct bpf_map *map)
+{
+	struct bpf_local_storage_map *smap;
+
+	smap = (struct bpf_local_storage_map *)map;
+	bpf_local_storage_cache_idx_free(&sk_cache, smap->cache_idx);
+	bpf_local_storage_map_free(smap);
 }
 
 /* U16_MAX is much more than enough for sk local storage
@@ -653,7 +732,7 @@ static void bpf_local_storage_map_free(struct bpf_map *map)
 	       sizeof(struct bpf_local_storage_elem)),			\
 	      (U16_MAX - sizeof(struct bpf_local_storage_elem)))
 
-static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr)
+int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
 {
 	if (attr->map_flags & ~BPF_LOCAL_STORAGE_CREATE_FLAG_MASK ||
 	    !(attr->map_flags & BPF_F_NO_PREALLOC) ||
@@ -672,7 +751,7 @@ static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr)
 	return 0;
 }
 
-static struct bpf_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
+struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 {
 	struct bpf_local_storage_map *smap;
 	unsigned int i;
@@ -710,9 +789,21 @@ static struct bpf_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 		raw_spin_lock_init(&smap->buckets[i].lock);
 	}
 
-	smap->elem_size = sizeof(struct bpf_local_storage_elem) + attr->value_size;
-	smap->cache_idx = bpf_local_storage_cache_idx_get(&sk_cache);
+	smap->elem_size =
+		sizeof(struct bpf_local_storage_elem) + attr->value_size;
+
+	return smap;
+}
 
+static struct bpf_map *sk_storage_map_alloc(union bpf_attr *attr)
+{
+	struct bpf_local_storage_map *smap;
+
+	smap = bpf_local_storage_map_alloc(attr);
+	if (IS_ERR(smap))
+		return ERR_CAST(smap);
+
+	smap->cache_idx = bpf_local_storage_cache_idx_get(&sk_cache);
 	return &smap->map;
 }
 
@@ -722,10 +813,10 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key,
 	return -ENOTSUPP;
 }
 
-static int bpf_sk_storage_map_check_btf(const struct bpf_map *map,
-					const struct btf *btf,
-					const struct btf_type *key_type,
-					const struct btf_type *value_type)
+int bpf_local_storage_map_check_btf(const struct bpf_map *map,
+				    const struct btf *btf,
+				    const struct btf_type *key_type,
+				    const struct btf_type *value_type)
 {
 	u32 int_data;
 
@@ -766,8 +857,7 @@ static int bpf_fd_sk_storage_update_elem(struct bpf_map *map, void *key,
 	fd = *(int *)key;
 	sock = sockfd_lookup(fd, &err);
 	if (sock) {
-		sdata = bpf_local_storage_update(sock->sk, map, value,
-						 map_flags);
+		sdata = sk_storage_update(sock->sk, map, value, map_flags);
 		sockfd_put(sock);
 		return PTR_ERR_OR_ZERO(sdata);
 	}
@@ -798,7 +888,7 @@ bpf_sk_storage_clone_elem(struct sock *newsk,
 {
 	struct bpf_local_storage_elem *copy_selem;
 
-	copy_selem = bpf_selem_alloc(smap, newsk, NULL, true);
+	copy_selem = sk_selem_alloc(smap, newsk, NULL, true);
 	if (!copy_selem)
 		return NULL;
 
@@ -900,7 +990,7 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
 	     *  destruction).
 	     */
 	    refcount_inc_not_zero(&sk->sk_refcnt)) {
-		sdata = bpf_local_storage_update(sk, map, value, BPF_NOEXIST);
+		sdata = sk_storage_update(sk, map, value, BPF_NOEXIST);
 		/* sk must be a fullsock (guaranteed by verifier),
 		 * so sock_gen_put() is unnecessary.
 		 */
@@ -927,16 +1017,19 @@ BPF_CALL_2(bpf_sk_storage_delete, struct bpf_map *, map, struct sock *, sk)
 
 static int sk_storage_map_btf_id;
 const struct bpf_map_ops sk_storage_map_ops = {
-	.map_alloc_check = bpf_sk_storage_map_alloc_check,
-	.map_alloc = bpf_local_storage_map_alloc,
-	.map_free = bpf_local_storage_map_free,
+	.map_alloc_check = bpf_local_storage_map_alloc_check,
+	.map_alloc = sk_storage_map_alloc,
+	.map_free = sk_storage_map_free,
 	.map_get_next_key = notsupp_get_next_key,
 	.map_lookup_elem = bpf_fd_sk_storage_lookup_elem,
 	.map_update_elem = bpf_fd_sk_storage_update_elem,
 	.map_delete_elem = bpf_fd_sk_storage_delete_elem,
-	.map_check_btf = bpf_sk_storage_map_check_btf,
+	.map_check_btf = bpf_local_storage_map_check_btf,
 	.map_btf_name = "bpf_local_storage_map",
 	.map_btf_id = &sk_storage_map_btf_id,
+	.map_selem_alloc = sk_selem_alloc,
+	.map_local_storage_update = sk_storage_update,
+	.map_local_storage_unlink = unlink_sk_storage,
 };
 
 const struct bpf_func_proto bpf_sk_storage_get_proto = {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 54d0c886e3ba..b9d2e4792d08 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3630,9 +3630,13 @@ enum {
 	BPF_F_SYSCTL_BASE_NAME		= (1ULL << 0),
 };
 
-/* BPF_FUNC_sk_storage_get flags */
+/* BPF_FUNC_<local>_storage_get flags */
 enum {
-	BPF_SK_STORAGE_GET_F_CREATE	= (1ULL << 0),
+	BPF_LOCAL_STORAGE_GET_F_CREATE	= (1ULL << 0),
+	/* BPF_SK_STORAGE_GET_F_CREATE is only kept for backward compatibility
+	 * and BPF_LOCAL_STORAGE_GET_F_CREATE must be used instead.
+	 */
+	BPF_SK_STORAGE_GET_F_CREATE  = BPF_LOCAL_STORAGE_GET_F_CREATE,
 };
 
 /* BPF_FUNC_read_branch_records flags. */
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related

* [PATCH bpf-next v5 1/7] bpf: Renames to prepare for generalizing sk_storage.
From: KP Singh @ 2020-07-22 17:14 UTC (permalink / raw)
  To: linux-kernel, bpf, linux-security-module
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Paul Turner, Jann Horn, Florent Revest
In-Reply-To: <20200722171409.102949-1-kpsingh@chromium.org>

From: KP Singh <kpsingh@google.com>

A purely mechanical change to split the renaming from the actual
generalization.

Flags/consts:

  SK_STORAGE_CREATE_FLAG_MASK	BPF_LOCAL_STORAGE_CREATE_FLAG_MASK
  BPF_SK_STORAGE_CACHE_SIZE	BPF_LOCAL_STORAGE_CACHE_SIZE
  MAX_VALUE_SIZE		BPF_LOCAL_STORAGE_MAX_VALUE_SIZE

Structs:

  bucket			bpf_local_storage_map_bucket
  bpf_sk_storage_map		bpf_local_storage_map
  bpf_sk_storage_data		bpf_local_storage_data
  bpf_sk_storage_elem		bpf_local_storage_elem
  bpf_sk_storage		bpf_local_storage
  selem_linked_to_sk		selem_linked_to_storage
  selem_alloc			bpf_selem_alloc

The "sk" member in bpf_local_storage is also updated to "owner"
in preparation for changing the type to void * in a subsequent patch.

Functions:

  __selem_unlink_sk			bpf_selem_unlink_storage
  __selem_link_sk			bpf_selem_link_storage
  selem_unlink_sk			__bpf_selem_unlink_storage
  sk_storage_update			bpf_local_storage_update
  __sk_storage_lookup			bpf_local_storage_lookup
  bpf_sk_storage_map_free		bpf_local_storage_map_free
  bpf_sk_storage_map_alloc		bpf_local_storage_map_alloc
  bpf_sk_storage_map_alloc_check	bpf_local_storage_map_alloc_check
  bpf_sk_storage_map_check_btf		bpf_local_storage_map_check_btf

Signed-off-by: KP Singh <kpsingh@google.com>
---
 include/net/sock.h        |   4 +-
 net/core/bpf_sk_storage.c | 405 +++++++++++++++++++-------------------
 2 files changed, 208 insertions(+), 201 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 4bf884165148..43a23e5ff6ac 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -245,7 +245,7 @@ struct sock_common {
 	/* public: */
 };
 
-struct bpf_sk_storage;
+struct bpf_local_storage;
 
 /**
   *	struct sock - network layer representation of sockets
@@ -516,7 +516,7 @@ struct sock {
 	void                    (*sk_destruct)(struct sock *sk);
 	struct sock_reuseport __rcu	*sk_reuseport_cb;
 #ifdef CONFIG_BPF_SYSCALL
-	struct bpf_sk_storage __rcu	*sk_bpf_storage;
+	struct bpf_local_storage __rcu	*sk_bpf_storage;
 #endif
 	struct rcu_head		sk_rcu;
 };
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 6f921c4ddc2c..fb7ea6f4174d 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -11,33 +11,32 @@
 #include <uapi/linux/sock_diag.h>
 #include <uapi/linux/btf.h>
 
-#define SK_STORAGE_CREATE_FLAG_MASK					\
-	(BPF_F_NO_PREALLOC | BPF_F_CLONE)
+#define BPF_LOCAL_STORAGE_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_CLONE)
 
-struct bucket {
+struct bpf_local_storage_map_bucket {
 	struct hlist_head list;
 	raw_spinlock_t lock;
 };
 
-/* Thp map is not the primary owner of a bpf_sk_storage_elem.
- * Instead, the sk->sk_bpf_storage is.
+/* Thp map is not the primary owner of a bpf_local_storage_elem.
+ * Instead, the container object (eg. sk->sk_bpf_storage) is.
  *
- * The map (bpf_sk_storage_map) is for two purposes
- * 1. Define the size of the "sk local storage".  It is
+ * The map (bpf_local_storage_map) is for two purposes
+ * 1. Define the size of the "local storage".  It is
  *    the map's value_size.
  *
  * 2. Maintain a list to keep track of all elems such
  *    that they can be cleaned up during the map destruction.
  *
  * When a bpf local storage is being looked up for a
- * particular sk,  the "bpf_map" pointer is actually used
+ * particular object,  the "bpf_map" pointer is actually used
  * as the "key" to search in the list of elem in
- * sk->sk_bpf_storage.
+ * the respective bpf_local_storage owned by the object.
  *
- * Hence, consider sk->sk_bpf_storage is the mini-map
- * with the "bpf_map" pointer as the searching key.
+ * e.g. sk->sk_bpf_storage is the mini-map with the "bpf_map" pointer
+ * as the searching key.
  */
-struct bpf_sk_storage_map {
+struct bpf_local_storage_map {
 	struct bpf_map map;
 	/* Lookup elem does not require accessing the map.
 	 *
@@ -45,55 +44,57 @@ struct bpf_sk_storage_map {
 	 * link/unlink the elem from the map.  Having
 	 * multiple buckets to improve contention.
 	 */
-	struct bucket *buckets;
+	struct bpf_local_storage_map_bucket *buckets;
 	u32 bucket_log;
 	u16 elem_size;
 	u16 cache_idx;
 };
 
-struct bpf_sk_storage_data {
+struct bpf_local_storage_data {
 	/* smap is used as the searching key when looking up
-	 * from sk->sk_bpf_storage.
+	 * from the object's bpf_local_storage.
 	 *
 	 * Put it in the same cacheline as the data to minimize
 	 * the number of cachelines access during the cache hit case.
 	 */
-	struct bpf_sk_storage_map __rcu *smap;
+	struct bpf_local_storage_map __rcu *smap;
 	u8 data[] __aligned(8);
 };
 
-/* Linked to bpf_sk_storage and bpf_sk_storage_map */
-struct bpf_sk_storage_elem {
-	struct hlist_node map_node;	/* Linked to bpf_sk_storage_map */
-	struct hlist_node snode;	/* Linked to bpf_sk_storage */
-	struct bpf_sk_storage __rcu *sk_storage;
+/* Linked to bpf_local_storage and bpf_local_storage_map */
+struct bpf_local_storage_elem {
+	struct hlist_node map_node;	/* Linked to bpf_local_storage_map */
+	struct hlist_node snode;	/* Linked to bpf_local_storage */
+	struct bpf_local_storage __rcu *local_storage;
 	struct rcu_head rcu;
 	/* 8 bytes hole */
 	/* The data is stored in aother cacheline to minimize
 	 * the number of cachelines access during a cache hit.
 	 */
-	struct bpf_sk_storage_data sdata ____cacheline_aligned;
+	struct bpf_local_storage_data sdata ____cacheline_aligned;
 };
 
-#define SELEM(_SDATA) container_of((_SDATA), struct bpf_sk_storage_elem, sdata)
+#define SELEM(_SDATA)							\
+	container_of((_SDATA), struct bpf_local_storage_elem, sdata)
 #define SDATA(_SELEM) (&(_SELEM)->sdata)
-#define BPF_SK_STORAGE_CACHE_SIZE	16
+#define BPF_LOCAL_STORAGE_CACHE_SIZE	16
 
 static DEFINE_SPINLOCK(cache_idx_lock);
-static u64 cache_idx_usage_counts[BPF_SK_STORAGE_CACHE_SIZE];
+static u64 cache_idx_usage_counts[BPF_LOCAL_STORAGE_CACHE_SIZE];
 
-struct bpf_sk_storage {
-	struct bpf_sk_storage_data __rcu *cache[BPF_SK_STORAGE_CACHE_SIZE];
-	struct hlist_head list;	/* List of bpf_sk_storage_elem */
-	struct sock *sk;	/* The sk that owns the the above "list" of
-				 * bpf_sk_storage_elem.
+struct bpf_local_storage {
+	struct bpf_local_storage_data __rcu *cache[BPF_LOCAL_STORAGE_CACHE_SIZE];
+	struct hlist_head list; /* List of bpf_local_storage_elem */
+	struct sock *owner;	/* The object that owns the the above "list" of
+				 * bpf_local_storage_elem.
 				 */
 	struct rcu_head rcu;
 	raw_spinlock_t lock;	/* Protect adding/removing from the "list" */
 };
 
-static struct bucket *select_bucket(struct bpf_sk_storage_map *smap,
-				    struct bpf_sk_storage_elem *selem)
+static struct bpf_local_storage_map_bucket *
+select_bucket(struct bpf_local_storage_map *smap,
+	      struct bpf_local_storage_elem *selem)
 {
 	return &smap->buckets[hash_ptr(selem, smap->bucket_log)];
 }
@@ -110,21 +111,21 @@ static int omem_charge(struct sock *sk, unsigned int size)
 	return -ENOMEM;
 }
 
-static bool selem_linked_to_sk(const struct bpf_sk_storage_elem *selem)
+static bool selem_linked_to_storage(const struct bpf_local_storage_elem *selem)
 {
 	return !hlist_unhashed(&selem->snode);
 }
 
-static bool selem_linked_to_map(const struct bpf_sk_storage_elem *selem)
+static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
 {
 	return !hlist_unhashed(&selem->map_node);
 }
 
-static struct bpf_sk_storage_elem *selem_alloc(struct bpf_sk_storage_map *smap,
-					       struct sock *sk, void *value,
-					       bool charge_omem)
+static struct bpf_local_storage_elem *
+bpf_selem_alloc(struct bpf_local_storage_map *smap, struct sock *sk,
+		void *value, bool charge_omem)
 {
-	struct bpf_sk_storage_elem *selem;
+	struct bpf_local_storage_elem *selem;
 
 	if (charge_omem && omem_charge(sk, smap->elem_size))
 		return NULL;
@@ -146,85 +147,86 @@ static struct bpf_sk_storage_elem *selem_alloc(struct bpf_sk_storage_map *smap,
  * The caller must ensure selem->smap is still valid to be
  * dereferenced for its smap->elem_size and smap->cache_idx.
  */
-static bool __selem_unlink_sk(struct bpf_sk_storage *sk_storage,
-			      struct bpf_sk_storage_elem *selem,
-			      bool uncharge_omem)
+static bool bpf_selem_unlink_storage(struct bpf_local_storage *local_storage,
+				     struct bpf_local_storage_elem *selem,
+				     bool uncharge_omem)
 {
-	struct bpf_sk_storage_map *smap;
-	bool free_sk_storage;
+	struct bpf_local_storage_map *smap;
+	bool free_local_storage;
 	struct sock *sk;
 
 	smap = rcu_dereference(SDATA(selem)->smap);
-	sk = sk_storage->sk;
+	sk = local_storage->owner;
 
 	/* All uncharging on sk->sk_omem_alloc must be done first.
-	 * sk may be freed once the last selem is unlinked from sk_storage.
+	 * sk may be freed once the last selem is unlinked from local_storage.
 	 */
 	if (uncharge_omem)
 		atomic_sub(smap->elem_size, &sk->sk_omem_alloc);
 
-	free_sk_storage = hlist_is_singular_node(&selem->snode,
-						 &sk_storage->list);
-	if (free_sk_storage) {
-		atomic_sub(sizeof(struct bpf_sk_storage), &sk->sk_omem_alloc);
-		sk_storage->sk = NULL;
+	free_local_storage = hlist_is_singular_node(&selem->snode,
+						    &local_storage->list);
+	if (free_local_storage) {
+		atomic_sub(sizeof(struct bpf_local_storage), &sk->sk_omem_alloc);
+		local_storage->owner = NULL;
 		/* After this RCU_INIT, sk may be freed and cannot be used */
 		RCU_INIT_POINTER(sk->sk_bpf_storage, NULL);
 
-		/* sk_storage is not freed now.  sk_storage->lock is
-		 * still held and raw_spin_unlock_bh(&sk_storage->lock)
+		/* local_storage is not freed now.  local_storage->lock is
+		 * still held and raw_spin_unlock_bh(&local_storage->lock)
 		 * will be done by the caller.
 		 *
 		 * Although the unlock will be done under
 		 * rcu_read_lock(),  it is more intutivie to
-		 * read if kfree_rcu(sk_storage, rcu) is done
-		 * after the raw_spin_unlock_bh(&sk_storage->lock).
+		 * read if kfree_rcu(local_storage, rcu) is done
+		 * after the raw_spin_unlock_bh(&local_storage->lock).
 		 *
-		 * Hence, a "bool free_sk_storage" is returned
+		 * Hence, a "bool free_local_storage" is returned
 		 * to the caller which then calls the kfree_rcu()
 		 * after unlock.
 		 */
 	}
 	hlist_del_init_rcu(&selem->snode);
-	if (rcu_access_pointer(sk_storage->cache[smap->cache_idx]) ==
+	if (rcu_access_pointer(local_storage->cache[smap->cache_idx]) ==
 	    SDATA(selem))
-		RCU_INIT_POINTER(sk_storage->cache[smap->cache_idx], NULL);
+		RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
 
 	kfree_rcu(selem, rcu);
 
-	return free_sk_storage;
+	return free_local_storage;
 }
 
-static void selem_unlink_sk(struct bpf_sk_storage_elem *selem)
+static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem)
 {
-	struct bpf_sk_storage *sk_storage;
-	bool free_sk_storage = false;
+	struct bpf_local_storage *local_storage;
+	bool free_local_storage = false;
 
-	if (unlikely(!selem_linked_to_sk(selem)))
+	if (unlikely(!selem_linked_to_storage(selem)))
 		/* selem has already been unlinked from sk */
 		return;
 
-	sk_storage = rcu_dereference(selem->sk_storage);
-	raw_spin_lock_bh(&sk_storage->lock);
-	if (likely(selem_linked_to_sk(selem)))
-		free_sk_storage = __selem_unlink_sk(sk_storage, selem, true);
-	raw_spin_unlock_bh(&sk_storage->lock);
+	local_storage = rcu_dereference(selem->local_storage);
+	raw_spin_lock_bh(&local_storage->lock);
+	if (likely(selem_linked_to_storage(selem)))
+		free_local_storage =
+			bpf_selem_unlink_storage(local_storage, selem, true);
+	raw_spin_unlock_bh(&local_storage->lock);
 
-	if (free_sk_storage)
-		kfree_rcu(sk_storage, rcu);
+	if (free_local_storage)
+		kfree_rcu(local_storage, rcu);
 }
 
-static void __selem_link_sk(struct bpf_sk_storage *sk_storage,
-			    struct bpf_sk_storage_elem *selem)
+static void bpf_selem_link_storage(struct bpf_local_storage *local_storage,
+				   struct bpf_local_storage_elem *selem)
 {
-	RCU_INIT_POINTER(selem->sk_storage, sk_storage);
-	hlist_add_head(&selem->snode, &sk_storage->list);
+	RCU_INIT_POINTER(selem->local_storage, local_storage);
+	hlist_add_head(&selem->snode, &local_storage->list);
 }
 
-static void selem_unlink_map(struct bpf_sk_storage_elem *selem)
+static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
 {
-	struct bpf_sk_storage_map *smap;
-	struct bucket *b;
+	struct bpf_local_storage_map *smap;
+	struct bpf_local_storage_map_bucket *b;
 
 	if (unlikely(!selem_linked_to_map(selem)))
 		/* selem has already be unlinked from smap */
@@ -238,10 +240,10 @@ static void selem_unlink_map(struct bpf_sk_storage_elem *selem)
 	raw_spin_unlock_bh(&b->lock);
 }
 
-static void selem_link_map(struct bpf_sk_storage_map *smap,
-			   struct bpf_sk_storage_elem *selem)
+static void bpf_selem_link_map(struct bpf_local_storage_map *smap,
+			       struct bpf_local_storage_elem *selem)
 {
-	struct bucket *b = select_bucket(smap, selem);
+	struct bpf_local_storage_map_bucket *b = select_bucket(smap, selem);
 
 	raw_spin_lock_bh(&b->lock);
 	RCU_INIT_POINTER(SDATA(selem)->smap, smap);
@@ -249,31 +251,31 @@ static void selem_link_map(struct bpf_sk_storage_map *smap,
 	raw_spin_unlock_bh(&b->lock);
 }
 
-static void selem_unlink(struct bpf_sk_storage_elem *selem)
+static void bpf_selem_unlink(struct bpf_local_storage_elem *selem)
 {
-	/* Always unlink from map before unlinking from sk_storage
+	/* Always unlink from map before unlinking from local_storage
 	 * because selem will be freed after successfully unlinked from
-	 * the sk_storage.
+	 * the local_storage.
 	 */
-	selem_unlink_map(selem);
-	selem_unlink_sk(selem);
+	bpf_selem_unlink_map(selem);
+	__bpf_selem_unlink_storage(selem);
 }
 
-static struct bpf_sk_storage_data *
-__sk_storage_lookup(struct bpf_sk_storage *sk_storage,
-		    struct bpf_sk_storage_map *smap,
-		    bool cacheit_lockit)
+static struct bpf_local_storage_data *
+bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
+			 struct bpf_local_storage_map *smap,
+			 bool cacheit_lockit)
 {
-	struct bpf_sk_storage_data *sdata;
-	struct bpf_sk_storage_elem *selem;
+	struct bpf_local_storage_data *sdata;
+	struct bpf_local_storage_elem *selem;
 
 	/* Fast path (cache hit) */
-	sdata = rcu_dereference(sk_storage->cache[smap->cache_idx]);
+	sdata = rcu_dereference(local_storage->cache[smap->cache_idx]);
 	if (sdata && rcu_access_pointer(sdata->smap) == smap)
 		return sdata;
 
 	/* Slow path (cache miss) */
-	hlist_for_each_entry_rcu(selem, &sk_storage->list, snode)
+	hlist_for_each_entry_rcu(selem, &local_storage->list, snode)
 		if (rcu_access_pointer(SDATA(selem)->smap) == smap)
 			break;
 
@@ -285,33 +287,33 @@ __sk_storage_lookup(struct bpf_sk_storage *sk_storage,
 		/* spinlock is needed to avoid racing with the
 		 * parallel delete.  Otherwise, publishing an already
 		 * deleted sdata to the cache will become a use-after-free
-		 * problem in the next __sk_storage_lookup().
+		 * problem in the next bpf_local_storage_lookup().
 		 */
-		raw_spin_lock_bh(&sk_storage->lock);
-		if (selem_linked_to_sk(selem))
-			rcu_assign_pointer(sk_storage->cache[smap->cache_idx],
+		raw_spin_lock_bh(&local_storage->lock);
+		if (selem_linked_to_storage(selem))
+			rcu_assign_pointer(local_storage->cache[smap->cache_idx],
 					   sdata);
-		raw_spin_unlock_bh(&sk_storage->lock);
+		raw_spin_unlock_bh(&local_storage->lock);
 	}
 
 	return sdata;
 }
 
-static struct bpf_sk_storage_data *
+static struct bpf_local_storage_data *
 sk_storage_lookup(struct sock *sk, struct bpf_map *map, bool cacheit_lockit)
 {
-	struct bpf_sk_storage *sk_storage;
-	struct bpf_sk_storage_map *smap;
+	struct bpf_local_storage *sk_storage;
+	struct bpf_local_storage_map *smap;
 
 	sk_storage = rcu_dereference(sk->sk_bpf_storage);
 	if (!sk_storage)
 		return NULL;
 
-	smap = (struct bpf_sk_storage_map *)map;
-	return __sk_storage_lookup(sk_storage, smap, cacheit_lockit);
+	smap = (struct bpf_local_storage_map *)map;
+	return bpf_local_storage_lookup(sk_storage, smap, cacheit_lockit);
 }
 
-static int check_flags(const struct bpf_sk_storage_data *old_sdata,
+static int check_flags(const struct bpf_local_storage_data *old_sdata,
 		       u64 map_flags)
 {
 	if (old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_NOEXIST)
@@ -326,10 +328,10 @@ static int check_flags(const struct bpf_sk_storage_data *old_sdata,
 }
 
 static int sk_storage_alloc(struct sock *sk,
-			    struct bpf_sk_storage_map *smap,
-			    struct bpf_sk_storage_elem *first_selem)
+			    struct bpf_local_storage_map *smap,
+			    struct bpf_local_storage_elem *first_selem)
 {
-	struct bpf_sk_storage *prev_sk_storage, *sk_storage;
+	struct bpf_local_storage *prev_sk_storage, *sk_storage;
 	int err;
 
 	err = omem_charge(sk, sizeof(*sk_storage));
@@ -343,10 +345,10 @@ static int sk_storage_alloc(struct sock *sk,
 	}
 	INIT_HLIST_HEAD(&sk_storage->list);
 	raw_spin_lock_init(&sk_storage->lock);
-	sk_storage->sk = sk;
+	sk_storage->owner = sk;
 
-	__selem_link_sk(sk_storage, first_selem);
-	selem_link_map(smap, first_selem);
+	bpf_selem_link_storage(sk_storage, first_selem);
+	bpf_selem_link_map(smap, first_selem);
 	/* Publish sk_storage to sk.  sk->sk_lock cannot be acquired.
 	 * Hence, atomic ops is used to set sk->sk_bpf_storage
 	 * from NULL to the newly allocated sk_storage ptr.
@@ -356,10 +358,10 @@ static int sk_storage_alloc(struct sock *sk,
 	 * the sk->sk_bpf_storage, the sk_storage->lock must
 	 * be held before setting sk->sk_bpf_storage to NULL.
 	 */
-	prev_sk_storage = cmpxchg((struct bpf_sk_storage **)&sk->sk_bpf_storage,
+	prev_sk_storage = cmpxchg((struct bpf_local_storage **)&sk->sk_bpf_storage,
 				  NULL, sk_storage);
 	if (unlikely(prev_sk_storage)) {
-		selem_unlink_map(first_selem);
+		bpf_selem_unlink_map(first_selem);
 		err = -EAGAIN;
 		goto uncharge;
 
@@ -386,15 +388,14 @@ static int sk_storage_alloc(struct sock *sk,
  * Otherwise, it will become a leak (and other memory issues
  * during map destruction).
  */
-static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk,
-						     struct bpf_map *map,
-						     void *value,
-						     u64 map_flags)
+static struct bpf_local_storage_data *
+bpf_local_storage_update(struct sock *sk, struct bpf_map *map, void *value,
+			 u64 map_flags)
 {
-	struct bpf_sk_storage_data *old_sdata = NULL;
-	struct bpf_sk_storage_elem *selem;
-	struct bpf_sk_storage *sk_storage;
-	struct bpf_sk_storage_map *smap;
+	struct bpf_local_storage_data *old_sdata = NULL;
+	struct bpf_local_storage_elem *selem;
+	struct bpf_local_storage *local_storage;
+	struct bpf_local_storage_map *smap;
 	int err;
 
 	/* BPF_EXIST and BPF_NOEXIST cannot be both set */
@@ -403,15 +404,15 @@ static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk,
 	    unlikely((map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map)))
 		return ERR_PTR(-EINVAL);
 
-	smap = (struct bpf_sk_storage_map *)map;
-	sk_storage = rcu_dereference(sk->sk_bpf_storage);
-	if (!sk_storage || hlist_empty(&sk_storage->list)) {
-		/* Very first elem for this sk */
+	smap = (struct bpf_local_storage_map *)map;
+	local_storage = rcu_dereference(sk->sk_bpf_storage);
+	if (!local_storage || hlist_empty(&local_storage->list)) {
+		/* Very first elem for this object */
 		err = check_flags(NULL, map_flags);
 		if (err)
 			return ERR_PTR(err);
 
-		selem = selem_alloc(smap, sk, value, true);
+		selem = bpf_selem_alloc(smap, sk, value, true);
 		if (!selem)
 			return ERR_PTR(-ENOMEM);
 
@@ -427,25 +428,26 @@ static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk,
 
 	if ((map_flags & BPF_F_LOCK) && !(map_flags & BPF_NOEXIST)) {
 		/* Hoping to find an old_sdata to do inline update
-		 * such that it can avoid taking the sk_storage->lock
+		 * such that it can avoid taking the local_storage->lock
 		 * and changing the lists.
 		 */
-		old_sdata = __sk_storage_lookup(sk_storage, smap, false);
+		old_sdata =
+			bpf_local_storage_lookup(local_storage, smap, false);
 		err = check_flags(old_sdata, map_flags);
 		if (err)
 			return ERR_PTR(err);
-		if (old_sdata && selem_linked_to_sk(SELEM(old_sdata))) {
+		if (old_sdata && selem_linked_to_storage(SELEM(old_sdata))) {
 			copy_map_value_locked(map, old_sdata->data,
 					      value, false);
 			return old_sdata;
 		}
 	}
 
-	raw_spin_lock_bh(&sk_storage->lock);
+	raw_spin_lock_bh(&local_storage->lock);
 
-	/* Recheck sk_storage->list under sk_storage->lock */
-	if (unlikely(hlist_empty(&sk_storage->list))) {
-		/* A parallel del is happening and sk_storage is going
+	/* Recheck local_storage->list under local_storage->lock */
+	if (unlikely(hlist_empty(&local_storage->list))) {
+		/* A parallel del is happening and local_storage is going
 		 * away.  It has just been checked before, so very
 		 * unlikely.  Return instead of retry to keep things
 		 * simple.
@@ -454,7 +456,7 @@ static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk,
 		goto unlock_err;
 	}
 
-	old_sdata = __sk_storage_lookup(sk_storage, smap, false);
+	old_sdata = bpf_local_storage_lookup(local_storage, smap, false);
 	err = check_flags(old_sdata, map_flags);
 	if (err)
 		goto unlock_err;
@@ -465,50 +467,51 @@ static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk,
 		goto unlock;
 	}
 
-	/* sk_storage->lock is held.  Hence, we are sure
+	/* local_storage->lock is held.  Hence, we are sure
 	 * we can unlink and uncharge the old_sdata successfully
 	 * later.  Hence, instead of charging the new selem now
 	 * and then uncharge the old selem later (which may cause
 	 * a potential but unnecessary charge failure),  avoid taking
 	 * a charge at all here (the "!old_sdata" check) and the
-	 * old_sdata will not be uncharged later during __selem_unlink_sk().
+	 * old_sdata will not be uncharged later during
+	 * bpf_selem_unlink_storage().
 	 */
-	selem = selem_alloc(smap, sk, value, !old_sdata);
+	selem = bpf_selem_alloc(smap, sk, value, !old_sdata);
 	if (!selem) {
 		err = -ENOMEM;
 		goto unlock_err;
 	}
 
 	/* First, link the new selem to the map */
-	selem_link_map(smap, selem);
+	bpf_selem_link_map(smap, selem);
 
-	/* Second, link (and publish) the new selem to sk_storage */
-	__selem_link_sk(sk_storage, selem);
+	/* Second, link (and publish) the new selem to local_storage */
+	bpf_selem_link_storage(local_storage, selem);
 
 	/* Third, remove old selem, SELEM(old_sdata) */
 	if (old_sdata) {
-		selem_unlink_map(SELEM(old_sdata));
-		__selem_unlink_sk(sk_storage, SELEM(old_sdata), false);
+		bpf_selem_unlink_map(SELEM(old_sdata));
+		bpf_selem_unlink_storage(local_storage, SELEM(old_sdata), false);
 	}
 
 unlock:
-	raw_spin_unlock_bh(&sk_storage->lock);
+	raw_spin_unlock_bh(&local_storage->lock);
 	return SDATA(selem);
 
 unlock_err:
-	raw_spin_unlock_bh(&sk_storage->lock);
+	raw_spin_unlock_bh(&local_storage->lock);
 	return ERR_PTR(err);
 }
 
 static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
 {
-	struct bpf_sk_storage_data *sdata;
+	struct bpf_local_storage_data *sdata;
 
 	sdata = sk_storage_lookup(sk, map, false);
 	if (!sdata)
 		return -ENOENT;
 
-	selem_unlink(SELEM(sdata));
+	bpf_selem_unlink(SELEM(sdata));
 
 	return 0;
 }
@@ -520,7 +523,7 @@ static u16 cache_idx_get(void)
 
 	spin_lock(&cache_idx_lock);
 
-	for (i = 0; i < BPF_SK_STORAGE_CACHE_SIZE; i++) {
+	for (i = 0; i < BPF_LOCAL_STORAGE_CACHE_SIZE; i++) {
 		if (cache_idx_usage_counts[i] < min_usage) {
 			min_usage = cache_idx_usage_counts[i];
 			res = i;
@@ -547,8 +550,8 @@ static void cache_idx_free(u16 idx)
 /* Called by __sk_destruct() & bpf_sk_storage_clone() */
 void bpf_sk_storage_free(struct sock *sk)
 {
-	struct bpf_sk_storage_elem *selem;
-	struct bpf_sk_storage *sk_storage;
+	struct bpf_local_storage_elem *selem;
+	struct bpf_local_storage *sk_storage;
 	bool free_sk_storage = false;
 	struct hlist_node *n;
 
@@ -573,8 +576,9 @@ void bpf_sk_storage_free(struct sock *sk)
 		/* Always unlink from map before unlinking from
 		 * sk_storage.
 		 */
-		selem_unlink_map(selem);
-		free_sk_storage = __selem_unlink_sk(sk_storage, selem, true);
+		bpf_selem_unlink_map(selem);
+		free_sk_storage =
+			bpf_selem_unlink_storage(sk_storage, selem, true);
 	}
 	raw_spin_unlock_bh(&sk_storage->lock);
 	rcu_read_unlock();
@@ -583,14 +587,14 @@ void bpf_sk_storage_free(struct sock *sk)
 		kfree_rcu(sk_storage, rcu);
 }
 
-static void bpf_sk_storage_map_free(struct bpf_map *map)
+static void bpf_local_storage_map_free(struct bpf_map *map)
 {
-	struct bpf_sk_storage_elem *selem;
-	struct bpf_sk_storage_map *smap;
-	struct bucket *b;
+	struct bpf_local_storage_elem *selem;
+	struct bpf_local_storage_map *smap;
+	struct bpf_local_storage_map_bucket *b;
 	unsigned int i;
 
-	smap = (struct bpf_sk_storage_map *)map;
+	smap = (struct bpf_local_storage_map *)map;
 
 	cache_idx_free(smap->cache_idx);
 
@@ -614,10 +618,10 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
 
 		rcu_read_lock();
 		/* No one is adding to b->list now */
-		while ((selem = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(&b->list)),
-						 struct bpf_sk_storage_elem,
-						 map_node))) {
-			selem_unlink(selem);
+		while ((selem = hlist_entry_safe(
+				rcu_dereference_raw(hlist_first_rcu(&b->list)),
+				struct bpf_local_storage_elem, map_node))) {
+			bpf_selem_unlink(selem);
 			cond_resched_rcu();
 		}
 		rcu_read_unlock();
@@ -630,7 +634,7 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
 	 *
 	 * However, the bpf_sk_storage_free() still needs to access
 	 * the smap->elem_size to do the uncharging in
-	 * __selem_unlink_sk().
+	 * bpf_selem_unlink_storage().
 	 *
 	 * Hence, wait another rcu grace period for the
 	 * bpf_sk_storage_free() to finish.
@@ -644,14 +648,15 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
 /* U16_MAX is much more than enough for sk local storage
  * considering a tcp_sock is ~2k.
  */
-#define MAX_VALUE_SIZE							\
+#define BPF_LOCAL_STORAGE_MAX_VALUE_SIZE				\
 	min_t(u32,							\
-	      (KMALLOC_MAX_SIZE - MAX_BPF_STACK - sizeof(struct bpf_sk_storage_elem)), \
-	      (U16_MAX - sizeof(struct bpf_sk_storage_elem)))
+	      (KMALLOC_MAX_SIZE - MAX_BPF_STACK -			\
+	       sizeof(struct bpf_local_storage_elem)),			\
+	      (U16_MAX - sizeof(struct bpf_local_storage_elem)))
 
 static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr)
 {
-	if (attr->map_flags & ~SK_STORAGE_CREATE_FLAG_MASK ||
+	if (attr->map_flags & ~BPF_LOCAL_STORAGE_CREATE_FLAG_MASK ||
 	    !(attr->map_flags & BPF_F_NO_PREALLOC) ||
 	    attr->max_entries ||
 	    attr->key_size != sizeof(int) || !attr->value_size ||
@@ -662,15 +667,15 @@ static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr)
 	if (!bpf_capable())
 		return -EPERM;
 
-	if (attr->value_size > MAX_VALUE_SIZE)
+	if (attr->value_size > BPF_LOCAL_STORAGE_MAX_VALUE_SIZE)
 		return -E2BIG;
 
 	return 0;
 }
 
-static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
+static struct bpf_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 {
-	struct bpf_sk_storage_map *smap;
+	struct bpf_local_storage_map *smap;
 	unsigned int i;
 	u32 nbuckets;
 	u64 cost;
@@ -706,7 +711,7 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr)
 		raw_spin_lock_init(&smap->buckets[i].lock);
 	}
 
-	smap->elem_size = sizeof(struct bpf_sk_storage_elem) + attr->value_size;
+	smap->elem_size = sizeof(struct bpf_local_storage_elem) + attr->value_size;
 	smap->cache_idx = cache_idx_get();
 
 	return &smap->map;
@@ -737,7 +742,7 @@ static int bpf_sk_storage_map_check_btf(const struct bpf_map *map,
 
 static void *bpf_fd_sk_storage_lookup_elem(struct bpf_map *map, void *key)
 {
-	struct bpf_sk_storage_data *sdata;
+	struct bpf_local_storage_data *sdata;
 	struct socket *sock;
 	int fd, err;
 
@@ -755,14 +760,15 @@ static void *bpf_fd_sk_storage_lookup_elem(struct bpf_map *map, void *key)
 static int bpf_fd_sk_storage_update_elem(struct bpf_map *map, void *key,
 					 void *value, u64 map_flags)
 {
-	struct bpf_sk_storage_data *sdata;
+	struct bpf_local_storage_data *sdata;
 	struct socket *sock;
 	int fd, err;
 
 	fd = *(int *)key;
 	sock = sockfd_lookup(fd, &err);
 	if (sock) {
-		sdata = sk_storage_update(sock->sk, map, value, map_flags);
+		sdata = bpf_local_storage_update(sock->sk, map, value,
+						 map_flags);
 		sockfd_put(sock);
 		return PTR_ERR_OR_ZERO(sdata);
 	}
@@ -786,14 +792,14 @@ static int bpf_fd_sk_storage_delete_elem(struct bpf_map *map, void *key)
 	return err;
 }
 
-static struct bpf_sk_storage_elem *
+static struct bpf_local_storage_elem *
 bpf_sk_storage_clone_elem(struct sock *newsk,
-			  struct bpf_sk_storage_map *smap,
-			  struct bpf_sk_storage_elem *selem)
+			  struct bpf_local_storage_map *smap,
+			  struct bpf_local_storage_elem *selem)
 {
-	struct bpf_sk_storage_elem *copy_selem;
+	struct bpf_local_storage_elem *copy_selem;
 
-	copy_selem = selem_alloc(smap, newsk, NULL, true);
+	copy_selem = bpf_selem_alloc(smap, newsk, NULL, true);
 	if (!copy_selem)
 		return NULL;
 
@@ -809,9 +815,9 @@ bpf_sk_storage_clone_elem(struct sock *newsk,
 
 int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
 {
-	struct bpf_sk_storage *new_sk_storage = NULL;
-	struct bpf_sk_storage *sk_storage;
-	struct bpf_sk_storage_elem *selem;
+	struct bpf_local_storage *new_sk_storage = NULL;
+	struct bpf_local_storage *sk_storage;
+	struct bpf_local_storage_elem *selem;
 	int ret = 0;
 
 	RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
@@ -823,8 +829,8 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
 		goto out;
 
 	hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
-		struct bpf_sk_storage_elem *copy_selem;
-		struct bpf_sk_storage_map *smap;
+		struct bpf_local_storage_elem *copy_selem;
+		struct bpf_local_storage_map *smap;
 		struct bpf_map *map;
 
 		smap = rcu_dereference(SDATA(selem)->smap);
@@ -848,8 +854,8 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
 		}
 
 		if (new_sk_storage) {
-			selem_link_map(smap, copy_selem);
-			__selem_link_sk(new_sk_storage, copy_selem);
+			bpf_selem_link_map(smap, copy_selem);
+			bpf_selem_link_storage(new_sk_storage, copy_selem);
 		} else {
 			ret = sk_storage_alloc(newsk, smap, copy_selem);
 			if (ret) {
@@ -860,7 +866,8 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
 				goto out;
 			}
 
-			new_sk_storage = rcu_dereference(copy_selem->sk_storage);
+			new_sk_storage =
+				rcu_dereference(copy_selem->local_storage);
 		}
 		bpf_map_put(map);
 	}
@@ -878,7 +885,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
 BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
 	   void *, value, u64, flags)
 {
-	struct bpf_sk_storage_data *sdata;
+	struct bpf_local_storage_data *sdata;
 
 	if (flags > BPF_SK_STORAGE_GET_F_CREATE)
 		return (unsigned long)NULL;
@@ -894,7 +901,7 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
 	     *  destruction).
 	     */
 	    refcount_inc_not_zero(&sk->sk_refcnt)) {
-		sdata = sk_storage_update(sk, map, value, BPF_NOEXIST);
+		sdata = bpf_local_storage_update(sk, map, value, BPF_NOEXIST);
 		/* sk must be a fullsock (guaranteed by verifier),
 		 * so sock_gen_put() is unnecessary.
 		 */
@@ -922,14 +929,14 @@ BPF_CALL_2(bpf_sk_storage_delete, struct bpf_map *, map, struct sock *, sk)
 static int sk_storage_map_btf_id;
 const struct bpf_map_ops sk_storage_map_ops = {
 	.map_alloc_check = bpf_sk_storage_map_alloc_check,
-	.map_alloc = bpf_sk_storage_map_alloc,
-	.map_free = bpf_sk_storage_map_free,
+	.map_alloc = bpf_local_storage_map_alloc,
+	.map_free = bpf_local_storage_map_free,
 	.map_get_next_key = notsupp_get_next_key,
 	.map_lookup_elem = bpf_fd_sk_storage_lookup_elem,
 	.map_update_elem = bpf_fd_sk_storage_update_elem,
 	.map_delete_elem = bpf_fd_sk_storage_delete_elem,
 	.map_check_btf = bpf_sk_storage_map_check_btf,
-	.map_btf_name = "bpf_sk_storage_map",
+	.map_btf_name = "bpf_local_storage_map",
 	.map_btf_id = &sk_storage_map_btf_id,
 };
 
@@ -1011,7 +1018,7 @@ bpf_sk_storage_diag_alloc(const struct nlattr *nla_stgs)
 	u32 nr_maps = 0;
 	int rem, err;
 
-	/* bpf_sk_storage_map is currently limited to CAP_SYS_ADMIN as
+	/* bpf_local_storage_map is currently limited to CAP_SYS_ADMIN as
 	 * the map_alloc_check() side also does.
 	 */
 	if (!bpf_capable())
@@ -1061,13 +1068,13 @@ bpf_sk_storage_diag_alloc(const struct nlattr *nla_stgs)
 }
 EXPORT_SYMBOL_GPL(bpf_sk_storage_diag_alloc);
 
-static int diag_get(struct bpf_sk_storage_data *sdata, struct sk_buff *skb)
+static int diag_get(struct bpf_local_storage_data *sdata, struct sk_buff *skb)
 {
 	struct nlattr *nla_stg, *nla_value;
-	struct bpf_sk_storage_map *smap;
+	struct bpf_local_storage_map *smap;
 
 	/* It cannot exceed max nlattr's payload */
-	BUILD_BUG_ON(U16_MAX - NLA_HDRLEN < MAX_VALUE_SIZE);
+	BUILD_BUG_ON(U16_MAX - NLA_HDRLEN < BPF_LOCAL_STORAGE_MAX_VALUE_SIZE);
 
 	nla_stg = nla_nest_start(skb, SK_DIAG_BPF_STORAGE);
 	if (!nla_stg)
@@ -1103,9 +1110,9 @@ static int bpf_sk_storage_diag_put_all(struct sock *sk, struct sk_buff *skb,
 {
 	/* stg_array_type (e.g. INET_DIAG_BPF_SK_STORAGES) */
 	unsigned int diag_size = nla_total_size(0);
-	struct bpf_sk_storage *sk_storage;
-	struct bpf_sk_storage_elem *selem;
-	struct bpf_sk_storage_map *smap;
+	struct bpf_local_storage *sk_storage;
+	struct bpf_local_storage_elem *selem;
+	struct bpf_local_storage_map *smap;
 	struct nlattr *nla_stgs;
 	unsigned int saved_len;
 	int err = 0;
@@ -1158,8 +1165,8 @@ int bpf_sk_storage_diag_put(struct bpf_sk_storage_diag *diag,
 {
 	/* stg_array_type (e.g. INET_DIAG_BPF_SK_STORAGES) */
 	unsigned int diag_size = nla_total_size(0);
-	struct bpf_sk_storage *sk_storage;
-	struct bpf_sk_storage_data *sdata;
+	struct bpf_local_storage *sk_storage;
+	struct bpf_local_storage_data *sdata;
 	struct nlattr *nla_stgs;
 	unsigned int saved_len;
 	int err = 0;
@@ -1186,8 +1193,8 @@ int bpf_sk_storage_diag_put(struct bpf_sk_storage_diag *diag,
 
 	saved_len = skb->len;
 	for (i = 0; i < diag->nr_maps; i++) {
-		sdata = __sk_storage_lookup(sk_storage,
-				(struct bpf_sk_storage_map *)diag->maps[i],
+		sdata = bpf_local_storage_lookup(sk_storage,
+				(struct bpf_local_storage_map *)diag->maps[i],
 				false);
 
 		if (!sdata)
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related

* [PATCH bpf-next v5 2/7] bpf: Generalize caching for sk_storage.
From: KP Singh @ 2020-07-22 17:14 UTC (permalink / raw)
  To: linux-kernel, bpf, linux-security-module
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Paul Turner, Jann Horn, Florent Revest
In-Reply-To: <20200722171409.102949-1-kpsingh@chromium.org>

From: KP Singh <kpsingh@google.com>

Provide the a ability to define local storage caches on a per-object
type basis. The caches and caching indices for different objects should
not be inter-mixed as suggested in:

  https://lore.kernel.org/bpf/20200630193441.kdwnkestulg5erii@kafai-mbp.dhcp.thefacebook.com/

  "Caching a sk-storage at idx=0 of a sk should not stop an
  inode-storage to be cached at the same idx of a inode."

Signed-off-by: KP Singh <kpsingh@google.com>
---
 include/net/bpf_sk_storage.h | 19 +++++++++++++++++++
 net/core/bpf_sk_storage.c    | 31 +++++++++++++++----------------
 2 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h
index 5036c94c0503..950c5aaba15e 100644
--- a/include/net/bpf_sk_storage.h
+++ b/include/net/bpf_sk_storage.h
@@ -3,6 +3,9 @@
 #ifndef _BPF_SK_STORAGE_H
 #define _BPF_SK_STORAGE_H
 
+#include <linux/types.h>
+#include <linux/spinlock.h>
+
 struct sock;
 
 void bpf_sk_storage_free(struct sock *sk);
@@ -15,6 +18,22 @@ struct sk_buff;
 struct nlattr;
 struct sock;
 
+#define BPF_LOCAL_STORAGE_CACHE_SIZE	16
+
+struct bpf_local_storage_cache {
+	spinlock_t idx_lock;
+	u64 idx_usage_counts[BPF_LOCAL_STORAGE_CACHE_SIZE];
+};
+
+#define DEFINE_BPF_STORAGE_CACHE(name)				\
+static struct bpf_local_storage_cache name = {			\
+	.idx_lock = __SPIN_LOCK_UNLOCKED(name.idx_lock),	\
+}
+
+u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache);
+void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache,
+				      u16 idx);
+
 #ifdef CONFIG_BPF_SYSCALL
 int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk);
 struct bpf_sk_storage_diag *
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index fb7ea6f4174d..aa3e3a47acb5 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -13,6 +13,8 @@
 
 #define BPF_LOCAL_STORAGE_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_CLONE)
 
+DEFINE_BPF_STORAGE_CACHE(sk_cache);
+
 struct bpf_local_storage_map_bucket {
 	struct hlist_head list;
 	raw_spinlock_t lock;
@@ -77,10 +79,6 @@ struct bpf_local_storage_elem {
 #define SELEM(_SDATA)							\
 	container_of((_SDATA), struct bpf_local_storage_elem, sdata)
 #define SDATA(_SELEM) (&(_SELEM)->sdata)
-#define BPF_LOCAL_STORAGE_CACHE_SIZE	16
-
-static DEFINE_SPINLOCK(cache_idx_lock);
-static u64 cache_idx_usage_counts[BPF_LOCAL_STORAGE_CACHE_SIZE];
 
 struct bpf_local_storage {
 	struct bpf_local_storage_data __rcu *cache[BPF_LOCAL_STORAGE_CACHE_SIZE];
@@ -516,16 +514,16 @@ static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
 	return 0;
 }
 
-static u16 cache_idx_get(void)
+u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache)
 {
 	u64 min_usage = U64_MAX;
 	u16 i, res = 0;
 
-	spin_lock(&cache_idx_lock);
+	spin_lock(&cache->idx_lock);
 
 	for (i = 0; i < BPF_LOCAL_STORAGE_CACHE_SIZE; i++) {
-		if (cache_idx_usage_counts[i] < min_usage) {
-			min_usage = cache_idx_usage_counts[i];
+		if (cache->idx_usage_counts[i] < min_usage) {
+			min_usage = cache->idx_usage_counts[i];
 			res = i;
 
 			/* Found a free cache_idx */
@@ -533,18 +531,19 @@ static u16 cache_idx_get(void)
 				break;
 		}
 	}
-	cache_idx_usage_counts[res]++;
+	cache->idx_usage_counts[res]++;
 
-	spin_unlock(&cache_idx_lock);
+	spin_unlock(&cache->idx_lock);
 
 	return res;
 }
 
-static void cache_idx_free(u16 idx)
+void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache,
+				      u16 idx)
 {
-	spin_lock(&cache_idx_lock);
-	cache_idx_usage_counts[idx]--;
-	spin_unlock(&cache_idx_lock);
+	spin_lock(&cache->idx_lock);
+	cache->idx_usage_counts[idx]--;
+	spin_unlock(&cache->idx_lock);
 }
 
 /* Called by __sk_destruct() & bpf_sk_storage_clone() */
@@ -596,7 +595,7 @@ static void bpf_local_storage_map_free(struct bpf_map *map)
 
 	smap = (struct bpf_local_storage_map *)map;
 
-	cache_idx_free(smap->cache_idx);
+	bpf_local_storage_cache_idx_free(&sk_cache, smap->cache_idx);
 
 	/* Note that this map might be concurrently cloned from
 	 * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone
@@ -712,7 +711,7 @@ static struct bpf_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
 	}
 
 	smap->elem_size = sizeof(struct bpf_local_storage_elem) + attr->value_size;
-	smap->cache_idx = cache_idx_get();
+	smap->cache_idx = bpf_local_storage_cache_idx_get(&sk_cache);
 
 	return &smap->map;
 }
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related

* [PATCH bpf-next v5 4/7] bpf: Split bpf_local_storage to bpf_sk_storage
From: KP Singh @ 2020-07-22 17:14 UTC (permalink / raw)
  To: linux-kernel, bpf, linux-security-module
  Cc: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau,
	Paul Turner, Jann Horn, Florent Revest
In-Reply-To: <20200722171409.102949-1-kpsingh@chromium.org>

From: KP Singh <kpsingh@google.com>

A purely mechanical change:

	bpf_sk_storage.c = bpf_sk_storage.c + bpf_local_storage.c
	bpf_sk_storage.h = bpf_sk_storage.h + bpf_local_storage.h

Signed-off-by: KP Singh <kpsingh@google.com>
---
 include/linux/bpf_local_storage.h | 165 +++++++++
 include/net/bpf_sk_storage.h      |  64 ----
 kernel/bpf/Makefile               |   1 +
 kernel/bpf/bpf_local_storage.c    | 519 ++++++++++++++++++++++++++
 net/core/bpf_sk_storage.c         | 587 +-----------------------------
 5 files changed, 686 insertions(+), 650 deletions(-)
 create mode 100644 include/linux/bpf_local_storage.h
 create mode 100644 kernel/bpf/bpf_local_storage.c

diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
new file mode 100644
index 000000000000..d80573b11d4c
--- /dev/null
+++ b/include/linux/bpf_local_storage.h
@@ -0,0 +1,165 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2019 Facebook
+ * Copyright 2020 Google LLC.
+ */
+
+#ifndef _BPF_LOCAL_STORAGE_H
+#define _BPF_LOCAL_STORAGE_H
+
+#include <linux/bpf.h>
+#include <linux/rculist.h>
+#include <linux/list.h>
+#include <linux/hash.h>
+#include <linux/types.h>
+#include <uapi/linux/btf.h>
+
+#define BPF_LOCAL_STORAGE_CACHE_SIZE	16
+
+struct bpf_local_storage_map_bucket {
+	struct hlist_head list;
+	raw_spinlock_t lock;
+};
+
+/* Thp map is not the primary owner of a bpf_local_storage_elem.
+ * Instead, the container object (eg. sk->sk_bpf_storage) is.
+ *
+ * The map (bpf_local_storage_map) is for two purposes
+ * 1. Define the size of the "local storage".  It is
+ *    the map's value_size.
+ *
+ * 2. Maintain a list to keep track of all elems such
+ *    that they can be cleaned up during the map destruction.
+ *
+ * When a bpf local storage is being looked up for a
+ * particular object,  the "bpf_map" pointer is actually used
+ * as the "key" to search in the list of elem in
+ * the respective bpf_local_storage owned by the object.
+ *
+ * e.g. sk->sk_bpf_storage is the mini-map with the "bpf_map" pointer
+ * as the searching key.
+ */
+struct bpf_local_storage_map {
+	struct bpf_map map;
+	/* Lookup elem does not require accessing the map.
+	 *
+	 * Updating/Deleting requires a bucket lock to
+	 * link/unlink the elem from the map.  Having
+	 * multiple buckets to improve contention.
+	 */
+	struct bpf_local_storage_map_bucket *buckets;
+	u32 bucket_log;
+	u16 elem_size;
+	u16 cache_idx;
+};
+
+struct bpf_local_storage_data {
+	/* smap is used as the searching key when looking up
+	 * from the object's bpf_local_storage.
+	 *
+	 * Put it in the same cacheline as the data to minimize
+	 * the number of cachelines access during the cache hit case.
+	 */
+	struct bpf_local_storage_map __rcu *smap;
+	u8 data[] __aligned(8);
+};
+
+/* Linked to bpf_local_storage and bpf_local_storage_map */
+struct bpf_local_storage_elem {
+	struct hlist_node map_node;	/* Linked to bpf_local_storage_map */
+	struct hlist_node snode;	/* Linked to bpf_local_storage */
+	struct bpf_local_storage __rcu *local_storage;
+	struct rcu_head rcu;
+	/* 8 bytes hole */
+	/* The data is stored in aother cacheline to minimize
+	 * the number of cachelines access during a cache hit.
+	 */
+	struct bpf_local_storage_data sdata ____cacheline_aligned;
+};
+
+struct bpf_local_storage {
+	struct bpf_local_storage_data __rcu *cache[BPF_LOCAL_STORAGE_CACHE_SIZE];
+	struct hlist_head list; /* List of bpf_local_storage_elem */
+	void *owner;		/* The object that owns the the above "list" of
+				 * bpf_local_storage_elem.
+				 */
+	struct rcu_head rcu;
+	raw_spinlock_t lock;	/* Protect adding/removing from the "list" */
+};
+
+struct bpf_local_storage_cache {
+	spinlock_t idx_lock;
+	u64 idx_usage_counts[BPF_LOCAL_STORAGE_CACHE_SIZE];
+};
+
+#define DEFINE_BPF_STORAGE_CACHE(name)				\
+static struct bpf_local_storage_cache name = {			\
+	.idx_lock = __SPIN_LOCK_UNLOCKED(name.idx_lock),	\
+}
+
+u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache);
+void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache,
+				      u16 idx);
+
+/* U16_MAX is much more than enough for sk local storage
+ * considering a tcp_sock is ~2k.
+ */
+#define BPF_LOCAL_STORAGE_MAX_VALUE_SIZE				       \
+	min_t(u32,                                                             \
+	      (KMALLOC_MAX_SIZE - MAX_BPF_STACK -                              \
+	       sizeof(struct bpf_local_storage_elem)),                         \
+	      (U16_MAX - sizeof(struct bpf_local_storage_elem)))
+
+#define SELEM(_SDATA)                                                          \
+	container_of((_SDATA), struct bpf_local_storage_elem, sdata)
+#define SDATA(_SELEM) (&(_SELEM)->sdata)
+
+/* Helper functions for bpf_local_storage */
+int bpf_local_storage_map_alloc_check(union bpf_attr *attr);
+
+struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr);
+
+struct bpf_local_storage_data *
+bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
+			 struct bpf_local_storage_map *smap,
+			 bool cacheit_lockit);
+
+void bpf_local_storage_map_free(struct bpf_local_storage_map *smap);
+
+int bpf_local_storage_map_check_btf(const struct bpf_map *map,
+				    const struct btf *btf,
+				    const struct btf_type *key_type,
+				    const struct btf_type *value_type);
+
+void bpf_selem_link_storage(struct bpf_local_storage *local_storage,
+			    struct bpf_local_storage_elem *selem);
+
+bool bpf_selem_unlink_storage(struct bpf_local_storage *local_storage,
+			      struct bpf_local_storage_elem *selem,
+			      bool uncharge_omem);
+
+void bpf_selem_unlink(struct bpf_local_storage_elem *selem);
+
+void bpf_selem_link_map(struct bpf_local_storage_map *smap,
+			struct bpf_local_storage_elem *selem);
+
+void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem);
+
+struct bpf_local_storage_elem *
+bpf_selem_alloc(struct bpf_local_storage_map *smap, void *value);
+
+struct bpf_local_storage *
+bpf_local_storage_alloc(struct bpf_local_storage_map *smap);
+
+int bpf_local_storage_publish(struct bpf_local_storage_elem *first_selem,
+			      struct bpf_local_storage **addr,
+			      struct bpf_local_storage *curr);
+
+int bpf_local_storage_check_update_flags(struct bpf_map *map, u64 map_flags);
+
+struct bpf_local_storage_data *
+bpf_local_storage_update(void *owner, struct bpf_map *map,
+			 struct bpf_local_storage *local_storage, void *value,
+			 u64 map_flags);
+
+#endif /* _BPF_LOCAL_STORAGE_H */
diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h
index e3185cfb91da..4cdf37ac278c 100644
--- a/include/net/bpf_sk_storage.h
+++ b/include/net/bpf_sk_storage.h
@@ -25,70 +25,6 @@ struct sk_buff;
 struct nlattr;
 struct sock;
 
-#define BPF_LOCAL_STORAGE_CACHE_SIZE	16
-
-struct bpf_local_storage_cache {
-	spinlock_t idx_lock;
-	u64 idx_usage_counts[BPF_LOCAL_STORAGE_CACHE_SIZE];
-};
-
-#define DEFINE_BPF_STORAGE_CACHE(name)				\
-static struct bpf_local_storage_cache name = {			\
-	.idx_lock = __SPIN_LOCK_UNLOCKED(name.idx_lock),	\
-}
-
-u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache);
-void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache,
-				      u16 idx);
-
-/* Helper functions for bpf_local_storage */
-int bpf_local_storage_map_alloc_check(union bpf_attr *attr);
-
-struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr);
-
-struct bpf_local_storage_data *
-bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
-			 struct bpf_local_storage_map *smap,
-			 bool cacheit_lockit);
-
-void bpf_local_storage_map_free(struct bpf_local_storage_map *smap);
-
-int bpf_local_storage_map_check_btf(const struct bpf_map *map,
-				    const struct btf *btf,
-				    const struct btf_type *key_type,
-				    const struct btf_type *value_type);
-
-void bpf_selem_link_storage(struct bpf_local_storage *local_storage,
-			    struct bpf_local_storage_elem *selem);
-
-bool bpf_selem_unlink_storage(struct bpf_local_storage *local_storage,
-			      struct bpf_local_storage_elem *selem,
-			      bool uncharge_omem);
-
-void bpf_selem_unlink(struct bpf_local_storage_elem *selem);
-
-void bpf_selem_link_map(struct bpf_local_storage_map *smap,
-			struct bpf_local_storage_elem *selem);
-
-void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem);
-
-struct bpf_local_storage_elem *
-bpf_selem_alloc(struct bpf_local_storage_map *smap, void *value);
-
-struct bpf_local_storage *
-bpf_local_storage_alloc(struct bpf_local_storage_map *smap);
-
-int bpf_local_storage_publish(struct bpf_local_storage_elem *first_selem,
-			      struct bpf_local_storage **addr,
-			      struct bpf_local_storage *curr);
-
-int bpf_local_storage_check_update_flags(struct bpf_map *map, u64 map_flags);
-
-struct bpf_local_storage_data *
-bpf_local_storage_update(void *owner, struct bpf_map *map,
-			 struct bpf_local_storage *local_storage, void *value,
-			 u64 map_flags);
-
 #ifdef CONFIG_BPF_SYSCALL
 int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk);
 struct bpf_sk_storage_diag *
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index 1131a921e1a6..0acb8f8a6042 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_BPF_JIT) += dispatcher.o
 ifeq ($(CONFIG_NET),y)
 obj-$(CONFIG_BPF_SYSCALL) += devmap.o
 obj-$(CONFIG_BPF_SYSCALL) += cpumap.o
+obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o
 obj-$(CONFIG_BPF_SYSCALL) += offload.o
 obj-$(CONFIG_BPF_SYSCALL) += net_namespace.o
 endif
diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
new file mode 100644
index 000000000000..cefda1f6dd24
--- /dev/null
+++ b/kernel/bpf/bpf_local_storage.c
@@ -0,0 +1,519 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2019 Facebook
+ * Copyright 2020 Google LLC.
+ */
+
+#include <linux/rculist.h>
+#include <linux/list.h>
+#include <linux/hash.h>
+#include <linux/types.h>
+#include <linux/spinlock.h>
+#include <linux/bpf.h>
+#include <linux/bpf_local_storage.h>
+#include <net/sock.h>
+#include <uapi/linux/sock_diag.h>
+#include <uapi/linux/btf.h>
+
+#define BPF_LOCAL_STORAGE_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_CLONE)
+
+static struct bpf_local_storage_map_bucket *
+select_bucket(struct bpf_local_storage_map *smap,
+	      struct bpf_local_storage_elem *selem)
+{
+	return &smap->buckets[hash_ptr(selem, smap->bucket_log)];
+}
+
+static bool selem_linked_to_storage(const struct bpf_local_storage_elem *selem)
+{
+	return !hlist_unhashed(&selem->snode);
+}
+
+static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
+{
+	return !hlist_unhashed(&selem->map_node);
+}
+
+struct bpf_local_storage_elem *
+bpf_selem_alloc(struct bpf_local_storage_map *smap, void *value)
+{
+	struct bpf_local_storage_elem *selem;
+
+	selem = kzalloc(smap->elem_size, GFP_ATOMIC | __GFP_NOWARN);
+	if (selem) {
+		if (value)
+			memcpy(SDATA(selem)->data, value, smap->map.value_size);
+		return selem;
+	}
+
+	return NULL;
+}
+
+/* local_storage->lock must be held and selem->local_storage == local_storage.
+ * The caller must ensure selem->smap is still valid to be
+ * dereferenced for its smap->elem_size and smap->cache_idx.
+ */
+bool bpf_selem_unlink_storage(struct bpf_local_storage *local_storage,
+			      struct bpf_local_storage_elem *selem,
+			      bool uncharge_omem)
+{
+	struct bpf_local_storage_map *smap;
+	bool free_local_storage;
+
+	smap = rcu_dereference(SDATA(selem)->smap);
+	/* local_storage is not freed now. local_storage->lock is
+	 * still held and raw_spin_unlock_bh(&local_storage->lock)
+	 * will be done by the caller.
+	 * Although the unlock will be done under
+	 * rcu_read_lock(),  it is more intutivie to
+	 * read if kfree_rcu(local_storage, rcu) is done
+	 * after the raw_spin_unlock_bh(&local_storage->lock).
+	 *
+	 * Hence, a "bool free_local_storage" is returned
+	 * to the caller which then calls the kfree_rcu()
+	 * after unlock.
+	 */
+	free_local_storage = smap->map.ops->map_local_storage_unlink(
+		local_storage, selem, uncharge_omem);
+	hlist_del_init_rcu(&selem->snode);
+	if (rcu_access_pointer(local_storage->cache[smap->cache_idx]) ==
+	    SDATA(selem))
+		RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
+
+	kfree_rcu(selem, rcu);
+
+	return free_local_storage;
+}
+
+
+static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem)
+{
+	struct bpf_local_storage *local_storage;
+	bool free_local_storage = false;
+
+	if (unlikely(!selem_linked_to_storage(selem)))
+		/* selem has already been unlinked from sk */
+		return;
+
+	local_storage = rcu_dereference(selem->local_storage);
+	raw_spin_lock_bh(&local_storage->lock);
+	if (likely(selem_linked_to_storage(selem)))
+		free_local_storage =
+			bpf_selem_unlink_storage(local_storage, selem, true);
+	raw_spin_unlock_bh(&local_storage->lock);
+
+	if (free_local_storage)
+		kfree_rcu(local_storage, rcu);
+}
+
+void bpf_selem_link_storage(struct bpf_local_storage *local_storage,
+			    struct bpf_local_storage_elem *selem)
+{
+	RCU_INIT_POINTER(selem->local_storage, local_storage);
+	hlist_add_head(&selem->snode, &local_storage->list);
+}
+
+void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
+{
+	struct bpf_local_storage_map *smap;
+	struct bpf_local_storage_map_bucket *b;
+
+	if (unlikely(!selem_linked_to_map(selem)))
+		/* selem has already be unlinked from smap */
+		return;
+
+	smap = rcu_dereference(SDATA(selem)->smap);
+	b = select_bucket(smap, selem);
+	raw_spin_lock_bh(&b->lock);
+	if (likely(selem_linked_to_map(selem)))
+		hlist_del_init_rcu(&selem->map_node);
+	raw_spin_unlock_bh(&b->lock);
+}
+
+void bpf_selem_link_map(struct bpf_local_storage_map *smap,
+			struct bpf_local_storage_elem *selem)
+{
+	struct bpf_local_storage_map_bucket *b = select_bucket(smap, selem);
+
+	raw_spin_lock_bh(&b->lock);
+	RCU_INIT_POINTER(SDATA(selem)->smap, smap);
+	hlist_add_head_rcu(&selem->map_node, &b->list);
+	raw_spin_unlock_bh(&b->lock);
+}
+
+void bpf_selem_unlink(struct bpf_local_storage_elem *selem)
+{
+	/* Always unlink from map before unlinking from local_storage
+	 * because selem will be freed after successfully unlinked from
+	 * the local_storage.
+	 */
+	bpf_selem_unlink_map(selem);
+	__bpf_selem_unlink_storage(selem);
+}
+
+struct bpf_local_storage_data *
+bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
+			 struct bpf_local_storage_map *smap,
+			 bool cacheit_lockit)
+{
+	struct bpf_local_storage_data *sdata;
+	struct bpf_local_storage_elem *selem;
+
+	/* Fast path (cache hit) */
+	sdata = rcu_dereference(local_storage->cache[smap->cache_idx]);
+	if (sdata && rcu_access_pointer(sdata->smap) == smap)
+		return sdata;
+
+	/* Slow path (cache miss) */
+	hlist_for_each_entry_rcu(selem, &local_storage->list, snode)
+		if (rcu_access_pointer(SDATA(selem)->smap) == smap)
+			break;
+
+	if (!selem)
+		return NULL;
+
+	sdata = SDATA(selem);
+	if (cacheit_lockit) {
+		/* spinlock is needed to avoid racing with the
+		 * parallel delete.  Otherwise, publishing an already
+		 * deleted sdata to the cache will become a use-after-free
+		 * problem in the next bpf_local_storage_lookup().
+		 */
+		raw_spin_lock_bh(&local_storage->lock);
+		if (selem_linked_to_storage(selem))
+			rcu_assign_pointer(local_storage->cache[smap->cache_idx],
+					   sdata);
+		raw_spin_unlock_bh(&local_storage->lock);
+	}
+
+	return sdata;
+}
+
+struct bpf_local_storage *
+bpf_local_storage_alloc(struct bpf_local_storage_map *smap)
+{
+	struct bpf_local_storage *storage;
+
+	storage = kzalloc(sizeof(*storage), GFP_ATOMIC | __GFP_NOWARN);
+	if (!storage)
+		return NULL;
+
+	INIT_HLIST_HEAD(&storage->list);
+	raw_spin_lock_init(&storage->lock);
+	return storage;
+}
+
+static int check_flags(const struct bpf_local_storage_data *old_sdata,
+		       u64 map_flags)
+{
+	if (old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_NOEXIST)
+		/* elem already exists */
+		return -EEXIST;
+
+	if (!old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_EXIST)
+		/* elem doesn't exist, cannot update it */
+		return -ENOENT;
+
+	return 0;
+}
+
+/* sk cannot be going away because it is linking new elem
+ * to sk->sk_bpf_storage. (i.e. sk->sk_refcnt cannot be 0).
+ * Otherwise, it will become a leak (and other memory issues
+ * during map destruction).
+ */
+struct bpf_local_storage_data *
+bpf_local_storage_update(void *owner, struct bpf_map *map,
+			 struct bpf_local_storage *local_storage, void *value,
+			 u64 map_flags)
+{
+	struct bpf_local_storage_data *old_sdata = NULL;
+	struct bpf_local_storage_elem *selem;
+	struct bpf_local_storage_map *smap;
+	int err;
+
+	smap = (struct bpf_local_storage_map *)map;
+
+	if ((map_flags & BPF_F_LOCK) && !(map_flags & BPF_NOEXIST)) {
+		/* Hoping to find an old_sdata to do inline update
+		 * such that it can avoid taking the local_storage->lock
+		 * and changing the lists.
+		 */
+		old_sdata = bpf_local_storage_lookup(local_storage, smap, false);
+		err = check_flags(old_sdata, map_flags);
+		if (err)
+			return ERR_PTR(err);
+
+		if (old_sdata && selem_linked_to_storage(SELEM(old_sdata))) {
+			copy_map_value_locked(map, old_sdata->data,
+					      value, false);
+			return old_sdata;
+		}
+	}
+
+	raw_spin_lock_bh(&local_storage->lock);
+
+	/* Recheck local_storage->list under local_storage->lock */
+	if (unlikely(hlist_empty(&local_storage->list))) {
+		/* A parallel del is happening and local_storage is going
+		 * away.  It has just been checked before, so very
+		 * unlikely.  Return instead of retry to keep things
+		 * simple.
+		 */
+		err = -EAGAIN;
+		goto unlock_err;
+	}
+
+	old_sdata = bpf_local_storage_lookup(local_storage, smap, false);
+	err = check_flags(old_sdata, map_flags);
+	if (err)
+		goto unlock_err;
+
+	if (old_sdata && (map_flags & BPF_F_LOCK)) {
+		copy_map_value_locked(map, old_sdata->data, value, false);
+		selem = SELEM(old_sdata);
+		goto unlock;
+	}
+
+	/* local_storage->lock is held.  Hence, we are sure
+	 * we can unlink and uncharge the old_sdata successfully
+	 * later.  Hence, instead of charging the new selem now
+	 * and then uncharge the old selem later (which may cause
+	 * a potential but unnecessary charge failure),  avoid taking
+	 * a charge at all here (the "!old_sdata" check) and the
+	 * old_sdata will not be uncharged later during bpf_selem_unlink().
+	 */
+	selem = map->ops->map_selem_alloc(smap, owner, value, !old_sdata);
+	if (!selem) {
+		err = -ENOMEM;
+		goto unlock_err;
+	}
+
+	/* First, link the new selem to the map */
+	bpf_selem_link_map(smap, selem);
+
+	/* Second, link (and publish) the new selem to local_storage */
+	bpf_selem_link_storage(local_storage, selem);
+
+	/* Third, remove old selem, SELEM(old_sdata) */
+	if (old_sdata) {
+		bpf_selem_unlink_map(SELEM(old_sdata));
+		bpf_selem_unlink_storage(local_storage, SELEM(old_sdata), false);
+	}
+
+unlock:
+	raw_spin_unlock_bh(&local_storage->lock);
+	return SDATA(selem);
+
+unlock_err:
+	raw_spin_unlock_bh(&local_storage->lock);
+	return ERR_PTR(err);
+}
+
+int bpf_local_storage_check_update_flags(struct bpf_map *map, u64 map_flags)
+{
+	/* BPF_EXIST and BPF_NOEXIST cannot be both set */
+	if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) ||
+	    /* BPF_F_LOCK can only be used in a value with spin_lock */
+	    unlikely((map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map)))
+		return -EINVAL;
+
+	return 0;
+}
+
+/* Publish local_storage to the address.  This is used because we are already
+ * in a region where we cannot grab a lock on the object owning the storage (
+ * (e.g sk->sk_lock). Hence, atomic ops is used.
+ *
+ * From now on, the addr pointer is protected
+ * by the local_storage->lock.  Hence, upon freeing,
+ * the local_storage->lock must be held before unlinking the storage from the
+ * owner.
+ */
+int bpf_local_storage_publish(struct bpf_local_storage_elem *first_selem,
+			      struct bpf_local_storage **addr,
+			      struct bpf_local_storage *curr)
+{
+	struct bpf_local_storage *prev;
+
+	prev = cmpxchg(addr, NULL, curr);
+	if (unlikely(prev)) {
+		/* Note that even first_selem was linked to smap's
+		 * bucket->list, first_selem can be freed immediately
+		 * (instead of kfree_rcu) because
+		 * bpf_local_storage_map_free() does a
+		 * synchronize_rcu() before walking the bucket->list.
+		 * Hence, no one is accessing selem from the
+		 * bucket->list under rcu_read_lock().
+		 */
+		bpf_selem_unlink_map(first_selem);
+		return -EAGAIN;
+	}
+
+	return 0;
+}
+
+u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache)
+{
+	u64 min_usage = U64_MAX;
+	u16 i, res = 0;
+
+	spin_lock(&cache->idx_lock);
+	for (i = 0; i < BPF_LOCAL_STORAGE_CACHE_SIZE; i++) {
+		if (cache->idx_usage_counts[i] < min_usage) {
+			min_usage = cache->idx_usage_counts[i];
+			res = i;
+
+			/* Found a free cache_idx */
+			if (!min_usage)
+				break;
+		}
+	}
+
+	cache->idx_usage_counts[res]++;
+
+	spin_unlock(&cache->idx_lock);
+
+	return res;
+}
+
+void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache,
+				      u16 idx)
+{
+	spin_lock(&cache->idx_lock);
+	cache->idx_usage_counts[idx]--;
+	spin_unlock(&cache->idx_lock);
+}
+
+void bpf_local_storage_map_free(struct bpf_local_storage_map *smap)
+{
+	struct bpf_local_storage_elem *selem;
+	struct bpf_local_storage_map_bucket *b;
+	unsigned int i;
+
+	/* Note that this map might be concurrently cloned from
+	 * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone
+	 * RCU read section to finish before proceeding. New RCU
+	 * read sections should be prevented via bpf_map_inc_not_zero.
+	 */
+	synchronize_rcu();
+
+	/* bpf prog and the userspace can no longer access this map
+	 * now.  No new selem (of this map) can be added
+	 * to the bpf_local_storage or to the map bucket's list.
+	 *
+	 * The elem of this map can be cleaned up here
+	 * or by bpf_local_storage_free() during the destruction of the
+	 * owner object. eg. __sk_destruct.
+	 */
+	for (i = 0; i < (1U << smap->bucket_log); i++) {
+		b = &smap->buckets[i];
+
+		rcu_read_lock();
+		/* No one is adding to b->list now */
+		while ((selem = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(&b->list)),
+						 struct bpf_local_storage_elem,
+						 map_node))) {
+			bpf_selem_unlink(selem);
+			cond_resched_rcu();
+		}
+		rcu_read_unlock();
+	}
+
+	/* bpf_local_storage_free() may still need to access the map.
+	 * e.g. bpf_local_storage_free() has unlinked selem from the map
+	 * which then made the above while((selem = ...)) loop
+	 * exited immediately.
+	 *
+	 * However, the bpf_local_storage_free() still needs to access
+	 * the smap->elem_size to do the uncharging in
+	 * bpf_selem_unlink().
+	 *
+	 * Hence, wait another rcu grace period for the
+	 * bpf_local_storage_free() to finish.
+	 */
+	synchronize_rcu();
+
+	kvfree(smap->buckets);
+	kfree(smap);
+}
+
+int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
+{
+	if (attr->map_flags & ~BPF_LOCAL_STORAGE_CREATE_FLAG_MASK ||
+	    !(attr->map_flags & BPF_F_NO_PREALLOC) ||
+	    attr->max_entries ||
+	    attr->key_size != sizeof(int) || !attr->value_size ||
+	    /* Enforce BTF for userspace sk dumping */
+	    !attr->btf_key_type_id || !attr->btf_value_type_id)
+		return -EINVAL;
+
+	if (!bpf_capable())
+		return -EPERM;
+
+	if (attr->value_size > BPF_LOCAL_STORAGE_MAX_VALUE_SIZE)
+		return -E2BIG;
+
+	return 0;
+}
+
+struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
+{
+	struct bpf_local_storage_map *smap;
+	unsigned int i;
+	u32 nbuckets;
+	u64 cost;
+	int ret;
+
+	smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN);
+	if (!smap)
+		return ERR_PTR(-ENOMEM);
+	bpf_map_init_from_attr(&smap->map, attr);
+
+	nbuckets = roundup_pow_of_two(num_possible_cpus());
+	/* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */
+	nbuckets = max_t(u32, 2, nbuckets);
+	smap->bucket_log = ilog2(nbuckets);
+	cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap);
+
+	ret = bpf_map_charge_init(&smap->map.memory, cost);
+	if (ret < 0) {
+		kfree(smap);
+		return ERR_PTR(ret);
+	}
+
+	smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
+				 GFP_USER | __GFP_NOWARN);
+	if (!smap->buckets) {
+		bpf_map_charge_finish(&smap->map.memory);
+		kfree(smap);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	for (i = 0; i < nbuckets; i++) {
+		INIT_HLIST_HEAD(&smap->buckets[i].list);
+		raw_spin_lock_init(&smap->buckets[i].lock);
+	}
+
+	smap->elem_size =
+		sizeof(struct bpf_local_storage_elem) + attr->value_size;
+
+	return smap;
+}
+
+int bpf_local_storage_map_check_btf(const struct bpf_map *map,
+				    const struct btf *btf,
+				    const struct btf_type *key_type,
+				    const struct btf_type *value_type)
+{
+	u32 int_data;
+
+	if (BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
+		return -EINVAL;
+
+	int_data = *(u32 *)(key_type + 1);
+	if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data))
+		return -EINVAL;
+
+	return 0;
+}
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index f6bb02f076ad..be0ed44d0887 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -7,96 +7,13 @@
 #include <linux/spinlock.h>
 #include <linux/bpf.h>
 #include <net/bpf_sk_storage.h>
+#include <linux/bpf_local_storage.h>
 #include <net/sock.h>
 #include <uapi/linux/sock_diag.h>
 #include <uapi/linux/btf.h>
 
-#define BPF_LOCAL_STORAGE_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_CLONE)
-
 DEFINE_BPF_STORAGE_CACHE(sk_cache);
 
-struct bpf_local_storage_map_bucket {
-	struct hlist_head list;
-	raw_spinlock_t lock;
-};
-
-/* Thp map is not the primary owner of a bpf_local_storage_elem.
- * Instead, the container object (eg. sk->sk_bpf_storage) is.
- *
- * The map (bpf_local_storage_map) is for two purposes
- * 1. Define the size of the "local storage".  It is
- *    the map's value_size.
- *
- * 2. Maintain a list to keep track of all elems such
- *    that they can be cleaned up during the map destruction.
- *
- * When a bpf local storage is being looked up for a
- * particular object,  the "bpf_map" pointer is actually used
- * as the "key" to search in the list of elem in
- * the respective bpf_local_storage owned by the object.
- *
- * e.g. sk->sk_bpf_storage is the mini-map with the "bpf_map" pointer
- * as the searching key.
- */
-struct bpf_local_storage_map {
-	struct bpf_map map;
-	/* Lookup elem does not require accessing the map.
-	 *
-	 * Updating/Deleting requires a bucket lock to
-	 * link/unlink the elem from the map.  Having
-	 * multiple buckets to improve contention.
-	 */
-	struct bpf_local_storage_map_bucket *buckets;
-	u32 bucket_log;
-	u16 elem_size;
-	u16 cache_idx;
-};
-
-struct bpf_local_storage_data {
-	/* smap is used as the searching key when looking up
-	 * from the object's bpf_local_storage.
-	 *
-	 * Put it in the same cacheline as the data to minimize
-	 * the number of cachelines access during the cache hit case.
-	 */
-	struct bpf_local_storage_map __rcu *smap;
-	u8 data[] __aligned(8);
-};
-
-/* Linked to bpf_local_storage and bpf_local_storage_map */
-struct bpf_local_storage_elem {
-	struct hlist_node map_node;	/* Linked to bpf_local_storage_map */
-	struct hlist_node snode;	/* Linked to bpf_local_storage */
-	struct bpf_local_storage __rcu *local_storage;
-	struct rcu_head rcu;
-	/* 8 bytes hole */
-	/* The data is stored in aother cacheline to minimize
-	 * the number of cachelines access during a cache hit.
-	 */
-	struct bpf_local_storage_data sdata ____cacheline_aligned;
-};
-
-#define SELEM(_SDATA)							\
-	container_of((_SDATA), struct bpf_local_storage_elem, sdata)
-#define SDATA(_SELEM) (&(_SELEM)->sdata)
-
-struct bpf_local_storage {
-	struct bpf_local_storage_data __rcu *cache[BPF_LOCAL_STORAGE_CACHE_SIZE];
-	struct hlist_head list; /* List of bpf_local_storage_elem */
-	void *owner;		/* The object that owns the the above "list" of
-				 * bpf_local_storage_elem.
-				 */
-	struct rcu_head rcu;
-	raw_spinlock_t lock;	/* Protect adding/removing from the "list" */
-};
-
-static struct bpf_local_storage_map_bucket *
-select_bucket(struct bpf_local_storage_map *smap,
-	      struct bpf_local_storage_elem *selem)
-{
-	return &smap->buckets[hash_ptr(selem, smap->bucket_log)];
-}
-
 static int omem_charge(struct sock *sk, unsigned int size)
 {
 	/* same check as in sock_kmalloc() */
@@ -109,31 +26,6 @@ static int omem_charge(struct sock *sk, unsigned int size)
 	return -ENOMEM;
 }
 
-static bool selem_linked_to_storage(const struct bpf_local_storage_elem *selem)
-{
-	return !hlist_unhashed(&selem->snode);
-}
-
-static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem)
-{
-	return !hlist_unhashed(&selem->map_node);
-}
-
-struct bpf_local_storage_elem *
-bpf_selem_alloc(struct bpf_local_storage_map *smap, void *value)
-{
-	struct bpf_local_storage_elem *selem;
-
-	selem = kzalloc(smap->elem_size, GFP_ATOMIC | __GFP_NOWARN);
-	if (selem) {
-		if (value)
-			memcpy(SDATA(selem)->data, value, smap->map.value_size);
-		return selem;
-	}
-
-	return NULL;
-}
-
 static struct bpf_local_storage_elem *
 sk_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value,
 	       bool charge_omem)
@@ -154,42 +46,6 @@ sk_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value,
 	return NULL;
 }
 
-/* sk_storage->lock must be held and selem->sk_storage == sk_storage.
- * The caller must ensure selem->smap is still valid to be
- * dereferenced for its smap->elem_size and smap->cache_idx.
- */
-bool bpf_selem_unlink_storage(struct bpf_local_storage *local_storage,
-			      struct bpf_local_storage_elem *selem,
-			      bool uncharge_omem)
-{
-	struct bpf_local_storage_map *smap;
-	bool free_local_storage;
-
-	smap = rcu_dereference(SDATA(selem)->smap);
-	/* local_storage is not freed now. local_storage->lock is
-	 * still held and raw_spin_unlock_bh(&local_storage->lock)
-	 * will be done by the caller.
-	 * Although the unlock will be done under
-	 * rcu_read_lock(),  it is more intutivie to
-	 * read if kfree_rcu(local_storage, rcu) is done
-	 * after the raw_spin_unlock_bh(&local_storage->lock).
-	 *
-	 * Hence, a "bool free_local_storage" is returned
-	 * to the caller which then calls the kfree_rcu()
-	 * after unlock.
-	 */
-	free_local_storage = smap->map.ops->map_local_storage_unlink(
-		local_storage, selem, uncharge_omem);
-	hlist_del_init_rcu(&selem->snode);
-	if (rcu_access_pointer(local_storage->cache[smap->cache_idx]) ==
-	    SDATA(selem))
-		RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL);
-
-	kfree_rcu(selem, rcu);
-
-	return free_local_storage;
-}
-
 static bool unlink_sk_storage(struct bpf_local_storage *local_storage,
 			      struct bpf_local_storage_elem *selem,
 			      bool uncharge_omem)
@@ -221,109 +77,6 @@ static bool unlink_sk_storage(struct bpf_local_storage *local_storage,
 	return free_local_storage;
 }
 
-static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem)
-{
-	struct bpf_local_storage *local_storage;
-	bool free_local_storage = false;
-
-	if (unlikely(!selem_linked_to_storage(selem)))
-		/* selem has already been unlinked from sk */
-		return;
-
-	local_storage = rcu_dereference(selem->local_storage);
-	raw_spin_lock_bh(&local_storage->lock);
-	if (likely(selem_linked_to_storage(selem)))
-		free_local_storage =
-			bpf_selem_unlink_storage(local_storage, selem, true);
-	raw_spin_unlock_bh(&local_storage->lock);
-
-	if (free_local_storage)
-		kfree_rcu(local_storage, rcu);
-}
-
-void bpf_selem_link_storage(struct bpf_local_storage *local_storage,
-			    struct bpf_local_storage_elem *selem)
-{
-	RCU_INIT_POINTER(selem->local_storage, local_storage);
-	hlist_add_head(&selem->snode, &local_storage->list);
-}
-
-void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
-{
-	struct bpf_local_storage_map *smap;
-	struct bpf_local_storage_map_bucket *b;
-
-	if (unlikely(!selem_linked_to_map(selem)))
-		/* selem has already be unlinked from smap */
-		return;
-
-	smap = rcu_dereference(SDATA(selem)->smap);
-	b = select_bucket(smap, selem);
-	raw_spin_lock_bh(&b->lock);
-	if (likely(selem_linked_to_map(selem)))
-		hlist_del_init_rcu(&selem->map_node);
-	raw_spin_unlock_bh(&b->lock);
-}
-
-void bpf_selem_link_map(struct bpf_local_storage_map *smap,
-			struct bpf_local_storage_elem *selem)
-{
-	struct bpf_local_storage_map_bucket *b = select_bucket(smap, selem);
-
-	raw_spin_lock_bh(&b->lock);
-	RCU_INIT_POINTER(SDATA(selem)->smap, smap);
-	hlist_add_head_rcu(&selem->map_node, &b->list);
-	raw_spin_unlock_bh(&b->lock);
-}
-
-void bpf_selem_unlink(struct bpf_local_storage_elem *selem)
-{
-	/* Always unlink from map before unlinking from local_storage
-	 * because selem will be freed after successfully unlinked from
-	 * the local_storage.
-	 */
-	bpf_selem_unlink_map(selem);
-	__bpf_selem_unlink_storage(selem);
-}
-
-struct bpf_local_storage_data *
-bpf_local_storage_lookup(struct bpf_local_storage *local_storage,
-			 struct bpf_local_storage_map *smap,
-			 bool cacheit_lockit)
-{
-	struct bpf_local_storage_data *sdata;
-	struct bpf_local_storage_elem *selem;
-
-	/* Fast path (cache hit) */
-	sdata = rcu_dereference(local_storage->cache[smap->cache_idx]);
-	if (sdata && rcu_access_pointer(sdata->smap) == smap)
-		return sdata;
-
-	/* Slow path (cache miss) */
-	hlist_for_each_entry_rcu(selem, &local_storage->list, snode)
-		if (rcu_access_pointer(SDATA(selem)->smap) == smap)
-			break;
-
-	if (!selem)
-		return NULL;
-
-	sdata = SDATA(selem);
-	if (cacheit_lockit) {
-		/* spinlock is needed to avoid racing with the
-		 * parallel delete.  Otherwise, publishing an already
-		 * deleted sdata to the cache will become a use-after-free
-		 * problem in the next bpf_local_storage_lookup().
-		 */
-		raw_spin_lock_bh(&local_storage->lock);
-		if (selem_linked_to_storage(selem))
-			rcu_assign_pointer(local_storage->cache[smap->cache_idx],
-					   sdata);
-		raw_spin_unlock_bh(&local_storage->lock);
-	}
-
-	return sdata;
-}
-
 static struct bpf_local_storage_data *
 sk_storage_lookup(struct sock *sk, struct bpf_map *map, bool cacheit_lockit)
 {
@@ -338,34 +91,6 @@ sk_storage_lookup(struct sock *sk, struct bpf_map *map, bool cacheit_lockit)
 	return bpf_local_storage_lookup(sk_storage, smap, cacheit_lockit);
 }
 
-static int check_flags(const struct bpf_local_storage_data *old_sdata,
-		       u64 map_flags)
-{
-	if (old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_NOEXIST)
-		/* elem already exists */
-		return -EEXIST;
-
-	if (!old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_EXIST)
-		/* elem doesn't exist, cannot update it */
-		return -ENOENT;
-
-	return 0;
-}
-
-struct bpf_local_storage *
-bpf_local_storage_alloc(struct bpf_local_storage_map *smap)
-{
-	struct bpf_local_storage *storage;
-
-	storage = kzalloc(sizeof(*storage), GFP_ATOMIC | __GFP_NOWARN);
-	if (!storage)
-		return NULL;
-
-	INIT_HLIST_HEAD(&storage->list);
-	raw_spin_lock_init(&storage->lock);
-	return storage;
-}
-
 static int sk_storage_alloc(void *owner,
 			    struct bpf_local_storage_map *smap,
 			    struct bpf_local_storage_elem *first_selem)
@@ -402,142 +127,6 @@ static int sk_storage_alloc(void *owner,
 	return err;
 }
 
-/* sk cannot be going away because it is linking new elem
- * to sk->sk_bpf_storage. (i.e. sk->sk_refcnt cannot be 0).
- * Otherwise, it will become a leak (and other memory issues
- * during map destruction).
- */
-struct bpf_local_storage_data *
-bpf_local_storage_update(void *owner, struct bpf_map *map,
-			 struct bpf_local_storage *local_storage, void *value,
-			 u64 map_flags)
-{
-	struct bpf_local_storage_data *old_sdata = NULL;
-	struct bpf_local_storage_elem *selem;
-	struct bpf_local_storage_map *smap;
-	int err;
-
-	smap = (struct bpf_local_storage_map *)map;
-
-	if ((map_flags & BPF_F_LOCK) && !(map_flags & BPF_NOEXIST)) {
-		/* Hoping to find an old_sdata to do inline update
-		 * such that it can avoid taking the local_storage->lock
-		 * and changing the lists.
-		 */
-		old_sdata = bpf_local_storage_lookup(local_storage, smap, false);
-		err = check_flags(old_sdata, map_flags);
-		if (err)
-			return ERR_PTR(err);
-
-		if (old_sdata && selem_linked_to_storage(SELEM(old_sdata))) {
-			copy_map_value_locked(map, old_sdata->data,
-					      value, false);
-			return old_sdata;
-		}
-	}
-
-	raw_spin_lock_bh(&local_storage->lock);
-
-	/* Recheck local_storage->list under local_storage->lock */
-	if (unlikely(hlist_empty(&local_storage->list))) {
-		/* A parallel del is happening and local_storage is going
-		 * away.  It has just been checked before, so very
-		 * unlikely.  Return instead of retry to keep things
-		 * simple.
-		 */
-		err = -EAGAIN;
-		goto unlock_err;
-	}
-
-	old_sdata = bpf_local_storage_lookup(local_storage, smap, false);
-	err = check_flags(old_sdata, map_flags);
-	if (err)
-		goto unlock_err;
-
-	if (old_sdata && (map_flags & BPF_F_LOCK)) {
-		copy_map_value_locked(map, old_sdata->data, value, false);
-		selem = SELEM(old_sdata);
-		goto unlock;
-	}
-
-	/* local_storage->lock is held.  Hence, we are sure
-	 * we can unlink and uncharge the old_sdata successfully
-	 * later.  Hence, instead of charging the new selem now
-	 * and then uncharge the old selem later (which may cause
-	 * a potential but unnecessary charge failure),  avoid taking
-	 * a charge at all here (the "!old_sdata" check) and the
-	 * old_sdata will not be uncharged later during bpf_selem_unlink().
-	 */
-	selem = map->ops->map_selem_alloc(smap, owner, value, !old_sdata);
-	if (!selem) {
-		err = -ENOMEM;
-		goto unlock_err;
-	}
-
-	/* First, link the new selem to the map */
-	bpf_selem_link_map(smap, selem);
-
-	/* Second, link (and publish) the new selem to local_storage */
-	bpf_selem_link_storage(local_storage, selem);
-
-	/* Third, remove old selem, SELEM(old_sdata) */
-	if (old_sdata) {
-		bpf_selem_unlink_map(SELEM(old_sdata));
-		bpf_selem_unlink_storage(local_storage, SELEM(old_sdata), false);
-	}
-
-unlock:
-	raw_spin_unlock_bh(&local_storage->lock);
-	return SDATA(selem);
-
-unlock_err:
-	raw_spin_unlock_bh(&local_storage->lock);
-	return ERR_PTR(err);
-}
-
-int bpf_local_storage_check_update_flags(struct bpf_map *map, u64 map_flags)
-{
-	/* BPF_EXIST and BPF_NOEXIST cannot be both set */
-	if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) ||
-	    /* BPF_F_LOCK can only be used in a value with spin_lock */
-	    unlikely((map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map)))
-		return -EINVAL;
-
-	return 0;
-}
-
-/* Publish local_storage to the address.  This is used because we are already
- * in a region where we cannot grab a lock on the object owning the storage (
- * (e.g sk->sk_lock). Hence, atomic ops is used.
- *
- * From now on, the addr pointer is protected
- * by the local_storage->lock.  Hence, upon freeing,
- * the local_storage->lock must be held before unlinking the storage from the
- * owner.
- */
-int bpf_local_storage_publish(struct bpf_local_storage_elem *first_selem,
-			      struct bpf_local_storage **addr,
-			      struct bpf_local_storage *curr)
-{
-	struct bpf_local_storage *prev;
-
-	prev = cmpxchg(addr, NULL, curr);
-	if (unlikely(prev)) {
-		/* Note that even first_selem was linked to smap's
-		 * bucket->list, first_selem can be freed immediately
-		 * (instead of kfree_rcu) because
-		 * bpf_local_storage_map_free() does a
-		 * synchronize_rcu() before walking the bucket->list.
-		 * Hence, no one is accessing selem from the
-		 * bucket->list under rcu_read_lock().
-		 */
-		bpf_selem_unlink_map(first_selem);
-		return -EAGAIN;
-	}
-
-	return 0;
-}
-
 static struct bpf_local_storage_data *
 sk_storage_update(void *owner, struct bpf_map *map, void *value, u64 map_flags)
 {
@@ -589,38 +178,6 @@ static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
 	return 0;
 }
 
-u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache)
-{
-	u64 min_usage = U64_MAX;
-	u16 i, res = 0;
-
-	spin_lock(&cache->idx_lock);
-
-	for (i = 0; i < BPF_LOCAL_STORAGE_CACHE_SIZE; i++) {
-		if (cache->idx_usage_counts[i] < min_usage) {
-			min_usage = cache->idx_usage_counts[i];
-			res = i;
-
-			/* Found a free cache_idx */
-			if (!min_usage)
-				break;
-		}
-	}
-	cache->idx_usage_counts[res]++;
-
-	spin_unlock(&cache->idx_lock);
-
-	return res;
-}
-
-void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache,
-				      u16 idx)
-{
-	spin_lock(&cache->idx_lock);
-	cache->idx_usage_counts[idx]--;
-	spin_unlock(&cache->idx_lock);
-}
-
 /* Called by __sk_destruct() & bpf_sk_storage_clone() */
 void bpf_sk_storage_free(struct sock *sk)
 {
@@ -661,59 +218,6 @@ void bpf_sk_storage_free(struct sock *sk)
 		kfree_rcu(sk_storage, rcu);
 }
 
-void bpf_local_storage_map_free(struct bpf_local_storage_map *smap)
-{
-	struct bpf_local_storage_elem *selem;
-	struct bpf_local_storage_map_bucket *b;
-	unsigned int i;
-
-	/* Note that this map might be concurrently cloned from
-	 * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone
-	 * RCU read section to finish before proceeding. New RCU
-	 * read sections should be prevented via bpf_map_inc_not_zero.
-	 */
-	synchronize_rcu();
-
-	/* bpf prog and the userspace can no longer access this map
-	 * now.  No new selem (of this map) can be added
-	 * to the bpf_local_storage or to the map bucket's list.
-	 *
-	 * The elem of this map can be cleaned up here
-	 * or by bpf_local_storage_free() during the destruction of the
-	 * owner object. eg. __sk_destruct.
-	 */
-	for (i = 0; i < (1U << smap->bucket_log); i++) {
-		b = &smap->buckets[i];
-
-		rcu_read_lock();
-		/* No one is adding to b->list now */
-		while ((selem = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(&b->list)),
-						 struct bpf_local_storage_elem,
-						 map_node))) {
-			bpf_selem_unlink(selem);
-			cond_resched_rcu();
-		}
-		rcu_read_unlock();
-	}
-
-	/* bpf_local_storage_free() may still need to access the map.
-	 * e.g. bpf_local_storage_free() has unlinked selem from the map
-	 * which then made the above while((selem = ...)) loop
-	 * exited immediately.
-	 *
-	 * However, the bpf_local_storage_free() still needs to access
-	 * the smap->elem_size to do the uncharging in
-	 * bpf_selem_unlink().
-	 *
-	 * Hence, wait another rcu grace period for the
-	 * bpf_local_storage_free() to finish.
-	 */
-	synchronize_rcu();
-
-	kvfree(smap->buckets);
-	kfree(smap);
-}
-
 static void sk_storage_map_free(struct bpf_map *map)
 {
 	struct bpf_local_storage_map *smap;
@@ -723,78 +227,6 @@ static void sk_storage_map_free(struct bpf_map *map)
 	bpf_local_storage_map_free(smap);
 }
 
-/* U16_MAX is much more than enough for sk local storage
- * considering a tcp_sock is ~2k.
- */
-#define BPF_LOCAL_STORAGE_MAX_VALUE_SIZE				\
-	min_t(u32,							\
-	      (KMALLOC_MAX_SIZE - MAX_BPF_STACK -			\
-	       sizeof(struct bpf_local_storage_elem)),			\
-	      (U16_MAX - sizeof(struct bpf_local_storage_elem)))
-
-int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
-{
-	if (attr->map_flags & ~BPF_LOCAL_STORAGE_CREATE_FLAG_MASK ||
-	    !(attr->map_flags & BPF_F_NO_PREALLOC) ||
-	    attr->max_entries ||
-	    attr->key_size != sizeof(int) || !attr->value_size ||
-	    /* Enforce BTF for userspace sk dumping */
-	    !attr->btf_key_type_id || !attr->btf_value_type_id)
-		return -EINVAL;
-
-	if (!bpf_capable())
-		return -EPERM;
-
-	if (attr->value_size > BPF_LOCAL_STORAGE_MAX_VALUE_SIZE)
-		return -E2BIG;
-
-	return 0;
-}
-
-struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr)
-{
-	struct bpf_local_storage_map *smap;
-	unsigned int i;
-	u32 nbuckets;
-	u64 cost;
-	int ret;
-
-	smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN);
-	if (!smap)
-		return ERR_PTR(-ENOMEM);
-	bpf_map_init_from_attr(&smap->map, attr);
-
-	nbuckets = roundup_pow_of_two(num_possible_cpus());
-	/* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */
-	nbuckets = max_t(u32, 2, nbuckets);
-	smap->bucket_log = ilog2(nbuckets);
-	cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap);
-
-	ret = bpf_map_charge_init(&smap->map.memory, cost);
-	if (ret < 0) {
-		kfree(smap);
-		return ERR_PTR(ret);
-	}
-
-	smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets,
-				 GFP_USER | __GFP_NOWARN);
-	if (!smap->buckets) {
-		bpf_map_charge_finish(&smap->map.memory);
-		kfree(smap);
-		return ERR_PTR(-ENOMEM);
-	}
-
-	for (i = 0; i < nbuckets; i++) {
-		INIT_HLIST_HEAD(&smap->buckets[i].list);
-		raw_spin_lock_init(&smap->buckets[i].lock);
-	}
-
-	smap->elem_size =
-		sizeof(struct bpf_local_storage_elem) + attr->value_size;
-
-	return smap;
-}
-
 static struct bpf_map *sk_storage_map_alloc(union bpf_attr *attr)
 {
 	struct bpf_local_storage_map *smap;
@@ -813,23 +245,6 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key,
 	return -ENOTSUPP;
 }
 
-int bpf_local_storage_map_check_btf(const struct bpf_map *map,
-				    const struct btf *btf,
-				    const struct btf_type *key_type,
-				    const struct btf_type *value_type)
-{
-	u32 int_data;
-
-	if (BTF_INFO_KIND(key_type->info) != BTF_KIND_INT)
-		return -EINVAL;
-
-	int_data = *(u32 *)(key_type + 1);
-	if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data))
-		return -EINVAL;
-
-	return 0;
-}
-
 static void *bpf_fd_sk_storage_lookup_elem(struct bpf_map *map, void *key)
 {
 	struct bpf_local_storage_data *sdata;
-- 
2.28.0.rc0.105.gf9edc3c819-goog


^ permalink raw reply related

* KASAN: slab-out-of-bounds Read in vsscanf (2)
From: syzbot @ 2020-07-22 18:32 UTC (permalink / raw)
  To: casey, jmorris, linux-kernel, linux-security-module, serge,
	syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    f932d58a Merge tag 'scsi-fixes' of git://git.kernel.org/pu..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10337320900000
kernel config:  https://syzkaller.appspot.com/x/.config?x=e944500a36bc4d55
dashboard link: https://syzkaller.appspot.com/bug?extid=a22c6092d003d6fe1122
compiler:       clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15ce6d7d100000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1207c827100000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+a22c6092d003d6fe1122@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: slab-out-of-bounds in vsscanf+0x2666/0x2ef0 lib/vsprintf.c:3321
Read of size 1 at addr ffff888097d682b8 by task syz-executor980/6804

CPU: 0 PID: 6804 Comm: syz-executor980 Not tainted 5.8.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1f0/0x31e lib/dump_stack.c:118
 print_address_description+0x66/0x5a0 mm/kasan/report.c:383
 __kasan_report mm/kasan/report.c:513 [inline]
 kasan_report+0x132/0x1d0 mm/kasan/report.c:530
 vsscanf+0x2666/0x2ef0 lib/vsprintf.c:3321
 sscanf+0x6c/0x90 lib/vsprintf.c:3527
 smk_set_cipso+0x374/0x6c0 security/smack/smackfs.c:908
 vfs_write+0x2dd/0xc70 fs/read_write.c:576
 ksys_write+0x11b/0x220 fs/read_write.c:631
 do_syscall_64+0x73/0xe0 arch/x86/entry/common.c:384
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x4402d9
Code: Bad RIP value.
RSP: 002b:00007ffe89010db8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004402d9
RDX: 0000000000000037 RSI: 0000000020000040 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 0000000000000014 R09: 00000000004002c8
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401ae0
R13: 0000000000401b70 R14: 0000000000000000 R15: 0000000000000000

Allocated by task 6804:
 save_stack mm/kasan/common.c:48 [inline]
 set_track mm/kasan/common.c:56 [inline]
 __kasan_kmalloc+0x103/0x140 mm/kasan/common.c:494
 __do_kmalloc mm/slab.c:3656 [inline]
 __kmalloc_track_caller+0x249/0x320 mm/slab.c:3671
 memdup_user_nul+0x26/0xf0 mm/util.c:259
 smk_set_cipso+0xff/0x6c0 security/smack/smackfs.c:859
 vfs_write+0x2dd/0xc70 fs/read_write.c:576
 ksys_write+0x11b/0x220 fs/read_write.c:631
 do_syscall_64+0x73/0xe0 arch/x86/entry/common.c:384
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Freed by task 4906:
 save_stack mm/kasan/common.c:48 [inline]
 set_track mm/kasan/common.c:56 [inline]
 kasan_set_free_info mm/kasan/common.c:316 [inline]
 __kasan_slab_free+0x114/0x170 mm/kasan/common.c:455
 __cache_free mm/slab.c:3426 [inline]
 kfree+0x10a/0x220 mm/slab.c:3757
 tomoyo_path_number_perm+0x525/0x690 security/tomoyo/file.c:736
 tomoyo_path_mknod+0x128/0x150 security/tomoyo/tomoyo.c:240
 security_path_mknod+0xdc/0x160 security/security.c:1077
 may_o_create fs/namei.c:2919 [inline]
 lookup_open fs/namei.c:3060 [inline]
 open_last_lookups fs/namei.c:3169 [inline]
 path_openat+0xbe8/0x37f0 fs/namei.c:3357
 do_filp_open+0x191/0x3a0 fs/namei.c:3387
 do_sys_openat2+0x463/0x770 fs/open.c:1179
 do_sys_open fs/open.c:1195 [inline]
 ksys_open include/linux/syscalls.h:1388 [inline]
 __do_sys_open fs/open.c:1201 [inline]
 __se_sys_open fs/open.c:1199 [inline]
 __x64_sys_open+0x1af/0x1e0 fs/open.c:1199
 do_syscall_64+0x73/0xe0 arch/x86/entry/common.c:384
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The buggy address belongs to the object at ffff888097d68280
 which belongs to the cache kmalloc-64 of size 64
The buggy address is located 56 bytes inside of
 64-byte region [ffff888097d68280, ffff888097d682c0)
The buggy address belongs to the page:
page:ffffea00025f5a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888097d68c80
flags: 0xfffe0000000200(slab)
raw: 00fffe0000000200 ffffea000288fe08 ffffea00026f38c8 ffff8880aa400380
raw: ffff888097d68c80 ffff888097d68000 000000010000001e 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff888097d68180: 00 00 00 00 01 fc fc fc fc fc fc fc fc fc fc fc
 ffff888097d68200: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>ffff888097d68280: 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fc
                                        ^
 ffff888097d68300: 00 00 00 00 01 fc fc fc fc fc fc fc fc fc fc fc
 ffff888097d68380: 00 00 00 00 01 fc fc fc fc fc fc fc fc fc fc fc
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH v6 5/7] fs,doc: Enable to enforce noexec mounts or file exec through O_MAYEXEC
From: Mickaël Salaün @ 2020-07-22 19:04 UTC (permalink / raw)
  To: Thibaut Sautereau, Kees Cook
  Cc: linux-kernel, Aleksa Sarai, Alexei Starovoitov, Al Viro,
	Andrew Morton, Andy Lutomirski, Christian Brauner,
	Christian Heimes, Daniel Borkmann, Deven Bowers, Dmitry Vyukov,
	Eric Biggers, Eric Chiang, Florian Weimer, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Lakshmi Ramasubramanian,
	Matthew Garrett, Matthew Wilcox, Michael Kerrisk,
	Mickaël Salaün, Mimi Zohar, Philippe Trébuchet,
	Scott Shell, Sean Christopherson, Shuah Khan, Steve Dower,
	Steve Grubb, Tetsuo Handa, Thibaut Sautereau, Vincent Strubel,
	kernel-hardening, linux-api, linux-integrity,
	linux-security-module, linux-fsdevel
In-Reply-To: <20200722161639.GA24129@gandi.net>


On 22/07/2020 18:16, Thibaut Sautereau wrote:
> On Thu, Jul 16, 2020 at 04:39:14PM +0200, Mickaël Salaün wrote:
>>
>> On 15/07/2020 22:37, Kees Cook wrote:
>>> On Tue, Jul 14, 2020 at 08:16:36PM +0200, Mickaël Salaün wrote:
>>>> @@ -2849,7 +2855,7 @@ static int may_open(const struct path *path, int acc_mode, int flag)
>>>>  	case S_IFLNK:
>>>>  		return -ELOOP;
>>>>  	case S_IFDIR:
>>>> -		if (acc_mode & (MAY_WRITE | MAY_EXEC))
>>>> +		if (acc_mode & (MAY_WRITE | MAY_EXEC | MAY_OPENEXEC))
>>>>  			return -EISDIR;
>>>>  		break;
>>>
>>> (I need to figure out where "open for reading" rejects S_IFDIR, since
>>> it's clearly not here...)
> 
> Doesn't it come from generic_read_dir() in fs/libfs.c?
> 
>>>
>>>>  	case S_IFBLK:
>>>> @@ -2859,13 +2865,26 @@ static int may_open(const struct path *path, int acc_mode, int flag)
>>>>  		fallthrough;
>>>>  	case S_IFIFO:
>>>>  	case S_IFSOCK:
>>>> -		if (acc_mode & MAY_EXEC)
>>>> +		if (acc_mode & (MAY_EXEC | MAY_OPENEXEC))
>>>>  			return -EACCES;
>>>>  		flag &= ~O_TRUNC;
>>>>  		break;
>>>
>>> This will immediately break a system that runs code with MAY_OPENEXEC
>>> set but reads from a block, char, fifo, or socket, even in the case of
>>> a sysadmin leaving the "file" sysctl disabled.
>>
>> As documented, O_MAYEXEC is for regular files. The only legitimate use
>> case seems to be with pipes, which should probably be allowed when
>> enforcement is disabled.
> 
> By the way Kees, while we fix that for the next series, do you think it
> would be relevant, at least for the sake of clarity, to add a
> WARN_ON_ONCE(acc_mode & MAY_OPENEXEC) for the S_IFSOCK case, since a
> socket cannot be open anyway?
> 

We just did some more tests (for the next patch series) and it turns out
that may_open() can return EACCES before another part returns ENXIO.

As a reminder, the next series will deny access to block devices,
character devices, fifo and socket when opened with O_MAYEXEC *and* if
any policy is enforced (via the sysctl).

The question is then: do we prefer to return EACCES when a policy is
enforced (on a socket), or do we stick to the ENXIO? The EACCES approach
will be more consistent with devices and fifo handling, and seems safer
(belt and suspenders) thought.

^ permalink raw reply

* Re: [PATCH v6 5/7] fs,doc: Enable to enforce noexec mounts or file exec through O_MAYEXEC
From: Kees Cook @ 2020-07-22 19:40 UTC (permalink / raw)
  To: Mickaël Salaün
  Cc: Thibaut Sautereau, linux-kernel, Aleksa Sarai, Alexei Starovoitov,
	Al Viro, Andrew Morton, Andy Lutomirski, Christian Brauner,
	Christian Heimes, Daniel Borkmann, Deven Bowers, Dmitry Vyukov,
	Eric Biggers, Eric Chiang, Florian Weimer, James Morris, Jan Kara,
	Jann Horn, Jonathan Corbet, Lakshmi Ramasubramanian,
	Matthew Garrett, Matthew Wilcox, Michael Kerrisk,
	Mickaël Salaün, Mimi Zohar, Philippe Trébuchet,
	Scott Shell, Sean Christopherson, Shuah Khan, Steve Dower,
	Steve Grubb, Tetsuo Handa, Thibaut Sautereau, Vincent Strubel,
	kernel-hardening, linux-api, linux-integrity,
	linux-security-module, linux-fsdevel
In-Reply-To: <efb88aab-f9f9-4b66-e7ab-3aa054eec96e@digikod.net>

On Wed, Jul 22, 2020 at 09:04:28PM +0200, Mickaël Salaün wrote:
> 
> On 22/07/2020 18:16, Thibaut Sautereau wrote:
> > On Thu, Jul 16, 2020 at 04:39:14PM +0200, Mickaël Salaün wrote:
> >>
> >> On 15/07/2020 22:37, Kees Cook wrote:
> >>> On Tue, Jul 14, 2020 at 08:16:36PM +0200, Mickaël Salaün wrote:
> >>>> @@ -2849,7 +2855,7 @@ static int may_open(const struct path *path, int acc_mode, int flag)
> >>>>  	case S_IFLNK:
> >>>>  		return -ELOOP;
> >>>>  	case S_IFDIR:
> >>>> -		if (acc_mode & (MAY_WRITE | MAY_EXEC))
> >>>> +		if (acc_mode & (MAY_WRITE | MAY_EXEC | MAY_OPENEXEC))
> >>>>  			return -EISDIR;
> >>>>  		break;
> >>>
> >>> (I need to figure out where "open for reading" rejects S_IFDIR, since
> >>> it's clearly not here...)
> > 
> > Doesn't it come from generic_read_dir() in fs/libfs.c?
> > 
> >>>
> >>>>  	case S_IFBLK:
> >>>> @@ -2859,13 +2865,26 @@ static int may_open(const struct path *path, int acc_mode, int flag)
> >>>>  		fallthrough;
> >>>>  	case S_IFIFO:
> >>>>  	case S_IFSOCK:
> >>>> -		if (acc_mode & MAY_EXEC)
> >>>> +		if (acc_mode & (MAY_EXEC | MAY_OPENEXEC))
> >>>>  			return -EACCES;
> >>>>  		flag &= ~O_TRUNC;
> >>>>  		break;
> >>>
> >>> This will immediately break a system that runs code with MAY_OPENEXEC
> >>> set but reads from a block, char, fifo, or socket, even in the case of
> >>> a sysadmin leaving the "file" sysctl disabled.
> >>
> >> As documented, O_MAYEXEC is for regular files. The only legitimate use
> >> case seems to be with pipes, which should probably be allowed when
> >> enforcement is disabled.
> > 
> > By the way Kees, while we fix that for the next series, do you think it
> > would be relevant, at least for the sake of clarity, to add a
> > WARN_ON_ONCE(acc_mode & MAY_OPENEXEC) for the S_IFSOCK case, since a
> > socket cannot be open anyway?

If it's a state that userspace should never be able to reach, then yes,
I think a WARN_ON_ONCE() would be nice.

> We just did some more tests (for the next patch series) and it turns out
> that may_open() can return EACCES before another part returns ENXIO.
> 
> As a reminder, the next series will deny access to block devices,
> character devices, fifo and socket when opened with O_MAYEXEC *and* if
> any policy is enforced (via the sysctl).
> 
> The question is then: do we prefer to return EACCES when a policy is
> enforced (on a socket), or do we stick to the ENXIO? The EACCES approach
> will be more consistent with devices and fifo handling, and seems safer
> (belt and suspenders) thought.

I think EACCES is correct for these cases, since it's a new flag, etc.

-- 
Kees Cook

^ permalink raw reply

* Re: [PATCH v2] KEYS: remove redundant memset
From: Joe Perches @ 2020-07-22 20:02 UTC (permalink / raw)
  To: trix, dhowells, jarkko.sakkinen, jmorris, serge, denkenz, marcel
  Cc: keyrings, linux-security-module, linux-kernel
In-Reply-To: <20200722134610.31947-1-trix@redhat.com>

On Wed, 2020-07-22 at 06:46 -0700, trix@redhat.com wrote:
> From: Tom Rix <trix@redhat.com>
> 
> Reviewing use of memset in keyctrl_pkey.c
> 
> keyctl_pkey_params_get prologue code to set params up
> 
> 	memset(params, 0, sizeof(*params));
> 	params->encoding = "raw";
> 
> keyctl_pkey_query has the same prologue
> and calls keyctl_pkey_params_get.
> 
> So remove the prologue.
> 
> Fixes: 00d60fd3b932 ("KEYS: Provide keyctls to drive the new key type ops for asymmetric keys [ver #2]")

At best, this is a micro optimization.

How is this appropriate for a Fixes: line?

> diff --git a/security/keys/keyctl_pkey.c b/security/keys/keyctl_pkey.c
[]
> @@ -166,8 +166,6 @@ long keyctl_pkey_query(key_serial_t id,
>  	struct kernel_pkey_query res;
>  	long ret;
>  
> -	memset(&params, 0, sizeof(params));
> -
>  	ret = keyctl_pkey_params_get(id, _info, &params);
>  	if (ret < 0)
>  		goto error;


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox