public inbox for linux-unionfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <brauner@kernel.org>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>,
	hu1.chen@intel.com, miklos@szeredi.hu, malini.bhandaru@intel.com,
	tim.c.chen@intel.com, mikko.ylinen@intel.com,
	lizhen.you@intel.com, linux-unionfs@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	David Howells <dhowells@redhat.com>,
	Seth Forshee <sforshee@kernel.org>
Subject: Re: [RFC] HACK: overlayfs: Optimize overlay/restore creds
Date: Tue, 19 Dec 2023 14:35:27 +0100	[thread overview]
Message-ID: <20231219-marken-pochen-26d888fb9bb9@brauner> (raw)
In-Reply-To: <CAOQ4uxibYMQw0iszKhE5uxBnyayHWjqp4ZnOOiugO3GxMRS1eA@mail.gmail.com>

On Tue, Dec 19, 2023 at 09:15:52AM +0200, Amir Goldstein wrote:
> On Mon, Dec 18, 2023 at 11:57 PM Vinicius Costa Gomes
> <vinicius.gomes@intel.com> wrote:
> >
> > Christian Brauner <brauner@kernel.org> writes:
> >
> > >> > Yes, the important thing is that an object cannot change
> > >> > its non_refcount property during its lifetime -
> > >>
> > >> ... which means that put_creds_ref() should assert that
> > >> there is only a single refcount - the one handed out by
> > >> prepare_creds_ref() before removing non_refcount or
> > >> directly freeing the cred object.
> > >>
> > >> I must say that the semantics of making a non-refcounted copy
> > >> to an object whose lifetime is managed by the caller sounds a lot
> > >> less confusing to me.
> > >
> > > So can't we do an override_creds() variant that is effectively just:
> 
> Yes, I think that we can....
> 
> > >
> > > /* caller guarantees lifetime of @new */
> > > const struct cred *foo_override_cred(const struct cred *new)
> > > {
> > >       const struct cred *old = current->cred;
> > >       rcu_assign_pointer(current->cred, new);
> > >       return old;
> > > }
> > >
> > > /* caller guarantees lifetime of @old */
> > > void foo_revert_creds(const struct cred *old)
> > > {
> > >       const struct cred *override = current->cred;
> > >       rcu_assign_pointer(current->cred, old);
> > > }
> > >
> 
> Even better(?), we can do this in the actual guard helpers to
> discourage use without a guard:
> 
> struct override_cred {
>         struct cred *cred;
> };
> 
> DEFINE_GUARD(override_cred, struct override_cred *,
>             override_cred_save(_T),
>             override_cred_restore(_T));
> 
> ...
> 
> void override_cred_save(struct override_cred *new)
> {
>         new->cred = rcu_replace_pointer(current->cred, new->cred, true);
> }
> 
> void override_cred_restore(struct override_cred *old)
> {
>         rcu_assign_pointer(current->cred, old->cred);
> }

The main thing we want is that it's somewhat clear that it's special
purpose interface (Sometimes I jokingly feel we should have
include/linux/quirky_overlayfs_helpers.h or actually working module
specific exports so we can export a helper to only a single module.
Whatever happened to that?).

If you do the cred guard thing then maybe name it:

{override,revert}_cred_light()

and then use them to implement the replace portion for:

{override,revert}_cred().

Yes, the {override,revert}_cred() naming isn't optimal but unless we
rename them as well to *_{save,restore} I don't see the point in making
the new helpers deviate from that pattern. They basically do the same
thing.

So my point is to just let them mirror the naming in __fget_light().
To a regular VFS developer the *_light() will give away that it probably
doesn't take a reference.

But I'm not married to that.

So I'd probably just do something like the following COMPLETELY UNTESTED
AND UNCOMPILED thing:

diff --git a/include/linux/cred.h b/include/linux/cred.h
index 2976f534a7a3..c975eb47e691 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -165,6 +165,24 @@ extern int cred_fscmp(const struct cred *, const struct cred *);
 extern void __init cred_init(void);
 extern int set_cred_ucounts(struct cred *);

+/*
+ * Override creds without bumping reference count. Caller must ensure
+ * reference remains valid or has taken reference. Almost always not the
+ * interface you want. Use override_creds()/revert_creds() instead.
+ */
+#define override_creds_light(override_cred)                       \
+       ({                                                        \
+               const struct cred *__old_cred = current->cred;    \
+               rcu_assign_pointer(current->cred, override_cred); \
+               __old_cred;                                       \
+       })
+
+#define revert_creds_light(revert_cred) \
+       rcu_assign_pointer(current->cred, revert_cred);
+
+DEFINE_GUARD(cred, struct cred *, override_creds_light(_T),
+            revert_creds_light(_T));
+
 static inline bool cap_ambient_invariant_ok(const struct cred *cred)
 {
        return cap_issubset(cred->cap_ambient,
diff --git a/kernel/cred.c b/kernel/cred.c
index c033a201c808..d6713edaee37 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -485,7 +485,7 @@ EXPORT_SYMBOL(abort_creds);
  */
 const struct cred *override_creds(const struct cred *new)
 {
-       const struct cred *old = current->cred;
+       const struct cred *old;

        kdebug("override_creds(%p{%ld})", new,
               atomic_long_read(&new->usage));
@@ -499,8 +499,7 @@ const struct cred *override_creds(const struct cred *new)
         * visible to other threads under RCU.
         */
        get_new_cred((struct cred *)new);
-       rcu_assign_pointer(current->cred, new);
-
+       old = override_creds_light(new);
        kdebug("override_creds() = %p{%ld}", old,
               atomic_long_read(&old->usage));
        return old;
@@ -521,7 +520,7 @@ void revert_creds(const struct cred *old)
        kdebug("revert_creds(%p{%ld})", old,
               atomic_long_read(&old->usage));

-       rcu_assign_pointer(current->cred, old);
+       revert_creds_light(old);
        put_cred(override);
 }
 EXPORT_SYMBOL(revert_creds);

> 
> > > Maybe I really fail to understand this problem or the proposed solution:
> > > the single reference that overlayfs keeps in ovl->creator_cred is tied
> > > to the lifetime of the overlayfs superblock, no? And anyone who needs a
> > > long term cred reference e.g, file->f_cred will take it's own reference
> > > anyway. So it should be safe to just keep that reference alive until
> > > overlayfs is unmounted, no? I'm sure it's something quite obvious why
> > > that doesn't work but I'm just not seeing it currently.
> >
> > My read of the code says that what you are proposing should work. (what
> > I am seeing is that in the "optimized" cases, the only practical effect
> > of override/revert is the rcu_assign_pointer() dance)
> >
> > I guess that the question becomes: Do we want this property (that the
> > 'cred' associated with a subperblock/similar is long lived and the
> > "inner" refcount can be omitted) to be encoded in the constructor? Or do
> > we want it to be "encoded" in a call by call basis?
> >
> 
> Neither.
> 
> Christian's proposal does not involve marking the cred object as
> long lived, which looks a much better idea to me.
> 
> The performance issues you observed are (probably) due to get/put
> of cred refcount in the helpers {override,revert}_creds().

Most likely they are. I don't see what else would be expensive. But I
may lack details.

> 
> Christian suggested lightweight variants of {override,revert}_creds()
> that do not change refcount. Combining those with a guard and
> I don't see what can go wrong (TM).

Place a nice comment explaining lifetime expectations in the commit
message. Then someone can always tell us why we're wrong.

  reply	other threads:[~2023-12-19 13:35 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-18  7:45 ovl: ovl_fs::creator_cred::usage scalability issues Chen Hu
2023-10-18 11:59 ` Amir Goldstein
2023-12-14 22:02   ` [RFC] HACK: overlayfs: Optimize overlay/restore creds Vinicius Costa Gomes
2023-12-15 10:30     ` Amir Goldstein
2023-12-15 20:00       ` Vinicius Costa Gomes
2023-12-16 10:16         ` Amir Goldstein
2023-12-16 11:38           ` Amir Goldstein
2023-12-18 16:30             ` Christian Brauner
2023-12-18 21:57               ` Vinicius Costa Gomes
2023-12-19  7:15                 ` Amir Goldstein
2023-12-19 13:35                   ` Christian Brauner [this message]
2023-12-19 14:33                   ` Vinicius Costa Gomes
2024-01-23 15:39                     ` Christian Brauner
2024-01-23 16:37                       ` Vinicius Costa Gomes
2023-12-16 18:26           ` Linus Torvalds
2023-12-18 15:17             ` Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231219-marken-pochen-26d888fb9bb9@brauner \
    --to=brauner@kernel.org \
    --cc=amir73il@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=hu1.chen@intel.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-unionfs@vger.kernel.org \
    --cc=lizhen.you@intel.com \
    --cc=malini.bhandaru@intel.com \
    --cc=mikko.ylinen@intel.com \
    --cc=miklos@szeredi.hu \
    --cc=sforshee@kernel.org \
    --cc=tim.c.chen@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vinicius.gomes@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox