From: Jann Horn <jannh@google.com>
To: Laurent Vivier <laurent@vivier.eu>
Cc: "kernel list" <linux-kernel@vger.kernel.org>,
"Greg Kurz" <groug@kaod.org>, "Andrei Vagin" <avagin@gmail.com>,
"Linux API" <linux-api@vger.kernel.org>,
"Dmitry Safonov" <dima@arista.com>,
"James Bottomley" <James.Bottomley@hansenpartnership.com>,
"Jan Kiszka" <jan.kiszka@siemens.com>,
"Christian Brauner" <christian.brauner@ubuntu.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
"Linux Containers" <containers@lists.linux-foundation.org>,
"Alexander Viro" <viro@zeniv.linux.org.uk>,
"Eric Biederman" <ebiederm@xmission.com>,
"Henning Schild" <henning.schild@siemens.com>,
"Cédric Le Goater" <clg@kaod.org>
Subject: Re: [PATCH v8 1/1] ns: add binfmt_misc to the user namespace
Date: Mon, 16 Dec 2019 23:53:54 +0100 [thread overview]
Message-ID: <CAG48ez2YE33KiuhnHa=cq_DymqWLAv9CyeD3BOrjsStKfb_dBQ@mail.gmail.com> (raw)
In-Reply-To: <15d270a6-2264-adc5-3f56-fdb8b67ad267@vivier.eu>
On Mon, Dec 16, 2019 at 9:05 PM Laurent Vivier <laurent@vivier.eu> wrote:
> Le 16/12/2019 à 20:08, Jann Horn a écrit :
> > On Mon, Dec 16, 2019 at 10:12 AM Laurent Vivier <laurent@vivier.eu> wrote:
> >> This patch allows to have a different binfmt_misc configuration
> >> for each new user namespace. By default, the binfmt_misc configuration
> >> is the one of the previous level, but if the binfmt_misc filesystem is
> >> mounted in the new namespace a new empty binfmt instance is created and
> >> used in this namespace.
> >>
> >> For instance, using "unshare" we can start a chroot of another
> >> architecture and configure the binfmt_misc interpreter without being root
> >> to run the binaries in this chroot.
> >
> > How do you ensure that when userspace is no longer using the user
> > namespace and mount namespace, the entries and the binfmt_misc
> > superblock are deleted? As far as I can tell from looking at the code,
> > at the moment, if I create a user namespace+mount namespace, mount
> > binfmt_misc in there, register a file format and then let all
> > processes inside the namespaces exit, the binfmt_misc mount will be
> > kept alive by the simple_pin_fs() stuff, and the binfmt_misc entries
> > will also stay in memory.
> >
> > [...]
>
> Do you have any idea how I can fix this issue?
I think the easiest way (keeping in mind that we want to avoid having
to fiddle around with reference loops, where e.g. interpreter
executable files opened by binfmt_misc have references back to the
user namespace through ->f_cred) would be to add a new patch in front
of this one that changes the semantics such that when binfmt_misc is
unmounted, all the existing format registrations are deleted. That's
probably also nicer from the perspective of inspectability. It could
in theory break stuff, but I think that's probably somewhat unlikely.
Still, it'd be an API change, and therefore you should CC linux-api@
on such a change.
> >> @@ -718,7 +736,9 @@ static ssize_t bm_register_write(struct file *file, const char __user *buffer,
> >> if (!inode)
> >> goto out2;
> >>
> >> - err = simple_pin_fs(&bm_fs_type, &bm_mnt, &entry_count);
> >> + ns = binfmt_ns(file_dentry(file)->d_sb->s_user_ns);
> >> + err = simple_pin_fs(&bm_fs_type, &ns->bm_mnt,
> >> + &ns->entry_count);
> >
> > When you call simple_pin_fs() here, the user namespace of `current`
> > and the user namespace of the superblock are not necessarily related.
> > So simple_pin_fs() may end up taking a reference on the mountpoint for
> > a user namespace that has nothing to do with the namespace for which
> > an entry is being created.
>
> Do you have any idea how I can fix this issue?
If you fix the memory leak the way I suggested, this wouldn't be a
problem anymore either.
> >> +static void bm_free(struct fs_context *fc)
> >> +{
> >> + if (fc->s_fs_info)
> >> + put_user_ns(fc->s_fs_info);
> >> +}
> >
> > Silly question: Why the "if"? Can you ever reach this with fc->s_fs_info==NULL?
>
> So I understand the if is unnecessary and I will remove it.
Your code was actually exactly right, I didn't understand how
fc->s_fs_info works.
> >> static int bm_get_tree(struct fs_context *fc)
> >> {
> >> - return get_tree_single(fc, bm_fill_super);
> >> + return get_tree_keyed(fc, bm_fill_super, get_user_ns(fc->user_ns));
> >
> > get_user_ns() increments the refcount of the namespace, but in the
> > case where a binfmt_misc mount already exists, that refcount is never
> > dropped, right? That would be a security bug, since an attacker could
> > overflow the refcount of the user namespace and then trigger a UAF.
> > (And the refcount hardening won't catch it because user namespaces
> > still use raw atomics instead of refcount_t.)
>
> Do you have any idea how I can fix this issue?
Ah, this was actually fine. I missed that get_tree_keyed() stashes
that pointer in fc->s_fs_info.
> >> +#if IS_ENABLED(CONFIG_BINFMT_MISC)
> >
> > Nit: Isn't this kind of check normally written as "#ifdef"?
> >
>
> What is the difference?
As explained in Documentation/process/coding-style.rst and the
relevant header, IS_ENABLED() is for inline use in C expressions.
next prev parent reply other threads:[~2019-12-16 22:54 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-16 9:12 [PATCH v8 0/1] ns: introduce binfmt_misc namespace Laurent Vivier
2019-12-16 9:12 ` [PATCH v8 1/1] ns: add binfmt_misc to the user namespace Laurent Vivier
2019-12-16 19:08 ` Jann Horn
2019-12-16 20:05 ` Laurent Vivier
2019-12-16 22:53 ` Jann Horn [this message]
2021-01-08 8:22 ` Jan Kiszka
2021-01-18 19:51 ` Laurent Vivier
2023-06-30 8:38 ` Norbert Lange
2023-06-30 8:52 ` Laurent Vivier
2023-06-30 9:06 ` Christian Brauner
2023-07-12 19:40 ` Kees Cook
2023-09-06 10:28 ` Norbert Lange
2023-10-11 0:36 ` Kees Cook
2019-12-16 9:46 ` [PATCH v8 0/1] ns: introduce binfmt_misc namespace Christian Brauner
2019-12-16 9:53 ` Laurent Vivier
2019-12-16 10:06 ` Christian Brauner
2019-12-16 10:08 ` Laurent Vivier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAG48ez2YE33KiuhnHa=cq_DymqWLAv9CyeD3BOrjsStKfb_dBQ@mail.gmail.com' \
--to=jannh@google.com \
--cc=James.Bottomley@hansenpartnership.com \
--cc=avagin@gmail.com \
--cc=christian.brauner@ubuntu.com \
--cc=clg@kaod.org \
--cc=containers@lists.linux-foundation.org \
--cc=dima@arista.com \
--cc=ebiederm@xmission.com \
--cc=groug@kaod.org \
--cc=henning.schild@siemens.com \
--cc=jan.kiszka@siemens.com \
--cc=laurent@vivier.eu \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).