From: Christian Brauner <christian.brauner@canonical.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>,
davem@davemloft.net, gregkh@linuxfoundation.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
avagin@virtuozzo.com, serge@hallyn.com
Subject: Re: [PATCH net-next] netns: filter uevents correctly
Date: Wed, 11 Apr 2018 21:57:43 +0200 [thread overview]
Message-ID: <20180411195742.GA640@gmail.com> (raw)
In-Reply-To: <871sflk0zc.fsf@xmission.com>
On Wed, Apr 11, 2018 at 02:16:23PM -0500, Eric W. Biederman wrote:
> Christian Brauner <christian.brauner@canonical.com> writes:
>
> > On Wed, Apr 11, 2018 at 01:37:18PM -0500, Eric W. Biederman wrote:
> >> Christian Brauner <christian.brauner@canonical.com> writes:
> >>
> >> > On Wed, Apr 11, 2018 at 11:40:14AM -0500, Eric W. Biederman wrote:
> >> >> Christian Brauner <christian.brauner@canonical.com> writes:
> >> >> > Yeah, agreed.
> >> >> > But I think the patch is not complete. To guarantee that no non-initial
> >> >> > user namespace actually receives uevents we need to:
> >> >> > 1. only sent uevents to uevent sockets that are located in network
> >> >> > namespaces that are owned by init_user_ns
> >> >> > 2. filter uevents that are sent to sockets in mc_list that have opened a
> >> >> > uevent socket that is owned by init_user_ns *from* a
> >> >> > non-init_user_ns
> >> >> >
> >> >> > We account for 1. by only recording uevent sockets in the global uevent
> >> >> > socket list who are owned by init_user_ns.
> >> >> > But to account for 2. we need to filter by the user namespace who owns
> >> >> > the socket in mc_list. So in addition to that we also need to slightly
> >> >> > change the filter logic in kobj_bcast_filter() I think:
> >> >> >
> >> >> > diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> >> >> > index 22a2c1a98b8f..064d7d29ace5 100644
> >> >> > --- a/lib/kobject_uevent.c
> >> >> > +++ b/lib/kobject_uevent.c
> >> >> > @@ -251,7 +251,8 @@ static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
> >> >> > return sock_ns != ns;
> >> >> > }
> >> >> >
> >> >> > - return 0;
> >> >> > + /* Check if socket was opened from non-initial user namespace. */
> >> >> > + return sk_user_ns(dsk) != &init_user_ns;
> >> >> > }
> >> >> > #endif
> >> >> >
> >> >> >
> >> >> > But correct me if I'm wrong.
> >> >>
> >> >> You are worrying about NETLINK_LISTEN_ALL_NSID sockets. That has
> >> >> permissions and an explicit opt-in to receiving packets from multiple
> >> >> network namespaces.
> >> >
> >> > I don't think that's what I'm talking about unless that is somehow the
> >> > default for NETLINK_KOBJECT_UEVENT sockets. What I'm worried about is
> >> > doing
> >> >
> >> > unshare -U --map-root
> >> >
> >> > then opening a NETLINK_KOBJECT_UEVENT socket and starting to listen to
> >> > uevents. Imho, this should not be possible because I'm in a
> >> > non-init_user_ns. But currently I'm able to - even with the patch to
> >> > come - since the uevent socket in the kernel was created when init_net
> >> > was created and hence is *owned* by the init_user_ns which means it is
> >> > in the list of uevent sockets. Here's a demo of what I mean:
> >> >
> >> > https://asciinema.org/a/175632
> >>
> >> Why do you care about this case?
> >
> > It's not so much that I care about this case since any workload that
> > wants to run a separate udevd will have to unshare the network namespace
> > and the user namespace for it to make complete sense.
> > What I do care about is that the two of us are absolutely in the clear
> > about what semantics we are going to expose to userspace and it seems
> > that we were talking past each other wrt to this "corner case".
> > For userspace, it needs to be very clear that the intention is to filter
> > by *owning user namespace of the network namespace a given task resides
> > in* and not by user namespace of the task per se. This is what this
> > corner case basically shows, I think.
>
> If this is just a clarification of semantics then yes this is a
> productive question. I almost agree with your definition above.
>
> I would make the definition very simple. Uevents will not be broadcast
> via netlink in a network namespace where net->user_ns != &init_user_ns,
> with the exception of uevents for network devices in that network
> namespace.
Well, for the sake of posterity :) I should add that I'd prefer we'd add
what I suggested above:
- return 0;
+ /* Check if socket was opened from non-initial user namespace. */
+ return sk_user_ns(dsk) != &init_user_ns;
}
to slam the door shut once and for all for all non-init_user_ns
namespaces because it *seems* like the cleanest solution: uevents are
owned by init_user_ns; period. Because it is the only user namespace
that can do anything interesting with them *by default*.
But what we have now right now with my upcoming patch is at least
sufficient and safe.
Christian
>
> The existing filtering by the sending uid and verifying that it is uid 0
> gives a little more room to filter if we want (as udev & friends will
> ignore the uevent), but I don't see the point.
>
> Eric
prev parent reply other threads:[~2018-04-11 19:57 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-04 19:48 [PATCH net-next] netns: filter uevents correctly Christian Brauner
2018-04-04 20:30 ` [PATCH net] " Christian Brauner
2018-04-04 22:38 ` Eric W. Biederman
2018-04-05 1:27 ` Christian Brauner
2018-04-06 2:02 ` David Miller
2018-04-05 1:35 ` Christian Brauner
2018-04-05 13:01 ` [PATCH net-next] " Kirill Tkhai
2018-04-05 14:07 ` Christian Brauner
2018-04-05 14:26 ` Kirill Tkhai
2018-04-05 14:41 ` Christian Brauner
2018-04-06 3:59 ` Eric W. Biederman
2018-04-06 13:07 ` Christian Brauner
2018-04-06 14:45 ` Eric W. Biederman
2018-04-06 16:07 ` Christian Brauner
2018-04-06 16:48 ` Eric W. Biederman
2018-04-09 15:46 ` Christian Brauner
2018-04-09 23:21 ` Eric W. Biederman
2018-04-10 14:35 ` Christian Brauner
2018-04-10 15:04 ` Eric W. Biederman
2018-04-11 9:09 ` Christian Brauner
2018-04-11 16:40 ` Eric W. Biederman
2018-04-11 17:03 ` Christian Brauner
2018-04-11 18:37 ` Eric W. Biederman
2018-04-11 18:57 ` Christian Brauner
2018-04-11 19:16 ` Eric W. Biederman
2018-04-11 19:57 ` Christian Brauner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180411195742.GA640@gmail.com \
--to=christian.brauner@canonical.com \
--cc=avagin@virtuozzo.com \
--cc=davem@davemloft.net \
--cc=ebiederm@xmission.com \
--cc=gregkh@linuxfoundation.org \
--cc=ktkhai@virtuozzo.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=serge@hallyn.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.