netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: task_diag: add a new interface to get information about processes
       [not found] ` <CANaxB-w2RZ8WZ0ACP-3M9OJ0vgWRM0d74S2WAVwRRY=gXoR5Zg@mail.gmail.com>
@ 2016-05-05  2:14   ` Stephen Hemminger
  2016-05-05  3:39     ` Andy Lutomirski
  0 siblings, 1 reply; 3+ messages in thread
From: Stephen Hemminger @ 2016-05-05  2:14 UTC (permalink / raw)
  To: Andrey Vagin; +Cc: Andy Lutomirski, netdev, David Miller, Eric W. Biedermanm

On Wed, 4 May 2016 15:34:21 -0700
Andrey Vagin <avagin@openvz.org> wrote:

> Hi Stephen,
> 
> On Wed, May 4, 2016 at 1:22 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> > I understand how reading /proc or /sys can be a bottleneck, but this
> > proposed method using a system call is the wrong way to do this.
> >
> > Why not use netlink like other systems do which allows a message
> > based response which allows for future changes (no fixed datastructures),
> > and is message based.
> >
> > Generic netlink has already been used by several other subsystems.
> 
> I used netlink in two first versions of task_diag, but then Andy
> convinced me that netlink interfaces are not ideal for this case. I
> added him into Cс.
> 
> Here is a thread with our discussion about using netlink for
> task_diag: https://lkml.org/lkml/2015/12/15/520
> Can I ask you to read it and give your comments? It would be nice to
> find a way how to use netlink sockets instead of creating a new
> interface.
> 
> Thanks,
> Andrew

LKML is too busy, no one reads it anymore :-)
Since this is netlink related you need to discuss it on netdev.

The objection seems to be time or creation versus time of use and permissions.
Netlink internally is not really message based all responses are generated
in the context of the send().  You need credentials to create
the socket, but the actual response will occur in the context of the calling
process. I don't see how that is substantially different than a system call.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: task_diag: add a new interface to get information about processes
  2016-05-05  2:14   ` task_diag: add a new interface to get information about processes Stephen Hemminger
@ 2016-05-05  3:39     ` Andy Lutomirski
  2016-05-16 22:29       ` Andrew Vagin
  0 siblings, 1 reply; 3+ messages in thread
From: Andy Lutomirski @ 2016-05-05  3:39 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Eric W. Biederman, David S. Miller, Network Development,
	Andrey Vagin, Linus Torvalds

On May 4, 2016 7:14 PM, "Stephen Hemminger" <stephen@networkplumber.org> wrote:
>
> On Wed, 4 May 2016 15:34:21 -0700
> Andrey Vagin <avagin@openvz.org> wrote:
>
> > Hi Stephen,
> >
> > On Wed, May 4, 2016 at 1:22 PM, Stephen Hemminger
> > <stephen@networkplumber.org> wrote:
> > > I understand how reading /proc or /sys can be a bottleneck, but this
> > > proposed method using a system call is the wrong way to do this.
> > >
> > > Why not use netlink like other systems do which allows a message
> > > based response which allows for future changes (no fixed datastructures),
> > > and is message based.
> > >
> > > Generic netlink has already been used by several other subsystems.
> >
> > I used netlink in two first versions of task_diag, but then Andy
> > convinced me that netlink interfaces are not ideal for this case. I
> > added him into Cс.
> >
> > Here is a thread with our discussion about using netlink for
> > task_diag: https://lkml.org/lkml/2015/12/15/520
> > Can I ask you to read it and give your comments? It would be nice to
> > find a way how to use netlink sockets instead of creating a new
> > interface.
> >
> > Thanks,
> > Andrew
>
> LKML is too busy, no one reads it anymore :-)
> Since this is netlink related you need to discuss it on netdev.
>
> The objection seems to be time or creation versus time of use and permissions.
> Netlink internally is not really message based all responses are generated
> in the context of the send().  You need credentials to create
> the socket, but the actual response will occur in the context of the calling
> process. I don't see how that is substantially different than a system call.
>
>

Linus, this is Yet Another Credential Fuckup, except that it hasn't
happened yet, so it's okay.  The tl;dr is that Andrey wants to add an
interface to ask a pidns some questions, and netlink looks natural,
except that using netlink sockets to interrogate a pidns seems rather
problematic.  I would also love to see a decent interface for
interrogating user namespaces, and again, netlink would be great,
except that it's a socket and makes no sense in this context.

Netlink had, and possibly still has, tons of serious security bugs
involving code checking send() callers' creds.  I found and fixed a
few a couple years ago.  To reiterate once again, send() CANNOT use
caller creds safely.  (I feel like I say this once every few weeks.
It's getting old.)

I realize that it's convenient to use a socket as a context to keep
state between syscalls, but it has some annoying side effects:

 - It makes people want to rely on send()'s caller's creds.

 - It's miserable in combination with seccomp.

 - It doesn't play nicely with namespaces.

 - It makes me wonder why things like task_diag, which have nothing to
do with networking, seem to get tangled up with networking.


Would it be worth considering adding a parallel interface, using it
for new things, and slowly migrating old use cases over?

int issue_kernel_command(int ns, int command, const struct iovec *iov,
int iovcnt, int flags);

ns is an actual namespace fd or:

KERNEL_COMMAND_CURRENT_NETNS
KERNEL_COMMAND_CURRENT_PIDNS
etc, or a special one:
KERNEL_COMMAND_GLOBAL.  KERNEL_COMMAND_GLOBAL can't be used in a
non-root namespace.

KERNEL_COMMAND_GLOBAL works even for namespaced things, if the
relevant current ns is the init namespace.  (This feature is optional,
but it would allow gradually namespacing global things.)

command is an enumerated command.  Each command implies a namespace
type, and, if you feed this thing the wrong namespace type, you get
EINVAL.  The high bit of command indicates whether it's read-only
command.

iov gives a command in the format expected, which, for the most part,
would be a netlink message.

The return value is an fd that you can call read/readv on to read the
response.  It's not a socket (or at least you can't do normal socket
operations on it if it is a socket behind the scenes).  The
implementation of read() promises *not* to look at caller creds.  The
returned fd is unconditionally cloexec -- it's 2016 already.  Sheesh.

When you've read all the data, all you can do is close the fd.  You
can't issue another command on the same fd.  You also can't call
write() or send() on the fd unless someone has a good reason why you
should be able to and why it's safe.  You can't issue another command
on the same fd.


I imagine that the implementation could re-use a bunch of netlink code
under the hood.


--Andy

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: task_diag: add a new interface to get information about processes
  2016-05-05  3:39     ` Andy Lutomirski
@ 2016-05-16 22:29       ` Andrew Vagin
  0 siblings, 0 replies; 3+ messages in thread
From: Andrew Vagin @ 2016-05-16 22:29 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Stephen Hemminger, Eric W. Biederman, David S. Miller,
	Network Development, Andrey Vagin, Linus Torvalds

On Wed, May 04, 2016 at 08:39:51PM -0700, Andy Lutomirski wrote:
> 
> Linus, this is Yet Another Credential Fuckup, except that it hasn't
> happened yet, so it's okay.  The tl;dr is that Andrey wants to add an
> interface to ask a pidns some questions, and netlink looks natural,
> except that using netlink sockets to interrogate a pidns seems rather
> problematic.  I would also love to see a decent interface for
> interrogating user namespaces, and again, netlink would be great,
> except that it's a socket and makes no sense in this context.
> 
> Netlink had, and possibly still has, tons of serious security bugs
> involving code checking send() callers' creds.  I found and fixed a
> few a couple years ago.  To reiterate once again, send() CANNOT use
> caller creds safely.  (I feel like I say this once every few weeks.
> It's getting old.)
> 
> I realize that it's convenient to use a socket as a context to keep
> state between syscalls, but it has some annoying side effects:
> 
>  - It makes people want to rely on send()'s caller's creds.
> 
>  - It's miserable in combination with seccomp.
> 
>  - It doesn't play nicely with namespaces.
> 
>  - It makes me wonder why things like task_diag, which have nothing to
> do with networking, seem to get tangled up with networking.
> 
> 
> Would it be worth considering adding a parallel interface, using it
> for new things, and slowly migrating old use cases over?
> 
> int issue_kernel_command(int ns, int command, const struct iovec *iov,
> int iovcnt, int flags);
> 
> ns is an actual namespace fd or:
> 
> KERNEL_COMMAND_CURRENT_NETNS
> KERNEL_COMMAND_CURRENT_PIDNS
> etc, or a special one:
> KERNEL_COMMAND_GLOBAL.  KERNEL_COMMAND_GLOBAL can't be used in a
> non-root namespace.

An request can depend on a few namespaces. For example, we can request
credentials for a specified task. In this case we may want to specify
pid and user namespace.

> 
> KERNEL_COMMAND_GLOBAL works even for namespaced things, if the
> relevant current ns is the init namespace.  (This feature is optional,
> but it would allow gradually namespacing global things.)
> 
> command is an enumerated command.  Each command implies a namespace
> type, and, if you feed this thing the wrong namespace type, you get
> EINVAL.  The high bit of command indicates whether it's read-only
> command.
> 
> iov gives a command in the format expected, which, for the most part,
> would be a netlink message.
> 
> The return value is an fd that you can call read/readv on to read the
> response.  It's not a socket (or at least you can't do normal socket
> operations on it if it is a socket behind the scenes).  The
> implementation of read() promises *not* to look at caller creds.  The
> returned fd is unconditionally cloexec -- it's 2016 already.  Sheesh.
> 
> When you've read all the data, all you can do is close the fd.  You
> can't issue another command on the same fd.  You also can't call
> write() or send() on the fd unless someone has a good reason why you
> should be able to and why it's safe.  You can't issue another command
> on the same fd.
> 
> 
> I imagine that the implementation could re-use a bunch of netlink code
> under the hood.

I'm agree with this interface. For me it's interesting to know an
opinion from the other side. Stephen, could you share you comments
about these netlink issues and this new interface?

Thanks,
Andrew

> 
> 
> --Andy

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-05-16 22:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20160504132245.1d04c131@xeon-e3>
     [not found] ` <CANaxB-w2RZ8WZ0ACP-3M9OJ0vgWRM0d74S2WAVwRRY=gXoR5Zg@mail.gmail.com>
2016-05-05  2:14   ` task_diag: add a new interface to get information about processes Stephen Hemminger
2016-05-05  3:39     ` Andy Lutomirski
2016-05-16 22:29       ` Andrew Vagin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).