netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Alexei Starovoitov <ast@fb.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"David S . Miller" <davem@davemloft.net>,
	Daniel Borkmann <daniel@iogearbox.net>,
	David Ahern <dsa@cumulusnetworks.com>, Tejun Heo <tj@kernel.org>,
	Thomas Graf <tgraf@suug.ch>,
	Network Development <netdev@vger.kernel.org>
Subject: Re: [PATCH net] bpf: expose netns inode to bpf programs
Date: Thu, 2 Feb 2017 22:05:57 -0800	[thread overview]
Message-ID: <20170203060554.GA80764@ast-mbp.thefacebook.com> (raw)
In-Reply-To: <87r33fevva.fsf@xmission.com>

On Fri, Feb 03, 2017 at 05:33:45PM +1300, Eric W. Biederman wrote:
> 
> The point is that we can make the inode number stable across migration
> and the user space API for namespaces has been designed with that
> possibility in mind.
> 
> What you have proposed is the equivalent of reporting a file name, and
> instead of reporting /dir1/file1 /dir2/file1 just reporting file1 for
> both cases.
> 
> That is problematic.
> 
> It doesn't matter that eBPF and CRIU do not mix.  When we implement
> migration of the namespace file descriptors and can move them from
> one system to another preserving the device number and inode number
> so that criu of other parts of userspace can function better there will
> be a problem.  There is not one unique inode number per namespace and
> the proposed interface in your eBPF programs is broken.
> 
> I don't know when inode numbers are going to be the bottleneck we decide
> to make migratable to make CRIU work better but things have been
> designed and maintained very carefully so that we can do that.
> 
> Inode numbers are in the namespace of the filesystem they reside in.

I saw that iproute2 is doing:
  if ((st.st_dev == netst.st_dev) &&
      (st.st_ino == netst.st_ino)) {
but proc_alloc_inum() is using global ida,
so I figured that iproute2 extra st_dev check must have been obsolete.
So the long term plan is to make /proc to be namespace-aware?
That's fair. In such case exposing inode only will
lead to wrong assumptions.

> >> But you told Eric that his nack doesn't matter, and maybe it would be
> >> nice to ask him to clarify instead.
> >
> > Fair enough. Eric, thoughts?
> 
> In very short terms exporting just the inode number would require
> implementing a namespace of namespaces, and that is NOT happening.
> We are not going to design our kernel interfaces so badly that we need
> to do that.
> 
> At a bare minimum you need to export the device number of the filesystem
> as well as the inode number.

Agree. Will do.

> My expectation would be that now you are starting to look at concepts
> that are namespaced the way you would proceed would be to associate a
> full set of namespaces with your ebpf program.  Those namespaces would
> come from the submitter of your ebpf program.  Namespaced values
> would be in the terms of your associated namespaces.
> 
> That keeps things working the way userspace would expect.
> 
> The easy way to build such an association is to not allow your
> contextless ebpf programs from being submitted to kernel in anything
> other than the initial set of namespaces.
> 
> But please assume all global identifiers are namespaced.  If they aren't
> that needs to be fixed because not having them namespaced will break
> process migration at some point.
> 
> In short the fix here is to export both the inode number the device
> number.  That is what it takes to uniquely identify a file.  It would be

Agree. Will respin.

> good if you went farther and limited your contextless ebpf programs to
> only being installed by programs in the initial set of namespaces.

you mean to limit to init_net only? This might break existing users.

> Does that make things clearer?

yep. thanks for the feedback.

  reply	other threads:[~2017-02-03  6:06 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-26  3:27 [PATCH net] bpf: expose netns inode to bpf programs Alexei Starovoitov
2017-01-26  5:46 ` Eric W. Biederman
2017-01-26  6:00   ` Ying Xue
2017-01-26  6:23   ` Alexei Starovoitov
2017-01-26 16:37     ` Andy Lutomirski
2017-01-26 17:46       ` Alexei Starovoitov
2017-01-26 18:12         ` Andy Lutomirski
2017-01-26 18:32           ` Alexei Starovoitov
2017-01-26 19:07             ` Andy Lutomirski
2017-01-26 19:25               ` Alexei Starovoitov
2017-02-03  4:33                 ` Eric W. Biederman
2017-02-03  6:05                   ` Alexei Starovoitov [this message]
2017-02-03 10:30                     ` Eric W. Biederman
2017-02-03 21:00                   ` Andy Lutomirski
2017-02-03 21:06                     ` Eric W. Biederman
2017-02-03 23:08                     ` Alexei Starovoitov
2017-02-04 17:07                       ` Andy Lutomirski
2017-02-05  3:10                         ` Alexei Starovoitov
2017-02-05  3:27                           ` Andy Lutomirski
2017-02-05  3:48                             ` Alexei Starovoitov
2017-02-05  3:54                               ` Andy Lutomirski
2017-02-05  4:37                                 ` Alexei Starovoitov
2017-02-05  5:05                                   ` Andy Lutomirski
2017-02-07  1:43                                     ` Alexei Starovoitov
2017-01-31 18:02 ` David Miller
2017-01-31 22:11 ` David Ahern
2017-02-03 21:56 ` Daniel Borkmann
2017-02-03 23:06   ` Alexei Starovoitov
2017-02-03 23:42     ` Daniel Borkmann
2017-02-04  1:25       ` Alexei Starovoitov
2017-02-04 17:08       ` Andy Lutomirski
2017-02-05  3:18         ` Alexei Starovoitov
2017-02-05  3:22           ` Andy Lutomirski
2017-02-05  3:35             ` Alexei Starovoitov
2017-02-05  3:49               ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170203060554.GA80764@ast-mbp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=ast@fb.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dsa@cumulusnetworks.com \
    --cc=ebiederm@xmission.com \
    --cc=luto@amacapital.net \
    --cc=netdev@vger.kernel.org \
    --cc=tgraf@suug.ch \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).