public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Alun Evans <alun@badgerous.net>
Cc: linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 05/27] containers: Open a socket inside a container
Date: Mon, 30 Sep 2019 05:02:57 -0500	[thread overview]
Message-ID: <87o8z2m78u.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <m2d0fkt5pj.fsf@badgerous.net> (Alun Evans's message of "Sat, 28 Sep 2019 15:29:44 -0700")

Alun Evans <alun@badgerous.net> writes:

> On Fri 27 Sep '19 at 07:46 ebiederm@xmission.com (Eric W. Biederman) wrote:
>> 
>> Alun Evans <alun@badgerous.net> writes:
>>
>>> Hi Eric,
>>>
>>>
>>> On Tue, 19 Feb 2019, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>>>
>>>> David Howells <dhowells@redhat.com> writes:
>>>>
>>>> > Provide a system call to open a socket inside of a container, using that
>>>> > container's network namespace.  This allows netlink to be used to manage
>>>> > the container.
>>>> >
>>>> > 	fd = container_socket(int container_fd,
>>>> > 			      int domain, int type, int protocol);
>>>> >
>>>>
>>>> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>>>
>>>> Use a namespace file descriptor if you need this.  So far we have not
>>>> added this system call as it is just a performance optimization.  And it
>>>> has been too niche to matter.
>>>>
>>>> If this that has changed we can add this separately from everything else
>>>> you are doing here.
>>>
>>> I think I've found the niche.
>>>
>>>
>>> I'm trying to use network namespaces from Go.
>>
>> Yes. Go sucks for this.
>
> Haha... Neither confirm nor deny.
>
>>> Since setns is thread
>>> specific, I'm forced to use this pattern:
>>>
>>>     runtime.LockOSThread()
>>>     defer runtime.UnlockOSThread()
>>>     …
>>>     err = netns.Set(newns)
>>>
>>>
>>> This is only safe recently:
>>> https://github.com/vishvananda/netns/issues/17#issuecomment-367325770
>>>
>>> - but is still less than ideal performance wise, as it locks out other
>>>   socket operations.
>>>
>>> The socketat() / socketns() would be ideal:
>>>
>>>   https://lwn.net/Articles/406684/
>>>   https://lwn.net/Articles/407495/
>>>   https://lkml.org/lkml/2011/10/3/220
>>>
>>>
>>> One thing that is interesting, the LockOSThread works pretty well for
>>> receiving, since I can wrap it around the socket()/bind()/listen() at
>>> startup. Then accept() can run outside of the lock.
>>>
>>> It's creating new outbound tcp connections via socket()/connect() pairs
>>> that is the issue.
>>
>> As I understand it you should be able to write socketat in go something like:
>>
>>         runtime.LockOSThread()
>>         err = netns.Set(newns);
>>         fd = socket(...);
>>         err = netns.Set(defaultns);
>>         runtime.UnlockOSThread()
>
> Yeah, this is currently what I'm having to do. It's painful because due
> to the Go runtime model of a single OS netpoller thread, locking the OS
> thread to the current goroutine blocks out the other goroutines doing
> network I/O.

Just to be clear you know that only the setns and the socket calls need
to block out switching threads and all of those should be currently
quite fast.

Hmm.  So this is a global Go lock and not simply locking the current go
routine onto it's current kernel thread?  Yes that does sound quite
painful.

It would be very nice if Go could provide an idiom where a series of
calls could be fixed to a single kernel thread.

>> I have no real objections to a kernel system call doing that.  It has
>> just never risen to the level where it was necessary to optimize
>> userspace yet.
>
> Would you be able to accept the patch from this thread with the
> container API?
>
>     fd = container_socket(int container_fd,
>                           int domain, int type, int protocol);
>
> I think that seems more coherent with the rest of the container world
> than a follow up of https://lkml.org/lkml/2011/10/3/220 :
>

Given container_socket implies the need to create a namespace of
namespaces. No.

Given that container_socket can't be used in iptools because it has
a different concept of container.  No.

Given that no one has ever proposed solving the entire migration story
when the have wanted to define a container and thus all of this
implies breaking CRIU.  No.

>     int socketns(int netns_fd, int domain, int type, int protocol)
>

Yes please.

I suspect in the current world where system calls are much more
expensive (because of mitigations for speculative execution bugs) with a
little bit of timing we could come up with a reasonable case even for
non GO runtimes.

To that end I would like to see performance numbers of at least a micro
benchmark in C.  Just so we can quantify the improvement.

Eric

  reply	other threads:[~2019-09-30 10:03 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <m2o8z7t2w5.fsf@badgerous.net>
2019-09-27 14:46 ` [RFC PATCH 05/27] containers: Open a socket inside a container Eric W. Biederman
2019-09-28 22:29   ` Alun Evans
2019-09-30 10:02     ` Eric W. Biederman [this message]
2019-02-15 16:07 [RFC PATCH 00/27] Containers and using authenticated filesystems David Howells
2019-02-15 16:07 ` [RFC PATCH 05/27] containers: Open a socket inside a container David Howells
2019-02-19 16:41   ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o8z2m78u.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=alun@badgerous.net \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox