Re: opening tap devices that are created in a container

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Jason Baron <jbaron@akamai.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	libvir-list@redhat.com, rmohr@redhat.com,
	Fabian Deutsch <fdeutsch@redhat.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>
Subject: Re: opening tap devices that are created in a container
Date: Thu, 5 Jul 2018 17:10:15 +0100	[thread overview]
Message-ID: <20180705160938.GK3814@redhat.com> (raw)
In-Reply-To: <6a8d7673-0ed7-5920-cc3a-d5d68dbc547c@akamai.com>

On Thu, Jul 05, 2018 at 10:20:16AM -0400, Jason Baron wrote:
> Hi,
> 
> Opening tap devices, such as macvtap, that are created in containers is
> problematic because the interface for opening tap devices is via
> /dev/tapNN and devtmpfs is not typically mounted inside a container as
> its not namespace aware. It is possible to do a mknod() in the
> container, once the tap devices are created, however, since the tap
> devices are created dynamically its not possible to apriori allow access
> to certain major/minor numbers, since we don't know what these are going
> to be. In addition, its desirable to not allow the mknod capability in
> containers. This behavior, I think is somewhat inconsistent with the
> tuntap driver where one can create tuntap devices inside a container by
> first opening /dev/net/tun and then using them by supplying the tuntap
> device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates the
> network namespace, one is limited to opening network devices that belong
> to your current network namespace.
> 
> Here are some options to this issue, that I wanted to get feedback
> about, and just wondering if anybody else has run into this.
> 
> 1)
> 
> Don't create the tap device, such as macvtap in the container. Instead,
> create the tap device outside of the container and then move it into the
> desired container network namespace. In addition, do a mknod() for the
> corresponding /dev/tapNN device from outside the container before doing
> chroot().
> 
> This solution still doesn't allow tap devices to be created inside the
> container. Thus, in the case of kubevirt, which runs libvirtd inside of
> a container, it would mean changing libvirtd to open existing tap
> devices (as opposed to the current behavior of creating new ones). This
> would not require any kernel changes, but as mentioned seems
> inconsistent with the tuntap interface.

Presumably the /dev/tapNN  device name also changes when you rename
the tap device interface using SIOCSIFNAME ?

eg if it was /dev/tap24 in the host and you called SIOCSIFNAME(eth0)
when moving it into the container, it would be /dev/eth0 inside the
container ?

Anyway, given that this /dev/tapNN approach is what exists today,
libvirt will likely want to implement support for this regardless
in order to support existing kernels.

> 2)
> 
> Add a new kernel interface for tap devices similar to how /dev/net/tun
> currently works. It might be nice to use TUNSETIFF for tap devices, but
> because tap devices have different fops they can't be easily switched
> after open(). So the suggestion is a new ioctl (TUNGETFDBYNAME?), where
> the tap device name is supplied and a new fd (distinct from the fd
> returned by the open of /dev/net/tun) is returned as an output field as
> part of the new ioctl parameter.
> 
> It may not make sense to have this new ioctl call for /dev/net/tun since
> its really about opening a tap device, so it may make sense to introduce
> it as part of a new device, such as /dev/net/tap. This new ioctl could
> be used for macvtap and ipvtap (or any tap device). I think it might
> also improve performance for tuntap devices themselves, if they are
> opened this way since currently all tun operations such as read() and
> write() take a reference count on the underlying tuntap device, since it
> can be changed via TUNSETIFF. I tested this interface out, so I can
> provide the kernel changes if that's helpful for clarification.

Either /dev/net/tun wit new ioctl, or /dev/net/tap with TNUSETIFF
would be workable from libvirt's POV.

One slight complication with either of the solutions above is that
libvirt won't know whether it is given a TAP or a MACVTAP device.
It'll only be given the device name. So with code today we would
probably have to first try /dev/tapNNN and if that doesn't exist
then try /dev/net/tun with TUNSETIFF.

If adding a new /dev/net/tap, something could seemlessy accept
either a TAP or MACTAP nic name would be nice.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

next prev parent reply	other threads:[~2018-07-05 16:10 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-05 14:20 opening tap devices that are created in a container Jason Baron
2018-07-05 16:10 ` Daniel P. Berrangé [this message]
2018-07-09 20:56   ` Jason Baron
2018-07-10  8:46     ` Daniel P. Berrangé
2018-07-05 16:24 ` [libvirt] " Roman Mohr
2018-07-08  6:01   ` Martin Kletzander
2018-07-09 21:00     ` Jason Baron
2018-07-10  8:47       ` Daniel P. Berrangé
     [not found]       ` <20180711101005.GA13392@wheatley>
2018-07-12  3:33         ` Jason Baron
2018-07-17 11:58         ` Roman Mohr
2018-07-17 11:45       ` Martin Kletzander

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180705160938.GK3814@redhat.com \
    --to=berrange@redhat.com \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=fdeutsch@redhat.com \
    --cc=jbaron@akamai.com \
    --cc=libvir-list@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=rmohr@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).