All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Jason Baron <jbaron@akamai.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	libvir-list@redhat.com, rmohr@redhat.com,
	Fabian Deutsch <fdeutsch@redhat.com>,
	"Eric W. Biederman" <ebiederm@xmission.com>
Subject: Re: opening tap devices that are created in a container
Date: Tue, 10 Jul 2018 09:46:14 +0100	[thread overview]
Message-ID: <20180710084614.GA1612@redhat.com> (raw)
In-Reply-To: <72dfae4e-876a-729e-6dc6-61219f8d0294@akamai.com>

On Mon, Jul 09, 2018 at 04:56:04PM -0400, Jason Baron wrote:
> 
> 
> On 07/05/2018 12:10 PM, Daniel P. Berrangé wrote:
> > On Thu, Jul 05, 2018 at 10:20:16AM -0400, Jason Baron wrote:
> >> Hi,
> >>
> >> Opening tap devices, such as macvtap, that are created in containers is
> >> problematic because the interface for opening tap devices is via
> >> /dev/tapNN and devtmpfs is not typically mounted inside a container as
> >> its not namespace aware. It is possible to do a mknod() in the
> >> container, once the tap devices are created, however, since the tap
> >> devices are created dynamically its not possible to apriori allow access
> >> to certain major/minor numbers, since we don't know what these are going
> >> to be. In addition, its desirable to not allow the mknod capability in
> >> containers. This behavior, I think is somewhat inconsistent with the
> >> tuntap driver where one can create tuntap devices inside a container by
> >> first opening /dev/net/tun and then using them by supplying the tuntap
> >> device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates the
> >> network namespace, one is limited to opening network devices that belong
> >> to your current network namespace.
> >>
> >> Here are some options to this issue, that I wanted to get feedback
> >> about, and just wondering if anybody else has run into this.
> >>
> >> 1)
> >>
> >> Don't create the tap device, such as macvtap in the container. Instead,
> >> create the tap device outside of the container and then move it into the
> >> desired container network namespace. In addition, do a mknod() for the
> >> corresponding /dev/tapNN device from outside the container before doing
> >> chroot().
> >>
> >> This solution still doesn't allow tap devices to be created inside the
> >> container. Thus, in the case of kubevirt, which runs libvirtd inside of
> >> a container, it would mean changing libvirtd to open existing tap
> >> devices (as opposed to the current behavior of creating new ones). This
> >> would not require any kernel changes, but as mentioned seems
> >> inconsistent with the tuntap interface.
> > 
> > Presumably the /dev/tapNN  device name also changes when you rename
> > the tap device interface using SIOCSIFNAME ?
> > 
> 
> I don't think so. the NN is the ifindex of the device- changing the
> device name does not affect the ifindex.

Ah right that makes sense. 

> > eg if it was /dev/tap24 in the host and you called SIOCSIFNAME(eth0)
> > when moving it into the container, it would be /dev/eth0 inside the
> > container ?
> > 
> 
> When moving it into the container the ifindex can change since the
> ifindex range is per-namespace (not global).

Oh thats interesting, I hadn't realized that.

> > Anyway, given that this /dev/tapNN approach is what exists today,
> > libvirt will likely want to implement support for this regardless
> > in order to support existing kernels.
> 
> Ok, in this case whatever created the tap device outside of the
> container would pass the name of the device to libvirt and make sure
> that the /dev/tapNN device was setup correctly in the container. I
> believe this differs from how libvirt works today in that libvirt would
> need to be modified to open an existing device (I think it currently
> always creates new ones).

Libvirt can use a pre-created TAP device today, but not a pre-created
MACVTAP, so supporting the latter is new code for us no matter what.

> > One slight complication with either of the solutions above is that
> > libvirt won't know whether it is given a TAP or a MACVTAP device.
> > It'll only be given the device name. So with code today we would
> > probably have to first try /dev/tapNNN and if that doesn't exist
> > then try /dev/net/tun with TUNSETIFF.
> >
> 
> hmmm. doesn't libvirt make this distinction today?

No need to make the distinction yet, since we only support pre-created
TAP devices right now. In cases where we create the devices ourselves,
we already know what is what.

> > If adding a new /dev/net/tap, something could seemlessy accept
> > either a TAP or MACTAP nic name would be nice.
> > 
> >
> 
> I think if we added a new ioctl() as I proposed it could accept either
> type of nic.

ok that would be nice.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

  reply	other threads:[~2018-07-10  8:46 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-05 14:20 opening tap devices that are created in a container Jason Baron
2018-07-05 16:10 ` Daniel P. Berrangé
2018-07-09 20:56   ` Jason Baron
2018-07-10  8:46     ` Daniel P. Berrangé [this message]
2018-07-05 16:24 ` [libvirt] " Roman Mohr
2018-07-08  6:01   ` Martin Kletzander
2018-07-09 21:00     ` Jason Baron
2018-07-10  8:47       ` Daniel P. Berrangé
     [not found]       ` <20180711101005.GA13392@wheatley>
2018-07-12  3:33         ` Jason Baron
2018-07-17 11:58         ` Roman Mohr
2018-07-17 11:45       ` Martin Kletzander

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180710084614.GA1612@redhat.com \
    --to=berrange@redhat.com \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=fdeutsch@redhat.com \
    --cc=jbaron@akamai.com \
    --cc=libvir-list@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=rmohr@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.