From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Jason Baron <jbaron@akamai.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"David S. Miller" <davem@davemloft.net>,
libvir-list@redhat.com, rmohr@redhat.com,
Fabian Deutsch <fdeutsch@redhat.com>,
"Eric W. Biederman" <ebiederm@xmission.com>
Subject: Re: opening tap devices that are created in a container
Date: Tue, 10 Jul 2018 09:46:14 +0100 [thread overview]
Message-ID: <20180710084614.GA1612@redhat.com> (raw)
In-Reply-To: <72dfae4e-876a-729e-6dc6-61219f8d0294@akamai.com>
On Mon, Jul 09, 2018 at 04:56:04PM -0400, Jason Baron wrote:
>
>
> On 07/05/2018 12:10 PM, Daniel P. Berrangé wrote:
> > On Thu, Jul 05, 2018 at 10:20:16AM -0400, Jason Baron wrote:
> >> Hi,
> >>
> >> Opening tap devices, such as macvtap, that are created in containers is
> >> problematic because the interface for opening tap devices is via
> >> /dev/tapNN and devtmpfs is not typically mounted inside a container as
> >> its not namespace aware. It is possible to do a mknod() in the
> >> container, once the tap devices are created, however, since the tap
> >> devices are created dynamically its not possible to apriori allow access
> >> to certain major/minor numbers, since we don't know what these are going
> >> to be. In addition, its desirable to not allow the mknod capability in
> >> containers. This behavior, I think is somewhat inconsistent with the
> >> tuntap driver where one can create tuntap devices inside a container by
> >> first opening /dev/net/tun and then using them by supplying the tuntap
> >> device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates the
> >> network namespace, one is limited to opening network devices that belong
> >> to your current network namespace.
> >>
> >> Here are some options to this issue, that I wanted to get feedback
> >> about, and just wondering if anybody else has run into this.
> >>
> >> 1)
> >>
> >> Don't create the tap device, such as macvtap in the container. Instead,
> >> create the tap device outside of the container and then move it into the
> >> desired container network namespace. In addition, do a mknod() for the
> >> corresponding /dev/tapNN device from outside the container before doing
> >> chroot().
> >>
> >> This solution still doesn't allow tap devices to be created inside the
> >> container. Thus, in the case of kubevirt, which runs libvirtd inside of
> >> a container, it would mean changing libvirtd to open existing tap
> >> devices (as opposed to the current behavior of creating new ones). This
> >> would not require any kernel changes, but as mentioned seems
> >> inconsistent with the tuntap interface.
> >
> > Presumably the /dev/tapNN device name also changes when you rename
> > the tap device interface using SIOCSIFNAME ?
> >
>
> I don't think so. the NN is the ifindex of the device- changing the
> device name does not affect the ifindex.
Ah right that makes sense.
> > eg if it was /dev/tap24 in the host and you called SIOCSIFNAME(eth0)
> > when moving it into the container, it would be /dev/eth0 inside the
> > container ?
> >
>
> When moving it into the container the ifindex can change since the
> ifindex range is per-namespace (not global).
Oh thats interesting, I hadn't realized that.
> > Anyway, given that this /dev/tapNN approach is what exists today,
> > libvirt will likely want to implement support for this regardless
> > in order to support existing kernels.
>
> Ok, in this case whatever created the tap device outside of the
> container would pass the name of the device to libvirt and make sure
> that the /dev/tapNN device was setup correctly in the container. I
> believe this differs from how libvirt works today in that libvirt would
> need to be modified to open an existing device (I think it currently
> always creates new ones).
Libvirt can use a pre-created TAP device today, but not a pre-created
MACVTAP, so supporting the latter is new code for us no matter what.
> > One slight complication with either of the solutions above is that
> > libvirt won't know whether it is given a TAP or a MACVTAP device.
> > It'll only be given the device name. So with code today we would
> > probably have to first try /dev/tapNNN and if that doesn't exist
> > then try /dev/net/tun with TUNSETIFF.
> >
>
> hmmm. doesn't libvirt make this distinction today?
No need to make the distinction yet, since we only support pre-created
TAP devices right now. In cases where we create the devices ourselves,
we already know what is what.
> > If adding a new /dev/net/tap, something could seemlessy accept
> > either a TAP or MACTAP nic name would be nice.
> >
> >
>
> I think if we added a new ioctl() as I proposed it could accept either
> type of nic.
ok that would be nice.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
next prev parent reply other threads:[~2018-07-10 8:46 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-05 14:20 opening tap devices that are created in a container Jason Baron
2018-07-05 16:10 ` Daniel P. Berrangé
2018-07-09 20:56 ` Jason Baron
2018-07-10 8:46 ` Daniel P. Berrangé [this message]
2018-07-05 16:24 ` [libvirt] " Roman Mohr
2018-07-08 6:01 ` Martin Kletzander
2018-07-09 21:00 ` Jason Baron
2018-07-10 8:47 ` Daniel P. Berrangé
[not found] ` <20180711101005.GA13392@wheatley>
2018-07-12 3:33 ` Jason Baron
2018-07-17 11:58 ` Roman Mohr
2018-07-17 11:45 ` Martin Kletzander
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180710084614.GA1612@redhat.com \
--to=berrange@redhat.com \
--cc=davem@davemloft.net \
--cc=ebiederm@xmission.com \
--cc=fdeutsch@redhat.com \
--cc=jbaron@akamai.com \
--cc=libvir-list@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=rmohr@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).