From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Jason Baron <jbaron@akamai.com>
Cc: fabiand@sni.github.map.fastly.net, libvir-list@redhat.com,
netdev@vger.kernel.org, Roman Mohr <rmohr@redhat.com>,
ebiederm@xmission.com, Martin Kletzander <mkletzan@redhat.com>,
davem@davemloft.net
Subject: Re: [libvirt] opening tap devices that are created in a container
Date: Tue, 10 Jul 2018 09:47:45 +0100 [thread overview]
Message-ID: <20180710084745.GB1612@redhat.com> (raw)
In-Reply-To: <6f5f40b6-3637-c7a9-44f8-81352ece2bef@akamai.com>
On Mon, Jul 09, 2018 at 05:00:49PM -0400, Jason Baron wrote:
>
>
> On 07/08/2018 02:01 AM, Martin Kletzander wrote:
> > On Thu, Jul 05, 2018 at 06:24:20PM +0200, Roman Mohr wrote:
> >> On Thu, Jul 5, 2018 at 4:20 PM Jason Baron <jbaron@akamai.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> Opening tap devices, such as macvtap, that are created in containers is
> >>> problematic because the interface for opening tap devices is via
> >>> /dev/tapNN and devtmpfs is not typically mounted inside a container as
> >>> its not namespace aware. It is possible to do a mknod() in the
> >>> container, once the tap devices are created, however, since the tap
> >>> devices are created dynamically its not possible to apriori allow access
> >>> to certain major/minor numbers, since we don't know what these are going
> >>> to be. In addition, its desirable to not allow the mknod capability in
> >>> containers. This behavior, I think is somewhat inconsistent with the
> >>> tuntap driver where one can create tuntap devices inside a container by
> >>> first opening /dev/net/tun and then using them by supplying the tuntap
> >>> device name via the ioctl(TUNSETIFF). And since TUNSETIFF validates the
> >>> network namespace, one is limited to opening network devices that belong
> >>> to your current network namespace.
> >>>
> >>> Here are some options to this issue, that I wanted to get feedback
> >>> about, and just wondering if anybody else has run into this.
> >>>
> >>> 1)
> >>>
> >>> Don't create the tap device, such as macvtap in the container. Instead,
> >>> create the tap device outside of the container and then move it into the
> >>> desired container network namespace. In addition, do a mknod() for the
> >>> corresponding /dev/tapNN device from outside the container before doing
> >>> chroot().
> >>>
> >>> This solution still doesn't allow tap devices to be created inside the
> >>> container. Thus, in the case of kubevirt, which runs libvirtd inside of
> >>> a container, it would mean changing libvirtd to open existing tap
> >>> devices (as opposed to the current behavior of creating new ones). This
> >>> would not require any kernel changes, but as mentioned seems
> >>> inconsistent with the tuntap interface.
> >>>
> >>
> >> For KubeVirt, apart from how exactly the device ends up in the
> >> container, I
> >> would want to pursue a way where all network preparations which require
> >> privileges happens from a privileged process *outside* of the container.
> >> Like CNI solutions do it. They run outside, have privileges and then
> >> create
> >> devices in the right network/mount namespace or move them there. The
> >> final
> >> goal for KubeVirt is that our pod with the qemu process is completely
> >> unprivileged and privileged setup happens from outside.
> >>
> >> As a consequence, and depending on which route Dan pursues with the
> >> restructured libvirt, I would assume that either a privileged
> >> libvirtd-part
> >> outside of containers creates the devices by entering the right
> >> namespaces,
> >> or that libvirt in the container can consume pre-created tun/tap devices,
> >> like qemu.
> >>
> >
> > That would be nice, but as far as I understand there will always be a
> > need for
> > some privileges if you want to use a tap device. It's nice that CNI
> > does that
> > and all the containers can run unprivileged, but that's because they do
> > not open
> > the tap device and they do not do any privileged operations on it. But
> > QEMU
> > needs to. So the only way would be passing an opened fd to the
> > container or
> > opening the tap device there and making the fd usable for one process in
> > the
> > container. Is this already supported for some type of containers in
> > some way?
> >
> > Martin
>
> Hi,
>
> So another option here call it #3 is to pass open fds via unix sockets.
> If there are privileged operations that QEMU is trying to do with the fd
> though, how will opening it first and then passing it to an unprivileged
> QEMU address that? Is the opener doing those operations first?
>From libvirt's POV, it would be preferrable to be able to open the
macvtap device by name inside the container, rather than having to
accept a pre-opened FD from the application.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
next prev parent reply other threads:[~2018-07-10 8:47 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-05 14:20 opening tap devices that are created in a container Jason Baron
2018-07-05 16:10 ` Daniel P. Berrangé
2018-07-09 20:56 ` Jason Baron
2018-07-10 8:46 ` Daniel P. Berrangé
2018-07-05 16:24 ` [libvirt] " Roman Mohr
2018-07-08 6:01 ` Martin Kletzander
2018-07-09 21:00 ` Jason Baron
2018-07-10 8:47 ` Daniel P. Berrangé [this message]
[not found] ` <20180711101005.GA13392@wheatley>
2018-07-12 3:33 ` Jason Baron
2018-07-17 11:58 ` Roman Mohr
2018-07-17 11:45 ` Martin Kletzander
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180710084745.GB1612@redhat.com \
--to=berrange@redhat.com \
--cc=davem@davemloft.net \
--cc=ebiederm@xmission.com \
--cc=fabiand@sni.github.map.fastly.net \
--cc=jbaron@akamai.com \
--cc=libvir-list@redhat.com \
--cc=mkletzan@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=rmohr@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).