* Guest bridge setup variations
@ 2009-12-08 16:07 Arnd Bergmann
2009-12-10 12:26 ` Fischer, Anna
0 siblings, 1 reply; 7+ messages in thread
From: Arnd Bergmann @ 2009-12-08 16:07 UTC (permalink / raw)
To: virtualization; +Cc: qemu-devel
As promised, here is my small writeup on which setups I feel
are important in the long run for server-type guests. This
does not cover -net user, which is really for desktop kinds
of applications where you do not want to connect into the
guest from another IP address.
I can see four separate setups that we may or may not want to
support, the main difference being how the forwarding between
guests happens:
1. The current setup, with a bridge and tun/tap devices on ports
of the bridge. This is what Gerhard's work on access controls is
focused on and the only option where the hypervisor actually
is in full control of the traffic between guests. CPU utilization should
be highest this way, and network management can be a burden,
because the controls are done through a Linux, libvirt and/or Director
specific interface.
2. Using macvlan as a bridging mechanism, replacing the bridge
and tun/tap entirely. This should offer the best performance on
inter-guest communication, both in terms of throughput and
CPU utilization, but offer no access control for this traffic at all.
Performance of guest-external traffic should be slightly better
than bridge/tap.
3. Doing the bridging in the NIC using macvlan in passthrough
mode. This lowers the CPU utilization further compared to 2,
at the expense of limiting throughput by the performance of
the PCIe interconnect to the adapter. Whether or not this
is a win is workload dependent. Access controls now happen
in the NIC. Currently, this is not supported yet, due to lack of
device drivers, but it will be an important scenario in the future
according to some people.
4. Using macvlan for actual VEPA on the outbound interface.
This is mostly interesting because it makes the network access
controls visible in an external switch that is already managed.
CPU utilization and guest-external throughput should be
identical to 3, but inter-guest latency can only be worse because
all frames go through the external switch.
In case 2 through 4, we have the choice between macvtap and
the raw packet interface for connecting macvlan to qemu.
Raw sockets are better tested right now, while macvtap has
better permission management (i.e. it does not require
CAP_NET_ADMIN). Neither one is upstream though at the
moment. The raw driver only requires qemu patches, while
macvtap requires both a new kernel driver and a trivial change
in qemu.
In all four cases, vhost-net could be used to move the workload
from user space into the kernel, which may be an advantage.
The decision for or against vhost-net is entirely independent of
the other decisions.
Arnd
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Guest bridge setup variations
2009-12-08 16:07 Arnd Bergmann
@ 2009-12-10 12:26 ` Fischer, Anna
2009-12-10 14:18 ` Arnd Bergmann
0 siblings, 1 reply; 7+ messages in thread
From: Fischer, Anna @ 2009-12-10 12:26 UTC (permalink / raw)
To: Arnd Bergmann
Cc: qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org
> Subject: Guest bridge setup variations
>
> As promised, here is my small writeup on which setups I feel
> are important in the long run for server-type guests. This
> does not cover -net user, which is really for desktop kinds
> of applications where you do not want to connect into the
> guest from another IP address.
>
> I can see four separate setups that we may or may not want to
> support, the main difference being how the forwarding between
> guests happens:
>
> 1. The current setup, with a bridge and tun/tap devices on ports
> of the bridge. This is what Gerhard's work on access controls is
> focused on and the only option where the hypervisor actually
> is in full control of the traffic between guests. CPU utilization should
> be highest this way, and network management can be a burden,
> because the controls are done through a Linux, libvirt and/or Director
> specific interface.
>
> 2. Using macvlan as a bridging mechanism, replacing the bridge
> and tun/tap entirely. This should offer the best performance on
> inter-guest communication, both in terms of throughput and
> CPU utilization, but offer no access control for this traffic at all.
> Performance of guest-external traffic should be slightly better
> than bridge/tap.
>
> 3. Doing the bridging in the NIC using macvlan in passthrough
> mode. This lowers the CPU utilization further compared to 2,
> at the expense of limiting throughput by the performance of
> the PCIe interconnect to the adapter. Whether or not this
> is a win is workload dependent. Access controls now happen
> in the NIC. Currently, this is not supported yet, due to lack of
> device drivers, but it will be an important scenario in the future
> according to some people.
Can you differentiate this option from typical PCI pass-through mode? It is not clear to me where macvlan sits in a setup where the NIC does bridging.
Typically, in a PCI pass-through configuration, all configuration goes through the physical function device driver (and all data goes directly to the NIC). Are you suggesting to use macvlan as a common configuration layer that then configures the underlying NIC? I could see some benefit in such a model, though I am not certain I understand you correctly.
Thanks,
Anna
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Guest bridge setup variations
2009-12-10 12:26 ` Fischer, Anna
@ 2009-12-10 14:18 ` Arnd Bergmann
2009-12-10 19:14 ` Alexander Graf
0 siblings, 1 reply; 7+ messages in thread
From: Arnd Bergmann @ 2009-12-10 14:18 UTC (permalink / raw)
To: Fischer, Anna
Cc: qemu-devel@nongnu.org, virtualization@lists.linux-foundation.org
On Thursday 10 December 2009, Fischer, Anna wrote:
> >
> > 3. Doing the bridging in the NIC using macvlan in passthrough
> > mode. This lowers the CPU utilization further compared to 2,
> > at the expense of limiting throughput by the performance of
> > the PCIe interconnect to the adapter. Whether or not this
> > is a win is workload dependent. Access controls now happen
> > in the NIC. Currently, this is not supported yet, due to lack of
> > device drivers, but it will be an important scenario in the future
> > according to some people.
>
> Can you differentiate this option from typical PCI pass-through mode?
> It is not clear to me where macvlan sits in a setup where the NIC does
> bridging.
In this setup (hypothetical so far, the code doesn't exist yet), we use
the configuration logic of macvlan, but not the forwarding. This also
doesn't do PCI pass-through but instead gives all the logical interfaces
to the host, using only the bridging and traffic separation capabilities
of the NIC, but not the PCI-separation.
Intel calls this mode VMDq, as opposed to SR-IOV, which implies
the assignment of the adapter to a guest.
It was confusing of me to call it passthrough above, sorry for that.
> Typically, in a PCI pass-through configuration, all configuration goes
> through the physical function device driver (and all data goes directly
> to the NIC). Are you suggesting to use macvlan as a common
> configuration layer that then configures the underlying NIC?
> I could see some benefit in such a model, though I am not certain I
> understand you correctly.
This is something I also have been thinking about, but it is not what
I was referring to above. I think it would be good to keep the three
cases (macvlan, VMDq, SR-IOV) as similar as possible from the user
perspective, so using macvlan as an infrastructure for all of them
sounds reasonable to me.
The difference between VMDq and SR-IOV in that case would be
that the former uses a virtio-net driver in the guest and a hardware
driver in the host, while the latter uses a hardware driver in the guest
only. The data flow on these two would be identical though, while
in the classic macvlan the data forwarding decisions are made in
the host kernel.
Arnd
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Guest bridge setup variations
2009-12-10 14:18 ` Arnd Bergmann
@ 2009-12-10 19:14 ` Alexander Graf
0 siblings, 0 replies; 7+ messages in thread
From: Alexander Graf @ 2009-12-10 19:14 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Fischer, Anna, qemu-devel@nongnu.org,
virtualization@lists.linux-foundation.org
On 10.12.2009, at 15:18, Arnd Bergmann wrote:
> On Thursday 10 December 2009, Fischer, Anna wrote:
>>>
>>> 3. Doing the bridging in the NIC using macvlan in passthrough
>>> mode. This lowers the CPU utilization further compared to 2,
>>> at the expense of limiting throughput by the performance of
>>> the PCIe interconnect to the adapter. Whether or not this
>>> is a win is workload dependent. Access controls now happen
>>> in the NIC. Currently, this is not supported yet, due to lack of
>>> device drivers, but it will be an important scenario in the future
>>> according to some people.
>>
>> Can you differentiate this option from typical PCI pass-through mode?
>> It is not clear to me where macvlan sits in a setup where the NIC does
>> bridging.
>
> In this setup (hypothetical so far, the code doesn't exist yet), we use
> the configuration logic of macvlan, but not the forwarding. This also
> doesn't do PCI pass-through but instead gives all the logical interfaces
> to the host, using only the bridging and traffic separation capabilities
> of the NIC, but not the PCI-separation.
>
> Intel calls this mode VMDq, as opposed to SR-IOV, which implies
> the assignment of the adapter to a guest.
>
> It was confusing of me to call it passthrough above, sorry for that.
>
>> Typically, in a PCI pass-through configuration, all configuration goes
>> through the physical function device driver (and all data goes directly
>> to the NIC). Are you suggesting to use macvlan as a common
>> configuration layer that then configures the underlying NIC?
>> I could see some benefit in such a model, though I am not certain I
>> understand you correctly.
>
> This is something I also have been thinking about, but it is not what
> I was referring to above. I think it would be good to keep the three
> cases (macvlan, VMDq, SR-IOV) as similar as possible from the user
> perspective, so using macvlan as an infrastructure for all of them
> sounds reasonable to me.
Oh, so you'd basically do -net vt-d,if=eth0 and the rest would automatically work? That's a pretty slick idea!
Alex
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Guest bridge setup variations
[not found] <78C9135A3D2ECE4B8162EBDCE82CAD7705FDEDCF@nekter>
@ 2009-12-16 0:55 ` Leonid Grossman
2009-12-16 14:15 ` Arnd Bergmann
0 siblings, 1 reply; 7+ messages in thread
From: Leonid Grossman @ 2009-12-16 0:55 UTC (permalink / raw)
To: virtualization; +Cc: qemu-devel
> > -----Original Message-----
> > From: virtualization-bounces@lists.linux-foundation.org
> > [mailto:virtualization-bounces@lists.linux-foundation.org] On Behalf
> Of
> > Arnd Bergmann
> > Sent: Tuesday, December 08, 2009 8:08 AM
> > To: virtualization@lists.linux-foundation.org
> > Cc: qemu-devel@nongnu.org
> > Subject: Guest bridge setup variations
> >
> > As promised, here is my small writeup on which setups I feel
> > are important in the long run for server-type guests. This
> > does not cover -net user, which is really for desktop kinds
> > of applications where you do not want to connect into the
> > guest from another IP address.
> >
> > I can see four separate setups that we may or may not want to
> > support, the main difference being how the forwarding between
> > guests happens:
> >
> > 1. The current setup, with a bridge and tun/tap devices on ports
> > of the bridge. This is what Gerhard's work on access controls is
> > focused on and the only option where the hypervisor actually
> > is in full control of the traffic between guests. CPU utilization
> should
> > be highest this way, and network management can be a burden,
> > because the controls are done through a Linux, libvirt and/or
> Director
> > specific interface.
> >
> > 2. Using macvlan as a bridging mechanism, replacing the bridge
> > and tun/tap entirely. This should offer the best performance on
> > inter-guest communication, both in terms of throughput and
> > CPU utilization, but offer no access control for this traffic at
all.
> > Performance of guest-external traffic should be slightly better
> > than bridge/tap.
> >
> > 3. Doing the bridging in the NIC using macvlan in passthrough
> > mode. This lowers the CPU utilization further compared to 2,
> > at the expense of limiting throughput by the performance of
> > the PCIe interconnect to the adapter. Whether or not this
> > is a win is workload dependent.
This is certainly true today for pci-e 1.1 and 2.0 devices, but
as NICs move to pci-e 3.0 (while remaining almost exclusively dual port
10GbE for a long while),
EVB internal bandwidth will significantly exceed external bandwidth.
So, #3 can become a win for most inter-guest workloads.
> > Access controls now happen
> > in the NIC. Currently, this is not supported yet, due to lack of
> > device drivers, but it will be an important scenario in the future
> > according to some people.
Actually, x3100 10GbE drivers support this today via sysfs interface to
the host driver
that can choose to control VEB tables (and therefore MAC addresses, vlan
memberships, etc. for all passthru interfaces behind the VEB).
OF course a more generic vendor-independent interface will be important
in the future.
> >
> > 4. Using macvlan for actual VEPA on the outbound interface.
> > This is mostly interesting because it makes the network access
> > controls visible in an external switch that is already managed.
> > CPU utilization and guest-external throughput should be
> > identical to 3, but inter-guest latency can only be worse because
> > all frames go through the external switch.
> >
> > In case 2 through 4, we have the choice between macvtap and
> > the raw packet interface for connecting macvlan to qemu.
> > Raw sockets are better tested right now, while macvtap has
> > better permission management (i.e. it does not require
> > CAP_NET_ADMIN). Neither one is upstream though at the
> > moment. The raw driver only requires qemu patches, while
> > macvtap requires both a new kernel driver and a trivial change
> > in qemu.
> >
> > In all four cases, vhost-net could be used to move the workload
> > from user space into the kernel, which may be an advantage.
> > The decision for or against vhost-net is entirely independent of
> > the other decisions.
> >
> > Arnd
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linux-foundation.org/mailman/listinfo/virtualization
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Guest bridge setup variations
2009-12-16 0:55 ` Guest bridge setup variations Leonid Grossman
@ 2009-12-16 14:15 ` Arnd Bergmann
2009-12-17 6:18 ` Leonid Grossman
0 siblings, 1 reply; 7+ messages in thread
From: Arnd Bergmann @ 2009-12-16 14:15 UTC (permalink / raw)
To: virtualization; +Cc: Leonid Grossman, qemu-devel
On Wednesday 16 December 2009, Leonid Grossman wrote:
> > > 3. Doing the bridging in the NIC using macvlan in passthrough
> > > mode. This lowers the CPU utilization further compared to 2,
> > > at the expense of limiting throughput by the performance of
> > > the PCIe interconnect to the adapter. Whether or not this
> > > is a win is workload dependent.
>
> This is certainly true today for pci-e 1.1 and 2.0 devices, but
> as NICs move to pci-e 3.0 (while remaining almost exclusively dual port
> 10GbE for a long while),
> EVB internal bandwidth will significantly exceed external bandwidth.
> So, #3 can become a win for most inter-guest workloads.
Right, it's also hardware dependent, but it usually comes down
to whether it's cheaper to spend CPU cycles or to spend IO bandwidth.
I would be surprised if all future machines with PCIe 3.0 suddenly have
a huge surplus of bandwidth but no CPU to keep up with that.
> > > Access controls now happen
> > > in the NIC. Currently, this is not supported yet, due to lack of
> > > device drivers, but it will be an important scenario in the future
> > > according to some people.
>
> Actually, x3100 10GbE drivers support this today via sysfs interface to
> the host driver
> that can choose to control VEB tables (and therefore MAC addresses, vlan
> memberships, etc. for all passthru interfaces behind the VEB).
Ok, I didn't know about that.
> OF course a more generic vendor-independent interface will be important
> in the future.
Right. I hope we can come up with something soon. I'll have a look at
what your driver does and see if that can be abstracted in some way.
I expect that if we can find an interface between the kernel and device
driver for two or three NIC implementations that it will be good enough
to adapt to everyone else as well.
Arnd
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: Guest bridge setup variations
2009-12-16 14:15 ` Arnd Bergmann
@ 2009-12-17 6:18 ` Leonid Grossman
0 siblings, 0 replies; 7+ messages in thread
From: Leonid Grossman @ 2009-12-17 6:18 UTC (permalink / raw)
To: Arnd Bergmann, virtualization; +Cc: qemu-devel
> -----Original Message-----
> From: Arnd Bergmann [mailto:arnd@arndb.de]
> Sent: Wednesday, December 16, 2009 6:16 AM
> To: virtualization@lists.linux-foundation.org
> Cc: Leonid Grossman; qemu-devel@nongnu.org
> Subject: Re: Guest bridge setup variations
>
> On Wednesday 16 December 2009, Leonid Grossman wrote:
> > > > 3. Doing the bridging in the NIC using macvlan in passthrough
> > > > mode. This lowers the CPU utilization further compared to 2,
> > > > at the expense of limiting throughput by the performance of
> > > > the PCIe interconnect to the adapter. Whether or not this
> > > > is a win is workload dependent.
> >
> > This is certainly true today for pci-e 1.1 and 2.0 devices, but
> > as NICs move to pci-e 3.0 (while remaining almost exclusively dual
> port
> > 10GbE for a long while),
> > EVB internal bandwidth will significantly exceed external bandwidth.
> > So, #3 can become a win for most inter-guest workloads.
>
> Right, it's also hardware dependent, but it usually comes down
> to whether it's cheaper to spend CPU cycles or to spend IO bandwidth.
>
> I would be surprised if all future machines with PCIe 3.0 suddenly
have
> a huge surplus of bandwidth but no CPU to keep up with that.
>
> > > > Access controls now happen
> > > > in the NIC. Currently, this is not supported yet, due to lack of
> > > > device drivers, but it will be an important scenario in the
> future
> > > > according to some people.
> >
> > Actually, x3100 10GbE drivers support this today via sysfs interface
> to
> > the host driver
> > that can choose to control VEB tables (and therefore MAC addresses,
> vlan
> > memberships, etc. for all passthru interfaces behind the VEB).
>
> Ok, I didn't know about that.
>
> > OF course a more generic vendor-independent interface will be
> important
> > in the future.
>
> Right. I hope we can come up with something soon. I'll have a look at
> what your driver does and see if that can be abstracted in some way.
Sounds good, please let us know if looking at the code/documentation
will suffice or you need a couple cards to go along with the code.
> I expect that if we can find an interface between the kernel and
device
> driver for two or three NIC implementations that it will be good
enough
> to adapt to everyone else as well.
The interface will likely evolve along with EVB standards and other
developments, but
initial implementation can be pretty basic (and vendor-independent).
Early IOV NIC deployments can benefit from an interface that sets couple
VF parameters missing in "legacy" NIC interface - things like bandwidth
limit and list of MAC addresses (since setting a NIC in promisc mode
doesn't work well for VEB, it is currently forced to learn the addresses
it is configured for).
The interface can also include querying IOV NIC capabilities like number
of VFs, support for VEB and/or VEPA mode, etc as well as getting VF
stats and MAC/VLAN tables - all in all, it is not a long list.
>
> Arnd
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2009-12-17 6:18 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <78C9135A3D2ECE4B8162EBDCE82CAD7705FDEDCF@nekter>
2009-12-16 0:55 ` Guest bridge setup variations Leonid Grossman
2009-12-16 14:15 ` Arnd Bergmann
2009-12-17 6:18 ` Leonid Grossman
2009-12-08 16:07 Arnd Bergmann
2009-12-10 12:26 ` Fischer, Anna
2009-12-10 14:18 ` Arnd Bergmann
2009-12-10 19:14 ` Alexander Graf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).