discussion questions: SR-IOV, virtualization, and bonding

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* discussion questions: SR-IOV, virtualization, and bonding
@ 2012-08-02 19:21 Chris Friesen
  2012-08-02 20:30 ` Jay Vosburgh
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Friesen @ 2012-08-02 19:21 UTC (permalink / raw)
  To: e1000-devel@lists.sourceforge.net, netdev

Hi all,

I wanted to just highlight some issues that we're seeing and see what 
others are doing in this area.

Our configuration is that we have a host with SR-IOV-capable NICs with 
bonding enabled on the PF.  Depending on the exact system it could be 
active/standby or some form of active/active.

In the guests we generally have several VFs (corresponding to several 
PFs) and we want to bond them for reliability.

We're seeing a number of issues:

1) If the guests use arp monitoring then broadcast arp packets from the 
guests are visible on the other guests and on the host, and can cause 
them to think the link is good even if we aren't receiving arp packets 
from the external network.  (I'm assuming carrier is up.)

2) If both the host and guest use active/backup but pick different 
devices as the active, there is no traffic between host/guest over the 
bond link.  Packets are sent out the active and looped back internally 
to arrive on the inactive, then skb_bond_should_drop() suppresses them.

3) For active/standby the default is to set the standby to the MAC 
address of the bond.  If the host has already set the MAC address (using 
some algorithm to ensure uniqueness within the local network) then the 
guest is not allowed to change it.

So far the solutions to 1 seem to be either using arp validation (which 
currently doesn't exist for loadbalancing modes) or else have the 
underlying ethernet driver distinguish between packets coming from the 
wire vs being looped back internally and have the bonding driver only 
set last_rx for external packets.

For issue 2, it would seem beneficial for the host to be able to ensure 
that the guest uses the same link as the active.  I don't see a tidy 
solution here.  One somewhat messy possibility here is to have bonding 
send a message to the standby PF which then tells all its VFs to fake 
loss of carrier.

For issue 3, the logical solution would seem to be some way of assigning 
a list of "valid" mac addresses to a given VF--like maybe all MAC 
addresses assigned to a VM or something.  Anyone have any bright ideas?

I'm sure we're not the only ones running into this, so what are others 
doing?  Is the only current option to use active/active with miimon?

Chris

-- 

Chris Friesen
Software Designer

3500 Carling Avenue
Ottawa, Ontario K2H 8E9
www.genband.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: discussion questions: SR-IOV, virtualization, and bonding
  2012-08-02 19:21 discussion questions: SR-IOV, virtualization, and bonding Chris Friesen
@ 2012-08-02 20:30 ` Jay Vosburgh
  2012-08-02 22:26   ` Chris Friesen
  0 siblings, 1 reply; 10+ messages in thread
From: Jay Vosburgh @ 2012-08-02 20:30 UTC (permalink / raw)
  To: Chris Friesen; +Cc: e1000-devel@lists.sourceforge.net, netdev


Chris Friesen <chris.friesen@genband.com> wrote:
>Hi all,
>
>I wanted to just highlight some issues that we're seeing and see what 
>others are doing in this area.
>
>Our configuration is that we have a host with SR-IOV-capable NICs with 
>bonding enabled on the PF.  Depending on the exact system it could be 
>active/standby or some form of active/active.
>
>In the guests we generally have several VFs (corresponding to several 
>PFs) and we want to bond them for reliability.
>
>We're seeing a number of issues:
>
>1) If the guests use arp monitoring then broadcast arp packets from the 
>guests are visible on the other guests and on the host, and can cause 
>them to think the link is good even if we aren't receiving arp packets 
>from the external network.  (I'm assuming carrier is up.)
>
>2) If both the host and guest use active/backup but pick different 
>devices as the active, there is no traffic between host/guest over the 
>bond link.  Packets are sent out the active and looped back internally 
>to arrive on the inactive, then skb_bond_should_drop() suppresses them.

	Just to be sure that I'm following this correctly, you're
setting up active-backup bonds on the guest and the host.  The guest
sets its active slave to be a VF from "SR-IOV Device A," but the host
sets its active slave to a PF from "SR-IOV Device B."  Traffic from the
guest to the host then arrives at the host's inactive slave (it's PF for
"SR-IOV Device A") and is then dropped.

	Correct?

>3) For active/standby the default is to set the standby to the MAC 
>address of the bond.  If the host has already set the MAC address (using 
>some algorithm to ensure uniqueness within the local network) then the 
>guest is not allowed to change it.
>
>
>So far the solutions to 1 seem to be either using arp validation (which 
>currently doesn't exist for loadbalancing modes) or else have the 
>underlying ethernet driver distinguish between packets coming from the 
>wire vs being looped back internally and have the bonding driver only 
>set last_rx for external packets.

	As discussed previously, e.g.,:

http://marc.info/?l=linux-netdev&m=134316327912154&w=2

	implementing arp_validate for load balance modes is tricky at
best, regardless of SR-IOV issues.

	This is really a variation on the situation that led to the
arp_validate functionality in the first place (that multiple instances
of ARP monitor on a subnet can fool one another), except that the switch
here is within the SR-IOV device and the various hosts are guests.

	The best long term solution is to have a user space API that
provides link state input to bonding on a per-slave basis, and then some
user space entity can perform whatever link monitoring method is
appropriate (e.g., LLDP) and pass the results to bonding.

>For issue 2, it would seem beneficial for the host to be able to ensure 
>that the guest uses the same link as the active.  I don't see a tidy 
>solution here.  One somewhat messy possibility here is to have bonding 
>send a message to the standby PF which then tells all its VFs to fake 
>loss of carrier.

	There is no tidy solution here that I'm aware of; this has been
a long standing concern in bladecenter type of network environments,
wherein all blade "eth0" interfaces connect to one chassis switch, and
all blade "eth1" interfaces connect to a different chassis switch.  If
those switches are not connected, then there may not be a path from
blade A:eth0 to blade B:eth1.  There is no simple mechanism to force a
gang failover across multiple hosts.

	That said, I've seen a slight rub on this using virtualized
network devices (pseries ehea, which is similar in principle to SR-IOV,
although implemented differently).  In that case, the single ehea card
provides all "eth0" devices for all lpars (logical partitions,
"guests").  A separate card (or individual per-lpar cards) provides the
"eth1" devices.

	In this configuration, the bonding primary option is used to
make eth0 the primary, and thus all lpars use eth0 preferentially, and
there is no connectivity issue.  If the ehea card itself fails, all of
the bonds will fail over simultaneously to the backup devices, and
again, there is no connectivity issue.  This works because the ehea is a
single point of failure for all of the partitions.

	Note that the ehea can propagate link failure of its external
port (the one that connects to a "real" switch) to its internal ports
(what the lpars see), so that bonding can detect the link failure.  This
is an option to ehea; by default, all internal ports are always carrier
up so that they can communicate with one another regardless of the
external port link state.  To my knowledge, this is used with miimon,
not the arp monitor.

	I don't know how SR-IOV operates in this regard (e.g., can VFs
fail independently from the PF?).  It is somewhat different from your
case in that there is no equivalent to the PF in the ehea case.  If the
PFs participate in the primary setting it will likely permit initial
connectivity, but I'm not sure if a PF plus all its VFs fail as a unit
(from bonding's point of view).

>For issue 3, the logical solution would seem to be some way of assigning 
>a list of "valid" mac addresses to a given VF--like maybe all MAC 
>addresses assigned to a VM or something.  Anyone have any bright ideas?

	There's an option to bonding, fail_over_mac, that modifies
bonding's handling of the slaves' MAC address(es).  One setting,
"active" instructs bonding to make its MAC be whatever the currently
active slave's MAC is, never changing any of the slave's MAC addresses.

>I'm sure we're not the only ones running into this, so what are others 
>doing?  Is the only current option to use active/active with miimon?

	I think you're at least close to the edge here; I've only done
some basic testing of bonding with SR-IOV, although I'm planning to do
some more early next week (and what you've found has been good input for
me, so thanks for that, at least).

	I suspect that some bonding configurations are simply not going
to work at all; e.g., I'm not aware of any SR-IOV devices that implement
LACP on the internal switch, and in any event, it would have to create
aggregators that span across physical network devices to be really
useful.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: discussion questions: SR-IOV, virtualization, and bonding
  2012-08-02 20:30 ` Jay Vosburgh
@ 2012-08-02 22:26   ` Chris Friesen
  2012-08-02 22:33     ` Chris Friesen
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Friesen @ 2012-08-02 22:26 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: e1000-devel@lists.sourceforge.net, netdev

On 08/02/2012 02:30 PM, Jay Vosburgh wrote:
>
> Chris Friesen<chris.friesen@genband.com>  wrote:

>> 2) If both the host and guest use active/backup but pick different
>> devices as the active, there is no traffic between host/guest over the
>> bond link.  Packets are sent out the active and looped back internally
>> to arrive on the inactive, then skb_bond_should_drop() suppresses them.
>
> 	Just to be sure that I'm following this correctly, you're
> setting up active-backup bonds on the guest and the host.  The guest
> sets its active slave to be a VF from "SR-IOV Device A," but the host
> sets its active slave to a PF from "SR-IOV Device B."  Traffic from the
> guest to the host then arrives at the host's inactive slave (it's PF for
> "SR-IOV Device A") and is then dropped.
>
> 	Correct?


Yes, that's correct.  The issue is that the internal switch on device A 
knows nothing about device B.  Ideally what should happen is that the 
internal switch routes the packets out onto the wire so that they come 
back in on device B and get routed up to the host.  However, at least 
with the Intel devices the internal switch has no learning capabilities.

The alternative is to have the external switch(es) configured to do the 
loopback, but that puts some extra requirements on the selection of the 
external switch.


>> So far the solutions to 1 seem to be either using arp validation (which
>> currently doesn't exist for loadbalancing modes) or else have the
>> underlying ethernet driver distinguish between packets coming from the
>> wire vs being looped back internally and have the bonding driver only
>> set last_rx for external packets.
>
> 	As discussed previously, e.g.,:
>
> http://marc.info/?l=linux-netdev&m=134316327912154&w=2
>
> 	implementing arp_validate for load balance modes is tricky at
> best, regardless of SR-IOV issues.

Yes, I should have referenced that discussion.  I thought I'd include it 
here with the other issues to group everything together.

> 	This is really a variation on the situation that led to the
> arp_validate functionality in the first place (that multiple instances
> of ARP monitor on a subnet can fool one another), except that the switch
> here is within the SR-IOV device and the various hosts are guests.
>
> 	The best long term solution is to have a user space API that
> provides link state input to bonding on a per-slave basis, and then some
> user space entity can perform whatever link monitoring method is
> appropriate (e.g., LLDP) and pass the results to bonding.

I think this has potential.  This requires a virtual communication 
channel between guest/host if we want the host to be able to influence 
the guest's choice of active link, but I think that's not unreasonable.

Actually, couldn't we do this now?  Turn off miimon and arpmon, then 
just have the userspace thing write to 
/sys/class/net/bondX/bonding/active_slave

>> For issue 2, it would seem beneficial for the host to be able to ensure
>> that the guest uses the same link as the active.  I don't see a tidy
>> solution here.  One somewhat messy possibility here is to have bonding
>> send a message to the standby PF which then tells all its VFs to fake
>> loss of carrier.
>
> 	There is no tidy solution here that I'm aware of; this has been
> a long standing concern in bladecenter type of network environments,
> wherein all blade "eth0" interfaces connect to one chassis switch, and
> all blade "eth1" interfaces connect to a different chassis switch.  If
> those switches are not connected, then there may not be a path from
> blade A:eth0 to blade B:eth1.  There is no simple mechanism to force a
> gang failover across multiple hosts.

In our blade server environment those two switches are indeed 
cross-connected, so we haven't had to do gang-failover.


> 	Note that the ehea can propagate link failure of its external
> port (the one that connects to a "real" switch) to its internal ports
> (what the lpars see), so that bonding can detect the link failure.  This
> is an option to ehea; by default, all internal ports are always carrier
> up so that they can communicate with one another regardless of the
> external port link state.  To my knowledge, this is used with miimon,
> not the arp monitor.
>
> 	I don't know how SR-IOV operates in this regard (e.g., can VFs
> fail independently from the PF?).  It is somewhat different from your
> case in that there is no equivalent to the PF in the ehea case.  If the
> PFs participate in the primary setting it will likely permit initial
> connectivity, but I'm not sure if a PF plus all its VFs fail as a unit
> (from bonding's point of view).

With current Intel drivers at least, if the PF detects link failure it 
fires a message to the VFs and they detect link failure within a short 
time (milliseconds).

We can recommend the use of the "primary" option, but we don't always 
have total control over what the guest does, and for some reason some of 
them don't want to use "primary".  I'm not sure why.


>> For issue 3, the logical solution would seem to be some way of assigning
>> a list of "valid" mac addresses to a given VF--like maybe all MAC
>> addresses assigned to a VM or something.  Anyone have any bright ideas?
>
> 	There's an option to bonding, fail_over_mac, that modifies
> bonding's handling of the slaves' MAC address(es).  One setting,
> "active" instructs bonding to make its MAC be whatever the currently
> active slave's MAC is, never changing any of the slave's MAC addresses.

Yes, I'm aware of that option.  It does have drawbacks though, as 
described in the bonding.txt docs.


>> I'm sure we're not the only ones running into this, so what are others
>> doing?  Is the only current option to use active/active with miimon?
>
> 	I think you're at least close to the edge here; I've only done
> some basic testing of bonding with SR-IOV, although I'm planning to do
> some more early next week (and what you've found has been good input for
> me, so thanks for that, at least).

Glad we could help.  :)

Chris

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: discussion questions: SR-IOV, virtualization, and bonding
  2012-08-02 22:26   ` Chris Friesen
@ 2012-08-02 22:33     ` Chris Friesen
  2012-08-02 23:01       ` [E1000-devel] " Jay Vosburgh
  0 siblings, 1 reply; 10+ messages in thread
From: Chris Friesen @ 2012-08-02 22:33 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: e1000-devel@lists.sourceforge.net, netdev

On 08/02/2012 04:26 PM, Chris Friesen wrote:
> On 08/02/2012 02:30 PM, Jay Vosburgh wrote:

>> The best long term solution is to have a user space API that
>> provides link state input to bonding on a per-slave basis, and then some
>> user space entity can perform whatever link monitoring method is
>> appropriate (e.g., LLDP) and pass the results to bonding.
>
> I think this has potential. This requires a virtual communication
> channel between guest/host if we want the host to be able to influence
> the guest's choice of active link, but I think that's not unreasonable.
>
> Actually, couldn't we do this now? Turn off miimon and arpmon, then just
> have the userspace thing write to /sys/class/net/bondX/bonding/active_slave

Hmm...looks like the bonding code requires either miimon or arpmon.  I 
wonder if setting miimon to INT_MAX might work, at least for some 
bonding modes.

Chris

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [E1000-devel] discussion questions: SR-IOV, virtualization, and bonding
  2012-08-02 22:33     ` Chris Friesen
@ 2012-08-02 23:01       ` Jay Vosburgh
  2012-08-02 23:15         ` Chris Friesen
  2012-08-03  4:50         ` [E1000-devel] " John Fastabend
  0 siblings, 2 replies; 10+ messages in thread
From: Jay Vosburgh @ 2012-08-02 23:01 UTC (permalink / raw)
  To: Chris Friesen; +Cc: e1000-devel@lists.sourceforge.net, netdev

Chris Friesen <chris.friesen@genband.com> wrote:

>On 08/02/2012 04:26 PM, Chris Friesen wrote:
>> On 08/02/2012 02:30 PM, Jay Vosburgh wrote:
>
>>> The best long term solution is to have a user space API that
>>> provides link state input to bonding on a per-slave basis, and then some
>>> user space entity can perform whatever link monitoring method is
>>> appropriate (e.g., LLDP) and pass the results to bonding.
>>
>> I think this has potential. This requires a virtual communication
>> channel between guest/host if we want the host to be able to influence
>> the guest's choice of active link, but I think that's not unreasonable.

	Not necessarily, if something like LLDP runs across the virtual
link between the guest and slave, then the guest will notice when the
link goes down (although perhaps not very quickly).  I'm pretty sure the
infrastructure to make LLDP work on inactive slaves is already there; as
I recall, the "no wildcard" or "deliver exact" business in the receive
path is at least partially for LLDP.

	Still, though, isn't "influence the guest's choice" pretty much
satisified by having the VF interface go carrier down in the guest when
the host wants it to?  Or are you thinking about more fine grained than
that?

>> Actually, couldn't we do this now? Turn off miimon and arpmon, then just
>> have the userspace thing write to /sys/class/net/bondX/bonding/active_slave

	That might work for active-backup mode, yes, although it may not
handle the case when all slaves have failed if "failed" does not include
the slave being carrier down.  It's not quite the same thing as input to
the link monitoring logic.

>Hmm...looks like the bonding code requires either miimon or arpmon.  I
>wonder if setting miimon to INT_MAX might work, at least for some bonding
>modes.

	Not true; it's legal to leave miimon and arp_interval set to 0.
Older versions of bonding will whine about it, but let you do it; in
mainline, it's a debug message you have to choose to turn on (because
current versions of initscripts, et al, create the bond first, and then
set those options, so it tended to whine all the time).

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [E1000-devel] discussion questions: SR-IOV, virtualization, and bonding
  2012-08-02 23:01       ` [E1000-devel] " Jay Vosburgh
@ 2012-08-02 23:15         ` Chris Friesen
  2012-08-02 23:36           ` Jay Vosburgh
  2012-08-03  4:50         ` [E1000-devel] " John Fastabend
  1 sibling, 1 reply; 10+ messages in thread
From: Chris Friesen @ 2012-08-02 23:15 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: e1000-devel@lists.sourceforge.net, netdev

On 08/02/2012 05:01 PM, Jay Vosburgh wrote:
> Chris Friesen<chris.friesen@genband.com>  wrote:

> 	Still, though, isn't "influence the guest's choice" pretty much
> satisified by having the VF interface go carrier down in the guest when
> the host wants it to?  Or are you thinking about more fine grained than
> that?

That was the first thing we started looking at.

It would actually be better technically (since it would use the 
back-channel between PF and VFs rather than needing an explicit virtual 
network link between host/guest) but it would require work in all the 
PF/VF drivers.  We'd need to get support from all the driver maintainers.

The main advantage of doing it in bonding is that we'd only need to 
modify the code in one place.

Chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: discussion questions: SR-IOV, virtualization, and bonding
  2012-08-02 23:15         ` Chris Friesen
@ 2012-08-02 23:36           ` Jay Vosburgh
  0 siblings, 0 replies; 10+ messages in thread
From: Jay Vosburgh @ 2012-08-02 23:36 UTC (permalink / raw)
  To: Chris Friesen; +Cc: e1000-devel@lists.sourceforge.net, netdev

Chris Friesen <chris.friesen@genband.com> wrote:

>On 08/02/2012 05:01 PM, Jay Vosburgh wrote:
>> Chris Friesen<chris.friesen@genband.com>  wrote:
>
>> 	Still, though, isn't "influence the guest's choice" pretty much
>> satisified by having the VF interface go carrier down in the guest when
>> the host wants it to?  Or are you thinking about more fine grained than
>> that?
>
>That was the first thing we started looking at.
>
>It would actually be better technically (since it would use the
>back-channel between PF and VFs rather than needing an explicit virtual
>network link between host/guest) but it would require work in all the
>PF/VF drivers.  We'd need to get support from all the driver maintainers.

	It might also be better (for a different definition of "better")
to use the virtual network link and do more functionality in a generic
user space piece that's not in the kernel and wouldn't require special
driver support.  Either way, I imagine there's going to have to be some
sort of message passing going on.

>The main advantage of doing it in bonding is that we'd only need to modify
>the code in one place.

	As long as it works with VLANs bonded together; that seems to be
more common these days.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [E1000-devel] discussion questions: SR-IOV, virtualization, and bonding
  2012-08-02 23:01       ` [E1000-devel] " Jay Vosburgh
  2012-08-02 23:15         ` Chris Friesen
@ 2012-08-03  4:50         ` John Fastabend
  2012-08-03 17:49           ` Ben Hutchings
  1 sibling, 1 reply; 10+ messages in thread
From: John Fastabend @ 2012-08-03  4:50 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Chris Friesen, e1000-devel@lists.sourceforge.net, netdev

On 8/2/2012 4:01 PM, Jay Vosburgh wrote:
> Chris Friesen <chris.friesen@genband.com> wrote:
>
>> On 08/02/2012 04:26 PM, Chris Friesen wrote:
>>> On 08/02/2012 02:30 PM, Jay Vosburgh wrote:
>>
>>>> The best long term solution is to have a user space API that
>>>> provides link state input to bonding on a per-slave basis, and then some
>>>> user space entity can perform whatever link monitoring method is
>>>> appropriate (e.g., LLDP) and pass the results to bonding.
>>>
>>> I think this has potential. This requires a virtual communication
>>> channel between guest/host if we want the host to be able to influence
>>> the guest's choice of active link, but I think that's not unreasonable.
>
> 	Not necessarily, if something like LLDP runs across the virtual
> link between the guest and slave, then the guest will notice when the
> link goes down (although perhaps not very quickly).  I'm pretty sure the
> infrastructure to make LLDP work on inactive slaves is already there; as
> I recall, the "no wildcard" or "deliver exact" business in the receive
> path is at least partially for LLDP.

Right we run LLDP over the inactive bond. However because LLDP
uses nearest customer bridge, nearest bridge, or neareast non-tpmr
addresses it should be dropped by switching components. The problem
with having VMs send LLDP and _not_ dropping the packets is it looks
like multiple neighbors to the peer. The point is there is really an
edge relay like component in the hardware with SR-IOV. So likely using
LLDP do to do this wouldn't work

If you happen to have the 2010 802.1Q rev section 8.6.3 "frame
filtering" has some more details. The 802.1AB spec has details on the
multiple neighbor case.

>
> 	Still, though, isn't "influence the guest's choice" pretty much
> satisified by having the VF interface go carrier down in the guest when
> the host wants it to?  Or are you thinking about more fine grained than
> that?
>

Perhaps one argument against this is if the hardware supports loopback
modes or the edge relay in the hardware is acting like a VEB it may
still be possible to support VF to VF traffic even if the external link
is down. Not sure how useful this is though or if any existing hardware
even supports it.

Just in case its not clear (it might not be) an edge relay (ER) is
defined in the new 802.1Qbg-2012 spec. "An ER supports local relay
among virtual stations and/or between a virtual station and other
stations on a bridged LAN". Similar to a bridge but without spanning
tree operations.

.John

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [E1000-devel] discussion questions: SR-IOV, virtualization, and bonding
  2012-08-03  4:50         ` [E1000-devel] " John Fastabend
@ 2012-08-03 17:49           ` Ben Hutchings
  2012-08-10 18:41             ` Chris Friesen
  0 siblings, 1 reply; 10+ messages in thread
From: Ben Hutchings @ 2012-08-03 17:49 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jay Vosburgh, Chris Friesen, e1000-devel@lists.sourceforge.net,
	netdev

On Thu, 2012-08-02 at 21:50 -0700, John Fastabend wrote:
> On 8/2/2012 4:01 PM, Jay Vosburgh wrote:
[...]
> > 	Still, though, isn't "influence the guest's choice" pretty much
> > satisified by having the VF interface go carrier down in the guest when
> > the host wants it to?  Or are you thinking about more fine grained than
> > that?
> >
> 
> Perhaps one argument against this is if the hardware supports loopback
> modes or the edge relay in the hardware is acting like a VEB it may
> still be possible to support VF to VF traffic even if the external link
> is down. Not sure how useful this is though or if any existing hardware
> even supports it.
[...]

It seems to me that VF to VF traffic ought to still work.  If it doesn't
then that's an unfortunate regression when moving from software bridging
and virtio to hardware-supported network virtualisation.  (But hybrid
network virtualisation may help to solve that.)

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [E1000-devel] discussion questions: SR-IOV, virtualization, and bonding
  2012-08-03 17:49           ` Ben Hutchings
@ 2012-08-10 18:41             ` Chris Friesen
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Friesen @ 2012-08-10 18:41 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: John Fastabend, Jay Vosburgh, e1000-devel@lists.sourceforge.net,
	netdev

On 08/03/2012 11:49 AM, Ben Hutchings wrote:
> On Thu, 2012-08-02 at 21:50 -0700, John Fastabend wrote:

>> Perhaps one argument against this is if the hardware supports loopback
>> modes or the edge relay in the hardware is acting like a VEB it may
>> still be possible to support VF to VF traffic even if the external link
>> is down. Not sure how useful this is though or if any existing hardware
>> even supports it.
> [...]
>
> It seems to me that VF to VF traffic ought to still work.  If it doesn't
> then that's an unfortunate regression when moving from software bridging
> and virtio to hardware-supported network virtualisation.  (But hybrid
> network virtualisation may help to solve that.)

I would have thought this to be desirable as well.  Apparently the Intel 
engineers disagreed.  The 82599 datasheet has the following:

"Loopback is disabled when the network link is disconnected. It is 
expected (but not required) that system software (including VMs) does 
not post packets for transmission when the link is disconnected. Note 
that packets posted by system software for transmission when the link is 
down are buffered."

Chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-08-10 18:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-02 19:21 discussion questions: SR-IOV, virtualization, and bonding Chris Friesen
2012-08-02 20:30 ` Jay Vosburgh
2012-08-02 22:26   ` Chris Friesen
2012-08-02 22:33     ` Chris Friesen
2012-08-02 23:01       ` [E1000-devel] " Jay Vosburgh
2012-08-02 23:15         ` Chris Friesen
2012-08-02 23:36           ` Jay Vosburgh
2012-08-03  4:50         ` [E1000-devel] " John Fastabend
2012-08-03 17:49           ` Ben Hutchings
2012-08-10 18:41             ` Chris Friesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).