netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* lost gARP after live migration
@ 2011-06-28 13:01 Laszlo Ersek
  2011-06-28 13:03 ` Paolo Bonzini
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Laszlo Ersek @ 2011-06-28 13:01 UTC (permalink / raw)
  To: xen-devel@lists.xensource.com; +Cc: netdev, Paolo Bonzini

Hi,

with reference to RHBZ#713585:

It seems when a RHEL-6.1 or F-15 Xen PV guest is live migrated, the 
gratuitous ARP packet is not forwarded to the affected "networking 
equipment". The netback vif is added to a routed bridge in the host(s) 
and external hosts are expeted to have connection to the guest at all 
times, no matter the current Xen host.

I experimented a bit with tcpdump, and the gARP does appear on the 
netfront interface. It also appears on the host bridge if sufficient 
time passes between completing the xenbus handshake and sending the gARP.

When the guest queues eg. three gARPs in rapid succession, a variable 
number of them gets lost. (When all such packets disappear, then the 
migrated guest becomes invisible to the outside world, until it 
initiates network traffic on its own.)

When the guest waits for about half a second before sending (queueing), 
the very first gARP packet successfully appears on the host bridge.

I suspect it's a timing race against the netback vif being added to the 
host bridge. What would be a good countermeasure?

- Adding two modparams to xen-netfront (gARP requeue count & number of 
msecs to wait between queueing the gARPs).
- (Paolo's idea:) watching the "hotplug-status" xenstore node and 
sending a single gARP when the watch fires with "connected". This node 
belongs to the backend xenstore subtree, thus watching it from the guest 
doesn't please the architecture astronaut in me.
- Something else.

Sorry for the naivety / verbiage.

Thanks,
lacos

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: lost gARP after live migration
  2011-06-28 13:01 lost gARP after live migration Laszlo Ersek
@ 2011-06-28 13:03 ` Paolo Bonzini
  2011-06-28 13:33 ` Ian Campbell
  2011-06-28 14:14 ` Ben Hutchings
  2 siblings, 0 replies; 4+ messages in thread
From: Paolo Bonzini @ 2011-06-28 13:03 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: netdev, xen-devel@lists.xensource.com

On 06/28/2011 03:01 PM, Laszlo Ersek wrote:
> - (Paolo's idea:) watching the "hotplug-status" xenstore node and
> sending a single gARP when the watch fires with "connected". This node
> belongs to the backend xenstore subtree, thus watching it from the guest
> doesn't please the architecture astronaut in me.

Note that watching the backend and reading its information is quite 
common.  In fact, that's how the state of the backend is observed in the 
first place.  Of course you cannot write to the backend tree, but you do 
not have to do that.

Paolo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: lost gARP after live migration
  2011-06-28 13:01 lost gARP after live migration Laszlo Ersek
  2011-06-28 13:03 ` Paolo Bonzini
@ 2011-06-28 13:33 ` Ian Campbell
  2011-06-28 14:14 ` Ben Hutchings
  2 siblings, 0 replies; 4+ messages in thread
From: Ian Campbell @ 2011-06-28 13:33 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: netdev@vger.kernel.org, xen-devel@lists.xensource.com,
	Paolo Bonzini

On Tue, 2011-06-28 at 14:01 +0100, Laszlo Ersek wrote:
> Hi,
> 
> with reference to RHBZ#713585:
> 
> It seems when a RHEL-6.1 or F-15 Xen PV guest is live migrated, the 
> gratuitous ARP packet is not forwarded to the affected "networking 
> equipment". The netback vif is added to a routed bridge in the host(s) 
> and external hosts are expeted to have connection to the guest at all 
> times, no matter the current Xen host.
> 
> I experimented a bit with tcpdump, and the gARP does appear on the 
> netfront interface. It also appears on the host bridge if sufficient 
> time passes between completing the xenbus handshake and sending the gARP.
> 
> When the guest queues eg. three gARPs in rapid succession, a variable 
> number of them gets lost. (When all such packets disappear, then the 
> migrated guest becomes invisible to the outside world, until it 
> initiates network traffic on its own.)
> 
> When the guest waits for about half a second before sending (queueing), 
> the very first gARP packet successfully appears on the host bridge.
> 
> I suspect it's a timing race against the netback vif being added to the 
> host bridge. What would be a good countermeasure?
> 
> - Adding two modparams to xen-netfront (gARP requeue count & number of 
> msecs to wait between queueing the gARPs).
> - (Paolo's idea:) watching the "hotplug-status" xenstore node and 
> sending a single gARP when the watch fires with "connected". This node 
> belongs to the backend xenstore subtree, thus watching it from the guest 
> doesn't please the architecture astronaut in me.

netback already waits (or should...) for hotplug-status to fire with
"connected" before moving to state XenbusStateConnected. See
hotplug_status_changed in drivers/net/xen-netback/xenbus.c. You need
either the netback in upstream or something newer than 43223efd9bfd (C
Feb 2010) if you are using e.g. xen.git#xen/next-2.6.32. That commit
fixes pretty much the issue you describe.

I expected that netfront waited for the backend to hit
XenbusStateConnected before sending the grat ARP but instead I find it
happens when the backend hits XenbusStateInitWait. I'm not sure if that
is a problem -- it appears to have been done this way since forever
(even back in the classic Xen kernels) and I've never noticed a gARP go
missing in the way you describe, but perhaps something isn't quite
matching up any more.

Ian.

> - Something else.
> 
> Sorry for the naivety / verbiage.
> 
> Thanks,
> lacos
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: lost gARP after live migration
  2011-06-28 13:01 lost gARP after live migration Laszlo Ersek
  2011-06-28 13:03 ` Paolo Bonzini
  2011-06-28 13:33 ` Ian Campbell
@ 2011-06-28 14:14 ` Ben Hutchings
  2 siblings, 0 replies; 4+ messages in thread
From: Ben Hutchings @ 2011-06-28 14:14 UTC (permalink / raw)
  To: Laszlo Ersek; +Cc: netdev, xen-devel@lists.xensource.com, Paolo Bonzini

On Tue, 2011-06-28 at 15:01 +0200, Laszlo Ersek wrote:
[...]
> When the guest waits for about half a second before sending (queueing), 
> the very first gARP packet successfully appears on the host bridge.
> 
> I suspect it's a timing race against the netback vif being added to the 
> host bridge. What would be a good countermeasure?
> 
> - Adding two modparams to xen-netfront (gARP requeue count & number of 
> msecs to wait between queueing the gARPs).

Note that peer notifications are indirected through netdev notifiers and
now include IPv6 NAs as well as ARPs.  If repeated notifications are
commonly necessary then this should probably be handled in the protocol
(or in the networking core).  However this sounds like a workaround
whereas your other option would be a proper fix:

> - (Paolo's idea:) watching the "hotplug-status" xenstore node and 
> sending a single gARP when the watch fires with "connected". This node 
> belongs to the backend xenstore subtree, thus watching it from the guest 
> doesn't please the architecture astronaut in me.
[...]

-- 
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-06-28 14:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-28 13:01 lost gARP after live migration Laszlo Ersek
2011-06-28 13:03 ` Paolo Bonzini
2011-06-28 13:33 ` Ian Campbell
2011-06-28 14:14 ` Ben Hutchings

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).