All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: arp during live migration
@ 2007-03-02 22:22 Graham, Simon
  2007-03-03 16:21 ` Jacob Gorm Hansen
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Graham, Simon @ 2007-03-02 22:22 UTC (permalink / raw)
  To: Cristian Zamfir, xen-devel

> I am having some trouble with the send_fake_arp in the netfront
driver.
> 

Interesting - I was just composing an almost identical note; we've been
seeing some horrible network blackouts in migration that are caused by a
failure to send the gratuitous ARP (blackouts vary from 0-50+ seconds
when the domain is idle and just being pinged from outside).

In my case, I NEVER see the gratuitous ARP being sent (confirmed using
tcpdump on peth0 in Dom0) and the return value from dev_queue_xmit is
sometimes 0 and sometimes 2 (that's PLUS 2 -- congestion notification
[NET_XMIT_CN]).

My next step was going to be to add instrumentation to netback but I
thought I would ask if this is a known issue with 3.0.3 first...

Simon

^ permalink raw reply	[flat|nested] 12+ messages in thread
* arp during live migration
@ 2007-03-02 20:54 Cristian Zamfir
  0 siblings, 0 replies; 12+ messages in thread
From: Cristian Zamfir @ 2007-03-02 20:54 UTC (permalink / raw)
  To: xen-devel


Hi,

I am having some trouble with the send_fake_arp in the netfront driver.

Normally, on my domU, which has no queuing disciplines compiled in, the 
packets are sent via dev_queue_xmit in net/core/dev.c and enqueued using 
pfifo_fast_enqueue in net/sched/sch_generic.c.

However, during live migration, send_fake_arp() returns -2 and does not 
go to pfifo_fast_enqueue any more. I have been able to trace it further 
than this code in dev_queue_xmit:

if (q->enqueue) {
                 /* Grab device queue */
                 spin_lock(&dev->queue_lock);
                 rc = q->enqueue(skb, q);
                 qdisc_run(dev);
                 spin_unlock(&dev->queue_lock);
                 rc = rc == NET_XMIT_BYPASS ? NET_XMIT_SUCCESS : rc;
                 goto out;
          }

I noticed that the error code returned by send_fake_arp() is not 
checked. Would it be a good option to check the error code and 
reschedule the arp broadcast at a later time?

I have made some changes to xen 3.0.3 regarding block device migration 
so I might have messed things up. It could be the reason only few people 
reported this problem on xen-users. Obviously, the problem can also go 
unnoticed if a downtime of 1-2 seconds is tolerated.

Does anyone have any hints on why this might happen or how to search for 
more clues?

Thank you.

Cristian

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-01-30 11:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-02 22:22 arp during live migration Graham, Simon
2007-03-03 16:21 ` Jacob Gorm Hansen
2007-03-03 17:10   ` Graham, Simon
2007-03-05 15:40   ` Cristian Zamfir
     [not found] ` <1172938895.14470.25.ca mel@localhost.localdomain>
2007-03-05 23:47   ` Graham, Simon
2007-03-06 22:59   ` Graham, Simon
2007-03-07  8:30     ` Keir Fraser
2007-03-07 16:02     ` Keir Fraser
2007-03-07 22:25       ` Graham, Simon
2007-03-08  7:53         ` Keir Fraser
2008-01-30 11:17 ` Hans-Christian Armingeon
  -- strict thread matches above, loose matches on Subject: below --
2007-03-02 20:54 Cristian Zamfir

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.