From mboxrd@z Thu Jan 1 00:00:00 1970 From: Cristian Zamfir Subject: arp during live migration Date: Fri, 02 Mar 2007 20:54:02 +0000 Message-ID: <45E88EEA.4020707@dcs.gla.ac.uk> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org Hi, I am having some trouble with the send_fake_arp in the netfront driver. Normally, on my domU, which has no queuing disciplines compiled in, the packets are sent via dev_queue_xmit in net/core/dev.c and enqueued using pfifo_fast_enqueue in net/sched/sch_generic.c. However, during live migration, send_fake_arp() returns -2 and does not go to pfifo_fast_enqueue any more. I have been able to trace it further than this code in dev_queue_xmit: if (q->enqueue) { /* Grab device queue */ spin_lock(&dev->queue_lock); rc = q->enqueue(skb, q); qdisc_run(dev); spin_unlock(&dev->queue_lock); rc = rc == NET_XMIT_BYPASS ? NET_XMIT_SUCCESS : rc; goto out; } I noticed that the error code returned by send_fake_arp() is not checked. Would it be a good option to check the error code and reschedule the arp broadcast at a later time? I have made some changes to xen 3.0.3 regarding block device migration so I might have messed things up. It could be the reason only few people reported this problem on xen-users. Obviously, the problem can also go unnoticed if a downtime of 1-2 seconds is tolerated. Does anyone have any hints on why this might happen or how to search for more clues? Thank you. Cristian