From mboxrd@z Thu Jan 1 00:00:00 1970 From: Keir Fraser Subject: Re: Domain 0 stop response on frequently reboot VMS Date: Sat, 16 Oct 2010 08:16:51 +0100 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: MaoXiaoyun , xen devel List-Id: xen-devel@lists.xenproject.org Send a patch to the list, Cc Jeremy Fitzhardinge and also a blktap maintainer, which you should be able to derive from changeset histories and signed-off-by lines. Flag it clearly in the subject line as a proposed bugfix for pv_ops. -- Keir On 16/10/2010 06:39, "MaoXiaoyun" wrote: > Well, Thanks Keir. > Fortunately we caught the bug, it turned out to be a tapdisk problem. > A brief explaination for other guys might confront this issue. > > Clear BLKTAP_DEFERRED on line 19 will lead to the concurrent access of > tap->deferred_queue between line 24 and 37, which will finally cause bad > pointer of tap->deferred_queue, and infinte loop in while clause in line 22. > Lock line 24 will be a simple fix. > > /linux-2.6-pvops.git/drivers/xen/blktap/wait_queue.c > 9 void > 10 blktap_run_deferred(void) > 11 { > 12 LIST_HEAD(queue); > 13 struct blktap *tap; > 14 unsigned long flags; > 15 > 16 spin_lock_irqsave(&deferred_work_lock, flags); > 17 list_splice_init(&deferred_work_queue, &queue); > 18 list_for_each_entry(tap, &queue, deferred_queue) > 19 clear_bit(BLKTAP_DEFERRED, &tap->dev_inuse); > 20 spin_unlock_irqrestore(&deferred_work_lock, flags); > 21 > 22 while (!list_empty(&queue)) { > 23 tap = list_entry(queue.next, struct blktap, deferred_queue); > 24 &nb sp; list_del_init(&tap->deferred_queue); > 25 blktap_device_restart(tap); > 26 } > 27 } > 28 > 29 void > 30 blktap_defer(struct blktap *tap) > 31 { > 32 unsigned long flags; > 33 > 34 spin_lock_irqsave(&deferred_work_lock, flags); > 35 if (!test_bit(BLKTAP_DEFERRED, &tap->dev_inuse)) { > 36 set_bit(BLKTAP_DEFERRED, &tap->dev_inuse); > 37 list_add_tail(&tap->deferred_queue, &deferred_work_queue); > 38 } > 39 spin_unlock_irqrestore(&deferred_work_lock,! f lags); > 40 } > > >> Date: Fri, 15 Oct 2010 13:57:09 +0100 >> Subject: Re: [Xen-devel] Domain 0 stop response on frequently reboot VMS >> From: keir@xen.org >> To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com >> >> You'll probably want to see if you can get SysRq output from dom0 via serial >> line. It's likely you can if it is alive enough to respond to ping. This >> might tell you things like what all processes are getting blocked on, and >> thus indicate what is stopping dom0 from making progress. >> >> -- Keir >> >> On 15/10/2010 13:43, "MaoXiaoyun" wrote: >> >>> >>> Hi Keir: >>> >>> First, I'd like to express my appreciation for the help your offered >>> before. >>> Well, recently we confront a rather nasty domain 0 no response >>> problem. >>> >>> We still have 12 HVMs almost continuously and con currently reboot >>> test on a physical server. >>> A few hours later, the server looks like dead. We only can ping to >>> the server and get right response, >>> the Xen works fine since we can get debug info from serial port. Attached is >>> the full debug output. >>> After decode the domain 0 CPU stack, I find the CPU still works for domain 0 >>> since the stack changed >>> info changed every time I dumped. >>> >>> Could help to take a look at the attentchment to see whether there are >>> some hints for debugging this >>> problem. Thanks in advance. >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >> >> > !