From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: xen-4.1: PV domain hanging at startup, jiffies stopped Date: Mon, 29 Aug 2011 16:59:38 -0400 Message-ID: <20110829205938.GB18697@dumpdata.com> References: <4E5A3F0A.8060700@mimuw.edu.pl> <20110829200749.GA17265@dumpdata.com> <4E5BF4C3.2050108@mimuw.edu.pl> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <4E5BF4C3.2050108@mimuw.edu.pl> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Marek Marczykowski Cc: "xen-devel@lists.xensource.com" , Joanna Rutkowska List-Id: xen-devel@lists.xenproject.org On Mon, Aug 29, 2011 at 10:21:23PM +0200, Marek Marczykowski wrote: > On 29.08.2011 22:07, Konrad Rzeszutek Wilk wrote: > > On Sun, Aug 28, 2011 at 03:13:46PM +0200, Marek Marczykowski wrote: > >> Hey, > >> > >> I'm experiencing strange problem: non-deterministic PV domain hang, only > >> on some machines (with fast SSD drive). I've tried xen-4.1.0 and > >> xen-4.1.1 with many kernels different kernels: > >> VM: > >> - 2.6.38.3 xenlinux based on SUSE package > >> - vanilla 3.0.3 > >> - vanilla 3.1 rc2 > >> dom0: > >> - 2.6.38.3 xenlinux based on SUSE package > >> - vanilla 3.1 rc2 > >> > >> Result always the same: sometimes VM hang at startup, SysRq-T shows > >> modprobe waiting in "wait_for_devices" (concretely schedule_timeout) and > >> jiffies counter not increasing between task-states dumps. > >> > >> The only found thing (probably) connected with this problem are domU > >> kernel messages: > >> CE: xen increased min_delta_ns to 150000 nsec > >> (...) > >> CE: xen increased min_delta_ns to 4000000 nsec > >> CE: Reprogramming failure. Giving up > >> > >> This messages doesn't exists in successful boot. > >> > >> I've also tried some options to xen and domU kernel, but without success > >> (all combinations): > > > > BTW, your 'xencons=..' and 'swiotlb=force' are obsolete. Use > > 'console=hvc0' and 'iommu=soft'. The 'swiotlb=force' kills performance. > > > >> xen: tsc=unstable, cpufreq=none > >> domU: nohz=off, clocksource=tsc > >> > >> Some combination of above options lowered frequency of problem (ex > >> tsc=unstable + nohz=off), but it happens quite often - like 1 of 15 > >> boots fails. > >> > >> Have you idea what is the cause and what can help? > > > > The problem looks to be xenwatch stuck. So the problem is in Dom0 right? > > This "R" state of xenwatch looks like result of SysRq, which dumps data... > > [ 118.679707] [] handle_sysrq+0x21/0x30 > [ 118.679707] [] sysrq_handler+0xb9/0xe0 > [ 118.679707] [] xenwatch_thread+0xb0/0x170 > > And the problem is at DomU boot, Dom0 works without any problems. Ok, but I am still unsure where it is hanging in DomU. Can you run with 'console=hvc0 debug initcall_debug loglevel=8 earlyprintk=xen' to get an idea of what is stuck in the guest? You might also have better luck using 'xenctx' to get a stack trace of what is hangning in the guest. (you will need the System.map file from the guest's kernel.. but that should be fairly easy to extract).