From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [PATCH] tools/xenconsoled: Initialise pointers before trying to use them Date: Thu, 7 Mar 2013 17:46:52 +0000 Message-ID: <5138D28C.7020303@citrix.com> References: <1362673249.29093.57.camel@zion.uk.xensource.com> <5138BEC3.40907@citrix.com> <1362676506.29093.61.camel@zion.uk.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1362676506.29093.61.camel@zion.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Wei Liu Cc: Marcus Granado , Ian Jackson , Ian Campbell , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 07/03/13 17:15, Wei Liu wrote: > On Thu, 2013-03-07 at 16:22 +0000, Andrew Cooper wrote: >> On 07/03/13 16:20, Wei Liu wrote: >>> On Thu, 2013-03-07 at 15:13 +0000, Andrew Cooper wrote: >>>> This is a regression introduced by >>>> >>>> "Switch from select() to poll() in xenconsoled's IO loop." >>>> hg c/s 26405:7359c3122c5d >>>> git cc5434c933153c4b8812d1df901f8915c22830a8 >>>> >>>> which results in reliable segfaults during VM power operations. >>>> >>>> Signed-off-by: Marcus Granado >>>> Signed-off-by: Andrew Cooper >>>> >>> Good catch. Thanks. >>> >>> >>> Wei. >>> >> Sadly, after fixing these segfaults, the code as currently is will cause >> xenconsoled to exit gracefully as soon as you try and boot the 128th >> domain. We are currently investigating the issue. >> > Odd. So you were implying if you didn't fix this bug, you succeeded in > booting up >128 guests? > > All the exit paths have "dolog" which outputs to stderr, so maybe run > xenconsoled in foreground can help. > > > Wei. After fixing the segfault bugs, xenconsoled intermittently fails and exits with: Mar 7 16:45:49 localhost /usr/sbin/xenconsoled: Failure in poll xs_handle: 3 (No such process) The test case is attempting to sequentially boot 1000 PV VMs on the same host. So far, the common theme of failures appear to be at multiples of 128 VMs. This is usually at the 128th VM, but also seen at the 384th VM. ~Andrew