From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jim Fehlig Subject: Re: help with xenstored 'hang' Date: Wed, 30 Jun 2010 17:31:40 -0600 Message-ID: <4C2BD3DC.1030008@novell.com> References: <4C2BC1FD.5050404@novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Patrick Colp Cc: xen-devel List-Id: xen-devel@lists.xenproject.org Patrick Colp wrote: > I was recently struggling with what sounds like a not-too-dissimilar > problem while working with a disaggregated version of xenstore. The > ultimate solution for me was to disable pthreads in xenstore/libxs. I > just commented out the following line in tools/xenstore/Makefile: > > xs.opic: CFLAGS += -DUSE_PTHREAD > Xen3.2 predates c/s 17405, which introduced optional use of pthreads. Prior to that, pthreads was used explicitly. > After I removed that line and rebuilt and installed xenstore, it > worked just fine. I would be curious to know if this also solves your > problem. > I can see if the user is receptive to testing backported 17405 with pthreads disabled. Thanks for the suggestion. Jim > > Patrick > > > On 30 June 2010 15:15, Jim Fehlig wrote: > >> I'm trying to debug an 'xm list' hang on a large (~700 hosts) Xen 3.2 >> production installation. The hang occurs randomly, on a random host. >> User has provided cores of xend and xenstored processes when hang >> occurs. After poking at these cores I have discovered >> >> In xend process, a thread is blocked on a cond variable, waiting for a >> response to XS_TRANSACTION_START from xenstored. A reader thread >> responsible for reading from xenstored is blocked on read(2). >> >> In the xenstored process, the lone thread is blocked on select(2), >> waiting for IO. I examined the connections list and see that it contains >> a connection for the XS_TRANSACTION_START request. Dumping the >> connection object: >> >> (gdb) p *(struct connection *)0x526c70 >> $48 = {list = {next = 0x517c30, prev = 0x5151f0}, fd = 13, id = 0, >> can_write = >> true, in = 0x523600, >> out_list = {next = 0x526c98, prev = 0x526c98}, transaction = 0x0, >> transaction_list = {next = 0x523560, >> prev = 0x523560}, next_transaction_id = 60231445, transaction_started = 1, >> domain = 0x0, watches = { >> next = 0x51daa0, prev = 0x5267b0}, write = 0x402460 , read = >> 0x405180 } >> >> Notice transaction_started is set to 1, but out_list is empty. AFAICT, >> that means the reply has been sent to xend. The reader thread in xend >> should have received the response and signaled the cond variable - >> allowing execution to progress. Ultimately, xend would send a >> XS_TRANSACTION_END message, freeing the connection object in xenstored >> and removing it from connections list. >> >> Does my understanding of this code sound correct? Anyone have >> suggestions or further debugging tips? Examining cores is about my only >> debug option as user does not want to deploy debug patches, enable >> tracing, etc. across 700 hosts. >> >> Interestingly, when user strace's or attaches to xenstored process with >> gdb, xenstored "awakes", the hung 'xm list' returns, and xenstored >> continues normally. A new connection to xenstored (e.g. running xmtop) >> seems to poke it along as well. Would a timeout on select(2) in main >> loop of xenstored help at all? >> >> Thanks for any insights! >> Jim >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >> >>