Re: Bug: xm commands hanging due to poor threading in xend

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Matt Ayres <matta@tektonic.net>
To: Ewan Mellor <ewan@ewanmellor.org.uk>
Cc: xen-devel@lists.xensource.com
Subject: Re: Bug: xm commands hanging due to poor threading in xend
Date: Mon, 23 Jan 2006 14:46:00 -0500	[thread overview]
Message-ID: <43D53278.5090407@tektonic.net> (raw)
In-Reply-To: <20060123035932.GA28166@localhost.localdomain>



Ewan Mellor wrote:

>>
>> Bug url: http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=465
> 
> In the /var/log/xend-debug.log for both your bugs #465 and #486 you can see
> the message "error: can't start new thread".  That's going to be fatal --
> there's no way that Xend can proceed if it cannot create new threads.
> 
> This points to a resource leak on the machine -- either you are leaking
> threads or processes locally to Xend or globally to your machine, which would
> show up on ps ax, or you are out of memory, which would show up in free or top
> (press m to sort by memory usage).  Possibly, this could be a manifestation of
> a file descriptor leak, which would show up in lsof.
> 
> Could you try and track down the leak?  This would give us a much better clue
> as where to look.
> 

I went ahead and did find some problems.  On a server up with 10 days 
some processes (mysql/httpd) in dom0 were stressed.  Swap was 50% in 
use.  I have put in memory minimizing config files for both of these 
apps.  File descriptors is still high even after restart most all 
services on the server with the higher uptime.  I can also try 
increasing dom0 memory to 512MB or so.

I did 128MB for dom0 with 2.0 and increased this to 256MB with 3.0 
because all my hosts can now access their full 8GB.

10 day uptime host:
# lsof -n | wc -l
2775
# free
              total       used       free     shared    buffers     cached
Mem:        262544     218040      44504          0      21300      55592
-/+ buffers/cache:     141148     121396
Swap:       522104      35944     486160

2 day uptime host:
# lsof -n | wc -l
1420
# free
              total       used       free     shared    buffers     cached
Mem:        262544     252076      10468          0      28432      85264
-/+ buffers/cache:     138380     124164
Swap:       522104       3928     518176


File limit is 14343 so fd's shouldn't be a problem.

I do not have any OOM errors in my logs though.

>> I've also run into this once:
>>
>> Message from syslogd@vm20 at Fri Jan 20 23:16:52 2006 ...
>> vm20 xenstored: xenstored corruption: connection id -1: err No such file 
>> or directory: No child '(null)' found
> 
> If you get this, all bets are off.  There is no way that the system as it
> stands will recover gracefully if the store is corrupted.  At best, you'll
> just lose configuration data regarding the running VMs -- at worst, the
> corruption could persist indefinitely, and you'll be unable to do anything
> through Xend.
> 
> Do you have xen-unstable changeset 8269:ac3ceb2d37d1 aka xen-3.0-testing
> changeset 8250:1e3d31952015?  This fixes the only xenstore corruption bug that
> I know of, and if you've got that fix, then it's definitely a new bug.  In
> that case, we would appreciate it if you could either find a test case that
> takes less than a few days to trigger this bug, or get your hands dirty
> yourself and put some tracing and assertions into Xenstored around the TDB
> manipulations to try and catch the corruption.
> 

I am running -unstable from the 16th.  If that change exists in there 
then yes I have the fix.

> Maybe the corrupted TDB file itself might be useful to someone.  Could you
> save that, too?

Yes, normally in a case like this I get a few tdb.xxxxxx where the x's 
represent a 6 character length hex string.

> 
> As far as I'm aware, you are the only person who's ever seen this message, so
> tracking it down without your help is going to be impossible.  Is there
> anything strange about your setup?  Any network block devices or NFS involved,
> any quotas on your filesystems or SELinux?  Any patches that you've applied,
> non-standard kernel options, anything like that?
> 

My setup is fairly standard.  -unstable, PAE, LVM, routed networking. 
Just tracking Xen using mercuial.

next prev parent reply	other threads:[~2006-01-23 19:46 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-21 19:19 Bug: xm commands hanging due to poor threading in xend Matt Ayres
2006-01-21 19:20 ` Matt Ayres
2006-01-23  3:59 ` Ewan Mellor
2006-01-23 19:46   ` Matt Ayres [this message]
2006-01-23 19:54     ` Matt Ayres
2006-01-24 18:11       ` Matt Ayres
2006-01-23 20:09     ` Ewan Mellor
  -- strict thread matches above, loose matches on Subject: below --
2006-01-22  0:05 James Harper
2006-01-23 19:57 Ian Pratt
2006-01-23 20:23 ` Matt Ayres

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43D53278.5090407@tektonic.net \
    --to=matta@tektonic.net \
    --cc=ewan@ewanmellor.org.uk \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.