From mboxrd@z Thu Jan 1 00:00:00 1970 From: Philipp Hahn Subject: Re: xenstored crashes with SIGSEGV Date: Fri, 12 Dec 2014 17:14:42 +0100 Message-ID: <548B1472.5080302@univention.de> References: <546461A2.2070908@univention.de> <1415869951.31613.26.camel@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1415869951.31613.26.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: Xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org Hello, On 13.11.2014 10:12, Ian Campbell wrote: > On Thu, 2014-11-13 at 08:45 +0100, Philipp Hahn wrote: >> To me this looks like some memory corruption by some unknown code >> writing into some random memory space, which happens to be the tdb here. > > I wonder if running xenstored under valgrind would be useful. I think > you'd want to stop xenstored from starting during normal boot and then > launch it with: > valgrind /usr/local/sbin/xenstored -N > -N is to stay in the foreground, you might want to do this in a screen > session or something, alternatively you could investigate the --log-* > options in the valgrind manpage, together with the various > --trace-children* in order to follow the processes over its > daemonization. We did enable tracing and now have the xenstored-trace.log of one crash: It contains 1.6 billion lines and is 83 GiB. It just shows xenstored to crash on TRANSACTION_START. Is there some tool to feed that trace back into a newly launched xenstored? My hope would be that xenstored crashes again, because then we could use all those other tools like valgrind more easily. >> 3. the crash happens rarely and the host run fine most of the time. The >> crash mostly happens around midnight and seem to be guest-triggered, as >> the logs on the host don't show any activity like starting new or >> destroying running VMs. So far the problem only showed on host running >> Linux VMs. Other host running Windows VMs so far never showed that crash. Now we also observed a crash on a host running Windows VMs. > If it is really mostly happening around midnight then it might be worth > digging into the host and guest configs for cronjobs and the like, e.g. > log rotation stuff like that which might be tweaking things somehow. > > Does this happen on multiple hosts, or just the one? Multiple host in two different data centers. > Do you rm the xenstore db on boot? It might have a persistent > corruption, aiui most folks using C xenstored are doing so or even > placing it on a tmpfs for performance reasons. We're using a tmpfs for /var/lib/xenstored/, as we had some sever performance problem with something updating /local/domain/0/backend/console/*/0/uuid too often, which put xenstored in permanent D state. > If you are running 4.1.x then I think oxenstored isn't an option, but it > might be something to consider when you upgrade. Thank you for the hint, I'll have another look at the Ocaml version. Thank you again. Philipp Hahn