From mboxrd@z Thu Jan 1 00:00:00 1970 From: Philipp Hahn Subject: Re: xenstored crashes with SIGSEGV Date: Fri, 12 Dec 2014 17:45:28 +0100 Message-ID: <548B1BA8.3090504@univention.de> References: <546461A2.2070908@univention.de> <1415869951.31613.26.camel@citrix.com> <548B1472.5080302@univention.de> <1418401932.16425.34.camel@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1418401932.16425.34.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: Xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org Hello Ian, On 12.12.2014 17:32, Ian Campbell wrote: > On Fri, 2014-12-12 at 17:14 +0100, Philipp Hahn wrote: >> We did enable tracing and now have the xenstored-trace.log of one crash: >> It contains 1.6 billion lines and is 83 GiB. >> It just shows xenstored to crash on TRANSACTION_START. >> >> Is there some tool to feed that trace back into a newly launched xenstored? > > Not that I know of I'm afraid. Okay, then I have to continue with my own tool. > Do you get a core dump when this happens? You might need to fiddle with > ulimits (some distros disable by default). IIRC there is also some /proc > nob which controls where core dumps go on the filesystem. Not for that specific trace: We first enabled generating core files, but only then discovered that this is not enough. Then we enabled --trace-file, but on that host something reseted generating the core file. We hopefully fixed all hosts so on the next crash we hopefully will get both a core file and the trace. >> My hope would be that xenstored crashes again, because then we could use >> all those other tools like valgrind more easily. > > That would be handy. My fear would be that this bug is likely to be a > race condition of some sort, and the granularity/accuracy of the > playback would possibly need to be quite high to trigger the issue. cxenstored looks single threaded to me, or am I wrong? >>> Do you rm the xenstore db on boot? It might have a persistent >>> corruption, aiui most folks using C xenstored are doing so or even >>> placing it on a tmpfs for performance reasons. >> >> We're using a tmpfs for /var/lib/xenstored/, as we had some sever >> performance problem with something updating >> /local/domain/0/backend/console/*/0/uuid too often, which put xenstored >> in permanent D state. > > But this is just a process crashing and not the whole host so you still > have the db file at the point of the crash? Yes: Running xs_tdb_dump or tdb_dump on it didn't show anything obviously wrong. > It might be interesting to see what happens if you preserve the db and > reboot arranging for the new xenstored to start with the old file. If > the corruption is part of the file then maybe it can be induced to crash > again more quickly. Thanks for the pointer, will try. Thank you again for your fast reply. Philipp Hahn