From mboxrd@z Thu Jan  1 00:00:00 1970
From: Philipp Hahn <hahn@univention.de>
Subject: Re: xenstored crashes with SIGSEGV
Date: Fri, 12 Dec 2014 17:45:28 +0100
Message-ID: <548B1BA8.3090504@univention.de>
References: <546461A2.2070908@univention.de>	
	<1415869951.31613.26.camel@citrix.com>
	<548B1472.5080302@univention.de>
	<1418401932.16425.34.camel@citrix.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <1418401932.16425.34.camel@citrix.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Xen-devel@lists.xen.org
List-Id: xen-devel@lists.xenproject.org

Hello Ian,

On 12.12.2014 17:32, Ian Campbell wrote:
> On Fri, 2014-12-12 at 17:14 +0100, Philipp Hahn wrote:
>> We did enable tracing and now have the xenstored-trace.log of one crash:
>> It contains 1.6 billion lines and is 83 GiB.
>> It just shows xenstored to crash on TRANSACTION_START.
>>
>> Is there some tool to feed that trace back into a newly launched xenstored?
> 
> Not that I know of I'm afraid.

Okay, then I have to continue with my own tool.

> Do you get a core dump when this happens? You might need to fiddle with
> ulimits (some distros disable by default). IIRC there is also some /proc
> nob which controls where core dumps go on the filesystem.

Not for that specific trace: We first enabled generating core files, but
only then discovered that this is not enough. Then we enabled
--trace-file, but on that host something reseted generating the core file.
We hopefully fixed all hosts so on the next crash we hopefully will get
both a core file and the trace.

>> My hope would be that xenstored crashes again, because then we could use
>> all those other tools like valgrind more easily.
> 
> That would be handy. My fear would be that this bug is likely to be a
> race condition of some sort, and the granularity/accuracy of the
> playback would possibly need to be quite high to trigger the issue.

cxenstored looks single threaded to me, or am I wrong?

>>> Do you rm the xenstore db on boot? It might have a persistent
>>> corruption, aiui most folks using C xenstored are doing so or even
>>> placing it on a tmpfs for performance reasons.
>>
>> We're using a tmpfs for /var/lib/xenstored/, as we had some sever
>> performance problem with something updating
>> /local/domain/0/backend/console/*/0/uuid too often, which put xenstored
>> in permanent D state.
> 
> But this is just a process crashing and not the whole host so you still
> have the db file at the point of the crash?

Yes: Running xs_tdb_dump or tdb_dump on it didn't show anything
obviously wrong.

> It might be interesting to see what happens if you preserve the db and
> reboot arranging for the new xenstored to start with the old file. If
> the corruption is part of the file then maybe it can be induced to crash
> again more quickly.

Thanks for the pointer, will try.

Thank you again for your fast reply.
Philipp Hahn