From mboxrd@z Thu Jan 1 00:00:00 1970 From: Philipp Hahn Subject: [BUG] xenstored crash [xen-4.1.3] - likely tdb related Date: Wed, 15 Oct 2014 10:41:07 +0200 Message-ID: <543E3323.20401@univention.de> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1XeK8t-0001mU-6K for xen-devel@lists.xenproject.org; Wed, 15 Oct 2014 08:41:11 +0000 Received: from localhost (localhost [127.0.0.1]) by solig.knut.univention.de (Postfix) with ESMTP id 20F0610CE51B for ; Wed, 15 Oct 2014 10:41:09 +0200 (CEST) Received: from mail.univention.de ([127.0.0.1]) by localhost (solig.knut.univention.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8it6+HwvUjdp for ; Wed, 15 Oct 2014 10:41:08 +0200 (CEST) Received: from [10.150.174.29] (unknown [62.156.150.204]) by solig.knut.univention.de (Postfix) with ESMTPSA id F22B410CC868 for ; Wed, 15 Oct 2014 10:41:07 +0200 (CEST) List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: xen-devel List-Id: xen-devel@lists.xenproject.org Hello, we now observed several xenstored crashes. After enabling writing core filed I was able to capture the following stack trace through gdb: > 0 talloc_chunk_from_ptr (ptr=3D0xff0000000000) at talloc.c:116 > 116 if ((tc->flags & ~0xF) !=3D TALLOC_MAGIC) { = > warning: not using untrusted file "/root/xen-4.1-4.1.3/xen-4.1.3/tools/xe= nstore/.gdbinit" > (gdb) bt > #0 talloc_chunk_from_ptr (ptr=3D0xff0000000000) at talloc.c:116 > #1 0x0000000000407edf in talloc_free (ptr=3D0xff0000000000) at talloc.c:= 551 > #2 0x000000000040a348 in tdb_open_ex (name=3D0x167d620 "/var/lib/xenstor= ed/tdb.0x16a48b0", = > hash_size=3D, tdb_flags=3D0, open_flags=3D, mode=3D, = > log_fn=3D0x4093b0 , hash_fn=3D) at = tdb.c:1958 > #3 0x000000000040a684 in tdb_open (name=3D0xff0000000000
, hash_size=3D0, = > tdb_flags=3D4254928, open_flags=3D-1, mode=3D3974450184) at tdb.c:1773 > #4 0x000000000040a70b in tdb_copy (tdb=3D0x16c9040, outfile=3D0x167d620 = "/var/lib/xenstored/tdb.0x16a48b0") > at tdb.c:2124 > #5 0x0000000000406c2d in do_transaction_start (conn=3D0x167e310, in=3D) > at xenstored_transaction.c:164 > #6 0x00000000004045ca in process_message (conn=3D0x167e310) at xenstored= _core.c:1214 > #7 consider_message (conn=3D0x167e310) at xenstored_core.c:1261 > #8 handle_input (conn=3D0x167e310) at xenstored_core.c:1308 > #9 0x0000000000405170 in main (argc=3D, argv=3D) at xenstored_core.c:1964 > = > (gdb) frame 2 > #2 0x000000000040a348 in tdb_open_ex (name=3D0x167d620 "/var/lib/xenstor= ed/tdb.0x16a48b0", = > hash_size=3D, tdb_flags=3D0, open_flags=3D, mode=3D, = > log_fn=3D0x4093b0 , hash_fn=3D) at = tdb.c:1958 > 1958 SAFE_FREE(tdb->locked); > (gdb) print tdb->locked > $3 =3D (struct tdb_lock_type *) 0xff0000000000 The "tdb->locked" address looks bogus. I had a look at xen/tools/xenstore/tdb.c myself but did not spot any obvious errors. As tdb_copy() looks like some internal function of tdb and tdb has come from the SAMBA project, this looks more like a bug in tdb then in xenstored. I compared tdb between RELEASE-4.1.3 and master and didn't see any interesting changes, so I'm not convinced that an update to 4.1.6 or newer xen-4.x would solve this specific issue. The crash is very annoying as the domains can no longer be managed or migrated. As xenstored (AFAIK) can't be restarted, we currently have to reboot the host to get the system back to a workable state. Has someone seen that bug elsewhere? Sincerely Philipp -- = Philipp Hahn Open Source Software Engineer Univention GmbH be open. Mary-Somerville-Str. 1 D-28359 Bremen Tel.: +49 421 22232-0 Fax : +49 421 22232-99 hahn@univention.de http://www.univention.de/ Gesch=E4ftsf=FChrer: Peter H. Ganten HRB 20755 Amtsgericht Bremen Steuer-Nr.: 71-597-02876