From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38318) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YpXOr-0007zv-HX for qemu-devel@nongnu.org; Tue, 05 May 2015 03:36:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YpXOo-0008NK-9p for qemu-devel@nongnu.org; Tue, 05 May 2015 03:36:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52281) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YpXOo-0008Mp-2P for qemu-devel@nongnu.org; Tue, 05 May 2015 03:36:14 -0400 Date: Tue, 5 May 2015 15:36:09 +0800 From: Fam Zheng Message-ID: <20150505073609.GA9322@ad.nay.redhat.com> References: <55424F3C.1050209@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <55424F3C.1050209@redhat.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] Fwd: qemu drive mirror assert fault List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: John Snow , qemu-devel , wangxiaolong@ucloud.cn On Thu, 04/30 17:50, Paolo Bonzini wrote: > John, Fam, >=20 > I got this report offlist. This happens if a bit in the hbitmap is > cleared and the HBitmap has _not_ yet reached the bit. See this commen= t > in include/qemu/hbitmap.h: >=20 > * Resetting bits before the current > * position of the iterator is also okay. However, concurrent > * resetting of bits can lead to unexpected behavior if the iterator > * has not yet reached those bits. >=20 > Can you please take a look? Since the gdb output is suggesting 1.5.3, it's worth to trying 2.3 which = has this: commit c4237dfa635900e4d1cdc6038d5efe3507f45f0c Author: Vladimir Sementsov-Ogievskiy Date: Thu Nov 27 12:40:46 2014 +0300 block: fix spoiling all dirty bitmaps by mirror and migration Mirror and migration use dirty bitmaps for their purposes, and si= nce commit [block: per caller dirty bitmap] they use their own bitmap= s, not the global one. But they use old functions bdrv_set_dirty and bdrv_reset_dirty, which change all dirty bitmaps. Named dirty bitmaps series by Fam and Snow are affected: mirrorin= g and migration will spoil all (not related to this mirroring or migrat= ion) named dirty bitmaps. This patch fixes this by adding bdrv_set_dirty_bitmap and bdrv_reset_dirty_bitmap, which change concrete bitmap. Also, to p= revent such mistakes in future, old functions bdrv_(set,reset)_dirty are= made static, for internal block usage. Signed-off-by: Vladimir Sementsov-Ogievskiy CC: John Snow CC: Fam Zheng CC: Denis V. Lunev CC: Stefan Hajnoczi CC: Kevin Wolf Reviewed-by: John Snow Reviewed-by: Fam Zheng Message-id: 1417081246-3593-1-git-send-email-vsementsov@parallels= .com Signed-off-by: Max Reitz Fam >=20 > Thanks, >=20 > Paolo >=20 > -------- Forwarded Message -------- > Subject: qemu drive mirror assert fault > Date: Wed, 29 Apr 2015 10:50:28 +0800 > From: wangxiaolong > To: pbonzini >=20 > hello, >=20 > I used drive mirror to do live migration, and I run into such an assert > fault: >=20 > (gdb) bt >=20 > #0 0x00007fd2c6e678a5 in raise (sig=3D6) at > ../nptl/sysdeps/unix/sysv/linux/raise.c:64 >=20 > #1 0x00007fd2c6e69085 in abort () at abort.c:92 >=20 > #2 0x00007fd2c6e60a1e in __assert_fail_base (fmt=3D, > assertion=3D0x7fd2ca215aa0 "cur", file=3D0x7fd2ca215a78 "util/hbitmap.c= ", > line=3D, >=20 > function=3D) at assert.c:96 >=20 > #3 0x00007fd2c6e60ae0 in __assert_fail (assertion=3D0x7fd2ca215aa0 "cu= r", > file=3D0x7fd2ca215a78 "util/hbitmap.c", line=3D129, function=3D0x7fd2ca= 215bf0 > "hbitmap_iter_skip_words") >=20 > at assert.c:105 >=20 > #4 0x00007fd2ca1b3bb8 in hbitmap_iter_skip_words (hbi=3D out>) at util/hbitmap.c:129 >=20 > #5 0x00007fd2c9f8f8e0 in hbitmap_iter_next (opaque=3D0x7fd2cc59c730) a= t > /usr/src/debug/qemu-kvm-1.5.3/include/qemu/hbitmap.h:166 >=20 > #6 mirror_iteration (opaque=3D0x7fd2cc59c730) at block/mirror.c:163 >=20 > #7 mirror_run (opaque=3D0x7fd2cc59c730) at block/mirror.c:407 >=20 > #8 0x00007fd2c9fc45bb in coroutine_trampoline (i0=3D out>, i1=3D) at coroutine-ucontext.c:118 >=20 > #9 0x00007fd2c6e78b70 in ?? () from /lib64/libc-2.12.so >=20 > #10 0x00007fff53eede80 in ?? () >=20 > #11 0x0000000000000000 in ?? () >=20 >=20 > and I just can=E2=80=99t figure out what is the cause of this situation= , > could you help me figure it out, thanks! >=20 >=20 >=20 >=20