From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:39649)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1ZyeYU-0005LN-RP
	for qemu-devel@nongnu.org; Tue, 17 Nov 2015 06:36:12 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1ZyeYR-0007kU-HN
	for qemu-devel@nongnu.org; Tue, 17 Nov 2015 06:36:10 -0500
Received: from mx1.redhat.com ([209.132.183.28]:47778)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <dgilbert@redhat.com>) id 1ZyeYR-0007kN-7h
	for qemu-devel@nongnu.org; Tue, 17 Nov 2015 06:36:07 -0500
Date: Tue, 17 Nov 2015 11:36:02 +0000
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Message-ID: <20151117113601.GD2498@work-vm>
References: <6A17C71B52524C408E7AAF69103E9E490F14400C@fabamailserver.fabagl.fabasoft.com>
	<20151113190014.GB18986@redhat.com>
	<6A17C71B52524C408E7AAF69103E9E490F14E9F4@fabamailserver.fabagl.fabasoft.com>
	<20151117095920.GB2498@work-vm>
	<6A17C71B52524C408E7AAF69103E9E490F153F45@fabamailserver.fabagl.fabasoft.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
In-Reply-To: <6A17C71B52524C408E7AAF69103E9E490F153F45@fabamailserver.fabagl.fabasoft.com>
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] WG: [ovirt-users] Segmentation fault in libtcmalloc
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Grundmann, Christian" <Christian.Grundmann@fabasoft.com>
Cc: "'qemu-devel@nongnu.org'" <qemu-devel@nongnu.org>, "stefanha@redhat.com" <stefanha@redhat.com>

* Grundmann, Christian (Christian.Grundmann@fabasoft.com) wrote:
> Hi,
>=20
> @ Can you please use a 'thread apply all bt full'   the full gives a li=
ttle more info.
>=20
> gdb --batch /usr/libexec/qemu-kvm core.52281.1447709011.dump -ex "set p=
agination off" -ex "thread apply all bt full"

OK, it doesn't relaly give any more without the debuginfo package mention=
ed below.

<snip>

> @ Also, if you've not already got it installed can you please install t=
he debuginfo package for qemu, it gives a lot more information in backtra=
ces.
> Sorry it's a ovirt-node System where I can't you yum

Ah, although perhaps if you took the core dump, onto another machine with=
 matching qemu and debuginfo you should
be able to get more detail.

> @ Does this part always look the same in your backtraces?
> The most are the same, found one a little bit different :
> Thread 1 (Thread 0x7f378a0d7c00 (LWP 6658)):
> #0  0x00007f3785d18353 in tcmalloc::ThreadCache::ReleaseToCentralCache(=
tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /lib64/libt=
cmalloc.so.4
> No symbol table info available.
> #1  0x00007f3785d186b0 in tcmalloc::ThreadCache::Scavenge() () from /li=
b64/libtcmalloc.so.4
> No symbol table info available.
> #2  0x00007f3785d27057 in tc_free () from /lib64/libtcmalloc.so.4
> No symbol table info available.
> #3  0x00007f37885e858f in g_free () from /lib64/libglib-2.0.so.0
> No symbol table info available.
> #4  0x00007f37885fec89 in g_slice_free1 () from /lib64/libglib-2.0.so.0
> No symbol table info available.
> #5  0x00007f378a1f232e in virtio_blk_rw_complete ()
> No symbol table info available.
> #6  0x00007f378a39f1ae in bdrv_co_em_bh ()
> No symbol table info available.
> #7  0x00007f378a398394 in aio_bh_poll ()
> No symbol table info available.
> #8  0x00007f378a3a7409 in aio_dispatch_clients ()
> No symbol table info available.
> #9  0x00007f378a39820e in aio_ctx_dispatch ()
> No symbol table info available.
> #10 0x00007f37885e299a in g_main_context_dispatch () from /lib64/libgli=
b-2.0.so.0
> No symbol table info available.
> #11 0x00007f378a3a6288 in main_loop_wait ()
> No symbol table info available.
> #12 0x00007f378a1a5a4e in main ()
> No symbol table info available.
>=20

OK, that's a bit different but interesting....

> @  1) Was there anything nasty in the /var/log/libvirt/qemu/yourvmname.=
log ?
> No nothing abnormal
>=20
> @  2) Did you hit any IO errors and need to tell the VM to continue aft=
er a problem?
> Ovirt tells me "no Storage space error". Which is something like the di=
sk is growing to fast i think. I use Snapshots so on heavy write the disk=
 has to grow a lot.
> Sometimes the VM is paused and resumed from ovirt. Sometimes the VM sta=
ys offline.

OK, that's interesting, because you may be hitting the following bug;
http://lists.nongnu.org/archive/html/qemu-block/2015-11/msg00585.html

whose fix coincidentally just got accepted today; it's related to error c=
ases with error=3Dstop which
you are using.

Do you think you're only hitting these crashes on VMs that have been paus=
ed because of these space errors?

>      disk emulation and see if the problem goes away - e.g. virtio-scsi=
 would be a good one to try.
>=20
> Ok will try that and report

Thanks,

Dave

>=20
> Thx Christian
>=20
>=20
> -----Urspr=FCngliche Nachricht-----
> Von: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com]=20
> Gesendet: Dienstag, 17. November 2015 10:59
> An: Grundmann, Christian <Christian.Grundmann@fabasoft.com>
> Cc: 'qemu-devel@nongnu.org' <qemu-devel@nongnu.org>; stefanha@redhat.co=
m
> Betreff: Re: [Qemu-devel] WG: [ovirt-users] Segmentation fault in libtc=
malloc
>=20
> * Grundmann, Christian (Christian.Grundmann@fabasoft.com) wrote:
> > Hi,
> > Dan sent me over to you,
> > please let me know if i can provide additional informations
>=20
> Hi Christian,
>   Thanks for reporting this,
>=20
> > Softwareversions:
> > ovirt-node-iso-3.6-0.999.201510221942.el7.centos.iso
> >=20
> > qemu-img-ev-2.3.0-29.1.el7.x86_64
> > qemu-kvm-ev-2.3.0-29.1.el7.x86_64
> > qemu-kvm-common-ev-2.3.0-29.1.el7.x86_64
> > qemu-kvm-tools-ev-2.3.0-29.1.el7.x86_64
> > ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch
> > kernel-3.10.0-229.14.1.el7.x86_64
> > gperftools-libs-2.4-7.el7.x86_64
> >=20
> > Commandline:
> > /usr/libexec/qemu-kvm -name myvmname -S -machine=20
> > rhel6.5.0,accel=3Dkvm,usb=3Doff -cpu Westmere -m 7168 -realtime mlock=
=3Doff=20
> > -smp 2,maxcpus=3D16,sockets=3D16,cores=3D1,threads=3D1 -uuid=20
> > 5b6b8899-5a9d-4c07-a6aa-6171527ad319 -smbios=20
> > type=3D1,manufacturer=3DoVirt,product=3DoVirt=20
> > Node,version=3D3.6-0.999.201510221942.el7.centos,serial=3D30343536-31=
38-5A
> > 43-4A34-323630303253,uuid=3D5b6b8899-5a9d-4c07-a6aa-6171527ad319=20
> > -nographic -no-user-config -nodefaults -chardev=20
> > socket,id=3Dcharmonitor,path=3D/var/lib/libvirt/qemu/myvmname.monitor=
,serv
> > er,nowait -mon chardev=3Dcharmonitor,id=3Dmonitor,mode=3Dcontrol -rtc=
=20
> > base=3D2015-11-15T20:04:35,driftfix=3Dslew -global=20
> > kvm-pit.lost_tick_policy=3Ddiscard -no-hpet -no-shutdown -boot strict=
=3Don=20
> > -device piix3-usb-uhci,id=3Dusb,bus=3Dpci.0,addr=3D0x1.0x2 -device=20
> > virtio-scsi-pci,id=3Dscsi0,bus=3Dpci.0,addr=3D0x4 -device=20
> > virtio-serial-pci,id=3Dvirtio-serial0,max_ports=3D16,bus=3Dpci.0,addr=
=3D0x5=20
> > -drive if=3Dnone,id=3Ddrive-ide0-1-0,readonly=3Don,format=3Draw,seria=
l=3D=20
> > -device ide-cd,bus=3Dide.1,unit=3D0,drive=3Ddrive-ide0-1-0,id=3Dide0-=
1-0=20
> > -drive=20
> > file=3D/rhev/data-center/00000002-0002-0002-0002-0000000000e2/5df61b8=
4-8
> > 746-4460-b148-65cc0eb8d29c/images/8202b81d-6191-495f-8c9d-7d90baffaec=
f
> > /d7665e07-1786-4051-aa26-0a3e1c9d2574,if=3Dnone,id=3Ddrive-virtio-dis=
k0,fo
> > rmat=3Dqcow2,serial=3D8202b81d-6191-495f-8c9d-7d90baffaecf,cache=3Dno=
ne,werr
> > or=3Dstop,rerror=3Dstop,aio=3Dnative -device=20
> > virtio-blk-pci,scsi=3Doff,bus=3Dpci.0,addr=3D0x6,drive=3Ddrive-virtio=
-disk0,id
> > =3Dvirtio-disk0,bootindex=3D1 -netdev=20
> > tap,fd=3D39,id=3Dhostnet0,vhost=3Don,vhostfd=3D65 -device=20
> > virtio-net-pci,netdev=3Dhostnet0,id=3Dnet0,mac=3D52:54:00:83:a2:0e,bu=
s=3Dpci.0
> > ,addr=3D0x3 -chardev=20
> > socket,id=3Dcharchannel0,path=3D/var/lib/libvirt/qemu/channels/5b6b88=
99-5a
> > 9d-4c07-a6aa-6171527ad319.com.redhat.rhevm.vdsm,server,nowait -device=
=20
> > virtserialport,bus=3Dvirtio-serial0.0,nr=3D1,chardev=3Dcharchannel0,i=
d=3Dchann
> > el0,name=3Dcom.redhat.rhevm.vdsm -chardev=20
> > socket,id=3Dcharchannel1,path=3D/var/lib/libvirt/qemu/channels/5b6b88=
99-5a
> > 9d-4c07-a6aa-6171527ad319.org.qemu.guest_agent.0,server,nowait -devic=
e=20
> > virtserialport,bus=3Dvirtio-serial0.0,nr=3D2,chardev=3Dcharchannel1,i=
d=3Dchann
> > el1,name=3Dorg.qemu.guest_agent.0 -device=20
> > cirrus-vga,id=3Dvideo0,bus=3Dpci.0,addr=3D0x2 -device=20
> > virtio-balloon-pci,id=3Dballoon0,bus=3Dpci.0,addr=3D0x7 -msg timestam=
p=3Don
> >=20
> > Stack Trace:
> >=20
> > gdb --batch /usr/libexec/qemu-kvm core.14750.1447544080.dump -ex "set=
 pagination off" -ex "thread apply all bt"
>=20
> Can you please use a 'thread apply all bt full'   the full gives a litt=
le more info.
> Also, if you've not already got it installed can you please install the=
 debuginfo package for qemu, it gives a lot more information in backtrace=
s.
>=20
> > Thread 1 (Thread 0x7fa8b16afc00 (LWP 14750)):
> > #0  0x00007fa8ad2febe1 in tc_malloc () from /lib64/libtcmalloc.so.4
> > #1  0x00007fa8b186b489 in malloc_and_trace ()
> > #2  0x00007fa8afbc047f in g_malloc () from /lib64/libglib-2.0.so.0
> > #3  0x00007fa8afbd666e in g_slice_alloc () from=20
> > /lib64/libglib-2.0.so.0
> > #4  0x00007fa8b17cbffd in virtio_blk_handle_output ()
> > #5  0x00007fa8b197e6b6 in qemu_iohandler_poll ()
> > #6  0x00007fa8b197e296 in main_loop_wait ()
> > #7  0x00007fa8b177da4e in main ()
>=20
> Does this part always look the same in your backtraces?
> The segfault in tc_malloc is probably due to a heap corruption, or doub=
le free or similar - although it can be a bit tricky to find out what did=
 it, since the corruption might have happened a bit before the place it c=
rashed.
>=20
> Some other ideas:
>   1) Was there anything nasty in the /var/log/libvirt/qemu/yourvmname.l=
og ?
>   2) Did you hit any IO errors and need to tell the VM to continue afte=
r a problem?
>   3) If this is pretty repeatable, then it would be interesting to try =
changing to a different
>      disk emulation and see if the problem goes away - e.g. virtio-scsi=
 would be a good one to try.
>=20
> Dave
> >=20
> >=20
> > Thx Christian
> >=20
> > -----Urspr=FCngliche Nachricht-----
> > Von: Dan Kenigsberg [mailto:danken@redhat.com]
> > Gesendet: Freitag, 13. November 2015 20:00
> > An: Grundmann, Christian <Christian.Grundmann@fabasoft.com>
> > Cc: 'users@ovirt.org' <users@ovirt.org>
> > Betreff: Re: [ovirt-users] Segmentation fault in libtcmalloc
> >=20
> > On Fri, Nov 13, 2015 at 07:56:14AM +0000, Grundmann, Christian wrote:
> > > Hi,
> > > i am using "ovirt-node-iso-3.6-0.999.201510221942.el7.centos.iso"=20
> > > (is there something better to use?) fort he nodes, and have random=20
> > > crashes of VMs The dumps are always the Same
> > >=20
> > > gdb --batch /usr/libexec/qemu-kvm core.45902.1447199164.dump [Threa=
d=20
> > > debugging using libthread_db enabled] Using host libthread_db=20
> > > library "/lib64/libthread_db.so.1".
> > > Core was generated by `/usr/libexec/qemu-kvm -name vmname -S -machi=
ne rhel6.5.0,accel=3Dkvm,usb=3Do'.
> > > Program terminated with signal 11, Segmentation fault.
> > > #0  0x00007f0c559c4353 in
> > > tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache:=
:
> > > Fr eeList*, unsigned long, int) () from /lib64/libtcmalloc.so.4
> > >=20
> > >=20
> > > Didn't have the Problem with 3.5 el6 nodes, so don't no if ist=20
> > > centos7 or 3.6
> >=20
> > Due to the low-leveled-ness of the problem, I'd guess it's a qemu//li=
b64/libtcmalloc malloc bug, and not directly related to ovirt.
> >=20
> > Please report the precise version of qemu,kernel,libvirt and gperftoo=
ls-libs to qemu-devel mailing list and the complete stack trace and qemu =
command line, if possible.
> >=20
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK