From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39649) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZyeYU-0005LN-RP for qemu-devel@nongnu.org; Tue, 17 Nov 2015 06:36:12 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZyeYR-0007kU-HN for qemu-devel@nongnu.org; Tue, 17 Nov 2015 06:36:10 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47778) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZyeYR-0007kN-7h for qemu-devel@nongnu.org; Tue, 17 Nov 2015 06:36:07 -0500 Date: Tue, 17 Nov 2015 11:36:02 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20151117113601.GD2498@work-vm> References: <6A17C71B52524C408E7AAF69103E9E490F14400C@fabamailserver.fabagl.fabasoft.com> <20151113190014.GB18986@redhat.com> <6A17C71B52524C408E7AAF69103E9E490F14E9F4@fabamailserver.fabagl.fabasoft.com> <20151117095920.GB2498@work-vm> <6A17C71B52524C408E7AAF69103E9E490F153F45@fabamailserver.fabagl.fabasoft.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <6A17C71B52524C408E7AAF69103E9E490F153F45@fabamailserver.fabagl.fabasoft.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] WG: [ovirt-users] Segmentation fault in libtcmalloc List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Grundmann, Christian" Cc: "'qemu-devel@nongnu.org'" , "stefanha@redhat.com" * Grundmann, Christian (Christian.Grundmann@fabasoft.com) wrote: > Hi, >=20 > @ Can you please use a 'thread apply all bt full' the full gives a li= ttle more info. >=20 > gdb --batch /usr/libexec/qemu-kvm core.52281.1447709011.dump -ex "set p= agination off" -ex "thread apply all bt full" OK, it doesn't relaly give any more without the debuginfo package mention= ed below. > @ Also, if you've not already got it installed can you please install t= he debuginfo package for qemu, it gives a lot more information in backtra= ces. > Sorry it's a ovirt-node System where I can't you yum Ah, although perhaps if you took the core dump, onto another machine with= matching qemu and debuginfo you should be able to get more detail. > @ Does this part always look the same in your backtraces? > The most are the same, found one a little bit different : > Thread 1 (Thread 0x7f378a0d7c00 (LWP 6658)): > #0 0x00007f3785d18353 in tcmalloc::ThreadCache::ReleaseToCentralCache(= tcmalloc::ThreadCache::FreeList*, unsigned long, int) () from /lib64/libt= cmalloc.so.4 > No symbol table info available. > #1 0x00007f3785d186b0 in tcmalloc::ThreadCache::Scavenge() () from /li= b64/libtcmalloc.so.4 > No symbol table info available. > #2 0x00007f3785d27057 in tc_free () from /lib64/libtcmalloc.so.4 > No symbol table info available. > #3 0x00007f37885e858f in g_free () from /lib64/libglib-2.0.so.0 > No symbol table info available. > #4 0x00007f37885fec89 in g_slice_free1 () from /lib64/libglib-2.0.so.0 > No symbol table info available. > #5 0x00007f378a1f232e in virtio_blk_rw_complete () > No symbol table info available. > #6 0x00007f378a39f1ae in bdrv_co_em_bh () > No symbol table info available. > #7 0x00007f378a398394 in aio_bh_poll () > No symbol table info available. > #8 0x00007f378a3a7409 in aio_dispatch_clients () > No symbol table info available. > #9 0x00007f378a39820e in aio_ctx_dispatch () > No symbol table info available. > #10 0x00007f37885e299a in g_main_context_dispatch () from /lib64/libgli= b-2.0.so.0 > No symbol table info available. > #11 0x00007f378a3a6288 in main_loop_wait () > No symbol table info available. > #12 0x00007f378a1a5a4e in main () > No symbol table info available. >=20 OK, that's a bit different but interesting.... > @ 1) Was there anything nasty in the /var/log/libvirt/qemu/yourvmname.= log ? > No nothing abnormal >=20 > @ 2) Did you hit any IO errors and need to tell the VM to continue aft= er a problem? > Ovirt tells me "no Storage space error". Which is something like the di= sk is growing to fast i think. I use Snapshots so on heavy write the disk= has to grow a lot. > Sometimes the VM is paused and resumed from ovirt. Sometimes the VM sta= ys offline. OK, that's interesting, because you may be hitting the following bug; http://lists.nongnu.org/archive/html/qemu-block/2015-11/msg00585.html whose fix coincidentally just got accepted today; it's related to error c= ases with error=3Dstop which you are using. Do you think you're only hitting these crashes on VMs that have been paus= ed because of these space errors? > disk emulation and see if the problem goes away - e.g. virtio-scsi= would be a good one to try. >=20 > Ok will try that and report Thanks, Dave >=20 > Thx Christian >=20 >=20 > -----Urspr=FCngliche Nachricht----- > Von: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com]=20 > Gesendet: Dienstag, 17. November 2015 10:59 > An: Grundmann, Christian > Cc: 'qemu-devel@nongnu.org' ; stefanha@redhat.co= m > Betreff: Re: [Qemu-devel] WG: [ovirt-users] Segmentation fault in libtc= malloc >=20 > * Grundmann, Christian (Christian.Grundmann@fabasoft.com) wrote: > > Hi, > > Dan sent me over to you, > > please let me know if i can provide additional informations >=20 > Hi Christian, > Thanks for reporting this, >=20 > > Softwareversions: > > ovirt-node-iso-3.6-0.999.201510221942.el7.centos.iso > >=20 > > qemu-img-ev-2.3.0-29.1.el7.x86_64 > > qemu-kvm-ev-2.3.0-29.1.el7.x86_64 > > qemu-kvm-common-ev-2.3.0-29.1.el7.x86_64 > > qemu-kvm-tools-ev-2.3.0-29.1.el7.x86_64 > > ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch > > kernel-3.10.0-229.14.1.el7.x86_64 > > gperftools-libs-2.4-7.el7.x86_64 > >=20 > > Commandline: > > /usr/libexec/qemu-kvm -name myvmname -S -machine=20 > > rhel6.5.0,accel=3Dkvm,usb=3Doff -cpu Westmere -m 7168 -realtime mlock= =3Doff=20 > > -smp 2,maxcpus=3D16,sockets=3D16,cores=3D1,threads=3D1 -uuid=20 > > 5b6b8899-5a9d-4c07-a6aa-6171527ad319 -smbios=20 > > type=3D1,manufacturer=3DoVirt,product=3DoVirt=20 > > Node,version=3D3.6-0.999.201510221942.el7.centos,serial=3D30343536-31= 38-5A > > 43-4A34-323630303253,uuid=3D5b6b8899-5a9d-4c07-a6aa-6171527ad319=20 > > -nographic -no-user-config -nodefaults -chardev=20 > > socket,id=3Dcharmonitor,path=3D/var/lib/libvirt/qemu/myvmname.monitor= ,serv > > er,nowait -mon chardev=3Dcharmonitor,id=3Dmonitor,mode=3Dcontrol -rtc= =20 > > base=3D2015-11-15T20:04:35,driftfix=3Dslew -global=20 > > kvm-pit.lost_tick_policy=3Ddiscard -no-hpet -no-shutdown -boot strict= =3Don=20 > > -device piix3-usb-uhci,id=3Dusb,bus=3Dpci.0,addr=3D0x1.0x2 -device=20 > > virtio-scsi-pci,id=3Dscsi0,bus=3Dpci.0,addr=3D0x4 -device=20 > > virtio-serial-pci,id=3Dvirtio-serial0,max_ports=3D16,bus=3Dpci.0,addr= =3D0x5=20 > > -drive if=3Dnone,id=3Ddrive-ide0-1-0,readonly=3Don,format=3Draw,seria= l=3D=20 > > -device ide-cd,bus=3Dide.1,unit=3D0,drive=3Ddrive-ide0-1-0,id=3Dide0-= 1-0=20 > > -drive=20 > > file=3D/rhev/data-center/00000002-0002-0002-0002-0000000000e2/5df61b8= 4-8 > > 746-4460-b148-65cc0eb8d29c/images/8202b81d-6191-495f-8c9d-7d90baffaec= f > > /d7665e07-1786-4051-aa26-0a3e1c9d2574,if=3Dnone,id=3Ddrive-virtio-dis= k0,fo > > rmat=3Dqcow2,serial=3D8202b81d-6191-495f-8c9d-7d90baffaecf,cache=3Dno= ne,werr > > or=3Dstop,rerror=3Dstop,aio=3Dnative -device=20 > > virtio-blk-pci,scsi=3Doff,bus=3Dpci.0,addr=3D0x6,drive=3Ddrive-virtio= -disk0,id > > =3Dvirtio-disk0,bootindex=3D1 -netdev=20 > > tap,fd=3D39,id=3Dhostnet0,vhost=3Don,vhostfd=3D65 -device=20 > > virtio-net-pci,netdev=3Dhostnet0,id=3Dnet0,mac=3D52:54:00:83:a2:0e,bu= s=3Dpci.0 > > ,addr=3D0x3 -chardev=20 > > socket,id=3Dcharchannel0,path=3D/var/lib/libvirt/qemu/channels/5b6b88= 99-5a > > 9d-4c07-a6aa-6171527ad319.com.redhat.rhevm.vdsm,server,nowait -device= =20 > > virtserialport,bus=3Dvirtio-serial0.0,nr=3D1,chardev=3Dcharchannel0,i= d=3Dchann > > el0,name=3Dcom.redhat.rhevm.vdsm -chardev=20 > > socket,id=3Dcharchannel1,path=3D/var/lib/libvirt/qemu/channels/5b6b88= 99-5a > > 9d-4c07-a6aa-6171527ad319.org.qemu.guest_agent.0,server,nowait -devic= e=20 > > virtserialport,bus=3Dvirtio-serial0.0,nr=3D2,chardev=3Dcharchannel1,i= d=3Dchann > > el1,name=3Dorg.qemu.guest_agent.0 -device=20 > > cirrus-vga,id=3Dvideo0,bus=3Dpci.0,addr=3D0x2 -device=20 > > virtio-balloon-pci,id=3Dballoon0,bus=3Dpci.0,addr=3D0x7 -msg timestam= p=3Don > >=20 > > Stack Trace: > >=20 > > gdb --batch /usr/libexec/qemu-kvm core.14750.1447544080.dump -ex "set= pagination off" -ex "thread apply all bt" >=20 > Can you please use a 'thread apply all bt full' the full gives a litt= le more info. > Also, if you've not already got it installed can you please install the= debuginfo package for qemu, it gives a lot more information in backtrace= s. >=20 > > Thread 1 (Thread 0x7fa8b16afc00 (LWP 14750)): > > #0 0x00007fa8ad2febe1 in tc_malloc () from /lib64/libtcmalloc.so.4 > > #1 0x00007fa8b186b489 in malloc_and_trace () > > #2 0x00007fa8afbc047f in g_malloc () from /lib64/libglib-2.0.so.0 > > #3 0x00007fa8afbd666e in g_slice_alloc () from=20 > > /lib64/libglib-2.0.so.0 > > #4 0x00007fa8b17cbffd in virtio_blk_handle_output () > > #5 0x00007fa8b197e6b6 in qemu_iohandler_poll () > > #6 0x00007fa8b197e296 in main_loop_wait () > > #7 0x00007fa8b177da4e in main () >=20 > Does this part always look the same in your backtraces? > The segfault in tc_malloc is probably due to a heap corruption, or doub= le free or similar - although it can be a bit tricky to find out what did= it, since the corruption might have happened a bit before the place it c= rashed. >=20 > Some other ideas: > 1) Was there anything nasty in the /var/log/libvirt/qemu/yourvmname.l= og ? > 2) Did you hit any IO errors and need to tell the VM to continue afte= r a problem? > 3) If this is pretty repeatable, then it would be interesting to try = changing to a different > disk emulation and see if the problem goes away - e.g. virtio-scsi= would be a good one to try. >=20 > Dave > >=20 > >=20 > > Thx Christian > >=20 > > -----Urspr=FCngliche Nachricht----- > > Von: Dan Kenigsberg [mailto:danken@redhat.com] > > Gesendet: Freitag, 13. November 2015 20:00 > > An: Grundmann, Christian > > Cc: 'users@ovirt.org' > > Betreff: Re: [ovirt-users] Segmentation fault in libtcmalloc > >=20 > > On Fri, Nov 13, 2015 at 07:56:14AM +0000, Grundmann, Christian wrote: > > > Hi, > > > i am using "ovirt-node-iso-3.6-0.999.201510221942.el7.centos.iso"=20 > > > (is there something better to use?) fort he nodes, and have random=20 > > > crashes of VMs The dumps are always the Same > > >=20 > > > gdb --batch /usr/libexec/qemu-kvm core.45902.1447199164.dump [Threa= d=20 > > > debugging using libthread_db enabled] Using host libthread_db=20 > > > library "/lib64/libthread_db.so.1". > > > Core was generated by `/usr/libexec/qemu-kvm -name vmname -S -machi= ne rhel6.5.0,accel=3Dkvm,usb=3Do'. > > > Program terminated with signal 11, Segmentation fault. > > > #0 0x00007f0c559c4353 in > > > tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache:= : > > > Fr eeList*, unsigned long, int) () from /lib64/libtcmalloc.so.4 > > >=20 > > >=20 > > > Didn't have the Problem with 3.5 el6 nodes, so don't no if ist=20 > > > centos7 or 3.6 > >=20 > > Due to the low-leveled-ness of the problem, I'd guess it's a qemu//li= b64/libtcmalloc malloc bug, and not directly related to ovirt. > >=20 > > Please report the precise version of qemu,kernel,libvirt and gperftoo= ls-libs to qemu-devel mailing list and the complete stack trace and qemu = command line, if possible. > >=20 > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK