From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: [PATCH] Re: Crash on blktap shutdown Date: Thu, 25 Feb 2010 15:18:50 -0800 Message-ID: <4B87055A.2080605@goop.org> References: <4B85AE5C.8050603@goop.org> <1267055355.5962.503.camel@agari.van.xensource.com> <4B85BBAC.6000704@goop.org> <1267057784.5962.688.camel@agari.van.xensource.com> <4B85C656.1060109@goop.org> <1267062465.22667.357.camel@agari.van.xensource.com> <1267066999.23936.79.camel@agari.van.xensource.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <1267066999.23936.79.camel@agari.van.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Daniel Stodden Cc: Xen-devel , Jake Wires List-Id: xen-devel@lists.xenproject.org On 02/24/2010 07:03 PM, Daniel Stodden wrote: > On Wed, 2010-02-24 at 20:47 -0500, Daniel Stodden wrote: > =20 >> On Wed, 2010-02-24 at 19:37 -0500, Jeremy Fitzhardinge wrote: >> =20 >>> On 02/24/2010 04:29 PM, Daniel Stodden wrote: >>> =20 >>>> On Wed, 2010-02-24 at 18:52 -0500, Jeremy Fitzhardinge wrote: >>>> >>>> =20 >>>>> On 02/24/2010 03:49 PM, Daniel Stodden wrote: >>>>> >>>>> =20 >>>>>> On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote: >>>>>> >>>>>> >>>>>> =20 >>>>>>> When rebooting the machine, I got this crash from blktap. The r= ip maps to line 262 in >>>>>>> 0xffffffff812548a1 is in blktap_request_pool_free (/home/jeremy/g= it/linux/drivers/xen/blktap/request.c:262). >>>>>>> >>>>>>> >>>>>>> =20 >>>>>> Uhm, where did that RIP come from? >>>>>> >>>>>> pool_free is on the module exit path. The stack trace below looks = like a >>>>>> crash from the broadcasted SIGTERM before reboot. >>>>>> >>>>>> >>>>>> =20 >>>>> Ignore it; I generated it from a different kernel from the one that >>>>> crashed. But the other oops I posted should be all consistent and >>>>> meaningful. >>>>> >>>>> =20 >>>> Ignore only the debuginfo quote, right? >>>> Cos this looks like a different issue to me. >>>> >>>> =20 >>> Perhaps. I got all the others on normal domain shutdown, but this on= e >>> was on machine reboot. I'll try to repro (as I boot the test kernel >>> with your patch in it). >>> =20 >> (gdb) list *(blktap_device_restart+0x7a) >> 0x2a73 is in blktap_device_restart >> (/local/exp/dns/scratch/xenbits/xen-unstable.hg/linux-2.6-pvops.git/dr= ivers/xen/blktap/device.c:920). >> 915 /* Re-enable calldowns. */ >> 916 if (blk_queue_stopped(dev->gd->queue)) >> 917 blk_start_queue(dev->gd->queue); >> 918 >> 919 /* Kick things off immediately. */ >> 920 blktap_device_do_request(dev->gd->queue); >> 921 >> 922 spin_unlock_irq(&dev->lock); >> 923 } >> 924 >> >> Assuming we've been dereferencing a NULL gendisk, i.e. device_destroy >> racing against device_restart. >> >> Would take >> >> * Tapdisk killed on the other thread, which goes through into >> a device_restart(). Which is what your stacktrace shows. >> >> * Device removal pending, blocking until >> device->users drops to 0, then doing the device_destroy(). >> That might have happened during bdev .release. >> >> Both running at the same time sounds like what happens if you kill the= m >> all at once. >> >> That clearly takes another patch then. >> =20 > Jeremy, > > can you try out the attached patch for me? > > This should close the above shutdown race as well. > > Should be nowhere as frequent as the timer_sync crash fixed earlier. > =20 Hm, the two patches changed things but I'm still seeing problems on=20 domain shutdown. Still looks like use-after-free. blktap_device_destroy: destroy device 0 users 0 blktap_ring_vm_close: unmapping ring 0 blktap_ring_release: freeing device 0 blktap_sysfs_destroy =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D BUG kmalloc-512: Poison overwritten -------------------------------------------------------------------------= ---- INFO: 0xffff88002e9e2048-0xffff88002e9e2048. First byte 0x6a instead of 0= x6b INFO: Allocated in device_create_vargs+0x47/0xd7 age=3D7705 cpu=3D0 pid=3D= 3072 INFO: Freed in device_create_release+0x9/0xb age=3D14 cpu=3D0 pid=3D3320 INFO: Slab 0xffff880003cca5b0 objects=3D14 used=3D2 fp=3D0xffff88002e9e20= 00 flags=3D0xa3 INFO: Object 0xffff88002e9e2000 @offset=3D0 fp=3D0xffff88002e9e2248 Object 0xffff88002e9e2000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2020: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2030: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2040: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2050: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2060: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2070: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2080: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2090: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e20a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e20b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e20c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e20d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e20e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e20f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2100: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2110: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2120: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2130: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2140: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2150: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2160: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2170: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2180: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e2190: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e21a0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e21b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e21c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e21d0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e21e0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b 6b kk Object 0xffff88002e9e21f0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b = 6b a5 k=EF=BF=BD Redzone 0xffff88002e9e2200: bb bb bb bb bb bb bb bb = =EF=BF=BD Padding 0xffff88002e9e2240: 5a 5a 5a 5a 5a 5a 5a 5a = Z Pid: 3327, comm: ifdown Not tainted 2.6.32 #358 Call Trace: [] print_trailer+0x16a/0x173 [] check_bytes_and_report+0xb5/0xe6 [] check_object+0xc5/0x237 [] __slab_alloc+0x493/0x591 [] ? load_elf_binary+0xe2/0x17d8 [] ? load_elf_binary+0xe2/0x17d8 [] __kmalloc+0xbe/0x12f [] load_elf_binary+0xe2/0x17d8 [] ? xen_force_evtchn_callback+0xd/0xf [] ? xen_force_evtchn_callback+0xd/0xf [] ? check_events+0x12/0x20 [] ? search_binary_handler+0x18f/0x278 [] ? flock_to_posix_lock+0x4/0xe1 [] ? search_binary_handler+0xd2/0x278 [] ? xen_restore_fl_direct_end+0x0/0x1 [] ? lock_release+0x15a/0x166 [] ? flock_to_posix_lock+0x4/0xe1 [] search_binary_handler+0xdf/0x278 [] ? load_elf_binary+0x0/0x17d8 [] do_execve+0x185/0x27a [] sys_execve+0x3e/0x5c [] stub_execve+0x6a/0xc0 FIX kmalloc-512: Restoring 0xffff88002e9e2048-0xffff88002e9e2048=3D0x6b J