Re: [PATCH] Re: Crash on blktap shutdown

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

From: Daniel Stodden <daniel.stodden@citrix.com>
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Xen-devel <xen-devel@lists.xensource.com>,
	Jake Wires <Jake.Wires@citrix.com>
Subject: Re: [PATCH] Re: Crash on blktap shutdown
Date: Wed, 24 Feb 2010 19:03:19 -0800	[thread overview]
Message-ID: <1267066999.23936.79.camel@agari.van.xensource.com> (raw)
In-Reply-To: <1267062465.22667.357.camel@agari.van.xensource.com>

[-- Attachment #1: Type: text/plain, Size: 2542 bytes --]

On Wed, 2010-02-24 at 20:47 -0500, Daniel Stodden wrote:
> On Wed, 2010-02-24 at 19:37 -0500, Jeremy Fitzhardinge wrote:
> > On 02/24/2010 04:29 PM, Daniel Stodden wrote:
> > > On Wed, 2010-02-24 at 18:52 -0500, Jeremy Fitzhardinge wrote:
> > >    
> > >> On 02/24/2010 03:49 PM, Daniel Stodden wrote:
> > >>      
> > >>> On Wed, 2010-02-24 at 17:55 -0500, Jeremy Fitzhardinge wrote:
> > >>>
> > >>>        
> > >>>> When rebooting the machine,  I got this crash from blktap.  The rip maps to line 262 in
> > >>>> 0xffffffff812548a1 is in blktap_request_pool_free (/home/jeremy/git/linux/drivers/xen/blktap/request.c:262).
> > >>>>
> > >>>>          
> > >>> Uhm, where did that RIP come from?
> > >>>
> > >>> pool_free is on the module exit path. The stack trace below looks like a
> > >>> crash from the broadcasted SIGTERM before reboot.
> > >>>
> > >>>        
> > >> Ignore it; I generated it from a different kernel from the one that
> > >> crashed.  But the other oops I posted should be all consistent and
> > >> meaningful.
> > >>      
> > > Ignore only the debuginfo quote, right?
> > > Cos this looks like a different issue to me.
> > >    
> > 
> > Perhaps.  I got all the others on normal domain shutdown, but this one 
> > was on machine reboot.  I'll try to repro (as I boot the test kernel 
> > with your patch in it).
> 
> (gdb) list *(blktap_device_restart+0x7a)
> 0x2a73 is in blktap_device_restart
> (/local/exp/dns/scratch/xenbits/xen-unstable.hg/linux-2.6-pvops.git/drivers/xen/blktap/device.c:920).
> 915 /* Re-enable calldowns. */
> 916 if (blk_queue_stopped(dev->gd->queue))
> 917 blk_start_queue(dev->gd->queue);
> 918 
> 919 /* Kick things off immediately. */
> 920 blktap_device_do_request(dev->gd->queue);
> 921 
> 922 spin_unlock_irq(&dev->lock);
> 923 }
> 924 
> 
> Assuming we've been dereferencing a NULL gendisk, i.e. device_destroy
> racing against device_restart.
> 
> Would take
> 
>  * Tapdisk killed on the other thread, which goes through into
>    a device_restart(). Which is what your stacktrace shows.
> 
>  * Device removal pending, blocking until
>    device->users drops to 0, then doing the device_destroy().
>    That might have happened during bdev .release.
> 
> Both running at the same time sounds like what happens if you kill them
> all at once.
> 
> That clearly takes another patch then.

Jeremy, 

can you try out the attached patch for me?

This should close the above shutdown race as well.

Should be nowhere as frequent as the timer_sync crash fixed earlier.

Thanks,
Daniel




[-- Attachment #2: fix2.diff --]
[-- Type: text/x-patch, Size: 1912 bytes --]

blktap: Fix queue restart, racing block device removal.

Makes tapdisk context test dev->gd before attempting a queue restart,
with the device lock held. Fixes a race lost against device
destruction, which may issued anywhere on the control path.

Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>

diff -r b8f6877ed00e drivers/xen/blktap/device.c
--- a/drivers/xen/blktap/device.c	Wed Feb 24 18:31:41 2010 -0800
+++ b/drivers/xen/blktap/device.c	Wed Feb 24 18:50:18 2010 -0800
@@ -896,8 +896,6 @@
 	struct blktap_device *dev;
 
 	dev = &tap->device;
-	if (!dev->gd || !dev->gd->queue)
-		return;
 
 	if (blktap_active(tap) && RING_FULL(&tap->ring.ring)) {
 		blktap_defer(tap);
@@ -913,11 +911,15 @@
 	spin_lock_irq(&dev->lock);
 
 	/* Re-enable calldowns. */
-	if (blk_queue_stopped(dev->gd->queue))
-		blk_start_queue(dev->gd->queue);
+	if (dev->gd) {
+		struct request_queue *rq = dev->gd->queue;
 
-	/* Kick things off immediately. */
-	blktap_device_do_request(dev->gd->queue);
+		if (blk_queue_stopped(rq))
+			blk_start_queue(rq);
+
+		/* Kick things off immediately. */
+		blktap_device_do_request(rq);
+	}
 
 	spin_unlock_irq(&dev->lock);
 }
@@ -1006,6 +1008,7 @@
 blktap_device_destroy(struct blktap *tap)
 {
 	struct blktap_device *dev = &tap->device;
+	struct gendisk *gd = dev->gd;
 
 	if (!test_bit(BLKTAP_DEVICE, &tap->dev_inuse))
 		return 0;
@@ -1017,8 +1020,9 @@
 
 	spin_lock_irq(&dev->lock);
 	/* No more blktap_device_do_request(). */
-	blk_stop_queue(dev->gd->queue);
+	blk_stop_queue(gd->queue);
 	clear_bit(BLKTAP_DEVICE, &tap->dev_inuse);
+	dev->gd = NULL;
 	spin_unlock_irq(&dev->lock);
 
 #ifdef ENABLE_PASSTHROUGH
@@ -1026,11 +1030,9 @@
 		blktap_device_close_bdev(tap);
 #endif
 
-	del_gendisk(dev->gd);
-	blk_cleanup_queue(dev->gd->queue);
-	put_disk(dev->gd);
-
-	dev->gd = NULL;
+	del_gendisk(gd);
+	blk_cleanup_queue(gd->queue);
+	put_disk(gd);
 
 	wake_up(&tap->wq);
 

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

next prev parent reply	other threads:[~2010-02-25  3:03 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-24 22:55 Crash on blktap shutdown Jeremy Fitzhardinge
2010-02-24 23:20 ` Daniel Stodden
2010-02-24 23:26   ` Jeremy Fitzhardinge
2010-02-25  0:12     ` Daniel Stodden
2010-02-25  0:16       ` Daniel Stodden
2010-02-25  0:24         ` [PATCH] Fix wild ptr deref during device destruction Daniel Stodden
2010-02-25  8:28           ` Jan Beulich
2010-02-25  9:57             ` Daniel Stodden
2010-02-25 10:02               ` Daniel Stodden
2010-02-25 22:54                 ` Yet another [PATCH] blkfront: " Daniel Stodden
2010-02-24 23:49 ` Crash on blktap shutdown Daniel Stodden
2010-02-24 23:52   ` Jeremy Fitzhardinge
2010-02-25  0:29     ` Daniel Stodden
2010-02-25  0:37       ` Jeremy Fitzhardinge
2010-02-25  1:47         ` Daniel Stodden
2010-02-25  3:03           ` Daniel Stodden [this message]
2010-02-25 23:18             ` [PATCH] " Jeremy Fitzhardinge
2010-02-26 15:38               ` Daniel Stodden

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1267066999.23936.79.camel@agari.van.xensource.com \
    --to=daniel.stodden@citrix.com \
    --cc=Jake.Wires@citrix.com \
    --cc=jeremy@goop.org \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).