xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Re: Another blktap2-ish shutdown crash
       [not found] <4BD8D4AE.6020602@goop.org>
@ 2010-06-03  1:50 ` Daniel Stodden
  2010-06-07  7:29   ` Jan Beulich
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Stodden @ 2010-06-03  1:50 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: xen-devel@lists.xensource.com

[-- Attachment #1: Type: text/plain, Size: 6693 bytes --]

On Wed, 2010-04-28 at 20:37 -0400, Jeremy Fitzhardinge wrote:
> Crash on shutdown with one domain running:

Sending this one separately, since it doesn't seem to belong onto the
blktap2 branch. 

This only broke after the forward port in

commit ab77527f9a63a5c657c6d6a50e70a66adceaa3a0
Author: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Date:   Thu Feb 18 17:22:09 2010 -0800

    xen/blktap2: make compile
    
    Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>

The blk_* queue calls won't implicitly dequeue anymore, where the prior
elv_* calls did. I didn't play much around with it, but I'm pretty sure
the patch below will help.

Cheers,
Daniel

> Sending all processes the TERM signal... [  OK  ]
> blktap_ring_vm_close: unmapping ring 0
> blktap_ring_release: freeing device 0
> blktap_device_do_request: device closed: failing secs 5481032 - 5481040
> ------------[ cut here ]------------
> kernel BUG at block/blk-core.c:2111!
> invalid opcode: 0000 [#1] PREEMPT SMP 
> last sysfs file: /sys/devices/virtual/sound/timer/uevent
> CPU 0 
> Modules linked in: bridge stp llc nf_conntrack_netbios_ns dm_multipath e1000e thermal processor acpi_processor xts gf128mul cryptd aes_x86_64 aes_generic dm_crypt dm_snapshot dm_zero dm_mirror dm_region_hash dm_log raid10 ahci [last unloaded: microcode]
> Pid: 13481, comm: blkback.3.xvda Not tainted 2.6.32.12 #9 X8SIL
> RIP: e030:[<ffffffff811c3095>]  [<ffffffff811c3095>] blk_finish_request+0x2e/0x27f
> RSP: e02b:ffff8800b332d7a0  EFLAGS: 00010087
> RAX: 0000000000000001 RBX: ffff8800bd1c25a0 RCX: 0000000000000000
> RDX: ffff88009ce4f000 RSI: 0000000000000000 RDI: ffff8800bd1c25a0
> RBP: ffff8800b332d7f0 R08: 0000000000000000 R09: 0000000000000001
> R10: ffffffff81036d5a R11: ffffffff81036d5a R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000001000
> FS:  00007fc7e7c52700(0000) GS:ffff8800032c1000(0000) knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000003629227b28 CR3: 0000000001001000 CR4: 0000000000002660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process blkback.3.xvda (pid: 13481, threadinfo ffff8800b332c000, task ffff88009ce4f000)
> Stack:
>  0000000000000000 0000000000000000 ffff8800b332d7c0 ffffffff8124871b
> <0> ffff8800b332d7f0 ffff8800bd1c25a0 0000000000000000 0000000000000000
> <0> 0000000000000000 0000000000001000 ffff8800b332d810 ffffffff811c34eb
> Call Trace:
>  [<ffffffff8124871b>] ? add_disk_randomness+0x29/0x2b
>  [<ffffffff811c34eb>] __blk_end_request+0x27/0x2e
>  [<ffffffff811c355e>] __blk_end_request_cur+0x30/0x32
>  [<ffffffff812401e3>] blktap_device_do_request+0x279/0x298
>  [<ffffffff81066ab5>] ? trace_hardirqs_off+0xd/0xf
>  [<ffffffff81489edc>] ? _spin_unlock_irqrestore+0x56/0x74
>  [<ffffffff8104d553>] ? del_timer+0x79/0x85
>  [<ffffffff811c3e01>] __generic_unplug_device+0x30/0x35
>  [<ffffffff811c1892>] elv_insert+0x190/0x199
>  [<ffffffff811c192e>] __elv_add_request+0x93/0x9a
>  [<ffffffff811c4891>] __make_request+0x361/0x3de
>  [<ffffffff811c2cc1>] generic_make_request+0x29d/0x2f9
>  [<ffffffff810a12a3>] ? mempool_alloc_slab+0x10/0x12
>  [<ffffffff811c2e42>] submit_bio+0x125/0x146
>  [<ffffffff8100fe32>] ? check_events+0x12/0x20
>  [<ffffffff8123bcda>] dispatch_rw_block_io+0x57c/0x641
>  [<ffffffff8100fe1f>] ? xen_restore_fl_direct_end+0x0/0x1
>  [<ffffffff8106a196>] ? lock_release+0x15a/0x165
>  [<ffffffff8100f539>] ? xen_force_evtchn_callback+0xd/0xf
>  [<ffffffff8100fe32>] ? check_events+0x12/0x20
>  [<ffffffff8100f539>] ? xen_force_evtchn_callback+0xd/0xf
>  [<ffffffff810581af>] ? prepare_to_wait+0x1e/0x69
>  [<ffffffff8100f539>] ? xen_force_evtchn_callback+0xd/0xf
>  [<ffffffff8100fe32>] ? check_events+0x12/0x20
>  [<ffffffff810581af>] ? prepare_to_wait+0x1e/0x69
>  [<ffffffff8103d2e6>] ? cpuacct_charge+0x20/0xc6
>  [<ffffffff8103d346>] ? cpuacct_charge+0x80/0xc6
>  [<ffffffff8105af04>] ? lock_hrtimer_base+0x25/0x4b
>  [<ffffffff8100fe32>] ? check_events+0x12/0x20
>  [<ffffffff8105af04>] ? lock_hrtimer_base+0x25/0x4b
>  [<ffffffff8105af04>] ? lock_hrtimer_base+0x25/0x4b
>  [<ffffffff8100fe32>] ? check_events+0x12/0x20
>  [<ffffffff8105af04>] ? lock_hrtimer_base+0x25/0x4b
>  [<ffffffff8123be03>] ? do_block_io_op+0x64/0x2bc
>  [<ffffffff8100f539>] ? xen_force_evtchn_callback+0xd/0xf
>  [<ffffffff8100fe32>] ? check_events+0x12/0x20
>  [<ffffffff8123be03>] ? do_block_io_op+0x64/0x2bc
>  [<ffffffff8123be03>] ? do_block_io_op+0x64/0x2bc
>  [<ffffffff8123be03>] ? do_block_io_op+0x64/0x2bc
>  [<ffffffff8100f539>] ? xen_force_evtchn_callback+0xd/0xf
>  [<ffffffff8100fe32>] ? check_events+0x12/0x20
>  [<ffffffff8123be03>] ? do_block_io_op+0x64/0x2bc
>  [<ffffffff8123be03>] ? do_block_io_op+0x64/0x2bc
>  [<ffffffff8123bfe7>] do_block_io_op+0x248/0x2bc
>  [<ffffffff8123c2ae>] blkif_schedule+0x1d2/0x290
>  [<ffffffff81057fd3>] ? autoremove_wake_function+0x0/0x34
>  [<ffffffff8123c0dc>] ? blkif_schedule+0x0/0x290
>  [<ffffffff81057d61>] kthread+0x7a/0x82
>  [<ffffffff81014cba>] child_rip+0xa/0x20
>  [<ffffffff81014614>] ? restore_args+0x0/0x30
>  [<ffffffff81014cb0>] ? child_rip+0x0/0x20
> Code: e5 41 57 41 56 41 89 f6 41 55 41 54 53 48 89 fb 48 83 ec 28 f6 47 49 10 74 0c 48 8b 7f 40 48 89 de e8 df 1a 00 00 48 39 1b 74 04 <0f> 0b eb fe 83 3d 38 7c f5 00 00 74 0b 83 7b 4c 01 75 05 e8 66 
> RIP  [<ffffffff811c3095>] blk_finish_request+0x2e/0x27f
>  RSP <ffff8800b332d7a0>
> ---[ end trace 213b85711bd407a1 ]---
> note: blkback.3.xvda[13481] exited with preempt_count 1
> Sending all processes the KILL signal... [  OK  ]
> Saving random seed:  [  OK  ]
> Syncing hardware clock to system time type=1111 audit(1272501348.517:54918): user pid=21683 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:hwclock_t:s0 msg='changing system time: exe="/sbin/hwclock" hostname=? addr=? terminal=console res=success'
> [  OK  ]
> Turning off swap:  [  OK  ]
> Turning off quotas:  [  OK  ]
> Unmounting file systems:  [  OK  ]
> Stopping disk encryption for luks-11eb0145-088a-4c13-acf4-9784e82bb1e3 [  OK  ]
> Stopping disk encryption for luks-37e6b844-a112-427c-be60-ae6f4d9c5041 [  OK  ]
> mount: /dev/shm not mounted already, or bad option
> Please stand by while rebooting the system...
> type=1128 audit(1272501349.996:54919): user pid=21439 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:initrc_t:s0 msg='init: exe="/sbin/reboot" hostname=? addr=? terminal=console res=success'
> md: stopping all md devices.
> md: md0 switched to read-only mode.
> xenbus_dev_shutdown: backend/console/3/0: Initialising != Connected, skipping
> 
> 


[-- Attachment #2: blktap-dequeue-reqs.diff --]
[-- Type: text/x-patch, Size: 951 bytes --]

blktap: Dequeue requests before completion.

The blk_end_request path won't dequeue implicitly any more.

Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>

diff -r 666cc4c28bd7 -r 79d30c3f4345 drivers/xen/blktap/device.c
--- a/drivers/xen/blktap/device.c	Mon May 03 17:05:35 2010 -0700
+++ b/drivers/xen/blktap/device.c	Wed May 05 15:03:39 2010 -0700
@@ -795,11 +795,13 @@
 
 	while ((req = blk_peek_request(rq)) != NULL) {
 		if (!blk_fs_request(req)) {
+			blk_start_request(req);
 			__blk_end_request_cur(req, 0);
 			continue;
 		}
 
 		if (blk_barrier_rq(req)) {
+			blk_start_request(req);
 			__blk_end_request_cur(req, 0);
 			continue;
 		}
@@ -882,7 +884,7 @@
 	return;
 
 fail:
-	while ((req = blk_peek_request(rq))) {
+	while ((req = blk_fetch_request(rq))) {
 		BTERR("device closed: failing secs %llu - %llu\n",
 		      (unsigned long long)blk_rq_pos(req),
 		      (unsigned long long)blk_rq_pos(req) + blk_rq_sectors(req));

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] Re: Another blktap2-ish shutdown crash
  2010-06-03  1:50 ` [PATCH] Re: Another blktap2-ish shutdown crash Daniel Stodden
@ 2010-06-07  7:29   ` Jan Beulich
  2010-06-07 11:12     ` Daniel Stodden
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2010-06-07  7:29 UTC (permalink / raw)
  To: Daniel Stodden, Jeremy Fitzhardinge; +Cc: xen-devel@lists.xensource.com

>>> On 03.06.10 at 03:50, Daniel Stodden <daniel.stodden@citrix.com> wrote:

Why would you want blk_start_request() only after the blk_fs_request()
check, but not after the blk_barrier_rq() one?

Jan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Re: Another blktap2-ish shutdown crash
  2010-06-07  7:29   ` Jan Beulich
@ 2010-06-07 11:12     ` Daniel Stodden
  2010-06-07 13:43       ` Jan Beulich
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Stodden @ 2010-06-07 11:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jeremy Fitzhardinge, xen-devel@lists.xensource.com

On Mon, 2010-06-07 at 03:29 -0400, Jan Beulich wrote:
> >>> On 03.06.10 at 03:50, Daniel Stodden <daniel.stodden@citrix.com> wrote:
> 
> Why would you want blk_start_request() only after the blk_fs_request()
> check, but not after the blk_barrier_rq() one?

Huh? But cases did get the blk_start_request call (?!)

Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Re: Another blktap2-ish shutdown crash
  2010-06-07 11:12     ` Daniel Stodden
@ 2010-06-07 13:43       ` Jan Beulich
  2010-06-07 19:27         ` Daniel Stodden
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2010-06-07 13:43 UTC (permalink / raw)
  To: Daniel Stodden; +Cc: Jeremy Fitzhardinge, xen-devel@lists.xensource.com

>>> On 07.06.10 at 13:12, Daniel Stodden <daniel.stodden@citrix.com> wrote:
> On Mon, 2010-06-07 at 03:29 -0400, Jan Beulich wrote:
>> >>> On 03.06.10 at 03:50, Daniel Stodden <daniel.stodden@citrix.com> wrote:
>> 
>> Why would you want blk_start_request() only after the blk_fs_request()
>> check, but not after the blk_barrier_rq() one?
> 
> Huh? But cases did get the blk_start_request call (?!)

I have to admit that I don't understand your response at all.

Assuming that you think my original question was rubbish, this is
the original (before your patch) code I look at

	while ((req = blk_peek_request(rq)) != NULL) {
		if (!blk_fs_request(req)) {
			blk_end_request(req, -EIO, 0);
			continue;
		}

		if (blk_barrier_rq(req)) {
			blk_end_request(req, -EIO, 0);
			continue;
		}
...
		blk_start_request(req);
...

Your patch inserts a call to blk_start_request() into the
first if clause's body, and I was asking why the second
one's wouldn't also need such a call.

Sorry if I'm being dense - I'll appreciate any enlightenment.

Jan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Re: Another blktap2-ish shutdown crash
  2010-06-07 13:43       ` Jan Beulich
@ 2010-06-07 19:27         ` Daniel Stodden
  0 siblings, 0 replies; 5+ messages in thread
From: Daniel Stodden @ 2010-06-07 19:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Jeremy Fitzhardinge, xen-devel@lists.xensource.com

On Mon, 2010-06-07 at 09:43 -0400, Jan Beulich wrote:
> >>> On 07.06.10 at 13:12, Daniel Stodden <daniel.stodden@citrix.com> wrote:
> > On Mon, 2010-06-07 at 03:29 -0400, Jan Beulich wrote:
> >> >>> On 03.06.10 at 03:50, Daniel Stodden <daniel.stodden@citrix.com> wrote:
> >> 
> >> Why would you want blk_start_request() only after the blk_fs_request()
> >> check, but not after the blk_barrier_rq() one?
> > 
> > Huh? But cases did get the blk_start_request call (?!)
> 
> I have to admit that I don't understand your response at all.

".. both ..", sorrry :)

> Assuming that you think my original question was rubbish, this is
> the original (before your patch) code I look at
> 
> 	while ((req = blk_peek_request(rq)) != NULL) {
> 		if (!blk_fs_request(req)) {
> 			blk_end_request(req, -EIO, 0);
> 			continue;
> 		}
> 
> 		if (blk_barrier_rq(req)) {
> 			blk_end_request(req, -EIO, 0);
> 			continue;
> 		}
> ...
> 		blk_start_request(req);
> ...
> 
> Your patch inserts a call to blk_start_request() into the
> first if clause's body, and I was asking why the second
> one's wouldn't also need such a call.
> 
> Sorry if I'm being dense - I'll appreciate any enlightenment.

This is the relevant hunk from 

http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=blobdiff;f=drivers/xen/blktap/device.c;h=8650f028485a6ac10bcb89ebfeaa5756ff2204e6;hp=a50b622ef018c0db88f0d491ce389768c570a4bc;hb=b34e5649ea3fc0f70f87e104c9b08a0dc013b6bb;hpb=4b647759e9f4afd46c23e26e54241a9dfa7d8f4b  

        while ((req = blk_peek_request(rq)) != NULL) {
                if (!blk_fs_request(req)) {
+                       blk_start_request(req);
                        __blk_end_request_cur(req, 0);
                        continue;
                }
 
                if (blk_barrier_rq(req)) {
+                       blk_start_request(req);
                        __blk_end_request_cur(req, 0);
                        continue;
                }

Looks correct to me. What are you looking at?

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-06-07 19:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <4BD8D4AE.6020602@goop.org>
2010-06-03  1:50 ` [PATCH] Re: Another blktap2-ish shutdown crash Daniel Stodden
2010-06-07  7:29   ` Jan Beulich
2010-06-07 11:12     ` Daniel Stodden
2010-06-07 13:43       ` Jan Beulich
2010-06-07 19:27         ` Daniel Stodden

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).