From mboxrd@z Thu Jan 1 00:00:00 1970 From: axboe@fb.com (Jens Axboe) Date: Mon, 22 Dec 2014 09:47:12 -0700 Subject: [PATCH 0/4] nvme-blkmq fixes In-Reply-To: References: <1419036856-16275-1-git-send-email-keith.busch@intel.com> <5495B2BF.8070602@fb.com> <5495CDFE.3060404@fb.com> Message-ID: <54984B10.6060907@fb.com> On 12/22/2014 09:38 AM, Keith Busch wrote: > On Sat, 20 Dec 2014, Jens Axboe wrote: >> Here's the patch referenced. Keith, if you tested it, can I add your >> tested/reviewed-by to it? > > Oh, the perils of sending patches at the end of a Friday before holiday... > > I re-tested on my dual-ported machine but without the linux-dm 3.20 > bits, so we're not multipath capable here. DM rejects the device, clears > its request_queue and hits a bug, like the wait queue's task_list has > something invalid. > > --- > device-mapper: table: table load rejected: including > non-request-stackable devices > device-mapper: table: unable to set table type > BUG: unable to handle kernel NULL pointer dereference at (null) > IP: [] __wake_up_common+0x1e/0x78 > PGD 7bb0d067 PUD 36b24067 PMD 0 > SMP > Modules linked in: nvme bnep rfcomm bluetooth rfkill nfsd auth_rpcgss > oid_registry nfs_acl nfs lockd grace fscache sunrpc dm_round_robin loop > dm_multipath parport_pc evdev parport pcspkr psmouse serio_raw processor > thermal_sys button ext4 crc16 jbd2 mbcache sg sr_mod cdrom sd_mod > ata_generic ata_piix e1000 floppy libata scsi_mod [last unloaded: nvme] > CPU: 0 PID: 4597 Comm: multipath Tainted: G D 3.18.0+ #8 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 > task: ffff880036ac15d0 ti: ffff880076880000 task.ti: ffff880076880000 > RIP: 0010:[] [] > __wake_up_common+0x1e/0x78 > RSP: 0018:ffff880076883bd8 EFLAGS: 00010096 > RAX: 0000000000000296 RBX: 0000000000000001 RCX: 0000000000000000 > RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88007ab00878 > RBP: ffff88007ab00880 R08: 0000000000000000 R09: 000000000000c201 > R10: 000000000000c210 R11: 000000000000c1c1 R12: 0000000000000003 > R13: 0000000000000000 R14: ffff880077ca5110 R15: ffff880076883d50 > FS: 00007f53829d67a0(0000) GS:ffff88007f200000(0000) > knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 00000000776be000 CR4: 00000000000006f0 > Stack: > ffff88007c3df800 ffffffff810e75ae ffff88007f219430 ffff88007ab00878 > 0000000000000296 ffff88007ab00e90 ffff880077ca5110 ffff880077ca5110 > ffff880076883d50 ffffffff8106579f ffff880077ca5110 0000000000000000 > Call Trace: > [] ? pcpu_free_area+0x79/0xf8 > [] ? __wake_up+0x35/0x46 > [] ? blk_set_queue_dying+0x33/0x69 > [] ? blk_cleanup_queue+0x25/0xfd > [] ? __dm_destroy+0x22c/0x254 I wonder if it cleaned up the requests lists upfront, otherwise I don't see where that would crash. I'll look into that. This particular patch isn't pushed out yet. > I also couldn't remember if I wrote this next part. It looks like I did, > and it's needed when we run out of requests. I think this still might lose > a "wake" in the case we call blk_cleanup_queue() just before bt_get() > calls prepare_to_wait(), so maybe need to check for dying before and > after io_schedule(). > > --- > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c > index 32e8dbb..69628ef 100644 > --- a/block/blk-mq-tag.c > +++ b/block/blk-mq-tag.c > @@ -275,6 +275,9 @@ static int bt_get(struct blk_mq_alloc_data *data, > > io_schedule(); > > + if (blk_queue_dying(data->q)) > + break; > + > data->ctx = blk_mq_get_ctx(data->q); > data->hctx = data->q->mq_ops->map_queue(data->q, > data->ctx->cpu); I think a cleaner way to handle this would be to add a queue mapped/dead check to hctx_may_queue(). That should catch the case of the queue being dead on entry, and after the io_schedule(). > Finally as Willy pointed out, I messed the nvme_queue struct's natural > alignment, so it doesn't pack. No biggie, just send a one-liner cq_vector and q_depth. -- Jens Axboe