linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET block:for-3.2/core] rescue cfq from RCU death sprial :)
@ 2011-10-26  1:48 Tejun Heo
  2011-10-26  1:48 ` [PATCH 01/13] ida: make ida_simple_get/put() IRQ safe Tejun Heo
                   ` (13 more replies)
  0 siblings, 14 replies; 33+ messages in thread
From: Tejun Heo @ 2011-10-26  1:48 UTC (permalink / raw)
  To: axboe, vgoyal; +Cc: ctalbott, rni, linux-kernel

Hello,

Over the years, cfq has developed an interesting synchronization
scheme around cic's (cfq_io_context).  Each cic is association between
an ioc (io_context) and a q (request_queue).  As it lives across two
different locking domains, synchronization has some level of inherent
complexity - nesting one way and makes the other way difficult.

cfq used RCU in rather creative ways to resolve the locking order
problem and also to cut on some of locking overhead - cic traversals
from both sides depend on RCU and removal synchronization is done by
setting cic->key to a different value.

Even when used according to usual conventions, RCU can be a bit
challenging to get right; unfortunately, this unconventional use of
RCU in CFQ seems to have devolved into ambiguous pile of partial fixes
- lazy dropping to avoid unlinking race, elevator_ops->trim() and
percpu cic counting to deal with lingering ioc's on module unload, and
so on.  There doesn't seem to be any guiding design regarding
synchronization at this point.

This is a lost cause and the code is extremely fragile and holes
aren't too difficult to spot.  The following one is from cfqd going
away too soon after ioc and q exits raced.

 sd 0:0:1:0: [sdb] Synchronizing SCSI cache
 sd 0:0:1:0: [sdb] Stopping disk
 ata1.01: disabled
 scsi: killing requests for dead queue
 general protection fault: 0000 [#1] PREEMPT SMP
 CPU 2
 Modules linked in:
 [   88.503444]
 Pid: 599, comm: hexdump Not tainted 3.1.0-rc10-work+ #158 Bochs Bochs
 RIP: 0010:[<ffffffff81397628>]  [<ffffffff81397628>] cfq_exit_single_io_context+0x58/0xf0
 RSP: 0018:ffff88001bd79de8  EFLAGS: 00010296
 RAX: 000000000000002a RBX: ffff88001daa6188 RCX: 0000000000000001
 RDX: 0000000000000000 RSI: ffffffff81afdaf1 RDI: ffffffff810950b7
 RBP: ffff88001bd79e18 R08: 0000000000000001 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000001 R12: 6b6b6b6b6b6b6b6b
 R13: ffff88001b7490a0 R14: 0000000000000064 R15: ffff88001bd79f00
 FS:  00007f4452475720(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
 CR2: 00007f6c11760000 CR3: 000000001b713000 CR4: 00000000000006e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
 Process hexdump (pid: 599, threadinfo ffff88001bd78000, task ffff88001a60e600)
 Stack:
  ffff88001bd79e68 ffff88001b4f3bd8 ffff88001b4f3bd8 ffffffff813975d0
  ffff88001b7490f0 ffff88001e9ca040 ffff88001bd79e58 ffffffff81395a4a
  ffffffff813959f0 ffffffff820883c0 ffff88001e9ca040 ffff88001b4f3bd8
 Call Trace:
  [<ffffffff81395a4a>] call_for_each_cic+0x5a/0x90
  [<ffffffff81395ab5>] cfq_exit_io_context+0x15/0x20
  [<ffffffff81389130>] exit_io_context+0x100/0x140
  [<ffffffff81098a29>] do_exit+0x579/0x850
  [<ffffffff81098d5b>] do_group_exit+0x5b/0xd0
  [<ffffffff81098de7>] sys_exit_group+0x17/0x20
  [<ffffffff81b02f2b>] system_call_fastpath+0x16/0x1b

Also, the complex and unconventional synchronization made the code
very inaccessible and elevator/io_context APIs non-modular.

The thing is that locking scheme here, although non-trivial, can be
much simpler and conventional.  The only actual performance benefit
which can be gained from use of RCU is not acquiring ioc->lock during
cic lookup on request issue path.  Nothing else actually needs or
benefits from RCU.  cic adding and removal from queue side can be done
with straight forward nested locking.

The only complex part is cic removal from ioc side; however,
reverse-order double lock dancing is a well-known trick.  It isn't the
prettiest thing in the world but definitely is way better and more
understandable than the current scheme.  And with fast path
optimization, it doesn't have to be noticeably more expensive either.

So, that's what this patchset does.  It replaces the current RCU based
magic sync scheme with plain double locking w/ only lookup part using
RCU.  The changes don't add any locking overhead to fast path and even
slow path overhead shouldn't be noticeable at all.

This patchset contains the following 13 patches.

 0001-ida-make-ida_simple_get-put-IRQ-safe.patch
 0002-block-cfq-move-cfqd-cic_index-to-q-id.patch
 0003-block-misc-ioc-cleanups.patch
 0004-block-make-ioc-get-put-interface-more-conventional-a.patch
 0005-block-misc-updates-to-blk_get_queue.patch
 0006-block-cfq-misc-updates-to-cfq_io_context.patch
 0007-block-cfq-move-ioc-ioprio-cgroup-changed-handling-to.patch
 0008-block-cfq-fix-race-condition-in-cic-creation-path-an.patch
 0009-block-cfq-fix-cic-lookup-locking.patch
 0010-block-cfq-unlink-cfq_io_context-s-immediately.patch
 0011-block-cfq-remove-delayed-unlink.patch
 0012-block-cfq-kill-ioc_gone.patch
 0013-block-cfq-kill-cic-key.patch

0001-0003 are misc preps.

0004-0005 fix and udpate io_context get/put.

0006-0007 are cfq preps.

0008-0010 updates cic locking such that cic's are added and removed
under both locks and looked up under queue_lock.

0011-0013 drop no longer necessary stuff.

I tried to test most paths w/ injected conditions and instrumentations
but this stuff definitely needs a lot more baking, so definitely not
material for this merge window.

And here are future plans.

* With locking updated, it becomes much easier to reorganize ioc/cic
  interface such that blk-ioc can handle most of cic management so
  that cfq can simply request and use cic's, so that's the next step.

* blkcg support basically suffers from the same problem both in
  locking and API separation, so that'll be the next one once ioc side
  is done.

This patchset is on top of

  block:for-3.2/core 334c2b0b8b "blk-throttle: use queue_is_locked() instead..."
+ [1] patchset "further updates to blk_cleanup_queue(), take#2"

and available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git block-cfq-sync

diffstat follows.

 block/blk-cgroup.c        |   11 -
 block/blk-core.c          |   32 +--
 block/blk-ioc.c           |  356 +++++++++++++++++++++++++--------
 block/blk-sysfs.c         |    2 
 block/blk.h               |   12 +
 block/bsg.c               |    4 
 block/cfq-iosched.c       |  485 ++++++++++++++--------------------------------
 block/elevator.c          |   16 -
 block/genhd.c             |    2 
 drivers/scsi/scsi_scan.c  |    2 
 fs/ioprio.c               |   24 --
 include/linux/blkdev.h    |   11 -
 include/linux/elevator.h  |   18 -
 include/linux/iocontext.h |   40 +--
 kernel/fork.c             |    8 
 lib/idr.c                 |   11 -
 16 files changed, 515 insertions(+), 519 deletions(-)

Thank you.

--
tejun

[1] http://thread.gmane.org/gmane.linux.kernel/1207608

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2011-10-27 17:11 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-26  1:48 [PATCHSET block:for-3.2/core] rescue cfq from RCU death sprial :) Tejun Heo
2011-10-26  1:48 ` [PATCH 01/13] ida: make ida_simple_get/put() IRQ safe Tejun Heo
2011-10-26  4:42   ` Rusty Russell
2011-10-26 20:28     ` Tejun Heo
2011-10-26  1:48 ` [PATCH 02/13] block, cfq: move cfqd->cic_index to q->id Tejun Heo
2011-10-26  1:48 ` [PATCH 03/13] block: misc ioc cleanups Tejun Heo
2011-10-26  1:48 ` [PATCH 04/13] block: make ioc get/put interface more conventional and fix race on alloction Tejun Heo
2011-10-26 16:01   ` Vivek Goyal
2011-10-26 19:29     ` Tejun Heo
2011-10-26 21:30   ` [PATCH UPDATED " Tejun Heo
2011-10-26  1:48 ` [PATCH 05/13] block: misc updates to blk_get_queue() Tejun Heo
2011-10-26  1:48 ` [PATCH 06/13] block, cfq: misc updates to cfq_io_context Tejun Heo
2011-10-27 15:39   ` Vivek Goyal
2011-10-27 16:24     ` Tejun Heo
2011-10-26  1:48 ` [PATCH 07/13] block, cfq: move ioc ioprio/cgroup changed handling to cic Tejun Heo
2011-10-26  1:48 ` [PATCH 08/13] block, cfq: fix race condition in cic creation path and tighten locking Tejun Heo
2011-10-26  1:48 ` [PATCH 09/13] block, cfq: fix cic lookup locking Tejun Heo
2011-10-26  1:48 ` [PATCH 10/13] block, cfq: unlink cfq_io_context's immediately Tejun Heo
2011-10-26 19:48   ` Vivek Goyal
2011-10-26 19:55     ` Tejun Heo
2011-10-26 20:38       ` Vivek Goyal
2011-10-26 20:54         ` Tejun Heo
2011-10-26 21:31   ` [PATCH UPDATED " Tejun Heo
2011-10-27 14:31     ` Vivek Goyal
2011-10-27 16:17       ` Tejun Heo
2011-10-27 17:05         ` Vivek Goyal
2011-10-27 17:11           ` Tejun Heo
2011-10-26  1:48 ` [PATCH 11/13] block, cfq: remove delayed unlink Tejun Heo
2011-10-26  1:48 ` [PATCH 12/13] block, cfq: kill ioc_gone Tejun Heo
2011-10-26  1:48 ` [PATCH 13/13] block, cfq: kill cic->key Tejun Heo
2011-10-26 21:36 ` [PATCHSET block:for-3.2/core] rescue cfq from RCU death sprial :) Tejun Heo
2011-10-27 15:32   ` Vivek Goyal
2011-10-27 16:10     ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).