From: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Cc: ctalbott-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
rni-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org,
vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Subject: [PATCHSET] block: implement per-blkg request allocation, take#2
Date: Mon, 4 Jun 2012 20:40:50 -0700 [thread overview]
Message-ID: <1338867660-4689-1-git-send-email-tj@kernel.org> (raw)
Hello,
(The posting yesterday went out w/o lkml cc'd and old head message,
resending w/ Vivek's suggestions applied. Sorry about the noise.)
This is the second take of "implement per-blkg request allocation"
patchset. Changes from the last take[L] are
* 0001-fix-blkg_alloc-failure-path patch is separated from this series
and merged to block/for-linus.
* Updated patches posted incrementally merged into the series.
* Rebased on top of the current block/for-linus.
* Documentation/block/queue-sysfs.txt updated to note that nr_requests
is per-blkcg.
* Acked/Reviewed-by's added.
The original description follows.
Currently block layer shares a single request_list (@q->rq) for all
IOs regardless of their blkcg associations. This means that once the
shared pool is exhausted, blkcg limits don't mean much. Whoever grabs
the requests being freed the first grabs the next IO slot.
This priority inversion can be easily demonstrated by creating a blkio
cgroup w/ very low weight, put a program which can issue a lot of
random direct IOs there and running a sequential IO from a different
cgroup. As soon as the request pool is used up, the sequential IO
bandwidth crashes.
This patchset implements per-blkg request allocation so that each
blkcg-request_queue pair has its own request pool to allocate from.
This isolates different blkcgs in terms of request allocation.
Most changes are straight-forward; unfortunately, bdi isn't
blkcg-aware yet so it currently just propagates the congestion state
from root blkcg. As writeback currently is always on the root blkcg,
this kinda works for write congestion but readahead may behave
non-optimally under congestion for now. This needs to be improved but
the situation is still way better than blkcg completely collapsing.
0001-blkcg-__blkg_lookup_create-doesn-t-need-radix-preloa.patch
0002-blkcg-make-root-blkcg-allocation-use-GFP_KERNEL.patch
0003-mempool-add-gfp_mask-to-mempool_create_node.patch
0004-block-drop-custom-queue-draining-used-by-scsi_transp.patch
0005-block-refactor-get_request-_wait.patch
0006-block-allocate-io_context-upfront.patch
0007-blkcg-inline-bio_blkcg-and-friends.patch
0008-block-add-q-nr_rqs-and-move-q-rq.elvpriv-to-q-nr_rqs.patch
0009-block-prepare-for-multiple-request_lists.patch
0010-blkcg-implement-per-blkg-request-allocation.patch
0001-0002 are misc preps.
0003 adds @gfp_mask to mempool_create_node(). This is necessary
because blkg allocation is on the IO path and now blkg contains
mempool for request_list. Note that blkg allocation failure doesn't
lead to catastrophic failure. It just hinders blkcg enforcement.
0004 drops custom queue draining which I dont't think is necessary and
hinders with further updates.
0005-0009 are prep patches and 0010 implements per-blkg request
allocation.
This patchset is on top of the current block/for-linus - 9b2ea86bc9e
"blkcg: fix blkg_alloc() failure path" and is also available in the
following git branch.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git blkcg-rl
Documentation/block/queue-sysfs.txt | 7 +
block/blk-cgroup.c | 139 ++++++++++++++++---------
block/blk-cgroup.h | 121 +++++++++++++++++++++
block/blk-core.c | 200 ++++++++++++++++++------------------
block/blk-sysfs.c | 34 +++---
block/blk-throttle.c | 3
block/blk.h | 3
block/bsg-lib.c | 53 ---------
drivers/scsi/scsi_transport_fc.c | 38 ------
drivers/scsi/scsi_transport_iscsi.c | 2
include/linux/blkdev.h | 53 +++++----
include/linux/bsg-lib.h | 1
include/linux/mempool.h | 3
mm/mempool.c | 12 +-
14 files changed, 382 insertions(+), 287 deletions(-)
Thanks.
--
tejun
[L] http://thread.gmane.org/gmane.linux.kernel.containers/23159
WARNING: multiple messages have this Message-ID (diff)
From: Tejun Heo <tj@kernel.org>
To: axboe@kernel.dk, linux-kernel@vger.kernel.org,
containers@lists.linux-foundation.org
Cc: vgoyal@redhat.com, ctalbott@google.com, rni@google.com,
fengguang.wu@intel.com, hughd@google.com,
akpm@linux-foundation.org
Subject: [PATCHSET] block: implement per-blkg request allocation, take#2
Date: Mon, 4 Jun 2012 20:40:50 -0700 [thread overview]
Message-ID: <1338867660-4689-1-git-send-email-tj@kernel.org> (raw)
Hello,
(The posting yesterday went out w/o lkml cc'd and old head message,
resending w/ Vivek's suggestions applied. Sorry about the noise.)
This is the second take of "implement per-blkg request allocation"
patchset. Changes from the last take[L] are
* 0001-fix-blkg_alloc-failure-path patch is separated from this series
and merged to block/for-linus.
* Updated patches posted incrementally merged into the series.
* Rebased on top of the current block/for-linus.
* Documentation/block/queue-sysfs.txt updated to note that nr_requests
is per-blkcg.
* Acked/Reviewed-by's added.
The original description follows.
Currently block layer shares a single request_list (@q->rq) for all
IOs regardless of their blkcg associations. This means that once the
shared pool is exhausted, blkcg limits don't mean much. Whoever grabs
the requests being freed the first grabs the next IO slot.
This priority inversion can be easily demonstrated by creating a blkio
cgroup w/ very low weight, put a program which can issue a lot of
random direct IOs there and running a sequential IO from a different
cgroup. As soon as the request pool is used up, the sequential IO
bandwidth crashes.
This patchset implements per-blkg request allocation so that each
blkcg-request_queue pair has its own request pool to allocate from.
This isolates different blkcgs in terms of request allocation.
Most changes are straight-forward; unfortunately, bdi isn't
blkcg-aware yet so it currently just propagates the congestion state
from root blkcg. As writeback currently is always on the root blkcg,
this kinda works for write congestion but readahead may behave
non-optimally under congestion for now. This needs to be improved but
the situation is still way better than blkcg completely collapsing.
0001-blkcg-__blkg_lookup_create-doesn-t-need-radix-preloa.patch
0002-blkcg-make-root-blkcg-allocation-use-GFP_KERNEL.patch
0003-mempool-add-gfp_mask-to-mempool_create_node.patch
0004-block-drop-custom-queue-draining-used-by-scsi_transp.patch
0005-block-refactor-get_request-_wait.patch
0006-block-allocate-io_context-upfront.patch
0007-blkcg-inline-bio_blkcg-and-friends.patch
0008-block-add-q-nr_rqs-and-move-q-rq.elvpriv-to-q-nr_rqs.patch
0009-block-prepare-for-multiple-request_lists.patch
0010-blkcg-implement-per-blkg-request-allocation.patch
0001-0002 are misc preps.
0003 adds @gfp_mask to mempool_create_node(). This is necessary
because blkg allocation is on the IO path and now blkg contains
mempool for request_list. Note that blkg allocation failure doesn't
lead to catastrophic failure. It just hinders blkcg enforcement.
0004 drops custom queue draining which I dont't think is necessary and
hinders with further updates.
0005-0009 are prep patches and 0010 implements per-blkg request
allocation.
This patchset is on top of the current block/for-linus - 9b2ea86bc9e
"blkcg: fix blkg_alloc() failure path" and is also available in the
following git branch.
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git blkcg-rl
Documentation/block/queue-sysfs.txt | 7 +
block/blk-cgroup.c | 139 ++++++++++++++++---------
block/blk-cgroup.h | 121 +++++++++++++++++++++
block/blk-core.c | 200 ++++++++++++++++++------------------
block/blk-sysfs.c | 34 +++---
block/blk-throttle.c | 3
block/blk.h | 3
block/bsg-lib.c | 53 ---------
drivers/scsi/scsi_transport_fc.c | 38 ------
drivers/scsi/scsi_transport_iscsi.c | 2
include/linux/blkdev.h | 53 +++++----
include/linux/bsg-lib.h | 1
include/linux/mempool.h | 3
mm/mempool.c | 12 +-
14 files changed, 382 insertions(+), 287 deletions(-)
Thanks.
--
tejun
[L] http://thread.gmane.org/gmane.linux.kernel.containers/23159
next reply other threads:[~2012-06-05 3:40 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-05 3:40 Tejun Heo [this message]
2012-06-05 3:40 ` [PATCHSET] block: implement per-blkg request allocation, take#2 Tejun Heo
[not found] ` <1338867660-4689-1-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-06-05 3:40 ` [PATCH 01/10] blkcg: __blkg_lookup_create() doesn't need radix preload Tejun Heo
2012-06-05 3:40 ` Tejun Heo
2012-06-05 3:40 ` [PATCH 02/10] blkcg: make root blkcg allocation use %GFP_KERNEL Tejun Heo
2012-06-05 3:40 ` Tejun Heo
2012-06-05 3:40 ` [PATCH 03/10] mempool: add @gfp_mask to mempool_create_node() Tejun Heo
2012-06-05 3:40 ` Tejun Heo
2012-06-05 3:40 ` [PATCH 04/10] block: drop custom queue draining used by scsi_transport_{iscsi|fc} Tejun Heo
2012-06-05 3:40 ` Tejun Heo
2012-06-05 3:40 ` [PATCH 05/10] block: refactor get_request[_wait]() Tejun Heo
2012-06-05 3:40 ` Tejun Heo
2012-06-05 3:40 ` [PATCH 06/10] block: allocate io_context upfront Tejun Heo
2012-06-05 3:40 ` Tejun Heo
2012-06-05 3:40 ` [PATCH 07/10] blkcg: inline bio_blkcg() and friends Tejun Heo
2012-06-05 3:40 ` Tejun Heo
2012-06-05 3:40 ` [PATCH 08/10] block: add q->nr_rqs[] and move q->rq.elvpriv to q->nr_rqs_elvpriv Tejun Heo
2012-06-05 3:40 ` Tejun Heo
2012-06-05 3:40 ` [PATCH 09/10] block: prepare for multiple request_lists Tejun Heo
2012-06-05 3:40 ` Tejun Heo
2012-06-05 3:41 ` [PATCH 10/10] blkcg: implement per-blkg request allocation Tejun Heo
2012-06-18 22:34 ` [PATCHSET] block: implement per-blkg request allocation, take#2 Tejun Heo
2012-06-18 22:34 ` Tejun Heo
[not found] ` <20120618223432.GF32733-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-06-25 9:55 ` Jens Axboe
2012-06-25 9:55 ` Jens Axboe
2012-06-05 3:41 ` [PATCH 10/10] blkcg: implement per-blkg request allocation Tejun Heo
[not found] ` <1338867660-4689-11-git-send-email-tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2012-06-21 22:25 ` [PATCH UPDATED " Tejun Heo
2012-06-21 22:25 ` Tejun Heo
2012-06-26 22:34 ` [PATCH UPDATED v4 " Tejun Heo
2012-06-26 22:34 ` Tejun Heo
2012-06-26 22:48 ` [PATCH UPDATED v5 " Tejun Heo
2012-06-26 22:48 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1338867660-4689-1-git-send-email-tj@kernel.org \
--to=tj-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
--cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=ctalbott-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=rni-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.