3.6-rc5 cgroups blkio throttle + md regression

cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 3.6-rc5 cgroups blkio throttle + md regression
@ 2012-09-16 17:31 Joseph Glanville
       [not found] ` <CAOzFzEhf2LfT0BCNPAgPgxZ3=pj2KvJ4Z6kP7XF8nnxag1dfvg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Joseph Glanville @ 2012-09-16 17:31 UTC (permalink / raw)
  To: cgroups, linux-bcache, linux-raid; +Cc: Tejun Heo, NeilBrown

Hi,

When using cgroups blkio controller with bio throttling my shell hangs
attempting to write limits for md devices.
Standard sd devices work as expected but md devices cause the shell to
hang on io_wait.

For example:

echo 9:0 0 > blkio.throttle.read_bps_device

Where 9:0 => /dev/md0

I haven't started trying to narrow down when this occured yet, it
works fine on 3.2 and I will try out 3.5 etc shortly to see if I can
bisect where it stopped working.

Joseph.

-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 3.6-rc5 cgroups blkio throttle + md regression
       [not found] ` <CAOzFzEhf2LfT0BCNPAgPgxZ3=pj2KvJ4Z6kP7XF8nnxag1dfvg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-09-19 16:28   ` Joseph Glanville
       [not found]     ` <CAOzFzEiC4313K4H9393ffzNyBo398BPYSxTk7ZEmuH4GfW5qtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Joseph Glanville @ 2012-09-19 16:28 UTC (permalink / raw)
  To: cgroups; +Cc: Tejun Heo, Vivek Goyal

CC'ing Vivek.

On 17 September 2012 03:31, Joseph Glanville
<joseph.glanville-2MxvZkOi9dvvnOemgxGiVw@public.gmane.org> wrote:
> Hi,
>
> When using cgroups blkio controller with bio throttling my shell hangs
> attempting to write limits for md devices.
> Standard sd devices work as expected but md devices cause the shell to
> hang on io_wait.
>
> For example:
>
> echo 9:0 0 > blkio.throttle.read_bps_device
>
> Where 9:0 => /dev/md0
>
> I haven't started trying to narrow down when this occured yet, it
> works fine on 3.2 and I will try out 3.5 etc shortly to see if I can
> bisect where it stopped working.
>
> Joseph.
>
> --
> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
> Phone: 1300 56 99 52 | Mobile: 0428 754 846

Hi,

I have found some more time to bisect this but unfortunately my test
system fails to boot as it gets into the last ~40 revisions.
I am not sure if this is a Xen related bug so I might be able to
continue on baremetal and I might give that a go shortly.

These are the remaining commits in git bisect view:

ff26eaa blkcg: tg_stats_alloc_lock is an irq lock
0b7877d Merge tag 'v3.4-rc5' into for-3.5/core
bd1a68b vmsplice: relax alignement requirements for SPLICE_F_GIFT
a637120 blkcg: use radix tree to index blkgs from blkcg
496fb78 blkcg: fix blkcg->css ref leak in __blkg_lookup_create()
aaf7c68 block: fix elvpriv allocation failure handling
29e2b09 block: collapse blk_alloc_request() into get_request()
f9fcc2d blkcg: collapse blkcg_policy_ops into blkcg_policy
f95a04a blkcg: embed struct blkg_policy_data in policy specific data
3c79839 blkcg: mass rename of blkcg API
36558c8 blkcg: style cleanups for blk-cgroup.h
54e7ed1 blkcg: remove blkio_group->path[]
c94bed89 blkcg: blkg_rwstat_read() was missing inline
6d18b00 blkcg: shoot down blkgs if all policies are deactivated
3c96cb3 blkcg: drop stuff unused after per-queue policy activation update
a2b1693 blkcg: implement per-queue policy activation
03d8e11 blkcg: add request_queue->root_blkg
b82d4b1 blkcg: make request_queue bypassing on allocation
80fd997 blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing
da8b066 blkcg: make blkg_conf_prep() take @pol and return with queue lock held
8bd435b blkcg: remove static policy ID enums
ec39934 blkcg: use @pol instead of @plid in update_root_blkg_pd() and
blkcg_print_blkgs()
bc0d650 blkcg: kill blkio_list and replace blkio_list_lock with a mutex
f48ec1d cfq: fix build breakage & warnings
5bc4afb blkcg: drop BLKCG_STAT_{PRIV|POL|OFF} macros
d366e7e blkcg: pass around pd->pdata instead of pd itself in prfill functions
af133ce blkcg: move blkio_group_conf->iops and ->bps to blk-throttle
3381cb8 blkcg: move blkio_group_conf->weight to cfq
8a3d261 blkcg: move blkio_group_stats_cpu and friends to blk-throttle.c
155fead blkcg: move blkio_group_stats to cfq-iosched.c
9ade5ea blkcg: add blkio_policy_ops operations for exit and stat reset
41b38b6 blkcg: cfq doesn't need per-cpu dispatch stats
629ed0b blkcg: move statistics update code to policies
2ce4d50 cfq: collapse cfq.h into cfq-iosched.c
60c2bc2 blkcg: move conf/stat file handling code to policies
44ea53d blkcg: implement blkio_policy_type->cftypes
829fdb5 blkcg: export conf/stat helpers to prepare for reorganization
726fa69 blkcg: simplify blkg_conf_prep()
3a8b31d blkcg: restructure blkio_group configruation setting
c4682ae blkcg: restructure configuration printing
627f29f blkcg: drop blkiocg_file_write_u64()
d3d32e6 blkcg: restructure statistics printing
edcb072 blkcg: introduce blkg_stat and blkg_rwstat
2aa4a15 blkcg: BLKIO_STAT_CPU_SECTORS doesn't have subcounters
aaec55a blkcg: remove unused @pol and @plid parameters
959d851 Merge branch 'for-3.5' of ../cgroup into block/for-3.5/core-merged
a556793 blkcg: change a spin_lock() to spin_lock_irq()

If anything it narrows it down to being a blkcg problem rather than
having anything to do with md.

-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 3.6-rc5 cgroups blkio throttle + md regression
       [not found]     ` <CAOzFzEiC4313K4H9393ffzNyBo398BPYSxTk7ZEmuH4GfW5qtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-09-19 18:20       ` Joseph Glanville
       [not found]         ` <CAOzFzEhWk_7oQFOyW6Ri_9Cvsshj2s3pa=Oo-p8uL6r340MoTw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Joseph Glanville @ 2012-09-19 18:20 UTC (permalink / raw)
  To: cgroups; +Cc: Tejun Heo, Vivek Goyal

Hi,

I booted the machine under bare metal to continue bisecting.
Thankfully this allowed me to locate the commit that causes the
problem.

git bisect reports this as the offending commit:
commit b82d4b197c782ced82a8b7b76664125d2d3c156c
Author: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Date:   Fri Apr 13 13:11:31 2012 -0700

    blkcg: make request_queue bypassing on allocation

    With the previous change to guarantee bypass visiblity for RCU read
    lock regions, entering bypass mode involves non-trivial overhead and
    future changes are scheduled to make use of bypass mode during init
    path.  Combined it may end up adding noticeable delay during boot.

    This patch makes request_queue start its life in bypass mode, which is
    ended on queue init completion at the end of
    blk_init_allocated_queue(), and updates blk_queue_bypass_start() such
    that draining and RCU synchronization are performed only when the
    queue actually enters bypass mode.

    This avoids unnecessarily switching in and out of bypass mode during
    init avoiding the overhead and any nasty surprises which may step from
    leaving bypass mode on half-initialized queues.

    The boot time overhead was pointed out by Vivek.

    Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
    Cc: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    Signed-off-by: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>

diff --git a/block/blk-core.c b/block/blk-core.c
index f2db628..3b02ba3 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -421,14 +421,18 @@ void blk_drain_queue(struct request_queue *q,
bool drain_all)
  */
 void blk_queue_bypass_start(struct request_queue *q)
 {
+	bool drain;
+
 	spin_lock_irq(q->queue_lock);
-	q->bypass_depth++;
+	drain = !q->bypass_depth++;
 	queue_flag_set(QUEUE_FLAG_BYPASS, q);
 	spin_unlock_irq(q->queue_lock);

-	blk_drain_queue(q, false);
-	/* ensure blk_queue_bypass() is %true inside RCU read lock */
-	synchronize_rcu();
+	if (drain) {
+		blk_drain_queue(q, false);
+		/* ensure blk_queue_bypass() is %true inside RCU read lock */
+		synchronize_rcu();
+	}
 }
 EXPORT_SYMBOL_GPL(blk_queue_bypass_start);

@@ -577,6 +581,15 @@ struct request_queue *blk_alloc_queue_node(gfp_t
gfp_mask, int node_id)
 	 */
 	q->queue_lock = &q->__queue_lock;

+	/*
+	 * A queue starts its life with bypass turned on to avoid
+	 * unnecessary bypass on/off overhead and nasty surprises during
+	 * init.  The initial bypass will be finished at the end of
+	 * blk_init_allocated_queue().
+	 */
+	q->bypass_depth = 1;
+	__set_bit(QUEUE_FLAG_BYPASS, &q->queue_flags);
+
 	if (blkcg_init_queue(q))
 		goto fail_id;

@@ -672,15 +685,15 @@ blk_init_allocated_queue(struct request_queue
*q, request_fn_proc *rfn,

 	q->sg_reserved_size = INT_MAX;

-	/*
-	 * all done
-	 */
-	if (!elevator_init(q, NULL)) {
-		blk_queue_congestion_threshold(q);
-		return q;
-	}
+	/* init elevator */
+	if (elevator_init(q, NULL))
+		return NULL;

-	return NULL;
+	blk_queue_congestion_threshold(q);
+
+	/* all done, end the initial bypass */
+	blk_queue_bypass_end(q);
+	return q;
 }
 EXPORT_SYMBOL(blk_init_allocated_queue);


Reverting this commit fixes the regression. :)

Joseph.

-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: 3.6-rc5 cgroups blkio throttle + md regression
       [not found]         ` <CAOzFzEhWk_7oQFOyW6Ri_9Cvsshj2s3pa=Oo-p8uL6r340MoTw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-09-19 19:42           ` Vivek Goyal
       [not found]             ` <20120919194231.GF31860-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Vivek Goyal @ 2012-09-19 19:42 UTC (permalink / raw)
  To: Joseph Glanville; +Cc: cgroups, Tejun Heo

On Thu, Sep 20, 2012 at 04:20:42AM +1000, Joseph Glanville wrote:
> Hi,
> 
> I booted the machine under bare metal to continue bisecting.
> Thankfully this allowed me to locate the commit that causes the
> problem.
> 

I tested it and I am also noticing the hang. I can see this hang on
dm devices also.

I suspect this issue is related to bio based drivers. We exit the
bypass mode in blk_init_allocated_queue() and that will be called
only for request based drivers. So for bio based drivers may be 
we never exit the bypass mode and this issue is somehow side
affect of that.

Thanks
Vivek

> git bisect reports this as the offending commit:
> commit b82d4b197c782ced82a8b7b76664125d2d3c156c
> Author: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Date:   Fri Apr 13 13:11:31 2012 -0700
> 
>     blkcg: make request_queue bypassing on allocation
> 
>     With the previous change to guarantee bypass visiblity for RCU read
>     lock regions, entering bypass mode involves non-trivial overhead and
>     future changes are scheduled to make use of bypass mode during init
>     path.  Combined it may end up adding noticeable delay during boot.
> 
>     This patch makes request_queue start its life in bypass mode, which is
>     ended on queue init completion at the end of
>     blk_init_allocated_queue(), and updates blk_queue_bypass_start() such
>     that draining and RCU synchronization are performed only when the
>     queue actually enters bypass mode.
> 
>     This avoids unnecessarily switching in and out of bypass mode during
>     init avoiding the overhead and any nasty surprises which may step from
>     leaving bypass mode on half-initialized queues.
> 
>     The boot time overhead was pointed out by Vivek.
> 
>     Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>     Cc: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>     Signed-off-by: Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index f2db628..3b02ba3 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -421,14 +421,18 @@ void blk_drain_queue(struct request_queue *q,
> bool drain_all)
>   */
>  void blk_queue_bypass_start(struct request_queue *q)
>  {
> +	bool drain;
> +
>  	spin_lock_irq(q->queue_lock);
> -	q->bypass_depth++;
> +	drain = !q->bypass_depth++;
>  	queue_flag_set(QUEUE_FLAG_BYPASS, q);
>  	spin_unlock_irq(q->queue_lock);
> 
> -	blk_drain_queue(q, false);
> -	/* ensure blk_queue_bypass() is %true inside RCU read lock */
> -	synchronize_rcu();
> +	if (drain) {
> +		blk_drain_queue(q, false);
> +		/* ensure blk_queue_bypass() is %true inside RCU read lock */
> +		synchronize_rcu();
> +	}
>  }
>  EXPORT_SYMBOL_GPL(blk_queue_bypass_start);
> 
> @@ -577,6 +581,15 @@ struct request_queue *blk_alloc_queue_node(gfp_t
> gfp_mask, int node_id)
>  	 */
>  	q->queue_lock = &q->__queue_lock;
> 
> +	/*
> +	 * A queue starts its life with bypass turned on to avoid
> +	 * unnecessary bypass on/off overhead and nasty surprises during
> +	 * init.  The initial bypass will be finished at the end of
> +	 * blk_init_allocated_queue().
> +	 */
> +	q->bypass_depth = 1;
> +	__set_bit(QUEUE_FLAG_BYPASS, &q->queue_flags);
> +
>  	if (blkcg_init_queue(q))
>  		goto fail_id;
> 
> @@ -672,15 +685,15 @@ blk_init_allocated_queue(struct request_queue
> *q, request_fn_proc *rfn,
> 
>  	q->sg_reserved_size = INT_MAX;
> 
> -	/*
> -	 * all done
> -	 */
> -	if (!elevator_init(q, NULL)) {
> -		blk_queue_congestion_threshold(q);
> -		return q;
> -	}
> +	/* init elevator */
> +	if (elevator_init(q, NULL))
> +		return NULL;
> 
> -	return NULL;
> +	blk_queue_congestion_threshold(q);
> +
> +	/* all done, end the initial bypass */
> +	blk_queue_bypass_end(q);
> +	return q;
>  }
>  EXPORT_SYMBOL(blk_init_allocated_queue);
> 
> 
> Reverting this commit fixes the regression. :)
> 
> Joseph.
> 
> -- 
> CTO | Orion Virtualisation Solutions | www.orionvm.com.au
> Phone: 1300 56 99 52 | Mobile: 0428 754 846

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 3.6-rc5 cgroups blkio throttle + md regression
       [not found]             ` <20120919194231.GF31860-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-09-20 18:31               ` Tejun Heo
       [not found]                 ` <20120920183153.GI28934-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2012-09-20 18:31 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Joseph Glanville, cgroups

Hello,

On Wed, Sep 19, 2012 at 03:42:31PM -0400, Vivek Goyal wrote:
> On Thu, Sep 20, 2012 at 04:20:42AM +1000, Joseph Glanville wrote:
> > Hi,
> > 
> > I booted the machine under bare metal to continue bisecting.
> > Thankfully this allowed me to locate the commit that causes the
> > problem.
> > 
> 
> I tested it and I am also noticing the hang. I can see this hang on
> dm devices also.
> 
> I suspect this issue is related to bio based drivers. We exit the
> bypass mode in blk_init_allocated_queue() and that will be called
> only for request based drivers. So for bio based drivers may be 
> we never exit the bypass mode and this issue is somehow side
> affect of that.

Can you please trigger sysrq-t and post the result?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 3.6-rc5 cgroups blkio throttle + md regression
       [not found]                 ` <20120920183153.GI28934-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-20 18:42                   ` Vivek Goyal
       [not found]                     ` <20120920184219.GH4681-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Vivek Goyal @ 2012-09-20 18:42 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Joseph Glanville, cgroups

On Thu, Sep 20, 2012 at 11:31:53AM -0700, Tejun Heo wrote:
> Hello,
> 
> On Wed, Sep 19, 2012 at 03:42:31PM -0400, Vivek Goyal wrote:
> > On Thu, Sep 20, 2012 at 04:20:42AM +1000, Joseph Glanville wrote:
> > > Hi,
> > > 
> > > I booted the machine under bare metal to continue bisecting.
> > > Thankfully this allowed me to locate the commit that causes the
> > > problem.
> > > 
> > 
> > I tested it and I am also noticing the hang. I can see this hang on
> > dm devices also.
> > 
> > I suspect this issue is related to bio based drivers. We exit the
> > bypass mode in blk_init_allocated_queue() and that will be called
> > only for request based drivers. So for bio based drivers may be 
> > we never exit the bypass mode and this issue is somehow side
> > affect of that.
> 
> Can you please trigger sysrq-t and post the result?

Sorry, I had taken the sysrq-t output yesterday itself. Got distracted
in other things and could never look through the code. Here it is.

[  418.685015] bash            D ffff880037aa4e88  4720  2898   2847
0x00000080
[  418.685015]  ffff88007c777cd8 0000000000000082 ffff880037aa4b00
ffff88007c777fd8
[  418.685015]  ffff88007c777fd8 ffff88007c777fd8 ffffffff81c13440
ffff880037aa4b00
[  418.685015]  ffff88007c777ce8 ffffffff81e05e40 ffff88007c777d18
000000010001d5e4
[  418.685015] Call Trace:
[  418.685015]  [<ffffffff817a2fa9>] schedule+0x29/0x70
[  418.685015]  [<ffffffff817a1570>] schedule_timeout+0x130/0x250
[  418.685015]  [<ffffffff8140bbcb>] ? kobj_lookup+0x10b/0x160
[  418.685015]  [<ffffffff8104a9b0>] ? usleep_range+0x50/0x50
[  418.685015]  [<ffffffff817a16ae>] schedule_timeout_uninterruptible+0x1e/0x20
[  418.685015]  [<ffffffff8104c310>] msleep+0x20/0x30
[  418.685015]  [<ffffffff812ccaa8>] blkg_conf_prep+0x118/0x140
[  418.685015]  [<ffffffff812cd400>] ? tg_set_conf_uint+0x20/0x20
[  418.685015]  [<ffffffff812cd33a>] tg_set_conf.isra.20+0x2a/0xd0
[  418.685015]  [<ffffffff810021ff>] ? do_signal+0x3f/0x610
[  418.685015]  [<ffffffff812cd417>] tg_set_conf_u64+0x17/0x20
[  418.685015]  [<ffffffff8109de6f>] cgroup_file_write+0x1bf/0x2c0
[  418.685015]  [<ffffffff812864bc>] ? security_file_permission+0x2c/0xb0
[  418.685015]  [<ffffffff811325cc>] vfs_write+0xac/0x180
[  418.685015]  [<ffffffff811328fa>] sys_write+0x4a/0x90
[  418.685015]  [<ffffffff817ab8d2>] system_call_fastpath+0x16/0x1b

Thanks
Vivek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 3.6-rc5 cgroups blkio throttle + md regression
       [not found]                     ` <20120920184219.GH4681-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-09-20 19:17                       ` Vivek Goyal
       [not found]                         ` <20120920191716.GI4681-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Vivek Goyal @ 2012-09-20 19:17 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Joseph Glanville, cgroups

On Thu, Sep 20, 2012 at 02:42:19PM -0400, Vivek Goyal wrote:
> On Thu, Sep 20, 2012 at 11:31:53AM -0700, Tejun Heo wrote:
> > Hello,
> > 
> > On Wed, Sep 19, 2012 at 03:42:31PM -0400, Vivek Goyal wrote:
> > > On Thu, Sep 20, 2012 at 04:20:42AM +1000, Joseph Glanville wrote:
> > > > Hi,
> > > > 
> > > > I booted the machine under bare metal to continue bisecting.
> > > > Thankfully this allowed me to locate the commit that causes the
> > > > problem.
> > > > 
> > > 
> > > I tested it and I am also noticing the hang. I can see this hang on
> > > dm devices also.
> > > 
> > > I suspect this issue is related to bio based drivers. We exit the
> > > bypass mode in blk_init_allocated_queue() and that will be called
> > > only for request based drivers. So for bio based drivers may be 
> > > we never exit the bypass mode and this issue is somehow side
> > > affect of that.
> > 
> > Can you please trigger sysrq-t and post the result?
> 
> Sorry, I had taken the sysrq-t output yesterday itself. Got distracted
> in other things and could never look through the code. Here it is.
> 
> [  418.685015] bash            D ffff880037aa4e88  4720  2898   2847
> 0x00000080
> [  418.685015]  ffff88007c777cd8 0000000000000082 ffff880037aa4b00
> ffff88007c777fd8
> [  418.685015]  ffff88007c777fd8 ffff88007c777fd8 ffffffff81c13440
> ffff880037aa4b00
> [  418.685015]  ffff88007c777ce8 ffffffff81e05e40 ffff88007c777d18
> 000000010001d5e4
> [  418.685015] Call Trace:
> [  418.685015]  [<ffffffff817a2fa9>] schedule+0x29/0x70
> [  418.685015]  [<ffffffff817a1570>] schedule_timeout+0x130/0x250
> [  418.685015]  [<ffffffff8140bbcb>] ? kobj_lookup+0x10b/0x160
> [  418.685015]  [<ffffffff8104a9b0>] ? usleep_range+0x50/0x50
> [  418.685015]  [<ffffffff817a16ae>] schedule_timeout_uninterruptible+0x1e/0x20
> [  418.685015]  [<ffffffff8104c310>] msleep+0x20/0x30
> [  418.685015]  [<ffffffff812ccaa8>] blkg_conf_prep+0x118/0x140
> [  418.685015]  [<ffffffff812cd400>] ? tg_set_conf_uint+0x20/0x20
> [  418.685015]  [<ffffffff812cd33a>] tg_set_conf.isra.20+0x2a/0xd0
> [  418.685015]  [<ffffffff810021ff>] ? do_signal+0x3f/0x610
> [  418.685015]  [<ffffffff812cd417>] tg_set_conf_u64+0x17/0x20
> [  418.685015]  [<ffffffff8109de6f>] cgroup_file_write+0x1bf/0x2c0
> [  418.685015]  [<ffffffff812864bc>] ? security_file_permission+0x2c/0xb0
> [  418.685015]  [<ffffffff811325cc>] vfs_write+0xac/0x180
> [  418.685015]  [<ffffffff811328fa>] sys_write+0x4a/0x90
> [  418.685015]  [<ffffffff817ab8d2>] system_call_fastpath+0x16/0x1b

I suspect we are looping in retry code because bio based queues never come
out of bypass mode.

                /*
                 * If queue was bypassing, we should retry.  Do so after a
                 * short msleep().  It isn't strictly necessary but queue
                 * can be bypassing for some time and it's always nice to
                 * avoid busy looping.
                 */
                if (ret == -EBUSY) {
                        msleep(10);
                        ret = restart_syscall();
                }


Thanks
Vivek

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 3.6-rc5 cgroups blkio throttle + md regression
       [not found]                         ` <20120920191716.GI4681-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-09-20 19:20                           ` Tejun Heo
       [not found]                             ` <20120920192038.GJ28934-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2012-09-20 19:20 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Joseph Glanville, cgroups

On Thu, Sep 20, 2012 at 03:17:16PM -0400, Vivek Goyal wrote:
> I suspect we are looping in retry code because bio based queues never come
> out of bypass mode.
> 
>                 /*
>                  * If queue was bypassing, we should retry.  Do so after a
>                  * short msleep().  It isn't strictly necessary but queue
>                  * can be bypassing for some time and it's always nice to
>                  * avoid busy looping.
>                  */
>                 if (ret == -EBUSY) {
>                         msleep(10);
>                         ret = restart_syscall();
>                 }

Yeah, I incorrectly assumed that bio based drivers would call
blk_init_allocated_queue().  I think we need to move the initial
bypass_end call to blk_register_queue().  Like the following.  Not
completely sure yet tho.

diff --git a/block/blk-core.c b/block/blk-core.c
index 4b4dbdf..cbb019a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -714,9 +714,6 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
 		return NULL;
 
 	blk_queue_congestion_threshold(q);
-
-	/* all done, end the initial bypass */
-	blk_queue_bypass_end(q);
 	return q;
 }
 EXPORT_SYMBOL(blk_init_allocated_queue);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9628b29..f53802e 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -527,6 +527,8 @@ int blk_register_queue(struct gendisk *disk)
 	if (WARN_ON(!q))
 		return -ENXIO;
 
+	blk_queue_bypass_end(q);
+
 	ret = blk_trace_init_sysfs(dev);
 	if (ret)
 		return ret;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: 3.6-rc5 cgroups blkio throttle + md regression
       [not found]                             ` <20120920192038.GJ28934-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-20 19:32                               ` Vivek Goyal
       [not found]                                 ` <20120920193227.GJ4681-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2012-09-20 19:57                               ` Vivek Goyal
  1 sibling, 1 reply; 19+ messages in thread
From: Vivek Goyal @ 2012-09-20 19:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Joseph Glanville, cgroups

On Thu, Sep 20, 2012 at 12:20:38PM -0700, Tejun Heo wrote:
> On Thu, Sep 20, 2012 at 03:17:16PM -0400, Vivek Goyal wrote:
> > I suspect we are looping in retry code because bio based queues never come
> > out of bypass mode.
> > 
> >                 /*
> >                  * If queue was bypassing, we should retry.  Do so after a
> >                  * short msleep().  It isn't strictly necessary but queue
> >                  * can be bypassing for some time and it's always nice to
> >                  * avoid busy looping.
> >                  */
> >                 if (ret == -EBUSY) {
> >                         msleep(10);
> >                         ret = restart_syscall();
> >                 }
> 
> Yeah, I incorrectly assumed that bio based drivers would call
> blk_init_allocated_queue().  I think we need to move the initial
> bypass_end call to blk_register_queue().  Like the following.  Not
> completely sure yet tho.

Hmm.., I am just trying to remember the details hence thinking loud.

So issue was that we did not want to call synchronize_rcu() in queue exit
path (ex. cfq_exit_queue()) because some driver could call create/destroy
queues very frequently during device/disk discovery.

So we started queue in bypass mode so that if it is destroyed early, it
will be destroyed in bypass mode and we shall never have to call
synchronize_rcu().

So assumption here is that blk_register_queue() will be called by
add_disk() and hopefully add_disk() will be called by driver only if
it is not planning to tear down the queue soon. If this assumption is
right, then this patch should work, I think.

Thanks
Vivek

> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 4b4dbdf..cbb019a 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -714,9 +714,6 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
>  		return NULL;
>  
>  	blk_queue_congestion_threshold(q);
> -
> -	/* all done, end the initial bypass */
> -	blk_queue_bypass_end(q);
>  	return q;
>  }
>  EXPORT_SYMBOL(blk_init_allocated_queue);
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 9628b29..f53802e 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -527,6 +527,8 @@ int blk_register_queue(struct gendisk *disk)
>  	if (WARN_ON(!q))
>  		return -ENXIO;
>  
> +	blk_queue_bypass_end(q);
> +
>  	ret = blk_trace_init_sysfs(dev);
>  	if (ret)
>  		return ret;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 3.6-rc5 cgroups blkio throttle + md regression
       [not found]                             ` <20120920192038.GJ28934-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-20 19:32                               ` Vivek Goyal
@ 2012-09-20 19:57                               ` Vivek Goyal
       [not found]                                 ` <20120920195759.GK4681-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 19+ messages in thread
From: Vivek Goyal @ 2012-09-20 19:57 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Joseph Glanville, cgroups

On Thu, Sep 20, 2012 at 12:20:38PM -0700, Tejun Heo wrote:
> On Thu, Sep 20, 2012 at 03:17:16PM -0400, Vivek Goyal wrote:
> > I suspect we are looping in retry code because bio based queues never come
> > out of bypass mode.
> > 
> >                 /*
> >                  * If queue was bypassing, we should retry.  Do so after a
> >                  * short msleep().  It isn't strictly necessary but queue
> >                  * can be bypassing for some time and it's always nice to
> >                  * avoid busy looping.
> >                  */
> >                 if (ret == -EBUSY) {
> >                         msleep(10);
> >                         ret = restart_syscall();
> >                 }
> 
> Yeah, I incorrectly assumed that bio based drivers would call
> blk_init_allocated_queue().  I think we need to move the initial
> bypass_end call to blk_register_queue().  Like the following.  Not
> completely sure yet tho.

I can confirm that with this patch, hang issue is gone. I tried it on
a dm device.

Thanks
Vivek
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 4b4dbdf..cbb019a 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -714,9 +714,6 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
>  		return NULL;
>  
>  	blk_queue_congestion_threshold(q);
> -
> -	/* all done, end the initial bypass */
> -	blk_queue_bypass_end(q);
>  	return q;
>  }
>  EXPORT_SYMBOL(blk_init_allocated_queue);
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 9628b29..f53802e 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -527,6 +527,8 @@ int blk_register_queue(struct gendisk *disk)
>  	if (WARN_ON(!q))
>  		return -ENXIO;
>  
> +	blk_queue_bypass_end(q);
> +
>  	ret = blk_trace_init_sysfs(dev);
>  	if (ret)
>  		return ret;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 3.6-rc5 cgroups blkio throttle + md regression
       [not found]                                 ` <20120920193227.GJ4681-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-09-20 20:17                                   ` Tejun Heo
  0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2012-09-20 20:17 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Joseph Glanville, cgroups

Hello, Vivek.

On Thu, Sep 20, 2012 at 03:32:27PM -0400, Vivek Goyal wrote:
> > Yeah, I incorrectly assumed that bio based drivers would call
> > blk_init_allocated_queue().  I think we need to move the initial
> > bypass_end call to blk_register_queue().  Like the following.  Not
> > completely sure yet tho.
> 
> Hmm.., I am just trying to remember the details hence thinking loud.
> 
> So issue was that we did not want to call synchronize_rcu() in queue exit
> path (ex. cfq_exit_queue()) because some driver could call create/destroy
> queues very frequently during device/disk discovery.
> 
> So we started queue in bypass mode so that if it is destroyed early, it
> will be destroyed in bypass mode and we shall never have to call
> synchronize_rcu().

Yeah, that and it's desirable to ensure that all the fancy stuff is
skipped until initialization is complete.

> So assumption here is that blk_register_queue() will be called by
> add_disk() and hopefully add_disk() will be called by driver only if
> it is not planning to tear down the queue soon. If this assumption is
> right, then this patch should work, I think.

I'm wondering whether there's any use case where a queue is used
without being registered and still require elevator and all other
stuff.  I can't think of any valid use case but am not completely sure
either.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: 3.6-rc5 cgroups blkio throttle + md regression
       [not found]                                 ` <20120920195759.GK4681-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2012-09-20 20:18                                   ` Tejun Heo
       [not found]                                     ` <20120920201815.GB7264-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2012-09-20 20:18 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Joseph Glanville, cgroups

On Thu, Sep 20, 2012 at 03:57:59PM -0400, Vivek Goyal wrote:
> > Yeah, I incorrectly assumed that bio based drivers would call
> > blk_init_allocated_queue().  I think we need to move the initial
> > bypass_end call to blk_register_queue().  Like the following.  Not
> > completely sure yet tho.
> 
> I can confirm that with this patch, hang issue is gone. I tried it on
> a dm device.

Thanks, will prep a patch.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/2] block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()
       [not found]                                     ` <20120920201815.GB7264-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-20 21:08                                       ` Tejun Heo
       [not found]                                         ` <20120920210852.GC7264-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2012-09-20 21:08 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Joseph Glanville, cgroups, Vivek Goyal,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

b82d4b197c ("blkcg: make request_queue bypassing on allocation") made
request_queues bypassed on allocation to avoid switching on and off
bypass mode on a queue being initialized.  Some drivers allocate and
then destroy a lot of queues without fully initializing them and
incurring bypass latency overhead on each of them could add upto
significant overhead.

Unfortunately, blk_init_allocated_queue() is never used by queues of
bio-based drivers, which means that all bio-based driver queues are in
bypass mode even after initialization and registration complete
successfully.

Due to the limited way request_queues are used by bio drivers, this
problem is hidden pretty well but it shows up when blk-throttle is
used in combination with a bio-based driver.  Trying to configure
(echoing to cgroupfs file) blk-throttle for a bio-based driver hangs
indefinitely in blkg_conf_prep() waiting for bypass mode to end.

This patch moves the initial blk_queue_bypass_end() call from
blk_init_allocated_queue() to blk_register_queue() which is called for
any userland-visible queues regardless of its type.

I believe this is correct because I don't think there is any block
driver which needs or wants working elevator and blk-cgroup on a queue
which isn't visible to userland.  If there are such users, we need a
different solution.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Reported-by: Joseph Glanville <joseph.glanville-2MxvZkOi9dvvnOemgxGiVw@public.gmane.org>
Cc: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
Jens, while these are fixes, I think it isn't extremely urgent and
routing these through 3.7-rc1 should be enough.

Thanks.

 block/blk-core.c  |    7 ++-----
 block/blk-sysfs.c |    6 ++++++
 2 files changed, 8 insertions(+), 5 deletions(-)

--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -608,8 +608,8 @@ struct request_queue *blk_alloc_queue_no
 	/*
 	 * A queue starts its life with bypass turned on to avoid
 	 * unnecessary bypass on/off overhead and nasty surprises during
-	 * init.  The initial bypass will be finished at the end of
-	 * blk_init_allocated_queue().
+	 * init.  The initial bypass will be finished when the queue is
+	 * registered by blk_register_queue().
 	 */
 	q->bypass_depth = 1;
 	__set_bit(QUEUE_FLAG_BYPASS, &q->queue_flags);
@@ -714,9 +714,6 @@ blk_init_allocated_queue(struct request_
 		return NULL;

 	blk_queue_congestion_threshold(q);
-
-	/* all done, end the initial bypass */
-	blk_queue_bypass_end(q);
 	return q;
 }
 EXPORT_SYMBOL(blk_init_allocated_queue);
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -527,6 +527,12 @@ int blk_register_queue(struct gendisk *d
 	if (WARN_ON(!q))
 		return -ENXIO;

+	/*
+	 * Initialization must be complete by now.  Finish the initial
+	 * bypass from queue allocation.
+	 */
+	blk_queue_bypass_end(q);
+
 	ret = blk_trace_init_sysfs(dev);
 	if (ret)
 		return ret;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/2] block: fix request_queue->flags initialization
       [not found]                                         ` <20120920210852.GC7264-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-20 21:09                                           ` Tejun Heo
       [not found]                                             ` <20120920210930.GD7264-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-21 13:25                                           ` [PATCH 1/2] block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue() Vivek Goyal
  2012-09-21 13:25                                           ` Jens Axboe
  2 siblings, 1 reply; 19+ messages in thread
From: Tejun Heo @ 2012-09-20 21:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Joseph Glanville, cgroups, Vivek Goyal,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

A queue newly allocated with blk_alloc_queue_node() has only
QUEUE_FLAG_BYPASS set.  For request-based drivers,
blk_init_allocated_queue() is called and q->queue_flags is overwritten
with QUEUE_FLAG_DEFAULT which doesn't include BYPASS even though the
initial bypass is still in effect.

In blk_init_allocated_queue(), or QUEUE_FLAG_DEFAULT to q->queue_flags
instead of overwriting.

Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
 block/blk-core.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -696,7 +696,7 @@ blk_init_allocated_queue(struct request_
 	q->request_fn		= rfn;
 	q->prep_rq_fn		= NULL;
 	q->unprep_rq_fn		= NULL;
-	q->queue_flags		= QUEUE_FLAG_DEFAULT;
+	q->queue_flags		|= QUEUE_FLAG_DEFAULT;
 
 	/* Override internal queue lock with supplied lock pointer */
 	if (lock)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()
       [not found]                                         ` <20120920210852.GC7264-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-20 21:09                                           ` [PATCH 2/2] block: fix request_queue->flags initialization Tejun Heo
@ 2012-09-21 13:25                                           ` Vivek Goyal
  2012-09-21 13:25                                           ` Jens Axboe
  2 siblings, 0 replies; 19+ messages in thread
From: Vivek Goyal @ 2012-09-21 13:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Joseph Glanville, cgroups,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Thu, Sep 20, 2012 at 02:08:52PM -0700, Tejun Heo wrote:
> b82d4b197c ("blkcg: make request_queue bypassing on allocation") made
> request_queues bypassed on allocation to avoid switching on and off
> bypass mode on a queue being initialized.  Some drivers allocate and
> then destroy a lot of queues without fully initializing them and
> incurring bypass latency overhead on each of them could add upto
> significant overhead.
> 
> Unfortunately, blk_init_allocated_queue() is never used by queues of
> bio-based drivers, which means that all bio-based driver queues are in
> bypass mode even after initialization and registration complete
> successfully.
> 
> Due to the limited way request_queues are used by bio drivers, this
> problem is hidden pretty well but it shows up when blk-throttle is
> used in combination with a bio-based driver.  Trying to configure
> (echoing to cgroupfs file) blk-throttle for a bio-based driver hangs
> indefinitely in blkg_conf_prep() waiting for bypass mode to end.
> 
> This patch moves the initial blk_queue_bypass_end() call from
> blk_init_allocated_queue() to blk_register_queue() which is called for
> any userland-visible queues regardless of its type.
> 
> I believe this is correct because I don't think there is any block
> driver which needs or wants working elevator and blk-cgroup on a queue
> which isn't visible to userland.  If there are such users, we need a
> different solution.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Reported-by: Joseph Glanville <joseph.glanville-2MxvZkOi9dvvnOemgxGiVw@public.gmane.org>
> Cc: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
> Jens, while these are fixes, I think it isn't extremely urgent and
> routing these through 3.7-rc1 should be enough.

Looks good to me. 

Acked-by: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Given the fact that blkcg throttling is broken on all bio based devices
(dm,md), I would think that we need to send these fixes out in 3.6
instead of pushing these out to 3.7.

Thanks
Vivek

> 
> Thanks.
> 
>  block/blk-core.c  |    7 ++-----
>  block/blk-sysfs.c |    6 ++++++
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -608,8 +608,8 @@ struct request_queue *blk_alloc_queue_no
>  	/*
>  	 * A queue starts its life with bypass turned on to avoid
>  	 * unnecessary bypass on/off overhead and nasty surprises during
> -	 * init.  The initial bypass will be finished at the end of
> -	 * blk_init_allocated_queue().
> +	 * init.  The initial bypass will be finished when the queue is
> +	 * registered by blk_register_queue().
>  	 */
>  	q->bypass_depth = 1;
>  	__set_bit(QUEUE_FLAG_BYPASS, &q->queue_flags);
> @@ -714,9 +714,6 @@ blk_init_allocated_queue(struct request_
>  		return NULL;
>  
>  	blk_queue_congestion_threshold(q);
> -
> -	/* all done, end the initial bypass */
> -	blk_queue_bypass_end(q);
>  	return q;
>  }
>  EXPORT_SYMBOL(blk_init_allocated_queue);
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -527,6 +527,12 @@ int blk_register_queue(struct gendisk *d
>  	if (WARN_ON(!q))
>  		return -ENXIO;
>  
> +	/*
> +	 * Initialization must be complete by now.  Finish the initial
> +	 * bypass from queue allocation.
> +	 */
> +	blk_queue_bypass_end(q);
> +
>  	ret = blk_trace_init_sysfs(dev);
>  	if (ret)
>  		return ret;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] block: fix request_queue->flags initialization
       [not found]                                             ` <20120920210930.GD7264-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2012-09-21 13:25                                               ` Vivek Goyal
  0 siblings, 0 replies; 19+ messages in thread
From: Vivek Goyal @ 2012-09-21 13:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Joseph Glanville, cgroups,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Thu, Sep 20, 2012 at 02:09:30PM -0700, Tejun Heo wrote:
> A queue newly allocated with blk_alloc_queue_node() has only
> QUEUE_FLAG_BYPASS set.  For request-based drivers,
> blk_init_allocated_queue() is called and q->queue_flags is overwritten
> with QUEUE_FLAG_DEFAULT which doesn't include BYPASS even though the
> initial bypass is still in effect.
> 
> In blk_init_allocated_queue(), or QUEUE_FLAG_DEFAULT to q->queue_flags
> instead of overwriting.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org

Acked-by: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

Vivek

> ---
>  block/blk-core.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -696,7 +696,7 @@ blk_init_allocated_queue(struct request_
>  	q->request_fn		= rfn;
>  	q->prep_rq_fn		= NULL;
>  	q->unprep_rq_fn		= NULL;
> -	q->queue_flags		= QUEUE_FLAG_DEFAULT;
> +	q->queue_flags		|= QUEUE_FLAG_DEFAULT;
>  
>  	/* Override internal queue lock with supplied lock pointer */
>  	if (lock)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()
       [not found]                                         ` <20120920210852.GC7264-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2012-09-20 21:09                                           ` [PATCH 2/2] block: fix request_queue->flags initialization Tejun Heo
  2012-09-21 13:25                                           ` [PATCH 1/2] block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue() Vivek Goyal
@ 2012-09-21 13:25                                           ` Jens Axboe
       [not found]                                             ` <505C6AD4.6030206-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
  2 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2012-09-21 13:25 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Joseph Glanville, cgroups, Vivek Goyal,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 09/20/2012 11:08 PM, Tejun Heo wrote:
> b82d4b197c ("blkcg: make request_queue bypassing on allocation") made
> request_queues bypassed on allocation to avoid switching on and off
> bypass mode on a queue being initialized.  Some drivers allocate and
> then destroy a lot of queues without fully initializing them and
> incurring bypass latency overhead on each of them could add upto
> significant overhead.
> 
> Unfortunately, blk_init_allocated_queue() is never used by queues of
> bio-based drivers, which means that all bio-based driver queues are in
> bypass mode even after initialization and registration complete
> successfully.
> 
> Due to the limited way request_queues are used by bio drivers, this
> problem is hidden pretty well but it shows up when blk-throttle is
> used in combination with a bio-based driver.  Trying to configure
> (echoing to cgroupfs file) blk-throttle for a bio-based driver hangs
> indefinitely in blkg_conf_prep() waiting for bypass mode to end.
> 
> This patch moves the initial blk_queue_bypass_end() call from
> blk_init_allocated_queue() to blk_register_queue() which is called for
> any userland-visible queues regardless of its type.
> 
> I believe this is correct because I don't think there is any block
> driver which needs or wants working elevator and blk-cgroup on a queue
> which isn't visible to userland.  If there are such users, we need a
> different solution.
> 
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Reported-by: Joseph Glanville <joseph.glanville-2MxvZkOi9dvvnOemgxGiVw@public.gmane.org>
> Cc: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
> Jens, while these are fixes, I think it isn't extremely urgent and
> routing these through 3.7-rc1 should be enough.

Agree, I'll shove them into for-3.7/core

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()
       [not found]                                             ` <505C6AD4.6030206-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
@ 2012-10-16 10:00                                               ` Joseph Glanville
       [not found]                                                 ` <CAOzFzEiCWazLEfjo=w8c+7qCce98Q6faW1uwGm-tRmCNPJUztw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Joseph Glanville @ 2012-10-16 10:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Tejun Heo, cgroups, Vivek Goyal,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 21 September 2012 23:25, Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> wrote:
> On 09/20/2012 11:08 PM, Tejun Heo wrote:
>> b82d4b197c ("blkcg: make request_queue bypassing on allocation") made
>> request_queues bypassed on allocation to avoid switching on and off
>> bypass mode on a queue being initialized.  Some drivers allocate and
>> then destroy a lot of queues without fully initializing them and
>> incurring bypass latency overhead on each of them could add upto
>> significant overhead.
>>
>> Unfortunately, blk_init_allocated_queue() is never used by queues of
>> bio-based drivers, which means that all bio-based driver queues are in
>> bypass mode even after initialization and registration complete
>> successfully.
>>
>> Due to the limited way request_queues are used by bio drivers, this
>> problem is hidden pretty well but it shows up when blk-throttle is
>> used in combination with a bio-based driver.  Trying to configure
>> (echoing to cgroupfs file) blk-throttle for a bio-based driver hangs
>> indefinitely in blkg_conf_prep() waiting for bypass mode to end.
>>
>> This patch moves the initial blk_queue_bypass_end() call from
>> blk_init_allocated_queue() to blk_register_queue() which is called for
>> any userland-visible queues regardless of its type.
>>
>> I believe this is correct because I don't think there is any block
>> driver which needs or wants working elevator and blk-cgroup on a queue
>> which isn't visible to userland.  If there are such users, we need a
>> different solution.
>>
>> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>> Reported-by: Joseph Glanville <joseph.glanville-2MxvZkOi9dvvnOemgxGiVw@public.gmane.org>
>> Cc: Vivek Goyal <vgoyal-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> ---
>> Jens, while these are fixes, I think it isn't extremely urgent and
>> routing these through 3.7-rc1 should be enough.
>
> Agree, I'll shove them into for-3.7/core
>
> --
> Jens Axboe
>

Hi,

Has this patch been marked for stable?
This is still currently broken on 3.6.2 (and I would assume other
stable kernels since 3.5)

Joseph.

-- 
CTO | Orion Virtualisation Solutions | www.orionvm.com.au
Phone: 1300 56 99 52 | Mobile: 0428 754 846

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()
       [not found]                                                 ` <CAOzFzEiCWazLEfjo=w8c+7qCce98Q6faW1uwGm-tRmCNPJUztw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-10-16 19:11                                                   ` Tejun Heo
  0 siblings, 0 replies; 19+ messages in thread
From: Tejun Heo @ 2012-10-16 19:11 UTC (permalink / raw)
  To: Joseph Glanville
  Cc: Jens Axboe, cgroups, Vivek Goyal,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue, Oct 16, 2012 at 09:00:30PM +1100, Joseph Glanville wrote:
> Has this patch been marked for stable?

Yes, both have been marked.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-10-16 19:11 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-16 17:31 3.6-rc5 cgroups blkio throttle + md regression Joseph Glanville
     [not found] ` <CAOzFzEhf2LfT0BCNPAgPgxZ3=pj2KvJ4Z6kP7XF8nnxag1dfvg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-09-19 16:28   ` Joseph Glanville
     [not found]     ` <CAOzFzEiC4313K4H9393ffzNyBo398BPYSxTk7ZEmuH4GfW5qtw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-09-19 18:20       ` Joseph Glanville
     [not found]         ` <CAOzFzEhWk_7oQFOyW6Ri_9Cvsshj2s3pa=Oo-p8uL6r340MoTw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-09-19 19:42           ` Vivek Goyal
     [not found]             ` <20120919194231.GF31860-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-09-20 18:31               ` Tejun Heo
     [not found]                 ` <20120920183153.GI28934-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-20 18:42                   ` Vivek Goyal
     [not found]                     ` <20120920184219.GH4681-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-09-20 19:17                       ` Vivek Goyal
     [not found]                         ` <20120920191716.GI4681-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-09-20 19:20                           ` Tejun Heo
     [not found]                             ` <20120920192038.GJ28934-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-20 19:32                               ` Vivek Goyal
     [not found]                                 ` <20120920193227.GJ4681-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-09-20 20:17                                   ` Tejun Heo
2012-09-20 19:57                               ` Vivek Goyal
     [not found]                                 ` <20120920195759.GK4681-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2012-09-20 20:18                                   ` Tejun Heo
     [not found]                                     ` <20120920201815.GB7264-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-20 21:08                                       ` [PATCH 1/2] block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue() Tejun Heo
     [not found]                                         ` <20120920210852.GC7264-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-20 21:09                                           ` [PATCH 2/2] block: fix request_queue->flags initialization Tejun Heo
     [not found]                                             ` <20120920210930.GD7264-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-09-21 13:25                                               ` Vivek Goyal
2012-09-21 13:25                                           ` [PATCH 1/2] block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue() Vivek Goyal
2012-09-21 13:25                                           ` Jens Axboe
     [not found]                                             ` <505C6AD4.6030206-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2012-10-16 10:00                                               ` Joseph Glanville
     [not found]                                                 ` <CAOzFzEiCWazLEfjo=w8c+7qCce98Q6faW1uwGm-tRmCNPJUztw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-10-16 19:11                                                   ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).