Re: blk-throttle.c : When limit is changed, must start a new slice

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: blk-throttle.c : When limit is changed, must start a new slice
       [not found] <tencent_323C67E7636EA7895ACB367F@qq.com>
@ 2011-03-04 19:30 ` Vivek Goyal
  0 siblings, 0 replies; 4+ messages in thread
From: Vivek Goyal @ 2011-03-04 19:30 UTC (permalink / raw)
  To: lina; +Cc: linux kernel mailing list

On Sat, Mar 05, 2011 at 12:30:55AM +0800, lina wrote:
> Hi Vivek,    
>      Take you a litter time to read this letter, I think there seens  to 
>  be a small bug in blk-throttle.c.
>      This might happen that initially cgroup limit was greater than the 
>  device's physic ability, but later limit was under physic ability. In this
>  case, all of the bio in the device was throttled for serveral minutes. 
>   
>      I did some analysis as the following:
>      First, setting a very large bps on device lead all of the bio go
>  through. The slice is begin when test begin, and only extend but 
>  can not start new one. 
>      Second, change the limit under physic ability to make one bio 
>  queued. Once one bio queued, blk_throtl_bio func will call 
>  tg_update_disptime to estimate the delay time for the throtl_work.
>      As the slice is very old, there is a very large value in 
>  tg->bytes_disp[rw], and the tg->disptime is a long time after jiffies. 
>  During this time, all of the bio is queued. And the work_queue can 
>  not start, so tg->slice_start[rw] still can not be reset.
>   
>      Although after serveral minutes everything will be ok, but it still
>  seens no-good for users.
>      
>      I think it should start a new slice when the limit is changed. 
>      Here is my patch, please conrrect it if there is something wrong
>  follow.

CCing to lkml. Lets keep all the testing and bug reports regarding
blkio throttling on mailing list.

thanks for the bug report lina. I think this is a bug. I am not too keen
on restarting slice all the time upon limit change as somebody can exploit
that to get higher BW by doing it frequently. Can you try attached patch
and see if it solves your problem.

>   
>  --- block/blk-throttle.c
>  +++ block/blk-throttle.c
>  @@ -1027,6 +1027,9 @@
>           *biop = NULL;  
>           if (update_disptime) {
>  +                 if (tg->limits_changed)
>  +                 {
>  +                         throtl_start_new_slice(td, tg, rw);
>  +                 }
>                   tg_update_disptime(td, tg);
>                   throtl_schedule_next_dispatch(td);
>           }
>   
>   
>       There is two other problems, can you explain them to me? 
>   
>  First：
>       The blkio dir always like 
>   
>       blkio //patent dir
>         ---blkio.weight
>         ---blkio.throtl_write_bps_device
>         ...
>         ---tst1  //child dir
>               ---blkio.weight              ---blkio.throtl_write_bps_device
>               ...
>   
>       The child dir's throtl files like tst1/blkio.throttl_write_bps_device 
>  seens never be use. It maybe better that do not create these files.
>  If they are useful, please tell me. Thank you!

Throttle creates a flag hierarchy of groups internally. So even if you
create child of a child group, child will be treated as if children 
of root and will be throttled as per throttling rules.
>   
>  Second:
>       When apply weight policy to some pid, I must mkdir tst1 like 
>  above. If all of pid in tst1 is end, the pid in tasks will disappear, but
>  the tst1 dir is still here. I think here will be better if tst1 can be 
>  removed automatically. 
>       Throttle policy has no auto-clean capacity. If I create many 
>  linear dm devices, and apply throttle policy on them, then use 
>  'dmsetup remove--all'. Now I must clean the entries one by one in blkio.throtl_write_bps_device manually. If the throttle policy can 
>  clean the entry automatically when the device is no longger in the 
>  system, It seens to be perfect!

i think it is hard to remove rules also when devie is going away. One
easy way is to remove whole cgroup. Other would be that provide a 
simple command to delete all the rules from a file. 

say echo "0" > blkio.throttle.read_bps_device will remove all the rules
from the file. If that helps you, you can write one patch for that.

Thanks
Vivek

---
 block/blk-throttle.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

Index: linux-2.6/block/blk-throttle.c
===================================================================
--- linux-2.6.orig/block/blk-throttle.c	2011-03-04 13:59:45.000000000 -0500
+++ linux-2.6/block/blk-throttle.c	2011-03-04 14:05:31.542332242 -0500
@@ -1023,6 +1023,19 @@ int blk_throtl_bio(struct request_queue 
 	/* Bio is with-in rate limit of group */
 	if (tg_may_dispatch(td, tg, bio, NULL)) {
 		throtl_charge_bio(tg, bio);
+
+		/*
+		 * We need to trim slice even when bios are not being queued
+		 * otherwise it might happen that a bio is not queued for
+		 * a long time and slice keeps on extending and trim is not
+		 * called for a long time. Now if limits are reduced suddenly
+		 * we take into account all the IO dispatched so far at new
+		 * low rate and * newly queued IO gets a really long dispatch
+		 * time.
+		 *
+		 * So keep on trimming slice even if bio is not queued.
+		 */
+		throtl_trim_slice(td, tg, rw);
 		goto out;
 	}
 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: blk-throttle.c : When limit is changed, must start a new slice
       [not found] <tencent_1FCD09C260CD8FFB3D3886AD@qq.com>
@ 2011-03-07 16:07 ` Vivek Goyal
  0 siblings, 0 replies; 4+ messages in thread
From: Vivek Goyal @ 2011-03-07 16:07 UTC (permalink / raw)
  To: lina; +Cc: linux kernel mailing list

On Mon, Mar 07, 2011 at 11:05:49PM +0800, lina wrote:
> >On Sat, Mar 05, 2011 at 12:30:55AM +0800, lina wrote:
> >> Hi Vivek,    
> >>      Take you a litter time to read this letter, I think there seens  to 
> >>  be a small bug in blk-throttle.c.
> >>      This might happen that initially cgroup limit was greater than the 
> >>  device's physic ability, but later limit was under physic ability. In this
> >>  case, all of the bio in the device was throttled for serveral minutes. 
> >>   
> >>      I did some analysis as the following:
> >>      First, setting a very large bps on device lead all of the bio go
> >>  through. The slice is begin when test begin, and only extend but 
> >>  can not start new one. 
> >>      Second, change the limit under physic ability to make one bio 
> >>  queued. Once one bio queued, blk_throtl_bio func will call 
> >>  tg_update_disptime to estimate the delay time for the throtl_work.
> >>      As the slice is very old, there is a very large value in 
> >>  tg->bytes_disp[rw], and the tg->disptime is a long time after jiffies. 
> >>  During this time, all of the bio is queued. And the work_queue can 
> >>  not start, so tg->slice_start[rw] still can not be reset.
> >>   
> >>      Although after serveral minutes everything will be ok, but it still
> >>  seens no-good for users.
> >>      
> >>      I think it should start a new slice when the limit is changed. 
> >>      Here is my patch, please conrrect it if there is something wrong
> >>  follow.</:includetail>
>  >
> >
> >CCing to lkml. Lets keep all the testing and bug reports regarding
> >blkio throttling on mailing list.
> >
> >thanks for the bug report lina. I think this is a bug. I am not too keen
> >on restarting slice all the time upon limit change as somebody can exploit
> >that to get higher BW by doing it frequently. Can you try attached patch
> >and see if it solves your problem.
> </:includetail></:includetail>
>  Thank you for this patch, it can solve the problem. But there still has 5~10</:includetail>
>  seconds 0 bps in the test. I think this is because we first tirm the end of </:includetail>
>  slice, then new one, there is some latency. Do you have any idea to let </:includetail>the </:includetail>
>  limit change work immediately or make less latency(maybe 1 or 2 seconds)?</:includetail>
>  </:includetail> 

Ok, if you are concerned about those few seconds, can you please try
following patch. I think starting a new slice is better when processing
limit change instead of in blk_throtl_bio().

Thanks
Vivek


---
 block/blk-throttle.c |   21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

Index: linux-2.6/block/blk-throttle.c
===================================================================
--- linux-2.6.orig/block/blk-throttle.c	2011-03-04 13:59:45.000000000 -0500
+++ linux-2.6/block/blk-throttle.c	2011-03-07 10:54:30.639467538 -0500
@@ -757,6 +757,14 @@ static void throtl_process_limit_change(
 				" riops=%u wiops=%u", tg->bps[READ],
 				tg->bps[WRITE], tg->iops[READ],
 				tg->iops[WRITE]);
+			/*
+			 * Restart the slices for both READ and WRITES. It
+			 * might happen that a group's limit are dropped
+			 * suddenly and we don't want to account recently
+			 * dispatched IO with new low rate
+			 */
+			throtl_start_new_slice(td, tg, 0);
+			throtl_start_new_slice(td, tg, 1);
 			tg_update_disptime(td, tg);
 			tg->limits_changed = false;
 		}
@@ -1023,6 +1031,19 @@ int blk_throtl_bio(struct request_queue 
 	/* Bio is with-in rate limit of group */
 	if (tg_may_dispatch(td, tg, bio, NULL)) {
 		throtl_charge_bio(tg, bio);
+
+		/*
+		 * We need to trim slice even when bios are not being queued
+		 * otherwise it might happen that a bio is not queued for
+		 * a long time and slice keeps on extending and trim is not
+		 * called for a long time. Now if limits are reduced suddenly
+		 * we take into account all the IO dispatched so far at new
+		 * low rate and * newly queued IO gets a really long dispatch
+		 * time.
+		 *
+		 * So keep on trimming slice even if bio is not queued.
+		 */
+		throtl_trim_slice(td, tg, rw);
 		goto out;
 	}
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: blk-throttle.c : When limit is changed, must start a new slice
       [not found] <tencent_5B2F59AF17F4410D6A9CFA34@qq.com>
@ 2011-03-08 14:51 ` Vivek Goyal
  0 siblings, 0 replies; 4+ messages in thread
From: Vivek Goyal @ 2011-03-08 14:51 UTC (permalink / raw)
  To: lina; +Cc: linux kernel mailing list

On Tue, Mar 08, 2011 at 10:33:47PM +0800, lina wrote:
> >On Mon, Mar 07, 2011 at 11:05:49PM +0800, lina wrote:
> >> >On Sat, Mar 05, 2011 at 12:30:55AM +0800, lina wrote:
> >> >> Hi Vivek,    
> >> >>      Take you a litter time to read this letter, I think there seens  to 
> >> >>  be a small bug in blk-throttle.c.
> >> >>      This might happen that initially cgroup limit was greater than the 
> >> >>  device's physic ability, but later limit was under physic ability. In this
> >> >>  case, all of the bio in the device was throttled for serveral minutes. 
> >> >>   
> >> >>      I did some analysis as the following:
> >> >>      First, setting a very large bps on device lead all of the bio go
> >> >>  through. The slice is begin when test begin, and only extend but 
> >> >>  can not start new one. 
> >> >>      Second, change the limit under physic ability to make one bio 
> >> >>  queued. Once one bio queued, blk_throtl_bio func will call 
> >> >>  tg_update_disptime to estimate the delay time for the throtl_work.
> >> >>      As the slice is very old, there is a very large value in 
> >> >>  tg->bytes_disp[rw], and the tg->disptime is a long time after jiffies. 
> >> >>  During this time, all of the bio is queued. And the work_queue can 
> >> >>  not start, so tg->slice_start[rw] still can not be reset.
> >> >>   
> >> >>      Although after serveral minutes everything will be ok, but it still
> >> >>  seens no-good for users.
> >> >>      
> >> >>      I think it should start a new slice when the limit is changed. 
> >> >>      Here is my patch, please conrrect it if there is something wrong
> >> >>  follow.
> >>  >
> >> >
> >> >CCing to lkml. Lets keep all the testing and bug reports regarding
> >> >blkio throttling on mailing list.
> >> >
> >> >thanks for the bug report lina. I think this is a bug. I am not too keen
> >> >on restarting slice all the time upon limit change as somebody can exploit
> >> >that to get higher BW by doing it frequently. Can you try attached patch
> >> >and see if it solves your problem.
> >> 
> >>  Thank you for this patch, it can solve the problem. But there still has 5~10
> >>  seconds 0 bps in the test. I think this is because we first tirm the end of
> >>  slice, then new one, there is some latency. Do you have any idea to let the 
> >>  limit change work immediately or make less latency(maybe 1 or 2 seconds
> >>   
> >
> >Ok, if you are concerned about those few seconds, can you please try
> >following patch. I think starting a new slice is better when processing
> >limit change instead of in blk_throtl_bio().
> >
>   
>  Yes. It's better starting a new slice in processing limit change function. I
>  have test starting a new slice in blk_throtl_bio(), if I change the limit very
>  quickly, the bps go out of my control.

I thought about changing limits too quickly. Is this a problem? In many
setups a non-priviliged user will not be able to write to limit files so
he can not exploit it. root should not anyway do that. To get higher
bandwidth root can simply write to files and there is no reason to try
to write to reset limit frequently.

So resetting the slice upon limit change probably is not a huge problem then.

>   
>  Unfortunately, the following patch still has 5~10 seconds latency. I have no
>  idea to resolve this problem, it seens hard to find a more suitable func to
>  call throtl_start_new_slice().

So are you saying that following patch did not solve the latnecy issue?
Resetting slice upon limit change did not work for you?

Thanks
Vivek

> >
> >---
> > block/blk-throttle.c |   21 +++++++++++++++++++++
> > 1 file changed, 21 insertions(+)
> >
> >Index: linux-2.6/block/blk-throttle.c
> >===================================================================
> >--- linux-2.6.orig/block/blk-throttle.c 2011-03-04 13:59:45.000000000 -0500
> >+++ linux-2.6/block/blk-throttle.c 2011-03-07 10:54:30.639467538 -0500
> >@@ -757,6 +757,14 @@ static void throtl_process_limit_change(
> >  " riops=%u wiops=%u", tg->bps[READ],
> >  tg->bps[WRITE], tg->iops[READ],
> >  tg->iops[WRITE]);
> >+ /*
> >+ * Restart the slices for both READ and WRITES. It
> >+ * might happen that a group's limit are dropped
> >+ * suddenly and we don't want to account recently
> >+ * dispatched IO with new low rate
> >+ */
> >+ throtl_start_new_slice(td, tg, 0);
> >+ throtl_start_new_slice(td, tg, 1);
> >  tg_update_disptime(td, tg);
> >  tg->limits_changed = false;
> >  }
> >@@ -1023,6 +1031,19 @@ int blk_throtl_bio(struct request_queue 
> >  /* Bio is with-in rate limit of group */
> >  if (tg_may_dispatch(td, tg, bio, NULL)) {
> >  throtl_charge_bio(tg, bio);
> >+
> >+ /*
> >+ * We need to trim slice even when bios are not being queued
> >+ * otherwise it might happen that a bio is not queued for
> >+ * a long time and slice keeps on extending and trim is not
> >+ * called for a long time. Now if limits are reduced suddenly
> >+ * we take into account all the IO dispatched so far at new
> >+ * low rate and * newly queued IO gets a really long dispatch
> >+ * time.
> >+ *
> >+ * So keep on trimming slice even if bio is not queued.
> >+ */
> >+ throtl_trim_slice(td, tg, rw);
> >  goto out;
> >  }
>  
> Thanks
>  Lina

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: blk-throttle.c : When limit is changed, must start a new slice
       [not found] <tencent_6A5F95FF2112DFE963C44E4E@qq.com>
@ 2011-03-08 20:54 ` Vivek Goyal
  0 siblings, 0 replies; 4+ messages in thread
From: Vivek Goyal @ 2011-03-08 20:54 UTC (permalink / raw)
  To: lina; +Cc: linux kernel mailing list

On Tue, Mar 08, 2011 at 11:03:59PM +0800, lina wrote:

[..]
> >>  Unfortunately, the following patch still has 5~10 seconds latency. I have no
> >>  idea to resolve this problem, it seens hard to find a more suitable func to
> >>  call throtl_start_new_slice().
> >
> >So are you saying that following patch did not solve the latnecy issue?
> >Resetting slice upon limit change did not work for you?
> ></:includetail>
>  </:includetail> 
>  Yes, the following patch did not solve the latency issue. There is still 5~10</:includetail>
>  seconds latency when I change the limit from a very high value to low. From </:includetail>
>  blktrace, I find that the throtl_process_limit_change() is called after work </:includetail>
>  queue </:includetail>delay.</:includetail>
>  </:includetail> 
>  Thanks</:includetail>
>  Lina</:includetail>
>  </:includetail></:includetail> 
>  </:includetail>>Thanks

Ok,

Can you try the attached patch. I think what was happening that after
changing limits, work was not being scheduled as there were no queued
bios hence no slice reset was taking place immediately.

Also I am not sure from where these "</:includetail>" strings are coming.
Looks like your mailer is inserting those. Trying sending mails in text
format.

Thanks
Vivek

---
 block/blk-throttle.c |   24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

Index: linux-2.6/block/blk-throttle.c
===================================================================
--- linux-2.6.orig/block/blk-throttle.c	2011-03-04 13:59:45.000000000 -0500
+++ linux-2.6/block/blk-throttle.c	2011-03-08 15:41:19.384654732 -0500
@@ -757,6 +757,14 @@ static void throtl_process_limit_change(
 				" riops=%u wiops=%u", tg->bps[READ],
 				tg->bps[WRITE], tg->iops[READ],
 				tg->iops[WRITE]);
+			/*
+			 * Restart the slices for both READ and WRITES. It
+			 * might happen that a group's limit are dropped
+			 * suddenly and we don't want to account recently
+			 * dispatched IO with new low rate
+			 */
+			throtl_start_new_slice(td, tg, 0);
+			throtl_start_new_slice(td, tg, 1);
 			tg_update_disptime(td, tg);
 			tg->limits_changed = false;
 		}
@@ -825,7 +833,8 @@ throtl_schedule_delayed_work(struct thro
 
 	struct delayed_work *dwork = &td->throtl_work;
 
-	if (total_nr_queued(td) > 0) {
+	/* schedule work if limits changed even if no bio is queued */
+	if (total_nr_queued(td) > 0 || atomic_read(&td->limits_changed)) {
 		/*
 		 * We might have a work scheduled to be executed in future.
 		 * Cancel that and schedule a new one.
@@ -1023,6 +1032,19 @@ int blk_throtl_bio(struct request_queue 
 	/* Bio is with-in rate limit of group */
 	if (tg_may_dispatch(td, tg, bio, NULL)) {
 		throtl_charge_bio(tg, bio);
+
+		/*
+		 * We need to trim slice even when bios are not being queued
+		 * otherwise it might happen that a bio is not queued for
+		 * a long time and slice keeps on extending and trim is not
+		 * called for a long time. Now if limits are reduced suddenly
+		 * we take into account all the IO dispatched so far at new
+		 * low rate and * newly queued IO gets a really long dispatch
+		 * time.
+		 *
+		 * So keep on trimming slice even if bio is not queued.
+		 */
+		throtl_trim_slice(td, tg, rw);
 		goto out;
 	}
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-03-08 20:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <tencent_1FCD09C260CD8FFB3D3886AD@qq.com>
2011-03-07 16:07 ` blk-throttle.c : When limit is changed, must start a new slice Vivek Goyal
     [not found] <tencent_6A5F95FF2112DFE963C44E4E@qq.com>
2011-03-08 20:54 ` Vivek Goyal
     [not found] <tencent_5B2F59AF17F4410D6A9CFA34@qq.com>
2011-03-08 14:51 ` Vivek Goyal
     [not found] <tencent_323C67E7636EA7895ACB367F@qq.com>
2011-03-04 19:30 ` Vivek Goyal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).