* Re: blk-throttle.c : When limit is changed, must start a new slice [not found] <tencent_6A5F95FF2112DFE963C44E4E@qq.com> @ 2011-03-08 20:54 ` Vivek Goyal 2011-03-09 15:40 ` lulina_nuaa 2011-03-10 16:38 ` Lina Lu 2 siblings, 0 replies; 10+ messages in thread From: Vivek Goyal @ 2011-03-08 20:54 UTC (permalink / raw) To: lina; +Cc: linux kernel mailing list On Tue, Mar 08, 2011 at 11:03:59PM +0800, lina wrote: [..] > >> Unfortunately, the following patch still has 5~10 seconds latency. I have no > >> idea to resolve this problem, it seens hard to find a more suitable func to > >> call throtl_start_new_slice(). > > > >So are you saying that following patch did not solve the latnecy issue? > >Resetting slice upon limit change did not work for you? > ></:includetail> > </:includetail> > Yes, the following patch did not solve the latency issue. There is still 5~10</:includetail> > seconds latency when I change the limit from a very high value to low. From </:includetail> > blktrace, I find that the throtl_process_limit_change() is called after work </:includetail> > queue </:includetail>delay.</:includetail> > </:includetail> > Thanks</:includetail> > Lina</:includetail> > </:includetail></:includetail> > </:includetail>>Thanks Ok, Can you try the attached patch. I think what was happening that after changing limits, work was not being scheduled as there were no queued bios hence no slice reset was taking place immediately. Also I am not sure from where these "</:includetail>" strings are coming. Looks like your mailer is inserting those. Trying sending mails in text format. Thanks Vivek --- block/blk-throttle.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) Index: linux-2.6/block/blk-throttle.c =================================================================== --- linux-2.6.orig/block/blk-throttle.c 2011-03-04 13:59:45.000000000 -0500 +++ linux-2.6/block/blk-throttle.c 2011-03-08 15:41:19.384654732 -0500 @@ -757,6 +757,14 @@ static void throtl_process_limit_change( " riops=%u wiops=%u", tg->bps[READ], tg->bps[WRITE], tg->iops[READ], tg->iops[WRITE]); + /* + * Restart the slices for both READ and WRITES. It + * might happen that a group's limit are dropped + * suddenly and we don't want to account recently + * dispatched IO with new low rate + */ + throtl_start_new_slice(td, tg, 0); + throtl_start_new_slice(td, tg, 1); tg_update_disptime(td, tg); tg->limits_changed = false; } @@ -825,7 +833,8 @@ throtl_schedule_delayed_work(struct thro struct delayed_work *dwork = &td->throtl_work; - if (total_nr_queued(td) > 0) { + /* schedule work if limits changed even if no bio is queued */ + if (total_nr_queued(td) > 0 || atomic_read(&td->limits_changed)) { /* * We might have a work scheduled to be executed in future. * Cancel that and schedule a new one. @@ -1023,6 +1032,19 @@ int blk_throtl_bio(struct request_queue /* Bio is with-in rate limit of group */ if (tg_may_dispatch(td, tg, bio, NULL)) { throtl_charge_bio(tg, bio); + + /* + * We need to trim slice even when bios are not being queued + * otherwise it might happen that a bio is not queued for + * a long time and slice keeps on extending and trim is not + * called for a long time. Now if limits are reduced suddenly + * we take into account all the IO dispatched so far at new + * low rate and * newly queued IO gets a really long dispatch + * time. + * + * So keep on trimming slice even if bio is not queued. + */ + throtl_trim_slice(td, tg, rw); goto out; } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Re: blk-throttle.c : When limit is changed, must start a new slice [not found] <tencent_6A5F95FF2112DFE963C44E4E@qq.com> 2011-03-08 20:54 ` blk-throttle.c : When limit is changed, must start a new slice Vivek Goyal @ 2011-03-09 15:40 ` lulina_nuaa 2011-03-10 16:38 ` Lina Lu 2 siblings, 0 replies; 10+ messages in thread From: lulina_nuaa @ 2011-03-09 15:40 UTC (permalink / raw) To: Vivek Goyal; +Cc: linux kernel mailing list >On 2011-03-09 04:54:43, Vivek Goyal wrote: > >On Tue, Mar 08, 2011 at 11:03:59PM +0800, lina wrote: >[..] >> >> Unfortunately, the following patch still has 5~10 seconds latency. I have no >> >> idea to resolve this problem, it seens hard to find a more suitable func to >> >> call throtl_start_new_slice(). >> > >> >So are you saying that following patch did not solve the latnecy issue? >> >Resetting slice upon limit change did not work for you? >> > >> >> Yes, the following patch did not solve the latency issue. There is still 5~10 >> seconds latency when I change the limit from a very high value to low. From >> blktrace, I find that the throtl_process_limit_change() is called after work >> queue delay. >> >> Thanks >> Lina > >Ok, > >Can you try the attached patch. I think what was happening that after >changing limits, work was not being scheduled as there were no queued >bios hence no slice reset was taking place immediately. > >[..] > >Thanks >Vivek > I have remove the HTML code, I'm sorry for the mail format! Thank you very much for the following patch! I think it can solve the problem. I'll test it as soon as possible, and will inform you once get the result! Thanks Lina >--- > block/blk-throttle.c | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) > >Index: linux-2.6/block/blk-throttle.c >=================================================================== >--- linux-2.6.orig/block/blk-throttle.c 2011-03-04 13:59:45.000000000 -0500 >+++ linux-2.6/block/blk-throttle.c 2011-03-08 15:41:19.384654732 -0500 >@@ -757,6 +757,14 @@ static void throtl_process_limit_change( > " riops=%u wiops=%u", tg->bps[READ], > tg->bps[WRITE], tg->iops[READ], > tg->iops[WRITE]); >+ /* >+ * Restart the slices for both READ and WRITES. It >+ * might happen that a group's limit are dropped >+ * suddenly and we don't want to account recently >+ * dispatched IO with new low rate >+ */ >+ throtl_start_new_slice(td, tg, 0); >+ throtl_start_new_slice(td, tg, 1); > tg_update_disptime(td, tg); > tg->limits_changed = false; > } >@@ -825,7 +833,8 @@ throtl_schedule_delayed_work(struct thro > > struct delayed_work *dwork = &td->throtl_work; > >- if (total_nr_queued(td) > 0) { >+ /* schedule work if limits changed even if no bio is queued */ >+ if (total_nr_queued(td) > 0 || atomic_read(&td->limits_changed)) { > /* > * We might have a work scheduled to be executed in future. > * Cancel that and schedule a new one. >@@ -1023,6 +1032,19 @@ int blk_throtl_bio(struct request_queue > /* Bio is with-in rate limit of group */ > if (tg_may_dispatch(td, tg, bio, NULL)) { > throtl_charge_bio(tg, bio); >+ >+ /* >+ * We need to trim slice even when bios are not being queued >+ * otherwise it might happen that a bio is not queued for >+ * a long time and slice keeps on extending and trim is not >+ * called for a long time. Now if limits are reduced suddenly >+ * we take into account all the IO dispatched so far at new >+ * low rate and * newly queued IO gets a really long dispatch >+ * time. >+ * >+ * So keep on trimming slice even if bio is not queued. >+ */ >+ throtl_trim_slice(td, tg, rw); > goto out; > } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Re: blk-throttle.c : When limit is changed, must start a new slice [not found] <tencent_6A5F95FF2112DFE963C44E4E@qq.com> 2011-03-08 20:54 ` blk-throttle.c : When limit is changed, must start a new slice Vivek Goyal 2011-03-09 15:40 ` lulina_nuaa @ 2011-03-10 16:38 ` Lina Lu 2011-03-10 19:55 ` Vivek Goyal 2011-03-12 11:33 ` Re: Re: blk-throttle.c : When limit is changed, must start a newslice Lina Lu 2 siblings, 2 replies; 10+ messages in thread From: Lina Lu @ 2011-03-10 16:38 UTC (permalink / raw) To: Vivek Goyal; +Cc: linux kernel mailing list On 2011-03-09 04:54:43, Vivek Goyal wrote: > >On Tue, Mar 08, 2011 at 11:03:59PM +0800, lina wrote: >[..] >> >> Unfortunately, the following patch still has 5~10 seconds latency. I have no >> >> idea to resolve this problem, it seens hard to find a more suitable func to >> >> call throtl_start_new_slice(). >> > >> >So are you saying that following patch did not solve the latnecy issue? >> >Resetting slice upon limit change did not work for you? >> > >> >> Yes, the following patch did not solve the latency issue. There is still 5~10 >> seconds latency when I change the limit from a very high value to low. From >> blktrace, I find that the throtl_process_limit_change() is called after work >> queue delay. >> >> Thanks >> Lina > >Ok, > >Can you try the attached patch. I think what was happening that after >changing limits, work was not being scheduled as there were no queued >bios hence no slice reset was taking place immediately. > >[..] > >Thanks >Vivek > Hi Vivek, I have test the following patch, but the latency still there. I try to find why there are 5~10 seconds latency today. After collect the blktrace, I think the reason is that throtl_trim_slice() don't aways update the tg->slice_start[rw], although we call it once dispatch a bio. Suppose that if the limits change now from 102400000000 to 1024000, the tg->slice_start[rw] and tg->slice_end[rw] just like in the following chart. There is two throtl_slice in the chart. Here my HZ is 250, so the throtl_slice is 25. jiffies | |------------------|------------------| | | start end As the jiffies - start < 25(throtl_slice), throtl_trim_slice() will not update the tg->slice_start[rw] and tg->bytes_disp[rw]. If the tg->bytes_disp[rw] now is 8M, then there will be about 7 seconds from jiffies 0 bps as I have set the limits at 1M/s, in these seconds no bio can be dispatched. As the tg->slice_start[rw] must less than or equal to jiffies, and we can not know the reason of tg->bytes_disp[rw] > the theoretical value with limits 1M/s, So can not just set the tg->slice_start[rw] to jiffies here. If set the start to jiffies, throtl will not work. I think if we can start a new slice in the next throtl_slice when the limits changed from high to low and the tg->bytes_disp[rw] is critical greater than the theoretical value with now limits, this problem can be solved. Thanks Lina >--- > block/blk-throttle.c | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) > >Index: linux-2.6/block/blk-throttle.c >=================================================================== >--- linux-2.6.orig/block/blk-throttle.c 2011-03-04 13:59:45.000000000 -0500 >+++ linux-2.6/block/blk-throttle.c 2011-03-08 15:41:19.384654732 -0500 >@@ -757,6 +757,14 @@ static void throtl_process_limit_change( > " riops=%u wiops=%u", tg->bps[READ], > tg->bps[WRITE], tg->iops[READ], > tg->iops[WRITE]); >+ /* >+ * Restart the slices for both READ and WRITES. It >+ * might happen that a group's limit are dropped >+ * suddenly and we don't want to account recently >+ * dispatched IO with new low rate >+ */ >+ throtl_start_new_slice(td, tg, 0); >+ throtl_start_new_slice(td, tg, 1); > tg_update_disptime(td, tg); > tg->limits_changed = false; > } >@@ -825,7 +833,8 @@ throtl_schedule_delayed_work(struct thro > > struct delayed_work *dwork = &td->throtl_work; > >- if (total_nr_queued(td) > 0) { >+ /* schedule work if limits changed even if no bio is queued */ >+ if (total_nr_queued(td) > 0 || atomic_read(&td->limits_changed)) { > /* > * We might have a work scheduled to be executed in future. > * Cancel that and schedule a new one. >@@ -1023,6 +1032,19 @@ int blk_throtl_bio(struct request_queue > /* Bio is with-in rate limit of group */ > if (tg_may_dispatch(td, tg, bio, NULL)) { > throtl_charge_bio(tg, bio); >+ >+ /* >+ * We need to trim slice even when bios are not being queued >+ * otherwise it might happen that a bio is not queued for >+ * a long time and slice keeps on extending and trim is not >+ * called for a long time. Now if limits are reduced suddenly >+ * we take into account all the IO dispatched so far at new >+ * low rate and * newly queued IO gets a really long dispatch >+ * time. >+ * >+ * So keep on trimming slice even if bio is not queued. >+ */ >+ throtl_trim_slice(td, tg, rw); > goto out; > } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Re: blk-throttle.c : When limit is changed, must start a new slice 2011-03-10 16:38 ` Lina Lu @ 2011-03-10 19:55 ` Vivek Goyal 2011-03-12 11:33 ` Re: Re: blk-throttle.c : When limit is changed, must start a newslice Lina Lu 1 sibling, 0 replies; 10+ messages in thread From: Vivek Goyal @ 2011-03-10 19:55 UTC (permalink / raw) To: Lina Lu; +Cc: linux kernel mailing list On Fri, Mar 11, 2011 at 12:38:18AM +0800, Lina Lu wrote: > On 2011-03-09 04:54:43, Vivek Goyal wrote: > > > >On Tue, Mar 08, 2011 at 11:03:59PM +0800, lina wrote: > > >[..] > >> >> Unfortunately, the following patch still has 5~10 seconds latency. I have no > >> >> idea to resolve this problem, it seens hard to find a more suitable func to > >> >> call throtl_start_new_slice(). > >> > > >> >So are you saying that following patch did not solve the latnecy issue? > >> >Resetting slice upon limit change did not work for you? > >> > > >> > >> Yes, the following patch did not solve the latency issue. There is still 5~10 > >> seconds latency when I change the limit from a very high value to low. From > >> blktrace, I find that the throtl_process_limit_change() is called after work > >> queue delay. > >> > >> Thanks > >> Lina > > > >Ok, > > > >Can you try the attached patch. I think what was happening that after > >changing limits, work was not being scheduled as there were no queued > >bios hence no slice reset was taking place immediately. > > > >[..] > > > >Thanks > >Vivek > > > > Hi Vivek, > I have test the following patch, but the latency still there. > > I try to find why there are 5~10 seconds latency today. After collect the blktrace, I > think the reason is that throtl_trim_slice() don't aways update the tg->slice_start[rw], > although we call it once dispatch a bio. lina, Trim slice should not even matter now. Upon limit change, this patch should reset the slice and start a new one irrespective of the fact where are. In your traces, do you see limit change message and do you see a new slice starting. I did similar test yesterday on my box and this patch worked. Can you capture some block traces and I can have a look at those. Key thing to look for is limit change message and whether it started a new slice or not. Thanks Vivek > > Suppose that if the limits change now from 102400000000 to 1024000, the > tg->slice_start[rw] and tg->slice_end[rw] just like in the following chart. There is two > throtl_slice in the chart. Here my HZ is 250, so the throtl_slice is 25. > > jiffies > | > |------------------|------------------| > | | > start end > > As the jiffies - start < 25(throtl_slice), throtl_trim_slice() will not update the > tg->slice_start[rw] and tg->bytes_disp[rw]. If the tg->bytes_disp[rw] now is 8M, then > there will be about 7 seconds from jiffies 0 bps as I have set the limits at 1M/s, in > these seconds no bio can be dispatched. > > As the tg->slice_start[rw] must less than or equal to jiffies, and we can not know the > reason of tg->bytes_disp[rw] > the theoretical value with limits 1M/s, So can not just > set the tg->slice_start[rw] to jiffies here. If set the start to jiffies, throtl will not work. > > I think if we can start a new slice in the next throtl_slice when the limits changed from > high to low and the tg->bytes_disp[rw] is critical greater than the theoretical value with > now limits, this problem can be solved. > > Thanks > Lina > > >--- > > block/blk-throttle.c | 24 +++++++++++++++++++++++- > > 1 file changed, 23 insertions(+), 1 deletion(-) > > > >Index: linux-2.6/block/blk-throttle.c > >=================================================================== > >--- linux-2.6.orig/block/blk-throttle.c 2011-03-04 13:59:45.000000000 -0500 > >+++ linux-2.6/block/blk-throttle.c 2011-03-08 15:41:19.384654732 -0500 > >@@ -757,6 +757,14 @@ static void throtl_process_limit_change( > > " riops=%u wiops=%u", tg->bps[READ], > > tg->bps[WRITE], tg->iops[READ], > > tg->iops[WRITE]); > >+ /* > >+ * Restart the slices for both READ and WRITES. It > >+ * might happen that a group's limit are dropped > >+ * suddenly and we don't want to account recently > >+ * dispatched IO with new low rate > >+ */ > >+ throtl_start_new_slice(td, tg, 0); > >+ throtl_start_new_slice(td, tg, 1); > > tg_update_disptime(td, tg); > > tg->limits_changed = false; > > } > >@@ -825,7 +833,8 @@ throtl_schedule_delayed_work(struct thro > > > > struct delayed_work *dwork = &td->throtl_work; > > > >- if (total_nr_queued(td) > 0) { > >+ /* schedule work if limits changed even if no bio is queued */ > >+ if (total_nr_queued(td) > 0 || atomic_read(&td->limits_changed)) { > > /* > > * We might have a work scheduled to be executed in future. > > * Cancel that and schedule a new one. > >@@ -1023,6 +1032,19 @@ int blk_throtl_bio(struct request_queue > > /* Bio is with-in rate limit of group */ > > if (tg_may_dispatch(td, tg, bio, NULL)) { > > throtl_charge_bio(tg, bio); > >+ > >+ /* > >+ * We need to trim slice even when bios are not being queued > >+ * otherwise it might happen that a bio is not queued for > >+ * a long time and slice keeps on extending and trim is not > >+ * called for a long time. Now if limits are reduced suddenly > >+ * we take into account all the IO dispatched so far at new > >+ * low rate and * newly queued IO gets a really long dispatch > >+ * time. > >+ * > >+ * So keep on trimming slice even if bio is not queued. > >+ */ > >+ throtl_trim_slice(td, tg, rw); > > goto out; > > } > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Re: Re: blk-throttle.c : When limit is changed, must start a newslice 2011-03-10 16:38 ` Lina Lu 2011-03-10 19:55 ` Vivek Goyal @ 2011-03-12 11:33 ` Lina Lu 2011-03-14 15:17 ` Vivek Goyal 2011-03-14 15:52 ` Re: Re: blk-throttle.c : When limit is changed, must start anewslice Lina Lu 1 sibling, 2 replies; 10+ messages in thread From: Lina Lu @ 2011-03-12 11:33 UTC (permalink / raw) To: Vivek Goyal; +Cc: linux kernel mailing list On 2011-03-11 03:55:55, Vivek Goyal wrote: >On Fri, Mar 11, 2011 at 12:38:18AM +0800, Lina Lu wrote: >> [..] >> Hi Vivek, >> I have test the following patch, but the latency still there. >> >> I try to find why there are 5~10 seconds latency today. After collect the blktrace, I >> think the reason is that throtl_trim_slice() don't aways update the tg->slice_start[rw], >> although we call it once dispatch a bio. > >lina, > >Trim slice should not even matter now. Upon limit change, this patch >should reset the slice and start a new one irrespective of the fact >where are. > >In your traces, do you see limit change message and do you see a new >slice starting. > >I did similar test yesterday on my box and this patch worked. Can you >capture some block traces and I can have a look at those. Key thing >to look for is limit change message and whether it started a new >slice or not. > >Thanks >Vivek > Hi Vivek, Here is the blktrace and iostat results when I change the limit from 1024000000000000 to 1024000. When the limit changed, there is about 3 seconds lantency. blktrace: 253,1 0 0 4.177733270 0 m N throtl / [R] trim slice nr=1 bytes=102400000000000 io=429496729 start=4297788991 end=4297789100 jiffies=4297788992 253,1 0 0 4.187393582 0 m N throtl / [R] extend slice start=4297788991 end=4297789200 jiffies=4297789002 253,1 0 0 4.276120505 0 m N throtl / [R] trim slice nr=1 bytes=102400000000000 io=429496729 start=4297789091 end=4297789200 jiffies=4297789091 253,1 0 0 4.285934091 0 m N throtl / [R] extend slice start=4297789091 end=4297789300 jiffies=4297789101 253,1 1 0 4.348552814 0 m N throtl schedule work. delay=0 jiffies=4297789163 253,1 1 0 4.348571560 0 m N throtl limit changed =1 253,1 0 0 4.349839104 0 m N throtl / [R] extend slice start=4297789091 end=4297793000 jiffies=4297789164 253,1 0 0 4.349844118 0 m N throtl / [R] bio. bdisp=3928064 sz=4096 bps=1024000 iodisp=959 iops=4294967295 queued=0/0 253,1 0 0 4.349850121 0 m N throtl schedule work. delay=3767 jiffies=4297789164 253,1 0 0 4.349912607 0 m N throtl / [R] bio. bdisp=3928064 sz=4096 bps=1024000 iodisp=959 iops=4294967295 queued=1/0 253,1 0 0 4.349915880 0 m N throtl schedule work. delay=3766 jiffies=4297789165 253,1 0 0 4.349921567 0 m N throtl / [R] bio. bdisp=3928064 sz=4096 bps=1024000 iodisp=959 iops=4294967295 queued=2/0 ... #queued 63 read bios with no new slice. 253,1 0 0 4.353728869 0 m N throtl / [R] bio. bdisp=3928064 sz=4096 bps=1024000 iodisp=959 iops=4294967295 queued=61/0 253,1 0 0 4.353731799 0 m N throtl / [R] bio. bdisp=3928064 sz=4096 bps=1024000 iodisp=959 iops=4294967295 queued=62/0 253,1 0 0 4.353735427 0 m N throtl / [R] bio. bdisp=3928064 sz=4096 bps=1024000 iodisp=959 iops=4294967295 queued=63/0 253,1 0 0 8.129092326 0 m N throtl dispatch nr_queued=64 read=64 write=0 253,1 0 0 8.129096924 0 m N throtl / [R] extend slice start=4297789091 end=4297793100 jiffies=4297792944 253,1 0 0 8.129100584 0 m N throtl / [R] trim slice nr=38 bytes=3891200 io=16320875721 start=4297792891 end=4297793100 jiffies=4297792944 253,1 0 0 8.129108331 0 m N throtl bios disp=16 253,1 0 0 8.129111864 0 m N throtl schedule work. delay=51 jiffies=4297792944 253,1 0 0 8.180899035 0 m N throtl dispatch nr_queued=48 read=48 write=0 253,1 0 0 8.180905222 0 m N throtl / [R] trim slice nr=1 bytes=102400 io=429496729 start=4297792991 end=4297793100 jiffies=4297792996 253,1 0 0 8.180915206 0 m N throtl bios disp=25 253,1 0 0 8.180919011 0 m N throtl schedule work. delay=99 jiffies=4297792996 253,1 0 0 8.182058927 0 m N throtl / [R] bio. bdisp=102400 sz=4096 bps=1024000 iodisp=24 iops=4294967295 queued=23/0 iostat: Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util dm-1 0.00 0.00 12992.00 0.00 50.75 0.00 8.00 59.47 4.57 0.08 99.70 dm-1 0.00 0.00 12598.00 0.00 49.21 0.00 8.00 59.03 4.70 0.08 99.70 dm-1 0.00 0.00 12923.00 0.00 50.48 0.00 8.00 54.43 4.22 0.08 98.50 dm-1 0.00 0.00 13103.00 0.00 51.18 0.00 8.00 57.54 4.38 0.08 99.70 dm-1 0.00 0.00 13024.00 0.00 50.88 0.00 8.00 58.67 4.51 0.08 99.70 dm-1 0.00 0.00 12928.00 0.00 50.50 0.00 8.00 58.50 4.53 0.08 99.60 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 66.00 0.00 0.26 0.00 8.00 0.05 0.76 0.03 0.20 dm-1 0.00 0.00 250.00 0.00 0.98 0.00 8.00 0.24 0.98 0.04 1.00 >From the trace we can find 3766 delay(3766/HZ~3.7 seconds), the greater delay value, the longer latency. Sometimes, the delay is low, so the latency does not appear everytime when I change limit from high to low. And the latency seens relate to the device's physic capacity. Here my device has only 50MB/s physic capacity, so there is about 3~5 seconds latency. If the device has 100MB/s physic capacity, the lantcy will be 5~10 seconds latency. There is no new slice trace because the throtl_process_limit_change() is not been called. throtl_process_limit_change() function is called only in throtl_dispatch(), and throtl_dispatch() is called only in blk_throtl_work(). When the limit change from high to low, there is no work queue, so blk_throtl_work() is never called. When the limit change from low to high, I find the new slice trace like following. So it only new slice when there is work queue. 253,1 0 0 60.250888001 0 m N throtl / [R] bio. bdisp=102400 sz=4096 bps=1024000 iodisp=24 iops=4294967295 queued=49/0 253,1 0 0 60.250890858 0 m N throtl / [R] bio. bdisp=102400 sz=4096 bps=1024000 iodisp=24 iops=4294967295 queued=50/0 253,1 0 0 60.349455559 0 m N throtl dispatch nr_queued=51 read=51 write=0 253,1 0 0 60.349460882 0 m N throtl / [R] extend slice start=4297998658 end=4297998900 jiffies=4297998762 253,1 0 0 60.349464810 0 m N throtl / [R] trim slice nr=1 bytes=102400 io=429496729 start=4297998758 end=4297998900 jiffies=4297998762 253,1 0 0 60.349473330 0 m N throtl bios disp=25 253,1 0 0 60.349476631 0 m N throtl schedule work. delay=100 jiffies=4297998762 253,1 1 0 60.375043834 0 m N throtl schedule work. delay=0 jiffies=4297998787 253,1 1 0 60.375062998 0 m N throtl limit changed =1 253,1 1 0 60.375066704 0 m N throtl / limit change rbps=1024000000000000 wbps=18446744073709551615 riops=4294967295 wiops=4294967295 253,1 1 0 60.375069747 0 m N throtl / [R] new slice start=4297998787 end=4297998887 jiffies=4297998787 253,1 1 0 60.375070919 0 m N throtl / [W] new slice start=4297998787 end=4297998887 jiffies=4297998787 253,1 1 0 60.375073946 0 m N throtl dispatch nr_queued=26 read=26 write=0 253,1 1 0 60.375083440 0 m N throtl bios disp=26 253,1 1 0 60.430614460 0 m N throtl / [R] extend slice start=4297998787 end=4297999000 jiffies=4297998843 253,1 1 0 60.476022578 0 m N throtl / [R] trim slice nr=1 bytes=102400000000000 io=429496729 start=4297998887 end=4297999000 jiffies=4297998888 Thanks Lina >> >> Suppose that if the limits change now from 102400000000 to 1024000, the >> tg->slice_start[rw] and tg->slice_end[rw] just like in the following chart. There is two >> throtl_slice in the chart. Here my HZ is 250, so the throtl_slice is 25. >> >> jiffies >> | >> |------------------|------------------| >> | | >> start end >> >> As the jiffies - start < 25(throtl_slice), throtl_trim_slice() will not update the >> tg->slice_start[rw] and tg->bytes_disp[rw]. If the tg->bytes_disp[rw] now is 8M, then >> there will be about 7 seconds from jiffies 0 bps as I have set the limits at 1M/s, in >> these seconds no bio can be dispatched. >> >> As the tg->slice_start[rw] must less than or equal to jiffies, and we can not know the >> reason of tg->bytes_disp[rw] > the theoretical value with limits 1M/s, So can not just >> set the tg->slice_start[rw] to jiffies here. If set the start to jiffies, throtl will not work. >> >> I think if we can start a new slice in the next throtl_slice when the limits changed from >> high to low and the tg->bytes_disp[rw] is critical greater than the theoretical value with >> now limits, this problem can be solved. >> >> Thanks >> Lina >> >> >--- >> > block/blk-throttle.c | 24 +++++++++++++++++++++++- >> > 1 file changed, 23 insertions(+), 1 deletion(-) >> > >> >Index: linux-2.6/block/blk-throttle.c >> >=================================================================== >> >--- linux-2.6.orig/block/blk-throttle.c 2011-03-04 13:59:45.000000000 -0500 >> >+++ linux-2.6/block/blk-throttle.c 2011-03-08 15:41:19.384654732 -0500 >> >@@ -757,6 +757,14 @@ static void throtl_process_limit_change( >> > " riops=%u wiops=%u", tg->bps[READ], >> > tg->bps[WRITE], tg->iops[READ], >> > tg->iops[WRITE]); >> >+ /* >> >+ * Restart the slices for both READ and WRITES. It >> >+ * might happen that a group's limit are dropped >> >+ * suddenly and we don't want to account recently >> >+ * dispatched IO with new low rate >> >+ */ >> >+ throtl_start_new_slice(td, tg, 0); >> >+ throtl_start_new_slice(td, tg, 1); >> > tg_update_disptime(td, tg); >> > tg->limits_changed = false; >> > } >> >@@ -825,7 +833,8 @@ throtl_schedule_delayed_work(struct thro >> > >> > struct delayed_work *dwork = &td->throtl_work; >> > >> >- if (total_nr_queued(td) > 0) { >> >+ /* schedule work if limits changed even if no bio is queued */ >> >+ if (total_nr_queued(td) > 0 || atomic_read(&td->limits_changed)) { >> > /* >> > * We might have a work scheduled to be executed in future. >> > * Cancel that and schedule a new one. >> >@@ -1023,6 +1032,19 @@ int blk_throtl_bio(struct request_queue >> > /* Bio is with-in rate limit of group */ >> > if (tg_may_dispatch(td, tg, bio, NULL)) { >> > throtl_charge_bio(tg, bio); >> >+ >> >+ /* >> >+ * We need to trim slice even when bios are not being queued >> >+ * otherwise it might happen that a bio is not queued for >> >+ * a long time and slice keeps on extending and trim is not >> >+ * called for a long time. Now if limits are reduced suddenly >> >+ * we take into account all the IO dispatched so far at new >> >+ * low rate and * newly queued IO gets a really long dispatch >> >+ * time. >> >+ * >> >+ * So keep on trimming slice even if bio is not queued. >> >+ */ >> >+ throtl_trim_slice(td, tg, rw); >> > goto out; >> > } >> >> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Re: Re: blk-throttle.c : When limit is changed, must start a newslice 2011-03-12 11:33 ` Re: Re: blk-throttle.c : When limit is changed, must start a newslice Lina Lu @ 2011-03-14 15:17 ` Vivek Goyal 2011-03-14 15:52 ` Re: Re: blk-throttle.c : When limit is changed, must start anewslice Lina Lu 1 sibling, 0 replies; 10+ messages in thread From: Vivek Goyal @ 2011-03-14 15:17 UTC (permalink / raw) To: Lina Lu; +Cc: linux kernel mailing list On Sat, Mar 12, 2011 at 07:33:07PM +0800, Lina Lu wrote: > On 2011-03-11 03:55:55, Vivek Goyal wrote: > >On Fri, Mar 11, 2011 at 12:38:18AM +0800, Lina Lu wrote: > >> [..] > >> Hi Vivek, > >> I have test the following patch, but the latency still there. > >> > >> I try to find why there are 5~10 seconds latency today. After collect the blktrace, I > >> think the reason is that throtl_trim_slice() don't aways update the tg->slice_start[rw], > >> although we call it once dispatch a bio. > > > >lina, > > > >Trim slice should not even matter now. Upon limit change, this patch > >should reset the slice and start a new one irrespective of the fact > >where are. > > > >In your traces, do you see limit change message and do you see a new > >slice starting. > > > >I did similar test yesterday on my box and this patch worked. Can you > >capture some block traces and I can have a look at those. Key thing > >to look for is limit change message and whether it started a new > >slice or not. > > > >Thanks > >Vivek > > > > Hi Vivek, > > Here is the blktrace and iostat results when I change the limit from 1024000000000000 > to 1024000. When the limit changed, there is about 3 seconds lantency. > > blktrace: > 253,1 0 0 4.177733270 0 m N throtl / [R] trim slice nr=1 bytes=102400000000000 io=429496729 start=4297788991 end=4297789100 jiffies=4297788992 > 253,1 0 0 4.187393582 0 m N throtl / [R] extend slice start=4297788991 end=4297789200 jiffies=4297789002 > 253,1 0 0 4.276120505 0 m N throtl / [R] trim slice nr=1 bytes=102400000000000 io=429496729 start=4297789091 end=4297789200 jiffies=4297789091 > 253,1 0 0 4.285934091 0 m N throtl / [R] extend slice start=4297789091 end=4297789300 jiffies=4297789101 > 253,1 1 0 4.348552814 0 m N throtl schedule work. delay=0 jiffies=4297789163 > 253,1 1 0 4.348571560 0 m N throtl limit changed =1 > 253,1 0 0 4.349839104 0 m N throtl / [R] extend slice start=4297789091 end=4297793000 jiffies=4297789164 > 253,1 0 0 4.349844118 0 m N throtl / [R] bio. bdisp=3928064 sz=4096 bps=1024000 iodisp=959 iops=4294967295 queued=0/0 Lina, Thanks for the traces. I think we did call process_limit_change() but we did not start the new slice. I guess this happened because, we seem to be starting slice only if group on run tree. Because before limit udpates, most likely group is not on run tree as limits are very high, hence we missed resetting the slice. hlist_for_each_entry_safe(tg, pos, n, &td->tg_list, tg_node) { if (throtl_tg_on_rr(tg) && tg->limits_changed) { throtl_log_tg(td, tg, "limit change rbps=%llu wbps=%llu" " riops=%u wiops=%u", tg->bps[READ], tg->bps[WRITE], tg->iops[READ], tg->iops[WRITE]); Actually many races have been fixed in Jens's block tree. Is it possible to test origin/for-2.6.39/core branch of Jens's tree with following patch applied and see if it fixes the issue for you? Thanks Vivek --- block/blk-throttle.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) Index: linux-2.6-block/block/blk-throttle.c =================================================================== --- linux-2.6-block.orig/block/blk-throttle.c 2011-03-14 10:27:57.000000000 -0400 +++ linux-2.6-block/block/blk-throttle.c 2011-03-14 10:30:47.267170956 -0400 @@ -756,6 +756,15 @@ static void throtl_process_limit_change( " riops=%u wiops=%u", tg->bps[READ], tg->bps[WRITE], tg->iops[READ], tg->iops[WRITE]); + /* + * Restart the slices for both READ and WRITES. It + * might happen that a group's limit are dropped + * suddenly and we don't want to account recently + * dispatched IO with new low rate + */ + throtl_start_new_slice(td, tg, 0); + throtl_start_new_slice(td, tg, 1); + if (throtl_tg_on_rr(tg)) tg_update_disptime(td, tg); } @@ -821,7 +830,8 @@ throtl_schedule_delayed_work(struct thro struct delayed_work *dwork = &td->throtl_work; - if (total_nr_queued(td) > 0) { + /* schedule work if limits changed even if no bio is queued */ + if (total_nr_queued(td) > 0 || td->limits_changed) { /* * We might have a work scheduled to be executed in future. * Cancel that and schedule a new one. @@ -1002,6 +1012,19 @@ int blk_throtl_bio(struct request_queue /* Bio is with-in rate limit of group */ if (tg_may_dispatch(td, tg, bio, NULL)) { throtl_charge_bio(tg, bio); + + /* + * We need to trim slice even when bios are not being queued + * otherwise it might happen that a bio is not queued for + * a long time and slice keeps on extending and trim is not + * called for a long time. Now if limits are reduced suddenly + * we take into account all the IO dispatched so far at new + * low rate and * newly queued IO gets a really long dispatch + * time. + * + * So keep on trimming slice even if bio is not queued. + */ + throtl_trim_slice(td, tg, rw); goto out; } ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Re: Re: blk-throttle.c : When limit is changed, must start anewslice 2011-03-12 11:33 ` Re: Re: blk-throttle.c : When limit is changed, must start a newslice Lina Lu 2011-03-14 15:17 ` Vivek Goyal @ 2011-03-14 15:52 ` Lina Lu 2011-03-14 15:51 ` Vivek Goyal 2011-03-15 15:00 ` Re: Re: blk-throttle.c : When limit is changed, must startanewslice Lina Lu 1 sibling, 2 replies; 10+ messages in thread From: Lina Lu @ 2011-03-14 15:52 UTC (permalink / raw) To: Vivek Goyal; +Cc: linux kernel mailing list On 2011-03-14 23:18:31, Vivek Goyal wrote: >On Sat, Mar 12, 2011 at 07:33:07PM +0800, Lina Lu wrote: >> On 2011-03-11 03:55:55, Vivek Goyal wrote: >> >On Fri, Mar 11, 2011 at 12:38:18AM +0800, Lina Lu wrote: >> >> [..] >> >> Hi Vivek, >> >> I have test the following patch, but the latency still there. >> >> >> >> I try to find why there are 5~10 seconds latency today. After collect the blktrace, I >> >> think the reason is that throtl_trim_slice() don't aways update the tg->slice_start[rw], >> >> although we call it once dispatch a bio. >> > >> >lina, >> > >> >Trim slice should not even matter now. Upon limit change, this patch >> >should reset the slice and start a new one irrespective of the fact >> >where are. >> > >> >In your traces, do you see limit change message and do you see a new >> >slice starting. >> > >> >I did similar test yesterday on my box and this patch worked. Can you >> >capture some block traces and I can have a look at those. Key thing >> >to look for is limit change message and whether it started a new >> >slice or not. >> > >> >Thanks >> >Vivek >> > >> >> Hi Vivek, >> >> Here is the blktrace and iostat results when I change the limit from 1024000000000000 >> to 1024000. When the limit changed, there is about 3 seconds lantency. >> >> blktrace: >> 253,1 0 0 4.177733270 0 m N throtl / [R] trim slice nr=1 bytes=102400000000000 io=429496729 start=4297788991 end=4297789100 jiffies=4297788992 >> 253,1 0 0 4.187393582 0 m N throtl / [R] extend slice start=4297788991 end=4297789200 jiffies=4297789002 >> 253,1 0 0 4.276120505 0 m N throtl / [R] trim slice nr=1 bytes=102400000000000 io=429496729 start=4297789091 end=4297789200 jiffies=4297789091 >> 253,1 0 0 4.285934091 0 m N throtl / [R] extend slice start=4297789091 end=4297789300 jiffies=4297789101 >> 253,1 1 0 4.348552814 0 m N throtl schedule work. delay=0 jiffies=4297789163 >> 253,1 1 0 4.348571560 0 m N throtl limit changed =1 >> 253,1 0 0 4.349839104 0 m N throtl / [R] extend slice start=4297789091 end=4297793000 jiffies=4297789164 >> 253,1 0 0 4.349844118 0 m N throtl / [R] bio. bdisp=3928064 sz=4096 bps=1024000 iodisp=959 iops=4294967295 queued=0/0 > >Lina, > >Thanks for the traces. > >I think we did call process_limit_change() but we did not start the new >slice. I guess this happened because, we seem to be starting slice only >if group on run tree. Because before limit udpates, most likely group >is not on run tree as limits are very high, hence we missed resetting >the slice. > > hlist_for_each_entry_safe(tg, pos, n, &td->tg_list, tg_node) { > if (throtl_tg_on_rr(tg) && tg->limits_changed) { > throtl_log_tg(td, tg, "limit change rbps=%llu wbps=%llu" > " riops=%u wiops=%u", tg->bps[READ], > tg->bps[WRITE], tg->iops[READ], > tg->iops[WRITE]); > Do you mean that throtl_tg_on_rr() function returns 0 when the limits are very high? >Actually many races have been fixed in Jens's block tree. Is it possible to >test origin/for-2.6.39/core branch of Jens's tree with following patch applied >and see if it fixes the issue for you? I only find 2.6.38 core in gitweb. Do you mean origin/for-2.6.38/core branch? I'll test it as soon as possible and keep you know the result. >Thanks >Vivek > >--- > block/blk-throttle.c | 25 ++++++++++++++++++++++++- > 1 file changed, 24 insertions(+), 1 deletion(-) > >Index: linux-2.6-block/block/blk-throttle.c >=================================================================== >--- linux-2.6-block.orig/block/blk-throttle.c 2011-03-14 10:27:57.000000000 -0400 >+++ linux-2.6-block/block/blk-throttle.c 2011-03-14 10:30:47.267170956 -0400 >@@ -756,6 +756,15 @@ static void throtl_process_limit_change( > " riops=%u wiops=%u", tg->bps[READ], tg->bps[WRITE], > tg->iops[READ], tg->iops[WRITE]); > >+ /* >+ * Restart the slices for both READ and WRITES. It >+ * might happen that a group's limit are dropped >+ * suddenly and we don't want to account recently >+ * dispatched IO with new low rate >+ */ >+ throtl_start_new_slice(td, tg, 0); >+ throtl_start_new_slice(td, tg, 1); >+ > if (throtl_tg_on_rr(tg)) > tg_update_disptime(td, tg); > } >@@ -821,7 +830,8 @@ throtl_schedule_delayed_work(struct thro > > struct delayed_work *dwork = &td->throtl_work; > >- if (total_nr_queued(td) > 0) { >+ /* schedule work if limits changed even if no bio is queued */ >+ if (total_nr_queued(td) > 0 || td->limits_changed) { > /* > * We might have a work scheduled to be executed in future. > * Cancel that and schedule a new one. >@@ -1002,6 +1012,19 @@ int blk_throtl_bio(struct request_queue > /* Bio is with-in rate limit of group */ > if (tg_may_dispatch(td, tg, bio, NULL)) { > throtl_charge_bio(tg, bio); >+ >+ /* >+ * We need to trim slice even when bios are not being queued >+ * otherwise it might happen that a bio is not queued for >+ * a long time and slice keeps on extending and trim is not >+ * called for a long time. Now if limits are reduced suddenly >+ * we take into account all the IO dispatched so far at new >+ * low rate and * newly queued IO gets a really long dispatch >+ * time. >+ * >+ * So keep on trimming slice even if bio is not queued. >+ */ >+ throtl_trim_slice(td, tg, rw); > goto out; > } t ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Re: Re: blk-throttle.c : When limit is changed, must start anewslice 2011-03-14 15:52 ` Re: Re: blk-throttle.c : When limit is changed, must start anewslice Lina Lu @ 2011-03-14 15:51 ` Vivek Goyal 2011-03-15 15:00 ` Re: Re: blk-throttle.c : When limit is changed, must startanewslice Lina Lu 1 sibling, 0 replies; 10+ messages in thread From: Vivek Goyal @ 2011-03-14 15:51 UTC (permalink / raw) To: Lina Lu; +Cc: linux kernel mailing list On Mon, Mar 14, 2011 at 11:52:36PM +0800, Lina Lu wrote: > On 2011-03-14 23:18:31, Vivek Goyal wrote: > >On Sat, Mar 12, 2011 at 07:33:07PM +0800, Lina Lu wrote: > >> On 2011-03-11 03:55:55, Vivek Goyal wrote: > >> >On Fri, Mar 11, 2011 at 12:38:18AM +0800, Lina Lu wrote: > >> >> [..] > >> >> Hi Vivek, > >> >> I have test the following patch, but the latency still there. > >> >> > >> >> I try to find why there are 5~10 seconds latency today. After collect the blktrace, I > >> >> think the reason is that throtl_trim_slice() don't aways update the tg->slice_start[rw], > >> >> although we call it once dispatch a bio. > >> > > >> >lina, > >> > > >> >Trim slice should not even matter now. Upon limit change, this patch > >> >should reset the slice and start a new one irrespective of the fact > >> >where are. > >> > > >> >In your traces, do you see limit change message and do you see a new > >> >slice starting. > >> > > >> >I did similar test yesterday on my box and this patch worked. Can you > >> >capture some block traces and I can have a look at those. Key thing > >> >to look for is limit change message and whether it started a new > >> >slice or not. > >> > > >> >Thanks > >> >Vivek > >> > > >> > >> Hi Vivek, > >> > >> Here is the blktrace and iostat results when I change the limit from 1024000000000000 > >> to 1024000. When the limit changed, there is about 3 seconds lantency. > >> > >> blktrace: > >> 253,1 0 0 4.177733270 0 m N throtl / [R] trim slice nr=1 bytes=102400000000000 io=429496729 start=4297788991 end=4297789100 jiffies=4297788992 > >> 253,1 0 0 4.187393582 0 m N throtl / [R] extend slice start=4297788991 end=4297789200 jiffies=4297789002 > >> 253,1 0 0 4.276120505 0 m N throtl / [R] trim slice nr=1 bytes=102400000000000 io=429496729 start=4297789091 end=4297789200 jiffies=4297789091 > >> 253,1 0 0 4.285934091 0 m N throtl / [R] extend slice start=4297789091 end=4297789300 jiffies=4297789101 > >> 253,1 1 0 4.348552814 0 m N throtl schedule work. delay=0 jiffies=4297789163 > >> 253,1 1 0 4.348571560 0 m N throtl limit changed =1 > >> 253,1 0 0 4.349839104 0 m N throtl / [R] extend slice start=4297789091 end=4297793000 jiffies=4297789164 > >> 253,1 0 0 4.349844118 0 m N throtl / [R] bio. bdisp=3928064 sz=4096 bps=1024000 iodisp=959 iops=4294967295 queued=0/0 > > > >Lina, > > > >Thanks for the traces. > > > >I think we did call process_limit_change() but we did not start the new > >slice. I guess this happened because, we seem to be starting slice only > >if group on run tree. Because before limit udpates, most likely group > >is not on run tree as limits are very high, hence we missed resetting > >the slice. > > > > hlist_for_each_entry_safe(tg, pos, n, &td->tg_list, tg_node) { > > if (throtl_tg_on_rr(tg) && tg->limits_changed) { > > throtl_log_tg(td, tg, "limit change rbps=%llu wbps=%llu" > > " riops=%u wiops=%u", tg->bps[READ], > > tg->bps[WRITE], tg->iops[READ], > > tg->iops[WRITE]); > > > > Do you mean that throtl_tg_on_rr() function returns 0 when the limits are very > high? Yes. When limits are very high, you will never enqueue a bio hence a group will never be enqueued hence throtl_tg_on_rr=0. > > >Actually many races have been fixed in Jens's block tree. Is it possible to > >test origin/for-2.6.39/core branch of Jens's tree with following patch applied > >and see if it fixes the issue for you? > > I only find 2.6.38 core in gitweb. Do you mean origin/for-2.6.38/core branch? > I'll test it as soon as possible and keep you know the result. Here is Jens's block tree. It is separate from linus's tree. http://git.kernel.org/?p=linux/kernel/git/axboe/linux-2.6-block.git;a=summary Thanks Vivek > > >Thanks > >Vivek > > > >--- > > block/blk-throttle.c | 25 ++++++++++++++++++++++++- > > 1 file changed, 24 insertions(+), 1 deletion(-) > > > >Index: linux-2.6-block/block/blk-throttle.c > >=================================================================== > >--- linux-2.6-block.orig/block/blk-throttle.c 2011-03-14 10:27:57.000000000 -0400 > >+++ linux-2.6-block/block/blk-throttle.c 2011-03-14 10:30:47.267170956 -0400 > >@@ -756,6 +756,15 @@ static void throtl_process_limit_change( > > " riops=%u wiops=%u", tg->bps[READ], tg->bps[WRITE], > > tg->iops[READ], tg->iops[WRITE]); > > > >+ /* > >+ * Restart the slices for both READ and WRITES. It > >+ * might happen that a group's limit are dropped > >+ * suddenly and we don't want to account recently > >+ * dispatched IO with new low rate > >+ */ > >+ throtl_start_new_slice(td, tg, 0); > >+ throtl_start_new_slice(td, tg, 1); > >+ > > if (throtl_tg_on_rr(tg)) > > tg_update_disptime(td, tg); > > } > >@@ -821,7 +830,8 @@ throtl_schedule_delayed_work(struct thro > > > > struct delayed_work *dwork = &td->throtl_work; > > > >- if (total_nr_queued(td) > 0) { > >+ /* schedule work if limits changed even if no bio is queued */ > >+ if (total_nr_queued(td) > 0 || td->limits_changed) { > > /* > > * We might have a work scheduled to be executed in future. > > * Cancel that and schedule a new one. > >@@ -1002,6 +1012,19 @@ int blk_throtl_bio(struct request_queue > > /* Bio is with-in rate limit of group */ > > if (tg_may_dispatch(td, tg, bio, NULL)) { > > throtl_charge_bio(tg, bio); > >+ > >+ /* > >+ * We need to trim slice even when bios are not being queued > >+ * otherwise it might happen that a bio is not queued for > >+ * a long time and slice keeps on extending and trim is not > >+ * called for a long time. Now if limits are reduced suddenly > >+ * we take into account all the IO dispatched so far at new > >+ * low rate and * newly queued IO gets a really long dispatch > >+ * time. > >+ * > >+ * So keep on trimming slice even if bio is not queued. > >+ */ > >+ throtl_trim_slice(td, tg, rw); > > goto out; > > } > > t ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Re: Re: blk-throttle.c : When limit is changed, must startanewslice 2011-03-14 15:52 ` Re: Re: blk-throttle.c : When limit is changed, must start anewslice Lina Lu 2011-03-14 15:51 ` Vivek Goyal @ 2011-03-15 15:00 ` Lina Lu 2011-03-15 15:04 ` Vivek Goyal 1 sibling, 1 reply; 10+ messages in thread From: Lina Lu @ 2011-03-15 15:00 UTC (permalink / raw) To: Vivek Goyal; +Cc: linux kernel mailing list On 2011-03-14 23:52:31, Vivek Goyal wrote: >On Mon, Mar 14, 2011 at 11:52:36PM +0800, Lina Lu wrote: >> On 2011-03-14 23:18:31, Vivek Goyal wrote: >> >On Sat, Mar 12, 2011 at 07:33:07PM +0800, Lina Lu wrote: >> >> On 2011-03-11 03:55:55, Vivek Goyal wrote: >> >> >On Fri, Mar 11, 2011 at 12:38:18AM +0800, Lina Lu wrote: >> >> >> [..] >> >> >> Hi Vivek, >> >> >> I have test the following patch, but the latency still there. >> >> >> >> >> >> I try to find why there are 5~10 seconds latency today. After collect the blktrace, I >> >> >> think the reason is that throtl_trim_slice() don't aways update the tg->slice_start[rw], >> >> >> although we call it once dispatch a bio. >> >> > >> >> >lina, >> >> > >> >> >Trim slice should not even matter now. Upon limit change, this patch >> >> >should reset the slice and start a new one irrespective of the fact >> >> >where are. >> >> > >> >> >In your traces, do you see limit change message and do you see a new >> >> >slice starting. >> >> > >> >> >I did similar test yesterday on my box and this patch worked. Can you >> >> >capture some block traces and I can have a look at those. Key thing >> >> >to look for is limit change message and whether it started a new >> >> >slice or not. >> >> > >> >> >Thanks >> >> >Vivek >> >> > >> >> >> >> Hi Vivek, >> >> >> >> Here is the blktrace and iostat results when I change the limit from 1024000000000000 >> >> to 1024000. When the limit changed, there is about 3 seconds lantency. >> >> >> >> blktrace: >> >> 253,1 0 0 4.177733270 0 m N throtl / [R] trim slice nr=1 bytes=102400000000000 io=429496729 start=4297788991 end=4297789100 jiffies=4297788992 >> >> 253,1 0 0 4.187393582 0 m N throtl / [R] extend slice start=4297788991 end=4297789200 jiffies=4297789002 >> >> 253,1 0 0 4.276120505 0 m N throtl / [R] trim slice nr=1 bytes=102400000000000 io=429496729 start=4297789091 end=4297789200 jiffies=4297789091 >> >> 253,1 0 0 4.285934091 0 m N throtl / [R] extend slice start=4297789091 end=4297789300 jiffies=4297789101 >> >> 253,1 1 0 4.348552814 0 m N throtl schedule work. delay=0 jiffies=4297789163 >> >> 253,1 1 0 4.348571560 0 m N throtl limit changed =1 >> >> 253,1 0 0 4.349839104 0 m N throtl / [R] extend slice start=4297789091 end=4297793000 jiffies=4297789164 >> >> 253,1 0 0 4.349844118 0 m N throtl / [R] bio. bdisp=3928064 sz=4096 bps=1024000 iodisp=959 iops=4294967295 queued=0/0 >> > >> >Lina, >> > >> >Thanks for the traces. >> > >> >I think we did call process_limit_change() but we did not start the new >> >slice. I guess this happened because, we seem to be starting slice only >> >if group on run tree. Because before limit udpates, most likely group >> >is not on run tree as limits are very high, hence we missed resetting >> >the slice. >> > >> > hlist_for_each_entry_safe(tg, pos, n, &td->tg_list, tg_node) { >> > if (throtl_tg_on_rr(tg) && tg->limits_changed) { >> > throtl_log_tg(td, tg, "limit change rbps=%llu wbps=%llu" >> > " riops=%u wiops=%u", tg->bps[READ], >> > tg->bps[WRITE], tg->iops[READ], >> > tg->iops[WRITE]); >> > >> >> Do you mean that throtl_tg_on_rr() function returns 0 when the limits are very >> high? > >Yes. When limits are very high, you will never enqueue a bio hence a >group will never be enqueued hence throtl_tg_on_rr=0. > >> >> >Actually many races have been fixed in Jens's block tree. Is it possible to >> >test origin/for-2.6.39/core branch of Jens's tree with following patch applied >> >and see if it fixes the issue for you? >> >> I only find 2.6.38 core in gitweb. Do you mean origin/for-2.6.38/core branch? >> I'll test it as soon as possible and keep you know the result. > >Here is Jens's block tree. It is separate from linus's tree. > >http://git.kernel.org/?p=linux/kernel/git/axboe/linux-2.6-block.git;a=summary > >Thanks >Vivek > Hi Vivek, I have test the following patch on 2.6.39 core branch of Jens's tree, the bug has been fixed. Can you tell me which patch makes the throtl_tg_on_rr() return 1 even if the limits are very high? Thanks Lina >> >> >Thanks >> >Vivek >> > >> >--- >> > block/blk-throttle.c | 25 ++++++++++++++++++++++++- >> > 1 file changed, 24 insertions(+), 1 deletion(-) >> > >> >Index: linux-2.6-block/block/blk-throttle.c >> >=================================================================== >> >--- linux-2.6-block.orig/block/blk-throttle.c 2011-03-14 10:27:57.000000000 -0400 >> >+++ linux-2.6-block/block/blk-throttle.c 2011-03-14 10:30:47.267170956 -0400 >> >@@ -756,6 +756,15 @@ static void throtl_process_limit_change( >> > " riops=%u wiops=%u", tg->bps[READ], tg->bps[WRITE], >> > tg->iops[READ], tg->iops[WRITE]); >> > >> >+ /* >> >+ * Restart the slices for both READ and WRITES. It >> >+ * might happen that a group's limit are dropped >> >+ * suddenly and we don't want to account recently >> >+ * dispatched IO with new low rate >> >+ */ >> >+ throtl_start_new_slice(td, tg, 0); >> >+ throtl_start_new_slice(td, tg, 1); >> >+ >> > if (throtl_tg_on_rr(tg)) >> > tg_update_disptime(td, tg); >> > } >> >@@ -821,7 +830,8 @@ throtl_schedule_delayed_work(struct thro >> > >> > struct delayed_work *dwork = &td->throtl_work; >> > >> >- if (total_nr_queued(td) > 0) { >> >+ /* schedule work if limits changed even if no bio is queued */ >> >+ if (total_nr_queued(td) > 0 || td->limits_changed) { >> > /* >> > * We might have a work scheduled to be executed in future. >> > * Cancel that and schedule a new one. >> >@@ -1002,6 +1012,19 @@ int blk_throtl_bio(struct request_queue >> > /* Bio is with-in rate limit of group */ >> > if (tg_may_dispatch(td, tg, bio, NULL)) { >> > throtl_charge_bio(tg, bio); >> >+ >> >+ /* >> >+ * We need to trim slice even when bios are not being queued >> >+ * otherwise it might happen that a bio is not queued for >> >+ * a long time and slice keeps on extending and trim is not >> >+ * called for a long time. Now if limits are reduced suddenly >> >+ * we take into account all the IO dispatched so far at new >> >+ * low rate and * newly queued IO gets a really long dispatch >> >+ * time. >> >+ * >> >+ * So keep on trimming slice even if bio is not queued. >> >+ */ >> >+ throtl_trim_slice(td, tg, rw); >> > goto out; >> > } 0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Re: Re: blk-throttle.c : When limit is changed, must startanewslice 2011-03-15 15:00 ` Re: Re: blk-throttle.c : When limit is changed, must startanewslice Lina Lu @ 2011-03-15 15:04 ` Vivek Goyal 0 siblings, 0 replies; 10+ messages in thread From: Vivek Goyal @ 2011-03-15 15:04 UTC (permalink / raw) To: Lina Lu; +Cc: linux kernel mailing list On Tue, Mar 15, 2011 at 11:00:25PM +0800, Lina Lu wrote: [..] > Hi Vivek, > I have test the following patch on 2.6.39 core branch of Jens's tree, the bug has > been fixed. Thanks Lina. I will cleanup this patch and post for inclusion. > > Can you tell me which patch makes the throtl_tg_on_rr() return 1 even if the limits > are very high? Following patches fixes the issue in Jens's tree. commit de701c74a34005e637e1ca2634fbf28fd1debba2 Author: Vivek Goyal <vgoyal@redhat.com> Date: Mon Mar 7 21:09:32 2011 +0100 blk-throttle: Some cleanups and race fixes in limit update code Thanks Vivek ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-03-15 15:04 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <tencent_6A5F95FF2112DFE963C44E4E@qq.com>
2011-03-08 20:54 ` blk-throttle.c : When limit is changed, must start a new slice Vivek Goyal
2011-03-09 15:40 ` lulina_nuaa
2011-03-10 16:38 ` Lina Lu
2011-03-10 19:55 ` Vivek Goyal
2011-03-12 11:33 ` Re: Re: blk-throttle.c : When limit is changed, must start a newslice Lina Lu
2011-03-14 15:17 ` Vivek Goyal
2011-03-14 15:52 ` Re: Re: blk-throttle.c : When limit is changed, must start anewslice Lina Lu
2011-03-14 15:51 ` Vivek Goyal
2011-03-15 15:00 ` Re: Re: blk-throttle.c : When limit is changed, must startanewslice Lina Lu
2011-03-15 15:04 ` Vivek Goyal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox