* [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband @ 2009-04-24 21:47 Alan D. Brunelle 2009-04-27 9:44 ` Ryo Tsuruta 0 siblings, 1 reply; 11+ messages in thread From: Alan D. Brunelle @ 2009-04-24 21:47 UTC (permalink / raw) To: ryov; +Cc: dm-devel, linux-kernel@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 5872 bytes --] Hi Ryo - I don't know if you are taking in patches, but whilst trying to uncover some odd behavior I added some blktrace messages to dm-ioband-ctl.c. If you're keeping one code base for old stuff (2.6.18-ish RHEL stuff) and upstream you'll have to #if around these (the blktrace message stuff came in around 2.6.26 or 27 I think). My test case was to take a single 400GB storage device, put two 200GB partitions on it and then see what the "penalty" or overhead for adding dm-ioband on top. To do this I simply created an ext2 FS on each partition in parallel (two processes each doing a mkfs to one of the partitions). Then I put two dm-ioband devices on top of the two partitions (setting the weight to 100 in both cases - thus they should have equal access). Using default values I was seeing /very/ large differences - on the order of 3X. When I bumped the number of tokens to a large number (10,240) the timings got much closer (<2%). I have found that using weight-iosize performs worse than weight (closer to 5% penalty). I'll try to formalize these results as I go forward and report out on them. In any event, I thought I'd share this patch with you if you are interested... Here's a sampling from some blktrace output (sorry for the wrapping) - I should note that I'm a bit scared to see such large numbers of holds going on when the token count should be >5,000 for each device... Holding these back in an equal access situation is inhibiting the block I/O layer to merge (most) of these (as mkfs performs lots & lots of small but sequential I/Os). ... 8,80 16 0 0.090651446 0 m N ioband 1 hold_nrm 1654 8,80 16 0 0.090653575 0 m N ioband 1 hold_nrm 1655 8,80 16 0 0.090655694 0 m N ioband 1 hold_nrm 1656 8,80 16 0 0.090657609 0 m N ioband 1 hold_nrm 1657 8,80 16 0 0.090659554 0 m N ioband 1 hold_nrm 1658 8,80 16 0 0.090661327 0 m N ioband 1 hold_nrm 1659 8,80 16 0 0.090666237 0 m N ioband 1 hold_nrm 1660 8,80 16 53036 0.090675081 4713 C W 391420657 + 1024 [0] 8,80 16 53037 0.090913365 4713 D W 392995569 + 1024 [mkfs.ext2] 8,80 16 0 0.090950380 0 m N ioband 1 add_iss 1659 1659 8,80 16 0 0.090951296 0 m N ioband 1 add_iss 1658 1658 8,80 16 0 0.090951870 0 m N ioband 1 add_iss 1657 1657 8,80 16 0 0.090952416 0 m N ioband 1 add_iss 1656 1656 8,80 16 0 0.090952965 0 m N ioband 1 add_iss 1655 1655 8,80 16 0 0.090953517 0 m N ioband 1 add_iss 1654 1654 8,80 16 0 0.090954064 0 m N ioband 1 add_iss 1653 1653 8,80 16 0 0.090954610 0 m N ioband 1 add_iss 1652 1652 8,80 16 0 0.090955280 0 m N ioband 1 add_iss 1651 1651 8,80 16 0 0.090956495 0 m N ioband 1 pop_iss 8,80 16 53038 0.090957387 4659 A WS 396655745 + 8 <- (8,82) 6030744 8,80 16 53039 0.090957561 4659 Q WS 396655745 + 8 [kioband/16] 8,80 16 53040 0.090958328 4659 M WS 396655745 + 8 [kioband/16] 8,80 16 0 0.090959595 0 m N ioband 1 pop_iss 8,80 16 53041 0.090959754 4659 A WS 396655753 + 8 <- (8,82) 6030752 8,80 16 53042 0.090960007 4659 Q WS 396655753 + 8 [kioband/16] 8,80 16 53043 0.090960402 4659 M WS 396655753 + 8 [kioband/16] 8,80 16 0 0.090960962 0 m N ioband 1 pop_iss 8,80 16 53044 0.090961104 4659 A WS 396655761 + 8 <- (8,82) 6030760 8,80 16 53045 0.090961231 4659 Q WS 396655761 + 8 [kioband/16] 8,80 16 53046 0.090961496 4659 M WS 396655761 + 8 [kioband/16] 8,80 16 0 0.090961995 0 m N ioband 1 pop_iss 8,80 16 53047 0.090962117 4659 A WS 396655769 + 8 <- (8,82) 6030768 8,80 16 53048 0.090962222 4659 Q WS 396655769 + 8 [kioband/16] 8,80 16 53049 0.090962530 4659 M WS 396655769 + 8 [kioband/16] 8,80 16 0 0.090962974 0 m N ioband 1 pop_iss 8,80 16 53050 0.090963095 4659 A WS 396655777 + 8 <- (8,82) 6030776 8,80 16 53051 0.090963334 4659 Q WS 396655777 + 8 [kioband/16] 8,80 16 53052 0.090963518 4659 M WS 396655777 + 8 [kioband/16] 8,80 16 0 0.090963985 0 m N ioband 1 pop_iss 8,80 16 53053 0.090964220 4659 A WS 396655785 + 8 <- (8,82) 6030784 8,80 16 53054 0.090964327 4659 Q WS 396655785 + 8 [kioband/16] 8,80 16 53055 0.090964632 4659 M WS 396655785 + 8 [kioband/16] 8,80 16 0 0.090965094 0 m N ioband 1 pop_iss 8,80 16 53056 0.090965218 4659 A WS 396655793 + 8 <- (8,82) 6030792 8,80 16 53057 0.090965324 4659 Q WS 396655793 + 8 [kioband/16] 8,80 16 53058 0.090965548 4659 M WS 396655793 + 8 [kioband/16] 8,80 16 0 0.090965991 0 m N ioband 1 pop_iss 8,80 16 53059 0.090966112 4659 A WS 396655801 + 8 <- (8,82) 6030800 8,80 16 53060 0.090966221 4659 Q WS 396655801 + 8 [kioband/16] 8,80 16 53061 0.090966526 4659 M WS 396655801 + 8 [kioband/16] 8,80 16 0 0.090966944 0 m N ioband 1 pop_iss 8,80 16 53062 0.090967065 4659 A WS 396655809 + 8 <- (8,82) 6030808 8,80 16 53063 0.090967173 4659 Q WS 396655809 + 8 [kioband/16] 8,80 16 53064 0.090967383 4659 M WS 396655809 + 8 [kioband/16] 8,80 16 0 0.090968394 0 m N ioband 1 add_iss 1650 1650 8,80 16 0 0.090969068 0 m N ioband 1 add_iss 1649 1649 8,80 16 0 0.090969684 0 m N ioband 1 add_iss 1648 1648 ... Regards, Alan D. Brunelle Hewlett-Packard [-- Attachment #2: 0001-Added-in-blktrace-msgs-for-dm-ioband.patch --] [-- Type: text/x-diff, Size: 3655 bytes --] >From bd918c40d92e4f074763a88bc8f13593c4f2dc52 Mon Sep 17 00:00:00 2001 From: Alan D. Brunelle <alan.brunelle@hp.com> Date: Fri, 24 Apr 2009 17:30:32 -0400 Subject: [PATCH] Added in blktrace msgs for dm-ioband Added the following messages: In hold_bio - added messages as they are being added to either the urgent or normal hold lists. ioband <g_name> hold_urg <g_blocked> ioband <g_name> hold_nrm <g_blocked> In make_issue_list - added messages when placing previously held bios onto either the pushback or issue lists: ioband <g_name> add_pback <g_blocked> <c_blocked> ioband <g_name> add_iss <g_blocked> <c_blocked> In release_urgent bios - added a message indicating that an urgent bio was added to the issue list: ioband <g_name> urg_add_iss> <g_blocked> In ioband_conduct - added messages as bios are being popped and executed (sent to generic_make_request) or pushed back (bio_endio): ioband <g_name> pop_iss ioband <g_name> pop_pback Signed-off-by: Alan D. Brunelle <alan.brunelle@hp.com> --- drivers/md/dm-ioband-ctl.c | 30 +++++++++++++++++++++++++++--- 1 files changed, 27 insertions(+), 3 deletions(-) diff --git a/drivers/md/dm-ioband-ctl.c b/drivers/md/dm-ioband-ctl.c index 29bef11..26ad7a5 100644 --- a/drivers/md/dm-ioband-ctl.c +++ b/drivers/md/dm-ioband-ctl.c @@ -13,6 +13,7 @@ #include <linux/slab.h> #include <linux/workqueue.h> #include <linux/rbtree.h> +#include <linux/blktrace_api.h> #include "dm.h" #include "md.h" #include "dm-bio-list.h" @@ -633,9 +634,15 @@ static void hold_bio(struct ioband_group *gp, struct bio *bio) */ dp->g_prepare_bio(gp, bio, IOBAND_URGENT); bio_list_add(&dp->g_urgent_bios, bio); + blk_add_trace_msg(bdev_get_queue(bio->bi_bdev), + "ioband %s hold_urg %d", dp->g_name, + dp->g_blocked); } else { gp->c_blocked++; dp->g_hold_bio(gp, bio); + blk_add_trace_msg(bdev_get_queue(bio->bi_bdev), + "ioband %s hold_nrm %d", dp->g_name, + gp->c_blocked); } } @@ -676,14 +683,21 @@ static int make_issue_list(struct ioband_group *gp, struct bio *bio, clear_group_blocked(gp); wake_up_all(&gp->c_waitq); } - if (should_pushback_bio(gp)) + if (should_pushback_bio(gp)) { bio_list_add(pushback_list, bio); + blk_add_trace_msg(bdev_get_queue(bio->bi_bdev), + "ioband %s add_pback %d %d", dp->g_name, + dp->g_blocked, gp->c_blocked); + } else { int rw = bio_data_dir(bio); gp->c_stat[rw].deferred++; gp->c_stat[rw].sectors += bio_sectors(bio); bio_list_add(issue_list, bio); + blk_add_trace_msg(bdev_get_queue(bio->bi_bdev), + "ioband %s add_iss %d %d", dp->g_name, + dp->g_blocked, gp->c_blocked); } return prepare_to_issue(gp, bio); } @@ -703,6 +717,9 @@ static void release_urgent_bios(struct ioband_device *dp, dp->g_blocked--; dp->g_issued[bio_data_dir(bio)]++; bio_list_add(issue_list, bio); + blk_add_trace_msg(bdev_get_queue(bio->bi_bdev), + "ioband %s urg_add_iss %d", dp->g_name, + dp->g_blocked); } } @@ -916,10 +933,17 @@ static void ioband_conduct(struct work_struct *work) spin_unlock_irqrestore(&dp->g_lock, flags); - while ((bio = bio_list_pop(&issue_list))) + while ((bio = bio_list_pop(&issue_list))) { + blk_add_trace_msg(bdev_get_queue(bio->bi_bdev), + "ioband %s pop_iss", dp->g_name); generic_make_request(bio); - while ((bio = bio_list_pop(&pushback_list))) + } + + while ((bio = bio_list_pop(&pushback_list))) { + blk_add_trace_msg(bdev_get_queue(bio->bi_bdev), + "ioband %s pop_pback", dp->g_name); bio_endio(bio, -EIO); + } } static int ioband_end_io(struct dm_target *ti, struct bio *bio, -- 1.5.6.3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband 2009-04-24 21:47 [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband Alan D. Brunelle @ 2009-04-27 9:44 ` Ryo Tsuruta 2009-05-04 3:24 ` Li Zefan 0 siblings, 1 reply; 11+ messages in thread From: Ryo Tsuruta @ 2009-04-27 9:44 UTC (permalink / raw) To: Alan.Brunelle; +Cc: dm-devel, linux-kernel Hi Alan, From: "Alan D. Brunelle" <Alan.Brunelle@hp.com> Subject: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband Date: Fri, 24 Apr 2009 17:47:37 -0400 > Hi Ryo - > > I don't know if you are taking in patches, but whilst trying to uncover > some odd behavior I added some blktrace messages to dm-ioband-ctl.c. If > you're keeping one code base for old stuff (2.6.18-ish RHEL stuff) and > upstream you'll have to #if around these (the blktrace message stuff > came in around 2.6.26 or 27 I think). > > My test case was to take a single 400GB storage device, put two 200GB > partitions on it and then see what the "penalty" or overhead for adding > dm-ioband on top. To do this I simply created an ext2 FS on each > partition in parallel (two processes each doing a mkfs to one of the > partitions). Then I put two dm-ioband devices on top of the two > partitions (setting the weight to 100 in both cases - thus they should > have equal access). > > Using default values I was seeing /very/ large differences - on the > order of 3X. When I bumped the number of tokens to a large number > (10,240) the timings got much closer (<2%). I have found that using > weight-iosize performs worse than weight (closer to 5% penalty). I could reproduce similar results. One dm-ioband device seems to stop issuing I/Os for a few seconds at times. I'll investigate more on that. > I'll try to formalize these results as I go forward and report out on > them. In any event, I thought I'd share this patch with you if you are > interested... Thanks. I'll include your patche into the next release. > Here's a sampling from some blktrace output (sorry for the wrapping) - I > should note that I'm a bit scared to see such large numbers of holds > going on when the token count should be >5,000 for each device... > Holding these back in an equal access situation is inhibiting the block > I/O layer to merge (most) of these (as mkfs performs lots & lots of > small but sequential I/Os). > > ... > 8,80 16 0 0.090651446 0 m N ioband 1 hold_nrm 1654 > 8,80 16 0 0.090653575 0 m N ioband 1 hold_nrm 1655 > 8,80 16 0 0.090655694 0 m N ioband 1 hold_nrm 1656 > 8,80 16 0 0.090657609 0 m N ioband 1 hold_nrm 1657 > 8,80 16 0 0.090659554 0 m N ioband 1 hold_nrm 1658 > 8,80 16 0 0.090661327 0 m N ioband 1 hold_nrm 1659 > 8,80 16 0 0.090666237 0 m N ioband 1 hold_nrm 1660 > 8,80 16 53036 0.090675081 4713 C W 391420657 + 1024 [0] > 8,80 16 53037 0.090913365 4713 D W 392995569 + 1024 > [mkfs.ext2] > 8,80 16 0 0.090950380 0 m N ioband 1 add_iss 1659 1659 > 8,80 16 0 0.090951296 0 m N ioband 1 add_iss 1658 1658 > 8,80 16 0 0.090951870 0 m N ioband 1 add_iss 1657 1657 > 8,80 16 0 0.090952416 0 m N ioband 1 add_iss 1656 1656 > 8,80 16 0 0.090952965 0 m N ioband 1 add_iss 1655 1655 > 8,80 16 0 0.090953517 0 m N ioband 1 add_iss 1654 1654 > 8,80 16 0 0.090954064 0 m N ioband 1 add_iss 1653 1653 > 8,80 16 0 0.090954610 0 m N ioband 1 add_iss 1652 1652 > 8,80 16 0 0.090955280 0 m N ioband 1 add_iss 1651 1651 > 8,80 16 0 0.090956495 0 m N ioband 1 pop_iss > 8,80 16 53038 0.090957387 4659 A WS 396655745 + 8 <- (8,82) > 6030744 > 8,80 16 53039 0.090957561 4659 Q WS 396655745 + 8 [kioband/16] > 8,80 16 53040 0.090958328 4659 M WS 396655745 + 8 [kioband/16] > 8,80 16 0 0.090959595 0 m N ioband 1 pop_iss > 8,80 16 53041 0.090959754 4659 A WS 396655753 + 8 <- (8,82) > 6030752 > 8,80 16 53042 0.090960007 4659 Q WS 396655753 + 8 [kioband/16] > 8,80 16 53043 0.090960402 4659 M WS 396655753 + 8 [kioband/16] > 8,80 16 0 0.090960962 0 m N ioband 1 pop_iss > 8,80 16 53044 0.090961104 4659 A WS 396655761 + 8 <- (8,82) > 6030760 > 8,80 16 53045 0.090961231 4659 Q WS 396655761 + 8 [kioband/16] > 8,80 16 53046 0.090961496 4659 M WS 396655761 + 8 [kioband/16] > 8,80 16 0 0.090961995 0 m N ioband 1 pop_iss > 8,80 16 53047 0.090962117 4659 A WS 396655769 + 8 <- (8,82) > 6030768 > 8,80 16 53048 0.090962222 4659 Q WS 396655769 + 8 [kioband/16] > 8,80 16 53049 0.090962530 4659 M WS 396655769 + 8 [kioband/16] > 8,80 16 0 0.090962974 0 m N ioband 1 pop_iss > 8,80 16 53050 0.090963095 4659 A WS 396655777 + 8 <- (8,82) > 6030776 > 8,80 16 53051 0.090963334 4659 Q WS 396655777 + 8 [kioband/16] > 8,80 16 53052 0.090963518 4659 M WS 396655777 + 8 [kioband/16] > 8,80 16 0 0.090963985 0 m N ioband 1 pop_iss > 8,80 16 53053 0.090964220 4659 A WS 396655785 + 8 <- (8,82) > 6030784 > 8,80 16 53054 0.090964327 4659 Q WS 396655785 + 8 [kioband/16] > 8,80 16 53055 0.090964632 4659 M WS 396655785 + 8 [kioband/16] > 8,80 16 0 0.090965094 0 m N ioband 1 pop_iss > 8,80 16 53056 0.090965218 4659 A WS 396655793 + 8 <- (8,82) > 6030792 > 8,80 16 53057 0.090965324 4659 Q WS 396655793 + 8 [kioband/16] > 8,80 16 53058 0.090965548 4659 M WS 396655793 + 8 [kioband/16] > 8,80 16 0 0.090965991 0 m N ioband 1 pop_iss > 8,80 16 53059 0.090966112 4659 A WS 396655801 + 8 <- (8,82) > 6030800 > 8,80 16 53060 0.090966221 4659 Q WS 396655801 + 8 [kioband/16] > 8,80 16 53061 0.090966526 4659 M WS 396655801 + 8 [kioband/16] > 8,80 16 0 0.090966944 0 m N ioband 1 pop_iss > 8,80 16 53062 0.090967065 4659 A WS 396655809 + 8 <- (8,82) > 6030808 > 8,80 16 53063 0.090967173 4659 Q WS 396655809 + 8 [kioband/16] > 8,80 16 53064 0.090967383 4659 M WS 396655809 + 8 [kioband/16] > 8,80 16 0 0.090968394 0 m N ioband 1 add_iss 1650 1650 > 8,80 16 0 0.090969068 0 m N ioband 1 add_iss 1649 1649 > 8,80 16 0 0.090969684 0 m N ioband 1 add_iss 1648 1648 > ... > > Regards, > Alan D. Brunelle > Hewlett-Packard ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband 2009-04-27 9:44 ` Ryo Tsuruta @ 2009-05-04 3:24 ` Li Zefan 2009-05-07 0:23 ` Ryo Tsuruta 0 siblings, 1 reply; 11+ messages in thread From: Li Zefan @ 2009-05-04 3:24 UTC (permalink / raw) To: Ryo Tsuruta; +Cc: Alan.Brunelle, dm-devel, linux-kernel Ryo Tsuruta wrote: > Hi Alan, > >> Hi Ryo - >> >> I don't know if you are taking in patches, but whilst trying to uncover >> some odd behavior I added some blktrace messages to dm-ioband-ctl.c. If >> you're keeping one code base for old stuff (2.6.18-ish RHEL stuff) and >> upstream you'll have to #if around these (the blktrace message stuff >> came in around 2.6.26 or 27 I think). >> >> My test case was to take a single 400GB storage device, put two 200GB >> partitions on it and then see what the "penalty" or overhead for adding >> dm-ioband on top. To do this I simply created an ext2 FS on each >> partition in parallel (two processes each doing a mkfs to one of the >> partitions). Then I put two dm-ioband devices on top of the two >> partitions (setting the weight to 100 in both cases - thus they should >> have equal access). >> >> Using default values I was seeing /very/ large differences - on the >> order of 3X. When I bumped the number of tokens to a large number >> (10,240) the timings got much closer (<2%). I have found that using >> weight-iosize performs worse than weight (closer to 5% penalty). > > I could reproduce similar results. One dm-ioband device seems to stop > issuing I/Os for a few seconds at times. I'll investigate more on that. > >> I'll try to formalize these results as I go forward and report out on >> them. In any event, I thought I'd share this patch with you if you are >> interested... > > Thanks. I'll include your patche into the next release. > IMO we should use TRACE_EVENT instead of adding new blk_add_trace_msg(). >> Here's a sampling from some blktrace output (sorry for the wrapping) - I >> should note that I'm a bit scared to see such large numbers of holds >> going on when the token count should be >5,000 for each device... >> Holding these back in an equal access situation is inhibiting the block >> I/O layer to merge (most) of these (as mkfs performs lots & lots of >> small but sequential I/Os). ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband 2009-05-04 3:24 ` Li Zefan @ 2009-05-07 0:23 ` Ryo Tsuruta 2009-05-11 10:31 ` Ryo Tsuruta 0 siblings, 1 reply; 11+ messages in thread From: Ryo Tsuruta @ 2009-05-07 0:23 UTC (permalink / raw) To: lizf; +Cc: Alan.Brunelle, dm-devel, linux-kernel Hi Li, From: Li Zefan <lizf@cn.fujitsu.com> Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband Date: Mon, 04 May 2009 11:24:27 +0800 > Ryo Tsuruta wrote: > > Hi Alan, > > > >> Hi Ryo - > >> > >> I don't know if you are taking in patches, but whilst trying to uncover > >> some odd behavior I added some blktrace messages to dm-ioband-ctl.c. If > >> you're keeping one code base for old stuff (2.6.18-ish RHEL stuff) and > >> upstream you'll have to #if around these (the blktrace message stuff > >> came in around 2.6.26 or 27 I think). > >> > >> My test case was to take a single 400GB storage device, put two 200GB > >> partitions on it and then see what the "penalty" or overhead for adding > >> dm-ioband on top. To do this I simply created an ext2 FS on each > >> partition in parallel (two processes each doing a mkfs to one of the > >> partitions). Then I put two dm-ioband devices on top of the two > >> partitions (setting the weight to 100 in both cases - thus they should > >> have equal access). > >> > >> Using default values I was seeing /very/ large differences - on the > >> order of 3X. When I bumped the number of tokens to a large number > >> (10,240) the timings got much closer (<2%). I have found that using > >> weight-iosize performs worse than weight (closer to 5% penalty). > > > > I could reproduce similar results. One dm-ioband device seems to stop > > issuing I/Os for a few seconds at times. I'll investigate more on that. > > > >> I'll try to formalize these results as I go forward and report out on > >> them. In any event, I thought I'd share this patch with you if you are > >> interested... > > > > Thanks. I'll include your patche into the next release. > > > > IMO we should use TRACE_EVENT instead of adding new blk_add_trace_msg(). Thanks for your suggestion. I'll use TRACE_EVENT instead. > > >> Here's a sampling from some blktrace output (sorry for the wrapping) - I > >> should note that I'm a bit scared to see such large numbers of holds > >> going on when the token count should be >5,000 for each device... > >> Holding these back in an equal access situation is inhibiting the block > >> I/O layer to merge (most) of these (as mkfs performs lots & lots of > >> small but sequential I/Os). > Thanks, Ryo Tsuruta ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband 2009-05-07 0:23 ` Ryo Tsuruta @ 2009-05-11 10:31 ` Ryo Tsuruta 2009-05-12 3:49 ` Li Zefan 0 siblings, 1 reply; 11+ messages in thread From: Ryo Tsuruta @ 2009-05-11 10:31 UTC (permalink / raw) To: lizf; +Cc: Alan.Brunelle, dm-devel, linux-kernel Hi Li, From: Ryo Tsuruta <ryov@valinux.co.jp> Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband Date: Thu, 07 May 2009 09:23:22 +0900 (JST) > Hi Li, > > From: Li Zefan <lizf@cn.fujitsu.com> > Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband > Date: Mon, 04 May 2009 11:24:27 +0800 > > > Ryo Tsuruta wrote: > > > Hi Alan, > > > > > >> Hi Ryo - > > >> > > >> I don't know if you are taking in patches, but whilst trying to uncover > > >> some odd behavior I added some blktrace messages to dm-ioband-ctl.c. If > > >> you're keeping one code base for old stuff (2.6.18-ish RHEL stuff) and > > >> upstream you'll have to #if around these (the blktrace message stuff > > >> came in around 2.6.26 or 27 I think). > > >> > > >> My test case was to take a single 400GB storage device, put two 200GB > > >> partitions on it and then see what the "penalty" or overhead for adding > > >> dm-ioband on top. To do this I simply created an ext2 FS on each > > >> partition in parallel (two processes each doing a mkfs to one of the > > >> partitions). Then I put two dm-ioband devices on top of the two > > >> partitions (setting the weight to 100 in both cases - thus they should > > >> have equal access). > > >> > > >> Using default values I was seeing /very/ large differences - on the > > >> order of 3X. When I bumped the number of tokens to a large number > > >> (10,240) the timings got much closer (<2%). I have found that using > > >> weight-iosize performs worse than weight (closer to 5% penalty). > > > > > > I could reproduce similar results. One dm-ioband device seems to stop > > > issuing I/Os for a few seconds at times. I'll investigate more on that. > > > > > >> I'll try to formalize these results as I go forward and report out on > > >> them. In any event, I thought I'd share this patch with you if you are > > >> interested... > > > > > > Thanks. I'll include your patche into the next release. > > > > > > > IMO we should use TRACE_EVENT instead of adding new blk_add_trace_msg(). > > Thanks for your suggestion. I'll use TRACE_EVENT instead. blk_add_trace_msg() supports both blktrace and tracepoints. I can get messages from dm-ioband through debugfs. Could you expain why should we use TRACE_EVENT instead? > > > > >> Here's a sampling from some blktrace output (sorry for the wrapping) - I > > >> should note that I'm a bit scared to see such large numbers of holds > > >> going on when the token count should be >5,000 for each device... > > >> Holding these back in an equal access situation is inhibiting the block > > >> I/O layer to merge (most) of these (as mkfs performs lots & lots of > > >> small but sequential I/Os). > > > > Thanks, > Ryo Tsuruta Thanks, Ryo Tsuruta ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband 2009-05-11 10:31 ` Ryo Tsuruta @ 2009-05-12 3:49 ` Li Zefan 2009-05-12 6:11 ` Ryo Tsuruta 0 siblings, 1 reply; 11+ messages in thread From: Li Zefan @ 2009-05-12 3:49 UTC (permalink / raw) To: Ryo Tsuruta; +Cc: Alan.Brunelle, dm-devel, linux-kernel Ryo Tsuruta wrote: > Hi Li, > > From: Ryo Tsuruta <ryov@valinux.co.jp> > Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband > Date: Thu, 07 May 2009 09:23:22 +0900 (JST) > >> Hi Li, >> >> From: Li Zefan <lizf@cn.fujitsu.com> >> Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband >> Date: Mon, 04 May 2009 11:24:27 +0800 >> >>> Ryo Tsuruta wrote: >>>> Hi Alan, >>>> >>>>> Hi Ryo - >>>>> >>>>> I don't know if you are taking in patches, but whilst trying to uncover >>>>> some odd behavior I added some blktrace messages to dm-ioband-ctl.c. If >>>>> you're keeping one code base for old stuff (2.6.18-ish RHEL stuff) and >>>>> upstream you'll have to #if around these (the blktrace message stuff >>>>> came in around 2.6.26 or 27 I think). >>>>> >>>>> My test case was to take a single 400GB storage device, put two 200GB >>>>> partitions on it and then see what the "penalty" or overhead for adding >>>>> dm-ioband on top. To do this I simply created an ext2 FS on each >>>>> partition in parallel (two processes each doing a mkfs to one of the >>>>> partitions). Then I put two dm-ioband devices on top of the two >>>>> partitions (setting the weight to 100 in both cases - thus they should >>>>> have equal access). >>>>> >>>>> Using default values I was seeing /very/ large differences - on the >>>>> order of 3X. When I bumped the number of tokens to a large number >>>>> (10,240) the timings got much closer (<2%). I have found that using >>>>> weight-iosize performs worse than weight (closer to 5% penalty). >>>> I could reproduce similar results. One dm-ioband device seems to stop >>>> issuing I/Os for a few seconds at times. I'll investigate more on that. >>>> >>>>> I'll try to formalize these results as I go forward and report out on >>>>> them. In any event, I thought I'd share this patch with you if you are >>>>> interested... >>>> Thanks. I'll include your patche into the next release. >>>> >>> IMO we should use TRACE_EVENT instead of adding new blk_add_trace_msg(). >> Thanks for your suggestion. I'll use TRACE_EVENT instead. > > blk_add_trace_msg() supports both blktrace and tracepoints. I can > get messages from dm-ioband through debugfs. Could you expain why > should we use TRACE_EVENT instead? > Actually blk_add_trace_msg() has nothing to do with tracepoints.. If we use blk_add_trace_msg() is dm, we can use it in md, various block drivers and even ext4. So the right thing is, if a subsystem wants to add trace facility, it should use tracepoints/TRACE_EVENT. With TRACE_EVENT, you can get output through debugfs too, and it can be used together with blktrace: # echo 1 > /sys/block/dm/trace/enable # echo blk > /debugfs/tracing/current_tracer # echo dm-ioband-foo > /debugfs/tracing/tracing/set_event # cat /deubgfs/tracing/trace_pipe And you can enable dm-ioband-foo while disabling dm-ioband-bar, and you can use filter feature too. >>>>> Here's a sampling from some blktrace output (sorry for the wrapping) - I >>>>> should note that I'm a bit scared to see such large numbers of holds >>>>> going on when the token count should be >5,000 for each device... >>>>> Holding these back in an equal access situation is inhibiting the block >>>>> I/O layer to merge (most) of these (as mkfs performs lots & lots of >>>>> small but sequential I/Os). >> Thanks, >> Ryo Tsuruta > > Thanks, > Ryo Tsuruta > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband 2009-05-12 3:49 ` Li Zefan @ 2009-05-12 6:11 ` Ryo Tsuruta 2009-05-12 8:10 ` Li Zefan 0 siblings, 1 reply; 11+ messages in thread From: Ryo Tsuruta @ 2009-05-12 6:11 UTC (permalink / raw) To: lizf; +Cc: Alan.Brunelle, dm-devel, linux-kernel Hi Li, From: Li Zefan <lizf@cn.fujitsu.com> Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband Date: Tue, 12 May 2009 11:49:07 +0800 > Ryo Tsuruta wrote: > > Hi Li, > > > > From: Ryo Tsuruta <ryov@valinux.co.jp> > > Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband > > Date: Thu, 07 May 2009 09:23:22 +0900 (JST) > > > >> Hi Li, > >> > >> From: Li Zefan <lizf@cn.fujitsu.com> > >> Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband > >> Date: Mon, 04 May 2009 11:24:27 +0800 > >> > >>> Ryo Tsuruta wrote: > >>>> Hi Alan, > >>>> > >>>>> Hi Ryo - > >>>>> > >>>>> I don't know if you are taking in patches, but whilst trying to uncover > >>>>> some odd behavior I added some blktrace messages to dm-ioband-ctl.c. If > >>>>> you're keeping one code base for old stuff (2.6.18-ish RHEL stuff) and > >>>>> upstream you'll have to #if around these (the blktrace message stuff > >>>>> came in around 2.6.26 or 27 I think). > >>>>> > >>>>> My test case was to take a single 400GB storage device, put two 200GB > >>>>> partitions on it and then see what the "penalty" or overhead for adding > >>>>> dm-ioband on top. To do this I simply created an ext2 FS on each > >>>>> partition in parallel (two processes each doing a mkfs to one of the > >>>>> partitions). Then I put two dm-ioband devices on top of the two > >>>>> partitions (setting the weight to 100 in both cases - thus they should > >>>>> have equal access). > >>>>> > >>>>> Using default values I was seeing /very/ large differences - on the > >>>>> order of 3X. When I bumped the number of tokens to a large number > >>>>> (10,240) the timings got much closer (<2%). I have found that using > >>>>> weight-iosize performs worse than weight (closer to 5% penalty). > >>>> I could reproduce similar results. One dm-ioband device seems to stop > >>>> issuing I/Os for a few seconds at times. I'll investigate more on that. > >>>> > >>>>> I'll try to formalize these results as I go forward and report out on > >>>>> them. In any event, I thought I'd share this patch with you if you are > >>>>> interested... > >>>> Thanks. I'll include your patche into the next release. > >>>> > >>> IMO we should use TRACE_EVENT instead of adding new blk_add_trace_msg(). > >> Thanks for your suggestion. I'll use TRACE_EVENT instead. > > > > blk_add_trace_msg() supports both blktrace and tracepoints. I can > > get messages from dm-ioband through debugfs. Could you expain why > > should we use TRACE_EVENT instead? > > > > Actually blk_add_trace_msg() has nothing to do with tracepoints.. > > If we use blk_add_trace_msg() is dm, we can use it in md, various block > drivers and even ext4. So the right thing is, if a subsystem wants to add > trace facility, it should use tracepoints/TRACE_EVENT. > > With TRACE_EVENT, you can get output through debugfs too, and it can be used > together with blktrace: > > # echo 1 > /sys/block/dm/trace/enable > # echo blk > /debugfs/tracing/current_tracer > # echo dm-ioband-foo > /debugfs/tracing/tracing/set_event > # cat /deubgfs/tracing/trace_pipe > > And you can enable dm-ioband-foo while disabling dm-ioband-bar, and you can > use filter feature too. Thanks for explaining. The base kernel of current dm tree (2.6.30-rc4) has not supported dm-device tracing yet. I'll consider using TRACE_EVENT when the base kernel supports dm-device tracing. > > >>>>> Here's a sampling from some blktrace output (sorry for the wrapping) - I > >>>>> should note that I'm a bit scared to see such large numbers of holds > >>>>> going on when the token count should be >5,000 for each device... > >>>>> Holding these back in an equal access situation is inhibiting the block > >>>>> I/O layer to merge (most) of these (as mkfs performs lots & lots of > >>>>> small but sequential I/Os). > >> Thanks, > >> Ryo Tsuruta > > > > Thanks, > > Ryo Tsuruta > > Thanks, Ryo Tsuruta ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband 2009-05-12 6:11 ` Ryo Tsuruta @ 2009-05-12 8:10 ` Li Zefan 2009-05-12 10:12 ` Ryo Tsuruta 0 siblings, 1 reply; 11+ messages in thread From: Li Zefan @ 2009-05-12 8:10 UTC (permalink / raw) To: Ryo Tsuruta Cc: Alan.Brunelle, dm-devel, linux-kernel, Jens Axboe, Ingo Molnar, Steven Rostedt, Frederic Weisbecker CC more people on tracing issue Ryo Tsuruta wrote: > Hi Li, > > From: Li Zefan <lizf@cn.fujitsu.com> > Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband > Date: Tue, 12 May 2009 11:49:07 +0800 > >> Ryo Tsuruta wrote: >>> Hi Li, >>> >>> From: Ryo Tsuruta <ryov@valinux.co.jp> >>> Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband >>> Date: Thu, 07 May 2009 09:23:22 +0900 (JST) >>> >>>> Hi Li, >>>> >>>> From: Li Zefan <lizf@cn.fujitsu.com> >>>> Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband >>>> Date: Mon, 04 May 2009 11:24:27 +0800 >>>> >>>>> Ryo Tsuruta wrote: >>>>>> Hi Alan, >>>>>> >>>>>>> Hi Ryo - >>>>>>> >>>>>>> I don't know if you are taking in patches, but whilst trying to uncover >>>>>>> some odd behavior I added some blktrace messages to dm-ioband-ctl.c. If >>>>>>> you're keeping one code base for old stuff (2.6.18-ish RHEL stuff) and >>>>>>> upstream you'll have to #if around these (the blktrace message stuff >>>>>>> came in around 2.6.26 or 27 I think). >>>>>>> >>>>>>> My test case was to take a single 400GB storage device, put two 200GB >>>>>>> partitions on it and then see what the "penalty" or overhead for adding >>>>>>> dm-ioband on top. To do this I simply created an ext2 FS on each >>>>>>> partition in parallel (two processes each doing a mkfs to one of the >>>>>>> partitions). Then I put two dm-ioband devices on top of the two >>>>>>> partitions (setting the weight to 100 in both cases - thus they should >>>>>>> have equal access). >>>>>>> >>>>>>> Using default values I was seeing /very/ large differences - on the >>>>>>> order of 3X. When I bumped the number of tokens to a large number >>>>>>> (10,240) the timings got much closer (<2%). I have found that using >>>>>>> weight-iosize performs worse than weight (closer to 5% penalty). >>>>>> I could reproduce similar results. One dm-ioband device seems to stop >>>>>> issuing I/Os for a few seconds at times. I'll investigate more on that. >>>>>> >>>>>>> I'll try to formalize these results as I go forward and report out on >>>>>>> them. In any event, I thought I'd share this patch with you if you are >>>>>>> interested... >>>>>> Thanks. I'll include your patche into the next release. >>>>>> >>>>> IMO we should use TRACE_EVENT instead of adding new blk_add_trace_msg(). >>>> Thanks for your suggestion. I'll use TRACE_EVENT instead. >>> blk_add_trace_msg() supports both blktrace and tracepoints. I can >>> get messages from dm-ioband through debugfs. Could you expain why >>> should we use TRACE_EVENT instead? >>> >> Actually blk_add_trace_msg() has nothing to do with tracepoints.. >> >> If we use blk_add_trace_msg() is dm, we can use it in md, various block >> drivers and even ext4. So the right thing is, if a subsystem wants to add >> trace facility, it should use tracepoints/TRACE_EVENT. >> >> With TRACE_EVENT, you can get output through debugfs too, and it can be used >> together with blktrace: >> >> # echo 1 > /sys/block/dm/trace/enable >> # echo blk > /debugfs/tracing/current_tracer >> # echo dm-ioband-foo > /debugfs/tracing/tracing/set_event >> # cat /deubgfs/tracing/trace_pipe >> >> And you can enable dm-ioband-foo while disabling dm-ioband-bar, and you can >> use filter feature too. > > Thanks for explaining. > The base kernel of current dm tree (2.6.30-rc4) has not supported > dm-device tracing yet. I'll consider using TRACE_EVENT when the base > kernel supports dm-device tracing. > I think we don't have any plan on dm-device tracing support, do we? And you don't need that. I downloaded dm-ioband 1.10.4, and applied it to latest tip-tree, and made a patch to convert those blktrace msgs to trace events. # ls /debug/tracing/events/dm-ioband enable ioband_endio ioband_issue_list ioband_urgent_bio filter ioband_hold_bio ioband_make_request # echo 1 /debug/tracing/events/dm-ioband/enable # echo blk /debug/tracing/current_tracer ... Here is a sample output: pdflush-237 [001] 1706.377788: ioband_hold_bio: 8,9 ioband 1 hold_nrm 1 pdflush-237 [001] 1706.377795: 254,0 Q W 188832 + 8 [pdflush] pdflush-237 [001] 1706.377800: ioband_hold_bio: 8,9 ioband 1 hold_nrm 2 pdflush-237 [001] 1706.377806: 254,0 Q W 197032 + 8 [pdflush] pdflush-237 [001] 1706.377835: 254,0 C W 82232 + 8 [0] kioband/1-2569 [001] 1706.377922: ioband_issue_list: 8,9 ioband 1 add_iss 1 1 kioband/1-2569 [001] 1706.377923: ioband_issue_list: 8,9 ioband 1 add_iss 0 0 kioband/1-2569 [001] 1706.377925: ioband_make_request: 8,9 ioband 1 pop_iss kioband/1-2569 [001] 1706.377940: ioband_make_request: 8,9 ioband 1 pop_iss konsole-2140 [001] 1706.377969: 254,0 C W 90432 + 8 [0] konsole-2140 [001] 1706.378087: 254,0 C W 98632 + 8 [0] konsole-2140 [001] 1706.378232: 254,0 C W 106832 + 8 [0] konsole-2140 [001] 1706.378350: 254,0 C W 115032 + 8 [0] konsole-2140 [001] 1706.378466: 254,0 C W 123232 + 8 [0] konsole-2140 [001] 1706.378581: 254,0 C W 131432 + 8 [0] --- drivers/md/dm-ioband-ctl.c | 19 +++++- include/trace/events/dm-ioband.h | 134 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 150 insertions(+), 3 deletions(-) create mode 100644 include/trace/events/dm-ioband.h diff --git a/drivers/md/dm-ioband-ctl.c b/drivers/md/dm-ioband-ctl.c index 151b178..4033724 100644 --- a/drivers/md/dm-ioband-ctl.c +++ b/drivers/md/dm-ioband-ctl.c @@ -17,6 +17,9 @@ #include "md.h" #include "dm-ioband.h" +#define CREATE_TRACE_POINTS +#include <trace/events/dm-ioband.h> + #define POLICY_PARAM_START 6 #define POLICY_PARAM_DELIM "=:," @@ -632,9 +635,11 @@ static void hold_bio(struct ioband_group *gp, struct bio *bio) */ dp->g_prepare_bio(gp, bio, IOBAND_URGENT); bio_list_add(&dp->g_urgent_bios, bio); + trace_ioband_hold_bio(bio, dp, true); } else { gp->c_blocked++; dp->g_hold_bio(gp, bio); + trace_ioband_hold_bio(bio, dp, false); } } @@ -675,14 +680,17 @@ static int make_issue_list(struct ioband_group *gp, struct bio *bio, clear_group_blocked(gp); wake_up_all(&gp->c_waitq); } - if (should_pushback_bio(gp)) + if (should_pushback_bio(gp)) { bio_list_add(pushback_list, bio); + trace_ioband_issue_list(bio, dp, gp, true); + } else { int rw = bio_data_dir(bio); gp->c_stat[rw].deferred++; gp->c_stat[rw].sectors += bio_sectors(bio); bio_list_add(issue_list, bio); + trace_ioband_issue_list(bio, dp, gp, false); } return prepare_to_issue(gp, bio); } @@ -702,6 +710,7 @@ static void release_urgent_bios(struct ioband_device *dp, dp->g_blocked--; dp->g_issued[bio_data_dir(bio)]++; bio_list_add(issue_list, bio); + trace_ioband_urgent_bio(bio, dp); } } @@ -915,10 +924,14 @@ static void ioband_conduct(struct work_struct *work) spin_unlock_irqrestore(&dp->g_lock, flags); - while ((bio = bio_list_pop(&issue_list))) + while ((bio = bio_list_pop(&issue_list))) { + trace_ioband_make_request(bio, dp); generic_make_request(bio); - while ((bio = bio_list_pop(&pushback_list))) + } + while ((bio = bio_list_pop(&pushback_list))) { + trace_ioband_endio(bio, dp); bio_endio(bio, -EIO); + } } static int ioband_end_io(struct dm_target *ti, struct bio *bio, diff --git a/include/trace/events/dm-ioband.h b/include/trace/events/dm-ioband.h new file mode 100644 index 0000000..17dd61c --- /dev/null +++ b/include/trace/events/dm-ioband.h @@ -0,0 +1,134 @@ +#if !defined(_TRACE_DM_IOBAND_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_DM_IOBAND_H + +#include <linux/tracepoint.h> +#include <linux/interrupt.h> + +#undef TRACE_SYSTEM +#define TRACE_SYSTEM dm-ioband + +TRACE_EVENT(ioband_hold_bio, + + TP_PROTO(struct bio *bio, struct ioband_device *dp, bool urg), + + TP_ARGS(bio, dp, urg), + + TP_STRUCT__entry( + __field( dev_t, dev ) + __field( bool, urg ) + __field( int, g_blocked ) + __string( g_name, dp->g_name ) + ), + + TP_fast_assign( + __entry->dev = bio->bi_bdev->bd_dev; + __entry->urg = urg; + __entry->g_blocked = dp->g_blocked; + __assign_str(g_name, dp->g_name); + ), + + TP_printk("%d,%d ioband %s hold_%s %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + __get_str(g_name), + __entry->urg ? "urg" : "nrm", __entry->g_blocked) +); + +TRACE_EVENT(ioband_issue_list, + + TP_PROTO(struct bio *bio, struct ioband_device *dp, + struct ioband_group *gp, bool pushback), + + TP_ARGS(bio, dp, gp, pushback), + + TP_STRUCT__entry( + __field( dev_t, dev ) + __field( bool, pushback ) + __field( int, g_blocked ) + __field( int, c_blocked ) + __string( g_name, dp->g_name ) + ), + + TP_fast_assign( + __entry->dev = bio->bi_bdev->bd_dev; + __entry->pushback = pushback; + __entry->g_blocked = dp->g_blocked; + __entry->c_blocked = gp->c_blocked; + __assign_str(g_name, dp->g_name); + ), + + TP_printk("%d,%d ioband %s add_%s %d %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + __get_str(g_name), + __entry->pushback ? "pback" : "iss", + __entry->g_blocked, __entry->c_blocked) +); + +TRACE_EVENT(ioband_urgent_bio, + + TP_PROTO(struct bio *bio, struct ioband_device *dp), + + TP_ARGS(bio, dp), + + TP_STRUCT__entry( + __field( dev_t, dev ) + __field( int, g_blocked ) + __string( g_name, dp->g_name ) + ), + + TP_fast_assign( + __entry->dev = bio->bi_bdev->bd_dev; + __entry->g_blocked = dp->g_blocked; + __assign_str(g_name, dp->g_name); + ), + + TP_printk("%d,%d ioband %s urg_add_iss %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + __get_str(g_name), __entry->g_blocked) +); + +TRACE_EVENT(ioband_make_request, + + TP_PROTO(struct bio *bio, struct ioband_device *dp), + + TP_ARGS(bio, dp), + + TP_STRUCT__entry( + __field( dev_t, dev ) + __string( g_name, dp->g_name ) + ), + + TP_fast_assign( + __entry->dev = bio->bi_bdev->bd_dev; + __assign_str(g_name, dp->g_name); + ), + + TP_printk("%d,%d ioband %s pop_iss", + MAJOR(__entry->dev), MINOR(__entry->dev), __get_str(g_name)) +); + +TRACE_EVENT(ioband_endio, + + TP_PROTO(struct bio *bio, struct ioband_device *dp), + + TP_ARGS(bio, dp), + + TP_STRUCT__entry( + __field( dev_t, dev ) + __string( g_name, dp->g_name ) + ), + + TP_fast_assign( + __entry->dev = bio->bi_bdev->bd_dev; + __assign_str(g_name, dp->g_name); + ), + + TP_printk("%d,%d ioband %s pop_pback", + MAJOR(__entry->dev), MINOR(__entry->dev), __get_str(g_name)) +); + +#endif /* _TRACE_DM_IOBAND_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> + + -- 1.5.4.rc3 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband 2009-05-12 8:10 ` Li Zefan @ 2009-05-12 10:12 ` Ryo Tsuruta 2009-05-13 0:56 ` Li Zefan 0 siblings, 1 reply; 11+ messages in thread From: Ryo Tsuruta @ 2009-05-12 10:12 UTC (permalink / raw) To: lizf Cc: Alan.Brunelle, dm-devel, linux-kernel, jens.axboe, mingo, rostedt, fweisbec Hi Li, From: Li Zefan <lizf@cn.fujitsu.com> Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband Date: Tue, 12 May 2009 16:10:15 +0800 > CC more people on tracing issue > > Ryo Tsuruta wrote: > > Hi Li, > > > > From: Li Zefan <lizf@cn.fujitsu.com> > > Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband > > Date: Tue, 12 May 2009 11:49:07 +0800 > > > >> Ryo Tsuruta wrote: > >>> Hi Li, > >>> > >>> From: Ryo Tsuruta <ryov@valinux.co.jp> > >>> Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband > >>> Date: Thu, 07 May 2009 09:23:22 +0900 (JST) > >>> > >>>> Hi Li, > >>>> > >>>> From: Li Zefan <lizf@cn.fujitsu.com> > >>>> Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband > >>>> Date: Mon, 04 May 2009 11:24:27 +0800 > >>>> > >>>>> Ryo Tsuruta wrote: > >>>>>> Hi Alan, > >>>>>> > >>>>>>> Hi Ryo - > >>>>>>> > >>>>>>> I don't know if you are taking in patches, but whilst trying to uncover > >>>>>>> some odd behavior I added some blktrace messages to dm-ioband-ctl.c. If > >>>>>>> you're keeping one code base for old stuff (2.6.18-ish RHEL stuff) and > >>>>>>> upstream you'll have to #if around these (the blktrace message stuff > >>>>>>> came in around 2.6.26 or 27 I think). > >>>>>>> > >>>>>>> My test case was to take a single 400GB storage device, put two 200GB > >>>>>>> partitions on it and then see what the "penalty" or overhead for adding > >>>>>>> dm-ioband on top. To do this I simply created an ext2 FS on each > >>>>>>> partition in parallel (two processes each doing a mkfs to one of the > >>>>>>> partitions). Then I put two dm-ioband devices on top of the two > >>>>>>> partitions (setting the weight to 100 in both cases - thus they should > >>>>>>> have equal access). > >>>>>>> > >>>>>>> Using default values I was seeing /very/ large differences - on the > >>>>>>> order of 3X. When I bumped the number of tokens to a large number > >>>>>>> (10,240) the timings got much closer (<2%). I have found that using > >>>>>>> weight-iosize performs worse than weight (closer to 5% penalty). > >>>>>> I could reproduce similar results. One dm-ioband device seems to stop > >>>>>> issuing I/Os for a few seconds at times. I'll investigate more on that. > >>>>>> > >>>>>>> I'll try to formalize these results as I go forward and report out on > >>>>>>> them. In any event, I thought I'd share this patch with you if you are > >>>>>>> interested... > >>>>>> Thanks. I'll include your patche into the next release. > >>>>>> > >>>>> IMO we should use TRACE_EVENT instead of adding new blk_add_trace_msg(). > >>>> Thanks for your suggestion. I'll use TRACE_EVENT instead. > >>> blk_add_trace_msg() supports both blktrace and tracepoints. I can > >>> get messages from dm-ioband through debugfs. Could you expain why > >>> should we use TRACE_EVENT instead? > >>> > >> Actually blk_add_trace_msg() has nothing to do with tracepoints.. > >> > >> If we use blk_add_trace_msg() is dm, we can use it in md, various block > >> drivers and even ext4. So the right thing is, if a subsystem wants to add > >> trace facility, it should use tracepoints/TRACE_EVENT. > >> > >> With TRACE_EVENT, you can get output through debugfs too, and it can be used > >> together with blktrace: > >> > >> # echo 1 > /sys/block/dm/trace/enable > >> # echo blk > /debugfs/tracing/current_tracer > >> # echo dm-ioband-foo > /debugfs/tracing/tracing/set_event > >> # cat /deubgfs/tracing/trace_pipe > >> > >> And you can enable dm-ioband-foo while disabling dm-ioband-bar, and you can > >> use filter feature too. > > > > Thanks for explaining. > > The base kernel of current dm tree (2.6.30-rc4) has not supported > > dm-device tracing yet. I'll consider using TRACE_EVENT when the base > > kernel supports dm-device tracing. > > > > I think we don't have any plan on dm-device tracing support, do we? And you > don't need that. > > I downloaded dm-ioband 1.10.4, and applied it to latest tip-tree, and made > a patch to convert those blktrace msgs to trace events. > > # ls /debug/tracing/events/dm-ioband > enable ioband_endio ioband_issue_list ioband_urgent_bio > filter ioband_hold_bio ioband_make_request > # echo 1 /debug/tracing/events/dm-ioband/enable > # echo blk /debug/tracing/current_tracer > ... Is the following line unnecessary in the latest tip-tree? # echo 1 > /sys/block/dm/trace/enable How to enable/disable tracing on each device? O.K. I'll try the latest tip-tree. Thanks for your help! > > Here is a sample output: > > pdflush-237 [001] 1706.377788: ioband_hold_bio: 8,9 ioband 1 hold_nrm 1 > pdflush-237 [001] 1706.377795: 254,0 Q W 188832 + 8 [pdflush] > pdflush-237 [001] 1706.377800: ioband_hold_bio: 8,9 ioband 1 hold_nrm 2 > pdflush-237 [001] 1706.377806: 254,0 Q W 197032 + 8 [pdflush] > pdflush-237 [001] 1706.377835: 254,0 C W 82232 + 8 [0] > kioband/1-2569 [001] 1706.377922: ioband_issue_list: 8,9 ioband 1 add_iss 1 1 > kioband/1-2569 [001] 1706.377923: ioband_issue_list: 8,9 ioband 1 add_iss 0 0 > kioband/1-2569 [001] 1706.377925: ioband_make_request: 8,9 ioband 1 pop_iss > kioband/1-2569 [001] 1706.377940: ioband_make_request: 8,9 ioband 1 pop_iss > konsole-2140 [001] 1706.377969: 254,0 C W 90432 + 8 [0] > konsole-2140 [001] 1706.378087: 254,0 C W 98632 + 8 [0] > konsole-2140 [001] 1706.378232: 254,0 C W 106832 + 8 [0] > konsole-2140 [001] 1706.378350: 254,0 C W 115032 + 8 [0] > konsole-2140 [001] 1706.378466: 254,0 C W 123232 + 8 [0] > konsole-2140 [001] 1706.378581: 254,0 C W 131432 + 8 [0] > > --- > drivers/md/dm-ioband-ctl.c | 19 +++++- > include/trace/events/dm-ioband.h | 134 ++++++++++++++++++++++++++++++++++++++ > 2 files changed, 150 insertions(+), 3 deletions(-) > create mode 100644 include/trace/events/dm-ioband.h > > diff --git a/drivers/md/dm-ioband-ctl.c b/drivers/md/dm-ioband-ctl.c > index 151b178..4033724 100644 > --- a/drivers/md/dm-ioband-ctl.c > +++ b/drivers/md/dm-ioband-ctl.c > @@ -17,6 +17,9 @@ > #include "md.h" > #include "dm-ioband.h" > > +#define CREATE_TRACE_POINTS > +#include <trace/events/dm-ioband.h> > + > #define POLICY_PARAM_START 6 > #define POLICY_PARAM_DELIM "=:," > > @@ -632,9 +635,11 @@ static void hold_bio(struct ioband_group *gp, struct bio *bio) > */ > dp->g_prepare_bio(gp, bio, IOBAND_URGENT); > bio_list_add(&dp->g_urgent_bios, bio); > + trace_ioband_hold_bio(bio, dp, true); > } else { > gp->c_blocked++; > dp->g_hold_bio(gp, bio); > + trace_ioband_hold_bio(bio, dp, false); > } > } > > @@ -675,14 +680,17 @@ static int make_issue_list(struct ioband_group *gp, struct bio *bio, > clear_group_blocked(gp); > wake_up_all(&gp->c_waitq); > } > - if (should_pushback_bio(gp)) > + if (should_pushback_bio(gp)) { > bio_list_add(pushback_list, bio); > + trace_ioband_issue_list(bio, dp, gp, true); > + } > else { > int rw = bio_data_dir(bio); > > gp->c_stat[rw].deferred++; > gp->c_stat[rw].sectors += bio_sectors(bio); > bio_list_add(issue_list, bio); > + trace_ioband_issue_list(bio, dp, gp, false); > } > return prepare_to_issue(gp, bio); > } > @@ -702,6 +710,7 @@ static void release_urgent_bios(struct ioband_device *dp, > dp->g_blocked--; > dp->g_issued[bio_data_dir(bio)]++; > bio_list_add(issue_list, bio); > + trace_ioband_urgent_bio(bio, dp); > } > } > > @@ -915,10 +924,14 @@ static void ioband_conduct(struct work_struct *work) > > spin_unlock_irqrestore(&dp->g_lock, flags); > > - while ((bio = bio_list_pop(&issue_list))) > + while ((bio = bio_list_pop(&issue_list))) { > + trace_ioband_make_request(bio, dp); > generic_make_request(bio); > - while ((bio = bio_list_pop(&pushback_list))) > + } > + while ((bio = bio_list_pop(&pushback_list))) { > + trace_ioband_endio(bio, dp); > bio_endio(bio, -EIO); > + } > } > > static int ioband_end_io(struct dm_target *ti, struct bio *bio, > diff --git a/include/trace/events/dm-ioband.h b/include/trace/events/dm-ioband.h > new file mode 100644 > index 0000000..17dd61c > --- /dev/null > +++ b/include/trace/events/dm-ioband.h > @@ -0,0 +1,134 @@ > +#if !defined(_TRACE_DM_IOBAND_H) || defined(TRACE_HEADER_MULTI_READ) > +#define _TRACE_DM_IOBAND_H > + > +#include <linux/tracepoint.h> > +#include <linux/interrupt.h> > + > +#undef TRACE_SYSTEM > +#define TRACE_SYSTEM dm-ioband > + > +TRACE_EVENT(ioband_hold_bio, > + > + TP_PROTO(struct bio *bio, struct ioband_device *dp, bool urg), > + > + TP_ARGS(bio, dp, urg), > + > + TP_STRUCT__entry( > + __field( dev_t, dev ) > + __field( bool, urg ) > + __field( int, g_blocked ) > + __string( g_name, dp->g_name ) > + ), > + > + TP_fast_assign( > + __entry->dev = bio->bi_bdev->bd_dev; > + __entry->urg = urg; > + __entry->g_blocked = dp->g_blocked; > + __assign_str(g_name, dp->g_name); > + ), > + > + TP_printk("%d,%d ioband %s hold_%s %d", > + MAJOR(__entry->dev), MINOR(__entry->dev), > + __get_str(g_name), > + __entry->urg ? "urg" : "nrm", __entry->g_blocked) > +); > + > +TRACE_EVENT(ioband_issue_list, > + > + TP_PROTO(struct bio *bio, struct ioband_device *dp, > + struct ioband_group *gp, bool pushback), > + > + TP_ARGS(bio, dp, gp, pushback), > + > + TP_STRUCT__entry( > + __field( dev_t, dev ) > + __field( bool, pushback ) > + __field( int, g_blocked ) > + __field( int, c_blocked ) > + __string( g_name, dp->g_name ) > + ), > + > + TP_fast_assign( > + __entry->dev = bio->bi_bdev->bd_dev; > + __entry->pushback = pushback; > + __entry->g_blocked = dp->g_blocked; > + __entry->c_blocked = gp->c_blocked; > + __assign_str(g_name, dp->g_name); > + ), > + > + TP_printk("%d,%d ioband %s add_%s %d %d", > + MAJOR(__entry->dev), MINOR(__entry->dev), > + __get_str(g_name), > + __entry->pushback ? "pback" : "iss", > + __entry->g_blocked, __entry->c_blocked) > +); > + > +TRACE_EVENT(ioband_urgent_bio, > + > + TP_PROTO(struct bio *bio, struct ioband_device *dp), > + > + TP_ARGS(bio, dp), > + > + TP_STRUCT__entry( > + __field( dev_t, dev ) > + __field( int, g_blocked ) > + __string( g_name, dp->g_name ) > + ), > + > + TP_fast_assign( > + __entry->dev = bio->bi_bdev->bd_dev; > + __entry->g_blocked = dp->g_blocked; > + __assign_str(g_name, dp->g_name); > + ), > + > + TP_printk("%d,%d ioband %s urg_add_iss %d", > + MAJOR(__entry->dev), MINOR(__entry->dev), > + __get_str(g_name), __entry->g_blocked) > +); > + > +TRACE_EVENT(ioband_make_request, > + > + TP_PROTO(struct bio *bio, struct ioband_device *dp), > + > + TP_ARGS(bio, dp), > + > + TP_STRUCT__entry( > + __field( dev_t, dev ) > + __string( g_name, dp->g_name ) > + ), > + > + TP_fast_assign( > + __entry->dev = bio->bi_bdev->bd_dev; > + __assign_str(g_name, dp->g_name); > + ), > + > + TP_printk("%d,%d ioband %s pop_iss", > + MAJOR(__entry->dev), MINOR(__entry->dev), __get_str(g_name)) > +); > + > +TRACE_EVENT(ioband_endio, > + > + TP_PROTO(struct bio *bio, struct ioband_device *dp), > + > + TP_ARGS(bio, dp), > + > + TP_STRUCT__entry( > + __field( dev_t, dev ) > + __string( g_name, dp->g_name ) > + ), > + > + TP_fast_assign( > + __entry->dev = bio->bi_bdev->bd_dev; > + __assign_str(g_name, dp->g_name); > + ), > + > + TP_printk("%d,%d ioband %s pop_pback", > + MAJOR(__entry->dev), MINOR(__entry->dev), __get_str(g_name)) > +); > + > +#endif /* _TRACE_DM_IOBAND_H */ > + > +/* This part must be outside protection */ > +#include <trace/define_trace.h> > + > + > -- > 1.5.4.rc3 > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband 2009-05-12 10:12 ` Ryo Tsuruta @ 2009-05-13 0:56 ` Li Zefan 2009-05-13 11:30 ` Ryo Tsuruta 0 siblings, 1 reply; 11+ messages in thread From: Li Zefan @ 2009-05-13 0:56 UTC (permalink / raw) To: Ryo Tsuruta Cc: Alan.Brunelle, dm-devel, linux-kernel, jens.axboe, mingo, rostedt, fweisbec >>>>>> Thanks for your suggestion. I'll use TRACE_EVENT instead. >>>>> blk_add_trace_msg() supports both blktrace and tracepoints. I can >>>>> get messages from dm-ioband through debugfs. Could you expain why >>>>> should we use TRACE_EVENT instead? >>>>> >>>> Actually blk_add_trace_msg() has nothing to do with tracepoints.. >>>> >>>> If we use blk_add_trace_msg() is dm, we can use it in md, various block >>>> drivers and even ext4. So the right thing is, if a subsystem wants to add >>>> trace facility, it should use tracepoints/TRACE_EVENT. >>>> >>>> With TRACE_EVENT, you can get output through debugfs too, and it can be used >>>> together with blktrace: >>>> >>>> # echo 1 > /sys/block/dm/trace/enable >>>> # echo blk > /debugfs/tracing/current_tracer >>>> # echo dm-ioband-foo > /debugfs/tracing/tracing/set_event >>>> # cat /deubgfs/tracing/trace_pipe >>>> >>>> And you can enable dm-ioband-foo while disabling dm-ioband-bar, and you can >>>> use filter feature too. >>> Thanks for explaining. >>> The base kernel of current dm tree (2.6.30-rc4) has not supported >>> dm-device tracing yet. I'll consider using TRACE_EVENT when the base >>> kernel supports dm-device tracing. >>> >> I think we don't have any plan on dm-device tracing support, do we? And you >> don't need that. >> >> I downloaded dm-ioband 1.10.4, and applied it to latest tip-tree, and made >> a patch to convert those blktrace msgs to trace events. >> >> # ls /debug/tracing/events/dm-ioband >> enable ioband_endio ioband_issue_list ioband_urgent_bio >> filter ioband_hold_bio ioband_make_request >> # echo 1 /debug/tracing/events/dm-ioband/enable >> # echo blk /debug/tracing/current_tracer >> ... > > Is the following line unnecessary in the latest tip-tree? > > # echo 1 > /sys/block/dm/trace/enable > Yes, still needed. I was not giving a complete demonstration. ;) > How to enable/disable tracing on each device? echo 1 > /sys/block/dm-0/trace/enable echo 1 > /sys/block/dm-1/trace/enable Or use filter facility: echo "dev == $(( (ma0 << 20) | mi0 ))" > /debug/tracing/events/dm-ioband/filter This will filter out all events generated from dm-1, leaving those from dm-0. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband 2009-05-13 0:56 ` Li Zefan @ 2009-05-13 11:30 ` Ryo Tsuruta 0 siblings, 0 replies; 11+ messages in thread From: Ryo Tsuruta @ 2009-05-13 11:30 UTC (permalink / raw) To: lizf Cc: Alan.Brunelle, dm-devel, linux-kernel, jens.axboe, mingo, rostedt, fweisbec Hi Li, From: Li Zefan <lizf@cn.fujitsu.com> Subject: Re: [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband Date: Wed, 13 May 2009 08:56:13 +0800 > >>>>>> Thanks for your suggestion. I'll use TRACE_EVENT instead. > >>>>> blk_add_trace_msg() supports both blktrace and tracepoints. I can > >>>>> get messages from dm-ioband through debugfs. Could you expain why > >>>>> should we use TRACE_EVENT instead? > >>>>> > >>>> Actually blk_add_trace_msg() has nothing to do with tracepoints.. > >>>> > >>>> If we use blk_add_trace_msg() is dm, we can use it in md, various block > >>>> drivers and even ext4. So the right thing is, if a subsystem wants to add > >>>> trace facility, it should use tracepoints/TRACE_EVENT. > >>>> > >>>> With TRACE_EVENT, you can get output through debugfs too, and it can be used > >>>> together with blktrace: > >>>> > >>>> # echo 1 > /sys/block/dm/trace/enable > >>>> # echo blk > /debugfs/tracing/current_tracer > >>>> # echo dm-ioband-foo > /debugfs/tracing/tracing/set_event > >>>> # cat /deubgfs/tracing/trace_pipe > >>>> > >>>> And you can enable dm-ioband-foo while disabling dm-ioband-bar, and you can > >>>> use filter feature too. > >>> Thanks for explaining. > >>> The base kernel of current dm tree (2.6.30-rc4) has not supported > >>> dm-device tracing yet. I'll consider using TRACE_EVENT when the base > >>> kernel supports dm-device tracing. > >>> > >> I think we don't have any plan on dm-device tracing support, do we? And you > >> don't need that. > >> > >> I downloaded dm-ioband 1.10.4, and applied it to latest tip-tree, and made > >> a patch to convert those blktrace msgs to trace events. > >> > >> # ls /debug/tracing/events/dm-ioband > >> enable ioband_endio ioband_issue_list ioband_urgent_bio > >> filter ioband_hold_bio ioband_make_request > >> # echo 1 /debug/tracing/events/dm-ioband/enable > >> # echo blk /debug/tracing/current_tracer > >> ... > > > > Is the following line unnecessary in the latest tip-tree? > > > > # echo 1 > /sys/block/dm/trace/enable > > > > Yes, still needed. I was not giving a complete demonstration. ;) > > > How to enable/disable tracing on each device? > > echo 1 > /sys/block/dm-0/trace/enable > echo 1 > /sys/block/dm-1/trace/enable > > Or use filter facility: > > echo "dev == $(( (ma0 << 20) | mi0 ))" > /debug/tracing/events/dm-ioband/filter > > This will filter out all events generated from dm-1, leaving those from dm-0. > Thanks for further explaining. I'll include your patche into the next release. Thanks, Ryo Tsuruta ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-05-13 11:30 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-04-24 21:47 [RFC PATCH dm-ioband] Added in blktrace msgs for dm-ioband Alan D. Brunelle 2009-04-27 9:44 ` Ryo Tsuruta 2009-05-04 3:24 ` Li Zefan 2009-05-07 0:23 ` Ryo Tsuruta 2009-05-11 10:31 ` Ryo Tsuruta 2009-05-12 3:49 ` Li Zefan 2009-05-12 6:11 ` Ryo Tsuruta 2009-05-12 8:10 ` Li Zefan 2009-05-12 10:12 ` Ryo Tsuruta 2009-05-13 0:56 ` Li Zefan 2009-05-13 11:30 ` Ryo Tsuruta
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox