From: tang.junhui@zte.com.cn
To: mlyle@lyle.org
Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org,
tang.junhui@zte.com.cn
Subject: Re: [PATCH v3] bcache: fix writeback target calc on large devices
Date: Sat, 6 Jan 2018 15:29:57 +0800 [thread overview]
Message-ID: <1515223797-25169-1-git-send-email-tang.junhui@zte.com.cn> (raw)
From: Tang Junhui <tang.junhui@zte.com.cn>
Hello Mike,
I thought twice, and feel this patch is a little complex and still not very accurate for
small backend devices. I think we can resolve it like this:
uint64_t cache_dirty_target =
div_u64(cache_sectors * dc->writeback_percent, 100);
- int64_t target = div64_u64(cache_dirty_target * bdev_sectors(dc->bdev),
- c->cached_dev_sectors);
+ int64_t target = div64_u64(cache_dirty_target,
+ div64_u64(c->cached_dev_sectors, bdev_sectors(dc->bdev)));
c->cached_dev_sectors is always bigger than bdev_sectors(dc->bdev), so
div64_u64(c->cached_dev_sectors, bdev_sectors(dc->bdev) would be never smaller than 1,
and this expression has no multiplication operation, so we do not need to worry
about overflows.
we also can multiply a value(2^10) to avoid loss of accuracy in division like bellow:
(it would support all device smaller than 2^54 BYTE).
+ int64_t target = div64_u64(cache_dirty_target<<10,
+ div64_u64(c->cached_dev_sectors<<10, bdev_sectors(dc->bdev)));
How do you think about?
> Bcache needs to scale the dirty data in the cache over the multiple
> backing disks in order to calculate writeback rates for each.
> The previous code did this by multiplying the target number of dirty
> sectors by the backing device size, and expected it to fit into a
> uint64_t; this blows up on relatively small backing devices.
>
> The new approach figures out the bdev's share in 16384ths of the overall
> cached data. This is chosen to cope well when bdevs drastically vary in
> size and to ensure that bcache can cross the petabyte boundary for each
> backing device.
>
> This has been improved based on Tang Junhui's feedback to ensure that
> every device gets a share of dirty data, no matter how small it is
> compared to the total backing pool.
>
> Reported-by: Jack Douglas <jack@douglastechnology.co.uk>
> Signed-off-by: Michael Lyle <mlyle@lyle.org>
> ---
> drivers/md/bcache/writeback.c | 35 +++++++++++++++++++++++++++++++----
> 1 file changed, 31 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 61f24c04cebd..c7e35180091e 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -18,17 +18,43 @@
> #include <trace/events/bcache.h>
>
> /* Rate limiting */
> -
> -static void __update_writeback_rate(struct cached_dev *dc)
> +static uint64_t __calc_target_rate(struct cached_dev *dc)
> {
> struct cache_set *c = dc->disk.c;
> +
> + /*
> + * This is the size of the cache, minus the amount used for
> + * flash-only devices
> + */
> uint64_t cache_sectors = c->nbuckets * c->sb.bucket_size -
> bcache_flash_devs_sectors_dirty(c);
> +
> + /*
> + * Unfortunately there is no control of global dirty data. If the
> + * user states that they want 10% dirty data in the cache, and has,
> + * e.g., 5 backing volumes of equal size, we try and ensure each
> + * backing volume uses about 2% of the cache.
> + *
> + * 16384 is chosen here as something that each backing device should
> + * be a reasonable fraction of the share, and not to blow up until
> + * individual backing devices are a petabyte.
> + */
> + uint32_t bdev_share_per16k =
> + div64_u64(16384 * bdev_sectors(dc->bdev),
> + c->cached_dev_sectors);
> +
> uint64_t cache_dirty_target =
> div_u64(cache_sectors * dc->writeback_percent, 100);
> - int64_t target = div64_u64(cache_dirty_target * bdev_sectors(dc->bdev),
> - c->cached_dev_sectors);
>
> + /* Ensure each backing dev gets at least 1/16384th dirty share */
> + if (bdev_share_per16k < 1)
> + bdev_share_per16k = 1;
> +
> + return div_u64(cache_dirty_target * bdev_share_per16k, 16384);
> +}
> +
> +static void __update_writeback_rate(struct cached_dev *dc)
> +{
> /*
> * PI controller:
> * Figures out the amount that should be written per second.
> @@ -49,6 +75,7 @@ static void __update_writeback_rate(struct cached_dev *dc)
> * This acts as a slow, long-term average that is not subject to
> * variations in usage like the p term.
> */
> + int64_t target = __calc_target_rate(dc);
> int64_t dirty = bcache_dev_sectors_dirty(&dc->disk);
> int64_t error = dirty - target;
> int64_t proportional_scaled =
> --
> 2.14.1
Thanks,
Tang Junhui
next reply other threads:[~2018-01-06 7:28 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-06 7:29 tang.junhui [this message]
2018-01-06 17:47 ` [PATCH v3] bcache: fix writeback target calc on large devices Michael Lyle
-- strict thread matches above, loose matches on Subject: below --
2018-01-08 2:46 tang.junhui
2018-01-08 17:58 ` Michael Lyle
2018-01-05 21:17 Michael Lyle
2018-01-05 21:20 ` Michael Lyle
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1515223797-25169-1-git-send-email-tang.junhui@zte.com.cn \
--to=tang.junhui@zte.com.cn \
--cc=linux-bcache@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=mlyle@lyle.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox