linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] lib/sbitmap: convert shallow_depth from one word to the whole sbitmap
@ 2025-07-29  3:19 Yu Kuai
  2025-07-29  3:19 ` [PATCH v2 1/2] " Yu Kuai
  2025-07-29  3:19 ` [PATCH v2 2/2] lib/sbitmap: make sbitmap_get_shallow() static Yu Kuai
  0 siblings, 2 replies; 12+ messages in thread
From: Yu Kuai @ 2025-07-29  3:19 UTC (permalink / raw)
  To: jack, axboe, akpm, yang.yang, dlemoal, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

Changes from v1:
 - fix some wording in patch 2
 - add review tag by Damien in patch 2

Yu Kuai (2):
  lib/sbitmap: convert shallow_depth from one word to the whole sbitmap
  lib/sbitmap: make sbitmap_get_shallow() static

 block/bfq-iosched.c     | 35 +++++++++-----------
 block/bfq-iosched.h     |  3 +-
 block/kyber-iosched.c   |  9 ++---
 block/mq-deadline.c     | 16 +--------
 include/linux/sbitmap.h | 19 +----------
 lib/sbitmap.c           | 73 +++++++++++++++++++++++++----------------
 6 files changed, 65 insertions(+), 90 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] lib/sbitmap: convert shallow_depth from one word to the whole sbitmap
  2025-07-29  3:19 [PATCH v2 0/2] lib/sbitmap: convert shallow_depth from one word to the whole sbitmap Yu Kuai
@ 2025-07-29  3:19 ` Yu Kuai
  2025-07-29 10:16   ` Jan Kara
  2025-07-29  3:19 ` [PATCH v2 2/2] lib/sbitmap: make sbitmap_get_shallow() static Yu Kuai
  1 sibling, 1 reply; 12+ messages in thread
From: Yu Kuai @ 2025-07-29  3:19 UTC (permalink / raw)
  To: jack, axboe, akpm, yang.yang, dlemoal, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

Currently elevators will record internal 'async_depth' to throttle
asynchronous requests, and they both calculate shallow_dpeth based on
sb->shift, with the respect that sb->shift is the available tags in one
word.

However, sb->shift is not the availbale tags in the last word, see
__map_depth:

if (index == sb->map_nr - 1)
  return sb->depth - (index << sb->shift);

For consequence, if the last word is used, more tags can be get than
expected, for example, assume nr_requests=256 and there are four words,
in the worst case if user set nr_requests=32, then the first word is
the last word, and still use bits per word, which is 64, to calculate
async_depth is wrong.

One the other hand, due to cgroup qos, bfq can allow only one request
to be allocated, and set shallow_dpeth=1 will still allow the number
of words request to be allocated.

Fix this problems by using shallow_depth to the whole sbitmap instead
of per word, also change kyber, mq-deadline and bfq to follow this.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/bfq-iosched.c     | 35 ++++++++++++--------------
 block/bfq-iosched.h     |  3 +--
 block/kyber-iosched.c   |  9 ++-----
 block/mq-deadline.c     | 16 +-----------
 include/linux/sbitmap.h |  6 ++---
 lib/sbitmap.c           | 55 +++++++++++++++++++++--------------------
 6 files changed, 51 insertions(+), 73 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index f71ec0887733..a6a574a8eac9 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -694,17 +694,13 @@ static void bfq_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
 {
 	struct bfq_data *bfqd = data->q->elevator->elevator_data;
 	struct bfq_io_cq *bic = bfq_bic_lookup(data->q);
-	int depth;
-	unsigned limit = data->q->nr_requests;
-	unsigned int act_idx;
+	unsigned int limit, act_idx;
 
 	/* Sync reads have full depth available */
-	if (op_is_sync(opf) && !op_is_write(opf)) {
-		depth = 0;
-	} else {
-		depth = bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(opf)];
-		limit = (limit * depth) >> bfqd->full_depth_shift;
-	}
+	if (op_is_sync(opf) && !op_is_write(opf))
+		limit = data->q->nr_requests;
+	else
+		limit = bfqd->async_depths[!!bfqd->wr_busy_queues][op_is_sync(opf)];
 
 	for (act_idx = 0; bic && act_idx < bfqd->num_actuators; act_idx++) {
 		/* Fast path to check if bfqq is already allocated. */
@@ -718,14 +714,16 @@ static void bfq_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
 		 * available requests and thus starve other entities.
 		 */
 		if (bfqq_request_over_limit(bfqd, bic, opf, act_idx, limit)) {
-			depth = 1;
+			limit = 1;
 			break;
 		}
 	}
+
 	bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u",
-		__func__, bfqd->wr_busy_queues, op_is_sync(opf), depth);
-	if (depth)
-		data->shallow_depth = depth;
+		__func__, bfqd->wr_busy_queues, op_is_sync(opf), limit);
+
+	if (limit < data->q->nr_requests)
+		data->shallow_depth = limit;
 }
 
 static struct bfq_queue *
@@ -7114,9 +7112,8 @@ void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
  */
 static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
 {
-	unsigned int depth = 1U << bt->sb.shift;
+	unsigned int nr_requests = bfqd->queue->nr_requests;
 
-	bfqd->full_depth_shift = bt->sb.shift;
 	/*
 	 * In-word depths if no bfq_queue is being weight-raised:
 	 * leaving 25% of tags only for sync reads.
@@ -7128,13 +7125,13 @@ static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
 	 * limit 'something'.
 	 */
 	/* no more than 50% of tags for async I/O */
-	bfqd->word_depths[0][0] = max(depth >> 1, 1U);
+	bfqd->async_depths[0][0] = max(nr_requests >> 1, 1U);
 	/*
 	 * no more than 75% of tags for sync writes (25% extra tags
 	 * w.r.t. async I/O, to prevent async I/O from starving sync
 	 * writes)
 	 */
-	bfqd->word_depths[0][1] = max((depth * 3) >> 2, 1U);
+	bfqd->async_depths[0][1] = max((nr_requests * 3) >> 2, 1U);
 
 	/*
 	 * In-word depths in case some bfq_queue is being weight-
@@ -7144,9 +7141,9 @@ static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
 	 * shortage.
 	 */
 	/* no more than ~18% of tags for async I/O */
-	bfqd->word_depths[1][0] = max((depth * 3) >> 4, 1U);
+	bfqd->async_depths[1][0] = max((nr_requests * 3) >> 4, 1U);
 	/* no more than ~37% of tags for sync writes (~20% extra tags) */
-	bfqd->word_depths[1][1] = max((depth * 6) >> 4, 1U);
+	bfqd->async_depths[1][1] = max((nr_requests * 6) >> 4, 1U);
 }
 
 static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 687a3a7ba784..31217f196f4f 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -813,8 +813,7 @@ struct bfq_data {
 	 * Depth limits used in bfq_limit_depth (see comments on the
 	 * function)
 	 */
-	unsigned int word_depths[2][2];
-	unsigned int full_depth_shift;
+	unsigned int async_depths[2][2];
 
 	/*
 	 * Number of independent actuators. This is equal to 1 in
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 4dba8405bd01..bfd9a40bb33d 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -157,10 +157,7 @@ struct kyber_queue_data {
 	 */
 	struct sbitmap_queue domain_tokens[KYBER_NUM_DOMAINS];
 
-	/*
-	 * Async request percentage, converted to per-word depth for
-	 * sbitmap_get_shallow().
-	 */
+	/* Number of allowed async requests. */
 	unsigned int async_depth;
 
 	struct kyber_cpu_latency __percpu *cpu_latency;
@@ -454,10 +451,8 @@ static void kyber_depth_updated(struct blk_mq_hw_ctx *hctx)
 {
 	struct kyber_queue_data *kqd = hctx->queue->elevator->elevator_data;
 	struct blk_mq_tags *tags = hctx->sched_tags;
-	unsigned int shift = tags->bitmap_tags.sb.shift;
-
-	kqd->async_depth = (1U << shift) * KYBER_ASYNC_PERCENT / 100U;
 
+	kqd->async_depth = hctx->queue->nr_requests * KYBER_ASYNC_PERCENT / 100U;
 	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, kqd->async_depth);
 }
 
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 2edf1cac06d5..9ab6c6256695 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -487,20 +487,6 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	return rq;
 }
 
-/*
- * 'depth' is a number in the range 1..INT_MAX representing a number of
- * requests. Scale it with a factor (1 << bt->sb.shift) / q->nr_requests since
- * 1..(1 << bt->sb.shift) is the range expected by sbitmap_get_shallow().
- * Values larger than q->nr_requests have the same effect as q->nr_requests.
- */
-static int dd_to_word_depth(struct blk_mq_hw_ctx *hctx, unsigned int qdepth)
-{
-	struct sbitmap_queue *bt = &hctx->sched_tags->bitmap_tags;
-	const unsigned int nrr = hctx->queue->nr_requests;
-
-	return ((qdepth << bt->sb.shift) + nrr - 1) / nrr;
-}
-
 /*
  * Called by __blk_mq_alloc_request(). The shallow_depth value set by this
  * function is used by __blk_mq_get_tag().
@@ -517,7 +503,7 @@ static void dd_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
 	 * Throttle asynchronous requests and writes such that these requests
 	 * do not block the allocation of synchronous requests.
 	 */
-	data->shallow_depth = dd_to_word_depth(data->hctx, dd->async_depth);
+	data->shallow_depth = dd->async_depth;
 }
 
 /* Called by blk_mq_update_nr_requests(). */
diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
index 189140bf11fc..4adf4b364fcd 100644
--- a/include/linux/sbitmap.h
+++ b/include/linux/sbitmap.h
@@ -213,12 +213,12 @@ int sbitmap_get(struct sbitmap *sb);
  * sbitmap_get_shallow() - Try to allocate a free bit from a &struct sbitmap,
  * limiting the depth used from each word.
  * @sb: Bitmap to allocate from.
- * @shallow_depth: The maximum number of bits to allocate from a single word.
+ * @shallow_depth: The maximum number of bits to allocate from the bitmap.
  *
  * This rather specific operation allows for having multiple users with
  * different allocation limits. E.g., there can be a high-priority class that
  * uses sbitmap_get() and a low-priority class that uses sbitmap_get_shallow()
- * with a @shallow_depth of (1 << (@sb->shift - 1)). Then, the low-priority
+ * with a @shallow_depth of (sb->depth >> 1). Then, the low-priority
  * class can only allocate half of the total bits in the bitmap, preventing it
  * from starving out the high-priority class.
  *
@@ -478,7 +478,7 @@ unsigned long __sbitmap_queue_get_batch(struct sbitmap_queue *sbq, int nr_tags,
  * sbitmap_queue, limiting the depth used from each word, with preemption
  * already disabled.
  * @sbq: Bitmap queue to allocate from.
- * @shallow_depth: The maximum number of bits to allocate from a single word.
+ * @shallow_depth: The maximum number of bits to allocate from the queue.
  * See sbitmap_get_shallow().
  *
  * If you call this, make sure to call sbitmap_queue_min_shallow_depth() after
diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index d3412984170c..f2e90ac6b56e 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -208,8 +208,27 @@ static int sbitmap_find_bit_in_word(struct sbitmap_word *map,
 	return nr;
 }
 
+static unsigned int __map_depth_with_shallow(const struct sbitmap *sb,
+					     int index,
+					     unsigned int shallow_depth)
+{
+	unsigned int lower_bound = 0;
+
+	if (shallow_depth >= sb->depth)
+		return __map_depth(sb, index);
+
+	if (index > 0)
+		lower_bound += (index - 1) << sb->shift;
+
+	if (shallow_depth <= lower_bound)
+		return 0;
+
+	return min_t(unsigned int, __map_depth(sb, index),
+				   shallow_depth - lower_bound);
+}
+
 static int sbitmap_find_bit(struct sbitmap *sb,
-			    unsigned int depth,
+			    unsigned int shallow_depth,
 			    unsigned int index,
 			    unsigned int alloc_hint,
 			    bool wrap)
@@ -218,12 +237,12 @@ static int sbitmap_find_bit(struct sbitmap *sb,
 	int nr = -1;
 
 	for (i = 0; i < sb->map_nr; i++) {
-		nr = sbitmap_find_bit_in_word(&sb->map[index],
-					      min_t(unsigned int,
-						    __map_depth(sb, index),
-						    depth),
-					      alloc_hint, wrap);
+		unsigned int depth = __map_depth_with_shallow(sb, index,
+							      shallow_depth);
 
+		if (depth)
+			nr = sbitmap_find_bit_in_word(&sb->map[index], depth,
+						      alloc_hint, wrap);
 		if (nr != -1) {
 			nr += index << sb->shift;
 			break;
@@ -406,27 +425,9 @@ EXPORT_SYMBOL_GPL(sbitmap_bitmap_show);
 static unsigned int sbq_calc_wake_batch(struct sbitmap_queue *sbq,
 					unsigned int depth)
 {
-	unsigned int wake_batch;
-	unsigned int shallow_depth;
-
-	/*
-	 * Each full word of the bitmap has bits_per_word bits, and there might
-	 * be a partial word. There are depth / bits_per_word full words and
-	 * depth % bits_per_word bits left over. In bitwise arithmetic:
-	 *
-	 * bits_per_word = 1 << shift
-	 * depth / bits_per_word = depth >> shift
-	 * depth % bits_per_word = depth & ((1 << shift) - 1)
-	 *
-	 * Each word can be limited to sbq->min_shallow_depth bits.
-	 */
-	shallow_depth = min(1U << sbq->sb.shift, sbq->min_shallow_depth);
-	depth = ((depth >> sbq->sb.shift) * shallow_depth +
-		 min(depth & ((1U << sbq->sb.shift) - 1), shallow_depth));
-	wake_batch = clamp_t(unsigned int, depth / SBQ_WAIT_QUEUES, 1,
-			     SBQ_WAKE_BATCH);
-
-	return wake_batch;
+	return clamp_t(unsigned int,
+		       min(depth, sbq->min_shallow_depth) / SBQ_WAIT_QUEUES,
+		       1, SBQ_WAKE_BATCH);
 }
 
 int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] lib/sbitmap: make sbitmap_get_shallow() static
  2025-07-29  3:19 [PATCH v2 0/2] lib/sbitmap: convert shallow_depth from one word to the whole sbitmap Yu Kuai
  2025-07-29  3:19 ` [PATCH v2 1/2] " Yu Kuai
@ 2025-07-29  3:19 ` Yu Kuai
  2025-07-29  9:36   ` Jan Kara
  1 sibling, 1 reply; 12+ messages in thread
From: Yu Kuai @ 2025-07-29  3:19 UTC (permalink / raw)
  To: jack, axboe, akpm, yang.yang, dlemoal, ming.lei
  Cc: linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

Because it is only used in sbitmap.c

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
---
 include/linux/sbitmap.h | 17 -----------------
 lib/sbitmap.c           | 18 ++++++++++++++++--
 2 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
index 4adf4b364fcd..ffb9907c7070 100644
--- a/include/linux/sbitmap.h
+++ b/include/linux/sbitmap.h
@@ -209,23 +209,6 @@ void sbitmap_resize(struct sbitmap *sb, unsigned int depth);
  */
 int sbitmap_get(struct sbitmap *sb);
 
-/**
- * sbitmap_get_shallow() - Try to allocate a free bit from a &struct sbitmap,
- * limiting the depth used from each word.
- * @sb: Bitmap to allocate from.
- * @shallow_depth: The maximum number of bits to allocate from the bitmap.
- *
- * This rather specific operation allows for having multiple users with
- * different allocation limits. E.g., there can be a high-priority class that
- * uses sbitmap_get() and a low-priority class that uses sbitmap_get_shallow()
- * with a @shallow_depth of (sb->depth >> 1). Then, the low-priority
- * class can only allocate half of the total bits in the bitmap, preventing it
- * from starving out the high-priority class.
- *
- * Return: Non-negative allocated bit number if successful, -1 otherwise.
- */
-int sbitmap_get_shallow(struct sbitmap *sb, unsigned long shallow_depth);
-
 /**
  * sbitmap_any_bit_set() - Check for a set bit in a &struct sbitmap.
  * @sb: Bitmap to check.
diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index f2e90ac6b56e..5e3c35086253 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -306,7 +306,22 @@ static int __sbitmap_get_shallow(struct sbitmap *sb,
 	return sbitmap_find_bit(sb, shallow_depth, index, alloc_hint, true);
 }
 
-int sbitmap_get_shallow(struct sbitmap *sb, unsigned long shallow_depth)
+/**
+ * sbitmap_get_shallow() - Try to allocate a free bit from a &struct sbitmap,
+ * limiting the depth used from each word.
+ * @sb: Bitmap to allocate from.
+ * @shallow_depth: The maximum number of bits to allocate from the bitmap.
+ *
+ * This rather specific operation allows for having multiple users with
+ * different allocation limits. E.g., there can be a high-priority class that
+ * uses sbitmap_get() and a low-priority class that uses sbitmap_get_shallow()
+ * with a @shallow_depth of (sb->depth >> 1). Then, the low-priority
+ * class can only allocate half of the total bits in the bitmap, preventing it
+ * from starving out the high-priority class.
+ *
+ * Return: Non-negative allocated bit number if successful, -1 otherwise.
+ */
+static int sbitmap_get_shallow(struct sbitmap *sb, unsigned long shallow_depth)
 {
 	int nr;
 	unsigned int hint, depth;
@@ -321,7 +336,6 @@ int sbitmap_get_shallow(struct sbitmap *sb, unsigned long shallow_depth)
 
 	return nr;
 }
-EXPORT_SYMBOL_GPL(sbitmap_get_shallow);
 
 bool sbitmap_any_bit_set(const struct sbitmap *sb)
 {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] lib/sbitmap: make sbitmap_get_shallow() static
  2025-07-29  3:19 ` [PATCH v2 2/2] lib/sbitmap: make sbitmap_get_shallow() static Yu Kuai
@ 2025-07-29  9:36   ` Jan Kara
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2025-07-29  9:36 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, axboe, akpm, yang.yang, dlemoal, ming.lei, linux-block,
	linux-kernel, yukuai3, yi.zhang, yangerkun, johnny.chenyi

On Tue 29-07-25 11:19:06, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Because it is only used in sbitmap.c
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

Indeed. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  include/linux/sbitmap.h | 17 -----------------
>  lib/sbitmap.c           | 18 ++++++++++++++++--
>  2 files changed, 16 insertions(+), 19 deletions(-)
> 
> diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
> index 4adf4b364fcd..ffb9907c7070 100644
> --- a/include/linux/sbitmap.h
> +++ b/include/linux/sbitmap.h
> @@ -209,23 +209,6 @@ void sbitmap_resize(struct sbitmap *sb, unsigned int depth);
>   */
>  int sbitmap_get(struct sbitmap *sb);
>  
> -/**
> - * sbitmap_get_shallow() - Try to allocate a free bit from a &struct sbitmap,
> - * limiting the depth used from each word.
> - * @sb: Bitmap to allocate from.
> - * @shallow_depth: The maximum number of bits to allocate from the bitmap.
> - *
> - * This rather specific operation allows for having multiple users with
> - * different allocation limits. E.g., there can be a high-priority class that
> - * uses sbitmap_get() and a low-priority class that uses sbitmap_get_shallow()
> - * with a @shallow_depth of (sb->depth >> 1). Then, the low-priority
> - * class can only allocate half of the total bits in the bitmap, preventing it
> - * from starving out the high-priority class.
> - *
> - * Return: Non-negative allocated bit number if successful, -1 otherwise.
> - */
> -int sbitmap_get_shallow(struct sbitmap *sb, unsigned long shallow_depth);
> -
>  /**
>   * sbitmap_any_bit_set() - Check for a set bit in a &struct sbitmap.
>   * @sb: Bitmap to check.
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index f2e90ac6b56e..5e3c35086253 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -306,7 +306,22 @@ static int __sbitmap_get_shallow(struct sbitmap *sb,
>  	return sbitmap_find_bit(sb, shallow_depth, index, alloc_hint, true);
>  }
>  
> -int sbitmap_get_shallow(struct sbitmap *sb, unsigned long shallow_depth)
> +/**
> + * sbitmap_get_shallow() - Try to allocate a free bit from a &struct sbitmap,
> + * limiting the depth used from each word.
> + * @sb: Bitmap to allocate from.
> + * @shallow_depth: The maximum number of bits to allocate from the bitmap.
> + *
> + * This rather specific operation allows for having multiple users with
> + * different allocation limits. E.g., there can be a high-priority class that
> + * uses sbitmap_get() and a low-priority class that uses sbitmap_get_shallow()
> + * with a @shallow_depth of (sb->depth >> 1). Then, the low-priority
> + * class can only allocate half of the total bits in the bitmap, preventing it
> + * from starving out the high-priority class.
> + *
> + * Return: Non-negative allocated bit number if successful, -1 otherwise.
> + */
> +static int sbitmap_get_shallow(struct sbitmap *sb, unsigned long shallow_depth)
>  {
>  	int nr;
>  	unsigned int hint, depth;
> @@ -321,7 +336,6 @@ int sbitmap_get_shallow(struct sbitmap *sb, unsigned long shallow_depth)
>  
>  	return nr;
>  }
> -EXPORT_SYMBOL_GPL(sbitmap_get_shallow);
>  
>  bool sbitmap_any_bit_set(const struct sbitmap *sb)
>  {
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] lib/sbitmap: convert shallow_depth from one word to the whole sbitmap
  2025-07-29  3:19 ` [PATCH v2 1/2] " Yu Kuai
@ 2025-07-29 10:16   ` Jan Kara
  2025-07-30  2:03     ` Yu Kuai
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2025-07-29 10:16 UTC (permalink / raw)
  To: Yu Kuai
  Cc: jack, axboe, akpm, yang.yang, dlemoal, ming.lei, linux-block,
	linux-kernel, yukuai3, yi.zhang, yangerkun, johnny.chenyi,
	Omar Sandoval

On Tue 29-07-25 11:19:05, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> Currently elevators will record internal 'async_depth' to throttle
> asynchronous requests, and they both calculate shallow_dpeth based on
> sb->shift, with the respect that sb->shift is the available tags in one
> word.
> 
> However, sb->shift is not the availbale tags in the last word, see
> __map_depth:
> 
> if (index == sb->map_nr - 1)
>   return sb->depth - (index << sb->shift);
> 
> For consequence, if the last word is used, more tags can be get than
> expected, for example, assume nr_requests=256 and there are four words,
> in the worst case if user set nr_requests=32, then the first word is
> the last word, and still use bits per word, which is 64, to calculate
> async_depth is wrong.
> 
> One the other hand, due to cgroup qos, bfq can allow only one request
> to be allocated, and set shallow_dpeth=1 will still allow the number
> of words request to be allocated.
> 
> Fix this problems by using shallow_depth to the whole sbitmap instead
> of per word, also change kyber, mq-deadline and bfq to follow this.
> 
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>

I agree with these problems but AFAIU this implementation of shallow depth
has been done for a reason. Omar can chime in here as the original author
or perhaps Jens but the idea of current shallow depth implementation is
that each sbitmap user regardless of used shallow depth has a chance to
allocate from each sbitmap word which evenly distributes pressure among
available sbitmap words. With the implementation you've chosen there will
be higher pressure (and thus contention) on words with low indices.

So I think we would be good to fix issues with shallow depth for small
number of sbitmap words (because that's where these buggy cornercases may
matter in practice) but I believe the logic which constrains number of used
bits from each *word* when shallow_depth is specified should be kept.  It
might make sense to change the API so that shallow_depth is indeed
specified compared to the total size of the bitmap, not to the size of the
word (because that's confusing practically everybody I've met and is a
constant source of bugs) if it can be made to perform well.

								Honza

> ---
>  block/bfq-iosched.c     | 35 ++++++++++++--------------
>  block/bfq-iosched.h     |  3 +--
>  block/kyber-iosched.c   |  9 ++-----
>  block/mq-deadline.c     | 16 +-----------
>  include/linux/sbitmap.h |  6 ++---
>  lib/sbitmap.c           | 55 +++++++++++++++++++++--------------------
>  6 files changed, 51 insertions(+), 73 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index f71ec0887733..a6a574a8eac9 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -694,17 +694,13 @@ static void bfq_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
>  {
>  	struct bfq_data *bfqd = data->q->elevator->elevator_data;
>  	struct bfq_io_cq *bic = bfq_bic_lookup(data->q);
> -	int depth;
> -	unsigned limit = data->q->nr_requests;
> -	unsigned int act_idx;
> +	unsigned int limit, act_idx;
>  
>  	/* Sync reads have full depth available */
> -	if (op_is_sync(opf) && !op_is_write(opf)) {
> -		depth = 0;
> -	} else {
> -		depth = bfqd->word_depths[!!bfqd->wr_busy_queues][op_is_sync(opf)];
> -		limit = (limit * depth) >> bfqd->full_depth_shift;
> -	}
> +	if (op_is_sync(opf) && !op_is_write(opf))
> +		limit = data->q->nr_requests;
> +	else
> +		limit = bfqd->async_depths[!!bfqd->wr_busy_queues][op_is_sync(opf)];
>  
>  	for (act_idx = 0; bic && act_idx < bfqd->num_actuators; act_idx++) {
>  		/* Fast path to check if bfqq is already allocated. */
> @@ -718,14 +714,16 @@ static void bfq_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
>  		 * available requests and thus starve other entities.
>  		 */
>  		if (bfqq_request_over_limit(bfqd, bic, opf, act_idx, limit)) {
> -			depth = 1;
> +			limit = 1;
>  			break;
>  		}
>  	}
> +
>  	bfq_log(bfqd, "[%s] wr_busy %d sync %d depth %u",
> -		__func__, bfqd->wr_busy_queues, op_is_sync(opf), depth);
> -	if (depth)
> -		data->shallow_depth = depth;
> +		__func__, bfqd->wr_busy_queues, op_is_sync(opf), limit);
> +
> +	if (limit < data->q->nr_requests)
> +		data->shallow_depth = limit;
>  }
>  
>  static struct bfq_queue *
> @@ -7114,9 +7112,8 @@ void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
>   */
>  static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
>  {
> -	unsigned int depth = 1U << bt->sb.shift;
> +	unsigned int nr_requests = bfqd->queue->nr_requests;
>  
> -	bfqd->full_depth_shift = bt->sb.shift;
>  	/*
>  	 * In-word depths if no bfq_queue is being weight-raised:
>  	 * leaving 25% of tags only for sync reads.
> @@ -7128,13 +7125,13 @@ static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
>  	 * limit 'something'.
>  	 */
>  	/* no more than 50% of tags for async I/O */
> -	bfqd->word_depths[0][0] = max(depth >> 1, 1U);
> +	bfqd->async_depths[0][0] = max(nr_requests >> 1, 1U);
>  	/*
>  	 * no more than 75% of tags for sync writes (25% extra tags
>  	 * w.r.t. async I/O, to prevent async I/O from starving sync
>  	 * writes)
>  	 */
> -	bfqd->word_depths[0][1] = max((depth * 3) >> 2, 1U);
> +	bfqd->async_depths[0][1] = max((nr_requests * 3) >> 2, 1U);
>  
>  	/*
>  	 * In-word depths in case some bfq_queue is being weight-
> @@ -7144,9 +7141,9 @@ static void bfq_update_depths(struct bfq_data *bfqd, struct sbitmap_queue *bt)
>  	 * shortage.
>  	 */
>  	/* no more than ~18% of tags for async I/O */
> -	bfqd->word_depths[1][0] = max((depth * 3) >> 4, 1U);
> +	bfqd->async_depths[1][0] = max((nr_requests * 3) >> 4, 1U);
>  	/* no more than ~37% of tags for sync writes (~20% extra tags) */
> -	bfqd->word_depths[1][1] = max((depth * 6) >> 4, 1U);
> +	bfqd->async_depths[1][1] = max((nr_requests * 6) >> 4, 1U);
>  }
>  
>  static void bfq_depth_updated(struct blk_mq_hw_ctx *hctx)
> diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
> index 687a3a7ba784..31217f196f4f 100644
> --- a/block/bfq-iosched.h
> +++ b/block/bfq-iosched.h
> @@ -813,8 +813,7 @@ struct bfq_data {
>  	 * Depth limits used in bfq_limit_depth (see comments on the
>  	 * function)
>  	 */
> -	unsigned int word_depths[2][2];
> -	unsigned int full_depth_shift;
> +	unsigned int async_depths[2][2];
>  
>  	/*
>  	 * Number of independent actuators. This is equal to 1 in
> diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
> index 4dba8405bd01..bfd9a40bb33d 100644
> --- a/block/kyber-iosched.c
> +++ b/block/kyber-iosched.c
> @@ -157,10 +157,7 @@ struct kyber_queue_data {
>  	 */
>  	struct sbitmap_queue domain_tokens[KYBER_NUM_DOMAINS];
>  
> -	/*
> -	 * Async request percentage, converted to per-word depth for
> -	 * sbitmap_get_shallow().
> -	 */
> +	/* Number of allowed async requests. */
>  	unsigned int async_depth;
>  
>  	struct kyber_cpu_latency __percpu *cpu_latency;
> @@ -454,10 +451,8 @@ static void kyber_depth_updated(struct blk_mq_hw_ctx *hctx)
>  {
>  	struct kyber_queue_data *kqd = hctx->queue->elevator->elevator_data;
>  	struct blk_mq_tags *tags = hctx->sched_tags;
> -	unsigned int shift = tags->bitmap_tags.sb.shift;
> -
> -	kqd->async_depth = (1U << shift) * KYBER_ASYNC_PERCENT / 100U;
>  
> +	kqd->async_depth = hctx->queue->nr_requests * KYBER_ASYNC_PERCENT / 100U;
>  	sbitmap_queue_min_shallow_depth(&tags->bitmap_tags, kqd->async_depth);
>  }
>  
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index 2edf1cac06d5..9ab6c6256695 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -487,20 +487,6 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
>  	return rq;
>  }
>  
> -/*
> - * 'depth' is a number in the range 1..INT_MAX representing a number of
> - * requests. Scale it with a factor (1 << bt->sb.shift) / q->nr_requests since
> - * 1..(1 << bt->sb.shift) is the range expected by sbitmap_get_shallow().
> - * Values larger than q->nr_requests have the same effect as q->nr_requests.
> - */
> -static int dd_to_word_depth(struct blk_mq_hw_ctx *hctx, unsigned int qdepth)
> -{
> -	struct sbitmap_queue *bt = &hctx->sched_tags->bitmap_tags;
> -	const unsigned int nrr = hctx->queue->nr_requests;
> -
> -	return ((qdepth << bt->sb.shift) + nrr - 1) / nrr;
> -}
> -
>  /*
>   * Called by __blk_mq_alloc_request(). The shallow_depth value set by this
>   * function is used by __blk_mq_get_tag().
> @@ -517,7 +503,7 @@ static void dd_limit_depth(blk_opf_t opf, struct blk_mq_alloc_data *data)
>  	 * Throttle asynchronous requests and writes such that these requests
>  	 * do not block the allocation of synchronous requests.
>  	 */
> -	data->shallow_depth = dd_to_word_depth(data->hctx, dd->async_depth);
> +	data->shallow_depth = dd->async_depth;
>  }
>  
>  /* Called by blk_mq_update_nr_requests(). */
> diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h
> index 189140bf11fc..4adf4b364fcd 100644
> --- a/include/linux/sbitmap.h
> +++ b/include/linux/sbitmap.h
> @@ -213,12 +213,12 @@ int sbitmap_get(struct sbitmap *sb);
>   * sbitmap_get_shallow() - Try to allocate a free bit from a &struct sbitmap,
>   * limiting the depth used from each word.
>   * @sb: Bitmap to allocate from.
> - * @shallow_depth: The maximum number of bits to allocate from a single word.
> + * @shallow_depth: The maximum number of bits to allocate from the bitmap.
>   *
>   * This rather specific operation allows for having multiple users with
>   * different allocation limits. E.g., there can be a high-priority class that
>   * uses sbitmap_get() and a low-priority class that uses sbitmap_get_shallow()
> - * with a @shallow_depth of (1 << (@sb->shift - 1)). Then, the low-priority
> + * with a @shallow_depth of (sb->depth >> 1). Then, the low-priority
>   * class can only allocate half of the total bits in the bitmap, preventing it
>   * from starving out the high-priority class.
>   *
> @@ -478,7 +478,7 @@ unsigned long __sbitmap_queue_get_batch(struct sbitmap_queue *sbq, int nr_tags,
>   * sbitmap_queue, limiting the depth used from each word, with preemption
>   * already disabled.
>   * @sbq: Bitmap queue to allocate from.
> - * @shallow_depth: The maximum number of bits to allocate from a single word.
> + * @shallow_depth: The maximum number of bits to allocate from the queue.
>   * See sbitmap_get_shallow().
>   *
>   * If you call this, make sure to call sbitmap_queue_min_shallow_depth() after
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index d3412984170c..f2e90ac6b56e 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -208,8 +208,27 @@ static int sbitmap_find_bit_in_word(struct sbitmap_word *map,
>  	return nr;
>  }
>  
> +static unsigned int __map_depth_with_shallow(const struct sbitmap *sb,
> +					     int index,
> +					     unsigned int shallow_depth)
> +{
> +	unsigned int lower_bound = 0;
> +
> +	if (shallow_depth >= sb->depth)
> +		return __map_depth(sb, index);
> +
> +	if (index > 0)
> +		lower_bound += (index - 1) << sb->shift;
> +
> +	if (shallow_depth <= lower_bound)
> +		return 0;
> +
> +	return min_t(unsigned int, __map_depth(sb, index),
> +				   shallow_depth - lower_bound);
> +}
> +
>  static int sbitmap_find_bit(struct sbitmap *sb,
> -			    unsigned int depth,
> +			    unsigned int shallow_depth,
>  			    unsigned int index,
>  			    unsigned int alloc_hint,
>  			    bool wrap)
> @@ -218,12 +237,12 @@ static int sbitmap_find_bit(struct sbitmap *sb,
>  	int nr = -1;
>  
>  	for (i = 0; i < sb->map_nr; i++) {
> -		nr = sbitmap_find_bit_in_word(&sb->map[index],
> -					      min_t(unsigned int,
> -						    __map_depth(sb, index),
> -						    depth),
> -					      alloc_hint, wrap);
> +		unsigned int depth = __map_depth_with_shallow(sb, index,
> +							      shallow_depth);
>  
> +		if (depth)
> +			nr = sbitmap_find_bit_in_word(&sb->map[index], depth,
> +						      alloc_hint, wrap);
>  		if (nr != -1) {
>  			nr += index << sb->shift;
>  			break;
> @@ -406,27 +425,9 @@ EXPORT_SYMBOL_GPL(sbitmap_bitmap_show);
>  static unsigned int sbq_calc_wake_batch(struct sbitmap_queue *sbq,
>  					unsigned int depth)
>  {
> -	unsigned int wake_batch;
> -	unsigned int shallow_depth;
> -
> -	/*
> -	 * Each full word of the bitmap has bits_per_word bits, and there might
> -	 * be a partial word. There are depth / bits_per_word full words and
> -	 * depth % bits_per_word bits left over. In bitwise arithmetic:
> -	 *
> -	 * bits_per_word = 1 << shift
> -	 * depth / bits_per_word = depth >> shift
> -	 * depth % bits_per_word = depth & ((1 << shift) - 1)
> -	 *
> -	 * Each word can be limited to sbq->min_shallow_depth bits.
> -	 */
> -	shallow_depth = min(1U << sbq->sb.shift, sbq->min_shallow_depth);
> -	depth = ((depth >> sbq->sb.shift) * shallow_depth +
> -		 min(depth & ((1U << sbq->sb.shift) - 1), shallow_depth));
> -	wake_batch = clamp_t(unsigned int, depth / SBQ_WAIT_QUEUES, 1,
> -			     SBQ_WAKE_BATCH);
> -
> -	return wake_batch;
> +	return clamp_t(unsigned int,
> +		       min(depth, sbq->min_shallow_depth) / SBQ_WAIT_QUEUES,
> +		       1, SBQ_WAKE_BATCH);
>  }
>  
>  int sbitmap_queue_init_node(struct sbitmap_queue *sbq, unsigned int depth,
> -- 
> 2.39.2
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] lib/sbitmap: convert shallow_depth from one word to the whole sbitmap
  2025-07-29 10:16   ` Jan Kara
@ 2025-07-30  2:03     ` Yu Kuai
  2025-07-30 13:03       ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Yu Kuai @ 2025-07-30  2:03 UTC (permalink / raw)
  To: Jan Kara, Yu Kuai
  Cc: axboe, akpm, yang.yang, dlemoal, ming.lei, linux-block,
	linux-kernel, yi.zhang, yangerkun, johnny.chenyi, Omar Sandoval,
	yukuai (C)

Hi,

在 2025/07/29 18:16, Jan Kara 写道:
> On Tue 29-07-25 11:19:05, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Currently elevators will record internal 'async_depth' to throttle
>> asynchronous requests, and they both calculate shallow_dpeth based on
>> sb->shift, with the respect that sb->shift is the available tags in one
>> word.
>>
>> However, sb->shift is not the availbale tags in the last word, see
>> __map_depth:
>>
>> if (index == sb->map_nr - 1)
>>    return sb->depth - (index << sb->shift);
>>
>> For consequence, if the last word is used, more tags can be get than
>> expected, for example, assume nr_requests=256 and there are four words,
>> in the worst case if user set nr_requests=32, then the first word is
>> the last word, and still use bits per word, which is 64, to calculate
>> async_depth is wrong.
>>
>> One the other hand, due to cgroup qos, bfq can allow only one request
>> to be allocated, and set shallow_dpeth=1 will still allow the number
>> of words request to be allocated.
>>
>> Fix this problems by using shallow_depth to the whole sbitmap instead
>> of per word, also change kyber, mq-deadline and bfq to follow this.
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> 
> I agree with these problems but AFAIU this implementation of shallow depth
> has been done for a reason. Omar can chime in here as the original author
> or perhaps Jens but the idea of current shallow depth implementation is
> that each sbitmap user regardless of used shallow depth has a chance to
> allocate from each sbitmap word which evenly distributes pressure among
> available sbitmap words. With the implementation you've chosen there will
> be higher pressure (and thus contention) on words with low indices.

Yes, this make sense. However, consider that shallow depth is only used
by elevator, this higher pressure should be negligible for deadline and
bfq. As for kyber, this might be a problem.
> 
> So I think we would be good to fix issues with shallow depth for small
> number of sbitmap words (because that's where these buggy cornercases may
> matter in practice) but I believe the logic which constrains number of used
> bits from each *word* when shallow_depth is specified should be kept.  It
> might make sense to change the API so that shallow_depth is indeed
> specified compared to the total size of the bitmap, not to the size of the
> word (because that's confusing practically everybody I've met and is a
> constant source of bugs) if it can be made to perform well.

Do you think will it be ok to add a new shallow depth API to use the
total size, and convert bfq and deadline to use it?

Thanks,
Kuai

> 
> 								Honza


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] lib/sbitmap: convert shallow_depth from one word to the whole sbitmap
  2025-07-30  2:03     ` Yu Kuai
@ 2025-07-30 13:03       ` Jan Kara
  2025-07-30 18:24         ` Yu Kuai
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2025-07-30 13:03 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Jan Kara, axboe, akpm, yang.yang, dlemoal, ming.lei, linux-block,
	linux-kernel, yi.zhang, yangerkun, johnny.chenyi, Omar Sandoval,
	yukuai (C)

On Wed 30-07-25 10:03:50, Yu Kuai wrote:
> 在 2025/07/29 18:16, Jan Kara 写道:
> > On Tue 29-07-25 11:19:05, Yu Kuai wrote:
> > > From: Yu Kuai <yukuai3@huawei.com>
> > > 
> > > Currently elevators will record internal 'async_depth' to throttle
> > > asynchronous requests, and they both calculate shallow_dpeth based on
> > > sb->shift, with the respect that sb->shift is the available tags in one
> > > word.
> > > 
> > > However, sb->shift is not the availbale tags in the last word, see
> > > __map_depth:
> > > 
> > > if (index == sb->map_nr - 1)
> > >    return sb->depth - (index << sb->shift);
> > > 
> > > For consequence, if the last word is used, more tags can be get than
> > > expected, for example, assume nr_requests=256 and there are four words,
> > > in the worst case if user set nr_requests=32, then the first word is
> > > the last word, and still use bits per word, which is 64, to calculate
> > > async_depth is wrong.
> > > 
> > > One the other hand, due to cgroup qos, bfq can allow only one request
> > > to be allocated, and set shallow_dpeth=1 will still allow the number
> > > of words request to be allocated.
> > > 
> > > Fix this problems by using shallow_depth to the whole sbitmap instead
> > > of per word, also change kyber, mq-deadline and bfq to follow this.
> > > 
> > > Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> > 
> > I agree with these problems but AFAIU this implementation of shallow depth
> > has been done for a reason. Omar can chime in here as the original author
> > or perhaps Jens but the idea of current shallow depth implementation is
> > that each sbitmap user regardless of used shallow depth has a chance to
> > allocate from each sbitmap word which evenly distributes pressure among
> > available sbitmap words. With the implementation you've chosen there will
> > be higher pressure (and thus contention) on words with low indices.
> 
> Yes, this make sense. However, consider that shallow depth is only used
> by elevator, this higher pressure should be negligible for deadline and
> bfq. As for kyber, this might be a problem.

I agree that for bfq the overhead should be in the noise. For mq-deadline
it might be measurable but I'm not overly concerned.

> > So I think we would be good to fix issues with shallow depth for small
> > number of sbitmap words (because that's where these buggy cornercases may
> > matter in practice) but I believe the logic which constrains number of used
> > bits from each *word* when shallow_depth is specified should be kept.  It
> > might make sense to change the API so that shallow_depth is indeed
> > specified compared to the total size of the bitmap, not to the size of the
> > word (because that's confusing practically everybody I've met and is a
> > constant source of bugs) if it can be made to perform well.
> 
> Do you think will it be ok to add a new shallow depth API to use the
> total size, and convert bfq and deadline to use it?

I think having two APIs will be even more confusing than the current state.
But as I wrote I think you can have API to specify shallow depth in total
size and in sbitmap_queue_get_shallow() do:

shallow_per_word = (shallow_depth << sb->shift) / sb->depth;
rounding_index = shallow_depth - shallow_per_word * sb->depth;

and allow depth shallow_per_word + 1 if current index < rounding_index and
exactly shallow_per_word if current index >= rounding_index. This will
still evenly distribute shallow depth among words and I don't think the
additional overhead of the several arithmetic operations will be visible.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] lib/sbitmap: convert shallow_depth from one word to the whole sbitmap
  2025-07-30 13:03       ` Jan Kara
@ 2025-07-30 18:24         ` Yu Kuai
  2025-07-31  2:38           ` Yu Kuai
  0 siblings, 1 reply; 12+ messages in thread
From: Yu Kuai @ 2025-07-30 18:24 UTC (permalink / raw)
  To: Jan Kara, Yu Kuai
  Cc: axboe, akpm, yang.yang, dlemoal, ming.lei, linux-block,
	linux-kernel, yi.zhang, yangerkun, johnny.chenyi, Omar Sandoval,
	yukuai (C)

hi, Jan!

在 2025/7/30 21:03, Jan Kara 写道:
> I think having two APIs will be even more confusing than the current state.
> But as I wrote I think you can have API to specify shallow depth in total
> size and in sbitmap_queue_get_shallow() do:
>
> shallow_per_word = (shallow_depth << sb->shift) / sb->depth;
> rounding_index = shallow_depth - shallow_per_word * sb->depth;
>
> and allow depth shallow_per_word + 1 if current index < rounding_index and
> exactly shallow_per_word if current index >= rounding_index. This will
> still evenly distribute shallow depth among words and I don't think the
> additional overhead of the several arithmetic operations will be visible.
Yes, you're right, I did not get your idea before. Thanks for the 
explanation
and the suggestion :) Will follow this idea in the next version.

Thanks


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] lib/sbitmap: convert shallow_depth from one word to the whole sbitmap
  2025-07-30 18:24         ` Yu Kuai
@ 2025-07-31  2:38           ` Yu Kuai
  2025-07-31 16:27             ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Yu Kuai @ 2025-07-31  2:38 UTC (permalink / raw)
  To: yukuai, Jan Kara, Yu Kuai
  Cc: axboe, akpm, yang.yang, dlemoal, ming.lei, linux-block,
	linux-kernel, yi.zhang, yangerkun, johnny.chenyi, Omar Sandoval,
	yukuai (C)

Hi,

在 2025/07/31 2:24, Yu Kuai 写道:
> hi, Jan!
> 
> 在 2025/7/30 21:03, Jan Kara 写道:
>> I think having two APIs will be even more confusing than the current 
>> state.
>> But as I wrote I think you can have API to specify shallow depth in total
>> size and in sbitmap_queue_get_shallow() do:
>>
>> shallow_per_word = (shallow_depth << sb->shift) / sb->depth;
In order to consider the last word, I think we should use __map_depth()
here.

>> rounding_index = shallow_depth - shallow_per_word * sb->depth;
And then it's not possible to calculate this rounding index easily. How
about following, although the reminder handling is not perfect.

  static unsigned int __map_depth_with_shallow(const struct sbitmap *sb,
                                              int index,
                                              unsigned int shallow_depth)
  {
         unsigned int word_depth = __map_depth(sb, index);
         unsigned int shallow_word_depth = word_depth * shallow_depth;
         unsigned reminder = do_div(shallow_word_depth, sb->depth);

         if (reminder && !(index & 0x1))
                 shallow_word_depth++;

         return shallow_word_depth;
  }


>>
>> and allow depth shallow_per_word + 1 if current index < rounding_index 
>> and
>> exactly shallow_per_word if current index >= rounding_index. This will
>> still evenly distribute shallow depth among words and I don't think the
>> additional overhead of the several arithmetic operations will be visible.
> Yes, you're right, I did not get your idea before. Thanks for the 
> explanation
> and the suggestion :) Will follow this idea in the next version.

> 
> Thanks
> 
> .
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] lib/sbitmap: convert shallow_depth from one word to the whole sbitmap
  2025-07-31  2:38           ` Yu Kuai
@ 2025-07-31 16:27             ` Jan Kara
  2025-08-01  0:24               ` Yu Kuai
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2025-07-31 16:27 UTC (permalink / raw)
  To: Yu Kuai
  Cc: yukuai, Jan Kara, axboe, akpm, yang.yang, dlemoal, ming.lei,
	linux-block, linux-kernel, yi.zhang, yangerkun, johnny.chenyi,
	Omar Sandoval, yukuai (C)

On Thu 31-07-25 10:38:58, Yu Kuai wrote:
> Hi,
> 
> 在 2025/07/31 2:24, Yu Kuai 写道:
> > hi, Jan!
> > 
> > 在 2025/7/30 21:03, Jan Kara 写道:
> > > I think having two APIs will be even more confusing than the current
> > > state.
> > > But as I wrote I think you can have API to specify shallow depth in total
> > > size and in sbitmap_queue_get_shallow() do:
> > > 
> > > shallow_per_word = (shallow_depth << sb->shift) / sb->depth;
> In order to consider the last word, I think we should use __map_depth()
> here.

Right.

> > > rounding_index = shallow_depth - shallow_per_word * sb->depth;
> And then it's not possible to calculate this rounding index easily. How
> about following, although the reminder handling is not perfect.
> 
>  static unsigned int __map_depth_with_shallow(const struct sbitmap *sb,
>                                              int index,
>                                              unsigned int shallow_depth)
>  {
>         unsigned int word_depth = __map_depth(sb, index);
>         unsigned int shallow_word_depth = word_depth * shallow_depth;
>         unsigned reminder = do_div(shallow_word_depth, sb->depth);
> 
>         if (reminder && !(index & 0x1))

Well, why not:
	if (remainder > index)
?

That should accurately distribute the remainder across the remaining words,
shouldn't it?

>                 shallow_word_depth++;
> 
>         return shallow_word_depth;
>  }

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] lib/sbitmap: convert shallow_depth from one word to the whole sbitmap
  2025-07-31 16:27             ` Jan Kara
@ 2025-08-01  0:24               ` Yu Kuai
  2025-08-01  7:44                 ` Yu Kuai
  0 siblings, 1 reply; 12+ messages in thread
From: Yu Kuai @ 2025-08-01  0:24 UTC (permalink / raw)
  To: Jan Kara, Yu Kuai
  Cc: yukuai, axboe, akpm, yang.yang, dlemoal, ming.lei, linux-block,
	linux-kernel, yi.zhang, yangerkun, johnny.chenyi, Omar Sandoval,
	yukuai (C)

Hi,

在 2025/08/01 0:27, Jan Kara 写道:
> On Thu 31-07-25 10:38:58, Yu Kuai wrote:
>> Hi,
>>
>> 在 2025/07/31 2:24, Yu Kuai 写道:
>>> hi, Jan!
>>>
>>> 在 2025/7/30 21:03, Jan Kara 写道:
>>>> I think having two APIs will be even more confusing than the current
>>>> state.
>>>> But as I wrote I think you can have API to specify shallow depth in total
>>>> size and in sbitmap_queue_get_shallow() do:
>>>>
>>>> shallow_per_word = (shallow_depth << sb->shift) / sb->depth;
>> In order to consider the last word, I think we should use __map_depth()
>> here.
> 
> Right.
> 
>>>> rounding_index = shallow_depth - shallow_per_word * sb->depth;
>> And then it's not possible to calculate this rounding index easily. How
>> about following, although the reminder handling is not perfect.
>>
>>   static unsigned int __map_depth_with_shallow(const struct sbitmap *sb,
>>                                               int index,
>>                                               unsigned int shallow_depth)
>>   {
>>          unsigned int word_depth = __map_depth(sb, index);
>>          unsigned int shallow_word_depth = word_depth * shallow_depth;
>>          unsigned reminder = do_div(shallow_word_depth, sb->depth);
>>
>>          if (reminder && !(index & 0x1))
> 
> Well, why not:
> 	if (remainder > index)
Do you mean reminder > index * shallow_depth? This looks correct, and
with the consideration for the last word:

if (index == sb->map_nr - 1)
	shallow_word_depth = max(shallow_word_depth, 1);
else if (reminder > index * shallow_depth)
	shallow_word_depth++;

Thanks,
Kuai

> ?
> 
> That should accurately distribute the remainder across the remaining words,
> shouldn't it?
> 
>>                  shallow_word_depth++;
>>
>>          return shallow_word_depth;
>>   }
> 
> 								Honza
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] lib/sbitmap: convert shallow_depth from one word to the whole sbitmap
  2025-08-01  0:24               ` Yu Kuai
@ 2025-08-01  7:44                 ` Yu Kuai
  0 siblings, 0 replies; 12+ messages in thread
From: Yu Kuai @ 2025-08-01  7:44 UTC (permalink / raw)
  To: Yu Kuai, Jan Kara
  Cc: yukuai, axboe, akpm, yang.yang, dlemoal, ming.lei, linux-block,
	linux-kernel, yi.zhang, yangerkun, johnny.chenyi, Omar Sandoval,
	yukuai (C)

Hi,

在 2025/08/01 8:24, Yu Kuai 写道:
> Hi,
> 
> 在 2025/08/01 0:27, Jan Kara 写道:
>> On Thu 31-07-25 10:38:58, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2025/07/31 2:24, Yu Kuai 写道:
>>>> hi, Jan!
>>>>
>>>> 在 2025/7/30 21:03, Jan Kara 写道:
>>>>> I think having two APIs will be even more confusing than the current
>>>>> state.
>>>>> But as I wrote I think you can have API to specify shallow depth in 
>>>>> total
>>>>> size and in sbitmap_queue_get_shallow() do:
>>>>>
>>>>> shallow_per_word = (shallow_depth << sb->shift) / sb->depth;
>>> In order to consider the last word, I think we should use __map_depth()
>>> here.
>>
>> Right.
>>
>>>>> rounding_index = shallow_depth - shallow_per_word * sb->depth;
>>> And then it's not possible to calculate this rounding index easily. How
>>> about following, although the reminder handling is not perfect.
>>>
>>>   static unsigned int __map_depth_with_shallow(const struct sbitmap *sb,
>>>                                               int index,
>>>                                               unsigned int 
>>> shallow_depth)
>>>   {
>>>          unsigned int word_depth = __map_depth(sb, index);
>>>          unsigned int shallow_word_depth = word_depth * shallow_depth;
>>>          unsigned reminder = do_div(shallow_word_depth, sb->depth);
>>>
>>>          if (reminder && !(index & 0x1))
>>
>> Well, why not:
>>     if (remainder > index)
> Do you mean reminder > index * shallow_depth? This looks correct, and
> with the consideration for the last word:
> 
> if (index == sb->map_nr - 1)
>      shallow_word_depth = max(shallow_word_depth, 1);
> else if (reminder > index * shallow_depth)

Sorry there is a mistake, should use the word_depth here, following is
an example for 4 word sbitmap(64+64+64+32), I do the math manually, and
the results look perfect :)

| shallow_depth | word0  | word1  | word2  | word3 | total |
| ------------- | ------ | ------ | ------ | ----- | ----- |
| 224           | 64     | 64     | 64     | 32    | 224   |
| 112           | 32     | 32     | 32     | 16    | 112   |
| 113           | 32 + 1 | 32     | 32     | 16    | 113   |
| 114           | 32 + 1 | 32 + 1 | 32     | 16    | 114   |
| 115           | 32 + 1 | 32 + 1 | 32 + 1 | 16    | 115   |
| 116           | 33     | 33     | 33     | 16+1  | 116   |
>      shallow_word_depth++;
> 
> Thanks,
> Kuai
> 
>> ?
>>
>> That should accurately distribute the remainder across the remaining 
>> words,
>> shouldn't it?
>>
>>>                  shallow_word_depth++;
>>>
>>>          return shallow_word_depth;
>>>   }
>>
>>                                 Honza
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-08-01  7:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-29  3:19 [PATCH v2 0/2] lib/sbitmap: convert shallow_depth from one word to the whole sbitmap Yu Kuai
2025-07-29  3:19 ` [PATCH v2 1/2] " Yu Kuai
2025-07-29 10:16   ` Jan Kara
2025-07-30  2:03     ` Yu Kuai
2025-07-30 13:03       ` Jan Kara
2025-07-30 18:24         ` Yu Kuai
2025-07-31  2:38           ` Yu Kuai
2025-07-31 16:27             ` Jan Kara
2025-08-01  0:24               ` Yu Kuai
2025-08-01  7:44                 ` Yu Kuai
2025-07-29  3:19 ` [PATCH v2 2/2] lib/sbitmap: make sbitmap_get_shallow() static Yu Kuai
2025-07-29  9:36   ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).