* [PATCH RFC md-6.16 v3 01/19] md/md-bitmap: add {start, end}_discard in bitmap_operations
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 4:39 ` Christoph Hellwig
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 02/19] md: support discard for bitmap ops Yu Kuai
` (18 subsequent siblings)
19 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
Prepare to support discard in new md bitmap.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md-bitmap.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
index d3d50629af91..c3fc051c88e9 100644
--- a/drivers/md/md-bitmap.h
+++ b/drivers/md/md-bitmap.h
@@ -94,6 +94,11 @@ struct bitmap_operations {
unsigned long sectors);
void (*endwrite)(struct mddev *mddev, sector_t offset,
unsigned long sectors);
+ int (*start_discard)(struct mddev *mddev, sector_t offset,
+ unsigned long sectors);
+ void (*end_discard)(struct mddev *mddev, sector_t offset,
+ unsigned long sectors);
+
bool (*start_sync)(struct mddev *mddev, sector_t offset,
sector_t *blocks, bool degraded);
void (*end_sync)(struct mddev *mddev, sector_t offset, sector_t *blocks);
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 01/19] md/md-bitmap: add {start, end}_discard in bitmap_operations
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 01/19] md/md-bitmap: add {start, end}_discard in bitmap_operations Yu Kuai
@ 2025-05-12 4:39 ` Christoph Hellwig
0 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 4:39 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi
On Mon, May 12, 2025 at 09:19:09AM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> Prepare to support discard in new md bitmap.
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH RFC md-6.16 v3 02/19] md: support discard for bitmap ops
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 01/19] md/md-bitmap: add {start, end}_discard in bitmap_operations Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 4:41 ` Christoph Hellwig
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 03/19] md/md-bitmap: remove parameter slot from bitmap_create() Yu Kuai
` (17 subsequent siblings)
19 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
If {start, end}_discard is implemented from bitmap_operations, then they
will be used to handle discard IO. Currently md-bitmap handle discard
the same as normal write, prepare to support discard for new md bitmap.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md.c | 19 +++++++++++++++----
drivers/md/md.h | 1 +
2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 32b997dfe6f4..c72c13ed4253 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8849,14 +8849,24 @@ static void md_bitmap_start(struct mddev *mddev,
mddev->pers->bitmap_sector(mddev, &md_io_clone->offset,
&md_io_clone->sectors);
- mddev->bitmap_ops->startwrite(mddev, md_io_clone->offset,
- md_io_clone->sectors);
+ if (unlikely(md_io_clone->rw == STAT_DISCARD) &&
+ mddev->bitmap_ops->start_discard)
+ mddev->bitmap_ops->start_discard(mddev, md_io_clone->offset,
+ md_io_clone->sectors);
+ else
+ mddev->bitmap_ops->startwrite(mddev, md_io_clone->offset,
+ md_io_clone->sectors);
}
static void md_bitmap_end(struct mddev *mddev, struct md_io_clone *md_io_clone)
{
- mddev->bitmap_ops->endwrite(mddev, md_io_clone->offset,
- md_io_clone->sectors);
+ if (unlikely(md_io_clone->rw == STAT_DISCARD) &&
+ mddev->bitmap_ops->end_discard)
+ mddev->bitmap_ops->end_discard(mddev, md_io_clone->offset,
+ md_io_clone->sectors);
+ else
+ mddev->bitmap_ops->endwrite(mddev, md_io_clone->offset,
+ md_io_clone->sectors);
}
static void md_end_clone_io(struct bio *bio)
@@ -8895,6 +8905,7 @@ static void md_clone_bio(struct mddev *mddev, struct bio **bio)
if (bio_data_dir(*bio) == WRITE && md_bitmap_enabled(mddev)) {
md_io_clone->offset = (*bio)->bi_iter.bi_sector;
md_io_clone->sectors = bio_sectors(*bio);
+ md_io_clone->rw = op_stat_group(bio_op(*bio));
md_bitmap_start(mddev, md_io_clone);
}
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 6eb5dfdf2f55..c474bf74c345 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -850,6 +850,7 @@ struct md_io_clone {
unsigned long start_time;
sector_t offset;
unsigned long sectors;
+ enum stat_group rw;
struct bio bio_clone;
};
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 02/19] md: support discard for bitmap ops
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 02/19] md: support discard for bitmap ops Yu Kuai
@ 2025-05-12 4:41 ` Christoph Hellwig
2025-05-12 6:05 ` Yu Kuai
0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 4:41 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi
On Mon, May 12, 2025 at 09:19:10AM +0800, Yu Kuai wrote:
> +++ b/drivers/md/md.c
> @@ -8849,14 +8849,24 @@ static void md_bitmap_start(struct mddev *mddev,
> mddev->pers->bitmap_sector(mddev, &md_io_clone->offset,
> &md_io_clone->sectors);
>
> - mddev->bitmap_ops->startwrite(mddev, md_io_clone->offset,
> - md_io_clone->sectors);
> + if (unlikely(md_io_clone->rw == STAT_DISCARD) &&
> + mddev->bitmap_ops->start_discard)
> + mddev->bitmap_ops->start_discard(mddev, md_io_clone->offset,
> + md_io_clone->sectors);
> + else
> + mddev->bitmap_ops->startwrite(mddev, md_io_clone->offset,
> + md_io_clone->sectors);
> }
This interface feels weird, as it would still call into the write
interfaces when the discard ones are not defined instead of doing
nothing. Also shouldn't discard also use a different interface
than md_bitmap_start in the caller?
I'd also expect the final version of this to be merged with the
previous patch, as adding an interface without the only user is a
bit odd.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 02/19] md: support discard for bitmap ops
2025-05-12 4:41 ` Christoph Hellwig
@ 2025-05-12 6:05 ` Yu Kuai
2025-05-12 6:12 ` Christoph Hellwig
0 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 6:05 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: xni, colyli, agk, snitzer, mpatocka, song, linux-kernel, dm-devel,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/05/12 12:41, Christoph Hellwig 写道:
> On Mon, May 12, 2025 at 09:19:10AM +0800, Yu Kuai wrote:
>> +++ b/drivers/md/md.c
>> @@ -8849,14 +8849,24 @@ static void md_bitmap_start(struct mddev *mddev,
>> mddev->pers->bitmap_sector(mddev, &md_io_clone->offset,
>> &md_io_clone->sectors);
>>
>> - mddev->bitmap_ops->startwrite(mddev, md_io_clone->offset,
>> - md_io_clone->sectors);
>> + if (unlikely(md_io_clone->rw == STAT_DISCARD) &&
>> + mddev->bitmap_ops->start_discard)
>> + mddev->bitmap_ops->start_discard(mddev, md_io_clone->offset,
>> + md_io_clone->sectors);
>> + else
>> + mddev->bitmap_ops->startwrite(mddev, md_io_clone->offset,
>> + md_io_clone->sectors);
>> }
>
> This interface feels weird, as it would still call into the write
> interfaces when the discard ones are not defined instead of doing
> nothing. Also shouldn't discard also use a different interface
> than md_bitmap_start in the caller?
This is because the old bitmap handle discard the same as write, I
can't do nothing in this case. Do you prefer also reuse the write
api to new discard api for old bitmap?
For the caller, I think it's fine, since bitmap framwork already
calculate sectors and len for discard as well.
>
> I'd also expect the final version of this to be merged with the
> previous patch, as adding an interface without the only user is a
> bit odd.
Sure.
Thanks,
Kuai
> .
>
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 02/19] md: support discard for bitmap ops
2025-05-12 6:05 ` Yu Kuai
@ 2025-05-12 6:12 ` Christoph Hellwig
2025-05-12 6:22 ` Yu Kuai
0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 6:12 UTC (permalink / raw)
To: Yu Kuai
Cc: Christoph Hellwig, xni, colyli, agk, snitzer, mpatocka, song,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi, yukuai (C)
On Mon, May 12, 2025 at 02:05:56PM +0800, Yu Kuai wrote:
>>> - mddev->bitmap_ops->startwrite(mddev, md_io_clone->offset,
>>> - md_io_clone->sectors);
>>> + if (unlikely(md_io_clone->rw == STAT_DISCARD) &&
>>> + mddev->bitmap_ops->start_discard)
>>> + mddev->bitmap_ops->start_discard(mddev, md_io_clone->offset,
>>> + md_io_clone->sectors);
>>> + else
>>> + mddev->bitmap_ops->startwrite(mddev, md_io_clone->offset,
>>> + md_io_clone->sectors);
>>> }
>>
>> This interface feels weird, as it would still call into the write
>> interfaces when the discard ones are not defined instead of doing
>> nothing. Also shouldn't discard also use a different interface
>> than md_bitmap_start in the caller?
>
> This is because the old bitmap handle discard the same as write, I
> can't do nothing in this case. Do you prefer also reuse the write
> api to new discard api for old bitmap?
It can just point the discard method to the same function as the
existing write one.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 02/19] md: support discard for bitmap ops
2025-05-12 6:12 ` Christoph Hellwig
@ 2025-05-12 6:22 ` Yu Kuai
0 siblings, 0 replies; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 6:22 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: xni, colyli, agk, snitzer, mpatocka, song, linux-kernel, dm-devel,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/05/12 14:12, Christoph Hellwig 写道:
> On Mon, May 12, 2025 at 02:05:56PM +0800, Yu Kuai wrote:
>>>> - mddev->bitmap_ops->startwrite(mddev, md_io_clone->offset,
>>>> - md_io_clone->sectors);
>>>> + if (unlikely(md_io_clone->rw == STAT_DISCARD) &&
>>>> + mddev->bitmap_ops->start_discard)
>>>> + mddev->bitmap_ops->start_discard(mddev, md_io_clone->offset,
>>>> + md_io_clone->sectors);
>>>> + else
>>>> + mddev->bitmap_ops->startwrite(mddev, md_io_clone->offset,
>>>> + md_io_clone->sectors);
>>>> }
>>>
>>> This interface feels weird, as it would still call into the write
>>> interfaces when the discard ones are not defined instead of doing
>>> nothing. Also shouldn't discard also use a different interface
>>> than md_bitmap_start in the caller?
>>
>> This is because the old bitmap handle discard the same as write, I
>> can't do nothing in this case. Do you prefer also reuse the write
>> api to new discard api for old bitmap?
>
> It can just point the discard method to the same function as the
> existing write one.
Yes, this is exactly want I mean, I'll update this in the next version.
Thanks,
Kuai
>
> .
>
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH RFC md-6.16 v3 03/19] md/md-bitmap: remove parameter slot from bitmap_create()
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 01/19] md/md-bitmap: add {start, end}_discard in bitmap_operations Yu Kuai
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 02/19] md: support discard for bitmap ops Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 4:41 ` Christoph Hellwig
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 04/19] md: add a new sysfs api bitmap_version Yu Kuai
` (16 subsequent siblings)
19 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
All callers pass in '-1' for 'slot', hence it can be removed.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md-bitmap.c | 6 +++---
drivers/md/md-bitmap.h | 2 +-
drivers/md/md.c | 6 +++---
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 56a1430a398e..4c5067783b7a 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2183,9 +2183,9 @@ static struct bitmap *__bitmap_create(struct mddev *mddev, int slot)
return ERR_PTR(err);
}
-static int bitmap_create(struct mddev *mddev, int slot)
+static int bitmap_create(struct mddev *mddev)
{
- struct bitmap *bitmap = __bitmap_create(mddev, slot);
+ struct bitmap *bitmap = __bitmap_create(mddev, -1);
if (IS_ERR(bitmap))
return PTR_ERR(bitmap);
@@ -2647,7 +2647,7 @@ location_store(struct mddev *mddev, const char *buf, size_t len)
}
mddev->bitmap_info.offset = offset;
- rv = bitmap_create(mddev, -1);
+ rv = bitmap_create(mddev);
if (rv)
goto out;
diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
index c3fc051c88e9..41d09c6d0c14 100644
--- a/drivers/md/md-bitmap.h
+++ b/drivers/md/md-bitmap.h
@@ -74,7 +74,7 @@ struct bitmap_operations {
struct md_submodule_head head;
bool (*enabled)(void *data);
- int (*create)(struct mddev *mddev, int slot);
+ int (*create)(struct mddev *mddev);
int (*resize)(struct mddev *mddev, sector_t blocks, int chunksize);
int (*load)(struct mddev *mddev);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index c72c13ed4253..dc2b2b274677 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6255,7 +6255,7 @@ int md_run(struct mddev *mddev)
}
if (err == 0 && pers->sync_request && md_bitmap_registered(mddev) &&
(mddev->bitmap_info.file || mddev->bitmap_info.offset)) {
- err = mddev->bitmap_ops->create(mddev, -1);
+ err = mddev->bitmap_ops->create(mddev);
if (err)
pr_warn("%s: failed to create bitmap (%d)\n",
mdname(mddev), err);
@@ -7324,7 +7324,7 @@ static int set_bitmap_file(struct mddev *mddev, int fd)
err = 0;
if (mddev->pers) {
if (fd >= 0) {
- err = mddev->bitmap_ops->create(mddev, -1);
+ err = mddev->bitmap_ops->create(mddev);
if (!err)
err = mddev->bitmap_ops->load(mddev);
@@ -7648,7 +7648,7 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
mddev->bitmap_info.default_offset;
mddev->bitmap_info.space =
mddev->bitmap_info.default_space;
- rv = mddev->bitmap_ops->create(mddev, -1);
+ rv = mddev->bitmap_ops->create(mddev);
if (!rv)
rv = mddev->bitmap_ops->load(mddev);
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 03/19] md/md-bitmap: remove parameter slot from bitmap_create()
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 03/19] md/md-bitmap: remove parameter slot from bitmap_create() Yu Kuai
@ 2025-05-12 4:41 ` Christoph Hellwig
0 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 4:41 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH RFC md-6.16 v3 04/19] md: add a new sysfs api bitmap_version
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
` (2 preceding siblings ...)
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 03/19] md/md-bitmap: remove parameter slot from bitmap_create() Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 4:51 ` Christoph Hellwig
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 05/19] md: delay registration of bitmap_ops until creating bitmap Yu Kuai
` (15 subsequent siblings)
19 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
The api will be used by mdadm to set bitmap version while creating new
array or assemble array, prepare to add a new bitmap.
Currently available options are:
cat /sys/block/md0/md/bitmap_version
none [bitmap]
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md.c | 87 ++++++++++++++++++++++++++++++++++++++++++++++---
drivers/md/md.h | 2 ++
2 files changed, 84 insertions(+), 5 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index dc2b2b274677..e16d3b4033d5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -672,13 +672,13 @@ static void active_io_release(struct percpu_ref *ref)
static void no_op(struct percpu_ref *r) {}
-static void mddev_set_bitmap_ops(struct mddev *mddev, enum md_submodule_id id)
+static void mddev_set_bitmap_ops(struct mddev *mddev)
{
xa_lock(&md_submodule);
- mddev->bitmap_ops = xa_load(&md_submodule, id);
+ mddev->bitmap_ops = xa_load(&md_submodule, mddev->bitmap_id);
xa_unlock(&md_submodule);
if (!mddev->bitmap_ops)
- pr_warn_once("md: can't find bitmap id %d\n", id);
+ pr_warn_once("md: can't find bitmap id %d\n", mddev->bitmap_id);
}
static void mddev_clear_bitmap_ops(struct mddev *mddev)
@@ -688,8 +688,8 @@ static void mddev_clear_bitmap_ops(struct mddev *mddev)
int mddev_init(struct mddev *mddev)
{
- /* TODO: support more versions */
- mddev_set_bitmap_ops(mddev, ID_BITMAP);
+ mddev->bitmap_id = ID_BITMAP;
+ mddev_set_bitmap_ops(mddev);
if (percpu_ref_init(&mddev->active_io, active_io_release,
PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) {
@@ -4155,6 +4155,82 @@ new_level_store(struct mddev *mddev, const char *buf, size_t len)
static struct md_sysfs_entry md_new_level =
__ATTR(new_level, 0664, new_level_show, new_level_store);
+static ssize_t
+bitmap_version_show(struct mddev *mddev, char *page)
+{
+ struct md_submodule_head *head;
+ unsigned long i;
+ ssize_t len = 0;
+
+ if (mddev->bitmap_id == ID_BITMAP_NONE)
+ len += sprintf(page + len, "[none] ");
+ else
+ len += sprintf(page + len, "none ");
+
+ xa_lock(&md_submodule);
+ xa_for_each(&md_submodule, i, head) {
+ if (head->type != MD_BITMAP)
+ continue;
+
+ if (mddev->bitmap_id == head->id)
+ len += sprintf(page + len, "[%s] ", head->name);
+ else
+ len += sprintf(page + len, "%s ", head->name);
+ }
+ xa_unlock(&md_submodule);
+
+ len += sprintf(page + len, "\n");
+ return len;
+}
+
+static ssize_t
+bitmap_version_store(struct mddev *mddev, const char *buf, size_t len)
+{
+ struct md_submodule_head *head;
+ enum md_submodule_id id;
+ unsigned long i;
+ int err;
+
+ if (mddev->bitmap_ops)
+ return -EBUSY;
+
+ err = kstrtoint(buf, 10, &id);
+ if (!err) {
+ if (id == ID_BITMAP_NONE) {
+ mddev->bitmap_id = id;
+ return len;
+ }
+
+ xa_lock(&md_submodule);
+ head = xa_load(&md_submodule, id);
+ xa_unlock(&md_submodule);
+
+ if (head && head->type == MD_BITMAP) {
+ mddev->bitmap_id = id;
+ return len;
+ }
+ }
+
+ if (cmd_match(buf, "none")) {
+ mddev->bitmap_id = ID_BITMAP_NONE;
+ return len;
+ }
+
+ xa_lock(&md_submodule);
+ xa_for_each(&md_submodule, i, head) {
+ if (head->type == MD_BITMAP && cmd_match(buf, head->name)) {
+ mddev->bitmap_id = head->id;
+ xa_unlock(&md_submodule);
+ return len;
+ }
+ }
+ xa_unlock(&md_submodule);
+ return -ENOENT;
+}
+
+static struct md_sysfs_entry md_bitmap_version =
+__ATTR(bitmap_version, 0664, bitmap_version_show, bitmap_version_store);
+
static ssize_t
layout_show(struct mddev *mddev, char *page)
{
@@ -5719,6 +5795,7 @@ __ATTR(serialize_policy, S_IRUGO | S_IWUSR, serialize_policy_show,
static struct attribute *md_default_attrs[] = {
&md_level.attr,
&md_new_level.attr,
+ &md_bitmap_version.attr,
&md_layout.attr,
&md_raid_disks.attr,
&md_uuid.attr,
diff --git a/drivers/md/md.h b/drivers/md/md.h
index c474bf74c345..135d95ba1ebb 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -40,6 +40,7 @@ enum md_submodule_id {
ID_CLUSTER,
ID_BITMAP,
ID_LLBITMAP, /* TODO */
+ ID_BITMAP_NONE,
};
struct md_submodule_head {
@@ -565,6 +566,7 @@ struct mddev {
struct percpu_ref writes_pending;
int sync_checkers; /* # of threads checking writes_pending */
+ enum md_submodule_id bitmap_id;
void *bitmap; /* the bitmap for the device */
struct bitmap_operations *bitmap_ops;
struct {
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 04/19] md: add a new sysfs api bitmap_version
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 04/19] md: add a new sysfs api bitmap_version Yu Kuai
@ 2025-05-12 4:51 ` Christoph Hellwig
2025-05-12 6:12 ` Yu Kuai
0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 4:51 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi
On Mon, May 12, 2025 at 09:19:12AM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> The api will be used by mdadm to set bitmap version while creating new
> array or assemble array, prepare to add a new bitmap.
Maybe call it type or variant instead of version?
Also this needs a Documentation/ file like for other new sysfs APIs.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 04/19] md: add a new sysfs api bitmap_version
2025-05-12 4:51 ` Christoph Hellwig
@ 2025-05-12 6:12 ` Yu Kuai
0 siblings, 0 replies; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 6:12 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: xni, colyli, agk, snitzer, mpatocka, song, linux-kernel, dm-devel,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/05/12 12:51, Christoph Hellwig 写道:
> On Mon, May 12, 2025 at 09:19:12AM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> The api will be used by mdadm to set bitmap version while creating new
>> array or assemble array, prepare to add a new bitmap.
>
> Maybe call it type or variant instead of version?
>
> Also this needs a Documentation/ file like for other new sysfs APIs.
Sure, will update in the next version.
Thanks,
Kuai
>
>
> .
>
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH RFC md-6.16 v3 05/19] md: delay registration of bitmap_ops until creating bitmap
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
` (3 preceding siblings ...)
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 04/19] md: add a new sysfs api bitmap_version Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 4:53 ` Christoph Hellwig
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 06/19] md: add a new parameter 'offset' to md_super_write() Yu Kuai
` (14 subsequent siblings)
19 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
Currently bitmap_ops is registered while allocating mddev, this is fine
when there is only one bitmap_ops, however, after introduing a new
bitmap_ops, user space need a time window to choose which bitmap_ops to
use while creating new array.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md.c | 85 ++++++++++++++++++++++++++++++++-----------------
1 file changed, 56 insertions(+), 29 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index e16d3b4033d5..ba2b981b017c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -674,32 +674,47 @@ static void no_op(struct percpu_ref *r) {}
static void mddev_set_bitmap_ops(struct mddev *mddev)
{
+ struct bitmap_operations *old = mddev->bitmap_ops;
+ struct md_submodule_head *head;
+
+ if (mddev->bitmap_id == ID_BITMAP_NONE ||
+ (old && old->head.id == mddev->bitmap_id))
+ return;
+
xa_lock(&md_submodule);
- mddev->bitmap_ops = xa_load(&md_submodule, mddev->bitmap_id);
+ head = xa_load(&md_submodule, mddev->bitmap_id);
xa_unlock(&md_submodule);
- if (!mddev->bitmap_ops)
- pr_warn_once("md: can't find bitmap id %d\n", mddev->bitmap_id);
+
+ if (WARN_ON_ONCE(!head || head->type != MD_BITMAP)) {
+ pr_err("md: can't find bitmap id %d\n", mddev->bitmap_id);
+ return;
+ }
+
+ if (old && old->group)
+ sysfs_remove_group(&mddev->kobj, old->group);
+
+ mddev->bitmap_ops = (void *)head;
+ if (mddev->bitmap_ops && mddev->bitmap_ops->group &&
+ sysfs_create_group(&mddev->kobj, mddev->bitmap_ops->group))
+ pr_warn("md: cannot register extra bitmap attributes for %s\n",
+ mdname(mddev));
}
static void mddev_clear_bitmap_ops(struct mddev *mddev)
{
+ if (mddev->bitmap_ops && mddev->bitmap_ops->group)
+ sysfs_remove_group(&mddev->kobj, mddev->bitmap_ops->group);
mddev->bitmap_ops = NULL;
}
int mddev_init(struct mddev *mddev)
{
- mddev->bitmap_id = ID_BITMAP;
- mddev_set_bitmap_ops(mddev);
-
if (percpu_ref_init(&mddev->active_io, active_io_release,
- PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) {
- mddev_clear_bitmap_ops(mddev);
+ PERCPU_REF_ALLOW_REINIT, GFP_KERNEL))
return -ENOMEM;
- }
if (percpu_ref_init(&mddev->writes_pending, no_op,
PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) {
- mddev_clear_bitmap_ops(mddev);
percpu_ref_exit(&mddev->active_io);
return -ENOMEM;
}
@@ -727,6 +742,7 @@ int mddev_init(struct mddev *mddev)
mddev->resync_min = 0;
mddev->resync_max = MaxSector;
mddev->level = LEVEL_NONE;
+ mddev->bitmap_id = ID_BITMAP;
INIT_WORK(&mddev->sync_work, md_start_sync);
INIT_WORK(&mddev->del_work, mddev_delayed_delete);
@@ -737,7 +753,6 @@ EXPORT_SYMBOL_GPL(mddev_init);
void mddev_destroy(struct mddev *mddev)
{
- mddev_clear_bitmap_ops(mddev);
percpu_ref_exit(&mddev->active_io);
percpu_ref_exit(&mddev->writes_pending);
}
@@ -6086,11 +6101,6 @@ struct mddev *md_alloc(dev_t dev, char *name)
return ERR_PTR(error);
}
- if (md_bitmap_registered(mddev) && mddev->bitmap_ops->group)
- if (sysfs_create_group(&mddev->kobj, mddev->bitmap_ops->group))
- pr_warn("md: cannot register extra bitmap attributes for %s\n",
- mdname(mddev));
-
kobject_uevent(&mddev->kobj, KOBJ_ADD);
mddev->sysfs_state = sysfs_get_dirent_safe(mddev->kobj.sd, "array_state");
mddev->sysfs_level = sysfs_get_dirent_safe(mddev->kobj.sd, "level");
@@ -6166,6 +6176,25 @@ static void md_safemode_timeout(struct timer_list *t)
static int start_dirty_degraded;
+static int md_bitmap_create(struct mddev *mddev)
+{
+ if (!md_bitmap_registered(mddev))
+ mddev_set_bitmap_ops(mddev);
+ if (!mddev->bitmap_ops)
+ return -ENOENT;
+
+ return mddev->bitmap_ops->create(mddev);
+}
+
+static void md_bitmap_destroy(struct mddev *mddev)
+{
+ if (!md_bitmap_registered(mddev))
+ return;
+
+ mddev->bitmap_ops->destroy(mddev);
+ mddev_clear_bitmap_ops(mddev);
+}
+
int md_run(struct mddev *mddev)
{
int err;
@@ -6330,9 +6359,9 @@ int md_run(struct mddev *mddev)
(unsigned long long)pers->size(mddev, 0, 0) / 2);
err = -EINVAL;
}
- if (err == 0 && pers->sync_request && md_bitmap_registered(mddev) &&
+ if (err == 0 && pers->sync_request &&
(mddev->bitmap_info.file || mddev->bitmap_info.offset)) {
- err = mddev->bitmap_ops->create(mddev);
+ err = md_bitmap_create(mddev);
if (err)
pr_warn("%s: failed to create bitmap (%d)\n",
mdname(mddev), err);
@@ -6405,8 +6434,7 @@ int md_run(struct mddev *mddev)
pers->free(mddev, mddev->private);
mddev->private = NULL;
put_pers(pers);
- if (md_bitmap_registered(mddev))
- mddev->bitmap_ops->destroy(mddev);
+ md_bitmap_destroy(mddev);
abort:
bioset_exit(&mddev->io_clone_set);
exit_sync_set:
@@ -6429,7 +6457,7 @@ int do_md_run(struct mddev *mddev)
if (md_bitmap_registered(mddev)) {
err = mddev->bitmap_ops->load(mddev);
if (err) {
- mddev->bitmap_ops->destroy(mddev);
+ md_bitmap_destroy(mddev);
goto out;
}
}
@@ -6620,8 +6648,7 @@ static void __md_stop(struct mddev *mddev)
{
struct md_personality *pers = mddev->pers;
- if (md_bitmap_registered(mddev))
- mddev->bitmap_ops->destroy(mddev);
+ md_bitmap_destroy(mddev);
mddev_detach(mddev);
spin_lock(&mddev->lock);
mddev->pers = NULL;
@@ -7401,16 +7428,16 @@ static int set_bitmap_file(struct mddev *mddev, int fd)
err = 0;
if (mddev->pers) {
if (fd >= 0) {
- err = mddev->bitmap_ops->create(mddev);
+ err = md_bitmap_create(mddev);
if (!err)
err = mddev->bitmap_ops->load(mddev);
if (err) {
- mddev->bitmap_ops->destroy(mddev);
+ md_bitmap_destroy(mddev);
fd = -1;
}
} else if (fd < 0) {
- mddev->bitmap_ops->destroy(mddev);
+ md_bitmap_destroy(mddev);
}
}
@@ -7725,12 +7752,12 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
mddev->bitmap_info.default_offset;
mddev->bitmap_info.space =
mddev->bitmap_info.default_space;
- rv = mddev->bitmap_ops->create(mddev);
+ rv = md_bitmap_create(mddev);
if (!rv)
rv = mddev->bitmap_ops->load(mddev);
if (rv)
- mddev->bitmap_ops->destroy(mddev);
+ md_bitmap_destroy(mddev);
} else {
struct md_bitmap_stats stats;
@@ -7756,7 +7783,7 @@ static int update_array_info(struct mddev *mddev, mdu_array_info_t *info)
put_cluster_ops(mddev);
mddev->safemode_delay = DEFAULT_SAFEMODE_DELAY;
}
- mddev->bitmap_ops->destroy(mddev);
+ md_bitmap_destroy(mddev);
mddev->bitmap_info.offset = 0;
}
}
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 05/19] md: delay registration of bitmap_ops until creating bitmap
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 05/19] md: delay registration of bitmap_ops until creating bitmap Yu Kuai
@ 2025-05-12 4:53 ` Christoph Hellwig
2025-05-12 6:25 ` Yu Kuai
0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 4:53 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi
> + head = xa_load(&md_submodule, mddev->bitmap_id);
> xa_unlock(&md_submodule);
> - if (!mddev->bitmap_ops)
> - pr_warn_once("md: can't find bitmap id %d\n", mddev->bitmap_id);
> +
> + if (WARN_ON_ONCE(!head || head->type != MD_BITMAP)) {
> + pr_err("md: can't find bitmap id %d\n", mddev->bitmap_id);
> + return;
> + }
This needs a real error return, doesn't it?
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 05/19] md: delay registration of bitmap_ops until creating bitmap
2025-05-12 4:53 ` Christoph Hellwig
@ 2025-05-12 6:25 ` Yu Kuai
0 siblings, 0 replies; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 6:25 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: xni, colyli, agk, snitzer, mpatocka, song, linux-kernel, dm-devel,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/05/12 12:53, Christoph Hellwig 写道:
>> + head = xa_load(&md_submodule, mddev->bitmap_id);
>> xa_unlock(&md_submodule);
>> - if (!mddev->bitmap_ops)
>> - pr_warn_once("md: can't find bitmap id %d\n", mddev->bitmap_id);
>> +
>> + if (WARN_ON_ONCE(!head || head->type != MD_BITMAP)) {
>> + pr_err("md: can't find bitmap id %d\n", mddev->bitmap_id);
>> + return;
>> + }
>
> This needs a real error return, doesn't it?
The caller check and return -ENOENT if mddev->bitmap_ops is NULL, I
can change the code here by checking return value instead.
Thanks,
Kuai
>
>
>
> .
>
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH RFC md-6.16 v3 06/19] md: add a new parameter 'offset' to md_super_write()
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
` (4 preceding siblings ...)
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 05/19] md: delay registration of bitmap_ops until creating bitmap Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 4:55 ` Christoph Hellwig
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 07/19] md/md-bitmap: add a new helper skip_sync_blocks() in bitmap_operations Yu Kuai
` (13 subsequent siblings)
19 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
The parameter is always set to 0 for now, following patches will use
this helper to write llbitmap to underlying disks, allow writing
dirty sectors instead of the whole page.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md-bitmap.c | 3 ++-
drivers/md/md.c | 13 +++++++------
drivers/md/md.h | 3 ++-
3 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 4c5067783b7a..a5602cb5d756 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -468,7 +468,8 @@ static int __write_sb_page(struct md_rdev *rdev, struct bitmap *bitmap,
return -EINVAL;
}
- md_super_write(mddev, rdev, sboff + ps, (int)min(size, bitmap_limit), page);
+ md_super_write(mddev, rdev, sboff + ps, (int)min(size, bitmap_limit),
+ page, 0);
return 0;
}
diff --git a/drivers/md/md.c b/drivers/md/md.c
index ba2b981b017c..4329ecfbe8ff 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1037,7 +1037,8 @@ static void super_written(struct bio *bio)
}
void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
- sector_t sector, int size, struct page *page)
+ sector_t sector, int size, struct page *page,
+ unsigned int offset)
{
/* write first size bytes of page to sector of rdev
* Increment mddev->pending_writes before returning
@@ -1062,7 +1063,7 @@ void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
atomic_inc(&rdev->nr_pending);
bio->bi_iter.bi_sector = sector;
- __bio_add_page(bio, page, size, 0);
+ __bio_add_page(bio, page, size, offset);
bio->bi_private = rdev;
bio->bi_end_io = super_written;
@@ -1673,7 +1674,7 @@ super_90_rdev_size_change(struct md_rdev *rdev, sector_t num_sectors)
num_sectors = (sector_t)(2ULL << 32) - 2;
do {
md_super_write(rdev->mddev, rdev, rdev->sb_start, rdev->sb_size,
- rdev->sb_page);
+ rdev->sb_page, 0);
} while (md_super_wait(rdev->mddev) < 0);
return num_sectors;
}
@@ -2322,7 +2323,7 @@ super_1_rdev_size_change(struct md_rdev *rdev, sector_t num_sectors)
sb->sb_csum = calc_sb_1_csum(sb);
do {
md_super_write(rdev->mddev, rdev, rdev->sb_start, rdev->sb_size,
- rdev->sb_page);
+ rdev->sb_page, 0);
} while (md_super_wait(rdev->mddev) < 0);
return num_sectors;
@@ -2833,7 +2834,7 @@ void md_update_sb(struct mddev *mddev, int force_change)
if (!test_bit(Faulty, &rdev->flags)) {
md_super_write(mddev,rdev,
rdev->sb_start, rdev->sb_size,
- rdev->sb_page);
+ rdev->sb_page, 0);
pr_debug("md: (write) %pg's sb offset: %llu\n",
rdev->bdev,
(unsigned long long)rdev->sb_start);
@@ -2842,7 +2843,7 @@ void md_update_sb(struct mddev *mddev, int force_change)
md_super_write(mddev, rdev,
rdev->badblocks.sector,
rdev->badblocks.size << 9,
- rdev->bb_page);
+ rdev->bb_page, 0);
rdev->badblocks.size = 0;
}
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 135d95ba1ebb..99f6c7a92b48 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -890,7 +890,8 @@ void md_free_cloned_bio(struct bio *bio);
extern bool __must_check md_flush_request(struct mddev *mddev, struct bio *bio);
extern void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
- sector_t sector, int size, struct page *page);
+ sector_t sector, int size, struct page *page,
+ unsigned int offset);
extern int md_super_wait(struct mddev *mddev);
extern int sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
struct page *page, blk_opf_t opf, bool metadata_op);
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 06/19] md: add a new parameter 'offset' to md_super_write()
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 06/19] md: add a new parameter 'offset' to md_super_write() Yu Kuai
@ 2025-05-12 4:55 ` Christoph Hellwig
2025-05-12 6:32 ` Yu Kuai
0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 4:55 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi
On Mon, May 12, 2025 at 09:19:14AM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> The parameter is always set to 0 for now, following patches will use
> this helper to write llbitmap to underlying disks, allow writing
> dirty sectors instead of the whole page.
Givne that there is nothing super-block specific in md_super_write,
maybe use the chance to rename it?
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 06/19] md: add a new parameter 'offset' to md_super_write()
2025-05-12 4:55 ` Christoph Hellwig
@ 2025-05-12 6:32 ` Yu Kuai
2025-05-12 6:39 ` Christoph Hellwig
0 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 6:32 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: xni, colyli, agk, snitzer, mpatocka, song, linux-kernel, dm-devel,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/05/12 12:55, Christoph Hellwig 写道:
> On Mon, May 12, 2025 at 09:19:14AM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> The parameter is always set to 0 for now, following patches will use
>> this helper to write llbitmap to underlying disks, allow writing
>> dirty sectors instead of the whole page.
>
> Givne that there is nothing super-block specific in md_super_write,
> maybe use the chance to rename it?
Ok, how about md_write_metadata()?
Thanks,
Kuai
>
> .
>
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 06/19] md: add a new parameter 'offset' to md_super_write()
2025-05-12 6:32 ` Yu Kuai
@ 2025-05-12 6:39 ` Christoph Hellwig
0 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 6:39 UTC (permalink / raw)
To: Yu Kuai
Cc: Christoph Hellwig, xni, colyli, agk, snitzer, mpatocka, song,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi, yukuai (C)
On Mon, May 12, 2025 at 02:32:28PM +0800, Yu Kuai wrote:
> Ok, how about md_write_metadata()?
Sounds good.
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH RFC md-6.16 v3 07/19] md/md-bitmap: add a new helper skip_sync_blocks() in bitmap_operations
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
` (5 preceding siblings ...)
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 06/19] md: add a new parameter 'offset' to md_super_write() Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 4:56 ` Christoph Hellwig
2025-05-22 2:44 ` Xiao Ni
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 08/19] md/md-bitmap: add a new helper blocks_synced() " Yu Kuai
` (12 subsequent siblings)
19 siblings, 2 replies; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
This helper is used to check if blocks can be skipped before calling
into pers->sync_request(), llbiltmap will use this helper to skip
resync for unwritten/clean data blocks, and recovery/check/repair for
unwritten data blocks;
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md-bitmap.h | 1 +
drivers/md/md.c | 7 +++++++
2 files changed, 8 insertions(+)
diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
index 41d09c6d0c14..13be2a10801a 100644
--- a/drivers/md/md-bitmap.h
+++ b/drivers/md/md-bitmap.h
@@ -99,6 +99,7 @@ struct bitmap_operations {
void (*end_discard)(struct mddev *mddev, sector_t offset,
unsigned long sectors);
+ sector_t (*skip_sync_blocks)(struct mddev *mddev, sector_t offset);
bool (*start_sync)(struct mddev *mddev, sector_t offset,
sector_t *blocks, bool degraded);
void (*end_sync)(struct mddev *mddev, sector_t offset, sector_t *blocks);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4329ecfbe8ff..c23ee9c19cf9 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9370,6 +9370,12 @@ void md_do_sync(struct md_thread *thread)
if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
break;
+ if (mddev->bitmap_ops && mddev->bitmap_ops->skip_sync_blocks) {
+ sectors = mddev->bitmap_ops->skip_sync_blocks(mddev, j);
+ if (sectors)
+ goto update;
+ }
+
sectors = mddev->pers->sync_request(mddev, j, max_sectors,
&skipped);
if (sectors == 0) {
@@ -9385,6 +9391,7 @@ void md_do_sync(struct md_thread *thread)
if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
break;
+update:
j += sectors;
if (j > max_sectors)
/* when skipping, extra large numbers can be returned. */
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 07/19] md/md-bitmap: add a new helper skip_sync_blocks() in bitmap_operations
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 07/19] md/md-bitmap: add a new helper skip_sync_blocks() in bitmap_operations Yu Kuai
@ 2025-05-12 4:56 ` Christoph Hellwig
2025-05-22 2:44 ` Xiao Ni
1 sibling, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 4:56 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 07/19] md/md-bitmap: add a new helper skip_sync_blocks() in bitmap_operations
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 07/19] md/md-bitmap: add a new helper skip_sync_blocks() in bitmap_operations Yu Kuai
2025-05-12 4:56 ` Christoph Hellwig
@ 2025-05-22 2:44 ` Xiao Ni
1 sibling, 0 replies; 63+ messages in thread
From: Xiao Ni @ 2025-05-22 2:44 UTC (permalink / raw)
To: Yu Kuai, hch, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi
在 2025/5/12 上午9:19, Yu Kuai 写道:
> From: Yu Kuai <yukuai3@huawei.com>
>
> This helper is used to check if blocks can be skipped before calling
> into pers->sync_request(), llbiltmap will use this helper to skip
typo error s/llbiltmap/llbitmap/g
> resync for unwritten/clean data blocks, and recovery/check/repair for
> unwritten data blocks;
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
> drivers/md/md-bitmap.h | 1 +
> drivers/md/md.c | 7 +++++++
> 2 files changed, 8 insertions(+)
>
> diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
> index 41d09c6d0c14..13be2a10801a 100644
> --- a/drivers/md/md-bitmap.h
> +++ b/drivers/md/md-bitmap.h
> @@ -99,6 +99,7 @@ struct bitmap_operations {
> void (*end_discard)(struct mddev *mddev, sector_t offset,
> unsigned long sectors);
>
> + sector_t (*skip_sync_blocks)(struct mddev *mddev, sector_t offset);
> bool (*start_sync)(struct mddev *mddev, sector_t offset,
> sector_t *blocks, bool degraded);
> void (*end_sync)(struct mddev *mddev, sector_t offset, sector_t *blocks);
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 4329ecfbe8ff..c23ee9c19cf9 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -9370,6 +9370,12 @@ void md_do_sync(struct md_thread *thread)
> if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
> break;
>
> + if (mddev->bitmap_ops && mddev->bitmap_ops->skip_sync_blocks) {
> + sectors = mddev->bitmap_ops->skip_sync_blocks(mddev, j);
> + if (sectors)
> + goto update;
> + }
> +
> sectors = mddev->pers->sync_request(mddev, j, max_sectors,
> &skipped);
> if (sectors == 0) {
> @@ -9385,6 +9391,7 @@ void md_do_sync(struct md_thread *thread)
> if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
> break;
>
> +update:
> j += sectors;
> if (j > max_sectors)
> /* when skipping, extra large numbers can be returned. */
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH RFC md-6.16 v3 08/19] md/md-bitmap: add a new helper blocks_synced() in bitmap_operations
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
` (6 preceding siblings ...)
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 07/19] md/md-bitmap: add a new helper skip_sync_blocks() in bitmap_operations Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 4:57 ` Christoph Hellwig
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 09/19] md: add a new recovery_flag MD_RECOVERY_LAZY_RECOVER Yu Kuai
` (11 subsequent siblings)
19 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
Currently, raid456 must perform a whole array initial recovery to build
initail xor data, then IO to the array won't have to read all the blocks
in underlying disks.
This behavior will affect IO performance a lot, and nowadays there are
huge disks and the initial recovery can take a long time. Hence llbitmap
will support lazy initial recovery in following patches. This helper is
used to check if data blocks is synced or not, if not then IO will still
have to read all blocks.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md-bitmap.h | 1 +
drivers/md/raid5.c | 6 ++++++
2 files changed, 7 insertions(+)
diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
index 13be2a10801a..4e27f5f793b7 100644
--- a/drivers/md/md-bitmap.h
+++ b/drivers/md/md-bitmap.h
@@ -99,6 +99,7 @@ struct bitmap_operations {
void (*end_discard)(struct mddev *mddev, sector_t offset,
unsigned long sectors);
+ bool (*blocks_synced)(struct mddev *mddev, sector_t offset);
sector_t (*skip_sync_blocks)(struct mddev *mddev, sector_t offset);
bool (*start_sync)(struct mddev *mddev, sector_t offset,
sector_t *blocks, bool degraded);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 7e66a99f29af..e5d3d8facb4b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3748,6 +3748,7 @@ static int want_replace(struct stripe_head *sh, int disk_idx)
static int need_this_block(struct stripe_head *sh, struct stripe_head_state *s,
int disk_idx, int disks)
{
+ struct mddev *mddev = sh->raid_conf->mddev;
struct r5dev *dev = &sh->dev[disk_idx];
struct r5dev *fdev[2] = { &sh->dev[s->failed_num[0]],
&sh->dev[s->failed_num[1]] };
@@ -3762,6 +3763,11 @@ static int need_this_block(struct stripe_head *sh, struct stripe_head_state *s,
*/
return 0;
+ /* The initial recover is not done, must read everything */
+ if (mddev->bitmap_ops && mddev->bitmap_ops->blocks_synced &&
+ !mddev->bitmap_ops->blocks_synced(mddev, sh->sector))
+ return 1;
+
if (dev->toread ||
(dev->towrite && !test_bit(R5_OVERWRITE, &dev->flags)))
/* We need this block to directly satisfy a request */
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 08/19] md/md-bitmap: add a new helper blocks_synced() in bitmap_operations
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 08/19] md/md-bitmap: add a new helper blocks_synced() " Yu Kuai
@ 2025-05-12 4:57 ` Christoph Hellwig
0 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 4:57 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi
Btw, both for the previous and this patch: function pointers in
structures are methods, not helpers.
Otherwise looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH RFC md-6.16 v3 09/19] md: add a new recovery_flag MD_RECOVERY_LAZY_RECOVER
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
` (7 preceding siblings ...)
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 08/19] md/md-bitmap: add a new helper blocks_synced() " Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 4:57 ` Christoph Hellwig
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 10/19] md/md-llbitmap: add data structure definition and comments Yu Kuai
` (10 subsequent siblings)
19 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
This flag is used by llbitmap in later patches to skip raid456 initial
recover and delay building initial xor data to first write.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md.c | 12 +++++++++++-
drivers/md/md.h | 2 ++
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index c23ee9c19cf9..a5dd7a403ea5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9133,6 +9133,14 @@ static sector_t md_sync_position(struct mddev *mddev, enum sync_action action)
start = rdev->recovery_offset;
rcu_read_unlock();
+ /*
+ * If there are no spares, and raid456 lazy initial recover is
+ * requested.
+ */
+ if (test_bit(MD_RECOVERY_LAZY_RECOVER, &mddev->recovery) &&
+ start == MaxSector)
+ start = 0;
+
/* If there is a bitmap, we need to make sure all
* writes that started before we added a spare
* complete before we start doing a recovery.
@@ -9697,6 +9705,7 @@ static bool md_choose_sync_action(struct mddev *mddev, int *spares)
if (mddev->recovery_cp < MaxSector) {
set_bit(MD_RECOVERY_SYNC, &mddev->recovery);
clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
+ clear_bit(MD_RECOVERY_LAZY_RECOVER, &mddev->recovery);
return true;
}
@@ -9706,7 +9715,7 @@ static bool md_choose_sync_action(struct mddev *mddev, int *spares)
* re-add.
*/
*spares = remove_and_add_spares(mddev, NULL);
- if (*spares) {
+ if (*spares || test_bit(MD_RECOVERY_LAZY_RECOVER, &mddev->recovery)) {
clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
@@ -10029,6 +10038,7 @@ void md_reap_sync_thread(struct mddev *mddev)
clear_bit(MD_RECOVERY_RESHAPE, &mddev->recovery);
clear_bit(MD_RECOVERY_REQUESTED, &mddev->recovery);
clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
+ clear_bit(MD_RECOVERY_LAZY_RECOVER, &mddev->recovery);
/*
* We call mddev->cluster_ops->update_size here because sync_size could
* be changed by md_update_sb, and MD_RECOVERY_RESHAPE is cleared,
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 99f6c7a92b48..0c89bf0e8e4f 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -667,6 +667,8 @@ enum recovery_flags {
MD_RECOVERY_RESHAPE,
/* remote node is running resync thread */
MD_RESYNCING_REMOTE,
+ /* raid456 lazy initial recover */
+ MD_RECOVERY_LAZY_RECOVER,
};
enum md_ro_state {
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 09/19] md: add a new recovery_flag MD_RECOVERY_LAZY_RECOVER
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 09/19] md: add a new recovery_flag MD_RECOVERY_LAZY_RECOVER Yu Kuai
@ 2025-05-12 4:57 ` Christoph Hellwig
0 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 4:57 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH RFC md-6.16 v3 10/19] md/md-llbitmap: add data structure definition and comments
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
` (8 preceding siblings ...)
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 09/19] md: add a new recovery_flag MD_RECOVERY_LAZY_RECOVER Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 5:09 ` Christoph Hellwig
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 11/19] md/md-llbitmap: implement bitmap IO Yu Kuai
` (9 subsequent siblings)
19 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md-llbitmap.c | 281 +++++++++++++++++++++++++++++++++++++++
1 file changed, 281 insertions(+)
create mode 100644 drivers/md/md-llbitmap.c
diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
new file mode 100644
index 000000000000..8ab4c77abd32
--- /dev/null
+++ b/drivers/md/md-llbitmap.c
@@ -0,0 +1,281 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/blkdev.h>
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/slab.h>
+#include <linux/init.h>
+#include <linux/timer.h>
+#include <linux/sched.h>
+#include <linux/list.h>
+#include <linux/file.h>
+#include <linux/mount.h>
+#include <linux/buffer_head.h>
+#include <linux/seq_file.h>
+#include <trace/events/block.h>
+
+#include "md.h"
+#include "md-bitmap.h"
+
+/*
+ * #### Background
+ *
+ * Redundant data is used to enhance data fault tolerance, and the storage
+ * method for redundant data vary depending on the RAID levels. And it's
+ * important to maintain the consistency of redundant data.
+ *
+ * Bitmap is used to record which data blocks have been synchronized and which
+ * ones need to be resynchronized or recovered. Each bit in the bitmap
+ * represents a segment of data in the array. When a bit is set, it indicates
+ * that the multiple redundant copies of that data segment may not be
+ * consistent. Data synchronization can be performed based on the bitmap after
+ * power failure or readding a disk. If there is no bitmap, a full disk
+ * synchronization is required.
+ *
+ * #### Key Features
+ *
+ * - IO fastpath is lockless, if user issues lots of write IO to the same
+ * bitmap bit in a short time, only the first write have additional overhead
+ * to update bitmap bit, no additional overhead for the following writes;
+ * - support only resync or recover written data, means in the case creating
+ * new array or replacing with a new disk, there is no need to do a full disk
+ * resync/recovery;
+ *
+ * #### Key Concept
+ *
+ * ##### State Machine
+ *
+ * Each bit is one byte, contain 6 difference state, see llbitmap_state. And
+ * there are total 8 differenct actions, see llbitmap_action, can change state:
+ *
+ * llbitmap state machine: transitions between states
+ *
+ * | | Startwrite | Startsync | Endsync | Abortsync| Reload | Daemon | Discard | Stale |
+ * | --------- | ---------- | --------- | ------- | ------- | -------- | ------ | --------- | --------- |
+ * | Unwritten | Dirty | x | x | x | x | x | x | x |
+ * | Clean | Dirty | x | x | x | x | x | Unwritten | NeedSync |
+ * | Dirty | x | x | x | x | NeedSync | Clean | Unwritten | NeedSync |
+ * | NeedSync | x | Syncing | x | x | x | x | Unwritten | x |
+ * | Syncing | x | Syncing | Dirty | NeedSync | NeedSync | x | Unwritten | NeedSync |
+ *
+ * Typical scenarios:
+ *
+ * 1) Create new array
+ * All bits will be set to Unwritten by default, if --assume-clean is set,
+ * All bits will be set to Clean instead.
+ *
+ * 2) write data, raid1/raid10 have full copy of data, while raid456 doesn't and
+ * rely on xor data
+ *
+ * 2.1) write new data to raid1/raid10:
+ * Unwritten --StartWrite--> Dirty
+ *
+ * 2.2) write new data to raid456:
+ * Unwritten --StartWrite--> NeedSync
+ *
+ * Because the initial recover for raid456 is skipped, the xor data is not build
+ * yet, the bit must set to NeedSync first and after lazy initial recover is
+ * finished, the bit will finially set to Dirty(see 4.1 and 4.4);
+ *
+ * 2.3) cover write
+ * Clean --StartWrite--> Dirty
+ *
+ * 3) daemon, if the array is not degraded:
+ * Dirty --Daemon--> Clean
+ *
+ * For degraded array, the Dirty bit will never be cleared, prevent full disk
+ * recovery while readding a removed disk.
+ *
+ * 4) discard
+ * {Clean, Dirty, NeedSync, Syncing} --Discard--> Unwritten
+ *
+ * 5) resync and recover
+ *
+ * 5.1) common process
+ * NeedSync --Startsync--> Syncing --Endsync--> Dirty --Daemon--> Clean
+ *
+ * 5.2) resync after power failure
+ * Dirty --Reload--> NeedSync
+ *
+ * 5.3) recover while replacing with a new disk
+ * By default, the old bitmap framework will recover all data, and llbitmap
+ * implement this by a new helper llbitmap_skip_sync_blocks:
+ *
+ * skip recover for bits other than dirty or clean;
+ *
+ * 5.4) lazy initial recover for raid5:
+ * By default, the old bitmap framework will only allow new recover when there
+ * are spares(new disk), a new recovery flag MD_RECOVERY_LAZY_RECOVER is add
+ * to perform raid456 lazy recover for set bits(from 2.2).
+ *
+ * ##### Bitmap IO
+ *
+ * ##### Chunksize
+ *
+ * The default bitmap size is 128k, incluing 1k bitmap super block, and
+ * the default size of segment of data in the array each bit(chunksize) is 64k,
+ * and chunksize will adjust to twice the old size each time if the total number
+ * bits is not less than 127k.(see llbitmap_init)
+ *
+ * ##### READ
+ *
+ * While creating bitmap, all pages will be allocated and read for llbitmap,
+ * there won't be read afterwards
+ *
+ * ##### WRITE
+ *
+ * WRITE IO is divided into logical_block_size of the array, the dirty state
+ * of each block is tracked independently, for example:
+ *
+ * each page is 4k, contain 8 blocks; each block is 512 bytes contain 512 bit;
+ *
+ * | page0 | page1 | ... | page 31 |
+ * | |
+ * | \-----------------------\
+ * | |
+ * | block0 | block1 | ... | block 8|
+ * | |
+ * | \-----------------\
+ * | |
+ * | bit0 | bit1 | ... | bit511 |
+ *
+ * From IO path, if one bit is changed to Dirty or NeedSync, the corresponding
+ * block will be marked dirty, such block must write first before the IO is
+ * issued. This behaviour will affect IO performance, to reduce the impact, if
+ * multiple bits are changed in the same block in a short time, all bits in this
+ * block will be changed to Dirty/NeedSync, so that there won't be any overhead
+ * until daemon clears dirty bits.
+ *
+ * ##### Dirty Bits syncronization
+ *
+ * IO fast path will set bits to dirty, and those dirty bits will be cleared
+ * by daemon after IO is done. llbitmap_barrier is used to synchronize between
+ * IO path and daemon;
+ *
+ * IO path:
+ * 1) try to grab a reference, if succeed, set expire time after 5s and return;
+ * 2) if failed to grab a reference, wait for daemon to finish clearing dirty
+ * bits;
+ *
+ * Daemon(Daemon will be waken up every daemon_sleep seconds):
+ * For each page:
+ * 1) check if page expired, if not skip this page; for expired page:
+ * 2) suspend the page and wait for inflight write IO to be done;
+ * 3) change dirty page to clean;
+ * 4) resume the page;
+ */
+
+#define LLBITMAP_MAJOR_HI 6
+
+#define BITMAP_MAX_SECTOR (128 * 2)
+#define BITMAP_MAX_PAGES 32
+#define BITMAP_SB_SIZE 1024
+/* 64k is the max IO size of sync IO for raid1/raid10 */
+#define MIN_CHUNK_SIZE (64 * 2)
+
+#define DEFAULT_DAEMON_SLEEP 30
+
+#define BARRIER_IDLE 5
+
+enum llbitmap_state {
+ /* No valid data, init state after assemble the array */
+ BitUnwritten = 0,
+ /* data is consistent */
+ BitClean,
+ /* data will be consistent after IO is done, set directly for writes */
+ BitDirty,
+ /*
+ * data need to be resynchronized:
+ * 1) set directly for writes if array is degraded, prevent full disk
+ * synchronization after readding a disk;
+ * 2) reassemble the array after power failure, and dirty bits are
+ * found after reloading the bitmap;
+ */
+ BitNeedSync,
+ /* data is synchronizing */
+ BitSyncing,
+ nr_llbitmap_state,
+ BitNone = 0xff,
+};
+
+enum llbitmap_action {
+ /* User write new data, this is the only acton from IO fast path */
+ BitmapActionStartwrite = 0,
+ /* Start recovery */
+ BitmapActionStartsync,
+ /* Finish recovery */
+ BitmapActionEndsync,
+ /* Failed recovery */
+ BitmapActionAbortsync,
+ /* Reassemble the array */
+ BitmapActionReload,
+ /* Daemon thread is trying to clear dirty bits */
+ BitmapActionDaemon,
+ /* Data is deleted */
+ BitmapActionDiscard,
+ /*
+ * Bitmap is stale, mark all bits in addition to BitUnwritten to
+ * BitNeedSync.
+ */
+ BitmapActionStale,
+ nr_llbitmap_action,
+ /* Init state is BitUnwritten */
+ BitmapActionInit,
+};
+
+enum barrier_state {
+ LLPageFlush = 0,
+ LLPageDirty,
+};
+/*
+ * page level barrier to synchronize between dirty bit by write IO and clean bit
+ * by daemon.
+ */
+struct llbitmap_barrier {
+ char *data;
+ struct percpu_ref active;
+ unsigned long expire;
+ unsigned long flags;
+ /* Per block size dirty state, maximum 64k page / 512 sector = 128 */
+ DECLARE_BITMAP(dirty, 128);
+ wait_queue_head_t wait;
+} ____cacheline_aligned_in_smp;
+
+struct llbitmap {
+ struct mddev *mddev;
+ int nr_pages;
+ struct page *pages[BITMAP_MAX_PAGES];
+ struct llbitmap_barrier barrier[BITMAP_MAX_PAGES];
+
+ /* shift of one chunk */
+ unsigned long chunkshift;
+ /* size of one chunk in sector */
+ unsigned long chunksize;
+ /* total number of chunks */
+ unsigned long chunks;
+ int io_size;
+ int bits_per_page;
+ /* fires on first BitDirty state */
+ struct timer_list pending_timer;
+ struct work_struct daemon_work;
+
+ unsigned long flags;
+ __u64 events_cleared;
+};
+
+struct llbitmap_unplug_work {
+ struct work_struct work;
+ struct llbitmap *llbitmap;
+ struct completion *done;
+};
+
+static struct workqueue_struct *md_llbitmap_io_wq;
+static struct workqueue_struct *md_llbitmap_unplug_wq;
+
+static char state_machine[nr_llbitmap_state][nr_llbitmap_action] = {
+ [BitUnwritten] = {BitDirty, BitNone, BitNone, BitNone, BitNone, BitNone, BitNone, BitNone},
+ [BitClean] = {BitDirty, BitNone, BitNone, BitNone, BitNone, BitNone, BitUnwritten, BitNeedSync},
+ [BitDirty] = {BitNone, BitNone, BitNone, BitNone, BitNeedSync, BitClean, BitUnwritten, BitNeedSync},
+ [BitNeedSync] = {BitNone, BitSyncing, BitNone, BitNone, BitNone, BitNone, BitUnwritten, BitNone},
+ [BitSyncing] = {BitNone, BitSyncing, BitDirty, BitNeedSync, BitNeedSync, BitNone, BitUnwritten, BitNeedSync},
+};
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 10/19] md/md-llbitmap: add data structure definition and comments
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 10/19] md/md-llbitmap: add data structure definition and comments Yu Kuai
@ 2025-05-12 5:09 ` Christoph Hellwig
2025-05-12 8:05 ` Yu Kuai
0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 5:09 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi
On Mon, May 12, 2025 at 09:19:18AM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
You'll need a commit message here. Also given how tiny this is
vs the rest of the file I'm not entirely sure there is much of a
point in splittng it out.
> +#include <linux/buffer_head.h>
This shouldn't be needed here I think.
> + * llbitmap state machine: transitions between states
Can you split the table below into two columns so that it's easily
readabe?
> +#define LLBITMAP_MAJOR_HI 6
I think you'll want to consolidate all the different version in
md-bitmap.h and document them.
> +#define BITMAP_MAX_SECTOR (128 * 2)
This appears unused even with the later series.
> +#define BITMAP_MAX_PAGES 32
Can you comment on how we ended up with this number?
> +#define BITMAP_SB_SIZE 1024
I'd add this to md-bitmap.h as it's part of the on-disk format, and
also make md-bitmap.c use it.
> +#define DEFAULT_DAEMON_SLEEP 30
> +
> +#define BARRIER_IDLE 5
Can you document these?
> +enum llbitmap_action {
> + /* User write new data, this is the only acton from IO fast path */
s/acton/action/
> +/*
> + * page level barrier to synchronize between dirty bit by write IO and clean bit
> + * by daemon.
Overly lon line. Also please stat full sentences with a capitalized
word.
> + */
> +struct llbitmap_barrier {
> + char *data;
This is really a states array as far as I can tell. Maybe name it
more descriptively and throw in a comment.
> + struct percpu_ref active;
> + unsigned long expire;
> + unsigned long flags;
> + /* Per block size dirty state, maximum 64k page / 512 sector = 128 */
> + DECLARE_BITMAP(dirty, 128);
> + wait_queue_head_t wait;
> +} ____cacheline_aligned_in_smp;
> +
> +struct llbitmap {
> + struct mddev *mddev;
> + int nr_pages;
> + struct page *pages[BITMAP_MAX_PAGES];
> + struct llbitmap_barrier barrier[BITMAP_MAX_PAGES];
Should the bitmap and the page be in the same array to have less
cache line sharing between the users/ The above
____cacheline_aligned_in_smp suggests you are at least somewhat
woerried about cache line sharing.
> +static char state_machine[nr_llbitmap_state][nr_llbitmap_action] = {
> + [BitUnwritten] = {BitDirty, BitNone, BitNone, BitNone, BitNone, BitNone, BitNone, BitNone},
Maybe used named initializers to make this more readable. In fact that
might remove the need for the big table in the comment at the top of the
file because it would be just as readable.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 10/19] md/md-llbitmap: add data structure definition and comments
2025-05-12 5:09 ` Christoph Hellwig
@ 2025-05-12 8:05 ` Yu Kuai
2025-05-12 13:24 ` Christoph Hellwig
0 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 8:05 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: xni, colyli, agk, snitzer, mpatocka, song, linux-kernel, dm-devel,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/05/12 13:09, Christoph Hellwig 写道:
> On Mon, May 12, 2025 at 09:19:18AM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>
> You'll need a commit message here. Also given how tiny this is
> vs the rest of the file I'm not entirely sure there is much of a
> point in splittng it out.
I'm not sure, this way is simpiler, however do you prefer add structure
and comments by following patches gradually?
>
>> +#include <linux/buffer_head.h>
>
> This shouldn't be needed here I think.
Yes, I just copy the headers from old bitmap file.
>
>> + * llbitmap state machine: transitions between states
>
> Can you split the table below into two columns so that it's easily
> readabe?
Sure
>
>> +#define LLBITMAP_MAJOR_HI 6
>
> I think you'll want to consolidate all the different version in
> md-bitmap.h and document them.
Ok, then I'll also move other values from md-bitmap.c.
>
>> +#define BITMAP_MAX_SECTOR (128 * 2)
>
> This appears unused even with the later series.
Yes, this is used in the old version, I'll remove it.
>
>> +#define BITMAP_MAX_PAGES 32
>
> Can you comment on how we ended up with this number?
Ok, the bitmap superblock is created by mdadm, and mdadm limit
bitmap size max to 128k, then
>
>> +#define BITMAP_SB_SIZE 1024
>
> I'd add this to md-bitmap.h as it's part of the on-disk format, and
> also make md-bitmap.c use it.
Ok
>
>> +#define DEFAULT_DAEMON_SLEEP 30
>> +
>> +#define BARRIER_IDLE 5
>
> Can you document these?
Ok, this is used for daemon to clean dirty bits when user doesn't issue
IO to the bit for more than 5 seconds.
>
>> +enum llbitmap_action {
>> + /* User write new data, this is the only acton from IO fast path */
>
> s/acton/action/
>
>> +/*
>> + * page level barrier to synchronize between dirty bit by write IO and clean bit
>> + * by daemon.
>
> Overly lon line. Also please stat full sentences with a capitalized
> word.
>
>> + */
>> +struct llbitmap_barrier {
>> + char *data;
>
> This is really a states array as far as I can tell. Maybe name it
> more descriptively and throw in a comment.
How about llbitmap_page_ctl ?
>
>> + struct percpu_ref active;
>> + unsigned long expire;
>> + unsigned long flags;
>> + /* Per block size dirty state, maximum 64k page / 512 sector = 128 */
>> + DECLARE_BITMAP(dirty, 128);
>> + wait_queue_head_t wait;
>> +} ____cacheline_aligned_in_smp;
>> +
>> +struct llbitmap {
>> + struct mddev *mddev;
>> + int nr_pages;
>> + struct page *pages[BITMAP_MAX_PAGES];
>> + struct llbitmap_barrier barrier[BITMAP_MAX_PAGES];
>
> Should the bitmap and the page be in the same array to have less
> cache line sharing between the users/ The above
> ____cacheline_aligned_in_smp suggests you are at least somewhat
> woerried about cache line sharing.
Yes, and BTW, I probably should allocate necessary memory based on real
number of pages, instead of embedded max pages. I do this first to
eazy coding.
>
>> +static char state_machine[nr_llbitmap_state][nr_llbitmap_action] = {
>> + [BitUnwritten] = {BitDirty, BitNone, BitNone, BitNone, BitNone, BitNone, BitNone, BitNone},
>
> Maybe used named initializers to make this more readable. In fact that
> might remove the need for the big table in the comment at the top of the
> file because it would be just as readable.
Sure, this is a good suggestion.
Thanks for the review!
Kuai
>
> .
>
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 10/19] md/md-llbitmap: add data structure definition and comments
2025-05-12 8:05 ` Yu Kuai
@ 2025-05-12 13:24 ` Christoph Hellwig
0 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 13:24 UTC (permalink / raw)
To: Yu Kuai
Cc: Christoph Hellwig, xni, colyli, agk, snitzer, mpatocka, song,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi, yukuai (C)
On Mon, May 12, 2025 at 04:05:58PM +0800, Yu Kuai wrote:
> I'm not sure, this way is simpiler, however do you prefer add structure
> and comments by following patches gradually?
In general it's easier if code that belongs together is added in
one go. Separate modules in the sense of they don't really
depend on the surroundings separately
>>> + */
>>> +struct llbitmap_barrier {
>>> + char *data;
>>
>> This is really a states array as far as I can tell. Maybe name it
>> more descriptively and throw in a comment.
>
> How about llbitmap_page_ctl ?
I mean the data pointer which really is states. No real comment
on the structure name.
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH RFC md-6.16 v3 11/19] md/md-llbitmap: implement bitmap IO
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
` (9 preceding siblings ...)
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 10/19] md/md-llbitmap: add data structure definition and comments Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 5:15 ` Christoph Hellwig
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 12/19] md/md-llbitmap: implement bit state machine Yu Kuai
` (8 subsequent siblings)
19 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
READ
While creating bitmap, all pages will be allocated and read for llbitmap,
there won't be read afterwards
WRITE
WRITE IO is ievided into logical_block_size of the array, the dirty state
of each block is tracked independently, for example:
each page is 4k, contain 8 blocks; each block is 512 bytes contain 512 bit;
| page0 | page1 | ... | page 31 |
| |
| \-----------------------\
| |
| block0 | block1 | ... | block 8|
| |
| \-----------------\
| |
| bit0 | bit1 | ... | bit511 |
From IO path, if one bit is changed to Dirty or NeedSync, the corresponding
block will be marked dirty, such block must write first before the IO is
issued. This behaviour will affect IO performance, to reduce the impact, if
multiple bits are changed in the same block in a short time, all bits in
this block will be changed to Dirty/NeedSync, so that there won't be any
overhead until daemon clears dirty bits.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md-llbitmap.c | 183 +++++++++++++++++++++++++++++++++++++++
1 file changed, 183 insertions(+)
diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 8ab4c77abd32..b27d10661387 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -279,3 +279,186 @@ static char state_machine[nr_llbitmap_state][nr_llbitmap_action] = {
[BitNeedSync] = {BitNone, BitSyncing, BitNone, BitNone, BitNone, BitNone, BitUnwritten, BitNone},
[BitSyncing] = {BitNone, BitSyncing, BitDirty, BitNeedSync, BitNeedSync, BitNone, BitUnwritten, BitNeedSync},
};
+
+static bool is_raid456(struct mddev *mddev)
+{
+ return (mddev->level == 4 || mddev->level == 5 || mddev->level == 6);
+}
+
+static int llbitmap_read(struct llbitmap *llbitmap, enum llbitmap_state *state,
+ loff_t pos)
+{
+ pos += BITMAP_SB_SIZE;
+ *state = llbitmap->barrier[pos >> PAGE_SHIFT].data[offset_in_page(pos)];
+ return 0;
+}
+
+static void llbitmap_set_page_dirty(struct llbitmap *llbitmap, int idx, int offset)
+{
+ struct llbitmap_barrier *barrier = &llbitmap->barrier[idx];
+ bool level_456 = is_raid456(llbitmap->mddev);
+ int io_size = llbitmap->io_size;
+ int bit = offset / io_size;
+ bool infectious = false;
+ int pos;
+
+ if (!test_bit(LLPageDirty, &barrier->flags))
+ set_bit(LLPageDirty, &barrier->flags);
+
+ /*
+ * if the bit is already dirty, or other page bytes is the same bit is
+ * already BitDirty, then mark the whole bytes in the bit as dirty
+ */
+ if (test_and_set_bit(bit, barrier->dirty)) {
+ infectious = true;
+ } else {
+ for (pos = bit * io_size; pos < (bit + 1) * io_size - 1;
+ pos++) {
+ if (pos == offset)
+ continue;
+ if (barrier->data[pos] == BitDirty ||
+ barrier->data[pos] == BitNeedSync) {
+ infectious = true;
+ break;
+ }
+ }
+
+ }
+
+ if (!infectious)
+ return;
+
+ for (pos = bit * io_size; pos < (bit + 1) * io_size - 1; pos++) {
+ if (pos == offset)
+ continue;
+
+ switch (barrier->data[pos]) {
+ case BitUnwritten:
+ barrier->data[pos] = level_456 ? BitNeedSync : BitDirty;
+ break;
+ case BitClean:
+ barrier->data[pos] = BitDirty;
+ break;
+ };
+ }
+}
+
+static int llbitmap_write(struct llbitmap *llbitmap, enum llbitmap_state state,
+ loff_t pos)
+{
+ int idx;
+ int offset;
+
+ pos += BITMAP_SB_SIZE;
+ idx = pos >> PAGE_SHIFT;
+ offset = offset_in_page(pos);
+
+ llbitmap->barrier[idx].data[offset] = state;
+ if (state == BitDirty || state == BitNeedSync)
+ llbitmap_set_page_dirty(llbitmap, idx, offset);
+ return 0;
+}
+
+static void llbitmap_free_pages(struct llbitmap *llbitmap)
+{
+ int i;
+
+ for (i = 0; i < BITMAP_MAX_PAGES; i++) {
+ struct page *page = llbitmap->pages[i];
+
+ if (!page)
+ return;
+
+ llbitmap->pages[i] = NULL;
+ __free_page(page);
+ percpu_ref_exit(&llbitmap->barrier[i].active);
+ }
+}
+
+static struct page *llbitmap_read_page(struct llbitmap *llbitmap, int idx)
+{
+ struct page *page = llbitmap->pages[idx];
+ struct mddev *mddev = llbitmap->mddev;
+ struct md_rdev *rdev;
+
+ if (page)
+ return page;
+
+ page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ if (!page)
+ return ERR_PTR(-ENOMEM);
+
+ rdev_for_each(rdev, mddev) {
+ sector_t sector;
+
+ if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags))
+ continue;
+
+ sector = mddev->bitmap_info.offset + (idx << PAGE_SECTORS_SHIFT);
+
+ if (sync_page_io(rdev, sector, PAGE_SIZE, page, REQ_OP_READ, true))
+ return page;
+
+ md_error(mddev, rdev);
+ }
+
+ __free_page(page);
+ return ERR_PTR(-EIO);
+}
+
+static void llbitmap_write_page(struct llbitmap *llbitmap, int idx)
+{
+ struct page *page = llbitmap->pages[idx];
+ struct mddev *mddev = llbitmap->mddev;
+ struct md_rdev *rdev;
+ int bit;
+
+ for (bit = 0; bit < llbitmap->bits_per_page; bit++) {
+ struct llbitmap_barrier *barrier = &llbitmap->barrier[idx];
+
+ if (!test_and_clear_bit(bit, barrier->dirty))
+ continue;
+
+ rdev_for_each(rdev, mddev) {
+ sector_t sector;
+ sector_t bit_sector = llbitmap->io_size >> SECTOR_SHIFT;
+
+ if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags))
+ continue;
+
+ sector = mddev->bitmap_info.offset + rdev->sb_start +
+ (idx << PAGE_SECTORS_SHIFT) +
+ bit * bit_sector;
+ md_super_write(mddev, rdev, sector, llbitmap->io_size,
+ page, bit * llbitmap->io_size);
+ }
+ }
+}
+
+static int llbitmap_cache_pages(struct llbitmap *llbitmap)
+{
+ int nr_pages = (llbitmap->chunks + BITMAP_SB_SIZE + PAGE_SIZE - 1) / PAGE_SIZE;
+ struct page *page;
+ int i = 0;
+
+ llbitmap->nr_pages = nr_pages;
+ while (i < nr_pages) {
+ page = llbitmap_read_page(llbitmap, i);
+ if (IS_ERR(page)) {
+ llbitmap_free_pages(llbitmap);
+ return PTR_ERR(page);
+ }
+
+ if (percpu_ref_init(&llbitmap->barrier[i].active, active_release,
+ PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) {
+ __free_page(page);
+ return -ENOMEM;
+ }
+
+ init_waitqueue_head(&llbitmap->barrier[i].wait);
+ llbitmap->barrier[i].data = page_address(page);
+ llbitmap->pages[i++] = page;
+ }
+
+ return 0;
+}
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 11/19] md/md-llbitmap: implement bitmap IO
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 11/19] md/md-llbitmap: implement bitmap IO Yu Kuai
@ 2025-05-12 5:15 ` Christoph Hellwig
2025-05-12 8:15 ` Yu Kuai
0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 5:15 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi
On Mon, May 12, 2025 at 09:19:19AM +0800, Yu Kuai wrote:
> +static bool is_raid456(struct mddev *mddev)
> +{
> + return (mddev->level == 4 || mddev->level == 5 || mddev->level == 6);
> +}
This really should be in a common helper somewhere..
> +static int llbitmap_read(struct llbitmap *llbitmap, enum llbitmap_state *state,
> + loff_t pos)
> +{
> + pos += BITMAP_SB_SIZE;
> + *state = llbitmap->barrier[pos >> PAGE_SHIFT].data[offset_in_page(pos)];
> + return 0;
> +}
This always return 0, and could just return void.
> +static void llbitmap_set_page_dirty(struct llbitmap *llbitmap, int idx, int offset)
Overly long line.
Also should the second and third argument be unsigned?
> + /*
> + * if the bit is already dirty, or other page bytes is the same bit is
> + * already BitDirty, then mark the whole bytes in the bit as dirty
> + */
> + if (test_and_set_bit(bit, barrier->dirty)) {
> + infectious = true;
> + } else {
> + for (pos = bit * io_size; pos < (bit + 1) * io_size - 1;
> + pos++) {
> + if (pos == offset)
> + continue;
> + if (barrier->data[pos] == BitDirty ||
> + barrier->data[pos] == BitNeedSync) {
> + infectious = true;
> + break;
> + }
> + }
> +
> + }
> + if (!infectious)
> + return;
Mabe use a goto and/or a helper function containing the for loop to
clean up the control flow here a bit?
> +static int llbitmap_write(struct llbitmap *llbitmap, enum llbitmap_state state,
> + loff_t pos)
> +{
> + int idx;
> + int offset;
Unsigned?
> +
> + pos += BITMAP_SB_SIZE;
> + idx = pos >> PAGE_SHIFT;
> + offset = offset_in_page(pos);
> +
> + llbitmap->barrier[idx].data[offset] = state;
> + if (state == BitDirty || state == BitNeedSync)
> + llbitmap_set_page_dirty(llbitmap, idx, offset);
> + return 0;
and this could also be a void return.
> + sector = mddev->bitmap_info.offset + (idx << PAGE_SECTORS_SHIFT);
Overly long line.
> + if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags))
Same here.
> + int nr_pages = (llbitmap->chunks + BITMAP_SB_SIZE + PAGE_SIZE - 1) / PAGE_SIZE;
Unsigned for the type, and DIV_ROUND_UP for the calculation.
> + struct page *page;
> + int i = 0;
> +
> + llbitmap->nr_pages = nr_pages;
> + while (i < nr_pages) {
> + page = llbitmap_read_page(llbitmap, i);
> + if (IS_ERR(page)) {
> + llbitmap_free_pages(llbitmap);
> + return PTR_ERR(page);
> + }
> +
> + if (percpu_ref_init(&llbitmap->barrier[i].active, active_release,
> + PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) {
> + __free_page(page);
> + return -ENOMEM;
Doesn't this also need a llbitmap_free_pages for the error case?
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 11/19] md/md-llbitmap: implement bitmap IO
2025-05-12 5:15 ` Christoph Hellwig
@ 2025-05-12 8:15 ` Yu Kuai
2025-05-12 13:25 ` Christoph Hellwig
0 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 8:15 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: xni, colyli, agk, snitzer, mpatocka, song, linux-kernel, dm-devel,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/05/12 13:15, Christoph Hellwig 写道:
> On Mon, May 12, 2025 at 09:19:19AM +0800, Yu Kuai wrote:
>> +static bool is_raid456(struct mddev *mddev)
>> +{
>> + return (mddev->level == 4 || mddev->level == 5 || mddev->level == 6);
>> +}
>
> This really should be in a common helper somewhere..
Perhaps md.h?
>
>> +static int llbitmap_read(struct llbitmap *llbitmap, enum llbitmap_state *state,
>> + loff_t pos)
>> +{
>> + pos += BITMAP_SB_SIZE;
>> + *state = llbitmap->barrier[pos >> PAGE_SHIFT].data[offset_in_page(pos)];
>> + return 0;
>> +}
>
> This always return 0, and could just return void.
Ok
>
>> +static void llbitmap_set_page_dirty(struct llbitmap *llbitmap, int idx, int offset)
>
> Overly long line.
>
> Also should the second and third argument be unsigned?
Ok
>
>> + /*
>> + * if the bit is already dirty, or other page bytes is the same bit is
>> + * already BitDirty, then mark the whole bytes in the bit as dirty
>> + */
>> + if (test_and_set_bit(bit, barrier->dirty)) {
>> + infectious = true;
>> + } else {
>> + for (pos = bit * io_size; pos < (bit + 1) * io_size - 1;
>> + pos++) {
>> + if (pos == offset)
>> + continue;
>> + if (barrier->data[pos] == BitDirty ||
>> + barrier->data[pos] == BitNeedSync) {
>> + infectious = true;
>> + break;
>> + }
>> + }
>> +
>> + }
>> + if (!infectious)
>> + return;
>
> Mabe use a goto and/or a helper function containing the for loop to
> clean up the control flow here a bit?
Ok, I;ll add a helper function.
>
>> +static int llbitmap_write(struct llbitmap *llbitmap, enum llbitmap_state state,
>> + loff_t pos)
>> +{
>> + int idx;
>> + int offset;
>
> Unsigned?
>
>> +
>> + pos += BITMAP_SB_SIZE;
>> + idx = pos >> PAGE_SHIFT;
>> + offset = offset_in_page(pos);
>> +
>> + llbitmap->barrier[idx].data[offset] = state;
>> + if (state == BitDirty || state == BitNeedSync)
>> + llbitmap_set_page_dirty(llbitmap, idx, offset);
>> + return 0;
>
> and this could also be a void return.
Ok.
>
>> + sector = mddev->bitmap_info.offset + (idx << PAGE_SECTORS_SHIFT);
>
> Overly long line.
>
>> + if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags))
>
> Same here.
>
>> + int nr_pages = (llbitmap->chunks + BITMAP_SB_SIZE + PAGE_SIZE - 1) / PAGE_SIZE;
>
> Unsigned for the type, and DIV_ROUND_UP for the calculation.
Ok.
>
>> + struct page *page;
>> + int i = 0;
>> +
>> + llbitmap->nr_pages = nr_pages;
>> + while (i < nr_pages) {
>> + page = llbitmap_read_page(llbitmap, i);
>> + if (IS_ERR(page)) {
>> + llbitmap_free_pages(llbitmap);
>> + return PTR_ERR(page);
>> + }
>> +
>> + if (percpu_ref_init(&llbitmap->barrier[i].active, active_release,
>> + PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) {
>> + __free_page(page);
>> + return -ENOMEM;
>
> Doesn't this also need a llbitmap_free_pages for the error case?
Of course, my bad that I forgot this.
Thanks again for the review!
Kuai
>
> .
>
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 11/19] md/md-llbitmap: implement bitmap IO
2025-05-12 8:15 ` Yu Kuai
@ 2025-05-12 13:25 ` Christoph Hellwig
0 siblings, 0 replies; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 13:25 UTC (permalink / raw)
To: Yu Kuai
Cc: Christoph Hellwig, xni, colyli, agk, snitzer, mpatocka, song,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi, yukuai (C)
On Mon, May 12, 2025 at 04:15:34PM +0800, Yu Kuai wrote:
>>> +static bool is_raid456(struct mddev *mddev)
>>> +{
>>> + return (mddev->level == 4 || mddev->level == 5 || mddev->level == 6);
>>> +}
>>
>> This really should be in a common helper somewhere..
>
> Perhaps md.h?
Sounds good.
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH RFC md-6.16 v3 12/19] md/md-llbitmap: implement bit state machine
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
` (10 preceding siblings ...)
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 11/19] md/md-llbitmap: implement bitmap IO Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 13/19] md/md-llbitmap: implement APIs for page level dirty bits synchronization Yu Kuai
` (7 subsequent siblings)
19 siblings, 0 replies; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
Each bit is one byte, contain 6 difference state, see llbitmap_state. And
there are total 8 differenct actions, see llbitmap_action, can change
state:
llbitmap state machine: transitions between states
| | Startwrite | Startsync | Endsync | Abortsync| Reload | Daemon | Discard | Stale |
| --------- | ---------- | --------- | ------- | ------- | -------- | ------ | --------- | --------- |
| Unwritten | Dirty | x | x | x | x | x | x | x |
| Clean | Dirty | x | x | x | x | x | Unwritten | NeedSync |
| Dirty | x | x | x | x | NeedSync | Clean | Unwritten | NeedSync |
| NeedSync | x | Syncing | x | x | x | x | Unwritten | x |
| Syncing | x | Syncing | Dirty | NeedSync | NeedSync | x | Unwritten | NeedSync |
Typical scenarios:
1) Create new array
All bits will be set to Unwritten by default, if --assume-clean is set,
All bits will be set to Clean instead.
2) write data, raid1/raid10 have full copy of data, while raid456 donen't and
rely on xor data
2.1) write new data to raid1/raid10:
Unwritten --StartWrite--> Dirty
2.2) write new data to raid456:
Unwritten --StartWrite--> NeedSync
Because the initial recover for raid456 is skipped, the xor data is not build
yet, the bit must set to NeedSync first and after lazy initial recover is
finished, the bit will finially set to Dirty(see 4.1 and 4.4);
2.3) cover write
Clean --StartWrite--> Dirty
3) daemon, if the array is not degraded:
Dirty --Daemon--> Clean
For degraded array, the Dirty bit will never be cleared, prevent full disk
recovery while readding a removed disk.
4) discard
{Clean, Dirty, NeedSync, Syncing} --Discard--> Unwritten
4) resync and recover
4.1) common process
NeedSync --Startsync--> Syncing --Endsync--> Dirty --Daemon--> Clean
4.2) resync after power failure
Dirty --Reload--> NeedSync
4.3) recover while replacing with a new disk
By default, the old bitmap framework will recover all data, and llbitmap
implement this by a new helper llbitmap_skip_sync_blocks:
skip recover for bits other than dirty or clean;
4.4) lazy initial recover for raid5:
By default, the old bitmap framework will only allow new recover when there
are spares(new disk), a new recovery flag MD_RECOVERY_LAZY_RECOVER is add
to perform raid456 lazy recover for set bits(from 2.2).
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md-llbitmap.c | 100 +++++++++++++++++++++++++++++++++++++++
1 file changed, 100 insertions(+)
diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index b27d10661387..315a4eb7627c 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -462,3 +462,103 @@ static int llbitmap_cache_pages(struct llbitmap *llbitmap)
return 0;
}
+
+static void llbitmap_init_state(struct llbitmap *llbitmap)
+{
+ enum llbitmap_state state = BitUnwritten;
+ unsigned long i;
+
+ if (test_and_clear_bit(BITMAP_CLEAN, &llbitmap->flags))
+ state = BitClean;
+
+ for (i = 0; i < llbitmap->chunks; i++) {
+ int ret = llbitmap_write(llbitmap, state, i);
+
+ if (ret < 0) {
+ set_bit(BITMAP_WRITE_ERROR, &llbitmap->flags);
+ break;
+ }
+ }
+}
+
+/* The return value is only used from resync, where @start == @end. */
+static enum llbitmap_state llbitmap_state_machine(struct llbitmap *llbitmap,
+ unsigned long start,
+ unsigned long end,
+ enum llbitmap_action action)
+{
+ struct mddev *mddev = llbitmap->mddev;
+ enum llbitmap_state state = BitNone;
+ bool need_resync = false;
+ bool need_recovery = false;
+
+ if (test_bit(BITMAP_WRITE_ERROR, &llbitmap->flags))
+ return BitNone;
+
+ if (action == BitmapActionInit) {
+ llbitmap_init_state(llbitmap);
+ return BitNone;
+ }
+
+ while (start <= end) {
+ ssize_t ret;
+ enum llbitmap_state c;
+
+ ret = llbitmap_read(llbitmap, &c, start);
+ if (ret < 0) {
+ set_bit(BITMAP_WRITE_ERROR, &llbitmap->flags);
+ return BitNone;
+ }
+
+ if (c < 0 || c >= nr_llbitmap_state) {
+ pr_err("%s: invalid bit %lu state %d action %d, forcing resync\n",
+ __func__, start, c, action);
+ state = BitNeedSync;
+ goto write_bitmap;
+ }
+
+ if (c == BitNeedSync)
+ need_resync = true;
+
+ state = state_machine[c][action];
+ if (state == BitNone) {
+ start++;
+ continue;
+ }
+
+write_bitmap:
+ /* Delay raid456 initial recovery to first write. */
+ if (c == BitUnwritten && state == BitDirty &&
+ action == BitmapActionStartwrite && is_raid456(mddev)) {
+ state = BitNeedSync;
+ need_recovery = true;
+ }
+
+ ret = llbitmap_write(llbitmap, state, start);
+ if (ret < 0) {
+ set_bit(BITMAP_WRITE_ERROR, &llbitmap->flags);
+ return BitNone;
+ }
+
+ if (state == BitNeedSync)
+ need_resync = true;
+ else if (state == BitDirty &&
+ !timer_pending(&llbitmap->pending_timer))
+ mod_timer(&llbitmap->pending_timer,
+ jiffies + mddev->bitmap_info.daemon_sleep * HZ);
+
+ start++;
+ }
+
+ if (need_recovery) {
+ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+ set_bit(MD_RECOVERY_LAZY_RECOVER, &mddev->recovery);
+ md_wakeup_thread(mddev->thread);
+ } else if (need_resync) {
+ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+ set_bit(MD_RECOVERY_SYNC, &mddev->recovery);
+ md_wakeup_thread(mddev->thread);
+ }
+
+ return state;
+}
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH RFC md-6.16 v3 13/19] md/md-llbitmap: implement APIs for page level dirty bits synchronization
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
` (11 preceding siblings ...)
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 12/19] md/md-llbitmap: implement bit state machine Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 14/19] md/md-llbitmap: implement APIs to mange bitmap lifetime Yu Kuai
` (6 subsequent siblings)
19 siblings, 0 replies; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
IO fast path will set bits to dirty, and those dirty bits will be cleared
by daemon after IO is done. llbitmap_barrier is used to synchronize between
IO path and daemon;
IO path:
1) try to grab a reference, if succeed, set expire time after 5s and
return;
2) if failed to grab a reference, wait for daemon to finish clearing dirty
bits;
Daemon(Daemon will be waken up every daemon_sleep seconds):
For each page:
1) check if page expired, if not skip this page; for expired page:
2) suspend the page and wait for inflight write IO to be done;
3) change dirty page to clean;
4) resume the page;
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md-llbitmap.c | 46 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 315a4eb7627c..994ca0be3d17 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -435,6 +435,14 @@ static void llbitmap_write_page(struct llbitmap *llbitmap, int idx)
}
}
+static void active_release(struct percpu_ref *ref)
+{
+ struct llbitmap_barrier *barrier =
+ container_of(ref, struct llbitmap_barrier, active);
+
+ wake_up(&barrier->wait);
+}
+
static int llbitmap_cache_pages(struct llbitmap *llbitmap)
{
int nr_pages = (llbitmap->chunks + BITMAP_SB_SIZE + PAGE_SIZE - 1) / PAGE_SIZE;
@@ -562,3 +570,41 @@ static enum llbitmap_state llbitmap_state_machine(struct llbitmap *llbitmap,
return state;
}
+
+static void llbitmap_raise_barrier(struct llbitmap *llbitmap, int page_idx)
+{
+ struct llbitmap_barrier *barrier = &llbitmap->barrier[page_idx];
+
+retry:
+ if (likely(percpu_ref_tryget_live(&barrier->active))) {
+ WRITE_ONCE(barrier->expire, jiffies + BARRIER_IDLE * HZ);
+ return;
+ }
+
+ wait_event(barrier->wait, !percpu_ref_is_dying(&barrier->active));
+ goto retry;
+}
+
+static void llbitmap_release_barrier(struct llbitmap *llbitmap, int page_idx)
+{
+ struct llbitmap_barrier *barrier = &llbitmap->barrier[page_idx];
+
+ percpu_ref_put(&barrier->active);
+}
+
+static void llbitmap_suspend(struct llbitmap *llbitmap, int page_idx)
+{
+ struct llbitmap_barrier *barrier = &llbitmap->barrier[page_idx];
+
+ percpu_ref_kill(&barrier->active);
+ wait_event(barrier->wait, percpu_ref_is_zero(&barrier->active));
+}
+
+static void llbitmap_resume(struct llbitmap *llbitmap, int page_idx)
+{
+ struct llbitmap_barrier *barrier = &llbitmap->barrier[page_idx];
+
+ barrier->expire = LONG_MAX;
+ percpu_ref_resurrect(&barrier->active);
+ wake_up(&barrier->wait);
+}
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH RFC md-6.16 v3 14/19] md/md-llbitmap: implement APIs to mange bitmap lifetime
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
` (12 preceding siblings ...)
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 13/19] md/md-llbitmap: implement APIs for page level dirty bits synchronization Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 15/19] md/md-llbitmap: implement APIs to dirty bits and clear bits Yu Kuai
` (5 subsequent siblings)
19 siblings, 0 replies; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
Include following APIs:
- llbitmap_create
- llbitmap_resize
- llbitmap_load
- llbitmap_destroy
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md-llbitmap.c | 262 +++++++++++++++++++++++++++++++++++++++
1 file changed, 262 insertions(+)
diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 994ca0be3d17..4b54aa6fbe40 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -608,3 +608,265 @@ static void llbitmap_resume(struct llbitmap *llbitmap, int page_idx)
percpu_ref_resurrect(&barrier->active);
wake_up(&barrier->wait);
}
+
+static int llbitmap_check_support(struct mddev *mddev)
+{
+ if (test_bit(MD_HAS_JOURNAL, &mddev->flags)) {
+ pr_notice("md/llbitmap: %s: array with journal cannot have bitmap\n",
+ mdname(mddev));
+ return -EBUSY;
+ }
+
+ if (mddev->bitmap_info.space == 0) {
+ if (mddev->bitmap_info.default_space == 0) {
+ pr_notice("md/llbitmap: %s: no space for bitmap\n",
+ mdname(mddev));
+ return -ENOSPC;
+ }
+ }
+
+ if (!mddev->persistent) {
+ pr_notice("md/llbitmap: %s: array must be persistent\n",
+ mdname(mddev));
+ return -EOPNOTSUPP;
+ }
+
+ if (mddev->bitmap_info.file) {
+ pr_notice("md/llbitmap: %s: doesn't support bitmap file\n",
+ mdname(mddev));
+ return -EOPNOTSUPP;
+ }
+
+ if (mddev->bitmap_info.external) {
+ pr_notice("md/llbitmap: %s: doesn't support external metadata\n",
+ mdname(mddev));
+ return -EOPNOTSUPP;
+ }
+
+ if (mddev_is_dm(mddev)) {
+ pr_notice("md/llbitmap: %s: doesn't support dm-raid\n",
+ mdname(mddev));
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static int llbitmap_init(struct llbitmap *llbitmap)
+{
+ struct mddev *mddev = llbitmap->mddev;
+ sector_t blocks = mddev->resync_max_sectors;
+ unsigned long chunksize = MIN_CHUNK_SIZE;
+ unsigned long chunks = DIV_ROUND_UP(blocks, chunksize);
+ unsigned long space = mddev->bitmap_info.space << SECTOR_SHIFT;
+ int ret;
+
+ while (chunks > space) {
+ chunksize = chunksize << 1;
+ chunks = DIV_ROUND_UP(blocks, chunksize);
+ }
+
+ llbitmap->chunkshift = ffz(~chunksize);
+ llbitmap->chunksize = chunksize;
+ llbitmap->chunks = chunks;
+ mddev->bitmap_info.daemon_sleep = DEFAULT_DAEMON_SLEEP;
+
+ ret = llbitmap_cache_pages(llbitmap);
+ if (ret)
+ return ret;
+
+ llbitmap_state_machine(llbitmap, 0, llbitmap->chunks - 1, BitmapActionInit);
+ return 0;
+}
+
+static int llbitmap_read_sb(struct llbitmap *llbitmap)
+{
+ struct mddev *mddev = llbitmap->mddev;
+ unsigned long daemon_sleep;
+ unsigned long chunksize;
+ unsigned long events;
+ struct page *sb_page;
+ bitmap_super_t *sb;
+ int ret = -EINVAL;
+
+ if (!mddev->bitmap_info.offset) {
+ pr_err("md/llbitmap: %s: no super block found", mdname(mddev));
+ return -EINVAL;
+ }
+
+ sb_page = llbitmap_read_page(llbitmap, 0);
+ if (IS_ERR(sb_page)) {
+ pr_err("md/llbitmap: %s: read super block failed",
+ mdname(mddev));
+ ret = -EIO;
+ goto out;
+ }
+
+ sb = kmap_local_page(sb_page);
+ if (sb->magic != cpu_to_le32(BITMAP_MAGIC)) {
+ pr_err("md/llbitmap: %s: invalid super block magic number",
+ mdname(mddev));
+ goto out_put_page;
+ }
+
+ if (sb->version != cpu_to_le32(LLBITMAP_MAJOR_HI)) {
+ pr_err("md/llbitmap: %s: invalid super block version",
+ mdname(mddev));
+ goto out_put_page;
+ }
+
+ if (memcmp(sb->uuid, mddev->uuid, 16)) {
+ pr_err("md/llbitmap: %s: bitmap superblock UUID mismatch\n",
+ mdname(mddev));
+ goto out_put_page;
+ }
+
+ if (mddev->bitmap_info.space == 0) {
+ int room = le32_to_cpu(sb->sectors_reserved);
+
+ if (room)
+ mddev->bitmap_info.space = room;
+ else
+ mddev->bitmap_info.space = mddev->bitmap_info.default_space;
+ }
+ llbitmap->flags = le32_to_cpu(sb->state);
+ if (test_and_clear_bit(BITMAP_FIRST_USE, &llbitmap->flags)) {
+ ret = llbitmap_init(llbitmap);
+ goto out_put_page;
+ }
+
+ chunksize = le32_to_cpu(sb->chunksize);
+ if (!is_power_of_2(chunksize)) {
+ pr_err("md/llbitmap: %s: chunksize not a power of 2",
+ mdname(mddev));
+ goto out_put_page;
+ }
+
+ if (chunksize < DIV_ROUND_UP(mddev->resync_max_sectors,
+ mddev->bitmap_info.space << SECTOR_SHIFT)) {
+ pr_err("md/llbitmap: %s: chunksize too small %lu < %llu / %lu",
+ mdname(mddev), chunksize, mddev->resync_max_sectors,
+ mddev->bitmap_info.space);
+ goto out_put_page;
+ }
+
+ daemon_sleep = le32_to_cpu(sb->daemon_sleep);
+ if (daemon_sleep < 1 || daemon_sleep > MAX_SCHEDULE_TIMEOUT / HZ) {
+ pr_err("md/llbitmap: %s: daemon sleep %lu period out of range",
+ mdname(mddev), daemon_sleep);
+ goto out_put_page;
+ }
+
+ if (le32_to_cpu(sb->write_behind))
+ pr_warn("md/llbitmap: %s: slow disk is not supported",
+ mdname(mddev));
+
+ events = le64_to_cpu(sb->events);
+ if (events < mddev->events) {
+ pr_warn("md/llbitmap :%s: bitmap file is out of date (%lu < %llu) -- forcing full recovery",
+ mdname(mddev), events, mddev->events);
+ set_bit(BITMAP_STALE, &llbitmap->flags);
+ }
+
+ sb->sync_size = cpu_to_le64(mddev->resync_max_sectors);
+ mddev->bitmap_info.chunksize = chunksize;
+ mddev->bitmap_info.daemon_sleep = daemon_sleep;
+
+ llbitmap->chunksize = chunksize;
+ llbitmap->chunks = DIV_ROUND_UP(mddev->resync_max_sectors, chunksize);
+ llbitmap->chunkshift = ffz(~chunksize);
+ ret = llbitmap_cache_pages(llbitmap);
+
+out_put_page:
+ __free_page(sb_page);
+out:
+ kunmap_local(sb);
+ return ret;
+}
+
+static int llbitmap_create(struct mddev *mddev)
+{
+ struct llbitmap *llbitmap;
+ int ret;
+
+ ret = llbitmap_check_support(mddev);
+ if (ret)
+ return ret;
+
+ llbitmap = kzalloc(sizeof(*llbitmap), GFP_KERNEL);
+ if (!llbitmap)
+ return -ENOMEM;
+
+ llbitmap->mddev = mddev;
+ llbitmap->io_size = bdev_logical_block_size(mddev->gendisk->part0);
+ llbitmap->bits_per_page = PAGE_SIZE / llbitmap->io_size;
+
+ timer_setup(&llbitmap->pending_timer, llbitmap_pending_timer_fn, 0);
+ INIT_WORK(&llbitmap->daemon_work, md_llbitmap_daemon_fn);
+
+ mutex_lock(&mddev->bitmap_info.mutex);
+ mddev->bitmap = llbitmap;
+ ret = llbitmap_read_sb(llbitmap);
+ mutex_unlock(&mddev->bitmap_info.mutex);
+ if (ret)
+ goto err_out;
+
+ return 0;
+
+err_out:
+ kfree(llbitmap);
+ return ret;
+}
+
+static int llbitmap_resize(struct mddev *mddev, sector_t blocks, int chunksize)
+{
+ struct llbitmap *llbitmap = mddev->bitmap;
+ unsigned long chunks;
+
+ if (chunksize == 0)
+ chunksize = llbitmap->chunksize;
+
+ /* If there is enough space, leave the chunksize unchanged. */
+ chunks = DIV_ROUND_UP(blocks, chunksize);
+ while (chunks > mddev->bitmap_info.space << SECTOR_SHIFT) {
+ chunksize = chunksize << 1;
+ chunks = DIV_ROUND_UP(blocks, chunksize);
+ }
+
+ llbitmap->chunkshift = ffz(~chunksize);
+ llbitmap->chunksize = chunksize;
+ llbitmap->chunks = chunks;
+
+ return 0;
+}
+
+static int llbitmap_load(struct mddev *mddev)
+{
+ enum llbitmap_action action = BitmapActionReload;
+ struct llbitmap *llbitmap = mddev->bitmap;
+
+ if (test_and_clear_bit(BITMAP_STALE, &llbitmap->flags))
+ action = BitmapActionStale;
+
+ llbitmap_state_machine(llbitmap, 0, llbitmap->chunks - 1, action);
+ return 0;
+}
+
+static void llbitmap_destroy(struct mddev *mddev)
+{
+ struct llbitmap *llbitmap = mddev->bitmap;
+
+ if (!llbitmap)
+ return;
+
+ mutex_lock(&mddev->bitmap_info.mutex);
+
+ timer_delete_sync(&llbitmap->pending_timer);
+ flush_workqueue(md_llbitmap_io_wq);
+ flush_workqueue(md_llbitmap_unplug_wq);
+
+ mddev->bitmap = NULL;
+ llbitmap_free_pages(llbitmap);
+ kfree(llbitmap);
+ mutex_unlock(&mddev->bitmap_info.mutex);
+}
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH RFC md-6.16 v3 15/19] md/md-llbitmap: implement APIs to dirty bits and clear bits
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
` (13 preceding siblings ...)
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 14/19] md/md-llbitmap: implement APIs to mange bitmap lifetime Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 5:17 ` Christoph Hellwig
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 16/19] md/md-llbitmap: implement APIs for sync_thread Yu Kuai
` (4 subsequent siblings)
19 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
Include following APIs:
- llbitmap_startwrite
- llbitmap_endwrite
- llbitmap_start_discard
- llbitmap_end_discard
- llbitmap_unplug
- llbitmap_flush
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md-llbitmap.c | 206 +++++++++++++++++++++++++++++++++++++++
1 file changed, 206 insertions(+)
diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 4b54aa6fbe40..71234c0ae160 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -784,6 +784,68 @@ static int llbitmap_read_sb(struct llbitmap *llbitmap)
return ret;
}
+static void llbitmap_pending_timer_fn(struct timer_list *t)
+{
+ struct llbitmap *llbitmap = from_timer(llbitmap, t, pending_timer);
+
+ if (work_busy(&llbitmap->daemon_work)) {
+ pr_warn("daemon_work not finished\n");
+ set_bit(BITMAP_DAEMON_BUSY, &llbitmap->flags);
+ return;
+ }
+
+ queue_work(md_llbitmap_io_wq, &llbitmap->daemon_work);
+}
+
+static void md_llbitmap_daemon_fn(struct work_struct *work)
+{
+ struct llbitmap *llbitmap =
+ container_of(work, struct llbitmap, daemon_work);
+ unsigned long start;
+ unsigned long end;
+ bool restart;
+ int idx;
+
+ if (llbitmap->mddev->degraded)
+ return;
+
+retry:
+ start = 0;
+ end = min(llbitmap->chunks, PAGE_SIZE - BITMAP_SB_SIZE) - 1;
+ restart = false;
+
+ for (idx = 0; idx < llbitmap->nr_pages; idx++) {
+ struct llbitmap_barrier *barrier = &llbitmap->barrier[idx];
+
+ if (idx > 0) {
+ start = end + 1;
+ end = min(end + PAGE_SIZE, llbitmap->chunks - 1);
+ }
+
+ if (!test_bit(LLPageFlush, &barrier->flags) &&
+ time_before(jiffies, barrier->expire)) {
+ restart = true;
+ continue;
+ }
+
+ llbitmap_suspend(llbitmap, idx);
+ llbitmap_state_machine(llbitmap, start, end, BitmapActionDaemon);
+ llbitmap_resume(llbitmap, idx);
+ }
+
+ /*
+ * If the daemon took a long time to finish, retry to prevent missing
+ * clearing dirty bits.
+ */
+ if (test_and_clear_bit(BITMAP_DAEMON_BUSY, &llbitmap->flags))
+ goto retry;
+
+ /* If some page is dirty but not expired, setup timer again */
+ if (restart)
+ mod_timer(&llbitmap->pending_timer,
+ jiffies + llbitmap->mddev->bitmap_info.daemon_sleep * HZ);
+}
+
static int llbitmap_create(struct mddev *mddev)
{
struct llbitmap *llbitmap;
@@ -870,3 +932,147 @@ static void llbitmap_destroy(struct mddev *mddev)
kfree(llbitmap);
mutex_unlock(&mddev->bitmap_info.mutex);
}
+
+static int llbitmap_startwrite(struct mddev *mddev, sector_t offset,
+ unsigned long sectors)
+{
+ struct llbitmap *llbitmap = mddev->bitmap;
+ unsigned long start = offset >> llbitmap->chunkshift;
+ unsigned long end = (offset + sectors - 1) >> llbitmap->chunkshift;
+ int page_start = (start + BITMAP_SB_SIZE) >> PAGE_SHIFT;
+ int page_end = (end + BITMAP_SB_SIZE) >> PAGE_SHIFT;
+
+ llbitmap_state_machine(llbitmap, start, end, BitmapActionStartwrite);
+
+
+ while (page_start <= page_end) {
+ llbitmap_raise_barrier(llbitmap, page_start);
+ page_start++;
+ }
+
+ return 0;
+}
+
+static void llbitmap_endwrite(struct mddev *mddev, sector_t offset,
+ unsigned long sectors)
+{
+ struct llbitmap *llbitmap = mddev->bitmap;
+ unsigned long start = offset >> llbitmap->chunkshift;
+ unsigned long end = (offset + sectors - 1) >> llbitmap->chunkshift;
+ int page_start = (start + BITMAP_SB_SIZE) >> PAGE_SHIFT;
+ int page_end = (end + BITMAP_SB_SIZE) >> PAGE_SHIFT;
+
+ while (page_start <= page_end) {
+ llbitmap_release_barrier(llbitmap, page_start);
+ page_start++;
+ }
+}
+
+static int llbitmap_start_discard(struct mddev *mddev, sector_t offset,
+ unsigned long sectors)
+{
+ struct llbitmap *llbitmap = mddev->bitmap;
+ unsigned long start = DIV_ROUND_UP(offset, llbitmap->chunksize);
+ unsigned long end = (offset + sectors - 1) >> llbitmap->chunkshift;
+ int page_start = (start + BITMAP_SB_SIZE) >> PAGE_SHIFT;
+ int page_end = (end + BITMAP_SB_SIZE) >> PAGE_SHIFT;
+
+ llbitmap_state_machine(llbitmap, start, end, BitmapActionDiscard);
+
+ while (page_start <= page_end) {
+ llbitmap_raise_barrier(llbitmap, page_start);
+ page_start++;
+ }
+
+ return 0;
+}
+
+static void llbitmap_end_discard(struct mddev *mddev, sector_t offset,
+ unsigned long sectors)
+{
+ struct llbitmap *llbitmap = mddev->bitmap;
+ unsigned long start = DIV_ROUND_UP(offset, llbitmap->chunksize);
+ unsigned long end = (offset + sectors - 1) >> llbitmap->chunkshift;
+ int page_start = (start + BITMAP_SB_SIZE) >> PAGE_SHIFT;
+ int page_end = (end + BITMAP_SB_SIZE) >> PAGE_SHIFT;
+
+ while (page_start <= page_end) {
+ llbitmap_release_barrier(llbitmap, page_start);
+ page_start++;
+ }
+}
+
+static void llbitmap_unplug_fn(struct work_struct *work)
+{
+ struct llbitmap_unplug_work *unplug_work =
+ container_of(work, struct llbitmap_unplug_work, work);
+ struct llbitmap *llbitmap = unplug_work->llbitmap;
+ struct blk_plug plug;
+ int i;
+
+ blk_start_plug(&plug);
+
+ for (i = 0; i < llbitmap->nr_pages; i++) {
+ if (!test_bit(LLPageDirty, &llbitmap->barrier[i].flags) ||
+ !test_and_clear_bit(LLPageDirty, &llbitmap->barrier[i].flags))
+ continue;
+
+ llbitmap_write_page(llbitmap, i);
+ }
+
+ blk_finish_plug(&plug);
+ md_super_wait(llbitmap->mddev);
+ complete(unplug_work->done);
+}
+
+static bool llbitmap_dirty(struct llbitmap *llbitmap)
+{
+ int i;
+
+ for (i = 0; i < llbitmap->nr_pages; i++)
+ if (test_bit(LLPageDirty, &llbitmap->barrier[i].flags))
+ return true;
+
+ return false;
+}
+
+static void llbitmap_unplug(struct mddev *mddev, bool sync)
+{
+ DECLARE_COMPLETION_ONSTACK(done);
+ struct llbitmap *llbitmap = mddev->bitmap;
+ struct llbitmap_unplug_work unplug_work = {
+ .llbitmap = llbitmap,
+ .done = &done,
+ };
+
+ if (!llbitmap_dirty(llbitmap))
+ return;
+
+ INIT_WORK_ONSTACK(&unplug_work.work, llbitmap_unplug_fn);
+ queue_work(md_llbitmap_unplug_wq, &unplug_work.work);
+ wait_for_completion(&done);
+ destroy_work_on_stack(&unplug_work.work);
+}
+
+static void llbitmap_flush(struct mddev *mddev)
+{
+ struct llbitmap *llbitmap = mddev->bitmap;
+ struct blk_plug plug;
+ int i;
+
+ for (i = 0; i < llbitmap->nr_pages; i++)
+ set_bit(LLPageFlush, &llbitmap->barrier[i].flags);
+
+ timer_delete_sync(&llbitmap->pending_timer);
+ queue_work(md_llbitmap_io_wq, &llbitmap->daemon_work);
+ flush_work(&llbitmap->daemon_work);
+
+ blk_start_plug(&plug);
+ for (i = 0; i < llbitmap->nr_pages; i++) {
+ /* mark all bits as dirty */
+ bitmap_fill(llbitmap->barrier[i].dirty, llbitmap->bits_per_page);
+ llbitmap_write_page(llbitmap, i);
+ }
+ blk_finish_plug(&plug);
+ md_super_wait(llbitmap->mddev);
+}
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 15/19] md/md-llbitmap: implement APIs to dirty bits and clear bits
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 15/19] md/md-llbitmap: implement APIs to dirty bits and clear bits Yu Kuai
@ 2025-05-12 5:17 ` Christoph Hellwig
2025-05-12 8:23 ` Yu Kuai
0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 5:17 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi
On Mon, May 12, 2025 at 09:19:23AM +0800, Yu Kuai wrote:
> +static void llbitmap_unplug(struct mddev *mddev, bool sync)
> +{
> + DECLARE_COMPLETION_ONSTACK(done);
> + struct llbitmap *llbitmap = mddev->bitmap;
> + struct llbitmap_unplug_work unplug_work = {
> + .llbitmap = llbitmap,
> + .done = &done,
> + };
> +
> + if (!llbitmap_dirty(llbitmap))
> + return;
> +
> + INIT_WORK_ONSTACK(&unplug_work.work, llbitmap_unplug_fn);
> + queue_work(md_llbitmap_unplug_wq, &unplug_work.work);
> + wait_for_completion(&done);
> + destroy_work_on_stack(&unplug_work.work);
Why is this deferring the work to a workqueue, but then synchronously
waits on it?
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 15/19] md/md-llbitmap: implement APIs to dirty bits and clear bits
2025-05-12 5:17 ` Christoph Hellwig
@ 2025-05-12 8:23 ` Yu Kuai
2025-05-12 13:26 ` Christoph Hellwig
0 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 8:23 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: xni, colyli, agk, snitzer, mpatocka, song, linux-kernel, dm-devel,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/05/12 13:17, Christoph Hellwig 写道:
> On Mon, May 12, 2025 at 09:19:23AM +0800, Yu Kuai wrote:
>> +static void llbitmap_unplug(struct mddev *mddev, bool sync)
>> +{
>> + DECLARE_COMPLETION_ONSTACK(done);
>> + struct llbitmap *llbitmap = mddev->bitmap;
>> + struct llbitmap_unplug_work unplug_work = {
>> + .llbitmap = llbitmap,
>> + .done = &done,
>> + };
>> +
>> + if (!llbitmap_dirty(llbitmap))
>> + return;
>> +
>> + INIT_WORK_ONSTACK(&unplug_work.work, llbitmap_unplug_fn);
>> + queue_work(md_llbitmap_unplug_wq, &unplug_work.work);
>> + wait_for_completion(&done);
>> + destroy_work_on_stack(&unplug_work.work);
>
> Why is this deferring the work to a workqueue, but then synchronously
> waits on it?
This is the same as old bitmap, by the fact that issue new IO and wait
for such IO to be done from submit_bio() context will deadlock.
1) bitmap bio must be done before this bio can be issued;
2) bitmap bio will be added to current->bio_list, and wait for this bio
to be issued;
Do you have a better sulution to this problem?
Thanks,
Kuai
>
> .
>
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 15/19] md/md-llbitmap: implement APIs to dirty bits and clear bits
2025-05-12 8:23 ` Yu Kuai
@ 2025-05-12 13:26 ` Christoph Hellwig
2025-05-12 13:30 ` Christoph Hellwig
0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 13:26 UTC (permalink / raw)
To: Yu Kuai
Cc: Christoph Hellwig, xni, colyli, agk, snitzer, mpatocka, song,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi, yukuai (C)
On Mon, May 12, 2025 at 04:23:55PM +0800, Yu Kuai wrote:
>>> + INIT_WORK_ONSTACK(&unplug_work.work, llbitmap_unplug_fn);
>>> + queue_work(md_llbitmap_unplug_wq, &unplug_work.work);
>>> + wait_for_completion(&done);
>>> + destroy_work_on_stack(&unplug_work.work);
>>
>> Why is this deferring the work to a workqueue, but then synchronously
>> waits on it?
>
> This is the same as old bitmap, by the fact that issue new IO and wait
> for such IO to be done from submit_bio() context will deadlock.
>
> 1) bitmap bio must be done before this bio can be issued;
> 2) bitmap bio will be added to current->bio_list, and wait for this bio
> to be issued;
>
> Do you have a better sulution to this problem?
A bew block layer API that bypasses bio_list maybe? I.e. export
__submit_bio with a better name and a kerneldoc detailing the narrow
use case.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 15/19] md/md-llbitmap: implement APIs to dirty bits and clear bits
2025-05-12 13:26 ` Christoph Hellwig
@ 2025-05-12 13:30 ` Christoph Hellwig
2025-05-12 13:36 ` Yu Kuai
0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 13:30 UTC (permalink / raw)
To: Yu Kuai
Cc: Christoph Hellwig, xni, colyli, agk, snitzer, mpatocka, song,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi, yukuai (C)
On Mon, May 12, 2025 at 03:26:41PM +0200, Christoph Hellwig wrote:
> > 1) bitmap bio must be done before this bio can be issued;
> > 2) bitmap bio will be added to current->bio_list, and wait for this bio
> > to be issued;
> >
> > Do you have a better sulution to this problem?
>
> A bew block layer API that bypasses bio_list maybe? I.e. export
> __submit_bio with a better name and a kerneldoc detailing the narrow
> use case.
That won't work as we'd miss a lot of checks, cgroup handling, etc.
But maybe a flag to skip the recursion avoidance?
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 15/19] md/md-llbitmap: implement APIs to dirty bits and clear bits
2025-05-12 13:30 ` Christoph Hellwig
@ 2025-05-12 13:36 ` Yu Kuai
2025-05-13 6:32 ` Yu Kuai
0 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 13:36 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: xni, colyli, agk, snitzer, mpatocka, song, linux-kernel, dm-devel,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/05/12 21:30, Christoph Hellwig 写道:
> On Mon, May 12, 2025 at 03:26:41PM +0200, Christoph Hellwig wrote:
>>> 1) bitmap bio must be done before this bio can be issued;
>>> 2) bitmap bio will be added to current->bio_list, and wait for this bio
>>> to be issued;
>>>
>>> Do you have a better sulution to this problem?
>>
>> A bew block layer API that bypasses bio_list maybe? I.e. export
>> __submit_bio with a better name and a kerneldoc detailing the narrow
>> use case.
>
> That won't work as we'd miss a lot of checks, cgroup handling, etc.
>
> But maybe a flag to skip the recursion avoidance?
I think this can work, and this can also improve performance. I'll look
into this.
Thanks,
Kuai
>
> .
>
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 15/19] md/md-llbitmap: implement APIs to dirty bits and clear bits
2025-05-12 13:36 ` Yu Kuai
@ 2025-05-13 6:32 ` Yu Kuai
2025-05-13 6:48 ` Christoph Hellwig
0 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-13 6:32 UTC (permalink / raw)
To: Yu Kuai, Christoph Hellwig
Cc: xni, colyli, agk, snitzer, mpatocka, song, linux-kernel, dm-devel,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/05/12 21:36, Yu Kuai 写道:
> Hi,
>
> 在 2025/05/12 21:30, Christoph Hellwig 写道:
>> On Mon, May 12, 2025 at 03:26:41PM +0200, Christoph Hellwig wrote:
>>>> 1) bitmap bio must be done before this bio can be issued;
>>>> 2) bitmap bio will be added to current->bio_list, and wait for this bio
>>>> to be issued;
>>>>
>>>> Do you have a better sulution to this problem?
>>>
>>> A bew block layer API that bypasses bio_list maybe? I.e. export
>>> __submit_bio with a better name and a kerneldoc detailing the narrow
>>> use case.
>>
>> That won't work as we'd miss a lot of checks, cgroup handling, etc.
>>
>> But maybe a flag to skip the recursion avoidance?
>
> I think this can work, and this can also improve performance. I'll look
> into this.
So, I did a quick test with old internal bitmap and make sure following
patch can work.
However, for bitmap file case, bio is issued from submit_bh(), I'll have
to change buffer_head code and I'm not sure if we want to do that.
Thanks,
Kuai
diff --git a/block/blk-core.c b/block/blk-core.c
index b862c66018f2..66ced5769694 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -745,7 +745,7 @@ void submit_bio_noacct_nocheck(struct bio *bio)
* to collect a list of requests submited by a ->submit_bio
method while
* it is active, and then process them after it returned.
*/
- if (current->bio_list)
+ if (current->bio_list && !bio_flagged(bio, BIO_STACKED_META))
bio_list_add(¤t->bio_list[0], bio);
else if (!bdev_test_flag(bio->bi_bdev, BD_HAS_SUBMIT_BIO))
__submit_bio_noacct_mq(bio);
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index 431a3ab2e449..e0cb210a4ea4 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -1257,35 +1257,6 @@ static void __bitmap_unplug(struct bitmap *bitmap)
md_bitmap_file_kick(bitmap);
}
-struct bitmap_unplug_work {
- struct work_struct work;
- struct bitmap *bitmap;
- struct completion *done;
-};
-
-static void md_bitmap_unplug_fn(struct work_struct *work)
-{
- struct bitmap_unplug_work *unplug_work =
- container_of(work, struct bitmap_unplug_work, work);
-
- __bitmap_unplug(unplug_work->bitmap);
- complete(unplug_work->done);
-}
-
-static void bitmap_unplug_async(struct bitmap *bitmap)
-{
- DECLARE_COMPLETION_ONSTACK(done);
- struct bitmap_unplug_work unplug_work;
-
- INIT_WORK_ONSTACK(&unplug_work.work, md_bitmap_unplug_fn);
- unplug_work.bitmap = bitmap;
- unplug_work.done = &done;
-
- queue_work(md_bitmap_wq, &unplug_work.work);
- wait_for_completion(&done);
- destroy_work_on_stack(&unplug_work.work);
-}
-
static void bitmap_unplug(struct mddev *mddev, bool sync)
{
struct bitmap *bitmap = mddev->bitmap;
@@ -1293,10 +1264,7 @@ static void bitmap_unplug(struct mddev *mddev,
bool sync)
if (!bitmap)
return;
- if (sync)
- __bitmap_unplug(bitmap);
- else
- bitmap_unplug_async(bitmap);
+ __bitmap_unplug(bitmap);
}
static void md_bitmap_set_memory_bits(struct bitmap *bitmap, sector_t
offset, int needed);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 32b997dfe6f4..179eabd6e038 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1050,6 +1050,7 @@ void md_super_write(struct mddev *mddev, struct
md_rdev *rdev,
__bio_add_page(bio, page, size, 0);
bio->bi_private = rdev;
bio->bi_end_io = super_written;
+ bio_set_flag(bio, BIO_STACKED_META);
if (test_bit(MD_FAILFAST_SUPPORTED, &mddev->flags) &&
test_bit(FailFast, &rdev->flags) &&
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index f38425338c3f..88164cdae6aa 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -300,6 +300,7 @@ enum {
BIO_REMAPPED,
BIO_ZONE_WRITE_PLUGGING, /* bio handled through zone write
plugging */
BIO_EMULATES_ZONE_APPEND, /* bio emulates a zone append
operation */
+ BIO_STACKED_META, /* bio is metadata from stacked device */
BIO_FLAG_LAST
};
>
> Thanks,
> Kuai
>
>>
>> .
>>
>
> .
>
^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 15/19] md/md-llbitmap: implement APIs to dirty bits and clear bits
2025-05-13 6:32 ` Yu Kuai
@ 2025-05-13 6:48 ` Christoph Hellwig
2025-05-13 7:14 ` Yu Kuai
0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-13 6:48 UTC (permalink / raw)
To: Yu Kuai
Cc: Christoph Hellwig, xni, colyli, agk, snitzer, mpatocka, song,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi, yukuai (C)
On Tue, May 13, 2025 at 02:32:04PM +0800, Yu Kuai wrote:
> However, for bitmap file case, bio is issued from submit_bh(), I'll have
> to change buffer_head code and I'm not sure if we want to do that.
Don't bother with the old code, I'm still hoping we can replace it with
your new code ASAP.
> + BIO_STACKED_META, /* bio is metadata from stacked device */
I don't think that is the right name and description. The whole point
of the recursion avoidance is to to supported stacked devices by
reducing the stack depth. So we clearly need to document why that
is not desirable for some very specific cases like reading in metadata
needed to process a write I/O. We should also ensure it doesn't
propagate to lover stacked devices.
In fact I wonder if a better interfaces would be one to stash away
current->bio_list and then restore it after the call, as that would
enforce all that.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 15/19] md/md-llbitmap: implement APIs to dirty bits and clear bits
2025-05-13 6:48 ` Christoph Hellwig
@ 2025-05-13 7:14 ` Yu Kuai
2025-05-13 7:43 ` Christoph Hellwig
0 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-13 7:14 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: xni, colyli, agk, snitzer, mpatocka, song, linux-kernel, dm-devel,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/05/13 14:48, Christoph Hellwig 写道:
> On Tue, May 13, 2025 at 02:32:04PM +0800, Yu Kuai wrote:
>> However, for bitmap file case, bio is issued from submit_bh(), I'll have
>> to change buffer_head code and I'm not sure if we want to do that.
>
> Don't bother with the old code, I'm still hoping we can replace it with
> your new code ASAP.
Agreed :)
>
>> + BIO_STACKED_META, /* bio is metadata from stacked device */
>
> I don't think that is the right name and description. The whole point
> of the recursion avoidance is to to supported stacked devices by
> reducing the stack depth. So we clearly need to document why that
> is not desirable for some very specific cases like reading in metadata
> needed to process a write I/O. We should also ensure it doesn't
> propagate to lover stacked devices.
>
> In fact I wonder if a better interfaces would be one to stash away
> current->bio_list and then restore it after the call, as that would
> enforce all that.
Yes, following change can work as well.
Just wonder, if the array is created by another array, and which is
created by another array ... In this case, the stack depth can be
huge. :( This is super weird case, however, should we keep the old code
in this case?
Thanks,
Kuai
static void bitmap_unplug(struct mddev *mddev, bool sync)
{
struct bitmap *bitmap = mddev->bitmap;
+ struct bio_list *bio_list = NULL;
if (!bitmap)
return;
- if (sync)
- __bitmap_unplug(bitmap);
- else
- bitmap_unplug_async(bitmap);
+ if (current->bio_list) {
+ bio_list = current->bio_list;
+ current->bio_list = NULL;
+ }
+
+ __bitmap_unplug(bitmap);
+
+ current->bio_list = bio_list;
}
> .
>
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 15/19] md/md-llbitmap: implement APIs to dirty bits and clear bits
2025-05-13 7:14 ` Yu Kuai
@ 2025-05-13 7:43 ` Christoph Hellwig
2025-05-13 9:32 ` Yu Kuai
0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-13 7:43 UTC (permalink / raw)
To: Yu Kuai
Cc: Christoph Hellwig, xni, colyli, agk, snitzer, mpatocka, song,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi, yukuai (C)
On Tue, May 13, 2025 at 03:14:03PM +0800, Yu Kuai wrote:
> Yes, following change can work as well.
>
> Just wonder, if the array is created by another array, and which is
> created by another array ... In this case, the stack depth can be
> huge. :( This is super weird case, however, should we keep the old code
> in this case?
Yeah, that's a good question. Stacking multiple arrays using bitmaps
on top of each other is weird. But what if other block remappers
starting to use this for other remapping and they are stacked? That
seems much more likely unfotunately, so maybe we can't go down this
route after all, sorry for leading you to it.
So instead just write a comment documenting why you switch to a
different stack using the workqueue.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 15/19] md/md-llbitmap: implement APIs to dirty bits and clear bits
2025-05-13 7:43 ` Christoph Hellwig
@ 2025-05-13 9:32 ` Yu Kuai
2025-05-14 5:17 ` Christoph Hellwig
0 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-13 9:32 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: xni, colyli, agk, snitzer, mpatocka, song, linux-kernel, dm-devel,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/05/13 15:43, Christoph Hellwig 写道:
> On Tue, May 13, 2025 at 03:14:03PM +0800, Yu Kuai wrote:
>> Yes, following change can work as well.
>>
>> Just wonder, if the array is created by another array, and which is
>> created by another array ... In this case, the stack depth can be
>> huge. :( This is super weird case, however, should we keep the old code
>> in this case?
>
> Yeah, that's a good question. Stacking multiple arrays using bitmaps
> on top of each other is weird. But what if other block remappers
> starting to use this for other remapping and they are stacked? That
> seems much more likely unfotunately, so maybe we can't go down this
> route after all, sorry for leading you to it.
I was thinking about record a stack dev depth in mddev to handle the
weird case inside raid. Is there other stack device have the same
problem? AFAIK, some dm targets like dm-crypt are using workqueue
to handle all IO.
I'm still interested because this can improve first write latency.
>
> So instead just write a comment documenting why you switch to a
> different stack using the workqueue.
Ok, I'll add comment if we keep using the workqueue.
Thanks,
Kuai
> .
>
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 15/19] md/md-llbitmap: implement APIs to dirty bits and clear bits
2025-05-13 9:32 ` Yu Kuai
@ 2025-05-14 5:17 ` Christoph Hellwig
2025-05-15 13:38 ` Yu Kuai
0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-14 5:17 UTC (permalink / raw)
To: Yu Kuai
Cc: Christoph Hellwig, xni, colyli, agk, snitzer, mpatocka, song,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi, yukuai (C)
On Tue, May 13, 2025 at 05:32:13PM +0800, Yu Kuai wrote:
> I was thinking about record a stack dev depth in mddev to handle the
> weird case inside raid. Is there other stack device have the same
> problem? AFAIK, some dm targets like dm-crypt are using workqueue
> to handle all IO.
I guess anything that might have to read in metadata to serve a
data I/O. bcache, dm-snapshot and dm-thinkp would be candidates for
that, but I haven't checked the implementation.
> I'm still interested because this can improve first write latency.
>
>>
>> So instead just write a comment documenting why you switch to a
>> different stack using the workqueue.
>
> Ok, I'll add comment if we keep using the workqueue.
Maybe do that for getting the new bitmap code in ASAP and then
revisit the above separately?
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 15/19] md/md-llbitmap: implement APIs to dirty bits and clear bits
2025-05-14 5:17 ` Christoph Hellwig
@ 2025-05-15 13:38 ` Yu Kuai
0 siblings, 0 replies; 63+ messages in thread
From: Yu Kuai @ 2025-05-15 13:38 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: xni, colyli, agk, snitzer, mpatocka, song, linux-kernel, dm-devel,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
Hi
于 2025年5月14日 GMT+08:00 13:17:47,Christoph Hellwig <hch@lst.de> 写道:
>On Tue, May 13, 2025 at 05:32:13PM +0800, Yu Kuai wrote:
>> I was thinking about record a stack dev depth in mddev to handle the
>> weird case inside raid. Is there other stack device have the same
>> problem? AFAIK, some dm targets like dm-crypt are using workqueue
>> to handle all IO.
>
>I guess anything that might have to read in metadata to serve a
>data I/O. bcache, dm-snapshot and dm-thinkp would be candidates for
>that, but I haven't checked the implementation.
>
>> I'm still interested because this can improve first write latency.
>>
>>>
>>> So instead just write a comment documenting why you switch to a
>>> different stack using the workqueue.
>>
>> Ok, I'll add comment if we keep using the workqueue.
>
>Maybe do that for getting the new bitmap code in ASAP and then
>revisit the above separately?
>
>
Sure, sorry that I am in a business travel suddenly. I will get back to
this ASAP I return.
Thanks
Kuai
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH RFC md-6.16 v3 16/19] md/md-llbitmap: implement APIs for sync_thread
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
` (14 preceding siblings ...)
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 15/19] md/md-llbitmap: implement APIs to dirty bits and clear bits Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 17/19] md/md-llbitmap: implement all bitmap operations Yu Kuai
` (3 subsequent siblings)
19 siblings, 0 replies; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
Include following APIs:
- llbitmap_blocks_synced
- llbitmap_skip_sync_blocks
- llbitmap_start_sync
- llbitmap_end_sync
- llbitmap_close_sync
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md-llbitmap.c | 83 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 83 insertions(+)
diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 71234c0ae160..3169ae8b72be 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -1076,3 +1076,86 @@ static void llbitmap_flush(struct mddev *mddev)
blk_finish_plug(&plug);
md_super_wait(llbitmap->mddev);
}
+
+/* This is used for raid5 lazy initial recovery */
+static bool llbitmap_blocks_synced(struct mddev *mddev, sector_t offset)
+{
+ struct llbitmap *llbitmap = mddev->bitmap;
+ unsigned long p = offset >> llbitmap->chunkshift;
+ enum llbitmap_state c;
+ int ret;
+
+ ret = llbitmap_read(llbitmap, &c, p);
+ if (ret < 0) {
+ set_bit(BITMAP_WRITE_ERROR, &llbitmap->flags);
+ return false;
+ }
+
+ return c == BitClean || c == BitDirty;
+}
+
+static sector_t llbitmap_skip_sync_blocks(struct mddev *mddev, sector_t offset)
+{
+ struct llbitmap *llbitmap = mddev->bitmap;
+ unsigned long p = offset >> llbitmap->chunkshift;
+ int blocks = llbitmap->chunksize - (offset & (llbitmap->chunksize - 1));
+ enum llbitmap_state c;
+ int ret;
+
+ ret = llbitmap_read(llbitmap, &c, p);
+ if (ret < 0) {
+ set_bit(BITMAP_WRITE_ERROR, &llbitmap->flags);
+ return 0;
+ }
+
+ /* always skip unwritten blocks */
+ if (c == BitUnwritten)
+ return blocks;
+
+ /* For resync also skip clean/dirty blocks */
+ if ((c == BitClean || c == BitDirty) &&
+ test_bit(MD_RECOVERY_SYNC, &mddev->recovery) &&
+ !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
+ return blocks;
+
+ return 0;
+}
+
+static bool llbitmap_start_sync(struct mddev *mddev, sector_t offset,
+ sector_t *blocks, bool degraded)
+{
+ struct llbitmap *llbitmap = mddev->bitmap;
+ unsigned long p = offset >> llbitmap->chunkshift;
+
+ /*
+ * Handle one bit at a time, this is much simpler. And it doesn't matter
+ * if md_do_sync() loop more times.
+ */
+ *blocks = llbitmap->chunksize - (offset & (llbitmap->chunksize - 1));
+ return llbitmap_state_machine(llbitmap, p, p, BitmapActionStartsync) == BitSyncing;
+}
+
+static void llbitmap_end_sync(struct mddev *mddev, sector_t offset,
+ sector_t *blocks)
+{
+ struct llbitmap *llbitmap = mddev->bitmap;
+ unsigned long p = offset >> llbitmap->chunkshift;
+
+ *blocks = llbitmap->chunksize - (offset & (llbitmap->chunksize - 1));
+ llbitmap_state_machine(llbitmap, p, llbitmap->chunks - 1, BitmapActionAbortsync);
+}
+
+static void llbitmap_close_sync(struct mddev *mddev)
+{
+ struct llbitmap *llbitmap = mddev->bitmap;
+ int i;
+
+ for (i = 0; i < llbitmap->nr_pages; i++) {
+ struct llbitmap_barrier *barrier = &llbitmap->barrier[i];
+
+ /* let daemon_fn clear dirty bits immediately */
+ WRITE_ONCE(barrier->expire, jiffies);
+ }
+
+ llbitmap_state_machine(llbitmap, 0, llbitmap->chunks - 1, BitmapActionEndsync);
+}
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH RFC md-6.16 v3 17/19] md/md-llbitmap: implement all bitmap operations
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
` (15 preceding siblings ...)
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 16/19] md/md-llbitmap: implement APIs for sync_thread Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 5:18 ` Christoph Hellwig
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 18/19] md/md-llbitmap: implement sysfs APIs Yu Kuai
` (2 subsequent siblings)
19 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
Include following left APIs
- llbitmap_enabled
- llbitmap_dirty_bits
- llbitmap_update_sb
And following APIs that are not needed:
- llbitmap_write_all, used in old bitmap to mark all pages need
writeback;
- llbitmap_daemon_work, used in old bitmap, llbitmap use timer to
trigger daemon;
- llbitmap_cond_end_sync, use to end sync for completed sectors(TODO,
don't affect functionality)
And following APIs that are not supported:
- llbitmap_start_behind_write
- llbitmap_end_behind_write
- llbitmap_wait_behind_writes
- llbitmap_sync_with_cluster
- llbitmap_get_from_slot
- llbitmap_copy_from_slot
- llbitmap_set_pages
- llbitmap_free
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md-llbitmap.c | 125 +++++++++++++++++++++++++++++++++++++++
1 file changed, 125 insertions(+)
diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 3169ae8b72be..e381859efcd7 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -1159,3 +1159,128 @@ static void llbitmap_close_sync(struct mddev *mddev)
llbitmap_state_machine(llbitmap, 0, llbitmap->chunks - 1, BitmapActionEndsync);
}
+
+static bool llbitmap_enabled(void *data)
+{
+ struct llbitmap *llbitmap = data;
+
+ return llbitmap && !test_bit(BITMAP_WRITE_ERROR, &llbitmap->flags);
+}
+
+static void llbitmap_dirty_bits(struct mddev *mddev, unsigned long s,
+ unsigned long e)
+{
+ llbitmap_state_machine(mddev->bitmap, s, e, BitmapActionStartwrite);
+}
+
+static void llbitmap_write_sb(struct llbitmap *llbitmap)
+{
+ int nr_bits = round_up(BITMAP_SB_SIZE, llbitmap->io_size) / llbitmap->io_size;
+
+ bitmap_fill(llbitmap->barrier[0].dirty, nr_bits);
+ llbitmap_write_page(llbitmap, 0);
+ md_super_wait(llbitmap->mddev);
+}
+
+static void llbitmap_update_sb(void *data)
+{
+ struct llbitmap *llbitmap = data;
+ struct mddev *mddev = llbitmap->mddev;
+ struct page *sb_page;
+ bitmap_super_t *sb;
+
+ if (test_bit(BITMAP_WRITE_ERROR, &llbitmap->flags))
+ return;
+
+ sb_page = llbitmap_read_page(llbitmap, 0);
+ if (IS_ERR(sb_page)) {
+ pr_err("%s: %s: read super block failed", __func__,
+ mdname(mddev));
+ set_bit(BITMAP_WRITE_ERROR, &llbitmap->flags);
+ return;
+ }
+
+ if (mddev->events < llbitmap->events_cleared)
+ llbitmap->events_cleared = mddev->events;
+
+ sb = kmap_local_page(sb_page);
+ sb->events = cpu_to_le64(mddev->events);
+ sb->state = cpu_to_le32(llbitmap->flags);
+ sb->chunksize = cpu_to_le32(llbitmap->chunksize);
+ sb->sync_size = cpu_to_le64(mddev->resync_max_sectors);
+ sb->events_cleared = cpu_to_le64(llbitmap->events_cleared);
+ sb->sectors_reserved = cpu_to_le32(mddev->bitmap_info.space);
+ sb->daemon_sleep = cpu_to_le32(mddev->bitmap_info.daemon_sleep);
+
+ kunmap_local(sb);
+ llbitmap_write_sb(llbitmap);
+}
+
+static int llbitmap_get_stats(void *data, struct md_bitmap_stats *stats)
+{
+ struct llbitmap *llbitmap = data;
+
+ memset(stats, 0, sizeof(*stats));
+
+ stats->missing_pages = 0;
+ stats->pages = llbitmap->nr_pages;
+ stats->file_pages = llbitmap->nr_pages;
+
+ return 0;
+}
+
+static void llbitmap_write_all(struct mddev *mddev)
+{
+
+}
+
+static void llbitmap_daemon_work(struct mddev *mddev)
+{
+
+}
+
+static void llbitmap_start_behind_write(struct mddev *mddev)
+{
+
+}
+
+static void llbitmap_end_behind_write(struct mddev *mddev)
+{
+
+}
+
+static void llbitmap_wait_behind_writes(struct mddev *mddev)
+{
+
+}
+
+static void llbitmap_cond_end_sync(struct mddev *mddev, sector_t sector,
+ bool force)
+{
+}
+
+static void llbitmap_sync_with_cluster(struct mddev *mddev,
+ sector_t old_lo, sector_t old_hi,
+ sector_t new_lo, sector_t new_hi)
+{
+
+}
+
+static void *llbitmap_get_from_slot(struct mddev *mddev, int slot)
+{
+ return ERR_PTR(-EOPNOTSUPP);
+}
+
+static int llbitmap_copy_from_slot(struct mddev *mddev, int slot, sector_t *low,
+ sector_t *high, bool clear_bits)
+{
+ return -EOPNOTSUPP;
+}
+
+static void llbitmap_set_pages(void *data, unsigned long pages)
+{
+}
+
+static void llbitmap_free(void *data)
+{
+}
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 17/19] md/md-llbitmap: implement all bitmap operations
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 17/19] md/md-llbitmap: implement all bitmap operations Yu Kuai
@ 2025-05-12 5:18 ` Christoph Hellwig
2025-05-12 8:28 ` Yu Kuai
0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 5:18 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi
On Mon, May 12, 2025 at 09:19:25AM +0800, Yu Kuai wrote:
> And following APIs that are not needed:
> - llbitmap_write_all, used in old bitmap to mark all pages need
> writeback;
> - llbitmap_daemon_work, used in old bitmap, llbitmap use timer to
> trigger daemon;
> - llbitmap_cond_end_sync, use to end sync for completed sectors(TODO,
> don't affect functionality)
> And following APIs that are not supported:
> - llbitmap_start_behind_write
> - llbitmap_end_behind_write
> - llbitmap_wait_behind_writes
> - llbitmap_sync_with_cluster
> - llbitmap_get_from_slot
> - llbitmap_copy_from_slot
> - llbitmap_set_pages
> - llbitmap_free
Please just make these optional instead of implementing stubs.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 17/19] md/md-llbitmap: implement all bitmap operations
2025-05-12 5:18 ` Christoph Hellwig
@ 2025-05-12 8:28 ` Yu Kuai
0 siblings, 0 replies; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 8:28 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: xni, colyli, agk, snitzer, mpatocka, song, linux-kernel, dm-devel,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/05/12 13:18, Christoph Hellwig 写道:
> On Mon, May 12, 2025 at 09:19:25AM +0800, Yu Kuai wrote:
>> And following APIs that are not needed:
>> - llbitmap_write_all, used in old bitmap to mark all pages need
>> writeback;
>> - llbitmap_daemon_work, used in old bitmap, llbitmap use timer to
>> trigger daemon;
>> - llbitmap_cond_end_sync, use to end sync for completed sectors(TODO,
>> don't affect functionality)
>> And following APIs that are not supported:
>> - llbitmap_start_behind_write
>> - llbitmap_end_behind_write
>> - llbitmap_wait_behind_writes
>> - llbitmap_sync_with_cluster
>> - llbitmap_get_from_slot
>> - llbitmap_copy_from_slot
>> - llbitmap_set_pages
>> - llbitmap_free
>
> Please just make these optional instead of implementing stubs.
Ok, I'll add a patch to check if those methods are NULL before calling
them.
Thanks,
Kuai
> .
>
^ permalink raw reply [flat|nested] 63+ messages in thread
* [PATCH RFC md-6.16 v3 18/19] md/md-llbitmap: implement sysfs APIs
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
` (16 preceding siblings ...)
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 17/19] md/md-llbitmap: implement all bitmap operations Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 19/19] md/md-llbitmap: add Kconfig Yu Kuai
2025-05-12 5:21 ` [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Christoph Hellwig
19 siblings, 0 replies; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
There are 3 APIs for now:
- bits: readonly, show status of bitmap bits, the number of each value;
- metadata: readonly show bitmap metadata, include chunksize, chunkshift,
chunks, offset and daemon_sleep;
- daemon_sleep: read-write, default value is 30;
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/md-llbitmap.c | 99 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 99 insertions(+)
diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index e381859efcd7..6993be132127 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -1284,3 +1284,102 @@ static void llbitmap_set_pages(void *data, unsigned long pages)
static void llbitmap_free(void *data)
{
}
+
+static ssize_t bits_show(struct mddev *mddev, char *page)
+{
+ struct llbitmap *llbitmap;
+ int bits[nr_llbitmap_state] = {0};
+ loff_t start = 0;
+
+ mutex_lock(&mddev->bitmap_info.mutex);
+ llbitmap = mddev->bitmap;
+ if (!llbitmap) {
+ mutex_unlock(&mddev->bitmap_info.mutex);
+ return sprintf(page, "no bitmap\n");
+ }
+
+ if (test_bit(BITMAP_WRITE_ERROR, &llbitmap->flags)) {
+ mutex_unlock(&mddev->bitmap_info.mutex);
+ return sprintf(page, "bitmap io error\n");
+ }
+
+ while (start < llbitmap->chunks) {
+ ssize_t ret;
+ enum llbitmap_state c;
+
+ ret = llbitmap_read(llbitmap, &c, start);
+ if (ret < 0) {
+ set_bit(BITMAP_WRITE_ERROR, &llbitmap->flags);
+ mutex_unlock(&mddev->bitmap_info.mutex);
+ return sprintf(page, "bitmap io error\n");
+ }
+
+ if (c < 0 || c >= nr_llbitmap_state)
+ pr_err("%s: invalid bit %llu state %d\n",
+ __func__, start, c);
+ else
+ bits[c]++;
+ start++;
+ }
+
+ mutex_unlock(&mddev->bitmap_info.mutex);
+ return sprintf(page, "unwritten %d\nclean %d\ndirty %d\nneed sync %d\nsyncing %d\n",
+ bits[BitUnwritten], bits[BitClean], bits[BitDirty],
+ bits[BitNeedSync], bits[BitSyncing]);
+}
+
+static struct md_sysfs_entry llbitmap_bits =
+__ATTR_RO(bits);
+
+static ssize_t metadata_show(struct mddev *mddev, char *page)
+{
+ struct llbitmap *llbitmap;
+ ssize_t ret;
+
+ mutex_lock(&mddev->bitmap_info.mutex);
+ llbitmap = mddev->bitmap;
+ if (!llbitmap) {
+ mutex_unlock(&mddev->bitmap_info.mutex);
+ return sprintf(page, "no bitmap\n");
+ }
+
+ ret = sprintf(page, "chunksize %lu\nchunkshift %lu\nchunks %lu\noffset %llu\ndaemon_sleep %lu\n",
+ llbitmap->chunksize, llbitmap->chunkshift,
+ llbitmap->chunks, mddev->bitmap_info.offset,
+ llbitmap->mddev->bitmap_info.daemon_sleep);
+ mutex_unlock(&mddev->bitmap_info.mutex);
+
+ return ret;
+}
+
+static struct md_sysfs_entry llbitmap_metadata =
+__ATTR_RO(metadata);
+
+static ssize_t
+daemon_sleep_show(struct mddev *mddev, char *page)
+{
+ return sprintf(page, "%lu\n", mddev->bitmap_info.daemon_sleep);
+}
+
+static ssize_t
+daemon_sleep_store(struct mddev *mddev, const char *buf, size_t len)
+{
+ unsigned long timeout;
+ int rv = kstrtoul(buf, 10, &timeout);
+
+ if (rv)
+ return rv;
+
+ mddev->bitmap_info.daemon_sleep = timeout;
+ return len;
+}
+
+static struct md_sysfs_entry llbitmap_daemon_sleep =
+__ATTR_RW(daemon_sleep);
+
+static struct attribute *md_llbitmap_attrs[] = {
+ &llbitmap_bits.attr,
+ &llbitmap_metadata.attr,
+ &llbitmap_daemon_sleep.attr,
+ NULL
+};
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* [PATCH RFC md-6.16 v3 19/19] md/md-llbitmap: add Kconfig
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
` (17 preceding siblings ...)
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 18/19] md/md-llbitmap: implement sysfs APIs Yu Kuai
@ 2025-05-12 1:19 ` Yu Kuai
2025-05-12 5:19 ` Christoph Hellwig
2025-05-12 5:21 ` [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Christoph Hellwig
19 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 1:19 UTC (permalink / raw)
To: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3
Cc: linux-kernel, dm-devel, linux-raid, yukuai1, yi.zhang, yangerkun,
johnny.chenyi
From: Yu Kuai <yukuai3@huawei.com>
A new config MD_LLBITMAP is added, user can now using llbitmap to
replace the old bitmap.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/md/Kconfig | 12 ++++++
drivers/md/Makefile | 1 +
drivers/md/md-bitmap.h | 16 ++++++++
drivers/md/md-llbitmap.c | 80 ++++++++++++++++++++++++++++++++++++++++
drivers/md/md.c | 6 +++
5 files changed, 115 insertions(+)
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index f913579e731c..655c4e381f7d 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -52,6 +52,18 @@ config MD_BITMAP
If unsure, say Y.
+config MD_LLBITMAP
+ bool "MD RAID lockless bitmap support"
+ default n
+ depends on BLK_DEV_MD
+ help
+ If you say Y here, support for the lockless write intent bitmap will
+ be enabled.
+
+ Note, this is an experimental feature.
+
+ If unsure, say N.
+
config MD_AUTODETECT
bool "Autodetect RAID arrays during kernel boot"
depends on BLK_DEV_MD=y
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 811731840a5c..e70e4d3cbe29 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -39,6 +39,7 @@ linear-y += md-linear.o
obj-$(CONFIG_MD_LINEAR) += linear.o
obj-$(CONFIG_MD_RAID0) += raid0.o
obj-$(CONFIG_MD_BITMAP) += md-bitmap.o
+obj-$(CONFIG_MD_LLBITMAP) += md-llbitmap.o
obj-$(CONFIG_MD_RAID1) += raid1.o
obj-$(CONFIG_MD_RAID10) += raid10.o
obj-$(CONFIG_MD_RAID456) += raid456.o
diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
index 4e27f5f793b7..dd23b6fedb70 100644
--- a/drivers/md/md-bitmap.h
+++ b/drivers/md/md-bitmap.h
@@ -22,6 +22,9 @@ typedef __u16 bitmap_counter_t;
enum bitmap_state {
BITMAP_STALE = 1, /* the bitmap file is out of date or had -EIO */
BITMAP_WRITE_ERROR = 2, /* A write error has occurred */
+ BITMAP_FIRST_USE = 3, /* llbitmap is just created */
+ BITMAP_CLEAN = 4, /* llbitmap is created with assume_clean */
+ BITMAP_DAEMON_BUSY = 5, /* llbitmap daemon is not finished after daemon_sleep */
BITMAP_HOSTENDIAN =15,
};
@@ -176,4 +179,17 @@ static inline void md_bitmap_exit(void)
}
#endif
+#ifdef CONFIG_MD_LLBITMAP
+int md_llbitmap_init(void);
+void md_llbitmap_exit(void);
+#else
+static inline int md_llbitmap_init(void)
+{
+ return 0;
+}
+static inline void md_llbitmap_exit(void)
+{
+}
+#endif
+
#endif
diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
index 6993be132127..5bb60340c7e2 100644
--- a/drivers/md/md-llbitmap.c
+++ b/drivers/md/md-llbitmap.c
@@ -1383,3 +1383,83 @@ static struct attribute *md_llbitmap_attrs[] = {
&llbitmap_daemon_sleep.attr,
NULL
};
+
+static struct attribute_group md_llbitmap_group = {
+ .name = "llbitmap",
+ .attrs = md_llbitmap_attrs,
+};
+
+static struct bitmap_operations llbitmap_ops = {
+ .head = {
+ .type = MD_BITMAP,
+ .id = ID_LLBITMAP,
+ .name = "llbitmap",
+ },
+
+ .enabled = llbitmap_enabled,
+ .create = llbitmap_create,
+ .resize = llbitmap_resize,
+ .load = llbitmap_load,
+ .destroy = llbitmap_destroy,
+
+ .startwrite = llbitmap_startwrite,
+ .endwrite = llbitmap_endwrite,
+ .start_discard = llbitmap_start_discard,
+ .end_discard = llbitmap_end_discard,
+ .unplug = llbitmap_unplug,
+ .flush = llbitmap_flush,
+
+ .blocks_synced = llbitmap_blocks_synced,
+ .skip_sync_blocks = llbitmap_skip_sync_blocks,
+ .start_sync = llbitmap_start_sync,
+ .end_sync = llbitmap_end_sync,
+ .close_sync = llbitmap_close_sync,
+
+ .update_sb = llbitmap_update_sb,
+ .get_stats = llbitmap_get_stats,
+ .dirty_bits = llbitmap_dirty_bits,
+
+ /* not needed */
+ .write_all = llbitmap_write_all,
+ .daemon_work = llbitmap_daemon_work,
+ .cond_end_sync = llbitmap_cond_end_sync,
+
+ /* not supported */
+ .start_behind_write = llbitmap_start_behind_write,
+ .end_behind_write = llbitmap_end_behind_write,
+ .wait_behind_writes = llbitmap_wait_behind_writes,
+ .sync_with_cluster = llbitmap_sync_with_cluster,
+ .get_from_slot = llbitmap_get_from_slot,
+ .copy_from_slot = llbitmap_copy_from_slot,
+ .set_pages = llbitmap_set_pages,
+ .free = llbitmap_free,
+
+ .group = &md_llbitmap_group,
+};
+
+int md_llbitmap_init(void)
+{
+ md_llbitmap_io_wq = alloc_workqueue("md_llbitmap_io",
+ WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
+ if (!md_llbitmap_io_wq)
+ return -ENOMEM;
+
+ md_llbitmap_unplug_wq = alloc_workqueue("md_llbitmap_unplug",
+ WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
+ if (!md_llbitmap_unplug_wq) {
+ destroy_workqueue(md_llbitmap_io_wq);
+ md_llbitmap_io_wq = NULL;
+ return -ENOMEM;
+ }
+
+ return register_md_submodule(&llbitmap_ops.head);
+}
+
+void md_llbitmap_exit(void)
+{
+ destroy_workqueue(md_llbitmap_io_wq);
+ md_llbitmap_io_wq = NULL;
+ destroy_workqueue(md_llbitmap_unplug_wq);
+ md_llbitmap_unplug_wq = NULL;
+ unregister_md_submodule(&llbitmap_ops.head);
+}
diff --git a/drivers/md/md.c b/drivers/md/md.c
index a5dd7a403ea5..6ac5747738dd 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -10191,6 +10191,10 @@ static int __init md_init(void)
if (ret)
return ret;
+ ret = md_llbitmap_init();
+ if (ret)
+ goto err_bitmap;
+
ret = -ENOMEM;
md_wq = alloc_workqueue("md", WQ_MEM_RECLAIM, 0);
if (!md_wq)
@@ -10222,6 +10226,8 @@ static int __init md_init(void)
err_misc_wq:
destroy_workqueue(md_wq);
err_wq:
+ md_llbitmap_exit();
+err_bitmap:
md_bitmap_exit();
return ret;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 19/19] md/md-llbitmap: add Kconfig
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 19/19] md/md-llbitmap: add Kconfig Yu Kuai
@ 2025-05-12 5:19 ` Christoph Hellwig
2025-05-12 8:30 ` Yu Kuai
0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 5:19 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi
On Mon, May 12, 2025 at 09:19:27AM +0800, Yu Kuai wrote:
> +config MD_LLBITMAP
> + bool "MD RAID lockless bitmap support"
> + default n
n is the default default.
> diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
> index 4e27f5f793b7..dd23b6fedb70 100644
> --- a/drivers/md/md-bitmap.h
> +++ b/drivers/md/md-bitmap.h
> @@ -22,6 +22,9 @@ typedef __u16 bitmap_counter_t;
> enum bitmap_state {
> BITMAP_STALE = 1, /* the bitmap file is out of date or had -EIO */
> BITMAP_WRITE_ERROR = 2, /* A write error has occurred */
> + BITMAP_FIRST_USE = 3, /* llbitmap is just created */
> + BITMAP_CLEAN = 4, /* llbitmap is created with assume_clean */
> + BITMAP_DAEMON_BUSY = 5, /* llbitmap daemon is not finished after daemon_sleep */
This should go into patches very early in the series, before the code
referencing them.
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 19/19] md/md-llbitmap: add Kconfig
2025-05-12 5:19 ` Christoph Hellwig
@ 2025-05-12 8:30 ` Yu Kuai
0 siblings, 0 replies; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 8:30 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: xni, colyli, agk, snitzer, mpatocka, song, linux-kernel, dm-devel,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/05/12 13:19, Christoph Hellwig 写道:
> On Mon, May 12, 2025 at 09:19:27AM +0800, Yu Kuai wrote:
>> +config MD_LLBITMAP
>> + bool "MD RAID lockless bitmap support"
>> + default n
>
> n is the default default.
>
>> diff --git a/drivers/md/md-bitmap.h b/drivers/md/md-bitmap.h
>> index 4e27f5f793b7..dd23b6fedb70 100644
>> --- a/drivers/md/md-bitmap.h
>> +++ b/drivers/md/md-bitmap.h
>> @@ -22,6 +22,9 @@ typedef __u16 bitmap_counter_t;
>> enum bitmap_state {
>> BITMAP_STALE = 1, /* the bitmap file is out of date or had -EIO */
>> BITMAP_WRITE_ERROR = 2, /* A write error has occurred */
>> + BITMAP_FIRST_USE = 3, /* llbitmap is just created */
>> + BITMAP_CLEAN = 4, /* llbitmap is created with assume_clean */
>> + BITMAP_DAEMON_BUSY = 5, /* llbitmap daemon is not finished after daemon_sleep */
>
> This should go into patches very early in the series, before the code
> referencing them.
Ok, I'll fold them with the new patch to move some values to md-bitmap.h
in the next version.
Thanks,
Kuai
>
> .
>
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap
2025-05-12 1:19 [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Yu Kuai
` (18 preceding siblings ...)
2025-05-12 1:19 ` [PATCH RFC md-6.16 v3 19/19] md/md-llbitmap: add Kconfig Yu Kuai
@ 2025-05-12 5:21 ` Christoph Hellwig
2025-05-12 8:40 ` Yu Kuai
19 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 5:21 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, xni, colyli, agk, snitzer, mpatocka, song, yukuai3,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi
From looking over the patches this entirely dropped support for bitmap
files for now. Do you plan to get back to those?
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap
2025-05-12 5:21 ` [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap Christoph Hellwig
@ 2025-05-12 8:40 ` Yu Kuai
2025-05-12 13:27 ` Christoph Hellwig
0 siblings, 1 reply; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 8:40 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: xni, colyli, agk, snitzer, mpatocka, song, linux-kernel, dm-devel,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/05/12 13:21, Christoph Hellwig 写道:
>>From looking over the patches this entirely dropped support for bitmap
> files for now. Do you plan to get back to those?
I don't have such plan for now, actually I tend to remove bitmap file,
once llbitmap is ready with lightweight overhead, I expect perforamce
can be better than old bitmap with bitmap file.
If there are cases that llbitmap performace is still much worse than
none bitmap, and bitmap file can reduce the gap, we'll probable think
about bitmap file again.
Thanks,
Kuai
>
> .
>
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap
2025-05-12 8:40 ` Yu Kuai
@ 2025-05-12 13:27 ` Christoph Hellwig
2025-05-12 13:41 ` Yu Kuai
0 siblings, 1 reply; 63+ messages in thread
From: Christoph Hellwig @ 2025-05-12 13:27 UTC (permalink / raw)
To: Yu Kuai
Cc: Christoph Hellwig, xni, colyli, agk, snitzer, mpatocka, song,
linux-kernel, dm-devel, linux-raid, yi.zhang, yangerkun,
johnny.chenyi, yukuai (C)
On Mon, May 12, 2025 at 04:40:02PM +0800, Yu Kuai wrote:
> I don't have such plan for now, actually I tend to remove bitmap file,
> once llbitmap is ready with lightweight overhead, I expect perforamce
> can be better than old bitmap with bitmap file.
>
> If there are cases that llbitmap performace is still much worse than
> none bitmap, and bitmap file can reduce the gap, we'll probable think
> about bitmap file again.
I'd really love to see this replace the old code as soon as possible.
But can we simply drop support for users with the bitmap in a file
in the file system (no matter how much I dislike that use case..)?
^ permalink raw reply [flat|nested] 63+ messages in thread
* Re: [PATCH RFC md-6.16 v3 00/19] md: introduce a new lockless bitmap
2025-05-12 13:27 ` Christoph Hellwig
@ 2025-05-12 13:41 ` Yu Kuai
0 siblings, 0 replies; 63+ messages in thread
From: Yu Kuai @ 2025-05-12 13:41 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: xni, colyli, agk, snitzer, mpatocka, song, linux-kernel, dm-devel,
linux-raid, yi.zhang, yangerkun, johnny.chenyi, yukuai (C)
Hi,
在 2025/05/12 21:27, Christoph Hellwig 写道:
> On Mon, May 12, 2025 at 04:40:02PM +0800, Yu Kuai wrote:
>> I don't have such plan for now, actually I tend to remove bitmap file,
>> once llbitmap is ready with lightweight overhead, I expect perforamce
>> can be better than old bitmap with bitmap file.
>>
>> If there are cases that llbitmap performace is still much worse than
>> none bitmap, and bitmap file can reduce the gap, we'll probable think
>> about bitmap file again.
>
> I'd really love to see this replace the old code as soon as possible.
> But can we simply drop support for users with the bitmap in a file
> in the file system (no matter how much I dislike that use case..)?
>
Do you mean drop it now?
AFAIK, bitmap file in a file system is problem the only case for now, I
don't see any users pass in a raw disk in this case, because bitmap file
is at most 128k by default.
Thanks,
Kuai
> .
>
^ permalink raw reply [flat|nested] 63+ messages in thread