linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/23] md/llbitmap: md/md-llbitmap: introduce a new lockless bitmap
@ 2025-05-24  6:12 Yu Kuai
  2025-05-24  6:12 ` [PATCH 01/23] md: add a new parameter 'offset' to md_super_write() Yu Kuai
                   ` (25 more replies)
  0 siblings, 26 replies; 108+ messages in thread
From: Yu Kuai @ 2025-05-24  6:12 UTC (permalink / raw)
  To: hch, xni, colyli, song, yukuai3
  Cc: linux-doc, linux-kernel, linux-raid, yukuai1, yi.zhang, yangerkun,
	johnny.chenyi

From: Yu Kuai <yukuai3@huawei.com>

This is the formal version after previous RFC version:

https://lore.kernel.org/all/20250512011927.2809400-1-yukuai1@huaweicloud.com/

#### Background

Redundant data is used to enhance data fault tolerance, and the storage
method for redundant data vary depending on the RAID levels. And it's
important to maintain the consistency of redundant data.

Bitmap is used to record which data blocks have been synchronized and which
ones need to be resynchronized or recovered. Each bit in the bitmap
represents a segment of data in the array. When a bit is set, it indicates
that the multiple redundant copies of that data segment may not be
consistent. Data synchronization can be performed based on the bitmap after
power failure or readding a disk. If there is no bitmap, a full disk
synchronization is required.

#### Key Features

 - IO fastpath is lockless, if user issues lots of write IO to the same
 bitmap bit in a short time, only the first write have additional overhead
 to update bitmap bit, no additional overhead for the following writes;
 - support only resync or recover written data, means in the case creating
 new array or replacing with a new disk, there is no need to do a full disk
 resync/recovery;

#### Key Concept

##### State Machine

Each bit is one byte, contain 6 difference state, see llbitmap_state. And
there are total 8 differenct actions, see llbitmap_action, can change state:

llbitmap state machine: transitions between states

|           | Startwrite | Startsync | Endsync | Abortsync|
| --------- | ---------- | --------- | ------- | -------  |
| Unwritten | Dirty      | x         | x       | x        |
| Clean     | Dirty      | x         | x       | x        |
| Dirty     | x          | x         | x       | x        |
| NeedSync  | x          | Syncing   | x       | x        |
| Syncing   | x          | Syncing   | Dirty   | NeedSync |

|           | Reload   | Daemon | Discard   | Stale     |
| --------- | -------- | ------ | --------- | --------- |
| Unwritten | x        | x      | x         | x         |
| Clean     | x        | x      | Unwritten | NeedSync  |
| Dirty     | NeedSync | Clean  | Unwritten | NeedSync  |
| NeedSync  | x        | x      | Unwritten | x         |
| Syncing   | NeedSync | x      | Unwritten | NeedSync  |

Typical scenarios:

1) Create new array
All bits will be set to Unwritten by default, if --assume-clean is set,
all bits will be set to Clean instead.

2) write data, raid1/raid10 have full copy of data, while raid456 doesn't and
rely on xor data

2.1) write new data to raid1/raid10:
Unwritten --StartWrite--> Dirty

2.2) write new data to raid456:
Unwritten --StartWrite--> NeedSync

Because the initial recover for raid456 is skipped, the xor data is not build
yet, the bit must set to NeedSync first and after lazy initial recover is
finished, the bit will finially set to Dirty(see 5.1 and 5.4);

2.3) cover write
Clean --StartWrite--> Dirty

3) daemon, if the array is not degraded:
Dirty --Daemon--> Clean

For degraded array, the Dirty bit will never be cleared, prevent full disk
recovery while readding a removed disk.

4) discard
{Clean, Dirty, NeedSync, Syncing} --Discard--> Unwritten

5) resync and recover

5.1) common process
NeedSync --Startsync--> Syncing --Endsync--> Dirty --Daemon--> Clean

5.2) resync after power failure
Dirty --Reload--> NeedSync

5.3) recover while replacing with a new disk
By default, the old bitmap framework will recover all data, and llbitmap
implement this by a new helper, see llbitmap_skip_sync_blocks:

skip recover for bits other than dirty or clean;

5.4) lazy initial recover for raid5:
By default, the old bitmap framework will only allow new recover when there
are spares(new disk), a new recovery flag MD_RECOVERY_LAZY_RECOVER is add
to perform raid456 lazy recover for set bits(from 2.2).

##### Bitmap IO

##### Chunksize

The default bitmap size is 128k, incluing 1k bitmap super block, and
the default size of segment of data in the array each bit(chunksize) is 64k,
and chunksize will adjust to twice the old size each time if the total number
bits is not less than 127k.(see llbitmap_init)

##### READ

While creating bitmap, all pages will be allocated and read for llbitmap,
there won't be read afterwards

##### WRITE

WRITE IO is divided into logical_block_size of the array, the dirty state
of each block is tracked independently, for example:

each page is 4k, contain 8 blocks; each block is 512 bytes contain 512 bit;

| page0 | page1 | ... | page 31 |
|       |
|        \-----------------------\
|                                |
| block0 | block1 | ... | block 8|
|        |
|         \-----------------\
|                            |
| bit0 | bit1 | ... | bit511 |

From IO path, if one bit is changed to Dirty or NeedSync, the corresponding
subpage will be marked dirty, such block must write first before the IO is
issued. This behaviour will affect IO performance, to reduce the impact, if
multiple bits are changed in the same block in a short time, all bits in this
block will be changed to Dirty/NeedSync, so that there won't be any overhead
until daemon clears dirty bits.

##### Dirty Bits syncronization

IO fast path will set bits to dirty, and those dirty bits will be cleared
by daemon after IO is done. llbitmap_page_ctl is used to synchronize between
IO path and daemon;

IO path:
 1) try to grab a reference, if succeed, set expire time after 5s and return;
 2) if failed to grab a reference, wait for daemon to finish clearing dirty
 bits;

Daemon(Daemon will be waken up every daemon_sleep seconds):
For each page:
 1) check if page expired, if not skip this page; for expired page:
 2) suspend the page and wait for inflight write IO to be done;
 3) change dirty page to clean;
 4) resume the page;

Performance Test:
Simple fio randwrite test to build array with 20GB ramdisk in my VM:

|                      | none      | bitmap    | llbitmap  |
| -------------------- | --------- | --------- | --------- |
| raid1                | 13.7MiB/s | 9696KiB/s | 19.5MiB/s |
| raid1(assume clean)  | 19.5MiB/s | 11.9MiB/s | 19.5MiB/s |
| raid10               | 21.9MiB/s | 11.6MiB/s | 27.8MiB/s |
| raid10(assume clean) | 27.8MiB/s | 15.4MiB/s | 27.8MiB/s |
| raid5                | 14.0MiB/s | 11.6MiB/s | 12.9MiB/s |
| raid5(assume clean)  | 17.8MiB/s | 13.4MiB/s | 13.9MiB/s |

For raid1/raid10 llbitmap can be better than none bitmap with background
initial resync, and it's the same as none bitmap without it.

Noted that llbitmap performance improvement for raid5 is not obvious,
this is due to raid5 has many other performance bottleneck, perf
results still shows that bitmap overhead will be much less.

following branch for review or test:
https://git.kernel.org/pub/scm/linux/kernel/git/yukuai/linux.git/log/?h=yukuai/md-llbitmap

Yu Kuai (23):
  md: add a new parameter 'offset' to md_super_write()
  md: factor out a helper raid_is_456()
  md/md-bitmap: cleanup bitmap_ops->startwrite()
  md/md-bitmap: support discard for bitmap ops
  md/md-bitmap: remove parameter slot from bitmap_create()
  md/md-bitmap: add a new sysfs api bitmap_type
  md/md-bitmap: delay registration of bitmap_ops until creating bitmap
  md/md-bitmap: add a new method skip_sync_blocks() in bitmap_operations
  md/md-bitmap: add a new method blocks_synced() in bitmap_operations
  md: add a new recovery_flag MD_RECOVERY_LAZY_RECOVER
  md/md-bitmap: make method bitmap_ops->daemon_work optional
  md/md-bitmap: add macros for lockless bitmap
  md/md-bitmap: fix dm-raid max_write_behind setting
  md/dm-raid: remove max_write_behind setting limit
  md/md-llbitmap: implement llbitmap IO
  md/md-llbitmap: implement bit state machine
  md/md-llbitmap: implement APIs for page level dirty bits
    synchronization
  md/md-llbitmap: implement APIs to mange bitmap lifetime
  md/md-llbitmap: implement APIs to dirty bits and clear bits
  md/md-llbitmap: implement APIs for sync_thread
  md/md-llbitmap: implement all bitmap operations
  md/md-llbitmap: implement sysfs APIs
  md/md-llbitmap: add Kconfig

 Documentation/admin-guide/md.rst |   80 +-
 drivers/md/Kconfig               |   11 +
 drivers/md/Makefile              |    2 +-
 drivers/md/dm-raid.c             |    6 +-
 drivers/md/md-bitmap.c           |   50 +-
 drivers/md/md-bitmap.h           |   55 +-
 drivers/md/md-llbitmap.c         | 1556 ++++++++++++++++++++++++++++++
 drivers/md/md.c                  |  247 +++--
 drivers/md/md.h                  |   20 +-
 drivers/md/raid5.c               |    6 +
 10 files changed, 1901 insertions(+), 132 deletions(-)
 create mode 100644 drivers/md/md-llbitmap.c

-- 
2.39.2


^ permalink raw reply	[flat|nested] 108+ messages in thread

end of thread, other threads:[~2025-07-01  2:32 UTC | newest]

Thread overview: 108+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-24  6:12 [PATCH 00/23] md/llbitmap: md/md-llbitmap: introduce a new lockless bitmap Yu Kuai
2025-05-24  6:12 ` [PATCH 01/23] md: add a new parameter 'offset' to md_super_write() Yu Kuai
2025-05-25 15:50   ` Xiao Ni
2025-05-26  6:28   ` Christoph Hellwig
2025-05-26  7:28     ` Yu Kuai
2025-05-27  5:54   ` Hannes Reinecke
2025-05-24  6:12 ` [PATCH 02/23] md: factor out a helper raid_is_456() Yu Kuai
2025-05-25 15:50   ` Xiao Ni
2025-05-26  6:28   ` Christoph Hellwig
2025-05-27  5:55   ` Hannes Reinecke
2025-05-24  6:13 ` [PATCH 03/23] md/md-bitmap: cleanup bitmap_ops->startwrite() Yu Kuai
2025-05-25 15:51   ` Xiao Ni
2025-05-26  6:29   ` Christoph Hellwig
2025-05-27  5:56   ` Hannes Reinecke
2025-05-24  6:13 ` [PATCH 04/23] md/md-bitmap: support discard for bitmap ops Yu Kuai
2025-05-25 15:53   ` Xiao Ni
2025-05-26  6:29   ` Christoph Hellwig
2025-05-27  6:01   ` Hannes Reinecke
2025-05-28  7:04   ` Glass Su
2025-05-24  6:13 ` [PATCH 05/23] md/md-bitmap: remove parameter slot from bitmap_create() Yu Kuai
2025-05-25 16:09   ` Xiao Ni
2025-05-26  6:30   ` Christoph Hellwig
2025-05-27  6:01   ` Hannes Reinecke
2025-05-24  6:13 ` [PATCH 06/23] md/md-bitmap: add a new sysfs api bitmap_type Yu Kuai
2025-05-25 16:32   ` Xiao Ni
2025-05-26  1:13     ` Yu Kuai
2025-05-26  5:11       ` Xiao Ni
2025-05-26  8:02         ` Yu Kuai
2025-05-26  6:32   ` Christoph Hellwig
2025-05-26  7:45     ` Yu Kuai
2025-05-27  8:21       ` Christoph Hellwig
2025-05-27  6:10   ` Hannes Reinecke
2025-05-27  7:43     ` Yu Kuai
2025-05-24  6:13 ` [PATCH 07/23] md/md-bitmap: delay registration of bitmap_ops until creating bitmap Yu Kuai
2025-05-26  6:32   ` Christoph Hellwig
2025-05-26  6:52   ` Xiao Ni
2025-05-26  7:57     ` Yu Kuai
2025-05-27  2:15       ` Xiao Ni
2025-05-27  2:49         ` Yu Kuai
2025-05-27  6:13   ` Hannes Reinecke
2025-05-27  7:53     ` Yu Kuai
2025-05-27  8:54       ` Hannes Reinecke
2025-05-24  6:13 ` [PATCH 08/23] md/md-bitmap: add a new method skip_sync_blocks() in bitmap_operations Yu Kuai
2025-05-26  7:03   ` Xiao Ni
2025-05-27  6:14   ` Hannes Reinecke
2025-05-24  6:13 ` [PATCH 09/23] md/md-bitmap: add a new method blocks_synced() " Yu Kuai
2025-05-27  2:35   ` Xiao Ni
2025-05-27  2:48     ` Yu Kuai
2025-05-27  6:16   ` Hannes Reinecke
2025-05-24  6:13 ` [PATCH 10/23] md: add a new recovery_flag MD_RECOVERY_LAZY_RECOVER Yu Kuai
2025-05-27  6:17   ` Hannes Reinecke
2025-05-27  8:00     ` Yu Kuai
2025-05-24  6:13 ` [PATCH 11/23] md/md-bitmap: make method bitmap_ops->daemon_work optional Yu Kuai
2025-05-26  6:34   ` Christoph Hellwig
2025-05-27  6:19   ` Hannes Reinecke
2025-05-27  8:03     ` Yu Kuai
2025-05-27  8:55       ` Hannes Reinecke
2025-05-24  6:13 ` [PATCH 12/23] md/md-bitmap: add macros for lockless bitmap Yu Kuai
2025-05-26  6:40   ` Christoph Hellwig
2025-05-26  8:12     ` Yu Kuai
2025-05-27  8:22       ` Christoph Hellwig
2025-05-27  6:21   ` Hannes Reinecke
2025-05-28  4:53   ` Xiao Ni
2025-05-24  6:13 ` [PATCH 13/23] md/md-bitmap: fix dm-raid max_write_behind setting Yu Kuai
2025-05-26  6:40   ` Christoph Hellwig
2025-05-27  6:21   ` Hannes Reinecke
2025-05-24  6:13 ` [PATCH 14/23] md/dm-raid: remove max_write_behind setting limit Yu Kuai
2025-05-26  6:41   ` Christoph Hellwig
2025-05-27  6:26   ` Hannes Reinecke
2025-05-28  4:58   ` Xiao Ni
2025-05-24  6:13 ` [PATCH 15/23] md/md-llbitmap: implement llbitmap IO Yu Kuai
2025-05-27  8:27   ` Christoph Hellwig
2025-05-27  8:55     ` Yu Kuai
2025-05-27  8:58       ` Yu Kuai
2025-06-06  3:21   ` Xiao Ni
2025-06-06  3:48     ` Yu Kuai
2025-06-06  6:24       ` Xiao Ni
2025-06-06  8:56         ` Yu Kuai
2025-06-30  2:07   ` Xiao Ni
2025-06-30  2:17     ` Yu Kuai
2025-05-24  6:13 ` [PATCH 16/23] md/md-llbitmap: implement bit state machine Yu Kuai
2025-06-30  2:14   ` Xiao Ni
2025-06-30  2:25     ` Yu Kuai
2025-06-30  8:25       ` Xiao Ni
2025-06-30 11:05         ` Yu Kuai
2025-06-30 11:30           ` Yu Kuai
2025-07-01  1:55           ` Xiao Ni
2025-07-01  2:02             ` Yu Kuai
2025-07-01  2:31               ` Xiao Ni
2025-05-24  6:13 ` [PATCH 17/23] md/md-llbitmap: implement APIs for page level dirty bits synchronization Yu Kuai
2025-05-24  6:13 ` [PATCH 18/23] md/md-llbitmap: implement APIs to mange bitmap lifetime Yu Kuai
2025-05-29  7:03   ` Xiao Ni
2025-05-29  9:03     ` Yu Kuai
2025-05-24  6:13 ` [PATCH 19/23] md/md-llbitmap: implement APIs to dirty bits and clear bits Yu Kuai
2025-05-24  6:13 ` [PATCH 20/23] md/md-llbitmap: implement APIs for sync_thread Yu Kuai
2025-05-24  6:13 ` [PATCH 21/23] md/md-llbitmap: implement all bitmap operations Yu Kuai
2025-05-24  6:13 ` [PATCH 22/23] md/md-llbitmap: implement sysfs APIs Yu Kuai
2025-05-24  6:13 ` [PATCH 23/23] md/md-llbitmap: add Kconfig Yu Kuai
2025-05-27  8:29   ` Christoph Hellwig
2025-05-27  9:00     ` Yu Kuai
2025-05-24  7:07 ` [PATCH 00/23] md/llbitmap: md/md-llbitmap: introduce a new lockless bitmap Yu Kuai
2025-05-30  6:45 ` Yu Kuai
2025-06-30  1:59 ` Xiao Ni
2025-06-30  2:34   ` Yu Kuai
2025-06-30  3:25     ` Xiao Ni
2025-06-30  3:46       ` Yu Kuai
2025-06-30  5:38         ` Xiao Ni
2025-06-30  6:09           ` Yu Kuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).