linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Xiao Ni <xni@redhat.com>
To: Yu Kuai <yukuai1@huaweicloud.com>,
	hch@lst.de, colyli@kernel.org, song@kernel.org
Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-raid@vger.kernel.org, yi.zhang@huawei.com,
	yangerkun@huawei.com, johnny.chenyi@huawei.com,
	"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: [PATCH 15/23] md/md-llbitmap: implement llbitmap IO
Date: Fri, 6 Jun 2025 14:24:21 +0800	[thread overview]
Message-ID: <89b1283c-c256-4830-96dd-ef9e5a7ce355@redhat.com> (raw)
In-Reply-To: <7ef969ed-8468-5d63-a08d-886ca853f772@huaweicloud.com>


在 2025/6/6 上午11:48, Yu Kuai 写道:
> Hi,
>
> 在 2025/06/06 11:21, Xiao Ni 写道:
>> Hi Kuai
>>
>> I've read some codes of llbitmap, but I don't figure out the 
>> relationship of in memory bits and in storage bits. Does llbitmap 
>> have the two types as old bitmap? For example, in llbitmap_create, 
>> there is a argument ->bits_per_page which is calculated by 
>> PAGE_SIZE/logical_block_size. As the graph bellow, bits_per_page is 8 
>> (4K/512byte). What does the bit mean? And in the graph below, it 
>> talks 512 bits in one block, what does this bit mean?  I haven't 
>> walked through all codes, maybe I can get the answer myself. If you 
>> can give a summary of how many types of bit and what's the usage of 
>> the bit, it can help to understand it easier.
>
> llbitmap bit is always 1 byte, it's the same in memory and on disk.


I c, thanks for the explanation.

>
> bits_per_page bit is used to track dirty sectors in the memory page.
>
> For example, usually 4k page will contain 8 sectors, each sector is 512
> bytes, if one llbitmap bit is dirty, then the related bits_per_page bit
> will be set as well, and later will write the sector to disk.


Maybe consider another name of bits_per_page? bits_per_page can easily 
let people to think the bitmat bits in one page. Through the graph 
below, maybe blocks_per_page?

Regards

Xiao

>
> Thanks,
> kuai
>
>>
>> Best Regards
>>
>> Xiao
>>
>> 在 2025/5/24 下午2:13, Yu Kuai 写道:
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> READ
>>>
>>> While creating bitmap, all pages will be allocated and read for 
>>> llbitmap,
>>> there won't be read afterwards
>>>
>>> WRITE
>>>
>>> WRITE IO is divided into logical_block_size of the page, the dirty 
>>> state
>>> of each block is tracked independently, for example:
>>>
>>> each page is 4k, contain 8 blocks; each block is 512 bytes contain 
>>> 512 bit;
>>>
>>> | page0 | page1 | ... | page 31 |
>>> |       |
>>> |        \-----------------------\
>>> |                                |
>>> | block0 | block1 | ... | block 8|
>>> |        |
>>> |         \-----------------\
>>> |                            |
>>> | bit0 | bit1 | ... | bit511 |
>>>
>>>  From IO path, if one bit is changed to Dirty or NeedSync, the 
>>> corresponding
>>> subpage will be marked dirty, such block must write first before the 
>>> IO is
>>> issued. This behaviour will affect IO performance, to reduce the 
>>> impact, if
>>> multiple bits are changed in the same block in a short time, all 
>>> bits in
>>> this block will be changed to Dirty/NeedSync, so that there won't be 
>>> any
>>> overhead until daemon clears dirty bits.
>>>
>>> Also add data structure definition and comments.
>>>
>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>> ---
>>>   drivers/md/md-llbitmap.c | 571 
>>> +++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 571 insertions(+)
>>>   create mode 100644 drivers/md/md-llbitmap.c
>>>
>>> diff --git a/drivers/md/md-llbitmap.c b/drivers/md/md-llbitmap.c
>>> new file mode 100644
>>> index 000000000000..1a01b6777527
>>> --- /dev/null
>>> +++ b/drivers/md/md-llbitmap.c
>>> @@ -0,0 +1,571 @@
>>> +// SPDX-License-Identifier: GPL-2.0-or-later
>>> +
>>> +#ifdef CONFIG_MD_LLBITMAP
>>> +
>>> +#include <linux/blkdev.h>
>>> +#include <linux/module.h>
>>> +#include <linux/errno.h>
>>> +#include <linux/slab.h>
>>> +#include <linux/init.h>
>>> +#include <linux/timer.h>
>>> +#include <linux/sched.h>
>>> +#include <linux/list.h>
>>> +#include <linux/file.h>
>>> +#include <linux/seq_file.h>
>>> +#include <trace/events/block.h>
>>> +
>>> +#include "md.h"
>>> +#include "md-bitmap.h"
>>> +
>>> +/*
>>> + * #### Background
>>> + *
>>> + * Redundant data is used to enhance data fault tolerance, and the 
>>> storage
>>> + * method for redundant data vary depending on the RAID levels. And 
>>> it's
>>> + * important to maintain the consistency of redundant data.
>>> + *
>>> + * Bitmap is used to record which data blocks have been 
>>> synchronized and which
>>> + * ones need to be resynchronized or recovered. Each bit in the bitmap
>>> + * represents a segment of data in the array. When a bit is set, it 
>>> indicates
>>> + * that the multiple redundant copies of that data segment may not be
>>> + * consistent. Data synchronization can be performed based on the 
>>> bitmap after
>>> + * power failure or readding a disk. If there is no bitmap, a full 
>>> disk
>>> + * synchronization is required.
>>> + *
>>> + * #### Key Features
>>> + *
>>> + *  - IO fastpath is lockless, if user issues lots of write IO to 
>>> the same
>>> + *  bitmap bit in a short time, only the first write have 
>>> additional overhead
>>> + *  to update bitmap bit, no additional overhead for the following 
>>> writes;
>>> + *  - support only resync or recover written data, means in the 
>>> case creating
>>> + *  new array or replacing with a new disk, there is no need to do 
>>> a full disk
>>> + *  resync/recovery;
>>> + *
>>> + * #### Key Concept
>>> + *
>>> + * ##### State Machine
>>> + *
>>> + * Each bit is one byte, contain 6 difference state, see 
>>> llbitmap_state. And
>>> + * there are total 8 differenct actions, see llbitmap_action, can 
>>> change state:
>>> + *
>>> + * llbitmap state machine: transitions between states
>>> + *
>>> + * |           | Startwrite | Startsync | Endsync | Abortsync|
>>> + * | --------- | ---------- | --------- | ------- | ------- |
>>> + * | Unwritten | Dirty      | x         | x       | x |
>>> + * | Clean     | Dirty      | x         | x       | x |
>>> + * | Dirty     | x          | x         | x       | x |
>>> + * | NeedSync  | x          | Syncing   | x       | x |
>>> + * | Syncing   | x          | Syncing   | Dirty   | NeedSync |
>>> + *
>>> + * |           | Reload   | Daemon | Discard   | Stale     |
>>> + * | --------- | -------- | ------ | --------- | --------- |
>>> + * | Unwritten | x        | x      | x         | x         |
>>> + * | Clean     | x        | x      | Unwritten | NeedSync  |
>>> + * | Dirty     | NeedSync | Clean  | Unwritten | NeedSync  |
>>> + * | NeedSync  | x        | x      | Unwritten | x         |
>>> + * | Syncing   | NeedSync | x      | Unwritten | NeedSync  |
>>> + *
>>> + * Typical scenarios:
>>> + *
>>> + * 1) Create new array
>>> + * All bits will be set to Unwritten by default, if --assume-clean 
>>> is set,
>>> + * all bits will be set to Clean instead.
>>> + *
>>> + * 2) write data, raid1/raid10 have full copy of data, while 
>>> raid456 doesn't and
>>> + * rely on xor data
>>> + *
>>> + * 2.1) write new data to raid1/raid10:
>>> + * Unwritten --StartWrite--> Dirty
>>> + *
>>> + * 2.2) write new data to raid456:
>>> + * Unwritten --StartWrite--> NeedSync
>>> + *
>>> + * Because the initial recover for raid456 is skipped, the xor data 
>>> is not build
>>> + * yet, the bit must set to NeedSync first and after lazy initial 
>>> recover is
>>> + * finished, the bit will finially set to Dirty(see 5.1 and 5.4);
>>> + *
>>> + * 2.3) cover write
>>> + * Clean --StartWrite--> Dirty
>>> + *
>>> + * 3) daemon, if the array is not degraded:
>>> + * Dirty --Daemon--> Clean
>>> + *
>>> + * For degraded array, the Dirty bit will never be cleared, prevent 
>>> full disk
>>> + * recovery while readding a removed disk.
>>> + *
>>> + * 4) discard
>>> + * {Clean, Dirty, NeedSync, Syncing} --Discard--> Unwritten
>>> + *
>>> + * 5) resync and recover
>>> + *
>>> + * 5.1) common process
>>> + * NeedSync --Startsync--> Syncing --Endsync--> Dirty --Daemon--> 
>>> Clean
>>> + *
>>> + * 5.2) resync after power failure
>>> + * Dirty --Reload--> NeedSync
>>> + *
>>> + * 5.3) recover while replacing with a new disk
>>> + * By default, the old bitmap framework will recover all data, and 
>>> llbitmap
>>> + * implement this by a new helper, see llbitmap_skip_sync_blocks:
>>> + *
>>> + * skip recover for bits other than dirty or clean;
>>> + *
>>> + * 5.4) lazy initial recover for raid5:
>>> + * By default, the old bitmap framework will only allow new recover 
>>> when there
>>> + * are spares(new disk), a new recovery flag 
>>> MD_RECOVERY_LAZY_RECOVER is add
>>> + * to perform raid456 lazy recover for set bits(from 2.2).
>>> + *
>>> + * ##### Bitmap IO
>>> + *
>>> + * ##### Chunksize
>>> + *
>>> + * The default bitmap size is 128k, incluing 1k bitmap super block, 
>>> and
>>> + * the default size of segment of data in the array each 
>>> bit(chunksize) is 64k,
>>> + * and chunksize will adjust to twice the old size each time if the 
>>> total number
>>> + * bits is not less than 127k.(see llbitmap_init)
>>> + *
>>> + * ##### READ
>>> + *
>>> + * While creating bitmap, all pages will be allocated and read for 
>>> llbitmap,
>>> + * there won't be read afterwards
>>> + *
>>> + * ##### WRITE
>>> + *
>>> + * WRITE IO is divided into logical_block_size of the array, the 
>>> dirty state
>>> + * of each block is tracked independently, for example:
>>> + *
>>> + * each page is 4k, contain 8 blocks; each block is 512 bytes 
>>> contain 512 bit;
>>> + *
>>> + * | page0 | page1 | ... | page 31 |
>>> + * |       |
>>> + * |        \-----------------------\
>>> + * |                                |
>>> + * | block0 | block1 | ... | block 8|
>>> + * |        |
>>> + * |         \-----------------\
>>> + * |                            |
>>> + * | bit0 | bit1 | ... | bit511 |
>>> + *
>>> + * From IO path, if one bit is changed to Dirty or NeedSync, the 
>>> corresponding
>>> + * subpage will be marked dirty, such block must write first before 
>>> the IO is
>>> + * issued. This behaviour will affect IO performance, to reduce the 
>>> impact, if
>>> + * multiple bits are changed in the same block in a short time, all 
>>> bits in this
>>> + * block will be changed to Dirty/NeedSync, so that there won't be 
>>> any overhead
>>> + * until daemon clears dirty bits.
>>> + *
>>> + * ##### Dirty Bits syncronization
>>> + *
>>> + * IO fast path will set bits to dirty, and those dirty bits will 
>>> be cleared
>>> + * by daemon after IO is done. llbitmap_page_ctl is used to 
>>> synchronize between
>>> + * IO path and daemon;
>>> + *
>>> + * IO path:
>>> + *  1) try to grab a reference, if succeed, set expire time after 
>>> 5s and return;
>>> + *  2) if failed to grab a reference, wait for daemon to finish 
>>> clearing dirty
>>> + *  bits;
>>> + *
>>> + * Daemon(Daemon will be waken up every daemon_sleep seconds):
>>> + * For each page:
>>> + *  1) check if page expired, if not skip this page; for expired page:
>>> + *  2) suspend the page and wait for inflight write IO to be done;
>>> + *  3) change dirty page to clean;
>>> + *  4) resume the page;
>>> + */
>>> +
>>> +#define BITMAP_SB_SIZE 1024
>>> +
>>> +/* 64k is the max IO size of sync IO for raid1/raid10 */
>>> +#define MIN_CHUNK_SIZE (64 * 2)
>>> +
>>> +/* By default, daemon will be waken up every 30s */
>>> +#define DEFAULT_DAEMON_SLEEP 30
>>> +
>>> +/*
>>> + * Dirtied bits that have not been accessed for more than 5s will 
>>> be cleared
>>> + * by daemon.
>>> + */
>>> +#define BARRIER_IDLE 5
>>> +
>>> +enum llbitmap_state {
>>> +    /* No valid data, init state after assemble the array */
>>> +    BitUnwritten = 0,
>>> +    /* data is consistent */
>>> +    BitClean,
>>> +    /* data will be consistent after IO is done, set directly for 
>>> writes */
>>> +    BitDirty,
>>> +    /*
>>> +     * data need to be resynchronized:
>>> +     * 1) set directly for writes if array is degraded, prevent 
>>> full disk
>>> +     * synchronization after readding a disk;
>>> +     * 2) reassemble the array after power failure, and dirty bits are
>>> +     * found after reloading the bitmap;
>>> +     * 3) set for first write for raid5, to build initial xor data 
>>> lazily
>>> +     */
>>> +    BitNeedSync,
>>> +    /* data is synchronizing */
>>> +    BitSyncing,
>>> +    nr_llbitmap_state,
>>> +    BitNone = 0xff,
>>> +};
>>> +
>>> +enum llbitmap_action {
>>> +    /* User write new data, this is the only action from IO fast 
>>> path */
>>> +    BitmapActionStartwrite = 0,
>>> +    /* Start recovery */
>>> +    BitmapActionStartsync,
>>> +    /* Finish recovery */
>>> +    BitmapActionEndsync,
>>> +    /* Failed recovery */
>>> +    BitmapActionAbortsync,
>>> +    /* Reassemble the array */
>>> +    BitmapActionReload,
>>> +    /* Daemon thread is trying to clear dirty bits */
>>> +    BitmapActionDaemon,
>>> +    /* Data is deleted */
>>> +    BitmapActionDiscard,
>>> +    /*
>>> +     * Bitmap is stale, mark all bits in addition to BitUnwritten to
>>> +     * BitNeedSync.
>>> +     */
>>> +    BitmapActionStale,
>>> +    nr_llbitmap_action,
>>> +    /* Init state is BitUnwritten */
>>> +    BitmapActionInit,
>>> +};
>>> +
>>> +enum llbitmap_page_state {
>>> +    LLPageFlush = 0,
>>> +    LLPageDirty,
>>> +};
>>> +
>>> +struct llbitmap_page_ctl {
>>> +    char *state;
>>> +    struct page *page;
>>> +    unsigned long expire;
>>> +    unsigned long flags;
>>> +    wait_queue_head_t wait;
>>> +    struct percpu_ref active;
>>> +    /* Per block size dirty state, maximum 64k page / 1 sector = 
>>> 128 */
>>> +    unsigned long dirty[];
>>> +};
>>> +
>>> +struct llbitmap {
>>> +    struct mddev *mddev;
>>> +    struct llbitmap_page_ctl **pctl;
>>> +
>>> +    unsigned int nr_pages;
>>> +    unsigned int io_size;
>>> +    unsigned int bits_per_page;
>>> +
>>> +    /* shift of one chunk */
>>> +    unsigned long chunkshift;
>>> +    /* size of one chunk in sector */
>>> +    unsigned long chunksize;
>>> +    /* total number of chunks */
>>> +    unsigned long chunks;
>>> +    unsigned long last_end_sync;
>>> +    /* fires on first BitDirty state */
>>> +    struct timer_list pending_timer;
>>> +    struct work_struct daemon_work;
>>> +
>>> +    unsigned long flags;
>>> +    __u64    events_cleared;
>>> +
>>> +    /* for slow disks */
>>> +    atomic_t behind_writes;
>>> +    wait_queue_head_t behind_wait;
>>> +};
>>> +
>>> +struct llbitmap_unplug_work {
>>> +    struct work_struct work;
>>> +    struct llbitmap *llbitmap;
>>> +    struct completion *done;
>>> +};
>>> +
>>> +static struct workqueue_struct *md_llbitmap_io_wq;
>>> +static struct workqueue_struct *md_llbitmap_unplug_wq;
>>> +
>>> +static char state_machine[nr_llbitmap_state][nr_llbitmap_action] = {
>>> +    [BitUnwritten] = {
>>> +        [BitmapActionStartwrite]    = BitDirty,
>>> +        [BitmapActionStartsync]        = BitNone,
>>> +        [BitmapActionEndsync]        = BitNone,
>>> +        [BitmapActionAbortsync]        = BitNone,
>>> +        [BitmapActionReload]        = BitNone,
>>> +        [BitmapActionDaemon]        = BitNone,
>>> +        [BitmapActionDiscard]        = BitNone,
>>> +        [BitmapActionStale]        = BitNone,
>>> +    },
>>> +    [BitClean] = {
>>> +        [BitmapActionStartwrite]    = BitDirty,
>>> +        [BitmapActionStartsync]        = BitNone,
>>> +        [BitmapActionEndsync]        = BitNone,
>>> +        [BitmapActionAbortsync]        = BitNone,
>>> +        [BitmapActionReload]        = BitNone,
>>> +        [BitmapActionDaemon]        = BitNone,
>>> +        [BitmapActionDiscard]        = BitUnwritten,
>>> +        [BitmapActionStale]        = BitNeedSync,
>>> +    },
>>> +    [BitDirty] = {
>>> +        [BitmapActionStartwrite]    = BitNone,
>>> +        [BitmapActionStartsync]        = BitNone,
>>> +        [BitmapActionEndsync]        = BitNone,
>>> +        [BitmapActionAbortsync]        = BitNone,
>>> +        [BitmapActionReload]        = BitNeedSync,
>>> +        [BitmapActionDaemon]        = BitClean,
>>> +        [BitmapActionDiscard]        = BitUnwritten,
>>> +        [BitmapActionStale]        = BitNeedSync,
>>> +    },
>>> +    [BitNeedSync] = {
>>> +        [BitmapActionStartwrite]    = BitNone,
>>> +        [BitmapActionStartsync]        = BitSyncing,
>>> +        [BitmapActionEndsync]        = BitNone,
>>> +        [BitmapActionAbortsync]        = BitNone,
>>> +        [BitmapActionReload]        = BitNone,
>>> +        [BitmapActionDaemon]        = BitNone,
>>> +        [BitmapActionDiscard]        = BitUnwritten,
>>> +        [BitmapActionStale]        = BitNone,
>>> +    },
>>> +    [BitSyncing] = {
>>> +        [BitmapActionStartwrite]    = BitNone,
>>> +        [BitmapActionStartsync]        = BitSyncing,
>>> +        [BitmapActionEndsync]        = BitDirty,
>>> +        [BitmapActionAbortsync]        = BitNeedSync,
>>> +        [BitmapActionReload]        = BitNeedSync,
>>> +        [BitmapActionDaemon]        = BitNone,
>>> +        [BitmapActionDiscard]        = BitUnwritten,
>>> +        [BitmapActionStale]        = BitNeedSync,
>>> +    },
>>> +};
>>> +
>>> +static enum llbitmap_state llbitmap_read(struct llbitmap *llbitmap, 
>>> loff_t pos)
>>> +{
>>> +    unsigned int idx;
>>> +    unsigned int offset;
>>> +
>>> +    pos += BITMAP_SB_SIZE;
>>> +    idx = pos >> PAGE_SHIFT;
>>> +    offset = offset_in_page(pos);
>>> +
>>> +    return llbitmap->pctl[idx]->state[offset];
>>> +}
>>> +
>>> +/* set all the bits in the subpage as dirty */
>>> +static void llbitmap_infect_dirty_bits(struct llbitmap *llbitmap,
>>> +                       struct llbitmap_page_ctl *pctl,
>>> +                       unsigned int bit, unsigned int offset)
>>> +{
>>> +    bool level_456 = raid_is_456(llbitmap->mddev);
>>> +    unsigned int io_size = llbitmap->io_size;
>>> +    int pos;
>>> +
>>> +    for (pos = bit * io_size; pos < (bit + 1) * io_size; pos++) {
>>> +        if (pos == offset)
>>> +            continue;
>>> +
>>> +        switch (pctl->state[pos]) {
>>> +        case BitUnwritten:
>>> +            pctl->state[pos] = level_456 ? BitNeedSync : BitDirty;
>>> +            break;
>>> +        case BitClean:
>>> +            pctl->state[pos] = BitDirty;
>>> +            break;
>>> +        };
>>> +    }
>>> +
>>> +}
>>> +
>>> +static void llbitmap_set_page_dirty(struct llbitmap *llbitmap, int 
>>> idx,
>>> +                    int offset)
>>> +{
>>> +    struct llbitmap_page_ctl *pctl = llbitmap->pctl[idx];
>>> +    unsigned int io_size = llbitmap->io_size;
>>> +    int bit = offset / io_size;
>>> +    int pos;
>>> +
>>> +    if (!test_bit(LLPageDirty, &pctl->flags))
>>> +        set_bit(LLPageDirty, &pctl->flags);
>>> +
>>> +    /*
>>> +     * The subpage usually contains a total of 512 bits. If any 
>>> single bit
>>> +     * within the subpage is marked as dirty, the entire sector 
>>> will be
>>> +     * written. To avoid impacting write performance, when multiple 
>>> bits
>>> +     * within the same sector are modified within a short time 
>>> frame, all
>>> +     * bits in the sector will be collectively marked as dirty at 
>>> once.
>>> +     */
>>> +    if (test_and_set_bit(bit, pctl->dirty)) {
>>> +        llbitmap_infect_dirty_bits(llbitmap, pctl, bit, offset);
>>> +        return;
>>> +    }
>>> +
>>> +    for (pos = bit * io_size; pos < (bit + 1) * io_size; pos++) {
>>> +        if (pos == offset)
>>> +            continue;
>>> +        if (pctl->state[pos] == BitDirty ||
>>> +            pctl->state[pos] == BitNeedSync) {
>>> +            llbitmap_infect_dirty_bits(llbitmap, pctl, bit, offset);
>>> +            return;
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +static void llbitmap_write(struct llbitmap *llbitmap, enum 
>>> llbitmap_state state,
>>> +               loff_t pos)
>>> +{
>>> +    unsigned int idx;
>>> +    unsigned int offset;
>>> +
>>> +    pos += BITMAP_SB_SIZE;
>>> +    idx = pos >> PAGE_SHIFT;
>>> +    offset = offset_in_page(pos);
>>> +
>>> +    llbitmap->pctl[idx]->state[offset] = state;
>>> +    if (state == BitDirty || state == BitNeedSync)
>>> +        llbitmap_set_page_dirty(llbitmap, idx, offset);
>>> +}
>>> +
>>> +static struct page *llbitmap_read_page(struct llbitmap *llbitmap, 
>>> int idx)
>>> +{
>>> +    struct mddev *mddev = llbitmap->mddev;
>>> +    struct page *page = NULL;
>>> +    struct md_rdev *rdev;
>>> +
>>> +    if (llbitmap->pctl && llbitmap->pctl[idx])
>>> +        page = llbitmap->pctl[idx]->page;
>>> +    if (page)
>>> +        return page;
>>> +
>>> +    page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>> +    if (!page)
>>> +        return ERR_PTR(-ENOMEM);
>>> +
>>> +    rdev_for_each(rdev, mddev) {
>>> +        sector_t sector;
>>> +
>>> +        if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags))
>>> +            continue;
>>> +
>>> +        sector = mddev->bitmap_info.offset +
>>> +             (idx << PAGE_SECTORS_SHIFT);
>>> +
>>> +        if (sync_page_io(rdev, sector, PAGE_SIZE, page, REQ_OP_READ,
>>> +                 true))
>>> +            return page;
>>> +
>>> +        md_error(mddev, rdev);
>>> +    }
>>> +
>>> +    __free_page(page);
>>> +    return ERR_PTR(-EIO);
>>> +}
>>> +
>>> +static void llbitmap_write_page(struct llbitmap *llbitmap, int idx)
>>> +{
>>> +    struct page *page = llbitmap->pctl[idx]->page;
>>> +    struct mddev *mddev = llbitmap->mddev;
>>> +    struct md_rdev *rdev;
>>> +    int bit;
>>> +
>>> +    for (bit = 0; bit < llbitmap->bits_per_page; bit++) {
>>> +        struct llbitmap_page_ctl *pctl = llbitmap->pctl[idx];
>>> +
>>> +        if (!test_and_clear_bit(bit, pctl->dirty))
>>> +            continue;
>>> +
>>> +        rdev_for_each(rdev, mddev) {
>>> +            sector_t sector;
>>> +            sector_t bit_sector = llbitmap->io_size >> SECTOR_SHIFT;
>>> +
>>> +            if (rdev->raid_disk < 0 || test_bit(Faulty, &rdev->flags))
>>> +                continue;
>>> +
>>> +            sector = mddev->bitmap_info.offset + rdev->sb_start +
>>> +                 (idx << PAGE_SECTORS_SHIFT) +
>>> +                 bit * bit_sector;
>>> +            md_write_metadata(mddev, rdev, sector,
>>> +                      llbitmap->io_size, page,
>>> +                      bit * llbitmap->io_size);
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +static void active_release(struct percpu_ref *ref)
>>> +{
>>> +    struct llbitmap_page_ctl *pctl =
>>> +        container_of(ref, struct llbitmap_page_ctl, active);
>>> +
>>> +    wake_up(&pctl->wait);
>>> +}
>>> +
>>> +static void llbitmap_free_pages(struct llbitmap *llbitmap)
>>> +{
>>> +    int i;
>>> +
>>> +    if (!llbitmap->pctl)
>>> +        return;
>>> +
>>> +    for (i = 0; i < llbitmap->nr_pages; i++) {
>>> +        struct llbitmap_page_ctl *pctl = llbitmap->pctl[i];
>>> +
>>> +        if (!pctl || !pctl->page)
>>> +            break;
>>> +
>>> +        __free_page(pctl->page);
>>> +        percpu_ref_exit(&pctl->active);
>>> +    }
>>> +
>>> +    kfree(llbitmap->pctl[0]);
>>> +    kfree(llbitmap->pctl);
>>> +    llbitmap->pctl = NULL;
>>> +}
>>> +
>>> +static int llbitmap_cache_pages(struct llbitmap *llbitmap)
>>> +{
>>> +    struct llbitmap_page_ctl *pctl;
>>> +    unsigned int nr_pages = DIV_ROUND_UP(llbitmap->chunks + 
>>> BITMAP_SB_SIZE,
>>> +                         PAGE_SIZE);
>>> +    unsigned int size = struct_size(pctl, dirty,
>>> + BITS_TO_LONGS(llbitmap->bits_per_page));
>>> +    int i;
>>> +
>>> +    llbitmap->pctl = kmalloc_array(nr_pages, sizeof(void *),
>>> +                       GFP_KERNEL | __GFP_ZERO);
>>> +    if (!llbitmap->pctl)
>>> +        return -ENOMEM;
>>> +
>>> +    size = round_up(size, cache_line_size());
>>> +    pctl = kmalloc_array(nr_pages, size, GFP_KERNEL | __GFP_ZERO);
>>> +    if (!pctl) {
>>> +        kfree(llbitmap->pctl);
>>> +        return -ENOMEM;
>>> +    }
>>> +
>>> +    llbitmap->nr_pages = nr_pages;
>>> +
>>> +    for (i = 0; i < nr_pages; i++, pctl = (void *)pctl + size) {
>>> +        struct page *page = llbitmap_read_page(llbitmap, i);
>>> +
>>> +        llbitmap->pctl[i] = pctl;
>>> +
>>> +        if (IS_ERR(page)) {
>>> +            llbitmap_free_pages(llbitmap);
>>> +            return PTR_ERR(page);
>>> +        }
>>> +
>>> +        if (percpu_ref_init(&pctl->active, active_release,
>>> +                    PERCPU_REF_ALLOW_REINIT, GFP_KERNEL)) {
>>> +            __free_page(page);
>>> +            llbitmap_free_pages(llbitmap);
>>> +            return -ENOMEM;
>>> +        }
>>> +
>>> +        pctl->page = page;
>>> +        pctl->state = page_address(page);
>>> +        init_waitqueue_head(&pctl->wait);
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +#endif /* CONFIG_MD_LLBITMAP */
>>
>>
>> .
>>
>


  reply	other threads:[~2025-06-06  6:24 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-24  6:12 [PATCH 00/23] md/llbitmap: md/md-llbitmap: introduce a new lockless bitmap Yu Kuai
2025-05-24  6:12 ` [PATCH 01/23] md: add a new parameter 'offset' to md_super_write() Yu Kuai
2025-05-25 15:50   ` Xiao Ni
2025-05-26  6:28   ` Christoph Hellwig
2025-05-26  7:28     ` Yu Kuai
2025-05-27  5:54   ` Hannes Reinecke
2025-05-24  6:12 ` [PATCH 02/23] md: factor out a helper raid_is_456() Yu Kuai
2025-05-25 15:50   ` Xiao Ni
2025-05-26  6:28   ` Christoph Hellwig
2025-05-27  5:55   ` Hannes Reinecke
2025-05-24  6:13 ` [PATCH 03/23] md/md-bitmap: cleanup bitmap_ops->startwrite() Yu Kuai
2025-05-25 15:51   ` Xiao Ni
2025-05-26  6:29   ` Christoph Hellwig
2025-05-27  5:56   ` Hannes Reinecke
2025-05-24  6:13 ` [PATCH 04/23] md/md-bitmap: support discard for bitmap ops Yu Kuai
2025-05-25 15:53   ` Xiao Ni
2025-05-26  6:29   ` Christoph Hellwig
2025-05-27  6:01   ` Hannes Reinecke
2025-05-28  7:04   ` Glass Su
2025-05-24  6:13 ` [PATCH 05/23] md/md-bitmap: remove parameter slot from bitmap_create() Yu Kuai
2025-05-25 16:09   ` Xiao Ni
2025-05-26  6:30   ` Christoph Hellwig
2025-05-27  6:01   ` Hannes Reinecke
2025-05-24  6:13 ` [PATCH 06/23] md/md-bitmap: add a new sysfs api bitmap_type Yu Kuai
2025-05-25 16:32   ` Xiao Ni
2025-05-26  1:13     ` Yu Kuai
2025-05-26  5:11       ` Xiao Ni
2025-05-26  8:02         ` Yu Kuai
2025-05-26  6:32   ` Christoph Hellwig
2025-05-26  7:45     ` Yu Kuai
2025-05-27  8:21       ` Christoph Hellwig
2025-05-27  6:10   ` Hannes Reinecke
2025-05-27  7:43     ` Yu Kuai
2025-05-24  6:13 ` [PATCH 07/23] md/md-bitmap: delay registration of bitmap_ops until creating bitmap Yu Kuai
2025-05-26  6:32   ` Christoph Hellwig
2025-05-26  6:52   ` Xiao Ni
2025-05-26  7:57     ` Yu Kuai
2025-05-27  2:15       ` Xiao Ni
2025-05-27  2:49         ` Yu Kuai
2025-05-27  6:13   ` Hannes Reinecke
2025-05-27  7:53     ` Yu Kuai
2025-05-27  8:54       ` Hannes Reinecke
2025-05-24  6:13 ` [PATCH 08/23] md/md-bitmap: add a new method skip_sync_blocks() in bitmap_operations Yu Kuai
2025-05-26  7:03   ` Xiao Ni
2025-05-27  6:14   ` Hannes Reinecke
2025-05-24  6:13 ` [PATCH 09/23] md/md-bitmap: add a new method blocks_synced() " Yu Kuai
2025-05-27  2:35   ` Xiao Ni
2025-05-27  2:48     ` Yu Kuai
2025-05-27  6:16   ` Hannes Reinecke
2025-05-24  6:13 ` [PATCH 10/23] md: add a new recovery_flag MD_RECOVERY_LAZY_RECOVER Yu Kuai
2025-05-27  6:17   ` Hannes Reinecke
2025-05-27  8:00     ` Yu Kuai
2025-05-24  6:13 ` [PATCH 11/23] md/md-bitmap: make method bitmap_ops->daemon_work optional Yu Kuai
2025-05-26  6:34   ` Christoph Hellwig
2025-05-27  6:19   ` Hannes Reinecke
2025-05-27  8:03     ` Yu Kuai
2025-05-27  8:55       ` Hannes Reinecke
2025-05-24  6:13 ` [PATCH 12/23] md/md-bitmap: add macros for lockless bitmap Yu Kuai
2025-05-26  6:40   ` Christoph Hellwig
2025-05-26  8:12     ` Yu Kuai
2025-05-27  8:22       ` Christoph Hellwig
2025-05-27  6:21   ` Hannes Reinecke
2025-05-28  4:53   ` Xiao Ni
2025-05-24  6:13 ` [PATCH 13/23] md/md-bitmap: fix dm-raid max_write_behind setting Yu Kuai
2025-05-26  6:40   ` Christoph Hellwig
2025-05-27  6:21   ` Hannes Reinecke
2025-05-24  6:13 ` [PATCH 14/23] md/dm-raid: remove max_write_behind setting limit Yu Kuai
2025-05-26  6:41   ` Christoph Hellwig
2025-05-27  6:26   ` Hannes Reinecke
2025-05-28  4:58   ` Xiao Ni
2025-05-24  6:13 ` [PATCH 15/23] md/md-llbitmap: implement llbitmap IO Yu Kuai
2025-05-27  8:27   ` Christoph Hellwig
2025-05-27  8:55     ` Yu Kuai
2025-05-27  8:58       ` Yu Kuai
2025-06-06  3:21   ` Xiao Ni
2025-06-06  3:48     ` Yu Kuai
2025-06-06  6:24       ` Xiao Ni [this message]
2025-06-06  8:56         ` Yu Kuai
2025-06-30  2:07   ` Xiao Ni
2025-06-30  2:17     ` Yu Kuai
2025-05-24  6:13 ` [PATCH 16/23] md/md-llbitmap: implement bit state machine Yu Kuai
2025-06-30  2:14   ` Xiao Ni
2025-06-30  2:25     ` Yu Kuai
2025-06-30  8:25       ` Xiao Ni
2025-06-30 11:05         ` Yu Kuai
2025-06-30 11:30           ` Yu Kuai
2025-07-01  1:55           ` Xiao Ni
2025-07-01  2:02             ` Yu Kuai
2025-07-01  2:31               ` Xiao Ni
2025-05-24  6:13 ` [PATCH 17/23] md/md-llbitmap: implement APIs for page level dirty bits synchronization Yu Kuai
2025-05-24  6:13 ` [PATCH 18/23] md/md-llbitmap: implement APIs to mange bitmap lifetime Yu Kuai
2025-05-29  7:03   ` Xiao Ni
2025-05-29  9:03     ` Yu Kuai
2025-05-24  6:13 ` [PATCH 19/23] md/md-llbitmap: implement APIs to dirty bits and clear bits Yu Kuai
2025-05-24  6:13 ` [PATCH 20/23] md/md-llbitmap: implement APIs for sync_thread Yu Kuai
2025-05-24  6:13 ` [PATCH 21/23] md/md-llbitmap: implement all bitmap operations Yu Kuai
2025-05-24  6:13 ` [PATCH 22/23] md/md-llbitmap: implement sysfs APIs Yu Kuai
2025-05-24  6:13 ` [PATCH 23/23] md/md-llbitmap: add Kconfig Yu Kuai
2025-05-27  8:29   ` Christoph Hellwig
2025-05-27  9:00     ` Yu Kuai
2025-05-24  7:07 ` [PATCH 00/23] md/llbitmap: md/md-llbitmap: introduce a new lockless bitmap Yu Kuai
2025-05-30  6:45 ` Yu Kuai
2025-06-30  1:59 ` Xiao Ni
2025-06-30  2:34   ` Yu Kuai
2025-06-30  3:25     ` Xiao Ni
2025-06-30  3:46       ` Yu Kuai
2025-06-30  5:38         ` Xiao Ni
2025-06-30  6:09           ` Yu Kuai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=89b1283c-c256-4830-96dd-ef9e5a7ce355@redhat.com \
    --to=xni@redhat.com \
    --cc=colyli@kernel.org \
    --cc=hch@lst.de \
    --cc=johnny.chenyi@huawei.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=song@kernel.org \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai1@huaweicloud.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).