From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
To: Coly Li <colyli@suse.de>
Cc: Eric Wheeler <bcache@lists.ewheeler.net>,
Kent Overstreet <kent.overstreet@gmail.com>,
linux-bcache@vger.kernel.org
Subject: Re: [RFC] Live resize of backing device
Date: Thu, 8 Sep 2022 10:32:30 +0200 [thread overview]
Message-ID: <4ddb082f-cefc-644e-2ccf-56d41207ecd3@devo.com> (raw)
In-Reply-To: <CAHykVA4zGN=WA4A3njQ3VdX4age2-AXq3EcW1qRTFbf=o1=yDw@mail.gmail.com>
From 59787372cf21af0b79e895578ae05b6586dfeb09 Mon Sep 17 00:00:00 2001
From: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
Date: Thu, 8 Sep 2022 09:47:55 +0200
Subject: [PATCH] bcache: Add support for live resize of backing devices
Signed-off-by: Andrea Tomassetti <andrea.tomassetti-opensource@devo.com>
---
Hi Coly,
Here is the first version of the patch. There are some points I noted down
that I would like to discuss with you:
- I found it pretty convenient to hook the call of the new added function
inside the `register_bcache`. In fact, every time (at least from my
understandings) a disk changes size, it will trigger a new probe and,
thus, `register_bcache` will be triggered. The only inconvenient
is that, in case of success, the function will output
`error: capacity changed` even if it's not really an error.
- I'm using `kvrealloc`, introduced in kernel version 5.15, to resize
`stripe_sectors_dirty` and `full_dirty_stripes`. It shouldn't be a
problem, right?
- There is some reused code between this new function and
`bcache_device_init`. Maybe I can move `const size_t max_stripes` to
a broader scope or make it a macro, what do you think?
Thank you very much,
Andrea
drivers/md/bcache/super.c | 75 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 74 insertions(+), 1 deletion(-)
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index ba3909bb6bea..9a77caf2a18f 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2443,6 +2443,76 @@ static bool bch_is_open(dev_t dev)
return bch_is_open_cache(dev) || bch_is_open_backing(dev);
}
+static bool bch_update_capacity(dev_t dev)
+{
+ const size_t max_stripes = min_t(size_t, INT_MAX,
+ SIZE_MAX / sizeof(atomic_t));
+
+ uint64_t n, n_old;
+ int nr_stripes_old;
+ bool res = false;
+
+ struct bcache_device *d;
+ struct cache_set *c, *tc;
+ struct cached_dev *dcp, *t, *dc = NULL;
+
+ uint64_t parent_nr_sectors;
+
+ list_for_each_entry_safe(c, tc, &bch_cache_sets, list)
+ list_for_each_entry_safe(dcp, t, &c->cached_devs, list)
+ if (dcp->bdev->bd_dev == dev) {
+ dc = dcp;
+ goto dc_found;
+ }
+
+dc_found:
+ if (!dc)
+ return false;
+
+ parent_nr_sectors = bdev_nr_sectors(dc->bdev) - dc->sb.data_offset;
+
+ if (parent_nr_sectors == bdev_nr_sectors(dc->disk.disk->part0))
+ return false;
+
+ if (!set_capacity_and_notify(dc->disk.disk, parent_nr_sectors))
+ return false;
+
+ d = &dc->disk;
+
+ /* Force cached device sectors re-calc */
+ calc_cached_dev_sectors(d->c);
+
+ /* Block writeback thread */
+ down_write(&dc->writeback_lock);
+ nr_stripes_old = d->nr_stripes;
+ n = DIV_ROUND_UP_ULL(parent_nr_sectors, d->stripe_size);
+ if (!n || n > max_stripes) {
+ pr_err("nr_stripes too large or invalid: %llu (start sector beyond
end of disk?)\n",
+ n);
+ goto unblock_and_exit;
+ }
+ d->nr_stripes = n;
+
+ n = d->nr_stripes * sizeof(atomic_t);
+ n_old = nr_stripes_old * sizeof(atomic_t);
+ d->stripe_sectors_dirty = kvrealloc(d->stripe_sectors_dirty, n_old,
+ n, GFP_KERNEL);
+ if (!d->stripe_sectors_dirty)
+ goto unblock_and_exit;
+
+ n = BITS_TO_LONGS(d->nr_stripes) * sizeof(unsigned long);
+ n_old = BITS_TO_LONGS(nr_stripes_old) * sizeof(unsigned long);
+ d->full_dirty_stripes = kvrealloc(d->full_dirty_stripes, n_old, n,
GFP_KERNEL);
+ if (!d->full_dirty_stripes)
+ goto unblock_and_exit;
+
+ res = true;
+
+unblock_and_exit:
+ up_write(&dc->writeback_lock);
+ return res;
+}
+
struct async_reg_args {
struct delayed_work reg_work;
char *path;
@@ -2569,7 +2639,10 @@ static ssize_t register_bcache(struct kobject *k,
struct kobj_attribute *attr,
mutex_lock(&bch_register_lock);
if (lookup_bdev(strim(path), &dev) == 0 &&
bch_is_open(dev))
- err = "device already registered";
+ if (bch_update_capacity(dev))
+ err = "capacity changed";
+ else
+ err = "device already registered";
else
err = "device busy";
mutex_unlock(&bch_register_lock);
--
2.37.3
On 6/9/22 15:22, Andrea Tomassetti wrote:
> Hi Coly,
> I have finally some time to work on the patch. I already have a first
> prototype that seems to work but, before sending it, I would like to
> ask you two questions:
> 1. Inspecting the code, I found that the only lock mechanism is the
> writeback_lock semaphore. Am I correct?
> 2. How can I effectively test my patch? So far I'm doing something like this:
> a. make-bcache --writeback -B /dev/vdb -C /dev/vdc
> b. mkfs.xfs /dev/bcache0
> c. dd if=/dev/random of=/mnt/random bs=1M count=1000
> d. md5sum /mnt/random | tee /mnt/random.md5
> e. live resize the disk and repeat c. and d.
> f. umount/reboot/remount and check that the md5 hashes are correct
>
> Any suggestions?
>
> Thank you very much,
> Andrea
>
> On Fri, Aug 5, 2022 at 9:38 PM Eric Wheeler <bcache@lists.ewheeler.net> wrote:
>>
>> On Wed, 3 Aug 2022, Andrea Tomassetti wrote:
>>> Hi Coly,
>>> In one of our previous emails you said that
>>>> Currently bcache doesn’t support cache or backing device resize
>>>
>>> I was investigating this point and I actually found a solution. I
>>> briefly tested it and it seems to work fine.
>>> Basically what I'm doing is:
>>> 1. Check if there's any discrepancy between the nr of sectors
>>> reported by the bcache backing device (holder) and the nr of sectors
>>> reported by its parent (slave).
>>> 2. If the number of sectors of the two devices are not the same,
>>> then call set_capacity_and_notify on the bcache device.
>>> 3. From user space, depending on the fs used, grow the fs with some
>>> utility (e.g. xfs_growfs)
>>>
>>> This works without any need of unmounting the mounted fs nor stopping
>>> the bcache backing device.
>>
>> Well done! +1, would love to see a patch for this!
>>
>>
>>> So my question is: am I missing something? Can this live resize cause
>>> some problems (e.g. data loss)? Would it be useful if I send a patch on
>>> this?
>>
>> A while a go we looked into doing this. Here is the summary of our
>> findings, not sure if there are any other considerations:
>>
>> 1. Create a sysfs file like /sys/block/bcache0/bcache/resize to trigger
>> resize on echo 1 >.
>> 2. Refactor the set_capacity() bits from bcache_device_init() into its
>> own function.
>> 3. Put locks around bcache_device.full_dirty_stripes and
>> bcache_device.stripe_sectors_dirty. Re-alloc+copy+free and zero the
>> new bytes at the end. Grep where bcache_device.full_dirty_stripes is
>> used and make sure it is locked appropriately, probably in the
>> writeback thread, maybe other places too.
>>
>> The cachedev's don't know anything about the bdev size, so (according to
>> Kent) they will "just work" by referencing new offsets that were
>> previously beyond the disk. (This is basically the same as resizing the
>> bdev and then unregister/re-register which is how we resize bdevs now.)
>>
>> As for resizing a cachedev, I've not looked at all---not sure about that
>> one. We always detach, resize, make-bcache and re-attach the new cache.
>> Maybe it is similarly simple, but haven't looked.
>>
>>
>> --
>> Eric Wheeler
>>
>>
>>
>>>
>>> Kind regards,
>>> Andrea
>>>
next prev parent reply other threads:[~2022-09-08 8:32 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-03 10:05 [RFC] Live resize of backing device Andrea Tomassetti
2022-08-04 14:32 ` Coly Li
2022-08-05 19:38 ` Eric Wheeler
2022-09-06 13:22 ` Andrea Tomassetti
2022-09-08 8:32 ` Andrea Tomassetti [this message]
2022-09-19 11:42 ` Andrea Tomassetti
2022-09-19 12:16 ` Coly Li
2022-12-09 8:57 ` Andrea Tomassetti
2022-12-09 9:36 ` Coly Li
2022-12-30 10:40 ` Coly Li
2023-01-11 16:01 ` Andrea Tomassetti
2023-01-17 13:08 ` Error messages with kernel 6.1.[56] Pierre Juhen
2023-01-17 16:08 ` Coly Li
2023-01-17 16:18 ` [RFC] Live resize of backing device Coly Li
2023-01-25 10:07 ` Andrea Tomassetti
2023-01-25 17:59 ` Coly Li
2023-01-27 12:44 ` Andrea Tomassetti
2023-01-27 22:40 ` Eric Wheeler
2023-01-31 10:20 ` Andrea Tomassetti
2023-02-02 17:18 ` Coly Li
2023-02-02 20:48 ` Eric Wheeler
2023-02-03 2:41 ` Coly Li
2023-02-19 9:39 ` Coly Li
2023-02-20 8:27 ` mingzhe
2023-02-20 12:29 ` Coly Li
2023-02-22 8:42 ` Andrea Tomassetti
2023-02-27 22:08 ` Eric Wheeler
2023-02-28 2:46 ` mingzhe
2023-01-27 2:53 ` [RFC] Live resize of bcache " Eric Wheeler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ddb082f-cefc-644e-2ccf-56d41207ecd3@devo.com \
--to=andrea.tomassetti-opensource@devo.com \
--cc=bcache@lists.ewheeler.net \
--cc=colyli@suse.de \
--cc=kent.overstreet@gmail.com \
--cc=linux-bcache@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.