* [PATCH 00/14] mikulas' shared snapshot patches
@ 2010-03-02 0:23 Mike Snitzer
2010-03-02 0:23 ` [PATCH 01/14] dm-multisnap-common Mike Snitzer
` (15 more replies)
0 siblings, 16 replies; 22+ messages in thread
From: Mike Snitzer @ 2010-03-02 0:23 UTC (permalink / raw)
To: dm-devel
Mikulas,
This is just the full submit of your shared snapshot patches from:
http://people.redhat.com/mpatocka/patches/kernel/new-snapshots/r15/
I think the next phase of review should possibly be driven through the
dm-devel mailing list. I'd at least like the option of exchanging
mail on aspects of some of these patches.
The first patch has one small cleanup in do_origin_write(): I
eliminated the 'midcycle' goto.
But the primary difference with this submission (when compared to your
r15 patches) is I editted the patches for whitespace and typos. I'm
_really_ not trying to step on your hard work by doing this
superficial stuff. But while reviewing the code the insanely long
lines really were distracting. I tried very hard to preserve the
intent of the DM_MULTISNAP_SET_ERROR/DM_ERR messages by still having
grep'able content (on a single line).
I also didn't go crazy like a checkpatch.pl zealot.. I didn't even run
these patches through checkpatch!
I know how sensitive you are about allowing the editor do the wrapping
but I trully think the length of some lines would never get past
Alasdair (or Linus) -- even though they have relaxed the rules for
line length.
I'll respond to this cover-letter with a single incremental patch that
shows my edits.
All my edits aside; I must say I'm impressed at the amount/complexity
of code you've cranked out for this shared snapshot support. It is
going to take me many more review iterations of these patches before
I'll be able to say I understand all that these patches achieve.
I think drivers/md/dm-bufio.c will be controversial (to the greater
upstream community) but I understand that it enabled you to focus on
the problem of shared snapshots without having to concern yourself
with core VM and block changes to accomplish the same.
Mikulas Patocka (14):
dm-multisnap-common
dm-bufio
dm-multisnap-mikulas-headers
dm-multisnap-mikulas-alloc
dm-multisnap-mikulas-blocks
dm-multisnap-mikulas-btree
dm-multisnap-mikulas-commit
dm-multisnap-mikulas-delete
dm-multisnap-mikulas-freelist
dm-multisnap-mikulas-io
dm-multisnap-mikulas-snaps
dm-multisnap-mikulas-common
dm-multisnap-mikulas-config
dm-multisnap-daniel
Documentation/device-mapper/dm-multisnapshot.txt | 77 +
drivers/md/Kconfig | 33 +
drivers/md/Makefile | 10 +
drivers/md/dm-bufio.c | 987 +++++++++++
drivers/md/dm-bufio.h | 35 +
drivers/md/dm-multisnap-alloc.c | 590 +++++++
drivers/md/dm-multisnap-blocks.c | 333 ++++
drivers/md/dm-multisnap-btree.c | 838 +++++++++
drivers/md/dm-multisnap-commit.c | 245 +++
drivers/md/dm-multisnap-daniel.c | 1711 ++++++++++++++++++
drivers/md/dm-multisnap-delete.c | 137 ++
drivers/md/dm-multisnap-freelist.c | 296 ++++
drivers/md/dm-multisnap-io.c | 209 +++
drivers/md/dm-multisnap-mikulas-struct.h | 380 ++++
drivers/md/dm-multisnap-mikulas.c | 760 ++++++++
drivers/md/dm-multisnap-mikulas.h | 247 +++
drivers/md/dm-multisnap-private.h | 161 ++
drivers/md/dm-multisnap-snaps.c | 636 +++++++
drivers/md/dm-multisnap.c | 2007 ++++++++++++++++++++++
drivers/md/dm-multisnap.h | 183 ++
20 files changed, 9875 insertions(+), 0 deletions(-)
create mode 100644 Documentation/device-mapper/dm-multisnapshot.txt
create mode 100644 drivers/md/dm-bufio.c
create mode 100644 drivers/md/dm-bufio.h
create mode 100644 drivers/md/dm-multisnap-alloc.c
create mode 100644 drivers/md/dm-multisnap-blocks.c
create mode 100644 drivers/md/dm-multisnap-btree.c
create mode 100644 drivers/md/dm-multisnap-commit.c
create mode 100644 drivers/md/dm-multisnap-daniel.c
create mode 100644 drivers/md/dm-multisnap-delete.c
create mode 100644 drivers/md/dm-multisnap-freelist.c
create mode 100644 drivers/md/dm-multisnap-io.c
create mode 100644 drivers/md/dm-multisnap-mikulas-struct.h
create mode 100644 drivers/md/dm-multisnap-mikulas.c
create mode 100644 drivers/md/dm-multisnap-mikulas.h
create mode 100644 drivers/md/dm-multisnap-private.h
create mode 100644 drivers/md/dm-multisnap-snaps.c
create mode 100644 drivers/md/dm-multisnap.c
create mode 100644 drivers/md/dm-multisnap.h
^ permalink raw reply [flat|nested] 22+ messages in thread* [PATCH 01/14] dm-multisnap-common 2010-03-02 0:23 [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer @ 2010-03-02 0:23 ` Mike Snitzer 2010-03-02 0:23 ` [PATCH 02/14] dm-bufio Mike Snitzer ` (14 subsequent siblings) 15 siblings, 0 replies; 22+ messages in thread From: Mike Snitzer @ 2010-03-02 0:23 UTC (permalink / raw) To: dm-devel; +Cc: Mikulas Patocka From: Mikulas Patocka <mpatocka@redhat.com> Common code for multisnapshot target. This is the common code, shared by all exception stores. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- Documentation/device-mapper/dm-multisnapshot.txt | 77 + drivers/md/Kconfig | 10 + drivers/md/Makefile | 2 + drivers/md/dm-multisnap-private.h | 161 ++ drivers/md/dm-multisnap.c | 2007 ++++++++++++++++++++++ drivers/md/dm-multisnap.h | 183 ++ 6 files changed, 2440 insertions(+), 0 deletions(-) create mode 100644 Documentation/device-mapper/dm-multisnapshot.txt create mode 100644 drivers/md/dm-multisnap-private.h create mode 100644 drivers/md/dm-multisnap.c create mode 100644 drivers/md/dm-multisnap.h diff --git a/Documentation/device-mapper/dm-multisnapshot.txt b/Documentation/device-mapper/dm-multisnapshot.txt new file mode 100644 index 0000000..0dff16e --- /dev/null +++ b/Documentation/device-mapper/dm-multisnapshot.txt @@ -0,0 +1,77 @@ +This snapshot implementation has shared storage and high number of snapshots. + +The work is split to two modules: +dm-multisnapshot.ko - the general module +dm-store-mikulas.ko - the snapshot store + +The modularity allows to load other snapshot stores. + +Usage: +Create two logical volumes, one for origin and one for snapshots. +(assume /dev/mapper/vg1-lv1 for origin and /dev/mapper/vg1-lv2 for snapshot in +these examples) + +Clear the first sector of the snapshot volume: +dd if=/dev/zero of=/dev/mapper/vg1-lv2 bs=4096 count=1 + +Table line arguments: +- origin device +- shared store device +- chunk size +- number of generic arguments +- generic arguments + sync-snapshots --- synchronize snapshots according to the list + preserve-on-error --- halt the origin on error in the snapshot store +- shared store type +- number of arguments for shared store type +- shared store arguments + cache-threshold size --- a background write is started + cache-limit size --- a limit for metadata cache size +if sync-snapshots was specified + - number of snapshot ids + - snapshot ids + +Load the shared snapshot driver: +echo 0 `blockdev --getsize /dev/mapper/vg1-lv1` multisnapshot /dev/mapper/vg1-lv1 /dev/mapper/vg1-lv2 16 0 mikulas 0|dmsetup create ms +(16 is the chunk size in 512-byte sectors. You can place different number there) +This creates the origin store on /dev/mapper/ms. If the store was zeroed, it +creates new structure, otherwise it loads existing structure. + +Once this is done, you should no longer access /dev/mapper/vg1-lv1 and +/dev/mapper/vg1-lv2 and only use /dev/mapper/ms. + +Create new snapshot: +dmsetup message /dev/mapper/ms 0 create + If you want to create snapshot-of-snapshot, use + dmsetup message /dev/mapper/ms 0 create_subsnap <snapID> +dmsetup status /dev/mapper/ms + (this will find out the newly created snapshot ID) +dmsetup suspend /dev/mapper/ms +dmsetup resume /dev/mapper/ms + +Attach the snapshot: +echo 0 `blockdev --getsize /dev/mapper/vg1-lv1` multisnap-snap /dev/mapper/vg1-lv1 0|dmsetup create ms0 +(that '0' is the snapshot id ... you can use different number) +This attaches the snapshot '0' on /dev/mapper/ms0 + +Delete the snapshot: +dmsetup message /dev/mapper/ms 0 delete 0 +(the parameter after "delete" is the snapshot id) + +See status: +dmsetup status prints these information about the multisnapshot device: +- number of arguments befor the snapshot id list (5) +- 0 on active storage, -error number on error (-ENOSPC, -EIO, etc.) +- the new snapshot number that will be created, "-" if there is none +- total number of chunks on the device +- total number of allocated chunks +- a number of chunks allocated for metadata +- a number of snapshots +- existing snapshot IDs + +Unload it: +dmsetup remove ms +dmsetup remove ms0 +... etc. (note, once you unload the origin, the snapshots become inaccessible +- the devices exist but they return -EIO on everything) + diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig index acb3a4e..c3b55a8 100644 --- a/drivers/md/Kconfig +++ b/drivers/md/Kconfig @@ -249,6 +249,16 @@ config DM_SNAPSHOT ---help--- Allow volume managers to take writable snapshots of a device. +config DM_MULTISNAPSHOT + tristate "Multisnapshot target" + depends on BLK_DEV_DM + ---help--- + A new implementation of snapshots allowing sharing storage + between several snapshots. + + A submenu allows to select a specific shared snapshot store + driver. + config DM_MIRROR tristate "Mirror target" depends on BLK_DEV_DM diff --git a/drivers/md/Makefile b/drivers/md/Makefile index e355e7f..674649c 100644 --- a/drivers/md/Makefile +++ b/drivers/md/Makefile @@ -7,6 +7,7 @@ dm-mod-y += dm.o dm-table.o dm-target.o dm-linear.o dm-stripe.o \ dm-multipath-y += dm-path-selector.o dm-mpath.o dm-snapshot-y += dm-snap.o dm-exception-store.o dm-snap-transient.o \ dm-snap-persistent.o +dm-multisnapshot-y += dm-multisnap.o dm-mirror-y += dm-raid1.o dm-log-userspace-y \ += dm-log-userspace-base.o dm-log-userspace-transfer.o @@ -41,6 +42,7 @@ obj-$(CONFIG_DM_MULTIPATH) += dm-multipath.o dm-round-robin.o obj-$(CONFIG_DM_MULTIPATH_QL) += dm-queue-length.o obj-$(CONFIG_DM_MULTIPATH_ST) += dm-service-time.o obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o +obj-$(CONFIG_DM_MULTISNAPSHOT) += dm-multisnapshot.o obj-$(CONFIG_DM_MIRROR) += dm-mirror.o dm-log.o dm-region-hash.o obj-$(CONFIG_DM_LOG_USERSPACE) += dm-log-userspace.o obj-$(CONFIG_DM_ZERO) += dm-zero.o diff --git a/drivers/md/dm-multisnap-private.h b/drivers/md/dm-multisnap-private.h new file mode 100644 index 0000000..b623027 --- /dev/null +++ b/drivers/md/dm-multisnap-private.h @@ -0,0 +1,161 @@ +/* + * Copyright (C) 2009 Red Hat Czech, s.r.o. + * + * Mikulas Patocka <mpatocka@redhat.com> + * + * This file is released under the GPL. + */ + +#ifndef DM_MULTISNAP_PRIVATE_H +#define DM_MULTISNAP_PRIVATE_H + +#include "dm-multisnap.h" + +/* + * Private structures for dm-multisnap.c. + * This file should not be included by exception store drivers. + * Changes to this file do not change ABI. + */ + +#include <linux/dm-kcopyd.h> + +#define DM_MULTISNAP_MAX_REMAPS 256 + +#define DM_MULTISNAP_KCOPYD_PAGES (((1UL << 20) >> PAGE_SHIFT) ? : 1) + +#define DM_MULTISNAP_MAX_CHUNKS_TO_REMAP DM_KCOPYD_MAX_REGIONS + +#define DM_PENDING_HASH_SIZE 256 +#define DM_PENDING_HASH(c) ((c) & (DM_PENDING_HASH_SIZE - 1)) +#define DM_PENDING_MEMPOOL_SIZE 256 + +#define DM_TRACKED_CHUNK_HASH_SIZE 16 +#define DM_TRACKED_CHUNK_HASH(x) ((unsigned long)(x) & (DM_TRACKED_CHUNK_HASH_SIZE - 1)) +#define DM_TRACKED_CHUNK_POOL_SIZE 256 + +struct dm_multisnap_bio_queue { + struct bio_list bios; +}; + +#define DM_MULTISNAP_N_QUEUES 2 + +struct dm_multisnap { + struct dm_exception_store *p; + struct dm_multisnap_exception_store *store; + + struct dm_dev *origin; + struct dm_dev *snapshot; + + int error; + + unsigned chunk_size; + unsigned char chunk_shift; + + unsigned char flags; +#define DM_MULTISNAP_SYNC_SNAPSHOTS 1 +#define DM_MULTISNAP_PRESERVE_ON_ERROR 2 + + sector_t origin_sectors; + + struct mutex master_lock; + struct mutex status_lock; + struct workqueue_struct *wq; + struct work_struct work; + + /* Queues are protected with dm_multisnap_bio_list_lock */ + struct dm_multisnap_bio_queue queue[DM_MULTISNAP_N_QUEUES]; + unsigned current_queue; + + struct list_head background_works; + + /* All snapshot IOs */ + mempool_t *tracked_chunk_pool; + + /* these two are protected with dm_multisnap_bio_list_lock */ + long n_tracked_ios; + struct hlist_head tracked_chunk_hash[DM_TRACKED_CHUNK_HASH_SIZE]; + + mempool_t *pending_pool; + + struct dm_kcopyd_client *kcopyd; + + /* + * The following two variables do a trick to avoid the need for + * atomic operations. + * + * kcopyd_jobs_submitted_count is incremented each time a job is + * submitted to kcopyd. master_lock protects it. + * + * kcopyd_jobs_finished_count is incremented each time a kcopyd + * callback is called. The callback is single-threaded, so it needs + * no protection. + * + * Both kcopyd_jobs_submitted_count and kcopyd_jobs_finished_count + * can be updated simultaneously. But none of these variables is + * updated multiple times concurrently. + * + * When these two are equal, there are no jobs in flight. When they + * are equal and master_lock is held, we know that there are no jobs + * in flight and no new can be submitted --- i.e. we can commit. + */ + unsigned long kcopyd_jobs_submitted_count; + unsigned long kcopyd_jobs_finished_count; + + /* The value of the counter on last commit */ + unsigned long kcopyd_jobs_last_commit_count; + + /* This may only be accessed from kcopyd callback, it has no locking */ + struct list_head pes_waiting_for_commit; + + /* Increased each time a commit happens */ + unsigned commit_sequence; + + /* List head for struct dm_multisnap_pending_exception->hash_list */ + struct hlist_head pending_hash[DM_PENDING_HASH_SIZE]; + + char pending_mempool_allocation_failed; + + /* The new snapshot id to be created */ + char new_snapid_valid; + snapid_t new_snapid; + + /* List head for struct dm_multisnap_snap->list_snaps */ + struct list_head all_snaps; + + /* List entry for all_multisnapshots */ + struct list_head list_all; +}; + +struct dm_multisnap_snap { + struct dm_multisnap *s; + snapid_t snapid; + /* List entry for struct dm_multisnap->list_all */ + struct list_head list_snaps; + char origin_name[16]; + char snapid_string[1]; +}; + +struct dm_multisnap_tracked_chunk { + struct hlist_node node; + chunk_t chunk; + unsigned long bio_rw; + struct dm_multisnap *s; +}; + +struct dm_multisnap_pending_exception { + /* List entry for struct dm_multisnap->pending_hash */ + struct hlist_node hash_list; + + struct dm_multisnap *s; + struct bio_list bios; + + chunk_t chunk; + + int n_descs; + union chunk_descriptor desc[DM_MULTISNAP_MAX_CHUNKS_TO_REMAP]; + + /* List entry for struct dm_multisnap->pes_waiting_for_commit */ + struct list_head list; +}; + +#endif diff --git a/drivers/md/dm-multisnap.c b/drivers/md/dm-multisnap.c new file mode 100644 index 0000000..758c013 --- /dev/null +++ b/drivers/md/dm-multisnap.c @@ -0,0 +1,2007 @@ +/* + * Copyright (C) 2009 Red Hat Czech, s.r.o. + * + * Mikulas Patocka <mpatocka@redhat.com> + * + * This file is released under the GPL. + */ + +#include "dm-multisnap-private.h" + +#include <linux/delay.h> +#include <linux/vmalloc.h> +#include <linux/sort.h> + +static void dm_multisnap_process_bios(struct dm_multisnap *s); + +/* --- locking --- */ + +static void dm_multisnap_lock(struct dm_multisnap *s) +{ + mutex_lock(&s->master_lock); + if (s->p && s->store->store_lock_acquired) + s->store->store_lock_acquired(s->p, 0); +} + +static void dm_multisnap_unlock(struct dm_multisnap *s) +{ + mutex_unlock(&s->master_lock); +} + +static int dm_multisnap_lock_contended(struct dm_multisnap *s) +{ + return !list_empty(&s->master_lock.wait_list); +} + +static void dm_multisnap_assert_locked(struct dm_multisnap *s) +{ + BUG_ON(!mutex_is_locked(&s->master_lock)); +} + +void dm_multisnap_status_lock(struct dm_multisnap *s) +{ + mutex_lock(&s->status_lock); +} +EXPORT_SYMBOL(dm_multisnap_status_lock); + +void dm_multisnap_status_unlock(struct dm_multisnap *s) +{ + mutex_unlock(&s->status_lock); +} +EXPORT_SYMBOL(dm_multisnap_status_unlock); + +void dm_multisnap_status_assert_locked(struct dm_multisnap *s) +{ + BUG_ON(!mutex_is_locked(&s->status_lock)); +} +EXPORT_SYMBOL(dm_multisnap_status_assert_locked); + +/* --- helper functions to access internal state --- */ + +/* + * These tiny functions are used to access internal state of dm_multisnap. + * + * We access these fields with functions and don't export struct dm_multisnap + * to exception store drivers, so that changes to "struct dm_multisnap" don't + * change the ABI. + */ + +struct block_device *dm_multisnap_snapshot_bdev(struct dm_multisnap *s) +{ + return s->snapshot->bdev; +} +EXPORT_SYMBOL(dm_multisnap_snapshot_bdev); + +unsigned dm_multisnap_chunk_size(struct dm_multisnap *s) +{ + return s->chunk_size; +} +EXPORT_SYMBOL(dm_multisnap_chunk_size); + +void dm_multisnap_set_error(struct dm_multisnap *s, int error) +{ + if (!s->error) + s->error = error; + + /* + * Dump the stack on all errors, except space overflow. + * + * Space overflow can happen normally, other errors may mean that + * there is a bug in the code and getting a stack dump is viable. + */ + if (error != -ENOSPC) + dump_stack(); +} +EXPORT_SYMBOL(dm_multisnap_set_error); + +int dm_multisnap_has_error(struct dm_multisnap *s) +{ + return s->error; +} +EXPORT_SYMBOL(dm_multisnap_has_error); + +int dm_multisnap_drop_on_error(struct dm_multisnap *s) +{ + return !(s->flags & DM_MULTISNAP_PRESERVE_ON_ERROR); +} +EXPORT_SYMBOL(dm_multisnap_drop_on_error); + +static DEFINE_MUTEX(all_multisnapshots_lock); +static LIST_HEAD(all_multisnapshots); + +static chunk_t sector_to_chunk(struct dm_multisnap *s, sector_t sector) +{ + return sector >> (s->chunk_shift - SECTOR_SHIFT); +} + +static sector_t chunk_to_sector(struct dm_multisnap *s, chunk_t chunk) +{ + return chunk << (s->chunk_shift - SECTOR_SHIFT); +} + +int dm_multisnap_snapshot_exists(struct dm_multisnap *s, snapid_t snapid) +{ + return snapid == s->store->get_next_snapid(s->p, snapid); +} +EXPORT_SYMBOL(dm_multisnap_snapshot_exists); + +static long dm_multisnap_jobs_in_flight(struct dm_multisnap *s) +{ + return s->kcopyd_jobs_submitted_count - s->kcopyd_jobs_last_commit_count; +} + +/* --- snapids --- */ + +/* + * Any reading/writing of snapids in table/status/message must go + * through these functions, so that snapid format for userspace can + * be overridden. + */ + +static void print_snapid(struct dm_multisnap *s, char *string, + unsigned maxlen, snapid_t snapid) +{ + if (s->store->print_snapid) + s->store->print_snapid(s->p, string, maxlen, snapid); + else + snprintf(string, maxlen, "%llu", (unsigned long long)snapid); +} + +static int read_snapid(struct dm_multisnap *s, char *string, + snapid_t *snapid, char **error) +{ + if (s->store->read_snapid) + return s->store->read_snapid(s->p, string, snapid, error); + else { + int r; + + char *argv_array[1] = { string }; + char **argv = argv_array; + unsigned argc = 1; + __u64 unsigned_int64; + + r = dm_multisnap_get_uint64(&argv, &argc, &unsigned_int64, error); + if (r) + return r; + + *snapid = unsigned_int64; + return 0; + } +} + +/* --- bio list --- */ + +static DEFINE_SPINLOCK(dm_multisnap_bio_list_lock); + +static void wakeup_kmultisnapd(struct dm_multisnap *s) +{ + queue_work(s->wq, &s->work); +} + +static void dm_multisnap_enqueue_bio_unlocked(struct dm_multisnap *s, struct bio *bio) +{ + struct dm_multisnap_bio_queue *q; + if (bio_rw(bio) != WRITE) + q = &s->queue[0]; + else + q = &s->queue[1]; + bio_list_add(&q->bios, bio); +} + +static void dm_multisnap_enqueue_bio(struct dm_multisnap *s, struct bio *bio) +{ + spin_lock_irq(&dm_multisnap_bio_list_lock); + dm_multisnap_enqueue_bio_unlocked(s, bio); + spin_unlock_irq(&dm_multisnap_bio_list_lock); +} + +static void dm_multisnap_enqueue_bio_list(struct dm_multisnap *s, struct bio_list *bl) +{ + struct bio *bio; + while ((bio = bio_list_pop(bl))) { + dm_multisnap_enqueue_bio(s, bio); + cond_resched(); + } +} + +static struct bio *dm_multisnap_dequeue_bio(struct dm_multisnap *s) +{ + struct bio *bio; + + spin_lock_irq(&dm_multisnap_bio_list_lock); + +#ifdef DM_MULTISNAP_MAX_REMAPS + if (dm_multisnap_jobs_in_flight(s) >= DM_MULTISNAP_MAX_REMAPS) { + s->current_queue = 0; + goto test_current_queue; + } +#endif + + s->current_queue ^= 1; + + bio = bio_list_pop(&s->queue[s->current_queue ^ 1].bios); + if (bio) + goto ret; + +#ifdef DM_MULTISNAP_MAX_REMAPS +test_current_queue: +#endif + bio = bio_list_pop(&s->queue[s->current_queue].bios); + +ret: + spin_unlock_irq(&dm_multisnap_bio_list_lock); + + return bio; +} + +static int dm_multisnap_bio_queue_empty(struct dm_multisnap *s) +{ + unsigned i; + + spin_lock_irq(&dm_multisnap_bio_list_lock); + + for (i = 0; i < DM_MULTISNAP_N_QUEUES; i++) + if (!bio_list_empty(&s->queue[i].bios)) + break; + + spin_unlock_irq(&dm_multisnap_bio_list_lock); + + return i != DM_MULTISNAP_N_QUEUES; +} + +static void dm_multisnap_bio_dequeue_all(struct dm_multisnap *s, struct bio_list *bl) +{ + unsigned i; + + bio_list_init(bl); + + spin_lock_irq(&dm_multisnap_bio_list_lock); + + for (i = 0; i < DM_MULTISNAP_N_QUEUES; i++) { + bio_list_merge(bl, &s->queue[i].bios); + bio_list_init(&s->queue[i].bios); + } + + spin_unlock_irq(&dm_multisnap_bio_list_lock); +} + +static void dm_multisnap_init_bio_queues(struct dm_multisnap *s) +{ + unsigned i; + for (i = 0; i < DM_MULTISNAP_N_QUEUES; i++) + bio_list_init(&s->queue[i].bios); + s->current_queue = 0; +} + +/* Reduce the size of the bio */ + +static void bio_trim(struct bio *bio, unsigned size) +{ + unsigned i; + bio->bi_size = size; + for (i = 0; i < bio->bi_vcnt; i++) { + if (size <= bio->bi_io_vec[i].bv_len) { + bio->bi_io_vec[i].bv_len = size; + bio->bi_vcnt = i + 1; + bio->bi_flags &= ~(1 << BIO_SEG_VALID); + return; + } + size -= bio->bi_io_vec[i].bv_len; + } + BUG(); +} + +/* --- encode 64-bit snapids in bio */ + +static snapid_t bio_get_snapid(struct bio *bio) +{ + return ((__u64)bio->bi_seg_front_size << 32) | bio->bi_seg_back_size; +} + +static void bio_put_snapid(struct bio *bio, snapid_t snapid) +{ + bio->bi_seg_front_size = (__u64)snapid >> 32; + bio->bi_seg_back_size = snapid; +} + +/* --- tracked chunks --- */ + +static struct kmem_cache *tracked_chunk_cache; + +static int chunk_is_tracked(struct dm_multisnap *s, chunk_t chunk) +{ + struct dm_multisnap_tracked_chunk *c; + struct hlist_node *hn; + + spin_lock_irq(&dm_multisnap_bio_list_lock); + + hlist_for_each_entry(c, hn, + &s->tracked_chunk_hash[DM_TRACKED_CHUNK_HASH(chunk)], node) { + if (likely(c->chunk == chunk)) { + spin_unlock_irq(&dm_multisnap_bio_list_lock); + return 1; + } + } + + spin_unlock_irq(&dm_multisnap_bio_list_lock); + + return 0; +} + +/* --- pending exception cache --- */ + +static struct kmem_cache *pending_exception_cache; + +#define GFP_PENDING_EXCEPTION GFP_NOIO + +static void pending_exception_ctor(void *pe_) +{ + struct dm_multisnap_pending_exception *pe = pe_; + bio_list_init(&pe->bios); +} + +static struct dm_multisnap_pending_exception * +dm_multisnap_alloc_pending_exception(struct dm_multisnap *s, chunk_t chunk) +{ + struct dm_multisnap_pending_exception *pe; + /* + * Warning, we don't want to wait. Because we are holding master_lock + * and taking this lock is needed to complete the exception. + * + * If an allocation failure happens, we must go up, drop the lock, + * try dummy mempool allocation and go here again. + */ + pe = mempool_alloc(s->pending_pool, GFP_PENDING_EXCEPTION & ~__GFP_WAIT); + if (unlikely(!pe)) + return NULL; + + pe->s = s; + pe->chunk = chunk; + hlist_add_head(&pe->hash_list, &s->pending_hash[DM_PENDING_HASH(chunk)]); + return pe; +} + +static void dm_multisnap_free_pending_exception(struct dm_multisnap_pending_exception *pe) +{ + hlist_del(&pe->hash_list); + mempool_free(pe, pe->s->pending_pool); +} + +static void dm_multisnap_wait_for_pending_exception(struct dm_multisnap *s) +{ + /* + * Wait until there is something in the mempool. Free it immediately. + */ + struct dm_multisnap_pending_exception *pe; + + pe = mempool_alloc(s->pending_pool, GFP_PENDING_EXCEPTION | __GFP_WAIT); + mempool_free(pe, s->pending_pool); +} + +/* + * Check if the chunk+snapid conflicts with any pending exception. + * + * If it does, queue the bio on the pending exception. + */ +static int check_pending_io(struct dm_multisnap *s, struct bio *bio, + chunk_t chunk, snapid_t snapid) +{ + struct dm_multisnap_pending_exception *pe; + struct hlist_node *hn; + hlist_for_each_entry(pe, hn, &s->pending_hash[DM_PENDING_HASH(chunk)], hash_list) { + if (pe->chunk == chunk) { + int i; + if (snapid == DM_SNAPID_T_ORIGIN) + goto conflict; + for (i = 0; i < pe->n_descs; i++) { + if (s->store->check_conflict(s->p, &pe->desc[i], snapid)) + goto conflict; + } + } + cond_resched(); + } + return 0; + +conflict: + bio_list_add(&pe->bios, bio); + return 1; +} + +/* --- commit --- */ + +/* + * Test if commit can be performed. If these two variables are not equal, + * there are some pending kcopyd jobs and we must not commit. + */ +int dm_multisnap_can_commit(struct dm_multisnap *s) +{ + return s->kcopyd_jobs_submitted_count == s->kcopyd_jobs_finished_count; +} +EXPORT_SYMBOL(dm_multisnap_can_commit); + +/* + * Call exception store commit method. + * This can be called only if dm_multisnap_can_commit returned true; + * master_lock must be locked. + */ +void dm_multisnap_call_commit(struct dm_multisnap *s) +{ + s->kcopyd_jobs_last_commit_count = s->kcopyd_jobs_finished_count; + s->store->commit(s->p); + s->commit_sequence++; +} +EXPORT_SYMBOL(dm_multisnap_call_commit); + +/* + * Force commit at this point. It is guaranteed that commit happened when + * this function exits. + * master_lock must be unlocked. + * + * If the commit cannot be performed immediately (because there are pending + * chunks being copied), the function drops the lock and polls. It won't + * livelock --- either it will be possible to do the commit or someone + * has done the commit already (commit_sequence changed). + * + * The polling is justified because this function is only called when deleting + * a snapshot or when suspending the origin with postsuspend. These functions + * are not performance-critical, thus 1ms delay won't cause a performance + * problem. + */ +static int dm_multisnap_force_commit(struct dm_multisnap *s) +{ + int err; + unsigned commit_sequence; + + dm_multisnap_lock(s); + + commit_sequence = s->commit_sequence; + + while (!dm_multisnap_can_commit(s)) { + dm_multisnap_unlock(s); + msleep(1); + dm_multisnap_lock(s); + if (s->commit_sequence != commit_sequence) + goto unlock_ret; + } + + dm_multisnap_call_commit(s); + +unlock_ret: + err = dm_multisnap_has_error(s); + dm_multisnap_unlock(s); + + return err; +} + +/* --- kcopyd callback --- */ + +static void remap_callback(int read_err, unsigned long write_err, void *pe_) +{ + struct dm_multisnap_pending_exception *pe = pe_; + struct dm_multisnap *s = pe->s; + + if (unlikely((read_err | write_err) != 0)) + DM_MULTISNAP_SET_ERROR(s, -EIO, ("remap_callback: kcopyd I/O error: " + "%d, %lx", read_err, write_err)); + + list_add_tail(&pe->list, &s->pes_waiting_for_commit); + + s->kcopyd_jobs_finished_count++; + + /* If there are more jobs pending, don't commit */ + if (!dm_multisnap_can_commit(s)) + return; + + if (s->store->prepare_for_commit) + s->store->prepare_for_commit(s->p); + + dm_multisnap_lock(s); + + /* Recheck after the lock was taken */ + if (unlikely(!dm_multisnap_can_commit(s))) { + /* Not yet ... kmultisnapd has just added something */ + dm_multisnap_unlock(s); + return; + } + + /* We need to commit stuff */ + + dm_multisnap_call_commit(s); + + do { + pe = container_of(s->pes_waiting_for_commit.next, + struct dm_multisnap_pending_exception, list); + + /* + * When we are about to free the pending exception, we must + * wait for all reads to the appropriate chunk to finish. + * + * This prevents the following race condition: + * - someone reads the chunk in the snapshot with no exception + * - that read is remapped directly to the origin, the read + * is delayed for some reason + * - someone else writes to the origin, this triggers realloc + * - the realloc finishes + * - the write is dispatched to the origin + * - the read submitted first is dispatched and reads modified + * data + * + * This race is very improbable (non-shared snapshots have this + * race too and it hasn't ever been reported seen, except in + * artifically simulated cases). So we use active waiting with + * msleep(1). + */ + while (chunk_is_tracked(s, pe->chunk)) + msleep(1); + + list_del(&pe->list); + dm_multisnap_enqueue_bio_list(s, &pe->bios); + dm_multisnap_free_pending_exception(pe); + } while (!list_empty(&s->pes_waiting_for_commit)); + + /* + * Process the bios that we have just added to the queue. + * It's faster to process them now than to hand them over to + * kmultisnapd. + */ + dm_multisnap_process_bios(s); + + dm_multisnap_unlock(s); + + blk_unplug(bdev_get_queue(s->origin->bdev)); + blk_unplug(bdev_get_queue(s->snapshot->bdev)); +} + +static void dispatch_kcopyd(struct dm_multisnap *s, + struct dm_multisnap_pending_exception *pe, + int from_snapshot, chunk_t chunk, struct bio *bio, + struct dm_io_region *dests, unsigned n_dests) +{ + unsigned i; + struct dm_io_region src; + + pe->n_descs = n_dests; + + bio_list_add(&pe->bios, bio); + + src.bdev = likely(!from_snapshot) ? s->origin->bdev : s->snapshot->bdev; + src.sector = chunk_to_sector(s, chunk); + src.count = s->chunk_size >> SECTOR_SHIFT; + + if (likely(!from_snapshot) && + unlikely(src.sector + src.count > s->origin_sectors)) { + if (src.sector >= s->origin_sectors) + src.count = 0; + else + src.count = s->origin_sectors - src.sector; + + for (i = 0; i < pe->n_descs; i++) + dests[i].count = src.count; + } + + s->kcopyd_jobs_submitted_count++; + + dm_kcopyd_copy(s->kcopyd, &src, n_dests, dests, 0, remap_callback, pe); +} + +/* --- bio processing --- */ + +/* + * Process bio on the origin. + * Reads and barriers never go here, they are dispatched directly. + */ +static void do_origin_write(struct dm_multisnap *s, struct bio *bio) +{ + int r; + unsigned i; + chunk_t chunk, new_chunk; + struct dm_multisnap_pending_exception *pe; + struct dm_io_region dests[DM_MULTISNAP_MAX_CHUNKS_TO_REMAP]; + + /* reads are processed directly in multisnap_origin_map */ + BUG_ON(bio_rw(bio) != WRITE); + + if (bio->bi_sector + (bio->bi_size >> SECTOR_SHIFT) > s->origin_sectors) { + DMERR("do_origin_write: access beyond end of device, flags %lx, " + "sector %llx, size %x, origin sectors %llx", + bio->bi_flags, + (unsigned long long)bio->bi_sector, + bio->bi_size, + (unsigned long long)s->origin_sectors); + bio_endio(bio, -EIO); + return; + } + + if (unlikely(dm_multisnap_has_error(s))) + goto err_endio; + + s->store->reset_query(s->p); + + chunk = sector_to_chunk(s, bio->bi_sector); + + r = s->store->query_next_remap(s->p, chunk); + if (unlikely(r < 0)) + goto err_endio; + + if (likely(!r)) { + /* There is nothing to remap */ + if (unlikely(check_pending_io(s, bio, chunk, DM_SNAPID_T_ORIGIN))) + return; +dispatch_write: + bio->bi_bdev = s->origin->bdev; + generic_make_request(bio); + return; + } + + pe = dm_multisnap_alloc_pending_exception(s, chunk); + if (unlikely(!pe)) { + s->pending_mempool_allocation_failed = 1; + dm_multisnap_enqueue_bio(s, bio); + return; + } + + i = 0; + for (; i < DM_MULTISNAP_MAX_CHUNKS_TO_REMAP; i++) { + s->store->add_next_remap(s->p, &pe->desc[i], &new_chunk); + if (unlikely(dm_multisnap_has_error(s))) + goto free_err_endio; + + dests[i].bdev = s->snapshot->bdev; + dests[i].sector = chunk_to_sector(s, new_chunk); + dests[i].count = s->chunk_size >> SECTOR_SHIFT; + + r = s->store->query_next_remap(s->p, chunk); + if (unlikely(r < 0)) + goto free_err_endio; + if (likely(!r)) { + i++; + break; + } + } + + dispatch_kcopyd(s, pe, 0, chunk, bio, dests, i); + return; + +free_err_endio: + dm_multisnap_free_pending_exception(pe); +err_endio: + r = -EIO; + if (!(s->flags & DM_MULTISNAP_PRESERVE_ON_ERROR)) + goto dispatch_write; + + bio_endio(bio, r); + return; +} + +/* + * Process bio on the snapshot. + * Barriers never go here, they are dispatched directly. + */ +static void do_snapshot_io(struct dm_multisnap *s, struct bio *bio, snapid_t id) +{ + chunk_t chunk, result, copy_from; + int r; + struct dm_multisnap_pending_exception *pe; + struct dm_io_region dest; + + if (unlikely(!s->store->make_chunk_writeable) && + unlikely(bio_rw(bio) == WRITE)) + goto err_endio; + + if (unlikely(dm_multisnap_has_error(s))) + goto err_endio; + + chunk = sector_to_chunk(s, bio->bi_sector); + r = s->store->find_snapshot_chunk(s->p, id, chunk, + bio_rw(bio) == WRITE, &result); + if (unlikely(r < 0)) + goto err_endio; + + if (!r) { + /* Not found in the snapshot */ + if (likely(bio_rw(bio) != WRITE)) { + union map_info *map_context; + struct dm_multisnap_tracked_chunk *c; + + if (unlikely(bio->bi_sector + (bio->bi_size >> SECTOR_SHIFT) > s->origin_sectors)) { + zero_fill_bio(bio); + if (bio->bi_sector >= s->origin_sectors) { + bio_endio(bio, 0); + return; + } + bio_trim(bio, (s->origin_sectors - bio->bi_sector) << SECTOR_SHIFT); + } + + /* + * Redirect reads to the origin. + * Record the bio in the hash of tracked bios. + * This prevents read-vs-realloc race. + * + * An important requirement is that when any bio is + * added to tracked_chunk_hash, the bio must be finished + * and removed from the hash without taking master_lock. + * + * So we add it immediately before submitting the bio + * with generic_make_request. + */ + bio->bi_bdev = s->origin->bdev; + + map_context = dm_get_mapinfo(bio); + BUG_ON(!map_context); + c = map_context->ptr; + + spin_lock_irq(&dm_multisnap_bio_list_lock); + BUG_ON(!hlist_unhashed(&c->node)); + hlist_add_head(&c->node, &s->tracked_chunk_hash[DM_TRACKED_CHUNK_HASH(c->chunk)]); + spin_unlock_irq(&dm_multisnap_bio_list_lock); + } else { + pe = dm_multisnap_alloc_pending_exception(s, chunk); + if (unlikely(!pe)) + goto failed_pe_allocation; + + s->store->add_next_remap(s->p, &pe->desc[0], &result); + if (unlikely(dm_multisnap_has_error(s))) + goto free_err_endio; + + dest.bdev = s->snapshot->bdev; + dest.sector = chunk_to_sector(s, result); + dest.count = s->chunk_size >> SECTOR_SHIFT; + + dispatch_kcopyd(s, pe, 0, chunk, bio, &dest, 1); + return; + } + } else { + /* Found in the snapshot */ + if (unlikely(check_pending_io(s, bio, chunk, id))) + return; + + if (unlikely(bio_rw(bio) == WRITE) && r == 1) { + copy_from = result; + + pe = dm_multisnap_alloc_pending_exception(s, chunk); + if (unlikely(!pe)) + goto failed_pe_allocation; + + s->store->make_chunk_writeable(s->p, &pe->desc[0], &result); + if (unlikely(dm_multisnap_has_error(s))) + goto free_err_endio; + + dest.bdev = s->snapshot->bdev; + dest.sector = chunk_to_sector(s, result); + dest.count = s->chunk_size >> SECTOR_SHIFT; + + dispatch_kcopyd(s, pe, 1, copy_from, bio, &dest, 1); + return; + } + + bio->bi_bdev = s->snapshot->bdev; + bio->bi_sector &= (s->chunk_size >> SECTOR_SHIFT) - 1; + bio->bi_sector |= chunk_to_sector(s, result); + } + generic_make_request(bio); + return; + +free_err_endio: + dm_multisnap_free_pending_exception(pe); +err_endio: + r = -EIO; + bio_endio(bio, r); + return; + +failed_pe_allocation: + s->pending_mempool_allocation_failed = 1; + dm_multisnap_enqueue_bio(s, bio); + return; +} + +/* + * The main routine used to process everything in the thread. + * It must be called with master_lock held. + * It is usually called from the worker thread, but can also be called + * from other places (for example kcopyd callback), assuming that the caller + * holds master_lock. + */ +static void dm_multisnap_process_bios(struct dm_multisnap *s) +{ + struct bio *bio; + snapid_t snapid; + +again: + cond_resched(); + + if (!list_empty(&s->background_works)) { + struct dm_multisnap_background_work *bw = + list_entry(s->background_works.next, + struct dm_multisnap_background_work, list); + list_del(&bw->list); + bw->queued = 0; + bw->work(s->p, bw); + + cond_resched(); + } + + bio = dm_multisnap_dequeue_bio(s); + if (unlikely(!bio)) + return; + + snapid = bio_get_snapid(bio); + if (snapid == DM_SNAPID_T_ORIGIN) + do_origin_write(s, bio); + else + do_snapshot_io(s, bio, snapid); + + if (likely(!s->pending_mempool_allocation_failed) && + likely(!dm_multisnap_lock_contended(s))) + goto again; + + if (!dm_multisnap_bio_queue_empty(s)) + wakeup_kmultisnapd(s); +} + +/* + * Background-job routines exported for exception store drivers. + * + * Jobs queued with these routines will be executed on background, with the + * master lock held. + */ + +void dm_multisnap_queue_work(struct dm_multisnap *s, + struct dm_multisnap_background_work *bw) +{ + dm_multisnap_assert_locked(s); + + if (bw->queued) { + BUG_ON(bw->queued != 1); + return; + } + + bw->queued = 1; + list_add(&bw->list, &s->background_works); + wakeup_kmultisnapd(s); +} +EXPORT_SYMBOL(dm_multisnap_queue_work); + +void dm_multisnap_cancel_work(struct dm_multisnap *s, + struct dm_multisnap_background_work *bw) +{ + dm_multisnap_assert_locked(s); + + if (!bw->queued) + return; + + bw->queued = 0; + list_del(&bw->list); +} +EXPORT_SYMBOL(dm_multisnap_cancel_work); + +/* + * The main work thread. + */ +static void dm_multisnap_work(struct work_struct *work) +{ + struct dm_multisnap *s = container_of(work, struct dm_multisnap, work); + + dm_multisnap_lock(s); + dm_multisnap_process_bios(s); + dm_multisnap_unlock(s); + + /* + * If there was some mempool allocation failure we must wait, outside + * the lock, until there is some free memory. + * If this branch is taken, the work is already queued again, so it + * reexecutes after finding some memory. + */ + if (unlikely(s->pending_mempool_allocation_failed)) { + s->pending_mempool_allocation_failed = 0; + dm_multisnap_wait_for_pending_exception(s); + } + + blk_unplug(bdev_get_queue(s->origin->bdev)); + blk_unplug(bdev_get_queue(s->snapshot->bdev)); +} + +static struct dm_multisnap *find_multisnapshot(struct block_device *origin) +{ + struct dm_multisnap *s; + list_for_each_entry(s, &all_multisnapshots, list_all) + if (s->origin->bdev == origin) + return s; + return NULL; +} + +/* --- exception stores --- */ + +static DEFINE_MUTEX(exception_stores_lock); +static LIST_HEAD(all_exception_stores); + +static struct dm_multisnap_exception_store * +dm_multisnap_find_exception_store(const char *name) +{ + struct dm_multisnap_exception_store *store; + + list_for_each_entry(store, &all_exception_stores, list) + if (!strcmp(store->name, name)) + return store; + + return NULL; +} + +static int dm_multisnap_exception_store_active(struct dm_multisnap_exception_store *find) +{ + struct dm_multisnap_exception_store *store; + + list_for_each_entry(store, &all_exception_stores, list) + if (store == find) + return 1; + + return 0; +} + +int dm_multisnap_register_exception_store(struct dm_multisnap_exception_store *store) +{ + mutex_lock(&exception_stores_lock); + + BUG_ON(dm_multisnap_exception_store_active(store)); + + if (dm_multisnap_find_exception_store(store->name)) { + mutex_unlock(&exception_stores_lock); + return -EEXIST; + } + list_add(&store->list, &all_exception_stores); + + mutex_unlock(&exception_stores_lock); + + return 0; +} +EXPORT_SYMBOL(dm_multisnap_register_exception_store); + +void dm_multisnap_unregister_exception_store(struct dm_multisnap_exception_store *store) +{ + mutex_lock(&exception_stores_lock); + + BUG_ON(!dm_multisnap_exception_store_active(store)); + list_del(&store->list); + + mutex_unlock(&exception_stores_lock); +} +EXPORT_SYMBOL(dm_multisnap_unregister_exception_store); + +static struct dm_multisnap_exception_store * +dm_multisnap_get_exception_store(const char *name) +{ + struct dm_multisnap_exception_store *store; + + mutex_lock(&exception_stores_lock); + + store = dm_multisnap_find_exception_store(name); + if (store) { + if (!try_module_get(store->module)) + store = NULL; + } + + mutex_unlock(&exception_stores_lock); + + return store; +} + +static void dm_multisnap_put_exception_store(struct dm_multisnap_exception_store *store) +{ + mutex_lock(&exception_stores_lock); + + BUG_ON(!dm_multisnap_exception_store_active(store)); + module_put(store->module); + + mutex_unlock(&exception_stores_lock); +} + +/* --- argument parser --- */ + +int dm_multisnap_get_string(char ***argv, unsigned *argc, + char **string, char **error) +{ + if (!*argc) { + *error = "Not enough arguments"; + return -EINVAL; + } + *string = *(*argv)++; + (*argc)--; + return 0; +} +EXPORT_SYMBOL(dm_multisnap_get_string); + +int dm_multisnap_get_uint64(char ***argv, unsigned *argc, + __u64 *unsigned_int64, char **error) +{ + char *string; + int r = dm_multisnap_get_string(argv, argc, &string, error); + if (r) + return r; + if (!*string) { +invalid_number: + *error = "Invalid number"; + return -EINVAL; + } + *unsigned_int64 = simple_strtoull(string, &string, 10); + if (*string) + goto invalid_number; + return 0; +} +EXPORT_SYMBOL(dm_multisnap_get_uint64); + +int dm_multisnap_get_uint(char ***argv, unsigned *argc, + unsigned *unsigned_int, char **error) +{ + __u64 unsigned_int64; + int r = dm_multisnap_get_uint64(argv, argc, &unsigned_int64, error); + if (r) + return r; + *unsigned_int = unsigned_int64; + if (unsigned_int64 != *unsigned_int) { + *error = "Number out of range"; + return -ERANGE; + } + return 0; +} +EXPORT_SYMBOL(dm_multisnap_get_uint); + +int dm_multisnap_get_argcount(char ***argv, unsigned *argc, + unsigned *unsigned_int, char **error) +{ + int r = dm_multisnap_get_uint(argv, argc, unsigned_int, error); + if (r) + return r; + if (*unsigned_int > *argc) { + *error = "Not enough arguments"; + return -EINVAL; + } + return 0; +} +EXPORT_SYMBOL(dm_multisnap_get_argcount); + +void dm_multisnap_adjust_string(char **result, unsigned *maxlen) +{ + unsigned len = strlen(*result); + *result += len; + *maxlen -= len; +} +EXPORT_SYMBOL(dm_multisnap_adjust_string); + +/* --- target methods --- */ + +static int compare_snapids(const void *p1, const void *p2) +{ + snapid_t s1 = *(const snapid_t *)p1; + snapid_t s2 = *(const snapid_t *)p2; + if (s1 < s2) + return -1; + if (s1 > s2) + return 1; + return 0; +} + +/* --- constructor & destructor --- */ + +static int multisnap_origin_ctr(struct dm_target *ti, unsigned argc, char **argv) +{ + int r; + int i; + char *origin_path; + char *snapshot_path; + unsigned chunk_size; + unsigned generic_args; + char *store_name; + unsigned store_args; + unsigned num_snapshots; + + struct dm_multisnap *s, *ss; + + mutex_lock(&all_multisnapshots_lock); + + r = dm_multisnap_get_string(&argv, &argc, &origin_path, &ti->error); + if (r) + goto bad_arguments; + r = dm_multisnap_get_string(&argv, &argc, &snapshot_path, &ti->error); + if (r) + goto bad_arguments; + r = dm_multisnap_get_uint(&argv, &argc, &chunk_size, &ti->error); + if (r) + goto bad_arguments; + + s = kmalloc(sizeof(struct dm_multisnap), GFP_KERNEL); + if (!s) { + ti->error = "Can't allocate multisnapshot structure"; + r = -ENOMEM; + goto bad_s; + } + + ti->private = s; + + s->p = NULL; + s->error = 0; + s->flags = 0; + mutex_init(&s->master_lock); + mutex_init(&s->status_lock); + INIT_WORK(&s->work, dm_multisnap_work); + dm_multisnap_init_bio_queues(s); + INIT_LIST_HEAD(&s->background_works); + s->kcopyd_jobs_submitted_count = 0; + s->kcopyd_jobs_finished_count = 0; + s->kcopyd_jobs_last_commit_count = 0; + INIT_LIST_HEAD(&s->pes_waiting_for_commit); + s->commit_sequence = 0; + for (i = 0; i < DM_PENDING_HASH_SIZE; i++) + INIT_HLIST_HEAD(&s->pending_hash[i]); + s->pending_mempool_allocation_failed = 0; + s->new_snapid_valid = 0; + INIT_LIST_HEAD(&s->all_snaps); + + r = dm_multisnap_get_argcount(&argv, &argc, &generic_args, &ti->error); + if (r) + goto bad_arguments; + while (generic_args--) { + char *arg; + r = dm_multisnap_get_string(&argv, &argc, &arg, &ti->error); + if (r) + goto bad_generic_arguments; + + /* Synchronize snapshot list against the list given in the target table */ + if (!strcasecmp(arg, "sync-snapshots")) + s->flags |= DM_MULTISNAP_SYNC_SNAPSHOTS; + /* Don't drop the snapshot store on error, rather stop the origin */ + else if (!strcasecmp(arg, "preserve-on-error")) + s->flags |= DM_MULTISNAP_PRESERVE_ON_ERROR; + else { + r = -EINVAL; + ti->error = "Invalid argument"; + goto bad_generic_arguments; + } + } + + r = dm_get_device(ti, origin_path, 0, 0, + FMODE_READ | FMODE_WRITE, &s->origin); + if (r) { + ti->error = "Could not get origin device"; + goto bad_origin; + } + s->origin_sectors = i_size_read(s->origin->bdev->bd_inode) >> SECTOR_SHIFT; + + r = dm_get_device(ti, snapshot_path, 0, 0, + FMODE_READ | FMODE_WRITE, &s->snapshot); + if (r) { + ti->error = "Could not get snapshot device"; + goto bad_snapshot; + } + + /* + * Prevent multiple load over the same devices. + * + * Currently, multisnapshot target is loaded just once, there is no + * place where it would be reloaded (even lvchange --refresh doesn't + * do it). So there is no need to handle loading the target multiple + * times for the same devices and "handover" of the exception store. + * + * As a safeguard to protect against possible data corruption from + * userspace misbehavior, we check that there is no other target loaded + * that has the origin or the snapshot store on the same devices. + */ + list_for_each_entry(ss, &all_multisnapshots, list_all) + if (ss->origin->bdev == s->origin->bdev || + ss->snapshot->bdev == s->snapshot->bdev) { + ti->error = "Another multisnapshot with the same devices"; + r = -EINVAL; + goto bad_conflicting_snapshot; + } + + /* Validate the chunk size */ + if (chunk_size > INT_MAX / 512) { + ti->error = "Chunk size is too high"; + r = -EINVAL; + goto bad_chunk_size; + } + if (!is_power_of_2(chunk_size)) { + ti->error = "Chunk size is not power of two"; + r = -EINVAL; + goto bad_chunk_size; + } + chunk_size *= 512; + if (chunk_size < bdev_logical_block_size(s->origin->bdev) || + chunk_size < bdev_logical_block_size(s->snapshot->bdev)) { + ti->error = "Chunk size is smaller than device block size"; + r = -EINVAL; + goto bad_chunk_size; + } + s->chunk_size = chunk_size; + s->chunk_shift = ffs(chunk_size) - 1; + + s->pending_pool = mempool_create_slab_pool(DM_PENDING_MEMPOOL_SIZE, + pending_exception_cache); + if (!s->pending_pool) { + ti->error = "Could not allocate mempool for pending exceptions"; + r = -ENOMEM; + goto bad_pending_pool; + } + + s->tracked_chunk_pool = mempool_create_slab_pool(DM_TRACKED_CHUNK_POOL_SIZE, + tracked_chunk_cache); + if (!s->tracked_chunk_pool) { + ti->error = "Could not allocate tracked_chunk mempool for tracking reads"; + goto bad_tracked_chunk_pool; + } + s->n_tracked_ios = 0; + for (i = 0; i < DM_TRACKED_CHUNK_HASH_SIZE; i++) + INIT_HLIST_HEAD(&s->tracked_chunk_hash[i]); + + r = dm_kcopyd_client_create(DM_MULTISNAP_KCOPYD_PAGES, &s->kcopyd); + if (r) { + ti->error = "Could not create kcopyd client"; + goto bad_kcopyd; + } + + r = dm_multisnap_get_string(&argv, &argc, &store_name, &ti->error); + if (r) + goto bad_store; + + r = dm_multisnap_get_argcount(&argv, &argc, &store_args, &ti->error); + if (r) + goto bad_store; + + s->store = dm_multisnap_get_exception_store(store_name); + if (!s->store) { + request_module("dm-store-%s", store_name); + s->store = dm_multisnap_get_exception_store(store_name); + if (!s->store) { + ti->error = "Can't get exception store type"; + r = -ENOENT; + goto bad_store; + } + } + + s->wq = create_singlethread_workqueue("kmultisnapd"); + if (!s->wq) { + ti->error = "Could not create kernel thread"; + r = -ENOMEM; + goto bad_thread; + } + + dm_multisnap_lock(s); + r = s->store->init_exception_store(s, &s->p, store_args, argv, &ti->error); + if (r) { + s->p = NULL; + goto exception_store_error; + } + + ti->split_io = s->chunk_size >> SECTOR_SHIFT; + ti->num_flush_requests = 1; + + argv += store_args; + argc -= store_args; + + /* + * Synchronize snapshot IDs according to the table line: + * allocate IDs that are specified on the table line + * free IDs that are not specified on the table line + */ + if (s->flags & DM_MULTISNAP_SYNC_SNAPSHOTS) { + snapid_t sn, n, *snapids; + r = dm_multisnap_get_argcount(&argv, &argc, &num_snapshots, &ti->error); + if (r) + goto error_syncing_snapshots; + snapids = vmalloc(sizeof(snapid_t) * (num_snapshots + 1)); + if (!snapids && num_snapshots) { + ti->error = "Could not allocate snapids array"; + goto bad_kcopyd; + } + for (n = 0; n < num_snapshots; n++) { + char *string; + r = dm_multisnap_get_string(&argv, &argc, &string, &ti->error); + if (r) { + vfree(snapids); + goto error_syncing_snapshots; + } + r = read_snapid(s, string, &snapids[n], &ti->error); + if (r) { + vfree(snapids); + goto error_syncing_snapshots; + } + } + snapids[num_snapshots] = DM_SNAPID_T_ORIGIN; + + /* Delete the snapshots that shouldn't be there */ + sort(snapids, num_snapshots, sizeof(snapid_t), compare_snapids, NULL); + sn = s->store->get_next_snapid(s->p, 0); + for (n = 0; n <= num_snapshots; n++) { + while (sn < snapids[n]) { + if (!dm_multisnap_has_error(s)) { + r = s->store->delete_snapshot(s->p, sn); + if (r && s->flags & DM_MULTISNAP_PRESERVE_ON_ERROR) { + ti->error = "Could not delete snapshot"; + vfree(snapids); + goto error_syncing_snapshots; + } + } + sn = s->store->get_next_snapid(s->p, sn + 1); + if (sn == DM_SNAPID_T_ORIGIN) + goto delete_done; + } + if (sn == snapids[n]) { + sn = s->store->get_next_snapid(s->p, sn + 1); + if (sn == DM_SNAPID_T_ORIGIN) + goto delete_done; + } + } +delete_done: + /* Create the snapshots that should be there */ + if (s->store->compare_snapids_for_create) + sort(snapids, num_snapshots, sizeof(snapid_t), + s->store->compare_snapids_for_create, NULL); + for (n = 0; n <= num_snapshots; n++) { + if (!dm_multisnap_snapshot_exists(s, snapids[n])) { + if (!dm_multisnap_has_error(s)) { + r = s->store->create_snapshot(s->p, snapids[n]); + if (r && s->flags & DM_MULTISNAP_PRESERVE_ON_ERROR) { + ti->error = "Could not create snapshot"; + vfree(snapids); + goto error_syncing_snapshots; + } + } + } + } + vfree(snapids); + } + + dm_multisnap_unlock(s); + + list_add(&s->list_all, &all_multisnapshots); + + mutex_unlock(&all_multisnapshots_lock); + return 0; + +error_syncing_snapshots: + s->store->exit_exception_store(s->p); + s->p = NULL; +exception_store_error: + dm_multisnap_unlock(s); + destroy_workqueue(s->wq); +bad_thread: + dm_multisnap_put_exception_store(s->store); +bad_store: + dm_kcopyd_client_destroy(s->kcopyd); +bad_kcopyd: + mempool_destroy(s->tracked_chunk_pool); +bad_tracked_chunk_pool: + mempool_destroy(s->pending_pool); +bad_pending_pool: +bad_conflicting_snapshot: +bad_chunk_size: + dm_put_device(ti, s->snapshot); +bad_snapshot: + dm_put_device(ti, s->origin); +bad_origin: +bad_generic_arguments: + kfree(s); +bad_s: +bad_arguments: + mutex_unlock(&all_multisnapshots_lock); + return r; +} + +static void multisnap_origin_dtr(struct dm_target *ti) +{ + struct dm_multisnap *s = ti->private; + struct dm_multisnap_snap *sn; + unsigned i; + + mutex_lock(&all_multisnapshots_lock); + + /* Make sure that no more IOs will be submitted by snapshot targets */ + list_for_each_entry(sn, &s->all_snaps, list_snaps) { + spin_lock_irq(&dm_multisnap_bio_list_lock); + sn->s = NULL; + spin_unlock_irq(&dm_multisnap_bio_list_lock); + } + list_del(&s->all_snaps); + + /* + * This code is called in the destructor, it is not performance + * sensitive and thus we use polling with active waiting (msleep(1)). + * + * A possible 1ms delay on device destruction won't cause any trouble + * and this polling is simpler and less bug-prone than using wait + * queues. + */ +poll_for_ios: + /* Wait for IOs on the snapshot */ + spin_lock_irq(&dm_multisnap_bio_list_lock); + if (s->n_tracked_ios) { + spin_unlock_irq(&dm_multisnap_bio_list_lock); + msleep(1); + goto poll_for_ios; + } + spin_unlock_irq(&dm_multisnap_bio_list_lock); + + /* Make sure that there really are no outstanding IOs */ + for (i = 0; i < DM_MULTISNAP_N_QUEUES; i++) + BUG_ON(!bio_list_empty(&s->queue[i].bios)); + for (i = 0; i < DM_TRACKED_CHUNK_HASH_SIZE; i++) + BUG_ON(!hlist_empty(&s->tracked_chunk_hash[i])); + + /* Wait for pending reallocations */ + dm_multisnap_lock(s); + for (i = 0; i < DM_PENDING_HASH_SIZE; i++) + if (!hlist_empty(&s->pending_hash[i])) { + dm_multisnap_unlock(s); + msleep(1); + goto poll_for_ios; + } + dm_multisnap_unlock(s); + + flush_workqueue(s->wq); + + dm_multisnap_lock(s); + dm_multisnap_call_commit(s); + s->store->exit_exception_store(s->p); + s->p = NULL; + list_del(&s->list_all); + dm_multisnap_unlock(s); + + destroy_workqueue(s->wq); + kfree(s->p); + dm_kcopyd_client_destroy(s->kcopyd); + mempool_destroy(s->tracked_chunk_pool); + mempool_destroy(s->pending_pool); + dm_put_device(ti, s->snapshot); + dm_put_device(ti, s->origin); + dm_multisnap_put_exception_store(s->store); + + kfree(s); + + mutex_unlock(&all_multisnapshots_lock); +} + +static int multisnap_origin_map(struct dm_target *ti, struct bio *bio, + union map_info *map_context) +{ + struct dm_multisnap *s = ti->private; + + /* + * Do the most common case quickly: reads and write barriers are + * dispatched to the origin device directly. + */ + if (likely(bio_rw(bio) != WRITE) || unlikely(bio_empty_barrier(bio))) { + bio->bi_bdev = s->origin->bdev; + return DM_MAPIO_REMAPPED; + } + + bio_put_snapid(bio, DM_SNAPID_T_ORIGIN); + + dm_multisnap_enqueue_bio(s, bio); + wakeup_kmultisnapd(s); + + return DM_MAPIO_SUBMITTED; +} + +static int multisnap_origin_message(struct dm_target *ti, + unsigned argc, char **argv) +{ + struct dm_multisnap *s = ti->private; + char *error; + int r; + int subsnap = 0; + snapid_t subsnap_id = 0; + + mutex_lock(&all_multisnapshots_lock); + dm_multisnap_lock(s); + + if (argc == 2 && !strcasecmp(argv[0], "create_subsnap")) { + /* + * Create snapshot of snapshot. + */ + r = read_snapid(s, argv[1], &subsnap_id, &error); + if (r) { + DMWARN("invalid snapshot id: %s", error); + goto unlock_ret; + } + subsnap = 1; + goto create_snapshot; + } + + if (argc == 1 && !strcasecmp(argv[0], "create")) { +create_snapshot: + /* + * Prepare snapshot creation. + * + * We allocate a snapid, and return it in the status. + * + * The snapshot is really created in postsuspend method (to + * make sure that possibly mounted filesystem is quiescent and + * the snapshot will be consistent). + */ + r = dm_multisnap_has_error(s); + if (r) + goto unlock_ret; + + dm_multisnap_status_lock(s); + s->new_snapid_valid = 0; + dm_multisnap_status_unlock(s); + + r = s->store->allocate_snapid(s->p, &s->new_snapid, + subsnap, subsnap_id); + if (r) + goto unlock_ret; + + r = dm_multisnap_has_error(s); + if (r) + goto unlock_ret; + + dm_multisnap_status_lock(s); + s->new_snapid_valid = 1; + dm_multisnap_status_unlock(s); + + r = 0; + goto unlock_ret; + } + + if (argc == 2 && !strcasecmp(argv[0], "delete")) { + /* + * Delete a snapshot. + */ + snapid_t snapid; + struct dm_multisnap_snap *sn; + struct bio *bio; + struct bio_list all_bios; + + r = read_snapid(s, argv[1], &snapid, &error); + if (r) { + DMWARN("invalid snapshot id: %s", error); + goto unlock_ret; + } + + if (!s->store->delete_snapshot) { + DMERR("snapshot store doesn't support delete"); + r = -EOPNOTSUPP; + goto unlock_ret; + } + + r = dm_multisnap_has_error(s); + if (r) + goto unlock_ret; + + /* Kick off possibly attached snapshot */ + list_for_each_entry(sn, &s->all_snaps, list_snaps) { + if (sn->snapid == snapid) { + spin_lock_irq(&dm_multisnap_bio_list_lock); + sn->s = NULL; + spin_unlock_irq(&dm_multisnap_bio_list_lock); + } + } + + /* Terminate bios queued for this snapshot so far */ + dm_multisnap_bio_dequeue_all(s, &all_bios); + while ((bio = bio_list_pop(&all_bios))) { + if (bio_get_snapid(bio) == snapid) + bio_endio(bio, -EIO); + else + dm_multisnap_enqueue_bio(s, bio); + } + + if (!dm_multisnap_snapshot_exists(s, snapid)) { + DMWARN("snapshot with this id doesn't exists."); + r = -EINVAL; + goto unlock_ret; + } + + r = s->store->delete_snapshot(s->p, snapid); + if (r) + goto unlock_ret; + + dm_multisnap_unlock(s); + + r = dm_multisnap_force_commit(s); + + goto unlock2_ret; + } + + DMWARN("unrecognised message received."); + r = -EINVAL; + +unlock_ret: + dm_multisnap_unlock(s); +unlock2_ret: + mutex_unlock(&all_multisnapshots_lock); + + return r; +} + +/* Print used snapshot IDs into a supplied string */ +static void print_snapshot_ids(struct dm_multisnap *s, char *result, unsigned maxlen) +{ + snapid_t nsnap = 0; + snapid_t sn = 0; + while ((sn = s->store->get_next_snapid(s->p, sn)) != DM_SNAPID_T_ORIGIN) + sn++, nsnap++; + snprintf(result, maxlen, " %llu", (unsigned long long)nsnap); + dm_multisnap_adjust_string(&result, &maxlen); + sn = 0; + while ((sn = s->store->get_next_snapid(s->p, sn)) != DM_SNAPID_T_ORIGIN) { + snprintf(result, maxlen, " "); + dm_multisnap_adjust_string(&result, &maxlen); + print_snapid(s, result, maxlen, sn); + dm_multisnap_adjust_string(&result, &maxlen); + sn++; + } +} + +static int multisnap_origin_status(struct dm_target *ti, status_type_t type, + char *result, unsigned maxlen) +{ + struct dm_multisnap *s = ti->private; + + /* + * Use a special status lock, so that this code can execute even + * when the underlying device is suspended and there is no possibility + * to obtain the master lock. + */ + dm_multisnap_status_lock(s); + + switch (type) { + case STATUSTYPE_INFO: { + unsigned long long total, alloc, meta; + snprintf(result, maxlen, "5 %d ", dm_multisnap_has_error(s)); + dm_multisnap_adjust_string(&result, &maxlen); + if (s->new_snapid_valid) + print_snapid(s, result, maxlen, s->new_snapid); + else + snprintf(result, maxlen, "-"); + dm_multisnap_adjust_string(&result, &maxlen); + if (s->store->get_space) + s->store->get_space(s->p, &total, &alloc, &meta); + else + total = alloc = meta = 0; + total <<= s->chunk_shift - SECTOR_SHIFT; + alloc <<= s->chunk_shift - SECTOR_SHIFT; + meta <<= s->chunk_shift - SECTOR_SHIFT; + snprintf(result, maxlen, " %llu %llu %llu", total, alloc, meta); + dm_multisnap_adjust_string(&result, &maxlen); + print_snapshot_ids(s, result, maxlen); + dm_multisnap_adjust_string(&result, &maxlen); + break; + } + case STATUSTYPE_TABLE: { + unsigned ngen = 0; + if (s->flags & DM_MULTISNAP_SYNC_SNAPSHOTS) + ngen++; + if (s->flags & DM_MULTISNAP_PRESERVE_ON_ERROR) + ngen++; + snprintf(result, maxlen, "%s %s %u %u%s%s %s", + s->origin->name, + s->snapshot->name, + s->chunk_size / 512, + ngen, + s->flags & DM_MULTISNAP_SYNC_SNAPSHOTS ? + " sync-snapshots" : "", + s->flags & DM_MULTISNAP_PRESERVE_ON_ERROR ? + " preserve-on-error" : "", + s->store->name); + dm_multisnap_adjust_string(&result, &maxlen); + if (s->store->status_table) + s->store->status_table(s->p, result, maxlen); + else + snprintf(result, maxlen, " 0"); + dm_multisnap_adjust_string(&result, &maxlen); + if (s->flags & DM_MULTISNAP_SYNC_SNAPSHOTS) { + print_snapshot_ids(s, result, maxlen); + dm_multisnap_adjust_string(&result, &maxlen); + } + break; + } + } + + dm_multisnap_status_unlock(s); + + /* If there's no space left in the buffer, ask for larger size */ + return maxlen <= 1; +} + +/* + * In postsuspend, we optionally create a snapshot that we prepared with + * a message. + */ +static void multisnap_origin_postsuspend(struct dm_target *ti) +{ + struct dm_multisnap *s = ti->private; + + dm_multisnap_lock(s); + if (s->new_snapid_valid && !dm_multisnap_has_error(s)) { + /* + * No way to return the error code, but it is recorded + * in s->error anyway. + */ + s->store->create_snapshot(s->p, s->new_snapid); + s->new_snapid_valid = 0; + } + dm_multisnap_unlock(s); + + dm_multisnap_force_commit(s); +} + +static int multisnap_snap_ctr(struct dm_target *ti, unsigned argc, char **argv) +{ + int r; + char *origin_path; + char *snapid_str; + snapid_t snapid; + int doesnt_exist; + + struct dm_dev *origin; + + struct dm_multisnap *s; + struct dm_multisnap_snap *sn; + + r = dm_multisnap_get_string(&argv, &argc, &origin_path, &ti->error); + if (r) + goto bad_arguments; + r = dm_multisnap_get_string(&argv, &argc, &snapid_str, &ti->error); + if (r) + goto bad_arguments; + r = dm_get_device(ti, origin_path, 0, 0, FMODE_READ | FMODE_WRITE, &origin); + if (r) { + ti->error = "Could not get origin device"; + goto bad_origin; + } + mutex_lock(&all_multisnapshots_lock); + s = find_multisnapshot(origin->bdev); + if (!s) { + r = -ENXIO; + ti->error = "Origin target not loaded"; + goto origin_not_loaded; + } + + dm_multisnap_lock(s); + + r = read_snapid(s, snapid_str, &snapid, &ti->error); + if (r) { + dm_multisnap_unlock(s); + goto snapid_doesnt_exist; + } + + doesnt_exist = 0; + if (!dm_multisnap_snapshot_exists(s, snapid)) { + if (dm_multisnap_has_error(s) && dm_multisnap_drop_on_error(s)) { + /* + * If there was an error, we don't know which snapshot + * IDs are available. So we must accept it. But we + * abort all accesses to this snapshot with an error. + */ + doesnt_exist = 1; + } else { + dm_multisnap_unlock(s); + r = -ENOENT; + ti->error = "Snapshot with this id doesn't exist"; + goto snapid_doesnt_exist; + } + } + dm_multisnap_unlock(s); + + sn = kmalloc(sizeof(*sn) + strlen(snapid_str), GFP_KERNEL); + if (!sn) { + ti->error = "Could not allocate multisnapshot_snap structure"; + r = -ENOMEM; + goto cant_allocate; + } + sn->s = doesnt_exist ? NULL : s; + sn->snapid = snapid; + list_add(&sn->list_snaps, &s->all_snaps); + strlcpy(sn->origin_name, origin->name, sizeof sn->origin_name); + strcpy(sn->snapid_string, snapid_str); + + mutex_unlock(&all_multisnapshots_lock); + + dm_put_device(ti, origin); + + ti->private = sn; + ti->split_io = s->chunk_size >> SECTOR_SHIFT; + ti->num_flush_requests = 1; + + return 0; + +cant_allocate: +snapid_doesnt_exist: +origin_not_loaded: + dm_put_device(ti, origin); + mutex_unlock(&all_multisnapshots_lock); +bad_origin: +bad_arguments: + return r; +} + +static void multisnap_snap_dtr(struct dm_target *ti) +{ + struct dm_multisnap_snap *sn = ti->private; + + mutex_lock(&all_multisnapshots_lock); + + list_del(&sn->list_snaps); + kfree(sn); + + mutex_unlock(&all_multisnapshots_lock); +} + +/* + * Each snapshot I/O is counted in n_tracked_ios in the origin and + * has 'struct dm_multisnap_tracked_chunk' allocated. + * dm_multisnap_tracked_chunk->node can be optionally linked into + * origin's hash of tracked I/Os. + */ +static int multisnap_snap_map(struct dm_target *ti, struct bio *bio, + union map_info *map_context) +{ + struct dm_multisnap_snap *sn = ti->private; + struct dm_multisnap *s; + struct dm_multisnap_tracked_chunk *c; + + bio_put_snapid(bio, sn->snapid); + + spin_lock_irq(&dm_multisnap_bio_list_lock); + s = sn->s; + if (unlikely(!s)) { + spin_unlock_irq(&dm_multisnap_bio_list_lock); + return -EIO; + } + /* + * make sure that the origin is not unloaded under us while + * we drop the lock + */ + s->n_tracked_ios++; + + c = mempool_alloc(s->tracked_chunk_pool, GFP_ATOMIC); + if (unlikely(!c)) { + spin_unlock_irq(&dm_multisnap_bio_list_lock); + c = mempool_alloc(s->tracked_chunk_pool, GFP_NOIO); + spin_lock_irq(&dm_multisnap_bio_list_lock); + } + c->s = s; + c->chunk = sector_to_chunk(s, bio->bi_sector); + c->bio_rw = bio_rw(bio); + INIT_HLIST_NODE(&c->node); + map_context->ptr = c; + + if (unlikely(bio_empty_barrier(bio))) { + bio->bi_bdev = s->snapshot->bdev; + spin_unlock_irq(&dm_multisnap_bio_list_lock); + return DM_MAPIO_REMAPPED; + } + + dm_multisnap_enqueue_bio_unlocked(s, bio); + spin_unlock_irq(&dm_multisnap_bio_list_lock); + + wakeup_kmultisnapd(s); + + return DM_MAPIO_SUBMITTED; +} + +static int multisnap_snap_end_io(struct dm_target *ti, struct bio *bio, + int error, union map_info *map_context) +{ + struct dm_multisnap_tracked_chunk *c = map_context->ptr; + struct dm_multisnap *s = c->s; + unsigned long flags; + + spin_lock_irqsave(&dm_multisnap_bio_list_lock, flags); + + s->n_tracked_ios--; + if (!hlist_unhashed(&c->node)) + hlist_del(&c->node); + mempool_free(c, s->tracked_chunk_pool); + + spin_unlock_irqrestore(&dm_multisnap_bio_list_lock, flags); + + return 0; +} + +static int multisnap_snap_status(struct dm_target *ti, status_type_t type, + char *result, unsigned maxlen) +{ + struct dm_multisnap_snap *sn = ti->private; + + switch (type) { + + case STATUSTYPE_INFO: + /* there is no status */ + result[0] = 0; + dm_multisnap_adjust_string(&result, &maxlen); + break; + case STATUSTYPE_TABLE: + snprintf(result, maxlen, "%s %s", + sn->origin_name, sn->snapid_string); + dm_multisnap_adjust_string(&result, &maxlen); + break; + } + + /* If there's no space left in the buffer, ask for larger size */ + return maxlen <= 1; +} + +static struct target_type multisnap_origin_target = { + .name = "multisnapshot", + .version = {1, 0, 0}, + .module = THIS_MODULE, + .ctr = multisnap_origin_ctr, + .dtr = multisnap_origin_dtr, + .map = multisnap_origin_map, + .message = multisnap_origin_message, + .status = multisnap_origin_status, + .postsuspend = multisnap_origin_postsuspend, +}; + +static struct target_type multisnap_snap_target = { + .name = "multisnap-snap", + .version = {1, 0, 0}, + .module = THIS_MODULE, + .ctr = multisnap_snap_ctr, + .dtr = multisnap_snap_dtr, + .map = multisnap_snap_map, + .end_io = multisnap_snap_end_io, + .status = multisnap_snap_status, +}; + +static int __init dm_multisnapshot_init(void) +{ + int r; + + pending_exception_cache = kmem_cache_create( + "dm_multisnap_pending_exception", + sizeof(struct dm_multisnap_pending_exception), + __alignof__(struct dm_multisnap_pending_exception), + 0, + pending_exception_ctor); + if (!pending_exception_cache) { + DMERR("Couldn't create exception cache."); + r = -ENOMEM; + goto bad_exception_cache; + } + tracked_chunk_cache = KMEM_CACHE(dm_multisnap_tracked_chunk, 0); + if (!tracked_chunk_cache) { + DMERR("Couldn't create cache to track chunks in use."); + r = -ENOMEM; + goto bad_tracked_chunk_cache; + } + + r = dm_register_target(&multisnap_origin_target); + if (r < 0) { + DMERR("multisnap_origin_target target register failed %d", r); + goto bad_multisnap_origin_target; + } + + r = dm_register_target(&multisnap_snap_target); + if (r < 0) { + DMERR("multisnap_snap_target target register failed %d", r); + goto bad_multisnap_snap_target; + } + + return 0; + +bad_multisnap_snap_target: + dm_unregister_target(&multisnap_origin_target); +bad_multisnap_origin_target: + kmem_cache_destroy(tracked_chunk_cache); +bad_tracked_chunk_cache: + kmem_cache_destroy(pending_exception_cache); +bad_exception_cache: + return r; +} + +static void __exit dm_multisnapshot_exit(void) +{ + dm_unregister_target(&multisnap_origin_target); + dm_unregister_target(&multisnap_snap_target); + kmem_cache_destroy(tracked_chunk_cache); + kmem_cache_destroy(pending_exception_cache); +} + +/* Module hooks */ +module_init(dm_multisnapshot_init); +module_exit(dm_multisnapshot_exit); + +MODULE_DESCRIPTION(DM_NAME " multisnapshot target"); +MODULE_AUTHOR("Mikulas Patocka"); +MODULE_LICENSE("GPL"); diff --git a/drivers/md/dm-multisnap.h b/drivers/md/dm-multisnap.h new file mode 100644 index 0000000..0af87dd --- /dev/null +++ b/drivers/md/dm-multisnap.h @@ -0,0 +1,183 @@ +/* + * Copyright (C) 2009 Red Hat Czech, s.r.o. + * + * Mikulas Patocka <mpatocka@redhat.com> + * + * This file is released under the GPL. + */ + +#ifndef DM_MULTISNAP_H +#define DM_MULTISNAP_H + +/* + * This file defines the interface between generic driver (dm-multisnap.c) + * and exception store drivers. + */ + +#include <linux/device-mapper.h> +#include <linux/list.h> + +#define EFSERROR EPERM + +#define DM_MSG_PREFIX "multisnapshot" + +#define DM_SNAPID_T_ORIGIN 0xffffffffffffffffULL + +typedef sector_t chunk_t; +typedef __u64 snapid_t; + +struct dm_multisnap; /* private to dm-multisnap.c */ +struct dm_exception_store; /* private to the exception store driver */ + +struct dm_multisnap_background_work { + struct list_head list; + void (*work)(struct dm_exception_store *, struct dm_multisnap_background_work *); + int queued; +}; + +union chunk_descriptor { + __u64 bitmask; + struct { + snapid_t from; + snapid_t to; + } range; +}; + +struct dm_multisnap_exception_store { + struct list_head list; + struct module *module; + const char *name; + + /* < 0 - error */ + int (*init_exception_store)(struct dm_multisnap *dm, struct dm_exception_store **s, + unsigned argc, char **argv, char **error); + + void (*exit_exception_store)(struct dm_exception_store *s); + + void (*store_lock_acquired)(struct dm_exception_store *s, int flags); + + /* These two can override format of snapids in the table. Can be NULL */ + void (*print_snapid)(struct dm_exception_store *s, char *string, + unsigned maxlen, snapid_t snapid); + int (*read_snapid)(struct dm_exception_store *s, char *string, + snapid_t *snapid, char **error); + + /* return the exception-store specific table arguments */ + void (*status_table)(struct dm_exception_store *s, char *result, unsigned maxlen); + + /* return the space */ + void (*get_space)(struct dm_exception_store *s, unsigned long long *chunks_total, + unsigned long long *chunks_allocated, + unsigned long long *chunks_metadata_allocated); + + /* < 0 - error */ + int (*allocate_snapid)(struct dm_exception_store *s, snapid_t *snapid, + int snap_of_snap, snapid_t master); + + /* < 0 - error */ + int (*create_snapshot)(struct dm_exception_store *s, snapid_t snapid); + + /* < 0 - error (may be NULL if not supported) */ + int (*delete_snapshot)(struct dm_exception_store *s, snapid_t snapid); + + /* + * Get the first snapid at or after snapid in its argument. + * If there are no more snapids, return DM_SNAPID_T_ORIGIN. + */ + snapid_t (*get_next_snapid)(struct dm_exception_store *s, snapid_t snapid); + + /* + * qsort()-compatible function to order snapshots for creation. + * may be NULL if standard ordering should be used. + */ + int (*compare_snapids_for_create)(const void *p1, const void *p2); + + /* 0 - not found, 1 - found (read-only), 2 - found (writeable), < 0 - error */ + int (*find_snapshot_chunk)(struct dm_exception_store *s, snapid_t snapid, + chunk_t chunk, int write, chunk_t *result); + + /* + * Chunk interface between exception store and generic code. + * Allowed sequences: + * + * - first call reset_query + * then repeatedly query next exception to make with query_next_remap + * and add it to btree with add_next_remap. This can be repeated until + * query_next_remap indicates that it has nothing more or until all 8 + * kcopyd slots are filled. + * + * - call find_snapshot_chunk, if it returns 0, you can call + * add_next_remap to add the chunk to the btree. + * + * - call find_snapshot_chunk, if it returns 1 (shared chunk), call + * make_chunk_writeable to relocate that chunk. + */ + + void (*reset_query)(struct dm_exception_store *s); + int (*query_next_remap)(struct dm_exception_store *s, chunk_t chunk); + void (*add_next_remap)(struct dm_exception_store *s, + union chunk_descriptor *cd, chunk_t *new_chunk); + + /* may be NULL if writeable snapshots are not supported */ + void (*make_chunk_writeable)(struct dm_exception_store *s, + union chunk_descriptor *cd, chunk_t *new_chunk); + int (*check_conflict)(struct dm_exception_store *s, + union chunk_descriptor *cd, snapid_t snapid); + + /* This is called without the lock, prior to commit */ + void (*prepare_for_commit)(struct dm_exception_store *s); + + /* Commit the transactions */ + void (*commit)(struct dm_exception_store *s); +}; + +#define DM_MULTISNAP_SET_ERROR(dm, err, msg) \ +do { \ + DMERR msg; \ + dm_multisnap_set_error(dm, err); \ +} while (0) + +/* dm-multisnap.c */ + +/* Access generic information about the snapshot */ +struct block_device *dm_multisnap_snapshot_bdev(struct dm_multisnap *s); +unsigned dm_multisnap_chunk_size(struct dm_multisnap *s); +void dm_multisnap_set_error(struct dm_multisnap *s, int error); +int dm_multisnap_has_error(struct dm_multisnap *s); +int dm_multisnap_drop_on_error(struct dm_multisnap *s); +int dm_multisnap_snapshot_exists(struct dm_multisnap *s, snapid_t snapid); + +/* Lock status/table queries */ +void dm_multisnap_status_lock(struct dm_multisnap *s); +void dm_multisnap_status_unlock(struct dm_multisnap *s); +void dm_multisnap_status_assert_locked(struct dm_multisnap *s); + +/* + * Commit. dm_multisnap_call_commit can only be called + * if dm_multisnap_can_commit returns true + */ +int dm_multisnap_can_commit(struct dm_multisnap *s); +void dm_multisnap_call_commit(struct dm_multisnap *s); + +/* Delayed work for delete/merge */ +void dm_multisnap_queue_work(struct dm_multisnap *s, + struct dm_multisnap_background_work *bw); +void dm_multisnap_cancel_work(struct dm_multisnap *s, + struct dm_multisnap_background_work *bw); + +/* Parsing command line */ +int dm_multisnap_get_string(char ***argv, unsigned *argc, + char **string, char **error); +int dm_multisnap_get_uint64(char ***argv, unsigned *argc, + __u64 *unsigned_int64, char **error); +int dm_multisnap_get_uint(char ***argv, unsigned *argc, + unsigned *unsigned_int, char **error); +int dm_multisnap_get_argcount(char ***argv, unsigned *argc, + unsigned *unsigned_int, char **error); +void dm_multisnap_adjust_string(char **result, unsigned *maxlen); + +/* Register/unregister the exception store driver */ +int dm_multisnap_register_exception_store(struct dm_multisnap_exception_store *store); +void dm_multisnap_unregister_exception_store(struct dm_multisnap_exception_store *store); + +#endif -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 02/14] dm-bufio 2010-03-02 0:23 [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer 2010-03-02 0:23 ` [PATCH 01/14] dm-multisnap-common Mike Snitzer @ 2010-03-02 0:23 ` Mike Snitzer 2010-03-02 0:23 ` [PATCH 03/14] dm-multisnap-mikulas-headers Mike Snitzer ` (13 subsequent siblings) 15 siblings, 0 replies; 22+ messages in thread From: Mike Snitzer @ 2010-03-02 0:23 UTC (permalink / raw) To: dm-devel; +Cc: Mikulas Patocka From: Mikulas Patocka <mpatocka@redhat.com> Buffered I/O interface. This interface allows you to do I/O on device and act as a cache, holding recently read blocks in memory and performing delayed writes. We don't use buffer cache or page cache already present in the kernel, because: * both buffer cache and page cache can't handle block size larger than one page * both buffer cache and page cache allocate memory to perform reads, this could lead to deadlocks Currently, we use fixed size cache (8MB), this could be changed to scale with available memory. If mm people provide some callbacks, we could install a callback to free the cache in case of memory pressure. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/dm-bufio.c | 987 +++++++++++++++++++++++++++++++++++++++++++++++++ drivers/md/dm-bufio.h | 35 ++ 2 files changed, 1022 insertions(+), 0 deletions(-) create mode 100644 drivers/md/dm-bufio.c create mode 100644 drivers/md/dm-bufio.h diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c new file mode 100644 index 0000000..44dbb0e --- /dev/null +++ b/drivers/md/dm-bufio.c @@ -0,0 +1,987 @@ +/* + * Copyright (C) 2009 Red Hat Czech, s.r.o. + * + * Mikulas Patocka <mpatocka@redhat.com> + * + * This file is released under the GPL. + */ + +#include <linux/device-mapper.h> +#include <linux/dm-io.h> +#include <linux/slab.h> +#include <linux/vmalloc.h> + +#include "dm-bufio.h" + +/* + * dm_bufio_client_create --- create a buffered IO cache on a given device + * dm_bufio_client_destroy --- release a buffered IO cache + * + * dm_bufio_read --- read a given block from disk. Returns pointer to data. + * Returns a pointer to dm_buffer that can be used to release the buffer + * or to make it dirty. + * dm_bufio_new --- like dm_bufio_read, but don't read anything from the disk. + * It is expected that the caller initializes the buffer and marks it + * dirty. + * dm_bufio_release --- release a reference obtained with dm_bufio_read or + * dm_bufio_new. The data pointer and dm_buffer pointer is no longer valid + * after this call. + * + * WARNING: to avoid deadlocks, the thread can hold at most one buffer. Multiple + * threads can hold each one buffer simultaneously. + * + * dm_bufio_mark_buffer_dirty --- mark a buffer dirty. It should be called after + * the buffer is modified. + * dm_bufio_write_dirty_buffers --- write all dirty buffers. Guarantees that all + * dirty buffers created prior to this call are on disk when this call + * exits. + * dm_bufio_issue_flush --- send an empty write barrier to the device to flush + * hardware disk cache. + * + * In case of memory pressure, the buffer may be written after + * dm_bufio_mark_buffer_dirty, but before dm_bufio_write_dirty_buffers. + * So dm_bufio_write_dirty_buffers guarantees that the buffer is on-disk + * but the actual writing may occur earlier. + * + * dm_bufio_release_move --- like dm_bufio_release but also move the buffer to + * the new block. dm_bufio_write_dirty_buffers is needed to commit the new + * block. + * dm_bufio_drop_buffers --- clear all buffers. + */ + +/* + * Memory management policy: + * When we're above threshold, start asynchronous writing of dirty buffers + * and memory reclaiming --- but still allow new allocations. + * When we're above limit, don't allocate any more space and synchronously + * wait until existing buffers are freed. + * + * These default parameters can be overriden with parameters to + * dm_bufio_client_create. + */ +#define DM_BUFIO_THRESHOLD_MEMORY (8 * 1048576) +#define DM_BUFIO_LIMIT_MEMORY (9 * 1048576) + +/* + * The number of bvec entries that are embedded directly in the buffer. + * If the chunk size is larger, dm-io is used to do the io. + */ +#define DM_BUFIO_INLINE_VECS 16 + +/* + * Buffer hash + */ +#define DM_BUFIO_HASH_SIZE (PAGE_SIZE / sizeof(struct hlist_head) / 2) +#define DM_BUFIO_HASH(block) ((block) & (DM_BUFIO_HASH_SIZE - 1)) + +/* + * Don't try to kmalloc blocks larger than this. + * For explanation, see dm_bufio_alloc_buffer_data below. + */ +#define DM_BUFIO_BLOCK_SIZE_KMALLOC_LIMIT PAGE_SIZE + +/* + * Buffer state bits. + */ +#define B_READING 0 +#define B_WRITING 1 +#define B_DIRTY 2 + +struct dm_bufio_client { + /* + * Linking of buffers: + * all buffers are linked to cache_hash with their hash_list field. + * clean buffers that are not being written (B_WRITING not set) + * are linked to lru with their lru_list field. + * dirty and clean buffers that are being written are linked + * to dirty_lru with their lru_list field. When the write + * finishes, the buffer cannot be immediately relinked + * (because we are in an interrupt context and relinking + * requires process context), so some clean-not-writing + * buffers can be held on dirty_lru too. They are later + * added to lru in the process context. + */ + struct list_head lru; + struct list_head dirty_lru; + struct mutex lock; + struct block_device *bdev; + unsigned block_size; + unsigned char sectors_per_block_bits; + unsigned char pages_per_block_bits; + + unsigned long n_buffers; + unsigned threshold_buffers; + unsigned limit_buffers; + + struct dm_io_client *dm_io; + + struct dm_buffer *reserved_buffer; + struct hlist_head cache_hash[DM_BUFIO_HASH_SIZE]; + wait_queue_head_t free_buffer_wait; + + int async_write_error; +}; + +/* + * A method, with which the data is allocated: + * kmalloc(), __get_free_pages() or vmalloc(). + * See the comment at dm_bufio_alloc_buffer_data. + */ +#define DATA_MODE_KMALLOC 1 +#define DATA_MODE_GET_FREE_PAGES 2 +#define DATA_MODE_VMALLOC 3 + +struct dm_buffer { + struct hlist_node hash_list; + struct list_head lru_list; + sector_t block; + void *data; + char data_mode; /* DATA_MODE_* */ + unsigned hold_count; + int read_error; + int write_error; + unsigned long state; + struct dm_bufio_client *c; + struct bio bio; + struct bio_vec bio_vec[DM_BUFIO_INLINE_VECS]; +}; + +/* + * Allocating buffer data. + * + * Small buffers are allocated with kmalloc, to use space optimally. + * + * Large buffers: + * We use get_free_pages or vmalloc, both have their advantages and + * disadvantages. + * __get_free_pages can randomly fail, if the memory is fragmented. + * __vmalloc won't randomly fail, but vmalloc space is limited (it may be + * as low as 128M) --- so using it for caching is not appropriate. + * If the allocation may fail we use __get_free_pages. Memory fragmentation + * won't have fatal effect here, it just causes flushes of some other + * buffers and more I/O will be performed. + * If the allocation shouldn't fail we use __vmalloc. This is only for + * the initial reserve allocation, so there's no risk of wasting + * all vmalloc space. + */ +static void *dm_bufio_alloc_buffer_data(struct dm_bufio_client *c, + gfp_t gfp_mask, char *data_mode) +{ + if (c->block_size <= DM_BUFIO_BLOCK_SIZE_KMALLOC_LIMIT) { + *data_mode = DATA_MODE_KMALLOC; + return kmalloc(c->block_size, gfp_mask); + } else if (gfp_mask & __GFP_NORETRY) { + *data_mode = DATA_MODE_GET_FREE_PAGES; + return (void *)__get_free_pages(gfp_mask, + c->pages_per_block_bits); + } else { + *data_mode = DATA_MODE_VMALLOC; + return __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL); + } +} + +/* + * Free buffer's data. + */ +static void dm_bufio_free_buffer_data(struct dm_bufio_client *c, + void *data, char data_mode) +{ + switch (data_mode) { + + case DATA_MODE_KMALLOC: + kfree(data); + break; + case DATA_MODE_GET_FREE_PAGES: + free_pages((unsigned long)data, c->pages_per_block_bits); + break; + case DATA_MODE_VMALLOC: + vfree(data); + break; + default: + printk(KERN_CRIT "dm_bufio_free_buffer_data: bad data mode: %d", + data_mode); + BUG(); + + } +} + +/* + * Allocate buffer and its data. + */ +static struct dm_buffer *alloc_buffer(struct dm_bufio_client *c, gfp_t gfp_mask) +{ + struct dm_buffer *b; + b = kmalloc(sizeof(struct dm_buffer), gfp_mask); + if (!b) + return NULL; + b->c = c; + b->data = dm_bufio_alloc_buffer_data(c, gfp_mask, &b->data_mode); + if (!b->data) { + kfree(b); + return NULL; + } + return b; +} + +/* + * Free buffer and its data. + */ +static void free_buffer(struct dm_buffer *b) +{ + dm_bufio_free_buffer_data(b->c, b->data, b->data_mode); + kfree(b); +} + + +/* + * Link buffer to the hash list and clean or dirty queue. + */ +static void link_buffer(struct dm_buffer *b, sector_t block, int dirty) +{ + struct dm_bufio_client *c = b->c; + c->n_buffers++; + b->block = block; + list_add(&b->lru_list, dirty ? &c->dirty_lru : &c->lru); + hlist_add_head(&b->hash_list, &c->cache_hash[DM_BUFIO_HASH(block)]); +} + +/* + * Unlink buffer from the hash list and dirty or clean queue. + */ +static void unlink_buffer(struct dm_buffer *b) +{ + BUG_ON(!b->c->n_buffers); + b->c->n_buffers--; + hlist_del(&b->hash_list); + list_del(&b->lru_list); +} + +/* + * Place the buffer to the head of dirty or clean LRU queue. + */ +static void relink_lru(struct dm_buffer *b, int dirty) +{ + struct dm_bufio_client *c = b->c; + list_del(&b->lru_list); + list_add(&b->lru_list, dirty ? &c->dirty_lru : &c->lru); +} + +/* + * This function is called when wait_on_bit is actually waiting. + * It unplugs the underlying block device, so that coalesced I/Os in + * the request queue are dispatched to the device. + */ +static int do_io_schedule(void *word) +{ + struct dm_buffer *b = container_of(word, struct dm_buffer, state); + struct dm_bufio_client *c = b->c; + + blk_run_address_space(c->bdev->bd_inode->i_mapping); + + io_schedule(); + + return 0; +} + +static void write_dirty_buffer(struct dm_buffer *b); + +/* + * Wait until any activity on the buffer finishes. + * Possibly write the buffer if it is dirty. + * When this function finishes, there is no I/O running on the buffer + * and the buffer is not dirty. + */ +static void make_buffer_clean(struct dm_buffer *b) +{ + BUG_ON(b->hold_count); + if (likely(!b->state)) /* fast case */ + return; + wait_on_bit(&b->state, B_READING, do_io_schedule, TASK_UNINTERRUPTIBLE); + write_dirty_buffer(b); + wait_on_bit(&b->state, B_WRITING, do_io_schedule, TASK_UNINTERRUPTIBLE); +} + +/* + * Find some buffer that is not held by anybody, clean it, unlink it and + * return it. + * If "wait" is zero, try less hard and don't block. + */ +static struct dm_buffer *get_unclaimed_buffer(struct dm_bufio_client *c, int wait) +{ + struct dm_buffer *b; + list_for_each_entry_reverse(b, &c->lru, lru_list) { + cond_resched(); + BUG_ON(test_bit(B_WRITING, &b->state)); + BUG_ON(test_bit(B_DIRTY, &b->state)); + if (!b->hold_count) { + if (!wait && unlikely(test_bit(B_READING, &b->state))) + continue; + make_buffer_clean(b); + unlink_buffer(b); + return b; + } + } + list_for_each_entry_reverse(b, &c->dirty_lru, lru_list) { + cond_resched(); + BUG_ON(test_bit(B_READING, &b->state)); + if (!b->hold_count) { + if (!wait && (unlikely(test_bit(B_DIRTY, &b->state)) || + unlikely(test_bit(B_WRITING, &b->state)))) { + if (!test_bit(B_WRITING, &b->state)) + write_dirty_buffer(b); + continue; + } + make_buffer_clean(b); + unlink_buffer(b); + return b; + } + } + return NULL; +} + +/* + * Wait until some other threads free some buffer or release hold count + * on some buffer. + * + * This function is entered with c->lock held, drops it and regains it before + * exiting. + */ +static void wait_for_free_buffer(struct dm_bufio_client *c) +{ + DECLARE_WAITQUEUE(wait, current); + + add_wait_queue(&c->free_buffer_wait, &wait); + set_task_state(current, TASK_UNINTERRUPTIBLE); + mutex_unlock(&c->lock); + + io_schedule(); + + set_task_state(current, TASK_RUNNING); + remove_wait_queue(&c->free_buffer_wait, &wait); + + mutex_lock(&c->lock); +} + +/* + * Allocate a new buffer. If the allocation is not possible, wait until some + * other thread frees a buffer. + * + * May drop the lock and regain it. + */ +static struct dm_buffer *alloc_buffer_wait(struct dm_bufio_client *c) +{ + struct dm_buffer *b; + +retry: + /* + * dm-bufio is resistant to allocation failures (it just keeps + * one buffer reserved in caes all the allocations fail). + * So set flags to not try too hard: + * GFP_NOIO: don't recurse into the I/O layer + * __GFP_NOMEMALLOC: don't use emergency reserves + * __GFP_NORETRY: don't retry and rather return failure + * __GFP_NOWARN: don't print a warning in case of failure + */ + b = alloc_buffer(c, GFP_NOIO | __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN); + if (b) + return b; + + if (c->reserved_buffer) { + b = c->reserved_buffer; + c->reserved_buffer = NULL; + return b; + } + + b = get_unclaimed_buffer(c, 1); + if (b) + return b; + + wait_for_free_buffer(c); + goto retry; +} + +/* + * Free a buffer and wake other threads waiting for free buffers. + */ +static void free_buffer_wake(struct dm_buffer *b) +{ + struct dm_bufio_client *c = b->c; + + if (unlikely(!c->reserved_buffer)) + c->reserved_buffer = b; + else + free_buffer(b); + + wake_up(&c->free_buffer_wait); + + cond_resched(); +} + +/* + * Check if we're over watermark. + * If we are over threshold_buffers, start freeing buffers. + * If we're over "limit_buffers", blocks until we get under the limit. + */ +static void check_watermark(struct dm_bufio_client *c) +{ + while (c->n_buffers > c->threshold_buffers) { + struct dm_buffer *b; + b = get_unclaimed_buffer(c, c->n_buffers > c->limit_buffers); + if (!b) + return; + free_buffer_wake(b); + } +} + +static void dm_bufio_dmio_complete(unsigned long error, void *context); + +/* + * Submit I/O on the buffer. + * + * Bio interface is faster but it has some problems: + * - the vector list is limited (increasing this limit increases + * memory-consumption per buffer, so it is not viable) + * - the memory must be direct-mapped, not vmallocated + * - the I/O driver can spuriously reject requests if it thinks that + * the requests are too big for the device or if they cross a + * controller-defined memory boundary + * + * If the buffer is small enough (up to DM_BUFIO_INLINE_VECS pages) and + * it is not vmalloc()ated, try using the bio interface. + * + * If the buffer is big, if it is vmalloc()ated or if the underlying device + * rejects the bio because it is too large, use dm-io layer to do the I/O. + * dmio layer splits the I/O to multiple requests, solving the above + * shortcomings. + */ +static void dm_bufio_submit_io(struct dm_buffer *b, int rw, sector_t block, + bio_end_io_t *end_io) +{ + if (b->c->block_size <= DM_BUFIO_INLINE_VECS * PAGE_SIZE && + b->data_mode != DATA_MODE_VMALLOC) { + char *ptr; + int len; + bio_init(&b->bio); + b->bio.bi_io_vec = b->bio_vec; + b->bio.bi_max_vecs = DM_BUFIO_INLINE_VECS; + b->bio.bi_sector = block << b->c->sectors_per_block_bits; + b->bio.bi_bdev = b->c->bdev; + b->bio.bi_end_io = end_io; + + /* + * we assume that if len >= PAGE_SIZE, ptr is page-aligned, + * if len < PAGE_SIZE, the buffer doesn't cross page boundary. + */ + ptr = b->data; + len = b->c->block_size; + do { + if (!bio_add_page(&b->bio, virt_to_page(ptr), + len < PAGE_SIZE ? len : PAGE_SIZE, + virt_to_phys(ptr) & (PAGE_SIZE - 1))) { + BUG_ON(b->c->block_size <= PAGE_SIZE); + goto use_dmio; + } + len -= PAGE_SIZE; + ptr += PAGE_SIZE; + } while (len > 0); + submit_bio(rw, &b->bio); + } else +use_dmio : { + int r; + struct dm_io_request io_req = { + .bi_rw = rw, + .notify.fn = dm_bufio_dmio_complete, + .notify.context = b, + .client = b->c->dm_io, + }; + struct dm_io_region region = { + .bdev = b->c->bdev, + .sector = block << b->c->sectors_per_block_bits, + .count = b->c->block_size >> SECTOR_SHIFT, + }; + if (b->data_mode != DATA_MODE_VMALLOC) { + io_req.mem.type = DM_IO_KMEM; + io_req.mem.ptr.addr = b->data; + } else { + io_req.mem.type = DM_IO_VMA; + io_req.mem.ptr.vma = b->data; + } + b->bio.bi_end_io = end_io; + r = dm_io(&io_req, 1, ®ion, NULL); + if (unlikely(r)) + end_io(&b->bio, r); + } +} + +/* + * dm-io completion routine. It just calls b->bio.bi_end_io, pretending + * that the request was handled directly with bio interface. + */ +static void dm_bufio_dmio_complete(unsigned long error, void *context) +{ + struct dm_buffer *b = context; + int err = 0; + if (unlikely(error != 0)) + err = -EIO; + b->bio.bi_end_io(&b->bio, err); +} + +/* Find a buffer in the hash. */ +static struct dm_buffer *dm_bufio_find(struct dm_bufio_client *c, sector_t block) +{ + struct dm_buffer *b; + struct hlist_node *hn; + hlist_for_each_entry(b, hn, &c->cache_hash[DM_BUFIO_HASH(block)], hash_list) { + cond_resched(); + if (b->block == block) + return b; + } + + return NULL; +} + +static void read_endio(struct bio *bio, int error); + +/* + * A common routine for dm_bufio_new and dm_bufio_read. + * Operation of these function is very similar, except that dm_bufio_new + * doesn't read the buffer from the disk (assuming that the caller overwrites + * all the data and uses dm_bufio_mark_buffer_dirty to write new data back). + */ +static void *dm_bufio_new_read(struct dm_bufio_client *c, sector_t block, + struct dm_buffer **bp, int read) +{ + struct dm_buffer *b, *new_b = NULL; + + cond_resched(); + mutex_lock(&c->lock); +retry_search: + b = dm_bufio_find(c, block); + if (b) { + if (new_b) + free_buffer_wake(new_b); + b->hold_count++; + relink_lru(b, test_bit(B_DIRTY, &b->state) || + test_bit(B_WRITING, &b->state)); +unlock_wait_ret: + mutex_unlock(&c->lock); +wait_ret: + wait_on_bit(&b->state, B_READING, + do_io_schedule, TASK_UNINTERRUPTIBLE); + if (b->read_error) { + int error = b->read_error; + dm_bufio_release(b); + return ERR_PTR(error); + } + *bp = b; + return b->data; + } + if (!new_b) { + new_b = alloc_buffer_wait(c); + goto retry_search; + } + + check_watermark(c); + + b = new_b; + b->hold_count = 1; + b->read_error = 0; + b->write_error = 0; + link_buffer(b, block, 0); + + if (!read) { + b->state = 0; + goto unlock_wait_ret; + } + + b->state = 1 << B_READING; + + mutex_unlock(&c->lock); + + dm_bufio_submit_io(b, READ, b->block, read_endio); + + goto wait_ret; +} + +/* Read the buffer and hold reference on it */ +void *dm_bufio_read(struct dm_bufio_client *c, sector_t block, + struct dm_buffer **bp) +{ + return dm_bufio_new_read(c, block, bp, 1); +} +EXPORT_SYMBOL(dm_bufio_read); + +/* Get the buffer with possibly invalid data and hold reference on it */ +void *dm_bufio_new(struct dm_bufio_client *c, sector_t block, + struct dm_buffer **bp) +{ + return dm_bufio_new_read(c, block, bp, 0); +} +EXPORT_SYMBOL(dm_bufio_new); + +/* + * The endio routine for reading: set the error, clear the bit and wake up + * anyone waiting on the buffer. + */ +static void read_endio(struct bio *bio, int error) +{ + struct dm_buffer *b = container_of(bio, struct dm_buffer, bio); + b->read_error = error; + BUG_ON(!test_bit(B_READING, &b->state)); + smp_mb__before_clear_bit(); + clear_bit(B_READING, &b->state); + smp_mb__after_clear_bit(); + wake_up_bit(&b->state, B_READING); +} + +/* + * Release the reference held on the buffer. + */ +void dm_bufio_release(struct dm_buffer *b) +{ + struct dm_bufio_client *c = b->c; + mutex_lock(&c->lock); + BUG_ON(test_bit(B_READING, &b->state)); + BUG_ON(!b->hold_count); + b->hold_count--; + if (likely(!b->hold_count)) { + wake_up(&c->free_buffer_wait); + /* + * If there were errors on the buffer, and the buffer is not + * to be written, free the buffer. There is no point in caching + * invalid buffer. + */ + if ((b->read_error || b->write_error) && + !test_bit(B_WRITING, &b->state) && + !test_bit(B_DIRTY, &b->state)) { + unlink_buffer(b); + free_buffer_wake(b); + } + } + mutex_unlock(&c->lock); +} +EXPORT_SYMBOL(dm_bufio_release); + +/* + * Mark that the data in the buffer were modified and the buffer needs to + * be written back. + */ +void dm_bufio_mark_buffer_dirty(struct dm_buffer *b) +{ + struct dm_bufio_client *c = b->c; + + mutex_lock(&c->lock); + + if (!test_and_set_bit(B_DIRTY, &b->state)) + relink_lru(b, 1); + + mutex_unlock(&c->lock); +} +EXPORT_SYMBOL(dm_bufio_mark_buffer_dirty); + +static void write_endio(struct bio *bio, int error); + +/* + * Initiate a write on a dirty buffer, but don't wait for it. + * If the buffer is not dirty, exit. + * If there some previous write going on, wait for it to finish (we can't + * have two writes on the same buffer simultaneously). + * Finally, submit our write and don't wait on it. We set B_WRITING indicating + * that there is a write in progress. + */ +static void write_dirty_buffer(struct dm_buffer *b) +{ + if (!test_bit(B_DIRTY, &b->state)) + return; + clear_bit(B_DIRTY, &b->state); + wait_on_bit_lock(&b->state, B_WRITING, + do_io_schedule, TASK_UNINTERRUPTIBLE); + dm_bufio_submit_io(b, WRITE, b->block, write_endio); +} + +/* + * The endio routine for write. + * Set the error, clear B_WRITING bit and wake anyone who was waiting on it. + */ +static void write_endio(struct bio *bio, int error) +{ + struct dm_buffer *b = container_of(bio, struct dm_buffer, bio); + b->write_error = error; + if (unlikely(error)) { + struct dm_bufio_client *c = b->c; + cmpxchg(&c->async_write_error, 0, error); + } + BUG_ON(!test_bit(B_WRITING, &b->state)); + smp_mb__before_clear_bit(); + clear_bit(B_WRITING, &b->state); + smp_mb__after_clear_bit(); + wake_up_bit(&b->state, B_WRITING); +} + +/* + * Start writing all the dirty buffers. Don't wait for results. + */ +void dm_bufio_write_dirty_buffers_async(struct dm_bufio_client *c) +{ + struct dm_buffer *b; + mutex_lock(&c->lock); + list_for_each_entry_reverse(b, &c->dirty_lru, lru_list) { + cond_resched(); + BUG_ON(test_bit(B_READING, &b->state)); + write_dirty_buffer(b); + } + mutex_unlock(&c->lock); +} +EXPORT_SYMBOL(dm_bufio_write_dirty_buffers_async); + +/* + * Write all the dirty buffers synchronously. + * For performance, it is essential that the buffers are written asynchronously + * and simultaneously (so that the block layer can merge the writes) and then + * waited upon. + * + * Finally, we flush hardware disk cache. + */ +int dm_bufio_write_dirty_buffers(struct dm_bufio_client *c) +{ + int a, f; + unsigned long buffers_processed = 0; + + struct dm_buffer *b, *tmp; + dm_bufio_write_dirty_buffers_async(c); + mutex_lock(&c->lock); +again: + list_for_each_entry_safe_reverse(b, tmp, &c->dirty_lru, lru_list) { + int dropped_lock = 0; + if (buffers_processed < c->n_buffers) + buffers_processed++; + cond_resched(); + BUG_ON(test_bit(B_READING, &b->state)); + if (test_bit(B_WRITING, &b->state)) { + if (buffers_processed < c->n_buffers) { + dropped_lock = 1; + b->hold_count++; + mutex_unlock(&c->lock); + wait_on_bit(&b->state, B_WRITING, + do_io_schedule, TASK_UNINTERRUPTIBLE); + mutex_lock(&c->lock); + b->hold_count--; + } else + wait_on_bit(&b->state, B_WRITING, + do_io_schedule, TASK_UNINTERRUPTIBLE); + } + if (!test_bit(B_DIRTY, &b->state) && !test_bit(B_WRITING, &b->state)) + relink_lru(b, 0); + + /* + * If we dropped the lock, the list is no longer consistent, + * so we must restart the search. + * + * In the most common case, the buffer just processed is + * relinked to the clean list, so we won't loop scanning the + * same buffer again and again. + * + * This may livelock if there is another thread simultaneously + * dirtying buffers, so we count the number of buffers walked + * and if it exceeds the total number of buffers, it means that + * someone is doing some writes simultaneously with us --- in + * this case, stop dropping the lock. + */ + if (dropped_lock) + goto again; + } + wake_up(&c->free_buffer_wait); + mutex_unlock(&c->lock); + + a = xchg(&c->async_write_error, 0); + f = dm_bufio_issue_flush(c); + if (unlikely(a)) + return a; + return f; +} +EXPORT_SYMBOL(dm_bufio_write_dirty_buffers); + +/* + * Use dm-io to send and empty barrier flush the device. + */ +int dm_bufio_issue_flush(struct dm_bufio_client *c) +{ + struct dm_io_request io_req = { + .bi_rw = WRITE_BARRIER, + .mem.type = DM_IO_KMEM, + .mem.ptr.bvec = NULL, + .client = c->dm_io, + }; + struct dm_io_region io_reg = { + .bdev = c->bdev, + .sector = 0, + .count = 0, + }; + return dm_io(&io_req, 1, &io_reg, NULL); +} +EXPORT_SYMBOL(dm_bufio_issue_flush); + +/* + * Release the buffer and copy it to the new location. + * + * We first delete any other buffer that may be at that new location. + * + * Then, we write the buffer to the original location if it was dirty. + * + * Then, if we are the only one who is holding the buffer, relink the buffer + * in the hash queue for the new location. + * + * If there was someone else holding the buffer, we write it to the new + * location but not relink it, because that other user needs to have the buffer + * at the same place. + */ +void dm_bufio_release_move(struct dm_buffer *b, sector_t new_block) +{ + struct dm_bufio_client *c = b->c; + struct dm_buffer *underlying; + + mutex_lock(&c->lock); + +retry: + underlying = dm_bufio_find(c, new_block); + if (unlikely(underlying != NULL)) { + if (underlying->hold_count) { + wait_for_free_buffer(c); + goto retry; + } + make_buffer_clean(underlying); + unlink_buffer(underlying); + free_buffer_wake(underlying); + } + + BUG_ON(!b->hold_count); + BUG_ON(test_bit(B_READING, &b->state)); + write_dirty_buffer(b); + if (b->hold_count == 1) { + wait_on_bit(&b->state, B_WRITING, + do_io_schedule, TASK_UNINTERRUPTIBLE); + set_bit(B_DIRTY, &b->state); + unlink_buffer(b); + link_buffer(b, new_block, 1); + } else { + wait_on_bit_lock(&b->state, B_WRITING, + do_io_schedule, TASK_UNINTERRUPTIBLE); + dm_bufio_submit_io(b, WRITE, new_block, write_endio); + wait_on_bit(&b->state, B_WRITING, + do_io_schedule, TASK_UNINTERRUPTIBLE); + } + mutex_unlock(&c->lock); + dm_bufio_release(b); +} +EXPORT_SYMBOL(dm_bufio_release_move); + +/* + * Free all the buffers (and possibly write them if they were dirty) + * It is required that the calling thread doesn't have any reference on + * any buffer. + */ +void dm_bufio_drop_buffers(struct dm_bufio_client *c) +{ + struct dm_buffer *b; + + /* an optimization ... so that the buffers are not written one-by-one */ + dm_bufio_write_dirty_buffers_async(c); + + mutex_lock(&c->lock); + while ((b = get_unclaimed_buffer(c, 1))) + free_buffer_wake(b); + BUG_ON(!list_empty(&c->lru)); + BUG_ON(!list_empty(&c->dirty_lru)); + mutex_unlock(&c->lock); +} +EXPORT_SYMBOL(dm_bufio_drop_buffers); + +/* Create the buffering interface */ +struct dm_bufio_client * +dm_bufio_client_create(struct block_device *bdev, unsigned blocksize, + unsigned flags, __u64 cache_threshold, __u64 cache_limit) +{ + int r; + struct dm_bufio_client *c; + unsigned i; + + BUG_ON(blocksize < 1 << SECTOR_SHIFT || (blocksize & (blocksize - 1))); + + c = kmalloc(sizeof(*c), GFP_KERNEL); + if (!c) { + r = -ENOMEM; + goto bad_client; + } + + c->bdev = bdev; + c->block_size = blocksize; + c->sectors_per_block_bits = ffs(blocksize) - 1 - SECTOR_SHIFT; + c->pages_per_block_bits = (ffs(blocksize) - 1 >= PAGE_SHIFT) ? + (ffs(blocksize) - 1 - PAGE_SHIFT) : 0; + INIT_LIST_HEAD(&c->lru); + INIT_LIST_HEAD(&c->dirty_lru); + for (i = 0; i < DM_BUFIO_HASH_SIZE; i++) + INIT_HLIST_HEAD(&c->cache_hash[i]); + mutex_init(&c->lock); + c->n_buffers = 0; + + if (!cache_limit) + cache_limit = DM_BUFIO_LIMIT_MEMORY; + c->limit_buffers = cache_limit >> + (c->sectors_per_block_bits + SECTOR_SHIFT); + if (!c->limit_buffers) + c->limit_buffers = 1; + + if (!cache_threshold) + cache_threshold = DM_BUFIO_THRESHOLD_MEMORY; + if (cache_threshold > cache_limit) + cache_threshold = cache_limit; + c->threshold_buffers = cache_threshold >> + (c->sectors_per_block_bits + SECTOR_SHIFT); + if (!c->threshold_buffers) + c->threshold_buffers = 1; + + init_waitqueue_head(&c->free_buffer_wait); + c->async_write_error = 0; + + /* Number of pages is not really hard limit, just a mempool size */ + c->dm_io = dm_io_client_create((blocksize + PAGE_SIZE - 1) / PAGE_SIZE); + if (IS_ERR(c->dm_io)) { + r = PTR_ERR(c->dm_io); + goto bad_dm_io; + } + + c->reserved_buffer = alloc_buffer(c, GFP_KERNEL); + if (!c->reserved_buffer) { + r = -ENOMEM; + goto bad_buffer; + } + + return c; + +bad_buffer: + dm_io_client_destroy(c->dm_io); +bad_dm_io: + kfree(c); +bad_client: + return ERR_PTR(r); +} +EXPORT_SYMBOL(dm_bufio_client_create); + +/* + * Free the buffering interface. + * It is required that there are no references on any buffers. + */ +void dm_bufio_client_destroy(struct dm_bufio_client *c) +{ + unsigned i; + dm_bufio_drop_buffers(c); + for (i = 0; i < DM_BUFIO_HASH_SIZE; i++) + BUG_ON(!hlist_empty(&c->cache_hash[i])); + BUG_ON(!c->reserved_buffer); + free_buffer(c->reserved_buffer); + BUG_ON(c->n_buffers != 0); + dm_io_client_destroy(c->dm_io); + kfree(c); +} +EXPORT_SYMBOL(dm_bufio_client_destroy); diff --git a/drivers/md/dm-bufio.h b/drivers/md/dm-bufio.h new file mode 100644 index 0000000..3261ea2 --- /dev/null +++ b/drivers/md/dm-bufio.h @@ -0,0 +1,35 @@ +/* + * Copyright (C) 2009 Red Hat Czech, s.r.o. + * + * Mikulas Patocka <mpatocka@redhat.com> + * + * This file is released under the GPL. + */ + +#ifndef DM_BUFIO_H +#define DM_BUFIO_H + +struct dm_bufio_client; +struct dm_buffer; + +void *dm_bufio_read(struct dm_bufio_client *c, sector_t block, + struct dm_buffer **bp); +void *dm_bufio_new(struct dm_bufio_client *c, sector_t block, + struct dm_buffer **bp); +void dm_bufio_release(struct dm_buffer *b); + +void dm_bufio_mark_buffer_dirty(struct dm_buffer *b); +void dm_bufio_write_dirty_buffers_async(struct dm_bufio_client *c); +int dm_bufio_write_dirty_buffers(struct dm_bufio_client *c); +int dm_bufio_issue_flush(struct dm_bufio_client *c); + +void dm_bufio_release_move(struct dm_buffer *b, sector_t new_block); + +struct dm_bufio_client * +dm_bufio_client_create(struct block_device *bdev, unsigned blocksize, + unsigned flags, __u64 cache_threshold, + __u64 cache_limit); +void dm_bufio_client_destroy(struct dm_bufio_client *c); +void dm_bufio_drop_buffers(struct dm_bufio_client *c); + +#endif -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 03/14] dm-multisnap-mikulas-headers 2010-03-02 0:23 [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer 2010-03-02 0:23 ` [PATCH 01/14] dm-multisnap-common Mike Snitzer 2010-03-02 0:23 ` [PATCH 02/14] dm-bufio Mike Snitzer @ 2010-03-02 0:23 ` Mike Snitzer 2010-03-05 22:46 ` Mike Snitzer 2010-03-02 0:23 ` [PATCH 04/14] dm-multisnap-mikulas-alloc Mike Snitzer ` (12 subsequent siblings) 15 siblings, 1 reply; 22+ messages in thread From: Mike Snitzer @ 2010-03-02 0:23 UTC (permalink / raw) To: dm-devel; +Cc: Mikulas Patocka From: Mikulas Patocka <mpatocka@redhat.com> Common header files for the exception store. dm-multisnap-mikulas-struct.h contains on-disk structure definitions. dm-multisnap-mikulas.h contains in-memory structures and kernel function prototypes. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/dm-multisnap-mikulas-struct.h | 380 ++++++++++++++++++++++++++++++ drivers/md/dm-multisnap-mikulas.h | 247 +++++++++++++++++++ 2 files changed, 627 insertions(+), 0 deletions(-) create mode 100644 drivers/md/dm-multisnap-mikulas-struct.h create mode 100644 drivers/md/dm-multisnap-mikulas.h diff --git a/drivers/md/dm-multisnap-mikulas-struct.h b/drivers/md/dm-multisnap-mikulas-struct.h new file mode 100644 index 0000000..39eaa16 --- /dev/null +++ b/drivers/md/dm-multisnap-mikulas-struct.h @@ -0,0 +1,380 @@ +/* + * Copyright (C) 2009 Red Hat Czech, s.r.o. + * + * Mikulas Patocka <mpatocka@redhat.com> + * + * This file is released under the GPL. + */ + +#ifndef DM_MULTISNAP_MIKULAS_STRUCT_H +#define DM_MULTISNAP_MIKULAS_STRUCT_H + +/* on-disk structures */ + +#include <linux/types.h> +#include <asm/byteorder.h> + +#include "dm-multisnap.h" + +/* + * Encoding of snapshot numbers: + * + * If CONFIG_DM_MULTISNAPSHOT_MIKULAS_SNAP_OF_SNAP is not selected (normally it + * is), then mikulas_snapid_t is 32-bit sequential number. It continually grows. + * + * IF CONFIG_DM_MULTISNAPSHOT_MIKULAS_SNAP_OF_SNAP is selected (by default), + * then mikulas_snapid_t is 64-bit number. The high 32 bits are sequential + * snapshot number. With each new snapshot, it is incremented. The low 32 bits + * are subsnapshot number. Single snapshots (snapshots of the origin) have + * low 32 bits equal to all ones. Snapshots-of-snapshots have high 32 bits + * equal as their master snapshot and low 32 bits start with zero and is + * incremented with each new snapshot-of-snapshot. + * + * More levels (snapshots-of-snapshots-of-snapshots) are not allowed. + */ + +/* + * Description of on-disk format: + * + * The device is composed of blocks (also called chunks). The block size (also + * called chunk size) is specified in the superblock. + * + * The chunk and block mean the same. "chunk" comes from old snapshots. + * "block" comes from filesystems. We tend to use "chunk" in + * exception-store-independent code to make it consistent with snapshot + * terminology and "block" in exception-store code to make it consistent with + * filesystem terminology. + * + * The minimum block size is 512, the maximum block size is not specified (it is + * limited by 32-bit integer size and available memory). All on-disk pointers + * are in the units of blocks. The pointers are 48-bit, making this format + * capable of handling 2^48 blocks. + * + * Log-structured update is used, new data are only written to unallocated parts + * of the device. By writing a new commit block, these unallocated parts become + * allocated and the store makes a transition to the new state. This maintains + * data consistency across crashes. + * + * Super block + * + * Chunk 0 is the superblock. It is defined in 'struct multisnap_superblock'. + * The superblock contains chunk size, commit block stride, error (if non-zero, + * then the exception store is invalid) and pointer to the current commit block. + * + * Commit blocks + * + * Chunks 1, 1+cb_stride, 1+2*cb_stride, 1+3*cb_stride, etc. are commit blocks. + * Chunks at these locations ((location % cb_stride) == 1) are only used for + * commit blocks, they can't be used for anything else. A commit block is + * written each time a new state is committed. The snapshot store transitions + * from one consistent state to another consistent state by writing a commit + * block. + * + * All commit blocks must be present and initialized (i.e. have valid signature + * and sequence number). They are created when the device is initialized or + * extended. It is not allowed to have random uninitialized data in any commit + * block. + * + * For correctness, one commit block would be sufficient --- but to improve + * performance and minimize seek times, there are multiple commit blocks and + * we use the commit block that is near currently written data. + * + * The current commit block is stored in the super block. However, updates to + * the super block would make excessive disk seeks too, so the updates to super + * block are skipped if the commit block is written at the currently valid + * commit block or at the next location following the currently valid commit + * block. The following algorithm is used to find the commit block at mount: + * 1. read the commit block multisnap_superblock->commit_block + * 2. get its sequence number + * 3. read the next commit block + * 4. if the sequence number of the next commit block is higher than + * the sequence number of the previous block, go to step 3. (i.e. read + * another commit block) + * 5. if the sequence number of the next commit block is lower than + * the sequence number of the previous block, use the previous block + * as the most recent valid commit block + * + * Note: because the disks only support atomic writes of 512 bytes, the commit + * block has only 512 bytes of valid data. The remaining data in the commit + * block up to the chunk size is unused. + * + * B+tree + * + * To hold the information about reallocated chunks, we use b+tree. The b+tree + * leaf entry contains: old chunk (in the origin), new chunk (in the snapshot + * store), the range of snapshot IDs for which this mapping applies. The b+tree + * is keyed by (old chunk, snapshot ID range). The b+tree node is specified + * in 'struct dm_multisnap_bt_node', the b+tree entry is in 'struct + * dm_multisnap_bt_entry'. The maximum number of entries in one node is specified + * so that the node fits into one chunk. + * + * The internal nodes have the same structure as the leaf nodes, except that: + * Both snapshot ID range entries (snap_from and snap_to) must be equal. + * New_chunk is really pointer to the subordinate b+tree node. + * + * The pointer to the root node and the depth of the b+tree is stored in the + * commit block. + * + * Snapshot IDs + * + * We use 64-bit snapshot IDs. The high 32 bits is the number of a snapshot. + * This number always increases by one when creating a new snapshot. The + * snapshot IDs are never reused. It is expected that the admin won't create + * 2^32 snapshots. + * + * The low 32 bits is the subsnapshot ID and it allows to create snapshots of + * snapshots. The store allow holding snapshots of 2 levels --- i.e. master + * snapshots (they have all low 32 bits set to 1) and snapshots-of-snapshots + * (they have low 32 bits incrementing from 0 to 2^32-1). + * + * The valid snapshots IDs are stored in the b+tree. Special entries with chunk + * number DM_CHUNK_T_SNAP_PRESENT denote the present snapshot IDs. These entries + * point to no chunk, instead their presence shows the presence of the specified + * snapshot ID. + * + * When the snapshot is deleted, its entry is removed from the b+tree and the + * whole b+tree is scanned on background --- entries whose range doesn't cover + * any present snapshot are deleted. + * + * Bitmaps + * + * Free space is managed by bitmaps. Bitmaps are pointed to by a radix-tree. + * Each internal node contains 64-bit pointers to subordinate nodes, each leaf + * node contains individual bits, '1' meaning allocated and '0' meaning free. + * There are no structs defined for the radix tree because the internal node is + * just an array of "u64" and the leaf node is just a bit mask. + * + * The depth of the radix tree is dependent on the device size and chunk size. + * The smallest depth that covers the whole device is used. The depth is not + * stored on the device, it is calculated with + * dm_multisnap_bitmap_depth function. + * + * The bitmap root is stored in the commit block. + * If the depth is 0, this root bitmap contains just individual bits (the device + * is so small that its bitmap fits within one chunk), if the depth is 1, the + * bitmap root points to a block with 64-bit pointers to individual bitmaps. + * If the depth is 2, there are two levels of pointers ... etc. + * + * Remaps + * + * If we wanted to follow the log-structure principle (writing only to + * unallocated parts), we would have to always write a new pathway up to the + * b+tree root or bitmap root. + * + * To avoid these multiple writes, remaps are used. There are limited number + * of remap entries in the commit block: 27 entries of commit_block_tmp_remap. + * Each entry contains (old, new) pair of chunk numbers. + * + * When we need to update a b+tree block or a bitmap block, we write the new + * block to a new location and store the old block and the new block in the + * commit block remap. When reading a block, we check if the number is present + * in the remap array --- if it is, we read the new location from the remap + * instead. + * + * This way, if we need to update one bitmap or one b+tree block, we don't have + * to write the whole path down from the root. Eventually, the remap entries in + * the commit block will be exhausted and if this happens, we must free the + * remap entries by writing the path from the root. + * + * The bitmap_idx field in the remap is the index of the bitmap that the + * remapped chunk represents or CB_BITMAP_IDX_NONE if it represents a b+tree + * node. It is used to construct the path to the root. Bitmaps don't contain + * any other data except the bits, so the path must be constructed using this + * index. b+tree nodes contain the entries, so the path can be constructed by + * looking at the b+tree entries. + * + * Example: let's have a b+tree with depth 4 and pointers 10 -> 20 -> 30 -> 40. + * Now, if we want to change node 40: so write a new version to a chunk 41 and + * store the pair (40, 41) into the commit block. + * Now, we want to change this node again: so write a new version to a chunk 42 + * and store the pair (40, 42) into the commit block. + * Now, let's do the same operation for other node --- the remap array in the + * commit block eventually fills up. When this happens, we expunge (40, 42) map + * by writing the path from the root: + * copy node 30 to 43, change the pointer from 40 to 42 + * copy node 20 to 44, change the pointer from 30 to 43 + * copy node 10 to 45, change the pointer from 20 to 44 + * change the root pointer from 10 to 45. + * Now, the remap entry (40, 42) can be removed from the remap array. + * + * Freelist + * + * Freeing blocks is a bit tricky. If we freed blocks using the log-structured + * method, freeing would allocate and free more bitmap blocks, and the whole + * thing would get into an infinite loop. So, to free blocks, a different method + * is used: freelists. + * + * We have a 'struct dm_multisnap_freelist' that contains an array of runs of + * blocks to free. Each run is the pair (start, length). When we need to free + * a block, we add the block to the freelist. We optionally allocate a free + * list, if there is no freelist, or if the current freelist is full. If one + * freelist is not sufficient, a linked list of freelists is being created. + * In the commit we write the freelist location to the commit block and after + * the commit, we free individual bits in the bitmaps. If the computer crashes + * during freeing the bits we just free the bits again on next mount. + */ + +#ifndef CONFIG_DM_MULTISNAPSHOT_MIKULAS_SNAP_OF_SNAP +typedef __u32 mikulas_snapid_t; +#define DM_MIKULAS_SNAPID_STEP_BITS 0 +#define mikulas_snapid_to_cpu le32_to_cpu +#define cpu_to_mikulas_snapid cpu_to_le32 +#else +typedef __u64 mikulas_snapid_t; +#define DM_MIKULAS_SNAPID_STEP_BITS 32 +#define mikulas_snapid_to_cpu le64_to_cpu +#define cpu_to_mikulas_snapid cpu_to_le64 +#endif + +#define DM_MIKULAS_SUBSNAPID_MASK (((mikulas_snapid_t)1 << DM_MIKULAS_SNAPID_STEP_BITS) - 1) +#define DM_SNAPID_T_LAST ((mikulas_snapid_t)0xffffffffffffffffULL) +#define DM_SNAPID_T_MAX ((mikulas_snapid_t)0xfffffffffffffffeULL) + +#define DM_CHUNK_BITS 48 +#define DM_CHUNK_T_LAST ((chunk_t)(1LL << DM_CHUNK_BITS) - 1) +#define DM_CHUNK_T_SNAP_PRESENT ((chunk_t)(1LL << DM_CHUNK_BITS) - 1) +#define DM_CHUNK_T_MAX ((chunk_t)(1LL << DM_CHUNK_BITS) - 2) + +#define CB_STRIDE_DEFAULT 1024 + +#define SB_BLOCK 0 + +#define SB_SIGNATURE cpu_to_be32(0xF6015342) + +struct multisnap_superblock { + __u32 signature; + __u32 chunk_size; + __u32 cb_stride; + __s32 error; + __u64 commit_block; +}; + + +#define FIRST_CB_BLOCK 1 + +#define CB_SIGNATURE cpu_to_be32(0xF6014342) + +struct commit_block_tmp_remap { + __u32 old1; + __u16 old2; + __u16 new2; + __u32 new1; + __u32 bitmap_idx; /* CB_BITMAP_IDX_* */ +}; + +#define CB_BITMAP_IDX_MAX 0xfffffffd +#define CB_BITMAP_IDX_NONE 0xfffffffe + +#define N_REMAPS 27 + +struct multisnap_commit_block { + __u32 signature; + __u32 snapshot_num; /* new snapshot number to allocate */ + __u64 sequence; /* a sequence, increased with each commit */ + + __u32 dev_size1; /* total size of the device in chunks */ + __u16 dev_size2; + __u16 total_allocated2; /* total allocated chunks */ + __u32 total_allocated1; + __u32 data_allocated1; /* chunks allocated for data */ + + __u16 data_allocated2; + __u16 bitmap_root2; /* bitmap root */ + __u32 bitmap_root1; + __u32 alloc_rover1; /* the next place where to try allocation */ + __u16 alloc_rover2; + __u16 freelist2; /* pointer to dm_multisnap_freelist */ + + __u32 freelist1; + __u32 delete_rover1; /* an index in the btree where to continue */ + __u16 delete_rover2; /* searching for data to delete */ + __u16 bt_root2; /* btree root chunk */ + __u32 bt_root1; + + __u8 bt_depth; /* depth of the btree */ + __u8 flags; /* DM_MULTISNAP_FLAG_* */ + __u8 pad[14]; + + struct commit_block_tmp_remap tmp_remap[N_REMAPS]; +}; + +#define DM_MULTISNAP_FLAG_DELETING 0x01 +#define DM_MULTISNAP_FLAG_PENDING_DELETE 0x02 + +#define MAX_BITMAP_DEPTH 6 + +static inline int dm_multisnap_bitmap_depth(unsigned chunk_shift, __u64 device_size) +{ + unsigned depth = 0; + __u64 entries = 8 << chunk_shift; + while (entries < device_size) { + depth++; + entries <<= chunk_shift - 3; + if (!entries) + return -ERANGE; + } + + if (depth > MAX_BITMAP_DEPTH) + return -ERANGE; + + return depth; +} + + +/* B+-tree entry. Sorted by orig_chunk and snap_from/to */ + +#define MAX_BT_DEPTH 12 + +struct dm_multisnap_bt_entry { + __u32 orig_chunk1; + __u16 orig_chunk2; + __u16 new_chunk2; + __u32 new_chunk1; + __u32 flags; + mikulas_snapid_t snap_from; + mikulas_snapid_t snap_to; +}; + +#define BT_SIGNATURE cpu_to_be32(0xF6014254) + +struct dm_multisnap_bt_node { + __u32 signature; + __u32 n_entries; + struct dm_multisnap_bt_entry entries[0]; +}; + +static inline unsigned dm_multisnap_btree_entries(unsigned chunk_size) +{ + return (chunk_size - sizeof(struct dm_multisnap_bt_node)) / + sizeof(struct dm_multisnap_bt_entry); +} + + +/* Freelist */ + +struct dm_multisnap_freelist_entry { + __u32 block1; + __u16 block2; + __u16 run_length; /* FREELIST_* */ +}; + +#define FREELIST_RL_MASK 0x7fff /* Run length */ +#define FREELIST_DATA_FLAG 0x8000 /* Represents a data block */ + +#define FL_SIGNATURE cpu_to_be32(0xF601464C) + +struct dm_multisnap_freelist { + __u32 signature; + __u32 backlink1; + __u16 backlink2; + __u32 n_entries; + struct dm_multisnap_freelist_entry entries[0]; +}; + +static inline unsigned dm_multisnap_freelist_entries(unsigned chunk_size) +{ + return (chunk_size - sizeof(struct dm_multisnap_freelist)) / + sizeof(struct dm_multisnap_freelist); +} + +#endif diff --git a/drivers/md/dm-multisnap-mikulas.h b/drivers/md/dm-multisnap-mikulas.h new file mode 100644 index 0000000..52c87e0 --- /dev/null +++ b/drivers/md/dm-multisnap-mikulas.h @@ -0,0 +1,247 @@ +/* + * Copyright (C) 2009 Red Hat Czech, s.r.o. + * + * Mikulas Patocka <mpatocka@redhat.com> + * + * This file is released under the GPL. + */ + +#ifndef DM_MULTISNAP_MIKULAS_H +#define DM_MULTISNAP_MIKULAS_H + +/* + * This can be optionally undefined to get 32-bit snapshot numbers. + * Breaks on-disk format compatibility. + */ +#define CONFIG_DM_MULTISNAPSHOT_MIKULAS_SNAP_OF_SNAP + +#include "dm-multisnap.h" +#include "dm-multisnap-mikulas-struct.h" + +#include "dm-bufio.h" + +#include <linux/vmalloc.h> + +typedef __u32 bitmap_t; + +#define read_48(struc, entry) (le32_to_cpu((struc)->entry##1) |\ + ((chunk_t)le16_to_cpu((struc)->entry##2) << 31 << 1)) + +#define write_48(struc, entry, val) do { (struc)->entry##1 = cpu_to_le32(val); \ + (struc)->entry##2 = cpu_to_le16((chunk_t)(val) >> 31 >> 1); } while (0) + +#define TMP_REMAP_HASH_SIZE 256 +#define TMP_REMAP_HASH(c) ((c) & (TMP_REMAP_HASH_SIZE - 1)) + +#define UNCOMMITTED_BLOCK_HASH_SIZE 256 +#define UNCOMMITTED_BLOCK_HASH(c) ((c) & (UNCOMMITTED_BLOCK_HASH_SIZE - 1)) + +struct tmp_remap { + /* List entry for tmp_remap */ + struct hlist_node hash_list; + /* List entry for used_tmp_remaps/free_tmp_remaps */ + struct list_head list; + chunk_t old; + chunk_t new; + bitmap_t bitmap_idx; + int uncommitted; +}; + +struct bt_key { + chunk_t chunk; + mikulas_snapid_t snap_from; + mikulas_snapid_t snap_to; +}; + +struct path_element { + chunk_t block; + unsigned idx; + unsigned n_entries; +}; + +struct dm_exception_store { + struct dm_multisnap *dm; + struct dm_bufio_client *bufio; + + chunk_t dev_size; + unsigned chunk_size; + unsigned char chunk_shift; + unsigned char bitmap_depth; + unsigned btree_entries; + __u8 bt_depth; + __u8 flags; + __u32 snapshot_num; + unsigned cb_stride; + + chunk_t bitmap_root; + chunk_t alloc_rover; + chunk_t bt_root; + chunk_t sb_commit_block; + chunk_t valid_commit_block; + chunk_t delete_rover_chunk; + mikulas_snapid_t delete_rover_snapid; + + chunk_t total_allocated; + chunk_t data_allocated; + + __u64 commit_sequence; + + void *tmp_chunk; + + struct rb_root active_snapshots; + + /* Used during query/add remap */ + chunk_t query_snapid; + struct bt_key query_new_key; + unsigned char query_active; + chunk_t query_block_from; + chunk_t query_block_to; + + /* List heads for struct tmp_remap->list */ + unsigned n_used_tmp_remaps; + struct list_head used_bitmap_tmp_remaps; + struct list_head used_bt_tmp_remaps; + struct list_head free_tmp_remaps; + /* List head for struct tmp_remap->hash_list */ + struct hlist_head tmp_remap[TMP_REMAP_HASH_SIZE]; + struct tmp_remap tmp_remap_store[N_REMAPS]; + + unsigned n_preallocated_blocks; + chunk_t preallocated_blocks[MAX_BITMAP_DEPTH * 2]; + + struct dm_multisnap_freelist *freelist; + chunk_t freelist_ptr; + + struct dm_multisnap_background_work delete_work; + unsigned delete_commit_count; + + __u64 cache_threshold; + __u64 cache_limit; + + struct hlist_head uncommitted_blocks[UNCOMMITTED_BLOCK_HASH_SIZE]; +}; + +/* dm-multisnap-alloc.c */ + +void dm_multisnap_create_bitmaps(struct dm_exception_store *s, chunk_t *writing_block); +void dm_multisnap_extend_bitmaps(struct dm_exception_store *s, chunk_t new_size); +void *dm_multisnap_map_bitmap(struct dm_exception_store *s, bitmap_t bitmap, + struct dm_buffer **bp, chunk_t *block, + struct path_element *path); +int dm_multisnap_alloc_blocks(struct dm_exception_store *s, chunk_t *results, + unsigned n_blocks, int flags); +#define ALLOC_DRY 1 +void *dm_multisnap_alloc_duplicate_block(struct dm_exception_store *s, chunk_t block, + struct dm_buffer **bp, void *ptr); +void *dm_multisnap_alloc_make_block(struct dm_exception_store *s, chunk_t *result, + struct dm_buffer **bp); +void dm_multisnap_free_blocks_immediate(struct dm_exception_store *s, chunk_t block, + unsigned n_blocks); +void dm_multisnap_bitmap_finalize_tmp_remap(struct dm_exception_store *s, + struct tmp_remap *tmp_remap); + +/* dm-multisnap-blocks.c */ + +chunk_t dm_multisnap_remap_block(struct dm_exception_store *s, chunk_t block); +void *dm_multisnap_read_block(struct dm_exception_store *s, chunk_t block, + struct dm_buffer **bp); +int dm_multisnap_block_is_uncommitted(struct dm_exception_store *s, chunk_t block); +void dm_multisnap_block_set_uncommitted(struct dm_exception_store *s, chunk_t block); +void dm_multisnap_clear_uncommitted(struct dm_exception_store *s); +void *dm_multisnap_duplicate_block(struct dm_exception_store *s, chunk_t old_chunk, + chunk_t new_chunk, bitmap_t bitmap_idx, + struct dm_buffer **bp, chunk_t *to_free); +void dm_multisnap_free_tmp_remap(struct dm_exception_store *s, struct tmp_remap *t); +void *dm_multisnap_make_block(struct dm_exception_store *s, chunk_t new_chunk, + struct dm_buffer **bp); +void dm_multisnap_free_block_and_duplicates(struct dm_exception_store *s, + chunk_t block); + +int dm_multisnap_is_commit_block(struct dm_exception_store *s, chunk_t block); + +struct stop_cycles { + chunk_t key; + __u64 count; +}; + +void dm_multisnap_init_stop_cycles(struct stop_cycles *cy); +int dm_multisnap_stop_cycles(struct dm_exception_store *s, + struct stop_cycles *cy, chunk_t key); + +/* dm-multisnap-btree.c */ + +void dm_multisnap_create_btree(struct dm_exception_store *s, chunk_t *writing_block); +int dm_multisnap_find_in_btree(struct dm_exception_store *s, struct bt_key *key, + chunk_t *result); +void dm_multisnap_add_to_btree(struct dm_exception_store *s, struct bt_key *key, + chunk_t new_chunk); +void dm_multisnap_restrict_btree_entry(struct dm_exception_store *s, struct bt_key *key); +void dm_multisnap_extend_btree_entry(struct dm_exception_store *s, struct bt_key *key); +void dm_multisnap_delete_from_btree(struct dm_exception_store *s, struct bt_key *key); +void dm_multisnap_bt_finalize_tmp_remap(struct dm_exception_store *s, + struct tmp_remap *tmp_remap); +int dm_multisnap_list_btree(struct dm_exception_store *s, struct bt_key *key, + int (*call)(struct dm_exception_store *, + struct dm_multisnap_bt_node *, + struct dm_multisnap_bt_entry *, void *), + void *cookie); + +/* dm-multisnap-commit.c */ + +void dm_multisnap_transition_mark(struct dm_exception_store *s); +void dm_multisnap_prepare_for_commit(struct dm_exception_store *s); +void dm_multisnap_commit(struct dm_exception_store *s); + +/* dm-multisnap-delete.c */ + +void dm_multisnap_background_delete(struct dm_exception_store *s, + struct dm_multisnap_background_work *bw); + +/* dm-multisnap-freelist.c */ + +void dm_multisnap_init_freelist(struct dm_multisnap_freelist *fl, unsigned chunk_size); +void dm_multisnap_free_block(struct dm_exception_store *s, chunk_t block, unsigned flags); +int dm_multisnap_check_allocated_block(struct dm_exception_store *s, chunk_t block); +void dm_multisnap_flush_freelist_before_commit(struct dm_exception_store *s); +void dm_multisnap_load_freelist(struct dm_exception_store *s); + +/* dm-multisnap-io.c */ + +int dm_multisnap_find_snapshot_chunk(struct dm_exception_store *s, snapid_t snapid, + chunk_t chunk, int write, chunk_t *result); +void dm_multisnap_reset_query(struct dm_exception_store *s); +int dm_multisnap_query_next_remap(struct dm_exception_store *s, chunk_t chunk); +void dm_multisnap_add_next_remap(struct dm_exception_store *s, + union chunk_descriptor *cd, chunk_t *new_chunk); +void dm_multisnap_make_chunk_writeable(struct dm_exception_store *s, + union chunk_descriptor *cd, chunk_t *new_chunk); +int dm_multisnap_check_conflict(struct dm_exception_store *s, union chunk_descriptor *cd, + snapid_t snapid); + +/* dm-multisnap-snaps.c */ + +snapid_t dm_multisnap_get_next_snapid(struct dm_exception_store *s, snapid_t snapid); +int dm_multisnap_compare_snapids_for_create(const void *p1, const void *p2); +int dm_multisnap_find_next_snapid_range(struct dm_exception_store *s, snapid_t snapid, + snapid_t *from, snapid_t *to); +snapid_t dm_multisnap_find_next_subsnapshot(struct dm_exception_store *s, snapid_t snapid); + +void dm_multisnap_destroy_snapshot_tree(struct dm_exception_store *s); +void dm_multisnap_read_snapshots(struct dm_exception_store *s); +int dm_multisnap_allocate_snapid(struct dm_exception_store *s, snapid_t *snapid, + int snap_of_snap, snapid_t master); +int dm_multisnap_create_snapshot(struct dm_exception_store *s, snapid_t snapid); +int dm_multisnap_delete_snapshot(struct dm_exception_store *s, snapid_t snapid); + +void dm_multisnap_get_space(struct dm_exception_store *s, unsigned long long *chunks_total, + unsigned long long *chunks_allocated, + unsigned long long *chunks_metadata_allocated); + +#ifdef CONFIG_DM_MULTISNAPSHOT_MIKULAS_SNAP_OF_SNAP +void dm_multisnap_print_snapid(struct dm_exception_store *s, char *string, + unsigned maxlen, snapid_t snapid); +int dm_multisnap_read_snapid(struct dm_exception_store *s, char *string, + snapid_t *snapid, char **error); +#endif + +#endif -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 03/14] dm-multisnap-mikulas-headers 2010-03-02 0:23 ` [PATCH 03/14] dm-multisnap-mikulas-headers Mike Snitzer @ 2010-03-05 22:46 ` Mike Snitzer 2010-03-06 1:54 ` Mike Snitzer 2010-03-09 3:08 ` Mikulas Patocka 0 siblings, 2 replies; 22+ messages in thread From: Mike Snitzer @ 2010-03-05 22:46 UTC (permalink / raw) To: dm-devel; +Cc: Mikulas Patocka On Mon, Mar 01 2010 at 7:23pm -0500, Mike Snitzer <snitzer@redhat.com> wrote: > From: Mikulas Patocka <mpatocka@redhat.com> > > Common header files for the exception store. > > dm-multisnap-mikulas-struct.h contains on-disk structure definitions. > > dm-multisnap-mikulas.h contains in-memory structures and kernel function > prototypes. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > --- > drivers/md/dm-multisnap-mikulas-struct.h | 380 ++++++++++++++++++++++++++++++ > drivers/md/dm-multisnap-mikulas.h | 247 +++++++++++++++++++ > 2 files changed, 627 insertions(+), 0 deletions(-) > create mode 100644 drivers/md/dm-multisnap-mikulas-struct.h > create mode 100644 drivers/md/dm-multisnap-mikulas.h > > diff --git a/drivers/md/dm-multisnap-mikulas-struct.h b/drivers/md/dm-multisnap-mikulas-struct.h > new file mode 100644 > index 0000000..39eaa16 > --- /dev/null > +++ b/drivers/md/dm-multisnap-mikulas-struct.h <snip> > +/* > + * Description of on-disk format: > + * > + * The device is composed of blocks (also called chunks). The block size (also > + * called chunk size) is specified in the superblock. > + * > + * The chunk and block mean the same. "chunk" comes from old snapshots. > + * "block" comes from filesystems. We tend to use "chunk" in > + * exception-store-independent code to make it consistent with snapshot > + * terminology and "block" in exception-store code to make it consistent with > + * filesystem terminology. > + * > + * The minimum block size is 512, the maximum block size is not specified (it is > + * limited by 32-bit integer size and available memory). All on-disk pointers > + * are in the units of blocks. The pointers are 48-bit, making this format > + * capable of handling 2^48 blocks. Shouldn't we require the chunk size be at least as big as (and a multiple of) physical_block_size? E.g. 4096 on a 4K sector device. This question applies to non-shared snapshots too. > + * Commit blocks > + * > + * Chunks 1, 1+cb_stride, 1+2*cb_stride, 1+3*cb_stride, etc. are commit blocks. > + * Chunks at these locations ((location % cb_stride) == 1) are only used for > + * commit blocks, they can't be used for anything else. A commit block is > + * written each time a new state is committed. The snapshot store transitions > + * from one consistent state to another consistent state by writing a commit > + * block. > + * > + * All commit blocks must be present and initialized (i.e. have valid signature > + * and sequence number). They are created when the device is initialized or > + * extended. It is not allowed to have random uninitialized data in any commit > + * block. > + * > + * For correctness, one commit block would be sufficient --- but to improve > + * performance and minimize seek times, there are multiple commit blocks and > + * we use the commit block that is near currently written data. > + * > + * The current commit block is stored in the super block. However, updates to > + * the super block would make excessive disk seeks too, so the updates to super > + * block are skipped if the commit block is written at the currently valid > + * commit block or at the next location following the currently valid commit > + * block. The following algorithm is used to find the commit block at mount: > + * 1. read the commit block multisnap_superblock->commit_block > + * 2. get its sequence number > + * 3. read the next commit block > + * 4. if the sequence number of the next commit block is higher than > + * the sequence number of the previous block, go to step 3. (i.e. read > + * another commit block) > + * 5. if the sequence number of the next commit block is lower than > + * the sequence number of the previous block, use the previous block > + * as the most recent valid commit block > + * > + * Note: because the disks only support atomic writes of 512 bytes, the commit > + * block has only 512 bytes of valid data. The remaining data in the commit > + * block up to the chunk size is unused. Are there other places where you assume 512b is beneficial? My concern is: what will happen on 4K devices? Would making the commit block's size match the physical_block_size give us any multisnapshot benefit? At a minimum I see a larger commit block would allow us to have more remap entries (larger remap array).. "Remaps" detailed below. But what does that buy us? However, and before I get ahead of myself, with blk_stack_limits() we could have a (DM) device that is composed of 4K and 512b devices; with a resulting physical_block_size of 4K. But 4K wouldn't be atomic to the 512b disk. But what if we were to add a checksum to the commit block? This could give us the ability to have a larger commit block regardless of the physical_block_size. [NOTE: I saw dm_multisnap_commit() is just writing a fixed CB_SIGNATURE] And in speaking with Ric Wheeler, using a checksum in the commit block opens up the possibility for optimizing (reducing) the barrier ops associated with: 1) before the commit block is written (flushes journal transaction) 2) and after the commit block is written. Leaving us with only needing to barrier after the commit block is written. But this optimization apparently also requires having a checksummed journal. Ext4 offers this (somewhat controversial yet fast) capability with the 'journal_async_commit' mount option. [NOTE: I'm largely parroting what I heard from Ric] [NOTE: I couldn't immediately tell if dm_multisnap_commit() is doing multiple barriers when writing out the transaction and commit block] Taking a step back, any reason you elected to not reuse existing kernel infrastructure (e.g. jbd2) for journaling? Custom solution needed for the log-nature of the multisnapshot? [Excuse my naive question(s), I see nilfs2 also has its own journaling... I'm just playing devil's advocate given how important it is that the multisnapshot journal code be correct] > + * The pointer to the root node and the depth of the b+tree is stored in the > + * commit block. OK. > + * Bitmaps > + * > + * Free space is managed by bitmaps. Bitmaps are pointed to by a radix-tree. > + * Each internal node contains 64-bit pointers to subordinate nodes, each leaf > + * node contains individual bits, '1' meaning allocated and '0' meaning free. > + * There are no structs defined for the radix tree because the internal node is > + * just an array of "u64" and the leaf node is just a bit mask. > + * > + * The depth of the radix tree is dependent on the device size and chunk size. > + * The smallest depth that covers the whole device is used. The depth is not > + * stored on the device, it is calculated with > + * dm_multisnap_bitmap_depth function. > + * > + * The bitmap root is stored in the commit block. > + * If the depth is 0, this root bitmap contains just individual bits (the device > + * is so small that its bitmap fits within one chunk), if the depth is 1, the > + * bitmap root points to a block with 64-bit pointers to individual bitmaps. > + * If the depth is 2, there are two levels of pointers ... etc. OK. > + * > + * Remaps > + * > + * If we wanted to follow the log-structure principle (writing only to > + * unallocated parts), we would have to always write a new pathway up to the > + * b+tree root or bitmap root. > + * > + * To avoid these multiple writes, remaps are used. There are limited number > + * of remap entries in the commit block: 27 entries of commit_block_tmp_remap. > + * Each entry contains (old, new) pair of chunk numbers. > + * > + * When we need to update a b+tree block or a bitmap block, we write the new > + * block to a new location and store the old block and the new block in the > + * commit block remap. When reading a block, we check if the number is present > + * in the remap array --- if it is, we read the new location from the remap > + * instead. > + * > + * This way, if we need to update one bitmap or one b+tree block, we don't have > + * to write the whole path down from the root. Eventually, the remap entries in > + * the commit block will be exhausted and if this happens, we must free the > + * remap entries by writing the path from the root. > + * > + * The bitmap_idx field in the remap is the index of the bitmap that the > + * remapped chunk represents or CB_BITMAP_IDX_NONE if it represents a b+tree > + * node. It is used to construct the path to the root. Bitmaps don't contain > + * any other data except the bits, so the path must be constructed using this > + * index. b+tree nodes contain the entries, so the path can be constructed by > + * looking at the b+tree entries. > + * > + * Example: let's have a b+tree with depth 4 and pointers 10 -> 20 -> 30 -> 40. > + * Now, if we want to change node 40: so write a new version to a chunk 41 and > + * store the pair (40, 41) into the commit block. > + * Now, we want to change this node again: so write a new version to a chunk 42 > + * and store the pair (40, 42) into the commit block. > + * Now, let's do the same operation for other node --- the remap array in the > + * commit block eventually fills up. When this happens, we expunge (40, 42) map > + * by writing the path from the root: > + * copy node 30 to 43, change the pointer from 40 to 42 > + * copy node 20 to 44, change the pointer from 30 to 43 > + * copy node 10 to 45, change the pointer from 20 to 44 > + * change the root pointer from 10 to 45. > + * Now, the remap entry (40, 42) can be removed from the remap array. Above provided, for the benefit of others, to give more context on the role of remap entries (and the commit block's remap array). ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 03/14] dm-multisnap-mikulas-headers 2010-03-05 22:46 ` Mike Snitzer @ 2010-03-06 1:54 ` Mike Snitzer 2010-03-09 3:08 ` Mikulas Patocka 1 sibling, 0 replies; 22+ messages in thread From: Mike Snitzer @ 2010-03-06 1:54 UTC (permalink / raw) To: dm-devel; +Cc: Ric Wheeler, Mikulas Patocka On Fri, Mar 05 2010 at 5:46pm -0500, Mike Snitzer <snitzer@redhat.com> wrote: > On Mon, Mar 01 2010 at 7:23pm -0500, > Mike Snitzer <snitzer@redhat.com> wrote: > > > From: Mikulas Patocka <mpatocka@redhat.com> > > > > Common header files for the exception store. > > > > dm-multisnap-mikulas-struct.h contains on-disk structure definitions. > > > > dm-multisnap-mikulas.h contains in-memory structures and kernel function > > prototypes. > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > --- > > drivers/md/dm-multisnap-mikulas-struct.h | 380 ++++++++++++++++++++++++++++++ > > drivers/md/dm-multisnap-mikulas.h | 247 +++++++++++++++++++ > > 2 files changed, 627 insertions(+), 0 deletions(-) > > create mode 100644 drivers/md/dm-multisnap-mikulas-struct.h > > create mode 100644 drivers/md/dm-multisnap-mikulas.h > > > > diff --git a/drivers/md/dm-multisnap-mikulas-struct.h b/drivers/md/dm-multisnap-mikulas-struct.h > > new file mode 100644 > > index 0000000..39eaa16 > > --- /dev/null > > +++ b/drivers/md/dm-multisnap-mikulas-struct.h > > <snip> > > > +/* > > + * Description of on-disk format: > > + * > > + * The device is composed of blocks (also called chunks). The block size (also > > + * called chunk size) is specified in the superblock. > > + * > > + * The chunk and block mean the same. "chunk" comes from old snapshots. > > + * "block" comes from filesystems. We tend to use "chunk" in > > + * exception-store-independent code to make it consistent with snapshot > > + * terminology and "block" in exception-store code to make it consistent with > > + * filesystem terminology. > > + * > > + * The minimum block size is 512, the maximum block size is not specified (it is > > + * limited by 32-bit integer size and available memory). All on-disk pointers > > + * are in the units of blocks. The pointers are 48-bit, making this format > > + * capable of handling 2^48 blocks. > > Shouldn't we require the chunk size be at least as big as > (and a multiple of) physical_block_size? E.g. 4096 on a 4K sector > device. > > This question applies to non-shared snapshots too. > > > + * Commit blocks > > + * > > + * Chunks 1, 1+cb_stride, 1+2*cb_stride, 1+3*cb_stride, etc. are commit blocks. > > + * Chunks at these locations ((location % cb_stride) == 1) are only used for > > + * commit blocks, they can't be used for anything else. A commit block is > > + * written each time a new state is committed. The snapshot store transitions > > + * from one consistent state to another consistent state by writing a commit > > + * block. > > + * > > + * All commit blocks must be present and initialized (i.e. have valid signature > > + * and sequence number). They are created when the device is initialized or > > + * extended. It is not allowed to have random uninitialized data in any commit > > + * block. > > + * > > + * For correctness, one commit block would be sufficient --- but to improve > > + * performance and minimize seek times, there are multiple commit blocks and > > + * we use the commit block that is near currently written data. > > + * > > + * The current commit block is stored in the super block. However, updates to > > + * the super block would make excessive disk seeks too, so the updates to super > > + * block are skipped if the commit block is written at the currently valid > > + * commit block or at the next location following the currently valid commit > > + * block. The following algorithm is used to find the commit block at mount: > > + * 1. read the commit block multisnap_superblock->commit_block > > + * 2. get its sequence number > > + * 3. read the next commit block > > + * 4. if the sequence number of the next commit block is higher than > > + * the sequence number of the previous block, go to step 3. (i.e. read > > + * another commit block) > > + * 5. if the sequence number of the next commit block is lower than > > + * the sequence number of the previous block, use the previous block > > + * as the most recent valid commit block > > + * > > + * Note: because the disks only support atomic writes of 512 bytes, the commit > > + * block has only 512 bytes of valid data. The remaining data in the commit > > + * block up to the chunk size is unused. > > Are there other places where you assume 512b is beneficial? My concern > is: what will happen on 4K devices? > > Would making the commit block's size match the physical_block_size give > us any multisnapshot benefit? At a minimum I see a larger commit block > would allow us to have more remap entries (larger remap > array).. "Remaps" detailed below. But what does that buy us? > > However, and before I get ahead of myself, with blk_stack_limits() we > could have a (DM) device that is composed of 4K and 512b devices; with a > resulting physical_block_size of 4K. But 4K wouldn't be atomic to the > 512b disk. > > But what if we were to add a checksum to the commit block? This could > give us the ability to have a larger commit block regardless of the > physical_block_size. [NOTE: I saw dm_multisnap_commit() is just writing > a fixed CB_SIGNATURE] > > And in speaking with Ric Wheeler, using a checksum in the commit block > opens up the possibility for optimizing (reducing) the barrier ops > associated with: > 1) before the commit block is written (flushes journal transaction) > 2) and after the commit block is written. > > Leaving us with only needing to barrier after the commit block is > written. But this optimization apparently also requires having a > checksummed journal. Ext4 offers this (somewhat controversial yet fast) > capability with the 'journal_async_commit' mount option. [NOTE: I'm > largely parroting what I heard from Ric] > > [NOTE: I couldn't immediately tell if dm_multisnap_commit() is doing > multiple barriers when writing out the transaction and commit block] > > Taking a step back, any reason you elected to not reuse existing kernel > infrastructure (e.g. jbd2) for journaling? Custom solution needed for > the log-nature of the multisnapshot? [Excuse my naive question(s), I > see nilfs2 also has its own journaling... I'm just playing devil's > advocate given how important it is that the multisnapshot journal code > be correct] Here is some additional detail on ext4's 'journal_async_commit': http://marc.info/?l=linux-ext4&m=125263711211379&w=2 http://marc.info/?l=linux-ext4&m=125267485222449&w=2 Ted Tso acknowledged that the name 'journal_async_commit' is really a misnomer here (I reference this post last because it contains an early misunderstanding from Ted, that he corrects in the 1st url I referenced above): http://marc.info/?l=linux-ext4&m=125238515130681&w=2 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 03/14] dm-multisnap-mikulas-headers 2010-03-05 22:46 ` Mike Snitzer 2010-03-06 1:54 ` Mike Snitzer @ 2010-03-09 3:08 ` Mikulas Patocka 2010-03-09 3:30 ` Mike Snitzer 1 sibling, 1 reply; 22+ messages in thread From: Mikulas Patocka @ 2010-03-09 3:08 UTC (permalink / raw) To: Mike Snitzer; +Cc: dm-devel > > + * The minimum block size is 512, the maximum block size is not specified (it is > > + * limited by 32-bit integer size and available memory). All on-disk pointers > > + * are in the units of blocks. The pointers are 48-bit, making this format > > + * capable of handling 2^48 blocks. > > Shouldn't we require the chunk size be at least as big as > (and a multiple of) physical_block_size? E.g. 4096 on a 4K sector > device. > > This question applies to non-shared snapshots too. If the device has a larger physical block size, it will reject smaller chunks. The same for non-shared snapshots. > > + * Note: because the disks only support atomic writes of 512 bytes, the commit > > + * block has only 512 bytes of valid data. The remaining data in the commit > > + * block up to the chunk size is unused. > > Are there other places where you assume 512b is beneficial? My concern > is: what will happen on 4K devices? With 4K chunk size, it writes 4K, but assumes that only 512b write is atomic. So, if the disk supports atomic write of 4K, it doesn't hurt. > Would making the commit block's size match the physical_block_size give > us any multisnapshot benefit? At a minimum I see a larger commit block > would allow us to have more remap entries (larger remap > array).. > > "Remaps" detailed below. But what does that buy us? They reduce the number of blocks written. Without remaps, you'd have to write the path from the root every time. > However, and before I get ahead of myself, with blk_stack_limits() we > could have a (DM) device that is composed of 4K and 512b devices; with a > resulting physical_block_size of 4K. But 4K wouldn't be atomic to the > 512b disk. Yes, that's why I must assume only 512b atomic write. > But what if we were to add a checksum to the commit block? This could > give us the ability to have a larger commit block regardless of the > physical_block_size. [NOTE: I saw dm_multisnap_commit() is just writing > a fixed CB_SIGNATURE] That would have to be cryptographic hash --- simple checksum can be fooled. Even that wouldn't be correct, because if the hash fails, the commit block is lost. If you wanted to use full commit blocks, you'd have to: - divide the commit block to two. - write these two alternatively (so that at least one is valid) - hash them or (which is simpler) copy sequence number to each 512b sector (so that if some sectors get written and some not, you find it out by having different sequence number). That is possible to do. > And in speaking with Ric Wheeler, using a checksum in the commit block > opens up the possibility for optimizing (reducing) the barrier ops > associated with: > 1) before the commit block is written (flushes journal transaction) > 2) and after the commit block is written. No, you have to use barriers. If the data before the commit blocks is not written and the commit block is written (with matching checksum), then the data is corrupted. Obviously, you can checksum all the data, but SHA1 is slow and it is being phased out already and even slower SHA256 is being recommended... > Leaving us with only needing to barrier after the commit block is > written. But this optimization apparently also requires having a > checksummed journal. Ext4 offers this (somewhat controversial yet fast) > capability with the 'journal_async_commit' mount option. [NOTE: I'm > largely parroting what I heard from Ric] > > [NOTE: I couldn't immediately tell if dm_multisnap_commit() is doing > multiple barriers when writing out the transaction and commit block] It calls dm_bufio_write_dirty_buffers twice and dm_bufio_write_dirty_buffers submits a zero barrier. (there's no point in submitting data-barrier, because that gets split into two zero barriers and non-barrier write anyway) > Taking a step back, any reason you elected to not reuse existing kernel > infrastructure (e.g. jbd2) for journaling? Custom solution needed for > the log-nature of the multisnapshot? [Excuse my naive question(s), I > see nilfs2 also has its own journaling... I'm just playing devil's > advocate given how important it is that the multisnapshot journal code > be correct] All the filesystems have their own journaling. jbd is used only by ext3, jbd2 only by ext4. Reiserfs has its own, JFS has its own, XFS has its own... etc. I consider the idead of sharing journaling code as inefficient: arguing about the interface would take more time than writing it from scratch. > Above provided, for the benefit of others, to give more context on the > role of remap entries (and the commit block's remap array). If there were no remaps, change in any B-tree node would require to overwrite all the nodes from the root. Similarly, changing any bitmap would require to overwrite the bitmap directory from the root. With remaps, changes to B-tree nodes or bitmaps write just that one block (and commit block, to store the remap). The full write from the root is done later, when the remap table fills up. Mikulas ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 03/14] dm-multisnap-mikulas-headers 2010-03-09 3:08 ` Mikulas Patocka @ 2010-03-09 3:30 ` Mike Snitzer 0 siblings, 0 replies; 22+ messages in thread From: Mike Snitzer @ 2010-03-09 3:30 UTC (permalink / raw) To: Mikulas Patocka; +Cc: dm-devel On Mon, Mar 08 2010 at 10:08pm -0500, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > + * The minimum block size is 512, the maximum block size is not specified (it is > > > + * limited by 32-bit integer size and available memory). All on-disk pointers > > > + * are in the units of blocks. The pointers are 48-bit, making this format > > > + * capable of handling 2^48 blocks. > > > > Shouldn't we require the chunk size be at least as big as > > (and a multiple of) physical_block_size? E.g. 4096 on a 4K sector > > device. > > > > This question applies to non-shared snapshots too. > > If the device has a larger physical block size, it will reject smaller > chunks. The same for non-shared snapshots. Correct, but we don't prevent the user from trying to use less than physical_block_size. So I think we agree we should. Hasn't been a concern but native 4K devices change that. > > > + * Note: because the disks only support atomic writes of 512 bytes, the commit > > > + * block has only 512 bytes of valid data. The remaining data in the commit > > > + * block up to the chunk size is unused. > > > > Are there other places where you assume 512b is beneficial? My concern > > is: what will happen on 4K devices? > > With 4K chunk size, it writes 4K, but assumes that only 512b write is > atomic. So, if the disk supports atomic write of 4K, it doesn't hurt. Right, so long as we impose 4K on a native 4K device, etc. But I was more wondering if there were other places that assume 512b granularity. Didn't see any but figured I'd ask. > > Would making the commit block's size match the physical_block_size give > > us any multisnapshot benefit? At a minimum I see a larger commit block > > would allow us to have more remap entries (larger remap > > array).. > > > > "Remaps" detailed below. But what does that buy us? > > They reduce the number of blocks written. Without remaps, you'd have to > write the path from the root every time. Sure, I wasn't saying we'd eliminate/reduce rempas. I was saying we could increase the number of remaps (think the current limit is 27 per 512b commit block). Using a 4K commit block would give us 100+? So I was asking if more remaps help at all. > > However, and before I get ahead of myself, with blk_stack_limits() we > > could have a (DM) device that is composed of 4K and 512b devices; with a > > resulting physical_block_size of 4K. But 4K wouldn't be atomic to the > > 512b disk. > > Yes, that's why I must assume only 512b atomic write. OK, fine by me so long as we impose chunk_size >= physical_block_size. > > But what if we were to add a checksum to the commit block? This could > > give us the ability to have a larger commit block regardless of the > > physical_block_size. [NOTE: I saw dm_multisnap_commit() is just writing > > a fixed CB_SIGNATURE] > > That would have to be cryptographic hash --- simple checksum can be > fooled. > > Even that wouldn't be correct, because if the hash fails, the commit block > is lost. If you wanted to use full commit blocks, you'd have to: > > - divide the commit block to two. > - write these two alternatively (so that at least one is valid) > - hash them or (which is simpler) copy sequence number to each 512b sector > (so that if some sectors get written and some not, you find it out by > having different sequence number). > > That is possible to do. Yes, Ric mentioned a comparable strategy was used for database transactions. > > And in speaking with Ric Wheeler, using a checksum in the commit block > > opens up the possibility for optimizing (reducing) the barrier ops > > associated with: > > 1) before the commit block is written (flushes journal transaction) > > 2) and after the commit block is written. > > No, you have to use barriers. If the data before the commit blocks is not > written and the commit block is written (with matching checksum), then the > data is corrupted. Fair enough, though it is used by ext4. But with ext4 it is used in conjunction with a checksummed journal. > Obviously, you can checksum all the data, but SHA1 is slow and it is being > phased out already and even slower SHA256 is being recommended... > > > Leaving us with only needing to barrier after the commit block is > > written. But this optimization apparently also requires having a > > checksummed journal. Ext4 offers this (somewhat controversial yet fast) > > capability with the 'journal_async_commit' mount option. [NOTE: I'm > > largely parroting what I heard from Ric] > > > > [NOTE: I couldn't immediately tell if dm_multisnap_commit() is doing > > multiple barriers when writing out the transaction and commit block] > > It calls dm_bufio_write_dirty_buffers twice and > dm_bufio_write_dirty_buffers submits a zero barrier. (there's no point in > submitting data-barrier, because that gets split into two zero barriers > and non-barrier write anyway) OK. > > Taking a step back, any reason you elected to not reuse existing kernel > > infrastructure (e.g. jbd2) for journaling? Custom solution needed for > > the log-nature of the multisnapshot? [Excuse my naive question(s), I > > see nilfs2 also has its own journaling... I'm just playing devil's > > advocate given how important it is that the multisnapshot journal code > > be correct] > > All the filesystems have their own journaling. jbd is used only by ext3, > jbd2 only by ext4. Reiserfs has its own, JFS has its own, XFS has its > own... etc. > > I consider the idead of sharing journaling code as inefficient: arguing > about the interface would take more time than writing it from scratch. Well, ocfs2 uses jbd2 too but I understand your point. > > Above provided, for the benefit of others, to give more context on the > > role of remap entries (and the commit block's remap array). > > If there were no remaps, change in any B-tree node would require to > overwrite all the nodes from the root. Similarly, changing any bitmap > would require to overwrite the bitmap directory from the root. > > With remaps, changes to B-tree nodes or bitmaps write just that one block > (and commit block, to store the remap). The full write from the root is > done later, when the remap table fills up. Again, I was asking about adding more remap entries in the remaps array if the commit block was increased from 512b to 4K. Mike ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 04/14] dm-multisnap-mikulas-alloc 2010-03-02 0:23 [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer ` (2 preceding siblings ...) 2010-03-02 0:23 ` [PATCH 03/14] dm-multisnap-mikulas-headers Mike Snitzer @ 2010-03-02 0:23 ` Mike Snitzer 2010-03-02 0:23 ` [PATCH 05/14] dm-multisnap-mikulas-blocks Mike Snitzer ` (11 subsequent siblings) 15 siblings, 0 replies; 22+ messages in thread From: Mike Snitzer @ 2010-03-02 0:23 UTC (permalink / raw) To: dm-devel; +Cc: Mikulas Patocka From: Mikulas Patocka <mpatocka@redhat.com> Management of allocation bitmaps. Creating the bitmaps. Allocating and freeing blocks. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/dm-multisnap-alloc.c | 590 +++++++++++++++++++++++++++++++++++++++ 1 files changed, 590 insertions(+), 0 deletions(-) create mode 100644 drivers/md/dm-multisnap-alloc.c diff --git a/drivers/md/dm-multisnap-alloc.c b/drivers/md/dm-multisnap-alloc.c new file mode 100644 index 0000000..02f89be --- /dev/null +++ b/drivers/md/dm-multisnap-alloc.c @@ -0,0 +1,590 @@ +/* + * Copyright (C) 2009 Red Hat Czech, s.r.o. + * + * Mikulas Patocka <mpatocka@redhat.com> + * + * This file is released under the GPL. + */ + +#include "dm-multisnap-mikulas.h" + +#define rshift_roundup(val, bits) (((val) + ((chunk_t)1 << (bits)) - 1) >> (bits)) + +#define BITS_PER_BYTE_SHIFT 3 +#define BYTES_PER_POINTER_SHIFT 3 + +/* + * Initialize the root bitmap, write it at the position "writing block". + */ +void dm_multisnap_create_bitmaps(struct dm_exception_store *s, chunk_t *writing_block) +{ + struct dm_buffer *bp; + unsigned i; + void *bmp; + + while (dm_multisnap_is_commit_block(s, *writing_block)) + (*writing_block)++; + + if (*writing_block >= s->dev_size) { + DM_MULTISNAP_SET_ERROR(s->dm, -ENOSPC, + ("dm_multisnap_create_bitmaps: device is too small")); + return; + } + + if (*writing_block >= s->chunk_size << BITS_PER_BYTE_SHIFT) { + DM_MULTISNAP_SET_ERROR(s->dm, -ENOSPC, + ("dm_multisnap_create_bitmaps: invalid block to write: %llx", + (unsigned long long)*writing_block)); + return; + } + + bmp = dm_bufio_new(s->bufio, *writing_block, &bp); + if (IS_ERR(bmp)) { + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(bmp), + ("dm_multisnap_create_bitmaps: can't create direct bitmap block at %llx", + (unsigned long long)*writing_block)); + return; + } + cond_resched(); + memset(bmp, 0, s->chunk_size); + for (i = 0; i <= *writing_block; i++) { + generic___set_le_bit(i, bmp); + dm_multisnap_status_lock(s->dm); + s->total_allocated++; + dm_multisnap_status_unlock(s->dm); + cond_resched(); + } + s->bitmap_root = *writing_block; + s->bitmap_depth = 0; + + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + + (*writing_block)++; +} + +static void dm_multisnap_add_bitmap(struct dm_exception_store *s); + +/* + * Extend bitmaps to cover "new_size" area. + * + * While we extend bitmaps we increase s->dev_size so that the newly mapped + * space can be used to hold further bitmaps. + */ +void dm_multisnap_extend_bitmaps(struct dm_exception_store *s, chunk_t new_size) +{ + while (s->dev_size < new_size) { + struct dm_buffer *bp; + void *bmp; + bitmap_t bitmap_no = s->dev_size >> (s->chunk_shift + BITS_PER_BYTE_SHIFT); + unsigned i = s->dev_size & ((1 << (s->chunk_shift + BITS_PER_BYTE_SHIFT)) - 1); + chunk_t c = s->dev_size; + if (!i) { + dm_multisnap_add_bitmap(s); + if (unlikely(dm_multisnap_has_error(s->dm))) + return; + } + bmp = dm_multisnap_map_bitmap(s, bitmap_no, &bp, NULL, NULL); + if (unlikely(!bmp)) + return; + for (; i < s->chunk_size << BITS_PER_BYTE_SHIFT; i++, c++) { + if (unlikely(dm_multisnap_is_commit_block(s, c))) + generic___set_le_bit(i, bmp); + else + generic___clear_le_bit(i, bmp); + } + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + + s->dev_size = ((chunk_t)bitmap_no + 1) << (s->chunk_shift + BITS_PER_BYTE_SHIFT); + if (s->dev_size > new_size) + s->dev_size = new_size; + } +} + +/* + * Add one bitmap after the last bitmap. A helper function for + * dm_multisnap_extend_bitmaps + */ +static void dm_multisnap_add_bitmap(struct dm_exception_store *s) +{ + struct path_element path[MAX_BITMAP_DEPTH]; + struct dm_buffer *bp; + int d; + __u64 *bmpp; + unsigned i; + chunk_t c, bitmap_blk, new_blk; + bitmap_t bitmap_no = s->dev_size >> (s->chunk_shift + BITS_PER_BYTE_SHIFT); + void *bmp = dm_multisnap_alloc_make_block(s, &bitmap_blk, &bp); + if (!bmp) + return; + c = (chunk_t)bitmap_no << (s->chunk_shift + BITS_PER_BYTE_SHIFT); + for (i = 0; i < s->chunk_size << BITS_PER_BYTE_SHIFT; i++, c++) { + if (unlikely(dm_multisnap_is_commit_block(s, c))) + generic___set_le_bit(i, bmp); + else + generic___clear_le_bit(i, bmp); + } + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + + /* just get the path to the last block */ + bmp = dm_multisnap_map_bitmap(s, bitmap_no - 1, &bp, NULL, path); + if (unlikely(!bmp)) + return; + dm_bufio_release(bp); + + for (d = s->bitmap_depth - 1; d >= 0; d--) { + if (path[d].idx + 1 < path[d].n_entries) { + bmpp = dm_multisnap_read_block(s, path[d].block, &bp); + if (!bmpp) + return; + bmpp[path[d].idx + 1] = cpu_to_le64(bitmap_blk); + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + return; + } else { + bmpp = dm_multisnap_alloc_make_block(s, &new_blk, &bp); + if (!bmpp) + return; + memset(bmpp, 0, s->chunk_size); + bmpp[0] = cpu_to_le64(bitmap_blk); + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + bitmap_blk = new_blk; + } + } + + /* make new root */ + bmpp = dm_multisnap_alloc_make_block(s, &new_blk, &bp); + if (!bmpp) + return; + memset(bmpp, 0, s->chunk_size); + bmpp[0] = cpu_to_le64(s->bitmap_root); + bmpp[1] = cpu_to_le64(bitmap_blk); + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + s->bitmap_root = new_blk; + s->bitmap_depth++; +} + +/* + * Read a leaf bitmap node with index "bitmap". + * Return the pointer to the data, store the held buffer to bl. + * Return the block in block and path in path. + */ +void *dm_multisnap_map_bitmap(struct dm_exception_store *s, bitmap_t bitmap, + struct dm_buffer **bp, chunk_t *block, struct path_element *path) +{ + __u64 *bmp; + unsigned idx; + unsigned d = s->bitmap_depth; + chunk_t blk = s->bitmap_root; + chunk_t parent = 0; + + while (1) { + bmp = dm_multisnap_read_block(s, blk, bp); + if (unlikely(!bmp)) { + /* error is already set in dm_multisnap_read_block */ + DMERR("dm_multisnap_map_bitmap: can't read bitmap at " + "%llx (%llx), pointed to by %llx (%llx), depth %d/%d, index %llx", + (unsigned long long)blk, + (unsigned long long)dm_multisnap_remap_block(s, blk), + (unsigned long long)parent, + (unsigned long long)dm_multisnap_remap_block(s, parent), + s->bitmap_depth - d, + s->bitmap_depth, + (unsigned long long)bitmap); + return NULL; + } + if (!d) { + if (block) + *block = blk; + return bmp; + } + + idx = (bitmap >> ((d - 1) * (s->chunk_shift - BYTES_PER_POINTER_SHIFT))) & + ((s->chunk_size - 1) >> BYTES_PER_POINTER_SHIFT); + + if (unlikely(path != NULL)) { + path[s->bitmap_depth - d].block = blk; + path[s->bitmap_depth - d].idx = idx; + path[s->bitmap_depth - d].n_entries = s->chunk_size >> BYTES_PER_POINTER_SHIFT; + } + + parent = blk; + blk = le64_to_cpu(bmp[idx]); + + dm_bufio_release(*bp); + + d--; + } +} + +/* + * Find a free bit from "start" to "end" (in bits). + * If wide_search is nonzero, search for the whole free byte first. + */ +static int find_bit(const void *bmp, unsigned start, unsigned end, int wide_search) +{ + const void *p; + unsigned bit; + if (unlikely(start >= end)) + return -ENOSPC; + if (likely(!generic_test_le_bit(start, bmp))) + return start; + cond_resched(); + if (likely(wide_search)) { + p = memchr((const char *)bmp + (start >> 3), 0, (end >> 3) - (start >> 3)); + cond_resched(); + if (p) { + bit = (((const __u8 *)p - (const __u8 *)bmp) << 3) | (start & 7); + while (bit > start && !generic_test_le_bit(bit - 1, bmp)) + bit--; + goto ret_bit; + } + } + bit = generic_find_next_zero_le_bit(bmp, end, start); + cond_resched(); + +ret_bit: + if (unlikely(bit >= end)) + return -ENOSPC; + return bit; +} + +/* + * Find the bitmap limit in bits. + * + * All the bitmaps hold s->chunk_size << BITS_PER_BYTE_SHIFT bits, except + * the last one where we must use s->dev_size modulo number of bits in bitmap + * to find the valid number of bits. Note that bits past s->dev_size are + * undefined, there can be anything, so we must not scan past this limit. + */ +static unsigned bitmap_limit(struct dm_exception_store *s, bitmap_t bmp) +{ + if (bmp == (bitmap_t)(s->dev_size >> (s->chunk_shift + BITS_PER_BYTE_SHIFT))) + return (unsigned)s->dev_size & ((s->chunk_size << BITS_PER_BYTE_SHIFT) - 1); + return s->chunk_size << BITS_PER_BYTE_SHIFT; +} + +/* + * The central allocation routine. + * + * Allocation strategy: + * + * We maintain s->alloc_rover to point past the last allocated blocks (and wraps + * around back to 0 if it would point past the device end). + * + * We attempt to allocate at the rover and increment the rover, this will + * minimize seek times for writes. I assume that this snapshot driver will be + * mostly loaded with write requests (it happens when writing to the origin), + * so I attempt to optimize for writes. + * + * If there is no space at the rover, find the whole free byte (8 bits) in the + * current chunk. If there is not any free byte, we search the individual bits. + * If we don't find any bit, continue with the next bitmap. If we scan the whole + * device linearly and still don't find anything, abort with a failure. + * + * This is similar to what ext[23] does, so I suppose it is tuned well enough + * that it won't fragment too much. + */ +int dm_multisnap_alloc_blocks(struct dm_exception_store *s, chunk_t *results, + unsigned n_blocks, int flags) +{ + void *bmp; + struct dm_buffer *bp; + chunk_t block; + int wrap_around = 0; + int start_bit; + int wide_search; + int i; + bitmap_t bitmap_no; + int c; + int bit; + chunk_t to_free = 0; + + bitmap_no = s->alloc_rover >> (s->chunk_shift + BITS_PER_BYTE_SHIFT); +next_bitmap: + bmp = dm_multisnap_map_bitmap(s, bitmap_no, &bp, &block, NULL); + if (unlikely(!bmp)) + return -1; + + wide_search = 1; +find_again: + start_bit = s->alloc_rover & ((s->chunk_size << BITS_PER_BYTE_SHIFT) - 1); + + for (i = 0; i < n_blocks; i++) { +find_another_bit: + bit = find_bit(bmp, start_bit, bitmap_limit(s, bitmap_no), wide_search); + if (unlikely(bit < 0)) { +bit_find_failed: + if (wide_search) { + wide_search = 0; + goto find_again; + } + dm_bufio_release(bp); + s->alloc_rover = (chunk_t) ++bitmap_no << (s->chunk_shift + BITS_PER_BYTE_SHIFT); + if (unlikely(s->alloc_rover >= s->dev_size)) { + s->alloc_rover = 0; + bitmap_no = 0; + wrap_around++; + if (wrap_around >= 2) { + DM_MULTISNAP_SET_ERROR(s->dm, -ENOSPC, ("snapshot overflow")); + return -1; + } + } + goto next_bitmap; + } + results[i] = ((chunk_t)bitmap_no << (s->chunk_shift + BITS_PER_BYTE_SHIFT)) | bit; + start_bit = bit + 1; + dm_bufio_release(bp); + + c = dm_multisnap_check_allocated_block(s, results[i]); + if (dm_multisnap_has_error(s->dm)) + return -1; + + bmp = dm_multisnap_read_block(s, block, &bp); + if (unlikely(!bmp)) + return -1; + + if (unlikely(c)) + goto find_another_bit; + } + + if (flags & ALLOC_DRY) + goto bp_release_return; + + if (!dm_multisnap_block_is_uncommitted(s, block)) { + chunk_t new_block; +find_another_bit_for_bitmap: + bit = find_bit(bmp, start_bit, bitmap_limit(s, bitmap_no), wide_search); + if (unlikely(bit < 0)) + goto bit_find_failed; + + new_block = ((chunk_t)bitmap_no << (s->chunk_shift + BITS_PER_BYTE_SHIFT)) | bit; + start_bit = bit + 1; + + dm_bufio_release(bp); + c = dm_multisnap_check_allocated_block(s, new_block); + if (dm_multisnap_has_error(s->dm)) + return -1; + + bmp = dm_multisnap_read_block(s, block, &bp); + if (unlikely(!bmp)) + return -1; + + if (unlikely(c)) + goto find_another_bit_for_bitmap; + + /* + * Warning: record the address of a block to free in a special + * variable. + * + * If we freed it here, that could recurse back to + * dm_multisnap_alloc_blocks and corrupt allocations. Free it + * later when we are done with the allocation and all the + * allocated blocks are marked in the bitmap. + */ + bmp = dm_multisnap_duplicate_block(s, block, new_block, bitmap_no, &bp, &to_free); + if (unlikely(!bmp)) + return -1; + + generic___set_le_bit(bit, bmp); + dm_multisnap_status_lock(s->dm); + s->total_allocated++; + dm_multisnap_status_unlock(s->dm); + } + + for (i = 0; i < n_blocks; i++) + generic___set_le_bit(results[i] & ((s->chunk_size << BITS_PER_BYTE_SHIFT) - 1), bmp); + dm_multisnap_status_lock(s->dm); + s->total_allocated += n_blocks; + dm_multisnap_status_unlock(s->dm); + + dm_bufio_mark_buffer_dirty(bp); + +bp_release_return: + dm_bufio_release(bp); + + s->alloc_rover = (s->alloc_rover & ~(chunk_t)((s->chunk_size << BITS_PER_BYTE_SHIFT) - 1)) + start_bit; + if (unlikely(s->alloc_rover >= s->dev_size)) + s->alloc_rover = 0; + + if (unlikely(to_free != 0)) + dm_multisnap_free_block(s, to_free, 0); + + return 0; +} + +/* + * This function gets a valid block number (block), buffer for this block (bp) + * and data in this buffer (ptr) and returns new writeable block. It possibly + * moves the data to another buffer and updates *bp. + * + * Note that to maintain log-structured storage, we must not write to the block, + * but we must allocate new block and copy the data there. + * + * The only case where we can write to the provided block directly is if the + * block was created since last commit. + */ + +void *dm_multisnap_alloc_duplicate_block(struct dm_exception_store *s, chunk_t block, + struct dm_buffer **bp, void *ptr) +{ + int r; + chunk_t new_chunk; + void *data; + + if (dm_multisnap_block_is_uncommitted(s, block)) + return ptr; + + dm_bufio_release(*bp); + + r = dm_multisnap_alloc_blocks(s, &new_chunk, 1, 0); + if (r) + return NULL; + + data = dm_multisnap_read_block(s, block, bp); + if (!data) + return NULL; + + return dm_multisnap_duplicate_block(s, block, new_chunk, + CB_BITMAP_IDX_NONE, bp, NULL); +} + +/* + * Allocate a new block and return its data. Return the block number in *result + * and buffer pointer in *bp. + */ +void *dm_multisnap_alloc_make_block(struct dm_exception_store *s, chunk_t *result, + struct dm_buffer **bp) +{ + int r = dm_multisnap_alloc_blocks(s, result, 1, 0); + if (unlikely(r < 0)) + return NULL; + + return dm_multisnap_make_block(s, *result, bp); +} + +/* + * Free the block immediately. You must be careful with this function because + * it doesn't follow log-structured protocol. + * + * It may be used only if + * - the blocks to free were allocated since last transactions. + * - or from freelist management, which means the blocks were already recorded in + * a freelist (thus it would be freed again in case of machine crash). + */ +void dm_multisnap_free_blocks_immediate(struct dm_exception_store *s, chunk_t block, + unsigned n_blocks) +{ + void *bmp; + struct dm_buffer *bp; + + if (!n_blocks) + return; + + if (unlikely(block + n_blocks > s->dev_size)) { + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_free_block_immediate: freeing invalid blocks %llx, %x", + (unsigned long long)block, n_blocks)); + return; + } + + if (block + n_blocks == s->alloc_rover) + s->alloc_rover = block; + + do { + bitmap_t bitmap_no = block >> (s->chunk_shift + BITS_PER_BYTE_SHIFT); + + bmp = dm_multisnap_map_bitmap(s, bitmap_no, &bp, NULL, NULL); + if (!bmp) + return; + + do { + generic___clear_le_bit(block & ((s->chunk_size << BITS_PER_BYTE_SHIFT) - 1), bmp); + dm_multisnap_status_lock(s->dm); + s->total_allocated--; + dm_multisnap_status_unlock(s->dm); + n_blocks--; + block++; + cond_resched(); + } while (n_blocks && (block & ((s->chunk_size << BITS_PER_BYTE_SHIFT) - 1))); + + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + } while (unlikely(n_blocks != 0)); +} + +/* + * Flush tmp_remaps for bitmaps. Write the path from modified bitmaps to the + * root. + */ +void dm_multisnap_bitmap_finalize_tmp_remap(struct dm_exception_store *s, struct tmp_remap *tmp_remap) +{ + chunk_t block; + struct dm_buffer *bp; + __u64 *new_block; + struct path_element path[MAX_BITMAP_DEPTH]; + int results_ptr; + + chunk_t new_blockn; + int i; + + /* + * Preallocate twice the required amount of blocks, so that resolving + * the next tmp_remap (created here, in dm_multisnap_alloc_blocks) + * doesn't have to allocate anything. + */ + if (s->n_preallocated_blocks < s->bitmap_depth) { + if (unlikely(dm_multisnap_alloc_blocks(s, s->preallocated_blocks + s->n_preallocated_blocks, + s->bitmap_depth * 2 - s->n_preallocated_blocks, 0) < 0)) + return; + s->n_preallocated_blocks = s->bitmap_depth * 2; + } + results_ptr = 0; + + new_block = dm_multisnap_map_bitmap(s, tmp_remap->bitmap_idx, &bp, &block, path); + if (unlikely(!new_block)) + return; + + dm_bufio_release(bp); + + new_blockn = tmp_remap->new; + for (i = s->bitmap_depth - 1; i >= 0; i--) { + chunk_t block_to_free; + int remapped = 0; + __u64 *bmp = dm_multisnap_read_block(s, path[i].block, &bp); + if (unlikely(!bmp)) + return; + + if (!dm_multisnap_block_is_uncommitted(s, path[i].block)) { + remapped = 1; + dm_bufio_release_move(bp, s->preallocated_blocks[results_ptr]); + bmp = dm_multisnap_read_block(s, s->preallocated_blocks[results_ptr], &bp); + if (!bmp) + return; + dm_multisnap_block_set_uncommitted(s, s->preallocated_blocks[results_ptr]); + } + + block_to_free = le64_to_cpu(bmp[path[i].idx]); + bmp[path[i].idx] = cpu_to_le64(new_blockn); + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + + dm_multisnap_free_block(s, block_to_free, 0); + + if (!remapped) + goto skip_it; + new_blockn = s->preallocated_blocks[results_ptr]; + results_ptr++; + } + + dm_multisnap_free_block(s, s->bitmap_root, 0); + s->bitmap_root = new_blockn; + +skip_it: + memmove(s->preallocated_blocks, s->preallocated_blocks + results_ptr, + (s->n_preallocated_blocks -= results_ptr) * sizeof(chunk_t)); +} -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 05/14] dm-multisnap-mikulas-blocks 2010-03-02 0:23 [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer ` (3 preceding siblings ...) 2010-03-02 0:23 ` [PATCH 04/14] dm-multisnap-mikulas-alloc Mike Snitzer @ 2010-03-02 0:23 ` Mike Snitzer 2010-03-02 0:23 ` [PATCH 06/14] dm-multisnap-mikulas-btree Mike Snitzer ` (10 subsequent siblings) 15 siblings, 0 replies; 22+ messages in thread From: Mike Snitzer @ 2010-03-02 0:23 UTC (permalink / raw) To: dm-devel; +Cc: Mikulas Patocka From: Mikulas Patocka <mpatocka@redhat.com> Common operations with blocks. Management of tmp_remap array and some helper functions. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/dm-multisnap-blocks.c | 333 ++++++++++++++++++++++++++++++++++++++ 1 files changed, 333 insertions(+), 0 deletions(-) create mode 100644 drivers/md/dm-multisnap-blocks.c diff --git a/drivers/md/dm-multisnap-blocks.c b/drivers/md/dm-multisnap-blocks.c new file mode 100644 index 0000000..8715ed9 --- /dev/null +++ b/drivers/md/dm-multisnap-blocks.c @@ -0,0 +1,333 @@ +/* + * Copyright (C) 2009 Red Hat Czech, s.r.o. + * + * Mikulas Patocka <mpatocka@redhat.com> + * + * This file is released under the GPL. + */ + +#include "dm-multisnap-mikulas.h" + +/* + * Check that the block is valid. + */ +static int check_invalid(struct dm_exception_store *s, chunk_t block) +{ + if (unlikely(block >= s->dev_size) || + unlikely(block == SB_BLOCK) || + unlikely(dm_multisnap_is_commit_block(s, block))) { + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("check_invalid: access to invalid part of the device: %llx, size %llx", + (unsigned long long)block, (unsigned long long)s->dev_size)); + return 1; + } + return 0; +} + +static struct tmp_remap *find_tmp_remap(struct dm_exception_store *s, chunk_t block) +{ + struct tmp_remap *t; + struct hlist_node *hn; + unsigned hash = TMP_REMAP_HASH(block); + hlist_for_each_entry(t, hn, &s->tmp_remap[hash], hash_list) { + if (t->old == block) + return t; + cond_resched(); + } + return NULL; +} + +/* + * Remap a block number according to tmp_remap table. + */ +chunk_t dm_multisnap_remap_block(struct dm_exception_store *s, chunk_t block) +{ + struct tmp_remap *t; + t = find_tmp_remap(s, block); + if (t) + return t->new; + return block; +} + +/* + * Read a metadata block, return pointer to the dataand hold a buffer for that + * block. + * + * Do a possible block remapping according to tmp_remap table. + */ +void *dm_multisnap_read_block(struct dm_exception_store *s, chunk_t block, + struct dm_buffer **bp) +{ + void *buf; + cond_resched(); + + if (unlikely(check_invalid(s, block))) + return NULL; + + block = dm_multisnap_remap_block(s, block); + + if (unlikely(check_invalid(s, block))) + return NULL; + + buf = dm_bufio_read(s->bufio, block, bp); + if (unlikely(IS_ERR(buf))) { + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(buf), + ("dm_multisnap_read_block: error read chunk %llx", + (unsigned long long)block)); + return NULL; + } + return buf; +} + +struct uncommitted_record { + struct hlist_node hash; + chunk_t block; +}; + +/* + * Check if the block is not yet committed. + * + * If this function returns 1, the block is surely uncommitted. + * If it returns 0, the block may be committed or may be uncommitted. + * This function is used for optimizations, if it returns 0 + * it doesn't break correctness, it only degrades performance. + */ +int dm_multisnap_block_is_uncommitted(struct dm_exception_store *s, chunk_t block) +{ + struct tmp_remap *t; + + struct uncommitted_record *ur; + struct hlist_node *hn; + + check_invalid(s, block); + t = find_tmp_remap(s, block); + if (t) { + if (t->uncommitted) + return 1; + block = t->new; + } + hlist_for_each_entry(ur, hn, &s->uncommitted_blocks[UNCOMMITTED_BLOCK_HASH(block)], hash) + if (ur->block == block) + return 1; + return 0; +} + +/* + * Set the given block as uncommitted. + * + * The allocation may fail, in this case we see only a performance degradation + * (the block will be copied again), there is no functionality loss. + * + * We can't use non-failing allocation because it could deadlock (wait for some + * pages being written and that write could be directed through this driver). + */ +void dm_multisnap_block_set_uncommitted(struct dm_exception_store *s, chunk_t block) +{ + struct uncommitted_record *ur; + /* + * GFP_ATOMIC allows to exhaust reserves. We don't want it (we can + * afford failure), so we use GFP_NOWAIT. + * __GFP_NOWARN supresses the log message on failure. + * __GFP_NOMEMALLOC makes it less aggressive if the allocator recurses + * into itself. + */ + ur = kmalloc(sizeof(struct uncommitted_record), + GFP_NOWAIT | __GFP_NOWARN | __GFP_NOMEMALLOC); + if (!ur) + return; + ur->block = block; + hlist_add_head(&ur->hash, &s->uncommitted_blocks[UNCOMMITTED_BLOCK_HASH(block)]); +} + +/* + * Clear the register of uncommitted blocks. This is called on commit and + * on unload. + */ +void dm_multisnap_clear_uncommitted(struct dm_exception_store *s) +{ + int i; + for (i = 0; i < UNCOMMITTED_BLOCK_HASH_SIZE; i++) { + struct hlist_head *h = &s->uncommitted_blocks[i]; + while (!hlist_empty(h)) { + struct uncommitted_record *ur = + hlist_entry(h->first, struct uncommitted_record, hash); + hlist_del(&ur->hash); + kfree(ur); + } + } +} + +/* + * This function is called by an allocation code when needing to modify a + * committed block. + * + * It will create new remap for old_chunk->new_chunk. + * bitmap_idx is the index of bitmap if we are remapping bitmap, otherwise + * CB_BITMAP_IDX_NONE. + * + * *bp must be open buffer for old_chunk. New buffer for new_chunk is returned + * there. + * + * A block that needs to be freed is returned in to_free. If to_free is NULL, + * that block is freed immediatelly. + */ +void *dm_multisnap_duplicate_block(struct dm_exception_store *s, chunk_t old_chunk, + chunk_t new_chunk, bitmap_t bitmap_idx, + struct dm_buffer **bp, chunk_t *to_free_ptr) +{ + chunk_t to_free_val; + void *buf; + struct tmp_remap *t; + + if (unlikely(check_invalid(s, old_chunk)) || + unlikely(check_invalid(s, new_chunk))) + return NULL; + + if (!to_free_ptr) + to_free_ptr = &to_free_val; + *to_free_ptr = 0; + + t = find_tmp_remap(s, old_chunk); + if (t) { + if (unlikely(t->bitmap_idx != bitmap_idx)) { + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_duplicate_block: bitmap_idx doesn't match, %X != %X", + t->bitmap_idx, bitmap_idx)); + return NULL; + } + *to_free_ptr = t->new; + t->new = new_chunk; + } else { + if (unlikely(list_empty(&s->free_tmp_remaps))) { + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_duplicate_block: all remap blocks used")); + return NULL; + } + t = list_first_entry(&s->free_tmp_remaps, struct tmp_remap, list); + t->new = new_chunk; + t->old = old_chunk; + t->bitmap_idx = bitmap_idx; + hlist_add_head(&t->hash_list, &s->tmp_remap[TMP_REMAP_HASH(old_chunk)]); + s->n_used_tmp_remaps++; + } + list_del(&t->list); + if (bitmap_idx == CB_BITMAP_IDX_NONE) + list_add_tail(&t->list, &s->used_bt_tmp_remaps); + else + list_add_tail(&t->list, &s->used_bitmap_tmp_remaps); + t->uncommitted = 1; + dm_bufio_release_move(*bp, new_chunk); + + if (to_free_ptr == &to_free_val && to_free_val) + dm_multisnap_free_block(s, to_free_val, 0); + + buf = dm_bufio_read(s->bufio, new_chunk, bp); + if (IS_ERR(buf)) { + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(buf), + ("dm_multisnap_duplicate_block: error reading chunk %llx", + (unsigned long long)new_chunk)); + return NULL; + } + return buf; +} + +/* + * Remove an entry from tmp_remap table. + */ +void dm_multisnap_free_tmp_remap(struct dm_exception_store *s, struct tmp_remap *t) +{ + list_del(&t->list); + hlist_del(&t->hash_list); + s->n_used_tmp_remaps--; + list_add(&t->list, &s->free_tmp_remaps); +} + +/* + * Get a new block. Just a wrapper around dm_bufio_new. + * It is expected that the caller fills all the data in the block, calls + * dm_bufio_mark_buffer_dirty and releases the buffer. + */ +void *dm_multisnap_make_block(struct dm_exception_store *s, chunk_t new_chunk, + struct dm_buffer **bp) +{ + void *buf; + + if (unlikely(check_invalid(s, new_chunk))) + return NULL; + + dm_multisnap_block_set_uncommitted(s, new_chunk); + + buf = dm_bufio_new(s->bufio, new_chunk, bp); + if (unlikely(IS_ERR(buf))) { + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(buf), + ("dm_multisnap_make_block: error creating new block at chunk %llx", + (unsigned long long)new_chunk)); + return NULL; + } + return buf; +} + +/* + * Free the given block and a possible tmp_remap shadow of it. + */ +void dm_multisnap_free_block_and_duplicates(struct dm_exception_store *s, chunk_t block) +{ + struct tmp_remap *t; + + if (unlikely(check_invalid(s, block))) + return; + + t = find_tmp_remap(s, block); + if (t) { + dm_multisnap_free_block(s, t->new, 0); + dm_multisnap_free_tmp_remap(s, t); + } + dm_multisnap_free_block(s, block, 0); +} + +/* + * Return true if the block is a commit block. + */ +int dm_multisnap_is_commit_block(struct dm_exception_store *s, chunk_t block) +{ + if (unlikely(block < FIRST_CB_BLOCK)) + return 0; + /* + * Division is very slow, thus we optimize the most common case + * if cb_stride is the power of 2. + */ + if (likely(!(s->cb_stride & (s->cb_stride - 1)))) + return (block & (s->cb_stride - 1)) == (FIRST_CB_BLOCK & (s->cb_stride - 1)); + else + return sector_div(block, s->cb_stride) == FIRST_CB_BLOCK % s->cb_stride; +} + +/* + * These two functions are used to avoid cycling on a corrupted device. + * + * If the data on the device is corrupted, we mark the device as errorneous, + * but we don't want to lockup the whole system. These functions help to achieve + * this goal. + * + * cy->count is the number of processed blocks. + * cy->key is the recorded block at last power-of-two count. + */ +void dm_multisnap_init_stop_cycles(struct stop_cycles *cy) +{ + cy->key = 0; + cy->count = 0; +} + +int dm_multisnap_stop_cycles(struct dm_exception_store *s, struct stop_cycles *cy, chunk_t key) +{ + if (unlikely(cy->key == key) && unlikely(cy->count != 0)) { + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_stop_cycles: cycle detected at chunk %llx", + (unsigned long long)key)); + return -1; + } + cy->count++; + if (!((cy->count - 1) & cy->count)) + cy->key = key; + return 0; +} -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 06/14] dm-multisnap-mikulas-btree 2010-03-02 0:23 [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer ` (4 preceding siblings ...) 2010-03-02 0:23 ` [PATCH 05/14] dm-multisnap-mikulas-blocks Mike Snitzer @ 2010-03-02 0:23 ` Mike Snitzer 2010-03-02 0:23 ` [PATCH 07/14] dm-multisnap-mikulas-commit Mike Snitzer ` (9 subsequent siblings) 15 siblings, 0 replies; 22+ messages in thread From: Mike Snitzer @ 2010-03-02 0:23 UTC (permalink / raw) To: dm-devel; +Cc: Mikulas Patocka From: Mikulas Patocka <mpatocka@redhat.com> B+tree operations. Adding and deleting entries and walking the whole b+tree. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/dm-multisnap-btree.c | 838 +++++++++++++++++++++++++++++++++++++++ 1 files changed, 838 insertions(+), 0 deletions(-) create mode 100644 drivers/md/dm-multisnap-btree.c diff --git a/drivers/md/dm-multisnap-btree.c b/drivers/md/dm-multisnap-btree.c new file mode 100644 index 0000000..a7e3b60 --- /dev/null +++ b/drivers/md/dm-multisnap-btree.c @@ -0,0 +1,838 @@ +/* + * Copyright (C) 2009 Red Hat Czech, s.r.o. + * + * Mikulas Patocka <mpatocka@redhat.com> + * + * This file is released under the GPL. + */ + +#include "dm-multisnap-mikulas.h" + +/* + * Read one btree node and do basic consistency checks. + * Any btree access should be done with this function. + */ +static struct dm_multisnap_bt_node * +dm_multisnap_read_btnode(struct dm_exception_store *s, int depth, + chunk_t block, unsigned want_entries, struct dm_buffer **bp) +{ + struct dm_multisnap_bt_node *node; + + BUG_ON((unsigned)depth >= s->bt_depth); + + node = dm_multisnap_read_block(s, block, bp); + if (unlikely(!node)) + return NULL; + + if (unlikely(node->signature != BT_SIGNATURE)) { + dm_bufio_release(*bp); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_read_btnode: bad signature on btree node %llx", + (unsigned long long)block)); + return NULL; + } + + if (unlikely((unsigned)(le32_to_cpu(node->n_entries) - 1) >= s->btree_entries) || + (want_entries && unlikely(le32_to_cpu(node->n_entries) != want_entries))) { + dm_bufio_release(*bp); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_read_btnode: bad number of entries in btree node " + "%llx: %x, wanted %x", + (unsigned long long)block, + le32_to_cpu(node->n_entries), + want_entries)); + return NULL; + } + + return node; +} + +/* + * Write orig_chunk entry. + * + * If we are compiled with 32-bit chunk_t, we must write the special fields + * with bits 32-47 set, so that the store could be read on a system with + * 64-bit chunk_t. + */ +static void write_orig_chunk(struct dm_multisnap_bt_entry *be, chunk_t n) +{ + write_48(be, orig_chunk, n); + if (sizeof(chunk_t) == 4 && unlikely(n > DM_CHUNK_T_MAX)) + be->orig_chunk2 = cpu_to_le16(0xffff); +} + +/* + * Add an entry (key, new_chunk) at an appropriate index to the btree node. + * Move the existing entries + */ +static void add_at_idx(struct dm_multisnap_bt_node *node, unsigned index, + struct bt_key *key, chunk_t new_chunk) +{ + memmove(&node->entries[index + 1], &node->entries[index], + (le32_to_cpu(node->n_entries) - index) * sizeof(struct dm_multisnap_bt_entry)); + write_orig_chunk(&node->entries[index], key->chunk); + write_48(&node->entries[index], new_chunk, new_chunk); + node->entries[index].snap_from = cpu_to_mikulas_snapid(key->snap_from); + node->entries[index].snap_to = cpu_to_mikulas_snapid(key->snap_to); + node->entries[index].flags = cpu_to_le32(0); + node->n_entries = cpu_to_le32(le32_to_cpu(node->n_entries) + 1); +} + +/* + * Create an initial btree. + * (*writing_block) is updated to point after the btree. + */ +void dm_multisnap_create_btree(struct dm_exception_store *s, chunk_t *writing_block) +{ + struct dm_buffer *bp; + struct dm_multisnap_bt_node *node; + struct bt_key new_key; + + while (dm_multisnap_is_commit_block(s, *writing_block)) + (*writing_block)++; + + if (*writing_block >= s->dev_size) { + DM_MULTISNAP_SET_ERROR(s->dm, -ENOSPC, + ("dm_multisnap_create_btree: device is too small")); + return; + } + + node = dm_bufio_new(s->bufio, *writing_block, &bp); + if (IS_ERR(node)) { + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(node), + ("dm_multisnap_create_btree: 't create direct bitmap block at %llx", + (unsigned long long)*writing_block)); + return; + } + memset(node, 0, s->chunk_size); + node->signature = BT_SIGNATURE; + node->n_entries = cpu_to_le32(0); + + /* + * A btree node must have at least one entry --- so create this empty + * one + */ + new_key.snap_from = new_key.snap_to = DM_SNAPID_T_LAST; + new_key.chunk = DM_CHUNK_T_LAST; + add_at_idx(node, 0, &new_key, 0); + + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + s->bt_root = *writing_block; + s->bt_depth = 1; + (*writing_block)++; +} + +/* + * Compare btree entry and a search key. Returns: + * -1: the entry is lower than the key + * 1: the entry is higher than the key + * 0: the entry matches the key (both entry and key have ranges, a match + * is returned when the ranges overlap) + */ +static int compare_key(struct dm_multisnap_bt_entry *e, struct bt_key *key) +{ + chunk_t orig_chunk = read_48(e, orig_chunk); + if (orig_chunk < key->chunk) + return -1; + if (orig_chunk > key->chunk) + return 1; + + if (mikulas_snapid_to_cpu(e->snap_to) < key->snap_from) + return -1; + if (mikulas_snapid_to_cpu(e->snap_from) > key->snap_to) + return 1; + + return 0; +} + +/* + * Perform binary search on the btree node. + * Returns: 1 - found, 0 - not found + * *result - if found, then the first entry in the requested range + * - if not found, then the first entry after the requested range + */ +static int binary_search(struct dm_multisnap_bt_node *node, struct bt_key *key, + unsigned *result) +{ + int c; + int first = 0; + int last = le32_to_cpu(node->n_entries) - 1; + + while (1) { + int middle = (first + last) >> 1; + struct dm_multisnap_bt_entry *e = &node->entries[middle]; + + c = compare_key(e, key); + + if (first == last) + break; + + if (c < 0) + first = middle + 1; + else + last = middle; + + cond_resched(); + } + + *result = first; + return !c; +} + +/* + * Find a given key in the btree. + * + * Returns: 1 - found, 0 - not found, -1 - error + * In case of not error (0 or 1 is returned), the node and held buffer for + * this node is returned (the buffer must be released with + * dm_bufio_release). Also, path with s->bt_depth entries is returned. + */ +static int walk_btree(struct dm_exception_store *s, struct bt_key *key, + struct dm_multisnap_bt_node **nodep, struct dm_buffer **bp, + struct path_element path[MAX_BT_DEPTH]) +{ +#define node (*nodep) + int r; + chunk_t block = s->bt_root; + unsigned d = 0; + + /* + * These four are purely to check tree consistency. + * They could be commented out. But it's safer to leave them there. + */ + chunk_t want_last_chunk = DM_CHUNK_T_LAST; + mikulas_snapid_t want_last_snapid = DM_SNAPID_T_LAST; + chunk_t last_chunk; + mikulas_snapid_t last_snapid; + + while (1) { + path[d].block = block; + node = dm_multisnap_read_btnode(s, d, block, 0, bp); + if (!node) + return -1; + path[d].n_entries = le32_to_cpu(node->n_entries); + + /* Check consistency (can be commented out) */ + last_chunk = read_48(&node->entries[path[d].n_entries - 1], orig_chunk); + last_snapid = mikulas_snapid_to_cpu(node->entries[path[d].n_entries - 1].snap_to); + if (unlikely(last_chunk != want_last_chunk) || + unlikely(last_snapid != want_last_snapid)) { + dm_bufio_release(*bp); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("walk_btree: invalid last entry in node %llx/%llx: " + "last_chunk %llx, want_last_chunk %llx, last_snapid: %llx, " + "want_last_snapid: %llx, searching for %llx, %llx-%llx", + (unsigned long long)block, + (unsigned long long)dm_multisnap_remap_block(s, block), + (unsigned long long)last_chunk, + (unsigned long long)want_last_chunk, + (unsigned long long)last_snapid, + (unsigned long long)want_last_snapid, + (unsigned long long)key->chunk, + (unsigned long long)key->snap_from, + (unsigned long long)key->snap_to)); + return -1; + } + + r = binary_search(node, key, &path[d].idx); + + want_last_chunk = read_48(&node->entries[path[d].idx], orig_chunk); + want_last_snapid = mikulas_snapid_to_cpu(node->entries[path[d].idx].snap_to); + + block = read_48(&node->entries[path[d].idx], new_chunk); + if (++d == s->bt_depth) + break; + dm_bufio_release(*bp); + } + if (unlikely(compare_key(&node->entries[path[s->bt_depth - 1].idx], key) < 0)) + path[s->bt_depth - 1].idx++; + return r; +#undef node +} + +/* + * Find a given key in the btree. + * + * Returns: 1 - found, 0 - not found, -1 - error + * In case the node is found, key contains updated key and result contains + * the resulting chunk. + */ +int dm_multisnap_find_in_btree(struct dm_exception_store *s, struct bt_key *key, + chunk_t *result) +{ + struct dm_multisnap_bt_node *node; + struct path_element path[MAX_BT_DEPTH]; + struct dm_buffer *bp; + + int r = walk_btree(s, key, &node, &bp, path); + if (unlikely(r < 0)) + return r; + + if (r) { + struct dm_multisnap_bt_entry *entry = &node->entries[path[s->bt_depth - 1].idx]; + *result = read_48(entry, new_chunk); + key->chunk = read_48(entry, orig_chunk); + key->snap_from = mikulas_snapid_to_cpu(entry->snap_from); + key->snap_to = mikulas_snapid_to_cpu(entry->snap_to); + } + dm_bufio_release(bp); + + return r; +} + +/* + * Scan the btree sequentially. + * Start with the given key. Perform "call" on each leaf node. When call returns + * nonzero, terminate the scan and return the value returned from call. + * When the whole tree is scanned, return 0. + * On error, return -1. + */ +int dm_multisnap_list_btree(struct dm_exception_store *s, struct bt_key *key, + int (*call)(struct dm_exception_store *, struct dm_multisnap_bt_node *, + struct dm_multisnap_bt_entry *, void *), + void *cookie) +{ + struct dm_multisnap_bt_node *node; + struct path_element path[MAX_BT_DEPTH]; + struct dm_buffer *bp; + int depth; + int i; + int r; + + r = walk_btree(s, key, &node, &bp, path); + if (unlikely(r < 0)) + return r; + +list_next_node: + for (i = path[s->bt_depth - 1].idx; i < le32_to_cpu(node->n_entries); i++) { + cond_resched(); + r = call(s, node, &node->entries[i], cookie); + if (unlikely(r)) { + dm_bufio_release(bp); + return r; + } + } + dm_bufio_release(bp); + + for (depth = s->bt_depth - 2; depth >= 0; depth--) { + int idx; + node = dm_multisnap_read_btnode(s, depth, path[depth].block, + path[depth].n_entries, &bp); + if (!node) + return -1; + idx = path[depth].idx + 1; + if (idx < path[depth].n_entries) { + r = compare_key(&node->entries[idx], key); + if (unlikely(r <= 0)) { + dm_bufio_release(bp); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_list_btree: non-monotonic btree: node " + "%llx, index %x", + (unsigned long long)path[depth].block, idx)); + return 0; + } + path[depth].idx = idx; + do { + depth++; + path[depth].block = read_48(&node->entries[path[depth - 1].idx], new_chunk); + path[depth].idx = 0; + dm_bufio_release(bp); + node = dm_multisnap_read_btnode(s, depth, path[depth].block, 0, &bp); + if (!node) + return -1; + path[depth].n_entries = le32_to_cpu(node->n_entries); + } while (depth < s->bt_depth - 1); + goto list_next_node; + } + dm_bufio_release(bp); + } + + return 0; +} + +/* + * Add a key and chunk to the btree. + * The key must not overlap with any existing btree entry. + */ + +void dm_multisnap_add_to_btree(struct dm_exception_store *s, struct bt_key *key, chunk_t new_chunk) +{ + struct dm_multisnap_bt_node *node; + struct dm_buffer *bp; + struct path_element path[MAX_BT_DEPTH]; + int depth; + + unsigned split_entries, split_index, split_offset, split_size; + struct bt_key new_key; + struct dm_multisnap_bt_entry *last_one; + chunk_t new_root; + + int r = walk_btree(s, key, &node, &bp, path); + + if (unlikely(r)) { + if (r > 0) { + dm_bufio_release(bp); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_add_to_btree: adding key that already exists: " + "%llx, %llx-%llx", + (unsigned long long)key->chunk, + (unsigned long long)key->snap_from, + (unsigned long long)key->snap_to)); + } + return; + } + + depth = s->bt_depth - 1; + +go_up: + node = dm_multisnap_alloc_duplicate_block(s, path[depth].block, &bp, node); + if (unlikely(!node)) + return; + + if (likely(le32_to_cpu(node->n_entries) < s->btree_entries)) { + add_at_idx(node, path[depth].idx, key, new_chunk); + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + return; + } + cond_resched(); + memcpy(s->tmp_chunk, node, s->chunk_size); + cond_resched(); + add_at_idx(s->tmp_chunk, path[depth].idx, key, new_chunk); + + split_entries = le32_to_cpu(((struct dm_multisnap_bt_node *)s->tmp_chunk)->n_entries); + split_index = split_entries / 2; + split_offset = sizeof(struct dm_multisnap_bt_node) + split_index * sizeof(struct dm_multisnap_bt_entry); + split_size = sizeof(struct dm_multisnap_bt_node) + split_entries * sizeof(struct dm_multisnap_bt_entry); + cond_resched(); + memcpy(node, s->tmp_chunk, sizeof(struct dm_multisnap_bt_node)); + cond_resched(); + memcpy((char *)node + sizeof(struct dm_multisnap_bt_node), + (char *)s->tmp_chunk + split_offset, split_size - split_offset); + cond_resched(); + memset((char *)node + sizeof(struct dm_multisnap_bt_node) + split_size - split_offset, 0, + s->chunk_size - (sizeof(struct dm_multisnap_bt_node) + split_size - split_offset)); + cond_resched(); + node->n_entries = cpu_to_le32(split_entries - split_index); + + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + + node = dm_multisnap_alloc_make_block(s, &new_chunk, &bp); + if (unlikely(!node)) + return; + + cond_resched(); + memcpy(node, s->tmp_chunk, split_offset); + cond_resched(); + memset((char *)node + split_offset, 0, s->chunk_size - split_offset); + cond_resched(); + node->n_entries = cpu_to_le32(split_index); + + last_one = &node->entries[split_index - 1]; + new_key.chunk = read_48(last_one, orig_chunk); + new_key.snap_from = mikulas_snapid_to_cpu(last_one->snap_to); + new_key.snap_to = mikulas_snapid_to_cpu(last_one->snap_to); + + key = &new_key; + + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + + if (depth--) { + node = dm_multisnap_read_btnode(s, depth, path[depth].block, + path[depth].n_entries, &bp); + if (unlikely(!node)) + return; + goto go_up; + } + + if (s->bt_depth >= MAX_BT_DEPTH) { + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_add_to_btree: max b+-tree depth reached")); + return; + } + + node = dm_multisnap_alloc_make_block(s, &new_root, &bp); + if (unlikely(!node)) + return; + + cond_resched(); + memset(node, 0, s->chunk_size); + cond_resched(); + node->signature = BT_SIGNATURE; + node->n_entries = cpu_to_le32(0); + add_at_idx(node, 0, &new_key, new_chunk); + new_key.snap_from = new_key.snap_to = DM_SNAPID_T_LAST; + new_key.chunk = DM_CHUNK_T_LAST; + add_at_idx(node, 1, &new_key, path[0].block); + + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + + s->bt_root = new_root; + s->bt_depth++; +} + +/* + * Change the last entry from old_chunk/old_snapid to new_chunk/new_snapid. + * Start at a given depth and go upward to the root. + */ +static void dm_multisnap_fixup_backlimits(struct dm_exception_store *s, + struct path_element path[MAX_BT_DEPTH], int depth, + chunk_t old_chunk, mikulas_snapid_t old_snapid, + chunk_t new_chunk, mikulas_snapid_t new_snapid) +{ + int idx; + struct dm_multisnap_bt_node *node; + struct dm_buffer *bp; + + if (old_chunk == new_chunk && old_snapid == new_snapid) + return; + + for (depth--; depth >= 0; depth--) { + node = dm_multisnap_read_btnode(s, depth, path[depth].block, + path[depth].n_entries, &bp); + if (unlikely(!node)) + return; + + node = dm_multisnap_alloc_duplicate_block(s, path[depth].block, &bp, node); + if (unlikely(!node)) + return; + + idx = path[depth].idx; + + if (unlikely(read_48(&node->entries[idx], orig_chunk) != old_chunk) || + unlikely(mikulas_snapid_to_cpu(node->entries[idx].snap_from) != old_snapid) || + unlikely(mikulas_snapid_to_cpu(node->entries[idx].snap_to) != old_snapid)) { + dm_bufio_release(bp); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_fixup_backlimits: btree limit does not match, block " + "%llx, idx %x, orig_chunk %llx, snap_from %llx, snap_to " + "%llx, want %llx, %llx", + (unsigned long long)path[depth].block, + idx, + (unsigned long long)read_48(&node->entries[idx], orig_chunk), + (unsigned long long)mikulas_snapid_to_cpu(node->entries[idx].snap_from), + (unsigned long long)mikulas_snapid_to_cpu(node->entries[idx].snap_to), + (unsigned long long)old_chunk, + (unsigned long long)old_snapid)); + return; + } + write_48(&node->entries[idx], orig_chunk, new_chunk); + node->entries[idx].snap_from = node->entries[idx].snap_to = cpu_to_mikulas_snapid(new_snapid); + + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + + if (path[depth].idx != path[depth].n_entries - 1) + return; + } + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_fixup_backlimits: the last entry modified, %llx/%llx -> %llx/%llx", + (unsigned long long)old_chunk, + (unsigned long long)old_snapid, + (unsigned long long)new_chunk, + (unsigned long long)new_snapid)); +} + +/* + * Restrict the range of an existing btree entry. + * The key must have the same beginning or end as some existing entry (not both) + * The range of the key is excluded from the entry. + */ +void dm_multisnap_restrict_btree_entry(struct dm_exception_store *s, struct bt_key *key) +{ + struct dm_multisnap_bt_node *node; + struct path_element path[MAX_BT_DEPTH]; + struct dm_buffer *bp; + int idx; + struct dm_multisnap_bt_entry *entry; + mikulas_snapid_t from, to, new_to; + + int r = walk_btree(s, key, &node, &bp, path); + if (unlikely(r < 0)) + return; + + if (!r) { + dm_bufio_release(bp); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_restrict_btree_entry: unknown key: %llx, %llx-%llx", + (unsigned long long)key->chunk, + (unsigned long long)key->snap_from, + (unsigned long long)key->snap_to)); + return; + } + + node = dm_multisnap_alloc_duplicate_block(s, path[s->bt_depth - 1].block, &bp, node); + if (unlikely(!node)) + return; + + idx = path[s->bt_depth - 1].idx; + entry = &node->entries[idx]; + from = mikulas_snapid_to_cpu(entry->snap_from); + to = new_to = mikulas_snapid_to_cpu(entry->snap_to); + if (key->snap_from == from && key->snap_to < to) { + entry->snap_from = cpu_to_mikulas_snapid(key->snap_to + 1); + } else if (key->snap_from > from && key->snap_to == to) { + new_to = key->snap_from - 1; + entry->snap_to = cpu_to_mikulas_snapid(new_to); + } else { + dm_bufio_release(bp); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_restrict_btree_entry: invali range to restruct: " + "%llx, %llx-%llx %llx-%llx", + (unsigned long long)key->chunk, + (unsigned long long)from, + (unsigned long long)to, + (unsigned long long)key->snap_from, + (unsigned long long)key->snap_to)); + return; + } + + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + + if (unlikely(idx == path[s->bt_depth - 1].n_entries - 1)) + dm_multisnap_fixup_backlimits(s, path, s->bt_depth - 1, + key->chunk, to, key->chunk, new_to); +} + +/* + * Expand range of an existing btree entry. + * The key represents the whole new range (including the old and new part). + */ +void dm_multisnap_extend_btree_entry(struct dm_exception_store *s, struct bt_key *key) +{ + struct dm_multisnap_bt_node *node; + struct path_element path[MAX_BT_DEPTH]; + struct dm_buffer *bp; + int idx; + struct dm_multisnap_bt_entry *entry; + mikulas_snapid_t from, to, new_to; + + int r = walk_btree(s, key, &node, &bp, path); + if (unlikely(r < 0)) + return; + + if (!r) { + dm_bufio_release(bp); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_extend_btree_entry: unknown key: " + "%llx, %llx-%llx", + (unsigned long long)key->chunk, + (unsigned long long)key->snap_from, + (unsigned long long)key->snap_to)); + return; + } + + node = dm_multisnap_alloc_duplicate_block(s, path[s->bt_depth - 1].block, + &bp, node); + if (unlikely(!node)) + return; + + idx = path[s->bt_depth - 1].idx; + entry = &node->entries[idx]; + from = mikulas_snapid_to_cpu(entry->snap_from); + to = new_to = mikulas_snapid_to_cpu(entry->snap_to); + if (key->snap_from < from) + entry->snap_from = cpu_to_mikulas_snapid(key->snap_from); + if (key->snap_to > to) { + new_to = key->snap_to; + entry->snap_to = cpu_to_mikulas_snapid(new_to); + } + + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + + if (unlikely(idx == path[s->bt_depth - 1].n_entries - 1)) + dm_multisnap_fixup_backlimits(s, path, s->bt_depth - 1, + key->chunk, to, key->chunk, new_to); +} + +/* + * Delete an entry from the btree. + */ +void dm_multisnap_delete_from_btree(struct dm_exception_store *s, struct bt_key *key) +{ + struct dm_multisnap_bt_node *node; + struct path_element path[MAX_BT_DEPTH]; + struct dm_buffer *bp; + int idx; + struct dm_multisnap_bt_entry *entry; + mikulas_snapid_t from, to; + int depth, n_entries; + + struct dm_multisnap_bt_entry *last_one; + chunk_t last_one_chunk; + mikulas_snapid_t last_one_snap_to; + + int r = walk_btree(s, key, &node, &bp, path); + if (unlikely(r < 0)) + return; + + if (unlikely(!r)) { + dm_bufio_release(bp); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_delete_from_btree: unknown key: %llx, %llx-%llx", + (unsigned long long)key->chunk, + (unsigned long long)key->snap_from, + (unsigned long long)key->snap_to)); + return; + } + + depth = s->bt_depth - 1; + + idx = path[depth].idx; + entry = &node->entries[idx]; + from = mikulas_snapid_to_cpu(entry->snap_from); + to = mikulas_snapid_to_cpu(entry->snap_to); + if (unlikely(from != key->snap_from) || unlikely(to != key->snap_to)) { + dm_bufio_release(bp); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_delete_from_btree: invalid range to restrict: " + "%llx, %llx-%llx %llx-%llx", + (unsigned long long)key->chunk, + (unsigned long long)from, + (unsigned long long)to, + (unsigned long long)key->snap_from, + (unsigned long long)key->snap_to)); + return; + } + + while (unlikely((n_entries = le32_to_cpu(node->n_entries)) == 1)) { + dm_bufio_release(bp); + if (unlikely(!depth)) { + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_delete_from_btree: b-tree is empty")); + return; + } + dm_multisnap_free_block_and_duplicates(s, path[depth].block); + depth--; + node = dm_multisnap_read_btnode(s, depth, path[depth].block, + path[depth].n_entries, &bp); + if (!node) + return; + } + + node = dm_multisnap_alloc_duplicate_block(s, path[depth].block, &bp, node); + if (unlikely(!node)) + return; + + idx = path[depth].idx; + + cond_resched(); + memmove(node->entries + idx, node->entries + idx + 1, + (n_entries - idx - 1) * sizeof(struct dm_multisnap_bt_entry)); + cond_resched(); + n_entries--; + memset(node->entries + n_entries, 0, sizeof(struct dm_multisnap_bt_entry)); + + node->n_entries = cpu_to_le32(n_entries); + + last_one = &node->entries[n_entries - 1]; + last_one_chunk = read_48(last_one, orig_chunk); + last_one_snap_to = mikulas_snapid_to_cpu(last_one->snap_to); + + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + + if (unlikely(idx == n_entries)) + dm_multisnap_fixup_backlimits(s, path, depth, key->chunk, + key->snap_to, last_one_chunk, + last_one_snap_to); +} + +/* + * Process btree tmp remaps. + * Find the whole path for tmp_remap and write the path as new entries, from + * the root. + */ +void dm_multisnap_bt_finalize_tmp_remap(struct dm_exception_store *s, + struct tmp_remap *tmp_remap) +{ + struct dm_buffer *bp; + struct dm_multisnap_bt_node *node; + struct bt_key key; + struct path_element path[MAX_BT_DEPTH]; + int results_ptr; + + chunk_t new_blockn; + int r; + int i; + + if (s->n_preallocated_blocks < s->bt_depth) { + if (dm_multisnap_alloc_blocks(s, s->preallocated_blocks + s->n_preallocated_blocks, + s->bt_depth - s->n_preallocated_blocks, 0) < 0) + return; + s->n_preallocated_blocks = s->bt_depth; + } + results_ptr = 0; + + /* + * Read the key from this node --- we'll walk the btree according + * to this key to find a path from the root. + */ + node = dm_multisnap_read_btnode(s, s->bt_depth - 1, tmp_remap->new, 0, &bp); + if (!node) + return; + key.chunk = read_48(&node->entries[0], orig_chunk); + key.snap_from = key.snap_to = mikulas_snapid_to_cpu(node->entries[0].snap_from); + dm_bufio_release(bp); + + r = walk_btree(s, &key, &node, &bp, path); + if (r < 0) + return; + + dm_bufio_release(bp); + + for (i = s->bt_depth - 1; i >= 0; i--) + if (path[i].block == tmp_remap->old) + goto found; + + DMERR("block %llx/%llx was not found in btree when searching for %llx/%llx", + (unsigned long long)tmp_remap->old, + (unsigned long long)tmp_remap->new, + (unsigned long long)key.chunk, + (unsigned long long)key.snap_from); + for (i = 0; i < s->bt_depth; i++) + DMERR("path[%d]: %llx/%x", i, (unsigned long long)path[i].block, path[i].idx); + dm_multisnap_set_error(s->dm, -EFSERROR); + return; + +found: + dm_multisnap_free_block(s, tmp_remap->old, 0); + + new_blockn = tmp_remap->new; + for (i--; i >= 0; i--) { + int remapped = 0; + node = dm_multisnap_read_btnode(s, i, path[i].block, path[i].n_entries, &bp); + if (!node) + return; + if (!dm_multisnap_block_is_uncommitted(s, path[i].block)) { + remapped = 1; + dm_bufio_release_move(bp, s->preallocated_blocks[results_ptr]); + dm_multisnap_free_block_and_duplicates(s, path[i].block); + node = dm_multisnap_read_btnode(s, i, s->preallocated_blocks[results_ptr], + path[i].n_entries, &bp); + if (!node) + return; + dm_multisnap_block_set_uncommitted(s, s->preallocated_blocks[results_ptr]); + } + write_48(&node->entries[path[i].idx], new_chunk, new_blockn); + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + + if (!remapped) + goto skip_it; + new_blockn = s->preallocated_blocks[results_ptr]; + results_ptr++; + } + + s->bt_root = new_blockn; + +skip_it: + memmove(s->preallocated_blocks, s->preallocated_blocks + results_ptr, + (s->n_preallocated_blocks -= results_ptr) * sizeof(chunk_t)); +} -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 07/14] dm-multisnap-mikulas-commit 2010-03-02 0:23 [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer ` (5 preceding siblings ...) 2010-03-02 0:23 ` [PATCH 06/14] dm-multisnap-mikulas-btree Mike Snitzer @ 2010-03-02 0:23 ` Mike Snitzer 2010-03-02 0:23 ` [PATCH 08/14] dm-multisnap-mikulas-delete Mike Snitzer ` (8 subsequent siblings) 15 siblings, 0 replies; 22+ messages in thread From: Mike Snitzer @ 2010-03-02 0:23 UTC (permalink / raw) To: dm-devel; +Cc: Mikulas Patocka From: Mikulas Patocka <mpatocka@redhat.com> Writing the commit block. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/dm-multisnap-commit.c | 245 ++++++++++++++++++++++++++++++++++++++ 1 files changed, 245 insertions(+), 0 deletions(-) create mode 100644 drivers/md/dm-multisnap-commit.c diff --git a/drivers/md/dm-multisnap-commit.c b/drivers/md/dm-multisnap-commit.c new file mode 100644 index 0000000..78b2583 --- /dev/null +++ b/drivers/md/dm-multisnap-commit.c @@ -0,0 +1,245 @@ +/* + * Copyright (C) 2009 Red Hat Czech, s.r.o. + * + * Mikulas Patocka <mpatocka@redhat.com> + * + * This file is released under the GPL. + */ + +#include "dm-multisnap-mikulas.h" + +/* + * Flush existing tmp_remaps. + */ +static void dm_multisnap_finalize_tmp_remaps(struct dm_exception_store *s) +{ + struct tmp_remap *t; + int i; + + while (s->n_used_tmp_remaps) { + if (dm_multisnap_has_error(s->dm)) + return; + if (s->n_used_tmp_remaps < N_REMAPS - 1) { + /* + * prefer btree remaps ... + * if there are none, do bitmap remaps + */ + if (!list_empty(&s->used_bt_tmp_remaps)) { + t = container_of(s->used_bt_tmp_remaps.next, + struct tmp_remap, list); + dm_multisnap_bt_finalize_tmp_remap(s, t); + dm_multisnap_free_tmp_remap(s, t); + continue; + } + } + + /* else: 0 or 1 free remaps : finalize bitmaps */ + if (!list_empty(&s->used_bitmap_tmp_remaps)) { + t = container_of(s->used_bitmap_tmp_remaps.next, + struct tmp_remap, list); + dm_multisnap_bitmap_finalize_tmp_remap(s, t); + dm_multisnap_free_tmp_remap(s, t); + continue; + } else { + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_finalize_tmp_remaps: no bitmap tmp remaps, n_used_tmp_remaps %u", + s->n_used_tmp_remaps)); + return; + } + } + + if (dm_multisnap_has_error(s->dm)) + return; + + for (i = s->n_preallocated_blocks - 1; i >= 0; i--) + dm_multisnap_free_blocks_immediate(s, s->preallocated_blocks[i], 1); + s->n_preallocated_blocks = 0; +} + +/* + * This function must be called before any two b+tree modification at a point + * when b+tree is consistent. It flushes tmp_remaps, so that tmp_remap array + * doesn't overflow. This function doesn't commit anything. + */ +void dm_multisnap_transition_mark(struct dm_exception_store *s) +{ + /* + * Accounting: + * max number of modified/allocated blocks during btree add: + * s->bt_depth * 2 + 1 + * one additional entry for newly allocated data chunk + * one additional entry for bitmap finalization + */ + if (unlikely(N_REMAPS - s->n_used_tmp_remaps < s->bt_depth * 2 + 3)) + dm_multisnap_finalize_tmp_remaps(s); +} + +/* + * Flush buffers. This is called without the lock to reduce lock contention. + * The buffers will be flushed again, with the lock. + */ +void dm_multisnap_prepare_for_commit(struct dm_exception_store *s) +{ + int r; + + r = dm_bufio_write_dirty_buffers(s->bufio); + if (unlikely(r < 0)) { + DM_MULTISNAP_SET_ERROR(s->dm, r, + ("dm_multisnap_prepare_for_commit: error writing data")); + return; + } +} + +/* + * This function makes any modifications make so far permanent. + * + * It is valid to make multiple modifications to the exception store and + * then commit them atomically at once with this function. + */ +void dm_multisnap_commit(struct dm_exception_store *s) +{ + struct tmp_remap *t; + chunk_t cb_addr; + chunk_t cb_div, cb_offset; + struct multisnap_commit_block *cb; + struct multisnap_superblock *sb; + unsigned idx; + struct dm_buffer *bp; + int r; + + dm_multisnap_transition_mark(s); + + /* Forget all uncommitted blocks --- they are going to be committed */ + dm_multisnap_clear_uncommitted(s); + + dm_multisnap_flush_freelist_before_commit(s); + + if (dm_multisnap_has_error(s->dm)) { + if (!dm_multisnap_drop_on_error(s->dm)) + return; + + sb = dm_bufio_read(s->bufio, SB_BLOCK, &bp); + if (IS_ERR(sb)) + return; + + if (!le32_to_cpu(sb->error)) { + sb->error = cpu_to_le32(dm_multisnap_has_error(s->dm)); + dm_bufio_mark_buffer_dirty(bp); + } + + dm_bufio_release(bp); + return; + } + + list_for_each_entry(t, &s->used_bitmap_tmp_remaps, list) + t->uncommitted = 0; + + list_for_each_entry(t, &s->used_bt_tmp_remaps, list) + t->uncommitted = 0; + + r = dm_bufio_write_dirty_buffers(s->bufio); + if (unlikely(r < 0)) { + DM_MULTISNAP_SET_ERROR(s->dm, r, + ("dm_multisnap_commit: error writing data")); + return; + } + + cb_addr = s->alloc_rover; + + if (cb_addr < FIRST_CB_BLOCK) + cb_addr = FIRST_CB_BLOCK; + cb_div = cb_addr - FIRST_CB_BLOCK; + cb_offset = sector_div(cb_div, s->cb_stride); + cb_addr += s->cb_stride - cb_offset; + if (cb_offset < s->cb_stride / 2 || cb_addr >= s->dev_size) + cb_addr -= s->cb_stride; + + cb = dm_bufio_new(s->bufio, cb_addr, &bp); + if (IS_ERR(cb)) { + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(cb), + ("dm_multisnap_commit: can't allocate new commit block at %llx", + (unsigned long long)cb_addr)); + return; + } + + s->commit_sequence++; + + cb->signature = CB_SIGNATURE; + cb->snapshot_num = cpu_to_le32(s->snapshot_num); + cb->sequence = cpu_to_le64(s->commit_sequence); + write_48(cb, dev_size, s->dev_size); + write_48(cb, total_allocated, s->total_allocated); + write_48(cb, data_allocated, s->data_allocated); + write_48(cb, bitmap_root, s->bitmap_root); + write_48(cb, alloc_rover, s->alloc_rover); + write_48(cb, freelist, s->freelist_ptr); + write_48(cb, delete_rover, s->delete_rover_chunk); + write_48(cb, bt_root, s->bt_root); + cb->bt_depth = s->bt_depth; + cb->flags = s->flags; + memset(cb->pad, 0, sizeof cb->pad); + idx = 0; + list_for_each_entry(t, &s->used_bitmap_tmp_remaps, list) { + BUG_ON(idx >= N_REMAPS); + write_48(&cb->tmp_remap[idx], old, t->old); + write_48(&cb->tmp_remap[idx], new, t->new); + cb->tmp_remap[idx].bitmap_idx = cpu_to_le32(t->bitmap_idx); + idx++; + } + list_for_each_entry(t, &s->used_bt_tmp_remaps, list) { + BUG_ON(idx >= N_REMAPS); + write_48(&cb->tmp_remap[idx], old, t->old); + write_48(&cb->tmp_remap[idx], new, t->new); + cb->tmp_remap[idx].bitmap_idx = cpu_to_le32(t->bitmap_idx); + idx++; + } + for (; idx < N_REMAPS; idx++) { + write_48(&cb->tmp_remap[idx], old, 0); + write_48(&cb->tmp_remap[idx], new, 0); + cb->tmp_remap[idx].bitmap_idx = cpu_to_le32(0); + } + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + r = dm_bufio_write_dirty_buffers(s->bufio); + if (unlikely(r < 0)) { + DM_MULTISNAP_SET_ERROR(s->dm, r, + ("dm_multisnap_commit: can't write commit block at %llx", + (unsigned long long)cb_addr)); + return; + } + + if (likely(cb_addr == s->valid_commit_block) || + likely(cb_addr == s->valid_commit_block + s->cb_stride)) + goto return_success; + + sb = dm_bufio_read(s->bufio, SB_BLOCK, &bp); + if (IS_ERR(sb)) { + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(sb), + ("dm_multisnap_commit: can't read super block")); + return; + } + + if (unlikely(sb->signature != SB_SIGNATURE)) { + dm_bufio_release(bp); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_commit: invalid super block signature when committing")); + return; + } + + sb->commit_block = cpu_to_le64(cb_addr); + + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + r = dm_bufio_write_dirty_buffers(s->bufio); + if (unlikely(r < 0)) { + DM_MULTISNAP_SET_ERROR(s->dm, r, ("dm_multisnap_commit: can't write super block")); + return; + } + +return_success: + s->valid_commit_block = cb_addr; + + dm_multisnap_load_freelist(s); + + return; +} -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 08/14] dm-multisnap-mikulas-delete 2010-03-02 0:23 [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer ` (6 preceding siblings ...) 2010-03-02 0:23 ` [PATCH 07/14] dm-multisnap-mikulas-commit Mike Snitzer @ 2010-03-02 0:23 ` Mike Snitzer 2010-03-02 0:23 ` [PATCH 09/14] dm-multisnap-mikulas-freelist Mike Snitzer ` (7 subsequent siblings) 15 siblings, 0 replies; 22+ messages in thread From: Mike Snitzer @ 2010-03-02 0:23 UTC (permalink / raw) To: dm-devel; +Cc: Mikulas Patocka From: Mikulas Patocka <mpatocka@redhat.com> Background delete operation. Scan the b+tree on background and find entries that have unused snapshot IDs. Delete these entries and the associated chunks. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/dm-multisnap-delete.c | 137 ++++++++++++++++++++++++++++++++++++++ 1 files changed, 137 insertions(+), 0 deletions(-) create mode 100644 drivers/md/dm-multisnap-delete.c diff --git a/drivers/md/dm-multisnap-delete.c b/drivers/md/dm-multisnap-delete.c new file mode 100644 index 0000000..22705a3 --- /dev/null +++ b/drivers/md/dm-multisnap-delete.c @@ -0,0 +1,137 @@ +/* + * Copyright (C) 2009 Red Hat Czech, s.r.o. + * + * Mikulas Patocka <mpatocka@redhat.com> + * + * This file is released under the GPL. + */ + +#include "dm-multisnap-mikulas.h" + +/* + * Commit after this number of deleted entries. + * Too big number causes spurious overflows on nearly-full device. + * Too small number degrades delete performance. + */ +#define COMMIT_AFTER 128 + +struct list_cookie { + struct bt_key key; + chunk_t new_chunk; +}; + +#define RET_END 1 +#define RET_DO_FREE 2 +#define RET_RESCHEDULE 3 + +static int list_callback(struct dm_exception_store *s, + struct dm_multisnap_bt_node *node, + struct dm_multisnap_bt_entry *bt, void *cookie) +{ + struct list_cookie *lc = cookie; + mikulas_snapid_t found_from, found_to; + + lc->key.chunk = read_48(bt, orig_chunk); + lc->key.snap_from = mikulas_snapid_to_cpu(bt->snap_from); + lc->key.snap_to = mikulas_snapid_to_cpu(bt->snap_to); + + if (unlikely(lc->key.chunk > DM_CHUNK_T_MAX)) + return RET_END; + + s->delete_rover_chunk = lc->key.chunk; + s->delete_rover_snapid = lc->key.snap_to + 1; + if (unlikely(!s->delete_rover_snapid)) + s->delete_rover_chunk++; + + if (!dm_multisnap_find_next_snapid_range(s, lc->key.snap_from, + &found_from, &found_to) || + found_from > lc->key.snap_to) { + /* + * This range maps unused snapshots, delete it. + * But we can't do it now, so submit it to the caller; + */ + lc->new_chunk = read_48(bt, new_chunk); + return RET_DO_FREE; + } + + /* + * If we are at a last entry in the btree node, drop the lock and + * allow other requests to be processed. + * + * This avoids a starvation when there are no nodes to delete. + */ + if (bt == &node->entries[le32_to_cpu(node->n_entries) - 1]) + return RET_RESCHEDULE; + + return 0; +} + +static void delete_step(struct dm_exception_store *s) +{ + struct bt_key key; + int r; + struct list_cookie lc; + + key.chunk = s->delete_rover_chunk; + key.snap_from = s->delete_rover_snapid; + key.snap_to = s->delete_rover_snapid; + + r = dm_multisnap_list_btree(s, &key, list_callback, &lc); + + if (unlikely(r < 0)) + return; + + switch (r) { + + case RET_END: + s->flags &= ~DM_MULTISNAP_FLAG_DELETING; + + /* If we finished the job and there is no pending I/O, commit */ + if (dm_multisnap_can_commit(s->dm)) + dm_multisnap_call_commit(s->dm); + + return; + case RET_DO_FREE: + if (unlikely(dm_multisnap_has_error(s->dm))) + return; + + dm_multisnap_delete_from_btree(s, &lc.key); + + dm_multisnap_transition_mark(s); + + dm_multisnap_free_block(s, lc.new_chunk, FREELIST_DATA_FLAG); + + /* fall through */ + case RET_RESCHEDULE: + if (dm_multisnap_can_commit(s->dm)) { + if (++s->delete_commit_count >= COMMIT_AFTER) { + s->delete_commit_count = 0; + dm_multisnap_call_commit(s->dm); + } + } + return; + default: + printk(KERN_CRIT "delete_step: invalid return value %d", r); + BUG(); + + } +} + +void dm_multisnap_background_delete(struct dm_exception_store *s, + struct dm_multisnap_background_work *bw) +{ + if (unlikely(dm_multisnap_has_error(s->dm))) + return; + + if (s->flags & DM_MULTISNAP_FLAG_DELETING) { + delete_step(s); + } else if (s->flags & DM_MULTISNAP_FLAG_PENDING_DELETE) { + s->flags &= ~DM_MULTISNAP_FLAG_PENDING_DELETE; + s->flags |= DM_MULTISNAP_FLAG_DELETING; + s->delete_rover_chunk = 0; + s->delete_rover_snapid = 0; + } else + return; + + dm_multisnap_queue_work(s->dm, &s->delete_work); +} -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 09/14] dm-multisnap-mikulas-freelist 2010-03-02 0:23 [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer ` (7 preceding siblings ...) 2010-03-02 0:23 ` [PATCH 08/14] dm-multisnap-mikulas-delete Mike Snitzer @ 2010-03-02 0:23 ` Mike Snitzer 2010-03-02 0:23 ` [PATCH 10/14] dm-multisnap-mikulas-io Mike Snitzer ` (6 subsequent siblings) 15 siblings, 0 replies; 22+ messages in thread From: Mike Snitzer @ 2010-03-02 0:23 UTC (permalink / raw) To: dm-devel; +Cc: Mikulas Patocka From: Mikulas Patocka <mpatocka@redhat.com> Freelist management. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/dm-multisnap-freelist.c | 296 ++++++++++++++++++++++++++++++++++++ 1 files changed, 296 insertions(+), 0 deletions(-) create mode 100644 drivers/md/dm-multisnap-freelist.c diff --git a/drivers/md/dm-multisnap-freelist.c b/drivers/md/dm-multisnap-freelist.c new file mode 100644 index 0000000..6ec1476 --- /dev/null +++ b/drivers/md/dm-multisnap-freelist.c @@ -0,0 +1,296 @@ +/* + * Copyright (C) 2009 Red Hat Czech, s.r.o. + * + * Mikulas Patocka <mpatocka@redhat.com> + * + * This file is released under the GPL. + */ + +#include "dm-multisnap-mikulas.h" + +/* + * Initialize in-memory freelist structure. + */ +void dm_multisnap_init_freelist(struct dm_multisnap_freelist *fl, unsigned chunk_size) +{ + cond_resched(); + memset(fl, 0, chunk_size); + cond_resched(); + fl->signature = FL_SIGNATURE; + write_48(fl, backlink, 0); + fl->n_entries = cpu_to_le32(0); +} + +/* + * Add a given block to in-memory freelist. + * Returns: + * -1 --- error + * 1 --- block was added + * 0 --- block could not be added because the freelist is full + */ +static int add_to_freelist(struct dm_exception_store *s, chunk_t block, unsigned flags) +{ + int i; + struct dm_multisnap_freelist *fl = s->freelist; + for (i = le32_to_cpu(fl->n_entries) - 1; i >= 0; i--) { + chunk_t x = read_48(&fl->entries[i], block); + unsigned r = le16_to_cpu(fl->entries[i].run_length) & FREELIST_RL_MASK; + unsigned f = le16_to_cpu(fl->entries[i].run_length) & FREELIST_DATA_FLAG; + if (block >= x && block < x + r) { + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("add_to_freelist: freeing already free block %llx (%llx - %x)", + (unsigned long long)block, + (unsigned long long)x, + r)); + return -1; + } + if (likely(r < FREELIST_RL_MASK) && likely(f == flags)) { + if (block == x - 1) { + write_48(&fl->entries[i], block, x - 1); + goto inc_length; + } + if (block == x + r) { +inc_length: + fl->entries[i].run_length = cpu_to_le16((r + 1) | f); + return 1; + } + } + cond_resched(); + } + i = le32_to_cpu(fl->n_entries); + if (i < dm_multisnap_freelist_entries(s->chunk_size)) { + fl->n_entries = cpu_to_le32(i + 1); + write_48(&fl->entries[i], block, block); + fl->entries[i].run_length = cpu_to_le16(1 | flags); + return 1; + } + return 0; +} + +/* + * Read a freelist block from the disk. + */ +static struct dm_multisnap_freelist * +read_freelist(struct dm_exception_store *s, chunk_t block, struct dm_buffer **bp) +{ + struct dm_multisnap_freelist *fl; + fl = dm_bufio_read(s->bufio, block, bp); + if (IS_ERR(fl)) { + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(fl), + ("read_freelist: can't read freelist block %llx", + (unsigned long long)block)); + return NULL; + } + if (fl->signature != FL_SIGNATURE) { + dm_bufio_release(*bp); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("read_freelist: bad signature freelist block %llx", + (unsigned long long)block)); + return NULL; + } + if (le32_to_cpu(fl->n_entries) > dm_multisnap_freelist_entries(s->chunk_size)) { + dm_bufio_release(*bp); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("read_freelist: bad number of entries in freelist block %llx", + (unsigned long long)block)); + return NULL; + } + return fl; +} + +/* + * Allocate a block and write the current in-memory freelist to it. + * Then, clear the in-memory freelist. + */ +static void alloc_write_freelist(struct dm_exception_store *s) +{ + chunk_t new_block; + struct dm_multisnap_freelist *fl; + struct dm_buffer *bp; + + if (dm_multisnap_alloc_blocks(s, &new_block, 1, ALLOC_DRY)) + return; + + fl = dm_bufio_new(s->bufio, new_block, &bp); + if (IS_ERR(fl)) { + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(fl), + ("alloc_write_freelist: can't make new freelist block %llx", + (unsigned long long)new_block)); + return; + } + + memcpy(fl, s->freelist, s->chunk_size); + + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + + dm_multisnap_init_freelist(s->freelist, s->chunk_size); + write_48(s->freelist, backlink, new_block); +} + +/* + * This function is called by other subsystems when they want to free a block. + * It adds the block to the current freelist, if the freelist is full, it + * flushes the freelist and makes a new one. + */ +void dm_multisnap_free_block(struct dm_exception_store *s, chunk_t block, unsigned flags) +{ + if (likely(add_to_freelist(s, block, flags))) + return; + + alloc_write_freelist(s); + if (unlikely(dm_multisnap_has_error(s->dm))) + return; + + if (likely(add_to_freelist(s, block, flags))) + return; + + BUG(); +} + +/* + * Check if a given block is in a given freelist. + */ +static int check_against_freelist(struct dm_multisnap_freelist *fl, chunk_t block) +{ + int i; + for (i = le32_to_cpu(fl->n_entries) - 1; i >= 0; i--) { + chunk_t x = read_48(&fl->entries[i], block); + unsigned r = le16_to_cpu(fl->entries[i].run_length) & FREELIST_RL_MASK; + if (unlikely(block - x < r)) + return 1; + cond_resched(); + } + return 0; +} + +/* + * Check if a given block is in any freelist in a freelist chain. + */ +static int check_against_freelist_chain(struct dm_exception_store *s, + chunk_t fl_block, chunk_t block) +{ + struct stop_cycles cy; + dm_multisnap_init_stop_cycles(&cy); + + while (unlikely(fl_block != 0)) { + int c; + struct dm_buffer *bp; + struct dm_multisnap_freelist *fl; + + if (dm_multisnap_stop_cycles(s, &cy, fl_block)) + return -1; + + if (unlikely(block == fl_block)) + return 1; + + fl = read_freelist(s, fl_block, &bp); + if (unlikely(!fl)) + return -1; + c = check_against_freelist(fl, block); + fl_block = read_48(fl, backlink); + dm_bufio_release(bp); + if (unlikely(c)) + return c; + } + return 0; +} + +/* + * Check if a given block can be allocated. This checks against: + * - in-memory freelist + * - the current freelist chain + * - the freelist chain that was active on last commit + */ +int dm_multisnap_check_allocated_block(struct dm_exception_store *s, chunk_t block) +{ + int c; + + c = check_against_freelist(s->freelist, block); + if (unlikely(c)) + return c; + + c = check_against_freelist_chain(s, read_48(s->freelist, backlink), block); + if (unlikely(c)) + return c; + + c = check_against_freelist_chain(s, s->freelist_ptr, block); + if (unlikely(c)) + return c; + + return 0; +} + +/* + * This is called prior to commit, it writes the current freelist to the disk. + */ +void dm_multisnap_flush_freelist_before_commit(struct dm_exception_store *s) +{ + alloc_write_freelist(s); + + if (dm_multisnap_has_error(s->dm)) + return; + + s->freelist_ptr = read_48(s->freelist, backlink); +} + +/* + * Free the blocks in the freelist. + */ +static void free_blocks_in_freelist(struct dm_exception_store *s, + struct dm_multisnap_freelist *fl) +{ + int i; + for (i = le32_to_cpu(fl->n_entries) - 1; i >= 0; i--) { + chunk_t x = read_48(&fl->entries[i], block); + unsigned r = le16_to_cpu(fl->entries[i].run_length) & FREELIST_RL_MASK; + unsigned f = le16_to_cpu(fl->entries[i].run_length) & FREELIST_DATA_FLAG; + dm_multisnap_free_blocks_immediate(s, x, r); + if (likely(f & FREELIST_DATA_FLAG)) { + dm_multisnap_status_lock(s->dm); + s->data_allocated -= r; + dm_multisnap_status_unlock(s->dm); + } + cond_resched(); + } +} + +/* + * This is called after a commit or after a mount. It walks the current freelist + * chain and frees the individual blocks. + * + * If the computer crashes while this operation is in progress, it is done again + * after a mount --- thus, it maintains data consistency. + */ +void dm_multisnap_load_freelist(struct dm_exception_store *s) +{ + chunk_t fl_block = s->freelist_ptr; + + struct stop_cycles cy; + dm_multisnap_init_stop_cycles(&cy); + + while (fl_block) { + struct dm_buffer *bp; + struct dm_multisnap_freelist *fl; + + if (dm_multisnap_stop_cycles(s, &cy, fl_block)) + break; + + if (dm_multisnap_has_error(s->dm)) + break; + + fl = read_freelist(s, fl_block, &bp); + if (!fl) + break; + memcpy(s->freelist, fl, s->chunk_size); + dm_bufio_release(bp); + + free_blocks_in_freelist(s, s->freelist); + fl_block = read_48(s->freelist, backlink); + } + + /* Write the buffers eagerly to prevent further delays */ + dm_bufio_write_dirty_buffers_async(s->bufio); + + dm_multisnap_init_freelist(s->freelist, s->chunk_size); +} -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 10/14] dm-multisnap-mikulas-io 2010-03-02 0:23 [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer ` (8 preceding siblings ...) 2010-03-02 0:23 ` [PATCH 09/14] dm-multisnap-mikulas-freelist Mike Snitzer @ 2010-03-02 0:23 ` Mike Snitzer 2010-03-02 0:23 ` [PATCH 11/14] dm-multisnap-mikulas-snaps Mike Snitzer ` (5 subsequent siblings) 15 siblings, 0 replies; 22+ messages in thread From: Mike Snitzer @ 2010-03-02 0:23 UTC (permalink / raw) To: dm-devel; +Cc: Mikulas Patocka From: Mikulas Patocka <mpatocka@redhat.com> Callbacks from dm-multisnap.c These functions are called directly from exception-store-neutral code. The find the chunk or perform chunk reallocations. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/dm-multisnap-io.c | 209 ++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 209 insertions(+), 0 deletions(-) create mode 100644 drivers/md/dm-multisnap-io.c diff --git a/drivers/md/dm-multisnap-io.c b/drivers/md/dm-multisnap-io.c new file mode 100644 index 0000000..7620ebe --- /dev/null +++ b/drivers/md/dm-multisnap-io.c @@ -0,0 +1,209 @@ +/* + * Copyright (C) 2009 Red Hat Czech, s.r.o. + * + * Mikulas Patocka <mpatocka@redhat.com> + * + * This file is released under the GPL. + */ + +#include "dm-multisnap-mikulas.h" + +/* + * This function will check if there is remapping for a given snapid/chunk. + * It returns 1 if remapping exists and is read-only (shared by other snapshots) + * and 2 if it exists and is read-write (not shared by anyone). + */ +int dm_multisnap_find_snapshot_chunk(struct dm_exception_store *s, + snapid_t snapid, chunk_t chunk, + int write, chunk_t *result) +{ + int r; + struct bt_key key; + mikulas_snapid_t from, to; + mikulas_snapid_t find_from, find_to; + + from = dm_multisnap_find_next_subsnapshot(s, snapid); + to = snapid; + + key.chunk = chunk; + key.snap_from = snapid; + key.snap_to = snapid; + r = dm_multisnap_find_in_btree(s, &key, result); + if (unlikely(r < 0)) + return r; + + if (!r) { + s->query_new_key.chunk = chunk; + s->query_new_key.snap_from = from; + s->query_new_key.snap_to = to; + s->query_active = 1; + return 0; + } + + if (!write) + return 1; + + /* + * We are writing to a snapshot --- check if anything outside <from-to> + * range exists, if it does, it needs to be copied. + */ + if (key.snap_from < from) { + if (likely(dm_multisnap_find_next_snapid_range(s, key.snap_from, + &find_from, &find_to))) { + if (find_from < from) { + s->query_new_key.chunk = chunk; + s->query_new_key.snap_from = from; + s->query_new_key.snap_to = key.snap_to; + s->query_block_from = key.snap_from; + s->query_block_to = key.snap_to; + s->query_active = 2; + return 1; + } + if (unlikely(find_from > from)) + BUG(); /* SNAPID not in our tree */ + } else + BUG(); /* we're asking for a SNAPID not in our tree */ + } + if (key.snap_to > to) { + if (likely(dm_multisnap_find_next_snapid_range(s, to + 1, + &find_from, &find_to))) { + if (find_from <= key.snap_to) { + s->query_new_key.chunk = chunk; + s->query_new_key.snap_from = key.snap_from; + s->query_new_key.snap_to = to; + s->query_block_from = key.snap_from; + s->query_block_to = key.snap_to; + s->query_active = 2; + return 1; + } + } + } + return 2; +} + +/* + * Reset the query/remap state machine. + */ +void dm_multisnap_reset_query(struct dm_exception_store *s) +{ + s->query_active = 0; + s->query_snapid = 0; +} + +/* + * Find the next snapid range to remap. + */ +int dm_multisnap_query_next_remap(struct dm_exception_store *s, chunk_t chunk) +{ + int r; + chunk_t sink; + mikulas_snapid_t from, to; + + s->query_active = 0; + + while (dm_multisnap_find_next_snapid_range(s, s->query_snapid, &from, &to)) { + struct bt_key key; +next_btree_search: + if (dm_multisnap_has_error(s->dm)) + return -1; + key.chunk = chunk; + key.snap_from = from; + key.snap_to = to; + r = dm_multisnap_find_in_btree(s, &key, &sink); + if (unlikely(r < 0)) + return -1; + + if (!r) { + s->query_new_key.chunk = chunk; + s->query_new_key.snap_from = from; + s->query_new_key.snap_to = to; + s->query_active = 1; + return 1; + } + + if (key.snap_from > from) { + s->query_new_key.chunk = chunk; + s->query_new_key.snap_from = from; + s->query_new_key.snap_to = key.snap_from - 1; + s->query_active = 1; + return 1; + } + + if (key.snap_to < to) { + from = key.snap_to + 1; + goto next_btree_search; + } + + s->query_snapid = to + 1; + } + + return 0; +} + +/* + * Perform the remap on the range returned by dm_multisnap_query_next_remap. + */ +void dm_multisnap_add_next_remap(struct dm_exception_store *s, + union chunk_descriptor *cd, chunk_t *new_chunk) +{ + int r; + + BUG_ON(s->query_active != 1); + s->query_active = 0; + + cd->range.from = s->query_new_key.snap_from; + cd->range.to = s->query_new_key.snap_to; + + r = dm_multisnap_alloc_blocks(s, new_chunk, 1, 0); + if (unlikely(r < 0)) + return; + + dm_multisnap_status_lock(s->dm); + s->data_allocated++; + dm_multisnap_status_unlock(s->dm); + + dm_multisnap_add_to_btree(s, &s->query_new_key, *new_chunk); + dm_multisnap_transition_mark(s); +} + +/* + * Make the chunk writeable (i.e. unshare multiple snapshots). + */ +void dm_multisnap_make_chunk_writeable(struct dm_exception_store *s, + union chunk_descriptor *cd, chunk_t *new_chunk) +{ + int r; + + BUG_ON(s->query_active != 2); + s->query_active = 0; + + cd->range.from = s->query_block_from; + cd->range.to = s->query_block_to; + + r = dm_multisnap_alloc_blocks(s, new_chunk, 1, 0); + if (unlikely(r < 0)) + return; + + dm_multisnap_status_lock(s->dm); + s->data_allocated++; + dm_multisnap_status_unlock(s->dm); + + dm_multisnap_restrict_btree_entry(s, &s->query_new_key); + dm_multisnap_transition_mark(s); + + if (unlikely(dm_multisnap_has_error(s->dm))) + return; + + dm_multisnap_add_to_btree(s, &s->query_new_key, *new_chunk); + dm_multisnap_transition_mark(s); +} + +/* + * Check if the snapshot belongs to the remap range specified by "cd". + */ +int dm_multisnap_check_conflict(struct dm_exception_store *s, + union chunk_descriptor *cd, snapid_t snapid) +{ + return snapid >= cd->range.from && snapid <= cd->range.to; +} + -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 11/14] dm-multisnap-mikulas-snaps 2010-03-02 0:23 [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer ` (9 preceding siblings ...) 2010-03-02 0:23 ` [PATCH 10/14] dm-multisnap-mikulas-io Mike Snitzer @ 2010-03-02 0:23 ` Mike Snitzer 2010-03-02 0:23 ` [PATCH 12/14] dm-multisnap-mikulas-common Mike Snitzer ` (4 subsequent siblings) 15 siblings, 0 replies; 22+ messages in thread From: Mike Snitzer @ 2010-03-02 0:23 UTC (permalink / raw) To: dm-devel; +Cc: Mikulas Patocka From: Mikulas Patocka <mpatocka@redhat.com> The management of snapshot ID map. Red-black tree is used to hold the snapshot ID ranges in memory. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/dm-multisnap-snaps.c | 636 +++++++++++++++++++++++++++++++++++++++ 1 files changed, 636 insertions(+), 0 deletions(-) create mode 100644 drivers/md/dm-multisnap-snaps.c diff --git a/drivers/md/dm-multisnap-snaps.c b/drivers/md/dm-multisnap-snaps.c new file mode 100644 index 0000000..9947673 --- /dev/null +++ b/drivers/md/dm-multisnap-snaps.c @@ -0,0 +1,636 @@ +/* + * Copyright (C) 2009 Red Hat Czech, s.r.o. + * + * Mikulas Patocka <mpatocka@redhat.com> + * + * This file is released under the GPL. + */ + +#include "dm-multisnap-mikulas.h" + +/* + * In-memory red-black tree denoting the used snapshot IDs. + */ +struct snapshot_range { + struct rb_node node; + mikulas_snapid_t from; + mikulas_snapid_t to; +}; + +/* + * Find a leftmost key in rbtree in the specified range (if add == 0) + * or create a new key (if add != 0). + */ +static struct snapshot_range * +rb_find_insert_snapshot(struct dm_exception_store *s, + mikulas_snapid_t from, mikulas_snapid_t to, int add) +{ + struct snapshot_range *new; + struct snapshot_range *found = NULL; + struct rb_node **p = &s->active_snapshots.rb_node; + struct rb_node *parent = NULL; + while (*p) { + parent = *p; +#define rn rb_entry(parent, struct snapshot_range, node) + if (to < rn->from) { +go_left: + p = &rn->node.rb_left; + } else if (from > rn->to) { + p = &rn->node.rb_right; + } else { + if (!add) { + found = rn; + /* If there is range query, we need to find the leftmost node */ + if (from < rn->from) + goto go_left; + break; + } else { + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("rb_insert_snapshot: inserting overlapping entry: " + "(%llx,%llx) overlaps (%llx,%llx)", + (unsigned long long)from, + (unsigned long long)to, + (unsigned long long)rn->from, + (unsigned long long)rn->to)); + return NULL; + } + } +#undef rn + } + if (!add) + return found; + + dm_multisnap_status_assert_locked(s->dm); + + new = kmalloc(sizeof(struct snapshot_range), GFP_KERNEL); + if (!new) { + DM_MULTISNAP_SET_ERROR(s->dm, -ENOMEM, + ("rb_insert_snapshot: can't allocate memory for snapshot descriptor")); + return NULL; + } + + new->from = from; + new->to = to; + + rb_link_node(&new->node, parent, p); + rb_insert_color(&new->node, &s->active_snapshots); + + return new; +} + +/* + * Find a leftmost key in rbtree in the specified range. + */ +static struct snapshot_range * +rb_find_snapshot(struct dm_exception_store *s, + mikulas_snapid_t from, mikulas_snapid_t to) +{ + return rb_find_insert_snapshot(s, from, to, 0); +} + +/* + * Insert a range to rbtree. It must not overlap with existing entries. + */ +static int rb_insert_snapshot_unlocked(struct dm_exception_store *s, + mikulas_snapid_t from, + mikulas_snapid_t to) +{ + struct snapshot_range *rn; + rn = rb_find_insert_snapshot(s, from, to, 1); + if (!rn) + return -1; + return 0; +} + +/* + * Hold the lock and insert a range to rbtree. It must not overlap with + * existing entries. + */ +static int rb_insert_snapshot(struct dm_exception_store *s, + mikulas_snapid_t from, mikulas_snapid_t to) +{ + int r; + dm_multisnap_status_lock(s->dm); + r = rb_insert_snapshot_unlocked(s, from, to); + dm_multisnap_status_unlock(s->dm); + return r; +} + +/* + * "from" must be last entry in the existing range. This function extends the + * range. The extended area must not overlap with another entry. + */ +static int rb_extend_range(struct dm_exception_store *s, + mikulas_snapid_t from, mikulas_snapid_t to) +{ + struct snapshot_range *rn; + rn = rb_find_insert_snapshot(s, from, from, 0); + if (!rn) { + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("rb_extend_range: snapshot %llx not found", + (unsigned long long)from)); + return -1; + } + if (rn->to != from) { + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("rb_extend_range: bad attempt to extend range: " + "%llx >= %llx", + (unsigned long long)rn->to, + (unsigned long long)from)); + return -1; + } + dm_multisnap_status_lock(s->dm); + rn->to = to; + dm_multisnap_status_unlock(s->dm); + return 0; +} + +/* + * Delete range from the rbtree. The range must be already allocated. + * + * It is valid to specify a subset of existing range, in this case, the range + * is trimmed and possible split to two ranges. + */ +static int rb_delete_range(struct dm_exception_store *s, + mikulas_snapid_t from, mikulas_snapid_t to) +{ + struct snapshot_range *sr = rb_find_snapshot(s, from, from); + + if (!sr || sr->to < to) { + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("rb_delete_range: deleting non-existing snapid " + "%llx-%llx", + (unsigned long long)from, + (unsigned long long)to)); + return -1; + } + + dm_multisnap_status_lock(s->dm); + if (sr->from < from) { + mikulas_snapid_t orig_to = sr->to; + sr->to = from - 1; + if (orig_to > to) { + if (rb_insert_snapshot_unlocked(s, to + 1, orig_to)) { + sr->to = orig_to; + dm_multisnap_status_unlock(s->dm); + return -1; + } + } + } else { + if (sr->to > to) { + sr->from = to + 1; + } else { + rb_erase(&sr->node, &s->active_snapshots); + kfree(sr); + } + } + dm_multisnap_status_unlock(s->dm); + return 0; +} + +/* + * If "snapid" is valid snapshot ID, return snapid. + * Otherwise, return the next valid snapshot ID. + * If there is no next valid snapshot ID, return DM_SNAPID_T_ORIGIN. + */ +snapid_t dm_multisnap_get_next_snapid(struct dm_exception_store *s, + snapid_t snapid) +{ + struct snapshot_range *rn; + + rn = rb_find_snapshot(s, snapid, DM_SNAPID_T_MAX); + if (!rn) + return DM_SNAPID_T_ORIGIN; + if (rn->from > snapid) + snapid = rn->from; + if (rn->to >= (snapid | DM_MIKULAS_SUBSNAPID_MASK)) + return snapid | DM_MIKULAS_SUBSNAPID_MASK; + return snapid; +} + +/* + * Find next range. + * A wrapper around rb_find_snapshot that is useable in other object files + * that don't know about struct snapshot_range. + */ +int dm_multisnap_find_next_snapid_range(struct dm_exception_store *s, + snapid_t snapid, snapid_t *from, + snapid_t *to) +{ + struct snapshot_range *rn; + rn = rb_find_snapshot(s, snapid, DM_SNAPID_T_MAX); + if (!rn) + return 0; + *from = rn->from; + *to = rn->to; + return 1; +} + +/* + * Return true, if the snapid is master (not subsnapshot). + */ +static int dm_multisnap_snapid_is_master(snapid_t snapid) +{ + return (snapid & DM_MIKULAS_SUBSNAPID_MASK) == DM_MIKULAS_SUBSNAPID_MASK; +} + +/* + * Find the next subsnapshot that is to be created for a snapshot with snapid. + * + * If it returns snapid, then no subsnapshot can be created. + */ +snapid_t dm_multisnap_find_next_subsnapshot(struct dm_exception_store *s, + snapid_t snapid) +{ +#ifdef CONFIG_DM_MULTISNAPSHOT_MIKULAS_SNAP_OF_SNAP + mikulas_snapid_t find_from, find_to; + if (unlikely(!dm_multisnap_snapid_is_master(snapid))) + return snapid; + if (!dm_multisnap_find_next_snapid_range(s, snapid, + &find_from, &find_to)) + BUG(); + snapid &= ~DM_MIKULAS_SUBSNAPID_MASK; + if (snapid < find_from) + snapid = find_from; +#endif + return snapid; +} + +/* + * Deallocate the whole rbtree. + */ +void dm_multisnap_destroy_snapshot_tree(struct dm_exception_store *s) +{ + struct rb_node *root; + while ((root = s->active_snapshots.rb_node)) { +#define rn rb_entry(root, struct snapshot_range, node) + rb_erase(root, &s->active_snapshots); + kfree(rn); +#undef rn + } +} + +/* + * Populate in-memory rbtree from on-disk b+tree. + */ +void dm_multisnap_read_snapshots(struct dm_exception_store *s) +{ + struct bt_key snap_key; + chunk_t ignore; + int r; + + dm_multisnap_destroy_snapshot_tree(s); + + snap_key.snap_from = 0; +find_next: + snap_key.snap_to = DM_SNAPID_T_MAX; + snap_key.chunk = DM_CHUNK_T_SNAP_PRESENT; + + r = dm_multisnap_find_in_btree(s, &snap_key, &ignore); + + if (unlikely(r < 0)) + return; + + if (r) { + if (unlikely(snap_key.snap_to > DM_SNAPID_T_MAX)) { + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_read_snapshots: invalid snapshot id")); + return; + } + r = rb_insert_snapshot(s, snap_key.snap_from, snap_key.snap_to); + if (unlikely(r < 0)) + return; + snap_key.snap_from = snap_key.snap_to + 1; + goto find_next; + } +} + +/* + * Allocate next snapshot ID. + * If snap_of_snap != 0, allocate a subsnapshot ID for snapshot "master". + * Otherwise, allocate a new master snapshot ID. + */ +int dm_multisnap_allocate_snapid(struct dm_exception_store *s, + snapid_t *snapid, int snap_of_snap, snapid_t master) +{ + if (snap_of_snap) { +#ifdef CONFIG_DM_MULTISNAPSHOT_MIKULAS_SNAP_OF_SNAP + if (!dm_multisnap_snapid_is_master(master)) { + DMERR("dm_multisnap_allocate_snapid: only two levels of snapshots are supported"); + return -EOPNOTSUPP; + } + *snapid = dm_multisnap_find_next_subsnapshot(s, master); + if (*snapid == master) { + DMERR("dm_multisnap_allocate_snapid: 2^32 snapshots-of-snapshot limit reached"); + return -ENOSPC; + } + return 0; +#else + DMERR("dm_multisnap_allocate_snapid: snapshots of snapshots not supported with 32-bit snapshot IDs"); + return -EOPNOTSUPP; +#endif + } + *snapid = ((mikulas_snapid_t)s->snapshot_num << DM_MIKULAS_SNAPID_STEP_BITS) | DM_MIKULAS_SUBSNAPID_MASK; + if (s->snapshot_num == 0xffffffff || *snapid > DM_SNAPID_T_MAX) { + DMERR("dm_multisnap_allocate_snapid: 2^32 snapshot limit reached"); + return -ENOSPC; + } + return 0; +} + +/* + * Add a snapid range to in-memory rbtree and on-disk b+tree. + * Optionally, merge with the previous range. Don't merge with the next. + */ +static int dm_multisnap_create_snapid_range(struct dm_exception_store *s, + snapid_t from, snapid_t to) +{ + int r; + struct bt_key snap_key; + + if (from && dm_multisnap_snapshot_exists(s->dm, from - 1)) { + /* Extend existing key range */ + + r = rb_extend_range(s, from - 1, to); + + if (r < 0) + return dm_multisnap_has_error(s->dm); + + snap_key.chunk = DM_CHUNK_T_SNAP_PRESENT; + snap_key.snap_from = from - 1; + snap_key.snap_to = to; + dm_multisnap_extend_btree_entry(s, &snap_key); + } else { + /* Add new entry */ + + r = rb_insert_snapshot(s, from, to); + if (r < 0) + return dm_multisnap_has_error(s->dm); + + snap_key.chunk = DM_CHUNK_T_SNAP_PRESENT; + snap_key.snap_from = from; + snap_key.snap_to = to; + dm_multisnap_add_to_btree(s, &snap_key, 0); + } + if (dm_multisnap_has_error(s->dm)) + return dm_multisnap_has_error(s->dm); + + dm_multisnap_transition_mark(s); + + return 0; +} + +/* + * Delete a snapid range from in-memory rbtree and on-disk b+tree. + */ +static int dm_multisnap_delete_snapid_range(struct dm_exception_store *s, + snapid_t from, snapid_t to) +{ + int r; + struct bt_key snap_key; + chunk_t ignore; + + r = rb_delete_range(s, from, to); + if (r < 0) + return dm_multisnap_has_error(s->dm); + + snap_key.chunk = DM_CHUNK_T_SNAP_PRESENT; + snap_key.snap_from = from; + snap_key.snap_to = from; + + r = dm_multisnap_find_in_btree(s, &snap_key, &ignore); + if (r <= 0) { + if (!r) + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_delete_snapshot: snapshot id %llx not found in b-tree", + (unsigned long long)from)); + return dm_multisnap_has_error(s->dm); + } + if (snap_key.snap_to < to) { + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_delete_snapshot: snapshot id %llx-%llx not found in b-tree", + (unsigned long long)from, (unsigned long long)to)); + return -EFSERROR; + } + + if (snap_key.snap_from < from) { + snap_key.snap_from = from; + dm_multisnap_restrict_btree_entry(s, &snap_key); + + dm_multisnap_transition_mark(s); + + if (dm_multisnap_has_error(s->dm)) + return dm_multisnap_has_error(s->dm); + + if (snap_key.snap_to > to) { + snap_key.snap_from = to + 1; + dm_multisnap_add_to_btree(s, &snap_key, 0); + } + } else { + if (snap_key.snap_to > to) { + snap_key.snap_to = to; + dm_multisnap_restrict_btree_entry(s, &snap_key); + } else { + dm_multisnap_delete_from_btree(s, &snap_key); + } + } + + dm_multisnap_transition_mark(s); + + return 0; +} + +/* + * Create a subsnapshot. + */ +static int dm_multisnap_create_subsnapshot(struct dm_exception_store *s, snapid_t snapid) +{ + int r; + snapid_t master, next_sub; + + master = snapid | DM_MIKULAS_SUBSNAPID_MASK; + if (!dm_multisnap_snapshot_exists(s->dm, master)) { + DMERR("dm_multisnap_create_subsnapshot: master snapshot with id %llx doesn't exist", + (unsigned long long)snapid); + return -EINVAL; + } + + next_sub = dm_multisnap_find_next_subsnapshot(s, master); + if (snapid < next_sub) { + DMERR("dm_multisnap_create_subsnapshot: invalid subsnapshot id %llx " + "(allowed range %llx - %llx)", + (unsigned long long)snapid, + (unsigned long long)next_sub, + (unsigned long long)master - 1); + return -EINVAL; + } + + r = dm_multisnap_delete_snapid_range(s, next_sub, snapid); + if (r) + return r; + + r = dm_multisnap_create_snapid_range(s, snapid, snapid); + if (r) + return r; + + dm_multisnap_commit(s); + + return 0; +} + +/* + * Create a snapshot or subsnapshot with a given snapid. + */ +int dm_multisnap_create_snapshot(struct dm_exception_store *s, snapid_t snapid) +{ + int r; + + if (!dm_multisnap_snapid_is_master(snapid)) + return dm_multisnap_create_subsnapshot(s, snapid); + + if ((snapid >> DM_MIKULAS_SNAPID_STEP_BITS) < s->snapshot_num || snapid > DM_SNAPID_T_MAX) { + DMERR("dm_multisnap_create_snapshot: invalid snapshot id %llx (allowed range %llx - %llx)", + (unsigned long long)snapid, + (unsigned long long)s->snapshot_num, + (unsigned long long)DM_SNAPID_T_MAX); + return -EINVAL; + } + if (dm_multisnap_snapshot_exists(s->dm, snapid)) { + DMERR("dm_multisnap_create_snapshot: snapshot with id %llx already exists", + (unsigned long long)snapid); + return -EINVAL; + } + + r = dm_multisnap_create_snapid_range(s, snapid - DM_MIKULAS_SUBSNAPID_MASK, snapid); + if (r) + return r; + + s->snapshot_num = (snapid >> DM_MIKULAS_SNAPID_STEP_BITS) + 1; + dm_multisnap_commit(s); + + return 0; +} + +/* + * Delete a snapshot or subsnapshot with a given snapid. + * Spawn background scanning for entries to delete. + */ +int dm_multisnap_delete_snapshot(struct dm_exception_store *s, snapid_t snapid) +{ + int r; + + if (!dm_multisnap_snapshot_exists(s->dm, snapid)) { + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_delete_snapshot: snapshot id %llx not found in rb-tree", + (unsigned long long)snapid)); + return -EFSERROR; + } + + r = dm_multisnap_delete_snapid_range(s, dm_multisnap_find_next_subsnapshot(s, snapid), snapid); + if (r) + return r; + + s->flags |= DM_MULTISNAP_FLAG_PENDING_DELETE; + dm_multisnap_queue_work(s->dm, &s->delete_work); + + dm_multisnap_commit(s); + + return 0; +} + +/* + * Sort the snapids for creating. Sort them linearly except that the master + * goes before all subsnapshots. + */ +int dm_multisnap_compare_snapids_for_create(const void *p1, const void *p2) +{ + mikulas_snapid_t s1 = *(const snapid_t *)p1; + mikulas_snapid_t s2 = *(const snapid_t *)p2; + mikulas_snapid_t ms1 = s1 >> DM_MIKULAS_SNAPID_STEP_BITS; + mikulas_snapid_t ms2 = s2 >> DM_MIKULAS_SNAPID_STEP_BITS; + int m1 = dm_multisnap_snapid_is_master(s1); + int m2 = dm_multisnap_snapid_is_master(s2); + if (ms1 < ms2) + return -1; + if (ms1 > ms2) + return 1; + if (m1 != m2) + return m2 - m1; + if (s1 < s2) + return -1; + if (s1 > s2) + return 1; + return 0; +} + +/* + * Return the number of total, allocated and metadata chunks. + */ +void dm_multisnap_get_space(struct dm_exception_store *s, + unsigned long long *chunks_total, + unsigned long long *chunks_allocated, + unsigned long long *chunks_metadata_allocated) +{ + dm_multisnap_status_assert_locked(s->dm); + *chunks_total = s->dev_size; + *chunks_allocated = s->total_allocated; + *chunks_metadata_allocated = s->total_allocated - s->data_allocated; +} + +#ifdef CONFIG_DM_MULTISNAPSHOT_MIKULAS_SNAP_OF_SNAP + +/* + * Convert snapid to user-friendly format (so that he won't see things like + * 4294967296). + */ +void dm_multisnap_print_snapid(struct dm_exception_store *s, char *string, + unsigned maxlen, snapid_t snapid) +{ + unsigned master = snapid >> DM_MIKULAS_SNAPID_STEP_BITS; + unsigned subsnap = snapid & DM_MIKULAS_SUBSNAPID_MASK; + if (dm_multisnap_snapid_is_master(snapid)) + snprintf(string, maxlen, "%u", master); + else + snprintf(string, maxlen, "%u.%u", master, subsnap); +} + +/* + * Convert snapid from user-friendly format to the internal 64-bit number. + */ +int dm_multisnap_read_snapid(struct dm_exception_store *s, char *string, + snapid_t *snapid, char **error) +{ + unsigned long master; + unsigned long subsnap; + if (!string[0]) { +err: + *error = "Invalid snapshot id"; + return -EINVAL; + } + + master = simple_strtoul(string, &string, 10); + + if (!string[0]) + subsnap = DM_MIKULAS_SUBSNAPID_MASK; + else { + if (string[0] != '.' || !string[1]) + goto err; + string++; + subsnap = simple_strtoul(string, &string, 10); + if (string[0]) + goto err; + if (subsnap >= DM_MIKULAS_SUBSNAPID_MASK) { +bad_number: + *error = "Number out of range"; + return -EINVAL; + } + } + + if (master >= DM_SNAPID_T_MAX >> DM_MIKULAS_SNAPID_STEP_BITS) + goto bad_number; + + *snapid = (mikulas_snapid_t)master << DM_MIKULAS_SNAPID_STEP_BITS | subsnap; + return 0; +} + +#endif -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 12/14] dm-multisnap-mikulas-common 2010-03-02 0:23 [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer ` (10 preceding siblings ...) 2010-03-02 0:23 ` [PATCH 11/14] dm-multisnap-mikulas-snaps Mike Snitzer @ 2010-03-02 0:23 ` Mike Snitzer 2010-03-02 0:23 ` [PATCH 13/14] dm-multisnap-mikulas-config Mike Snitzer ` (3 subsequent siblings) 15 siblings, 0 replies; 22+ messages in thread From: Mike Snitzer @ 2010-03-02 0:23 UTC (permalink / raw) To: dm-devel; +Cc: Mikulas Patocka From: Mikulas Patocka <mpatocka@redhat.com> The topmost file used to initialize and unload the exception store. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/dm-multisnap-mikulas.c | 760 +++++++++++++++++++++++++++++++++++++ 1 files changed, 760 insertions(+), 0 deletions(-) create mode 100644 drivers/md/dm-multisnap-mikulas.c diff --git a/drivers/md/dm-multisnap-mikulas.c b/drivers/md/dm-multisnap-mikulas.c new file mode 100644 index 0000000..ec6e30f --- /dev/null +++ b/drivers/md/dm-multisnap-mikulas.c @@ -0,0 +1,760 @@ +/* + * Copyright (C) 2009 Red Hat Czech, s.r.o. + * + * Mikulas Patocka <mpatocka@redhat.com> + * + * This file is released under the GPL. + */ + +#include "dm-multisnap-mikulas.h" + +/* + * Initialize in-memory structures, belonging to the commit block. + */ +static void init_commit_block(struct dm_exception_store *s) +{ + int i; + + dm_multisnap_init_freelist(s->freelist, s->chunk_size); + + s->snapshot_num = 0; + s->total_allocated = 0; + s->data_allocated = 0; + s->bitmap_root = 0; + s->alloc_rover = 0; + s->freelist_ptr = 0; + s->delete_rover_chunk = 0; + s->delete_rover_snapid = 0; + s->bt_root = 0; + s->bt_depth = 0; + s->flags = 0; + + for (i = 0; i < TMP_REMAP_HASH_SIZE; i++) + INIT_HLIST_HEAD(&s->tmp_remap[i]); + s->n_used_tmp_remaps = 0; + INIT_LIST_HEAD(&s->used_bitmap_tmp_remaps); + INIT_LIST_HEAD(&s->used_bt_tmp_remaps); + INIT_LIST_HEAD(&s->free_tmp_remaps); + + for (i = 0; i < N_REMAPS; i++) { + struct tmp_remap *t = &s->tmp_remap_store[i]; + list_add(&t->list, &s->free_tmp_remaps); + } + + s->dev_size = 0; + s->bitmap_depth = 0; + s->btree_entries = dm_multisnap_btree_entries(s->chunk_size); +} + +/* + * Load the commit block specified in s->valid_commit_block to memory + * and populate in-memory structures. + */ +static void load_commit_block(struct dm_exception_store *s) +{ + struct dm_buffer *bp; + struct multisnap_commit_block *cb; + __u64 dev_size; + int bitmap_depth; + unsigned i; + + dm_multisnap_clear_uncommitted(s); + + cb = dm_bufio_read(s->bufio, s->valid_commit_block, &bp); + if (IS_ERR(cb)) { + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(cb), + ("load_commit_block: can't re-read commit block %llx", + (unsigned long long)s->valid_commit_block)); + return; + } + if (cb->signature != CB_SIGNATURE) { + dm_bufio_release(bp); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("load_commit_block: bad signature when re-reading commit block %llx", + (unsigned long long)s->valid_commit_block)); + return; + } + + init_commit_block(s); + + dev_size = read_48(cb, dev_size); + s->snapshot_num = le32_to_cpu(cb->snapshot_num); + s->total_allocated = read_48(cb, total_allocated); + s->data_allocated = read_48(cb, data_allocated); + s->bitmap_root = read_48(cb, bitmap_root); + s->alloc_rover = read_48(cb, alloc_rover); + s->freelist_ptr = read_48(cb, freelist); + s->delete_rover_chunk = read_48(cb, delete_rover); + s->delete_rover_snapid = 0; + s->bt_root = read_48(cb, bt_root); + s->bt_depth = cb->bt_depth; + s->flags = cb->flags; + + if (s->bt_depth > MAX_BT_DEPTH || !s->bt_depth) { + dm_bufio_release(bp); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("load_commit_block: invalid b+-tree depth in commit block %llx", + (unsigned long long)s->valid_commit_block)); + return; + } + + INIT_LIST_HEAD(&s->free_tmp_remaps); + for (i = 0; i < N_REMAPS; i++) { + struct tmp_remap *t = &s->tmp_remap_store[i]; + if (read_48(&cb->tmp_remap[i], old)) { + t->old = read_48(&cb->tmp_remap[i], old); + t->new = read_48(&cb->tmp_remap[i], new); + t->uncommitted = 0; + t->bitmap_idx = le32_to_cpu(cb->tmp_remap[i].bitmap_idx); + hlist_add_head(&t->hash_list, &s->tmp_remap[TMP_REMAP_HASH(t->old)]); + if (t->bitmap_idx == CB_BITMAP_IDX_NONE) + list_add(&t->list, &s->used_bt_tmp_remaps); + else + list_add(&t->list, &s->used_bitmap_tmp_remaps); + s->n_used_tmp_remaps++; + } else { + list_add(&t->list, &s->free_tmp_remaps); + } + } + + dm_bufio_release(bp); + + if ((chunk_t)(dev_size + s->cb_stride) < (chunk_t)dev_size) { + DM_MULTISNAP_SET_ERROR(s->dm, -ERANGE, + ("load_commit_block: device is too large. Compile kernel with 64-bit sector numbers")); + return; + } + bitmap_depth = dm_multisnap_bitmap_depth(s->chunk_shift, dev_size); + if (bitmap_depth < 0) { + DM_MULTISNAP_SET_ERROR(s->dm, bitmap_depth, + ("load_commit_block: device is too large")); + return; + } + s->dev_size = dev_size; + s->bitmap_depth = bitmap_depth; + + dm_multisnap_load_freelist(s); +} + +/* + * Find the valid commit block. + * + * Read the initial commit block number from the superblock and then scan the + * commit blocks linearly as long as the sequence number in the commit block + * increases. + */ +static void find_commit_block(struct dm_exception_store *s) +{ + struct dm_buffer *bp; + struct multisnap_commit_block *cb; + chunk_t cb_addr = s->sb_commit_block; + __u64 sequence; + __u64 dev_size; + s->valid_commit_block = 0; + s->commit_sequence = 0; + +try_next: + cb = dm_bufio_read(s->bufio, cb_addr, &bp); + if (IS_ERR(cb)) { + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(cb), + ("find_commit_block: can't read commit block %llx", + (unsigned long long)cb_addr)); + return; + } + if (cb->signature != CB_SIGNATURE) { + dm_bufio_release(bp); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("find_commit_block: bad signature on commit block %llx", + (unsigned long long)cb_addr)); + return; + } + + sequence = le64_to_cpu(cb->sequence); + dev_size = read_48(cb, dev_size); + + dm_bufio_release(bp); + + if (sequence > s->commit_sequence) { + s->commit_sequence = sequence; + s->valid_commit_block = cb_addr; + if ((__u64)cb_addr + s->cb_stride < dev_size) { + cb_addr += s->cb_stride; + goto try_next; + } + } + if (!s->valid_commit_block) { + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("find_commit_block: no valid commit block")); + return; + } +} + +/* + * Return device size in chunks. + */ +static int get_size(struct dm_exception_store *s, chunk_t *size) +{ + __u64 dev_size; + dev_size = i_size_read(dm_multisnap_snapshot_bdev(s->dm)->bd_inode) >> s->chunk_shift; + *size = dev_size; + if ((chunk_t)(dev_size + s->cb_stride) < dev_size) + return -EFBIG; + + return 0; +} + +/* + * Initialize the whole snapshot store. + */ +static void initialize_device(struct dm_exception_store *s) +{ + int r; + struct dm_buffer *bp; + struct multisnap_superblock *sb; + struct multisnap_commit_block *cb; + chunk_t cb_block; + chunk_t block_to_write; + + s->cb_stride = CB_STRIDE_DEFAULT; + + r = get_size(s, &s->dev_size); + if (r) { + DM_MULTISNAP_SET_ERROR(s->dm, r, + ("initialize_device: device is too large. Compile kernel with 64-bit sector numbers")); + return; + } + + s->total_allocated = 0; + s->data_allocated = 0; + + block_to_write = SB_BLOCK + 1; + + /* Write btree */ + dm_multisnap_create_btree(s, &block_to_write); + if (dm_multisnap_has_error(s->dm)) + return; + + /* Write bitmaps */ + dm_multisnap_create_bitmaps(s, &block_to_write); + if (dm_multisnap_has_error(s->dm)) + return; + + s->dev_size = block_to_write; + + /* Write commit blocks */ + if (FIRST_CB_BLOCK >= s->dev_size) { + DM_MULTISNAP_SET_ERROR(s->dm, -ENOSPC, + ("initialize_device: device is too small")); + return; + } + for (cb_block = FIRST_CB_BLOCK; cb_block < s->dev_size; cb_block += s->cb_stride) { + cb = dm_bufio_new(s->bufio, cb_block, &bp); + if (IS_ERR(cb)) { + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(cb), + ("initialize_device: can't allocate commit block at %llx", + (unsigned long long)cb_block)); + return; + } + memset(cb, 0, s->chunk_size); + cb->signature = CB_SIGNATURE; + cb->sequence = cpu_to_le64(cb_block == FIRST_CB_BLOCK); + if (cb_block == FIRST_CB_BLOCK) { + cb->snapshot_num = cpu_to_le32(0); + write_48(cb, dev_size, s->dev_size); + write_48(cb, total_allocated, s->total_allocated); + write_48(cb, data_allocated, s->data_allocated); + write_48(cb, bitmap_root, s->bitmap_root); + write_48(cb, freelist, 0); + write_48(cb, delete_rover, 0); + write_48(cb, bt_root, s->bt_root); + cb->bt_depth = s->bt_depth; + cb->flags = 0; + } + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + } + r = dm_bufio_write_dirty_buffers(s->bufio); + if (r) { + DM_MULTISNAP_SET_ERROR(s->dm, r, + ("initialize_device: write error when initializing device")); + return; + } + + /* Write super block */ + sb = dm_bufio_new(s->bufio, SB_BLOCK, &bp); + if (IS_ERR(sb)) { + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(sb), + ("initialize_device: can't allocate super block")); + return; + } + memset(sb, 0, s->chunk_size); + sb->signature = SB_SIGNATURE; + sb->chunk_size = cpu_to_le32(s->chunk_size); + sb->cb_stride = cpu_to_le32(s->cb_stride); + sb->error = cpu_to_le32(0); + sb->commit_block = cpu_to_le64(FIRST_CB_BLOCK); + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + r = dm_bufio_write_dirty_buffers(s->bufio); + if (r) { + DM_MULTISNAP_SET_ERROR(s->dm, r, + ("initialize_device: can't write super block")); + return; + } +} + +/* + * Extend the snapshot store if its size increases. + * + * Note: the size can never decrease. + */ +static void extend_exception_store(struct dm_exception_store *s, chunk_t new_size) +{ + struct dm_buffer *bp; + chunk_t cb_block; + struct multisnap_commit_block *cb; + + /* Write commit blocks */ + for (cb_block = FIRST_CB_BLOCK; cb_block < new_size; cb_block += s->cb_stride) { + cond_resched(); + if (cb_block < s->dev_size) + continue; + cb = dm_bufio_new(s->bufio, cb_block, &bp); + if (IS_ERR(cb)) { + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(cb), + ("initialize_device: can't allocate commit block at %llx", + (unsigned long long)cb_block)); + return; + } + memset(cb, 0, s->chunk_size); + cb->signature = CB_SIGNATURE; + cb->sequence = cpu_to_le64(0); + dm_bufio_mark_buffer_dirty(bp); + dm_bufio_release(bp); + } + + dm_multisnap_extend_bitmaps(s, new_size); + + s->valid_commit_block = (chunk_t)-1; + + dm_multisnap_commit(s); +} + +/* + * Read the super block and possibly initialize the device. + * + * If the super block contains valid signature, we assume that the device + * is initialized and read all the data. + * If the super block is zeroed, we do initialization. + * Otherwise we report error. + */ +static int read_super(struct dm_exception_store *s, char **error) +{ + struct dm_buffer *bp; + struct multisnap_superblock *sb; + int initialized; + __s32 e; + + init_commit_block(s); + + initialized = 0; +re_read: + sb = dm_bufio_read(s->bufio, SB_BLOCK, &bp); + if (IS_ERR(sb)) { + *error = "Could not read superblock"; + return PTR_ERR(sb); + } + + if (sb->signature != SB_SIGNATURE) { + int i; + if (initialized) { + *error = "Invalid signature after initialization"; + return -EIO; + } + for (i = 0; i < 1 << SECTOR_SHIFT; i++) { + if (((char *)sb)[i]) { + dm_bufio_release(bp); + *error = "Uninitialized device"; + return -ENXIO; + } + } + dm_bufio_release(bp); + initialize_device(s); + if (dm_multisnap_has_error(s->dm)) { + *error = "Can't initialize device"; + return dm_multisnap_has_error(s->dm); + } + initialized = 1; + goto re_read; + } + if (le32_to_cpu(sb->chunk_size) != s->chunk_size) { + dm_bufio_release(bp); + *error = "Bad chunk size"; + return -EINVAL; + } + s->cb_stride = le32_to_cpu(sb->cb_stride); + if (s->cb_stride <= 1) { + dm_bufio_release(bp); + *error = "Bad commit block stride in superblock"; + return -EFSERROR; + } + s->sb_commit_block = le64_to_cpu(sb->commit_block); + e = le32_to_cpu(sb->error); + dm_bufio_release(bp); + + find_commit_block(s); + + if (dm_multisnap_has_error(s->dm)) { + if (dm_multisnap_drop_on_error(s->dm)) + return 0; + *error = "Unable to find commit block"; + return dm_multisnap_has_error(s->dm); + } + + load_commit_block(s); + + if (dm_multisnap_has_error(s->dm)) { + if (dm_multisnap_drop_on_error(s->dm)) + return 0; + *error = "Unable to load commit block"; + return dm_multisnap_has_error(s->dm); + } + + if (e < 0) { + /* Don't read the B+-tree if there was an error */ + DM_MULTISNAP_SET_ERROR(s->dm, e, + ("read_super: activating invalidated snapshot store, error %d", e)); + return 0; + } + + dm_multisnap_read_snapshots(s); + if (dm_multisnap_has_error(s->dm)) { + if (dm_multisnap_drop_on_error(s->dm)) + return 0; + *error = "Could not read snapshot list"; + return dm_multisnap_has_error(s->dm); + } + + return 0; +} + +/* + * This is a callback that is being called each time the generic code acquires + * the master lock. Thus, it is guaranteed that other operations won't race with + * this callback. + * + * Currently, we test if the device size has grown, and if so, we extend the + * exception store. + * + * If the device size has shrunk, we report an error and stop further + * operations. + */ +static void dm_multisnap_mikulas_lock_acquired(struct dm_exception_store *s, int flags) +{ + int r; + chunk_t new_size; + + if (!dm_multisnap_can_commit(s->dm)) + return; + + r = get_size(s, &new_size); + if (unlikely(r)) + return; + + if (unlikely(new_size != s->dev_size)) { + if (unlikely(new_size < s->dev_size)) { + DM_MULTISNAP_SET_ERROR(s->dm, -EINVAL, + ("dm_multisnap_mikulas_lock_acquired: device shrinked")); + return; + } + extend_exception_store(s, new_size); + } +} + +/* + * Debug code. + */ + +/*#define PRINT_BTREE*/ + +#ifdef PRINT_BTREE +static int print_btree_callback(struct dm_exception_store *s, + struct dm_multisnap_bt_node *node, + struct dm_multisnap_bt_entry *bt, void *cookie) +{ + printk(KERN_DEBUG "entry: %llx, %llx-%llx -> %llx\n", + (unsigned long long)read_48(bt, orig_chunk), + (unsigned long long)cpu_to_mikulas_snapid(bt->snap_from), + (unsigned long long)cpu_to_mikulas_snapid(bt->snap_to), + (unsigned long long)read_48(bt, new_chunk)); + return 0; +} + +static void print_btree(struct dm_exception_store *s) +{ + struct bt_key key = { 0, 0, 0 }; + int r = dm_multisnap_list_btree(s, &key, print_btree_callback, NULL); + printk(KERN_DEBUG "list ended: %d\n", r); +} +#endif + +/*#define PRINT_BITMAPS*/ + +#ifdef PRINT_BITMAPS +static void print_bitmaps(struct dm_exception_store *s) +{ + chunk_t c; + printk(KERN_DEBUG "allocated:"); + for (c = 0; c < s->dev_size; c += s->chunk_size * 8) { + struct dm_buffer *bp; + unsigned i; + void *bmp = dm_multisnap_map_bitmap(s, c >> (s->chunk_shift + 3), + &bp, NULL, NULL); + if (!bmp) + continue; + for (i = 0; i < s->chunk_size * 8; i++) + if (generic_test_le_bit(i, bmp)) { + chunk_t block = c + i; + if (!dm_multisnap_is_commit_block(s, block)) + printk(" %llx", (unsigned long long)block); + cond_resched(); + } + } + + dm_bufio_release(bp); + } + printk("\n"); +} +#endif + +/* + * The initialization callback. + * Parse arguments, allocate structures and call read_super to read the data + * from the disk. + */ +static int dm_multisnap_mikulas_init(struct dm_multisnap *dm, + struct dm_exception_store **sp, + unsigned argc, char **argv, char **error) +{ + int r, i; + struct dm_exception_store *s; + + s = kzalloc(sizeof(struct dm_exception_store), GFP_KERNEL); + if (!s) { + *error = "Could not allocate private area"; + r = -ENOMEM; + goto bad_private; + } + *sp = s; + + s->dm = dm; + s->chunk_size = dm_multisnap_chunk_size(dm); + s->chunk_shift = ffs(s->chunk_size) - 1; + + s->active_snapshots = RB_ROOT; + s->n_preallocated_blocks = 0; + s->query_active = 0; + + s->delete_work.work = dm_multisnap_background_delete; + s->delete_work.queued = 0; + s->delete_commit_count = 0; + + s->cache_threshold = 0; + s->cache_limit = 0; + + for (i = 0; i < UNCOMMITTED_BLOCK_HASH_SIZE; i++) + INIT_HLIST_HEAD(&s->uncommitted_blocks[i]); + + while (argc) { + char *string; + r = dm_multisnap_get_string(&argv, &argc, &string, error); + if (r) + goto bad_arguments; + if (!strcasecmp(string, "cache-threshold")) { + r = dm_multisnap_get_uint64(&argv, &argc, + &s->cache_threshold, error); + if (r) + goto bad_arguments; + } else if (!strcasecmp(string, "cache-limit")) { + r = dm_multisnap_get_uint64(&argv, &argc, + &s->cache_limit, error); + if (r) + goto bad_arguments; + } else { + *error = "Unknown parameter"; + r = -EINVAL; + goto bad_arguments; + } + } + + + s->tmp_chunk = vmalloc(s->chunk_size + sizeof(struct dm_multisnap_bt_entry)); + if (!s->tmp_chunk) { + *error = "Can't allocate temporary chunk"; + r = -ENOMEM; + goto bad_tmp_chunk; + } + + s->freelist = vmalloc(s->chunk_size); + if (!s->freelist) { + *error = "Can't allocate freelist"; + r = -ENOMEM; + goto bad_freelist; + } + + s->bufio = dm_bufio_client_create(dm_multisnap_snapshot_bdev(s->dm), + s->chunk_size, 0, s->cache_threshold, + s->cache_limit); + if (IS_ERR(s->bufio)) { + *error = "Can't create bufio client"; + r = PTR_ERR(s->bufio); + goto bad_bufio; + } + + r = read_super(s, error); + if (r) + goto bad_super; + + if (s->flags & (DM_MULTISNAP_FLAG_DELETING | + DM_MULTISNAP_FLAG_PENDING_DELETE)) + dm_multisnap_queue_work(s->dm, &s->delete_work); + +#ifdef PRINT_BTREE + print_btree(s); +#endif +#ifdef PRINT_BITMAPS + print_bitmaps(s); +#endif + + return 0; + +bad_super: + dm_bufio_client_destroy(s->bufio); +bad_bufio: + vfree(s->freelist); +bad_freelist: + vfree(s->tmp_chunk); +bad_tmp_chunk: +bad_arguments: + kfree(s); +bad_private: + return r; +} + +/* + * Exit the exception store. + */ +static void dm_multisnap_mikulas_exit(struct dm_exception_store *s) +{ + int i; + + dm_multisnap_cancel_work(s->dm, &s->delete_work); + + i = 0; + while (!list_empty(&s->used_bitmap_tmp_remaps)) { + struct tmp_remap *t = list_first_entry(&s->used_bitmap_tmp_remaps, + struct tmp_remap, list); + list_del(&t->list); + hlist_del(&t->hash_list); + i++; + } + + while (!list_empty(&s->used_bt_tmp_remaps)) { + struct tmp_remap *t = list_first_entry(&s->used_bt_tmp_remaps, + struct tmp_remap, list); + list_del(&t->list); + hlist_del(&t->hash_list); + i++; + } + + BUG_ON(i != s->n_used_tmp_remaps); + while (!list_empty(&s->free_tmp_remaps)) { + struct tmp_remap *t = list_first_entry(&s->free_tmp_remaps, struct tmp_remap, list); + list_del(&t->list); + i++; + } + BUG_ON(i != N_REMAPS); + + for (i = 0; i < TMP_REMAP_HASH_SIZE; i++) + BUG_ON(!hlist_empty(&s->tmp_remap[i])); + + dm_multisnap_clear_uncommitted(s); + + dm_bufio_client_destroy(s->bufio); + vfree(s->freelist); + vfree(s->tmp_chunk); + kfree(s); +} + +/* + * Return exception-store specific arguments. This is used in the proces of + * constructing the table returned by device mapper. + */ +static void dm_multisnap_status_table(struct dm_exception_store *s, + char *result, unsigned maxlen) +{ + int npar = 0; + if (s->cache_threshold) + npar += 2; + if (s->cache_limit) + npar += 2; + + snprintf(result, maxlen, " %d", npar); + dm_multisnap_adjust_string(&result, &maxlen); + + if (s->cache_threshold) { + snprintf(result, maxlen, " cache-threshold %llu", + (unsigned long long)s->cache_threshold); + dm_multisnap_adjust_string(&result, &maxlen); + } + if (s->cache_limit) { + snprintf(result, maxlen, " cache-limit %llu", + (unsigned long long)s->cache_limit); + dm_multisnap_adjust_string(&result, &maxlen); + } +} + +struct dm_multisnap_exception_store dm_multisnap_mikulas_store = { + .name = "mikulas", + .module = THIS_MODULE, + .init_exception_store = dm_multisnap_mikulas_init, + .exit_exception_store = dm_multisnap_mikulas_exit, + .store_lock_acquired = dm_multisnap_mikulas_lock_acquired, +#ifdef CONFIG_DM_MULTISNAPSHOT_MIKULAS_SNAP_OF_SNAP + .print_snapid = dm_multisnap_print_snapid, + .read_snapid = dm_multisnap_read_snapid, +#endif + .status_table = dm_multisnap_status_table, + .get_space = dm_multisnap_get_space, + .allocate_snapid = dm_multisnap_allocate_snapid, + .create_snapshot = dm_multisnap_create_snapshot, + .delete_snapshot = dm_multisnap_delete_snapshot, + .get_next_snapid = dm_multisnap_get_next_snapid, + .compare_snapids_for_create = dm_multisnap_compare_snapids_for_create, + .find_snapshot_chunk = dm_multisnap_find_snapshot_chunk, + .reset_query = dm_multisnap_reset_query, + .query_next_remap = dm_multisnap_query_next_remap, + .add_next_remap = dm_multisnap_add_next_remap, + .make_chunk_writeable = dm_multisnap_make_chunk_writeable, + .check_conflict = dm_multisnap_check_conflict, + .prepare_for_commit = dm_multisnap_prepare_for_commit, + .commit = dm_multisnap_commit, +}; + +static int __init dm_multisnapshot_mikulas_module_init(void) +{ + BUG_ON(sizeof(struct multisnap_commit_block) != 512); + return dm_multisnap_register_exception_store(&dm_multisnap_mikulas_store); +} + +static void __exit dm_multisnapshot_mikulas_module_exit(void) +{ + dm_multisnap_unregister_exception_store(&dm_multisnap_mikulas_store); +} + +module_init(dm_multisnapshot_mikulas_module_init); +module_exit(dm_multisnapshot_mikulas_module_exit); + +MODULE_DESCRIPTION(DM_NAME " multisnapshot Mikulas' exceptions store"); +MODULE_AUTHOR("Mikulas Patocka"); +MODULE_LICENSE("GPL"); -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 13/14] dm-multisnap-mikulas-config 2010-03-02 0:23 [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer ` (11 preceding siblings ...) 2010-03-02 0:23 ` [PATCH 12/14] dm-multisnap-mikulas-common Mike Snitzer @ 2010-03-02 0:23 ` Mike Snitzer 2010-03-02 0:23 ` [PATCH 14/14] dm-multisnap-daniel Mike Snitzer ` (2 subsequent siblings) 15 siblings, 0 replies; 22+ messages in thread From: Mike Snitzer @ 2010-03-02 0:23 UTC (permalink / raw) To: dm-devel; +Cc: Mikulas Patocka From: Mikulas Patocka <mpatocka@redhat.com> Add the exception store to Kconfig and Makefile. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/Kconfig | 9 +++++++++ drivers/md/Makefile | 6 ++++++ 2 files changed, 15 insertions(+), 0 deletions(-) diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig index c3b55a8..0c3ce3b 100644 --- a/drivers/md/Kconfig +++ b/drivers/md/Kconfig @@ -259,6 +259,15 @@ config DM_MULTISNAPSHOT A submenu allows to select a specific shared snapshot store driver. +config DM_MULTISNAPSHOT_MIKULAS + tristate "Mikulas' snapshot store" + depends on DM_MULTISNAPSHOT + ---help--- + Mikulas Patocka's snapshot store. + + A B+-tree-based log-structured storage allowing unlimited + number of snapshots. + config DM_MIRROR tristate "Mirror target" depends on BLK_DEV_DM diff --git a/drivers/md/Makefile b/drivers/md/Makefile index 674649c..084d632 100644 --- a/drivers/md/Makefile +++ b/drivers/md/Makefile @@ -8,6 +8,11 @@ dm-multipath-y += dm-path-selector.o dm-mpath.o dm-snapshot-y += dm-snap.o dm-exception-store.o dm-snap-transient.o \ dm-snap-persistent.o dm-multisnapshot-y += dm-multisnap.o +dm-store-mikulas-y += dm-multisnap-mikulas.o dm-multisnap-alloc.o \ + dm-multisnap-blocks.o dm-multisnap-btree.o \ + dm-multisnap-commit.o dm-multisnap-delete.o \ + dm-multisnap-freelist.o dm-multisnap-io.o \ + dm-multisnap-snaps.o dm-bufio.o dm-mirror-y += dm-raid1.o dm-log-userspace-y \ += dm-log-userspace-base.o dm-log-userspace-transfer.o @@ -43,6 +48,7 @@ obj-$(CONFIG_DM_MULTIPATH_QL) += dm-queue-length.o obj-$(CONFIG_DM_MULTIPATH_ST) += dm-service-time.o obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o obj-$(CONFIG_DM_MULTISNAPSHOT) += dm-multisnapshot.o +obj-$(CONFIG_DM_MULTISNAPSHOT_MIKULAS) += dm-store-mikulas.o obj-$(CONFIG_DM_MIRROR) += dm-mirror.o dm-log.o dm-region-hash.o obj-$(CONFIG_DM_LOG_USERSPACE) += dm-log-userspace.o obj-$(CONFIG_DM_ZERO) += dm-zero.o -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 14/14] dm-multisnap-daniel 2010-03-02 0:23 [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer ` (12 preceding siblings ...) 2010-03-02 0:23 ` [PATCH 13/14] dm-multisnap-mikulas-config Mike Snitzer @ 2010-03-02 0:23 ` Mike Snitzer 2010-03-02 0:57 ` FUJITA Tomonori 2010-03-02 0:32 ` [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer 2010-03-02 14:56 ` Mike Snitzer 15 siblings, 1 reply; 22+ messages in thread From: Mike Snitzer @ 2010-03-02 0:23 UTC (permalink / raw) To: dm-devel; +Cc: Mikulas Patocka From: Mikulas Patocka <mpatocka@redhat.com> An implementation of snapshot store. Designed by Daniel Phillips, ported to kernelspace by Fujita Tomonorig. This implementation plugs as a module into dm-multisnapshot interface. The interface is almost the same as in Mikulas' exception store. You have to specify "daniel" instead of "mikulas" as a type of exception store. The only supported chunk size is 16384 bytes (but the code is general enough and this restriction may be removed in the future). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- drivers/md/Kconfig | 14 + drivers/md/Makefile | 2 + drivers/md/dm-multisnap-daniel.c | 1711 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 1727 insertions(+), 0 deletions(-) create mode 100644 drivers/md/dm-multisnap-daniel.c diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig index 0c3ce3b..c30bf70 100644 --- a/drivers/md/Kconfig +++ b/drivers/md/Kconfig @@ -268,6 +268,20 @@ config DM_MULTISNAPSHOT_MIKULAS A B+-tree-based log-structured storage allowing unlimited number of snapshots. +config DM_MULTISNAPSHOT_DANIEL + tristate "Daniel's snapshot store" + depends on DM_MULTISNAPSHOT + ---help--- + Daniel Philips' exception store. The data structures were + designed by Daniel Phillips for Zumastore project, a porting + to kernel space was done by Fujita Tomonorig. + + This store has limit of at most 64 snapshots and supports + snapshot deletion. + + So far it doesn't support maintaining consistency across + crashes; journaling is under development. + config DM_MIRROR tristate "Mirror target" depends on BLK_DEV_DM diff --git a/drivers/md/Makefile b/drivers/md/Makefile index 084d632..836cd95 100644 --- a/drivers/md/Makefile +++ b/drivers/md/Makefile @@ -13,6 +13,7 @@ dm-store-mikulas-y += dm-multisnap-mikulas.o dm-multisnap-alloc.o \ dm-multisnap-commit.o dm-multisnap-delete.o \ dm-multisnap-freelist.o dm-multisnap-io.o \ dm-multisnap-snaps.o dm-bufio.o +dm-store-daniel-y += dm-multisnap-daniel.o dm-mirror-y += dm-raid1.o dm-log-userspace-y \ += dm-log-userspace-base.o dm-log-userspace-transfer.o @@ -49,6 +50,7 @@ obj-$(CONFIG_DM_MULTIPATH_ST) += dm-service-time.o obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o obj-$(CONFIG_DM_MULTISNAPSHOT) += dm-multisnapshot.o obj-$(CONFIG_DM_MULTISNAPSHOT_MIKULAS) += dm-store-mikulas.o +obj-$(CONFIG_DM_MULTISNAPSHOT_DANIEL) += dm-store-daniel.o obj-$(CONFIG_DM_MIRROR) += dm-mirror.o dm-log.o dm-region-hash.o obj-$(CONFIG_DM_LOG_USERSPACE) += dm-log-userspace.o obj-$(CONFIG_DM_ZERO) += dm-zero.o diff --git a/drivers/md/dm-multisnap-daniel.c b/drivers/md/dm-multisnap-daniel.c new file mode 100644 index 0000000..00fd3c0 --- /dev/null +++ b/drivers/md/dm-multisnap-daniel.c @@ -0,0 +1,1711 @@ +/* + * dm-exception-store.c + * + * Copyright (C) 2001-2002 Sistina Software (UK) Limited. + * Copyright (C) 2006 Red Hat GmbH + * + * The shared exception code is based on Zumastor (http://zumastor.org/) + * + * By: Daniel Phillips, Nov 2003 to Mar 2007 + * (c) 2003, Sistina Software Inc. + * (c) 2004, Red Hat Software Inc. + * (c) 2005 Daniel Phillips + * (c) 2006 - 2007, Google Inc + * + * This file is released under the GPL. + */ + +#include "dm-multisnap.h" + +#include <linux/mm.h> +#include <linux/pagemap.h> +#include <linux/vmalloc.h> +#include <linux/slab.h> +#include <linux/dm-io.h> +#include <linux/dm-kcopyd.h> + +#define DM_CHUNK_SIZE_DEFAULT_SECTORS 32 /* 16KB */ + +#define MAX_SNAPSHOTS 64 + +#define MAX_CHUNK_BUFFERS 128 + +/*----------------------------------------------------------------- + * Persistent snapshots, by persistent we mean that the snapshot + * will survive a reboot. + *--------------------------------------------------------------- + */ + +/* + * We need to store a record of which parts of the origin have + * been copied to the snapshot device. The snapshot code + * requires that we copy exception chunks to chunk aligned areas + * of the COW store. It makes sense therefore, to store the + * metadata in chunk size blocks. + * + * There is no backward or forward compatibility implemented, + * snapshots with different disk versions than the kernel will + * not be usable. It is expected that "lvcreate" will blank out + * the start of a fresh COW device before calling the snapshot + * constructor. + * + * The first chunk of the COW device just contains the header. + * After this there is a chunk filled with exception metadata, + * followed by as many exception chunks as can fit in the + * metadata areas. + * + * All on disk structures are in little-endian format. The end + * of the exceptions info is indicated by an exception with a + * new_chunk of 0, which is invalid since it would point to the + * header chunk. + */ + +/* + * Magic for persistent snapshots: "SnAp" - Feeble isn't it. + */ +#define SNAP_MAGIC 0x70416e53 + +struct disk_header { + uint32_t magic; + + /* + * Is this snapshot valid. There is no way of recovering + * an invalid snapshot. + */ + uint32_t valid; + + /* + * Simple, incrementing version. no backward + * compatibility. + */ + uint32_t version; + + /* In sectors */ + uint32_t chunk_size; + + /* + * for shared exception + */ + __le64 root_tree_chunk; + __le64 snapmask; /* Mikulas: don't mix bit_test/set w.r.t. direct reading, it's unportable */ + __le32 tree_level; + __le32 h_nr_journal_chunks; +}; + +struct disk_exception { + uint64_t old_chunk; + uint64_t new_chunk; +}; + +struct commit_callback { + void (*callback)(void *, int success); + void *context; +}; + +/* + * The top level structure for a persistent exception store. + */ +struct dm_exception_store { + struct dm_multisnap *dm; + + void *area; + + struct dm_io_client *io_client; + + unsigned chunk_size; + unsigned char chunk_shift; + + int version; + + /* + * for shared exception + */ + u64 root_tree_chunk; + u64 snapmask; + u32 tree_level; + + u32 nr_snapshots; + + chunk_t nr_chunks; + chunk_t nr_bitmap_chunks; + chunk_t nr_journal_chunks; + unsigned long *bitmap; + chunk_t cur_bitmap_chunk; + chunk_t cur_bitmap_index; + chunk_t cur_journal_chunk; + + struct list_head chunk_buffer_list; + struct list_head chunk_buffer_dirty_list; + + int header_dirty; + int nr_chunk_buffers; + + chunk_t chunk_to_add; +}; + +static unsigned sectors_to_pages(unsigned sectors) +{ + return DIV_ROUND_UP(sectors, PAGE_SIZE >> 9); +} + +static int alloc_area(struct dm_exception_store *ps) +{ + int r = -ENOMEM; + size_t len; + + len = ps->chunk_size; + + /* + * Allocate the chunk_size block of memory that will hold + * a single metadata area. + */ + ps->area = vmalloc(len); + if (!ps->area) + return r; + + return 0; +} + +static void free_area(struct dm_exception_store *ps) +{ + vfree(ps->area); + ps->area = NULL; +} + +/* + * Read or write a chunk aligned and sized block of data from a device. + */ +static int chunk_io(struct dm_exception_store *ps, chunk_t chunk, int rw, void *data) +{ + struct dm_io_region where = { + .bdev = dm_multisnap_snapshot_bdev(ps->dm), + .sector = (ps->chunk_size >> SECTOR_SHIFT) * chunk, + .count = (ps->chunk_size >> SECTOR_SHIFT), + }; + struct dm_io_request io_req = { + .bi_rw = rw, + .mem.type = DM_IO_VMA, + .mem.ptr.vma = data, + .client = ps->io_client, + .notify.fn = NULL, + }; + + int r = dm_io(&io_req, 1, &where, NULL); + if (r) { + DM_MULTISNAP_SET_ERROR(ps->dm, r, ("io error when %s %llx: %d", + rw == WRITE ? "writing" : "reading", + (unsigned long long)chunk, r)); + } + return r; +} + +static int write_header(struct dm_exception_store *ps) +{ + struct disk_header *dh; + + memset(ps->area, 0, ps->chunk_size); + + dh = (struct disk_header *) ps->area; + dh->magic = cpu_to_le32(SNAP_MAGIC); + dh->valid = cpu_to_le32(dm_multisnap_drop_on_error(ps->dm) && dm_multisnap_has_error(ps->dm)); + dh->version = cpu_to_le32(ps->version); + dh->chunk_size = cpu_to_le32(ps->chunk_size >> SECTOR_SHIFT); + + dh->root_tree_chunk = cpu_to_le64(ps->root_tree_chunk); + dh->snapmask = cpu_to_le64(ps->snapmask); + dh->tree_level = cpu_to_le32(ps->tree_level); + dh->h_nr_journal_chunks = cpu_to_le32(ps->nr_journal_chunks); + + ps->header_dirty = 0; + + return chunk_io(ps, 0, WRITE, ps->area); +} + +/* + * shared exception code + */ + +#define SNAP_MAGIC 0x70416e53 +#define FIRST_BITMAP_CHUNK 1 + +struct chunk_buffer { + struct list_head list; + struct list_head dirty_list; + u64 chunk; + void *data; +}; + +struct node { + __le32 count; + __le32 unused; + struct index_entry { + __le64 key; /* note: entries[0].key never accessed */ + __le64 chunk; /* node sector address goes here */ + } entries[]; +}; + +struct leaf { + __le16 magic; + __le16 version; + __le32 count; + /* !!! FIXME the code doesn't use the base_chunk properly */ + __le64 base_chunk; + __le64 using_mask; + + struct tree_map { + __le32 offset; + __le32 rchunk; + } map[]; +}; + +struct exception { + __le64 share; + __le64 chunk; +}; + +static inline struct node *buffer2node(struct chunk_buffer *buffer) +{ + return (struct node *)buffer->data; +} + +static inline struct leaf *buffer2leaf(struct chunk_buffer *buffer) +{ + return (struct leaf *)buffer->data; +} + +static struct chunk_buffer *alloc_chunk_buffer(struct dm_exception_store *ps) +{ + struct chunk_buffer *b; + + /* Mikulas: changed to GFP_NOIO */ + b = kzalloc(sizeof(*b), GFP_NOIO); + if (!b) { + DM_MULTISNAP_SET_ERROR(ps->dm, -ENOMEM, + ("%s %d: out of memory", __func__, __LINE__)); + return NULL; + } + + /* Mikulas: must use GFP_NOIO; vmalloc without it may deadlock */ + b->data = __vmalloc(ps->chunk_size, GFP_NOIO | __GFP_HIGHMEM, PAGE_KERNEL); + if (!b->data) { + kfree(b); + DM_MULTISNAP_SET_ERROR(ps->dm, -ENOMEM, + ("%s %d: out of memory", __func__, __LINE__)); + return NULL; + } + + memset(b->data, 0, ps->chunk_size); + + list_add(&b->list, &ps->chunk_buffer_list); + INIT_LIST_HEAD(&b->dirty_list); + + ps->nr_chunk_buffers++; + + return b; +} + +static void free_chunk_buffer(struct dm_exception_store *ps, struct chunk_buffer *b) +{ + list_del(&b->list); + vfree(b->data); + kfree(b); + + ps->nr_chunk_buffers--; +} + +static int read_new_bitmap_chunk(struct dm_exception_store *ps) +{ + chunk_io(ps, ps->cur_bitmap_chunk, WRITE, ps->bitmap); + + ps->cur_bitmap_chunk++; + if (ps->cur_bitmap_chunk == ps->nr_bitmap_chunks + FIRST_BITMAP_CHUNK /* --- I couldn't get it working without this. I wonder how it ever worked? --- Mikulas */) + ps->cur_bitmap_chunk = FIRST_BITMAP_CHUNK; + + chunk_io(ps, ps->cur_bitmap_chunk, READ, ps->bitmap); + + return 0; +} + +static chunk_t shared_allocate_chunk(struct dm_exception_store *ps) +{ + /* !!! FIXME: replace multiply/divide/modulo with bit shifts */ + unsigned idx; + unsigned limit; + chunk_t start_chunk; + unsigned nr_bits = ps->chunk_shift + 3; + + start_chunk = ps->cur_bitmap_chunk; +again: + if (ps->cur_bitmap_chunk == ps->nr_bitmap_chunks && ps->nr_chunks & ((1 << nr_bits) - 1)) + limit = ps->nr_chunks & ((1 << nr_bits) - 1); + else + limit = 1 << nr_bits; + + idx = ext2_find_next_zero_bit(ps->bitmap, limit, ps->cur_bitmap_index); + if (idx < limit) { + ext2_set_bit(idx, ps->bitmap); + + if (idx == limit - 1) { + ps->cur_bitmap_index = 0; + + read_new_bitmap_chunk(ps); + } else + ps->cur_bitmap_index++; + } else { + /* chunk_io(ps, ps->cur_bitmap_chunk, WRITE, ps->bitmap); don't write it twice -- Mikulas */ + + read_new_bitmap_chunk(ps); + + /* todo: check # free chunks */ + if (start_chunk == ps->cur_bitmap_chunk) { + DM_MULTISNAP_SET_ERROR(ps->dm, -ENOSPC, ("%s %d: fail to find a new chunk", __func__, __LINE__)); + return 0; + } + + goto again; + } + + return idx + ((ps->cur_bitmap_chunk - FIRST_BITMAP_CHUNK) << nr_bits); +} + +static int shared_free_chunk(struct dm_exception_store *ps, chunk_t chunk) +{ + unsigned bits_per_chunk_shift = ps->chunk_shift + 3; + unsigned idx = chunk & ((1 << bits_per_chunk_shift) - 1); + + /* we don't always need to do this... */ + chunk_io(ps, ps->cur_bitmap_chunk, WRITE, ps->bitmap); + + ps->cur_bitmap_chunk = (chunk >> bits_per_chunk_shift) + FIRST_BITMAP_CHUNK; + + chunk_io(ps, ps->cur_bitmap_chunk, READ, ps->bitmap); + + if (!ext2_test_bit(idx, ps->bitmap)) { + DM_MULTISNAP_SET_ERROR(ps->dm, -EFSERROR, + ("%s: trying to free free block %lld %lld %u", __func__, + (unsigned long long)chunk, + (unsigned long long)ps->cur_bitmap_chunk, idx)); + } + + ext2_clear_bit(idx, ps->bitmap); + + chunk_io(ps, ps->cur_bitmap_chunk, WRITE, ps->bitmap); + + DMINFO("%s %d: free a chunk, %llu", __func__, __LINE__, + (unsigned long long)chunk); + + return 0; +} + +static void init_leaf(struct dm_exception_store *ps, struct leaf *leaf) +{ + leaf->magic = cpu_to_le16(0x1eaf); + leaf->version = 0; + leaf->base_chunk = 0; + leaf->count = 0; + leaf->map[0].offset = cpu_to_le32(ps->chunk_size); +} + +static struct chunk_buffer *new_btree_obj(struct dm_exception_store *ps) +{ + u64 chunk; + struct chunk_buffer *b; + + b = alloc_chunk_buffer(ps); + if (!b) + return NULL; + + chunk = shared_allocate_chunk(ps); + if (!chunk) { + free_chunk_buffer(ps, b); + return NULL; + } + + b->chunk = chunk; + + return b; +} + +static int shared_create_bitmap(struct dm_exception_store *ps) +{ + int i, r, rest, this; + chunk_t chunk; + + /* bitmap + superblock */ + rest = 1 + ps->nr_bitmap_chunks + ps->nr_journal_chunks; + + for (chunk = 0; chunk < ps->nr_bitmap_chunks; chunk++) { + memset(ps->area, 0, ps->chunk_size); + + this = min_t(int, rest, ps->chunk_size * 8); + if (this) { + rest -= this; + + memset(ps->area, 0xff, this / 8); + + for (i = 0; i < this % 8; i++) { + /* Mikulas: this produces unaligned accesses + char *p = ps->area + (this / 8); + ext2_set_bit(i, (unsigned long *)p);*/ + ext2_set_bit(this - this % 8 + i, (unsigned long *)ps->area); + } + } + + r = chunk_io(ps, chunk + FIRST_BITMAP_CHUNK, WRITE, + ps->area); + if (r) + return r; + } + + return 0; +} + +static struct chunk_buffer *new_leaf(struct dm_exception_store *ps) +{ + struct chunk_buffer *cb; + + cb = new_btree_obj(ps); + if (cb) + init_leaf(ps, cb->data); + + return cb; +} + +static struct chunk_buffer *new_node(struct dm_exception_store *ps) +{ + return new_btree_obj(ps); +} + +static int shared_create_btree(struct dm_exception_store *ps) +{ + struct chunk_buffer *l, *n; + int r; + + r = chunk_io(ps, ps->cur_bitmap_chunk, READ, ps->bitmap); + if (r) + return r; + + l = new_btree_obj(ps); + if (!l) + return -ENOMEM; + init_leaf(ps, l->data); + + n = new_btree_obj(ps); + if (!n) + return -ENOMEM; + + buffer2node(n)->count = cpu_to_le32(1); + buffer2node(n)->entries[0].chunk = cpu_to_le64(l->chunk); + + chunk_io(ps, l->chunk, WRITE, l->data); + chunk_io(ps, n->chunk, WRITE, n->data); + + ps->root_tree_chunk = n->chunk; + ps->snapmask = 0; + ps->tree_level = 1; + + return 0; +} + +static int shared_create_header(struct dm_exception_store *ps) +{ + int r; + + /* 128MB by default, should be configurable. */ + ps->nr_journal_chunks = (128 * 1024 * 1024) / ps->chunk_size; + + /* Mikulas: if the device is smaller than 128MB, we need something smaller */ + if (ps->nr_journal_chunks > ps->nr_chunks / 8) + ps->nr_journal_chunks = ps->nr_chunks / 8; + + r = shared_create_bitmap(ps); + if (r) + return r; + + r = shared_create_btree(ps); + if (r) + return r; + + r = write_header(ps); + if (r) + return r; + + return r; +} + +static int shared_read_header(struct dm_exception_store *ps, int *new_snapshot) +{ + struct disk_header *dh; + int r; + + ps->io_client = dm_io_client_create(sectors_to_pages(ps->chunk_size >> SECTOR_SHIFT)); + if (IS_ERR(ps->io_client)) { + r = PTR_ERR(ps->io_client); + goto fail_io_client; + } + + ps->bitmap = vmalloc(ps->chunk_size); + if (!ps->bitmap) { + r = -ENOMEM; + goto fail_bitmap; + } + + r = alloc_area(ps); + if (r) + goto fail_alloc_area; + + + r = chunk_io(ps, 0, READ, ps->area); + if (r) + goto fail_to_read_header; + + dh = (struct disk_header *) ps->area; + + if (le32_to_cpu(dh->magic) == 0) { + *new_snapshot = 1; + return 0; + } + + if (le32_to_cpu(dh->magic) != SNAP_MAGIC) { + DMWARN("Invalid or corrupt snapshot"); + r = -ENXIO; + goto fail_to_read_header; + } + + *new_snapshot = 0; + if (le32_to_cpu(dh->valid)) + dm_multisnap_set_error(ps->dm, -EINVAL); + ps->version = le32_to_cpu(dh->version); + + ps->root_tree_chunk = le64_to_cpu(dh->root_tree_chunk); + ps->snapmask = le64_to_cpu(dh->snapmask); + ps->tree_level = le32_to_cpu(dh->tree_level); + ps->nr_journal_chunks = le32_to_cpu(dh->h_nr_journal_chunks); + + if (ps->chunk_size >> SECTOR_SHIFT != le32_to_cpu(dh->chunk_size)) { + DMWARN("Invalid chunk size"); + r = -ENXIO; + goto fail_to_read_header; + } + + return 0; + +fail_to_read_header: + free_area(ps); +fail_alloc_area: + vfree(ps->bitmap); + ps->bitmap = NULL; +fail_bitmap: + dm_io_client_destroy(ps->io_client); +fail_io_client: + return r; +} + +static void shared_drop_header(struct dm_exception_store *ps) +{ + free_area(ps); + vfree(ps->bitmap); + ps->bitmap = NULL; + dm_io_client_destroy(ps->io_client); +} + +static int shared_read_metadata(struct dm_exception_store *ps, char **error) +{ + int i, r, uninitialized_var(new_snapshot); + size_t size = i_size_read(dm_multisnap_snapshot_bdev(ps->dm)->bd_inode); + chunk_t bitmap_chunk_bytes; + unsigned chunk_bytes = ps->chunk_size; + + ps->cur_bitmap_chunk = FIRST_BITMAP_CHUNK; + ps->cur_bitmap_index = 0; + ps->nr_chunks = size >> ps->chunk_shift; + + INIT_LIST_HEAD(&ps->chunk_buffer_list); + INIT_LIST_HEAD(&ps->chunk_buffer_dirty_list); + ps->nr_chunk_buffers = 0; + + bitmap_chunk_bytes = DIV_ROUND_UP(ps->nr_chunks, 8); + ps->nr_bitmap_chunks = (bitmap_chunk_bytes + chunk_bytes - 1) >> ps->chunk_shift; + + r = shared_read_header(ps, &new_snapshot); + if (r) { + *error = "Failed to read header"; + return r; + } + + if (new_snapshot) + DMINFO("%s %d: creates a new cow device", __func__, __LINE__); + else + DMINFO("%s %d: loads the cow device", __func__, __LINE__); + + if (new_snapshot) { + r = shared_create_header(ps); + if (r) { + *error = "Failed to create header"; + goto bad_header; + } + } else { + r = chunk_io(ps, ps->cur_bitmap_chunk, READ, ps->bitmap); + if (r) { + *error = "Failed to read bitmap"; + goto bad_header; + } + } + + for (i = 0; i < MAX_SNAPSHOTS; i++) + if (ps->snapmask & 1LL << i) + ps->nr_snapshots++; + + return 0; + +bad_header: + shared_drop_header(ps); + return r; +} + +struct etree_path { + struct chunk_buffer *buffer; + struct index_entry *pnext; +}; + +static struct chunk_buffer *btbread(struct dm_exception_store *ps, u64 chunk) +{ + struct chunk_buffer *b; + + list_for_each_entry(b, &ps->chunk_buffer_list, list) { + if (b->chunk == chunk) + return b; + } + + b = alloc_chunk_buffer(ps); + if (!b) + return NULL; + + b->chunk = chunk; + + chunk_io(ps, chunk, READ, b->data); + + return b; +} + +static void brelse(struct chunk_buffer *buffer) +{ +} + +static void brelse_dirty(struct dm_exception_store *ps, struct chunk_buffer *b) +{ + if (list_empty(&b->dirty_list)) + list_add(&b->dirty_list, &ps->chunk_buffer_dirty_list); +} + +static void set_buffer_dirty(struct dm_exception_store *ps, struct chunk_buffer *b) +{ + if (list_empty(&b->dirty_list)) + list_add(&b->dirty_list, &ps->chunk_buffer_dirty_list); +} + +static inline struct exception *emap(struct leaf *leaf, unsigned i) +{ + return (struct exception *) + ((char *)leaf + le32_to_cpu(leaf->map[i].offset)); +} + +static int add_exception_to_leaf(struct leaf *leaf, u64 chunk, u64 exception, + int snapshot, u64 active) +{ + unsigned target = chunk - le64_to_cpu(leaf->base_chunk); + u64 mask = 1ULL << snapshot, sharemap; + struct exception *ins, *exceptions = emap(leaf, 0); + char *maptop = (char *)(&leaf->map[le32_to_cpu(leaf->count) + 1]); + unsigned i, j, free = (char *)exceptions - maptop; + + /* + * Find the chunk for which we're adding an exception entry. + */ + for (i = 0; i < le32_to_cpu(leaf->count); i++) /* !!! binsearch goes here */ + if (le32_to_cpu(leaf->map[i].rchunk) >= target) + break; + + /* + * If we didn't find the chunk, insert a new one at map[i]. + */ + if (i == le32_to_cpu(leaf->count) || + le32_to_cpu(leaf->map[i].rchunk) > target) { + if (free < sizeof(struct exception) + sizeof(struct tree_map)) + return -1; + ins = emap(leaf, i); + memmove(&leaf->map[i+1], &leaf->map[i], maptop - (char *)&leaf->map[i]); + leaf->map[i].offset = cpu_to_le32((char *)ins - (char *)leaf); + leaf->map[i].rchunk = cpu_to_le32(target); + leaf->count = cpu_to_le32(le32_to_cpu(leaf->count) + 1); + sharemap = snapshot == -1 ? active : mask; + goto insert; + } + + if (free < sizeof(struct exception)) + return -1; + /* + * Compute the share map from that of each existing exception entry + * for this chunk. If we're doing this for a chunk on the origin, + * the new exception is shared between those snapshots that weren't + * already sharing exceptions for this chunk. (We combine the sharing + * that already exists, invert it, then mask off everything but the + * active snapshots.) + * + * If this is a chunk on a snapshot we go through the existing + * exception list to turn off sharing with this snapshot (with the + * side effect that if the chunk was only shared by this snapshot it + * becomes unshared). We then set sharing for this snapshot in the + * new exception entry. + */ + if (snapshot == -1) { + for (sharemap = 0, ins = emap(leaf, i); ins < emap(leaf, i+1); ins++) + sharemap |= le64_to_cpu(ins->share); + sharemap = (~sharemap) & active; + } else { + for (ins = emap(leaf, i); ins < emap(leaf, i+1); ins++) { + u64 val = le64_to_cpu(ins->share); + if (val & mask) { + ins->share = cpu_to_le64(val & ~mask); + break; + } + } + sharemap = mask; + } + ins = emap(leaf, i); +insert: + /* + * Insert the new exception entry. These grow from the end of the + * block toward the beginning. First move any earlier exceptions up + * to make room for the new one, then insert the new entry in the + * space freed. Adjust the offsets for all earlier chunks. + */ + memmove(exceptions - 1, exceptions, (char *)ins - (char *)exceptions); + ins--; + ins->share = cpu_to_le64(sharemap); + ins->chunk = cpu_to_le64(exception); + + for (j = 0; j <= i; j++) { + u32 val = le32_to_cpu(leaf->map[j].offset); + leaf->map[j].offset = cpu_to_le32(val - sizeof(struct exception)); + } + + return 0; +} + +static void insert_child(struct node *node, struct index_entry *p, u64 child, + u64 childkey) +{ + size_t count = (char *)(&node->entries[0] + le32_to_cpu(node->count)) - + (char *)p; + memmove(p + 1, p, count); + p->chunk = cpu_to_le64(child); + p->key = cpu_to_le64(childkey); + node->count = cpu_to_le32(le32_to_cpu(node->count) + 1); +} + +static u64 split_leaf(struct leaf *leaf, struct leaf *leaf2) +{ + unsigned i, nhead, ntail, tailsize; + u64 splitpoint; + char *phead, *ptail; + + nhead = (le32_to_cpu(leaf->count) + 1) / 2; + ntail = le32_to_cpu(leaf->count) - nhead; + + /* Should split at middle of data instead of median exception */ + splitpoint = le32_to_cpu(leaf->map[nhead].rchunk) + + le64_to_cpu(leaf->base_chunk); + + phead = (char *)emap(leaf, 0); + ptail = (char *)emap(leaf, nhead); + tailsize = (char *)emap(leaf, le32_to_cpu(leaf->count)) - ptail; + + /* Copy upper half to new leaf */ + memcpy(leaf2, leaf, offsetof(struct leaf, map)); + memcpy(&leaf2->map[0], &leaf->map[nhead], (ntail + 1) * sizeof(struct tree_map)); + memcpy(ptail - (char *)leaf + (char *)leaf2, ptail, tailsize); + leaf2->count = cpu_to_le32(ntail); + + /* Move lower half to top of block */ + memmove(phead + tailsize, phead, ptail - phead); + leaf->count = cpu_to_le32(nhead); + for (i = 0; i <= nhead; i++) + leaf->map[i].offset = + cpu_to_le32(le32_to_cpu(leaf->map[i].offset) + tailsize); + leaf->map[nhead].rchunk = 0; + + return splitpoint; +} + +static int add_exception_to_tree(struct dm_exception_store *ps, + struct chunk_buffer *leafbuf, + u64 target, u64 exception, int snapbit, + struct etree_path path[], unsigned levels) +{ + struct node *newroot; + struct chunk_buffer *newrootbuf, *childbuf; + struct leaf *leaf; + u64 childkey, childsector; + int ret; + + ret = add_exception_to_leaf(buffer2leaf(leafbuf), target, + exception, snapbit, ps->snapmask); + if (!ret) { + brelse_dirty(ps, leafbuf); + return 0; + } + + /* + * There wasn't room to add a new exception to the leaf. Split it. + */ + + childbuf = new_leaf(ps); + if (!childbuf) + return -ENOMEM; /* this is the right thing to do? */ + + set_buffer_dirty(ps, childbuf); + + childkey = split_leaf(buffer2leaf(leafbuf), buffer2leaf(childbuf)); + childsector = childbuf->chunk; + + /* + * Now add the exception to the appropriate leaf. Childkey has the + * first chunk in the new leaf we just created. + */ + if (target < childkey) + leaf = buffer2leaf(leafbuf); + else + leaf = buffer2leaf(childbuf); + + ret = add_exception_to_leaf(leaf, target, exception, snapbit, + ps->snapmask); + if (ret) + return -ENOMEM; + + brelse_dirty(ps, leafbuf); + brelse_dirty(ps, childbuf); + + while (levels--) { + unsigned half; + u64 newkey; + struct index_entry *pnext = path[levels].pnext; + struct chunk_buffer *parentbuf = path[levels].buffer; + struct node *parent = buffer2node(parentbuf); + struct chunk_buffer *newbuf; + struct node *newnode; + int csize = ps->chunk_size; + int alloc_per_node = (csize - offsetof(struct node, entries)) + / sizeof(struct index_entry); + + if (le32_to_cpu(parent->count) < alloc_per_node) { + insert_child(parent, pnext, childsector, childkey); + set_buffer_dirty(ps, parentbuf); + return 0; + } + /* + * Split the node. + */ + half = le32_to_cpu(parent->count) / 2; + newkey = le64_to_cpu(parent->entries[half].key); + newbuf = new_node(ps); + if (!newbuf) + return -ENOMEM; + set_buffer_dirty(ps, newbuf); + newnode = buffer2node(newbuf); + + newnode->count = cpu_to_le32(le32_to_cpu(parent->count) - half); + memcpy(&newnode->entries[0], &parent->entries[half], + le32_to_cpu(newnode->count) * sizeof(struct index_entry)); + parent->count = cpu_to_le32(half); + /* + * If the path entry is in the new node, use that as the + * parent. + */ + if (pnext > &parent->entries[half]) { + pnext = pnext - &parent->entries[half] + newnode->entries; + set_buffer_dirty(ps, parentbuf); + parentbuf = newbuf; + parent = newnode; + } else + set_buffer_dirty(ps, newbuf); + /* + * Insert the child now that we have room in the parent, then + * climb the path and insert the new child there. + */ + insert_child(parent, pnext, childsector, childkey); + set_buffer_dirty(ps, parentbuf); + childkey = newkey; + childsector = newbuf->chunk; + brelse(newbuf); + } + + newrootbuf = new_node(ps); + if (!newrootbuf) + return -ENOMEM; + + newroot = buffer2node(newrootbuf); + + newroot->count = cpu_to_le32(2); + newroot->entries[0].chunk = cpu_to_le64(ps->root_tree_chunk); + newroot->entries[1].key = cpu_to_le64(childkey); + newroot->entries[1].chunk = cpu_to_le64(childsector); + ps->root_tree_chunk = newrootbuf->chunk; + ps->tree_level++; + ps->header_dirty = 1; + brelse_dirty(ps, newrootbuf); + return 0; +} + +static struct chunk_buffer *probe(struct dm_exception_store *ps, u64 chunk, + struct etree_path *path) +{ + unsigned i, levels = ps->tree_level; + struct node *node; + struct chunk_buffer *nodebuf = btbread(ps, ps->root_tree_chunk); + + if (!nodebuf) + return NULL; + node = buffer2node(nodebuf); + + for (i = 0; i < levels; i++) { + struct index_entry *pnext = node->entries, + *top = pnext + le32_to_cpu(node->count); + + while (++pnext < top) + if (le64_to_cpu(pnext->key) > chunk) + break; + + path[i].buffer = nodebuf; + path[i].pnext = pnext; + nodebuf = btbread(ps, le64_to_cpu((pnext - 1)->chunk)); + if (!nodebuf) + return NULL; + + node = (struct node *)nodebuf->data; + } + BUG_ON(le16_to_cpu(((struct leaf *)nodebuf->data)->magic) != 0x1eaf); + return nodebuf; +} + +static inline struct node *path_node(struct etree_path path[], int level) +{ + return buffer2node(path[level].buffer); +} + +/* + * Release each buffer in the given path array. + */ +static void brelse_path(struct etree_path *path, unsigned levels) +{ + unsigned i; + for (i = 0; i < levels; i++) + brelse(path[i].buffer); +} + +/* + * Merge the contents of 'leaf2' into 'leaf.' The leaves are contiguous and + * 'leaf2' follows 'leaf.' Move the exception lists in 'leaf' up to make room + * for those of 'leaf2,' adjusting the offsets in the map entries, then copy + * the map entries and exception lists straight from 'leaf2.' + */ +static void merge_leaves(struct leaf *leaf, struct leaf *leaf2) +{ + unsigned nhead, ntail, i; + unsigned tailsize; + char *phead, *ptail; + + nhead = le32_to_cpu(leaf->count); + ntail = le32_to_cpu(leaf2->count); + tailsize = (char *)emap(leaf2, ntail) - (char *)emap(leaf2, 0); + phead = (char *)emap(leaf, 0); + ptail = (char *)emap(leaf, nhead); + + /* move data down */ + memmove(phead - tailsize, phead, ptail - phead); + + /* adjust pointers */ + for (i = 0; i <= nhead; i++) { + u32 val = le32_to_cpu(leaf->map[i].offset); + /* also adjust sentinel */ + leaf->map[i].offset = cpu_to_le32(val - tailsize); + } + + /* move data from leaf2 to top */ + /* data */ + memcpy(ptail - tailsize, (char *)emap(leaf2, 0), tailsize); + /* map */ + memcpy(&leaf->map[nhead], &leaf2->map[0], + (ntail + 1) * sizeof(struct tree_map)); + leaf->count = cpu_to_le32(le32_to_cpu(leaf->count) + ntail); +} + +/* + * Remove the index entry at path[level].pnext-1 by moving entries below it up + * into its place. If it wasn't the last entry in the node but it _was_ the + * first entry (and we're not at the root), preserve the key by inserting it + * into the index entry of the parent node that refers to this node. + */ +static void remove_index(struct dm_exception_store *ps, struct etree_path path[], int level) +{ + struct node *node = path_node(path, level); + /* !!! out of bounds for delete of last from full index */ + chunk_t pivot = le64_to_cpu((path[level].pnext)->key); + int i, count = le32_to_cpu(node->count); + + /* stomps the node count (if 0th key holds count) */ + memmove(path[level].pnext - 1, path[level].pnext, + (char *)&node->entries[count] - (char *)path[level].pnext); + node->count = cpu_to_le32(count - 1); + --(path[level].pnext); + set_buffer_dirty(ps, path[level].buffer); + + /* no pivot for last entry */ + if (path[level].pnext == node->entries + le32_to_cpu(node->count)) + return; + + /* + * climb up to common parent and set pivot to deleted key + * what if index is now empty? (no deleted key) + * then some key above is going to be deleted and used to set pivot + */ + if (path[level].pnext == node->entries && level) { + /* Keep going up the path if we're at the first index entry. */ + for (i = level - 1; path[i].pnext - 1 == path_node(path, i)->entries; i--) + if (!i) /* If we hit the root, we're done. */ + return; + /* + * Found a node where we're not at the first entry. + * Set the key here to that of the deleted index + * entry. + */ + (path[i].pnext - 1)->key = cpu_to_le64(pivot); + set_buffer_dirty(ps, path[i].buffer); + } +} + +/* + * Returns the number of bytes of free space in the given leaf by computing + * the difference between the end of the map entry list and the beginning + * of the first set of exceptions. + */ +static unsigned leaf_freespace(struct leaf *leaf) +{ + /* include sentinel */ + char *maptop = (char *)(&leaf->map[le32_to_cpu(leaf->count) + 1]); + return (char *)emap(leaf, 0) - maptop; +} + +/* + * Returns the number of bytes used in the given leaf by computing the number + * of bytes used by the map entry list and all sets of exceptions. + */ +static unsigned leaf_payload(struct leaf *leaf) +{ + int count = le32_to_cpu(leaf->count); + int lower = (char *)(&leaf->map[count]) - (char *)leaf->map; + int upper = (char *)emap(leaf, count) - (char *)emap(leaf, 0); + return lower + upper; +} + +static void check_leaf(struct dm_exception_store *ps, struct leaf *leaf, u64 snapmask) +{ + struct exception *p; + int i; + + for (i = 0; i < le32_to_cpu(leaf->count); i++) { + for (p = emap(leaf, i); p < emap(leaf, i+1); p++) { + /* !!! should also check for any zero sharemaps here */ + if (le64_to_cpu(p->share) & snapmask) { + DM_MULTISNAP_SET_ERROR(ps->dm, -EFSERROR, + ("nonzero bits %016llx outside snapmask %016llx", + (unsigned long long)p->share, + (unsigned long long)snapmask)); + } + } + } +} + +/* + * Remove all exceptions belonging to a given snapshot from the passed leaf. + * + * This clears the "share" bits on each chunk for the snapshot mask passed + * in the delete_info structure. In the process, it compresses out any + * exception entries that are entirely unshared and/or unused. In a second + * pass, it compresses out any map entries for which there are no exception + * entries remaining. + */ +static int delete_snapshots_from_leaf(struct dm_exception_store *ps, + struct leaf *leaf, u64 snapmask) +{ + /* p points just past the last map[] entry in the leaf. */ + struct exception *p = emap(leaf, le32_to_cpu(leaf->count)), *dest = p; + struct tree_map *pmap, *dmap; + unsigned i; + int ret = 0; + + /* Scan top to bottom clearing snapshot bit and moving + * non-zero entries to top of block */ + /* + * p points at each original exception; dest points at the location + * to receive an exception that is being moved down in the leaf. + * Exceptions that are unshared after clearing the share bit for + * the passed snapshot mask are skipped and the associated "exception" + * chunk is freed. This operates on the exceptions for one map entry + * at a time; when the beginning of a list of exceptions is reached, + * the associated map entry offset is adjusted. + */ + for (i = le32_to_cpu(leaf->count); i--;) { + /* + * False the first time through, since i is leaf->count and p + * was set to emap(leaf, leaf->count) above. + */ + while (p != emap(leaf, i)) { + u64 share = le64_to_cpu((--p)->share); + ret |= share & snapmask; + /* Unshare with given snapshot(s). */ + p->share = cpu_to_le64(le64_to_cpu(p->share) & ~snapmask); + if (le64_to_cpu(p->share)) /* If still used, keep chunk. */ + *--dest = *p; + else + shared_free_chunk(ps, le64_to_cpu(p->chunk)); + /* dirty_buffer_count_check(sb); */ + } + leaf->map[i].offset = cpu_to_le32((char *)dest - (char *)leaf); + } + /* Remove empties from map */ + /* + * This runs through the map entries themselves, skipping entries + * with matching offsets. If all the exceptions for a given map + * entry are skipped, its offset will be set to that of the following + * map entry (since the dest pointer will not have moved). + */ + dmap = pmap = &leaf->map[0]; + for (i = 0; i < le32_to_cpu(leaf->count); i++, pmap++) + if (le32_to_cpu(pmap->offset) != le32_to_cpu((pmap + 1)->offset)) + *dmap++ = *pmap; + /* + * There is always a phantom map entry after the last, that has the + * offset of the end of the leaf and, of course, no chunk number. + */ + dmap->offset = pmap->offset; + dmap->rchunk = 0; /* tidy up */ + leaf->count = cpu_to_le32(dmap - &leaf->map[0]); + check_leaf(ps, leaf, snapmask); + + return ret; +} + +/* + * Return true if path[level].pnext points at the end of the list of index + * entries. + */ +static inline int finished_level(struct etree_path path[], int level) +{ + struct node *node = path_node(path, level); + return path[level].pnext == node->entries + le32_to_cpu(node->count); +} + +/* + * Copy the index entries in 'node2' into 'node.' + */ +static void merge_nodes(struct node *node, struct node *node2) +{ + memcpy(&node->entries[le32_to_cpu(node->count)], + &node2->entries[0], + le32_to_cpu(node2->count) * sizeof(struct index_entry)); + node->count = cpu_to_le32(le32_to_cpu(node->count) + le32_to_cpu(node2->count)); +} + +static void brelse_free(struct dm_exception_store *ps, struct chunk_buffer *buffer) +{ + shared_free_chunk(ps, buffer->chunk); + free_chunk_buffer(ps, buffer); +} + +/* + * Delete all chunks in the B-tree for the snapshot(s) indicated by the + * passed snapshot mask, beginning at the passed chunk. + * + * Walk the tree (a stack-based inorder traversal) starting with the passed + * chunk, calling delete_snapshots_from_leaf() on each leaf to remove chunks + * associated with the snapshot(s) we're removing. As leaves and nodes become + * sparsely filled, merge them with their neighbors. When we reach the root + * we've finished the traversal; if there are empty levels (that is, level(s) + * directly below the root that only contain a single node), remove those + * empty levels until either the second level is no longer empty or we only + * have one level remaining. + */ +static int delete_tree_range(struct dm_exception_store *ps, u64 snapmask, chunk_t resume) +{ + int levels = ps->tree_level, level = levels - 1; + struct etree_path path[levels], hold[levels]; + struct chunk_buffer *leafbuf, *prevleaf = NULL; + unsigned i; + + /* can be initializer if not dynamic array (change it?) */ + for (i = 0; i < levels; i++) + hold[i] = (struct etree_path){ }; + /* + * Find the B-tree leaf with the chunk we were passed. Often + * this will be chunk 0. + */ + leafbuf = probe(ps, resume, path); + if (!leafbuf) + return -ENOMEM; + + while (1) { /* in-order leaf walk */ + if (delete_snapshots_from_leaf(ps, buffer2leaf(leafbuf), snapmask)) + set_buffer_dirty(ps, leafbuf); + /* + * If we have a previous leaf (i.e. we're past the first), + * try to merge the current leaf with it. + */ + if (prevleaf) { /* try to merge this leaf with prev */ + struct leaf *this = buffer2leaf(leafbuf); + struct leaf *prev = buffer2leaf(prevleaf); + + /* + * If there's room in the previous leaf for this leaf, + * merge this leaf into the previous leaf and remove + * the index entry that points to this leaf. + */ + if (leaf_payload(this) <= leaf_freespace(prev)) { + merge_leaves(prev, this); + remove_index(ps, path, level); + set_buffer_dirty(ps, prevleaf); + brelse_free(ps, leafbuf); + /* dirty_buffer_count_check(sb); */ + goto keep_prev_leaf; + } + brelse(prevleaf); + } + prevleaf = leafbuf; /* Save leaf for next time through. */ +keep_prev_leaf: + /* + * If we've reached the end of the index entries in the B-tree + * node at the current level, try to merge the node referred + * to at this level of the path with a prior node. Repeat + * this process at successively higher levels up the path; if + * we reach the root, clean up and exit. If we don't reach + * the root, we've reached a node with multiple entries; + * rebuild the path from the next index entry to the next + * leaf. + */ + if (finished_level(path, level)) { + do { /* pop and try to merge finished nodes */ + /* + * If we have a previous node at this level + * (again, we're past the first node), try to + * merge the current node with it. + */ + if (hold[level].buffer) { + struct node *this = path_node(path, level); + struct node *prev = path_node(hold, level); + int csize = ps->chunk_size; + int alloc_per_node = + (csize - offsetof(struct node, entries)) + / sizeof(struct index_entry); + + /* + * If there's room in the previous node + * for the entries in this node, merge + * this node into the previous node and + * remove the index entry that points + * to this node. + */ + if (le32_to_cpu(this->count) <= + alloc_per_node - le32_to_cpu(prev->count)) { + merge_nodes(prev, this); + remove_index(ps, path, level - 1); + set_buffer_dirty(ps, hold[level].buffer); + brelse_free(ps, path[level].buffer); +/* dirty_buffer_count_check(sb); */ + goto keep_prev_node; + } + brelse(hold[level].buffer); + } + /* Save the node for the next time through. */ + hold[level].buffer = path[level].buffer; +keep_prev_node: + /* + * If we're at the root, we need to clean up + * and return. First, though, try to reduce + * the number of levels. If the tree at the + * root has been reduced to only the nodes in + * our path, eliminate nodes with only one + * entry until we either have a new root node + * with multiple entries or we have only one + * level remaining in the B-tree. + */ + if (!level) { /* remove levels if possible */ + /* + * While above the first level and the + * root only has one entry, point the + * root at the (only) first-level node, + * reduce the number of levels and + * shift the path up one level. + */ + while (levels > 1 && + le32_to_cpu(path_node(hold, 0)->count) == 1) { + /* Point root at the first level. */ + ps->root_tree_chunk = hold[1].buffer->chunk; + brelse_free(ps, hold[0].buffer); +/* dirty_buffer_count_check(sb); */ + levels = --ps->tree_level; + memcpy(hold, hold + 1, levels * sizeof(hold[0])); + ps->header_dirty = 1; + } + brelse(prevleaf); + brelse_path(hold, levels); + +/* if (dirty_buffer_count) */ +/* commit_transaction(sb, 0); */ + ps->snapmask &= ~snapmask; + ps->header_dirty = 1; + return 0; + } + + level--; + } while (finished_level(path, level)); + /* + * Now rebuild the path from where we are (one entry + * past the last leaf we processed, which may have + * been adjusted in operations above) down to the node + * above the next leaf. + */ + do { /* push back down to leaf level */ + struct chunk_buffer *nodebuf; + + nodebuf = btbread(ps, + le64_to_cpu(path[level++].pnext++->chunk)); + if (!nodebuf) { + brelse_path(path, level - 1); /* anything else needs to be freed? */ + return -ENOMEM; + } + path[level].buffer = nodebuf; + path[level].pnext = buffer2node(nodebuf)->entries; + } while (level < levels - 1); + } + + /* dirty_buffer_count_check(sb); */ + /* + * Get the leaf indicated in the next index entry in the node + * at this level. + */ + leafbuf = btbread(ps, le64_to_cpu(path[level].pnext++->chunk)); + if (!leafbuf) { + brelse_path(path, level); + return -ENOMEM; + } + } +} + +static int origin_chunk_unique(struct leaf *leaf, u64 chunk, u64 snapmask) +{ + u64 using = 0; + u64 i, target = chunk - le64_to_cpu(leaf->base_chunk); + struct exception const *p; + + for (i = 0; i < le32_to_cpu(leaf->count); i++) + if (le32_to_cpu(leaf->map[i].rchunk) == target) + goto found; + return !snapmask; +found: + for (p = emap(leaf, i); p < emap(leaf, i+1); p++) + using |= le64_to_cpu(p->share); + + return !(~using & snapmask); +} + +static int snapshot_chunk_unique(struct leaf *leaf, u64 chunk, int snapbit, + chunk_t *exception) +{ + u64 mask = 1LL << snapbit; + unsigned i, target = chunk - le64_to_cpu(leaf->base_chunk); + struct exception const *p; + + for (i = 0; i < le32_to_cpu(leaf->count); i++) + if (le32_to_cpu(leaf->map[i].rchunk) == target) + goto found; + return 0; +found: + for (p = emap(leaf, i); p < emap(leaf, i+1); p++) + /* shared if more than one bit set including this one */ + if ((le64_to_cpu(p->share) & mask)) { + *exception = le64_to_cpu(p->chunk); + return !(le64_to_cpu(p->share) & ~mask); + } + return 0; +} + +static int shared_init(struct dm_multisnap *dm, struct dm_exception_store **sp, + unsigned argc, char **argv, char **error) +{ + int r; + struct dm_exception_store *ps; + + ps = kzalloc(sizeof(struct dm_exception_store), GFP_KERNEL); + if (!ps) { + *error = "Could not allocate private area"; + r = -ENOMEM; + goto bad_private; + } + *sp = ps; + + ps->dm = dm; + ps->chunk_size = dm_multisnap_chunk_size(dm); + ps->chunk_shift = ffs(ps->chunk_size) - 1; + + if (argc != 0) { + *error = "Bad number of arguments"; + r = -EINVAL; + goto bad_arguments; + } + + if (ps->chunk_size != DM_CHUNK_SIZE_DEFAULT_SECTORS << SECTOR_SHIFT) { + *error = "Unsupported chunk size"; + r = -EINVAL; + goto bad_arguments; + } + + ps->area = NULL; + + r = shared_read_metadata(ps, error); + if (r) + goto bad_metadata; + + return 0; + +bad_metadata: +bad_arguments: + kfree(ps); +bad_private: + return r; +} + +static void shared_destroy(struct dm_exception_store *ps) +{ + struct chunk_buffer *bb, *n; + + list_for_each_entry_safe(bb, n, &ps->chunk_buffer_list, list) + free_chunk_buffer(ps, bb); + + shared_drop_header(ps); + + kfree(ps); +} + +static int shared_allocate_snapid(struct dm_exception_store *ps, + snapid_t *snapid, int snap_of_snap, snapid_t master) +{ + int i; + + if (snap_of_snap) { + DMERR("shared_allocate_snapid: this exception store doesn't support snapshots of snapshots"); + return -EOPNOTSUPP; + } + + for (i = 0; i < MAX_SNAPSHOTS; i++) + if (!(ps->snapmask & 1LL << i)) { + *snapid = i; + return 0; + } + + DMERR("shared_allocate_snapid: limit of 64 snapshots reached"); + return -ENOSPC; +} + +static int shared_create_snapshot(struct dm_exception_store *ps, snapid_t snapid) +{ + if (snapid >= MAX_SNAPSHOTS) { + DMERR("shared_create_snapshot: invalid snapshot id %llx", + (unsigned long long)snapid); + return -EINVAL; + } + if (ps->snapmask & 1LL << snapid) { + DMERR("shared_create_snapshot: snapshot with id %llx already exists", + (unsigned long long)snapid); + return -EINVAL; + } + ps->snapmask |= 1LL << snapid; + ps->nr_snapshots++; + write_header(ps); + return 0; +} + +static void shared_commit(struct dm_exception_store *ps); + +static int shared_delete_snapshot(struct dm_exception_store *ps, snapid_t idx) +{ + int r; + + if (ps->snapmask & 1LL << idx) { + u64 mask; + DMINFO("%s %d: delete %uth snapshot.", + __func__, __LINE__, (int)idx); + + mask = 1ULL << idx; + + r = delete_tree_range(ps, mask, 0); + if (!r) { + ps->snapmask &= ~(1LL << idx); + ps->nr_snapshots--; + } + + write_header(ps); + + shared_commit(ps); + } else { + BUG(); /* checked in the caller */ + } + + return r; +} + +static snapid_t shared_get_next_snapid(struct dm_exception_store *ps, snapid_t snapid) +{ + for (; snapid < MAX_SNAPSHOTS; snapid++) + if ((ps->snapmask >> snapid) & 1) + return snapid; + return DM_SNAPID_T_ORIGIN; +} + +static int shared_find_snapshot_chunk(struct dm_exception_store *ps, snapid_t snapid, + chunk_t chunk, int write, chunk_t *result) +{ + unsigned levels = ps->tree_level; + struct etree_path path[levels + 1]; + struct chunk_buffer *leafbuf; + + leafbuf = probe(ps, (u64)chunk, path); + if (!leafbuf) /* should make the snapshot invalid */ + return -1; + + return snapshot_chunk_unique(buffer2leaf(leafbuf), chunk, snapid, result); +} + +static void shared_reset_query(struct dm_exception_store *ps) +{ +} + +static int shared_query_next_remap(struct dm_exception_store *ps, chunk_t chunk) +{ + unsigned levels = ps->tree_level; + struct etree_path path[levels + 1]; + struct chunk_buffer *leafbuf; + + leafbuf = probe(ps, (u64)chunk, path); + if (!leafbuf) /* should make the snapshot invalid */ + return -1; + + ps->chunk_to_add = chunk; + + return !origin_chunk_unique(buffer2leaf(leafbuf), chunk, ps->snapmask); +} + +static void shared_add_next_remap(struct dm_exception_store *ps, + union chunk_descriptor *cd, chunk_t *new_chunk) +{ + struct chunk_buffer *cb; + struct etree_path path[ps->tree_level + 1]; + chunk_t chunk = ps->chunk_to_add; + int ret; + + /* + * Mikulas: TODO: we could set just bits that we are really relocating. + * But setting more bits won't cause incorrect behavior, it'll just + * temporarily block access to already remapped snapshots. + */ + cd->bitmask = ps->snapmask; + + cb = probe(ps, chunk, path); + if (!cb) + return; + + ret = origin_chunk_unique(buffer2leaf(cb), chunk, ps->snapmask); + if (ret) { + DM_MULTISNAP_SET_ERROR(ps->dm, -EFSERROR, + ("%s %d: bug %llu %d", __func__, __LINE__, + (unsigned long long)chunk, ret)); + return; + } + + *new_chunk = shared_allocate_chunk(ps); + if (!*new_chunk) + return; + + add_exception_to_tree(ps, cb, chunk, *new_chunk, -1, path, + ps->tree_level); + + /*DMINFO("%s %d: allocated new chunk, %llu", __func__, __LINE__, + (unsigned long long)*new_chunk);*/ +} + +static int shared_check_conflict(struct dm_exception_store *ps, + union chunk_descriptor *cd, snapid_t snapid) +{ + return !!(cd->bitmask & (1LL << snapid)); +} + +static void shared_commit(struct dm_exception_store *ps) +{ + struct chunk_buffer *b, *n; + + /* Write bitmap */ + chunk_io(ps, ps->cur_bitmap_chunk, WRITE, ps->bitmap); + + list_for_each_entry_safe(b, n, &ps->chunk_buffer_dirty_list, + dirty_list) { + + list_del_init(&b->dirty_list); + list_move_tail(&b->list, &ps->chunk_buffer_list); + + /* todo: can be async */ + chunk_io(ps, b->chunk, WRITE, b->data); + } + + if (ps->header_dirty) + write_header(ps); + + list_for_each_entry_safe(b, n, &ps->chunk_buffer_list, list) { + if (ps->nr_chunk_buffers < MAX_CHUNK_BUFFERS) + break; + + free_chunk_buffer(ps, b); + } +} + +struct dm_multisnap_exception_store dm_multisnap_daniel_store = { + .name = "daniel", + .module = THIS_MODULE, + .init_exception_store = shared_init, + .exit_exception_store = shared_destroy, + .allocate_snapid = shared_allocate_snapid, + .create_snapshot = shared_create_snapshot, + .delete_snapshot = shared_delete_snapshot, + .get_next_snapid = shared_get_next_snapid, + .find_snapshot_chunk = shared_find_snapshot_chunk, + .reset_query = shared_reset_query, + .query_next_remap = shared_query_next_remap, + .add_next_remap = shared_add_next_remap, + .check_conflict = shared_check_conflict, + .commit = shared_commit, +}; + +static int __init dm_multisnapshot_daniel_module_init(void) +{ + return dm_multisnap_register_exception_store(&dm_multisnap_daniel_store); +} + +static void __exit dm_multisnapshot_daniel_module_exit(void) +{ + dm_multisnap_unregister_exception_store(&dm_multisnap_daniel_store); +} + +module_init(dm_multisnapshot_daniel_module_init); +module_exit(dm_multisnapshot_daniel_module_exit); + +MODULE_DESCRIPTION(DM_NAME " multisnapshot Fujita/Daniel's exceptions store"); +MODULE_AUTHOR("Fujita Tomonorig, Daniel Phillips"); +MODULE_LICENSE("GPL"); -- 1.6.5.2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 14/14] dm-multisnap-daniel 2010-03-02 0:23 ` [PATCH 14/14] dm-multisnap-daniel Mike Snitzer @ 2010-03-02 0:57 ` FUJITA Tomonori 0 siblings, 0 replies; 22+ messages in thread From: FUJITA Tomonori @ 2010-03-02 0:57 UTC (permalink / raw) To: dm-devel; +Cc: mpatocka On Mon, 1 Mar 2010 19:23:58 -0500 Mike Snitzer <snitzer@redhat.com> wrote: > From: Mikulas Patocka <mpatocka@redhat.com> > > An implementation of snapshot store. > Designed by Daniel Phillips, ported to kernelspace by Fujita Tomonorig. Thanks for putting my name here, however, I prefer "Fujita Tomonori" :) > This implementation plugs as a module into dm-multisnapshot interface. > > The interface is almost the same as in Mikulas' exception store. You have to > specify "daniel" instead of "mikulas" as a type of exception store. > The only supported chunk size is 16384 bytes (but the code is general enough and > this restriction may be removed in the future). > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > --- > drivers/md/Kconfig | 14 + > drivers/md/Makefile | 2 + > drivers/md/dm-multisnap-daniel.c | 1711 ++++++++++++++++++++++++++++++++++++++ > 3 files changed, 1727 insertions(+), 0 deletions(-) > create mode 100644 drivers/md/dm-multisnap-daniel.c > > diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig > index 0c3ce3b..c30bf70 100644 > --- a/drivers/md/Kconfig > +++ b/drivers/md/Kconfig > @@ -268,6 +268,20 @@ config DM_MULTISNAPSHOT_MIKULAS > A B+-tree-based log-structured storage allowing unlimited > number of snapshots. > > +config DM_MULTISNAPSHOT_DANIEL > + tristate "Daniel's snapshot store" > + depends on DM_MULTISNAPSHOT > + ---help--- > + Daniel Philips' exception store. The data structures were > + designed by Daniel Phillips for Zumastore project, a porting > + to kernel space was done by Fujita Tomonorig. s/Tomonorig/Tomonori/; (snip) > +static int __init dm_multisnapshot_daniel_module_init(void) > +{ > + return dm_multisnap_register_exception_store(&dm_multisnap_daniel_store); > +} > + > +static void __exit dm_multisnapshot_daniel_module_exit(void) > +{ > + dm_multisnap_unregister_exception_store(&dm_multisnap_daniel_store); > +} > + > +module_init(dm_multisnapshot_daniel_module_init); > +module_exit(dm_multisnapshot_daniel_module_exit); > + > +MODULE_DESCRIPTION(DM_NAME " multisnapshot Fujita/Daniel's exceptions store"); > +MODULE_AUTHOR("Fujita Tomonorig, Daniel Phillips"); s/Tomonorig/Tomonori/; > +MODULE_LICENSE("GPL"); Thanks, ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 00/14] mikulas' shared snapshot patches 2010-03-02 0:23 [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer ` (13 preceding siblings ...) 2010-03-02 0:23 ` [PATCH 14/14] dm-multisnap-daniel Mike Snitzer @ 2010-03-02 0:32 ` Mike Snitzer 2010-03-02 14:56 ` Mike Snitzer 15 siblings, 0 replies; 22+ messages in thread From: Mike Snitzer @ 2010-03-02 0:32 UTC (permalink / raw) To: dm-devel On Mon, Mar 01 2010 at 7:23pm -0500, Mike Snitzer <snitzer@redhat.com> wrote: > But the primary difference with this submission (when compared to your > r15 patches) is I editted the patches for whitespace and typos. I'm > _really_ not trying to step on your hard work by doing this > superficial stuff. But while reviewing the code the insanely long > lines really were distracting. I tried very hard to preserve the > intent of the DM_MULTISNAP_SET_ERROR/DM_ERR messages by still having > grep'able content (on a single line). > > I also didn't go crazy like a checkpatch.pl zealot.. I didn't even run > these patches through checkpatch! > > I know how sensitive you are about allowing the editor do the wrapping > but I trully think the length of some lines would never get past > Alasdair (or Linus) -- even though they have relaxed the rules for > line length. > > I'll respond to this cover-letter with a single incremental patch that > shows my edits. As promised: drivers/md/dm-bufio.c | 132 +++++++--------- drivers/md/dm-bufio.h | 11 +- drivers/md/dm-multisnap-alloc.c | 75 +++++---- drivers/md/dm-multisnap-blocks.c | 57 ++++--- drivers/md/dm-multisnap-btree.c | 253 +++++++++++++++++------------- drivers/md/dm-multisnap-commit.c | 36 +++-- drivers/md/dm-multisnap-daniel.c | 52 ++++--- drivers/md/dm-multisnap-delete.c | 11 +- drivers/md/dm-multisnap-freelist.c | 45 +++--- drivers/md/dm-multisnap-io.c | 26 ++-- drivers/md/dm-multisnap-mikulas-struct.h | 24 ++-- drivers/md/dm-multisnap-mikulas.c | 119 +++++++++------ drivers/md/dm-multisnap-mikulas.h | 88 +++++++--- drivers/md/dm-multisnap-snaps.c | 135 ++++++++++------- drivers/md/dm-multisnap.c | 203 +++++++++++++----------- drivers/md/dm-multisnap.h | 51 ++++-- 16 files changed, 755 insertions(+), 563 deletions(-) diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c index c158622..44dbb0e 100644 --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -40,10 +40,10 @@ * * In case of memory pressure, the buffer may be written after * dm_bufio_mark_buffer_dirty, but before dm_bufio_write_dirty_buffers. - * So, dm_bufio_write_dirty_buffers guarantees that the buffer is on-disk, + * So dm_bufio_write_dirty_buffers guarantees that the buffer is on-disk * but the actual writing may occur earlier. * - * dm_bufio_release_move --- like dm_bufio_release, and also move the buffer to + * dm_bufio_release_move --- like dm_bufio_release but also move the buffer to * the new block. dm_bufio_write_dirty_buffers is needed to commit the new * block. * dm_bufio_drop_buffers --- clear all buffers. @@ -76,7 +76,7 @@ /* * Don't try to kmalloc blocks larger than this. - * For exaplanation, see dm_bufio_alloc_buffer_data below. + * For explanation, see dm_bufio_alloc_buffer_data below. */ #define DM_BUFIO_BLOCK_SIZE_KMALLOC_LIMIT PAGE_SIZE @@ -95,12 +95,11 @@ struct dm_bufio_client { * are linked to lru with their lru_list field. * dirty and clean buffers that are being written are linked * to dirty_lru with their lru_list field. When the write - * finishes, the buffer cannot be immediatelly relinked + * finishes, the buffer cannot be immediately relinked * (because we are in an interrupt context and relinking * requires process context), so some clean-not-writing * buffers can be held on dirty_lru too. They are later - * added to - * lru in the process context. + * added to lru in the process context. */ struct list_head lru; struct list_head dirty_lru; @@ -124,7 +123,7 @@ struct dm_bufio_client { }; /* - * A method, with wich the data is allocated: + * A method, with which the data is allocated: * kmalloc(), __get_free_pages() or vmalloc(). * See the comment at dm_bufio_alloc_buffer_data. */ @@ -158,22 +157,23 @@ struct dm_buffer { * __get_free_pages can randomly fail, if the memory is fragmented. * __vmalloc won't randomly fail, but vmalloc space is limited (it may be * as low as 128M) --- so using it for caching is not appropriate. - * If the allocation may fail, we use __get_free_pages, memory fragmentation + * If the allocation may fail we use __get_free_pages. Memory fragmentation * won't have fatal effect here, it just causes flushes of some other * buffers and more I/O will be performed. - * If the allocation shouldn't fail, we use __vmalloc. This is only for + * If the allocation shouldn't fail we use __vmalloc. This is only for * the initial reserve allocation, so there's no risk of wasting * all vmalloc space. */ - -static void *dm_bufio_alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mask, char *data_mode) +static void *dm_bufio_alloc_buffer_data(struct dm_bufio_client *c, + gfp_t gfp_mask, char *data_mode) { if (c->block_size <= DM_BUFIO_BLOCK_SIZE_KMALLOC_LIMIT) { *data_mode = DATA_MODE_KMALLOC; return kmalloc(c->block_size, gfp_mask); } else if (gfp_mask & __GFP_NORETRY) { *data_mode = DATA_MODE_GET_FREE_PAGES; - return (void *)__get_free_pages(gfp_mask, c->pages_per_block_bits); + return (void *)__get_free_pages(gfp_mask, + c->pages_per_block_bits); } else { *data_mode = DATA_MODE_VMALLOC; return __vmalloc(c->block_size, gfp_mask, PAGE_KERNEL); @@ -183,8 +183,8 @@ static void *dm_bufio_alloc_buffer_data(struct dm_bufio_client *c, gfp_t gfp_mas /* * Free buffer's data. */ - -static void dm_bufio_free_buffer_data(struct dm_bufio_client *c, void *data, char data_mode) +static void dm_bufio_free_buffer_data(struct dm_bufio_client *c, + void *data, char data_mode) { switch (data_mode) { @@ -198,17 +198,16 @@ static void dm_bufio_free_buffer_data(struct dm_bufio_client *c, void *data, cha vfree(data); break; default: - printk(KERN_CRIT "dm_bufio_free_buffer_data: bad data mode: %d", data_mode); + printk(KERN_CRIT "dm_bufio_free_buffer_data: bad data mode: %d", + data_mode); BUG(); } } - /* * Allocate buffer and its data. */ - static struct dm_buffer *alloc_buffer(struct dm_bufio_client *c, gfp_t gfp_mask) { struct dm_buffer *b; @@ -227,7 +226,6 @@ static struct dm_buffer *alloc_buffer(struct dm_bufio_client *c, gfp_t gfp_mask) /* * Free buffer and its data. */ - static void free_buffer(struct dm_buffer *b) { dm_bufio_free_buffer_data(b->c, b->data, b->data_mode); @@ -238,7 +236,6 @@ static void free_buffer(struct dm_buffer *b) /* * Link buffer to the hash list and clean or dirty queue. */ - static void link_buffer(struct dm_buffer *b, sector_t block, int dirty) { struct dm_bufio_client *c = b->c; @@ -251,7 +248,6 @@ static void link_buffer(struct dm_buffer *b, sector_t block, int dirty) /* * Unlink buffer from the hash list and dirty or clean queue. */ - static void unlink_buffer(struct dm_buffer *b) { BUG_ON(!b->c->n_buffers); @@ -263,7 +259,6 @@ static void unlink_buffer(struct dm_buffer *b) /* * Place the buffer to the head of dirty or clean LRU queue. */ - static void relink_lru(struct dm_buffer *b, int dirty) { struct dm_bufio_client *c = b->c; @@ -276,7 +271,6 @@ static void relink_lru(struct dm_buffer *b, int dirty) * It unplugs the underlying block device, so that coalesced I/Os in * the request queue are dispatched to the device. */ - static int do_io_schedule(void *word) { struct dm_buffer *b = container_of(word, struct dm_buffer, state); @@ -297,7 +291,6 @@ static void write_dirty_buffer(struct dm_buffer *b); * When this function finishes, there is no I/O running on the buffer * and the buffer is not dirty. */ - static void make_buffer_clean(struct dm_buffer *b) { BUG_ON(b->hold_count); @@ -311,9 +304,8 @@ static void make_buffer_clean(struct dm_buffer *b) /* * Find some buffer that is not held by anybody, clean it, unlink it and * return it. - * If "wait" is zero, try less harder and don't block. + * If "wait" is zero, try less hard and don't block. */ - static struct dm_buffer *get_unclaimed_buffer(struct dm_bufio_client *c, int wait) { struct dm_buffer *b; @@ -354,7 +346,6 @@ static struct dm_buffer *get_unclaimed_buffer(struct dm_bufio_client *c, int wai * This function is entered with c->lock held, drops it and regains it before * exiting. */ - static void wait_for_free_buffer(struct dm_bufio_client *c) { DECLARE_WAITQUEUE(wait, current); @@ -377,7 +368,6 @@ static void wait_for_free_buffer(struct dm_bufio_client *c) * * May drop the lock and regain it. */ - static struct dm_buffer *alloc_buffer_wait(struct dm_bufio_client *c) { struct dm_buffer *b; @@ -413,7 +403,6 @@ retry: /* * Free a buffer and wake other threads waiting for free buffers. */ - static void free_buffer_wake(struct dm_buffer *b) { struct dm_bufio_client *c = b->c; @@ -433,7 +422,6 @@ static void free_buffer_wake(struct dm_buffer *b) * If we are over threshold_buffers, start freeing buffers. * If we're over "limit_buffers", blocks until we get under the limit. */ - static void check_watermark(struct dm_bufio_client *c) { while (c->n_buffers > c->threshold_buffers) { @@ -462,14 +450,15 @@ static void dm_bufio_dmio_complete(unsigned long error, void *context); * it is not vmalloc()ated, try using the bio interface. * * If the buffer is big, if it is vmalloc()ated or if the underlying device - * rejects the bio because it is too large, use dmio layer to do the I/O. + * rejects the bio because it is too large, use dm-io layer to do the I/O. * dmio layer splits the I/O to multiple requests, solving the above - * shorcomings. + * shortcomings. */ - -static void dm_bufio_submit_io(struct dm_buffer *b, int rw, sector_t block, bio_end_io_t *end_io) +static void dm_bufio_submit_io(struct dm_buffer *b, int rw, sector_t block, + bio_end_io_t *end_io) { - if (b->c->block_size <= DM_BUFIO_INLINE_VECS * PAGE_SIZE && b->data_mode != DATA_MODE_VMALLOC) { + if (b->c->block_size <= DM_BUFIO_INLINE_VECS * PAGE_SIZE && + b->data_mode != DATA_MODE_VMALLOC) { char *ptr; int len; bio_init(&b->bio); @@ -486,7 +475,9 @@ static void dm_bufio_submit_io(struct dm_buffer *b, int rw, sector_t block, bio_ ptr = b->data; len = b->c->block_size; do { - if (!bio_add_page(&b->bio, virt_to_page(ptr), len < PAGE_SIZE ? len : PAGE_SIZE, virt_to_phys(ptr) & (PAGE_SIZE - 1))) { + if (!bio_add_page(&b->bio, virt_to_page(ptr), + len < PAGE_SIZE ? len : PAGE_SIZE, + virt_to_phys(ptr) & (PAGE_SIZE - 1))) { BUG_ON(b->c->block_size <= PAGE_SIZE); goto use_dmio; } @@ -526,7 +517,6 @@ use_dmio : { * dm-io completion routine. It just calls b->bio.bi_end_io, pretending * that the request was handled directly with bio interface. */ - static void dm_bufio_dmio_complete(unsigned long error, void *context) { struct dm_buffer *b = context; @@ -537,7 +527,6 @@ static void dm_bufio_dmio_complete(unsigned long error, void *context) } /* Find a buffer in the hash. */ - static struct dm_buffer *dm_bufio_find(struct dm_bufio_client *c, sector_t block) { struct dm_buffer *b; @@ -559,8 +548,8 @@ static void read_endio(struct bio *bio, int error); * doesn't read the buffer from the disk (assuming that the caller overwrites * all the data and uses dm_bufio_mark_buffer_dirty to write new data back). */ - -static void *dm_bufio_new_read(struct dm_bufio_client *c, sector_t block, struct dm_buffer **bp, int read) +static void *dm_bufio_new_read(struct dm_bufio_client *c, sector_t block, + struct dm_buffer **bp, int read) { struct dm_buffer *b, *new_b = NULL; @@ -572,11 +561,13 @@ retry_search: if (new_b) free_buffer_wake(new_b); b->hold_count++; - relink_lru(b, test_bit(B_DIRTY, &b->state) || test_bit(B_WRITING, &b->state)); + relink_lru(b, test_bit(B_DIRTY, &b->state) || + test_bit(B_WRITING, &b->state)); unlock_wait_ret: mutex_unlock(&c->lock); wait_ret: - wait_on_bit(&b->state, B_READING, do_io_schedule, TASK_UNINTERRUPTIBLE); + wait_on_bit(&b->state, B_READING, + do_io_schedule, TASK_UNINTERRUPTIBLE); if (b->read_error) { int error = b->read_error; dm_bufio_release(b); @@ -613,16 +604,16 @@ wait_ret: } /* Read the buffer and hold reference on it */ - -void *dm_bufio_read(struct dm_bufio_client *c, sector_t block, struct dm_buffer **bp) +void *dm_bufio_read(struct dm_bufio_client *c, sector_t block, + struct dm_buffer **bp) { return dm_bufio_new_read(c, block, bp, 1); } EXPORT_SYMBOL(dm_bufio_read); /* Get the buffer with possibly invalid data and hold reference on it */ - -void *dm_bufio_new(struct dm_bufio_client *c, sector_t block, struct dm_buffer **bp) +void *dm_bufio_new(struct dm_bufio_client *c, sector_t block, + struct dm_buffer **bp) { return dm_bufio_new_read(c, block, bp, 0); } @@ -632,7 +623,6 @@ EXPORT_SYMBOL(dm_bufio_new); * The endio routine for reading: set the error, clear the bit and wake up * anyone waiting on the buffer. */ - static void read_endio(struct bio *bio, int error) { struct dm_buffer *b = container_of(bio, struct dm_buffer, bio); @@ -647,7 +637,6 @@ static void read_endio(struct bio *bio, int error) /* * Release the reference held on the buffer. */ - void dm_bufio_release(struct dm_buffer *b) { struct dm_bufio_client *c = b->c; @@ -677,7 +666,6 @@ EXPORT_SYMBOL(dm_bufio_release); * Mark that the data in the buffer were modified and the buffer needs to * be written back. */ - void dm_bufio_mark_buffer_dirty(struct dm_buffer *b) { struct dm_bufio_client *c = b->c; @@ -701,13 +689,13 @@ static void write_endio(struct bio *bio, int error); * Finally, submit our write and don't wait on it. We set B_WRITING indicating * that there is a write in progress. */ - static void write_dirty_buffer(struct dm_buffer *b) { if (!test_bit(B_DIRTY, &b->state)) return; clear_bit(B_DIRTY, &b->state); - wait_on_bit_lock(&b->state, B_WRITING, do_io_schedule, TASK_UNINTERRUPTIBLE); + wait_on_bit_lock(&b->state, B_WRITING, + do_io_schedule, TASK_UNINTERRUPTIBLE); dm_bufio_submit_io(b, WRITE, b->block, write_endio); } @@ -715,7 +703,6 @@ static void write_dirty_buffer(struct dm_buffer *b) * The endio routine for write. * Set the error, clear B_WRITING bit and wake anyone who was waiting on it. */ - static void write_endio(struct bio *bio, int error) { struct dm_buffer *b = container_of(bio, struct dm_buffer, bio); @@ -734,7 +721,6 @@ static void write_endio(struct bio *bio, int error) /* * Start writing all the dirty buffers. Don't wait for results. */ - void dm_bufio_write_dirty_buffers_async(struct dm_bufio_client *c) { struct dm_buffer *b; @@ -756,7 +742,6 @@ EXPORT_SYMBOL(dm_bufio_write_dirty_buffers_async); * * Finally, we flush hardware disk cache. */ - int dm_bufio_write_dirty_buffers(struct dm_bufio_client *c) { int a, f; @@ -777,11 +762,13 @@ again: dropped_lock = 1; b->hold_count++; mutex_unlock(&c->lock); - wait_on_bit(&b->state, B_WRITING, do_io_schedule, TASK_UNINTERRUPTIBLE); + wait_on_bit(&b->state, B_WRITING, + do_io_schedule, TASK_UNINTERRUPTIBLE); mutex_lock(&c->lock); b->hold_count--; } else - wait_on_bit(&b->state, B_WRITING, do_io_schedule, TASK_UNINTERRUPTIBLE); + wait_on_bit(&b->state, B_WRITING, + do_io_schedule, TASK_UNINTERRUPTIBLE); } if (!test_bit(B_DIRTY, &b->state) && !test_bit(B_WRITING, &b->state)) relink_lru(b, 0); @@ -794,7 +781,7 @@ again: * relinked to the clean list, so we won't loop scanning the * same buffer again and again. * - * This may livelock if there is other thread simultaneously + * This may livelock if there is another thread simultaneously * dirtying buffers, so we count the number of buffers walked * and if it exceeds the total number of buffers, it means that * someone is doing some writes simultaneously with us --- in @@ -817,7 +804,6 @@ EXPORT_SYMBOL(dm_bufio_write_dirty_buffers); /* * Use dm-io to send and empty barrier flush the device. */ - int dm_bufio_issue_flush(struct dm_bufio_client *c) { struct dm_io_request io_req = { @@ -849,7 +835,6 @@ EXPORT_SYMBOL(dm_bufio_issue_flush); * location but not relink it, because that other user needs to have the buffer * at the same place. */ - void dm_bufio_release_move(struct dm_buffer *b, sector_t new_block) { struct dm_bufio_client *c = b->c; @@ -873,14 +858,17 @@ retry: BUG_ON(test_bit(B_READING, &b->state)); write_dirty_buffer(b); if (b->hold_count == 1) { - wait_on_bit(&b->state, B_WRITING, do_io_schedule, TASK_UNINTERRUPTIBLE); + wait_on_bit(&b->state, B_WRITING, + do_io_schedule, TASK_UNINTERRUPTIBLE); set_bit(B_DIRTY, &b->state); unlink_buffer(b); link_buffer(b, new_block, 1); } else { - wait_on_bit_lock(&b->state, B_WRITING, do_io_schedule, TASK_UNINTERRUPTIBLE); + wait_on_bit_lock(&b->state, B_WRITING, + do_io_schedule, TASK_UNINTERRUPTIBLE); dm_bufio_submit_io(b, WRITE, new_block, write_endio); - wait_on_bit(&b->state, B_WRITING, do_io_schedule, TASK_UNINTERRUPTIBLE); + wait_on_bit(&b->state, B_WRITING, + do_io_schedule, TASK_UNINTERRUPTIBLE); } mutex_unlock(&c->lock); dm_bufio_release(b); @@ -889,15 +877,14 @@ EXPORT_SYMBOL(dm_bufio_release_move); /* * Free all the buffers (and possibly write them if they were dirty) - * It is required that the calling theread doesn't have any reference on + * It is required that the calling thread doesn't have any reference on * any buffer. */ - void dm_bufio_drop_buffers(struct dm_bufio_client *c) { struct dm_buffer *b; - /* an optimization ... so that the buffers are not writte one-by-one */ + /* an optimization ... so that the buffers are not written one-by-one */ dm_bufio_write_dirty_buffers_async(c); mutex_lock(&c->lock); @@ -910,8 +897,9 @@ void dm_bufio_drop_buffers(struct dm_bufio_client *c) EXPORT_SYMBOL(dm_bufio_drop_buffers); /* Create the buffering interface */ - -struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsigned blocksize, unsigned flags, __u64 cache_threshold, __u64 cache_limit) +struct dm_bufio_client * +dm_bufio_client_create(struct block_device *bdev, unsigned blocksize, + unsigned flags, __u64 cache_threshold, __u64 cache_limit) { int r; struct dm_bufio_client *c; @@ -928,7 +916,8 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign c->bdev = bdev; c->block_size = blocksize; c->sectors_per_block_bits = ffs(blocksize) - 1 - SECTOR_SHIFT; - c->pages_per_block_bits = ffs(blocksize) - 1 >= PAGE_SHIFT ? ffs(blocksize) - 1 - PAGE_SHIFT : 0; + c->pages_per_block_bits = (ffs(blocksize) - 1 >= PAGE_SHIFT) ? + (ffs(blocksize) - 1 - PAGE_SHIFT) : 0; INIT_LIST_HEAD(&c->lru); INIT_LIST_HEAD(&c->dirty_lru); for (i = 0; i < DM_BUFIO_HASH_SIZE; i++) @@ -938,7 +927,8 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign if (!cache_limit) cache_limit = DM_BUFIO_LIMIT_MEMORY; - c->limit_buffers = cache_limit >> (c->sectors_per_block_bits + SECTOR_SHIFT); + c->limit_buffers = cache_limit >> + (c->sectors_per_block_bits + SECTOR_SHIFT); if (!c->limit_buffers) c->limit_buffers = 1; @@ -946,12 +936,11 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign cache_threshold = DM_BUFIO_THRESHOLD_MEMORY; if (cache_threshold > cache_limit) cache_threshold = cache_limit; - c->threshold_buffers = cache_threshold >> (c->sectors_per_block_bits + SECTOR_SHIFT); + c->threshold_buffers = cache_threshold >> + (c->sectors_per_block_bits + SECTOR_SHIFT); if (!c->threshold_buffers) c->threshold_buffers = 1; - /*printk("%d %d\n", c->limit_buffers, c->threshold_buffers);*/ - init_waitqueue_head(&c->free_buffer_wait); c->async_write_error = 0; @@ -983,7 +972,6 @@ EXPORT_SYMBOL(dm_bufio_client_create); * Free the buffering interface. * It is required that there are no references on any buffers. */ - void dm_bufio_client_destroy(struct dm_bufio_client *c) { unsigned i; diff --git a/drivers/md/dm-bufio.h b/drivers/md/dm-bufio.h index 7abc035..3261ea2 100644 --- a/drivers/md/dm-bufio.h +++ b/drivers/md/dm-bufio.h @@ -12,8 +12,10 @@ struct dm_bufio_client; struct dm_buffer; -void *dm_bufio_read(struct dm_bufio_client *c, sector_t block, struct dm_buffer **bp); -void *dm_bufio_new(struct dm_bufio_client *c, sector_t block, struct dm_buffer **bp); +void *dm_bufio_read(struct dm_bufio_client *c, sector_t block, + struct dm_buffer **bp); +void *dm_bufio_new(struct dm_bufio_client *c, sector_t block, + struct dm_buffer **bp); void dm_bufio_release(struct dm_buffer *b); void dm_bufio_mark_buffer_dirty(struct dm_buffer *b); @@ -23,7 +25,10 @@ int dm_bufio_issue_flush(struct dm_bufio_client *c); void dm_bufio_release_move(struct dm_buffer *b, sector_t new_block); -struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsigned blocksize, unsigned flags, __u64 cache_threshold, __u64 cache_limit); +struct dm_bufio_client * +dm_bufio_client_create(struct block_device *bdev, unsigned blocksize, + unsigned flags, __u64 cache_threshold, + __u64 cache_limit); void dm_bufio_client_destroy(struct dm_bufio_client *c); void dm_bufio_drop_buffers(struct dm_bufio_client *c); diff --git a/drivers/md/dm-multisnap-alloc.c b/drivers/md/dm-multisnap-alloc.c index 482ed54..02f89be 100644 --- a/drivers/md/dm-multisnap-alloc.c +++ b/drivers/md/dm-multisnap-alloc.c @@ -16,7 +16,6 @@ /* * Initialize the root bitmap, write it at the position "writing block". */ - void dm_multisnap_create_bitmaps(struct dm_exception_store *s, chunk_t *writing_block) { struct dm_buffer *bp; @@ -27,18 +26,23 @@ void dm_multisnap_create_bitmaps(struct dm_exception_store *s, chunk_t *writing_ (*writing_block)++; if (*writing_block >= s->dev_size) { - DM_MULTISNAP_SET_ERROR(s->dm, -ENOSPC, ("dm_multisnap_create_bitmaps: device is too small")); + DM_MULTISNAP_SET_ERROR(s->dm, -ENOSPC, + ("dm_multisnap_create_bitmaps: device is too small")); return; } if (*writing_block >= s->chunk_size << BITS_PER_BYTE_SHIFT) { - DM_MULTISNAP_SET_ERROR(s->dm, -ENOSPC, ("dm_multisnap_create_bitmaps: invalid block to write: %llx", (unsigned long long)*writing_block)); + DM_MULTISNAP_SET_ERROR(s->dm, -ENOSPC, + ("dm_multisnap_create_bitmaps: invalid block to write: %llx", + (unsigned long long)*writing_block)); return; } bmp = dm_bufio_new(s->bufio, *writing_block, &bp); if (IS_ERR(bmp)) { - DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(bmp), ("dm_multisnap_create_bitmaps: can't create direct bitmap block at %llx", (unsigned long long)*writing_block)); + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(bmp), + ("dm_multisnap_create_bitmaps: can't create direct bitmap block at %llx", + (unsigned long long)*writing_block)); return; } cond_resched(); @@ -64,10 +68,9 @@ static void dm_multisnap_add_bitmap(struct dm_exception_store *s); /* * Extend bitmaps to cover "new_size" area. * - * While we extend bitmaps, we increase s->dev_size, so that the newly mapped + * While we extend bitmaps we increase s->dev_size so that the newly mapped * space can be used to hold further bitmaps. */ - void dm_multisnap_extend_bitmaps(struct dm_exception_store *s, chunk_t new_size) { while (s->dev_size < new_size) { @@ -103,7 +106,6 @@ void dm_multisnap_extend_bitmaps(struct dm_exception_store *s, chunk_t new_size) * Add one bitmap after the last bitmap. A helper function for * dm_multisnap_extend_bitmaps */ - static void dm_multisnap_add_bitmap(struct dm_exception_store *s) { struct path_element path[MAX_BITMAP_DEPTH]; @@ -171,8 +173,8 @@ static void dm_multisnap_add_bitmap(struct dm_exception_store *s) * Return the pointer to the data, store the held buffer to bl. * Return the block in block and path in path. */ - -void *dm_multisnap_map_bitmap(struct dm_exception_store *s, bitmap_t bitmap, struct dm_buffer **bp, chunk_t *block, struct path_element *path) +void *dm_multisnap_map_bitmap(struct dm_exception_store *s, bitmap_t bitmap, + struct dm_buffer **bp, chunk_t *block, struct path_element *path) { __u64 *bmp; unsigned idx; @@ -184,14 +186,15 @@ void *dm_multisnap_map_bitmap(struct dm_exception_store *s, bitmap_t bitmap, str bmp = dm_multisnap_read_block(s, blk, bp); if (unlikely(!bmp)) { /* error is already set in dm_multisnap_read_block */ - DMERR("dm_multisnap_map_bitmap: can't read bitmap at %llx (%llx), pointed to by %llx (%llx), depth %d/%d, index %llx", - (unsigned long long)blk, - (unsigned long long)dm_multisnap_remap_block(s, blk), - (unsigned long long)parent, - (unsigned long long)dm_multisnap_remap_block(s, parent), - s->bitmap_depth - d, - s->bitmap_depth, - (unsigned long long)bitmap); + DMERR("dm_multisnap_map_bitmap: can't read bitmap at " + "%llx (%llx), pointed to by %llx (%llx), depth %d/%d, index %llx", + (unsigned long long)blk, + (unsigned long long)dm_multisnap_remap_block(s, blk), + (unsigned long long)parent, + (unsigned long long)dm_multisnap_remap_block(s, parent), + s->bitmap_depth - d, + s->bitmap_depth, + (unsigned long long)bitmap); return NULL; } if (!d) { @@ -200,7 +203,8 @@ void *dm_multisnap_map_bitmap(struct dm_exception_store *s, bitmap_t bitmap, str return bmp; } - idx = (bitmap >> ((d - 1) * (s->chunk_shift - BYTES_PER_POINTER_SHIFT))) & ((s->chunk_size - 1) >> BYTES_PER_POINTER_SHIFT); + idx = (bitmap >> ((d - 1) * (s->chunk_shift - BYTES_PER_POINTER_SHIFT))) & + ((s->chunk_size - 1) >> BYTES_PER_POINTER_SHIFT); if (unlikely(path != NULL)) { path[s->bitmap_depth - d].block = blk; @@ -221,7 +225,6 @@ void *dm_multisnap_map_bitmap(struct dm_exception_store *s, bitmap_t bitmap, str * Find a free bit from "start" to "end" (in bits). * If wide_search is nonzero, search for the whole free byte first. */ - static int find_bit(const void *bmp, unsigned start, unsigned end, int wide_search) { const void *p; @@ -258,7 +261,6 @@ ret_bit: * to find the valid number of bits. Note that bits past s->dev_size are * undefined, there can be anything, so we must not scan past this limit. */ - static unsigned bitmap_limit(struct dm_exception_store *s, bitmap_t bmp) { if (bmp == (bitmap_t)(s->dev_size >> (s->chunk_shift + BITS_PER_BYTE_SHIFT))) @@ -287,8 +289,8 @@ static unsigned bitmap_limit(struct dm_exception_store *s, bitmap_t bmp) * This is similar to what ext[23] does, so I suppose it is tuned well enough * that it won't fragment too much. */ - -int dm_multisnap_alloc_blocks(struct dm_exception_store *s, chunk_t *results, unsigned n_blocks, int flags) +int dm_multisnap_alloc_blocks(struct dm_exception_store *s, chunk_t *results, + unsigned n_blocks, int flags) { void *bmp; struct dm_buffer *bp; @@ -427,7 +429,8 @@ bp_release_return: * block was created since last commit. */ -void *dm_multisnap_alloc_duplicate_block(struct dm_exception_store *s, chunk_t block, struct dm_buffer **bp, void *ptr) +void *dm_multisnap_alloc_duplicate_block(struct dm_exception_store *s, chunk_t block, + struct dm_buffer **bp, void *ptr) { int r; chunk_t new_chunk; @@ -446,15 +449,16 @@ void *dm_multisnap_alloc_duplicate_block(struct dm_exception_store *s, chunk_t b if (!data) return NULL; - return dm_multisnap_duplicate_block(s, block, new_chunk, CB_BITMAP_IDX_NONE, bp, NULL); + return dm_multisnap_duplicate_block(s, block, new_chunk, + CB_BITMAP_IDX_NONE, bp, NULL); } /* * Allocate a new block and return its data. Return the block number in *result * and buffer pointer in *bp. */ - -void *dm_multisnap_alloc_make_block(struct dm_exception_store *s, chunk_t *result, struct dm_buffer **bp) +void *dm_multisnap_alloc_make_block(struct dm_exception_store *s, chunk_t *result, + struct dm_buffer **bp) { int r = dm_multisnap_alloc_blocks(s, result, 1, 0); if (unlikely(r < 0)) @@ -464,16 +468,16 @@ void *dm_multisnap_alloc_make_block(struct dm_exception_store *s, chunk_t *resul } /* - * Free the block immediatelly. You must be careful with this function because + * Free the block immediately. You must be careful with this function because * it doesn't follow log-structured protocol. * * It may be used only if * - the blocks to free were allocated since last transactions. - * - or from freelist management, that makes the blocks is already recorded in + * - or from freelist management, which means the blocks were already recorded in * a freelist (thus it would be freed again in case of machine crash). */ - -void dm_multisnap_free_blocks_immediate(struct dm_exception_store *s, chunk_t block, unsigned n_blocks) +void dm_multisnap_free_blocks_immediate(struct dm_exception_store *s, chunk_t block, + unsigned n_blocks) { void *bmp; struct dm_buffer *bp; @@ -482,7 +486,9 @@ void dm_multisnap_free_blocks_immediate(struct dm_exception_store *s, chunk_t bl return; if (unlikely(block + n_blocks > s->dev_size)) { - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_free_block_immediate: freeing invalid blocks %llx, %x", (unsigned long long)block, n_blocks)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_free_block_immediate: freeing invalid blocks %llx, %x", + (unsigned long long)block, n_blocks)); return; } @@ -515,7 +521,6 @@ void dm_multisnap_free_blocks_immediate(struct dm_exception_store *s, chunk_t bl * Flush tmp_remaps for bitmaps. Write the path from modified bitmaps to the * root. */ - void dm_multisnap_bitmap_finalize_tmp_remap(struct dm_exception_store *s, struct tmp_remap *tmp_remap) { chunk_t block; @@ -533,7 +538,8 @@ void dm_multisnap_bitmap_finalize_tmp_remap(struct dm_exception_store *s, struct * doesn't have to allocate anything. */ if (s->n_preallocated_blocks < s->bitmap_depth) { - if (unlikely(dm_multisnap_alloc_blocks(s, s->preallocated_blocks + s->n_preallocated_blocks, s->bitmap_depth * 2 - s->n_preallocated_blocks, 0) < 0)) + if (unlikely(dm_multisnap_alloc_blocks(s, s->preallocated_blocks + s->n_preallocated_blocks, + s->bitmap_depth * 2 - s->n_preallocated_blocks, 0) < 0)) return; s->n_preallocated_blocks = s->bitmap_depth * 2; } @@ -579,5 +585,6 @@ void dm_multisnap_bitmap_finalize_tmp_remap(struct dm_exception_store *s, struct s->bitmap_root = new_blockn; skip_it: - memmove(s->preallocated_blocks, s->preallocated_blocks + results_ptr, (s->n_preallocated_blocks -= results_ptr) * sizeof(chunk_t)); + memmove(s->preallocated_blocks, s->preallocated_blocks + results_ptr, + (s->n_preallocated_blocks -= results_ptr) * sizeof(chunk_t)); } diff --git a/drivers/md/dm-multisnap-blocks.c b/drivers/md/dm-multisnap-blocks.c index 2b53cd7..8715ed9 100644 --- a/drivers/md/dm-multisnap-blocks.c +++ b/drivers/md/dm-multisnap-blocks.c @@ -11,13 +11,14 @@ /* * Check that the block is valid. */ - static int check_invalid(struct dm_exception_store *s, chunk_t block) { if (unlikely(block >= s->dev_size) || unlikely(block == SB_BLOCK) || unlikely(dm_multisnap_is_commit_block(s, block))) { - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("check_invalid: access to invalid part of the device: %llx, size %llx", (unsigned long long)block, (unsigned long long)s->dev_size)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("check_invalid: access to invalid part of the device: %llx, size %llx", + (unsigned long long)block, (unsigned long long)s->dev_size)); return 1; } return 0; @@ -39,7 +40,6 @@ static struct tmp_remap *find_tmp_remap(struct dm_exception_store *s, chunk_t bl /* * Remap a block number according to tmp_remap table. */ - chunk_t dm_multisnap_remap_block(struct dm_exception_store *s, chunk_t block) { struct tmp_remap *t; @@ -55,8 +55,8 @@ chunk_t dm_multisnap_remap_block(struct dm_exception_store *s, chunk_t block) * * Do a possible block remapping according to tmp_remap table. */ - -void *dm_multisnap_read_block(struct dm_exception_store *s, chunk_t block, struct dm_buffer **bp) +void *dm_multisnap_read_block(struct dm_exception_store *s, chunk_t block, + struct dm_buffer **bp) { void *buf; cond_resched(); @@ -71,7 +71,9 @@ void *dm_multisnap_read_block(struct dm_exception_store *s, chunk_t block, struc buf = dm_bufio_read(s->bufio, block, bp); if (unlikely(IS_ERR(buf))) { - DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(buf), ("dm_multisnap_read_block: error read chunk %llx", (unsigned long long)block)); + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(buf), + ("dm_multisnap_read_block: error read chunk %llx", + (unsigned long long)block)); return NULL; } return buf; @@ -90,7 +92,6 @@ struct uncommitted_record { * This function is used for optimizations, if it returns 0 * it doesn't break correctness, it only degrades performance. */ - int dm_multisnap_block_is_uncommitted(struct dm_exception_store *s, chunk_t block) { struct tmp_remap *t; @@ -120,7 +121,6 @@ int dm_multisnap_block_is_uncommitted(struct dm_exception_store *s, chunk_t bloc * We can't use non-failing allocation because it could deadlock (wait for some * pages being written and that write could be directed through this driver). */ - void dm_multisnap_block_set_uncommitted(struct dm_exception_store *s, chunk_t block) { struct uncommitted_record *ur; @@ -131,7 +131,8 @@ void dm_multisnap_block_set_uncommitted(struct dm_exception_store *s, chunk_t bl * __GFP_NOMEMALLOC makes it less aggressive if the allocator recurses * into itself. */ - ur = kmalloc(sizeof(struct uncommitted_record), GFP_NOWAIT | __GFP_NOWARN | __GFP_NOMEMALLOC); + ur = kmalloc(sizeof(struct uncommitted_record), + GFP_NOWAIT | __GFP_NOWARN | __GFP_NOMEMALLOC); if (!ur) return; ur->block = block; @@ -142,14 +143,14 @@ void dm_multisnap_block_set_uncommitted(struct dm_exception_store *s, chunk_t bl * Clear the register of uncommitted blocks. This is called on commit and * on unload. */ - void dm_multisnap_clear_uncommitted(struct dm_exception_store *s) { int i; for (i = 0; i < UNCOMMITTED_BLOCK_HASH_SIZE; i++) { struct hlist_head *h = &s->uncommitted_blocks[i]; while (!hlist_empty(h)) { - struct uncommitted_record *ur = hlist_entry(h->first, struct uncommitted_record, hash); + struct uncommitted_record *ur = + hlist_entry(h->first, struct uncommitted_record, hash); hlist_del(&ur->hash); kfree(ur); } @@ -170,8 +171,9 @@ void dm_multisnap_clear_uncommitted(struct dm_exception_store *s) * A block that needs to be freed is returned in to_free. If to_free is NULL, * that block is freed immediatelly. */ - -void *dm_multisnap_duplicate_block(struct dm_exception_store *s, chunk_t old_chunk, chunk_t new_chunk, bitmap_t bitmap_idx, struct dm_buffer **bp, chunk_t *to_free_ptr) +void *dm_multisnap_duplicate_block(struct dm_exception_store *s, chunk_t old_chunk, + chunk_t new_chunk, bitmap_t bitmap_idx, + struct dm_buffer **bp, chunk_t *to_free_ptr) { chunk_t to_free_val; void *buf; @@ -188,14 +190,17 @@ void *dm_multisnap_duplicate_block(struct dm_exception_store *s, chunk_t old_chu t = find_tmp_remap(s, old_chunk); if (t) { if (unlikely(t->bitmap_idx != bitmap_idx)) { - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_duplicate_block: bitmap_idx doesn't match, %X != %X", t->bitmap_idx, bitmap_idx)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_duplicate_block: bitmap_idx doesn't match, %X != %X", + t->bitmap_idx, bitmap_idx)); return NULL; } *to_free_ptr = t->new; t->new = new_chunk; } else { if (unlikely(list_empty(&s->free_tmp_remaps))) { - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_duplicate_block: all remap blocks used")); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_duplicate_block: all remap blocks used")); return NULL; } t = list_first_entry(&s->free_tmp_remaps, struct tmp_remap, list); @@ -218,7 +223,9 @@ void *dm_multisnap_duplicate_block(struct dm_exception_store *s, chunk_t old_chu buf = dm_bufio_read(s->bufio, new_chunk, bp); if (IS_ERR(buf)) { - DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(buf), ("dm_multisnap_duplicate_block: error reading chunk %llx", (unsigned long long)new_chunk)); + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(buf), + ("dm_multisnap_duplicate_block: error reading chunk %llx", + (unsigned long long)new_chunk)); return NULL; } return buf; @@ -227,7 +234,6 @@ void *dm_multisnap_duplicate_block(struct dm_exception_store *s, chunk_t old_chu /* * Remove an entry from tmp_remap table. */ - void dm_multisnap_free_tmp_remap(struct dm_exception_store *s, struct tmp_remap *t) { list_del(&t->list); @@ -241,8 +247,8 @@ void dm_multisnap_free_tmp_remap(struct dm_exception_store *s, struct tmp_remap * It is expected that the caller fills all the data in the block, calls * dm_bufio_mark_buffer_dirty and releases the buffer. */ - -void *dm_multisnap_make_block(struct dm_exception_store *s, chunk_t new_chunk, struct dm_buffer **bp) +void *dm_multisnap_make_block(struct dm_exception_store *s, chunk_t new_chunk, + struct dm_buffer **bp) { void *buf; @@ -253,7 +259,9 @@ void *dm_multisnap_make_block(struct dm_exception_store *s, chunk_t new_chunk, s buf = dm_bufio_new(s->bufio, new_chunk, bp); if (unlikely(IS_ERR(buf))) { - DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(buf), ("dm_multisnap_make_block: error creating new block at chunk %llx", (unsigned long long)new_chunk)); + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(buf), + ("dm_multisnap_make_block: error creating new block at chunk %llx", + (unsigned long long)new_chunk)); return NULL; } return buf; @@ -262,7 +270,6 @@ void *dm_multisnap_make_block(struct dm_exception_store *s, chunk_t new_chunk, s /* * Free the given block and a possible tmp_remap shadow of it. */ - void dm_multisnap_free_block_and_duplicates(struct dm_exception_store *s, chunk_t block) { struct tmp_remap *t; @@ -281,7 +288,6 @@ void dm_multisnap_free_block_and_duplicates(struct dm_exception_store *s, chunk_ /* * Return true if the block is a commit block. */ - int dm_multisnap_is_commit_block(struct dm_exception_store *s, chunk_t block) { if (unlikely(block < FIRST_CB_BLOCK)) @@ -299,14 +305,13 @@ int dm_multisnap_is_commit_block(struct dm_exception_store *s, chunk_t block) /* * These two functions are used to avoid cycling on a corrupted device. * - * If the data on the device are corrupted, we mark the device as errorneous, + * If the data on the device is corrupted, we mark the device as errorneous, * but we don't want to lockup the whole system. These functions help to achieve * this goal. * * cy->count is the number of processed blocks. * cy->key is the recorded block at last power-of-two count. */ - void dm_multisnap_init_stop_cycles(struct stop_cycles *cy) { cy->key = 0; @@ -316,7 +321,9 @@ void dm_multisnap_init_stop_cycles(struct stop_cycles *cy) int dm_multisnap_stop_cycles(struct dm_exception_store *s, struct stop_cycles *cy, chunk_t key) { if (unlikely(cy->key == key) && unlikely(cy->count != 0)) { - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_stop_cycles: cycle detected at chunk %llx", (unsigned long long)key)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_stop_cycles: cycle detected at chunk %llx", + (unsigned long long)key)); return -1; } cy->count++; diff --git a/drivers/md/dm-multisnap-btree.c b/drivers/md/dm-multisnap-btree.c index 722d842..a7e3b60 100644 --- a/drivers/md/dm-multisnap-btree.c +++ b/drivers/md/dm-multisnap-btree.c @@ -12,8 +12,9 @@ * Read one btree node and do basic consistency checks. * Any btree access should be done with this function. */ - -static struct dm_multisnap_bt_node *dm_multisnap_read_btnode(struct dm_exception_store *s, int depth, chunk_t block, unsigned want_entries, struct dm_buffer **bp) +static struct dm_multisnap_bt_node * +dm_multisnap_read_btnode(struct dm_exception_store *s, int depth, + chunk_t block, unsigned want_entries, struct dm_buffer **bp) { struct dm_multisnap_bt_node *node; @@ -25,17 +26,21 @@ static struct dm_multisnap_bt_node *dm_multisnap_read_btnode(struct dm_exception if (unlikely(node->signature != BT_SIGNATURE)) { dm_bufio_release(*bp); - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_read_btnode: bad signature on btree node %llx", (unsigned long long)block)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_read_btnode: bad signature on btree node %llx", + (unsigned long long)block)); return NULL; } if (unlikely((unsigned)(le32_to_cpu(node->n_entries) - 1) >= s->btree_entries) || (want_entries && unlikely(le32_to_cpu(node->n_entries) != want_entries))) { dm_bufio_release(*bp); - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_read_btnode: bad number of entries in btree node %llx: %x, wanted %x", - (unsigned long long)block, - le32_to_cpu(node->n_entries), - want_entries)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_read_btnode: bad number of entries in btree node " + "%llx: %x, wanted %x", + (unsigned long long)block, + le32_to_cpu(node->n_entries), + want_entries)); return NULL; } @@ -49,7 +54,6 @@ static struct dm_multisnap_bt_node *dm_multisnap_read_btnode(struct dm_exception * with bits 32-47 set, so that the store could be read on a system with * 64-bit chunk_t. */ - static void write_orig_chunk(struct dm_multisnap_bt_entry *be, chunk_t n) { write_48(be, orig_chunk, n); @@ -61,10 +65,11 @@ static void write_orig_chunk(struct dm_multisnap_bt_entry *be, chunk_t n) * Add an entry (key, new_chunk) at an appropriate index to the btree node. * Move the existing entries */ - -static void add_at_idx(struct dm_multisnap_bt_node *node, unsigned index, struct bt_key *key, chunk_t new_chunk) +static void add_at_idx(struct dm_multisnap_bt_node *node, unsigned index, + struct bt_key *key, chunk_t new_chunk) { - memmove(&node->entries[index + 1], &node->entries[index], (le32_to_cpu(node->n_entries) - index) * sizeof(struct dm_multisnap_bt_entry)); + memmove(&node->entries[index + 1], &node->entries[index], + (le32_to_cpu(node->n_entries) - index) * sizeof(struct dm_multisnap_bt_entry)); write_orig_chunk(&node->entries[index], key->chunk); write_48(&node->entries[index], new_chunk, new_chunk); node->entries[index].snap_from = cpu_to_mikulas_snapid(key->snap_from); @@ -77,7 +82,6 @@ static void add_at_idx(struct dm_multisnap_bt_node *node, unsigned index, struct * Create an initial btree. * (*writing_block) is updated to point after the btree. */ - void dm_multisnap_create_btree(struct dm_exception_store *s, chunk_t *writing_block) { struct dm_buffer *bp; @@ -88,13 +92,16 @@ void dm_multisnap_create_btree(struct dm_exception_store *s, chunk_t *writing_bl (*writing_block)++; if (*writing_block >= s->dev_size) { - DM_MULTISNAP_SET_ERROR(s->dm, -ENOSPC, ("dm_multisnap_create_btree: device is too small")); + DM_MULTISNAP_SET_ERROR(s->dm, -ENOSPC, + ("dm_multisnap_create_btree: device is too small")); return; } node = dm_bufio_new(s->bufio, *writing_block, &bp); if (IS_ERR(node)) { - DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(node), ("dm_multisnap_create_btree: 't create direct bitmap block at %llx", (unsigned long long)*writing_block)); + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(node), + ("dm_multisnap_create_btree: 't create direct bitmap block at %llx", + (unsigned long long)*writing_block)); return; } memset(node, 0, s->chunk_size); @@ -123,7 +130,6 @@ void dm_multisnap_create_btree(struct dm_exception_store *s, chunk_t *writing_bl * 0: the entry matches the key (both entry and key have ranges, a match * is returned when the ranges overlap) */ - static int compare_key(struct dm_multisnap_bt_entry *e, struct bt_key *key) { chunk_t orig_chunk = read_48(e, orig_chunk); @@ -146,8 +152,8 @@ static int compare_key(struct dm_multisnap_bt_entry *e, struct bt_key *key) * *result - if found, then the first entry in the requested range * - if not found, then the first entry after the requested range */ - -static int binary_search(struct dm_multisnap_bt_node *node, struct bt_key *key, unsigned *result) +static int binary_search(struct dm_multisnap_bt_node *node, struct bt_key *key, + unsigned *result) { int c; int first = 0; @@ -182,8 +188,9 @@ static int binary_search(struct dm_multisnap_bt_node *node, struct bt_key *key, * this node is returned (the buffer must be released with * dm_bufio_release). Also, path with s->bt_depth entries is returned. */ - -static int walk_btree(struct dm_exception_store *s, struct bt_key *key, struct dm_multisnap_bt_node **nodep, struct dm_buffer **bp, struct path_element path[MAX_BT_DEPTH]) +static int walk_btree(struct dm_exception_store *s, struct bt_key *key, + struct dm_multisnap_bt_node **nodep, struct dm_buffer **bp, + struct path_element path[MAX_BT_DEPTH]) { #define node (*nodep) int r; @@ -212,16 +219,19 @@ static int walk_btree(struct dm_exception_store *s, struct bt_key *key, struct d if (unlikely(last_chunk != want_last_chunk) || unlikely(last_snapid != want_last_snapid)) { dm_bufio_release(*bp); - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("walk_btree: invalid last entry in node %llx/%llx: last_chunk %llx, want_last_chunk %llx, last_snapid: %llx, want_last_snapid: %llx, searching for %llx, %llx-%llx", - (unsigned long long)block, - (unsigned long long)dm_multisnap_remap_block(s, block), - (unsigned long long)last_chunk, - (unsigned long long)want_last_chunk, - (unsigned long long)last_snapid, - (unsigned long long)want_last_snapid, - (unsigned long long)key->chunk, - (unsigned long long)key->snap_from, - (unsigned long long)key->snap_to)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("walk_btree: invalid last entry in node %llx/%llx: " + "last_chunk %llx, want_last_chunk %llx, last_snapid: %llx, " + "want_last_snapid: %llx, searching for %llx, %llx-%llx", + (unsigned long long)block, + (unsigned long long)dm_multisnap_remap_block(s, block), + (unsigned long long)last_chunk, + (unsigned long long)want_last_chunk, + (unsigned long long)last_snapid, + (unsigned long long)want_last_snapid, + (unsigned long long)key->chunk, + (unsigned long long)key->snap_from, + (unsigned long long)key->snap_to)); return -1; } @@ -248,8 +258,8 @@ static int walk_btree(struct dm_exception_store *s, struct bt_key *key, struct d * In case the node is found, key contains updated key and result contains * the resulting chunk. */ - -int dm_multisnap_find_in_btree(struct dm_exception_store *s, struct bt_key *key, chunk_t *result) +int dm_multisnap_find_in_btree(struct dm_exception_store *s, struct bt_key *key, + chunk_t *result) { struct dm_multisnap_bt_node *node; struct path_element path[MAX_BT_DEPTH]; @@ -278,8 +288,10 @@ int dm_multisnap_find_in_btree(struct dm_exception_store *s, struct bt_key *key, * When the whole tree is scanned, return 0. * On error, return -1. */ - -int dm_multisnap_list_btree(struct dm_exception_store *s, struct bt_key *key, int (*call)(struct dm_exception_store *, struct dm_multisnap_bt_node *, struct dm_multisnap_bt_entry *, void *), void *cookie) +int dm_multisnap_list_btree(struct dm_exception_store *s, struct bt_key *key, + int (*call)(struct dm_exception_store *, struct dm_multisnap_bt_node *, + struct dm_multisnap_bt_entry *, void *), + void *cookie) { struct dm_multisnap_bt_node *node; struct path_element path[MAX_BT_DEPTH]; @@ -305,7 +317,8 @@ list_next_node: for (depth = s->bt_depth - 2; depth >= 0; depth--) { int idx; - node = dm_multisnap_read_btnode(s, depth, path[depth].block, path[depth].n_entries, &bp); + node = dm_multisnap_read_btnode(s, depth, path[depth].block, + path[depth].n_entries, &bp); if (!node) return -1; idx = path[depth].idx + 1; @@ -313,9 +326,10 @@ list_next_node: r = compare_key(&node->entries[idx], key); if (unlikely(r <= 0)) { dm_bufio_release(bp); - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_list_btree: non-monotonic btree: node %llx, index %x", - (unsigned long long)path[depth].block, - idx)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_list_btree: non-monotonic btree: node " + "%llx, index %x", + (unsigned long long)path[depth].block, idx)); return 0; } path[depth].idx = idx; @@ -359,10 +373,12 @@ void dm_multisnap_add_to_btree(struct dm_exception_store *s, struct bt_key *key, if (unlikely(r)) { if (r > 0) { dm_bufio_release(bp); - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_add_to_btree: adding key that already exists: %llx, %llx-%llx", - (unsigned long long)key->chunk, - (unsigned long long)key->snap_from, - (unsigned long long)key->snap_to)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_add_to_btree: adding key that already exists: " + "%llx, %llx-%llx", + (unsigned long long)key->chunk, + (unsigned long long)key->snap_from, + (unsigned long long)key->snap_to)); } return; } @@ -392,9 +408,11 @@ go_up: cond_resched(); memcpy(node, s->tmp_chunk, sizeof(struct dm_multisnap_bt_node)); cond_resched(); - memcpy((char *)node + sizeof(struct dm_multisnap_bt_node), (char *)s->tmp_chunk + split_offset, split_size - split_offset); + memcpy((char *)node + sizeof(struct dm_multisnap_bt_node), + (char *)s->tmp_chunk + split_offset, split_size - split_offset); cond_resched(); - memset((char *)node + sizeof(struct dm_multisnap_bt_node) + split_size - split_offset, 0, s->chunk_size - (sizeof(struct dm_multisnap_bt_node) + split_size - split_offset)); + memset((char *)node + sizeof(struct dm_multisnap_bt_node) + split_size - split_offset, 0, + s->chunk_size - (sizeof(struct dm_multisnap_bt_node) + split_size - split_offset)); cond_resched(); node->n_entries = cpu_to_le32(split_entries - split_index); @@ -423,14 +441,16 @@ go_up: dm_bufio_release(bp); if (depth--) { - node = dm_multisnap_read_btnode(s, depth, path[depth].block, path[depth].n_entries, &bp); + node = dm_multisnap_read_btnode(s, depth, path[depth].block, + path[depth].n_entries, &bp); if (unlikely(!node)) return; goto go_up; } if (s->bt_depth >= MAX_BT_DEPTH) { - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_add_to_btree: max b+-tree depth reached")); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_add_to_btree: max b+-tree depth reached")); return; } @@ -459,8 +479,10 @@ go_up: * Change the last entry from old_chunk/old_snapid to new_chunk/new_snapid. * Start at a given depth and go upward to the root. */ - -static void dm_multisnap_fixup_backlimits(struct dm_exception_store *s, struct path_element path[MAX_BT_DEPTH], int depth, chunk_t old_chunk, mikulas_snapid_t old_snapid, chunk_t new_chunk, mikulas_snapid_t new_snapid) +static void dm_multisnap_fixup_backlimits(struct dm_exception_store *s, + struct path_element path[MAX_BT_DEPTH], int depth, + chunk_t old_chunk, mikulas_snapid_t old_snapid, + chunk_t new_chunk, mikulas_snapid_t new_snapid) { int idx; struct dm_multisnap_bt_node *node; @@ -470,7 +492,8 @@ static void dm_multisnap_fixup_backlimits(struct dm_exception_store *s, struct p return; for (depth--; depth >= 0; depth--) { - node = dm_multisnap_read_btnode(s, depth, path[depth].block, path[depth].n_entries, &bp); + node = dm_multisnap_read_btnode(s, depth, path[depth].block, + path[depth].n_entries, &bp); if (unlikely(!node)) return; @@ -484,14 +507,17 @@ static void dm_multisnap_fixup_backlimits(struct dm_exception_store *s, struct p unlikely(mikulas_snapid_to_cpu(node->entries[idx].snap_from) != old_snapid) || unlikely(mikulas_snapid_to_cpu(node->entries[idx].snap_to) != old_snapid)) { dm_bufio_release(bp); - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_fixup_backlimits: btree limit does not match, block %llx, idx %x, orig_chunk %llx, snap_from %llx, snap_to %llx, want %llx, %llx", - (unsigned long long)path[depth].block, - idx, - (unsigned long long)read_48(&node->entries[idx], orig_chunk), - (unsigned long long)mikulas_snapid_to_cpu(node->entries[idx].snap_from), - (unsigned long long)mikulas_snapid_to_cpu(node->entries[idx].snap_to), - (unsigned long long)old_chunk, - (unsigned long long)old_snapid)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_fixup_backlimits: btree limit does not match, block " + "%llx, idx %x, orig_chunk %llx, snap_from %llx, snap_to " + "%llx, want %llx, %llx", + (unsigned long long)path[depth].block, + idx, + (unsigned long long)read_48(&node->entries[idx], orig_chunk), + (unsigned long long)mikulas_snapid_to_cpu(node->entries[idx].snap_from), + (unsigned long long)mikulas_snapid_to_cpu(node->entries[idx].snap_to), + (unsigned long long)old_chunk, + (unsigned long long)old_snapid)); return; } write_48(&node->entries[idx], orig_chunk, new_chunk); @@ -503,11 +529,12 @@ static void dm_multisnap_fixup_backlimits(struct dm_exception_store *s, struct p if (path[depth].idx != path[depth].n_entries - 1) return; } - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_fixup_backlimits: the last entry modified, %llx/%llx -> %llx/%llx", - (unsigned long long)old_chunk, - (unsigned long long)old_snapid, - (unsigned long long)new_chunk, - (unsigned long long)new_snapid)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_fixup_backlimits: the last entry modified, %llx/%llx -> %llx/%llx", + (unsigned long long)old_chunk, + (unsigned long long)old_snapid, + (unsigned long long)new_chunk, + (unsigned long long)new_snapid)); } /* @@ -515,7 +542,6 @@ static void dm_multisnap_fixup_backlimits(struct dm_exception_store *s, struct p * The key must have the same beginning or end as some existing entry (not both) * The range of the key is excluded from the entry. */ - void dm_multisnap_restrict_btree_entry(struct dm_exception_store *s, struct bt_key *key) { struct dm_multisnap_bt_node *node; @@ -531,10 +557,11 @@ void dm_multisnap_restrict_btree_entry(struct dm_exception_store *s, struct bt_k if (!r) { dm_bufio_release(bp); - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_restrict_btree_entry: unknown key: %llx, %llx-%llx", - (unsigned long long)key->chunk, - (unsigned long long)key->snap_from, - (unsigned long long)key->snap_to)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_restrict_btree_entry: unknown key: %llx, %llx-%llx", + (unsigned long long)key->chunk, + (unsigned long long)key->snap_from, + (unsigned long long)key->snap_to)); return; } @@ -553,12 +580,14 @@ void dm_multisnap_restrict_btree_entry(struct dm_exception_store *s, struct bt_k entry->snap_to = cpu_to_mikulas_snapid(new_to); } else { dm_bufio_release(bp); - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_restrict_btree_entry: invali range to restruct: %llx, %llx-%llx %llx-%llx", - (unsigned long long)key->chunk, - (unsigned long long)from, - (unsigned long long)to, - (unsigned long long)key->snap_from, - (unsigned long long)key->snap_to)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_restrict_btree_entry: invali range to restruct: " + "%llx, %llx-%llx %llx-%llx", + (unsigned long long)key->chunk, + (unsigned long long)from, + (unsigned long long)to, + (unsigned long long)key->snap_from, + (unsigned long long)key->snap_to)); return; } @@ -566,14 +595,14 @@ void dm_multisnap_restrict_btree_entry(struct dm_exception_store *s, struct bt_k dm_bufio_release(bp); if (unlikely(idx == path[s->bt_depth - 1].n_entries - 1)) - dm_multisnap_fixup_backlimits(s, path, s->bt_depth - 1, key->chunk, to, key->chunk, new_to); + dm_multisnap_fixup_backlimits(s, path, s->bt_depth - 1, + key->chunk, to, key->chunk, new_to); } /* * Expand range of an existing btree entry. * The key represents the whole new range (including the old and new part). */ - void dm_multisnap_extend_btree_entry(struct dm_exception_store *s, struct bt_key *key) { struct dm_multisnap_bt_node *node; @@ -589,14 +618,17 @@ void dm_multisnap_extend_btree_entry(struct dm_exception_store *s, struct bt_key if (!r) { dm_bufio_release(bp); - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_extend_btree_entry: unknown key: %llx, %llx-%llx", - (unsigned long long)key->chunk, - (unsigned long long)key->snap_from, - (unsigned long long)key->snap_to)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_extend_btree_entry: unknown key: " + "%llx, %llx-%llx", + (unsigned long long)key->chunk, + (unsigned long long)key->snap_from, + (unsigned long long)key->snap_to)); return; } - node = dm_multisnap_alloc_duplicate_block(s, path[s->bt_depth - 1].block, &bp, node); + node = dm_multisnap_alloc_duplicate_block(s, path[s->bt_depth - 1].block, + &bp, node); if (unlikely(!node)) return; @@ -615,13 +647,13 @@ void dm_multisnap_extend_btree_entry(struct dm_exception_store *s, struct bt_key dm_bufio_release(bp); if (unlikely(idx == path[s->bt_depth - 1].n_entries - 1)) - dm_multisnap_fixup_backlimits(s, path, s->bt_depth - 1, key->chunk, to, key->chunk, new_to); + dm_multisnap_fixup_backlimits(s, path, s->bt_depth - 1, + key->chunk, to, key->chunk, new_to); } /* * Delete an entry from the btree. */ - void dm_multisnap_delete_from_btree(struct dm_exception_store *s, struct bt_key *key) { struct dm_multisnap_bt_node *node; @@ -642,10 +674,11 @@ void dm_multisnap_delete_from_btree(struct dm_exception_store *s, struct bt_key if (unlikely(!r)) { dm_bufio_release(bp); - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_delete_from_btree: unknown key: %llx, %llx-%llx", - (unsigned long long)key->chunk, - (unsigned long long)key->snap_from, - (unsigned long long)key->snap_to)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_delete_from_btree: unknown key: %llx, %llx-%llx", + (unsigned long long)key->chunk, + (unsigned long long)key->snap_from, + (unsigned long long)key->snap_to)); return; } @@ -657,24 +690,28 @@ void dm_multisnap_delete_from_btree(struct dm_exception_store *s, struct bt_key to = mikulas_snapid_to_cpu(entry->snap_to); if (unlikely(from != key->snap_from) || unlikely(to != key->snap_to)) { dm_bufio_release(bp); - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_restrict_btree: invali range to restruct: %llx, %llx-%llx %llx-%llx", - (unsigned long long)key->chunk, - (unsigned long long)from, - (unsigned long long)to, - (unsigned long long)key->snap_from, - (unsigned long long)key->snap_to)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_delete_from_btree: invalid range to restrict: " + "%llx, %llx-%llx %llx-%llx", + (unsigned long long)key->chunk, + (unsigned long long)from, + (unsigned long long)to, + (unsigned long long)key->snap_from, + (unsigned long long)key->snap_to)); return; } while (unlikely((n_entries = le32_to_cpu(node->n_entries)) == 1)) { dm_bufio_release(bp); if (unlikely(!depth)) { - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_restrict_btree: b-tree is empty")); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_delete_from_btree: b-tree is empty")); return; } dm_multisnap_free_block_and_duplicates(s, path[depth].block); depth--; - node = dm_multisnap_read_btnode(s, depth, path[depth].block, path[depth].n_entries, &bp); + node = dm_multisnap_read_btnode(s, depth, path[depth].block, + path[depth].n_entries, &bp); if (!node) return; } @@ -686,7 +723,8 @@ void dm_multisnap_delete_from_btree(struct dm_exception_store *s, struct bt_key idx = path[depth].idx; cond_resched(); - memmove(node->entries + idx, node->entries + idx + 1, (n_entries - idx - 1) * sizeof(struct dm_multisnap_bt_entry)); + memmove(node->entries + idx, node->entries + idx + 1, + (n_entries - idx - 1) * sizeof(struct dm_multisnap_bt_entry)); cond_resched(); n_entries--; memset(node->entries + n_entries, 0, sizeof(struct dm_multisnap_bt_entry)); @@ -701,7 +739,9 @@ void dm_multisnap_delete_from_btree(struct dm_exception_store *s, struct bt_key dm_bufio_release(bp); if (unlikely(idx == n_entries)) - dm_multisnap_fixup_backlimits(s, path, depth, key->chunk, key->snap_to, last_one_chunk, last_one_snap_to); + dm_multisnap_fixup_backlimits(s, path, depth, key->chunk, + key->snap_to, last_one_chunk, + last_one_snap_to); } /* @@ -709,8 +749,8 @@ void dm_multisnap_delete_from_btree(struct dm_exception_store *s, struct bt_key * Find the whole path for tmp_remap and write the path as new entries, from * the root. */ - -void dm_multisnap_bt_finalize_tmp_remap(struct dm_exception_store *s, struct tmp_remap *tmp_remap) +void dm_multisnap_bt_finalize_tmp_remap(struct dm_exception_store *s, + struct tmp_remap *tmp_remap) { struct dm_buffer *bp; struct dm_multisnap_bt_node *node; @@ -723,7 +763,8 @@ void dm_multisnap_bt_finalize_tmp_remap(struct dm_exception_store *s, struct tmp int i; if (s->n_preallocated_blocks < s->bt_depth) { - if (dm_multisnap_alloc_blocks(s, s->preallocated_blocks + s->n_preallocated_blocks, s->bt_depth - s->n_preallocated_blocks, 0) < 0) + if (dm_multisnap_alloc_blocks(s, s->preallocated_blocks + s->n_preallocated_blocks, + s->bt_depth - s->n_preallocated_blocks, 0) < 0) return; s->n_preallocated_blocks = s->bt_depth; } @@ -751,17 +792,16 @@ void dm_multisnap_bt_finalize_tmp_remap(struct dm_exception_store *s, struct tmp goto found; DMERR("block %llx/%llx was not found in btree when searching for %llx/%llx", - (unsigned long long)tmp_remap->old, - (unsigned long long)tmp_remap->new, - (unsigned long long)key.chunk, - (unsigned long long)key.snap_from); + (unsigned long long)tmp_remap->old, + (unsigned long long)tmp_remap->new, + (unsigned long long)key.chunk, + (unsigned long long)key.snap_from); for (i = 0; i < s->bt_depth; i++) DMERR("path[%d]: %llx/%x", i, (unsigned long long)path[i].block, path[i].idx); dm_multisnap_set_error(s->dm, -EFSERROR); return; found: - dm_multisnap_free_block(s, tmp_remap->old, 0); new_blockn = tmp_remap->new; @@ -774,7 +814,8 @@ found: remapped = 1; dm_bufio_release_move(bp, s->preallocated_blocks[results_ptr]); dm_multisnap_free_block_and_duplicates(s, path[i].block); - node = dm_multisnap_read_btnode(s, i, s->preallocated_blocks[results_ptr], path[i].n_entries, &bp); + node = dm_multisnap_read_btnode(s, i, s->preallocated_blocks[results_ptr], + path[i].n_entries, &bp); if (!node) return; dm_multisnap_block_set_uncommitted(s, s->preallocated_blocks[results_ptr]); @@ -792,6 +833,6 @@ found: s->bt_root = new_blockn; skip_it: - memmove(s->preallocated_blocks, s->preallocated_blocks + results_ptr, (s->n_preallocated_blocks -= results_ptr) * sizeof(chunk_t)); + memmove(s->preallocated_blocks, s->preallocated_blocks + results_ptr, + (s->n_preallocated_blocks -= results_ptr) * sizeof(chunk_t)); } - diff --git a/drivers/md/dm-multisnap-commit.c b/drivers/md/dm-multisnap-commit.c index f44f2e7..78b2583 100644 --- a/drivers/md/dm-multisnap-commit.c +++ b/drivers/md/dm-multisnap-commit.c @@ -11,7 +11,6 @@ /* * Flush existing tmp_remaps. */ - static void dm_multisnap_finalize_tmp_remaps(struct dm_exception_store *s) { struct tmp_remap *t; @@ -26,21 +25,25 @@ static void dm_multisnap_finalize_tmp_remaps(struct dm_exception_store *s) * if there are none, do bitmap remaps */ if (!list_empty(&s->used_bt_tmp_remaps)) { - t = container_of(s->used_bt_tmp_remaps.next, struct tmp_remap, list); + t = container_of(s->used_bt_tmp_remaps.next, + struct tmp_remap, list); dm_multisnap_bt_finalize_tmp_remap(s, t); dm_multisnap_free_tmp_remap(s, t); continue; } } -/* else: 0 or 1 free remaps : finalize bitmaps */ + /* else: 0 or 1 free remaps : finalize bitmaps */ if (!list_empty(&s->used_bitmap_tmp_remaps)) { - t = container_of(s->used_bitmap_tmp_remaps.next, struct tmp_remap, list); + t = container_of(s->used_bitmap_tmp_remaps.next, + struct tmp_remap, list); dm_multisnap_bitmap_finalize_tmp_remap(s, t); dm_multisnap_free_tmp_remap(s, t); continue; } else { - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_finalize_tmp_remaps: no bitmap tmp remaps, n_used_tmp_remaps %u", s->n_used_tmp_remaps)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_finalize_tmp_remaps: no bitmap tmp remaps, n_used_tmp_remaps %u", + s->n_used_tmp_remaps)); return; } } @@ -58,7 +61,6 @@ static void dm_multisnap_finalize_tmp_remaps(struct dm_exception_store *s) * when b+tree is consistent. It flushes tmp_remaps, so that tmp_remap array * doesn't overflow. This function doesn't commit anything. */ - void dm_multisnap_transition_mark(struct dm_exception_store *s) { /* @@ -76,14 +78,14 @@ void dm_multisnap_transition_mark(struct dm_exception_store *s) * Flush buffers. This is called without the lock to reduce lock contention. * The buffers will be flushed again, with the lock. */ - void dm_multisnap_prepare_for_commit(struct dm_exception_store *s) { int r; r = dm_bufio_write_dirty_buffers(s->bufio); if (unlikely(r < 0)) { - DM_MULTISNAP_SET_ERROR(s->dm, r, ("dm_multisnap_prepare_for_commit: error writing data")); + DM_MULTISNAP_SET_ERROR(s->dm, r, + ("dm_multisnap_prepare_for_commit: error writing data")); return; } } @@ -94,7 +96,6 @@ void dm_multisnap_prepare_for_commit(struct dm_exception_store *s) * It is valid to make multiple modifications to the exception store and * then commit them atomically at once with this function. */ - void dm_multisnap_commit(struct dm_exception_store *s) { struct tmp_remap *t; @@ -138,7 +139,8 @@ void dm_multisnap_commit(struct dm_exception_store *s) r = dm_bufio_write_dirty_buffers(s->bufio); if (unlikely(r < 0)) { - DM_MULTISNAP_SET_ERROR(s->dm, r, ("dm_multisnap_commit: error writing data")); + DM_MULTISNAP_SET_ERROR(s->dm, r, + ("dm_multisnap_commit: error writing data")); return; } @@ -154,7 +156,9 @@ void dm_multisnap_commit(struct dm_exception_store *s) cb = dm_bufio_new(s->bufio, cb_addr, &bp); if (IS_ERR(cb)) { - DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(cb), ("dm_multisnap_commit: can't allocate new commit block at %llx", (unsigned long long)cb_addr)); + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(cb), + ("dm_multisnap_commit: can't allocate new commit block at %llx", + (unsigned long long)cb_addr)); return; } @@ -198,7 +202,9 @@ void dm_multisnap_commit(struct dm_exception_store *s) dm_bufio_release(bp); r = dm_bufio_write_dirty_buffers(s->bufio); if (unlikely(r < 0)) { - DM_MULTISNAP_SET_ERROR(s->dm, r, ("dm_multisnap_commit: can't write commit block at %llx", (unsigned long long)cb_addr)); + DM_MULTISNAP_SET_ERROR(s->dm, r, + ("dm_multisnap_commit: can't write commit block at %llx", + (unsigned long long)cb_addr)); return; } @@ -208,13 +214,15 @@ void dm_multisnap_commit(struct dm_exception_store *s) sb = dm_bufio_read(s->bufio, SB_BLOCK, &bp); if (IS_ERR(sb)) { - DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(sb), ("dm_multisnap_commit: can't read super block")); + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(sb), + ("dm_multisnap_commit: can't read super block")); return; } if (unlikely(sb->signature != SB_SIGNATURE)) { dm_bufio_release(bp); - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_commit: invalid super block signature when committing")); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_commit: invalid super block signature when committing")); return; } diff --git a/drivers/md/dm-multisnap-daniel.c b/drivers/md/dm-multisnap-daniel.c index df3fafb..00fd3c0 100644 --- a/drivers/md/dm-multisnap-daniel.c +++ b/drivers/md/dm-multisnap-daniel.c @@ -33,7 +33,8 @@ /*----------------------------------------------------------------- * Persistent snapshots, by persistent we mean that the snapshot * will survive a reboot. - *---------------------------------------------------------------*/ + *--------------------------------------------------------------- + */ /* * We need to store a record of which parts of the origin have @@ -279,7 +280,8 @@ static struct chunk_buffer *alloc_chunk_buffer(struct dm_exception_store *ps) /* Mikulas: changed to GFP_NOIO */ b = kzalloc(sizeof(*b), GFP_NOIO); if (!b) { - DM_MULTISNAP_SET_ERROR(ps->dm, -ENOMEM, ("%s %d: out of memory", __func__, __LINE__)); + DM_MULTISNAP_SET_ERROR(ps->dm, -ENOMEM, + ("%s %d: out of memory", __func__, __LINE__)); return NULL; } @@ -287,7 +289,8 @@ static struct chunk_buffer *alloc_chunk_buffer(struct dm_exception_store *ps) b->data = __vmalloc(ps->chunk_size, GFP_NOIO | __GFP_HIGHMEM, PAGE_KERNEL); if (!b->data) { kfree(b); - DM_MULTISNAP_SET_ERROR(ps->dm, -ENOMEM, ("%s %d: out of memory", __func__, __LINE__)); + DM_MULTISNAP_SET_ERROR(ps->dm, -ENOMEM, + ("%s %d: out of memory", __func__, __LINE__)); return NULL; } @@ -378,8 +381,10 @@ static int shared_free_chunk(struct dm_exception_store *ps, chunk_t chunk) chunk_io(ps, ps->cur_bitmap_chunk, READ, ps->bitmap); if (!ext2_test_bit(idx, ps->bitmap)) { - DM_MULTISNAP_SET_ERROR(ps->dm, -EFSERROR, ("%s: trying to free free block %lld %lld %u", __func__, - (unsigned long long)chunk, (unsigned long long)ps->cur_bitmap_chunk, idx)); + DM_MULTISNAP_SET_ERROR(ps->dm, -EFSERROR, + ("%s: trying to free free block %lld %lld %u", __func__, + (unsigned long long)chunk, + (unsigned long long)ps->cur_bitmap_chunk, idx)); } ext2_clear_bit(idx, ps->bitmap); @@ -1112,9 +1117,10 @@ static void check_leaf(struct dm_exception_store *ps, struct leaf *leaf, u64 sna for (p = emap(leaf, i); p < emap(leaf, i+1); p++) { /* !!! should also check for any zero sharemaps here */ if (le64_to_cpu(p->share) & snapmask) { - DM_MULTISNAP_SET_ERROR(ps->dm, -EFSERROR, ("nonzero bits %016llx outside snapmask %016llx", - (unsigned long long)p->share, - (unsigned long long)snapmask)); + DM_MULTISNAP_SET_ERROR(ps->dm, -EFSERROR, + ("nonzero bits %016llx outside snapmask %016llx", + (unsigned long long)p->share, + (unsigned long long)snapmask)); } } } @@ -1382,7 +1388,7 @@ keep_prev_node: } while (level < levels - 1); } -/* dirty_buffer_count_check(sb); */ + /* dirty_buffer_count_check(sb); */ /* * Get the leaf indicated in the next index entry in the node * at this level. @@ -1433,7 +1439,8 @@ found: return 0; } -static int shared_init(struct dm_multisnap *dm, struct dm_exception_store **sp, unsigned argc, char **argv, char **error) +static int shared_init(struct dm_multisnap *dm, struct dm_exception_store **sp, + unsigned argc, char **argv, char **error) { int r; struct dm_exception_store *ps; @@ -1489,7 +1496,8 @@ static void shared_destroy(struct dm_exception_store *ps) kfree(ps); } -static int shared_allocate_snapid(struct dm_exception_store *ps, snapid_t *snapid, int snap_of_snap, snapid_t master) +static int shared_allocate_snapid(struct dm_exception_store *ps, + snapid_t *snapid, int snap_of_snap, snapid_t master) { int i; @@ -1511,11 +1519,13 @@ static int shared_allocate_snapid(struct dm_exception_store *ps, snapid_t *snapi static int shared_create_snapshot(struct dm_exception_store *ps, snapid_t snapid) { if (snapid >= MAX_SNAPSHOTS) { - DMERR("shared_create_snapshot: invalid snapshot id %llx", (unsigned long long)snapid); + DMERR("shared_create_snapshot: invalid snapshot id %llx", + (unsigned long long)snapid); return -EINVAL; } if (ps->snapmask & 1LL << snapid) { - DMERR("shared_create_snapshot: snapshot with id %llx already exists", (unsigned long long)snapid); + DMERR("shared_create_snapshot: snapshot with id %llx already exists", + (unsigned long long)snapid); return -EINVAL; } ps->snapmask |= 1LL << snapid; @@ -1561,7 +1571,8 @@ static snapid_t shared_get_next_snapid(struct dm_exception_store *ps, snapid_t s return DM_SNAPID_T_ORIGIN; } -static int shared_find_snapshot_chunk(struct dm_exception_store *ps, snapid_t snapid, chunk_t chunk, int write, chunk_t *result) +static int shared_find_snapshot_chunk(struct dm_exception_store *ps, snapid_t snapid, + chunk_t chunk, int write, chunk_t *result) { unsigned levels = ps->tree_level; struct etree_path path[levels + 1]; @@ -1593,7 +1604,8 @@ static int shared_query_next_remap(struct dm_exception_store *ps, chunk_t chunk) return !origin_chunk_unique(buffer2leaf(leafbuf), chunk, ps->snapmask); } -static void shared_add_next_remap(struct dm_exception_store *ps, union chunk_descriptor *cd, chunk_t *new_chunk) +static void shared_add_next_remap(struct dm_exception_store *ps, + union chunk_descriptor *cd, chunk_t *new_chunk) { struct chunk_buffer *cb; struct etree_path path[ps->tree_level + 1]; @@ -1613,8 +1625,9 @@ static void shared_add_next_remap(struct dm_exception_store *ps, union chunk_des ret = origin_chunk_unique(buffer2leaf(cb), chunk, ps->snapmask); if (ret) { - DM_MULTISNAP_SET_ERROR(ps->dm, -EFSERROR, ("%s %d: bug %llu %d", __func__, __LINE__, - (unsigned long long)chunk, ret)); + DM_MULTISNAP_SET_ERROR(ps->dm, -EFSERROR, + ("%s %d: bug %llu %d", __func__, __LINE__, + (unsigned long long)chunk, ret)); return; } @@ -1629,7 +1642,8 @@ static void shared_add_next_remap(struct dm_exception_store *ps, union chunk_des (unsigned long long)*new_chunk);*/ } -static int shared_check_conflict(struct dm_exception_store *ps, union chunk_descriptor *cd, snapid_t snapid) +static int shared_check_conflict(struct dm_exception_store *ps, + union chunk_descriptor *cd, snapid_t snapid) { return !!(cd->bitmask & (1LL << snapid)); } @@ -1695,5 +1709,3 @@ module_exit(dm_multisnapshot_daniel_module_exit); MODULE_DESCRIPTION(DM_NAME " multisnapshot Fujita/Daniel's exceptions store"); MODULE_AUTHOR("Fujita Tomonorig, Daniel Phillips"); MODULE_LICENSE("GPL"); - - diff --git a/drivers/md/dm-multisnap-delete.c b/drivers/md/dm-multisnap-delete.c index 2dcc251..22705a3 100644 --- a/drivers/md/dm-multisnap-delete.c +++ b/drivers/md/dm-multisnap-delete.c @@ -24,7 +24,9 @@ struct list_cookie { #define RET_DO_FREE 2 #define RET_RESCHEDULE 3 -static int list_callback(struct dm_exception_store *s, struct dm_multisnap_bt_node *node, struct dm_multisnap_bt_entry *bt, void *cookie) +static int list_callback(struct dm_exception_store *s, + struct dm_multisnap_bt_node *node, + struct dm_multisnap_bt_entry *bt, void *cookie) { struct list_cookie *lc = cookie; mikulas_snapid_t found_from, found_to; @@ -41,7 +43,9 @@ static int list_callback(struct dm_exception_store *s, struct dm_multisnap_bt_no if (unlikely(!s->delete_rover_snapid)) s->delete_rover_chunk++; - if (!dm_multisnap_find_next_snapid_range(s, lc->key.snap_from, &found_from, &found_to) || found_from > lc->key.snap_to) { + if (!dm_multisnap_find_next_snapid_range(s, lc->key.snap_from, + &found_from, &found_to) || + found_from > lc->key.snap_to) { /* * This range maps unused snapshots, delete it. * But we can't do it now, so submit it to the caller; @@ -113,7 +117,8 @@ static void delete_step(struct dm_exception_store *s) } } -void dm_multisnap_background_delete(struct dm_exception_store *s, struct dm_multisnap_background_work *bw) +void dm_multisnap_background_delete(struct dm_exception_store *s, + struct dm_multisnap_background_work *bw) { if (unlikely(dm_multisnap_has_error(s->dm))) return; diff --git a/drivers/md/dm-multisnap-freelist.c b/drivers/md/dm-multisnap-freelist.c index 791d291..6ec1476 100644 --- a/drivers/md/dm-multisnap-freelist.c +++ b/drivers/md/dm-multisnap-freelist.c @@ -11,7 +11,6 @@ /* * Initialize in-memory freelist structure. */ - void dm_multisnap_init_freelist(struct dm_multisnap_freelist *fl, unsigned chunk_size) { cond_resched(); @@ -29,7 +28,6 @@ void dm_multisnap_init_freelist(struct dm_multisnap_freelist *fl, unsigned chunk * 1 --- block was added * 0 --- block could not be added because the freelist is full */ - static int add_to_freelist(struct dm_exception_store *s, chunk_t block, unsigned flags) { int i; @@ -39,10 +37,11 @@ static int add_to_freelist(struct dm_exception_store *s, chunk_t block, unsigned unsigned r = le16_to_cpu(fl->entries[i].run_length) & FREELIST_RL_MASK; unsigned f = le16_to_cpu(fl->entries[i].run_length) & FREELIST_DATA_FLAG; if (block >= x && block < x + r) { - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("add_to_freelist: freeing already free block %llx (%llx - %x)", - (unsigned long long)block, - (unsigned long long)x, - r)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("add_to_freelist: freeing already free block %llx (%llx - %x)", + (unsigned long long)block, + (unsigned long long)x, + r)); return -1; } if (likely(r < FREELIST_RL_MASK) && likely(f == flags)) { @@ -71,23 +70,29 @@ inc_length: /* * Read a freelist block from the disk. */ - -static struct dm_multisnap_freelist *read_freelist(struct dm_exception_store *s, chunk_t block, struct dm_buffer **bp) +static struct dm_multisnap_freelist * +read_freelist(struct dm_exception_store *s, chunk_t block, struct dm_buffer **bp) { struct dm_multisnap_freelist *fl; fl = dm_bufio_read(s->bufio, block, bp); if (IS_ERR(fl)) { - DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(fl), ("read_freelist: can't read freelist block %llx", (unsigned long long)block)); + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(fl), + ("read_freelist: can't read freelist block %llx", + (unsigned long long)block)); return NULL; } if (fl->signature != FL_SIGNATURE) { dm_bufio_release(*bp); - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("read_freelist: bad signature freelist block %llx", (unsigned long long)block)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("read_freelist: bad signature freelist block %llx", + (unsigned long long)block)); return NULL; } if (le32_to_cpu(fl->n_entries) > dm_multisnap_freelist_entries(s->chunk_size)) { dm_bufio_release(*bp); - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("read_freelist: bad number of entries in freelist block %llx", (unsigned long long)block)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("read_freelist: bad number of entries in freelist block %llx", + (unsigned long long)block)); return NULL; } return fl; @@ -97,7 +102,6 @@ static struct dm_multisnap_freelist *read_freelist(struct dm_exception_store *s, * Allocate a block and write the current in-memory freelist to it. * Then, clear the in-memory freelist. */ - static void alloc_write_freelist(struct dm_exception_store *s) { chunk_t new_block; @@ -109,7 +113,9 @@ static void alloc_write_freelist(struct dm_exception_store *s) fl = dm_bufio_new(s->bufio, new_block, &bp); if (IS_ERR(fl)) { - DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(fl), ("alloc_write_freelist: can't make new freelist block %llx", (unsigned long long)new_block)); + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(fl), + ("alloc_write_freelist: can't make new freelist block %llx", + (unsigned long long)new_block)); return; } @@ -127,7 +133,6 @@ static void alloc_write_freelist(struct dm_exception_store *s) * It adds the block to the current freelist, if the freelist is full, it * flushes the freelist and makes a new one. */ - void dm_multisnap_free_block(struct dm_exception_store *s, chunk_t block, unsigned flags) { if (likely(add_to_freelist(s, block, flags))) @@ -146,7 +151,6 @@ void dm_multisnap_free_block(struct dm_exception_store *s, chunk_t block, unsign /* * Check if a given block is in a given freelist. */ - static int check_against_freelist(struct dm_multisnap_freelist *fl, chunk_t block) { int i; @@ -163,8 +167,8 @@ static int check_against_freelist(struct dm_multisnap_freelist *fl, chunk_t bloc /* * Check if a given block is in any freelist in a freelist chain. */ - -static int check_against_freelist_chain(struct dm_exception_store *s, chunk_t fl_block, chunk_t block) +static int check_against_freelist_chain(struct dm_exception_store *s, + chunk_t fl_block, chunk_t block) { struct stop_cycles cy; dm_multisnap_init_stop_cycles(&cy); @@ -198,7 +202,6 @@ static int check_against_freelist_chain(struct dm_exception_store *s, chunk_t fl * - the current freelist chain * - the freelist chain that was active on last commit */ - int dm_multisnap_check_allocated_block(struct dm_exception_store *s, chunk_t block) { int c; @@ -221,7 +224,6 @@ int dm_multisnap_check_allocated_block(struct dm_exception_store *s, chunk_t blo /* * This is called prior to commit, it writes the current freelist to the disk. */ - void dm_multisnap_flush_freelist_before_commit(struct dm_exception_store *s) { alloc_write_freelist(s); @@ -235,8 +237,8 @@ void dm_multisnap_flush_freelist_before_commit(struct dm_exception_store *s) /* * Free the blocks in the freelist. */ - -static void free_blocks_in_freelist(struct dm_exception_store *s, struct dm_multisnap_freelist *fl) +static void free_blocks_in_freelist(struct dm_exception_store *s, + struct dm_multisnap_freelist *fl) { int i; for (i = le32_to_cpu(fl->n_entries) - 1; i >= 0; i--) { @@ -260,7 +262,6 @@ static void free_blocks_in_freelist(struct dm_exception_store *s, struct dm_mult * If the computer crashes while this operation is in progress, it is done again * after a mount --- thus, it maintains data consistency. */ - void dm_multisnap_load_freelist(struct dm_exception_store *s) { chunk_t fl_block = s->freelist_ptr; diff --git a/drivers/md/dm-multisnap-io.c b/drivers/md/dm-multisnap-io.c index 9f5b1ad..7620ebe 100644 --- a/drivers/md/dm-multisnap-io.c +++ b/drivers/md/dm-multisnap-io.c @@ -13,8 +13,9 @@ * It returns 1 if remapping exists and is read-only (shared by other snapshots) * and 2 if it exists and is read-write (not shared by anyone). */ - -int dm_multisnap_find_snapshot_chunk(struct dm_exception_store *s, snapid_t snapid, chunk_t chunk, int write, chunk_t *result) +int dm_multisnap_find_snapshot_chunk(struct dm_exception_store *s, + snapid_t snapid, chunk_t chunk, + int write, chunk_t *result) { int r; struct bt_key key; @@ -46,9 +47,9 @@ int dm_multisnap_find_snapshot_chunk(struct dm_exception_store *s, snapid_t snap * We are writing to a snapshot --- check if anything outside <from-to> * range exists, if it does, it needs to be copied. */ - if (key.snap_from < from) { - if (likely(dm_multisnap_find_next_snapid_range(s, key.snap_from, &find_from, &find_to))) { + if (likely(dm_multisnap_find_next_snapid_range(s, key.snap_from, + &find_from, &find_to))) { if (find_from < from) { s->query_new_key.chunk = chunk; s->query_new_key.snap_from = from; @@ -64,7 +65,8 @@ int dm_multisnap_find_snapshot_chunk(struct dm_exception_store *s, snapid_t snap BUG(); /* we're asking for a SNAPID not in our tree */ } if (key.snap_to > to) { - if (likely(dm_multisnap_find_next_snapid_range(s, to + 1, &find_from, &find_to))) { + if (likely(dm_multisnap_find_next_snapid_range(s, to + 1, + &find_from, &find_to))) { if (find_from <= key.snap_to) { s->query_new_key.chunk = chunk; s->query_new_key.snap_from = key.snap_from; @@ -82,7 +84,6 @@ int dm_multisnap_find_snapshot_chunk(struct dm_exception_store *s, snapid_t snap /* * Reset the query/remap state machine. */ - void dm_multisnap_reset_query(struct dm_exception_store *s) { s->query_active = 0; @@ -92,7 +93,6 @@ void dm_multisnap_reset_query(struct dm_exception_store *s) /* * Find the next snapid range to remap. */ - int dm_multisnap_query_next_remap(struct dm_exception_store *s, chunk_t chunk) { int r; @@ -143,8 +143,8 @@ next_btree_search: /* * Perform the remap on the range returned by dm_multisnap_query_next_remap. */ - -void dm_multisnap_add_next_remap(struct dm_exception_store *s, union chunk_descriptor *cd, chunk_t *new_chunk) +void dm_multisnap_add_next_remap(struct dm_exception_store *s, + union chunk_descriptor *cd, chunk_t *new_chunk) { int r; @@ -169,8 +169,8 @@ void dm_multisnap_add_next_remap(struct dm_exception_store *s, union chunk_descr /* * Make the chunk writeable (i.e. unshare multiple snapshots). */ - -void dm_multisnap_make_chunk_writeable(struct dm_exception_store *s, union chunk_descriptor *cd, chunk_t *new_chunk) +void dm_multisnap_make_chunk_writeable(struct dm_exception_store *s, + union chunk_descriptor *cd, chunk_t *new_chunk) { int r; @@ -201,8 +201,8 @@ void dm_multisnap_make_chunk_writeable(struct dm_exception_store *s, union chunk /* * Check if the snapshot belongs to the remap range specified by "cd". */ - -int dm_multisnap_check_conflict(struct dm_exception_store *s, union chunk_descriptor *cd, snapid_t snapid) +int dm_multisnap_check_conflict(struct dm_exception_store *s, + union chunk_descriptor *cd, snapid_t snapid) { return snapid >= cd->range.from && snapid <= cd->range.to; } diff --git a/drivers/md/dm-multisnap-mikulas-struct.h b/drivers/md/dm-multisnap-mikulas-struct.h index 3ea1624..39eaa16 100644 --- a/drivers/md/dm-multisnap-mikulas-struct.h +++ b/drivers/md/dm-multisnap-mikulas-struct.h @@ -57,14 +57,14 @@ * * Super block * - * Chunk 0 is the superblock. It is defined in struct multisnap_superblock. + * Chunk 0 is the superblock. It is defined in 'struct multisnap_superblock'. * The superblock contains chunk size, commit block stride, error (if non-zero, * then the exception store is invalid) and pointer to the current commit block. * * Commit blocks * * Chunks 1, 1+cb_stride, 1+2*cb_stride, 1+3*cb_stride, etc. are commit blocks. - * Chunks at these location ((location % cb_stride) == 1) are only used for + * Chunks at these locations ((location % cb_stride) == 1) are only used for * commit blocks, they can't be used for anything else. A commit block is * written each time a new state is committed. The snapshot store transitions * from one consistent state to another consistent state by writing a commit @@ -104,8 +104,8 @@ * leaf entry contains: old chunk (in the origin), new chunk (in the snapshot * store), the range of snapshot IDs for which this mapping applies. The b+tree * is keyed by (old chunk, snapshot ID range). The b+tree node is specified - * in struct dm_multisnap_bt_node, the b+tree entry is in struct - * dm_multisnap_bt_entry. The maximum number of entries in one node is specified + * in 'struct dm_multisnap_bt_node', the b+tree entry is in 'struct + * dm_multisnap_bt_entry'. The maximum number of entries in one node is specified * so that the node fits into one chunk. * * The internal nodes have the same structure as the leaf nodes, except that: @@ -117,7 +117,7 @@ * * Snapshot IDs * - * We use 64-bit snapshot IDs. The high 32 bits is the number of a snapshot + * We use 64-bit snapshot IDs. The high 32 bits is the number of a snapshot. * This number always increases by one when creating a new snapshot. The * snapshot IDs are never reused. It is expected that the admin won't create * 2^32 snapshots. @@ -188,7 +188,7 @@ * store the pair (40, 41) into the commit block. * Now, we want to change this node again: so write a new version to a chunk 42 * and store the pair (40, 42) into the commit block. - * Now, let's do the same operation for other noder --- the remap array in the + * Now, let's do the same operation for other node --- the remap array in the * commit block eventually fills up. When this happens, we expunge (40, 42) map * by writing the path from the root: * copy node 30 to 43, change the pointer from 40 to 42 @@ -204,14 +204,14 @@ * thing would get into an infinite loop. So, to free blocks, a different method * is used: freelists. * - * We have a structure dm_multisnap_freelist that contains an array of runs of + * We have a 'struct dm_multisnap_freelist' that contains an array of runs of * blocks to free. Each run is the pair (start, length). When we need to free * a block, we add the block to the freelist. We optionally allocate a free - * list, if there is none freelist, or if the current freelist is full. If one + * list, if there is no freelist, or if the current freelist is full. If one * freelist is not sufficient, a linked list of freelists is being created. * In the commit we write the freelist location to the commit block and after * the commit, we free individual bits in the bitmaps. If the computer crashes - * during freeing the bits, we just free the bits again on next mount. + * during freeing the bits we just free the bits again on next mount. */ #ifndef CONFIG_DM_MULTISNAPSHOT_MIKULAS_SNAP_OF_SNAP @@ -345,7 +345,8 @@ struct dm_multisnap_bt_node { static inline unsigned dm_multisnap_btree_entries(unsigned chunk_size) { - return (chunk_size - sizeof(struct dm_multisnap_bt_node)) / sizeof(struct dm_multisnap_bt_entry); + return (chunk_size - sizeof(struct dm_multisnap_bt_node)) / + sizeof(struct dm_multisnap_bt_entry); } @@ -372,7 +373,8 @@ struct dm_multisnap_freelist { static inline unsigned dm_multisnap_freelist_entries(unsigned chunk_size) { - return (chunk_size - sizeof(struct dm_multisnap_freelist)) / sizeof(struct dm_multisnap_freelist); + return (chunk_size - sizeof(struct dm_multisnap_freelist)) / + sizeof(struct dm_multisnap_freelist); } #endif diff --git a/drivers/md/dm-multisnap-mikulas.c b/drivers/md/dm-multisnap-mikulas.c index 0fc4195..ec6e30f 100644 --- a/drivers/md/dm-multisnap-mikulas.c +++ b/drivers/md/dm-multisnap-mikulas.c @@ -11,7 +11,6 @@ /* * Initialize in-memory structures, belonging to the commit block. */ - static void init_commit_block(struct dm_exception_store *s) { int i; @@ -51,7 +50,6 @@ static void init_commit_block(struct dm_exception_store *s) * Load the commit block specified in s->valid_commit_block to memory * and populate in-memory structures. */ - static void load_commit_block(struct dm_exception_store *s) { struct dm_buffer *bp; @@ -64,12 +62,16 @@ static void load_commit_block(struct dm_exception_store *s) cb = dm_bufio_read(s->bufio, s->valid_commit_block, &bp); if (IS_ERR(cb)) { - DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(cb), ("load_commit_block: can't re-read commit block %llx", (unsigned long long)s->valid_commit_block)); + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(cb), + ("load_commit_block: can't re-read commit block %llx", + (unsigned long long)s->valid_commit_block)); return; } if (cb->signature != CB_SIGNATURE) { dm_bufio_release(bp); - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("load_commit_block: bad signature when re-reading commit block %llx", (unsigned long long)s->valid_commit_block)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("load_commit_block: bad signature when re-reading commit block %llx", + (unsigned long long)s->valid_commit_block)); return; } @@ -90,7 +92,9 @@ static void load_commit_block(struct dm_exception_store *s) if (s->bt_depth > MAX_BT_DEPTH || !s->bt_depth) { dm_bufio_release(bp); - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("load_commit_block: invalid b+-tree depth in commit block %llx", (unsigned long long)s->valid_commit_block)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("load_commit_block: invalid b+-tree depth in commit block %llx", + (unsigned long long)s->valid_commit_block)); return; } @@ -116,12 +120,14 @@ static void load_commit_block(struct dm_exception_store *s) dm_bufio_release(bp); if ((chunk_t)(dev_size + s->cb_stride) < (chunk_t)dev_size) { - DM_MULTISNAP_SET_ERROR(s->dm, -ERANGE, ("load_commit_block: device is too large. Compile kernel with 64-bit sector numbers")); + DM_MULTISNAP_SET_ERROR(s->dm, -ERANGE, + ("load_commit_block: device is too large. Compile kernel with 64-bit sector numbers")); return; } bitmap_depth = dm_multisnap_bitmap_depth(s->chunk_shift, dev_size); if (bitmap_depth < 0) { - DM_MULTISNAP_SET_ERROR(s->dm, bitmap_depth, ("load_commit_block: device is too large")); + DM_MULTISNAP_SET_ERROR(s->dm, bitmap_depth, + ("load_commit_block: device is too large")); return; } s->dev_size = dev_size; @@ -137,7 +143,6 @@ static void load_commit_block(struct dm_exception_store *s) * commit blocks linearly as long as the sequence number in the commit block * increases. */ - static void find_commit_block(struct dm_exception_store *s) { struct dm_buffer *bp; @@ -151,12 +156,16 @@ static void find_commit_block(struct dm_exception_store *s) try_next: cb = dm_bufio_read(s->bufio, cb_addr, &bp); if (IS_ERR(cb)) { - DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(cb), ("find_commit_block: can't read commit block %llx", (unsigned long long)cb_addr)); + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(cb), + ("find_commit_block: can't read commit block %llx", + (unsigned long long)cb_addr)); return; } if (cb->signature != CB_SIGNATURE) { dm_bufio_release(bp); - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("find_commit_block: bad signature on commit block %llx", (unsigned long long)cb_addr)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("find_commit_block: bad signature on commit block %llx", + (unsigned long long)cb_addr)); return; } @@ -174,7 +183,8 @@ try_next: } } if (!s->valid_commit_block) { - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("find_commit_block: no valid commit block")); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("find_commit_block: no valid commit block")); return; } } @@ -182,7 +192,6 @@ try_next: /* * Return device size in chunks. */ - static int get_size(struct dm_exception_store *s, chunk_t *size) { __u64 dev_size; @@ -197,7 +206,6 @@ static int get_size(struct dm_exception_store *s, chunk_t *size) /* * Initialize the whole snapshot store. */ - static void initialize_device(struct dm_exception_store *s) { int r; @@ -211,7 +219,8 @@ static void initialize_device(struct dm_exception_store *s) r = get_size(s, &s->dev_size); if (r) { - DM_MULTISNAP_SET_ERROR(s->dm, r, ("initialize_device: device is too large. Compile kernel with 64-bit sector numbers")); + DM_MULTISNAP_SET_ERROR(s->dm, r, + ("initialize_device: device is too large. Compile kernel with 64-bit sector numbers")); return; } @@ -220,27 +229,30 @@ static void initialize_device(struct dm_exception_store *s) block_to_write = SB_BLOCK + 1; -/* Write btree */ + /* Write btree */ dm_multisnap_create_btree(s, &block_to_write); if (dm_multisnap_has_error(s->dm)) return; -/* Write bitmaps */ + /* Write bitmaps */ dm_multisnap_create_bitmaps(s, &block_to_write); if (dm_multisnap_has_error(s->dm)) return; s->dev_size = block_to_write; -/* Write commit blocks */ + /* Write commit blocks */ if (FIRST_CB_BLOCK >= s->dev_size) { - DM_MULTISNAP_SET_ERROR(s->dm, -ENOSPC, ("initialize_device: device is too small")); + DM_MULTISNAP_SET_ERROR(s->dm, -ENOSPC, + ("initialize_device: device is too small")); return; } for (cb_block = FIRST_CB_BLOCK; cb_block < s->dev_size; cb_block += s->cb_stride) { cb = dm_bufio_new(s->bufio, cb_block, &bp); if (IS_ERR(cb)) { - DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(cb), ("initialize_device: can't allocate commit block at %llx", (unsigned long long)cb_block)); + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(cb), + ("initialize_device: can't allocate commit block at %llx", + (unsigned long long)cb_block)); return; } memset(cb, 0, s->chunk_size); @@ -263,14 +275,16 @@ static void initialize_device(struct dm_exception_store *s) } r = dm_bufio_write_dirty_buffers(s->bufio); if (r) { - DM_MULTISNAP_SET_ERROR(s->dm, r, ("initialize_device: write error when initializing device")); + DM_MULTISNAP_SET_ERROR(s->dm, r, + ("initialize_device: write error when initializing device")); return; } -/* Write super block */ + /* Write super block */ sb = dm_bufio_new(s->bufio, SB_BLOCK, &bp); if (IS_ERR(sb)) { - DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(sb), ("initialize_device: can't allocate super block")); + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(sb), + ("initialize_device: can't allocate super block")); return; } memset(sb, 0, s->chunk_size); @@ -283,7 +297,8 @@ static void initialize_device(struct dm_exception_store *s) dm_bufio_release(bp); r = dm_bufio_write_dirty_buffers(s->bufio); if (r) { - DM_MULTISNAP_SET_ERROR(s->dm, r, ("initialize_device: can't write super block")); + DM_MULTISNAP_SET_ERROR(s->dm, r, + ("initialize_device: can't write super block")); return; } } @@ -293,21 +308,22 @@ static void initialize_device(struct dm_exception_store *s) * * Note: the size can never decrease. */ - static void extend_exception_store(struct dm_exception_store *s, chunk_t new_size) { struct dm_buffer *bp; chunk_t cb_block; struct multisnap_commit_block *cb; -/* Write commit blocks */ + /* Write commit blocks */ for (cb_block = FIRST_CB_BLOCK; cb_block < new_size; cb_block += s->cb_stride) { cond_resched(); if (cb_block < s->dev_size) continue; cb = dm_bufio_new(s->bufio, cb_block, &bp); if (IS_ERR(cb)) { - DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(cb), ("initialize_device: can't allocate commit block at %llx", (unsigned long long)cb_block)); + DM_MULTISNAP_SET_ERROR(s->dm, PTR_ERR(cb), + ("initialize_device: can't allocate commit block at %llx", + (unsigned long long)cb_block)); return; } memset(cb, 0, s->chunk_size); @@ -332,7 +348,6 @@ static void extend_exception_store(struct dm_exception_store *s, chunk_t new_siz * If the super block is zeroed, we do initialization. * Otherwise we report error. */ - static int read_super(struct dm_exception_store *s, char **error) { struct dm_buffer *bp; @@ -407,7 +422,8 @@ re_read: if (e < 0) { /* Don't read the B+-tree if there was an error */ - DM_MULTISNAP_SET_ERROR(s->dm, e, ("read_super: activating invalidated snapshot store, error %d", e)); + DM_MULTISNAP_SET_ERROR(s->dm, e, + ("read_super: activating invalidated snapshot store, error %d", e)); return 0; } @@ -433,7 +449,6 @@ re_read: * If the device size has shrunk, we report an error and stop further * operations. */ - static void dm_multisnap_mikulas_lock_acquired(struct dm_exception_store *s, int flags) { int r; @@ -448,7 +463,8 @@ static void dm_multisnap_mikulas_lock_acquired(struct dm_exception_store *s, int if (unlikely(new_size != s->dev_size)) { if (unlikely(new_size < s->dev_size)) { - DM_MULTISNAP_SET_ERROR(s->dm, -EINVAL, ("dm_multisnap_mikulas_lock_acquired: device shrinked")); + DM_MULTISNAP_SET_ERROR(s->dm, -EINVAL, + ("dm_multisnap_mikulas_lock_acquired: device shrinked")); return; } extend_exception_store(s, new_size); @@ -462,7 +478,9 @@ static void dm_multisnap_mikulas_lock_acquired(struct dm_exception_store *s, int /*#define PRINT_BTREE*/ #ifdef PRINT_BTREE -static int print_btree_callback(struct dm_exception_store *s, struct dm_multisnap_bt_node *node, struct dm_multisnap_bt_entry *bt, void *cookie) +static int print_btree_callback(struct dm_exception_store *s, + struct dm_multisnap_bt_node *node, + struct dm_multisnap_bt_entry *bt, void *cookie) { printk(KERN_DEBUG "entry: %llx, %llx-%llx -> %llx\n", (unsigned long long)read_48(bt, orig_chunk), @@ -490,7 +508,8 @@ static void print_bitmaps(struct dm_exception_store *s) for (c = 0; c < s->dev_size; c += s->chunk_size * 8) { struct dm_buffer *bp; unsigned i; - void *bmp = dm_multisnap_map_bitmap(s, c >> (s->chunk_shift + 3), &bp, NULL, NULL); + void *bmp = dm_multisnap_map_bitmap(s, c >> (s->chunk_shift + 3), + &bp, NULL, NULL); if (!bmp) continue; for (i = 0; i < s->chunk_size * 8; i++) @@ -513,8 +532,9 @@ static void print_bitmaps(struct dm_exception_store *s) * Parse arguments, allocate structures and call read_super to read the data * from the disk. */ - -static int dm_multisnap_mikulas_init(struct dm_multisnap *dm, struct dm_exception_store **sp, unsigned argc, char **argv, char **error) +static int dm_multisnap_mikulas_init(struct dm_multisnap *dm, + struct dm_exception_store **sp, + unsigned argc, char **argv, char **error) { int r, i; struct dm_exception_store *s; @@ -551,11 +571,13 @@ static int dm_multisnap_mikulas_init(struct dm_multisnap *dm, struct dm_exceptio if (r) goto bad_arguments; if (!strcasecmp(string, "cache-threshold")) { - r = dm_multisnap_get_uint64(&argv, &argc, &s->cache_threshold, error); + r = dm_multisnap_get_uint64(&argv, &argc, + &s->cache_threshold, error); if (r) goto bad_arguments; } else if (!strcasecmp(string, "cache-limit")) { - r = dm_multisnap_get_uint64(&argv, &argc, &s->cache_limit, error); + r = dm_multisnap_get_uint64(&argv, &argc, + &s->cache_limit, error); if (r) goto bad_arguments; } else { @@ -580,7 +602,9 @@ static int dm_multisnap_mikulas_init(struct dm_multisnap *dm, struct dm_exceptio goto bad_freelist; } - s->bufio = dm_bufio_client_create(dm_multisnap_snapshot_bdev(s->dm), s->chunk_size, 0, s->cache_threshold, s->cache_limit); + s->bufio = dm_bufio_client_create(dm_multisnap_snapshot_bdev(s->dm), + s->chunk_size, 0, s->cache_threshold, + s->cache_limit); if (IS_ERR(s->bufio)) { *error = "Can't create bufio client"; r = PTR_ERR(s->bufio); @@ -591,7 +615,8 @@ static int dm_multisnap_mikulas_init(struct dm_multisnap *dm, struct dm_exceptio if (r) goto bad_super; - if (s->flags & (DM_MULTISNAP_FLAG_DELETING | DM_MULTISNAP_FLAG_PENDING_DELETE)) + if (s->flags & (DM_MULTISNAP_FLAG_DELETING | + DM_MULTISNAP_FLAG_PENDING_DELETE)) dm_multisnap_queue_work(s->dm, &s->delete_work); #ifdef PRINT_BTREE @@ -619,7 +644,6 @@ bad_private: /* * Exit the exception store. */ - static void dm_multisnap_mikulas_exit(struct dm_exception_store *s) { int i; @@ -628,14 +652,16 @@ static void dm_multisnap_mikulas_exit(struct dm_exception_store *s) i = 0; while (!list_empty(&s->used_bitmap_tmp_remaps)) { - struct tmp_remap *t = list_first_entry(&s->used_bitmap_tmp_remaps, struct tmp_remap, list); + struct tmp_remap *t = list_first_entry(&s->used_bitmap_tmp_remaps, + struct tmp_remap, list); list_del(&t->list); hlist_del(&t->hash_list); i++; } while (!list_empty(&s->used_bt_tmp_remaps)) { - struct tmp_remap *t = list_first_entry(&s->used_bt_tmp_remaps, struct tmp_remap, list); + struct tmp_remap *t = list_first_entry(&s->used_bt_tmp_remaps, + struct tmp_remap, list); list_del(&t->list); hlist_del(&t->hash_list); i++; @@ -664,8 +690,8 @@ static void dm_multisnap_mikulas_exit(struct dm_exception_store *s) * Return exception-store specific arguments. This is used in the proces of * constructing the table returned by device mapper. */ - -static void dm_multisnap_status_table(struct dm_exception_store *s, char *result, unsigned maxlen) +static void dm_multisnap_status_table(struct dm_exception_store *s, + char *result, unsigned maxlen) { int npar = 0; if (s->cache_threshold) @@ -677,11 +703,13 @@ static void dm_multisnap_status_table(struct dm_exception_store *s, char *result dm_multisnap_adjust_string(&result, &maxlen); if (s->cache_threshold) { - snprintf(result, maxlen, " cache-threshold %llu", (unsigned long long)s->cache_threshold); + snprintf(result, maxlen, " cache-threshold %llu", + (unsigned long long)s->cache_threshold); dm_multisnap_adjust_string(&result, &maxlen); } if (s->cache_limit) { - snprintf(result, maxlen, " cache-limit %llu", (unsigned long long)s->cache_limit); + snprintf(result, maxlen, " cache-limit %llu", + (unsigned long long)s->cache_limit); dm_multisnap_adjust_string(&result, &maxlen); } } @@ -730,4 +758,3 @@ module_exit(dm_multisnapshot_mikulas_module_exit); MODULE_DESCRIPTION(DM_NAME " multisnapshot Mikulas' exceptions store"); MODULE_AUTHOR("Mikulas Patocka"); MODULE_LICENSE("GPL"); - diff --git a/drivers/md/dm-multisnap-mikulas.h b/drivers/md/dm-multisnap-mikulas.h index 36cf8c3..52c87e0 100644 --- a/drivers/md/dm-multisnap-mikulas.h +++ b/drivers/md/dm-multisnap-mikulas.h @@ -24,8 +24,11 @@ typedef __u32 bitmap_t; -#define read_48(struc, entry) (le32_to_cpu((struc)->entry##1) | ((chunk_t)le16_to_cpu((struc)->entry##2) << 31 << 1)) -#define write_48(struc, entry, val) do { (struc)->entry##1 = cpu_to_le32(val); (struc)->entry##2 = cpu_to_le16((chunk_t)(val) >> 31 >> 1); } while (0) +#define read_48(struc, entry) (le32_to_cpu((struc)->entry##1) |\ + ((chunk_t)le16_to_cpu((struc)->entry##2) << 31 << 1)) + +#define write_48(struc, entry, val) do { (struc)->entry##1 = cpu_to_le32(val); \ + (struc)->entry##2 = cpu_to_le16((chunk_t)(val) >> 31 >> 1); } while (0) #define TMP_REMAP_HASH_SIZE 256 #define TMP_REMAP_HASH(c) ((c) & (TMP_REMAP_HASH_SIZE - 1)) @@ -122,25 +125,37 @@ struct dm_exception_store { void dm_multisnap_create_bitmaps(struct dm_exception_store *s, chunk_t *writing_block); void dm_multisnap_extend_bitmaps(struct dm_exception_store *s, chunk_t new_size); -void *dm_multisnap_map_bitmap(struct dm_exception_store *s, bitmap_t bitmap, struct dm_buffer **bp, chunk_t *block, struct path_element *path); -int dm_multisnap_alloc_blocks(struct dm_exception_store *s, chunk_t *results, unsigned n_blocks, int flags); +void *dm_multisnap_map_bitmap(struct dm_exception_store *s, bitmap_t bitmap, + struct dm_buffer **bp, chunk_t *block, + struct path_element *path); +int dm_multisnap_alloc_blocks(struct dm_exception_store *s, chunk_t *results, + unsigned n_blocks, int flags); #define ALLOC_DRY 1 -void *dm_multisnap_alloc_duplicate_block(struct dm_exception_store *s, chunk_t block, struct dm_buffer **bp, void *ptr); -void *dm_multisnap_alloc_make_block(struct dm_exception_store *s, chunk_t *result, struct dm_buffer **bp); -void dm_multisnap_free_blocks_immediate(struct dm_exception_store *s, chunk_t block, unsigned n_blocks); -void dm_multisnap_bitmap_finalize_tmp_remap(struct dm_exception_store *s, struct tmp_remap *tmp_remap); +void *dm_multisnap_alloc_duplicate_block(struct dm_exception_store *s, chunk_t block, + struct dm_buffer **bp, void *ptr); +void *dm_multisnap_alloc_make_block(struct dm_exception_store *s, chunk_t *result, + struct dm_buffer **bp); +void dm_multisnap_free_blocks_immediate(struct dm_exception_store *s, chunk_t block, + unsigned n_blocks); +void dm_multisnap_bitmap_finalize_tmp_remap(struct dm_exception_store *s, + struct tmp_remap *tmp_remap); /* dm-multisnap-blocks.c */ chunk_t dm_multisnap_remap_block(struct dm_exception_store *s, chunk_t block); -void *dm_multisnap_read_block(struct dm_exception_store *s, chunk_t block, struct dm_buffer **bp); +void *dm_multisnap_read_block(struct dm_exception_store *s, chunk_t block, + struct dm_buffer **bp); int dm_multisnap_block_is_uncommitted(struct dm_exception_store *s, chunk_t block); void dm_multisnap_block_set_uncommitted(struct dm_exception_store *s, chunk_t block); void dm_multisnap_clear_uncommitted(struct dm_exception_store *s); -void *dm_multisnap_duplicate_block(struct dm_exception_store *s, chunk_t old_chunk, chunk_t new_chunk, bitmap_t bitmap_idx, struct dm_buffer **bp, chunk_t *to_free); +void *dm_multisnap_duplicate_block(struct dm_exception_store *s, chunk_t old_chunk, + chunk_t new_chunk, bitmap_t bitmap_idx, + struct dm_buffer **bp, chunk_t *to_free); void dm_multisnap_free_tmp_remap(struct dm_exception_store *s, struct tmp_remap *t); -void *dm_multisnap_make_block(struct dm_exception_store *s, chunk_t new_chunk, struct dm_buffer **bp); -void dm_multisnap_free_block_and_duplicates(struct dm_exception_store *s, chunk_t block); +void *dm_multisnap_make_block(struct dm_exception_store *s, chunk_t new_chunk, + struct dm_buffer **bp); +void dm_multisnap_free_block_and_duplicates(struct dm_exception_store *s, + chunk_t block); int dm_multisnap_is_commit_block(struct dm_exception_store *s, chunk_t block); @@ -150,18 +165,26 @@ struct stop_cycles { }; void dm_multisnap_init_stop_cycles(struct stop_cycles *cy); -int dm_multisnap_stop_cycles(struct dm_exception_store *s, struct stop_cycles *cy, chunk_t key); +int dm_multisnap_stop_cycles(struct dm_exception_store *s, + struct stop_cycles *cy, chunk_t key); /* dm-multisnap-btree.c */ void dm_multisnap_create_btree(struct dm_exception_store *s, chunk_t *writing_block); -int dm_multisnap_find_in_btree(struct dm_exception_store *s, struct bt_key *key, chunk_t *result); -void dm_multisnap_add_to_btree(struct dm_exception_store *s, struct bt_key *key, chunk_t new_chunk); +int dm_multisnap_find_in_btree(struct dm_exception_store *s, struct bt_key *key, + chunk_t *result); +void dm_multisnap_add_to_btree(struct dm_exception_store *s, struct bt_key *key, + chunk_t new_chunk); void dm_multisnap_restrict_btree_entry(struct dm_exception_store *s, struct bt_key *key); void dm_multisnap_extend_btree_entry(struct dm_exception_store *s, struct bt_key *key); void dm_multisnap_delete_from_btree(struct dm_exception_store *s, struct bt_key *key); -void dm_multisnap_bt_finalize_tmp_remap(struct dm_exception_store *s, struct tmp_remap *tmp_remap); -int dm_multisnap_list_btree(struct dm_exception_store *s, struct bt_key *key, int (*call)(struct dm_exception_store *, struct dm_multisnap_bt_node *, struct dm_multisnap_bt_entry *, void *), void *cookie); +void dm_multisnap_bt_finalize_tmp_remap(struct dm_exception_store *s, + struct tmp_remap *tmp_remap); +int dm_multisnap_list_btree(struct dm_exception_store *s, struct bt_key *key, + int (*call)(struct dm_exception_store *, + struct dm_multisnap_bt_node *, + struct dm_multisnap_bt_entry *, void *), + void *cookie); /* dm-multisnap-commit.c */ @@ -171,7 +194,8 @@ void dm_multisnap_commit(struct dm_exception_store *s); /* dm-multisnap-delete.c */ -void dm_multisnap_background_delete(struct dm_exception_store *s, struct dm_multisnap_background_work *bw); +void dm_multisnap_background_delete(struct dm_exception_store *s, + struct dm_multisnap_background_work *bw); /* dm-multisnap-freelist.c */ @@ -183,31 +207,41 @@ void dm_multisnap_load_freelist(struct dm_exception_store *s); /* dm-multisnap-io.c */ -int dm_multisnap_find_snapshot_chunk(struct dm_exception_store *s, snapid_t snapid, chunk_t chunk, int write, chunk_t *result); +int dm_multisnap_find_snapshot_chunk(struct dm_exception_store *s, snapid_t snapid, + chunk_t chunk, int write, chunk_t *result); void dm_multisnap_reset_query(struct dm_exception_store *s); int dm_multisnap_query_next_remap(struct dm_exception_store *s, chunk_t chunk); -void dm_multisnap_add_next_remap(struct dm_exception_store *s, union chunk_descriptor *cd, chunk_t *new_chunk); -void dm_multisnap_make_chunk_writeable(struct dm_exception_store *s, union chunk_descriptor *cd, chunk_t *new_chunk); -int dm_multisnap_check_conflict(struct dm_exception_store *s, union chunk_descriptor *cd, snapid_t snapid); +void dm_multisnap_add_next_remap(struct dm_exception_store *s, + union chunk_descriptor *cd, chunk_t *new_chunk); +void dm_multisnap_make_chunk_writeable(struct dm_exception_store *s, + union chunk_descriptor *cd, chunk_t *new_chunk); +int dm_multisnap_check_conflict(struct dm_exception_store *s, union chunk_descriptor *cd, + snapid_t snapid); /* dm-multisnap-snaps.c */ snapid_t dm_multisnap_get_next_snapid(struct dm_exception_store *s, snapid_t snapid); int dm_multisnap_compare_snapids_for_create(const void *p1, const void *p2); -int dm_multisnap_find_next_snapid_range(struct dm_exception_store *s, snapid_t snapid, snapid_t *from, snapid_t *to); +int dm_multisnap_find_next_snapid_range(struct dm_exception_store *s, snapid_t snapid, + snapid_t *from, snapid_t *to); snapid_t dm_multisnap_find_next_subsnapshot(struct dm_exception_store *s, snapid_t snapid); void dm_multisnap_destroy_snapshot_tree(struct dm_exception_store *s); void dm_multisnap_read_snapshots(struct dm_exception_store *s); -int dm_multisnap_allocate_snapid(struct dm_exception_store *s, snapid_t *snapid, int snap_of_snap, snapid_t master); +int dm_multisnap_allocate_snapid(struct dm_exception_store *s, snapid_t *snapid, + int snap_of_snap, snapid_t master); int dm_multisnap_create_snapshot(struct dm_exception_store *s, snapid_t snapid); int dm_multisnap_delete_snapshot(struct dm_exception_store *s, snapid_t snapid); -void dm_multisnap_get_space(struct dm_exception_store *s, unsigned long long *chunks_total, unsigned long long *chunks_allocated, unsigned long long *chunks_metadata_allocated); +void dm_multisnap_get_space(struct dm_exception_store *s, unsigned long long *chunks_total, + unsigned long long *chunks_allocated, + unsigned long long *chunks_metadata_allocated); #ifdef CONFIG_DM_MULTISNAPSHOT_MIKULAS_SNAP_OF_SNAP -void dm_multisnap_print_snapid(struct dm_exception_store *s, char *string, unsigned maxlen, snapid_t snapid); -int dm_multisnap_read_snapid(struct dm_exception_store *s, char *string, snapid_t *snapid, char **error); +void dm_multisnap_print_snapid(struct dm_exception_store *s, char *string, + unsigned maxlen, snapid_t snapid); +int dm_multisnap_read_snapid(struct dm_exception_store *s, char *string, + snapid_t *snapid, char **error); #endif #endif diff --git a/drivers/md/dm-multisnap-snaps.c b/drivers/md/dm-multisnap-snaps.c index c26125d..9947673 100644 --- a/drivers/md/dm-multisnap-snaps.c +++ b/drivers/md/dm-multisnap-snaps.c @@ -11,7 +11,6 @@ /* * In-memory red-black tree denoting the used snapshot IDs. */ - struct snapshot_range { struct rb_node node; mikulas_snapid_t from; @@ -22,8 +21,9 @@ struct snapshot_range { * Find a leftmost key in rbtree in the specified range (if add == 0) * or create a new key (if add != 0). */ - -static struct snapshot_range *rb_find_insert_snapshot(struct dm_exception_store *s, mikulas_snapid_t from, mikulas_snapid_t to, int add) +static struct snapshot_range * +rb_find_insert_snapshot(struct dm_exception_store *s, + mikulas_snapid_t from, mikulas_snapid_t to, int add) { struct snapshot_range *new; struct snapshot_range *found = NULL; @@ -45,11 +45,13 @@ go_left: goto go_left; break; } else { - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("rb_insert_snapshot: inserting overlapping entry: (%llx,%llx) overlaps (%llx,%llx)", - (unsigned long long)from, - (unsigned long long)to, - (unsigned long long)rn->from, - (unsigned long long)rn->to)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("rb_insert_snapshot: inserting overlapping entry: " + "(%llx,%llx) overlaps (%llx,%llx)", + (unsigned long long)from, + (unsigned long long)to, + (unsigned long long)rn->from, + (unsigned long long)rn->to)); return NULL; } } @@ -62,7 +64,8 @@ go_left: new = kmalloc(sizeof(struct snapshot_range), GFP_KERNEL); if (!new) { - DM_MULTISNAP_SET_ERROR(s->dm, -ENOMEM, ("rb_insert_snapshot: can't allocate memory for snapshot descriptor")); + DM_MULTISNAP_SET_ERROR(s->dm, -ENOMEM, + ("rb_insert_snapshot: can't allocate memory for snapshot descriptor")); return NULL; } @@ -78,8 +81,9 @@ go_left: /* * Find a leftmost key in rbtree in the specified range. */ - -static struct snapshot_range *rb_find_snapshot(struct dm_exception_store *s, mikulas_snapid_t from, mikulas_snapid_t to) +static struct snapshot_range * +rb_find_snapshot(struct dm_exception_store *s, + mikulas_snapid_t from, mikulas_snapid_t to) { return rb_find_insert_snapshot(s, from, to, 0); } @@ -87,8 +91,9 @@ static struct snapshot_range *rb_find_snapshot(struct dm_exception_store *s, mik /* * Insert a range to rbtree. It must not overlap with existing entries. */ - -static int rb_insert_snapshot_unlocked(struct dm_exception_store *s, mikulas_snapid_t from, mikulas_snapid_t to) +static int rb_insert_snapshot_unlocked(struct dm_exception_store *s, + mikulas_snapid_t from, + mikulas_snapid_t to) { struct snapshot_range *rn; rn = rb_find_insert_snapshot(s, from, to, 1); @@ -101,8 +106,8 @@ static int rb_insert_snapshot_unlocked(struct dm_exception_store *s, mikulas_sna * Hold the lock and insert a range to rbtree. It must not overlap with * existing entries. */ - -static int rb_insert_snapshot(struct dm_exception_store *s, mikulas_snapid_t from, mikulas_snapid_t to) +static int rb_insert_snapshot(struct dm_exception_store *s, + mikulas_snapid_t from, mikulas_snapid_t to) { int r; dm_multisnap_status_lock(s->dm); @@ -115,17 +120,23 @@ static int rb_insert_snapshot(struct dm_exception_store *s, mikulas_snapid_t fro * "from" must be last entry in the existing range. This function extends the * range. The extended area must not overlap with another entry. */ - -static int rb_extend_range(struct dm_exception_store *s, mikulas_snapid_t from, mikulas_snapid_t to) +static int rb_extend_range(struct dm_exception_store *s, + mikulas_snapid_t from, mikulas_snapid_t to) { struct snapshot_range *rn; rn = rb_find_insert_snapshot(s, from, from, 0); if (!rn) { - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("rb_extend_range: snapshot %llx not found", (unsigned long long)from)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("rb_extend_range: snapshot %llx not found", + (unsigned long long)from)); return -1; } if (rn->to != from) { - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("rb_extend_range: bad attempt to extend range: %llx >= %llx", (unsigned long long)rn->to, (unsigned long long)from)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("rb_extend_range: bad attempt to extend range: " + "%llx >= %llx", + (unsigned long long)rn->to, + (unsigned long long)from)); return -1; } dm_multisnap_status_lock(s->dm); @@ -140,13 +151,17 @@ static int rb_extend_range(struct dm_exception_store *s, mikulas_snapid_t from, * It is valid to specify a subset of existing range, in this case, the range * is trimmed and possible split to two ranges. */ - -static int rb_delete_range(struct dm_exception_store *s, mikulas_snapid_t from, mikulas_snapid_t to) +static int rb_delete_range(struct dm_exception_store *s, + mikulas_snapid_t from, mikulas_snapid_t to) { struct snapshot_range *sr = rb_find_snapshot(s, from, from); if (!sr || sr->to < to) { - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("rb_delete_range: deleting non-existing snapid %llx-%llx", (unsigned long long)from, (unsigned long long)to)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("rb_delete_range: deleting non-existing snapid " + "%llx-%llx", + (unsigned long long)from, + (unsigned long long)to)); return -1; } @@ -178,8 +193,8 @@ static int rb_delete_range(struct dm_exception_store *s, mikulas_snapid_t from, * Otherwise, return the next valid snapshot ID. * If there is no next valid snapshot ID, return DM_SNAPID_T_ORIGIN. */ - -snapid_t dm_multisnap_get_next_snapid(struct dm_exception_store *s, snapid_t snapid) +snapid_t dm_multisnap_get_next_snapid(struct dm_exception_store *s, + snapid_t snapid) { struct snapshot_range *rn; @@ -198,8 +213,9 @@ snapid_t dm_multisnap_get_next_snapid(struct dm_exception_store *s, snapid_t sna * A wrapper around rb_find_snapshot that is useable in other object files * that don't know about struct snapshot_range. */ - -int dm_multisnap_find_next_snapid_range(struct dm_exception_store *s, snapid_t snapid, snapid_t *from, snapid_t *to) +int dm_multisnap_find_next_snapid_range(struct dm_exception_store *s, + snapid_t snapid, snapid_t *from, + snapid_t *to) { struct snapshot_range *rn; rn = rb_find_snapshot(s, snapid, DM_SNAPID_T_MAX); @@ -213,7 +229,6 @@ int dm_multisnap_find_next_snapid_range(struct dm_exception_store *s, snapid_t s /* * Return true, if the snapid is master (not subsnapshot). */ - static int dm_multisnap_snapid_is_master(snapid_t snapid) { return (snapid & DM_MIKULAS_SUBSNAPID_MASK) == DM_MIKULAS_SUBSNAPID_MASK; @@ -224,14 +239,15 @@ static int dm_multisnap_snapid_is_master(snapid_t snapid) * * If it returns snapid, then no subsnapshot can be created. */ - -snapid_t dm_multisnap_find_next_subsnapshot(struct dm_exception_store *s, snapid_t snapid) +snapid_t dm_multisnap_find_next_subsnapshot(struct dm_exception_store *s, + snapid_t snapid) { #ifdef CONFIG_DM_MULTISNAPSHOT_MIKULAS_SNAP_OF_SNAP mikulas_snapid_t find_from, find_to; if (unlikely(!dm_multisnap_snapid_is_master(snapid))) return snapid; - if (!dm_multisnap_find_next_snapid_range(s, snapid, &find_from, &find_to)) + if (!dm_multisnap_find_next_snapid_range(s, snapid, + &find_from, &find_to)) BUG(); snapid &= ~DM_MIKULAS_SUBSNAPID_MASK; if (snapid < find_from) @@ -243,7 +259,6 @@ snapid_t dm_multisnap_find_next_subsnapshot(struct dm_exception_store *s, snapid /* * Deallocate the whole rbtree. */ - void dm_multisnap_destroy_snapshot_tree(struct dm_exception_store *s) { struct rb_node *root; @@ -258,7 +273,6 @@ void dm_multisnap_destroy_snapshot_tree(struct dm_exception_store *s) /* * Populate in-memory rbtree from on-disk b+tree. */ - void dm_multisnap_read_snapshots(struct dm_exception_store *s) { struct bt_key snap_key; @@ -279,7 +293,8 @@ find_next: if (r) { if (unlikely(snap_key.snap_to > DM_SNAPID_T_MAX)) { - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_read_snapshots: invalid snapshot id")); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_read_snapshots: invalid snapshot id")); return; } r = rb_insert_snapshot(s, snap_key.snap_from, snap_key.snap_to); @@ -295,8 +310,8 @@ find_next: * If snap_of_snap != 0, allocate a subsnapshot ID for snapshot "master". * Otherwise, allocate a new master snapshot ID. */ - -int dm_multisnap_allocate_snapid(struct dm_exception_store *s, snapid_t *snapid, int snap_of_snap, snapid_t master) +int dm_multisnap_allocate_snapid(struct dm_exception_store *s, + snapid_t *snapid, int snap_of_snap, snapid_t master) { if (snap_of_snap) { #ifdef CONFIG_DM_MULTISNAPSHOT_MIKULAS_SNAP_OF_SNAP @@ -327,8 +342,8 @@ int dm_multisnap_allocate_snapid(struct dm_exception_store *s, snapid_t *snapid, * Add a snapid range to in-memory rbtree and on-disk b+tree. * Optionally, merge with the previous range. Don't merge with the next. */ - -static int dm_multisnap_create_snapid_range(struct dm_exception_store *s, snapid_t from, snapid_t to) +static int dm_multisnap_create_snapid_range(struct dm_exception_store *s, + snapid_t from, snapid_t to) { int r; struct bt_key snap_key; @@ -368,8 +383,8 @@ static int dm_multisnap_create_snapid_range(struct dm_exception_store *s, snapid /* * Delete a snapid range from in-memory rbtree and on-disk b+tree. */ - -static int dm_multisnap_delete_snapid_range(struct dm_exception_store *s, snapid_t from, snapid_t to) +static int dm_multisnap_delete_snapid_range(struct dm_exception_store *s, + snapid_t from, snapid_t to) { int r; struct bt_key snap_key; @@ -386,11 +401,15 @@ static int dm_multisnap_delete_snapid_range(struct dm_exception_store *s, snapid r = dm_multisnap_find_in_btree(s, &snap_key, &ignore); if (r <= 0) { if (!r) - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_delete_snapshot: snapshot id %llx not found in b-tree", (unsigned long long)from)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_delete_snapshot: snapshot id %llx not found in b-tree", + (unsigned long long)from)); return dm_multisnap_has_error(s->dm); } if (snap_key.snap_to < to) { - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_delete_snapshot: snapshot id %llx-%llx not found in b-tree", (unsigned long long)from, (unsigned long long)to)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_delete_snapshot: snapshot id %llx-%llx not found in b-tree", + (unsigned long long)from, (unsigned long long)to)); return -EFSERROR; } @@ -424,7 +443,6 @@ static int dm_multisnap_delete_snapid_range(struct dm_exception_store *s, snapid /* * Create a subsnapshot. */ - static int dm_multisnap_create_subsnapshot(struct dm_exception_store *s, snapid_t snapid) { int r; @@ -432,13 +450,18 @@ static int dm_multisnap_create_subsnapshot(struct dm_exception_store *s, snapid_ master = snapid | DM_MIKULAS_SUBSNAPID_MASK; if (!dm_multisnap_snapshot_exists(s->dm, master)) { - DMERR("dm_multisnap_create_subsnapshot: master snapshot with id %llx doesn't exist", (unsigned long long)snapid); + DMERR("dm_multisnap_create_subsnapshot: master snapshot with id %llx doesn't exist", + (unsigned long long)snapid); return -EINVAL; } next_sub = dm_multisnap_find_next_subsnapshot(s, master); if (snapid < next_sub) { - DMERR("dm_multisnap_create_subsnapshot: invalid subsnapshot id %llx (allowed range %llx - %llx)", (unsigned long long)snapid, (unsigned long long)next_sub, (unsigned long long)master - 1); + DMERR("dm_multisnap_create_subsnapshot: invalid subsnapshot id %llx " + "(allowed range %llx - %llx)", + (unsigned long long)snapid, + (unsigned long long)next_sub, + (unsigned long long)master - 1); return -EINVAL; } @@ -458,7 +481,6 @@ static int dm_multisnap_create_subsnapshot(struct dm_exception_store *s, snapid_ /* * Create a snapshot or subsnapshot with a given snapid. */ - int dm_multisnap_create_snapshot(struct dm_exception_store *s, snapid_t snapid) { int r; @@ -474,7 +496,8 @@ int dm_multisnap_create_snapshot(struct dm_exception_store *s, snapid_t snapid) return -EINVAL; } if (dm_multisnap_snapshot_exists(s->dm, snapid)) { - DMERR("dm_multisnap_create_snapshot: snapshot with id %llx already exists", (unsigned long long)snapid); + DMERR("dm_multisnap_create_snapshot: snapshot with id %llx already exists", + (unsigned long long)snapid); return -EINVAL; } @@ -492,13 +515,14 @@ int dm_multisnap_create_snapshot(struct dm_exception_store *s, snapid_t snapid) * Delete a snapshot or subsnapshot with a given snapid. * Spawn background scanning for entries to delete. */ - int dm_multisnap_delete_snapshot(struct dm_exception_store *s, snapid_t snapid) { int r; if (!dm_multisnap_snapshot_exists(s->dm, snapid)) { - DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, ("dm_multisnap_delete_snapshot: snapshot id %llx not found in rb-tree", (unsigned long long)snapid)); + DM_MULTISNAP_SET_ERROR(s->dm, -EFSERROR, + ("dm_multisnap_delete_snapshot: snapshot id %llx not found in rb-tree", + (unsigned long long)snapid)); return -EFSERROR; } @@ -518,7 +542,6 @@ int dm_multisnap_delete_snapshot(struct dm_exception_store *s, snapid_t snapid) * Sort the snapids for creating. Sort them linearly except that the master * goes before all subsnapshots. */ - int dm_multisnap_compare_snapids_for_create(const void *p1, const void *p2) { mikulas_snapid_t s1 = *(const snapid_t *)p1; @@ -543,8 +566,10 @@ int dm_multisnap_compare_snapids_for_create(const void *p1, const void *p2) /* * Return the number of total, allocated and metadata chunks. */ - -void dm_multisnap_get_space(struct dm_exception_store *s, unsigned long long *chunks_total, unsigned long long *chunks_allocated, unsigned long long *chunks_metadata_allocated) +void dm_multisnap_get_space(struct dm_exception_store *s, + unsigned long long *chunks_total, + unsigned long long *chunks_allocated, + unsigned long long *chunks_metadata_allocated) { dm_multisnap_status_assert_locked(s->dm); *chunks_total = s->dev_size; @@ -558,8 +583,8 @@ void dm_multisnap_get_space(struct dm_exception_store *s, unsigned long long *ch * Convert snapid to user-friendly format (so that he won't see things like * 4294967296). */ - -void dm_multisnap_print_snapid(struct dm_exception_store *s, char *string, unsigned maxlen, snapid_t snapid) +void dm_multisnap_print_snapid(struct dm_exception_store *s, char *string, + unsigned maxlen, snapid_t snapid) { unsigned master = snapid >> DM_MIKULAS_SNAPID_STEP_BITS; unsigned subsnap = snapid & DM_MIKULAS_SUBSNAPID_MASK; @@ -572,8 +597,8 @@ void dm_multisnap_print_snapid(struct dm_exception_store *s, char *string, unsig /* * Convert snapid from user-friendly format to the internal 64-bit number. */ - -int dm_multisnap_read_snapid(struct dm_exception_store *s, char *string, snapid_t *snapid, char **error) +int dm_multisnap_read_snapid(struct dm_exception_store *s, char *string, + snapid_t *snapid, char **error) { unsigned long master; unsigned long subsnap; diff --git a/drivers/md/dm-multisnap.c b/drivers/md/dm-multisnap.c index fe4fee6..758c013 100644 --- a/drivers/md/dm-multisnap.c +++ b/drivers/md/dm-multisnap.c @@ -134,11 +134,12 @@ static long dm_multisnap_jobs_in_flight(struct dm_multisnap *s) /* * Any reading/writing of snapids in table/status/message must go - * through this functions, so that snapid format for userspace can - * be overriden. + * through these functions, so that snapid format for userspace can + * be overridden. */ -static void print_snapid(struct dm_multisnap *s, char *string, unsigned maxlen, snapid_t snapid) +static void print_snapid(struct dm_multisnap *s, char *string, + unsigned maxlen, snapid_t snapid) { if (s->store->print_snapid) s->store->print_snapid(s->p, string, maxlen, snapid); @@ -146,7 +147,8 @@ static void print_snapid(struct dm_multisnap *s, char *string, unsigned maxlen, snprintf(string, maxlen, "%llu", (unsigned long long)snapid); } -static int read_snapid(struct dm_multisnap *s, char *string, snapid_t *snapid, char **error) +static int read_snapid(struct dm_multisnap *s, char *string, + snapid_t *snapid, char **error) { if (s->store->read_snapid) return s->store->read_snapid(s->p, string, snapid, error); @@ -302,7 +304,7 @@ static void bio_put_snapid(struct bio *bio, snapid_t snapid) bio->bi_seg_back_size = snapid; } -/* --- tracked chnuks --- */ +/* --- tracked chunks --- */ static struct kmem_cache *tracked_chunk_cache; @@ -338,7 +340,8 @@ static void pending_exception_ctor(void *pe_) bio_list_init(&pe->bios); } -static struct dm_multisnap_pending_exception *dm_multisnap_alloc_pending_exception(struct dm_multisnap *s, chunk_t chunk) +static struct dm_multisnap_pending_exception * +dm_multisnap_alloc_pending_exception(struct dm_multisnap *s, chunk_t chunk) { struct dm_multisnap_pending_exception *pe; /* @@ -367,7 +370,7 @@ static void dm_multisnap_free_pending_exception(struct dm_multisnap_pending_exce static void dm_multisnap_wait_for_pending_exception(struct dm_multisnap *s) { /* - * Wait until there is something in the mempool. Free it immediatelly. + * Wait until there is something in the mempool. Free it immediately. */ struct dm_multisnap_pending_exception *pe; @@ -380,8 +383,8 @@ static void dm_multisnap_wait_for_pending_exception(struct dm_multisnap *s) * * If it does, queue the bio on the pending exception. */ - -static int check_pending_io(struct dm_multisnap *s, struct bio *bio, chunk_t chunk, snapid_t snapid) +static int check_pending_io(struct dm_multisnap *s, struct bio *bio, + chunk_t chunk, snapid_t snapid) { struct dm_multisnap_pending_exception *pe; struct hlist_node *hn; @@ -410,7 +413,6 @@ conflict: * Test if commit can be performed. If these two variables are not equal, * there are some pending kcopyd jobs and we must not commit. */ - int dm_multisnap_can_commit(struct dm_multisnap *s) { return s->kcopyd_jobs_submitted_count == s->kcopyd_jobs_finished_count; @@ -422,7 +424,6 @@ EXPORT_SYMBOL(dm_multisnap_can_commit); * This can be called only if dm_multisnap_can_commit returned true; * master_lock must be locked. */ - void dm_multisnap_call_commit(struct dm_multisnap *s) { s->kcopyd_jobs_last_commit_count = s->kcopyd_jobs_finished_count; @@ -436,17 +437,16 @@ EXPORT_SYMBOL(dm_multisnap_call_commit); * this function exits. * master_lock must be unlocked. * - * If the commit cannot be performed immediatelly (because there are pending + * If the commit cannot be performed immediately (because there are pending * chunks being copied), the function drops the lock and polls. It won't * livelock --- either it will be possible to do the commit or someone - * have done the commit already (commit_sequence changed). + * has done the commit already (commit_sequence changed). * * The polling is justified because this function is only called when deleting * a snapshot or when suspending the origin with postsuspend. These functions * are not performance-critical, thus 1ms delay won't cause a performance * problem. */ - static int dm_multisnap_force_commit(struct dm_multisnap *s) { int err; @@ -481,7 +481,8 @@ static void remap_callback(int read_err, unsigned long write_err, void *pe_) struct dm_multisnap *s = pe->s; if (unlikely((read_err | write_err) != 0)) - DM_MULTISNAP_SET_ERROR(s, -EIO, ("remap_callback: kcopyd I/O error: %d, %lx", read_err, write_err)); + DM_MULTISNAP_SET_ERROR(s, -EIO, ("remap_callback: kcopyd I/O error: " + "%d, %lx", read_err, write_err)); list_add_tail(&pe->list, &s->pes_waiting_for_commit); @@ -508,29 +509,28 @@ static void remap_callback(int read_err, unsigned long write_err, void *pe_) dm_multisnap_call_commit(s); do { - pe = container_of(s->pes_waiting_for_commit.next, struct dm_multisnap_pending_exception, list); + pe = container_of(s->pes_waiting_for_commit.next, + struct dm_multisnap_pending_exception, list); /* * When we are about to free the pending exception, we must - * wait for all reads to the apropriate chunk to - * finish. + * wait for all reads to the appropriate chunk to finish. * * This prevents the following race condition: * - someone reads the chunk in the snapshot with no exception * - that read is remapped directly to the origin, the read * is delayed for some reason - * - someone other writes to the origin, this triggers realloc + * - someone else writes to the origin, this triggers realloc * - the realloc finishes * - the write is dispatched to the origin * - the read submitted first is dispatched and reads modified * data * - * This race is very improbable (non-shared snapshots had this + * This race is very improbable (non-shared snapshots have this * race too and it hasn't ever been reported seen, except in * artifically simulated cases). So we use active waiting with * msleep(1). */ - while (chunk_is_tracked(s, pe->chunk)) msleep(1); @@ -552,7 +552,10 @@ static void remap_callback(int read_err, unsigned long write_err, void *pe_) blk_unplug(bdev_get_queue(s->snapshot->bdev)); } -static void dispatch_kcopyd(struct dm_multisnap *s, struct dm_multisnap_pending_exception *pe, int from_snapshot, chunk_t chunk, struct bio *bio, struct dm_io_region *dests, unsigned n_dests) +static void dispatch_kcopyd(struct dm_multisnap *s, + struct dm_multisnap_pending_exception *pe, + int from_snapshot, chunk_t chunk, struct bio *bio, + struct dm_io_region *dests, unsigned n_dests) { unsigned i; struct dm_io_region src; @@ -565,7 +568,8 @@ static void dispatch_kcopyd(struct dm_multisnap *s, struct dm_multisnap_pending_ src.sector = chunk_to_sector(s, chunk); src.count = s->chunk_size >> SECTOR_SHIFT; - if (likely(!from_snapshot) && unlikely(src.sector + src.count > s->origin_sectors)) { + if (likely(!from_snapshot) && + unlikely(src.sector + src.count > s->origin_sectors)) { if (src.sector >= s->origin_sectors) src.count = 0; else @@ -586,7 +590,6 @@ static void dispatch_kcopyd(struct dm_multisnap *s, struct dm_multisnap_pending_ * Process bio on the origin. * Reads and barriers never go here, they are dispatched directly. */ - static void do_origin_write(struct dm_multisnap *s, struct bio *bio) { int r; @@ -599,11 +602,12 @@ static void do_origin_write(struct dm_multisnap *s, struct bio *bio) BUG_ON(bio_rw(bio) != WRITE); if (bio->bi_sector + (bio->bi_size >> SECTOR_SHIFT) > s->origin_sectors) { - DMERR("do_origin_write: access out of device, flags %lx, sector %llx, size %x, origin sectors %llx", - bio->bi_flags, - (unsigned long long)bio->bi_sector, - bio->bi_size, - (unsigned long long)s->origin_sectors); + DMERR("do_origin_write: access beyond end of device, flags %lx, " + "sector %llx, size %x, origin sectors %llx", + bio->bi_flags, + (unsigned long long)bio->bi_sector, + bio->bi_size, + (unsigned long long)s->origin_sectors); bio_endio(bio, -EIO); return; } @@ -621,7 +625,6 @@ static void do_origin_write(struct dm_multisnap *s, struct bio *bio) if (likely(!r)) { /* There is nothing to remap */ - if (unlikely(check_pending_io(s, bio, chunk, DM_SNAPID_T_ORIGIN))) return; dispatch_write: @@ -638,15 +641,7 @@ dispatch_write: } i = 0; - goto midcycle; for (; i < DM_MULTISNAP_MAX_CHUNKS_TO_REMAP; i++) { - r = s->store->query_next_remap(s->p, chunk); - if (unlikely(r < 0)) - goto free_err_endio; - if (likely(!r)) - break; - -midcycle: s->store->add_next_remap(s->p, &pe->desc[i], &new_chunk); if (unlikely(dm_multisnap_has_error(s))) goto free_err_endio; @@ -654,6 +649,14 @@ midcycle: dests[i].bdev = s->snapshot->bdev; dests[i].sector = chunk_to_sector(s, new_chunk); dests[i].count = s->chunk_size >> SECTOR_SHIFT; + + r = s->store->query_next_remap(s->p, chunk); + if (unlikely(r < 0)) + goto free_err_endio; + if (likely(!r)) { + i++; + break; + } } dispatch_kcopyd(s, pe, 0, chunk, bio, dests, i); @@ -674,7 +677,6 @@ err_endio: * Process bio on the snapshot. * Barriers never go here, they are dispatched directly. */ - static void do_snapshot_io(struct dm_multisnap *s, struct bio *bio, snapid_t id) { chunk_t chunk, result, copy_from; @@ -682,21 +684,21 @@ static void do_snapshot_io(struct dm_multisnap *s, struct bio *bio, snapid_t id) struct dm_multisnap_pending_exception *pe; struct dm_io_region dest; - if (unlikely(!s->store->make_chunk_writeable) && unlikely(bio_rw(bio) == WRITE)) + if (unlikely(!s->store->make_chunk_writeable) && + unlikely(bio_rw(bio) == WRITE)) goto err_endio; if (unlikely(dm_multisnap_has_error(s))) goto err_endio; chunk = sector_to_chunk(s, bio->bi_sector); - r = s->store->find_snapshot_chunk(s->p, id, chunk, bio_rw(bio) == WRITE, &result); + r = s->store->find_snapshot_chunk(s->p, id, chunk, + bio_rw(bio) == WRITE, &result); if (unlikely(r < 0)) goto err_endio; if (!r) { - /* Not found in the snapshot */ - if (likely(bio_rw(bio) != WRITE)) { union map_info *map_context; struct dm_multisnap_tracked_chunk *c; @@ -719,10 +721,9 @@ static void do_snapshot_io(struct dm_multisnap *s, struct bio *bio, snapid_t id) * added to tracked_chunk_hash, the bio must be finished * and removed from the hash without taking master_lock. * - * So we add it immediatelly before submitting the bio + * So we add it immediately before submitting the bio * with generic_make_request. */ - bio->bi_bdev = s->origin->bdev; map_context = dm_get_mapinfo(bio); @@ -750,9 +751,7 @@ static void do_snapshot_io(struct dm_multisnap *s, struct bio *bio, snapid_t id) return; } } else { - /* Found in the snapshot */ - if (unlikely(check_pending_io(s, bio, chunk, id))) return; @@ -802,7 +801,6 @@ failed_pe_allocation: * from other places (for example kcopyd callback), assuming that the caller * holds master_lock. */ - static void dm_multisnap_process_bios(struct dm_multisnap *s) { struct bio *bio; @@ -812,7 +810,9 @@ again: cond_resched(); if (!list_empty(&s->background_works)) { - struct dm_multisnap_background_work *bw = list_entry(s->background_works.next, struct dm_multisnap_background_work, list); + struct dm_multisnap_background_work *bw = + list_entry(s->background_works.next, + struct dm_multisnap_background_work, list); list_del(&bw->list); bw->queued = 0; bw->work(s->p, bw); @@ -821,7 +821,6 @@ again: } bio = dm_multisnap_dequeue_bio(s); - if (unlikely(!bio)) return; @@ -846,7 +845,8 @@ again: * master lock held. */ -void dm_multisnap_queue_work(struct dm_multisnap *s, struct dm_multisnap_background_work *bw) +void dm_multisnap_queue_work(struct dm_multisnap *s, + struct dm_multisnap_background_work *bw) { dm_multisnap_assert_locked(s); @@ -861,7 +861,8 @@ void dm_multisnap_queue_work(struct dm_multisnap *s, struct dm_multisnap_backgro } EXPORT_SYMBOL(dm_multisnap_queue_work); -void dm_multisnap_cancel_work(struct dm_multisnap *s, struct dm_multisnap_background_work *bw) +void dm_multisnap_cancel_work(struct dm_multisnap *s, + struct dm_multisnap_background_work *bw) { dm_multisnap_assert_locked(s); @@ -876,7 +877,6 @@ EXPORT_SYMBOL(dm_multisnap_cancel_work); /* * The main work thread. */ - static void dm_multisnap_work(struct work_struct *work) { struct dm_multisnap *s = container_of(work, struct dm_multisnap, work); @@ -886,7 +886,7 @@ static void dm_multisnap_work(struct work_struct *work) dm_multisnap_unlock(s); /* - * If there was some mempool allocation failure, we must fail, outside + * If there was some mempool allocation failure we must wait, outside * the lock, until there is some free memory. * If this branch is taken, the work is already queued again, so it * reexecutes after finding some memory. @@ -914,7 +914,8 @@ static struct dm_multisnap *find_multisnapshot(struct block_device *origin) static DEFINE_MUTEX(exception_stores_lock); static LIST_HEAD(all_exception_stores); -static struct dm_multisnap_exception_store *dm_multisnap_find_exception_store(const char *name) +static struct dm_multisnap_exception_store * +dm_multisnap_find_exception_store(const char *name) { struct dm_multisnap_exception_store *store; @@ -965,7 +966,8 @@ void dm_multisnap_unregister_exception_store(struct dm_multisnap_exception_store } EXPORT_SYMBOL(dm_multisnap_unregister_exception_store); -static struct dm_multisnap_exception_store *dm_multisnap_get_exception_store(const char *name) +static struct dm_multisnap_exception_store * +dm_multisnap_get_exception_store(const char *name) { struct dm_multisnap_exception_store *store; @@ -994,7 +996,8 @@ static void dm_multisnap_put_exception_store(struct dm_multisnap_exception_store /* --- argument parser --- */ -int dm_multisnap_get_string(char ***argv, unsigned *argc, char **string, char **error) +int dm_multisnap_get_string(char ***argv, unsigned *argc, + char **string, char **error) { if (!*argc) { *error = "Not enough arguments"; @@ -1006,7 +1009,8 @@ int dm_multisnap_get_string(char ***argv, unsigned *argc, char **string, char ** } EXPORT_SYMBOL(dm_multisnap_get_string); -int dm_multisnap_get_uint64(char ***argv, unsigned *argc, __u64 *unsigned_int64, char **error) +int dm_multisnap_get_uint64(char ***argv, unsigned *argc, + __u64 *unsigned_int64, char **error) { char *string; int r = dm_multisnap_get_string(argv, argc, &string, error); @@ -1024,7 +1028,8 @@ invalid_number: } EXPORT_SYMBOL(dm_multisnap_get_uint64); -int dm_multisnap_get_uint(char ***argv, unsigned *argc, unsigned *unsigned_int, char **error) +int dm_multisnap_get_uint(char ***argv, unsigned *argc, + unsigned *unsigned_int, char **error) { __u64 unsigned_int64; int r = dm_multisnap_get_uint64(argv, argc, &unsigned_int64, error); @@ -1039,7 +1044,8 @@ int dm_multisnap_get_uint(char ***argv, unsigned *argc, unsigned *unsigned_int, } EXPORT_SYMBOL(dm_multisnap_get_uint); -int dm_multisnap_get_argcount(char ***argv, unsigned *argc, unsigned *unsigned_int, char **error) +int dm_multisnap_get_argcount(char ***argv, unsigned *argc, + unsigned *unsigned_int, char **error) { int r = dm_multisnap_get_uint(argv, argc, unsigned_int, error); if (r) @@ -1138,10 +1144,10 @@ static int multisnap_origin_ctr(struct dm_target *ti, unsigned argc, char **argv if (r) goto bad_generic_arguments; - /* Synchronize snapshot list against a list given in the target table */ + /* Synchronize snapshot list against the list given in the target table */ if (!strcasecmp(arg, "sync-snapshots")) s->flags |= DM_MULTISNAP_SYNC_SNAPSHOTS; - /* Don't drop the snapshot store on error, rather stop the origin */ + /* Don't drop the snapshot store on error, rather stop the origin */ else if (!strcasecmp(arg, "preserve-on-error")) s->flags |= DM_MULTISNAP_PRESERVE_ON_ERROR; else { @@ -1151,14 +1157,16 @@ static int multisnap_origin_ctr(struct dm_target *ti, unsigned argc, char **argv } } - r = dm_get_device(ti, origin_path, 0, 0, FMODE_READ | FMODE_WRITE, &s->origin); + r = dm_get_device(ti, origin_path, 0, 0, + FMODE_READ | FMODE_WRITE, &s->origin); if (r) { ti->error = "Could not get origin device"; goto bad_origin; } s->origin_sectors = i_size_read(s->origin->bdev->bd_inode) >> SECTOR_SHIFT; - r = dm_get_device(ti, snapshot_path, 0, 0, FMODE_READ | FMODE_WRITE, &s->snapshot); + r = dm_get_device(ti, snapshot_path, 0, 0, + FMODE_READ | FMODE_WRITE, &s->snapshot); if (r) { ti->error = "Could not get snapshot device"; goto bad_snapshot; @@ -1169,14 +1177,13 @@ static int multisnap_origin_ctr(struct dm_target *ti, unsigned argc, char **argv * * Currently, multisnapshot target is loaded just once, there is no * place where it would be reloaded (even lvchange --refresh doesn't - * do it), so there is no need to handle loading the target multiple + * do it). So there is no need to handle loading the target multiple * times for the same devices and "handover" of the exception store. * * As a safeguard to protect against possible data corruption from * userspace misbehavior, we check that there is no other target loaded * that has the origin or the snapshot store on the same devices. */ - list_for_each_entry(ss, &all_multisnapshots, list_all) if (ss->origin->bdev == s->origin->bdev || ss->snapshot->bdev == s->snapshot->bdev) { @@ -1186,7 +1193,6 @@ static int multisnap_origin_ctr(struct dm_target *ti, unsigned argc, char **argv } /* Validate the chunk size */ - if (chunk_size > INT_MAX / 512) { ti->error = "Chunk size is too high"; r = -EINVAL; @@ -1207,14 +1213,16 @@ static int multisnap_origin_ctr(struct dm_target *ti, unsigned argc, char **argv s->chunk_size = chunk_size; s->chunk_shift = ffs(chunk_size) - 1; - s->pending_pool = mempool_create_slab_pool(DM_PENDING_MEMPOOL_SIZE, pending_exception_cache); + s->pending_pool = mempool_create_slab_pool(DM_PENDING_MEMPOOL_SIZE, + pending_exception_cache); if (!s->pending_pool) { ti->error = "Could not allocate mempool for pending exceptions"; r = -ENOMEM; goto bad_pending_pool; } - s->tracked_chunk_pool = mempool_create_slab_pool(DM_TRACKED_CHUNK_POOL_SIZE, tracked_chunk_cache); + s->tracked_chunk_pool = mempool_create_slab_pool(DM_TRACKED_CHUNK_POOL_SIZE, + tracked_chunk_cache); if (!s->tracked_chunk_pool) { ti->error = "Could not allocate tracked_chunk mempool for tracking reads"; goto bad_tracked_chunk_pool; @@ -1306,7 +1314,7 @@ static int multisnap_origin_ctr(struct dm_target *ti, unsigned argc, char **argv if (!dm_multisnap_has_error(s)) { r = s->store->delete_snapshot(s->p, sn); if (r && s->flags & DM_MULTISNAP_PRESERVE_ON_ERROR) { - ti->error = "Can't delete snapshot"; + ti->error = "Could not delete snapshot"; vfree(snapids); goto error_syncing_snapshots; } @@ -1322,16 +1330,16 @@ static int multisnap_origin_ctr(struct dm_target *ti, unsigned argc, char **argv } } delete_done: - /* Create the snapshots that should be there */ if (s->store->compare_snapids_for_create) - sort(snapids, num_snapshots, sizeof(snapid_t), s->store->compare_snapids_for_create, NULL); + sort(snapids, num_snapshots, sizeof(snapid_t), + s->store->compare_snapids_for_create, NULL); for (n = 0; n <= num_snapshots; n++) { if (!dm_multisnap_snapshot_exists(s, snapids[n])) { if (!dm_multisnap_has_error(s)) { r = s->store->create_snapshot(s->p, snapids[n]); if (r && s->flags & DM_MULTISNAP_PRESERVE_ON_ERROR) { - ti->error = "Can't create snapshot"; + ti->error = "Could not create snapshot"; vfree(snapids); goto error_syncing_snapshots; } @@ -1385,7 +1393,7 @@ static void multisnap_origin_dtr(struct dm_target *ti) mutex_lock(&all_multisnapshots_lock); - /* Make sure that any more IOs won't be submitted by snapshot targets */ + /* Make sure that no more IOs will be submitted by snapshot targets */ list_for_each_entry(sn, &s->all_snaps, list_snaps) { spin_lock_irq(&dm_multisnap_bio_list_lock); sn->s = NULL; @@ -1411,7 +1419,7 @@ poll_for_ios: } spin_unlock_irq(&dm_multisnap_bio_list_lock); - /* Bug-check that there are really no IOs */ + /* Make sure that there really are no outstanding IOs */ for (i = 0; i < DM_MULTISNAP_N_QUEUES; i++) BUG_ON(!bio_list_empty(&s->queue[i].bios)); for (i = 0; i < DM_TRACKED_CHUNK_HASH_SIZE; i++) @@ -1450,7 +1458,8 @@ poll_for_ios: mutex_unlock(&all_multisnapshots_lock); } -static int multisnap_origin_map(struct dm_target *ti, struct bio *bio, union map_info *map_context) +static int multisnap_origin_map(struct dm_target *ti, struct bio *bio, + union map_info *map_context) { struct dm_multisnap *s = ti->private; @@ -1471,7 +1480,8 @@ static int multisnap_origin_map(struct dm_target *ti, struct bio *bio, union map return DM_MAPIO_SUBMITTED; } -static int multisnap_origin_message(struct dm_target *ti, unsigned argc, char **argv) +static int multisnap_origin_message(struct dm_target *ti, + unsigned argc, char **argv) { struct dm_multisnap *s = ti->private; char *error; @@ -1514,7 +1524,8 @@ create_snapshot: s->new_snapid_valid = 0; dm_multisnap_status_unlock(s); - r = s->store->allocate_snapid(s->p, &s->new_snapid, subsnap, subsnap_id); + r = s->store->allocate_snapid(s->p, &s->new_snapid, + subsnap, subsnap_id); if (r) goto unlock_ret; @@ -1602,7 +1613,6 @@ unlock2_ret: } /* Print used snapshot IDs into a supplied string */ - static void print_snapshot_ids(struct dm_multisnap *s, char *result, unsigned maxlen) { snapid_t nsnap = 0; @@ -1621,14 +1631,15 @@ static void print_snapshot_ids(struct dm_multisnap *s, char *result, unsigned ma } } -static int multisnap_origin_status(struct dm_target *ti, status_type_t type, char *result, unsigned maxlen) +static int multisnap_origin_status(struct dm_target *ti, status_type_t type, + char *result, unsigned maxlen) { struct dm_multisnap *s = ti->private; /* * Use a special status lock, so that this code can execute even * when the underlying device is suspended and there is no possibility - * to optain the master lock. + * to obtain the master lock. */ dm_multisnap_status_lock(s); @@ -1695,13 +1706,11 @@ static int multisnap_origin_status(struct dm_target *ti, status_type_t type, cha * In postsuspend, we optionally create a snapshot that we prepared with * a message. */ - static void multisnap_origin_postsuspend(struct dm_target *ti) { struct dm_multisnap *s = ti->private; dm_multisnap_lock(s); - if (s->new_snapid_valid && !dm_multisnap_has_error(s)) { /* * No way to return the error code, but it is recorded @@ -1710,7 +1719,6 @@ static void multisnap_origin_postsuspend(struct dm_target *ti) s->store->create_snapshot(s->p, s->new_snapid); s->new_snapid_valid = 0; } - dm_multisnap_unlock(s); dm_multisnap_force_commit(s); @@ -1820,12 +1828,12 @@ static void multisnap_snap_dtr(struct dm_target *ti) /* * Each snapshot I/O is counted in n_tracked_ios in the origin and - * has struct dm_multisnap_tracked_chunk allocated. - * dm_multisnap_tracked_chunk->node can be optionally linked into origin's hash - * of tracked I/Os. + * has 'struct dm_multisnap_tracked_chunk' allocated. + * dm_multisnap_tracked_chunk->node can be optionally linked into + * origin's hash of tracked I/Os. */ - -static int multisnap_snap_map(struct dm_target *ti, struct bio *bio, union map_info *map_context) +static int multisnap_snap_map(struct dm_target *ti, struct bio *bio, + union map_info *map_context) { struct dm_multisnap_snap *sn = ti->private; struct dm_multisnap *s; @@ -1839,10 +1847,10 @@ static int multisnap_snap_map(struct dm_target *ti, struct bio *bio, union map_i spin_unlock_irq(&dm_multisnap_bio_list_lock); return -EIO; } - /* - * make sure that the origin is not unloaded under us while - * we drop the lock - */ + /* + * make sure that the origin is not unloaded under us while + * we drop the lock + */ s->n_tracked_ios++; c = mempool_alloc(s->tracked_chunk_pool, GFP_ATOMIC); @@ -1871,7 +1879,8 @@ static int multisnap_snap_map(struct dm_target *ti, struct bio *bio, union map_i return DM_MAPIO_SUBMITTED; } -static int multisnap_snap_end_io(struct dm_target *ti, struct bio *bio, int error, union map_info *map_context) +static int multisnap_snap_end_io(struct dm_target *ti, struct bio *bio, + int error, union map_info *map_context) { struct dm_multisnap_tracked_chunk *c = map_context->ptr; struct dm_multisnap *s = c->s; @@ -1889,7 +1898,8 @@ static int multisnap_snap_end_io(struct dm_target *ti, struct bio *bio, int erro return 0; } -static int multisnap_snap_status(struct dm_target *ti, status_type_t type, char *result, unsigned maxlen) +static int multisnap_snap_status(struct dm_target *ti, status_type_t type, + char *result, unsigned maxlen) { struct dm_multisnap_snap *sn = ti->private; @@ -1901,7 +1911,8 @@ static int multisnap_snap_status(struct dm_target *ti, status_type_t type, char dm_multisnap_adjust_string(&result, &maxlen); break; case STATUSTYPE_TABLE: - snprintf(result, maxlen, "%s %s", sn->origin_name, sn->snapid_string); + snprintf(result, maxlen, "%s %s", + sn->origin_name, sn->snapid_string); dm_multisnap_adjust_string(&result, &maxlen); break; } diff --git a/drivers/md/dm-multisnap.h b/drivers/md/dm-multisnap.h index ff7f844..0af87dd 100644 --- a/drivers/md/dm-multisnap.h +++ b/drivers/md/dm-multisnap.h @@ -49,24 +49,30 @@ struct dm_multisnap_exception_store { const char *name; /* < 0 - error */ - int (*init_exception_store)(struct dm_multisnap *dm, struct dm_exception_store **s, unsigned argc, char **argv, char **error); + int (*init_exception_store)(struct dm_multisnap *dm, struct dm_exception_store **s, + unsigned argc, char **argv, char **error); void (*exit_exception_store)(struct dm_exception_store *s); void (*store_lock_acquired)(struct dm_exception_store *s, int flags); /* These two can override format of snapids in the table. Can be NULL */ - void (*print_snapid)(struct dm_exception_store *s, char *string, unsigned maxlen, snapid_t snapid); - int (*read_snapid)(struct dm_exception_store *s, char *string, snapid_t *snapid, char **error); + void (*print_snapid)(struct dm_exception_store *s, char *string, + unsigned maxlen, snapid_t snapid); + int (*read_snapid)(struct dm_exception_store *s, char *string, + snapid_t *snapid, char **error); /* return the exception-store specific table arguments */ void (*status_table)(struct dm_exception_store *s, char *result, unsigned maxlen); /* return the space */ - void (*get_space)(struct dm_exception_store *s, unsigned long long *chunks_total, unsigned long long *chunks_allocated, unsigned long long *chunks_metadata_allocated); + void (*get_space)(struct dm_exception_store *s, unsigned long long *chunks_total, + unsigned long long *chunks_allocated, + unsigned long long *chunks_metadata_allocated); /* < 0 - error */ - int (*allocate_snapid)(struct dm_exception_store *s, snapid_t *snapid, int snap_of_snap, snapid_t master); + int (*allocate_snapid)(struct dm_exception_store *s, snapid_t *snapid, + int snap_of_snap, snapid_t master); /* < 0 - error */ int (*create_snapshot)(struct dm_exception_store *s, snapid_t snapid); @@ -87,7 +93,8 @@ struct dm_multisnap_exception_store { int (*compare_snapids_for_create)(const void *p1, const void *p2); /* 0 - not found, 1 - found (read-only), 2 - found (writeable), < 0 - error */ - int (*find_snapshot_chunk)(struct dm_exception_store *s, snapid_t id, chunk_t chunk, int write, chunk_t *result); + int (*find_snapshot_chunk)(struct dm_exception_store *s, snapid_t snapid, + chunk_t chunk, int write, chunk_t *result); /* * Chunk interface between exception store and generic code. @@ -108,11 +115,14 @@ struct dm_multisnap_exception_store { void (*reset_query)(struct dm_exception_store *s); int (*query_next_remap)(struct dm_exception_store *s, chunk_t chunk); - void (*add_next_remap)(struct dm_exception_store *s, union chunk_descriptor *cd, chunk_t *new_chunk); + void (*add_next_remap)(struct dm_exception_store *s, + union chunk_descriptor *cd, chunk_t *new_chunk); /* may be NULL if writeable snapshots are not supported */ - void (*make_chunk_writeable)(struct dm_exception_store *s, union chunk_descriptor *cd, chunk_t *new_chunk); - int (*check_conflict)(struct dm_exception_store *s, union chunk_descriptor *cd, snapid_t snapid); + void (*make_chunk_writeable)(struct dm_exception_store *s, + union chunk_descriptor *cd, chunk_t *new_chunk); + int (*check_conflict)(struct dm_exception_store *s, + union chunk_descriptor *cd, snapid_t snapid); /* This is called without the lock, prior to commit */ void (*prepare_for_commit)(struct dm_exception_store *s); @@ -142,19 +152,28 @@ void dm_multisnap_status_lock(struct dm_multisnap *s); void dm_multisnap_status_unlock(struct dm_multisnap *s); void dm_multisnap_status_assert_locked(struct dm_multisnap *s); -/* Commit. dm_multisnap_call_commit can be called only if dm_multisnap_can_commit returns true */ +/* + * Commit. dm_multisnap_call_commit can only be called + * if dm_multisnap_can_commit returns true + */ int dm_multisnap_can_commit(struct dm_multisnap *s); void dm_multisnap_call_commit(struct dm_multisnap *s); /* Delayed work for delete/merge */ -void dm_multisnap_queue_work(struct dm_multisnap *s, struct dm_multisnap_background_work *bw); -void dm_multisnap_cancel_work(struct dm_multisnap *s, struct dm_multisnap_background_work *bw); +void dm_multisnap_queue_work(struct dm_multisnap *s, + struct dm_multisnap_background_work *bw); +void dm_multisnap_cancel_work(struct dm_multisnap *s, + struct dm_multisnap_background_work *bw); /* Parsing command line */ -int dm_multisnap_get_string(char ***argv, unsigned *argc, char **string, char **error); -int dm_multisnap_get_uint64(char ***argv, unsigned *argc, __u64 *unsigned_int64, char **error); -int dm_multisnap_get_uint(char ***argv, unsigned *argc, unsigned *unsigned_int, char **error); -int dm_multisnap_get_argcount(char ***argv, unsigned *argc, unsigned *unsigned_int, char **error); +int dm_multisnap_get_string(char ***argv, unsigned *argc, + char **string, char **error); +int dm_multisnap_get_uint64(char ***argv, unsigned *argc, + __u64 *unsigned_int64, char **error); +int dm_multisnap_get_uint(char ***argv, unsigned *argc, + unsigned *unsigned_int, char **error); +int dm_multisnap_get_argcount(char ***argv, unsigned *argc, + unsigned *unsigned_int, char **error); void dm_multisnap_adjust_string(char **result, unsigned *maxlen); /* Register/unregister the exception store driver */ ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 00/14] mikulas' shared snapshot patches 2010-03-02 0:23 [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer ` (14 preceding siblings ...) 2010-03-02 0:32 ` [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer @ 2010-03-02 14:56 ` Mike Snitzer 15 siblings, 0 replies; 22+ messages in thread From: Mike Snitzer @ 2010-03-02 14:56 UTC (permalink / raw) To: Mikulas Patocka; +Cc: dm-devel On Mon, Mar 01 2010 at 7:23pm -0500, Mike Snitzer <snitzer@redhat.com> wrote: > Mikulas, > > This is just the full submit of your shared snapshot patches from: > http://people.redhat.com/mpatocka/patches/kernel/new-snapshots/r15/ > > I think the next phase of review should possibly be driven through the > dm-devel mailing list. I'd at least like the option of exchanging > mail on aspects of some of these patches. > > The first patch has one small cleanup in do_origin_write(): I > eliminated the 'midcycle' goto. > > But the primary difference with this submission (when compared to your > r15 patches) is I editted the patches for whitespace and typos. Mikulas, I've made these patches available for download here: http://people.redhat.com/msnitzer/patches/multisnap/kernel/2.6.33/ I'd like to hand these DM patches, and the lvm2 patches, over to you so we don't get out of sync. The lvm2 patches were rebased and fixed to work with lvm2 2.02.62: http://people.redhat.com/msnitzer/patches/multisnap/lvm2/LVM2-2.02.62/ I've not heard from you on either line of work that I did here. I welcome your feedback. Mike ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2010-03-09 3:30 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-03-02 0:23 [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer 2010-03-02 0:23 ` [PATCH 01/14] dm-multisnap-common Mike Snitzer 2010-03-02 0:23 ` [PATCH 02/14] dm-bufio Mike Snitzer 2010-03-02 0:23 ` [PATCH 03/14] dm-multisnap-mikulas-headers Mike Snitzer 2010-03-05 22:46 ` Mike Snitzer 2010-03-06 1:54 ` Mike Snitzer 2010-03-09 3:08 ` Mikulas Patocka 2010-03-09 3:30 ` Mike Snitzer 2010-03-02 0:23 ` [PATCH 04/14] dm-multisnap-mikulas-alloc Mike Snitzer 2010-03-02 0:23 ` [PATCH 05/14] dm-multisnap-mikulas-blocks Mike Snitzer 2010-03-02 0:23 ` [PATCH 06/14] dm-multisnap-mikulas-btree Mike Snitzer 2010-03-02 0:23 ` [PATCH 07/14] dm-multisnap-mikulas-commit Mike Snitzer 2010-03-02 0:23 ` [PATCH 08/14] dm-multisnap-mikulas-delete Mike Snitzer 2010-03-02 0:23 ` [PATCH 09/14] dm-multisnap-mikulas-freelist Mike Snitzer 2010-03-02 0:23 ` [PATCH 10/14] dm-multisnap-mikulas-io Mike Snitzer 2010-03-02 0:23 ` [PATCH 11/14] dm-multisnap-mikulas-snaps Mike Snitzer 2010-03-02 0:23 ` [PATCH 12/14] dm-multisnap-mikulas-common Mike Snitzer 2010-03-02 0:23 ` [PATCH 13/14] dm-multisnap-mikulas-config Mike Snitzer 2010-03-02 0:23 ` [PATCH 14/14] dm-multisnap-daniel Mike Snitzer 2010-03-02 0:57 ` FUJITA Tomonori 2010-03-02 0:32 ` [PATCH 00/14] mikulas' shared snapshot patches Mike Snitzer 2010-03-02 14:56 ` Mike Snitzer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.