* [V2 PATCH 00/13] The latest patches for md-cluster
@ 2016-05-02 15:33 Guoqing Jiang
2016-05-02 15:33 ` [V2 PATCH 01/13] md-cluster: change resync lock from asynchronous to synchronous Guoqing Jiang
` (8 more replies)
0 siblings, 9 replies; 15+ messages in thread
From: Guoqing Jiang @ 2016-05-02 15:33 UTC (permalink / raw)
To: shli; +Cc: neilb, linux-raid, Guoqing Jiang
Changes:
1. delete no_hijack parameter from bitmap_get_counter
2. add one patch from kbuild test robot to remove checkpatch warning
3. md-set-MD_CHANGE_PENDING-in-a-spinlocked-region.patch is removed
from the patchset and it will be post later.
For cluster raid1, we found some issues and some codes need to be
improved during the past months, and all the patches are based on
for-next branch of md tree.
The patchset also available in github as follows:
https://github.com/GuoqingJiang/linux/tree/md-for-next
Thanks,
Guoqing
Guoqing Jiang (12):
md-cluster: change resync lock from asynchronous to synchronous
md-cluser: make resync_finish only called after pers->sync_request
md-cluster: wake up thread to continue recovery
md-cluster: unregister thread if err happened
md-cluster: fix locking when node joins cluster during message
broadcast
md-cluster: change array_sectors and update size are not supported
md-cluster: wakeup thread if activated a spare disk
md-cluster: always setup in-memory bitmap
md-cluster: sync bitmap when node received RESYNCING msg
md-cluster/bitmap: fix wrong calcuation of offset
md-cluster/bitmap: fix wrong page num in bitmap_file_clear_bit and
bitmap_file_set_bit
md-cluster/bitmap: unplug bitmap to sync dirty pages to disk
kbuild test robot (1):
md-cluster: fix ifnullfree.cocci warnings
Documentation/md-cluster.txt | 6 ++++
drivers/md/bitmap.c | 85 ++++++++++++++++++++++++++++++++++++++------
drivers/md/bitmap.h | 3 ++
drivers/md/md-cluster.c | 53 ++++++++++++++++++++++-----
drivers/md/md.c | 61 ++++++++++++++++++++-----------
5 files changed, 168 insertions(+), 40 deletions(-)
--
2.6.6
^ permalink raw reply [flat|nested] 15+ messages in thread
* [V2 PATCH 01/13] md-cluster: change resync lock from asynchronous to synchronous
2016-05-02 15:33 [V2 PATCH 00/13] The latest patches for md-cluster Guoqing Jiang
@ 2016-05-02 15:33 ` Guoqing Jiang
2016-05-02 15:33 ` [V2 PATCH 02/13] md-cluser: make resync_finish only called after pers->sync_request Guoqing Jiang
` (7 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Guoqing Jiang @ 2016-05-02 15:33 UTC (permalink / raw)
To: shli; +Cc: neilb, linux-raid, Guoqing Jiang
If multiple nodes choose to attempt do resync at the same time
they need to be serialized so they don't duplicate effort. This
serialization is done by locking the 'resync' DLM lock.
Currently if a node cannot get the lock immediately it doesn't
request notification when the lock becomes available (i.e.
DLM_LKF_NOQUEUE is set), so it may not reliably find out when it
is safe to try again.
Rather than trying to arrange an async wake-up when the lock
becomes available, switch to using synchronous locking - this is
a lot easier to think about. As it is not permitted to block in
the 'raid1d' thread, move the locking to the resync thread. So
the rsync thread is forked immediately, but it blocks until the
resync lock is available. Once the lock is locked it checks again
if any resync action is needed.
A particular symptom of the current problem is that a node can
get stuck with "resync=pending" indefinitely.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
drivers/md/md-cluster.c | 2 --
drivers/md/md.c | 23 ++++++++++++++---------
2 files changed, 14 insertions(+), 11 deletions(-)
diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index dd97d42..12fbfec 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -937,7 +937,6 @@ static void metadata_update_cancel(struct mddev *mddev)
static int resync_start(struct mddev *mddev)
{
struct md_cluster_info *cinfo = mddev->cluster_info;
- cinfo->resync_lockres->flags |= DLM_LKF_NOQUEUE;
return dlm_lock_sync(cinfo->resync_lockres, DLM_LOCK_EX);
}
@@ -967,7 +966,6 @@ static int resync_info_update(struct mddev *mddev, sector_t lo, sector_t hi)
static int resync_finish(struct mddev *mddev)
{
struct md_cluster_info *cinfo = mddev->cluster_info;
- cinfo->resync_lockres->flags &= ~DLM_LKF_NOQUEUE;
dlm_unlock_sync(cinfo->resync_lockres);
return resync_info_update(mddev, 0, 0);
}
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 14d3b37..4fd7d77 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7786,6 +7786,7 @@ void md_do_sync(struct md_thread *thread)
char *desc, *action = NULL;
struct blk_plug plug;
bool cluster_resync_finished = false;
+ int ret;
/* just incase thread restarts... */
if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
@@ -7795,6 +7796,19 @@ void md_do_sync(struct md_thread *thread)
return;
}
+ if (mddev_is_clustered(mddev)) {
+ ret = md_cluster_ops->resync_start(mddev);
+ if (ret)
+ goto skip;
+
+ if (!(test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ||
+ test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) ||
+ test_bit(MD_RECOVERY_RECOVER, &mddev->recovery))
+ && ((unsigned long long)mddev->curr_resync_completed
+ < (unsigned long long)mddev->resync_max_sectors))
+ goto skip;
+ }
+
if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
if (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) {
desc = "data-check";
@@ -8226,18 +8240,9 @@ static void md_start_sync(struct work_struct *ws)
struct mddev *mddev = container_of(ws, struct mddev, del_work);
int ret = 0;
- if (mddev_is_clustered(mddev)) {
- ret = md_cluster_ops->resync_start(mddev);
- if (ret) {
- mddev->sync_thread = NULL;
- goto out;
- }
- }
-
mddev->sync_thread = md_register_thread(md_do_sync,
mddev,
"resync");
-out:
if (!mddev->sync_thread) {
if (!(mddev_is_clustered(mddev) && ret == -EAGAIN))
printk(KERN_ERR "%s: could not start resync"
--
2.6.6
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [V2 PATCH 02/13] md-cluser: make resync_finish only called after pers->sync_request
2016-05-02 15:33 [V2 PATCH 00/13] The latest patches for md-cluster Guoqing Jiang
2016-05-02 15:33 ` [V2 PATCH 01/13] md-cluster: change resync lock from asynchronous to synchronous Guoqing Jiang
@ 2016-05-02 15:33 ` Guoqing Jiang
2016-05-02 15:33 ` [V2 PATCH 03/13] md-cluster: wake up thread to continue recovery Guoqing Jiang
` (6 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Guoqing Jiang @ 2016-05-02 15:33 UTC (permalink / raw)
To: shli; +Cc: neilb, linux-raid, Guoqing Jiang
It is not reasonable that cluster raid to release resync
lock before the last pers->sync_request has finished.
As the metadata will be changed when node performs resync,
we need to inform other nodes to update metadata, so the
MD_CHANGE_PENDING flag is set before finish resync.
Then metadata_update_finish is move ahead to ensure that
METADATA_UPDATED msg is sent before finish resync, and
metadata_update_start need to be run after "repeat:" label
accordingly.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
drivers/md/md.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4fd7d77..dd83a50 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2291,6 +2291,7 @@ void md_update_sb(struct mddev *mddev, int force_change)
return;
}
+repeat:
if (mddev_is_clustered(mddev)) {
if (test_and_clear_bit(MD_CHANGE_DEVS, &mddev->flags))
force_change = 1;
@@ -2303,7 +2304,7 @@ void md_update_sb(struct mddev *mddev, int force_change)
return;
}
}
-repeat:
+
/* First make sure individual recovery_offsets are correct */
rdev_for_each(rdev, mddev) {
if (rdev->raid_disk >= 0 &&
@@ -2430,6 +2431,9 @@ repeat:
md_super_wait(mddev);
/* if there was a failure, MD_CHANGE_DEVS was set, and we re-write super */
+ if (mddev_is_clustered(mddev) && ret == 0)
+ md_cluster_ops->metadata_update_finish(mddev);
+
spin_lock(&mddev->lock);
if (mddev->in_sync != sync_req ||
test_bit(MD_CHANGE_DEVS, &mddev->flags)) {
@@ -2452,9 +2456,6 @@ repeat:
clear_bit(BlockedBadBlocks, &rdev->flags);
wake_up(&rdev->blocked_wait);
}
-
- if (mddev_is_clustered(mddev) && ret == 0)
- md_cluster_ops->metadata_update_finish(mddev);
}
EXPORT_SYMBOL(md_update_sb);
@@ -7785,7 +7786,6 @@ void md_do_sync(struct md_thread *thread)
struct md_rdev *rdev;
char *desc, *action = NULL;
struct blk_plug plug;
- bool cluster_resync_finished = false;
int ret;
/* just incase thread restarts... */
@@ -8103,11 +8103,6 @@ void md_do_sync(struct md_thread *thread)
mddev->curr_resync_completed = mddev->curr_resync;
sysfs_notify(&mddev->kobj, NULL, "sync_completed");
}
- /* tell personality and other nodes that we are finished */
- if (mddev_is_clustered(mddev)) {
- md_cluster_ops->resync_finish(mddev);
- cluster_resync_finished = true;
- }
mddev->pers->sync_request(mddev, max_sectors, &skipped);
if (!test_bit(MD_RECOVERY_CHECK, &mddev->recovery) &&
@@ -8147,9 +8142,15 @@ void md_do_sync(struct md_thread *thread)
set_bit(MD_CHANGE_DEVS, &mddev->flags);
if (mddev_is_clustered(mddev) &&
- test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
- !cluster_resync_finished)
+ ret == 0) {
+ /* set CHANGE_PENDING here since maybe another
+ * update is needed, so other nodes are informed */
+ set_bit(MD_CHANGE_PENDING, &mddev->flags);
+ md_wakeup_thread(mddev->thread);
+ wait_event(mddev->sb_wait,
+ !test_bit(MD_CHANGE_PENDING, &mddev->flags));
md_cluster_ops->resync_finish(mddev);
+ }
spin_lock(&mddev->lock);
if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
--
2.6.6
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [V2 PATCH 03/13] md-cluster: wake up thread to continue recovery
2016-05-02 15:33 [V2 PATCH 00/13] The latest patches for md-cluster Guoqing Jiang
2016-05-02 15:33 ` [V2 PATCH 01/13] md-cluster: change resync lock from asynchronous to synchronous Guoqing Jiang
2016-05-02 15:33 ` [V2 PATCH 02/13] md-cluser: make resync_finish only called after pers->sync_request Guoqing Jiang
@ 2016-05-02 15:33 ` Guoqing Jiang
2016-05-02 15:33 ` [V2 PATCH 04/13] md-cluster: unregister thread if err happened Guoqing Jiang
` (5 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Guoqing Jiang @ 2016-05-02 15:33 UTC (permalink / raw)
To: shli; +Cc: neilb, linux-raid, Guoqing Jiang
In recovery case, we need to set MD_RECOVERY_NEEDED
and wake up thread only if recover is not finished.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
drivers/md/md-cluster.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 12fbfec..0d4ddf8 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -284,11 +284,14 @@ static void recover_bitmaps(struct md_thread *thread)
goto dlm_unlock;
}
if (hi > 0) {
- /* TODO:Wait for current resync to get over */
- set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
if (lo < mddev->recovery_cp)
mddev->recovery_cp = lo;
- md_check_recovery(mddev);
+ /* wake up thread to continue resync in case resync
+ * is not finished */
+ if (mddev->recovery_cp != MaxSector) {
+ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+ md_wakeup_thread(mddev->thread);
+ }
}
dlm_unlock:
dlm_unlock_sync(bm_lockres);
--
2.6.6
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [V2 PATCH 04/13] md-cluster: unregister thread if err happened
2016-05-02 15:33 [V2 PATCH 00/13] The latest patches for md-cluster Guoqing Jiang
` (2 preceding siblings ...)
2016-05-02 15:33 ` [V2 PATCH 03/13] md-cluster: wake up thread to continue recovery Guoqing Jiang
@ 2016-05-02 15:33 ` Guoqing Jiang
2016-05-02 15:33 ` [V2 PATCH 05/13] md-cluster: fix locking when node joins cluster during message broadcast Guoqing Jiang
` (4 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Guoqing Jiang @ 2016-05-02 15:33 UTC (permalink / raw)
To: shli; +Cc: neilb, linux-raid, Guoqing Jiang
The two threads need to be unregistered if a node
can't join cluster successfully.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
drivers/md/md-cluster.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 0d4ddf8..76f88f7 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -818,6 +818,8 @@ static int join(struct mddev *mddev, int nodes)
return 0;
err:
+ md_unregister_thread(&cinfo->recovery_thread);
+ md_unregister_thread(&cinfo->recv_thread);
lockres_free(cinfo->message_lockres);
lockres_free(cinfo->token_lockres);
lockres_free(cinfo->ack_lockres);
--
2.6.6
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [V2 PATCH 05/13] md-cluster: fix locking when node joins cluster during message broadcast
2016-05-02 15:33 [V2 PATCH 00/13] The latest patches for md-cluster Guoqing Jiang
` (3 preceding siblings ...)
2016-05-02 15:33 ` [V2 PATCH 04/13] md-cluster: unregister thread if err happened Guoqing Jiang
@ 2016-05-02 15:33 ` Guoqing Jiang
2016-05-02 15:33 ` [V2 PATCH 06/13] md-cluster: change array_sectors and update size are not supported Guoqing Jiang
` (3 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Guoqing Jiang @ 2016-05-02 15:33 UTC (permalink / raw)
To: shli; +Cc: neilb, linux-raid, Guoqing Jiang
If a node joins the cluster while a message broadcast
is under way, a lock issue could happen as follows.
For a cluster which included two nodes, if node A is
calling __sendmsg before up-convert CR to EX on ack,
and node B released CR on ack. But if a new node C
joins the cluster and it doesn't receive the message
which A sent before, so it could hold CR on ack before
A up-convert CR to EX on ack.
So a node joining the cluster should get an EX lock on
the "token" first to ensure no broadcast is ongoing,
then release it after held CR on ack.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
drivers/md/md-cluster.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 76f88f7..30f1160 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -781,17 +781,24 @@ static int join(struct mddev *mddev, int nodes)
cinfo->token_lockres = lockres_init(mddev, "token", NULL, 0);
if (!cinfo->token_lockres)
goto err;
- cinfo->ack_lockres = lockres_init(mddev, "ack", ack_bast, 0);
- if (!cinfo->ack_lockres)
- goto err;
cinfo->no_new_dev_lockres = lockres_init(mddev, "no-new-dev", NULL, 0);
if (!cinfo->no_new_dev_lockres)
goto err;
+ ret = dlm_lock_sync(cinfo->token_lockres, DLM_LOCK_EX);
+ if (ret) {
+ ret = -EAGAIN;
+ pr_err("md-cluster: can't join cluster to avoid lock issue\n");
+ goto err;
+ }
+ cinfo->ack_lockres = lockres_init(mddev, "ack", ack_bast, 0);
+ if (!cinfo->ack_lockres)
+ goto err;
/* get sync CR lock on ACK. */
if (dlm_lock_sync(cinfo->ack_lockres, DLM_LOCK_CR))
pr_err("md-cluster: failed to get a sync CR lock on ACK!(%d)\n",
ret);
+ dlm_unlock_sync(cinfo->token_lockres);
/* get sync CR lock on no-new-dev. */
if (dlm_lock_sync(cinfo->no_new_dev_lockres, DLM_LOCK_CR))
pr_err("md-cluster: failed to get a sync CR lock on no-new-dev!(%d)\n", ret);
--
2.6.6
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [V2 PATCH 06/13] md-cluster: change array_sectors and update size are not supported
2016-05-02 15:33 [V2 PATCH 00/13] The latest patches for md-cluster Guoqing Jiang
` (4 preceding siblings ...)
2016-05-02 15:33 ` [V2 PATCH 05/13] md-cluster: fix locking when node joins cluster during message broadcast Guoqing Jiang
@ 2016-05-02 15:33 ` Guoqing Jiang
2016-05-02 15:33 ` [V2 PATCH 07/13] md-cluster: wakeup thread if activated a spare disk Guoqing Jiang
` (2 subsequent siblings)
8 siblings, 0 replies; 15+ messages in thread
From: Guoqing Jiang @ 2016-05-02 15:33 UTC (permalink / raw)
To: shli; +Cc: neilb, linux-raid, Guoqing Jiang
Currently, some features are not supported yet,
such as change array_sectors and update size, so
return EINVAL for them and listed it in document.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
Documentation/md-cluster.txt | 6 ++++++
drivers/md/md.c | 8 ++++++++
2 files changed, 14 insertions(+)
diff --git a/Documentation/md-cluster.txt b/Documentation/md-cluster.txt
index c100c71..3888327 100644
--- a/Documentation/md-cluster.txt
+++ b/Documentation/md-cluster.txt
@@ -316,3 +316,9 @@ The algorithm is:
nodes are using the raid which is achieved by lock all bitmap
locks within the cluster, and also those locks are unlocked
accordingly.
+
+7. Unsupported features
+
+There are somethings which are not supported by cluster MD yet.
+
+- update size and change array_sectors.
diff --git a/drivers/md/md.c b/drivers/md/md.c
index dd83a50..8cc4bbc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4817,6 +4817,10 @@ array_size_store(struct mddev *mddev, const char *buf, size_t len)
if (err)
return err;
+ /* cluster raid doesn't support change array_sectors */
+ if (mddev_is_clustered(mddev))
+ return -EINVAL;
+
if (strncmp(buf, "default", 7) == 0) {
if (mddev->pers)
sectors = mddev->pers->size(mddev, 0, 0);
@@ -6438,6 +6442,10 @@ static int update_size(struct mddev *mddev, sector_t num_sectors)
int rv;
int fit = (num_sectors == 0);
+ /* cluster raid doesn't support update size */
+ if (mddev_is_clustered(mddev))
+ return -EINVAL;
+
if (mddev->pers->resize == NULL)
return -EINVAL;
/* The "num_sectors" is the number of sectors of each device that
--
2.6.6
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [V2 PATCH 07/13] md-cluster: wakeup thread if activated a spare disk
2016-05-02 15:33 [V2 PATCH 00/13] The latest patches for md-cluster Guoqing Jiang
` (5 preceding siblings ...)
2016-05-02 15:33 ` [V2 PATCH 06/13] md-cluster: change array_sectors and update size are not supported Guoqing Jiang
@ 2016-05-02 15:33 ` Guoqing Jiang
2016-05-02 15:50 ` [V2 PATCH 08/13] md-cluster: always setup in-memory bitmap Guoqing Jiang
2016-05-02 17:49 ` [V2 PATCH 00/13] The latest patches for md-cluster Shaohua Li
8 siblings, 0 replies; 15+ messages in thread
From: Guoqing Jiang @ 2016-05-02 15:33 UTC (permalink / raw)
To: shli; +Cc: neilb, linux-raid, Guoqing Jiang
When a device is re-added, it will ultimately need
to be activated and that happens in md_check_recovery,
so we need to set MD_RECOVERY_NEEDED right after
remove_and_add_spares.
A specifical issue without the change is that when
one node perform fail/remove/readd on a disk, but
slave nodes could not add the disk back to array as
expected (added as missed instead of in sync). So
give slave nodes a chance to do resync.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
drivers/md/md.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8cc4bbc..06f6e81 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8694,6 +8694,11 @@ static void check_sb_changes(struct mddev *mddev, struct md_rdev *rdev)
ret = remove_and_add_spares(mddev, rdev2);
pr_info("Activated spare: %s\n",
bdevname(rdev2->bdev,b));
+ /* wakeup mddev->thread here, so array could
+ * perform resync with the new activated disk */
+ set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
+ md_wakeup_thread(mddev->thread);
+
}
/* device faulty
* We just want to do the minimum to mark the disk
--
2.6.6
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [V2 PATCH 08/13] md-cluster: always setup in-memory bitmap
2016-05-02 15:33 [V2 PATCH 00/13] The latest patches for md-cluster Guoqing Jiang
` (6 preceding siblings ...)
2016-05-02 15:33 ` [V2 PATCH 07/13] md-cluster: wakeup thread if activated a spare disk Guoqing Jiang
@ 2016-05-02 15:50 ` Guoqing Jiang
2016-05-02 15:50 ` [V2 PATCH 09/13] md-cluster: sync bitmap when node received RESYNCING msg Guoqing Jiang
` (4 more replies)
2016-05-02 17:49 ` [V2 PATCH 00/13] The latest patches for md-cluster Shaohua Li
8 siblings, 5 replies; 15+ messages in thread
From: Guoqing Jiang @ 2016-05-02 15:50 UTC (permalink / raw)
To: shli; +Cc: neilb, linux-raid, Guoqing Jiang
The in-memory bitmap for raid is allocated on demand,
then for cluster scenario, it is possible that slave
node which received RESYNCING message doesn't have the
in-memory bitmap when master node is perform resyncing,
so we can't make bitmap is match up well among each
nodes.
So for cluster scenario, we need always preserve the
bitmap, and ensure the page will not be freed. And a
no_hijack flag is introduced to both bitmap_checkpage
and bitmap_get_counter, which makes cluster raid returns
fail once allocate failed.
And the next patch is relied on this change since it
keeps sync bitmap among each nodes during resyncing
stage.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
drivers/md/bitmap.c | 37 +++++++++++++++++++++++++++++++++++--
1 file changed, 35 insertions(+), 2 deletions(-)
diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index 3fe86b5..431da21 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -46,7 +46,7 @@ static inline char *bmname(struct bitmap *bitmap)
* allocated while we're using it
*/
static int bitmap_checkpage(struct bitmap_counts *bitmap,
- unsigned long page, int create)
+ unsigned long page, int create, int no_hijack)
__releases(bitmap->lock)
__acquires(bitmap->lock)
{
@@ -90,6 +90,9 @@ __acquires(bitmap->lock)
if (mappage == NULL) {
pr_debug("md/bitmap: map page allocation failed, hijacking\n");
+ /* We don't support hijack for cluster raid */
+ if (no_hijack)
+ return -ENOMEM;
/* failed - set the hijacked flag so that we can use the
* pointer as a counter */
if (!bitmap->bp[page].map)
@@ -1321,7 +1324,7 @@ __acquires(bitmap->lock)
sector_t csize;
int err;
- err = bitmap_checkpage(bitmap, page, create);
+ err = bitmap_checkpage(bitmap, page, create, 0);
if (bitmap->bp[page].hijacked ||
bitmap->bp[page].map == NULL)
@@ -2032,6 +2035,36 @@ int bitmap_resize(struct bitmap *bitmap, sector_t blocks,
chunks << chunkshift);
spin_lock_irq(&bitmap->counts.lock);
+ /* For cluster raid, need to pre-allocate bitmap */
+ if (mddev_is_clustered(bitmap->mddev)) {
+ unsigned long page;
+ for (page = 0; page < pages; page++) {
+ ret = bitmap_checkpage(&bitmap->counts, page, 1, 1);
+ if (ret) {
+ unsigned long k;
+
+ /* deallocate the page memory */
+ for (k = 0; k < page; k++) {
+ if (new_bp[k].map)
+ kfree(new_bp[k].map);
+ }
+
+ /* restore some fields from old_counts */
+ bitmap->counts.bp = old_counts.bp;
+ bitmap->counts.pages = old_counts.pages;
+ bitmap->counts.missing_pages = old_counts.pages;
+ bitmap->counts.chunkshift = old_counts.chunkshift;
+ bitmap->counts.chunks = old_counts.chunks;
+ bitmap->mddev->bitmap_info.chunksize = 1 << (old_counts.chunkshift +
+ BITMAP_BLOCK_SHIFT);
+ blocks = old_counts.chunks << old_counts.chunkshift;
+ pr_err("Could not pre-allocate in-memory bitmap for cluster raid\n");
+ break;
+ } else
+ bitmap->counts.bp[page].count += 1;
+ }
+ }
+
for (block = 0; block < blocks; ) {
bitmap_counter_t *bmc_old, *bmc_new;
int set;
--
2.6.6
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [V2 PATCH 09/13] md-cluster: sync bitmap when node received RESYNCING msg
2016-05-02 15:50 ` [V2 PATCH 08/13] md-cluster: always setup in-memory bitmap Guoqing Jiang
@ 2016-05-02 15:50 ` Guoqing Jiang
2016-05-02 15:50 ` [V2 PATCH 10/13] md-cluster/bitmap: fix wrong calcuation of offset Guoqing Jiang
` (3 subsequent siblings)
4 siblings, 0 replies; 15+ messages in thread
From: Guoqing Jiang @ 2016-05-02 15:50 UTC (permalink / raw)
To: shli; +Cc: neilb, linux-raid, Guoqing Jiang
If the node received RESYNCING message which means
another node will perform resync with the area, then
we don't want to do it again in another node.
Let's set RESYNC_MASK and clear NEEDED_MASK for the
region from old-low to new-low which has finished
syncing, and the region from old-hi to new-hi is about
to syncing, bitmap_sync_with_cluste is introduced for
the purpose.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
drivers/md/bitmap.c | 21 +++++++++++++++++++++
drivers/md/bitmap.h | 3 +++
drivers/md/md-cluster.c | 27 +++++++++++++++++++++++++++
3 files changed, 51 insertions(+)
diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index 431da21..ac93d87 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -1597,6 +1597,27 @@ void bitmap_cond_end_sync(struct bitmap *bitmap, sector_t sector, bool force)
}
EXPORT_SYMBOL(bitmap_cond_end_sync);
+void bitmap_sync_with_cluster(struct mddev *mddev,
+ sector_t old_lo, sector_t old_hi,
+ sector_t new_lo, sector_t new_hi)
+{
+ struct bitmap *bitmap = mddev->bitmap;
+ sector_t sector, blocks = 0;
+
+ for (sector = old_lo; sector < new_lo; ) {
+ bitmap_end_sync(bitmap, sector, &blocks, 0);
+ sector += blocks;
+ }
+ WARN((blocks > new_lo) && old_lo, "alignment is not correct for lo\n");
+
+ for (sector = old_hi; sector < new_hi; ) {
+ bitmap_start_sync(bitmap, sector, &blocks, 0);
+ sector += blocks;
+ }
+ WARN((blocks > new_hi) && old_hi, "alignment is not correct for hi\n");
+}
+EXPORT_SYMBOL(bitmap_sync_with_cluster);
+
static void bitmap_set_memory_bits(struct bitmap *bitmap, sector_t offset, int needed)
{
/* For each chunk covered by any of these sectors, set the
diff --git a/drivers/md/bitmap.h b/drivers/md/bitmap.h
index 5e3fcd6..5b6dd63 100644
--- a/drivers/md/bitmap.h
+++ b/drivers/md/bitmap.h
@@ -258,6 +258,9 @@ int bitmap_start_sync(struct bitmap *bitmap, sector_t offset, sector_t *blocks,
void bitmap_end_sync(struct bitmap *bitmap, sector_t offset, sector_t *blocks, int aborted);
void bitmap_close_sync(struct bitmap *bitmap);
void bitmap_cond_end_sync(struct bitmap *bitmap, sector_t sector, bool force);
+void bitmap_sync_with_cluster(struct mddev *mddev,
+ sector_t old_lo, sector_t old_hi,
+ sector_t new_lo, sector_t new_hi);
void bitmap_unplug(struct bitmap *bitmap);
void bitmap_daemon_work(struct mddev *mddev);
diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 30f1160..a55b5f4 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -85,6 +85,9 @@ struct md_cluster_info {
struct completion newdisk_completion;
wait_queue_head_t wait;
unsigned long state;
+ /* record the region in RESYNCING message */
+ sector_t sync_low;
+ sector_t sync_hi;
};
enum msg_type {
@@ -411,6 +414,30 @@ static void process_suspend_info(struct mddev *mddev,
md_wakeup_thread(mddev->thread);
return;
}
+
+ /*
+ * The bitmaps are not same for different nodes
+ * if RESYNCING is happening in one node, then
+ * the node which received the RESYNCING message
+ * probably will perform resync with the region
+ * [lo, hi] again, so we could reduce resync time
+ * a lot if we can ensure that the bitmaps among
+ * different nodes are match up well.
+ *
+ * sync_low/hi is used to record the region which
+ * arrived in the previous RESYNCING message,
+ *
+ * Call bitmap_sync_with_cluster to clear
+ * NEEDED_MASK and set RESYNC_MASK since
+ * resync thread is running in another node,
+ * so we don't need to do the resync again
+ * with the same section */
+ bitmap_sync_with_cluster(mddev, cinfo->sync_low,
+ cinfo->sync_hi,
+ lo, hi);
+ cinfo->sync_low = lo;
+ cinfo->sync_hi = hi;
+
s = kzalloc(sizeof(struct suspend_info), GFP_KERNEL);
if (!s)
return;
--
2.6.6
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [V2 PATCH 10/13] md-cluster/bitmap: fix wrong calcuation of offset
2016-05-02 15:50 ` [V2 PATCH 08/13] md-cluster: always setup in-memory bitmap Guoqing Jiang
2016-05-02 15:50 ` [V2 PATCH 09/13] md-cluster: sync bitmap when node received RESYNCING msg Guoqing Jiang
@ 2016-05-02 15:50 ` Guoqing Jiang
2016-05-02 15:50 ` [V2 PATCH 11/13] md-cluster/bitmap: fix wrong page num in bitmap_file_clear_bit and bitmap_file_set_bit Guoqing Jiang
` (2 subsequent siblings)
4 siblings, 0 replies; 15+ messages in thread
From: Guoqing Jiang @ 2016-05-02 15:50 UTC (permalink / raw)
To: shli; +Cc: neilb, linux-raid, Guoqing Jiang
The offset is wrong in bitmap_storage_alloc, we should
set it like below in bitmap_init_from_disk().
node_offset = bitmap->cluster_slot * (DIV_ROUND_UP(store->bytes, PAGE_SIZE));
Because 'offset' is only assigned to 'page->index' and
that is usually over-written by read_sb_page. So it does
not cause problem in general, but it still need to be fixed.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
drivers/md/bitmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index ac93d87..cf93bb8 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -759,7 +759,7 @@ static int bitmap_storage_alloc(struct bitmap_storage *store,
bytes += sizeof(bitmap_super_t);
num_pages = DIV_ROUND_UP(bytes, PAGE_SIZE);
- offset = slot_number * (num_pages - 1);
+ offset = slot_number * num_pages;
store->filemap = kmalloc(sizeof(struct page *)
* num_pages, GFP_KERNEL);
--
2.6.6
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [V2 PATCH 11/13] md-cluster/bitmap: fix wrong page num in bitmap_file_clear_bit and bitmap_file_set_bit
2016-05-02 15:50 ` [V2 PATCH 08/13] md-cluster: always setup in-memory bitmap Guoqing Jiang
2016-05-02 15:50 ` [V2 PATCH 09/13] md-cluster: sync bitmap when node received RESYNCING msg Guoqing Jiang
2016-05-02 15:50 ` [V2 PATCH 10/13] md-cluster/bitmap: fix wrong calcuation of offset Guoqing Jiang
@ 2016-05-02 15:50 ` Guoqing Jiang
2016-05-02 15:50 ` [V2 PATCH 12/13] md-cluster/bitmap: unplug bitmap to sync dirty pages to disk Guoqing Jiang
2016-05-02 15:50 ` [V2 PATCH 13/13] md-cluster: fix ifnullfree.cocci warnings Guoqing Jiang
4 siblings, 0 replies; 15+ messages in thread
From: Guoqing Jiang @ 2016-05-02 15:50 UTC (permalink / raw)
To: shli; +Cc: neilb, linux-raid, Guoqing Jiang
The pnum passed to set_page_attr and test_page_attr should from
0 to storage.file_pages - 1, but bitmap_file_set_bit and
bitmap_file_clear_bit call set_page_attr and test_page_attr with
page->index parameter while page->index has already added node_offset
before.
So we need to minus node_offset in both bitmap_file_clear_bit
and bitmap_file_set_bit.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
drivers/md/bitmap.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index cf93bb8..de28c80 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -903,6 +903,11 @@ static void bitmap_file_set_bit(struct bitmap *bitmap, sector_t block)
struct page *page;
void *kaddr;
unsigned long chunk = block >> bitmap->counts.chunkshift;
+ struct bitmap_storage *store = &bitmap->storage;
+ unsigned long node_offset = 0;
+
+ if (mddev_is_clustered(bitmap->mddev))
+ node_offset = bitmap->cluster_slot * store->file_pages;
page = filemap_get_page(&bitmap->storage, chunk);
if (!page)
@@ -918,7 +923,7 @@ static void bitmap_file_set_bit(struct bitmap *bitmap, sector_t block)
kunmap_atomic(kaddr);
pr_debug("set file bit %lu page %lu\n", bit, page->index);
/* record page number so it gets flushed to disk when unplug occurs */
- set_page_attr(bitmap, page->index, BITMAP_PAGE_DIRTY);
+ set_page_attr(bitmap, page->index - node_offset, BITMAP_PAGE_DIRTY);
}
static void bitmap_file_clear_bit(struct bitmap *bitmap, sector_t block)
@@ -927,6 +932,11 @@ static void bitmap_file_clear_bit(struct bitmap *bitmap, sector_t block)
struct page *page;
void *paddr;
unsigned long chunk = block >> bitmap->counts.chunkshift;
+ struct bitmap_storage *store = &bitmap->storage;
+ unsigned long node_offset = 0;
+
+ if (mddev_is_clustered(bitmap->mddev))
+ node_offset = bitmap->cluster_slot * store->file_pages;
page = filemap_get_page(&bitmap->storage, chunk);
if (!page)
@@ -938,8 +948,8 @@ static void bitmap_file_clear_bit(struct bitmap *bitmap, sector_t block)
else
clear_bit_le(bit, paddr);
kunmap_atomic(paddr);
- if (!test_page_attr(bitmap, page->index, BITMAP_PAGE_NEEDWRITE)) {
- set_page_attr(bitmap, page->index, BITMAP_PAGE_PENDING);
+ if (!test_page_attr(bitmap, page->index - node_offset, BITMAP_PAGE_NEEDWRITE)) {
+ set_page_attr(bitmap, page->index - node_offset, BITMAP_PAGE_PENDING);
bitmap->allclean = 0;
}
}
--
2.6.6
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [V2 PATCH 12/13] md-cluster/bitmap: unplug bitmap to sync dirty pages to disk
2016-05-02 15:50 ` [V2 PATCH 08/13] md-cluster: always setup in-memory bitmap Guoqing Jiang
` (2 preceding siblings ...)
2016-05-02 15:50 ` [V2 PATCH 11/13] md-cluster/bitmap: fix wrong page num in bitmap_file_clear_bit and bitmap_file_set_bit Guoqing Jiang
@ 2016-05-02 15:50 ` Guoqing Jiang
2016-05-02 15:50 ` [V2 PATCH 13/13] md-cluster: fix ifnullfree.cocci warnings Guoqing Jiang
4 siblings, 0 replies; 15+ messages in thread
From: Guoqing Jiang @ 2016-05-02 15:50 UTC (permalink / raw)
To: shli; +Cc: neilb, linux-raid, Guoqing Jiang
This patch is doing two distinct but related things.
1. It adds bitmap_unplug() for the main bitmap (mddev->bitmap). As bit
have been set, BITMAP_PAGE_DIRTY is set so bitmap_deamon_work() will
not write those pages out in its regular scans, only bitmap_unplug()
will. If there are no writes to the array, bitmap_unplug() won't be
called, so we need to call it explicitly here.
2. bitmap_write_all() is a bit of a confusing interface as it doesn't
actually write anything. The current code for writing "bitmap" works
but this change makes it a bit clearer.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
drivers/md/bitmap.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index de28c80..4a05bac 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -1924,14 +1924,14 @@ int bitmap_copy_from_slot(struct mddev *mddev, int slot,
if (clear_bits) {
bitmap_update_sb(bitmap);
- /* Setting this for the ev_page should be enough.
- * And we do not require both write_all and PAGE_DIRT either
- */
+ /* BITMAP_PAGE_PENDING is set, but bitmap_unplug needs
+ * BITMAP_PAGE_DIRTY or _NEEDWRITE to write ... */
for (i = 0; i < bitmap->storage.file_pages; i++)
- set_page_attr(bitmap, i, BITMAP_PAGE_DIRTY);
- bitmap_write_all(bitmap);
+ if (test_page_attr(bitmap, i, BITMAP_PAGE_PENDING))
+ set_page_attr(bitmap, i, BITMAP_PAGE_NEEDWRITE);
bitmap_unplug(bitmap);
}
+ bitmap_unplug(mddev->bitmap);
*low = lo;
*high = hi;
err:
--
2.6.6
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [V2 PATCH 13/13] md-cluster: fix ifnullfree.cocci warnings
2016-05-02 15:50 ` [V2 PATCH 08/13] md-cluster: always setup in-memory bitmap Guoqing Jiang
` (3 preceding siblings ...)
2016-05-02 15:50 ` [V2 PATCH 12/13] md-cluster/bitmap: unplug bitmap to sync dirty pages to disk Guoqing Jiang
@ 2016-05-02 15:50 ` Guoqing Jiang
4 siblings, 0 replies; 15+ messages in thread
From: Guoqing Jiang @ 2016-05-02 15:50 UTC (permalink / raw)
To: shli; +Cc: neilb, linux-raid, kbuild test robot, Fengguang Wu
From: kbuild test robot <lkp@intel.com>
drivers/md/bitmap.c:2049:6-11: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values.
NULL check before some freeing functions is not needed.
Based on checkpatch warning
"kfree(NULL) is safe this check is probably not required"
and kfreeaddr.cocci by Julia Lawall.
Generated by: scripts/coccinelle/free/ifnullfree.cocci
Acked-by: Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
---
drivers/md/bitmap.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index 4a05bac..ad5a858 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -2076,8 +2076,7 @@ int bitmap_resize(struct bitmap *bitmap, sector_t blocks,
/* deallocate the page memory */
for (k = 0; k < page; k++) {
- if (new_bp[k].map)
- kfree(new_bp[k].map);
+ kfree(new_bp[k].map);
}
/* restore some fields from old_counts */
--
2.6.6
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [V2 PATCH 00/13] The latest patches for md-cluster
2016-05-02 15:33 [V2 PATCH 00/13] The latest patches for md-cluster Guoqing Jiang
` (7 preceding siblings ...)
2016-05-02 15:50 ` [V2 PATCH 08/13] md-cluster: always setup in-memory bitmap Guoqing Jiang
@ 2016-05-02 17:49 ` Shaohua Li
8 siblings, 0 replies; 15+ messages in thread
From: Shaohua Li @ 2016-05-02 17:49 UTC (permalink / raw)
To: Guoqing Jiang; +Cc: neilb, linux-raid
On Mon, May 02, 2016 at 11:33:07AM -0400, Guoqing Jiang wrote:
> Changes:
> 1. delete no_hijack parameter from bitmap_get_counter
> 2. add one patch from kbuild test robot to remove checkpatch warning
> 3. md-set-MD_CHANGE_PENDING-in-a-spinlocked-region.patch is removed
> from the patchset and it will be post later.
>
> For cluster raid1, we found some issues and some codes need to be
> improved during the past months, and all the patches are based on
> for-next branch of md tree.
>
> The patchset also available in github as follows:
>
> https://github.com/GuoqingJiang/linux/tree/md-for-next
Applied, thanks!
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2016-05-02 17:49 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-02 15:33 [V2 PATCH 00/13] The latest patches for md-cluster Guoqing Jiang
2016-05-02 15:33 ` [V2 PATCH 01/13] md-cluster: change resync lock from asynchronous to synchronous Guoqing Jiang
2016-05-02 15:33 ` [V2 PATCH 02/13] md-cluser: make resync_finish only called after pers->sync_request Guoqing Jiang
2016-05-02 15:33 ` [V2 PATCH 03/13] md-cluster: wake up thread to continue recovery Guoqing Jiang
2016-05-02 15:33 ` [V2 PATCH 04/13] md-cluster: unregister thread if err happened Guoqing Jiang
2016-05-02 15:33 ` [V2 PATCH 05/13] md-cluster: fix locking when node joins cluster during message broadcast Guoqing Jiang
2016-05-02 15:33 ` [V2 PATCH 06/13] md-cluster: change array_sectors and update size are not supported Guoqing Jiang
2016-05-02 15:33 ` [V2 PATCH 07/13] md-cluster: wakeup thread if activated a spare disk Guoqing Jiang
2016-05-02 15:50 ` [V2 PATCH 08/13] md-cluster: always setup in-memory bitmap Guoqing Jiang
2016-05-02 15:50 ` [V2 PATCH 09/13] md-cluster: sync bitmap when node received RESYNCING msg Guoqing Jiang
2016-05-02 15:50 ` [V2 PATCH 10/13] md-cluster/bitmap: fix wrong calcuation of offset Guoqing Jiang
2016-05-02 15:50 ` [V2 PATCH 11/13] md-cluster/bitmap: fix wrong page num in bitmap_file_clear_bit and bitmap_file_set_bit Guoqing Jiang
2016-05-02 15:50 ` [V2 PATCH 12/13] md-cluster/bitmap: unplug bitmap to sync dirty pages to disk Guoqing Jiang
2016-05-02 15:50 ` [V2 PATCH 13/13] md-cluster: fix ifnullfree.cocci warnings Guoqing Jiang
2016-05-02 17:49 ` [V2 PATCH 00/13] The latest patches for md-cluster Shaohua Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).