* [md PATCH 0/4] A few more MD patches for 2.6.30
@ 2009-04-17 2:14 NeilBrown
2009-04-17 2:14 ` [md PATCH 2/4] md: allow setting newly added device to 'in_sync' via sysfs NeilBrown
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: NeilBrown @ 2009-04-17 2:14 UTC (permalink / raw)
To: linux-raid
Following are 4 more md patches that I'll be asking Linus to pull
shortly (they are in my for-linus branch already). The three from me
finish off the raid5 reshape enhancements by making sure that I have
accurate sysfs access to some important internals.
Any review is, of course, most welcome.
NeilBrown
---
Christoph Hellwig (1):
md: tiny md.h cleanups
NeilBrown (3):
md: update sync_completed and reshape_position even more often.
md: improve usefulness and accuracy of sysfs file md/sync_completed.
md: allow setting newly added device to 'in_sync' via sysfs.
drivers/md/bitmap.c | 1 +
drivers/md/md.c | 41 ++++++++++++++++++++++++++++-------------
drivers/md/md.h | 21 +++++++++++++--------
drivers/md/raid5.c | 7 ++++++-
4 files changed, 48 insertions(+), 22 deletions(-)
--
Signature
^ permalink raw reply [flat|nested] 5+ messages in thread
* [md PATCH 1/4] md: tiny md.h cleanups
2009-04-17 2:14 [md PATCH 0/4] A few more MD patches for 2.6.30 NeilBrown
2009-04-17 2:14 ` [md PATCH 2/4] md: allow setting newly added device to 'in_sync' via sysfs NeilBrown
@ 2009-04-17 2:14 ` NeilBrown
2009-04-17 2:14 ` [md PATCH 3/4] md: improve usefulness and accuracy of sysfs file md/sync_completed NeilBrown
2009-04-17 2:14 ` [md PATCH 4/4] md: update sync_completed and reshape_position even more often NeilBrown
3 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2009-04-17 2:14 UTC (permalink / raw)
To: linux-raid; +Cc: Christoph Hellwig, NeilBrown
From: Christoph Hellwig <hch@lst.de>
- update inclusion guard and make sure it covers the whole file
- remove superflous #ifdef CONFIG_BLOCK
- make sure all required headers are included so that new users aren't
required to include others before
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
---
drivers/md/md.h | 21 +++++++++++++--------
1 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/drivers/md/md.h b/drivers/md/md.h
index e9b7f54..8227ab9 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -12,10 +12,17 @@
Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*/
-#ifndef _MD_K_H
-#define _MD_K_H
-
-#ifdef CONFIG_BLOCK
+#ifndef _MD_MD_H
+#define _MD_MD_H
+
+#include <linux/blkdev.h>
+#include <linux/kobject.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/timer.h>
+#include <linux/wait.h>
+#include <linux/workqueue.h>
#define MaxSector (~(sector_t)0)
@@ -408,10 +415,6 @@ static inline void safe_put_page(struct page *p)
if (p) put_page(p);
}
-#endif /* CONFIG_BLOCK */
-#endif
-
-
extern int register_md_personality(struct mdk_personality *p);
extern int unregister_md_personality(struct mdk_personality *p);
extern mdk_thread_t * md_register_thread(void (*run) (mddev_t *mddev),
@@ -434,3 +437,5 @@ extern void md_new_event(mddev_t *mddev);
extern int md_allow_write(mddev_t *mddev);
extern void md_wait_for_blocked_rdev(mdk_rdev_t *rdev, mddev_t *mddev);
extern void md_set_array_sectors(mddev_t *mddev, sector_t array_sectors);
+
+#endif /* _MD_MD_H */
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [md PATCH 2/4] md: allow setting newly added device to 'in_sync' via sysfs.
2009-04-17 2:14 [md PATCH 0/4] A few more MD patches for 2.6.30 NeilBrown
@ 2009-04-17 2:14 ` NeilBrown
2009-04-17 2:14 ` [md PATCH 1/4] md: tiny md.h cleanups NeilBrown
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2009-04-17 2:14 UTC (permalink / raw)
To: linux-raid; +Cc: NeilBrown
When adding devices to an active array via sysfs, there is currently
no way to mark a device as 'in-sync' which is useful when
incrementally assembling an array.
So add that option.
Signed-off-by: NeilBrown <neilb@suse.de>
---
drivers/md/md.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index ed5727c..298731b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2086,6 +2086,7 @@ state_store(mdk_rdev_t *rdev, const char *buf, size_t len)
* -writemostly - clears write_mostly
* blocked - sets the Blocked flag
* -blocked - clears the Blocked flag
+ * insync - sets Insync providing device isn't active
*/
int err = -EINVAL;
if (cmd_match(buf, "faulty") && rdev->mddev->pers) {
@@ -2118,6 +2119,9 @@ state_store(mdk_rdev_t *rdev, const char *buf, size_t len)
md_wakeup_thread(rdev->mddev->thread);
err = 0;
+ } else if (cmd_match(buf, "insync") && rdev->raid_disk == -1) {
+ set_bit(In_sync, &rdev->flags);
+ err = 0;
}
if (!err && rdev->sysfs_state)
sysfs_notify_dirent(rdev->sysfs_state);
@@ -2190,7 +2194,7 @@ slot_store(mdk_rdev_t *rdev, const char *buf, size_t len)
} else if (rdev->mddev->pers) {
mdk_rdev_t *rdev2;
/* Activating a spare .. or possibly reactivating
- * if we every get bitmaps working here.
+ * if we ever get bitmaps working here.
*/
if (rdev->raid_disk != -1)
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [md PATCH 3/4] md: improve usefulness and accuracy of sysfs file md/sync_completed.
2009-04-17 2:14 [md PATCH 0/4] A few more MD patches for 2.6.30 NeilBrown
2009-04-17 2:14 ` [md PATCH 2/4] md: allow setting newly added device to 'in_sync' via sysfs NeilBrown
2009-04-17 2:14 ` [md PATCH 1/4] md: tiny md.h cleanups NeilBrown
@ 2009-04-17 2:14 ` NeilBrown
2009-04-17 2:14 ` [md PATCH 4/4] md: update sync_completed and reshape_position even more often NeilBrown
3 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2009-04-17 2:14 UTC (permalink / raw)
To: linux-raid; +Cc: NeilBrown
The sync_completed file reports how much of a resync (or recovery or
reshape) has been completed.
However due to the possibility of out-of-order completion of writes,
it is not certain to be accurate.
We have an internal value - mddev->curr_resync_completed - which is an
accurate value (though it might not always be quite so uptodate).
So:
- make curr_resync_completed be uptodate a little more often,
particularly when raid5 reshape updates status in the metadata
- report curr_resync_completed in the sysfs file
- allow poll/select to report all updates to md/sync_completed.
This makes sync_completed completed usable by any external metadata
handler that wants to record this status information in its metadata.
Signed-off-by: NeilBrown <neilb@suse.de>
---
drivers/md/bitmap.c | 1 +
drivers/md/md.c | 34 ++++++++++++++++++++++------------
drivers/md/raid5.c | 4 ++++
3 files changed, 27 insertions(+), 12 deletions(-)
diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index f8a9f7a..e4510c9 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -1479,6 +1479,7 @@ void bitmap_cond_end_sync(struct bitmap *bitmap, sector_t sector)
s += blocks;
}
bitmap->last_end_sync = jiffies;
+ sysfs_notify(&bitmap->mddev->kobj, NULL, "sync_completed");
}
static void bitmap_set_memory_bits(struct bitmap *bitmap, sector_t offset, int needed)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 298731b..7af64f3 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2017,6 +2017,8 @@ repeat:
clear_bit(MD_CHANGE_PENDING, &mddev->flags);
spin_unlock_irq(&mddev->write_lock);
wake_up(&mddev->sb_wait);
+ if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
+ sysfs_notify(&mddev->kobj, NULL, "sync_completed");
}
@@ -3486,12 +3488,15 @@ sync_completed_show(mddev_t *mddev, char *page)
{
unsigned long max_sectors, resync;
+ if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery))
+ return sprintf(page, "none\n");
+
if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery))
max_sectors = mddev->resync_max_sectors;
else
max_sectors = mddev->dev_sectors;
- resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active));
+ resync = mddev->curr_resync_completed;
return sprintf(page, "%lu / %lu\n", resync, max_sectors);
}
@@ -6338,18 +6343,12 @@ void md_do_sync(mddev_t *mddev)
sector_t sectors;
skipped = 0;
- if (j >= mddev->resync_max) {
- sysfs_notify(&mddev->kobj, NULL, "sync_completed");
- wait_event(mddev->recovery_wait,
- mddev->resync_max > j
- || kthread_should_stop());
- }
- if (kthread_should_stop())
- goto interrupted;
- if (mddev->curr_resync > mddev->curr_resync_completed &&
- (mddev->curr_resync - mddev->curr_resync_completed)
- > (max_sectors >> 4)) {
+ if ((mddev->curr_resync > mddev->curr_resync_completed &&
+ (mddev->curr_resync - mddev->curr_resync_completed)
+ > (max_sectors >> 4)) ||
+ j >= mddev->resync_max
+ ) {
/* time to update curr_resync_completed */
blk_unplug(mddev->queue);
wait_event(mddev->recovery_wait,
@@ -6357,7 +6356,17 @@ void md_do_sync(mddev_t *mddev)
mddev->curr_resync_completed =
mddev->curr_resync;
set_bit(MD_CHANGE_CLEAN, &mddev->flags);
+ sysfs_notify(&mddev->kobj, NULL, "sync_completed");
}
+
+ if (j >= mddev->resync_max)
+ wait_event(mddev->recovery_wait,
+ mddev->resync_max > j
+ || kthread_should_stop());
+
+ if (kthread_should_stop())
+ goto interrupted;
+
sectors = mddev->pers->sync_request(mddev, j, &skipped,
currspeed < speed_min(mddev));
if (sectors == 0) {
@@ -6465,6 +6474,7 @@ void md_do_sync(mddev_t *mddev)
skip:
mddev->curr_resync = 0;
+ mddev->curr_resync_completed = 0;
mddev->resync_min = 0;
mddev->resync_max = MaxSector;
sysfs_notify(&mddev->kobj, NULL, "sync_completed");
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 3bbc6d6..76892ac 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3845,6 +3845,7 @@ static sector_t reshape_request(mddev_t *mddev, sector_t sector_nr, int *skipped
wait_event(conf->wait_for_overlap,
atomic_read(&conf->reshape_stripes)==0);
mddev->reshape_position = conf->reshape_progress;
+ mddev->curr_resync_completed = mddev->curr_resync;
conf->reshape_checkpoint = jiffies;
set_bit(MD_CHANGE_DEVS, &mddev->flags);
md_wakeup_thread(mddev->thread);
@@ -3854,6 +3855,7 @@ static sector_t reshape_request(mddev_t *mddev, sector_t sector_nr, int *skipped
conf->reshape_safe = mddev->reshape_position;
spin_unlock_irq(&conf->device_lock);
wake_up(&conf->wait_for_overlap);
+ sysfs_notify(&mddev->kobj, NULL, "sync_completed");
}
if (mddev->delta_disks < 0) {
@@ -3943,6 +3945,7 @@ static sector_t reshape_request(mddev_t *mddev, sector_t sector_nr, int *skipped
wait_event(conf->wait_for_overlap,
atomic_read(&conf->reshape_stripes) == 0);
mddev->reshape_position = conf->reshape_progress;
+ mddev->curr_resync_completed = mddev->curr_resync;
conf->reshape_checkpoint = jiffies;
set_bit(MD_CHANGE_DEVS, &mddev->flags);
md_wakeup_thread(mddev->thread);
@@ -3953,6 +3956,7 @@ static sector_t reshape_request(mddev_t *mddev, sector_t sector_nr, int *skipped
conf->reshape_safe = mddev->reshape_position;
spin_unlock_irq(&conf->device_lock);
wake_up(&conf->wait_for_overlap);
+ sysfs_notify(&mddev->kobj, NULL, "sync_completed");
}
return reshape_sectors;
}
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [md PATCH 4/4] md: update sync_completed and reshape_position even more often.
2009-04-17 2:14 [md PATCH 0/4] A few more MD patches for 2.6.30 NeilBrown
` (2 preceding siblings ...)
2009-04-17 2:14 ` [md PATCH 3/4] md: improve usefulness and accuracy of sysfs file md/sync_completed NeilBrown
@ 2009-04-17 2:14 ` NeilBrown
3 siblings, 0 replies; 5+ messages in thread
From: NeilBrown @ 2009-04-17 2:14 UTC (permalink / raw)
To: linux-raid; +Cc: NeilBrown
There are circumstances when a user-space process might need to
"oversee" a resync/reshape process. For example when doing an
in-place reshape of a raid5, it is prudent to take a backup of each
section before reshaping it as this is the only way to provide
safety against an unplanned shutdown (i.e. crash/power failure).
The sync_max sysfs value can be used to stop the resync from
advancing beyond a particular point.
So user-space can:
suspend IO to the first section and back it up
set 'sync_max' to the end of the section
wait for 'sync_completed' to reach that point
resume IO on the first section and move on to the next section.
However this process requires the kernel and user-space to run in
lock-step which could introduce unnecessary delays.
It would be better if a 'double buffered' approach could be used with
userspace and kernel space working on different sections with the
'next' section always ready when the 'current' section is finished.
One problem with implementing this is that sync_completed is only
guaranteed to be updated when the sync process reaches sync_max.
(it is updated on a time basis at other times, but it is hard to rely
on that). This defeats some of the double buffering.
With this patch, sync_completed (and reshape_position) get updated as
the current position approaches sync_max, so there is room for
userspace to advance sync_max early without losing updates.
To be precise, sync_completed is updated when the current sync
position reaches half way between the current value of sync_completed
and the value of sync_max. This will usually be a good time for user
space to update sync_max.
If sync_max does not get updated, the updates to sync_completed
(together with associated metadata updates) will occur at an
exponentially increasing frequency which will get unreasonably fast
(one update every page) immediately before the process hits sync_max
and stops. So the update rate will be unreasonably fast only for an
insignificant period of time.
Signed-off-by: NeilBrown <neilb@suse.de>
---
drivers/md/md.c | 3 ++-
drivers/md/raid5.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 7af64f3..612343f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6347,7 +6347,8 @@ void md_do_sync(mddev_t *mddev)
if ((mddev->curr_resync > mddev->curr_resync_completed &&
(mddev->curr_resync - mddev->curr_resync_completed)
> (max_sectors >> 4)) ||
- j >= mddev->resync_max
+ (j - mddev->curr_resync_completed)*2
+ >= mddev->resync_max - mddev->curr_resync_completed
) {
/* time to update curr_resync_completed */
blk_unplug(mddev->queue);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 76892ac..4616bc3 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3940,7 +3940,8 @@ static sector_t reshape_request(mddev_t *mddev, sector_t sector_nr, int *skipped
* then we need to write out the superblock.
*/
sector_nr += reshape_sectors;
- if (sector_nr >= mddev->resync_max) {
+ if ((sector_nr - mddev->curr_resync_completed) * 2
+ >= mddev->resync_max - mddev->curr_resync_completed) {
/* Cannot proceed until we've updated the superblock... */
wait_event(conf->wait_for_overlap,
atomic_read(&conf->reshape_stripes) == 0);
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2009-04-17 2:14 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-17 2:14 [md PATCH 0/4] A few more MD patches for 2.6.30 NeilBrown
2009-04-17 2:14 ` [md PATCH 2/4] md: allow setting newly added device to 'in_sync' via sysfs NeilBrown
2009-04-17 2:14 ` [md PATCH 1/4] md: tiny md.h cleanups NeilBrown
2009-04-17 2:14 ` [md PATCH 3/4] md: improve usefulness and accuracy of sysfs file md/sync_completed NeilBrown
2009-04-17 2:14 ` [md PATCH 4/4] md: update sync_completed and reshape_position even more often NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).