* [PATCH 000 of 10] md: Introduction - assorted patches for -mm
@ 2006-06-01 5:13 NeilBrown
2006-06-01 5:13 ` [PATCH 001 of 10] md: md Kconfig speeling feex NeilBrown
` (9 more replies)
0 siblings, 10 replies; 17+ messages in thread
From: NeilBrown @ 2006-06-01 5:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
Following are 10 patches for md in -mm. None are appropriate for 2.6.17.
First 3 are fixes.
Next 2 are functionality improvements (people keep wanting their spare
drives to go to sleep, so now they can).
Remainder are more sysfs access.
Thanks,
NeilBrown
[PATCH 001 of 10] md: md Kconfig speeling feex
[PATCH 002 of 10] md: Fix Kconfig error
[PATCH 003 of 10] md: Fix bug that stops raid5 resync from happening
[PATCH 004 of 10] md: Allow re-add to work on array without bitmaps.
[PATCH 005 of 10] md: Don't write dirty/clean update to spares - leave them alone
[PATCH 006 of 10] md: Set/get state of array via sysfs
[PATCH 007 of 10] md: Allow rdev state to be set via sysfs.
[PATCH 008 of 10] md: Allow raid 'layout' to be read and set via sysfs.
[PATCH 009 of 10] md: Allow resync_start to be set and queried via sysfs.
[PATCH 010 of 10] md: Allow the write_mostly flag to be set via sysfs.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 001 of 10] md: md Kconfig speeling feex
2006-06-01 5:13 [PATCH 000 of 10] md: Introduction - assorted patches for -mm NeilBrown
@ 2006-06-01 5:13 ` NeilBrown
2006-06-01 5:13 ` [PATCH 002 of 10] md: Fix Kconfig error NeilBrown
` (8 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2006-06-01 5:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel, Justin Piszcz
From: Justin Piszcz <jpiszcz@lucidpixels.com>
I was experimenting with Linux SW raid today and found a spelling error
when reading the help menus...
(and fly spell found more).
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/Kconfig | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff ./drivers/md/Kconfig~current~ ./drivers/md/Kconfig
--- ./drivers/md/Kconfig~current~ 2006-06-01 15:03:29.000000000 +1000
+++ ./drivers/md/Kconfig 2006-06-01 15:04:25.000000000 +1000
@@ -90,7 +90,7 @@ config MD_RAID10
depends on BLK_DEV_MD && EXPERIMENTAL
---help---
RAID-10 provides a combination of striping (RAID-0) and
- mirroring (RAID-1) with easier configuration and more flexable
+ mirroring (RAID-1) with easier configuration and more flexible
layout.
Unlike RAID-0, but like RAID-1, RAID-10 requires all devices to
be the same size (or at least, only as much as the smallest device
@@ -147,7 +147,7 @@ config MD_RAID5_RESHAPE
is online. However it is still EXPERIMENTAL code. It should
work, but please be sure that you have backups.
- You will need mdadm verion 2.4.1 or later to use this
+ You will need mdadm version 2.4.1 or later to use this
feature safely. During the early stage of reshape there is
a critical section where live data is being over-written. A
crash during this time needs extra care for recovery. The
@@ -221,7 +221,7 @@ config DM_SNAPSHOT
tristate "Snapshot target (EXPERIMENTAL)"
depends on BLK_DEV_DM && EXPERIMENTAL
---help---
- Allow volume managers to take writeable snapshots of a device.
+ Allow volume managers to take writable snapshots of a device.
config DM_MIRROR
tristate "Mirror target (EXPERIMENTAL)"
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 002 of 10] md: Fix Kconfig error
2006-06-01 5:13 [PATCH 000 of 10] md: Introduction - assorted patches for -mm NeilBrown
2006-06-01 5:13 ` [PATCH 001 of 10] md: md Kconfig speeling feex NeilBrown
@ 2006-06-01 5:13 ` NeilBrown
2006-06-01 5:13 ` [PATCH 003 of 10] md: Fix bug that stops raid5 resync from happening NeilBrown
` (7 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2006-06-01 5:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
RAID5 recently changed to RAID456
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff ./drivers/md/Kconfig~current~ ./drivers/md/Kconfig
--- ./drivers/md/Kconfig~current~ 2006-06-01 15:04:25.000000000 +1000
+++ ./drivers/md/Kconfig 2006-06-01 15:05:17.000000000 +1000
@@ -137,7 +137,7 @@ config MD_RAID456
config MD_RAID5_RESHAPE
bool "Support adding drives to a raid-5 array (experimental)"
- depends on MD_RAID5 && EXPERIMENTAL
+ depends on MD_RAID456 && EXPERIMENTAL
---help---
A RAID-5 set can be expanded by adding extra drives. This
requires "restriping" the array which means (almost) every
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 003 of 10] md: Fix bug that stops raid5 resync from happening
2006-06-01 5:13 [PATCH 000 of 10] md: Introduction - assorted patches for -mm NeilBrown
2006-06-01 5:13 ` [PATCH 001 of 10] md: md Kconfig speeling feex NeilBrown
2006-06-01 5:13 ` [PATCH 002 of 10] md: Fix Kconfig error NeilBrown
@ 2006-06-01 5:13 ` NeilBrown
2006-06-01 5:13 ` [PATCH 004 of 10] md: Allow re-add to work on array without bitmaps NeilBrown
` (6 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2006-06-01 5:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
As data_disks is *less* than raid_disks, the current test here is
obviously wrong. And as the difference is already available in
conf->max_degraded, it makes much more sense to use that.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/raid5.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~ 2006-06-01 15:03:29.000000000 +1000
+++ ./drivers/md/raid5.c 2006-06-01 15:05:28.000000000 +1000
@@ -2858,7 +2858,7 @@ static inline sector_t sync_request(mdde
* to resync, then assert that we are finished, because there is
* nothing we can do.
*/
- if (mddev->degraded >= (data_disks - raid_disks) &&
+ if (mddev->degraded >= conf->max_degraded &&
test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
sector_t rv = (mddev->size << 1) - sector_nr;
*skipped = 1;
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 004 of 10] md: Allow re-add to work on array without bitmaps.
2006-06-01 5:13 [PATCH 000 of 10] md: Introduction - assorted patches for -mm NeilBrown
` (2 preceding siblings ...)
2006-06-01 5:13 ` [PATCH 003 of 10] md: Fix bug that stops raid5 resync from happening NeilBrown
@ 2006-06-01 5:13 ` NeilBrown
2006-06-01 5:13 ` [PATCH 005 of 10] md: Don't write dirty/clean update to spares - leave them alone NeilBrown
` (5 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2006-06-01 5:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
When an array has a bitmap, a device can be removed and re-added
and only blocks changes since the removal (as recorded in the bitmap)
will be resynced.
It should be possible to do a similar thing to arrays without bitmaps.
i.e. if a device is removed and re-added and *no* changes have been
made in the interim, then the add should not require a resync.
This patch allows that option.
This means that when assembling an array one device at a time (e.g.
during device discovery) the array can be enabled read-only as soon
as enough devices are available, but extra devices can still be added
without causing a resync.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/md.c | 25 ++++++++++++++-----------
./drivers/md/raid1.c | 6 ++++++
2 files changed, 20 insertions(+), 11 deletions(-)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2006-06-01 15:03:28.000000000 +1000
+++ ./drivers/md/md.c 2006-06-01 15:05:29.000000000 +1000
@@ -737,6 +737,7 @@ static int super_90_validate(mddev_t *md
{
mdp_disk_t *desc;
mdp_super_t *sb = (mdp_super_t *)page_address(rdev->sb_page);
+ __u64 ev1 = md_event(sb);
rdev->raid_disk = -1;
rdev->flags = 0;
@@ -753,7 +754,7 @@ static int super_90_validate(mddev_t *md
mddev->layout = sb->layout;
mddev->raid_disks = sb->raid_disks;
mddev->size = sb->size;
- mddev->events = md_event(sb);
+ mddev->events = ev1;
mddev->bitmap_offset = 0;
mddev->default_bitmap_offset = MD_SB_BYTES >> 9;
@@ -802,7 +803,6 @@ static int super_90_validate(mddev_t *md
} else if (mddev->pers == NULL) {
/* Insist on good event counter while assembling */
- __u64 ev1 = md_event(sb);
++ev1;
if (ev1 < mddev->events)
return -EINVAL;
@@ -810,11 +810,13 @@ static int super_90_validate(mddev_t *md
/* if adding to array with a bitmap, then we can accept an
* older device ... but not too old.
*/
- __u64 ev1 = md_event(sb);
if (ev1 < mddev->bitmap->events_cleared)
return 0;
- } else /* just a hot-add of a new device, leave raid_disk at -1 */
- return 0;
+ } else {
+ if (ev1 < mddev->events)
+ /* just a hot-add of a new device, leave raid_disk at -1 */
+ return 0;
+ }
if (mddev->level != LEVEL_MULTIPATH) {
desc = sb->disks + rdev->desc_nr;
@@ -1105,6 +1107,7 @@ static int super_1_load(mdk_rdev_t *rdev
static int super_1_validate(mddev_t *mddev, mdk_rdev_t *rdev)
{
struct mdp_superblock_1 *sb = (struct mdp_superblock_1*)page_address(rdev->sb_page);
+ __u64 ev1 = le64_to_cpu(sb->events);
rdev->raid_disk = -1;
rdev->flags = 0;
@@ -1120,7 +1123,7 @@ static int super_1_validate(mddev_t *mdd
mddev->layout = le32_to_cpu(sb->layout);
mddev->raid_disks = le32_to_cpu(sb->raid_disks);
mddev->size = le64_to_cpu(sb->size)/2;
- mddev->events = le64_to_cpu(sb->events);
+ mddev->events = ev1;
mddev->bitmap_offset = 0;
mddev->default_bitmap_offset = 1024 >> 9;
@@ -1154,7 +1157,6 @@ static int super_1_validate(mddev_t *mdd
} else if (mddev->pers == NULL) {
/* Insist of good event counter while assembling */
- __u64 ev1 = le64_to_cpu(sb->events);
++ev1;
if (ev1 < mddev->events)
return -EINVAL;
@@ -1162,12 +1164,13 @@ static int super_1_validate(mddev_t *mdd
/* If adding to array with a bitmap, then we can accept an
* older device, but not too old.
*/
- __u64 ev1 = le64_to_cpu(sb->events);
if (ev1 < mddev->bitmap->events_cleared)
return 0;
- } else /* just a hot-add of a new device, leave raid_disk at -1 */
- return 0;
-
+ } else {
+ if (ev1 < mddev->events)
+ /* just a hot-add of a new device, leave raid_disk at -1 */
+ return 0;
+ }
if (mddev->level != LEVEL_MULTIPATH) {
int role;
rdev->desc_nr = le32_to_cpu(sb->dev_number);
diff ./drivers/md/raid1.c~current~ ./drivers/md/raid1.c
--- ./drivers/md/raid1.c~current~ 2006-06-01 15:03:28.000000000 +1000
+++ ./drivers/md/raid1.c 2006-06-01 15:05:29.000000000 +1000
@@ -1625,6 +1625,12 @@ static sector_t sync_request(mddev_t *md
/* before building a request, check if we can skip these blocks..
* This call the bitmap_start_sync doesn't actually record anything
*/
+ if (mddev->bitmap == NULL &&
+ mddev->recovery_cp == MaxSector &&
+ conf->fullsync == 0) {
+ *skipped = 1;
+ return max_sector - sector_nr;
+ }
if (!bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, 1) &&
!conf->fullsync && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
/* We can skip this block, and probably several more */
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 005 of 10] md: Don't write dirty/clean update to spares - leave them alone
2006-06-01 5:13 [PATCH 000 of 10] md: Introduction - assorted patches for -mm NeilBrown
` (3 preceding siblings ...)
2006-06-01 5:13 ` [PATCH 004 of 10] md: Allow re-add to work on array without bitmaps NeilBrown
@ 2006-06-01 5:13 ` NeilBrown
2006-06-01 5:13 ` [PATCH 006 of 10] md: Set/get state of array via sysfs NeilBrown
` (4 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2006-06-01 5:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
- record the 'event' count on each individual device (they
might sometimes be slightly different now)
- add a new value for 'sb_dirty': '3' means that the super
block only needs to be updated to record a clean<->dirty
transition.
- Prefer odd event numbers for dirty states and even numbers
for clean states
- Using all the above, don't update the superblock on
a spare device if the update is just doing a clean-dirty
transition. To accomodate this, a transition from
dirty back to clean might now decrement the events counter
if nothing else has changed.
The net effect of this is that spare drives will not see any IO
requests during normal running of the array, so they can go to sleep
if that is what they want to do.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./drivers/md/md.c | 65 ++++++++++++++++++++++++++++++++++++++------
./include/linux/raid/md_k.h | 1
2 files changed, 58 insertions(+), 8 deletions(-)
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2006-06-01 15:05:29.000000000 +1000
+++ ./drivers/md/md.c 2006-06-01 15:05:29.000000000 +1000
@@ -1558,15 +1558,30 @@ static void md_print_devices(void)
}
-static void sync_sbs(mddev_t * mddev)
+static void sync_sbs(mddev_t * mddev, int nospares)
{
+ /* Update each superblock (in-memory image), but
+ * if we are allowed to, skip spares which already
+ * have the right event counter, or have one earlier
+ * (which would mean they aren't being marked as dirty
+ * with the rest of the array)
+ */
mdk_rdev_t *rdev;
struct list_head *tmp;
ITERATE_RDEV(mddev,rdev,tmp) {
- super_types[mddev->major_version].
- sync_super(mddev, rdev);
- rdev->sb_loaded = 1;
+ if (rdev->sb_events == mddev->events ||
+ (nospares &&
+ rdev->raid_disk < 0 &&
+ (rdev->sb_events&1)==0 &&
+ rdev->sb_events+1 == mddev->events)) {
+ /* Don't update this superblock */
+ rdev->sb_loaded = 2;
+ } else {
+ super_types[mddev->major_version].
+ sync_super(mddev, rdev);
+ rdev->sb_loaded = 1;
+ }
}
}
@@ -1576,12 +1591,42 @@ void md_update_sb(mddev_t * mddev)
struct list_head *tmp;
mdk_rdev_t *rdev;
int sync_req;
+ int nospares = 0;
repeat:
spin_lock_irq(&mddev->write_lock);
sync_req = mddev->in_sync;
mddev->utime = get_seconds();
- mddev->events ++;
+ if (mddev->sb_dirty == 3)
+ /* just a clean<-> dirty transition, possibly leave spares alone,
+ * though if events isn't the right even/odd, we will have to do
+ * spares after all
+ */
+ nospares = 1;
+
+ /* If this is just a dirty<->clean transition, and the array is clean
+ * and 'events' is odd, we can roll back to the previous clean state */
+ if (mddev->sb_dirty == 3
+ && (mddev->in_sync && mddev->recovery_cp == MaxSector)
+ && (mddev->events & 1))
+ mddev->events--;
+ else {
+ /* otherwise we have to go forward and ... */
+ mddev->events ++;
+ if (!mddev->in_sync || mddev->recovery_cp != MaxSector) { /* not clean */
+ /* .. if the array isn't clean, insist on an odd 'events' */
+ if ((mddev->events&1)==0) {
+ mddev->events++;
+ nospares = 0;
+ }
+ } else {
+ /* otherwise insist on an even 'events' (for clean states) */
+ if ((mddev->events&1)) {
+ mddev->events++;
+ nospares = 0;
+ }
+ }
+ }
if (!mddev->events) {
/*
@@ -1593,7 +1638,7 @@ repeat:
mddev->events --;
}
mddev->sb_dirty = 2;
- sync_sbs(mddev);
+ sync_sbs(mddev, nospares);
/*
* do not write anything to disk if using
@@ -1615,6 +1660,8 @@ repeat:
ITERATE_RDEV(mddev,rdev,tmp) {
char b[BDEVNAME_SIZE];
dprintk(KERN_INFO "md: ");
+ if (rdev->sb_loaded != 1)
+ continue; /* no noise on spare devices */
if (test_bit(Faulty, &rdev->flags))
dprintk("(skipping faulty ");
@@ -1626,6 +1673,7 @@ repeat:
dprintk(KERN_INFO "(write) %s's sb offset: %llu\n",
bdevname(rdev->bdev,b),
(unsigned long long)rdev->sb_offset);
+ rdev->sb_events = mddev->events;
} else
dprintk(")\n");
@@ -1895,6 +1943,7 @@ static mdk_rdev_t *md_import_device(dev_
rdev->desc_nr = -1;
rdev->flags = 0;
rdev->data_offset = 0;
+ rdev->sb_events = 0;
atomic_set(&rdev->nr_pending, 0);
atomic_set(&rdev->read_errors, 0);
atomic_set(&rdev->corrected_errors, 0);
@@ -4708,7 +4757,7 @@ void md_write_start(mddev_t *mddev, stru
spin_lock_irq(&mddev->write_lock);
if (mddev->in_sync) {
mddev->in_sync = 0;
- mddev->sb_dirty = 1;
+ mddev->sb_dirty = 3;
md_wakeup_thread(mddev->thread);
}
spin_unlock_irq(&mddev->write_lock);
@@ -5055,7 +5104,7 @@ void md_check_recovery(mddev_t *mddev)
if (mddev->safemode && !atomic_read(&mddev->writes_pending) &&
!mddev->in_sync && mddev->recovery_cp == MaxSector) {
mddev->in_sync = 1;
- mddev->sb_dirty = 1;
+ mddev->sb_dirty = 3;
}
if (mddev->safemode == 1)
mddev->safemode = 0;
diff ./include/linux/raid/md_k.h~current~ ./include/linux/raid/md_k.h
--- ./include/linux/raid/md_k.h~current~ 2006-06-01 15:03:28.000000000 +1000
+++ ./include/linux/raid/md_k.h 2006-06-01 15:05:29.000000000 +1000
@@ -58,6 +58,7 @@ struct mdk_rdev_s
struct page *sb_page;
int sb_loaded;
+ __u64 sb_events;
sector_t data_offset; /* start of data in array */
sector_t sb_offset;
int sb_size; /* bytes in the superblock */
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 006 of 10] md: Set/get state of array via sysfs
2006-06-01 5:13 [PATCH 000 of 10] md: Introduction - assorted patches for -mm NeilBrown
` (4 preceding siblings ...)
2006-06-01 5:13 ` [PATCH 005 of 10] md: Don't write dirty/clean update to spares - leave them alone NeilBrown
@ 2006-06-01 5:13 ` NeilBrown
2006-06-01 5:33 ` Chris Wright
2006-06-01 5:14 ` [PATCH 007 of 10] md: Allow rdev state to be set " NeilBrown
` (3 subsequent siblings)
9 siblings, 1 reply; 17+ messages in thread
From: NeilBrown @ 2006-06-01 5:13 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
This allows the state of an md/array to be directly controlled
via sysfs and adds the ability to stop and array without
tearing it down.
Array states/settings:
clear
No devices, no size, no level
Equivalent to STOP_ARRAY ioctl
inactive
May have some settings, but array is not active
all IO results in error
When written, doesn't tear down array, but just stops it
suspended (not supported yet)
All IO requests will block. The array can be reconfigured.
Writing this, if accepted, will block until array is quiescent
readonly
no resync can happen. no superblocks get written.
write requests fail
read-auto
like readonly, but behaves like 'clean' on a write request.
clean - no pending writes, but otherwise active.
When written to inactive array, starts without resync
If a write request arrives then
if metadata is known, mark 'dirty' and switch to 'active'.
if not known, block and switch to write-pending
If written to an active array that has pending writes, then fails.
active
fully active: IO and resync can be happening.
When written to inactive array, starts with resync
write-pending (not supported yet)
clean, but writes are blocked waiting for 'active' to be written.
active-idle
like active, but no writes have been seen for a while (100msec).
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./Documentation/md.txt | 39 +++++++++
./drivers/md/md.c | 197 ++++++++++++++++++++++++++++++++++++++++++++++---
2 files changed, 227 insertions(+), 9 deletions(-)
diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~ 2006-06-01 15:03:28.000000000 +1000
+++ ./Documentation/md.txt 2006-06-01 15:05:29.000000000 +1000
@@ -216,6 +216,45 @@ All md devices contain:
period as a number of seconds. The default is 200msec (0.200).
Writing a value of 0 disables safemode.
+ array_state
+ This file contains a single word which describes the current
+ state of the array. In many cases, the state can be set by
+ writing the word for the desired state, however some states
+ cannot be explicitly set, and some transitions are not allowed.
+
+ clear
+ No devices, no size, no level
+ Writing is equivalent to STOP_ARRAY ioctl
+ inactive
+ May have some settings, but array is not active
+ all IO results in error
+ When written, doesn't tear down array, but just stops it
+ suspended (not supported yet)
+ All IO requests will block. The array can be reconfigured.
+ Writing this, if accepted, will block until array is quiessent
+ readonly
+ no resync can happen. no superblocks get written.
+ write requests fail
+ read-auto
+ like readonly, but behaves like 'clean' on a write request.
+
+ clean - no pending writes, but otherwise active.
+ When written to inactive array, starts without resync
+ If a write request arrives then
+ if metadata is known, mark 'dirty' and switch to 'active'.
+ if not known, block and switch to write-pending
+ If written to an active array that has pending writes, then fails.
+ active
+ fully active: IO and resync can be happening.
+ When written to inactive array, starts with resync
+
+ write-pending
+ clean, but writes are blocked waiting for 'active' to be written.
+
+ active-idle
+ like active, but no writes have been seen for a while (safe_mode_delay).
+
+
sync_speed_min
sync_speed_max
This are similar to /proc/sys/dev/raid/speed_limit_{min,max}
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2006-06-01 15:05:29.000000000 +1000
+++ ./drivers/md/md.c 2006-06-01 15:05:29.000000000 +1000
@@ -2185,6 +2185,176 @@ chunk_size_store(mddev_t *mddev, const c
static struct md_sysfs_entry md_chunk_size =
__ATTR(chunk_size, 0644, chunk_size_show, chunk_size_store);
+/*
+ * The array state can be:
+ *
+ * clear
+ * No devices, no size, no level
+ * Equivalent to STOP_ARRAY ioctl
+ * inactive
+ * May have some settings, but array is not active
+ * all IO results in error
+ * When written, doesn't tear down array, but just stops it
+ * suspended (not supported yet)
+ * All IO requests will block. The array can be reconfigured.
+ * Writing this, if accepted, will block until array is quiessent
+ * readonly
+ * no resync can happen. no superblocks get written.
+ * write requests fail
+ * read-auto
+ * like readonly, but behaves like 'clean' on a write request.
+ *
+ * clean - no pending writes, but otherwise active.
+ * When written to inactive array, starts without resync
+ * If a write request arrives then
+ * if metadata is known, mark 'dirty' and switch to 'active'.
+ * if not known, block and switch to write-pending
+ * If written to an active array that has pending writes, then fails.
+ * active
+ * fully active: IO and resync can be happening.
+ * When written to inactive array, starts with resync
+ *
+ * write-pending
+ * clean, but writes are blocked waiting for 'active' to be written.
+ *
+ * active-idle
+ * like active, but no writes have been seen for a while (100msec).
+ *
+ */
+enum array_state { clear, inactive, suspended, readonly, read_auto, clean, active,
+ write_pending, active_idle, bad_word};
+char *array_states[] = {
+ "clear", "inactive", "suspended", "readonly", "read-auto", "clean", "active",
+ "write-pending", "active-idle", NULL };
+
+static int match_word(const char *word, char **list)
+{
+ int n;
+ for (n=0; list[n]; n++)
+ if (cmd_match(word, list[n]))
+ break;
+ return n;
+}
+
+static ssize_t
+array_state_show(mddev_t *mddev, char *page)
+{
+ enum array_state st = inactive;
+
+ if (mddev->pers)
+ switch(mddev->ro) {
+ case 1:
+ st = readonly;
+ break;
+ case 2:
+ st = read_auto;
+ break;
+ case 0:
+ if (mddev->in_sync)
+ st = clean;
+ else if (mddev->safemode)
+ st = active_idle;
+ else
+ st = active;
+ }
+ else {
+ if (list_empty(&mddev->disks) &&
+ mddev->raid_disks == 0 &&
+ mddev->size == 0)
+ st = clear;
+ else
+ st = inactive;
+ }
+ return sprintf(page, "%s\n", array_states[st]);
+}
+
+static int do_md_stop(mddev_t * mddev, int ro);
+static int do_md_run(mddev_t * mddev);
+static int restart_array(mddev_t *mddev);
+
+static ssize_t
+array_state_store(mddev_t *mddev, const char *buf, size_t len)
+{
+ int err = -EINVAL;
+ enum array_state st = match_word(buf, array_states);
+ switch(st) {
+ case bad_word:
+ break;
+ case clear:
+ /* stopping an active array */
+ if (mddev->pers) {
+ if (atomic_read(&mddev->active) > 1)
+ return -EBUSY;
+ err = do_md_stop(mddev, 0);
+ }
+ break;
+ case inactive:
+ /* stopping an active array */
+ if (mddev->pers) {
+ if (atomic_read(&mddev->active) > 1)
+ return -EBUSY;
+ err = do_md_stop(mddev, 2);
+ }
+ break;
+ case suspended:
+ break; /* not supported yet */
+ case readonly:
+ if (mddev->pers)
+ err = do_md_stop(mddev, 1);
+ else {
+ mddev->ro = 1;
+ err = do_md_run(mddev);
+ }
+ break;
+ case read_auto:
+ /* stopping an active array */
+ if (mddev->pers) {
+ err = do_md_stop(mddev, 1);
+ if (err == 0)
+ mddev->ro = 2; /* FIXME mark devices writable */
+ } else {
+ mddev->ro = 2;
+ err = do_md_run(mddev);
+ }
+ break;
+ case clean:
+ if (mddev->pers) {
+ restart_array(mddev);
+ spin_lock_irq(&mddev->write_lock);
+ if (atomic_read(&mddev->writes_pending) == 0) {
+ mddev->in_sync = 1;
+ mddev->sb_dirty = 1;
+ }
+ spin_unlock_irq(&mddev->write_lock);
+ } else {
+ mddev->ro = 0;
+ mddev->recovery_cp = MaxSector;
+ err = do_md_run(mddev);
+ }
+ break;
+ case active:
+ if (mddev->pers) {
+ restart_array(mddev);
+ mddev->sb_dirty = 0;
+ wake_up(&mddev->sb_wait);
+ err = 0;
+ } else {
+ mddev->ro = 0;
+ err = do_md_run(mddev);
+ }
+ break;
+ case write_pending:
+ case active_idle:
+ /* these cannot be set */
+ break;
+ }
+ if (err)
+ return err;
+ else
+ return len;
+}
+static struct md_sysfs_entry md_array_state = __ATTR(array_state, 0644, array_state_show, array_state_store);
+
static ssize_t
null_show(mddev_t *mddev, char *page)
{
@@ -2553,6 +2723,7 @@ static struct attribute *md_default_attr
&md_metadata.attr,
&md_new_device.attr,
&md_safe_delay.attr,
+ &md_array_state.attr,
NULL,
};
@@ -2919,11 +3090,8 @@ static int restart_array(mddev_t *mddev)
md_wakeup_thread(mddev->thread);
md_wakeup_thread(mddev->sync_thread);
err = 0;
- } else {
- printk(KERN_ERR "md: %s has no personality assigned.\n",
- mdname(mddev));
+ } else
err = -EINVAL;
- }
out:
return err;
@@ -2955,7 +3123,12 @@ static void restore_bitmap_write_access(
spin_unlock(&inode->i_lock);
}
-static int do_md_stop(mddev_t * mddev, int ro)
+/* mode:
+ * 0 - completely stop and dis-assemble array
+ * 1 - switch to readonly
+ * 2 - stop but do not disassemble array
+ */
+static int do_md_stop(mddev_t * mddev, int mode)
{
int err = 0;
struct gendisk *disk = mddev->gendisk;
@@ -2977,12 +3150,15 @@ static int do_md_stop(mddev_t * mddev, i
invalidate_partition(disk, 0);
- if (ro) {
+ switch(mode) {
+ case 1: /* readonly */
err = -ENXIO;
if (mddev->ro==1)
goto out;
mddev->ro = 1;
- } else {
+ break;
+ case 0: /* disassemble */
+ case 2: /* stop */
bitmap_flush(mddev);
md_super_wait(mddev);
if (mddev->ro)
@@ -3002,7 +3178,7 @@ static int do_md_stop(mddev_t * mddev, i
mddev->in_sync = 1;
md_update_sb(mddev);
}
- if (ro)
+ if (mode == 1)
set_disk_ro(disk, 1);
clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
}
@@ -3010,7 +3186,7 @@ static int do_md_stop(mddev_t * mddev, i
/*
* Free resources if final stop
*/
- if (!ro) {
+ if (mode == 0) {
mdk_rdev_t *rdev;
struct list_head *tmp;
struct gendisk *disk;
@@ -3034,6 +3210,9 @@ static int do_md_stop(mddev_t * mddev, i
export_array(mddev);
mddev->array_size = 0;
+ mddev->size = 0;
+ mddev->raid_disks = 0;
+
disk = mddev->gendisk;
if (disk)
set_capacity(disk, 0);
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 007 of 10] md: Allow rdev state to be set via sysfs.
2006-06-01 5:13 [PATCH 000 of 10] md: Introduction - assorted patches for -mm NeilBrown
` (5 preceding siblings ...)
2006-06-01 5:13 ` [PATCH 006 of 10] md: Set/get state of array via sysfs NeilBrown
@ 2006-06-01 5:14 ` NeilBrown
2006-06-01 5:14 ` [PATCH 008 of 10] md: Allow raid 'layout' to be read and " NeilBrown
` (2 subsequent siblings)
9 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2006-06-01 5:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
The md/dev-XXX/state file can now be written:
"faulty" simulates an error on the device
"remove" removes the device from the array (if it is not busy)
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./Documentation/md.txt | 3 +++
./drivers/md/md.c | 26 +++++++++++++++++++++++++-
2 files changed, 28 insertions(+), 1 deletion(-)
diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~ 2006-06-01 15:05:29.000000000 +1000
+++ ./Documentation/md.txt 2006-06-01 15:05:29.000000000 +1000
@@ -302,6 +302,9 @@ Each directory contains:
This includes spares that are in the process
of being recoverred to
This list make grow in future.
+ This can be written to.
+ Writing "faulty" simulates a failure on the device.
+ Writing "remove" removes the device from the array.
errors
An approximate count of read errors that have been detected on
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2006-06-01 15:05:29.000000000 +1000
+++ ./drivers/md/md.c 2006-06-01 15:05:30.000000000 +1000
@@ -1745,8 +1745,32 @@ state_show(mdk_rdev_t *rdev, char *page)
return len+sprintf(page+len, "\n");
}
+static ssize_t
+state_store(mdk_rdev_t *rdev, const char *buf, size_t len)
+{
+ /* can write
+ * faulty - simulates and error
+ * remove - disconnects the device
+ */
+ int err = -EINVAL;
+ if (cmd_match(buf, "faulty") && rdev->mddev->pers) {
+ md_error(rdev->mddev, rdev);
+ err = 0;
+ } else if (cmd_match(buf, "remove")) {
+ if (rdev->raid_disk >= 0)
+ err = -EBUSY;
+ else {
+ mddev_t *mddev = rdev->mddev;
+ kick_rdev_from_array(rdev);
+ md_update_sb(mddev);
+ md_new_event(mddev);
+ err = 0;
+ }
+ }
+ return err ? err : len;
+}
static struct rdev_sysfs_entry
-rdev_state = __ATTR_RO(state);
+rdev_state = __ATTR(state, 0644, state_show, state_store);
static ssize_t
super_show(mdk_rdev_t *rdev, char *page)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 008 of 10] md: Allow raid 'layout' to be read and set via sysfs.
2006-06-01 5:13 [PATCH 000 of 10] md: Introduction - assorted patches for -mm NeilBrown
` (6 preceding siblings ...)
2006-06-01 5:14 ` [PATCH 007 of 10] md: Allow rdev state to be set " NeilBrown
@ 2006-06-01 5:14 ` NeilBrown
2006-06-01 5:34 ` Chris Wright
2006-06-01 5:14 ` [PATCH 009 of 10] md: Allow resync_start to be set and queried " NeilBrown
2006-06-01 5:14 ` [PATCH 010 of 10] md: Allow the write_mostly flag to be set " NeilBrown
9 siblings, 1 reply; 17+ messages in thread
From: NeilBrown @ 2006-06-01 5:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./Documentation/md.txt | 5 +++++
./drivers/md/md.c | 27 +++++++++++++++++++++++++++
2 files changed, 32 insertions(+)
diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~ 2006-06-01 15:05:29.000000000 +1000
+++ ./Documentation/md.txt 2006-06-01 15:05:30.000000000 +1000
@@ -200,6 +200,11 @@ All md devices contain:
This can be written only while the array is being assembled, not
after it is started.
+ layout
+ The "layout" for the array for the particular level. This is
+ simply a number that is interpretted differently by different
+ levels. It can be written while assembling an array.
+
new_dev
This file can be written but not read. The value written should
be a block device number as major:minor. e.g. 8:0
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2006-06-01 15:05:30.000000000 +1000
+++ ./drivers/md/md.c 2006-06-01 15:05:30.000000000 +1000
@@ -2155,6 +2155,32 @@ level_store(mddev_t *mddev, const char *
static struct md_sysfs_entry md_level =
__ATTR(level, 0644, level_show, level_store);
+
+static ssize_t
+layout_show(mddev_t *mddev, char *page)
+{
+ /* just a number, not meaningful for all levels */
+ return sprintf(page, "%d\n", mddev->layout);
+}
+
+static ssize_t
+layout_store(mddev_t *mddev, const char *buf, size_t len)
+{
+ char *e;
+ unsigned long n = simple_strtoul(buf, &e, 10);
+ if (mddev->pers)
+ return -EBUSY;
+
+ if (!*buf || (*e && *e != '\n'))
+ return -EINVAL;
+
+ mddev->layout = n;
+ return len;
+}
+static struct md_sysfs_entry md_layout =
+__ATTR(layout, 0655, layout_show, layout_store);
+
+
static ssize_t
raid_disks_show(mddev_t *mddev, char *page)
{
@@ -2741,6 +2767,7 @@ __ATTR(suspend_hi, S_IRUGO|S_IWUSR, susp
static struct attribute *md_default_attrs[] = {
&md_level.attr,
+ &md_layout.attr,
&md_raid_disks.attr,
&md_chunk_size.attr,
&md_size.attr,
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 009 of 10] md: Allow resync_start to be set and queried via sysfs.
2006-06-01 5:13 [PATCH 000 of 10] md: Introduction - assorted patches for -mm NeilBrown
` (7 preceding siblings ...)
2006-06-01 5:14 ` [PATCH 008 of 10] md: Allow raid 'layout' to be read and " NeilBrown
@ 2006-06-01 5:14 ` NeilBrown
2006-06-01 5:14 ` [PATCH 010 of 10] md: Allow the write_mostly flag to be set " NeilBrown
9 siblings, 0 replies; 17+ messages in thread
From: NeilBrown @ 2006-06-01 5:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./Documentation/md.txt | 6 ++++++
./drivers/md/md.c | 26 ++++++++++++++++++++++++++
2 files changed, 32 insertions(+)
diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~ 2006-06-01 15:05:30.000000000 +1000
+++ ./Documentation/md.txt 2006-06-01 15:05:30.000000000 +1000
@@ -205,6 +205,12 @@ All md devices contain:
simply a number that is interpretted differently by different
levels. It can be written while assembling an array.
+ resync_start
+ The point at which resync should start. If no resync is needed,
+ this will be a very large number. At array creation it will
+ default to 0, though starting the array as 'clean' will
+ set it much larger.
+
new_dev
This file can be written but not read. The value written should
be a block device number as major:minor. e.g. 8:0
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2006-06-01 15:05:30.000000000 +1000
+++ ./drivers/md/md.c 2006-06-01 15:05:30.000000000 +1000
@@ -2235,6 +2235,30 @@ chunk_size_store(mddev_t *mddev, const c
static struct md_sysfs_entry md_chunk_size =
__ATTR(chunk_size, 0644, chunk_size_show, chunk_size_store);
+static ssize_t
+resync_start_show(mddev_t *mddev, char *page)
+{
+ return sprintf(page, "%llu\n", (unsigned long long)mddev->recovery_cp);
+}
+
+static ssize_t
+resync_start_store(mddev_t *mddev, const char *buf, size_t len)
+{
+ /* can only set chunk_size if array is not yet active */
+ char *e;
+ unsigned long long n = simple_strtoull(buf, &e, 10);
+
+ if (mddev->pers)
+ return -EBUSY;
+ if (!*buf || (*e && *e != '\n'))
+ return -EINVAL;
+
+ mddev->recovery_cp = n;
+ return len;
+}
+static struct md_sysfs_entry md_resync_start =
+__ATTR(resync_start, 0644, resync_start_show, resync_start_store);
+
/*
* The array state can be:
*
@@ -2771,6 +2795,7 @@ static struct attribute *md_default_attr
&md_raid_disks.attr,
&md_chunk_size.attr,
&md_size.attr,
+ &md_resync_start.attr,
&md_metadata.attr,
&md_new_device.attr,
&md_safe_delay.attr,
@@ -3263,6 +3288,7 @@ static int do_md_stop(mddev_t * mddev, i
mddev->array_size = 0;
mddev->size = 0;
mddev->raid_disks = 0;
+ mddev->recovery_cp = 0;
disk = mddev->gendisk;
if (disk)
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 010 of 10] md: Allow the write_mostly flag to be set via sysfs.
2006-06-01 5:13 [PATCH 000 of 10] md: Introduction - assorted patches for -mm NeilBrown
` (8 preceding siblings ...)
2006-06-01 5:14 ` [PATCH 009 of 10] md: Allow resync_start to be set and queried " NeilBrown
@ 2006-06-01 5:14 ` NeilBrown
2006-08-05 5:59 ` Mike Snitzer
9 siblings, 1 reply; 17+ messages in thread
From: NeilBrown @ 2006-06-01 5:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-raid, linux-kernel
It appears in /sys/mdX/md/dev-YYY/state
and can be set or cleared by writing 'writemostly' or '-writemostly'
respectively.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./Documentation/md.txt | 5 +++++
./drivers/md/md.c | 12 ++++++++++++
2 files changed, 17 insertions(+)
diff ./Documentation/md.txt~current~ ./Documentation/md.txt
--- ./Documentation/md.txt~current~ 2006-06-01 15:05:30.000000000 +1000
+++ ./Documentation/md.txt 2006-06-01 15:05:30.000000000 +1000
@@ -309,6 +309,9 @@ Each directory contains:
faulty - device has been kicked from active use due to
a detected fault
in_sync - device is a fully in-sync member of the array
+ writemostly - device will only be subject to read
+ requests if there are no other options.
+ This applies only to raid1 arrays.
spare - device is working, but not a full member.
This includes spares that are in the process
of being recoverred to
@@ -316,6 +319,8 @@ Each directory contains:
This can be written to.
Writing "faulty" simulates a failure on the device.
Writing "remove" removes the device from the array.
+ Writing "writemostly" sets the writemostly flag.
+ Writing "-writemostly" clears the writemostly flag.
errors
An approximate count of read errors that have been detected on
diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~ 2006-06-01 15:05:30.000000000 +1000
+++ ./drivers/md/md.c 2006-06-01 15:05:30.000000000 +1000
@@ -1737,6 +1737,10 @@ state_show(mdk_rdev_t *rdev, char *page)
len += sprintf(page+len, "%sin_sync",sep);
sep = ",";
}
+ if (test_bit(WriteMostly, &rdev->flags)) {
+ len += sprintf(page+len, "%swrite_mostly",sep);
+ sep = ",";
+ }
if (!test_bit(Faulty, &rdev->flags) &&
!test_bit(In_sync, &rdev->flags)) {
len += sprintf(page+len, "%sspare", sep);
@@ -1751,6 +1755,8 @@ state_store(mdk_rdev_t *rdev, const char
/* can write
* faulty - simulates and error
* remove - disconnects the device
+ * writemostly - sets write_mostly
+ * -writemostly - clears write_mostly
*/
int err = -EINVAL;
if (cmd_match(buf, "faulty") && rdev->mddev->pers) {
@@ -1766,6 +1772,12 @@ state_store(mdk_rdev_t *rdev, const char
md_new_event(mddev);
err = 0;
}
+ } else if (cmd_match(buf, "writemostly")) {
+ set_bit(WriteMostly, &rdev->flags);
+ err = 0;
+ } else if (cmd_match(buf, "-writemostly")) {
+ clear_bit(WriteMostly, &rdev->flags);
+ err = 0;
}
return err ? err : len;
}
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 006 of 10] md: Set/get state of array via sysfs
2006-06-01 5:13 ` [PATCH 006 of 10] md: Set/get state of array via sysfs NeilBrown
@ 2006-06-01 5:33 ` Chris Wright
2006-06-01 5:43 ` Neil Brown
0 siblings, 1 reply; 17+ messages in thread
From: Chris Wright @ 2006-06-01 5:33 UTC (permalink / raw)
To: NeilBrown; +Cc: Andrew Morton, linux-raid, linux-kernel
* NeilBrown (neilb@suse.de) wrote:
>
> This allows the state of an md/array to be directly controlled
> via sysfs and adds the ability to stop and array without
> tearing it down.
>
> Array states/settings:
>
> clear
> No devices, no size, no level
> Equivalent to STOP_ARRAY ioctl
It looks like this demoted CAP_SYS_ADMIN to CAP_DAC_OVERRIDE for the
equiv ioctl. Intentional?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 008 of 10] md: Allow raid 'layout' to be read and set via sysfs.
2006-06-01 5:14 ` [PATCH 008 of 10] md: Allow raid 'layout' to be read and " NeilBrown
@ 2006-06-01 5:34 ` Chris Wright
2006-06-01 5:44 ` Neil Brown
0 siblings, 1 reply; 17+ messages in thread
From: Chris Wright @ 2006-06-01 5:34 UTC (permalink / raw)
To: NeilBrown; +Cc: Andrew Morton, linux-raid, linux-kernel
* NeilBrown (neilb@suse.de) wrote:
> +static struct md_sysfs_entry md_layout =
> +__ATTR(layout, 0655, layout_show, layout_store);
0644?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 006 of 10] md: Set/get state of array via sysfs
2006-06-01 5:33 ` Chris Wright
@ 2006-06-01 5:43 ` Neil Brown
0 siblings, 0 replies; 17+ messages in thread
From: Neil Brown @ 2006-06-01 5:43 UTC (permalink / raw)
To: Chris Wright; +Cc: Andrew Morton, linux-raid, linux-kernel
On Wednesday May 31, chrisw@sous-sol.org wrote:
> * NeilBrown (neilb@suse.de) wrote:
> >
> > This allows the state of an md/array to be directly controlled
> > via sysfs and adds the ability to stop and array without
> > tearing it down.
> >
> > Array states/settings:
> >
> > clear
> > No devices, no size, no level
> > Equivalent to STOP_ARRAY ioctl
>
> It looks like this demoted CAP_SYS_ADMIN to CAP_DAC_OVERRIDE for the
> equiv ioctl. Intentional?
Uhm.. no. Thanks. I'll fix that, see if I've done similar things
elsewhere, and keep it in mind for the future.
NeilBrown
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 008 of 10] md: Allow raid 'layout' to be read and set via sysfs.
2006-06-01 5:34 ` Chris Wright
@ 2006-06-01 5:44 ` Neil Brown
0 siblings, 0 replies; 17+ messages in thread
From: Neil Brown @ 2006-06-01 5:44 UTC (permalink / raw)
To: Chris Wright; +Cc: Andrew Morton, linux-raid, linux-kernel
On Wednesday May 31, chrisw@sous-sol.org wrote:
> * NeilBrown (neilb@suse.de) wrote:
> > +static struct md_sysfs_entry md_layout =
> > +__ATTR(layout, 0655, layout_show, layout_store);
>
> 0644?
I think the correct response is "Doh!" :-)
Yes, thanks,
NeilBrown
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 010 of 10] md: Allow the write_mostly flag to be set via sysfs.
2006-06-01 5:14 ` [PATCH 010 of 10] md: Allow the write_mostly flag to be set " NeilBrown
@ 2006-08-05 5:59 ` Mike Snitzer
2006-08-05 23:43 ` Mike Snitzer
0 siblings, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2006-08-05 5:59 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
Aside from this write-mostly sysfs support, is there a way to toggle
the write-mostly bit of an md member with mdadm? I couldn't identify
a clear way to do so.
It'd be nice if mdadm --assemble would honor --write-mostly...
On 6/1/06, NeilBrown <neilb@suse.de> wrote:
>
> It appears in /sys/mdX/md/dev-YYY/state
> and can be set or cleared by writing 'writemostly' or '-writemostly'
> respectively.
>
> Signed-off-by: Neil Brown <neilb@suse.de>
>
> ### Diffstat output
> ./Documentation/md.txt | 5 +++++
> ./drivers/md/md.c | 12 ++++++++++++
> 2 files changed, 17 insertions(+)
>
> diff ./Documentation/md.txt~current~ ./Documentation/md.txt
> --- ./Documentation/md.txt~current~ 2006-06-01 15:05:30.000000000 +1000
> +++ ./Documentation/md.txt 2006-06-01 15:05:30.000000000 +1000
> @@ -309,6 +309,9 @@ Each directory contains:
> faulty - device has been kicked from active use due to
> a detected fault
> in_sync - device is a fully in-sync member of the array
> + writemostly - device will only be subject to read
> + requests if there are no other options.
> + This applies only to raid1 arrays.
> spare - device is working, but not a full member.
> This includes spares that are in the process
> of being recoverred to
> @@ -316,6 +319,8 @@ Each directory contains:
> This can be written to.
> Writing "faulty" simulates a failure on the device.
> Writing "remove" removes the device from the array.
> + Writing "writemostly" sets the writemostly flag.
> + Writing "-writemostly" clears the writemostly flag.
>
> errors
> An approximate count of read errors that have been detected on
>
> diff ./drivers/md/md.c~current~ ./drivers/md/md.c
> --- ./drivers/md/md.c~current~ 2006-06-01 15:05:30.000000000 +1000
> +++ ./drivers/md/md.c 2006-06-01 15:05:30.000000000 +1000
> @@ -1737,6 +1737,10 @@ state_show(mdk_rdev_t *rdev, char *page)
> len += sprintf(page+len, "%sin_sync",sep);
> sep = ",";
> }
> + if (test_bit(WriteMostly, &rdev->flags)) {
> + len += sprintf(page+len, "%swrite_mostly",sep);
> + sep = ",";
> + }
> if (!test_bit(Faulty, &rdev->flags) &&
> !test_bit(In_sync, &rdev->flags)) {
> len += sprintf(page+len, "%sspare", sep);
> @@ -1751,6 +1755,8 @@ state_store(mdk_rdev_t *rdev, const char
> /* can write
> * faulty - simulates and error
> * remove - disconnects the device
> + * writemostly - sets write_mostly
> + * -writemostly - clears write_mostly
> */
> int err = -EINVAL;
> if (cmd_match(buf, "faulty") && rdev->mddev->pers) {
> @@ -1766,6 +1772,12 @@ state_store(mdk_rdev_t *rdev, const char
> md_new_event(mddev);
> err = 0;
> }
> + } else if (cmd_match(buf, "writemostly")) {
> + set_bit(WriteMostly, &rdev->flags);
> + err = 0;
> + } else if (cmd_match(buf, "-writemostly")) {
> + clear_bit(WriteMostly, &rdev->flags);
> + err = 0;
> }
> return err ? err : len;
> }
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 010 of 10] md: Allow the write_mostly flag to be set via sysfs.
2006-08-05 5:59 ` Mike Snitzer
@ 2006-08-05 23:43 ` Mike Snitzer
0 siblings, 0 replies; 17+ messages in thread
From: Mike Snitzer @ 2006-08-05 23:43 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 1223 bytes --]
On 8/5/06, Mike Snitzer <snitzer@gmail.com> wrote:
> Aside from this write-mostly sysfs support, is there a way to toggle
> the write-mostly bit of an md member with mdadm? I couldn't identify
> a clear way to do so.
>
> It'd be nice if mdadm --assemble would honor --write-mostly...
I went ahead and implemented the ability to toggle the write-mostly
bit for all disks in an array. I did so by adding another type of
--update to --assemble. This is very useful for a 2 disk raid1 (one
disk local, one remote). When you switch the raidhost you also need
to toggle the write-mostly bit too.
I've tested the attached patch to work with both ver.90 and ver1
superblocks with mdadm 2.4.1 and 2.5.2. The patch is against mdadm
2.4.1 but applies cleanly (with fuzz) against mdadm 2.5.2).
# cat /proc/mdstat
...
md2 : active raid1 nbd2[0] sdd[1](W)
390613952 blocks [2/2] [UU]
bitmap: 0/187 pages [0KB], 1024KB chunk
# mdadm -S /dev/md2
# mdadm --assemble /dev/md2 --run --update=toggle-write-mostly
/dev/sdd /dev/nbd2
mdadm: /dev/md2 has been started with 2 drives.
# cat /proc/mdstat
...
md2 : active raid1 nbd2[0](W) sdd[1]
390613952 blocks [2/2] [UU]
bitmap: 0/187 pages [0KB], 1024KB chunk
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: mdadm-2.4.1_toggle_write_mostly.patch --]
[-- Type: text/x-patch; name="mdadm-2.4.1_toggle_write_mostly.patch", Size: 1836 bytes --]
diff -Naur mdadm-2.4.1/mdadm.c mdadm-2.4.1_toggle_write_mostly/mdadm.c
--- mdadm-2.4.1/mdadm.c 2006-03-28 21:55:39.000000000 -0500
+++ mdadm-2.4.1_toggle_write_mostly/mdadm.c 2006-08-05 17:01:48.000000000 -0400
@@ -587,6 +587,8 @@
continue;
if (strcmp(update, "uuid")==0)
continue;
+ if (strcmp(update, "toggle-write-mostly")==0)
+ continue;
if (strcmp(update, "byteorder")==0) {
if (ss) {
fprintf(stderr, Name ": must not set metadata type with --update=byteorder.\n");
@@ -601,7 +603,7 @@
continue;
}
- fprintf(stderr, Name ": '--update %s' invalid. Only 'sparc2.2', 'super-minor', 'uuid', 'resync' or 'summaries' supported\n",update);
+ fprintf(stderr, Name ": '--update %s' invalid. Only 'sparc2.2', 'super-minor', 'uuid', 'resync', 'summaries' or 'toggle-write-mostly' supported\n",update);
exit(2);
case O(ASSEMBLE,'c'): /* config file */
diff -Naur mdadm-2.4.1/super0.c mdadm-2.4.1_toggle_write_mostly/super0.c
--- mdadm-2.4.1/super0.c 2006-03-28 01:10:51.000000000 -0500
+++ mdadm-2.4.1_toggle_write_mostly/super0.c 2006-08-05 18:04:45.000000000 -0400
@@ -382,6 +382,10 @@
rv = 1;
}
}
+ if (strcmp(update, "toggle-write-mostly")==0) {
+ int d = info->disk.number;
+ sb->disks[d].state ^= (1<<MD_DISK_WRITEMOSTLY);
+ }
if (strcmp(update, "newdev") == 0) {
int d = info->disk.number;
memset(&sb->disks[d], 0, sizeof(sb->disks[d]));
diff -Naur mdadm-2.4.1/super1.c mdadm-2.4.1_toggle_write_mostly/super1.c
--- mdadm-2.4.1/super1.c 2006-04-07 00:32:06.000000000 -0400
+++ mdadm-2.4.1_toggle_write_mostly/super1.c 2006-08-05 18:33:21.000000000 -0400
@@ -446,6 +446,9 @@
rv = 1;
}
}
+ if (strcmp(update, "toggle-write-mostly")==0) {
+ sb->devflags ^= WriteMostly1;
+ }
#if 0
if (strcmp(update, "newdev") == 0) {
int d = info->disk.number;
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2006-08-05 23:43 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-01 5:13 [PATCH 000 of 10] md: Introduction - assorted patches for -mm NeilBrown
2006-06-01 5:13 ` [PATCH 001 of 10] md: md Kconfig speeling feex NeilBrown
2006-06-01 5:13 ` [PATCH 002 of 10] md: Fix Kconfig error NeilBrown
2006-06-01 5:13 ` [PATCH 003 of 10] md: Fix bug that stops raid5 resync from happening NeilBrown
2006-06-01 5:13 ` [PATCH 004 of 10] md: Allow re-add to work on array without bitmaps NeilBrown
2006-06-01 5:13 ` [PATCH 005 of 10] md: Don't write dirty/clean update to spares - leave them alone NeilBrown
2006-06-01 5:13 ` [PATCH 006 of 10] md: Set/get state of array via sysfs NeilBrown
2006-06-01 5:33 ` Chris Wright
2006-06-01 5:43 ` Neil Brown
2006-06-01 5:14 ` [PATCH 007 of 10] md: Allow rdev state to be set " NeilBrown
2006-06-01 5:14 ` [PATCH 008 of 10] md: Allow raid 'layout' to be read and " NeilBrown
2006-06-01 5:34 ` Chris Wright
2006-06-01 5:44 ` Neil Brown
2006-06-01 5:14 ` [PATCH 009 of 10] md: Allow resync_start to be set and queried " NeilBrown
2006-06-01 5:14 ` [PATCH 010 of 10] md: Allow the write_mostly flag to be set " NeilBrown
2006-08-05 5:59 ` Mike Snitzer
2006-08-05 23:43 ` Mike Snitzer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).