[PATCH 000 of 12] md: Introduction

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 000 of 12] md: Introduction
@ 2006-06-27  7:05 NeilBrown
  2006-06-27  7:05 ` [PATCH 001 of 12] md: Possible fix for unplug problem NeilBrown
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: NeilBrown @ 2006-06-27  7:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel

Following are 12 assorted small patches for md.
They are against 2.6.17-mm2 and are suitable for inclusion in 2.6.18.

They are primarily small bug fixes, many fixing possible races, some
of which have been seen in the wild, some not.

Thanks,
NeilBrown

 [PATCH 001 of 12] md: Possible fix for unplug problem
 [PATCH 002 of 12] md: Set desc_nr correctly for version-1 superblocks.
 [PATCH 003 of 12] md: Delay starting md threads until array is completely setup.
 [PATCH 004 of 12] md: Fix resync speed calculation for restarted resyncs.
 [PATCH 005 of 12] md: Fix a plug/unplug race in raid5
 [PATCH 006 of 12] md: Fix some small races in bitmap plugging in raid5.
 [PATCH 007 of 12] md: Fix usage of wrong variable in raid1
 [PATCH 008 of 12] md: Unify usage of symbolic names for perms.
 [PATCH 009 of 12] md: Require CAP_SYS_ADMIN for (re-)configuring md devices via sysfs.
 [PATCH 010 of 12] md: Remove a variable that is now unused.
 [PATCH 011 of 12] md: Fix "Will Configure" message when interpreting md= kernel parameter.
 [PATCH 012 of 12] md: Include sector number in messages about corrected read errors.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 001 of 12] md: Possible fix for unplug problem
  2006-06-27  7:05 [PATCH 000 of 12] md: Introduction NeilBrown
@ 2006-06-27  7:05 ` NeilBrown
  2006-06-27  7:05 ` [PATCH 002 of 12] md: Set desc_nr correctly for version-1 superblocks NeilBrown
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2006-06-27  7:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel


I have reports of a problem with raid5 which turns out to be because
the raid5 device gets stuck in a 'plugged' state.  This shouldn't be
able to happen as 3msec after it gets plugged it should get unplugged.
However it happens none-the-less.  This patch fixes the problem and is
a reasonable thing to do, though it might hurt performance slightly in
some cases.

Until I can find the real problem, we should probably have this
workaround in place.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/raid5.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c	2006-06-27 12:15:17.000000000 +1000
+++ ./drivers/md/raid5.c	2006-06-27 12:16:41.000000000 +1000
@@ -271,7 +271,7 @@ static struct stripe_head *get_active_st
 						     < (conf->max_nr_stripes *3/4)
 						     || !conf->inactive_blocked),
 						    conf->device_lock,
-						    unplug_slaves(conf->mddev)
+						    raid5_unplug_device(conf->mddev->queue)
 					);
 				conf->inactive_blocked = 0;
 			} else

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 002 of 12] md: Set desc_nr correctly for version-1 superblocks.
  2006-06-27  7:05 [PATCH 000 of 12] md: Introduction NeilBrown
  2006-06-27  7:05 ` [PATCH 001 of 12] md: Possible fix for unplug problem NeilBrown
@ 2006-06-27  7:05 ` NeilBrown
  2006-06-27  7:05 ` [PATCH 003 of 12] md: Delay starting md threads until array is completely setup NeilBrown
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2006-06-27  7:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel


This has to be done in ->load_super, not ->validate_super

Without this, hot-adding devices to an array doesn't always
work right - though there is a work around in mdadm-2.5.2 to
make this less of an issue.

### Diffstat output
 ./drivers/md/md.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c	2006-06-27 12:15:17.000000000 +1000
+++ ./drivers/md/md.c	2006-06-27 12:17:32.000000000 +1000
@@ -1064,6 +1064,11 @@ static int super_1_load(mdk_rdev_t *rdev
 	if (rdev->sb_size & bmask)
 		rdev-> sb_size = (rdev->sb_size | bmask)+1;
 
+	if (sb->level == cpu_to_le32(LEVEL_MULTIPATH))
+		rdev->desc_nr = -1;
+	else
+		rdev->desc_nr = le32_to_cpu(sb->dev_number);
+
 	if (refdev == 0)
 		ret = 1;
 	else {
@@ -1173,7 +1178,6 @@ static int super_1_validate(mddev_t *mdd
 	}
 	if (mddev->level != LEVEL_MULTIPATH) {
 		int role;
-		rdev->desc_nr = le32_to_cpu(sb->dev_number);
 		role = le16_to_cpu(sb->dev_roles[rdev->desc_nr]);
 		switch(role) {
 		case 0xffff: /* spare */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 003 of 12] md: Delay starting md threads until array is completely setup.
  2006-06-27  7:05 [PATCH 000 of 12] md: Introduction NeilBrown
  2006-06-27  7:05 ` [PATCH 001 of 12] md: Possible fix for unplug problem NeilBrown
  2006-06-27  7:05 ` [PATCH 002 of 12] md: Set desc_nr correctly for version-1 superblocks NeilBrown
@ 2006-06-27  7:05 ` NeilBrown
  2006-06-27  7:05 ` [PATCH 004 of 12] md: Fix resync speed calculation for restarted resyncs NeilBrown
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2006-06-27  7:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel


When an array is started we start one or two threads (two if
there is a reshape or recovery that needs to be completed).

We currently start these *before* the array is completely set up and
in particular before queue->queuedata is set.  If the thread
actually starts very quickly on another CPU, we can end up
dereferencing queue->queuedata and oops.

This patch also makes sure we don't try to start a recovery if
a reshape is being restarted.

### Diffstat output
 ./drivers/md/md.c    |   10 +++++-----
 ./drivers/md/raid5.c |    3 ---
 2 files changed, 5 insertions(+), 8 deletions(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c	2006-06-27 12:17:32.000000000 +1000
+++ ./drivers/md/md.c	2006-06-27 12:17:32.000000000 +1000
@@ -3100,8 +3100,7 @@ static int do_md_run(mddev_t * mddev)
 		}
 	
 	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-	md_wakeup_thread(mddev->thread);
	
+
 	if (mddev->sb_dirty)
 		md_update_sb(mddev);
 
@@ -3121,7 +3120,7 @@ static int do_md_run(mddev_t * mddev)
 	 * start recovery here.  If we leave it to md_check_recovery,
 	 * it will remove the drives and not do the right thing
 	 */
-	if (mddev->degraded) {
+	if (mddev->degraded && !mddev->sync_thread) {
 		struct list_head *rtmp;
 		int spares = 0;
 		ITERATE_RDEV(mddev,rdev,rtmp)
@@ -3142,10 +3141,11 @@ static int do_md_run(mddev_t * mddev)
 				       mdname(mddev));
 				/* leave the spares where they are, it shouldn't hurt */
 				mddev->recovery = 0;
-			} else
-				md_wakeup_thread(mddev->sync_thread);
+			}
 		}
 	}
+	md_wakeup_thread(mddev->thread);
+	md_wakeup_thread(mddev->sync_thread); /* possibly kick off a reshape */
 
 	mddev->changed = 1;
 	md_new_event(mddev);

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c	2006-06-27 12:16:41.000000000 +1000
+++ ./drivers/md/raid5.c	2006-06-27 12:17:32.000000000 +1000
@@ -3248,9 +3248,6 @@ static int run(mddev_t *mddev)
 		set_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
 		mddev->sync_thread = md_register_thread(md_do_sync, mddev,
 							"%s_reshape");
-		/* FIXME if md_register_thread fails?? */
-		md_wakeup_thread(mddev->sync_thread);
-
 	}
 
 	/* read-ahead size must cover two whole stripes, which is

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 004 of 12] md: Fix resync speed calculation for restarted resyncs.
  2006-06-27  7:05 [PATCH 000 of 12] md: Introduction NeilBrown
                   ` (2 preceding siblings ...)
  2006-06-27  7:05 ` [PATCH 003 of 12] md: Delay starting md threads until array is completely setup NeilBrown
@ 2006-06-27  7:05 ` NeilBrown
  2006-06-27  7:05 ` [PATCH 005 of 12] md: Fix a plug/unplug race in raid5 NeilBrown
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2006-06-27  7:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel


We introduced 'io_sectors' recently so we could count
the sectors that causes io during resync separate from sectors
which didn't cause IO - there can be a difference if a bitmap
is being used to accelerate resync.

However when a speed is reported, we find the number of sectors
processed recently by subtracting an oldish io_sectors count
from a current 'curr_resync' count.  This is wrong because
curr_resync counts all sectors, not just io sectors.

So, add a field to mddev to store the curren io_sectors separately from
curr_resync, and use that in the calculations.

### Diffstat output
 ./drivers/md/md.c           |   10 ++++++----
 ./drivers/md/raid5.c        |    3 ++-
 ./include/linux/raid/md_k.h |    3 ++-
 3 files changed, 10 insertions(+), 6 deletions(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c	2006-06-27 12:17:32.000000000 +1000
+++ ./drivers/md/md.c	2006-06-27 12:17:32.000000000 +1000
@@ -2721,7 +2721,7 @@ static ssize_t
 sync_speed_show(mddev_t *mddev, char *page)
 {
 	unsigned long resync, dt, db;
-	resync = (mddev->curr_resync - atomic_read(&mddev->recovery_active));
+	resync = (mddev->curr_mark_cnt - atomic_read(&mddev->recovery_active));
 	dt = ((jiffies - mddev->resync_mark) / HZ);
 	if (!dt) dt++;
 	db = resync - (mddev->resync_mark_cnt);
@@ -4692,12 +4692,13 @@ static void status_resync(struct seq_fil
 	 */
 	dt = ((jiffies - mddev->resync_mark) / HZ);
 	if (!dt) dt++;
-	db = resync - (mddev->resync_mark_cnt/2);
-	rt = (dt * ((unsigned long)(max_blocks-resync) / (db/100+1)))/100;
+	db = (mddev->curr_mark_cnt - atomic_read(&mddev->recovery_active))
+		- mddev->resync_mark_cnt;
+	rt = (dt * ((unsigned long)(max_blocks-resync) / (db/2/100+1)))/100;
 
 	seq_printf(seq, " finish=%lu.%lumin", rt / 60, (rt % 60)/6);
 
-	seq_printf(seq, " speed=%ldK/sec", db/dt);
+	seq_printf(seq, " speed=%ldK/sec", db/2/dt);
 }
 
 static void *md_seq_start(struct seq_file *seq, loff_t *pos)
@@ -5208,6 +5209,7 @@ void md_do_sync(mddev_t *mddev)
 
 		j += sectors;
 		if (j>1) mddev->curr_resync = j;
+		mddev->curr_mark_cnt = io_sectors;
 		if (last_check == 0)
 			/* this is the earliers that rebuilt will be
 			 * visible in /proc/mdstat

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c	2006-06-27 12:17:32.000000000 +1000
+++ ./drivers/md/raid5.c	2006-06-27 12:17:32.000000000 +1000
@@ -282,7 +282,8 @@ static struct stripe_head *get_active_st
 			} else {
 				if (!test_bit(STRIPE_HANDLE, &sh->state))
 					atomic_inc(&conf->active_stripes);
-				if (list_empty(&sh->lru))
+				if (list_empty(&sh->lru) &&
+				    !test_bit(STRIPE_EXPANDING, &sh->state))
 					BUG();
 				list_del_init(&sh->lru);
 			}

diff .prev/include/linux/raid/md_k.h ./include/linux/raid/md_k.h
--- .prev/include/linux/raid/md_k.h	2006-06-27 12:15:17.000000000 +1000
+++ ./include/linux/raid/md_k.h	2006-06-27 12:17:32.000000000 +1000
@@ -148,9 +148,10 @@ struct mddev_s
 
 	struct mdk_thread_s		*thread;	/* management thread */
 	struct mdk_thread_s		*sync_thread;	/* doing resync or reconstruct */
-	sector_t			curr_resync;	/* blocks scheduled */
+	sector_t			curr_resync;	/* last block scheduled */
 	unsigned long			resync_mark;	/* a recent timestamp */
 	sector_t			resync_mark_cnt;/* blocks written at resync_mark */
+	sector_t			curr_mark_cnt; /* blocks scheduled now */
 
 	sector_t			resync_max_sectors; /* may be set by personality */
 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 005 of 12] md: Fix a plug/unplug race in raid5
  2006-06-27  7:05 [PATCH 000 of 12] md: Introduction NeilBrown
                   ` (3 preceding siblings ...)
  2006-06-27  7:05 ` [PATCH 004 of 12] md: Fix resync speed calculation for restarted resyncs NeilBrown
@ 2006-06-27  7:05 ` NeilBrown
  2006-06-27  7:05 ` [PATCH 006 of 12] md: Fix some small races in bitmap plugging " NeilBrown
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2006-06-27  7:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel


When a device is unplugged, requests are moved from one or two
(depending on whether a bitmap is in use) queues to the main
request queue.

So whenever requests are put on either of those queues, we should make
sure the raid5 array is 'plugged'.
However we don't.  We currently plug the raid5 queue just before
putting requests on queues, so there is room for a race.  If something
unplugs the queue at just the wrong time, requests will be left on
the queue and nothing will want to unplug them.
Normally something else will plug and unplug the queue fairly
soon, but there is a risk that nothing will.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/raid5.c |   18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c	2006-06-27 12:17:32.000000000 +1000
+++ ./drivers/md/raid5.c	2006-06-27 12:17:33.000000000 +1000
@@ -89,12 +89,14 @@ static void __release_stripe(raid5_conf_
 		BUG_ON(!list_empty(&sh->lru));
 		BUG_ON(atomic_read(&conf->active_stripes)==0);
 		if (test_bit(STRIPE_HANDLE, &sh->state)) {
-			if (test_bit(STRIPE_DELAYED, &sh->state))
+			if (test_bit(STRIPE_DELAYED, &sh->state)) {
 				list_add_tail(&sh->lru, &conf->delayed_list);
-			else if (test_bit(STRIPE_BIT_DELAY, &sh->state) &&
-				 conf->seq_write == sh->bm_seq)
+				blk_plug_device(conf->mddev->queue);
+			} else if (test_bit(STRIPE_BIT_DELAY, &sh->state) &&
+				   conf->seq_write == sh->bm_seq) {
 				list_add_tail(&sh->lru, &conf->bitmap_list);
-			else {
+				blk_plug_device(conf->mddev->queue);
+			} else {
 				clear_bit(STRIPE_BIT_DELAY, &sh->state);
 				list_add_tail(&sh->lru, &conf->handle_list);
 			}
@@ -2556,13 +2558,6 @@ static int raid5_issue_flush(request_que
 	return ret;
 }
 
-static inline void raid5_plug_device(raid5_conf_t *conf)
-{
-	spin_lock_irq(&conf->device_lock);
-	blk_plug_device(conf->mddev->queue);
-	spin_unlock_irq(&conf->device_lock);
-}
-
 static int make_request(request_queue_t *q, struct bio * bi)
 {
 	mddev_t *mddev = q->queuedata;
@@ -2672,7 +2667,6 @@ static int make_request(request_queue_t 
 				goto retry;
 			}
 			finish_wait(&conf->wait_for_overlap, &w);
-			raid5_plug_device(conf);
 			handle_stripe(sh, NULL);
 			release_stripe(sh);
 		} else {

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 006 of 12] md: Fix some small races in bitmap plugging in raid5.
  2006-06-27  7:05 [PATCH 000 of 12] md: Introduction NeilBrown
                   ` (4 preceding siblings ...)
  2006-06-27  7:05 ` [PATCH 005 of 12] md: Fix a plug/unplug race in raid5 NeilBrown
@ 2006-06-27  7:05 ` NeilBrown
  2006-06-27  7:05 ` [PATCH 007 of 12] md: Fix usage of wrong variable in raid1 NeilBrown
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2006-06-27  7:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel


The comment gives more details, but I didn't quite have the
sequencing write, so there was room for races to leave bits
unset in the on-disk bitmap for short periods of time.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/raid5.c |   30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c	2006-06-27 12:17:33.000000000 +1000
+++ ./drivers/md/raid5.c	2006-06-27 12:17:33.000000000 +1000
@@ -18,6 +18,30 @@
  * Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
  */
 
+/*
+ * BITMAP UNPLUGGING:
+ *
+ * The sequencing for updating the bitmap reliably is a little
+ * subtle (and I got it wrong the first time) so it deserves some
+ * explanation.
+ *
+ * We group bitmap updates into batches.  Each batch has a number.
+ * We may write out several batches at once, but that isn't very important.
+ * conf->bm_write is the number of the last batch successfully written.
+ * conf->bm_flush is the number of the last batch that was closed to
+ *    new additions.
+ * When we discover that we will need to write to any block in a stripe
+ * (in add_stripe_bio) we update the in-memory bitmap and record in sh->bm_seq
+ * the number of the batch it will be in. This is bm_flush+1.
+ * When we are ready to do a write, if that batch hasn't been written yet,
+ *   we plug the array and queue the stripe for later.
+ * When an unplug happens, we increment bm_flush, thus closing the current
+ *   batch.
+ * When we notice that bm_flush > bm_write, we write out all pending updates
+ * to the bitmap, and advance bm_write to where bm_flush was.
+ * This may occasionally write a bit out twice, but is sure never to
+ * miss any bits.
+ */
 
 #include <linux/config.h>
 #include <linux/module.h>
@@ -93,7 +117,7 @@ static void __release_stripe(raid5_conf_
 				list_add_tail(&sh->lru, &conf->delayed_list);
 				blk_plug_device(conf->mddev->queue);
 			} else if (test_bit(STRIPE_BIT_DELAY, &sh->state) &&
-				   conf->seq_write == sh->bm_seq) {
+				   sh->bm_seq - conf->seq_write > 0) {
 				list_add_tail(&sh->lru, &conf->bitmap_list);
 				blk_plug_device(conf->mddev->queue);
 			} else {
@@ -1274,9 +1298,9 @@ static int add_stripe_bio(struct stripe_
 		(unsigned long long)sh->sector, dd_idx);
 
 	if (conf->mddev->bitmap && firstwrite) {
-		sh->bm_seq = conf->seq_write;
 		bitmap_startwrite(conf->mddev->bitmap, sh->sector,
 				  STRIPE_SECTORS, 0);
+		sh->bm_seq = conf->seq_flush+1;
 		set_bit(STRIPE_BIT_DELAY, &sh->state);
 	}
 
@@ -2920,7 +2944,7 @@ static void raid5d (mddev_t *mddev)
 	while (1) {
 		struct list_head *first;
 
-		if (conf->seq_flush - conf->seq_write > 0) {
+		if (conf->seq_flush != conf->seq_write) {
 			int seq = conf->seq_flush;
 			spin_unlock_irq(&conf->device_lock);
 			bitmap_unplug(mddev->bitmap);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 007 of 12] md: Fix usage of wrong variable in raid1
  2006-06-27  7:05 [PATCH 000 of 12] md: Introduction NeilBrown
                   ` (5 preceding siblings ...)
  2006-06-27  7:05 ` [PATCH 006 of 12] md: Fix some small races in bitmap plugging " NeilBrown
@ 2006-06-27  7:05 ` NeilBrown
  2006-06-27  7:05 ` [PATCH 008 of 12] md: Unify usage of symbolic names for perms NeilBrown
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2006-06-27  7:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel


Though it rarely matters, we should be using 's' rather than
r1_bio->sector here.

### Diffstat output
 ./drivers/md/raid1.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c
--- .prev/drivers/md/raid1.c	2006-06-27 12:15:16.000000000 +1000
+++ ./drivers/md/raid1.c	2006-06-27 12:17:33.000000000 +1000
@@ -1145,7 +1145,7 @@ static int end_sync_write(struct bio *bi
 		long sectors_to_go = r1_bio->sectors;
 		/* make sure these bits doesn't get cleared. */
 		do {
-			bitmap_end_sync(mddev->bitmap, r1_bio->sector,
+			bitmap_end_sync(mddev->bitmap, s,
 					&sync_blocks, 1);
 			s += sync_blocks;
 			sectors_to_go -= sync_blocks;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 008 of 12] md: Unify usage of symbolic names for perms.
  2006-06-27  7:05 [PATCH 000 of 12] md: Introduction NeilBrown
                   ` (6 preceding siblings ...)
  2006-06-27  7:05 ` [PATCH 007 of 12] md: Fix usage of wrong variable in raid1 NeilBrown
@ 2006-06-27  7:05 ` NeilBrown
  2006-06-27  7:05 ` [PATCH 009 of 12] md: Require CAP_SYS_ADMIN for (re-)configuring md devices via sysfs NeilBrown
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2006-06-27  7:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel


Some places we use number (0660) someplaces names (S_IRUGO).
Change all numbers to be names, and change 0655 to be
what it should be.

Also make some formatting more consistent.


Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/md.c |   56 ++++++++++++++++++++++++++----------------------------
 1 file changed, 27 insertions(+), 29 deletions(-)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c	2006-06-27 12:17:32.000000000 +1000
+++ ./drivers/md/md.c	2006-06-27 12:17:33.000000000 +1000
@@ -112,7 +112,7 @@ static ctl_table raid_table[] = {
 		.procname	= "speed_limit_min",
 		.data		= &sysctl_speed_limit_min,
 		.maxlen		= sizeof(int),
-		.mode		= 0644,
+		.mode		= S_IRUGO|S_IWUSR,
 		.proc_handler	= &proc_dointvec,
 	},
 	{
@@ -120,7 +120,7 @@ static ctl_table raid_table[] = {
 		.procname	= "speed_limit_max",
 		.data		= &sysctl_speed_limit_max,
 		.maxlen		= sizeof(int),
-		.mode		= 0644,
+		.mode		= S_IRUGO|S_IWUSR,
 		.proc_handler	= &proc_dointvec,
 	},
 	{ .ctl_name = 0 }
@@ -131,7 +131,7 @@ static ctl_table raid_dir_table[] = {
 		.ctl_name	= DEV_RAID,
 		.procname	= "raid",
 		.maxlen		= 0,
-		.mode		= 0555,
+		.mode		= S_IRUGO|S_IXUGO,
 		.child		= raid_table,
 	},
 	{ .ctl_name = 0 }
@@ -1785,8 +1785,8 @@ state_store(mdk_rdev_t *rdev, const char
 	}
 	return err ? err : len;
 }
-static struct rdev_sysfs_entry
-rdev_state = __ATTR(state, 0644, state_show, state_store);
+static struct rdev_sysfs_entry rdev_state =
+__ATTR(state, S_IRUGO|S_IWUSR, state_show, state_store);
 
 static ssize_t
 super_show(mdk_rdev_t *rdev, char *page)
@@ -1817,7 +1817,7 @@ errors_store(mdk_rdev_t *rdev, const cha
 	return -EINVAL;
 }
 static struct rdev_sysfs_entry rdev_errors =
-__ATTR(errors, 0644, errors_show, errors_store);
+__ATTR(errors, S_IRUGO|S_IWUSR, errors_show, errors_store);
 
 static ssize_t
 slot_show(mdk_rdev_t *rdev, char *page)
@@ -1851,7 +1851,7 @@ slot_store(mdk_rdev_t *rdev, const char 
 
 
 static struct rdev_sysfs_entry rdev_slot =
-__ATTR(slot, 0644, slot_show, slot_store);
+__ATTR(slot, S_IRUGO|S_IWUSR, slot_show, slot_store);
 
 static ssize_t
 offset_show(mdk_rdev_t *rdev, char *page)
@@ -1873,7 +1873,7 @@ offset_store(mdk_rdev_t *rdev, const cha
 }
 
 static struct rdev_sysfs_entry rdev_offset =
-__ATTR(offset, 0644, offset_show, offset_store);
+__ATTR(offset, S_IRUGO|S_IWUSR, offset_show, offset_store);
 
 static ssize_t
 rdev_size_show(mdk_rdev_t *rdev, char *page)
@@ -1897,7 +1897,7 @@ rdev_size_store(mdk_rdev_t *rdev, const 
 }
 
 static struct rdev_sysfs_entry rdev_size =
-__ATTR(size, 0644, rdev_size_show, rdev_size_store);
+__ATTR(size, S_IRUGO|S_IWUSR, rdev_size_show, rdev_size_store);
 
 static struct attribute *rdev_default_attrs[] = {
 	&rdev_state.attr,
@@ -2134,7 +2134,7 @@ safe_delay_store(mddev_t *mddev, const c
 	return len;
 }
 static struct md_sysfs_entry md_safe_delay =
-__ATTR(safe_mode_delay, 0644,safe_delay_show, safe_delay_store);
+__ATTR(safe_mode_delay, S_IRUGO|S_IWUSR,safe_delay_show, safe_delay_store);
 
 static ssize_t
 level_show(mddev_t *mddev, char *page)
@@ -2169,7 +2169,7 @@ level_store(mddev_t *mddev, const char *
 }
 
 static struct md_sysfs_entry md_level =
-__ATTR(level, 0644, level_show, level_store);
+__ATTR(level, S_IRUGO|S_IWUSR, level_show, level_store);
 
 
 static ssize_t
@@ -2194,7 +2194,7 @@ layout_store(mddev_t *mddev, const char 
 	return len;
 }
 static struct md_sysfs_entry md_layout =
-__ATTR(layout, 0655, layout_show, layout_store);
+__ATTR(layout, S_IRUGO|S_IWUSR, layout_show, layout_store);
 
 
 static ssize_t
@@ -2225,7 +2225,7 @@ raid_disks_store(mddev_t *mddev, const c
 	return rv ? rv : len;
 }
 static struct md_sysfs_entry md_raid_disks =
-__ATTR(raid_disks, 0644, raid_disks_show, raid_disks_store);
+__ATTR(raid_disks, S_IRUGO|S_IWUSR, raid_disks_show, raid_disks_store);
 
 static ssize_t
 chunk_size_show(mddev_t *mddev, char *page)
@@ -2249,7 +2249,7 @@ chunk_size_store(mddev_t *mddev, const c
 	return len;
 }
 static struct md_sysfs_entry md_chunk_size =
-__ATTR(chunk_size, 0644, chunk_size_show, chunk_size_store);
+__ATTR(chunk_size, S_IRUGO|S_IWUSR, chunk_size_show, chunk_size_store);
 
 static ssize_t
 resync_start_show(mddev_t *mddev, char *page)
@@ -2273,7 +2273,7 @@ resync_start_store(mddev_t *mddev, const
 	return len;
 }
 static struct md_sysfs_entry md_resync_start =
-__ATTR(resync_start, 0644, resync_start_show, resync_start_store);
+__ATTR(resync_start, S_IRUGO|S_IWUSR, resync_start_show, resync_start_store);
 
 /*
  * The array state can be:
@@ -2443,7 +2443,8 @@ array_state_store(mddev_t *mddev, const 
 	else
 		return len;
 }
-static struct md_sysfs_entry md_array_state = __ATTR(array_state, 0644, array_state_show, array_state_store);
+static struct md_sysfs_entry md_array_state =
+__ATTR(array_state, S_IRUGO|S_IWUSR, array_state_show, array_state_store);
 
 static ssize_t
 null_show(mddev_t *mddev, char *page)
@@ -2503,7 +2504,7 @@ new_dev_store(mddev_t *mddev, const char
 }
 
 static struct md_sysfs_entry md_new_device =
-__ATTR(new_dev, 0200, null_show, new_dev_store);
+__ATTR(new_dev, S_IWUSR, null_show, new_dev_store);
 
 static ssize_t
 size_show(mddev_t *mddev, char *page)
@@ -2541,7 +2542,7 @@ size_store(mddev_t *mddev, const char *b
 }
 
 static struct md_sysfs_entry md_size =
-__ATTR(component_size, 0644, size_show, size_store);
+__ATTR(component_size, S_IRUGO|S_IWUSR, size_show, size_store);
 
 
 /* Metdata version.
@@ -2589,7 +2590,7 @@ metadata_store(mddev_t *mddev, const cha
 }
 
 static struct md_sysfs_entry md_metadata =
-__ATTR(metadata_version, 0644, metadata_show, metadata_store);
+__ATTR(metadata_version, S_IRUGO|S_IWUSR, metadata_show, metadata_store);
 
 static ssize_t
 action_show(mddev_t *mddev, char *page)
@@ -2657,12 +2658,11 @@ mismatch_cnt_show(mddev_t *mddev, char *
 		       (unsigned long long) mddev->resync_mismatches);
 }
 
-static struct md_sysfs_entry
-md_scan_mode = __ATTR(sync_action, S_IRUGO|S_IWUSR, action_show, action_store);
+static struct md_sysfs_entry md_scan_mode =
+__ATTR(sync_action, S_IRUGO|S_IWUSR, action_show, action_store);
 
 
-static struct md_sysfs_entry
-md_mismatches = __ATTR_RO(mismatch_cnt);
+static struct md_sysfs_entry md_mismatches = __ATTR_RO(mismatch_cnt);
 
 static ssize_t
 sync_min_show(mddev_t *mddev, char *page)
@@ -2728,8 +2728,7 @@ sync_speed_show(mddev_t *mddev, char *pa
 	return sprintf(page, "%ld\n", db/dt/2); /* K/sec */
 }
 
-static struct md_sysfs_entry
-md_sync_speed = __ATTR_RO(sync_speed);
+static struct md_sysfs_entry md_sync_speed = __ATTR_RO(sync_speed);
 
 static ssize_t
 sync_completed_show(mddev_t *mddev, char *page)
@@ -2745,8 +2744,7 @@ sync_completed_show(mddev_t *mddev, char
 	return sprintf(page, "%lu / %lu\n", resync, max_blocks);
 }
 
-static struct md_sysfs_entry
-md_sync_completed = __ATTR_RO(sync_completed);
+static struct md_sysfs_entry md_sync_completed = __ATTR_RO(sync_completed);
 
 static ssize_t
 suspend_lo_show(mddev_t *mddev, char *page)
@@ -5676,8 +5674,8 @@ static int set_ro(const char *val, struc
 	return -EINVAL;
 }
 
-module_param_call(start_ro, set_ro, get_ro, NULL, 0600);
-module_param(start_dirty_degraded, int, 0644);
+module_param_call(start_ro, set_ro, get_ro, NULL, S_IRUSR|S_IWUSR);
+module_param(start_dirty_degraded, int, S_IRUGO|S_IWUSR);
 
 
 EXPORT_SYMBOL(register_md_personality);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 009 of 12] md: Require CAP_SYS_ADMIN for (re-)configuring md devices via sysfs.
  2006-06-27  7:05 [PATCH 000 of 12] md: Introduction NeilBrown
                   ` (7 preceding siblings ...)
  2006-06-27  7:05 ` [PATCH 008 of 12] md: Unify usage of symbolic names for perms NeilBrown
@ 2006-06-27  7:05 ` NeilBrown
  2006-06-27  7:05 ` [PATCH 010 of 12] md: Remove a variable that is now unused NeilBrown
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2006-06-27  7:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Chris Wright, linux-raid, linux-kernel



The ioctl requires CAP_SYS_ADMIN, so sysfs should too.
Note that we don't require CAP_SYS_ADMIN for reading
attributes even though the ioctl does.  There is no reason
to limit the read access, and much of the information is
already available via /proc/mdstat

cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/md.c |    4 ++++
 1 file changed, 4 insertions(+)

diff .prev/drivers/md/md.c ./drivers/md/md.c
--- .prev/drivers/md/md.c	2006-06-27 12:17:33.000000000 +1000
+++ ./drivers/md/md.c	2006-06-27 12:17:33.000000000 +1000
@@ -1928,6 +1928,8 @@ rdev_attr_store(struct kobject *kobj, st
 
 	if (!entry->store)
 		return -EIO;
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
 	return entry->store(rdev, page, length);
 }
 
@@ -2861,6 +2863,8 @@ md_attr_store(struct kobject *kobj, stru
 
 	if (!entry->store)
 		return -EIO;
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
 	rv = mddev_lock(mddev);
 	if (!rv) {
 		rv = entry->store(mddev, page, length);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 010 of 12] md: Remove a variable that is now unused.
  2006-06-27  7:05 [PATCH 000 of 12] md: Introduction NeilBrown
                   ` (8 preceding siblings ...)
  2006-06-27  7:05 ` [PATCH 009 of 12] md: Require CAP_SYS_ADMIN for (re-)configuring md devices via sysfs NeilBrown
@ 2006-06-27  7:05 ` NeilBrown
  2006-06-27  7:06 ` [PATCH 011 of 12] md: Fix "Will Configure" message when interpreting md= kernel parameter NeilBrown
  2006-06-27  7:06 ` [PATCH 012 of 12] md: Include sector number in messages about corrected read errors NeilBrown
  11 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2006-06-27  7:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel



Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/raid5.c |    1 -
 1 file changed, 1 deletion(-)

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c	2006-06-27 12:17:33.000000000 +1000
+++ ./drivers/md/raid5.c	2006-06-27 12:17:34.000000000 +1000
@@ -2846,7 +2846,6 @@ static inline sector_t sync_request(mdde
 	struct stripe_head *sh;
 	int pd_idx;
 	int raid_disks = conf->raid_disks;
-	int data_disks = raid_disks - conf->max_degraded;
 	sector_t max_sector = mddev->size << 1;
 	int sync_blocks;
 	int still_degraded = 0;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 011 of 12] md: Fix "Will Configure" message when interpreting md= kernel parameter.
  2006-06-27  7:05 [PATCH 000 of 12] md: Introduction NeilBrown
                   ` (9 preceding siblings ...)
  2006-06-27  7:05 ` [PATCH 010 of 12] md: Remove a variable that is now unused NeilBrown
@ 2006-06-27  7:06 ` NeilBrown
  2006-06-27  7:06 ` [PATCH 012 of 12] md: Include sector number in messages about corrected read errors NeilBrown
  11 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2006-06-27  7:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: H. Peter Anvin, linux-raid, linux-kernel


If a partitionable array is used, we should say e.g.
  Will configure md_d0 (super-block) from ....
rather than
  Will configure md0 (super-block) from ....
which implies non-partitionable.


Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./usr/kinit/do_mounts_md.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff .prev/usr/kinit/do_mounts_md.c ./usr/kinit/do_mounts_md.c
--- .prev/usr/kinit/do_mounts_md.c	2006-06-27 12:15:15.000000000 +1000
+++ ./usr/kinit/do_mounts_md.c	2006-06-27 12:17:34.000000000 +1000
@@ -205,8 +205,8 @@ static int md_setup(char *str)
 		pername = "super-block";
 	}
 
-	fprintf(stderr, "md: Will configure md%d (%s) from %s, below.\n",
-		minor, pername, str);
+	fprintf(stderr, "md: Will configure md%s%d (%s) from %s, below.\n",
+		partitioned?"_d":"", minor, pername, str);
 	md_setup_args[ent].device_names = str;
 	md_setup_args[ent].partitioned = partitioned;
 	md_setup_args[ent].minor = minor;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 012 of 12] md: Include sector number in messages about corrected read errors.
  2006-06-27  7:05 [PATCH 000 of 12] md: Introduction NeilBrown
                   ` (10 preceding siblings ...)
  2006-06-27  7:06 ` [PATCH 011 of 12] md: Fix "Will Configure" message when interpreting md= kernel parameter NeilBrown
@ 2006-06-27  7:06 ` NeilBrown
  11 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2006-06-27  7:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel


This is generally useful, but particularly helps see if it is
the same sector that always needs correcting, or different ones.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/raid1.c  |    3 +++
 ./drivers/md/raid10.c |    4 ++++
 ./drivers/md/raid5.c  |   30 +++++++++++++++++++++++-------
 3 files changed, 30 insertions(+), 7 deletions(-)

diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c
--- .prev/drivers/md/raid1.c	2006-06-27 12:17:33.000000000 +1000
+++ ./drivers/md/raid1.c	2006-06-27 12:17:34.000000000 +1000
@@ -1509,6 +1509,9 @@ static void raid1d(mddev_t *mddev)
 									 s<<9, conf->tmppage, READ) == 0)
 								/* Well, this device is dead */
 								md_error(mddev, rdev);
+							else
+								printk(KERN_INFO "raid1:%s: read error corrected (%d sectors at %llu on %s)\n",
+								       mdname(mddev), s, sect + rdev->data_offset, bdevname(rdev->bdev, b));
 						}
 					}
 				} else {

diff .prev/drivers/md/raid10.c ./drivers/md/raid10.c
--- .prev/drivers/md/raid10.c	2006-06-27 12:15:15.000000000 +1000
+++ ./drivers/md/raid10.c	2006-06-27 12:17:34.000000000 +1000
@@ -1492,6 +1492,10 @@ static void raid10d(mddev_t *mddev)
 									 s<<9, conf->tmppage, READ) == 0)
 								/* Well, this device is dead */
 								md_error(mddev, rdev);
+							else
+								printk(KERN_INFO "raid10:%s: read error corrected (%d sectors at %llu on %s)\n",
+								       mdname(mddev), s, sect+rdev->data_offset, bdevname(rdev->bdev, b));
+
 							rdev_dec_pending(rdev, mddev);
 							rcu_read_lock();
 						}

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c
--- .prev/drivers/md/raid5.c	2006-06-27 12:17:34.000000000 +1000
+++ ./drivers/md/raid5.c	2006-06-27 12:17:34.000000000 +1000
@@ -524,6 +524,8 @@ static int raid5_end_read_request(struct
 	raid5_conf_t *conf = sh->raid_conf;
 	int disks = sh->disks, i;
 	int uptodate = test_bit(BIO_UPTODATE, &bi->bi_flags);
+	char b[BDEVNAME_SIZE];
+	mdk_rdev_t *rdev;
 
 	if (bi->bi_size)
 		return 1;
@@ -571,25 +573,39 @@ static int raid5_end_read_request(struct
 		set_bit(R5_UPTODATE, &sh->dev[i].flags);
 #endif
 		if (test_bit(R5_ReadError, &sh->dev[i].flags)) {
-			printk(KERN_INFO "raid5: read error corrected!!\n");
+			rdev = conf->disks[i].rdev;
+			printk(KERN_INFO "raid5:%s: read error corrected (%lu sectors at %llu on %s)\n",
+			       mdname(conf->mddev), STRIPE_SECTORS,
+			       (unsigned long long)sh->sector + rdev->data_offset,
+			       bdevname(rdev->bdev, b));
 			clear_bit(R5_ReadError, &sh->dev[i].flags);
 			clear_bit(R5_ReWrite, &sh->dev[i].flags);
 		}
 		if (atomic_read(&conf->disks[i].rdev->read_errors))
 			atomic_set(&conf->disks[i].rdev->read_errors, 0);
 	} else {
+		const char *bdn = bdevname(conf->disks[i].rdev->bdev, b);
 		int retry = 0;
+		rdev = conf->disks[i].rdev;
+
 		clear_bit(R5_UPTODATE, &sh->dev[i].flags);
-		atomic_inc(&conf->disks[i].rdev->read_errors);
+		atomic_inc(&rdev->read_errors);
 		if (conf->mddev->degraded)
-			printk(KERN_WARNING "raid5: read error not correctable.\n");
+			printk(KERN_WARNING "raid5:%s: read error not correctable (sector %llu on %s).\n",
+			       mdname(conf->mddev),
+			       (unsigned long long)sh->sector + rdev->data_offset,
+			       bdn);
 		else if (test_bit(R5_ReWrite, &sh->dev[i].flags))
 			/* Oh, no!!! */
-			printk(KERN_WARNING "raid5: read error NOT corrected!!\n");
-		else if (atomic_read(&conf->disks[i].rdev->read_errors)
+			printk(KERN_WARNING "raid5:%s: read error NOT corrected!! (sector %llu on %s).\n",
+			       mdname(conf->mddev),
+			       (unsigned long long)sh->sector + rdev->data_offset,
+			       bdn);
+		else if (atomic_read(&rdev->read_errors)
 			 > conf->max_nr_stripes)
 			printk(KERN_WARNING
-			       "raid5: Too many read errors, failing device.\n");
+			       "raid5:%s: Too many read errors, failing device %s.\n",
+			       mdname(conf->mddev), bdn);
 		else
 			retry = 1;
 		if (retry)
@@ -597,7 +613,7 @@ static int raid5_end_read_request(struct
 		else {
 			clear_bit(R5_ReadError, &sh->dev[i].flags);
 			clear_bit(R5_ReWrite, &sh->dev[i].flags);
-			md_error(conf->mddev, conf->disks[i].rdev);
+			md_error(conf->mddev, rdev);
 		}
 	}
 	rdev_dec_pending(conf->disks[i].rdev, conf->mddev);

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2006-06-27  7:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-27  7:05 [PATCH 000 of 12] md: Introduction NeilBrown
2006-06-27  7:05 ` [PATCH 001 of 12] md: Possible fix for unplug problem NeilBrown
2006-06-27  7:05 ` [PATCH 002 of 12] md: Set desc_nr correctly for version-1 superblocks NeilBrown
2006-06-27  7:05 ` [PATCH 003 of 12] md: Delay starting md threads until array is completely setup NeilBrown
2006-06-27  7:05 ` [PATCH 004 of 12] md: Fix resync speed calculation for restarted resyncs NeilBrown
2006-06-27  7:05 ` [PATCH 005 of 12] md: Fix a plug/unplug race in raid5 NeilBrown
2006-06-27  7:05 ` [PATCH 006 of 12] md: Fix some small races in bitmap plugging " NeilBrown
2006-06-27  7:05 ` [PATCH 007 of 12] md: Fix usage of wrong variable in raid1 NeilBrown
2006-06-27  7:05 ` [PATCH 008 of 12] md: Unify usage of symbolic names for perms NeilBrown
2006-06-27  7:05 ` [PATCH 009 of 12] md: Require CAP_SYS_ADMIN for (re-)configuring md devices via sysfs NeilBrown
2006-06-27  7:05 ` [PATCH 010 of 12] md: Remove a variable that is now unused NeilBrown
2006-06-27  7:06 ` [PATCH 011 of 12] md: Fix "Will Configure" message when interpreting md= kernel parameter NeilBrown
2006-06-27  7:06 ` [PATCH 012 of 12] md: Include sector number in messages about corrected read errors NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).