linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] imsm: FIX: Spare disk has wrong serial after takeover
@ 2011-09-15 16:38 Adam Kwolek
  2011-09-19  3:22 ` NeilBrown
  0 siblings, 1 reply; 3+ messages in thread
From: Adam Kwolek @ 2011-09-15 16:38 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, ed.ciechanowski, marcin.labun

Takeover marks disk as failed and adds to serial ':0' string and then
turns it in to spare. This causes that when new spare is about to be used,
it cannot be found due to different disk serial number.

Restore disk serial number to avoid this problem.

Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
---

 super-intel.c |   46 ++++++++++++++++++++++++++++++++++++----------
 1 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index a78d723..f1c924f 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -4407,6 +4407,37 @@ static int add_to_super_imsm_volume(struct supertype *st, mdu_disk_info_t *dk,
 	return 0;
 }
 
+/* mark_spare()
+ *   Function marks disk as spare and restores disk serial
+ *   in case it was previously marked as failed by takeover operation
+ * reruns:
+ *   -1 : critical error
+ *    0 : disk is marked as spare but serial is not set
+ *    1 : success
+ */
+int mark_spare(struct dl *disk)
+{
+	__u8 serial[MAX_RAID_SERIAL_LEN];
+	int ret_val = -1;
+
+	if (!disk)
+		return ret_val;
+
+	ret_val = 0;
+	if (!imsm_read_serial(disk->fd, NULL, serial)) {
+		/* Restore disk serial number, because takeover marks disk
+		 * as failed and adds to serial ':0' before it becomes
+		 * a spare disk.
+		 */
+		serialcpy(disk->serial, serial);
+		serialcpy(disk->disk.serial, serial);
+		ret_val = 1;
+	}
+	disk->disk.status = SPARE_DISK;
+	disk->index = -1;
+
+	return ret_val;
+}
 
 static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk,
 			     int fd, char *devname)
@@ -4444,7 +4475,6 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk,
 	memset(dd, 0, sizeof(*dd));
 	dd->major = major(stb.st_rdev);
 	dd->minor = minor(stb.st_rdev);
-	dd->index = -1;
 	dd->devname = devname ? strdup(devname) : NULL;
 	dd->fd = fd;
 	dd->e = NULL;
@@ -4461,7 +4491,7 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk,
 	size /= 512;
 	serialcpy(dd->disk.serial, dd->serial);
 	dd->disk.total_blocks = __cpu_to_le32(size);
-	dd->disk.status = SPARE_DISK;
+	mark_spare(dd);
 	if (sysfs_disk_to_scsi_id(fd, &id) == 0)
 		dd->disk.scsi_id = __cpu_to_le32(id);
 	else
@@ -4504,9 +4534,8 @@ static int remove_from_super_imsm(struct supertype *st, mdu_disk_info_t *dk)
 	memset(dd, 0, sizeof(*dd));
 	dd->major = dk->major;
 	dd->minor = dk->minor;
-	dd->index = -1;
 	dd->fd = -1;
-	dd->disk.status = SPARE_DISK;
+	mark_spare(dd);
 	dd->action = DISK_REMOVE;
 
 	dd->next = super->disk_mgmt_list;
@@ -5424,10 +5453,8 @@ static int kill_subarray_imsm(struct supertype *st)
 		struct dl *d;
 
 		for (d = super->disks; d; d = d->next)
-			if (d->index > -2) {
-				d->index = -1;
-				d->disk.status = SPARE_DISK;
-			}
+			if (d->index > -2)
+				mark_spare(d);
 	}
 
 	super->updates_pending++;
@@ -7011,8 +7038,7 @@ static int apply_takeover_update(struct imsm_update_takeover *u,
 					if (du->index > idx)
 						du->index--;
 				/* mark as spare disk */
-				dm->disk.status = SPARE_DISK;
-				dm->index = -1;
+				mark_spare(dm);
 			}
 		}
 		/* update map */


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] imsm: FIX: Spare disk has wrong serial after takeover
  2011-09-15 16:38 [PATCH] imsm: FIX: Spare disk has wrong serial after takeover Adam Kwolek
@ 2011-09-19  3:22 ` NeilBrown
  2011-09-21  6:05   ` Williams, Dan J
  0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2011-09-19  3:22 UTC (permalink / raw)
  To: Adam Kwolek; +Cc: linux-raid, ed.ciechanowski, marcin.labun, Dan Williams

[-- Attachment #1: Type: text/plain, Size: 3882 bytes --]

On Thu, 15 Sep 2011 18:38:39 +0200 Adam Kwolek <adam.kwolek@intel.com> wrote:

> Takeover marks disk as failed and adds to serial ':0' string and then
> turns it in to spare. This causes that when new spare is about to be used,
> it cannot be found due to different disk serial number.
> 
> Restore disk serial number to avoid this problem.
> 
> Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>

This looks believeable, and I really like the fact that you have factored out
'mark_spare' as a separate function rather having something open-coded at
various points.

I can't really say if it is 'right' as I don't understand all the details of
exactly how serial numbers are supposed to work.
So I'll apply it and hope than Dan will speak up if he sees any problems.

Thanks,
NeilBrown


> ---
> 
>  super-intel.c |   46 ++++++++++++++++++++++++++++++++++++----------
>  1 files changed, 36 insertions(+), 10 deletions(-)
> 
> diff --git a/super-intel.c b/super-intel.c
> index a78d723..f1c924f 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -4407,6 +4407,37 @@ static int add_to_super_imsm_volume(struct supertype *st, mdu_disk_info_t *dk,
>  	return 0;
>  }
>  
> +/* mark_spare()
> + *   Function marks disk as spare and restores disk serial
> + *   in case it was previously marked as failed by takeover operation
> + * reruns:
> + *   -1 : critical error
> + *    0 : disk is marked as spare but serial is not set
> + *    1 : success
> + */
> +int mark_spare(struct dl *disk)
> +{
> +	__u8 serial[MAX_RAID_SERIAL_LEN];
> +	int ret_val = -1;
> +
> +	if (!disk)
> +		return ret_val;
> +
> +	ret_val = 0;
> +	if (!imsm_read_serial(disk->fd, NULL, serial)) {
> +		/* Restore disk serial number, because takeover marks disk
> +		 * as failed and adds to serial ':0' before it becomes
> +		 * a spare disk.
> +		 */
> +		serialcpy(disk->serial, serial);
> +		serialcpy(disk->disk.serial, serial);
> +		ret_val = 1;
> +	}
> +	disk->disk.status = SPARE_DISK;
> +	disk->index = -1;
> +
> +	return ret_val;
> +}
>  
>  static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk,
>  			     int fd, char *devname)
> @@ -4444,7 +4475,6 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk,
>  	memset(dd, 0, sizeof(*dd));
>  	dd->major = major(stb.st_rdev);
>  	dd->minor = minor(stb.st_rdev);
> -	dd->index = -1;
>  	dd->devname = devname ? strdup(devname) : NULL;
>  	dd->fd = fd;
>  	dd->e = NULL;
> @@ -4461,7 +4491,7 @@ static int add_to_super_imsm(struct supertype *st, mdu_disk_info_t *dk,
>  	size /= 512;
>  	serialcpy(dd->disk.serial, dd->serial);
>  	dd->disk.total_blocks = __cpu_to_le32(size);
> -	dd->disk.status = SPARE_DISK;
> +	mark_spare(dd);
>  	if (sysfs_disk_to_scsi_id(fd, &id) == 0)
>  		dd->disk.scsi_id = __cpu_to_le32(id);
>  	else
> @@ -4504,9 +4534,8 @@ static int remove_from_super_imsm(struct supertype *st, mdu_disk_info_t *dk)
>  	memset(dd, 0, sizeof(*dd));
>  	dd->major = dk->major;
>  	dd->minor = dk->minor;
> -	dd->index = -1;
>  	dd->fd = -1;
> -	dd->disk.status = SPARE_DISK;
> +	mark_spare(dd);
>  	dd->action = DISK_REMOVE;
>  
>  	dd->next = super->disk_mgmt_list;
> @@ -5424,10 +5453,8 @@ static int kill_subarray_imsm(struct supertype *st)
>  		struct dl *d;
>  
>  		for (d = super->disks; d; d = d->next)
> -			if (d->index > -2) {
> -				d->index = -1;
> -				d->disk.status = SPARE_DISK;
> -			}
> +			if (d->index > -2)
> +				mark_spare(d);
>  	}
>  
>  	super->updates_pending++;
> @@ -7011,8 +7038,7 @@ static int apply_takeover_update(struct imsm_update_takeover *u,
>  					if (du->index > idx)
>  						du->index--;
>  				/* mark as spare disk */
> -				dm->disk.status = SPARE_DISK;
> -				dm->index = -1;
> +				mark_spare(dm);
>  			}
>  		}
>  		/* update map */


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] imsm: FIX: Spare disk has wrong serial after takeover
  2011-09-19  3:22 ` NeilBrown
@ 2011-09-21  6:05   ` Williams, Dan J
  0 siblings, 0 replies; 3+ messages in thread
From: Williams, Dan J @ 2011-09-21  6:05 UTC (permalink / raw)
  To: NeilBrown; +Cc: Adam Kwolek, linux-raid, ed.ciechanowski, marcin.labun

On Sun, Sep 18, 2011 at 8:22 PM, NeilBrown <neilb@suse.de> wrote:
> On Thu, 15 Sep 2011 18:38:39 +0200 Adam Kwolek <adam.kwolek@intel.com> wrote:
>
>> Takeover marks disk as failed and adds to serial ':0' string and then
>> turns it in to spare. This causes that when new spare is about to be used,
>> it cannot be found due to different disk serial number.
>>
>> Restore disk serial number to avoid this problem.
>>
>> Signed-off-by: Adam Kwolek <adam.kwolek@intel.com>
>
> This looks believeable, and I really like the fact that you have factored out
> 'mark_spare' as a separate function rather having something open-coded at
> various points.
>
> I can't really say if it is 'right' as I don't understand all the details of
> exactly how serial numbers are supposed to work.
> So I'll apply it and hope than Dan will speak up if he sees any problems.
>

The assembly process checks "in the 'best' super can I look my self up
by serial number".  So if the best super does not have a record of
your serial number then you are an "offline array member".  However I
don't understand the failed-to-spare transition and why the takeover
process can't hide the intermediate failed state from being written to
the metadata.  Maybe that's hard to avoid, but temporarily recording a
fib to the metadata seems to leave a window for confusion.

--
Dan

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-09-21  6:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-15 16:38 [PATCH] imsm: FIX: Spare disk has wrong serial after takeover Adam Kwolek
2011-09-19  3:22 ` NeilBrown
2011-09-21  6:05   ` Williams, Dan J

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).