* Which physical device failed?
@ 2015-05-27 12:04 Michael Munger
2015-05-27 12:10 ` Carsten Aulbert
0 siblings, 1 reply; 13+ messages in thread
From: Michael Munger @ 2015-05-27 12:04 UTC (permalink / raw)
To: linux-raid
In a degraded array (4 disks: sdb,sdc,sdd,sde in RAID 5). One of my disks
failed (sdc).
Is there an easy way to figure out which PHYSICAL disk that is? I have
always just unplugged the sata cables one at a time to map it, but it
occurs to me that the drives may be enumerated in order (SATA port0 is sda,
SATA port 1 is sdb, etc...)
Or, does the OS have access to serial numbers, etc...?
I have to guide someone through a drive replacement on the phone, and it
would be great if I could tell them exactly which drive to swap out...
Thanks in advance,
Michael
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Which physical device failed?
2015-05-27 12:04 Which physical device failed? Michael Munger
@ 2015-05-27 12:10 ` Carsten Aulbert
2015-05-27 12:27 ` Roman Mamedov
0 siblings, 1 reply; 13+ messages in thread
From: Carsten Aulbert @ 2015-05-27 12:10 UTC (permalink / raw)
To: Michael Munger, linux-raid
Hi
On 05/27/2015 02:04 PM, Michael Munger wrote:
> Or, does the OS have access to serial numbers, etc...?
>
> I have to guide someone through a drive replacement on the phone, and it
> would be great if I could tell them exactly which drive to swap out...
If you have direct knowledge, which serial number is where, you could
use hdparm -I /dev/sdX or smartctl -a /dev/sdX against the still
reachable drives.
Possibly the easiest is to *read* data from the still reachable disks
and check which LED does not blink anymore - if you have LEDs for each
of the drives (dd if=/dev/sdX of=/dev/null).
Otherwise someone needs to look at the "open" hardware to find the
serial numbers.
Does this help?
cheers
Carsten
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Which physical device failed?
2015-05-27 12:10 ` Carsten Aulbert
@ 2015-05-27 12:27 ` Roman Mamedov
2015-05-27 13:16 ` Phil Turmel
2015-05-27 17:48 ` Wols Lists
0 siblings, 2 replies; 13+ messages in thread
From: Roman Mamedov @ 2015-05-27 12:27 UTC (permalink / raw)
To: Carsten Aulbert; +Cc: Michael Munger, linux-raid
[-- Attachment #1: Type: text/plain, Size: 926 bytes --]
On Wed, 27 May 2015 14:10:03 +0200
Carsten Aulbert <Carsten.Aulbert@aei.mpg.de> wrote:
> On 05/27/2015 02:04 PM, Michael Munger wrote:
> > Or, does the OS have access to serial numbers, etc...?
> >
> > I have to guide someone through a drive replacement on the phone, and it
> > would be great if I could tell them exactly which drive to swap out...
>
> If you have direct knowledge, which serial number is where, you could
> use hdparm -I /dev/sdX or smartctl -a /dev/sdX against the still
> reachable drives.
If /dev/sdc is still present in the system (even if not responding correctly to
hdparm or smartctl anymore), you should be able to find its serial number from
the udev symlink that was registered earlier, by running e.g.:
ls -la /dev/disk/by-id/ | grep sdc$
Serial number is typically the last piece of the ID, after the manufacturer
name and model number.
--
With respect,
Roman
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Which physical device failed?
2015-05-27 12:27 ` Roman Mamedov
@ 2015-05-27 13:16 ` Phil Turmel
2015-05-27 14:24 ` Michael Munger
2015-05-27 18:16 ` Wilson, Jonathan
2015-05-27 17:48 ` Wols Lists
1 sibling, 2 replies; 13+ messages in thread
From: Phil Turmel @ 2015-05-27 13:16 UTC (permalink / raw)
To: Roman Mamedov, Carsten Aulbert; +Cc: Michael Munger, linux-raid
On 05/27/2015 08:27 AM, Roman Mamedov wrote:
> On Wed, 27 May 2015 14:10:03 +0200
> Carsten Aulbert <Carsten.Aulbert@aei.mpg.de> wrote:
>
>> On 05/27/2015 02:04 PM, Michael Munger wrote:
>>> Or, does the OS have access to serial numbers, etc...?
>>>
>>> I have to guide someone through a drive replacement on the phone, and it
>>> would be great if I could tell them exactly which drive to swap out...
>>
>> If you have direct knowledge, which serial number is where, you could
>> use hdparm -I /dev/sdX or smartctl -a /dev/sdX against the still
>> reachable drives.
>
> If /dev/sdc is still present in the system (even if not responding correctly to
> hdparm or smartctl anymore), you should be able to find its serial number from
> the udev symlink that was registered earlier, by running e.g.:
>
> ls -la /dev/disk/by-id/ | grep sdc$
>
> Serial number is typically the last piece of the ID, after the manufacturer
> name and model number.
>
This is one of the reasons I wrote lsdrv [1], especially after I noticed
that the port sequence it reports is stable for the various ports on
every mobo and sata expansion card I've handled. Per controller, at least.
I save of copy of an lsdrv report for each system I commission so that
there's no ambiguity later.
Phil
[1] https://github.com/pturmel/lsdrv
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Which physical device failed?
2015-05-27 13:16 ` Phil Turmel
@ 2015-05-27 14:24 ` Michael Munger
2015-05-27 18:16 ` Wilson, Jonathan
1 sibling, 0 replies; 13+ messages in thread
From: Michael Munger @ 2015-05-27 14:24 UTC (permalink / raw)
To: Phil Turmel, Roman Mamedov, Carsten Aulbert; +Cc: linux-raid
Phil!
lsdrv did the trick. Roman and Carsten were correct, and I was in the
middle of executing their suggestions when I got your email.
Running lsdrv showed me that the drive was, in fact, still there, but
just inactive. I removed it and re-added it to the array, and it is
rebuilding.
I have sent the output of lsdrv to the client with the note: "Keep this
for your records. At some point, a drive will fail, and we'll use this
to figure out which drive you need to replace."
Thank you all.
On 05/27/2015 09:16 AM, Phil Turmel wrote:
> On 05/27/2015 08:27 AM, Roman Mamedov wrote:
>> On Wed, 27 May 2015 14:10:03 +0200
>> Carsten Aulbert <Carsten.Aulbert@aei.mpg.de> wrote:
>>
>>> On 05/27/2015 02:04 PM, Michael Munger wrote:
>>>> Or, does the OS have access to serial numbers, etc...?
>>>>
>>>> I have to guide someone through a drive replacement on the phone, and it
>>>> would be great if I could tell them exactly which drive to swap out...
>>> If you have direct knowledge, which serial number is where, you could
>>> use hdparm -I /dev/sdX or smartctl -a /dev/sdX against the still
>>> reachable drives.
>> If /dev/sdc is still present in the system (even if not responding correctly to
>> hdparm or smartctl anymore), you should be able to find its serial number from
>> the udev symlink that was registered earlier, by running e.g.:
>>
>> ls -la /dev/disk/by-id/ | grep sdc$
>>
>> Serial number is typically the last piece of the ID, after the manufacturer
>> name and model number.
>>
> This is one of the reasons I wrote lsdrv [1], especially after I noticed
> that the port sequence it reports is stable for the various ports on
> every mobo and sata expansion card I've handled. Per controller, at least.
>
> I save of copy of an lsdrv report for each system I commission so that
> there's no ambiguity later.
>
> Phil
>
> [1] https://github.com/pturmel/lsdrv
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Which physical device failed?
2015-05-27 12:27 ` Roman Mamedov
2015-05-27 13:16 ` Phil Turmel
@ 2015-05-27 17:48 ` Wols Lists
2015-05-27 18:02 ` Roman Mamedov
1 sibling, 1 reply; 13+ messages in thread
From: Wols Lists @ 2015-05-27 17:48 UTC (permalink / raw)
To: Roman Mamedov, Carsten Aulbert; +Cc: Michael Munger, linux-raid
On 27/05/15 13:27, Roman Mamedov wrote:
> On Wed, 27 May 2015 14:10:03 +0200 Carsten Aulbert
> <Carsten.Aulbert@aei.mpg.de> wrote:
>
>> On 05/27/2015 02:04 PM, Michael Munger wrote:
>>> Or, does the OS have access to serial numbers, etc...?
>>>
>>> I have to guide someone through a drive replacement on the
>>> phone, and it would be great if I could tell them exactly which
>>> drive to swap out...
>>
>> If you have direct knowledge, which serial number is where, you
>> could use hdparm -I /dev/sdX or smartctl -a /dev/sdX against the
>> still reachable drives.
>
> If /dev/sdc is still present in the system (even if not responding
> correctly to hdparm or smartctl anymore), you should be able to
> find its serial number from the udev symlink that was registered
> earlier, by running e.g.:
>
> ls -la /dev/disk/by-id/ | grep sdc$
>
> Serial number is typically the last piece of the ID, after the
> manufacturer name and model number.
>
Just for info, I've done an ls -al on my by-id directory, and I have
no clue whatsoever as to what the serial number is. All my drives
appear twice (Seagate Barracudas), there is no manufacturer name that
I can see, and while the model number appears in one of the records,
there is nothing obvious to indicate whether what follows is the
serial number or whether the serial number is part of the other
directory entry.
Not helped, of course, by the fact I have no clue what the serial
number looks like ... :-)
ashdown by-id # ls -la
total 0
drwxr-xr-x 2 root root 620 May 27 08:10 .
drwxr-xr-x 7 root root 140 May 27 08:10 ..
lrwxrwxrwx 1 root root 9 May 27 08:10 ata-Optiarc_DVD_RW_AD-7241S ->
../../sr0
lrwxrwxrwx 1 root root 9 May 27 08:10
ata-ST3000DM001-1CH166_W1F4JWRP -> ../../sdb
lrwxrwxrwx 1 root root 10 May 27 08:10
ata-ST3000DM001-1CH166_W1F4JWRP-part1 -> ../../sdb1
lrwxrwxrwx 1 root root 10 May 27 08:10
ata-ST3000DM001-1CH166_W1F4JWRP-part2 -> ../../sdb2
lrwxrwxrwx 1 root root 10 May 27 08:10
ata-ST3000DM001-1CH166_W1F4JWRP-part3 -> ../../sdb3
lrwxrwxrwx 1 root root 10 May 27 08:10
ata-ST3000DM001-1CH166_W1F4JWRP-part4 -> ../../sdb4
lrwxrwxrwx 1 root root 10 May 27 08:10
ata-ST3000DM001-1CH166_W1F4JWRP-part5 -> ../../sdb5
lrwxrwxrwx 1 root root 9 May 27 08:10
ata-ST3000DM001-1CH166_W1F50K0F -> ../../sda
lrwxrwxrwx 1 root root 10 May 27 08:10
ata-ST3000DM001-1CH166_W1F50K0F-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 May 27 08:10
ata-ST3000DM001-1CH166_W1F50K0F-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 May 27 08:10
ata-ST3000DM001-1CH166_W1F50K0F-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 May 27 08:10
ata-ST3000DM001-1CH166_W1F50K0F-part4 -> ../../sda4
lrwxrwxrwx 1 root root 10 May 27 08:10
ata-ST3000DM001-1CH166_W1F50K0F-part5 -> ../../sda5
lrwxrwxrwx 1 root root 11 May 27 08:10 md-name-ashdown:0 -> ../../md126
lrwxrwxrwx 1 root root 11 May 27 08:10
md-uuid-39b62a86:885bf50d:33f360cf:a409bd11 -> ../../md126
lrwxrwxrwx 1 root root 11 May 27 08:10
md-uuid-42514e8a:2d127c98:7c2f52fe:60835b32 -> ../../md127
lrwxrwxrwx 1 root root 11 May 27 08:10
md-uuid-69270eac:a840f6e7:0199064b:d5863c5d -> ../../md125
lrwxrwxrwx 1 root root 9 May 27 08:10 wwn-0x5000c50072af4400 ->
../../sdb
lrwxrwxrwx 1 root root 10 May 27 08:10 wwn-0x5000c50072af4400-part1
-> ../../sdb1
lrwxrwxrwx 1 root root 10 May 27 08:10 wwn-0x5000c50072af4400-part2
-> ../../sdb2
lrwxrwxrwx 1 root root 10 May 27 08:10 wwn-0x5000c50072af4400-part3
-> ../../sdb3
lrwxrwxrwx 1 root root 10 May 27 08:10 wwn-0x5000c50072af4400-part4
-> ../../sdb4
lrwxrwxrwx 1 root root 10 May 27 08:10 wwn-0x5000c50072af4400-part5
-> ../../sdb5
lrwxrwxrwx 1 root root 9 May 27 08:10 wwn-0x5000c500737a98a4 ->
../../sda
lrwxrwxrwx 1 root root 10 May 27 08:10 wwn-0x5000c500737a98a4-part1
-> ../../sda1
lrwxrwxrwx 1 root root 10 May 27 08:10 wwn-0x5000c500737a98a4-part2
-> ../../sda2
lrwxrwxrwx 1 root root 10 May 27 08:10 wwn-0x5000c500737a98a4-part3
-> ../../sda3
lrwxrwxrwx 1 root root 10 May 27 08:10 wwn-0x5000c500737a98a4-part4
-> ../../sda4
lrwxrwxrwx 1 root root 10 May 27 08:10 wwn-0x5000c500737a98a4-part5
-> ../../sda5
Cheers,
Wol
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Which physical device failed?
2015-05-27 17:48 ` Wols Lists
@ 2015-05-27 18:02 ` Roman Mamedov
2015-05-27 18:19 ` Can Jeuleers
0 siblings, 1 reply; 13+ messages in thread
From: Roman Mamedov @ 2015-05-27 18:02 UTC (permalink / raw)
To: Wols Lists; +Cc: Carsten Aulbert, Michael Munger, linux-raid
[-- Attachment #1: Type: text/plain, Size: 996 bytes --]
On Wed, 27 May 2015 18:48:35 +0100
Wols Lists <antlists@youngman.org.uk> wrote:
> Just for info, I've done an ls -al on my by-id directory, and I have
> no clue whatsoever as to what the serial number is. All my drives
> appear twice (Seagate Barracudas), there is no manufacturer name that
> I can see, and while the model number appears in one of the records,
> there is nothing obvious to indicate whether what follows is the
> serial number or whether the serial number is part of the other
> directory entry.
>
> Not helped, of course, by the fact I have no clue what the serial
> number looks like ... :-)
>
> ata-ST3000DM001-1CH166_W1F4JWRP -> ../../sdb
> ata-ST3000DM001-1CH166_W1F50K0F -> ../../sda
From a picture of this model's top side
http://www.nix.ru/autocatalog/hdd_seagate/126689_2245_draft_large.jpg
the serial number on this model is 8 alphanumeric characters, so in your case
W1F4JWRP and W1F50K0F are the serial numbers.
--
With respect,
Roman
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Which physical device failed?
2015-05-27 13:16 ` Phil Turmel
2015-05-27 14:24 ` Michael Munger
@ 2015-05-27 18:16 ` Wilson, Jonathan
2015-05-27 18:38 ` Phil Turmel
1 sibling, 1 reply; 13+ messages in thread
From: Wilson, Jonathan @ 2015-05-27 18:16 UTC (permalink / raw)
To: Phil Turmel; +Cc: Roman Mamedov, Carsten Aulbert, Michael Munger, linux-raid
On Wed, 2015-05-27 at 09:16 -0400, Phil Turmel wrote:
> On 05/27/2015 08:27 AM, Roman Mamedov wrote:
> > On Wed, 27 May 2015 14:10:03 +0200
> > Carsten Aulbert <Carsten.Aulbert@aei.mpg.de> wrote:
> >
> >> On 05/27/2015 02:04 PM, Michael Munger wrote:
> >>> Or, does the OS have access to serial numbers, etc...?
> >>>
> >>> I have to guide someone through a drive replacement on the phone, and it
> >>> would be great if I could tell them exactly which drive to swap out...
> >>
> >> If you have direct knowledge, which serial number is where, you could
> >> use hdparm -I /dev/sdX or smartctl -a /dev/sdX against the still
> >> reachable drives.
> >
> > If /dev/sdc is still present in the system (even if not responding correctly to
> > hdparm or smartctl anymore), you should be able to find its serial number from
> > the udev symlink that was registered earlier, by running e.g.:
> >
> > ls -la /dev/disk/by-id/ | grep sdc$
> >
> > Serial number is typically the last piece of the ID, after the manufacturer
> > name and model number.
> >
>
> This is one of the reasons I wrote lsdrv [1], especially after I noticed
> that the port sequence it reports is stable for the various ports on
> every mobo and sata expansion card I've handled. Per controller, at least.
Interesting that you should say that as on my z97 board if I do a power
off, power on, the drives do indeed stay numbered to the sata ports...
however if I do a "restart" sometimes, very rarely, the drives are
listed with different sdX designations. It may be a quirk of either the
efi, linux, or the fact the drives are not, I believe, turned off during
a restart which may impact on designation. I didn't investigate the whys
as I just noticed that two drives had swapped in two arrays (sdb moved
from a raid10 into the raid6 and that sdc moved from the raid6 into the
raid10) which scared the heck out of me until I realised that it was
just the sdX that had changed not the drives so for one minute I was
expecting massive problems to ensue.
>
> I save of copy of an lsdrv report for each system I commission so that
> there's no ambiguity later.
>
> Phil
>
> [1] https://github.com/pturmel/lsdrv
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Which physical device failed?
2015-05-27 18:02 ` Roman Mamedov
@ 2015-05-27 18:19 ` Can Jeuleers
2015-05-27 18:38 ` Wols Lists
2015-05-27 18:41 ` Benjamin ESTRABAUD
0 siblings, 2 replies; 13+ messages in thread
From: Can Jeuleers @ 2015-05-27 18:19 UTC (permalink / raw)
To: Roman Mamedov, Wols Lists; +Cc: Carsten Aulbert, Michael Munger, linux-raid
On 27/05/15 20:02, Roman Mamedov wrote:
>> ata-ST3000DM001-1CH166_W1F4JWRP -> ../../sdb
>> ata-ST3000DM001-1CH166_W1F50K0F -> ../../sda
>
> From a picture of this model's top side
> http://www.nix.ru/autocatalog/hdd_seagate/126689_2245_draft_large.jpg
> the serial number on this model is 8 alphanumeric characters, so in your case
> W1F4JWRP and W1F50K0F are the serial numbers.
Indeed, and hdparm -i will also tell you.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Which physical device failed?
2015-05-27 18:16 ` Wilson, Jonathan
@ 2015-05-27 18:38 ` Phil Turmel
2015-05-28 9:03 ` Wilson, Jonathan
0 siblings, 1 reply; 13+ messages in thread
From: Phil Turmel @ 2015-05-27 18:38 UTC (permalink / raw)
To: Wilson, Jonathan
Cc: Roman Mamedov, Carsten Aulbert, Michael Munger, linux-raid
Hi Jonathan,
On 05/27/2015 02:16 PM, Wilson, Jonathan wrote:
> On Wed, 2015-05-27 at 09:16 -0400, Phil Turmel wrote:
>> This is one of the reasons I wrote lsdrv [1], especially after I noticed
>> that the port sequence it reports is stable for the various ports on
>> every mobo and sata expansion card I've handled. Per controller, at least.
>
> Interesting that you should say that as on my z97 board if I do a power
> off, power on, the drives do indeed stay numbered to the sata ports...
> however if I do a "restart" sometimes, very rarely, the drives are
> listed with different sdX designations. It may be a quirk of either the
> efi, linux, or the fact the drives are not, I believe, turned off during
> a restart which may impact on designation. I didn't investigate the whys
> as I just noticed that two drives had swapped in two arrays (sdb moved
> from a raid10 into the raid6 and that sdc moved from the raid6 into the
> raid10) which scared the heck out of me until I realised that it was
> just the sdX that had changed not the drives so for one minute I was
> expecting massive problems to ensue.
I didn't say the names are consistent--in fact, your experience is
entirely normal with modern kernel's device discovery. The names come
out the same for many people by chance (timing, interrupts, whatever).
But a new kernel might have slight differences, and then the names change.
My comment was referring to the SCSI LUNs "N:P:Q:R" that appear under
each controller in lsdrv. These correspond to the hostN/targetP:Q:R
folders in sysfs. P:Q:R appears to reliably correspond to physical
ports. Sometimes with phantom ports, but reliably so. Which is why
lsdrv shows them in order, even if empty. For the controllers I've
played with so far, that is. Consider labeling your cables with the
mobo or adapter's silkscreened port ID and the corresponding P:Q:R string.
Anyways, MD uses the superblock metadata to make sure array members are
properly assembled regardless what name they have at any moment. LVM
does so as well. The mdadm --detail report that shows kernel names
cannot be trusted across boots or between kernel versions.
If you are using /dev/sdX names in fstab or mdadm.conf, you may be
surprised by a boot failure at some point.
Phil
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Which physical device failed?
2015-05-27 18:19 ` Can Jeuleers
@ 2015-05-27 18:38 ` Wols Lists
2015-05-27 18:41 ` Benjamin ESTRABAUD
1 sibling, 0 replies; 13+ messages in thread
From: Wols Lists @ 2015-05-27 18:38 UTC (permalink / raw)
To: Can Jeuleers, Roman Mamedov; +Cc: Carsten Aulbert, Michael Munger, linux-raid
On 27/05/15 19:19, Can Jeuleers wrote:
> On 27/05/15 20:02, Roman Mamedov wrote:
>>> ata-ST3000DM001-1CH166_W1F4JWRP -> ../../sdb
>>> ata-ST3000DM001-1CH166_W1F50K0F -> ../../sda
>>
>> From a picture of this model's top side
>> http://www.nix.ru/autocatalog/hdd_seagate/126689_2245_draft_large.jpg
>> the serial number on this model is 8 alphanumeric characters, so in your case
>> W1F4JWRP and W1F50K0F are the serial numbers.
>
> Indeed, and hdparm -i will also tell you.
>
ashdown by-id # hdparm -i /dev/sda
bash: hdparm: command not found
:-)
Just emerged it and yes, lots of useful info thanks.
Cheers,
Wol
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Which physical device failed?
2015-05-27 18:19 ` Can Jeuleers
2015-05-27 18:38 ` Wols Lists
@ 2015-05-27 18:41 ` Benjamin ESTRABAUD
1 sibling, 0 replies; 13+ messages in thread
From: Benjamin ESTRABAUD @ 2015-05-27 18:41 UTC (permalink / raw)
To: Can Jeuleers, Roman Mamedov, Wols Lists
Cc: Carsten Aulbert, Michael Munger, linux-raid
On 27/05/15 19:19, Can Jeuleers wrote:
> On 27/05/15 20:02, Roman Mamedov wrote:
>>> ata-ST3000DM001-1CH166_W1F4JWRP -> ../../sdb
>>> ata-ST3000DM001-1CH166_W1F50K0F -> ../../sda
>>
>> From a picture of this model's top side
>> http://www.nix.ru/autocatalog/hdd_seagate/126689_2245_draft_large.jpg
>> the serial number on this model is 8 alphanumeric characters, so in your case
>> W1F4JWRP and W1F50K0F are the serial numbers.
>
> Indeed, and hdparm -i will also tell you.
Or you could use the excellent "sg_inq" tool from the sg3_utils package
by Douglas Gilbert to get the serial number, model and more.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Which physical device failed?
2015-05-27 18:38 ` Phil Turmel
@ 2015-05-28 9:03 ` Wilson, Jonathan
0 siblings, 0 replies; 13+ messages in thread
From: Wilson, Jonathan @ 2015-05-28 9:03 UTC (permalink / raw)
To: Phil Turmel; +Cc: Roman Mamedov, Carsten Aulbert, Michael Munger, linux-raid
On Wed, 2015-05-27 at 14:38 -0400, Phil Turmel wrote:
> Hi Jonathan,
>
> On 05/27/2015 02:16 PM, Wilson, Jonathan wrote:
> > On Wed, 2015-05-27 at 09:16 -0400, Phil Turmel wrote:
> >> This is one of the reasons I wrote lsdrv [1], especially after I noticed
> >> that the port sequence it reports is stable for the various ports on
> >> every mobo and sata expansion card I've handled. Per controller, at least.
> >
> > Interesting that you should say that as on my z97 board if I do a power
> > off, power on, the drives do indeed stay numbered to the sata ports...
> > however if I do a "restart" sometimes, very rarely, the drives are
> > listed with different sdX designations. It may be a quirk of either the
> > efi, linux, or the fact the drives are not, I believe, turned off during
> > a restart which may impact on designation. I didn't investigate the whys
> > as I just noticed that two drives had swapped in two arrays (sdb moved
> > from a raid10 into the raid6 and that sdc moved from the raid6 into the
> > raid10) which scared the heck out of me until I realised that it was
> > just the sdX that had changed not the drives so for one minute I was
> > expecting massive problems to ensue.
>
> I didn't say the names are consistent--in fact, your experience is
> entirely normal with modern kernel's device discovery. The names come
> out the same for many people by chance (timing, interrupts, whatever).
> But a new kernel might have slight differences, and then the names change.
My mistake I misinterpreted what you said.
>
> My comment was referring to the SCSI LUNs "N:P:Q:R" that appear under
> each controller in lsdrv. These correspond to the hostN/targetP:Q:R
> folders in sysfs. P:Q:R appears to reliably correspond to physical
> ports. Sometimes with phantom ports, but reliably so. Which is why
> lsdrv shows them in order, even if empty. For the controllers I've
> played with so far, that is. Consider labeling your cables with the
> mobo or adapter's silkscreened port ID and the corresponding P:Q:R string.
I did something similar, "card/port" A-[1-6] (main cpu sata, port)
B-[1-2] (marvel on board, port) C-[1-4] (jbod marvel card, port).
The main reason was that unlike older boards for some strange reason
(circuit paths I guess) the ordering of the physical sata port sockets
bares no relation to the sequence.
>
> Anyways, MD uses the superblock metadata to make sure array members are
> properly assembled regardless what name they have at any moment. LVM
> does so as well. The mdadm --detail report that shows kernel names
> cannot be trusted across boots or between kernel versions.
>
> If you are using /dev/sdX names in fstab or mdadm.conf, you may be
> surprised by a boot failure at some point.
Luckily I have always used GUIDs, but after 4 OS upgrades and many years
of use had never once seen devices not follow chip/port when providing
sdX names so it came as quite a shock even though I knew sdX names can
not be trusted to remain consistent. The only time I had seen them
shift/swap was when say a usb device grabbed sda and the rest shifted 1
letter up.
>
> Phil
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2015-05-28 9:03 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-27 12:04 Which physical device failed? Michael Munger
2015-05-27 12:10 ` Carsten Aulbert
2015-05-27 12:27 ` Roman Mamedov
2015-05-27 13:16 ` Phil Turmel
2015-05-27 14:24 ` Michael Munger
2015-05-27 18:16 ` Wilson, Jonathan
2015-05-27 18:38 ` Phil Turmel
2015-05-28 9:03 ` Wilson, Jonathan
2015-05-27 17:48 ` Wols Lists
2015-05-27 18:02 ` Roman Mamedov
2015-05-27 18:19 ` Can Jeuleers
2015-05-27 18:38 ` Wols Lists
2015-05-27 18:41 ` Benjamin ESTRABAUD
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox