* upgrading LSI SAS9211-8i fw IR->IT
@ 2017-10-24 11:47 Eyal Lebedinsky
2017-10-24 12:14 ` Roman Mamedov
2017-10-26 3:31 ` [sucess?] " Eyal Lebedinsky
0 siblings, 2 replies; 16+ messages in thread
From: Eyal Lebedinsky @ 2017-10-24 11:47 UTC (permalink / raw)
To: list linux-raid
[-- Attachment #1: Type: text/plain, Size: 1991 bytes --]
[This is a resend, as plain text and with reduced size attachment...]
Following some excitement with my 4yo+ controller I acquires a new one.
I now want to upgrade the fw from IR to IT.
I did some reading, which suggests the process is simple and straight forward - if I am lucky.
The issue seems to be that the flashing program does not run on all mobos.
I think that my server (Intel BOXDH77KC) is not booting UEFI.
Anyway, I plan to do the upgrade elsewhere, probably on my workstation (Gigabyte GA-G33M-DS2R).
Both are rather old and will be upgraded within a year. I decided to give it a test.
I disconnected the (only) disk it has and installed the LSI controller.
[No problem here except that using the on-board VGA video (until now I used an add-on video card)
I see that the letters q-z do not show properly in FreeDOS, but are OK during POST]
I started reading here:
https://forums.servethehome.com/index.php?threads/tutorial-updating-ibm-m1015-lsi-9211-8i-firmware-on-uefi-systems.11462/
where one needs to switch between UEFI and DOS mode. Probably unsuitable for me.
I then proceeded here
http://brycv.com/blog/2012/flashing-it-firmware-to-lsi-sas9211-8i/
where the blog mentions some hurdles along the way.
I used files from two packages:
LSI-9211-8i.zip
9211-8i_Package_P20_IR_IT_FW_BIOS_for_MSDOS_Windows.zip
Booted OK from a FreeDOS 1.2 USB disk. Ran 'sas2flsh.exe -list' and got the attached screen.
Seems to me that this worked. At least I did not get the
ERROR: Failed to initialize PAL. Exiting program.
message, in which case I would need to use the efi mode flasher.
Q1) Does this mean that I am clear to proceed?
Now I am ready to do a flash erase followed by a flash program.
However, LSI warns that a failure after the erase and before the program will leave a dead (unrecoverable) card.
Q2) How do I check that I can safely do BOTH steps?
TIA
--
Eyal Lebedinsky (eyal@eyal.emu.id.au)
--
Eyal Lebedinsky (eyal@eyal.emu.id.au)
[-- Attachment #2: 20171024_145524-small.jpg --]
[-- Type: image/jpeg, Size: 39991 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: upgrading LSI SAS9211-8i fw IR->IT 2017-10-24 11:47 upgrading LSI SAS9211-8i fw IR->IT Eyal Lebedinsky @ 2017-10-24 12:14 ` Roman Mamedov 2017-10-24 13:04 ` Eyal Lebedinsky 2017-10-26 3:31 ` [sucess?] " Eyal Lebedinsky 1 sibling, 1 reply; 16+ messages in thread From: Roman Mamedov @ 2017-10-24 12:14 UTC (permalink / raw) To: Eyal Lebedinsky; +Cc: list linux-raid On Tue, 24 Oct 2017 22:47:48 +1100 Eyal Lebedinsky <eyal@eyal.emu.id.au> wrote: > I now want to upgrade the fw from IR to IT. Is there any practical reason for doing that? I have a SAS9212-4i with a very ancient IR firmware -- and the thing is, it just works perfectly as a dumb SATA controller, including full SMART access to connected disks -- what else is there to ask from it? (no TRIM pass-through, but that's kind of "by design" with these cards, and I doubt it's fixed in the new firmware) If I'm not mistaken it's IT that is the simpler firmware for when you don't need the hardware RAID, but considering the above, and how cumbersome and risky the flashing process is, what am I losing by staying with IR? Anyways here's one more HOWTO on flashing that may or may not be more helpful than the ones you listed: https://wiki.hackspherelabs.com/index.php?title=LSI_Raid_Firmware_Bios_Flashing -- With respect, Roman ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: upgrading LSI SAS9211-8i fw IR->IT 2017-10-24 12:14 ` Roman Mamedov @ 2017-10-24 13:04 ` Eyal Lebedinsky 2017-10-24 17:59 ` Roman Mamedov 0 siblings, 1 reply; 16+ messages in thread From: Eyal Lebedinsky @ 2017-10-24 13:04 UTC (permalink / raw) To: Roman Mamedov; +Cc: list linux-raid On 24/10/17 23:14, Roman Mamedov wrote: > On Tue, 24 Oct 2017 22:47:48 +1100 > Eyal Lebedinsky <eyal@eyal.emu.id.au> wrote: > >> I now want to upgrade the fw from IR to IT. > > Is there any practical reason for doing that? I want to avoid the risk of the IR fw writing any metadata to the disks which already hold a software RAID. Is it not the case that the IR fw may mess with the disks (I read a comment suggesting it might)? Regardless, I want to update to fw 20 - the card is on 18 now. > I have a SAS9212-4i with a very ancient IR firmware -- and the thing is, it > just works perfectly as a dumb SATA controller, including full SMART access to > connected disks -- what else is there to ask from it? This is a fair comment which I will consider. ATM I avoid booting this card with the disks connected. > (no TRIM pass-through, but that's kind of "by design" with these cards, and I > doubt it's fixed in the new firmware) Not an issue for me. > If I'm not mistaken it's IT that is the simpler firmware for when you don't > need the hardware RAID, but considering the above, and how cumbersome and risky > the flashing process is, what am I losing by staying with IR? Yes, I expect the IT fw is simpler, and I hope more stable too as it mostly stays out of the way. > Anyways here's one more HOWTO on flashing that may or may not be more helpful > than the ones you listed: > https://wiki.hackspherelabs.com/index.php?title=LSI_Raid_Firmware_Bios_Flashing Thanks for the link. -- Eyal Lebedinsky (eyal@eyal.emu.id.au) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: upgrading LSI SAS9211-8i fw IR->IT 2017-10-24 13:04 ` Eyal Lebedinsky @ 2017-10-24 17:59 ` Roman Mamedov 0 siblings, 0 replies; 16+ messages in thread From: Roman Mamedov @ 2017-10-24 17:59 UTC (permalink / raw) To: Eyal Lebedinsky; +Cc: list linux-raid On Wed, 25 Oct 2017 00:04:37 +1100 Eyal Lebedinsky <eyal@eyal.emu.id.au> wrote: > I want to avoid the risk of the IR fw writing any metadata to the disks which already > hold a software RAID. > > Is it not the case that the IR fw may mess with the disks (I read a comment suggesting it might)? I believe when HW RAID controllers do that, they set up HPA to reserve a small area at the end, and write the metadata there. But just checked, there is no HPA on any of the 3 disks that I have connected. $ sudo hdparm -N /dev/sdi /dev/sdi: max sectors = 3907029168/3907029168, HPA is disabled $ sudo hdparm -N /dev/sdh /dev/sdh: max sectors = 3907029168/3907029168, HPA is disabled $ sudo hdparm -N /dev/sdg /dev/sdg: max sectors = 3907029168/3907029168, HPA is disabled You can test by connecting some disk with data you don't care about, and checking if its contents get modified (especially at the end), or if it gets HPA enabled. -- With respect, Roman ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [sucess?] upgrading LSI SAS9211-8i fw IR->IT 2017-10-24 11:47 upgrading LSI SAS9211-8i fw IR->IT Eyal Lebedinsky 2017-10-24 12:14 ` Roman Mamedov @ 2017-10-26 3:31 ` Eyal Lebedinsky 2017-10-27 23:09 ` Eyal Lebedinsky 2017-11-02 10:51 ` [sucess] " Eyal Lebedinsky 1 sibling, 2 replies; 16+ messages in thread From: Eyal Lebedinsky @ 2017-10-26 3:31 UTC (permalink / raw) To: list linux-raid On 24/10/17 22:47, Eyal Lebedinsky wrote: > [This is a resend, as plain text and with reduced size attachment...] > > Following some excitement with my 4yo+ controller I acquires a new one. > I now want to upgrade the fw from IR to IT. > > I did some reading, which suggests the process is simple and straight forward - if I am lucky. > The issue seems to be that the flashing program does not run on all mobos. > > I think that my server (Intel BOXDH77KC) is not booting UEFI. > Anyway, I plan to do the upgrade elsewhere, probably on my workstation (Gigabyte GA-G33M-DS2R). > Both are rather old and will be upgraded within a year. I decided to give it a test. > > I disconnected the (only) disk it has and installed the LSI controller. > [No problem here except that using the on-board VGA video (until now I used an add-on video card) > I see that the letters q-z do not show properly in FreeDOS, but are OK during POST] > > I started reading here: > https://forums.servethehome.com/index.php?threads/tutorial-updating-ibm-m1015-lsi-9211-8i-firmware-on-uefi-systems.11462/ > where one needs to switch between UEFI and DOS mode. Probably unsuitable for me. > > I then proceeded here > http://brycv.com/blog/2012/flashing-it-firmware-to-lsi-sas9211-8i/ > where the blog mentions some hurdles along the way. > > I used files from two packages: > LSI-9211-8i.zip > 9211-8i_Package_P20_IR_IT_FW_BIOS_for_MSDOS_Windows.zip > > Booted OK from a FreeDOS 1.2 USB disk. Ran 'sas2flsh.exe -list' and got the attached screen. > Seems to me that this worked. At least I did not get the > ERROR: Failed to initialize PAL. Exiting program. > message, in which case I would need to use the efi mode flasher. > > Q1) Does this mean that I am clear to proceed? > > Now I am ready to do a flash erase followed by a flash program. > However, LSI warns that a failure after the erase and before the program will leave a dead (unrecoverable) card. > > Q2) How do I check that I can safely do BOTH steps? > > TIA I progressed slowly and I think that I finally succeeded. 1) Prepared a bootable USB disk (full FreeDOS 1.2) 2) Copied the required files to the USB disk. 3) Installed the LSI card in a PC after disconnecting all the disks 4) Booted from the USB and tried sas2flsh -list It worked. 5) Upgraded from fw IR 18 to IR 20 sas2flsh -o -f 2118ir.bin -b mptsas2.rom It worked again, so now I was ready to attempt a flash clear and upgrade to IT 6) Cleared the flash sas2flsh -o -e 6 After it said that it is clearing I got no more messages for over 10 minutes. This was my worst worry as the card would be bricked. ^C had no effect. Ctl-Alt-Del had no effect. The machine was deal. Is this proof that there IS a God? [BTW, this machine was known to lock up at times, I thought it was the system (linux) but it now seems to be a more fundamental issue] 7) Re-booted from the USB and ran sas2flsh -list A message came up saying the card is not operational, but surprisingly it proceeded to say a firmware is required and asked for a file name. I entered '2118it.bin' and it succeeded in flashing it as a '-list' proved. At the end I rebooted once more and all looked good. Naturally there is no BIOS programmed (I could flash it but decided that the IT fw probably does not require it) Q1) is this correct? Comparing the '-list' details before and after the process, on top of the new fw (and no BIOS) I noticed that the "SAS Address" changed. Q2) Should I reconfigure the card with the original address with sas2flsh -o -sasadd 500605B-#-####-#### I assume it is only used as a global unique ID. As a final test I plan to boot the actual server with this card and 3 sacrificial disks (now all zeroed) attached, to confirm that nothing is written to the disks. As an aside, I now do not see the text corruption I mentioned earlier, so it was probably the BIOS causing it and not FreeDOS. cheers -- Eyal Lebedinsky (eyal@eyal.emu.id.au) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [sucess?] upgrading LSI SAS9211-8i fw IR->IT 2017-10-26 3:31 ` [sucess?] " Eyal Lebedinsky @ 2017-10-27 23:09 ` Eyal Lebedinsky 2017-10-29 15:02 ` Brad Campbell 2017-11-02 10:51 ` [sucess] " Eyal Lebedinsky 1 sibling, 1 reply; 16+ messages in thread From: Eyal Lebedinsky @ 2017-10-27 23:09 UTC (permalink / raw) To: list linux-raid On 26/10/17 14:31, Eyal Lebedinsky wrote: [trimmed fw upgrade notes] > As a final test I plan to boot the actual server with this card and 3 sacrificial > disks (now all zeroed) attached, to confirm that nothing is written to the disks. I installed the card with three disks that were zeroed and booted a fedora 26. I saw the disks OK. I rebooted the machine (removing the disks) and then checked all disks and they were still zeroed. I am ready to commission this controller after next week's backup, but would still like answers to some questions. Q1) I did not flash a rom file, do I need to do so (with the IT fw)? Q2) The upgrade changed the SAS Address, should I reprogram the original address? Q3) Below are the relevant messages from the test, do they look good? Is the "overriding NVDATA EEDPTagMode setting" OK? cheers Eyal kernel: mpt3sas version 15.100.00.00 loaded kernel: mpt3sas 0000:01:00.0: can't disable ASPM; OS doesn't have ASPM control kernel: mpt2sas_cm0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (8162760 kB) kernel: mpt2sas_cm0: MSI-X vectors supported: 1, no of cores: 4, max_msix_vectors: -1 kernel: mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 30 kernel: mpt2sas_cm0: iomem(0x00000000f1040000), mapped(0xffffb20541444000), size(16384) kernel: mpt2sas_cm0: ioport(0x000000000000b000), size(256) kernel: mpt2sas_cm0: Allocated physical memory: size(7579 kB) kernel: mpt2sas_cm0: Current Controller Queue Depth(3364),Max Controller Queue Depth(3432) kernel: mpt2sas_cm0: Scatter Gather Elements per IO(128) kernel: mpt2sas_cm0: overriding NVDATA EEDPTagMode setting kernel: mpt2sas_cm0: LSISAS2008: FWVersion(20.00.07.00), ChipRevision(0x03), BiosVersion(00.00.00.00) kernel: mpt2sas_cm0: Protocol=( kernel: Initiator kernel: ,Target kernel: ), kernel: Capabilities=( kernel: TLR kernel: ,EEDP kernel: ,Snapshot Buffer kernel: ,Diag Trace Buffer kernel: ,Task Set Full kernel: ,NCQ kernel: ) kernel: scsi host8: Fusion MPT SAS Host kernel: mpt2sas_cm0: sending port enable !! kernel: mpt2sas_cm0: host_add: handle(0x0001), sas_addr(0x500605b0013ca580), phys(8) kernel: scsi 8:0:0:0: Direct-Access ATA SAMSUNG HD400LJ 0-15 PQ: 0 ANSI: 6 kernel: scsi 8:0:0:0: SATA: handle(0x0009), sas_addr(0x4433221101000000), phy(1), device_name(0x0000000000000000) kernel: scsi 8:0:0:0: SATA: enclosure_logical_id(0x500605b0013ca580), slot(2) kernel: scsi 8:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) kernel: scsi 8:0:1:0: Direct-Access ATA SAMSUNG HD401LJ 0-15 PQ: 0 ANSI: 6 kernel: scsi 8:0:1:0: SATA: handle(0x000a), sas_addr(0x4433221102000000), phy(2), device_name(0x0000000000000000) kernel: scsi 8:0:1:0: SATA: enclosure_logical_id(0x500605b0013ca580), slot(1) kernel: scsi 8:0:1:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) kernel: scsi 8:0:2:0: Direct-Access ATA WDC WD3200JD-00K 5J08 PQ: 0 ANSI: 6 kernel: scsi 8:0:2:0: SATA: handle(0x000b), sas_addr(0x4433221103000000), phy(3), device_name(0x0000000000000000) kernel: scsi 8:0:2:0: SATA: enclosure_logical_id(0x500605b0013ca580), slot(0) kernel: scsi 8:0:2:0: atapi(n), ncq(n), asyn_notify(n), smart(y), fua(n), sw_preserve(n) kernel: mpt2sas_cm0: port enable: SUCCESS kernel: sd 8:0:0:0: Attached scsi generic sg10 type 0 kernel: sd 8:0:1:0: Attached scsi generic sg11 type 0 kernel: sd 8:0:2:0: Attached scsi generic sg12 type 0 kernel: sd 8:0:2:0: [sdl] 625140335 512-byte logical blocks: (320 GB/298 GiB) kernel: sd 8:0:0:0: [sdj] 781422768 512-byte logical blocks: (400 GB/373 GiB) kernel: sd 8:0:1:0: [sdk] 781422768 512-byte logical blocks: (400 GB/373 GiB) kernel: sd 8:0:2:0: [sdl] Write Protect is off kernel: sd 8:0:2:0: [sdl] Write cache: enabled, read cache: enabled, supports DPO and FUA kernel: sd 8:0:0:0: [sdj] Write Protect is off kernel: sd 8:0:2:0: [sdl] Attached SCSI disk kernel: sd 8:0:0:0: [sdj] Write cache: enabled, read cache: enabled, supports DPO and FUA kernel: sd 8:0:1:0: [sdk] Write Protect is off kernel: sd 8:0:1:0: [sdk] Write cache: enabled, read cache: enabled, supports DPO and FUA kernel: sd 8:0:0:0: [sdj] Attached SCSI disk kernel: sd 8:0:1:0: [sdk] Attached SCSI disk -- Eyal Lebedinsky (eyal@eyal.emu.id.au) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [sucess?] upgrading LSI SAS9211-8i fw IR->IT 2017-10-27 23:09 ` Eyal Lebedinsky @ 2017-10-29 15:02 ` Brad Campbell 0 siblings, 0 replies; 16+ messages in thread From: Brad Campbell @ 2017-10-29 15:02 UTC (permalink / raw) To: Eyal Lebedinsky, list linux-raid On 28/10/17 07:09, Eyal Lebedinsky wrote: > I am ready to commission this controller after next week's backup, but > would still like answers to > some questions. > > Q1) I did not flash a rom file, do I need to do so (with the IT fw)? Not unless you plan on trying to boot from one. I have the BIOS zeroed on all my cards as it speeds up POST. > Q2) The upgrade changed the SAS Address, should I reprogram the original > address? I always reset the address, but mainly because I don't want 2 cards in the machine with the same address. > Q3) Below are the relevant messages from the test, do they look good? > Is the "overriding NVDATA EEDPTagMode setting" OK? Can't answer that one. A quick squiz at the source doesn't seem like it's particularly evil though. Brad -- Dolphins are so intelligent that within a few weeks they can train Americans to stand at the edge of the pool and throw them fish. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [sucess] upgrading LSI SAS9211-8i fw IR->IT 2017-10-26 3:31 ` [sucess?] " Eyal Lebedinsky 2017-10-27 23:09 ` Eyal Lebedinsky @ 2017-11-02 10:51 ` Eyal Lebedinsky 2017-11-03 0:54 ` Brad Campbell 1 sibling, 1 reply; 16+ messages in thread From: Eyal Lebedinsky @ 2017-11-02 10:51 UTC (permalink / raw) To: list linux-raid On 26/10/17 14:31, Eyal Lebedinsky wrote: > On 24/10/17 22:47, Eyal Lebedinsky wrote: [trimmed the story of reflashing the LSI to the latest IT fw] > As a final test I plan to boot the actual server with this card and 3 sacrificial > disks (now all zeroed) attached, to confirm that nothing is written to the disks. For the record, in closing this thread: - I did the above test and confirmed that the disks are not written to by the driver. - I replaced the HighPoint HBA with the LSI and simply moved the SFF-8087 across. The array came up without issue. - I ran a full raid 'check' - zero mismatches. This was a good surprise because I had three cases of full array failures with the HighPoint and expected some corruption. What I noted so far: - the heat sink feels very hot even after short usage. - the disks were now in a different order, pretty much reverse order. I am not sure the order will remain fixed (by port number?) or variable (as the disks spin up). I thank all who responded, I found it very helpful. cheers Eyal > As an aside, I now do not see the text corruption I mentioned earlier, so it was > probably the BIOS causing it and not FreeDOS. > > cheers > Eyal -- Eyal Lebedinsky (eyal@eyal.emu.id.au) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [sucess] upgrading LSI SAS9211-8i fw IR->IT 2017-11-02 10:51 ` [sucess] " Eyal Lebedinsky @ 2017-11-03 0:54 ` Brad Campbell 2017-11-03 2:31 ` Eyal Lebedinsky 0 siblings, 1 reply; 16+ messages in thread From: Brad Campbell @ 2017-11-03 0:54 UTC (permalink / raw) To: Eyal Lebedinsky, list linux-raid On 02/11/17 18:51, Eyal Lebedinsky wrote: > What I noted so far: > - the heat sink feels very hot even after short usage. Yeah, they do get warm. Best to make sure you have a bit of airflow over them. It doesn't take much air movement to keep temps in check. > - the disks were now in a different order, pretty much reverse order. > I am not sure the order will remain fixed (by port number?) or > variable (as the disks spin up). The driver scans them in port/slot order, and apparently in order of increasing pci address in the case of multiple cards. In my case where I have staggered spinup enabled it spins them up in groups and then waits for them in order, so things don't tend to move around unless you shift hardware about or a drive fails. Make sure you do a periodic lsdrv just for records sake, but as yet I've not needed it. I keep a spreadsheet which lists which drive S/N is in which physical slot so when something happens I can just look up which drive needs to be popped without risk of pulling the wrong disk. I swapped out a set of highpoint controllers for these LSI units back in 2011 and it was the best thing I ever did for storage speed and reliability. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [sucess] upgrading LSI SAS9211-8i fw IR->IT 2017-11-03 0:54 ` Brad Campbell @ 2017-11-03 2:31 ` Eyal Lebedinsky 2017-11-03 3:03 ` Brad Campbell 2017-11-06 20:15 ` Wolfgang Denk 0 siblings, 2 replies; 16+ messages in thread From: Eyal Lebedinsky @ 2017-11-03 2:31 UTC (permalink / raw) To: Brad Campbell, list linux-raid On 03/11/17 11:54, Brad Campbell wrote: > On 02/11/17 18:51, Eyal Lebedinsky wrote: > >> What I noted so far: >> - the heat sink feels very hot even after short usage. > > Yeah, they do get warm. Best to make sure you have a bit of airflow over them. It doesn't take much air movement to keep temps in check. > >> - the disks were now in a different order, pretty much reverse order. >> I am not sure the order will remain fixed (by port number?) or variable (as the disks spin up). > > The driver scans them in port/slot order, and apparently in order of increasing pci address in the case of multiple cards. In my case where I have staggered spinup enabled it spins them up in groups and then waits for them in order, so things don't tend to move around unless you shift hardware about or a drive fails. I have only one card. However, I did not reconnect the drives to the SFF-8087 but merely moved the two harnesses to the new card. I recorded the S/N of the disks that were detected as c,d,e,f,g,h,i and recorded their physical location (I had many disk failures/replacements in the 4 years life of this array). Now, with the LSI, looking at the S/N I see the disks detected as h,g,f,e,d,c,i. I would understand if two 4-way connectors were swapped, but this (mostly) reverse order? I expect that the two sockets on the card are in different order, and the four lanes on each SFF are also in reverse order. BTW, sdi was always very slow to spin up so maybe this is why it is last rather than before sdd if it followed the same reverse ordering. > Make sure you do a periodic lsdrv just for records sake, but as yet I've not needed it. I keep a spreadsheet which lists which drive S/N is in which physical slot so when something happens I can just look up which drive needs to be popped without risk of pulling the wrong disk. The HighPoint had a habit of resetting a nearby disk when hot-removing another so I always did the swapping offline. Maybe the LSI is better? My 7 disk array (4TB WD blacks) had 11 replacements so far (4 years) ... > I swapped out a set of highpoint controllers for these LSI units back in 2011 and it was the best thing I ever did for storage speed and reliability. -- Eyal Lebedinsky (eyal@eyal.emu.id.au) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [sucess] upgrading LSI SAS9211-8i fw IR->IT 2017-11-03 2:31 ` Eyal Lebedinsky @ 2017-11-03 3:03 ` Brad Campbell 2017-11-03 3:39 ` Eyal Lebedinsky 2017-11-06 20:15 ` Wolfgang Denk 1 sibling, 1 reply; 16+ messages in thread From: Brad Campbell @ 2017-11-03 3:03 UTC (permalink / raw) To: Eyal Lebedinsky, list linux-raid On 03/11/17 10:31, Eyal Lebedinsky wrote: > The HighPoint had a habit of resetting a nearby disk when hot-removing > another so I always did the swapping > offline. Maybe the LSI is better? My 7 disk array (4TB WD blacks) had 11 > replacements so far (4 years) ... > I've swapped out everything from individual disks to entire arrays with the machine running. If possible and I care about the disk I'll take care to spin it down with hdparm first, but regardless the LSI controllers have behaved flawlessly. The only issue I see periodically is when a smart poll coincides with some seriously heavy activity I might get one or more messages like the following in dmesg : [1936062.640198] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) Generally I only see those during my monthly array scrub where every disk in every array is going hammer and tongs simultaneously and I've never seen an issue related to those messages. Unrelated an with regard to SSD, for TRIM to work you need "returns deterministic" *and* "returns zero" set for the card to enable trim. I have some intel 330's that do, and some Samsung 830s that don't. Apparently the 840Pro was the only Samsung drive that did the business. When I do my next SSD upgrade (these are only 60% gone after 6 years) I'll seek out drives that have the right features. I recently upgraded one of my base servers and had to replace an 8 port LSI with a 16 port (2x2008 with 1x2016). I did the same firmware upgrade and there have been no performance or reliability issues. I really like the LSI cards. Odd to hear of your disk issues. I just swapped out 7 WD Green drives with 6 years on them. I started with 10 and lost 1 to early life, and 2 to grown defects in the last year or so (classic bathtub curve). They were early units though that still had TLER enabled on them. The only drives I've had mass attrition on were Seagate/Maxtor 1TB 7200.11, and they were known as not great units. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [sucess] upgrading LSI SAS9211-8i fw IR->IT 2017-11-03 3:03 ` Brad Campbell @ 2017-11-03 3:39 ` Eyal Lebedinsky 0 siblings, 0 replies; 16+ messages in thread From: Eyal Lebedinsky @ 2017-11-03 3:39 UTC (permalink / raw) To: Brad Campbell, list linux-raid On 03/11/17 14:03, Brad Campbell wrote: > On 03/11/17 10:31, Eyal Lebedinsky wrote: > >> The HighPoint had a habit of resetting a nearby disk when hot-removing another so I always did the swapping >> offline. Maybe the LSI is better? My 7 disk array (4TB WD blacks) had 11 replacements so far (4 years) ... >> > > I've swapped out everything from individual disks to entire arrays with the machine running. If possible and I care about the disk I'll take care to spin it down with hdparm first, but regardless the LSI controllers have behaved flawlessly. > > The only issue I see periodically is when a smart poll coincides with some seriously heavy activity I might get one or more messages like the following in dmesg : > > [1936062.640198] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303) > > Generally I only see those during my monthly array scrub where every disk in every array is going hammer and tongs simultaneously and I've never seen an issue related to those messages. I will keep an eye on such messages. A message I saw once is this one: kernel: mpt3sas 0000:01:00.0: invalid short VPD tag 00 at offset 1 which a web search suggests is effectively a warning. It coincides with an 'lspci' that runs nightly. On another machine I do at times see the same from a NIC: kernel: r8169 0000:04:00.0: invalid short VPD tag 00 at offset 1 Again, probably from 'lspci'. > Unrelated an with regard to SSD, for TRIM to work you need "returns deterministic" *and* "returns zero" set for the card to enable trim. I have some intel 330's that do, and some Samsung 830s that don't. Apparently the 840Pro was the only Samsung drive that did the business. > > When I do my next SSD upgrade (these are only 60% gone after 6 years) I'll seek out drives that have the right features. > > I recently upgraded one of my base servers and had to replace an 8 port LSI with a 16 port (2x2008 with 1x2016). I did the same firmware upgrade and there have been no performance or reliability issues. I really like the LSI cards. > > Odd to hear of your disk issues. I just swapped out 7 WD Green drives with 6 years on them. I started with 10 and lost 1 to early life, and 2 to grown defects in the last year or so (classic bathtub curve). They were early units though that still had TLER enabled on them. The only drives I've had mass attrition on were Seagate/Maxtor 1TB 7200.11, and they were known as not great units. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Eyal Lebedinsky (eyal@eyal.emu.id.au) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [sucess] upgrading LSI SAS9211-8i fw IR->IT 2017-11-03 2:31 ` Eyal Lebedinsky 2017-11-03 3:03 ` Brad Campbell @ 2017-11-06 20:15 ` Wolfgang Denk 2017-11-06 21:38 ` Eyal Lebedinsky 1 sibling, 1 reply; 16+ messages in thread From: Wolfgang Denk @ 2017-11-06 20:15 UTC (permalink / raw) To: Eyal Lebedinsky; +Cc: Brad Campbell, list linux-raid Dear Eyal, In message <c12a2a32-2321-1ed7-e1de-ce0e408552e1@eyal.emu.id.au> you wrote: > > (I had many disk failures/replacements in the 4 years life of this array). Reading this makes me wonder if you checked your environment for other influences. there must be some reason for an exceptional high number of failures. I remeber we also had a nighmare of disk errors in the rack in the 2nd floor of our building - which disappeared after moving the rack into the basement. I can't prove it, but I blame it on vibrations. We have a heavy traffic train line less than 50 meters away, and disks (classic, magnetic ones) definitely do not like vibrations - see [1]. Maybe you have other influences you did not check for yet? [1] https://www.youtube.com/watch?v=tDacjrSCeq4 Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, Managing Director: Wolfgang Denk HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Speculation is always more interesting than facts. - Terry Pratchett, _Making_Money_ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [sucess] upgrading LSI SAS9211-8i fw IR->IT 2017-11-06 20:15 ` Wolfgang Denk @ 2017-11-06 21:38 ` Eyal Lebedinsky 2017-11-06 21:48 ` Phil Turmel 0 siblings, 1 reply; 16+ messages in thread From: Eyal Lebedinsky @ 2017-11-06 21:38 UTC (permalink / raw) To: list linux-raid; +Cc: Wolfgang Denk, Brad Campbell On 07/11/17 07:15, Wolfgang Denk wrote: > Dear Eyal, > > In message <c12a2a32-2321-1ed7-e1de-ce0e408552e1@eyal.emu.id.au> you wrote: >> >> (I had many disk failures/replacements in the 4 years life of this array). > > Reading this makes me wonder if you checked your environment for > other influences. there must be some reason for an exceptional high > number of failures. > > I remeber we also had a nighmare of disk errors in the rack in the > 2nd floor of our building - which disappeared after moving the rack > into the basement. I can't prove it, but I blame it on vibrations. > We have a heavy traffic train line less than 50 meters away, and > disks (classic, magnetic ones) definitely do not like vibrations - > see [1]. Maybe you have other influences you did not check for yet? Interesting Wolfgang, - This array is at home, a relatively quiet place. - I monitor the disks temperatures and it is OK. - The machine runs off a UPS which can be a source of bad power (if the PS does not filter it out). - The HBA may be somehow bothering the disks? The disks are under warranty until late next year so there is time to see if the disks do better with the LSI. BTW, two of the RMAs were for disks that arrived DOA (as RMAs). I do not have high regard for the WD blacks. The failures were spread across the last 4 years (so not infant mortality). If nothing else, this experience made me comfortable with software raid, and encouraged me to stick to my backup schedule. cheers > [1] https://www.youtube.com/watch?v=tDacjrSCeq4 > > Best regards, > > Wolfgang Denk -- Eyal Lebedinsky (eyal@eyal.emu.id.au) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [sucess] upgrading LSI SAS9211-8i fw IR->IT 2017-11-06 21:38 ` Eyal Lebedinsky @ 2017-11-06 21:48 ` Phil Turmel 2017-11-06 21:59 ` Eyal Lebedinsky 0 siblings, 1 reply; 16+ messages in thread From: Phil Turmel @ 2017-11-06 21:48 UTC (permalink / raw) To: Eyal Lebedinsky, list linux-raid; +Cc: Wolfgang Denk, Brad Campbell On 11/06/2017 04:38 PM, Eyal Lebedinsky wrote: > On 07/11/17 07:15, Wolfgang Denk wrote: >> Dear Eyal, >> >> In message <c12a2a32-2321-1ed7-e1de-ce0e408552e1@eyal.emu.id.au> >> you wrote: >>> >>> (I had many disk failures/replacements in the 4 years life of >>> this array). >> >> Reading this makes me wonder if you checked your environment for >> other influences. there must be some reason for an exceptional >> high number of failures. >> >> I remeber we also had a nighmare of disk errors in the rack in the >> 2nd floor of our building - which disappeared after moving the >> rack into the basement. I can't prove it, but I blame it on >> vibrations. We have a heavy traffic train line less than 50 meters >> away, and disks (classic, magnetic ones) definitely do not like >> vibrations - see [1]. Maybe you have other influences you did not >> check for yet? > > Interesting Wolfgang, > > - This array is at home, a relatively quiet place. - I monitor the > disks temperatures and it is OK. - The machine runs off a UPS which > can be a source of bad power (if the PS does not filter it out). - > The HBA may be somehow bothering the disks? > > The disks are under warranty until late next year so there is time to > see if the disks do better with the LSI. > > BTW, two of the RMAs were for disks that arrived DOA (as RMAs). I do > not have high regard for the WD blacks. The failures were spread > across the last 4 years (so not infant mortality). > > If nothing else, this experience made me comfortable with software > raid, and encouraged me to stick to my backup schedule. That's a really bad failure rate. But they're WD Blacks, which if I recall correctly, do not support scterc. Did you deal with your driver timeouts? If not, those drives probably weren't really dead. Just not raid-compatible out-of-the-box. Phil ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [sucess] upgrading LSI SAS9211-8i fw IR->IT 2017-11-06 21:48 ` Phil Turmel @ 2017-11-06 21:59 ` Eyal Lebedinsky 0 siblings, 0 replies; 16+ messages in thread From: Eyal Lebedinsky @ 2017-11-06 21:59 UTC (permalink / raw) To: Phil Turmel, list linux-raid; +Cc: Wolfgang Denk, Brad Campbell On 07/11/17 08:48, Phil Turmel wrote: > On 11/06/2017 04:38 PM, Eyal Lebedinsky wrote: >> On 07/11/17 07:15, Wolfgang Denk wrote: >>> Dear Eyal, >>> >>> In message <c12a2a32-2321-1ed7-e1de-ce0e408552e1@eyal.emu.id.au> >>> you wrote: >>>> >>>> (I had many disk failures/replacements in the 4 years life of >>>> this array). >>> >>> Reading this makes me wonder if you checked your environment for >>> other influences. there must be some reason for an exceptional >>> high number of failures. >>> >>> I remeber we also had a nighmare of disk errors in the rack in the >>> 2nd floor of our building - which disappeared after moving the >>> rack into the basement. I can't prove it, but I blame it on >>> vibrations. We have a heavy traffic train line less than 50 meters >>> away, and disks (classic, magnetic ones) definitely do not like >>> vibrations - see [1]. Maybe you have other influences you did not >>> check for yet? >> >> Interesting Wolfgang, >> >> - This array is at home, a relatively quiet place. - I monitor the >> disks temperatures and it is OK. - The machine runs off a UPS which >> can be a source of bad power (if the PS does not filter it out). - >> The HBA may be somehow bothering the disks? >> >> The disks are under warranty until late next year so there is time to >> see if the disks do better with the LSI. >> >> BTW, two of the RMAs were for disks that arrived DOA (as RMAs). I do >> not have high regard for the WD blacks. The failures were spread >> across the last 4 years (so not infant mortality). >> >> If nothing else, this experience made me comfortable with software >> raid, and encouraged me to stick to my backup schedule. > > That's a really bad failure rate. But they're WD Blacks, which if I > recall correctly, do not support scterc. Did you deal with your driver > timeouts? Yes, my rc.local does # echo 180 >"/sys/block/$disk/device/timeout" The disks were let go after developing bad sectors (more than once with increased rate). I did not have a case of a disk kicked out due to lack of scterc. I *will* get more suitable disks next time. > If not, those drives probably weren't really dead. Just not > raid-compatible out-of-the-box. > > Phil -- Eyal Lebedinsky (eyal@eyal.emu.id.au) ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2017-11-06 21:59 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-10-24 11:47 upgrading LSI SAS9211-8i fw IR->IT Eyal Lebedinsky 2017-10-24 12:14 ` Roman Mamedov 2017-10-24 13:04 ` Eyal Lebedinsky 2017-10-24 17:59 ` Roman Mamedov 2017-10-26 3:31 ` [sucess?] " Eyal Lebedinsky 2017-10-27 23:09 ` Eyal Lebedinsky 2017-10-29 15:02 ` Brad Campbell 2017-11-02 10:51 ` [sucess] " Eyal Lebedinsky 2017-11-03 0:54 ` Brad Campbell 2017-11-03 2:31 ` Eyal Lebedinsky 2017-11-03 3:03 ` Brad Campbell 2017-11-03 3:39 ` Eyal Lebedinsky 2017-11-06 20:15 ` Wolfgang Denk 2017-11-06 21:38 ` Eyal Lebedinsky 2017-11-06 21:48 ` Phil Turmel 2017-11-06 21:59 ` Eyal Lebedinsky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox