Re: Raid working but stuck at 99.9%

Linux RAID subsystem development
 help / color / mirror / Atom feed

From: antlists <antlists@youngman.org.uk>
To: Teejay <teejay@gizzy.co.uk>, antlists <antlists@youngman.org.uk>,
	linux-raid@vger.kernel.org
Subject: Re: Raid working but stuck at 99.9%
Date: Sun, 3 Jan 2021 20:26:55 +0000	[thread overview]
Message-ID: <788a1d77-1712-be55-8a66-dc900fd6176e@youngman.org.uk> (raw)
In-Reply-To: <33f592e7-3408-4f9b-7146-11af526b1af8@gizzy.co.uk>

On 02/01/2021 23:10, Teejay wrote:
> On 02/01/2021 19:32, antlists wrote:
>> On 02/01/2021 12:37, Teejay wrote:
>>> Hi,
> 
> Point taken, but it is what it is and I need to fix it. The internet is 
> full of advice about RAID - most of it contradicts itself as there is 
> little real consensus (welcome to the 21st Century :-) . As a newbie, I 
> went with what seemed right at the time. I concede I made a bad call. 
> Let's move on.
> 
That's why I took on updating the wiki. To try and provide an up-to-date 
reference site. Problem is, like you did, people usually find the old 
duff stuff first.
>>>
>>> To upgrade the array I used the following command:
>>>
>>> sudo mdadm --grow /dev/md0 --level=5 --raid-devices=5 --add /dev/sde 
>>> /dev/sdf --backup-file=/tmp/grow_md0.bak
>>>
>>> To my surprise it returned almost instantly with no errors. So I had 
>>> a look at the status:

Did your example tell you to use --backup-file? If it did, I hope it 
wasn't the wiki!

If the --grow told you it needed a backup, I'd be surprised but if it 
asks for it it needs it.

Once you were trying to fix things, it would be normal for it to ask for 
the backup you originally gave it ...
>>>
>>> less /proc/mdstat
>>>
>>> and it came back as being a raid 5 array and stated that it was 
>>> reshape = 0.01% and would take several million minutes to complete! 
>>> Somewhat concerned, I left it for half an hour and tried again only 
>>> to find that the number of complete blocks was the same and the time 
>>> had grown to an even more crazy number. It was clear the process had 
>>> stalled.
>>
>> uname -a ?
> 
> Linux lounge 5.4.0-58-generic #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 
> x86_64 x86_64 x86_64 GNU/Linux
> 
>> mdadm --version ?
> 
> mdadm - v4.1 - 2018-10-01
> 
>>
>> This sounds similar to a problem we've regularly seen with raid-5. And 
>> it's noticeable with older Ubuntus.
> 
> My install of Ubuntu is not that old is it? - and it has all the 
> official updates.
> 
> lsb_release -a
> 
> Distributor ID:    Ubuntu
> Description:    Ubuntu 20.04.1 LTS
> Release:    20.04
> Codename:    focal
> 
That's new enough. So it's not (I hope) that problem.
>>>
>>
>> Ahhhhh ... you MAY be able to RMA them. What are they? If they're WD 
>> Reds I'd RMA them as a matter of course as "unfit for purpose". If 
>> they're BarraCudas, well, tough luck but you might get away with it. 
> RMA? - Newbie here!

Well, I don't know what the letters stand for, but the expression is a 
pretty standard term for "return to supplier". If you return anything 
that is defective, the supplier will usually ask you to "fill in an RMA".
> 
>>
>> Okay. What are those drives? I'm *guessing* your original three drives 
>> were WD Reds. What is the type number? If they're Reds this ends in 
>> EFAX or EFRX, if I remember correctly. I think EFAX are good and EFRX 
>> are bad. It could be the other way round ...
> 
> That means little to me. This is what I know: The four drives that form 
> the working array are the three original ones and one of the new ones. 
> Non of them are Reds. They are all the same Seagate drives, though it is 
> possible they are different internally as they were purchased at 
> different times, I have not opened them up. The Array is working and 
> AFAIK it is all there, I can find not evidence to the contrary. It 
> mounts (I have only tried Read Only), and I can access the data, with 
> not apparent issues.
> 
Dare I suggest you need to read this page ...

http://www.catb.org/~esr/faqs/smart-questions.html

If you haven't come across ESR he may be a bit of a nutter, but he is an 
extremely good psychologist - it is well worth reading!

I gave you a link. If you went there, the very first link in it gave you 
some advice - https://raid.wiki.kernel.org/index.php/Asking_for_help

YOU DIDN'T FOLLOW IT.

One of the things it asks for is the smartctl info for your drives - ALL 
OF THEM. It'll tell you the model number of your drives. How many 
different models do you have? Are they SMR? Google the model numbers and 
see if you can find out! If you come back and say you can't make head or 
tail of what you've found, that's fine. What's not fine is if you don't try.
> 
>>
>>> I need to somehow get back to a useful state. If I could get back to 
>>> level 0 with three drives that would be great. I could then delete 
>>> some junk and then backup the data the other two drives using rsync 
>>> or something.
>>>
>>> So I guess my questions are
>>>
>>> 1 - Can I safely get back to a three drive level 0 RAID thereby 
>>> freeing the two drives I added to allow me to make a backup of the data?
>>
>> I'll let others comment.
>>
>>> 2 - Even if I can revert, should I move my data and no longer even 
>>> use RAID 0 until I can get some decent hard drives?
>>
>> Don't mess with it yet.
>>
>>> 3 - Any other cunning ideas, at the moment I think my only option, if 
>>> I can't revert, which is to buy many TB's of storage to back up the 
>>> read only file system, which I can ill afford to do!
>>
Okay, to throw another option into the mix, get that 12TB BarraCuda, and 
copy your data across as your main drive. You're probably better off 
with an IronWolf, but I don't know if they come as a 12TB and they'd 
cost rather more ...

Then you can combine the other drives with btrfs as a backup volume. 
That way, if the 12TB breaks you've got a backup, and if one of the 4TBs 
break, btrfs means you only lose what's on that disk (which is "backed 
up" on your live disk ...)
> 
> Sounds like you misunderstood what I wrote; sarcasm is not a great way 
> of helping someone, especially when you only half read the email!
> 
Sorry, I don't mean to be sarcastic, and offering to spend more money 
usually gets vendors on side ...

> again let's move on!
> 
and imho, you NEED to back up your data, which I think means spending 
money, whether you can afford it or not :-(

I personally don't have experience playing with broken arrays (unlike 
others on this list), so I don't want to advise you to do something that 
trashes your array and loses everything ...
>>>
> 
> I did read the wiki, promise! More than once! - it does not seem to 
> cover my situation, or if it does, it makes a very good job of hiding 
> it. Like many such sites is was clearly written by someone that knows 
> exactly how everything works and forgets that the reader will not 
> necessary have a similar level of knowledge, in other words, it is not 
> for written for novices. It uses many abbreviations and has an 
> assumption of knowledge that is way beyond mine. Asking for stuff if 
> fine, but not much help if you don't know what it is asking for. While I 
> really appreciate the help, it is not useful if you to go on the 
> offensive. I said up front that I am a newbie. The best piece of advice 
> I could find on the wiki was not to do anything unless I was sure and to 
> ask for help first, thus the email. Unfortunately, I did not find the 
> wiki in time to avoid getting in to the mess, but when I did I followed 
> the advice and asked for help -  so all I ask is for some help getting 
> out of this mess. If you need more info, I will do my best to provide it 
> but please remember I am at the bottom looking up, and the view from 
> down here is not as clear as the view from where you stand; I understand 
> it can be difficult to remember that, I am an engineer too, just not one 
> that know anything about RAID :)
> 
You describe exactly how I often feel, so we do understand. And yes, 
when your raid isn't working properly I can understand the panic. Let's 
work out WHAT is wrong. I'm hoping, if your drives were purchased at 
different times, the problem might be the new drives only are SMR. If it 
looks like something I'm not happy dealing with, I can kick it up to 
people who have more experience.

Cheers,
Wol

next prev parent reply	other threads:[~2021-01-03 20:27 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-02 12:37 Raid working but stuck at 99.9% Teejay
2021-01-02 19:32 ` antlists
     [not found]   ` <33f592e7-3408-4f9b-7146-11af526b1af8@gizzy.co.uk>
2021-01-03 20:26     ` antlists [this message]
2021-01-03 21:34       ` Rudy Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=788a1d77-1712-be55-8a66-dc900fd6176e@youngman.org.uk \
    --to=antlists@youngman.org.uk \
    --cc=linux-raid@vger.kernel.org \
    --cc=teejay@gizzy.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox