From: Wols Lists <antlists@youngman.org.uk>
To: David Madore <david+ml@madore.org>
Cc: Linux RAID mailing-list <linux-raid@vger.kernel.org>
Subject: Re: RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start)
Date: Thu, 1 Oct 2020 15:10:21 +0100 [thread overview]
Message-ID: <5F75E34D.7030207@youngman.org.uk> (raw)
In-Reply-To: <20200930222637.mmlphc4patipalng@achernar.gro-tsen.net>
On 30/09/20 23:26, David Madore wrote:
> On Wed, Sep 30, 2020 at 09:16:10PM +0100, antlists wrote:
>> The problem is that if you use mdadm 3.4 with kernel 4.9.237, the 237 means
>> that your kernel has been heavily updated and is far too new. But if you use
>> mdadm 4.1 with kernel 4.9.237, the 4.9 means that the kernel is basically a
>> very old one - too old for mdadm 4.1
>
> But the point of the longterm kernel lines like 4.9.237 is to keep
> strict compatibility with the original branch point (that's the point
> of a "stable" line) and perform only bugfixes, isn't it? Do you mean
> to say that there is NO stable kernel line with full mdadm support?
> Or just the ones provided by distributions? (But don't distributions
> like Debian do exactly the same thing as GKH and others with these
> longterm lines? I.e., fix bugs while keeping strict compatibility.
> If there are no longterm stable kernels with full RAID support, I find
> this rather worrying.)
Depends what you mean by full RAID support. Any kernel (within limits)
should work with any raid. We've found, by experience, that trying to
upgrade a raid can have problems ... :-)
>
> But in my specific case, the issue didn't come from a mdadm/kernel
> mismatch after all: I performed further investigation after I wrote my
> previous message, and my problem did indeed come from the
> /lib/systemd/system/mdadm-grow-continue@.service which, as far as I
> can tell, is broken insofar as --backup-file=... goes (the option is
> needed for --continue to work and it isn't passed). Furthermore, this
> file appears to be distributed by mdadm itself (it's not
> Debian-specific), and the systemd service is called by mdadm (from
> continue_via_systemd() in Grow.c).
Except is this the problem? If the reshape fails to start, I don't quite
see how the restart service-file can be to blame?
>
> So it seems to me that RAID reshaping with backup files is currently
> broken on all systems which use systemd. But then I'm confused as to
> why this didn't get more attention. Anyway, if you have any
> suggestion as to where I should bugreport this, it's the least I can
> do.
It works fine with a "latest and greatest" kernel and mdadm ... that
said, we know that there's been a fair bit of general house-keeping and
tidying up going on.
>
> In my particular setup, after giving this more thought, I thought the
> wisest thing would be to get tons of external storage, copy everything
> away, recreate a fresh RAID6 array, and copy everything back into it.
Well, I'm thinking of getting a huge shingled disk for backups :-) but
if that's worked for you, great.
>
> Whatever the case, thanks for your help.
>
And thank you for documenting what's going wrong. I doubt much work will
go in to fixing it for Debian 9, but if it really is a problem and rears
its head again, at least we'll have more info to start digging. I'll
make a note of this ...
But this is exactly the problem with the concept of LTS. Yes I
understand why people want LTS, but if the kernel accumulates bug-fixes
and patches it will get out of sync with user-space. And yes, the
intention is to minimise this as much as possible, but mdadm 3.4 is a
lot older (and known to be buggy) compared to your updated kernel, but
your updated the kernel is still anchored firmly in the past relative to
mdadm 4.1. LTS is a work-around to cope with the fact that time flows ...
Oh - and as for backup files - newer arrays by default don't need or use
them. So that again could be part of the problem ...
Cheers,
Wol
next prev parent reply other threads:[~2020-10-01 14:10 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-30 1:40 RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start) David Madore
2020-09-30 4:03 ` Wols Lists
2020-09-30 9:00 ` David Madore
2020-09-30 14:09 ` antlists
2020-09-30 18:58 ` David Madore
2020-09-30 19:03 ` Wols Lists
2020-09-30 19:45 ` David Madore
2020-09-30 20:16 ` antlists
2020-09-30 22:26 ` David Madore
2020-10-01 14:10 ` Wols Lists [this message]
2020-10-01 15:04 ` David Madore
2020-10-01 18:21 ` Phil Turmel
2020-10-02 10:52 ` Nix
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5F75E34D.7030207@youngman.org.uk \
--to=antlists@youngman.org.uk \
--cc=david+ml@madore.org \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox