From: Wols Lists <antlists@youngman.org.uk>
To: David Madore <david+ml@madore.org>
Cc: Linux RAID mailing-list <linux-raid@vger.kernel.org>
Subject: Re: RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start)
Date: Thu, 1 Oct 2020 15:10:21 +0100 [thread overview]
Message-ID: <5F75E34D.7030207@youngman.org.uk> (raw)
In-Reply-To: <20200930222637.mmlphc4patipalng@achernar.gro-tsen.net>
On 30/09/20 23:26, David Madore wrote:
> On Wed, Sep 30, 2020 at 09:16:10PM +0100, antlists wrote:
>> The problem is that if you use mdadm 3.4 with kernel 4.9.237, the 237 means
>> that your kernel has been heavily updated and is far too new. But if you use
>> mdadm 4.1 with kernel 4.9.237, the 4.9 means that the kernel is basically a
>> very old one - too old for mdadm 4.1
>
> But the point of the longterm kernel lines like 4.9.237 is to keep
> strict compatibility with the original branch point (that's the point
> of a "stable" line) and perform only bugfixes, isn't it? Do you mean
> to say that there is NO stable kernel line with full mdadm support?
> Or just the ones provided by distributions? (But don't distributions
> like Debian do exactly the same thing as GKH and others with these
> longterm lines? I.e., fix bugs while keeping strict compatibility.
> If there are no longterm stable kernels with full RAID support, I find
> this rather worrying.)
Depends what you mean by full RAID support. Any kernel (within limits)
should work with any raid. We've found, by experience, that trying to
upgrade a raid can have problems ... :-)
>
> But in my specific case, the issue didn't come from a mdadm/kernel
> mismatch after all: I performed further investigation after I wrote my
> previous message, and my problem did indeed come from the
> /lib/systemd/system/mdadm-grow-continue@.service which, as far as I
> can tell, is broken insofar as --backup-file=... goes (the option is
> needed for --continue to work and it isn't passed). Furthermore, this
> file appears to be distributed by mdadm itself (it's not
> Debian-specific), and the systemd service is called by mdadm (from
> continue_via_systemd() in Grow.c).
Except is this the problem? If the reshape fails to start, I don't quite
see how the restart service-file can be to blame?
>
> So it seems to me that RAID reshaping with backup files is currently
> broken on all systems which use systemd. But then I'm confused as to
> why this didn't get more attention. Anyway, if you have any
> suggestion as to where I should bugreport this, it's the least I can
> do.
It works fine with a "latest and greatest" kernel and mdadm ... that
said, we know that there's been a fair bit of general house-keeping and
tidying up going on.
>
> In my particular setup, after giving this more thought, I thought the
> wisest thing would be to get tons of external storage, copy everything
> away, recreate a fresh RAID6 array, and copy everything back into it.
Well, I'm thinking of getting a huge shingled disk for backups :-) but
if that's worked for you, great.
>
> Whatever the case, thanks for your help.
>
And thank you for documenting what's going wrong. I doubt much work will
go in to fixing it for Debian 9, but if it really is a problem and rears
its head again, at least we'll have more info to start digging. I'll
make a note of this ...
But this is exactly the problem with the concept of LTS. Yes I
understand why people want LTS, but if the kernel accumulates bug-fixes
and patches it will get out of sync with user-space. And yes, the
intention is to minimise this as much as possible, but mdadm 3.4 is a
lot older (and known to be buggy) compared to your updated kernel, but
your updated the kernel is still anchored firmly in the past relative to
mdadm 4.1. LTS is a work-around to cope with the fact that time flows ...
Oh - and as for backup files - newer arrays by default don't need or use
them. So that again could be part of the problem ...
Cheers,
Wol
next prev parent reply other threads:[~2020-10-01 14:10 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-30 1:40 RAID5->RAID6 reshape remains stuck at 0% (does nothing, not even start) David Madore
2020-09-30 4:03 ` Wols Lists
2020-09-30 9:00 ` David Madore
2020-09-30 14:09 ` antlists
2020-09-30 18:58 ` David Madore
2020-09-30 19:03 ` Wols Lists
2020-09-30 19:45 ` David Madore
2020-09-30 20:16 ` antlists
2020-09-30 22:26 ` David Madore
2020-10-01 14:10 ` Wols Lists [this message]
2020-10-01 15:04 ` David Madore
2020-10-01 18:21 ` Phil Turmel
2020-10-02 10:52 ` Nix
-- strict thread matches above, loose matches on Subject: below --
2020-09-30 0:53 David Madore
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5F75E34D.7030207@youngman.org.uk \
--to=antlists@youngman.org.uk \
--cc=david+ml@madore.org \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.