Re: [systemd-devel] Errorneous detection of degraded array

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Andrei Borzenkov <arvidjaar@gmail.com>
To: Luke Pyzowski <Luke@sunrisefutures.com>,
	"'systemd-devel@lists.freedesktop.org'"
	<systemd-devel@lists.freedesktop.org>,
	"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: [systemd-devel] Errorneous detection of degraded array
Date: Sat, 28 Jan 2017 20:34:26 +0300	[thread overview]
Message-ID: <4504399b-4d6f-a18a-d64a-e46ecd8efa46@gmail.com> (raw)
In-Reply-To: <96A26C8C6786C341B83BC4F2BC5419E4795DF1D8@SRF-EXCH1.corp.sunrisefutures.com>

27.01.2017 22:44, Luke Pyzowski пишет:
...
> Jan 27 11:33:14 lnxnfs01 kernel: md/raid:md0: raid level 6 active with 24 out of 24 devices, algorithm 2
...
> Jan 27 11:33:14 lnxnfs01 kernel: md0: detected capacity change from 0 to 45062020923392
> Jan 27 11:33:14 lnxnfs01 systemd[1]: Found device /dev/disk/by-uuid/2b9114be-3d5a-41d7-8d4b-e5047d223129.
> Jan 27 11:33:14 lnxnfs01 systemd[1]: Started udev Wait for Complete Device Initialization.
> Jan 27 11:33:14 lnxnfs01 systemd[1]: Started Timer to wait for more drives before activating degraded array..
> Jan 27 11:33:14 lnxnfs01 systemd[1]: Starting Timer to wait for more drives before activating degraded array..
...
> 
> ... + 31 seconds from disk initialization, expiration of 30 second timer from mdadm-last-resort@.timer
> 
> Jan 27 11:33:45 lnxnfs01 systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
> Jan 27 11:33:45 lnxnfs01 systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice.
> Jan 27 11:33:45 lnxnfs01 systemd[1]: Stopped target Local File Systems.
> Jan 27 11:33:45 lnxnfs01 systemd[1]: Stopping Local File Systems.
> Jan 27 11:33:45 lnxnfs01 systemd[1]: Unmounting Mount /share RAID partition explicitly...
> Jan 27 11:33:45 lnxnfs01 systemd[1]: Starting Activate md array even though degraded...
> Jan 27 11:33:45 lnxnfs01 systemd[1]: Stopped (with error) /dev/md0.
> Jan 27 11:33:45 lnxnfs01 systemd[1]: Started Activate md array even though degraded.
> Jan 27 11:33:45 lnxnfs01 systemd[1]: Unmounted Mount /share RAID partition explicitly.
> 

Here is my educated guess.

Both mdadm-last-resort@.timer and mdadm-last-resort@.service conflict
with MD device:

bor@bor-Latitude-E5450:~/src/systemd$ cat ../mdadm/systemd/
mdadm-grow-continue@.service  mdadm.shutdown
SUSE-mdadm_env.sh
mdadm-last-resort@.service    mdmonitor.service
mdadm-last-resort@.timer      mdmon@.service
bor@bor-Latitude-E5450:~/src/systemd$ cat
../mdadm/systemd/mdadm-last-resort@.timer
[Unit]
Description=Timer to wait for more drives before activating degraded array.
DefaultDependencies=no
Conflicts=sys-devices-virtual-block-%i.device

[Timer]
OnActiveSec=30
bor@bor-Latitude-E5450:~/src/systemd$ cat
../mdadm/systemd/mdadm-last-resort@.service
[Unit]
Description=Activate md array even though degraded
DefaultDependencies=no
Conflicts=sys-devices-virtual-block-%i.device

[Service]
Type=oneshot
ExecStart=BINDIR/mdadm --run /dev/%i

I presume intention is to stop these units when MD device is finally
assembled as complete. This is indeed what happens on my (test) system:

Jan 28 14:18:04 linux-ffk5 kernel: md: bind<vda1>
Jan 28 14:18:04 linux-ffk5 kernel: md: bind<vdb1>
Jan 28 14:18:05 linux-ffk5 kernel: md/raid1:md0: active with 2 out of 2
mirrors
Jan 28 14:18:05 linux-ffk5 kernel: md0: detected capacity change from 0
to 5363466240
Jan 28 14:18:06 linux-ffk5 systemd[1]: mdadm-last-resort@md0.timer:
Installed new job mdadm-last-resort@md0.timer/start as 287
Jan 28 14:18:06 linux-ffk5 systemd[1]: mdadm-last-resort@md0.timer:
Enqueued job mdadm-last-resort@md0.timer/start as 287
Jan 28 14:18:06 linux-ffk5 systemd[1]: dev-ttyS9.device: Changed dead ->
plugged
Jan 28 14:18:07 linux-ffk5 systemd[1]: mdadm-last-resort@md0.timer:
Changed dead -> waiting
Jan 28 14:18:12 linux-ffk5 systemd[1]:
sys-devices-virtual-block-md0.device: Changed dead -> plugged
Jan 28 14:18:12 linux-ffk5 systemd[1]: mdadm-last-resort@md0.timer:
Trying to enqueue job mdadm-last-resort@md0.timer/stop/replace
Jan 28 14:18:12 linux-ffk5 systemd[1]: mdadm-last-resort@md0.timer:
Installed new job mdadm-last-resort@md0.timer/stop as 292
Jan 28 14:18:12 linux-ffk5 systemd[1]: mdadm-last-resort@md0.timer:
Enqueued job mdadm-last-resort@md0.timer/stop as 292
Jan 28 14:18:12 linux-ffk5 systemd[1]: mdadm-last-resort@md0.timer:
Changed waiting -> dead
Jan 28 14:18:12 linux-ffk5 systemd[1]: mdadm-last-resort@md0.timer: Job
mdadm-last-resort@md0.timer/stop finished, result=done
Jan 28 14:18:12 linux-ffk5 systemd[1]: Stopped Timer to wait for more
drives before activating degraded array..
Jan 28 14:19:34 10 systemd[1692]: dev-vda1.device: Changed dead -> plugged
Jan 28 14:19:34 10 systemd[1692]: dev-vdb1.device: Changed dead -> plugged

On your system apparently timer is not stopped when md device appears so
that when later last-resort service runs, it causes attempt to stop md
device (due to conflict) and transitively mount on top of it.

Could you try run with systemd.log_level=debug on kernel command line
and upload journal again. We can only hope that it will not skew timings
enough but it may prove my hypothesis.

next prev parent reply	other threads:[~2017-01-28 17:34 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <96A26C8C6786C341B83BC4F2BC5419E4795DE9A6@SRF-EXCH1.corp.sunrisefutures.com>
2017-01-27  7:12 ` [systemd-devel] Errorneous detection of degraded array Andrei Borzenkov
2017-01-27  8:25   ` Martin Wilck
2017-01-27 19:44   ` Luke Pyzowski
2017-01-28 17:34     ` Andrei Borzenkov [this message]
2017-01-30 22:41       ` [systemd-devel] " Luke Pyzowski
2017-01-30  1:53   ` NeilBrown
2017-01-30  3:40     ` Andrei Borzenkov
2017-01-30  6:36       ` NeilBrown
2017-01-30  7:29         ` Andrei Borzenkov
2017-01-30 22:19           ` [systemd-devel] " NeilBrown
2017-01-31 20:17             ` Andrei Borzenkov
2017-02-08  4:10               ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4504399b-4d6f-a18a-d64a-e46ecd8efa46@gmail.com \
    --to=arvidjaar@gmail.com \
    --cc=Luke@sunrisefutures.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=systemd-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).