linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Andrei Borzenkov <arvidjaar@gmail.com>
Cc: Luke Pyzowski <Luke@sunrisefutures.com>,
	"systemd-devel@lists.freedesktop.org"
	<systemd-devel@lists.freedesktop.org>,
	linux-raid@vger.kernel.org
Subject: Re: [systemd-devel] Errorneous detection of degraded array
Date: Tue, 31 Jan 2017 09:19:48 +1100	[thread overview]
Message-ID: <8760kwry0r.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <CAA91j0V92RXDky-AnD6w+Dy=M7KJVCWyssA7yHRfRqBxLTWvog@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3172 bytes --]

On Mon, Jan 30 2017, Andrei Borzenkov wrote:

> On Mon, Jan 30, 2017 at 9:36 AM, NeilBrown <neilb@suse.com> wrote:
> ...
>>>>>>
>>>>>> systemd[1]: Created slice system-mdadm\x2dlast\x2dresort.slice.
>>>>>> systemd[1]: Starting system-mdadm\x2dlast\x2dresort.slice.
>>>>>> systemd[1]: Starting Activate md array even though degraded...
>>>>>> systemd[1]: Stopped target Local File Systems.
>>>>>> systemd[1]: Stopping Local File Systems.
>>>>>> systemd[1]: Unmounting /share...
>>>>>> systemd[1]: Stopped (with error) /dev/md0.
>>>>
> ...
>>
>> The race is, I think, that one I mentioned.  If the md device is started
>> before udev tells systemd to start the timer, the Conflicts dependencies
>> goes the "wrong" way and stops the wrong thing.
>>
>
> From the logs provided it is unclear whether it is *timer* or
> *service*. If it is timer - I do not understand why it is started
> exactly 30 seconds after device apparently appears. This would match
> starting service.

My guess is that the timer is triggered immediately after the device is
started, but before it is mounted.
The Conflicts directive tries to stop the device, but is cannot stop the
device and there are no dependencies yet, so nothing happen.
After the timer fires (30 seconds later) the .service starts.  It also
has a Conflicts directory so systemd tried to stop the device again.
Now that it has been mounted, there is a dependences that can be
stopped, and the device gets unmounted.

>
> Yet another case where system logging is hopelessly unfriendly for
> troubleshooting :(
>
>> It would be nice to be able to reliably stop the timer when the device
>> starts, without risking having the device get stopped when the timer
>> starts, but I don't think we can reliably do that.
>>
>
> Well, let's wait until we can get some more information about what happens.
>
>> Changing the
>>   Conflicts=sys-devices-virtual-block-%i.device
>> lines to
>>   ConditionPathExists=/sys/devices/virtual/block/%i
>> might make the problem go away, without any negative consequences.
>>
>
> Ugly, but yes, may be this is the only way using current systemd.
>
>> The primary purpose of having the 'Conflicts' directives was so that
>> systemd wouldn't log
>>   Starting Activate md array even though degraded
>> after the array was successfully started.
>
> This looks like cosmetic problem. What will happen if last resort
> service is started when array is fully assembled? Will it do any harm?

Yes, it could be seen as cosmetic, but cosmetic issues can be important
too.  Confusing messages in logs can be harmful.

In all likely cases, running the last-resort service won't cause any
harm.
If, during the 30 seconds, the array is started, then deliberately
stopped, then partially assembled again, then when the last-resort
service finally starts it might do the wrong thing.
So it would be cleanest if the timer was killed as soon as the device
is started.  But I don't think there is a practical concern.

I guess I could make a udev rule that fires when the array started, and
that runs "systemctl stop mdadm-last-resort@md0.timer"

NeilBrown


>
>> Hopefully it won't do that when the Condition fails.
>>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2017-01-30 22:19 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <96A26C8C6786C341B83BC4F2BC5419E4795DE9A6@SRF-EXCH1.corp.sunrisefutures.com>
2017-01-27  7:12 ` [systemd-devel] Errorneous detection of degraded array Andrei Borzenkov
2017-01-27  8:25   ` Martin Wilck
2017-01-27 19:44   ` Luke Pyzowski
2017-01-28 17:34     ` [systemd-devel] " Andrei Borzenkov
2017-01-30 22:41       ` Luke Pyzowski
2017-01-30  1:53   ` NeilBrown
2017-01-30  3:40     ` Andrei Borzenkov
2017-01-30  6:36       ` NeilBrown
2017-01-30  7:29         ` Andrei Borzenkov
2017-01-30 22:19           ` NeilBrown [this message]
2017-01-31 20:17             ` [systemd-devel] " Andrei Borzenkov
2017-02-08  4:10               ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8760kwry0r.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=Luke@sunrisefutures.com \
    --cc=arvidjaar@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=systemd-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).