From: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
To: "NeilBrown" <neilb@suse.de>
Cc: "Coly Li" <colyli@suse.de>,
linux-raid@vger.kernel.org,
"Benjamin Brunner" <bbrunner@suse.com>,
"Franck Bui" <fbui@suse.de>,
"Jes Sorensen" <jes@trained-monkey.org>,
"Xiao Ni" <xni@redhat.com>
Subject: Re: [PATCH] mdadm/systemd: remove KillMode=none from service file
Date: Tue, 2 Aug 2022 17:43:05 +0200 [thread overview]
Message-ID: <20220802174305.00000336@linux.intel.com> (raw)
In-Reply-To: <165905971898.4359.3905352912598347760@noble.neil.brown.name>
On Fri, 29 Jul 2022 11:55:18 +1000
"NeilBrown" <neilb@suse.de> wrote:
> On Thu, 28 Jul 2022, Mariusz Tkaczyk wrote:
> > On Tue, 15 Feb 2022 21:34:15 +0800
> > Coly Li <colyli@suse.de> wrote:
> >
> > > For mdadm's systemd configuration, current systemd KillMode is "none" in
> > > following service files,
> > > - mdadm-grow-continue@.service
> > > - mdmon@.service
> > >
> > > This "none" mode is strongly againsted by systemd developers (see man 5
> > > systemd.kill for "KillMode=" section), and is considering to remove in
> > > future systemd version.
> > >
> > > As systemd developer explained in disuccsion, the systemd kill process
> > > is,
> > > 1. send the signal specified by KillSignal= to the list of processes (if
> > > any), TERM is the default
> > > 2. wait until either the target of process(es) exit or a timeout expires
> > > 3. if the timeout expires send the signal specified by FinalKillSignal=,
> > > KILL is the default
> > >
> > > For "control-group", all remaining processes will receive the SIGTERM
> > > signal (by default) and if there are still processes after a period f
> > > time, they will get the SIGKILL signal.
> > >
> > > For "mixed", only the main process will receive the SIGTERM signal, and
> > > if there are still processes after a period of time, all remaining
> > > processes (including the main one) will receive the SIGKILL signal.
> > >
> > > From the above comment, currently KillMode=control-group is a propervi
> > > kill mode. Since control-gropu is the default kill mode, the fix can be
> > > simply removing KillMode=none line from the service file, then the
> > > default mode will take effect.
> >
> > Hi All,
> > We are experiencing issues with IMSM metadata on RHEL8.7 and 9.1 (the patch
> > was picked by Redhat). There are several issues which results in hang task,
> > characteristic to missing mdmon:
> >
> > [ 619.521440] task:umount state:D stack: 0 pid: 6285 ppid: flags:0x00004084
> > [ 619.534033] Call Trace:
> > [ 619.539980] __schedule+0x2d1/0x830
> > [ 619.547056] ? finish_wait+0x80/0x80
> > [ 619.554261] schedule+0x35/0xa0
> > [ 619.560999] md_write_start+0x14b/0x220
> > [ 619.568492] ? finish_wait+0x80/0x80
> > [ 619.575649] raid1_make_request+0x3c/0x90 [raid1]
> > [ 619.584111] md_handle_request+0x128/0x1b0
> > [ 619.591891] md_make_request+0x5b/0xb0
> > [ 619.599235] generic_make_request_no_check+0x202/0x330
> > [ 619.608185] submit_bio+0x3c/0x160
> > [ 619.615161] ? bio_add_page+0x42/0x50
> > [ 619.622413] submit_bh_wbc+0x16a/0x190
> > [ 619.629713] jbd2_write_superblock+0xf4/0x210 [jbd2]
> > [ 619.638340] jbd2_journal_update_sb_log_tail+0x65/0xc0 [jbd2]
> > [ 619.647773] __jbd2_update_log_tail+0x3f/0x100 [jbd2]
> > [ 619.656374] jbd2_cleanup_journal_tail+0x50/0x90 [jbd2]
> > [ 619.665107] jbd2_log_do_checkpoint+0xfa/0x400 [jbd2]
> > [ 619.673572] ? prepare_to_wait_event+0xa0/0x180
> > [ 619.681344] jbd2_journal_destroy+0x120/0x2a0 [jbd2]
> > [ 619.689551] ? finish_wait+0x80/0x80
> > [ 619.696096] ext4_put_super+0x76/0x390 [ext4]
> > [ 619.703584] generic_shutdown_super+0x6c/0x100
> > [ 619.711065] kill_block_super+0x21/0x50
> > [ 619.717809] deactivate_locked_super+0x34/0x70
> > [ 619.725146] cleanup_mnt+0x3b/0x70
> > [ 619.731279] task_work_run+0x8a/0xb0
> > [ 619.737576] exit_to_usermode_loop+0xeb/0xf0
> > [ 619.744657] do_syscall_64+0x198/0x1a0
> > [ 619.751155] entry_SYSCALL_64_after_hwframe+0x65/0xca
> >
> > It can be reproduced by mounting LVM created on IMSM RAID1 array and then
> > reboot. I verified that reverting the patch fixes the issue.
> >
> > I understand that from systemd perspective the behavior in not wanted, but
> > this is exactly what we need, to have working mdmon process even if systemd
> > was stopped. KillMode=none does the job.
> > I searched for alternative way to prevent systemd from stopping the mdmon
> > unit but I failed. I tried to change signals, so I configured unit to send
> > SIGPIPE (because it is ignored by mdmon)- it worked but later system hanged
> > because mdmon unit cannot be stopped.
> >
> > I also tried to configure mdmon unit to be stopped after umount.target and I
> > failed too. It cannot be achieved by setting After= or Before=. The one
> > objection I have here is that systemd-shutdown tries to stop raid arrays
> > later, so it could be better to have running mdmon there.
> >
> > IMO KillMode=none is desired in this case. Later, mdmon is restarted in
> > dracut by mdraid module.
> >
> > If there is no other solution for the problem, I will need to ask Jes to
> > revert this patch. For now, I asked Redhat to do it.
> > Do you have any suggestions?
>
> We should be able to make this work.
> We don't need mdmon after the last array stops, and we should have
> dependencies to tell systemd that the various arrays require mdmon.
> Ideally systemd wouldn't even try to stop mdmon until the relevant array
> was stopped.
>
> Can we change the udev rule to tell systemd that the device WANTS
> mdmon@foo.service??
Hi Neil,
This is done already:
https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/tree/udev-md-raid-arrays.rules#n41
but i can't find wants dependency in:
#systemctl show dev-md126.service
#systemctl show dev-md127.service
According to man:
https://www.freedesktop.org/software/systemd/man/systemd.device.html
there is nothing else I can do.
> Or add "Before=sys-devices-md-%I.device" or something like that to
> mdmon@.service ??
>
I got:
systemd[1]: /usr/lib/systemd/system/mdmon@.service:11: Failed to resolve unit
specifiers in 'dev-%I.device', ignoring: Invalid slot
> Do you know what exactly is causing systemd to hang because mdmon cannot
> be stopped? What other unit is waiting for it?
There is special umount.target
https://www.freedesktop.org/software/systemd/man/systemd.special.html
Probably it tries to umount every exiting .mount unit, i didn't check deeply.
https://www.freedesktop.org/software/systemd/man/systemd.mount.html
I can see that we can define something for .mount units so I tried both:
# mount -o x-systemd.after=mdmon@md127.service /dev/mapper/vg0-lvm_raid /mnt
# mount -o x-systemd.requires=mdmon@md127.service /dev/mapper/vg0-lvm_raid /mnt
but I doesn't help either. I seems that it is ignored because I cannot find
mdmon dependency in systemctl show output for mnt.mount unit.
Do you have any other ideas?
>
> Even if the root filesystems is on LVM on IMSM, doesn't systemd chroot
> back to the initramfs and then tear down the LVM and MD arrays???
Yes, this is how it works, mdmon is restarted in initrd later. System will
reboot successfully after timeout.
Thanks,
Mariusz
next prev parent reply other threads:[~2022-08-02 15:43 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-15 13:34 [PATCH] mdadm/systemd: remove KillMode=none from service file Coly Li
2022-04-06 6:36 ` Xiao Ni
2022-04-06 13:35 ` Jes Sorensen
2022-07-28 7:55 ` Mariusz Tkaczyk
2022-07-28 8:39 ` Coly Li
2022-07-28 9:01 ` Mariusz Tkaczyk
2022-07-28 10:55 ` Coly Li
2022-07-29 7:55 ` Mariusz Tkaczyk
2022-07-29 1:55 ` NeilBrown
2022-08-02 15:43 ` Mariusz Tkaczyk [this message]
2022-08-18 22:00 ` Michal Koutný
2022-08-24 9:52 ` Mariusz Tkaczyk
2022-08-24 12:03 ` Michal Koutný
2022-08-24 12:57 ` Mariusz Tkaczyk
2022-08-29 16:19 ` Michal Koutný
2022-10-04 10:24 ` Mariusz Tkaczyk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220802174305.00000336@linux.intel.com \
--to=mariusz.tkaczyk@linux.intel.com \
--cc=bbrunner@suse.com \
--cc=colyli@suse.de \
--cc=fbui@suse.de \
--cc=jes@trained-monkey.org \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
--cc=xni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).