linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Assorted patches relating to mdmon
@ 2023-02-27  0:13 NeilBrown
  2023-02-27  0:13 ` [PATCH 2/6] Improvements for IMSM_NO_PLATFORM testing NeilBrown
                   ` (6 more replies)
  0 siblings, 7 replies; 12+ messages in thread
From: NeilBrown @ 2023-02-27  0:13 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: linux-raid, Martin Wilck, Mariusz Tkaczyk

mdmon is a root-storage daemon is the sense defined by systemd
documentation, but it does not follow the practice that systemd
recommends.  Specifically it is run from the root filesystem when
possible.  The instance started in the initrd hands-over to a root-fs
based instance, which then hands-over to an initrd-based instance
started by dracut at shutdown.

Part of the reason that we ignore systemd advise is that mdmon needs
access to the filesystem - specifically /dev and /sys - which is not
available in the initrd context after switchroot.  We could possibly
change mdmon to work in the systemd-preferred way by splitting mdmon
into two processes instead of just having 2 threads.  The "monitor"
process could running entirely in the initrd context, the "manager"
process could safely run in the root-fs context, passing newly opened
file descriptors to the monitor over a unix-domain socket.

But we aren't there yet and may never be.

For now, mdmon doesn't work correctly.  There is no mechanism to ensure
a new instance starts after switchroot.  Until recently the initrd
instance of the systemd mdmon unit would be stopped at switchroot time
because udev would temporarily forget about md devices.  This would
allow the "udevadm trigger" process to start a new instance.  udev was
recently fixed:

Commit: 7ec624147a41 ("udevadm: cleanup-db: don't delete information for kept db entries")

so now the attempt to start mdmon via "udevadm trigger" does nothing as
mdmon already has an active unit.

The net result is that mdmon continues running in the initrd mount
namespace and so cannot access new devices.  Adding a device to a root
md array that depends on mdmon will no longer work.

We want the initrd instance of mdmon to continue running until the
root-fs based instance starts, and that really requires we have two
different systemd units.  This series achieves this in the final patch by
using a different instance name inside or initrd and outside.
"initrd-mdfoo" and "mdfoo".

Other patches in the series are mostly clean-ups and minor improvements
in related code.

NeilBrown



---

NeilBrown (6):
      Use existence of /etc/initrd-release to detect initrd.
      Improvements for IMSM_NO_PLATFORM testing.
      mdmon: don't test both 'all' and 'container_name'.
      mdmon: change systemd unit file to use --foreground
      mdmon: Remove need for KillMode=none
      mdmon improvements for switchroot


 Grow.c                    |  4 ++--
 mdadm.h                   |  4 +++-
 mdmon.c                   | 21 ++++++++++++-------
 super-intel.c             | 43 ++++++++++++++++++++++++++++++++++++---
 systemd/mdmon@.service    | 15 +++++++-------
 udev-md-raid-arrays.rules |  3 ++-
 util.c                    | 17 +++++-----------
 7 files changed, 74 insertions(+), 33 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/6] Use existence of /etc/initrd-release to detect initrd.
  2023-02-27  0:13 [PATCH 0/6] Assorted patches relating to mdmon NeilBrown
                   ` (4 preceding siblings ...)
  2023-02-27  0:13 ` [PATCH 4/6] mdmon: change systemd unit file to use --foreground NeilBrown
@ 2023-02-27  0:13 ` NeilBrown
  2023-03-01  8:55 ` [PATCH 0/6] Assorted patches relating to mdmon Mariusz Tkaczyk
  6 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2023-02-27  0:13 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: linux-raid, Martin Wilck, Mariusz Tkaczyk

Since v183, systemd has used the existence of /etc/initrd-release to
detect if it is running in an initrd, rather than looking at the magic
number of the root filesystem's device.  It is time for mdadm to do the
same.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 util.c |   10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/util.c b/util.c
index 9cd89fa416bd..6b44662db7cd 100644
--- a/util.c
+++ b/util.c
@@ -2217,15 +2217,7 @@ int continue_via_systemd(char *devnm, char *service_name)
 
 int in_initrd(void)
 {
-	/* This is based on similar function in systemd. */
-	struct statfs s;
-	/* statfs.f_type is signed long on s390x and MIPS, causing all
-	   sorts of sign extension problems with RAMFS_MAGIC being
-	   defined as 0x858458f6 */
-	return  statfs("/", &s) >= 0 &&
-		((unsigned long)s.f_type == TMPFS_MAGIC ||
-		 ((unsigned long)s.f_type & 0xFFFFFFFFUL) ==
-		 ((unsigned long)RAMFS_MAGIC & 0xFFFFFFFFUL));
+	return access("/etc/initrd-release", F_OK) >= 0;
 }
 
 void reopen_mddev(int mdfd)



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/6] Improvements for IMSM_NO_PLATFORM testing.
  2023-02-27  0:13 [PATCH 0/6] Assorted patches relating to mdmon NeilBrown
@ 2023-02-27  0:13 ` NeilBrown
  2023-02-27  0:13 ` [PATCH 3/6] mdmon: don't test both 'all' and 'container_name' NeilBrown
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2023-02-27  0:13 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: linux-raid, Martin Wilck, Mariusz Tkaczyk

Factor out IMSM_NO_PLATFORM testing into a single function that caches
the result.

Allow mdmon to explicitly set the result to "1" so that we don't need
the ENV var in the unit file

Check if the kernel command line contains "mdadm.imsm.test=1" and in
that case assert NO_PLATFORM.  This simplifies testing in a virtual
machine.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 mdadm.h                |    2 ++
 mdmon.c                |    6 ++++++
 super-intel.c          |   43 ++++++++++++++++++++++++++++++++++++++++---
 systemd/mdmon@.service |    3 ---
 4 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/mdadm.h b/mdadm.h
index 13f8b4cb5a6b..0939fd3cb1ee 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -1258,6 +1258,8 @@ extern struct superswitch super0, super1;
 extern struct superswitch super_imsm, super_ddf;
 extern struct superswitch mbr, gpt;
 
+void imsm_set_no_platform(int v);
+
 struct metadata_update {
 	int	len;
 	char	*buf;
diff --git a/mdmon.c b/mdmon.c
index 60ba318253b9..f557e12c6533 100644
--- a/mdmon.c
+++ b/mdmon.c
@@ -318,6 +318,12 @@ int main(int argc, char *argv[])
 		{NULL, 0, NULL, 0}
 	};
 
+	/*
+	 * mdmon should never complain due to lack of a platform,
+	 * that is mdadm's job if at all.
+	 */
+	imsm_set_no_platform(1);
+
 	while ((opt = getopt_long(argc, argv, "thaF", options, NULL)) != -1) {
 		switch (opt) {
 		case 'a':
diff --git a/super-intel.c b/super-intel.c
index 89fac6263172..9180d050083a 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -629,6 +629,43 @@ static const char *_sys_dev_type[] = {
 	[SYS_DEV_VMD] = "VMD"
 };
 
+static int no_platform = -1;
+
+static int check_no_platform(void)
+{
+	static const char search[] = "mdadm.imsm.test=1";
+	int fd;
+	char buf[1024];
+	int n = 0;
+
+	if (no_platform >= 0)
+		return no_platform;
+
+	if (check_env("IMSM_NO_PLATFORM")) {
+		no_platform = 1;
+		return 1;
+	}
+	fd = open("/proc/cmdline", O_RDONLY);
+	if (fd >= 0) {
+		n = read(fd, buf, sizeof(buf)-1);
+		close(fd);
+	}
+	if (n >= (int)sizeof(search)) {
+		buf[n] = 0;
+		if (strstr(buf, search) != NULL) {
+			no_platform = 1;
+			return 1;
+		}
+	}
+	no_platform = 0;
+	return 0;
+}
+
+void imsm_set_no_platform(int v)
+{
+	no_platform = v;
+}
+
 const char *get_sys_dev_type(enum sys_dev_type type)
 {
 	if (type >= SYS_DEV_MAX)
@@ -2699,7 +2736,7 @@ static int detail_platform_imsm(int verbose, int enumerate_only, char *controlle
 	int result=1;
 
 	if (enumerate_only) {
-		if (check_env("IMSM_NO_PLATFORM"))
+		if (check_no_platform())
 			return 0;
 		list = find_intel_devices();
 		if (!list)
@@ -4721,7 +4758,7 @@ static int find_intel_hba_capability(int fd, struct intel_super *super, char *de
 		       devname);
 		return 1;
 	}
-	if (!is_fd_valid(fd) || check_env("IMSM_NO_PLATFORM")) {
+	if (!is_fd_valid(fd) || check_no_platform()) {
 		super->orom = NULL;
 		super->hba = NULL;
 		return 0;
@@ -10696,7 +10733,7 @@ static int imsm_get_allowed_degradation(int level, int raid_disks,
  ******************************************************************************/
 int validate_container_imsm(struct mdinfo *info)
 {
-	if (check_env("IMSM_NO_PLATFORM"))
+	if (check_no_platform())
 		return 0;
 
 	struct sys_dev *idev;
diff --git a/systemd/mdmon@.service b/systemd/mdmon@.service
index cb6482d9ff29..e7ee17a8bf91 100644
--- a/systemd/mdmon@.service
+++ b/systemd/mdmon@.service
@@ -12,9 +12,6 @@ Before=initrd-switch-root.target
 Documentation=man:mdmon(8)
 
 [Service]
-# mdmon should never complain due to lack of a platform,
-# that is mdadm's job if at all.
-Environment=IMSM_NO_PLATFORM=1
 # The mdmon starting in the initramfs (with dracut at least)
 # cannot see sysfs after root is mounted, so we will have to
 # 'takeover'.  As the '--offroot --takeover' don't hurt when



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/6] mdmon: don't test both 'all' and 'container_name'.
  2023-02-27  0:13 [PATCH 0/6] Assorted patches relating to mdmon NeilBrown
  2023-02-27  0:13 ` [PATCH 2/6] Improvements for IMSM_NO_PLATFORM testing NeilBrown
@ 2023-02-27  0:13 ` NeilBrown
  2023-02-27  0:13 ` [PATCH 5/6] mdmon: Remove need for KillMode=none NeilBrown
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2023-02-27  0:13 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: linux-raid, Martin Wilck, Mariusz Tkaczyk

If 'all' is not set, then container_name must be NULL, as nothing else
can set it.  So simplify the test it ignore container_name.
This makes the purpose of the code more obvious.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 mdmon.c |   11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/mdmon.c b/mdmon.c
index f557e12c6533..6d37b17c3f53 100644
--- a/mdmon.c
+++ b/mdmon.c
@@ -358,7 +358,6 @@ int main(int argc, char *argv[])
 		}
 	}
 
-
 	if (in_initrd()) {
 		/*
 		 * set first char of argv[0] to @. This is used by
@@ -368,12 +367,10 @@ int main(int argc, char *argv[])
 		argv[0][0] = '@';
 	}
 
-	if (all == 0 && container_name == NULL) {
-		if (argv[optind]) {
-			container_name = get_md_name(argv[optind]);
-			if (!container_name)
-				return 1;
-		}
+	if (!all && argv[optind]) {
+		container_name = get_md_name(argv[optind]);
+		if (!container_name)
+			return 1;
 	}
 
 	if (container_name == NULL || argc - optind > 1)



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/6] mdmon: change systemd unit file to use --foreground
  2023-02-27  0:13 [PATCH 0/6] Assorted patches relating to mdmon NeilBrown
                   ` (3 preceding siblings ...)
  2023-02-27  0:13 ` [PATCH 6/6] mdmon improvements for switchroot NeilBrown
@ 2023-02-27  0:13 ` NeilBrown
  2023-02-27  0:13 ` [PATCH 1/6] Use existence of /etc/initrd-release to detect initrd NeilBrown
  2023-03-01  8:55 ` [PATCH 0/6] Assorted patches relating to mdmon Mariusz Tkaczyk
  6 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2023-02-27  0:13 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: linux-raid, Martin Wilck, Mariusz Tkaczyk

There is no value in mdmon forking when it is running under systemd -
systemd can still track it anyway.

So add --foreground option, and remove "Type=forking".

Signed-off-by: NeilBrown <neilb@suse.de>
---
 systemd/mdmon@.service |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/systemd/mdmon@.service b/systemd/mdmon@.service
index e7ee17a8bf91..3502fadc3fc4 100644
--- a/systemd/mdmon@.service
+++ b/systemd/mdmon@.service
@@ -17,8 +17,7 @@ Documentation=man:mdmon(8)
 # 'takeover'.  As the '--offroot --takeover' don't hurt when
 # not necessary, are are useful with root-on-md in dracut,
 # have them always present.
-ExecStart=BINDIR/mdmon --offroot --takeover %I
-Type=forking
+ExecStart=BINDIR/mdmon --foreground --offroot --takeover %I
 # Don't set the PIDFile.  It isn't necessary (systemd can work
 # it out) and systemd will remove it when transitioning from
 # initramfs to rootfs.



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 5/6] mdmon: Remove need for KillMode=none
  2023-02-27  0:13 [PATCH 0/6] Assorted patches relating to mdmon NeilBrown
  2023-02-27  0:13 ` [PATCH 2/6] Improvements for IMSM_NO_PLATFORM testing NeilBrown
  2023-02-27  0:13 ` [PATCH 3/6] mdmon: don't test both 'all' and 'container_name' NeilBrown
@ 2023-02-27  0:13 ` NeilBrown
  2023-03-06  6:45   ` Paul Menzel
  2023-02-27  0:13 ` [PATCH 6/6] mdmon improvements for switchroot NeilBrown
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 12+ messages in thread
From: NeilBrown @ 2023-02-27  0:13 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: linux-raid, Martin Wilck, Mariusz Tkaczyk

mdmon needs to keep running during the switchroot out of (at boot) and
then back into (at shutdown) the initrd.  It runs until a new mdmon
takes over.

Killmode=none is used to achieve this, with the help of --offroot which
sets argv[0][0] to '@' which systemd understands.

This is needed because mdmon is currently run in system-mdmon.slice
which conflicts with shutdown.target so without Killmode=none mdmon
would get killed early in shutdown when system.mdmon.slice is removed.

As described in systemd.service(5), this conflict which shutdown can be
resolved by explicitly requesting system.slice, which is a natural
counterpart to DefaultDependencies=no.

So add that, and also add IgnoreOnIsolate=true to avoid another possible
source of an early death.  With these we no longer need KillMode=none
which the systemd developers have marked as "deprecated".

Signed-off-by: NeilBrown <neilb@suse.de>
---
 systemd/mdmon@.service |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/systemd/mdmon@.service b/systemd/mdmon@.service
index 3502fadc3fc4..3ab908c45a3b 100644
--- a/systemd/mdmon@.service
+++ b/systemd/mdmon@.service
@@ -10,6 +10,9 @@ Description=MD Metadata Monitor on /dev/%I
 DefaultDependencies=no
 Before=initrd-switch-root.target
 Documentation=man:mdmon(8)
+# allow this to keep running after switchroot, until a new
+# instance is started
+IgnoreOnIsolate=true
 
 [Service]
 # The mdmon starting in the initramfs (with dracut at least)
@@ -22,4 +25,6 @@ ExecStart=BINDIR/mdmon --foreground --offroot --takeover %I
 # it out) and systemd will remove it when transitioning from
 # initramfs to rootfs.
 #PIDFile=/run/mdadm/%I.pid
-KillMode=none
+# The default slice is system-mdmon.slice which Conflicts
+# with shutdown, causing mdmon to exit early.  So use system.slice.
+Slice=system.slice



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 6/6] mdmon improvements for switchroot
  2023-02-27  0:13 [PATCH 0/6] Assorted patches relating to mdmon NeilBrown
                   ` (2 preceding siblings ...)
  2023-02-27  0:13 ` [PATCH 5/6] mdmon: Remove need for KillMode=none NeilBrown
@ 2023-02-27  0:13 ` NeilBrown
  2023-03-01 13:50   ` Mariusz Tkaczyk
  2023-03-06  8:31   ` Paul Menzel
  2023-02-27  0:13 ` [PATCH 4/6] mdmon: change systemd unit file to use --foreground NeilBrown
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 12+ messages in thread
From: NeilBrown @ 2023-02-27  0:13 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: linux-raid, Martin Wilck, Mariusz Tkaczyk

We need a new mdmon@mdfoo instance to run in the root filesystem after
switch root, as /sys and /dev are removed from the initrd.

systemd will not start a new unit with the same name running while the
old unit is still active, and we want the two mdmon processes to overlap
in time to avoid any risk of deadlock which a write is attempted with no
mdmon running.

So we need a different unit name in the initrd than in the root.  Apart
from the name, everything else should be the same.

This is easily achieved using a different instance name as the
mdmon@.service unit file already supports multiple instances (for
different arrays).

So start "mdmon@mdfoo.service" from root, but
"mdmon@initrd-mdfoo.service" from the initrd.  udev can tell which
circumstance is the case by looking for /etc/initrd-release.
continue_from_systemd() is enhanced so that the "initrd-" prefix can be
requested.

Teach mdmon that a container name like "initrd/foo" should be treated
just like "foo".  Note that systemd passes the instance name
"initrd-foo" as "initrd/foo".

We don't need a similar machanism at shutdown because dracut runs
"mdmon --takeover --all" when appropriate.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 Grow.c                    |    4 ++--
 mdadm.h                   |    2 +-
 mdmon.c                   |    6 +++++-
 systemd/mdmon@.service    |    2 +-
 udev-md-raid-arrays.rules |    3 ++-
 util.c                    |    7 ++++---
 6 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/Grow.c b/Grow.c
index bb5fe45c851c..06001f2d334a 100644
--- a/Grow.c
+++ b/Grow.c
@@ -3516,7 +3516,7 @@ started:
 
 	if (!forked)
 		if (continue_via_systemd(container ?: sra->sys_name,
-					 GROW_SERVICE)) {
+					 GROW_SERVICE, NULL)) {
 			free(fdlist);
 			free(offsets);
 			sysfs_free(sra);
@@ -3714,7 +3714,7 @@ int reshape_container(char *container, char *devname,
 	ping_monitor(container);
 
 	if (!forked && !freeze_reshape)
-		if (continue_via_systemd(container, GROW_SERVICE))
+		if (continue_via_systemd(container, GROW_SERVICE, NULL))
 			return 0;
 
 	switch (forked ? 0 : fork()) {
diff --git a/mdadm.h b/mdadm.h
index 0939fd3cb1ee..f208ed5ec22f 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -1605,7 +1605,7 @@ extern int same_dev(char *one, char *two);
 extern int compare_paths (char* path1,char* path2);
 extern void enable_fds(int devices);
 extern void manage_fork_fds(int close_all);
-extern int continue_via_systemd(char *devnm, char *service_name);
+extern int continue_via_systemd(char *devnm, char *service_name, char *prefix);
 
 extern void ident_init(struct mddev_ident *ident);
 
diff --git a/mdmon.c b/mdmon.c
index 6d37b17c3f53..25abdd71fb1e 100644
--- a/mdmon.c
+++ b/mdmon.c
@@ -368,7 +368,11 @@ int main(int argc, char *argv[])
 	}
 
 	if (!all && argv[optind]) {
-		container_name = get_md_name(argv[optind]);
+		static const char prefix[] = "initrd/";
+		container_name = argv[optind];
+		if (strncmp(container_name, prefix, sizeof(prefix)-1) == 0)
+			container_name += sizeof(prefix)-1;
+		container_name = get_md_name(container_name);
 		if (!container_name)
 			return 1;
 	}
diff --git a/systemd/mdmon@.service b/systemd/mdmon@.service
index 3ab908c45a3b..8d55fa63ce28 100644
--- a/systemd/mdmon@.service
+++ b/systemd/mdmon@.service
@@ -6,7 +6,7 @@
 #  (at your option) any later version.
 
 [Unit]
-Description=MD Metadata Monitor on /dev/%I
+Description=MD Metadata Monitor on %I
 DefaultDependencies=no
 Before=initrd-switch-root.target
 Documentation=man:mdmon(8)
diff --git a/udev-md-raid-arrays.rules b/udev-md-raid-arrays.rules
index 2967ace1f040..4e64b249b2db 100644
--- a/udev-md-raid-arrays.rules
+++ b/udev-md-raid-arrays.rules
@@ -38,7 +38,8 @@ ENV{MD_LEVEL}=="raid[1-9]*", ENV{SYSTEMD_WANTS}+="mdmonitor.service"
 
 # Tell systemd to run mdmon for our container, if we need it.
 ENV{MD_LEVEL}=="raid[1-9]*", ENV{MD_CONTAINER}=="?*", PROGRAM="/usr/bin/readlink $env{MD_CONTAINER}", ENV{MD_MON_THIS}="%c"
-ENV{MD_MON_THIS}=="?*", PROGRAM="/usr/bin/basename $env{MD_MON_THIS}", ENV{SYSTEMD_WANTS}+="mdmon@%c.service"
+ENV{MD_MON_THIS}=="?*", TEST=="/etc/initrd-release", PROGRAM="/usr/bin/basename $env{MD_MON_THIS}", ENV{SYSTEMD_WANTS}+="mdmon@initrd-%c.service"
+ENV{MD_MON_THIS}=="?*", TEST!="/etc/initrd-release", PROGRAM="/usr/bin/basename $env{MD_MON_THIS}", ENV{SYSTEMD_WANTS}+="mdmon@%c.service"
 ENV{RESHAPE_ACTIVE}=="yes", PROGRAM="/usr/bin/basename $env{MD_MON_THIS}", ENV{SYSTEMD_WANTS}+="mdadm-grow-continue@%c.service"
 
 LABEL="md_end"
diff --git a/util.c b/util.c
index 6b44662db7cd..1d433d1826b5 100644
--- a/util.c
+++ b/util.c
@@ -1906,6 +1906,7 @@ int start_mdmon(char *devnm)
 	int len;
 	pid_t pid;
 	int status;
+	char *prefix = in_initrd() ? "initrd-", "";
 	char pathbuf[1024];
 	char *paths[4] = {
 		pathbuf,
@@ -1916,7 +1917,7 @@ int start_mdmon(char *devnm)
 
 	if (check_env("MDADM_NO_MDMON"))
 		return 0;
-	if (continue_via_systemd(devnm, MDMON_SERVICE))
+	if (continue_via_systemd(devnm, MDMON_SERVICE, prefix))
 		return 0;
 
 	/* That failed, try running mdmon directly */
@@ -2187,7 +2188,7 @@ void manage_fork_fds(int close_all)
  *	1- if systemd service has been started
  *	0- otherwise
  */
-int continue_via_systemd(char *devnm, char *service_name)
+int continue_via_systemd(char *devnm, char *service_name, char *prefix)
 {
 	int pid, status;
 	char pathbuf[1024];
@@ -2199,7 +2200,7 @@ int continue_via_systemd(char *devnm, char *service_name)
 	case  0:
 		manage_fork_fds(1);
 		snprintf(pathbuf, sizeof(pathbuf),
-			 "%s@%s.service", service_name, devnm);
+			 "%s@%s%s.service", service_name, prefix ?: "", devnm);
 		status = execl("/usr/bin/systemctl", "systemctl", "restart",
 			       pathbuf, NULL);
 		status = execl("/bin/systemctl", "systemctl", "restart",



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/6] Assorted patches relating to mdmon
  2023-02-27  0:13 [PATCH 0/6] Assorted patches relating to mdmon NeilBrown
                   ` (5 preceding siblings ...)
  2023-02-27  0:13 ` [PATCH 1/6] Use existence of /etc/initrd-release to detect initrd NeilBrown
@ 2023-03-01  8:55 ` Mariusz Tkaczyk
  6 siblings, 0 replies; 12+ messages in thread
From: Mariusz Tkaczyk @ 2023-03-01  8:55 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: NeilBrown, linux-raid, Martin Wilck, Mariusz Tkaczyk

On Mon, 27 Feb 2023 11:13:07 +1100
NeilBrown <neilb@suse.de> wrote:

> mdmon is a root-storage daemon is the sense defined by systemd
> documentation, but it does not follow the practice that systemd
> recommends.  Specifically it is run from the root filesystem when
> possible.  The instance started in the initrd hands-over to a root-fs
> based instance, which then hands-over to an initrd-based instance
> started by dracut at shutdown.
> 
> Part of the reason that we ignore systemd advise is that mdmon needs
> access to the filesystem - specifically /dev and /sys - which is not
> available in the initrd context after switchroot.  We could possibly
> change mdmon to work in the systemd-preferred way by splitting mdmon
> into two processes instead of just having 2 threads.  The "monitor"
> process could running entirely in the initrd context, the "manager"
> process could safely run in the root-fs context, passing newly opened
> file descriptors to the monitor over a unix-domain socket.
> 
> But we aren't there yet and may never be.
> 
> For now, mdmon doesn't work correctly.  There is no mechanism to ensure
> a new instance starts after switchroot.  Until recently the initrd
> instance of the systemd mdmon unit would be stopped at switchroot time
> because udev would temporarily forget about md devices.  This would
> allow the "udevadm trigger" process to start a new instance.  udev was
> recently fixed:
> 
> Commit: 7ec624147a41 ("udevadm: cleanup-db: don't delete information for kept
> db entries")
> 
> so now the attempt to start mdmon via "udevadm trigger" does nothing as
> mdmon already has an active unit.
> 
> The net result is that mdmon continues running in the initrd mount
> namespace and so cannot access new devices.  Adding a device to a root
> md array that depends on mdmon will no longer work.
> 
> We want the initrd instance of mdmon to continue running until the
> root-fs based instance starts, and that really requires we have two
> different systemd units.  This series achieves this in the final patch by
> using a different instance name inside or initrd and outside.
> "initrd-mdfoo" and "mdfoo".
> 
> Other patches in the series are mostly clean-ups and minor improvements
> in related code.
> 
> NeilBrown
> 

Hi Jes,
The problem descried by Neil is critical for IMSM. I will test the patchset
ASAP.
Additionally, it resolves "KillMode=none" problem so we will be able to finally
drop it.

I will be back with results soon.

Thanks,
Mariusz

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/6] mdmon improvements for switchroot
  2023-02-27  0:13 ` [PATCH 6/6] mdmon improvements for switchroot NeilBrown
@ 2023-03-01 13:50   ` Mariusz Tkaczyk
  2023-03-05 22:10     ` NeilBrown
  2023-03-06  8:31   ` Paul Menzel
  1 sibling, 1 reply; 12+ messages in thread
From: Mariusz Tkaczyk @ 2023-03-01 13:50 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jes Sorensen, linux-raid, Martin Wilck, Mariusz Tkaczyk

Hi Neil,
We found typo. We fixed that to test the change.
Other comments are less important.

On Mon, 27 Feb 2023 11:13:07 +1100
NeilBrown <neilb@suse.de> wrote:

> We need a new mdmon@mdfoo instance to run in the root filesystem after
> switch root, as /sys and /dev are removed from the initrd.
> 
> systemd will not start a new unit with the same name running while the
> old unit is still active, and we want the two mdmon processes to overlap
> in time to avoid any risk of deadlock which a write is attempted with no
> mdmon running.
> 
> So we need a different unit name in the initrd than in the root.  Apart
> from the name, everything else should be the same.
> 
> This is easily achieved using a different instance name as the
> mdmon@.service unit file already supports multiple instances (for
> different arrays).
> 
> So start "mdmon@mdfoo.service" from root, but
> "mdmon@initrd-mdfoo.service" from the initrd.  udev can tell which
> circumstance is the case by looking for /etc/initrd-release.
> continue_from_systemd() is enhanced so that the "initrd-" prefix can be
> requested.
> 
> Teach mdmon that a container name like "initrd/foo" should be treated
> just like "foo".  Note that systemd passes the instance name
> "initrd-foo" as "initrd/foo".
> 
> We don't need a similar machanism at shutdown because dracut runs
> "mdmon --takeover --all" when appropriate.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>

> diff --git a/mdmon.c b/mdmon.c
> index 6d37b17c3f53..25abdd71fb1e 100644
> --- a/mdmon.c
> +++ b/mdmon.c
> @@ -368,7 +368,11 @@ int main(int argc, char *argv[])
>  	}
>  
>  	if (!all && argv[optind]) {
> -		container_name = get_md_name(argv[optind]);
> +		static const char prefix[] = "initrd/";
> +		container_name = argv[optind];
> +		if (strncmp(container_name, prefix, sizeof(prefix)-1) == 0)
> +			container_name += sizeof(prefix)-1;
> +		container_name = get_md_name(container_name);

"sizeof(prefix)-1" there should be spaces before and after operator.

You are defining similar literals in 2 places:
prefix[] = "initrd/"
*prefix = in_initrd() ? "initrd-", "";

When I see something like this, I need to ask why it is not globally defined
because in the future we would need to define it for the firth and fourth time.
I see the difference in last sign ('/' and '-'). We can omit that.
I would like propose something like:

in mdadm.h:
#DEFINE MDMON_PREFIX "initrd"

in mdmon, do not check last sign. whatever it is, we don't really care, just
skip it. All we need to know is that it not belongs to container name.
Hope it works correctly:
	if (strncmp(container_name, MDMON_PREFIX, sizeof(prefix) - 1) == 0)
		container_name += sizeof(MDMON_PREFIX);
	
And later in start_mdmon include '-' in snprintf:
		 "%s@%s%s.service", service_name, MDMON_PREFIX"-" ?: "",

I think that we don't need to pass whole char* value, we can use bool, the one
possibility is "initrd" now. If that would be changed, we can use enum and maps
interface:
https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/tree/maps.c

This is lesson learned by code study, we needed to put big effort to correct
similar case with reshapes because pointers become overkill through
years:
https://lore.kernel.org/linux-raid/20230102083524.28893-1-mateusz.kusiak@intel.com/

It my my personal view so you are free to make decision. I will accept it but
please note that mdadm is full of same literals (just find /dev/md or /dev/md/)
so that is why I'm especially sensitive in that cases.

> --git a/util.c b/util.c index 6b44662db7cd..1d433d1826b5 100644 --- a/util.c
> +++ b/util.c @@ -1906,6 +1906,7 @@ int start_mdmon(char *devnm)
>  	int len;
>  	pid_t pid;
>  	int status;
> +	char *prefix = in_initrd() ? "initrd-", "";

The most important thing:
typo, should be in_initrd() ? "initrd-": "";

>  	char pathbuf[1024];
>  	char *paths[4] = {
>  		pathbuf,
> @@ -1916,7 +1917,7 @@ int start_mdmon(char *devnm)
>  
>  	if (check_env("MDADM_NO_MDMON"))
>  		return 0;
> -	if (continue_via_systemd(devnm, MDMON_SERVICE))
> +	if (continue_via_systemd(devnm, MDMON_SERVICE, prefix))
>  		return 0;
>  
>  	/* That failed, try running mdmon directly */
> @@ -2187,7 +2188,7 @@ void manage_fork_fds(int close_all)
>   *	1- if systemd service has been started
>   *	0- otherwise
>   */
> -int continue_via_systemd(char *devnm, char *service_name)
> +int continue_via_systemd(char *devnm, char *service_name, char *prefix)
>  {
>  	int pid, status;
>  	char pathbuf[1024];
> @@ -2199,7 +2200,7 @@ int continue_via_systemd(char *devnm, char
> *service_name) case  0:
>  		manage_fork_fds(1);
>  		snprintf(pathbuf, sizeof(pathbuf),
> -			 "%s@%s.service", service_name, devnm);
> +			 "%s@%s%s.service", service_name, prefix ?: "",
> devnm); status = execl("/usr/bin/systemctl", "systemctl", "restart",
>  			       pathbuf, NULL);
>  		status = execl("/bin/systemctl", "systemctl", "restart",
> 
> 

Thanks,
Mariusz

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/6] mdmon improvements for switchroot
  2023-03-01 13:50   ` Mariusz Tkaczyk
@ 2023-03-05 22:10     ` NeilBrown
  0 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2023-03-05 22:10 UTC (permalink / raw)
  To: Mariusz Tkaczyk; +Cc: Jes Sorensen, linux-raid, Martin Wilck, Mariusz Tkaczyk

On Thu, 02 Mar 2023, Mariusz Tkaczyk wrote:
> Hi Neil,
> We found typo. We fixed that to test the change.
> Other comments are less important.
> 
> On Mon, 27 Feb 2023 11:13:07 +1100
> NeilBrown <neilb@suse.de> wrote:
> 
> > We need a new mdmon@mdfoo instance to run in the root filesystem after
> > switch root, as /sys and /dev are removed from the initrd.
> > 
> > systemd will not start a new unit with the same name running while the
> > old unit is still active, and we want the two mdmon processes to overlap
> > in time to avoid any risk of deadlock which a write is attempted with no
> > mdmon running.
> > 
> > So we need a different unit name in the initrd than in the root.  Apart
> > from the name, everything else should be the same.
> > 
> > This is easily achieved using a different instance name as the
> > mdmon@.service unit file already supports multiple instances (for
> > different arrays).
> > 
> > So start "mdmon@mdfoo.service" from root, but
> > "mdmon@initrd-mdfoo.service" from the initrd.  udev can tell which
> > circumstance is the case by looking for /etc/initrd-release.
> > continue_from_systemd() is enhanced so that the "initrd-" prefix can be
> > requested.
> > 
> > Teach mdmon that a container name like "initrd/foo" should be treated
> > just like "foo".  Note that systemd passes the instance name
> > "initrd-foo" as "initrd/foo".
> > 
> > We don't need a similar machanism at shutdown because dracut runs
> > "mdmon --takeover --all" when appropriate.
> > 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> 
> > diff --git a/mdmon.c b/mdmon.c
> > index 6d37b17c3f53..25abdd71fb1e 100644
> > --- a/mdmon.c
> > +++ b/mdmon.c
> > @@ -368,7 +368,11 @@ int main(int argc, char *argv[])
> >  	}
> >  
> >  	if (!all && argv[optind]) {
> > -		container_name = get_md_name(argv[optind]);
> > +		static const char prefix[] = "initrd/";
> > +		container_name = argv[optind];
> > +		if (strncmp(container_name, prefix, sizeof(prefix)-1) == 0)
> > +			container_name += sizeof(prefix)-1;
> > +		container_name = get_md_name(container_name);
> 
> "sizeof(prefix)-1" there should be spaces before and after operator.
> 
> You are defining similar literals in 2 places:
> prefix[] = "initrd/"
> *prefix = in_initrd() ? "initrd-", "";
> 
> When I see something like this, I need to ask why it is not globally defined
> because in the future we would need to define it for the firth and fourth time.
> I see the difference in last sign ('/' and '-'). We can omit that.
> I would like propose something like:
> 
> in mdadm.h:
> #DEFINE MDMON_PREFIX "initrd"

To my mind this is "premature optimisation".  I think it makes the core
harder to read.  If it makes the code easier to maintain then it might
be worth the cost.  I don't think it does.
> 
> in mdmon, do not check last sign. whatever it is, we don't really care, just
> skip it. All we need to know is that it not belongs to container name.
> Hope it works correctly:
> 	if (strncmp(container_name, MDMON_PREFIX, sizeof(prefix) - 1) == 0)
> 		container_name += sizeof(MDMON_PREFIX);
> 	
> And later in start_mdmon include '-' in snprintf:
> 		 "%s@%s%s.service", service_name, MDMON_PREFIX"-" ?: "",
> 
> I think that we don't need to pass whole char* value, we can use bool, the one
> possibility is "initrd" now. If that would be changed, we can use enum and maps
> interface:
> https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/tree/maps.c

Passing bools makes the calling code hard to read.  "What does that
"true" or "false" mean??".  Sometimes it really is best, but I think
passing the string "initrd" makes more sense to the reader than passing "true".

Passing enums is better than simple bools - and has the benefit of
spelling out the meaning of NULL.  These still feel a bit clumsy to me
so I would only use them when there is a clear benefit.

> 
> This is lesson learned by code study, we needed to put big effort to correct
> similar case with reshapes because pointers become overkill through
> years:
> https://lore.kernel.org/linux-raid/20230102083524.28893-1-mateusz.kusiak@intel.com/
> 
> It my my personal view so you are free to make decision. I will accept it but
> please note that mdadm is full of same literals (just find /dev/md or /dev/md/)
> so that is why I'm especially sensitive in that cases.
> 
> > --git a/util.c b/util.c index 6b44662db7cd..1d433d1826b5 100644 --- a/util.c
> > +++ b/util.c @@ -1906,6 +1906,7 @@ int start_mdmon(char *devnm)
> >  	int len;
> >  	pid_t pid;
> >  	int status;
> > +	char *prefix = in_initrd() ? "initrd-", "";
> 
> The most important thing:
> typo, should be in_initrd() ? "initrd-": "";

Certainly - thanks for catching that!

Jes - should I resent the whole series, or just this patch, or will you
edit before applying?

Thanks,
NeilBrown

> 
> >  	char pathbuf[1024];
> >  	char *paths[4] = {
> >  		pathbuf,
> > @@ -1916,7 +1917,7 @@ int start_mdmon(char *devnm)
> >  
> >  	if (check_env("MDADM_NO_MDMON"))
> >  		return 0;
> > -	if (continue_via_systemd(devnm, MDMON_SERVICE))
> > +	if (continue_via_systemd(devnm, MDMON_SERVICE, prefix))
> >  		return 0;
> >  
> >  	/* That failed, try running mdmon directly */
> > @@ -2187,7 +2188,7 @@ void manage_fork_fds(int close_all)
> >   *	1- if systemd service has been started
> >   *	0- otherwise
> >   */
> > -int continue_via_systemd(char *devnm, char *service_name)
> > +int continue_via_systemd(char *devnm, char *service_name, char *prefix)
> >  {
> >  	int pid, status;
> >  	char pathbuf[1024];
> > @@ -2199,7 +2200,7 @@ int continue_via_systemd(char *devnm, char
> > *service_name) case  0:
> >  		manage_fork_fds(1);
> >  		snprintf(pathbuf, sizeof(pathbuf),
> > -			 "%s@%s.service", service_name, devnm);
> > +			 "%s@%s%s.service", service_name, prefix ?: "",
> > devnm); status = execl("/usr/bin/systemctl", "systemctl", "restart",
> >  			       pathbuf, NULL);
> >  		status = execl("/bin/systemctl", "systemctl", "restart",
> > 
> > 
> 
> Thanks,
> Mariusz
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 5/6] mdmon: Remove need for KillMode=none
  2023-02-27  0:13 ` [PATCH 5/6] mdmon: Remove need for KillMode=none NeilBrown
@ 2023-03-06  6:45   ` Paul Menzel
  0 siblings, 0 replies; 12+ messages in thread
From: Paul Menzel @ 2023-03-06  6:45 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, Jes Sorensen, Martin Wilck, Mariusz Tkaczyk

Dear Neil,


Thank you for your patch. Two minor comments below.

Am 27.02.23 um 01:13 schrieb NeilBrown:
> mdmon needs to keep running during the switchroot out of (at boot) and
> then back into (at shutdown) the initrd.  It runs until a new mdmon
> takes over.
> 
> Killmode=none is used to achieve this, with the help of --offroot which
> sets argv[0][0] to '@' which systemd understands.
> 
> This is needed because mdmon is currently run in system-mdmon.slice
> which conflicts with shutdown.target so without Killmode=none mdmon
> would get killed early in shutdown when system.mdmon.slice is removed.
> 
> As described in systemd.service(5), this conflict which shutdown can be

Small typo: … conflict with shutdown …?

> resolved by explicitly requesting system.slice, which is a natural
> counterpart to DefaultDependencies=no.
> 
> So add that, and also add IgnoreOnIsolate=true to avoid another possible
> source of an early death.  With these we no longer need KillMode=none
> which the systemd developers have marked as "deprecated".
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>   systemd/mdmon@.service |    7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/systemd/mdmon@.service b/systemd/mdmon@.service
> index 3502fadc3fc4..3ab908c45a3b 100644
> --- a/systemd/mdmon@.service
> +++ b/systemd/mdmon@.service
> @@ -10,6 +10,9 @@ Description=MD Metadata Monitor on /dev/%I
>   DefaultDependencies=no
>   Before=initrd-switch-root.target
>   Documentation=man:mdmon(8)
> +# allow this to keep running after switchroot, until a new
> +# instance is started

Below you “format” the comment as a sentence – starting with capital 
letter and using a dot/period at the end.

> +IgnoreOnIsolate=true
>   
>   [Service]
>   # The mdmon starting in the initramfs (with dracut at least)
> @@ -22,4 +25,6 @@ ExecStart=BINDIR/mdmon --foreground --offroot --takeover %I
>   # it out) and systemd will remove it when transitioning from
>   # initramfs to rootfs.
>   #PIDFile=/run/mdadm/%I.pid
> -KillMode=none
> +# The default slice is system-mdmon.slice which Conflicts
> +# with shutdown, causing mdmon to exit early.  So use system.slice.
> +Slice=system.slice


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 6/6] mdmon improvements for switchroot
  2023-02-27  0:13 ` [PATCH 6/6] mdmon improvements for switchroot NeilBrown
  2023-03-01 13:50   ` Mariusz Tkaczyk
@ 2023-03-06  8:31   ` Paul Menzel
  1 sibling, 0 replies; 12+ messages in thread
From: Paul Menzel @ 2023-03-06  8:31 UTC (permalink / raw)
  To: NeilBrown; +Cc: Jes Sorensen, linux-raid, Martin Wilck, Mariusz Tkaczyk

Dear Neil,


Thank you for your patch.

Am 27.02.23 um 01:13 schrieb NeilBrown:

For commit message summary, I’d use a statement – maybe:

mdmon: Improve switchroot

> We need a new mdmon@mdfoo instance to run in the root filesystem after
> switch root, as /sys and /dev are removed from the initrd.
> 
> systemd will not start a new unit with the same name running while the
> old unit is still active, and we want the two mdmon processes to overlap
> in time to avoid any risk of deadlock which a write is attempted with no
> mdmon running.

I do not understand the part after *deadlock*.

> So we need a different unit name in the initrd than in the root.  Apart
> from the name, everything else should be the same.
> 
> This is easily achieved using a different instance name as the
> mdmon@.service unit file already supports multiple instances (for
> different arrays).
> 
> So start "mdmon@mdfoo.service" from root, but
> "mdmon@initrd-mdfoo.service" from the initrd.  udev can tell which
> circumstance is the case by looking for /etc/initrd-release.
> continue_from_systemd() is enhanced so that the "initrd-" prefix can be
> requested.
> 
> Teach mdmon that a container name like "initrd/foo" should be treated
> just like "foo".  Note that systemd passes the instance name
> "initrd-foo" as "initrd/foo".
> 
> We don't need a similar machanism at shutdown because dracut runs

mechanism

> "mdmon --takeover --all" when appropriate.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>   Grow.c                    |    4 ++--
>   mdadm.h                   |    2 +-
>   mdmon.c                   |    6 +++++-
>   systemd/mdmon@.service    |    2 +-
>   udev-md-raid-arrays.rules |    3 ++-
>   util.c                    |    7 ++++---
>   6 files changed, 15 insertions(+), 9 deletions(-)

[…]


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-03-06  8:31 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-27  0:13 [PATCH 0/6] Assorted patches relating to mdmon NeilBrown
2023-02-27  0:13 ` [PATCH 2/6] Improvements for IMSM_NO_PLATFORM testing NeilBrown
2023-02-27  0:13 ` [PATCH 3/6] mdmon: don't test both 'all' and 'container_name' NeilBrown
2023-02-27  0:13 ` [PATCH 5/6] mdmon: Remove need for KillMode=none NeilBrown
2023-03-06  6:45   ` Paul Menzel
2023-02-27  0:13 ` [PATCH 6/6] mdmon improvements for switchroot NeilBrown
2023-03-01 13:50   ` Mariusz Tkaczyk
2023-03-05 22:10     ` NeilBrown
2023-03-06  8:31   ` Paul Menzel
2023-02-27  0:13 ` [PATCH 4/6] mdmon: change systemd unit file to use --foreground NeilBrown
2023-02-27  0:13 ` [PATCH 1/6] Use existence of /etc/initrd-release to detect initrd NeilBrown
2023-03-01  8:55 ` [PATCH 0/6] Assorted patches relating to mdmon Mariusz Tkaczyk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).