util-linux.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] unshare: manage binfmt_misc mounts
@ 2024-06-11  8:43 Laurent Vivier
  2024-06-11  8:43 ` [PATCH v2 1/2] unshare: mount binfmt_misc Laurent Vivier
  2024-06-11  8:43 ` [PATCH v2 2/2] unshare: load binfmt_misc interpreter Laurent Vivier
  0 siblings, 2 replies; 6+ messages in thread
From: Laurent Vivier @ 2024-06-11  8:43 UTC (permalink / raw)
  To: util-linux; +Cc: Laurent Vivier

Since linux v6.7 and
commit 21ca59b365c0 ("binfmt_misc: enable sandboxed mounts"),
binfmt_misc can be mountable in a non-initial user namespace by
a non privileged user.

Extend unshare to manage it:

- add --mount-binfmt[=<dir>] to mount binfmt_misc filesystem, this
  results in clearing inherited interpreters from the previous namespace

- add -l, --load-interp <file> to load a binfmt_misc interpreter at startup.

  The interpreter is loaded from the initial fileystem if the 'F' flags is
  provided, otherwise from inside the new namespace
  This makes possible to start a chroot of another architecture without
  being root.

For instance:

  With 'F' flag, load the interpreter from the initial namespace:

    $ /bin/qemu-m68k-static --version
    qemu-m68k version 8.2.2 (qemu-8.2.2-1.fc40)
    Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers
    $ unshare --map-root-user --fork --pid --load-interp=":qemu-m68k:M::\\x7fELF\\x01\\x02\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x04:\\xff\\xff\\xff\\xff\\xff\\xff\\xfe\\x00\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xfe\\xff\\xff:/bin/qemu-m68k-static:OCF" --root=chroot/m68k/sid
    # QEMU_VERSION= ls
    qemu-m68k version 8.2.2 (qemu-8.2.2-1.fc40)
    Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers
    # /qemu-m68k  --version
    qemu-m68k version 8.0.50 (v8.0.0-340-gb1cff5e2da95)
    Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers

  Without 'F' flag, from inside the namespace:

    $ unshare --map-root-user --fork --pid --load-interp=":qemu-m68k:M::\\x7fELF\\x01\\x02\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x04:\\xff\\xff\\xff\\xff\\xff\\xff\\xfe\\x00\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xfe\\xff\\xff:/qemu-m68k:OC" --root=chroot/m68k/sid
    # QEMU_VERSION= ls
    qemu-m68k version 8.0.50 (v8.0.0-340-gb1cff5e2da95)
    Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
    # /qemu-m68k  --version
    qemu-m68k version 8.0.50 (v8.0.0-340-gb1cff5e2da95)
    Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers

v2:
  - use <binfmt_mnt>/register rather than _PATH_PROC_BINFMT_MISC_REGISTER to load the interpreter

Laurent Vivier (2):
  unshare: mount binfmt_misc
  unshare: load binfmt_misc interpreter

 include/pathnames.h      |  1 +
 sys-utils/unshare.1.adoc | 13 ++++++++
 sys-utils/unshare.c      | 71 +++++++++++++++++++++++++++++++++++++++-
 3 files changed, 84 insertions(+), 1 deletion(-)

-- 
2.45.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] unshare: mount binfmt_misc
  2024-06-11  8:43 [PATCH v2 0/2] unshare: manage binfmt_misc mounts Laurent Vivier
@ 2024-06-11  8:43 ` Laurent Vivier
  2024-06-11  8:43 ` [PATCH v2 2/2] unshare: load binfmt_misc interpreter Laurent Vivier
  1 sibling, 0 replies; 6+ messages in thread
From: Laurent Vivier @ 2024-06-11  8:43 UTC (permalink / raw)
  To: util-linux; +Cc: Laurent Vivier

add --mount-binfmt[=<dir>] to mount binfmt_misc filesystem,
this results in clearing inherited interpreters from the previous namespace

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
---

Notes:
    v2:
      - remove definition of _PATH_PROC_BINFMT_MISC_REGISTER

 include/pathnames.h      |  1 +
 sys-utils/unshare.1.adoc |  3 +++
 sys-utils/unshare.c      | 19 +++++++++++++++++++
 3 files changed, 23 insertions(+)

diff --git a/include/pathnames.h b/include/pathnames.h
index 81fa405f63c7..569bef17f982 100644
--- a/include/pathnames.h
+++ b/include/pathnames.h
@@ -204,6 +204,7 @@
 /* sysctl fs paths */
 #define _PATH_PROC_SYS_FS	"/proc/sys/fs"
 #define _PATH_PROC_PIPE_MAX_SIZE	_PATH_PROC_SYS_FS "/pipe-max-size"
+#define _PATH_PROC_BINFMT_MISC	_PATH_PROC_SYS_FS "/binfmt_misc"
 
 /* irqtop paths */
 #define _PATH_PROC_INTERRUPTS	"/proc/interrupts"
diff --git a/sys-utils/unshare.1.adoc b/sys-utils/unshare.1.adoc
index e6201e28fffd..48d1a5579282 100644
--- a/sys-utils/unshare.1.adoc
+++ b/sys-utils/unshare.1.adoc
@@ -90,6 +90,9 @@ When *unshare* terminates, have _signame_ be sent to the forked child process. C
 *--mount-proc*[**=**__mountpoint__]::
 Just before running the program, mount the proc filesystem at _mountpoint_ (default is _/proc_). This is useful when creating a new PID namespace. It also implies creating a new mount namespace since the _/proc_ mount would otherwise mess up existing programs on the system. The new proc filesystem is explicitly mounted as private (with *MS_PRIVATE*|*MS_REC*).
 
+*--mount-binfmt*[**=**__mountpoint__]::
+Just before running the program, mount the binfmt_misc filesystem at _mountpoint_ (default is /proc/sys/fs/binfmt_misc).  It also implies creating a new mount namespace since the binfmt_misc mount would otherwise mess up existing programs on the system.  The new binfmt_misc filesystem is explicitly mounted as private (with *MS_PRIVATE*|*MS_REC*).
+
 **--map-user=**__uid|name__::
 Run the program only after the current effective user ID has been mapped to _uid_. If this option is specified multiple times, the last occurrence takes precedence. This option implies *--user*.
 
diff --git a/sys-utils/unshare.c b/sys-utils/unshare.c
index 57f3b8744fb5..d79aa1125955 100644
--- a/sys-utils/unshare.c
+++ b/sys-utils/unshare.c
@@ -760,6 +760,7 @@ static void __attribute__((__noreturn__)) usage(void)
 	fputs(_(" --kill-child[=<signame>]  when dying, kill the forked child (implies --fork)\n"
 		"                             defaults to SIGKILL\n"), out);
 	fputs(_(" --mount-proc[=<dir>]      mount proc filesystem first (implies --mount)\n"), out);
+	fputs(_(" --mount-binfmt[=<dir>]    mount binfmt filesystem first (implies --user and --mount)\n"), out);
 	fputs(_(" --propagation slave|shared|private|unchanged\n"
 	        "                           modify mount propagation in mount namespace\n"), out);
 	fputs(_(" --setgroups allow|deny    control the setgroups syscall in user namespaces\n"), out);
@@ -783,6 +784,7 @@ int main(int argc, char *argv[])
 {
 	enum {
 		OPT_MOUNTPROC = CHAR_MAX + 1,
+		OPT_MOUNTBINFMT,
 		OPT_PROPAGATION,
 		OPT_SETGROUPS,
 		OPT_KILLCHILD,
@@ -811,6 +813,7 @@ int main(int argc, char *argv[])
 		{ "fork",          no_argument,       NULL, 'f'             },
 		{ "kill-child",    optional_argument, NULL, OPT_KILLCHILD   },
 		{ "mount-proc",    optional_argument, NULL, OPT_MOUNTPROC   },
+		{ "mount-binfmt",  optional_argument, NULL, OPT_MOUNTBINFMT },
 		{ "map-user",      required_argument, NULL, OPT_MAPUSER     },
 		{ "map-users",     required_argument, NULL, OPT_MAPUSERS    },
 		{ "map-group",     required_argument, NULL, OPT_MAPGROUP    },
@@ -839,6 +842,7 @@ int main(int argc, char *argv[])
 	struct map_range *groupmap = NULL;
 	int kill_child_signo = 0; /* 0 means --kill-child was not used */
 	const char *procmnt = NULL;
+	const char *binfmt_mnt = NULL;
 	const char *newroot = NULL;
 	const char *newdir = NULL;
 	pid_t pid_bind = 0, pid_idmap = 0;
@@ -913,6 +917,15 @@ int main(int argc, char *argv[])
 			unshare_flags |= CLONE_NEWNS;
 			procmnt = optarg ? optarg : "/proc";
 			break;
+		case OPT_MOUNTBINFMT:
+			unshare_flags |= CLONE_NEWNS | CLONE_NEWUSER;
+			binfmt_mnt = optarg;
+			if (!binfmt_mnt) {
+				if (!procmnt)
+					procmnt = "/proc";
+				binfmt_mnt = _PATH_PROC_BINFMT_MISC;
+			}
+			break;
 		case OPT_MAPUSER:
 			unshare_flags |= CLONE_NEWUSER;
 			mapuser = get_user(optarg, _("failed to parse uid"));
@@ -1178,6 +1191,12 @@ int main(int argc, char *argv[])
 			err(EXIT_FAILURE, _("mount %s failed"), procmnt);
 	}
 
+	if (binfmt_mnt) {
+		if (mount("binfmt_misc", binfmt_mnt, "binfmt_misc",
+			  MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL) != 0)
+			err(EXIT_FAILURE, _("mount %s failed"), binfmt_mnt);
+	}
+
 	if (force_gid) {
 		if (setgroups(0, NULL) != 0)	/* drop supplementary groups */
 			err(EXIT_FAILURE, _("setgroups failed"));
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] unshare: load binfmt_misc interpreter
  2024-06-11  8:43 [PATCH v2 0/2] unshare: manage binfmt_misc mounts Laurent Vivier
  2024-06-11  8:43 ` [PATCH v2 1/2] unshare: mount binfmt_misc Laurent Vivier
@ 2024-06-11  8:43 ` Laurent Vivier
  2024-06-18  9:51   ` Karel Zak
  1 sibling, 1 reply; 6+ messages in thread
From: Laurent Vivier @ 2024-06-11  8:43 UTC (permalink / raw)
  To: util-linux; +Cc: Laurent Vivier

add -l, --load-interp <file> to load a binfmt_misc interpreter at startup.

The interpreter is loaded from the initial fileystem if the 'F' flags is
provided, otherwise from inside the new namespace
This makes possible to start a chroot of another architecture without
being root.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
---

Notes:
    v2:
      - use <binfmt_mnt>/register rather than _PATH_PROC_BINFMT_MISC_REGISTER
        to load the interpreter

 sys-utils/unshare.1.adoc | 10 ++++++++
 sys-utils/unshare.c      | 52 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/sys-utils/unshare.1.adoc b/sys-utils/unshare.1.adoc
index 48d1a5579282..24ac6fb01867 100644
--- a/sys-utils/unshare.1.adoc
+++ b/sys-utils/unshare.1.adoc
@@ -138,6 +138,9 @@ Set the user ID which will be used in the entered namespace.
 *-G*, *--setgid* _gid_::
 Set the group ID which will be used in the entered namespace and drop supplementary groups.
 
+*-l*, **--load-interp=**__file__::
+Load binfmt_misc definition in the namespace (implies *--mount-binfmt*).
+
 *--monotonic* _offset_::
 Set the offset of *CLOCK_MONOTONIC* which will be used in the entered time namespace. This option requires unsharing a time namespace with *--time*.
 
@@ -256,6 +259,13 @@ up 21 hours, 30 minutes
 up 9 years, 28 weeks, 1 day, 2 hours, 50 minutes
 ....
 
+The following example execute a chroot into the directory /chroot/powerpc/jessie and install the interpreter /bin/qemu-ppc-static to execute the powerpc binaries.
+If the interpreter is defined with the flag F, the interpreter is loaded before the chroot otherwise the interpreter is loaded from inside the chroot.
+
+....
+$  unshare --map-root-user --fork --pid --load-interp=":qemu-ppc:M::\\x7fELF\x01\\x02\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x14:\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\x00\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xfe\\xff\\xff:/bin/qemu-ppc-static:OCF" --root=/chroot/powerpc/jessie /bin/bash -l
+....
+
 == AUTHORS
 
 mailto:dottedmag@dottedmag.net[Mikhail Gusarov],
diff --git a/sys-utils/unshare.c b/sys-utils/unshare.c
index d79aa1125955..f8e1141840ca 100644
--- a/sys-utils/unshare.c
+++ b/sys-utils/unshare.c
@@ -725,6 +725,35 @@ static pid_t map_ids_from_child(int *fd, uid_t mapuser,
 	exit(EXIT_SUCCESS);
 }
 
+static int is_fixed(const char *interp)
+{
+	const char *flags;
+
+	flags = strrchr(interp, ':');
+
+	return strchr(flags, 'F') != NULL;
+}
+
+static void load_interp(const char *binfmt_mnt, const char *interp)
+{
+	int dirfd, fd;
+
+	dirfd = open(binfmt_mnt, O_PATH | O_DIRECTORY);
+	if (dirfd < 0)
+		err(EXIT_FAILURE, _("cannot open %s"), binfmt_mnt);
+
+	fd = openat(dirfd, "register", O_WRONLY);
+	if (fd < 0)
+		err(EXIT_FAILURE, _("cannot open %s/register"), binfmt_mnt);
+
+	if (write_all(fd, interp, strlen(interp)))
+		err(EXIT_FAILURE, _("write failed %s/register"), binfmt_mnt);
+
+	close(fd);
+
+	close(dirfd);
+}
+
 static void __attribute__((__noreturn__)) usage(void)
 {
 	FILE *out = stdout;
@@ -772,6 +801,7 @@ static void __attribute__((__noreturn__)) usage(void)
 	fputs(_(" -G, --setgid <gid>        set gid in entered namespace\n"), out);
 	fputs(_(" --monotonic <offset>      set clock monotonic offset (seconds) in time namespaces\n"), out);
 	fputs(_(" --boottime <offset>       set clock boottime offset (seconds) in time namespaces\n"), out);
+	fputs(_(" -l, --load-interp <file>  load binfmt definition in the namespace (implies --mount-binfmt)\n"), out);
 
 	fputs(USAGE_SEPARATOR, out);
 	fprintf(out, USAGE_HELP_OPTIONS(27));
@@ -830,6 +860,7 @@ int main(int argc, char *argv[])
 		{ "wd",		   required_argument, NULL, 'w'		    },
 		{ "monotonic",     required_argument, NULL, OPT_MONOTONIC   },
 		{ "boottime",      required_argument, NULL, OPT_BOOTTIME    },
+		{ "load-interp",   required_argument, NULL, 'l'		    },
 		{ NULL, 0, NULL, 0 }
 	};
 
@@ -846,6 +877,7 @@ int main(int argc, char *argv[])
 	const char *newroot = NULL;
 	const char *newdir = NULL;
 	pid_t pid_bind = 0, pid_idmap = 0;
+	const char *newinterp = NULL;
 	pid_t pid = 0;
 #ifdef UL_HAVE_PIDFD
 	int fd_parent_pid = -1;
@@ -868,7 +900,7 @@ int main(int argc, char *argv[])
 	textdomain(PACKAGE);
 	close_stdout_atexit();
 
-	while ((c = getopt_long(argc, argv, "+fhVmuinpCTUrR:w:S:G:c", longopts, NULL)) != -1) {
+	while ((c = getopt_long(argc, argv, "+fhVmuinpCTUrR:w:S:G:cl:", longopts, NULL)) != -1) {
 		switch (c) {
 		case 'f':
 			forkit = 1;
@@ -1011,6 +1043,15 @@ int main(int argc, char *argv[])
 			boottime = strtos64_or_err(optarg, _("failed to parse boottime offset"));
 			force_boottime = 1;
 			break;
+		case 'l':
+			unshare_flags |= CLONE_NEWNS | CLONE_NEWUSER;
+			if (!binfmt_mnt) {
+				if (!procmnt)
+					procmnt = "/proc";
+				binfmt_mnt = _PATH_PROC_BINFMT_MISC;
+			}
+			newinterp = optarg;
+			break;
 
 		case 'h':
 			usage();
@@ -1165,6 +1206,13 @@ int main(int argc, char *argv[])
 	if ((unshare_flags & CLONE_NEWNS) && propagation)
 		set_propagation(propagation);
 
+	if (newinterp && is_fixed(newinterp)) {
+		if (mount("binfmt_misc", _PATH_PROC_BINFMT_MISC, "binfmt_misc",
+			  MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL) != 0)
+			err(EXIT_FAILURE, _("mount %s failed"), _PATH_PROC_BINFMT_MISC);
+		load_interp(_PATH_PROC_BINFMT_MISC, newinterp);
+	}
+
 	if (newroot) {
 		if (chroot(newroot) != 0)
 			err(EXIT_FAILURE,
@@ -1196,6 +1244,8 @@ int main(int argc, char *argv[])
 			  MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL) != 0)
 			err(EXIT_FAILURE, _("mount %s failed"), binfmt_mnt);
 	}
+	if (newinterp && !is_fixed(newinterp))
+		load_interp(binfmt_mnt, newinterp);
 
 	if (force_gid) {
 		if (setgroups(0, NULL) != 0)	/* drop supplementary groups */
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] unshare: load binfmt_misc interpreter
  2024-06-11  8:43 ` [PATCH v2 2/2] unshare: load binfmt_misc interpreter Laurent Vivier
@ 2024-06-18  9:51   ` Karel Zak
  2024-06-18 10:13     ` Laurent Vivier
  0 siblings, 1 reply; 6+ messages in thread
From: Karel Zak @ 2024-06-18  9:51 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: util-linux


 Hi Laurent,

On Tue, Jun 11, 2024 at 10:43:14AM +0200, Laurent Vivier wrote:
> +*-l*, **--load-interp=**__file__::
> +Load binfmt_misc definition in the namespace (implies *--mount-binfmt*).

Is it actually a file, or does the argument have a more complex
format? If there is something more that it should be described here.
It fine describe in the man page more about the interpreters.

> +
>  *--monotonic* _offset_::
>  Set the offset of *CLOCK_MONOTONIC* which will be used in the entered time namespace. This option requires unsharing a time namespace with *--time*.
>  
> @@ -256,6 +259,13 @@ up 21 hours, 30 minutes
>  up 9 years, 28 weeks, 1 day, 2 hours, 50 minutes
>  ....
>  
> +The following example execute a chroot into the directory /chroot/powerpc/jessie and install the interpreter /bin/qemu-ppc-static to execute the powerpc binaries.
> +If the interpreter is defined with the flag F, the interpreter is loaded before the chroot otherwise the interpreter is loaded from inside the chroot.
> +
> +....
> +$  unshare --map-root-user --fork --pid --load-interp=":qemu-ppc:M::\\x7fELF\x01\\x02\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x14:\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\x00\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xfe\\xff\\xff:/bin/qemu-ppc-static:OCF" --root=/chroot/powerpc/jessie /bin/bash -l
> +....

As an uneducated reader, I am confused by the flags. Where is the 'F'
flag? Perhaps you could provide more explanation to make it easier for
readers to understand.

>  == AUTHORS
>  
>  mailto:dottedmag@dottedmag.net[Mikhail Gusarov],
> diff --git a/sys-utils/unshare.c b/sys-utils/unshare.c
> index d79aa1125955..f8e1141840ca 100644
> --- a/sys-utils/unshare.c
> +++ b/sys-utils/unshare.c
> @@ -725,6 +725,35 @@ static pid_t map_ids_from_child(int *fd, uid_t mapuser,
>  	exit(EXIT_SUCCESS);
>  }
>  
> +static int is_fixed(const char *interp)
> +{
> +	const char *flags;
> +
> +	flags = strrchr(interp, ':');
> +
> +	return strchr(flags, 'F') != NULL;
> +}
> +
> +static void load_interp(const char *binfmt_mnt, const char *interp)
> +{
> +	int dirfd, fd;
> +
> +	dirfd = open(binfmt_mnt, O_PATH | O_DIRECTORY);
> +	if (dirfd < 0)
> +		err(EXIT_FAILURE, _("cannot open %s"), binfmt_mnt);
> +
> +	fd = openat(dirfd, "register", O_WRONLY);
> +	if (fd < 0)
> +		err(EXIT_FAILURE, _("cannot open %s/register"), binfmt_mnt);
> +
> +	if (write_all(fd, interp, strlen(interp)))
> +		err(EXIT_FAILURE, _("write failed %s/register"), binfmt_mnt);
> +
> +	close(fd);
> +
> +	close(dirfd);
> +}
> +
>  static void __attribute__((__noreturn__)) usage(void)
>  {
>  	FILE *out = stdout;
> @@ -772,6 +801,7 @@ static void __attribute__((__noreturn__)) usage(void)
>  	fputs(_(" -G, --setgid <gid>        set gid in entered namespace\n"), out);
>  	fputs(_(" --monotonic <offset>      set clock monotonic offset (seconds) in time namespaces\n"), out);
>  	fputs(_(" --boottime <offset>       set clock boottime offset (seconds) in time namespaces\n"), out);
> +	fputs(_(" -l, --load-interp <file>  load binfmt definition in the namespace (implies --mount-binfmt)\n"), out);
>  
>  	fputs(USAGE_SEPARATOR, out);
>  	fprintf(out, USAGE_HELP_OPTIONS(27));
> @@ -830,6 +860,7 @@ int main(int argc, char *argv[])
>  		{ "wd",		   required_argument, NULL, 'w'		    },
>  		{ "monotonic",     required_argument, NULL, OPT_MONOTONIC   },
>  		{ "boottime",      required_argument, NULL, OPT_BOOTTIME    },
> +		{ "load-interp",   required_argument, NULL, 'l'		    },
>  		{ NULL, 0, NULL, 0 }
>  	};
>  
> @@ -846,6 +877,7 @@ int main(int argc, char *argv[])
>  	const char *newroot = NULL;
>  	const char *newdir = NULL;
>  	pid_t pid_bind = 0, pid_idmap = 0;
> +	const char *newinterp = NULL;
>  	pid_t pid = 0;
>  #ifdef UL_HAVE_PIDFD
>  	int fd_parent_pid = -1;
> @@ -868,7 +900,7 @@ int main(int argc, char *argv[])
>  	textdomain(PACKAGE);
>  	close_stdout_atexit();
>  
> -	while ((c = getopt_long(argc, argv, "+fhVmuinpCTUrR:w:S:G:c", longopts, NULL)) != -1) {
> +	while ((c = getopt_long(argc, argv, "+fhVmuinpCTUrR:w:S:G:cl:", longopts, NULL)) != -1) {
>  		switch (c) {
>  		case 'f':
>  			forkit = 1;
> @@ -1011,6 +1043,15 @@ int main(int argc, char *argv[])
>  			boottime = strtos64_or_err(optarg, _("failed to parse boottime offset"));
>  			force_boottime = 1;
>  			break;
> +		case 'l':
> +			unshare_flags |= CLONE_NEWNS | CLONE_NEWUSER;
> +			if (!binfmt_mnt) {
> +				if (!procmnt)
> +					procmnt = "/proc";
> +				binfmt_mnt = _PATH_PROC_BINFMT_MISC;
> +			}
> +			newinterp = optarg;
> +			break;
>  
>  		case 'h':
>  			usage();
> @@ -1165,6 +1206,13 @@ int main(int argc, char *argv[])
>  	if ((unshare_flags & CLONE_NEWNS) && propagation)
>  		set_propagation(propagation);
>  
> +	if (newinterp && is_fixed(newinterp)) {
> +		if (mount("binfmt_misc", _PATH_PROC_BINFMT_MISC, "binfmt_misc",
> +			  MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL) != 0)
> +			err(EXIT_FAILURE, _("mount %s failed"), _PATH_PROC_BINFMT_MISC);
> +		load_interp(_PATH_PROC_BINFMT_MISC, newinterp);
> +	}

If I understand correctly, using --load-interp with 'F' calls
mount(binfmt_misc) twice:

1) before chroot
2) after chroot() and after mount(/proc) (implies --mount-binfmt and
   --mount-proc too)

I believe it would be helpful to include this information in the man
page.

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] unshare: load binfmt_misc interpreter
  2024-06-18  9:51   ` Karel Zak
@ 2024-06-18 10:13     ` Laurent Vivier
  2024-06-18 11:58       ` Karel Zak
  0 siblings, 1 reply; 6+ messages in thread
From: Laurent Vivier @ 2024-06-18 10:13 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux

Le 18/06/2024 à 11:51, Karel Zak a écrit :
> 
>   Hi Laurent,
> 

Hi Karel,

> On Tue, Jun 11, 2024 at 10:43:14AM +0200, Laurent Vivier wrote:
>> +*-l*, **--load-interp=**__file__::
>> +Load binfmt_misc definition in the namespace (implies *--mount-binfmt*).
> 
> Is it actually a file, or does the argument have a more complex
> format? If there is something more that it should be described here.
> It fine describe in the man page more about the interpreters.

Your right the format here is not actually a file, but it defines how to use the file provided in 
the parameter as an interpreter.

We provide here what we will write in /proc/sys/fs/binfmt_misc/register and the format is described 
in https://www.kernel.org/doc/Documentation/admin-guide/binfmt-misc.rst:

"To actually register a new binary type, you have to set up a string looking like 
``:name:type:offset:magic:mask:interpreter:flags``
[...]
- ``name``
    is an identifier string. A new /proc file will be created with this
    name below ``/proc/sys/fs/binfmt_misc``
- ``type``
    is the type of recognition. Give ``M`` for magic and ``E`` for extension.
- ``offset``
    is the offset of the magic/mask in the file
- ``magic``
    is the byte sequence binfmt_misc is matching for.
- ``mask``
    is an (optional, defaults to all 0xff) mask.
- ``interpreter``
    is the program that should be invoked with the binary as first
    argument
- ``flags``
    is an optional field that controls several aspects of the invocation
    of the interpreter.
    ``P`` - preserve-argv[0]
             Legacy behavior of binfmt_misc is to overwrite
             the original argv[0] with the full path to the binary. When this
             flag is included, binfmt_misc will add an argument to the argument
             vector for this purpose, thus preserving the original ``argv[0]``.
     ``O`` - open-binary
	    Legacy behavior of binfmt_misc is to pass the full path
             of the binary to the interpreter as an argument. When this flag is
             included, binfmt_misc will open the file for reading and pass its
             descriptor as an argument
     ``C`` - credentials
             Currently, the behavior of binfmt_misc is to calculate
             the credentials and security token of the new process according to
             the interpreter. When this flag is included, these attributes are
             calculated according to the binary
``F`` - fix binary
             The usual behaviour of binfmt_misc is to spawn the
	    binary lazily when the misc format file is invoked.  However,
	    this doesn't work very well in the face of mount namespaces and
	    changeroots, so the ``F`` mode opens the binary as soon as the
	    emulation is installed and uses the opened image to spawn the
	    emulator"
> 
>> +
>>   *--monotonic* _offset_::
>>   Set the offset of *CLOCK_MONOTONIC* which will be used in the entered time namespace. This option requires unsharing a time namespace with *--time*.
>>   
>> @@ -256,6 +259,13 @@ up 21 hours, 30 minutes
>>   up 9 years, 28 weeks, 1 day, 2 hours, 50 minutes
>>   ....
>>   
>> +The following example execute a chroot into the directory /chroot/powerpc/jessie and install the interpreter /bin/qemu-ppc-static to execute the powerpc binaries.
>> +If the interpreter is defined with the flag F, the interpreter is loaded before the chroot otherwise the interpreter is loaded from inside the chroot.
>> +
>> +....
>> +$  unshare --map-root-user --fork --pid --load-interp=":qemu-ppc:M::\\x7fELF\x01\\x02\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x14:\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\x00\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xfe\\xff\\xff:/bin/qemu-ppc-static:OCF" --root=/chroot/powerpc/jessie /bin/bash -l
>> +....
> 
> As an uneducated reader, I am confused by the flags. Where is the 'F'
> flag? Perhaps you could provide more explanation to make it easier for
> readers to understand.

I think this option should be used by educated user that is aware of binfmt_misc format.

Do you want I copy a part of the binfmt_misc documentation in the unshare documentation?

> 
>>   == AUTHORS
>>   
>>   mailto:dottedmag@dottedmag.net[Mikhail Gusarov],
>> diff --git a/sys-utils/unshare.c b/sys-utils/unshare.c
>> index d79aa1125955..f8e1141840ca 100644
>> --- a/sys-utils/unshare.c
>> +++ b/sys-utils/unshare.c
>> @@ -725,6 +725,35 @@ static pid_t map_ids_from_child(int *fd, uid_t mapuser,
>>   	exit(EXIT_SUCCESS);
>>   }
>>   
>> +static int is_fixed(const char *interp)
>> +{
>> +	const char *flags;
>> +
>> +	flags = strrchr(interp, ':');
>> +
>> +	return strchr(flags, 'F') != NULL;
>> +}
>> +
>> +static void load_interp(const char *binfmt_mnt, const char *interp)
>> +{
>> +	int dirfd, fd;
>> +
>> +	dirfd = open(binfmt_mnt, O_PATH | O_DIRECTORY);
>> +	if (dirfd < 0)
>> +		err(EXIT_FAILURE, _("cannot open %s"), binfmt_mnt);
>> +
>> +	fd = openat(dirfd, "register", O_WRONLY);
>> +	if (fd < 0)
>> +		err(EXIT_FAILURE, _("cannot open %s/register"), binfmt_mnt);
>> +
>> +	if (write_all(fd, interp, strlen(interp)))
>> +		err(EXIT_FAILURE, _("write failed %s/register"), binfmt_mnt);
>> +
>> +	close(fd);
>> +
>> +	close(dirfd);
>> +}
>> +
>>   static void __attribute__((__noreturn__)) usage(void)
>>   {
>>   	FILE *out = stdout;
>> @@ -772,6 +801,7 @@ static void __attribute__((__noreturn__)) usage(void)
>>   	fputs(_(" -G, --setgid <gid>        set gid in entered namespace\n"), out);
>>   	fputs(_(" --monotonic <offset>      set clock monotonic offset (seconds) in time namespaces\n"), out);
>>   	fputs(_(" --boottime <offset>       set clock boottime offset (seconds) in time namespaces\n"), out);
>> +	fputs(_(" -l, --load-interp <file>  load binfmt definition in the namespace (implies --mount-binfmt)\n"), out);
>>   
>>   	fputs(USAGE_SEPARATOR, out);
>>   	fprintf(out, USAGE_HELP_OPTIONS(27));
>> @@ -830,6 +860,7 @@ int main(int argc, char *argv[])
>>   		{ "wd",		   required_argument, NULL, 'w'		    },
>>   		{ "monotonic",     required_argument, NULL, OPT_MONOTONIC   },
>>   		{ "boottime",      required_argument, NULL, OPT_BOOTTIME    },
>> +		{ "load-interp",   required_argument, NULL, 'l'		    },
>>   		{ NULL, 0, NULL, 0 }
>>   	};
>>   
>> @@ -846,6 +877,7 @@ int main(int argc, char *argv[])
>>   	const char *newroot = NULL;
>>   	const char *newdir = NULL;
>>   	pid_t pid_bind = 0, pid_idmap = 0;
>> +	const char *newinterp = NULL;
>>   	pid_t pid = 0;
>>   #ifdef UL_HAVE_PIDFD
>>   	int fd_parent_pid = -1;
>> @@ -868,7 +900,7 @@ int main(int argc, char *argv[])
>>   	textdomain(PACKAGE);
>>   	close_stdout_atexit();
>>   
>> -	while ((c = getopt_long(argc, argv, "+fhVmuinpCTUrR:w:S:G:c", longopts, NULL)) != -1) {
>> +	while ((c = getopt_long(argc, argv, "+fhVmuinpCTUrR:w:S:G:cl:", longopts, NULL)) != -1) {
>>   		switch (c) {
>>   		case 'f':
>>   			forkit = 1;
>> @@ -1011,6 +1043,15 @@ int main(int argc, char *argv[])
>>   			boottime = strtos64_or_err(optarg, _("failed to parse boottime offset"));
>>   			force_boottime = 1;
>>   			break;
>> +		case 'l':
>> +			unshare_flags |= CLONE_NEWNS | CLONE_NEWUSER;
>> +			if (!binfmt_mnt) {
>> +				if (!procmnt)
>> +					procmnt = "/proc";
>> +				binfmt_mnt = _PATH_PROC_BINFMT_MISC;
>> +			}
>> +			newinterp = optarg;
>> +			break;
>>   
>>   		case 'h':
>>   			usage();
>> @@ -1165,6 +1206,13 @@ int main(int argc, char *argv[])
>>   	if ((unshare_flags & CLONE_NEWNS) && propagation)
>>   		set_propagation(propagation);
>>   
>> +	if (newinterp && is_fixed(newinterp)) {
>> +		if (mount("binfmt_misc", _PATH_PROC_BINFMT_MISC, "binfmt_misc",
>> +			  MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL) != 0)
>> +			err(EXIT_FAILURE, _("mount %s failed"), _PATH_PROC_BINFMT_MISC);
>> +		load_interp(_PATH_PROC_BINFMT_MISC, newinterp);
>> +	}
> 
> If I understand correctly, using --load-interp with 'F' calls
> mount(binfmt_misc) twice:
> 
> 1) before chroot
> 2) after chroot() and after mount(/proc) (implies --mount-binfmt and
>     --mount-proc too)

Yes, it's needed before chroot to load the interpreter from the caller filesystem.
it's not needed after the chroot in this case, it's only there for consistency to have it in the 
chroot as we asked it on the command line. I think it can be removed if you prefer.
> 
> I believe it would be helpful to include this information in the man
> page.

I'll update the man page accordingly.

Thanks,
Laurent
>      Karel
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] unshare: load binfmt_misc interpreter
  2024-06-18 10:13     ` Laurent Vivier
@ 2024-06-18 11:58       ` Karel Zak
  0 siblings, 0 replies; 6+ messages in thread
From: Karel Zak @ 2024-06-18 11:58 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: util-linux

On Tue, Jun 18, 2024 at 12:13:50PM +0200, Laurent Vivier wrote:
> Le 18/06/2024 à 11:51, Karel Zak a écrit :
> > 
> >   Hi Laurent,
> > 
> 
> Hi Karel,
> 
> > On Tue, Jun 11, 2024 at 10:43:14AM +0200, Laurent Vivier wrote:
> > > +*-l*, **--load-interp=**__file__::
> > > +Load binfmt_misc definition in the namespace (implies *--mount-binfmt*).
> > 
> > Is it actually a file, or does the argument have a more complex
> > format? If there is something more that it should be described here.
> > It fine describe in the man page more about the interpreters.
> 
> Your right the format here is not actually a file, but it defines how to use
> the file provided in the parameter as an interpreter.
> 
> We provide here what we will write in /proc/sys/fs/binfmt_misc/register and
> the format is described in
> https://www.kernel.org/doc/Documentation/admin-guide/binfmt-misc.rst:
> 
> "To actually register a new binary type, you have to set up a string looking
> like ``:name:type:offset:magic:mask:interpreter:flags``

 I guess we can use something like:

 -l*, **--load-interp=**string

 Load binfmt_misc definition in the namespace. The __string__ argument
 is ``:name:type:offset:magic:mask:interpreter:flags``. For more
 details about new binary type registration see
 https://www.kernel.org/doc/Documentation/admin-guide/binfmt-misc.rst.


> > As an uneducated reader, I am confused by the flags. Where is the 'F'
> > flag? Perhaps you could provide more explanation to make it easier for
> > readers to understand.
> 
> I think this option should be used by educated user that is aware of binfmt_misc format.
> 
> Do you want I copy a part of the binfmt_misc documentation in the unshare documentation?

It's probably overkill copy all the text.

> I'll update the man page accordingly.

Thanks!

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-06-18 11:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-11  8:43 [PATCH v2 0/2] unshare: manage binfmt_misc mounts Laurent Vivier
2024-06-11  8:43 ` [PATCH v2 1/2] unshare: mount binfmt_misc Laurent Vivier
2024-06-11  8:43 ` [PATCH v2 2/2] unshare: load binfmt_misc interpreter Laurent Vivier
2024-06-18  9:51   ` Karel Zak
2024-06-18 10:13     ` Laurent Vivier
2024-06-18 11:58       ` Karel Zak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).