* [PATCH v2 0/2] unshare: manage binfmt_misc mounts
@ 2024-06-11 8:43 Laurent Vivier
2024-06-11 8:43 ` [PATCH v2 1/2] unshare: mount binfmt_misc Laurent Vivier
2024-06-11 8:43 ` [PATCH v2 2/2] unshare: load binfmt_misc interpreter Laurent Vivier
0 siblings, 2 replies; 6+ messages in thread
From: Laurent Vivier @ 2024-06-11 8:43 UTC (permalink / raw)
To: util-linux; +Cc: Laurent Vivier
Since linux v6.7 and
commit 21ca59b365c0 ("binfmt_misc: enable sandboxed mounts"),
binfmt_misc can be mountable in a non-initial user namespace by
a non privileged user.
Extend unshare to manage it:
- add --mount-binfmt[=<dir>] to mount binfmt_misc filesystem, this
results in clearing inherited interpreters from the previous namespace
- add -l, --load-interp <file> to load a binfmt_misc interpreter at startup.
The interpreter is loaded from the initial fileystem if the 'F' flags is
provided, otherwise from inside the new namespace
This makes possible to start a chroot of another architecture without
being root.
For instance:
With 'F' flag, load the interpreter from the initial namespace:
$ /bin/qemu-m68k-static --version
qemu-m68k version 8.2.2 (qemu-8.2.2-1.fc40)
Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers
$ unshare --map-root-user --fork --pid --load-interp=":qemu-m68k:M::\\x7fELF\\x01\\x02\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x04:\\xff\\xff\\xff\\xff\\xff\\xff\\xfe\\x00\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xfe\\xff\\xff:/bin/qemu-m68k-static:OCF" --root=chroot/m68k/sid
# QEMU_VERSION= ls
qemu-m68k version 8.2.2 (qemu-8.2.2-1.fc40)
Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers
# /qemu-m68k --version
qemu-m68k version 8.0.50 (v8.0.0-340-gb1cff5e2da95)
Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
Without 'F' flag, from inside the namespace:
$ unshare --map-root-user --fork --pid --load-interp=":qemu-m68k:M::\\x7fELF\\x01\\x02\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x04:\\xff\\xff\\xff\\xff\\xff\\xff\\xfe\\x00\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xfe\\xff\\xff:/qemu-m68k:OC" --root=chroot/m68k/sid
# QEMU_VERSION= ls
qemu-m68k version 8.0.50 (v8.0.0-340-gb1cff5e2da95)
Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
# /qemu-m68k --version
qemu-m68k version 8.0.50 (v8.0.0-340-gb1cff5e2da95)
Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
v2:
- use <binfmt_mnt>/register rather than _PATH_PROC_BINFMT_MISC_REGISTER to load the interpreter
Laurent Vivier (2):
unshare: mount binfmt_misc
unshare: load binfmt_misc interpreter
include/pathnames.h | 1 +
sys-utils/unshare.1.adoc | 13 ++++++++
sys-utils/unshare.c | 71 +++++++++++++++++++++++++++++++++++++++-
3 files changed, 84 insertions(+), 1 deletion(-)
--
2.45.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/2] unshare: mount binfmt_misc
2024-06-11 8:43 [PATCH v2 0/2] unshare: manage binfmt_misc mounts Laurent Vivier
@ 2024-06-11 8:43 ` Laurent Vivier
2024-06-11 8:43 ` [PATCH v2 2/2] unshare: load binfmt_misc interpreter Laurent Vivier
1 sibling, 0 replies; 6+ messages in thread
From: Laurent Vivier @ 2024-06-11 8:43 UTC (permalink / raw)
To: util-linux; +Cc: Laurent Vivier
add --mount-binfmt[=<dir>] to mount binfmt_misc filesystem,
this results in clearing inherited interpreters from the previous namespace
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
---
Notes:
v2:
- remove definition of _PATH_PROC_BINFMT_MISC_REGISTER
include/pathnames.h | 1 +
sys-utils/unshare.1.adoc | 3 +++
sys-utils/unshare.c | 19 +++++++++++++++++++
3 files changed, 23 insertions(+)
diff --git a/include/pathnames.h b/include/pathnames.h
index 81fa405f63c7..569bef17f982 100644
--- a/include/pathnames.h
+++ b/include/pathnames.h
@@ -204,6 +204,7 @@
/* sysctl fs paths */
#define _PATH_PROC_SYS_FS "/proc/sys/fs"
#define _PATH_PROC_PIPE_MAX_SIZE _PATH_PROC_SYS_FS "/pipe-max-size"
+#define _PATH_PROC_BINFMT_MISC _PATH_PROC_SYS_FS "/binfmt_misc"
/* irqtop paths */
#define _PATH_PROC_INTERRUPTS "/proc/interrupts"
diff --git a/sys-utils/unshare.1.adoc b/sys-utils/unshare.1.adoc
index e6201e28fffd..48d1a5579282 100644
--- a/sys-utils/unshare.1.adoc
+++ b/sys-utils/unshare.1.adoc
@@ -90,6 +90,9 @@ When *unshare* terminates, have _signame_ be sent to the forked child process. C
*--mount-proc*[**=**__mountpoint__]::
Just before running the program, mount the proc filesystem at _mountpoint_ (default is _/proc_). This is useful when creating a new PID namespace. It also implies creating a new mount namespace since the _/proc_ mount would otherwise mess up existing programs on the system. The new proc filesystem is explicitly mounted as private (with *MS_PRIVATE*|*MS_REC*).
+*--mount-binfmt*[**=**__mountpoint__]::
+Just before running the program, mount the binfmt_misc filesystem at _mountpoint_ (default is /proc/sys/fs/binfmt_misc). It also implies creating a new mount namespace since the binfmt_misc mount would otherwise mess up existing programs on the system. The new binfmt_misc filesystem is explicitly mounted as private (with *MS_PRIVATE*|*MS_REC*).
+
**--map-user=**__uid|name__::
Run the program only after the current effective user ID has been mapped to _uid_. If this option is specified multiple times, the last occurrence takes precedence. This option implies *--user*.
diff --git a/sys-utils/unshare.c b/sys-utils/unshare.c
index 57f3b8744fb5..d79aa1125955 100644
--- a/sys-utils/unshare.c
+++ b/sys-utils/unshare.c
@@ -760,6 +760,7 @@ static void __attribute__((__noreturn__)) usage(void)
fputs(_(" --kill-child[=<signame>] when dying, kill the forked child (implies --fork)\n"
" defaults to SIGKILL\n"), out);
fputs(_(" --mount-proc[=<dir>] mount proc filesystem first (implies --mount)\n"), out);
+ fputs(_(" --mount-binfmt[=<dir>] mount binfmt filesystem first (implies --user and --mount)\n"), out);
fputs(_(" --propagation slave|shared|private|unchanged\n"
" modify mount propagation in mount namespace\n"), out);
fputs(_(" --setgroups allow|deny control the setgroups syscall in user namespaces\n"), out);
@@ -783,6 +784,7 @@ int main(int argc, char *argv[])
{
enum {
OPT_MOUNTPROC = CHAR_MAX + 1,
+ OPT_MOUNTBINFMT,
OPT_PROPAGATION,
OPT_SETGROUPS,
OPT_KILLCHILD,
@@ -811,6 +813,7 @@ int main(int argc, char *argv[])
{ "fork", no_argument, NULL, 'f' },
{ "kill-child", optional_argument, NULL, OPT_KILLCHILD },
{ "mount-proc", optional_argument, NULL, OPT_MOUNTPROC },
+ { "mount-binfmt", optional_argument, NULL, OPT_MOUNTBINFMT },
{ "map-user", required_argument, NULL, OPT_MAPUSER },
{ "map-users", required_argument, NULL, OPT_MAPUSERS },
{ "map-group", required_argument, NULL, OPT_MAPGROUP },
@@ -839,6 +842,7 @@ int main(int argc, char *argv[])
struct map_range *groupmap = NULL;
int kill_child_signo = 0; /* 0 means --kill-child was not used */
const char *procmnt = NULL;
+ const char *binfmt_mnt = NULL;
const char *newroot = NULL;
const char *newdir = NULL;
pid_t pid_bind = 0, pid_idmap = 0;
@@ -913,6 +917,15 @@ int main(int argc, char *argv[])
unshare_flags |= CLONE_NEWNS;
procmnt = optarg ? optarg : "/proc";
break;
+ case OPT_MOUNTBINFMT:
+ unshare_flags |= CLONE_NEWNS | CLONE_NEWUSER;
+ binfmt_mnt = optarg;
+ if (!binfmt_mnt) {
+ if (!procmnt)
+ procmnt = "/proc";
+ binfmt_mnt = _PATH_PROC_BINFMT_MISC;
+ }
+ break;
case OPT_MAPUSER:
unshare_flags |= CLONE_NEWUSER;
mapuser = get_user(optarg, _("failed to parse uid"));
@@ -1178,6 +1191,12 @@ int main(int argc, char *argv[])
err(EXIT_FAILURE, _("mount %s failed"), procmnt);
}
+ if (binfmt_mnt) {
+ if (mount("binfmt_misc", binfmt_mnt, "binfmt_misc",
+ MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL) != 0)
+ err(EXIT_FAILURE, _("mount %s failed"), binfmt_mnt);
+ }
+
if (force_gid) {
if (setgroups(0, NULL) != 0) /* drop supplementary groups */
err(EXIT_FAILURE, _("setgroups failed"));
--
2.45.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 2/2] unshare: load binfmt_misc interpreter
2024-06-11 8:43 [PATCH v2 0/2] unshare: manage binfmt_misc mounts Laurent Vivier
2024-06-11 8:43 ` [PATCH v2 1/2] unshare: mount binfmt_misc Laurent Vivier
@ 2024-06-11 8:43 ` Laurent Vivier
2024-06-18 9:51 ` Karel Zak
1 sibling, 1 reply; 6+ messages in thread
From: Laurent Vivier @ 2024-06-11 8:43 UTC (permalink / raw)
To: util-linux; +Cc: Laurent Vivier
add -l, --load-interp <file> to load a binfmt_misc interpreter at startup.
The interpreter is loaded from the initial fileystem if the 'F' flags is
provided, otherwise from inside the new namespace
This makes possible to start a chroot of another architecture without
being root.
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
---
Notes:
v2:
- use <binfmt_mnt>/register rather than _PATH_PROC_BINFMT_MISC_REGISTER
to load the interpreter
sys-utils/unshare.1.adoc | 10 ++++++++
sys-utils/unshare.c | 52 +++++++++++++++++++++++++++++++++++++++-
2 files changed, 61 insertions(+), 1 deletion(-)
diff --git a/sys-utils/unshare.1.adoc b/sys-utils/unshare.1.adoc
index 48d1a5579282..24ac6fb01867 100644
--- a/sys-utils/unshare.1.adoc
+++ b/sys-utils/unshare.1.adoc
@@ -138,6 +138,9 @@ Set the user ID which will be used in the entered namespace.
*-G*, *--setgid* _gid_::
Set the group ID which will be used in the entered namespace and drop supplementary groups.
+*-l*, **--load-interp=**__file__::
+Load binfmt_misc definition in the namespace (implies *--mount-binfmt*).
+
*--monotonic* _offset_::
Set the offset of *CLOCK_MONOTONIC* which will be used in the entered time namespace. This option requires unsharing a time namespace with *--time*.
@@ -256,6 +259,13 @@ up 21 hours, 30 minutes
up 9 years, 28 weeks, 1 day, 2 hours, 50 minutes
....
+The following example execute a chroot into the directory /chroot/powerpc/jessie and install the interpreter /bin/qemu-ppc-static to execute the powerpc binaries.
+If the interpreter is defined with the flag F, the interpreter is loaded before the chroot otherwise the interpreter is loaded from inside the chroot.
+
+....
+$ unshare --map-root-user --fork --pid --load-interp=":qemu-ppc:M::\\x7fELF\x01\\x02\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x14:\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\x00\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xfe\\xff\\xff:/bin/qemu-ppc-static:OCF" --root=/chroot/powerpc/jessie /bin/bash -l
+....
+
== AUTHORS
mailto:dottedmag@dottedmag.net[Mikhail Gusarov],
diff --git a/sys-utils/unshare.c b/sys-utils/unshare.c
index d79aa1125955..f8e1141840ca 100644
--- a/sys-utils/unshare.c
+++ b/sys-utils/unshare.c
@@ -725,6 +725,35 @@ static pid_t map_ids_from_child(int *fd, uid_t mapuser,
exit(EXIT_SUCCESS);
}
+static int is_fixed(const char *interp)
+{
+ const char *flags;
+
+ flags = strrchr(interp, ':');
+
+ return strchr(flags, 'F') != NULL;
+}
+
+static void load_interp(const char *binfmt_mnt, const char *interp)
+{
+ int dirfd, fd;
+
+ dirfd = open(binfmt_mnt, O_PATH | O_DIRECTORY);
+ if (dirfd < 0)
+ err(EXIT_FAILURE, _("cannot open %s"), binfmt_mnt);
+
+ fd = openat(dirfd, "register", O_WRONLY);
+ if (fd < 0)
+ err(EXIT_FAILURE, _("cannot open %s/register"), binfmt_mnt);
+
+ if (write_all(fd, interp, strlen(interp)))
+ err(EXIT_FAILURE, _("write failed %s/register"), binfmt_mnt);
+
+ close(fd);
+
+ close(dirfd);
+}
+
static void __attribute__((__noreturn__)) usage(void)
{
FILE *out = stdout;
@@ -772,6 +801,7 @@ static void __attribute__((__noreturn__)) usage(void)
fputs(_(" -G, --setgid <gid> set gid in entered namespace\n"), out);
fputs(_(" --monotonic <offset> set clock monotonic offset (seconds) in time namespaces\n"), out);
fputs(_(" --boottime <offset> set clock boottime offset (seconds) in time namespaces\n"), out);
+ fputs(_(" -l, --load-interp <file> load binfmt definition in the namespace (implies --mount-binfmt)\n"), out);
fputs(USAGE_SEPARATOR, out);
fprintf(out, USAGE_HELP_OPTIONS(27));
@@ -830,6 +860,7 @@ int main(int argc, char *argv[])
{ "wd", required_argument, NULL, 'w' },
{ "monotonic", required_argument, NULL, OPT_MONOTONIC },
{ "boottime", required_argument, NULL, OPT_BOOTTIME },
+ { "load-interp", required_argument, NULL, 'l' },
{ NULL, 0, NULL, 0 }
};
@@ -846,6 +877,7 @@ int main(int argc, char *argv[])
const char *newroot = NULL;
const char *newdir = NULL;
pid_t pid_bind = 0, pid_idmap = 0;
+ const char *newinterp = NULL;
pid_t pid = 0;
#ifdef UL_HAVE_PIDFD
int fd_parent_pid = -1;
@@ -868,7 +900,7 @@ int main(int argc, char *argv[])
textdomain(PACKAGE);
close_stdout_atexit();
- while ((c = getopt_long(argc, argv, "+fhVmuinpCTUrR:w:S:G:c", longopts, NULL)) != -1) {
+ while ((c = getopt_long(argc, argv, "+fhVmuinpCTUrR:w:S:G:cl:", longopts, NULL)) != -1) {
switch (c) {
case 'f':
forkit = 1;
@@ -1011,6 +1043,15 @@ int main(int argc, char *argv[])
boottime = strtos64_or_err(optarg, _("failed to parse boottime offset"));
force_boottime = 1;
break;
+ case 'l':
+ unshare_flags |= CLONE_NEWNS | CLONE_NEWUSER;
+ if (!binfmt_mnt) {
+ if (!procmnt)
+ procmnt = "/proc";
+ binfmt_mnt = _PATH_PROC_BINFMT_MISC;
+ }
+ newinterp = optarg;
+ break;
case 'h':
usage();
@@ -1165,6 +1206,13 @@ int main(int argc, char *argv[])
if ((unshare_flags & CLONE_NEWNS) && propagation)
set_propagation(propagation);
+ if (newinterp && is_fixed(newinterp)) {
+ if (mount("binfmt_misc", _PATH_PROC_BINFMT_MISC, "binfmt_misc",
+ MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL) != 0)
+ err(EXIT_FAILURE, _("mount %s failed"), _PATH_PROC_BINFMT_MISC);
+ load_interp(_PATH_PROC_BINFMT_MISC, newinterp);
+ }
+
if (newroot) {
if (chroot(newroot) != 0)
err(EXIT_FAILURE,
@@ -1196,6 +1244,8 @@ int main(int argc, char *argv[])
MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL) != 0)
err(EXIT_FAILURE, _("mount %s failed"), binfmt_mnt);
}
+ if (newinterp && !is_fixed(newinterp))
+ load_interp(binfmt_mnt, newinterp);
if (force_gid) {
if (setgroups(0, NULL) != 0) /* drop supplementary groups */
--
2.45.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] unshare: load binfmt_misc interpreter
2024-06-11 8:43 ` [PATCH v2 2/2] unshare: load binfmt_misc interpreter Laurent Vivier
@ 2024-06-18 9:51 ` Karel Zak
2024-06-18 10:13 ` Laurent Vivier
0 siblings, 1 reply; 6+ messages in thread
From: Karel Zak @ 2024-06-18 9:51 UTC (permalink / raw)
To: Laurent Vivier; +Cc: util-linux
Hi Laurent,
On Tue, Jun 11, 2024 at 10:43:14AM +0200, Laurent Vivier wrote:
> +*-l*, **--load-interp=**__file__::
> +Load binfmt_misc definition in the namespace (implies *--mount-binfmt*).
Is it actually a file, or does the argument have a more complex
format? If there is something more that it should be described here.
It fine describe in the man page more about the interpreters.
> +
> *--monotonic* _offset_::
> Set the offset of *CLOCK_MONOTONIC* which will be used in the entered time namespace. This option requires unsharing a time namespace with *--time*.
>
> @@ -256,6 +259,13 @@ up 21 hours, 30 minutes
> up 9 years, 28 weeks, 1 day, 2 hours, 50 minutes
> ....
>
> +The following example execute a chroot into the directory /chroot/powerpc/jessie and install the interpreter /bin/qemu-ppc-static to execute the powerpc binaries.
> +If the interpreter is defined with the flag F, the interpreter is loaded before the chroot otherwise the interpreter is loaded from inside the chroot.
> +
> +....
> +$ unshare --map-root-user --fork --pid --load-interp=":qemu-ppc:M::\\x7fELF\x01\\x02\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x14:\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\x00\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xfe\\xff\\xff:/bin/qemu-ppc-static:OCF" --root=/chroot/powerpc/jessie /bin/bash -l
> +....
As an uneducated reader, I am confused by the flags. Where is the 'F'
flag? Perhaps you could provide more explanation to make it easier for
readers to understand.
> == AUTHORS
>
> mailto:dottedmag@dottedmag.net[Mikhail Gusarov],
> diff --git a/sys-utils/unshare.c b/sys-utils/unshare.c
> index d79aa1125955..f8e1141840ca 100644
> --- a/sys-utils/unshare.c
> +++ b/sys-utils/unshare.c
> @@ -725,6 +725,35 @@ static pid_t map_ids_from_child(int *fd, uid_t mapuser,
> exit(EXIT_SUCCESS);
> }
>
> +static int is_fixed(const char *interp)
> +{
> + const char *flags;
> +
> + flags = strrchr(interp, ':');
> +
> + return strchr(flags, 'F') != NULL;
> +}
> +
> +static void load_interp(const char *binfmt_mnt, const char *interp)
> +{
> + int dirfd, fd;
> +
> + dirfd = open(binfmt_mnt, O_PATH | O_DIRECTORY);
> + if (dirfd < 0)
> + err(EXIT_FAILURE, _("cannot open %s"), binfmt_mnt);
> +
> + fd = openat(dirfd, "register", O_WRONLY);
> + if (fd < 0)
> + err(EXIT_FAILURE, _("cannot open %s/register"), binfmt_mnt);
> +
> + if (write_all(fd, interp, strlen(interp)))
> + err(EXIT_FAILURE, _("write failed %s/register"), binfmt_mnt);
> +
> + close(fd);
> +
> + close(dirfd);
> +}
> +
> static void __attribute__((__noreturn__)) usage(void)
> {
> FILE *out = stdout;
> @@ -772,6 +801,7 @@ static void __attribute__((__noreturn__)) usage(void)
> fputs(_(" -G, --setgid <gid> set gid in entered namespace\n"), out);
> fputs(_(" --monotonic <offset> set clock monotonic offset (seconds) in time namespaces\n"), out);
> fputs(_(" --boottime <offset> set clock boottime offset (seconds) in time namespaces\n"), out);
> + fputs(_(" -l, --load-interp <file> load binfmt definition in the namespace (implies --mount-binfmt)\n"), out);
>
> fputs(USAGE_SEPARATOR, out);
> fprintf(out, USAGE_HELP_OPTIONS(27));
> @@ -830,6 +860,7 @@ int main(int argc, char *argv[])
> { "wd", required_argument, NULL, 'w' },
> { "monotonic", required_argument, NULL, OPT_MONOTONIC },
> { "boottime", required_argument, NULL, OPT_BOOTTIME },
> + { "load-interp", required_argument, NULL, 'l' },
> { NULL, 0, NULL, 0 }
> };
>
> @@ -846,6 +877,7 @@ int main(int argc, char *argv[])
> const char *newroot = NULL;
> const char *newdir = NULL;
> pid_t pid_bind = 0, pid_idmap = 0;
> + const char *newinterp = NULL;
> pid_t pid = 0;
> #ifdef UL_HAVE_PIDFD
> int fd_parent_pid = -1;
> @@ -868,7 +900,7 @@ int main(int argc, char *argv[])
> textdomain(PACKAGE);
> close_stdout_atexit();
>
> - while ((c = getopt_long(argc, argv, "+fhVmuinpCTUrR:w:S:G:c", longopts, NULL)) != -1) {
> + while ((c = getopt_long(argc, argv, "+fhVmuinpCTUrR:w:S:G:cl:", longopts, NULL)) != -1) {
> switch (c) {
> case 'f':
> forkit = 1;
> @@ -1011,6 +1043,15 @@ int main(int argc, char *argv[])
> boottime = strtos64_or_err(optarg, _("failed to parse boottime offset"));
> force_boottime = 1;
> break;
> + case 'l':
> + unshare_flags |= CLONE_NEWNS | CLONE_NEWUSER;
> + if (!binfmt_mnt) {
> + if (!procmnt)
> + procmnt = "/proc";
> + binfmt_mnt = _PATH_PROC_BINFMT_MISC;
> + }
> + newinterp = optarg;
> + break;
>
> case 'h':
> usage();
> @@ -1165,6 +1206,13 @@ int main(int argc, char *argv[])
> if ((unshare_flags & CLONE_NEWNS) && propagation)
> set_propagation(propagation);
>
> + if (newinterp && is_fixed(newinterp)) {
> + if (mount("binfmt_misc", _PATH_PROC_BINFMT_MISC, "binfmt_misc",
> + MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL) != 0)
> + err(EXIT_FAILURE, _("mount %s failed"), _PATH_PROC_BINFMT_MISC);
> + load_interp(_PATH_PROC_BINFMT_MISC, newinterp);
> + }
If I understand correctly, using --load-interp with 'F' calls
mount(binfmt_misc) twice:
1) before chroot
2) after chroot() and after mount(/proc) (implies --mount-binfmt and
--mount-proc too)
I believe it would be helpful to include this information in the man
page.
Karel
--
Karel Zak <kzak@redhat.com>
http://karelzak.blogspot.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] unshare: load binfmt_misc interpreter
2024-06-18 9:51 ` Karel Zak
@ 2024-06-18 10:13 ` Laurent Vivier
2024-06-18 11:58 ` Karel Zak
0 siblings, 1 reply; 6+ messages in thread
From: Laurent Vivier @ 2024-06-18 10:13 UTC (permalink / raw)
To: Karel Zak; +Cc: util-linux
Le 18/06/2024 à 11:51, Karel Zak a écrit :
>
> Hi Laurent,
>
Hi Karel,
> On Tue, Jun 11, 2024 at 10:43:14AM +0200, Laurent Vivier wrote:
>> +*-l*, **--load-interp=**__file__::
>> +Load binfmt_misc definition in the namespace (implies *--mount-binfmt*).
>
> Is it actually a file, or does the argument have a more complex
> format? If there is something more that it should be described here.
> It fine describe in the man page more about the interpreters.
Your right the format here is not actually a file, but it defines how to use the file provided in
the parameter as an interpreter.
We provide here what we will write in /proc/sys/fs/binfmt_misc/register and the format is described
in https://www.kernel.org/doc/Documentation/admin-guide/binfmt-misc.rst:
"To actually register a new binary type, you have to set up a string looking like
``:name:type:offset:magic:mask:interpreter:flags``
[...]
- ``name``
is an identifier string. A new /proc file will be created with this
name below ``/proc/sys/fs/binfmt_misc``
- ``type``
is the type of recognition. Give ``M`` for magic and ``E`` for extension.
- ``offset``
is the offset of the magic/mask in the file
- ``magic``
is the byte sequence binfmt_misc is matching for.
- ``mask``
is an (optional, defaults to all 0xff) mask.
- ``interpreter``
is the program that should be invoked with the binary as first
argument
- ``flags``
is an optional field that controls several aspects of the invocation
of the interpreter.
``P`` - preserve-argv[0]
Legacy behavior of binfmt_misc is to overwrite
the original argv[0] with the full path to the binary. When this
flag is included, binfmt_misc will add an argument to the argument
vector for this purpose, thus preserving the original ``argv[0]``.
``O`` - open-binary
Legacy behavior of binfmt_misc is to pass the full path
of the binary to the interpreter as an argument. When this flag is
included, binfmt_misc will open the file for reading and pass its
descriptor as an argument
``C`` - credentials
Currently, the behavior of binfmt_misc is to calculate
the credentials and security token of the new process according to
the interpreter. When this flag is included, these attributes are
calculated according to the binary
``F`` - fix binary
The usual behaviour of binfmt_misc is to spawn the
binary lazily when the misc format file is invoked. However,
this doesn't work very well in the face of mount namespaces and
changeroots, so the ``F`` mode opens the binary as soon as the
emulation is installed and uses the opened image to spawn the
emulator"
>
>> +
>> *--monotonic* _offset_::
>> Set the offset of *CLOCK_MONOTONIC* which will be used in the entered time namespace. This option requires unsharing a time namespace with *--time*.
>>
>> @@ -256,6 +259,13 @@ up 21 hours, 30 minutes
>> up 9 years, 28 weeks, 1 day, 2 hours, 50 minutes
>> ....
>>
>> +The following example execute a chroot into the directory /chroot/powerpc/jessie and install the interpreter /bin/qemu-ppc-static to execute the powerpc binaries.
>> +If the interpreter is defined with the flag F, the interpreter is loaded before the chroot otherwise the interpreter is loaded from inside the chroot.
>> +
>> +....
>> +$ unshare --map-root-user --fork --pid --load-interp=":qemu-ppc:M::\\x7fELF\x01\\x02\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x14:\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\x00\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xff\\xfe\\xff\\xff:/bin/qemu-ppc-static:OCF" --root=/chroot/powerpc/jessie /bin/bash -l
>> +....
>
> As an uneducated reader, I am confused by the flags. Where is the 'F'
> flag? Perhaps you could provide more explanation to make it easier for
> readers to understand.
I think this option should be used by educated user that is aware of binfmt_misc format.
Do you want I copy a part of the binfmt_misc documentation in the unshare documentation?
>
>> == AUTHORS
>>
>> mailto:dottedmag@dottedmag.net[Mikhail Gusarov],
>> diff --git a/sys-utils/unshare.c b/sys-utils/unshare.c
>> index d79aa1125955..f8e1141840ca 100644
>> --- a/sys-utils/unshare.c
>> +++ b/sys-utils/unshare.c
>> @@ -725,6 +725,35 @@ static pid_t map_ids_from_child(int *fd, uid_t mapuser,
>> exit(EXIT_SUCCESS);
>> }
>>
>> +static int is_fixed(const char *interp)
>> +{
>> + const char *flags;
>> +
>> + flags = strrchr(interp, ':');
>> +
>> + return strchr(flags, 'F') != NULL;
>> +}
>> +
>> +static void load_interp(const char *binfmt_mnt, const char *interp)
>> +{
>> + int dirfd, fd;
>> +
>> + dirfd = open(binfmt_mnt, O_PATH | O_DIRECTORY);
>> + if (dirfd < 0)
>> + err(EXIT_FAILURE, _("cannot open %s"), binfmt_mnt);
>> +
>> + fd = openat(dirfd, "register", O_WRONLY);
>> + if (fd < 0)
>> + err(EXIT_FAILURE, _("cannot open %s/register"), binfmt_mnt);
>> +
>> + if (write_all(fd, interp, strlen(interp)))
>> + err(EXIT_FAILURE, _("write failed %s/register"), binfmt_mnt);
>> +
>> + close(fd);
>> +
>> + close(dirfd);
>> +}
>> +
>> static void __attribute__((__noreturn__)) usage(void)
>> {
>> FILE *out = stdout;
>> @@ -772,6 +801,7 @@ static void __attribute__((__noreturn__)) usage(void)
>> fputs(_(" -G, --setgid <gid> set gid in entered namespace\n"), out);
>> fputs(_(" --monotonic <offset> set clock monotonic offset (seconds) in time namespaces\n"), out);
>> fputs(_(" --boottime <offset> set clock boottime offset (seconds) in time namespaces\n"), out);
>> + fputs(_(" -l, --load-interp <file> load binfmt definition in the namespace (implies --mount-binfmt)\n"), out);
>>
>> fputs(USAGE_SEPARATOR, out);
>> fprintf(out, USAGE_HELP_OPTIONS(27));
>> @@ -830,6 +860,7 @@ int main(int argc, char *argv[])
>> { "wd", required_argument, NULL, 'w' },
>> { "monotonic", required_argument, NULL, OPT_MONOTONIC },
>> { "boottime", required_argument, NULL, OPT_BOOTTIME },
>> + { "load-interp", required_argument, NULL, 'l' },
>> { NULL, 0, NULL, 0 }
>> };
>>
>> @@ -846,6 +877,7 @@ int main(int argc, char *argv[])
>> const char *newroot = NULL;
>> const char *newdir = NULL;
>> pid_t pid_bind = 0, pid_idmap = 0;
>> + const char *newinterp = NULL;
>> pid_t pid = 0;
>> #ifdef UL_HAVE_PIDFD
>> int fd_parent_pid = -1;
>> @@ -868,7 +900,7 @@ int main(int argc, char *argv[])
>> textdomain(PACKAGE);
>> close_stdout_atexit();
>>
>> - while ((c = getopt_long(argc, argv, "+fhVmuinpCTUrR:w:S:G:c", longopts, NULL)) != -1) {
>> + while ((c = getopt_long(argc, argv, "+fhVmuinpCTUrR:w:S:G:cl:", longopts, NULL)) != -1) {
>> switch (c) {
>> case 'f':
>> forkit = 1;
>> @@ -1011,6 +1043,15 @@ int main(int argc, char *argv[])
>> boottime = strtos64_or_err(optarg, _("failed to parse boottime offset"));
>> force_boottime = 1;
>> break;
>> + case 'l':
>> + unshare_flags |= CLONE_NEWNS | CLONE_NEWUSER;
>> + if (!binfmt_mnt) {
>> + if (!procmnt)
>> + procmnt = "/proc";
>> + binfmt_mnt = _PATH_PROC_BINFMT_MISC;
>> + }
>> + newinterp = optarg;
>> + break;
>>
>> case 'h':
>> usage();
>> @@ -1165,6 +1206,13 @@ int main(int argc, char *argv[])
>> if ((unshare_flags & CLONE_NEWNS) && propagation)
>> set_propagation(propagation);
>>
>> + if (newinterp && is_fixed(newinterp)) {
>> + if (mount("binfmt_misc", _PATH_PROC_BINFMT_MISC, "binfmt_misc",
>> + MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL) != 0)
>> + err(EXIT_FAILURE, _("mount %s failed"), _PATH_PROC_BINFMT_MISC);
>> + load_interp(_PATH_PROC_BINFMT_MISC, newinterp);
>> + }
>
> If I understand correctly, using --load-interp with 'F' calls
> mount(binfmt_misc) twice:
>
> 1) before chroot
> 2) after chroot() and after mount(/proc) (implies --mount-binfmt and
> --mount-proc too)
Yes, it's needed before chroot to load the interpreter from the caller filesystem.
it's not needed after the chroot in this case, it's only there for consistency to have it in the
chroot as we asked it on the command line. I think it can be removed if you prefer.
>
> I believe it would be helpful to include this information in the man
> page.
I'll update the man page accordingly.
Thanks,
Laurent
> Karel
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 2/2] unshare: load binfmt_misc interpreter
2024-06-18 10:13 ` Laurent Vivier
@ 2024-06-18 11:58 ` Karel Zak
0 siblings, 0 replies; 6+ messages in thread
From: Karel Zak @ 2024-06-18 11:58 UTC (permalink / raw)
To: Laurent Vivier; +Cc: util-linux
On Tue, Jun 18, 2024 at 12:13:50PM +0200, Laurent Vivier wrote:
> Le 18/06/2024 à 11:51, Karel Zak a écrit :
> >
> > Hi Laurent,
> >
>
> Hi Karel,
>
> > On Tue, Jun 11, 2024 at 10:43:14AM +0200, Laurent Vivier wrote:
> > > +*-l*, **--load-interp=**__file__::
> > > +Load binfmt_misc definition in the namespace (implies *--mount-binfmt*).
> >
> > Is it actually a file, or does the argument have a more complex
> > format? If there is something more that it should be described here.
> > It fine describe in the man page more about the interpreters.
>
> Your right the format here is not actually a file, but it defines how to use
> the file provided in the parameter as an interpreter.
>
> We provide here what we will write in /proc/sys/fs/binfmt_misc/register and
> the format is described in
> https://www.kernel.org/doc/Documentation/admin-guide/binfmt-misc.rst:
>
> "To actually register a new binary type, you have to set up a string looking
> like ``:name:type:offset:magic:mask:interpreter:flags``
I guess we can use something like:
-l*, **--load-interp=**string
Load binfmt_misc definition in the namespace. The __string__ argument
is ``:name:type:offset:magic:mask:interpreter:flags``. For more
details about new binary type registration see
https://www.kernel.org/doc/Documentation/admin-guide/binfmt-misc.rst.
> > As an uneducated reader, I am confused by the flags. Where is the 'F'
> > flag? Perhaps you could provide more explanation to make it easier for
> > readers to understand.
>
> I think this option should be used by educated user that is aware of binfmt_misc format.
>
> Do you want I copy a part of the binfmt_misc documentation in the unshare documentation?
It's probably overkill copy all the text.
> I'll update the man page accordingly.
Thanks!
Karel
--
Karel Zak <kzak@redhat.com>
http://karelzak.blogspot.com
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-06-18 11:58 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-11 8:43 [PATCH v2 0/2] unshare: manage binfmt_misc mounts Laurent Vivier
2024-06-11 8:43 ` [PATCH v2 1/2] unshare: mount binfmt_misc Laurent Vivier
2024-06-11 8:43 ` [PATCH v2 2/2] unshare: load binfmt_misc interpreter Laurent Vivier
2024-06-18 9:51 ` Karel Zak
2024-06-18 10:13 ` Laurent Vivier
2024-06-18 11:58 ` Karel Zak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).