public inbox for linux-unionfs@vger.kernel.org
 help / color / mirror / Atom feed
* ovl: Allow layers from anonymous mount namespaces?
@ 2025-01-23  4:18 Mike Baynton
  2025-01-23 19:19 ` [RFC PATCH 1/2] fs: allow detached mounts in clone_private_mount() Christian Brauner
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Mike Baynton @ 2025-01-23  4:18 UTC (permalink / raw)
  To: overlayfs, brauner

Hi,
I've been eagerly awaiting the arrival of lowerdir+ by file handle, as
it looks likely to be well-suited to simplifying the task a container
runtime must take on in order to provide a set of properly idmapped
lower layers for a user namespaced container. Currently in containerd,
this is done by creating bindmounts for each required lower layer in
order to apply idmapping to them. Each of these bindmounts must be
briefly attached to some path-resolvable mountpoint before the overlay
is created, which seems less than ideal and is contributing to some
cleanup headaches e.g. when other software that may be present jumps on
the new mount and starts security scanning it or whatnot.

In order to better isolate the idmap bindmounts I was hoping to do
something like:

ovl_ctx = fsopen("overlay", FSOPEN_CLOEXEC);

opfd = open_tree(-1, "/path/to/unmapped/layer",
OPEN_TREE_CLONE|OPEN_TREE_CLOEXEC);
mount_setattr(opfd, "", AT_EMPTY_PATH, /* attrs to set a userns_fd */);
dfd = openat(opfd, ".", O_DIRECTORY, mode);

fsconfig(ovl_ctx, FSCONFIG_SET_FD, "lowerdir+", dfd);
// ...other ovl_ctx fsconfigs...
fsconfig(ovl_ctx, FSCONFIG_CMD_CREATE, NULL, NULL, 0);

...and this *almost* works in 6.13. The result of something like this is
that the FSCONFIG_CMD_CREATE fails, with "overlayfs: failed to clone
lowerpath" in dmesg. Investigating a bit, the cause is that the mount
represented by opfd is placed in a newly allocated mount namespace
containing only itself. When overlayfs then tries to make its own
private copy of that mount, it uses clone_private_mount() which subjects
any source mount to a test that its mount namespace is the task's mount
namespace. If I just remove this one check, then userspace code like the
above seems to happily work.

I've tried various things in userspace to move opfd to the task's mount
namespace _without_ also attaching it to a directory tree somewhere as
we do today, but have come up short on a way to do that.

Assuming what I'm trying to do is in line with the intended use case for
these new(er) APIs, I'm wondering if some relatively small kernel change
might be the best way to enable this? Perhaps clone_private_mount(),
which seems to only be used in-tree by overlayfs, could also tolerate
mounts in "anonymous" (when created by alloc_mnt_ns) mount namespaces or
something?

Thanks
Mike

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-01-24 11:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-23  4:18 ovl: Allow layers from anonymous mount namespaces? Mike Baynton
2025-01-23 19:19 ` [RFC PATCH 1/2] fs: allow detached mounts in clone_private_mount() Christian Brauner
2025-01-24  5:42   ` Mike Baynton
2025-01-23 19:19 ` [RFC PATCH 2/2] selftests: add tests for using detached mount with overlayfs Christian Brauner
2025-01-23 19:21 ` ovl: Allow layers from anonymous mount namespaces? Christian Brauner
2025-01-24  5:40   ` Mike Baynton
2025-01-24 11:06     ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox