From: Vivek Goyal <vgoyal@redhat.com>
To: Hanna Reitz <hreitz@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
qemu-devel@nongnu.org,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
virtio-fs@redhat.com, Ioannis Angelakopoulos <jaggel@bu.edu>,
Max Reitz <mreitz@redhat.com>
Subject: Re: [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle
Date: Tue, 17 Aug 2021 15:45:19 -0400 [thread overview]
Message-ID: <YRwRz8aZGq6QLpx/@redhat.com> (raw)
In-Reply-To: <89b416e7-c0ca-7831-da13-683e8a74b7ae@redhat.com>
On Tue, Aug 17, 2021 at 10:27:16AM +0200, Hanna Reitz wrote:
> On 16.08.21 21:44, Vivek Goyal wrote:
> > On Wed, Aug 11, 2021 at 08:41:18AM +0200, Hanna Reitz wrote:
> >
> > [..]
> > > > > But given the inotify complications, there’s really a good reason we should
> > > > > use mountinfo.
> > > > >
> > > > > > > It’s a bit tricky because our sandboxing prevents easy access to mountinfo,
> > > > > > > but if that’s the only way...
> > > > > > yes. We already have lo->proc_self_fd. Maybe we need to keep
> > > > > > /proc/self/mountinfo open in lo->proc_self_mountinfo. I am assuming
> > > > > > that any mount table changes will still be visible despite the fact
> > > > > > I have fd open (and don't have to open new fd to notice new mount/unmount
> > > > > > changes).
> > > > > Well, yes, that was my idea. Unfortunately, I wasn’t quite successful yet;
> > > > > when I tried keeping the fd open, reading from it would just return 0
> > > > > bytes. Perhaps that’s because we bind-mount /proc/self/fd to /proc so that
> > > > > nothing else in /proc is visible. Perhaps we need to bind-mount
> > > > > /proc/self/mountinfo into /proc/self/fd before that...
> > > > Or perhaps open /proc/self/mountinfo and save fd in lo->proc_mountinfo
> > > > before /proc/self/fd is bind mounted on /proc?
> > > Yes, I tried that, and then reading would just return 0 bytes.
> > Hi Hanna,
> >
> > I tried this simple patch and I can read /proc/self/mountinfo before
> > bind mounting /proc/self/fd and after bind mounting /proc/self/fd. Am
> > I missing something.
>
> Yes, but I tried reading it in the main loop (where we’d actually need it).
> It looks like the umount2(".", MNT_DETACH) in setup_mounts() breaks it.
Good point. I modified my code and notice too that after umoutn2() it
always reads 0 bytes. I can understand that all the other mount points
could go away but new rootfs mount point of virtiofsd should still be
visible, IIUC. I don't understand why.
Anyway, I tried re-opening /proc/self/mountinfo file after umount2(".",
MNT_DETACH), and that seems to work and it shows root mount point. I
created a bind mount and it shows that too.
So looks like quick fix can be that we re-open /proc/self/mountinfo. But
that means we can't bind /proc/self/fd on /proc/. We could bind mount
/proc/self on /proc. Not sure is it safe enough.
Here is the debug patch I tried.
---
tools/virtiofsd/passthrough_ll.c | 101 +++++++++++++++++++++++++++++++++++++--
1 file changed, 96 insertions(+), 5 deletions(-)
Index: rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c
===================================================================
--- rhvgoyal-qemu.orig/tools/virtiofsd/passthrough_ll.c 2021-08-16 15:29:27.712223551 -0400
+++ rhvgoyal-qemu/tools/virtiofsd/passthrough_ll.c 2021-08-17 15:40:20.456811218 -0400
@@ -172,6 +172,8 @@ struct lo_data {
/* An O_PATH file descriptor to /proc/self/fd/ */
int proc_self_fd;
+ int proc_mountinfo;
+ int proc_self;
int user_killpriv_v2, killpriv_v2;
/* If set, virtiofsd is responsible for setting umask during creation */
bool change_umask;
@@ -3403,12 +3405,56 @@ static void setup_wait_parent_capabiliti
capng_apply(CAPNG_SELECT_BOTH);
}
+static void read_mountinfo(struct lo_data *lo)
+{
+ char buf[4096];
+ ssize_t count, total_read = 0;
+ int ret;
+
+ ret = lseek(lo->proc_mountinfo, 0, SEEK_SET);
+ if (ret == -1) {
+ fuse_log(FUSE_LOG_ERR, "lseek(): %m\n");
+ exit(1);
+ }
+
+ do {
+ count = read(lo->proc_mountinfo, buf, 4095);
+ if (count == -1) {
+ fuse_log(FUSE_LOG_ERR, "read(/proc/self/mountinfo): %m\n");
+ exit(1);
+ }
+
+ //fuse_log(FUSE_LOG_INFO, "read(%d) bytes\n", count);
+ buf[count] = '\0';
+ fuse_log(FUSE_LOG_INFO, "%s", buf);
+ total_read += count;
+ } while(count);
+
+ fuse_log(FUSE_LOG_INFO, "read(%d) bytes\n", total_read);
+}
+
+static void reopen_mountinfo(struct lo_data *lo)
+{
+ int fd;
+
+ close(lo->proc_mountinfo);
+
+ fd = openat(lo->proc_self, "mountinfo", O_RDONLY);
+ if (fd == -1) {
+ fuse_log(FUSE_LOG_ERR, "open(/proc/self/mountinfo, O_RDONLY): %m\n");
+ exit(1);
+ }
+
+ lo->proc_mountinfo = fd;
+}
+
/*
* Move to a new mount, net, and pid namespaces to isolate this process.
*/
static void setup_namespaces(struct lo_data *lo, struct fuse_session *se)
{
pid_t child;
+ int fd;
/*
* Create a new pid namespace for *child* processes. We'll have to
@@ -3472,21 +3518,35 @@ static void setup_namespaces(struct lo_d
exit(1);
}
+ fd = open("/proc/self/mountinfo", O_RDONLY);
+ if (fd == -1) {
+ fuse_log(FUSE_LOG_ERR, "open(/proc/self/mountinfo, O_RDONLY): %m\n");
+ exit(1);
+ }
+
+ lo->proc_mountinfo = fd;
+
/*
* We only need /proc/self/fd. Prevent ".." from accessing parent
* directories of /proc/self/fd by bind-mounting it over /proc. Since / was
* previously remounted with MS_REC | MS_SLAVE this mount change only
* affects our process.
*/
- if (mount("/proc/self/fd", "/proc", NULL, MS_BIND, NULL) < 0) {
+ if (mount("/proc/self/", "/proc", NULL, MS_BIND, NULL) < 0) {
fuse_log(FUSE_LOG_ERR, "mount(/proc/self/fd, MS_BIND): %m\n");
exit(1);
}
/* Get the /proc (actually /proc/self/fd, see above) file descriptor */
- lo->proc_self_fd = open("/proc", O_PATH);
+ lo->proc_self_fd = open("/proc/fd", O_PATH);
if (lo->proc_self_fd == -1) {
- fuse_log(FUSE_LOG_ERR, "open(/proc, O_PATH): %m\n");
+ fuse_log(FUSE_LOG_ERR, "open(/proc/fd, O_PATH): %m\n");
+ exit(1);
+ }
+
+ lo->proc_self = open("/proc/", O_PATH);
+ if (lo->proc_self == -1) {
+ fuse_log(FUSE_LOG_ERR, "open(/proc/self, O_PATH): %m\n");
exit(1);
}
}
@@ -3524,7 +3584,7 @@ static void cleanup_capng(void)
* Make the source directory our root so symlinks cannot escape and no other
* files are accessible. Assumes unshare(CLONE_NEWNS) was already called.
*/
-static void setup_mounts(const char *source)
+static void setup_mounts(const char *source, struct lo_data *lo)
{
int oldroot;
int newroot;
@@ -3552,26 +3612,43 @@ static void setup_mounts(const char *sou
exit(1);
}
+ fuse_log(FUSE_LOG_INFO, "mountinfo before pivot_root()\n");
+ read_mountinfo(lo);
+
if (syscall(__NR_pivot_root, ".", ".") < 0) {
fuse_log(FUSE_LOG_ERR, "pivot_root(., .): %m\n");
exit(1);
}
+ fuse_log(FUSE_LOG_INFO, "mountinfo after pivot_root()\n");
+ read_mountinfo(lo);
+
if (fchdir(oldroot) < 0) {
fuse_log(FUSE_LOG_ERR, "fchdir(oldroot): %m\n");
exit(1);
}
+ fuse_log(FUSE_LOG_INFO, "mountinfo after fchdir()\n");
+ read_mountinfo(lo);
+
if (mount("", ".", "", MS_SLAVE | MS_REC, NULL) < 0) {
fuse_log(FUSE_LOG_ERR, "mount(., MS_SLAVE | MS_REC): %m\n");
exit(1);
}
+ fuse_log(FUSE_LOG_INFO, "mountinfo before umount2(., MNT_DETACH): %m\n");
+ reopen_mountinfo(lo);
+ read_mountinfo(lo);
+
if (umount2(".", MNT_DETACH) < 0) {
fuse_log(FUSE_LOG_ERR, "umount2(., MNT_DETACH): %m\n");
exit(1);
}
+ fuse_log(FUSE_LOG_INFO, "mountinfo after umount2(., MNT_DETACH): %m\n");
+ reopen_mountinfo(lo);
+ read_mountinfo(lo);
+
if (fchdir(newroot) < 0) {
fuse_log(FUSE_LOG_ERR, "fchdir(newroot): %m\n");
exit(1);
@@ -3711,6 +3788,19 @@ static void setup_chroot(struct lo_data
}
}
+static void create_mount(struct lo_data *lo)
+{
+ const char *source="foo", *dest="bar";
+
+ if (mount(source, dest, NULL, MS_BIND | MS_REC, NULL) < 0) {
+ fuse_log(FUSE_LOG_ERR, "mount(%s, %s, MS_BIND): %m\n", source, source);
+ exit(1);
+ }
+
+ fuse_log(FUSE_LOG_INFO, "mountinfo after mounting foo\n");
+ read_mountinfo(lo);
+}
+
/*
* Lock down this process to prevent access to other processes or files outside
* source directory. This reduces the impact of arbitrary code execution bugs.
@@ -3720,7 +3810,8 @@ static void setup_sandbox(struct lo_data
{
if (lo->sandbox == SANDBOX_NAMESPACE) {
setup_namespaces(lo, se);
- setup_mounts(lo->source);
+ setup_mounts(lo->source, lo);
+ create_mount(lo);
} else {
setup_chroot(lo);
}
next prev parent reply other threads:[~2021-08-17 19:46 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-30 15:01 [PATCH v3 00/10] virtiofsd: Allow using file handles instead of O_PATH FDs Max Reitz
2021-07-30 15:01 ` [PATCH v3 01/10] virtiofsd: Limit setxattr()'s creds-dropped region Max Reitz
2021-08-06 14:16 ` Vivek Goyal
2021-08-09 10:30 ` Max Reitz
2021-07-30 15:01 ` [PATCH v3 02/10] virtiofsd: Add TempFd structure Max Reitz
2021-08-06 14:41 ` Vivek Goyal
2021-08-09 10:44 ` Max Reitz
2021-07-30 15:01 ` [PATCH v3 03/10] virtiofsd: Use lo_inode_open() instead of openat() Max Reitz
2021-08-06 15:42 ` Vivek Goyal
2021-07-30 15:01 ` [PATCH v3 04/10] virtiofsd: Add lo_inode_fd() helper Max Reitz
2021-08-06 18:25 ` Vivek Goyal
2021-08-09 10:48 ` Max Reitz
2021-07-30 15:01 ` [PATCH v3 05/10] virtiofsd: Let lo_fd() return a TempFd Max Reitz
2021-07-30 15:01 ` [PATCH v3 06/10] virtiofsd: Let lo_inode_open() " Max Reitz
2021-08-06 19:55 ` Vivek Goyal
2021-08-09 13:40 ` Max Reitz
2021-07-30 15:01 ` [PATCH v3 07/10] virtiofsd: Add lo_inode.fhandle Max Reitz
2021-08-09 15:21 ` Vivek Goyal
2021-08-09 16:41 ` Hanna Reitz
2021-07-30 15:01 ` [PATCH v3 08/10] virtiofsd: Add inodes_by_handle hash table Max Reitz
2021-08-09 16:10 ` Vivek Goyal
2021-08-09 16:47 ` Hanna Reitz
2021-08-10 14:07 ` Vivek Goyal
2021-08-10 14:13 ` Hanna Reitz
2021-08-10 17:51 ` Vivek Goyal
2021-07-30 15:01 ` [PATCH v3 09/10] virtiofsd: Optionally fill lo_inode.fhandle Max Reitz
2021-08-09 18:41 ` Vivek Goyal
2021-08-10 8:32 ` Hanna Reitz
2021-08-10 15:23 ` Vivek Goyal
2021-08-10 15:26 ` Hanna Reitz
2021-08-10 15:57 ` Vivek Goyal
2021-08-11 6:41 ` Hanna Reitz
2021-08-16 19:44 ` Vivek Goyal
2021-08-17 8:27 ` Hanna Reitz
2021-08-17 19:45 ` Vivek Goyal [this message]
2021-08-18 0:14 ` Vivek Goyal
2021-08-18 13:32 ` Vivek Goyal
2021-08-18 13:48 ` Hanna Reitz
2021-08-19 16:38 ` Dr. David Alan Gilbert
2021-07-30 15:01 ` [PATCH v3 10/10] virtiofsd: Add lazy lo_do_find() Max Reitz
2021-08-09 19:08 ` Vivek Goyal
2021-08-10 8:38 ` Hanna Reitz
2021-08-10 14:12 ` Vivek Goyal
2021-08-10 14:17 ` Hanna Reitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YRwRz8aZGq6QLpx/@redhat.com \
--to=vgoyal@redhat.com \
--cc=dgilbert@redhat.com \
--cc=hreitz@redhat.com \
--cc=jaggel@bu.edu \
--cc=mreitz@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=virtio-fs@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).