* [Virtio-fs] virtiofsd permission problem to work with KATA on older host kernel @ 2019-07-16 17:49 Eric Ren 2019-07-23 18:50 ` Vivek Goyal 0 siblings, 1 reply; 4+ messages in thread From: Eric Ren @ 2019-07-16 17:49 UTC (permalink / raw) To: virtio-fs Hi, I'm trying virtio-fs with KATA container on older kernel (3.10) host. I failed to run container as follows: ``` $sudo docker run --it busybox sh standard_init_linux.go:190: exec user process caused "read-only file system" ERRO[0001] init failed error="standard_init_linux.go:190: exec user process caused \"read-only file system\"" name=kata-agent pid=1 source=agent panic: --this line should have never been executed, congratulations-- goroutine 1 [running, locked to thread]: main.init.0() /home/admin/rpmbuild/BUILD/go/src/github.com/kata-containers/agent/agent.go:1228 +0x10f ``` After some troubleshooting, I find it's OK to read/write within virtio-fs dir, and no problem happens if no overlayfs in between. However, it fails with EROFS error when executing binary on overlayfs, which can reproduce as below: ``` $mv hello lowdir/ $mkdir uppperdir workdir merged $sudo mount -t overlay overlay -olowerdir=lowdir,upperdir=upperdir,workdir=workdir merged $sudo docker run --name=virtio-fs-1 -v /home/eric/lab/merged:/mnt/ --runtime=kata-runtime -it busybox sh [hack kata-agent to sleep in deadloop, and login the VM] /run/kata-containers/shared/containers/833c4dad342ecd55a25d6470faf99b57c1057fe854b2309bd8efc41b26d10627-840ec5db9825b5ac-mnt # ./hello /bin/sh: ./hello: Read-only file system ``` The problem seems that `lowdir` of overlay is readonly, but we relax `open` to use O_RDWR, so they conflict when `execve` syscall to open executable binary. This test patch fixes this problem for me: ``` git diff diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c index 78716c8aca..eaba3db22c 100644 --- a/contrib/virtiofsd/passthrough_ll.c +++ b/contrib/virtiofsd/passthrough_ll.c @@ -1898,7 +1898,17 @@ static void lo_setupmapping(fuse_req_t req, fuse_ino_t ino, uint64_t foffset, * TODO: O_RDWR might not be allowed if file is read only or * write only. Fix it. */ - fd = openat(lo->proc_self_fd, buf, O_RDWR); + #define RW_MASK 0x3 + fd = openat(lo->proc_self_fd, buf, flags & RW_MASK); ``` But, it's intersting why newer host kernel like 4.19 is free of this problem? Regards, Eric ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [Virtio-fs] virtiofsd permission problem to work with KATA on older host kernel 2019-07-16 17:49 [Virtio-fs] virtiofsd permission problem to work with KATA on older host kernel Eric Ren @ 2019-07-23 18:50 ` Vivek Goyal 2019-07-23 23:27 ` Liu Bo 0 siblings, 1 reply; 4+ messages in thread From: Vivek Goyal @ 2019-07-23 18:50 UTC (permalink / raw) To: Eric Ren; +Cc: virtio-fs On Wed, Jul 17, 2019 at 01:49:10AM +0800, Eric Ren wrote: > Hi, > > I'm trying virtio-fs with KATA container on older kernel (3.10) host. > I failed to run container as follows: > > ``` > $sudo docker run --it busybox sh > standard_init_linux.go:190: exec user process caused "read-only file system" > ERRO[0001] init failed error="standard_init_linux.go:190: exec user process caused \"read-only file system\"" name=kata-agent pid=1 source=agent > panic: --this line should have never been executed, congratulations-- > > goroutine 1 [running, locked to thread]: > main.init.0() > /home/admin/rpmbuild/BUILD/go/src/github.com/kata-containers/agent/agent.go:1228 +0x10f > ``` > > After some troubleshooting, I find it's OK to read/write within > virtio-fs dir, and no problem happens if no overlayfs in between. > > However, it fails with EROFS error when executing binary on overlayfs, > which can reproduce as below: > > ``` > $mv hello lowdir/ > $mkdir uppperdir workdir merged > $sudo mount -t overlay overlay -olowerdir=lowdir,upperdir=upperdir,workdir=workdir merged > $sudo docker run --name=virtio-fs-1 -v /home/eric/lab/merged:/mnt/ --runtime=kata-runtime -it busybox sh > > [hack kata-agent to sleep in deadloop, and login the VM] > > /run/kata-containers/shared/containers/833c4dad342ecd55a25d6470faf99b57c1057fe854b2309bd8efc41b26d10627-840ec5db9825b5ac-mnt # ./hello > /bin/sh: ./hello: Read-only file system > ``` > > The problem seems that `lowdir` of overlay is readonly, but we relax > `open` to use O_RDWR, so they conflict when `execve` syscall to open > executable binary. > > This test patch fixes this problem for me: > ``` > git diff > diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c > index 78716c8aca..eaba3db22c 100644 > --- a/contrib/virtiofsd/passthrough_ll.c > +++ b/contrib/virtiofsd/passthrough_ll.c > @@ -1898,7 +1898,17 @@ static void lo_setupmapping(fuse_req_t req, fuse_ino_t ino, uint64_t foffset, > * TODO: O_RDWR might not be allowed if file is read only or > * write only. Fix it. > */ > - fd = openat(lo->proc_self_fd, buf, O_RDWR); > + #define RW_MASK 0x3 > + fd = openat(lo->proc_self_fd, buf, flags & RW_MASK); > ``` Hi Eric, Problem with doing read-only open is that what about following use case. - Process A opens a file read-only and maps a page read-only. - Process B opens same file read-write and maps a page read-write. Now this means that previsouly we setup a mapping read-only and now it needs to be upgraded to read-write so that process B does not fail. And currently we don't have logic to upgrade an existing mapping. I agree that this is a hack and needs to be changed. It breaks overlayfs horribly as all the files will be copied up and there will not be any page cache sharing between guests for files which are not being modified. Just that fix is little involved and requires modification in all the components (kernel, qemu and virtiofsd). I think somebody had posted patches to upgrade mapping from read-only to read-write on virtio-fs list. I did not get time to dive into details at that time. If you will like to look into it, will help. Vivek ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Virtio-fs] virtiofsd permission problem to work with KATA on older host kernel 2019-07-23 18:50 ` Vivek Goyal @ 2019-07-23 23:27 ` Liu Bo 2019-07-24 14:16 ` Vivek Goyal 0 siblings, 1 reply; 4+ messages in thread From: Liu Bo @ 2019-07-23 23:27 UTC (permalink / raw) To: Vivek Goyal; +Cc: virtio-fs On Tue, Jul 23, 2019 at 02:50:08PM -0400, Vivek Goyal wrote: > On Wed, Jul 17, 2019 at 01:49:10AM +0800, Eric Ren wrote: > > Hi, > > > > I'm trying virtio-fs with KATA container on older kernel (3.10) host. > > I failed to run container as follows: > > > > ``` > > $sudo docker run --it busybox sh > > standard_init_linux.go:190: exec user process caused "read-only file system" > > ERRO[0001] init failed error="standard_init_linux.go:190: exec user process caused \"read-only file system\"" name=kata-agent pid=1 source=agent > > panic: --this line should have never been executed, congratulations-- > > > > goroutine 1 [running, locked to thread]: > > main.init.0() > > /home/admin/rpmbuild/BUILD/go/src/github.com/kata-containers/agent/agent.go:1228 +0x10f > > ``` > > > > After some troubleshooting, I find it's OK to read/write within > > virtio-fs dir, and no problem happens if no overlayfs in between. > > > > However, it fails with EROFS error when executing binary on overlayfs, > > which can reproduce as below: > > > > ``` > > $mv hello lowdir/ > > $mkdir uppperdir workdir merged > > $sudo mount -t overlay overlay -olowerdir=lowdir,upperdir=upperdir,workdir=workdir merged > > $sudo docker run --name=virtio-fs-1 -v /home/eric/lab/merged:/mnt/ --runtime=kata-runtime -it busybox sh > > > > [hack kata-agent to sleep in deadloop, and login the VM] > > > > /run/kata-containers/shared/containers/833c4dad342ecd55a25d6470faf99b57c1057fe854b2309bd8efc41b26d10627-840ec5db9825b5ac-mnt # ./hello > > /bin/sh: ./hello: Read-only file system > > ``` > > > > The problem seems that `lowdir` of overlay is readonly, but we relax > > `open` to use O_RDWR, so they conflict when `execve` syscall to open > > executable binary. > > > > This test patch fixes this problem for me: > > ``` > > git diff > > diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c > > index 78716c8aca..eaba3db22c 100644 > > --- a/contrib/virtiofsd/passthrough_ll.c > > +++ b/contrib/virtiofsd/passthrough_ll.c > > @@ -1898,7 +1898,17 @@ static void lo_setupmapping(fuse_req_t req, fuse_ino_t ino, uint64_t foffset, > > * TODO: O_RDWR might not be allowed if file is read only or > > * write only. Fix it. > > */ > > - fd = openat(lo->proc_self_fd, buf, O_RDWR); > > + #define RW_MASK 0x3 > > + fd = openat(lo->proc_self_fd, buf, flags & RW_MASK); > > ``` > > Hi Eric, > > Problem with doing read-only open is that what about following use case. > > - Process A opens a file read-only and maps a page read-only. > - Process B opens same file read-write and maps a page read-write. > > Now this means that previsouly we setup a mapping read-only and now > it needs to be upgraded to read-write so that process B does not fail. > > And currently we don't have logic to upgrade an existing mapping. > > I agree that this is a hack and needs to be changed. It breaks overlayfs > horribly as all the files will be copied up and there will not be any > page cache sharing between guests for files which are not being modified. > Just FYI, on a older kernel centos 3.10, it's worse than a unnecessary copy-up. If we'd like to execuate a binary hosted on a overlayfs, 3 steps are done by order, i.e. 1) lookup, 2) open 3) setupmapping. On the daemon side, lookup uses open(O_PATH|O_RDONLY), open and setupmapping uses openat(proc_self_fd...), since openat() finds file from following symlinks in /proc/self/fd/, on 3.10 these symlinks always point to a file path of lower layer so that any openat(O_RDWR) would fail with an annoying EROFS. thanks, -liubo > Just that fix is little involved and requires modification in all > the components (kernel, qemu and virtiofsd). > > I think somebody had posted patches to upgrade mapping from read-only > to read-write on virtio-fs list. I did not get time to dive into details > at that time. If you will like to look into it, will help. > > Vivek > > _______________________________________________ > Virtio-fs mailing list > Virtio-fs@redhat.com > https://www.redhat.com/mailman/listinfo/virtio-fs ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Virtio-fs] virtiofsd permission problem to work with KATA on older host kernel 2019-07-23 23:27 ` Liu Bo @ 2019-07-24 14:16 ` Vivek Goyal 0 siblings, 0 replies; 4+ messages in thread From: Vivek Goyal @ 2019-07-24 14:16 UTC (permalink / raw) To: Liu Bo; +Cc: virtio-fs On Tue, Jul 23, 2019 at 04:27:21PM -0700, Liu Bo wrote: > On Tue, Jul 23, 2019 at 02:50:08PM -0400, Vivek Goyal wrote: > > On Wed, Jul 17, 2019 at 01:49:10AM +0800, Eric Ren wrote: > > > Hi, > > > > > > I'm trying virtio-fs with KATA container on older kernel (3.10) host. > > > I failed to run container as follows: > > > > > > ``` > > > $sudo docker run --it busybox sh > > > standard_init_linux.go:190: exec user process caused "read-only file system" > > > ERRO[0001] init failed error="standard_init_linux.go:190: exec user process caused \"read-only file system\"" name=kata-agent pid=1 source=agent > > > panic: --this line should have never been executed, congratulations-- > > > > > > goroutine 1 [running, locked to thread]: > > > main.init.0() > > > /home/admin/rpmbuild/BUILD/go/src/github.com/kata-containers/agent/agent.go:1228 +0x10f > > > ``` > > > > > > After some troubleshooting, I find it's OK to read/write within > > > virtio-fs dir, and no problem happens if no overlayfs in between. > > > > > > However, it fails with EROFS error when executing binary on overlayfs, > > > which can reproduce as below: > > > > > > ``` > > > $mv hello lowdir/ > > > $mkdir uppperdir workdir merged > > > $sudo mount -t overlay overlay -olowerdir=lowdir,upperdir=upperdir,workdir=workdir merged > > > $sudo docker run --name=virtio-fs-1 -v /home/eric/lab/merged:/mnt/ --runtime=kata-runtime -it busybox sh > > > > > > [hack kata-agent to sleep in deadloop, and login the VM] > > > > > > /run/kata-containers/shared/containers/833c4dad342ecd55a25d6470faf99b57c1057fe854b2309bd8efc41b26d10627-840ec5db9825b5ac-mnt # ./hello > > > /bin/sh: ./hello: Read-only file system > > > ``` > > > > > > The problem seems that `lowdir` of overlay is readonly, but we relax > > > `open` to use O_RDWR, so they conflict when `execve` syscall to open > > > executable binary. > > > > > > This test patch fixes this problem for me: > > > ``` > > > git diff > > > diff --git a/contrib/virtiofsd/passthrough_ll.c b/contrib/virtiofsd/passthrough_ll.c > > > index 78716c8aca..eaba3db22c 100644 > > > --- a/contrib/virtiofsd/passthrough_ll.c > > > +++ b/contrib/virtiofsd/passthrough_ll.c > > > @@ -1898,7 +1898,17 @@ static void lo_setupmapping(fuse_req_t req, fuse_ino_t ino, uint64_t foffset, > > > * TODO: O_RDWR might not be allowed if file is read only or > > > * write only. Fix it. > > > */ > > > - fd = openat(lo->proc_self_fd, buf, O_RDWR); > > > + #define RW_MASK 0x3 > > > + fd = openat(lo->proc_self_fd, buf, flags & RW_MASK); > > > ``` > > > > Hi Eric, > > > > Problem with doing read-only open is that what about following use case. > > > > - Process A opens a file read-only and maps a page read-only. > > - Process B opens same file read-write and maps a page read-write. > > > > Now this means that previsouly we setup a mapping read-only and now > > it needs to be upgraded to read-write so that process B does not fail. > > > > And currently we don't have logic to upgrade an existing mapping. > > > > I agree that this is a hack and needs to be changed. It breaks overlayfs > > horribly as all the files will be copied up and there will not be any > > page cache sharing between guests for files which are not being modified. > > > > Just FYI, on a older kernel centos 3.10, it's worse than a unnecessary copy-up. > > If we'd like to execuate a binary hosted on a overlayfs, 3 steps are done by > order, i.e. 1) lookup, 2) open 3) setupmapping. > > On the daemon side, lookup uses open(O_PATH|O_RDONLY), open and setupmapping > uses openat(proc_self_fd...), since openat() finds file from following symlinks > in /proc/self/fd/, on 3.10 these symlinks always point to a file path of lower > layer so that any openat(O_RDWR) would fail with an annoying EROFS. Ok, that explains it. Stacking file operations is relatively new change in overlayfs. So newer kernel continue to work and don't get EROFS but files get copied up always and we lose advantage of overlayfs. So this is something which requires fixing definitely. Need to revisit the logic for upgrading the mapping from read-only to read-write. I don't think we should be opening lower/upper files directly and operate on these directly (because we are essentially bypassing overlayfs in that case). So we will have to have a reqquirement that host kernel needs to be of certain minimum version (if users plan to use overlayfs on host and virtio-fs on top). Thanks Vivek ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-07-24 14:16 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-07-16 17:49 [Virtio-fs] virtiofsd permission problem to work with KATA on older host kernel Eric Ren 2019-07-23 18:50 ` Vivek Goyal 2019-07-23 23:27 ` Liu Bo 2019-07-24 14:16 ` Vivek Goyal
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.