From: Miklos Szeredi <miklos@szeredi.hu>
To: obuil.liubo@gmail.com
Cc: fuse-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org,
Vivek Goyal <vgoyal@redhat.com>,
virtio-fs-list <virtio-fs@redhat.com>
Subject: Re: [PATCH] fuse: fix stale data issue of virtiofs with dax
Date: Mon, 13 Nov 2023 15:45:11 +0100 [thread overview]
Message-ID: <CAJfpegsqwWKcpcJCpOQ_EmfuX5k9RXgPSz0q6d6+9DVfEAsLVw@mail.gmail.com> (raw)
In-Reply-To: <20231105063608.68114-1-obuil.liubo@gmail.com>
[adding Vivek Goyal and virtio-fs list]
On Sun, 5 Nov 2023 at 07:36, <obuil.liubo@gmail.com> wrote:
>
> From: Liu Bo <bo.liu@linux.alibaba.com>
>
> In open(O_TRUNC), it doesn't cleanup the corresponding dax mapping of
> an inode, which could expose stale data to users, which can be easily
> reproduced by accessing overlayfs through virtiofs with dax.
>
> The reproducer is,
> 0. mkdir /home/test/lower /home/test/upper /home/test/work
> 1. echo "aaaaaa" > /home/test/lower/foobar
> 2. sudo mount -t overlay overlay -olowerdir=/home/test/lower,upperdir=/home/test/upper,workdir=/home/test/work /mnt/ovl
> 3. start a guest VM configured with virtiofs passthrough(_dax_ enabled and cache policy is always)
> and use /mnt/ovl as the source directory.
> 4. ssh into the guest VM
> 5. mount virtiofs onto /mnt/test
> 5. cat /mnt/test/foobar
> aaaaaa
> 6. echo "xxxx" > /mnt/test/foobar
> 7. cat /mnt/test/foobar
>
> w/o this patch, step 7 shows "aaaa" rather than "xxxx".
This is generally unsupported. See Documentation/filesystems/overlayfs.rst:
"Changes to underlying filesystems
---------------------------------
Changes to the underlying filesystems while part of a mounted overlay
filesystem are not allowed. If the underlying filesystem is changed,
the behavior of the overlay is undefined, though it will not result in
a crash or deadlock.
..."
Thanks,
Miklos
>
> Step 5 reads the content of file 'foobar' by setting up a dax mapping, and the
> mapping eventually maps /home/test/lower/foobar because there is no copy up
> at this moment.
>
> In step 6, 'foobar' is opened with O_TRUNC and the viriofs daemon on host side
> triggers a copy-up, "xxxx" is eventually written to /home/test/upper/foobar.
> Since 'foobar' is truncated to size 0 when writting, writes go with non-dax io
> path and update the file size in guest VM accordingly.
>
> However, dax mapping still refers to the /home/test/lower/foobar, so what step
> 7 reads is /home/test/lower/foobar but with the new size, which shows "aaaa"
> rather "xxxx".
>
> Reported-by: Cameron <cameron@northflank.com>
> Tested-by: Cameron <cameron@northflank.com>
> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> ---
> fs/fuse/dax.c | 9 +++++++++
> fs/fuse/dir.c | 5 +++++
> fs/fuse/fuse_i.h | 1 +
> 3 files changed, 15 insertions(+)
>
> diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
> index 8e74f278a3f6..d7f9ec7f4597 100644
> --- a/fs/fuse/dax.c
> +++ b/fs/fuse/dax.c
> @@ -385,6 +385,15 @@ void fuse_dax_inode_cleanup(struct inode *inode)
> WARN_ON(fi->dax->nr);
> }
>
> +/* Callers need to make sure fi->i_mmap_sem is held. */
> +void fuse_dax_inode_cleanup_range(struct inode *inode, loff_t start)
> +{
> + struct fuse_conn *fc = get_fuse_conn(inode);
> + struct fuse_inode *fi = get_fuse_inode(inode);
> +
> + inode_reclaim_dmap_range(fc->dax, inode, start, -1);
> +}
> +
> static void fuse_fill_iomap_hole(struct iomap *iomap, loff_t length)
> {
> iomap->addr = IOMAP_NULL_ADDR;
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index f67bef9d83c4..be7892e052b5 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1788,6 +1788,9 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
> */
> i_size_write(inode, 0);
> truncate_pagecache(inode, 0);
> + if (fault_blocked && FUSE_IS_DAX(inode))
> + fuse_dax_inode_cleanup(inode);
> +
> goto out;
> }
> file = NULL;
> @@ -1883,6 +1886,8 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
> S_ISREG(inode->i_mode) && oldsize != outarg.attr.size) {
> truncate_pagecache(inode, outarg.attr.size);
> invalidate_inode_pages2(mapping);
> + if (fault_blocked && FUSE_IS_DAX(inode))
> + fuse_dax_inode_cleanup_range(inode, outarg.attr.size);
> }
>
> clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 9b7fc7d3c7f1..02fa7aa2bd56 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1305,6 +1305,7 @@ void fuse_dax_conn_free(struct fuse_conn *fc);
> bool fuse_dax_inode_alloc(struct super_block *sb, struct fuse_inode *fi);
> void fuse_dax_inode_init(struct inode *inode, unsigned int flags);
> void fuse_dax_inode_cleanup(struct inode *inode);
> +void fuse_dax_inode_cleanup_range(struct inode *inode, loff_t start);
> void fuse_dax_dontcache(struct inode *inode, unsigned int flags);
> bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
> void fuse_dax_cancel_work(struct fuse_conn *fc);
> --
> 2.32.1 (Apple Git-133)
>
prev parent reply other threads:[~2023-11-13 14:45 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-05 6:36 [PATCH] fuse: fix stale data issue of virtiofs with dax obuil.liubo
2023-11-05 12:33 ` kernel test robot
2023-11-13 14:45 ` Miklos Szeredi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJfpegsqwWKcpcJCpOQ_EmfuX5k9RXgPSz0q6d6+9DVfEAsLVw@mail.gmail.com \
--to=miklos@szeredi.hu \
--cc=fuse-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=obuil.liubo@gmail.com \
--cc=vgoyal@redhat.com \
--cc=virtio-fs@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).