From: "Michael S. Tsirkin" <mst@redhat.com>
To: Hou Tao <houtao@huaweicloud.com>
Cc: linux-fsdevel@vger.kernel.org, Miklos Szeredi <miklos@szeredi.hu>,
Vivek Goyal <vgoyal@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
linux-kernel@vger.kernel.org, virtualization@lists.linux.dev,
houtao1@huawei.com
Subject: Re: [PATCH] virtiofs: limit the length of ITER_KVEC dio by max_nopage_rw
Date: Tue, 9 Jan 2024 07:01:05 -0500 [thread overview]
Message-ID: <20240109065948-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20240103105929.1902658-1-houtao@huaweicloud.com>
On Wed, Jan 03, 2024 at 06:59:29PM +0800, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
>
> When trying to insert a 10MB kernel module kept in a virtiofs with cache
> disabled, the following warning was reported:
>
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 439 at mm/page_alloc.c:4544 ......
> Modules linked in:
> CPU: 2 PID: 439 Comm: insmod Not tainted 6.7.0-rc7+ #33
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ......
> RIP: 0010:__alloc_pages+0x2c4/0x360
> ......
> Call Trace:
> <TASK>
> ? __warn+0x8f/0x150
> ? __alloc_pages+0x2c4/0x360
> __kmalloc_large_node+0x86/0x160
> __kmalloc+0xcd/0x140
> virtio_fs_enqueue_req+0x240/0x6d0
> virtio_fs_wake_pending_and_unlock+0x7f/0x190
> queue_request_and_unlock+0x58/0x70
> fuse_simple_request+0x18b/0x2e0
> fuse_direct_io+0x58a/0x850
> fuse_file_read_iter+0xdb/0x130
> __kernel_read+0xf3/0x260
> kernel_read+0x45/0x60
> kernel_read_file+0x1ad/0x2b0
> init_module_from_file+0x6a/0xe0
> idempotent_init_module+0x179/0x230
> __x64_sys_finit_module+0x5d/0xb0
> do_syscall_64+0x36/0xb0
> entry_SYSCALL_64_after_hwframe+0x6e/0x76
> ......
> </TASK>
> ---[ end trace 0000000000000000 ]---
>
> The warning happened as follow. In copy_args_to_argbuf(), virtiofs uses
> kmalloc-ed memory as bound buffer for fuse args, but
> fuse_get_user_pages() only limits the length of fuse arg by max_read or
> max_write for IOV_KVEC io (e.g., kernel_read_file from finit_module()).
> For virtiofs, max_read is UINT_MAX, so a big read request which is about
> 10MB is passed to copy_args_to_argbuf(), kmalloc() is called in turn
> with len=10MB, and triggers the warning in __alloc_pages():
> WARN_ON_ONCE_GFP(order > MAX_ORDER, gfp)).
>
> A feasible solution is to limit the value of max_read for virtiofs, so
> the length passed to kmalloc() will be limited. However it will affects
> the max read size for ITER_IOVEC io and the value of max_write also needs
> limitation. So instead of limiting the values of max_read and max_write,
> introducing max_nopage_rw to cap both the values of max_read and
> max_write when the fuse dio read/write request is initiated from kernel.
>
> Considering that fuse read/write request from kernel is uncommon and to
> decrease the demand for large contiguous pages, set max_nopage_rw as
> 256KB instead of KMALLOC_MAX_SIZE - 4096 or similar.
>
> Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem")
> Signed-off-by: Hou Tao <houtao1@huawei.com>
Could this get some acks from virtio fs maintainers pls?
> ---
> fs/fuse/file.c | 12 +++++++++++-
> fs/fuse/fuse_i.h | 3 +++
> fs/fuse/inode.c | 1 +
> fs/fuse/virtio_fs.c | 6 ++++++
> 4 files changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index a660f1f21540..f1beb7c0b782 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1422,6 +1422,16 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
> return ret < 0 ? ret : 0;
> }
>
> +static size_t fuse_max_dio_rw_size(const struct fuse_conn *fc,
> + const struct iov_iter *iter, int write)
> +{
> + unsigned int nmax = write ? fc->max_write : fc->max_read;
> +
> + if (iov_iter_is_kvec(iter))
> + nmax = min(nmax, fc->max_nopage_rw);
> + return nmax;
> +}
> +
> ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
> loff_t *ppos, int flags)
> {
> @@ -1432,7 +1442,7 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter,
> struct inode *inode = mapping->host;
> struct fuse_file *ff = file->private_data;
> struct fuse_conn *fc = ff->fm->fc;
> - size_t nmax = write ? fc->max_write : fc->max_read;
> + size_t nmax = fuse_max_dio_rw_size(fc, iter, write);
> loff_t pos = *ppos;
> size_t count = iov_iter_count(iter);
> pgoff_t idx_from = pos >> PAGE_SHIFT;
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 1df83eebda92..fc753cd34211 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -594,6 +594,9 @@ struct fuse_conn {
> /** Constrain ->max_pages to this value during feature negotiation */
> unsigned int max_pages_limit;
>
> + /** Maximum read/write size when there is no page in request */
> + unsigned int max_nopage_rw;
> +
> /** Input queue */
> struct fuse_iqueue iq;
>
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 2a6d44f91729..4cbbcb4a4b71 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -923,6 +923,7 @@ void fuse_conn_init(struct fuse_conn *fc, struct fuse_mount *fm,
> fc->user_ns = get_user_ns(user_ns);
> fc->max_pages = FUSE_DEFAULT_MAX_PAGES_PER_REQ;
> fc->max_pages_limit = FUSE_MAX_MAX_PAGES;
> + fc->max_nopage_rw = UINT_MAX;
>
> INIT_LIST_HEAD(&fc->mounts);
> list_add(&fm->fc_entry, &fc->mounts);
> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> index 5f1be1da92ce..3aac31d45198 100644
> --- a/fs/fuse/virtio_fs.c
> +++ b/fs/fuse/virtio_fs.c
> @@ -1452,6 +1452,12 @@ static int virtio_fs_get_tree(struct fs_context *fsc)
> /* Tell FUSE to split requests that exceed the virtqueue's size */
> fc->max_pages_limit = min_t(unsigned int, fc->max_pages_limit,
> virtqueue_size - FUSE_HEADER_OVERHEAD);
> + /* copy_args_to_argbuf() uses kmalloc-ed memory as bounce buffer
> + * for fuse args, so limit the total size of these args to prevent
> + * the warning in __alloc_pages() and decrease the demand for large
> + * contiguous pages.
> + */
> + fc->max_nopage_rw = min(fc->max_nopage_rw, 256U << 10);
>
> fsc->s_fs_info = fm;
> sb = sget_fc(fsc, virtio_fs_test_super, set_anon_super_fc);
> --
> 2.29.2
next prev parent reply other threads:[~2024-01-09 12:01 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-03 10:59 [PATCH] virtiofs: limit the length of ITER_KVEC dio by max_nopage_rw Hou Tao
2024-01-09 12:01 ` Michael S. Tsirkin [this message]
2024-01-09 13:11 ` Bernd Schubert
2024-01-10 1:16 ` Hou Tao
2024-01-10 22:34 ` Bernd Schubert
2024-01-17 10:24 ` Hou Tao
2024-02-22 19:49 ` Michael S. Tsirkin
2024-02-23 0:58 ` Hou Tao
2024-02-23 9:42 ` Miklos Szeredi
2024-02-24 11:41 ` Hou Tao
2024-02-25 8:46 ` Michael S. Tsirkin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240109065948-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=houtao1@huawei.com \
--cc=houtao@huaweicloud.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=stefanha@redhat.com \
--cc=vgoyal@redhat.com \
--cc=virtualization@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.