From: Josef Bacik <josef@toxicpanda.com>
To: Bernd Schubert <bschubert@ddn.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
Amir Goldstein <amir73il@gmail.com>,
linux-fsdevel@vger.kernel.org, bernd.schubert@fastmail.fm
Subject: Re: [PATCH RFC v2 07/19] fuse uring: Add an mmap method
Date: Thu, 30 May 2024 11:37:13 -0400 [thread overview]
Message-ID: <20240530153713.GD2205585@perftesting> (raw)
In-Reply-To: <20240529-fuse-uring-for-6-9-rfc2-out-v1-7-d149476b1d65@ddn.com>
On Wed, May 29, 2024 at 08:00:42PM +0200, Bernd Schubert wrote:
> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> ---
> fs/fuse/dev.c | 3 ++
> fs/fuse/dev_uring.c | 114 ++++++++++++++++++++++++++++++++++++++++++++++
> fs/fuse/dev_uring_i.h | 22 +++++++++
> include/uapi/linux/fuse.h | 3 ++
> 4 files changed, 142 insertions(+)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index bc77413932cf..349c1d16b0df 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -2470,6 +2470,9 @@ const struct file_operations fuse_dev_operations = {
> .fasync = fuse_dev_fasync,
> .unlocked_ioctl = fuse_dev_ioctl,
> .compat_ioctl = compat_ptr_ioctl,
> +#if IS_ENABLED(CONFIG_FUSE_IO_URING)
I'm loathe to use
#if IS_ENABLED()
when we can use
#ifdef CONFIG_FUSE_IO_URING
which is more standard across the kernel.
> + .mmap = fuse_uring_mmap,
> +#endif
> };
> EXPORT_SYMBOL_GPL(fuse_dev_operations);
>
> diff --git a/fs/fuse/dev_uring.c b/fs/fuse/dev_uring.c
> index 702a994cf192..9491bdaa5716 100644
> --- a/fs/fuse/dev_uring.c
> +++ b/fs/fuse/dev_uring.c
> @@ -120,3 +120,117 @@ void fuse_uring_ring_destruct(struct fuse_ring *ring)
>
> mutex_destroy(&ring->start_stop_lock);
> }
> +
> +static inline int fuse_uring_current_nodeid(void)
> +{
> + int cpu;
> + const struct cpumask *proc_mask = current->cpus_ptr;
> +
> + cpu = cpumask_first(proc_mask);
> +
> + return cpu_to_node(cpu);
You don't need this, just use numa_node_id();
> +}
> +
> +static char *fuse_uring_alloc_queue_buf(int size, int node)
> +{
> + char *buf;
> +
> + if (size <= 0) {
> + pr_info("Invalid queue buf size: %d.\n", size);
> + return ERR_PTR(-EINVAL);
> + }
> +
> + buf = vmalloc_node_user(size, node);
> + return buf ? buf : ERR_PTR(-ENOMEM);
> +}
This is excessive, we base size off of ring->queue_buf_size, or the
fuse_uring_mmap() size we get from the vma, which I don't think can ever be 0 or
negative. I think we just validate that ->queue_buf_size is always correct, and
if we're really worried about it in fuse_uring_mmap we validate that sz is
correct there, and then we just use vmalloc_node_user() directly instead of
having this helper.
> +
> +/**
> + * fuse uring mmap, per ring qeuue.
> + * Userpsace maps a kernel allocated ring/queue buffer. For numa awareness,
> + * userspace needs to run the do the mapping from a core bound thread.
> + */
> +int
> +fuse_uring_mmap(struct file *filp, struct vm_area_struct *vma)
I'm not seeing anywhere else in the fuse code that has this style, I'd prefer we
keep it consistent with the rest of the kernel and have
int fuse_uring_mmap(struct file *filp, struct vm_area_struct *vma)
additionally you're using the docstyle strings without the actual docstyle
formatting, which is pissing off my git hooks that run checkpatch. Not a big
deal, but if you're going to provide docstyle comments then please do it
formatted properly, or just do a normal
/*
* fuse uring mmap....
*/
> +{
> + struct fuse_dev *fud = fuse_get_dev(filp);
> + struct fuse_conn *fc;
> + struct fuse_ring *ring;
> + size_t sz = vma->vm_end - vma->vm_start;
> + int ret;
> + struct fuse_uring_mbuf *new_node = NULL;
> + void *buf = NULL;
> + int nodeid;
> +
> + if (vma->vm_pgoff << PAGE_SHIFT != FUSE_URING_MMAP_OFF) {
> + pr_debug("Invalid offset, expected %llu got %lu\n",
> + FUSE_URING_MMAP_OFF, vma->vm_pgoff << PAGE_SHIFT);
> + return -EINVAL;
> + }
> +
> + if (!fud)
> + return -ENODEV;
> + fc = fud->fc;
> + ring = fc->ring;
> + if (!ring)
> + return -ENODEV;
> +
> + nodeid = ring->numa_aware ? fuse_uring_current_nodeid() : NUMA_NO_NODE;
nodeid = ring->numa_awayre ? numa_node_id() : NUMA_NO_NODE;
> +
> + /* check if uring is configured and if the requested size matches */
> + if (ring->nr_queues == 0 || ring->queue_depth == 0) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + if (sz != ring->queue_buf_size) {
> + ret = -EINVAL;
> + pr_devel("mmap size mismatch, expected %zu got %zu\n",
> + ring->queue_buf_size, sz);
> + goto out;
> + }
> +
> + if (current->nr_cpus_allowed != 1 && ring->numa_aware) {
> + ret = -EINVAL;
> + pr_debug(
> + "Numa awareness, but thread has more than allowed cpu.\n");
> + goto out;
> + }
> +
> + buf = fuse_uring_alloc_queue_buf(ring->queue_buf_size, nodeid);
> + if (IS_ERR(buf)) {
> + ret = PTR_ERR(buf);
> + goto out;
> + }
All of the above you can just return ret, you don't have to jump to out.
> +
> + new_node = kmalloc(sizeof(*new_node), GFP_USER);
> + if (unlikely(new_node == NULL)) {
> + ret = -ENOMEM;
> + goto out;
Here I would just
if (unlikely(new_node == NULL)) {
vfree(buf);
return -ENOMEM;
}
> + }
> +
> + ret = remap_vmalloc_range(vma, buf, 0);
> + if (ret)
> + goto out;
And since this is the only place we can fail with both things allocated I'd just
if (ret) {
vfree(buf);
kfree(new_node);
return ret;
}
and then drop the bit below where you free the buffers if there's an error.
> +
> + mutex_lock(&ring->start_stop_lock);
> + /*
> + * In this function we do not know the queue the buffer belongs to.
> + * Later server side will pass the mmaped address, the kernel address
> + * will be found through the map.
> + */
> + new_node->kbuf = buf;
> + new_node->ubuf = (void *)vma->vm_start;
> + rb_add(&new_node->rb_node, &ring->mem_buf_map,
> + fuse_uring_rb_tree_buf_less);
> + mutex_unlock(&ring->start_stop_lock);
> +out:
> + if (ret) {
> + kfree(new_node);
> + vfree(buf);
> + }
> +
> + pr_devel("%s: pid %d addr: %p sz: %zu ret: %d\n", __func__,
> + current->pid, (char *)vma->vm_start, sz, ret);
> +
> + return ret;
> +}
> diff --git a/fs/fuse/dev_uring_i.h b/fs/fuse/dev_uring_i.h
> index 58ab4671deff..c455ae0e729a 100644
> --- a/fs/fuse/dev_uring_i.h
> +++ b/fs/fuse/dev_uring_i.h
> @@ -181,6 +181,7 @@ struct fuse_ring {
>
> void fuse_uring_abort_end_requests(struct fuse_ring *ring);
> int fuse_uring_conn_cfg(struct fuse_ring *ring, struct fuse_ring_config *rcfg);
> +int fuse_uring_mmap(struct file *filp, struct vm_area_struct *vma);
> int fuse_uring_queue_cfg(struct fuse_ring *ring,
> struct fuse_ring_queue_config *qcfg);
> void fuse_uring_ring_destruct(struct fuse_ring *ring);
> @@ -208,6 +209,27 @@ static inline void fuse_uring_conn_destruct(struct fuse_conn *fc)
> kfree(ring);
> }
>
> +static inline int fuse_uring_rb_tree_buf_cmp(const void *key,
> + const struct rb_node *node)
> +{
> + const struct fuse_uring_mbuf *entry =
> + rb_entry(node, struct fuse_uring_mbuf, rb_node);
> +
> + if (key == entry->ubuf)
> + return 0;
> +
> + return (unsigned long)key < (unsigned long)entry->ubuf ? -1 : 1;
> +}
> +
> +static inline bool fuse_uring_rb_tree_buf_less(struct rb_node *node1,
> + const struct rb_node *node2)
> +{
> + const struct fuse_uring_mbuf *entry1 =
> + rb_entry(node1, struct fuse_uring_mbuf, rb_node);
> +
> + return fuse_uring_rb_tree_buf_cmp(entry1->ubuf, node2) < 0;
> +}
> +
These are only used in dev_uring.c, just put them in there instead of the header
file. Thanks,
Josef
next prev parent reply other threads:[~2024-05-30 15:37 UTC|newest]
Thread overview: 113+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-29 18:00 [PATCH RFC v2 00/19] fuse: fuse-over-io-uring Bernd Schubert
2024-05-29 18:00 ` [PATCH RFC v2 01/19] fuse: rename to fuse_dev_end_requests and make non-static Bernd Schubert
2024-05-29 21:09 ` Josef Bacik
2024-05-29 18:00 ` [PATCH RFC v2 02/19] fuse: Move fuse_get_dev to header file Bernd Schubert
2024-05-29 21:09 ` Josef Bacik
2024-05-29 18:00 ` [PATCH RFC v2 03/19] fuse: Move request bits Bernd Schubert
2024-05-29 21:10 ` Josef Bacik
2024-05-29 18:00 ` [PATCH RFC v2 04/19] fuse: Add fuse-io-uring design documentation Bernd Schubert
2024-05-29 21:17 ` Josef Bacik
2024-05-30 12:50 ` Bernd Schubert
2024-05-30 14:59 ` Josef Bacik
2024-05-29 18:00 ` [PATCH RFC v2 05/19] fuse: Add a uring config ioctl Bernd Schubert
2024-05-29 21:24 ` Josef Bacik
2024-05-30 12:51 ` Bernd Schubert
2024-06-03 13:03 ` Miklos Szeredi
2024-06-03 13:48 ` Bernd Schubert
2024-05-29 18:00 ` [PATCH RFC v2 06/19] Add a vmalloc_node_user function Bernd Schubert
2024-05-30 15:10 ` Josef Bacik
2024-05-30 16:13 ` Bernd Schubert
2024-05-31 13:56 ` Christoph Hellwig
2024-06-03 15:59 ` Kent Overstreet
2024-06-03 19:24 ` Bernd Schubert
2024-06-04 4:20 ` Christoph Hellwig
2024-06-07 2:30 ` Dave Chinner
2024-06-07 4:49 ` Christoph Hellwig
2024-06-04 4:08 ` Christoph Hellwig
2024-05-29 18:00 ` [PATCH RFC v2 07/19] fuse uring: Add an mmap method Bernd Schubert
2024-05-30 15:37 ` Josef Bacik [this message]
2024-05-29 18:00 ` [PATCH RFC v2 08/19] fuse: Add the queue configuration ioctl Bernd Schubert
2024-05-30 15:54 ` Josef Bacik
2024-05-30 17:49 ` Bernd Schubert
2024-05-29 18:00 ` [PATCH RFC v2 09/19] fuse: {uring} Add a dev_release exception for fuse-over-io-uring Bernd Schubert
2024-05-30 19:00 ` Josef Bacik
2024-05-29 18:00 ` [PATCH RFC v2 10/19] fuse: {uring} Handle SQEs - register commands Bernd Schubert
2024-05-30 19:55 ` Josef Bacik
2024-05-29 18:00 ` [PATCH RFC v2 11/19] fuse: Add support to copy from/to the ring buffer Bernd Schubert
2024-05-30 19:59 ` Josef Bacik
2024-09-01 11:56 ` Bernd Schubert
2024-09-01 11:56 ` Bernd Schubert
2024-05-29 18:00 ` [PATCH RFC v2 12/19] fuse: {uring} Add uring sqe commit and fetch support Bernd Schubert
2024-05-30 20:08 ` Josef Bacik
2024-05-29 18:00 ` [PATCH RFC v2 13/19] fuse: {uring} Handle uring shutdown Bernd Schubert
2024-05-30 20:21 ` Josef Bacik
2024-05-29 18:00 ` [PATCH RFC v2 14/19] fuse: {uring} Allow to queue to the ring Bernd Schubert
2024-05-30 20:32 ` Josef Bacik
2024-05-30 21:26 ` Bernd Schubert
2024-05-29 18:00 ` [PATCH RFC v2 15/19] export __wake_on_current_cpu Bernd Schubert
2024-05-30 20:37 ` Josef Bacik
2024-06-04 9:26 ` Peter Zijlstra
2024-06-04 9:36 ` Bernd Schubert
2024-06-04 19:27 ` Peter Zijlstra
2024-09-01 12:07 ` Bernd Schubert
2024-05-31 13:51 ` Christoph Hellwig
2024-05-29 18:00 ` [PATCH RFC v2 16/19] fuse: {uring} Wake requests on the the current cpu Bernd Schubert
2024-05-30 16:44 ` Shachar Sharon
2024-05-30 16:59 ` Bernd Schubert
2024-05-29 18:00 ` [PATCH RFC v2 17/19] fuse: {uring} Send async requests to qid of core + 1 Bernd Schubert
2024-05-29 18:00 ` [PATCH RFC v2 18/19] fuse: {uring} Set a min cpu offset io-size for reads/writes Bernd Schubert
2024-05-29 18:00 ` [PATCH RFC v2 19/19] fuse: {uring} Optimize async sends Bernd Schubert
2024-05-31 16:24 ` Jens Axboe
2024-05-31 17:36 ` Bernd Schubert
2024-05-31 19:10 ` Jens Axboe
2024-06-01 16:37 ` Bernd Schubert
2024-05-30 7:07 ` [PATCH RFC v2 00/19] fuse: fuse-over-io-uring Amir Goldstein
2024-05-30 12:09 ` Bernd Schubert
2024-05-30 15:36 ` Kent Overstreet
2024-05-30 16:02 ` Bernd Schubert
2024-05-30 16:10 ` Kent Overstreet
2024-05-30 16:17 ` Bernd Schubert
2024-05-30 17:30 ` Kent Overstreet
2024-05-30 19:09 ` Josef Bacik
2024-05-30 20:05 ` Kent Overstreet
2024-05-31 3:53 ` [PATCH] fs: sys_ringbuffer() (WIP) Kent Overstreet
2024-05-31 13:11 ` kernel test robot
2024-05-31 15:49 ` kernel test robot
2024-05-30 16:21 ` [PATCH RFC v2 00/19] fuse: fuse-over-io-uring Jens Axboe
2024-05-30 16:32 ` Bernd Schubert
2024-05-30 17:26 ` Jens Axboe
2024-05-30 17:16 ` Kent Overstreet
2024-05-30 17:28 ` Jens Axboe
2024-05-30 17:58 ` Kent Overstreet
2024-05-30 18:48 ` Jens Axboe
2024-05-30 19:35 ` Kent Overstreet
2024-05-31 0:11 ` Jens Axboe
2024-06-04 23:45 ` Ming Lei
2024-05-30 20:47 ` Josef Bacik
2024-06-11 8:20 ` Miklos Szeredi
2024-06-11 10:26 ` Bernd Schubert
2024-06-11 15:35 ` Miklos Szeredi
2024-06-11 17:37 ` Bernd Schubert
2024-06-11 23:35 ` Kent Overstreet
2024-06-12 13:53 ` Bernd Schubert
2024-06-12 14:19 ` Kent Overstreet
2024-06-12 15:40 ` Bernd Schubert
2024-06-12 15:55 ` Kent Overstreet
2024-06-12 16:15 ` Bernd Schubert
2024-06-12 16:24 ` Kent Overstreet
2024-06-12 16:44 ` Bernd Schubert
2024-06-12 7:39 ` Miklos Szeredi
2024-06-12 13:32 ` Bernd Schubert
2024-06-12 13:46 ` Bernd Schubert
2024-06-12 14:07 ` Miklos Szeredi
2024-06-12 14:56 ` Bernd Schubert
2024-08-02 23:03 ` Bernd Schubert
2024-08-29 22:32 ` Bernd Schubert
2024-08-30 13:12 ` Jens Axboe
2024-08-30 13:28 ` Bernd Schubert
2024-08-30 13:33 ` Jens Axboe
2024-08-30 14:55 ` Pavel Begunkov
2024-08-30 15:10 ` Bernd Schubert
2024-08-30 20:08 ` Jens Axboe
2024-08-31 0:02 ` Bernd Schubert
2024-08-31 0:49 ` Bernd Schubert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240530153713.GD2205585@perftesting \
--to=josef@toxicpanda.com \
--cc=amir73il@gmail.com \
--cc=bernd.schubert@fastmail.fm \
--cc=bschubert@ddn.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.