All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Gabriel Krisman Bertazi <krisman@suse.de>
Cc: io-uring@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	linux-mm@kvack.org
Subject: Re: [PATCH 2/2] io_uring: introduce IORING_OP_MMAP
Date: Fri, 30 Jan 2026 08:55:50 -0700	[thread overview]
Message-ID: <efa7714d-565d-41c4-af85-d7a89e7fa399@kernel.dk> (raw)
In-Reply-To: <20260129221138.897715-3-krisman@suse.de>

On 1/29/26 3:11 PM, Gabriel Krisman Bertazi wrote:
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index b5b23c0d5283..e24fe3b00059 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -74,6 +74,7 @@ struct io_uring_sqe {
>  		__u32		install_fd_flags;
>  		__u32		nop_flags;
>  		__u32		pipe_flags;
> +		__u32		mmap_flags;
>  	};
>  	__u64	user_data;	/* data to be passed back at completion time */
>  	/* pack this to avoid bogus arm OABI complaints */
> @@ -303,6 +304,7 @@ enum io_uring_op {
>  	IORING_OP_PIPE,
>  	IORING_OP_NOP128,
>  	IORING_OP_URING_CMD128,
> +	IORING_OP_MMAP,
>  
>  	/* this goes last, obviously */
>  	IORING_OP_LAST,
> @@ -1113,6 +1115,14 @@ struct zcrx_ctrl {
>  	};
>  };
>  
> +struct io_uring_mmap_desc {
> +	void __user *addr;
> +	unsigned long len;
> +	unsigned long pgoff;
> +	unsigned int prot;
> +	unsigned int flags;
> +};

You can't use pointers or unsigned long or unsigned int in a uapi, as
they'd be different sizes on 32-bit and 64-bit. And then you need compat
handling. It's much better to make this:

struct io_uring_mmap_desc {
	__u64 addr
	__u64 len;
	__u64 pgoff;
	__u32 prot;
	__u32 flags;
};

and then generally also a good idea to have a bit of expansion space
there, so you don't need a new desc down the line.

> +int io_mmap_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> +{
> +	struct io_mmap_data *mmap = io_kiocb_to_cmd(req, struct io_mmap_data);
> +	struct io_mmap_async *maps;
> +	int nr_maps;
> +
> +	mmap->uaddr = u64_to_user_ptr(READ_ONCE(sqe->addr));
> +	mmap->flags = READ_ONCE(sqe->mmap_flags);
> +	nr_maps = READ_ONCE(sqe->len);
> +
> +	if (mmap->flags & MAP_ANONYMOUS && req->cqe.fd != -1)
> +		return -EINVAL;
> +	if (nr_maps < 0 || nr_maps > MMAP_MAX_BATCH)
> +		return -EINVAL;
> +	if (!access_ok(mmap->uaddr, nr_maps*sizeof(struct io_uring_mmap_desc)))
> +		return -EFAULT;

Does this access_ok actually provide anything? We're copying it in later
anyway, no?

> +static int io_prep_mmap_hugetlb(struct file **filp, unsigned long *len,
> +				int flags)
> +{
> +	if (*filp) {
> +		*len = ALIGN(*len, huge_page_size(hstate_file(*filp)));
> +	} else {
> +		struct hstate *hs;
> +		unsigned long nlen = *len;
> +
> +		hs = hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);
> +		if (!hs)
> +			return -EINVAL;
> +		nlen = ALIGN(nlen, huge_page_size(hs));
> +		*filp = hugetlb_file_setup(HUGETLB_ANON_FILE, nlen,
> +					   VM_NORESERVE,
> +					   HUGETLB_ANONHUGE_INODE,
> +				   (flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK);

This looks like it dips into vm_mmap_pgoff(). More on that below.

> +		desc->addr = (void *) vm_mmap_pgoff(file,
> +					   (unsigned long) desc->addr,
> +					   len, desc->prot, flags, desc->pgoff);

One concern here is that vm_mmap_pgoff() ends up doing:

mmap_write_lock_killable(mm)
	grabs mm lock, can block, for a long time?

which could potentially stall the io_uring pipeline for a long time.
Ideally you'd be able to do something where you try to grab the mm lock
from io_mmap(), and if it fails, then either fail the request (if it's a
killable thing) or punt it with -EAGAIN to let an io-wq thread handle
it.

I'm not so sure simply wrapping vm_mmap_pgoff() either directly or
indirectly via the hugetlb stuff is going to be super useful, if we can
end up blocking for a long time on these operations.

-- 
Jens Axboe

  parent reply	other threads:[~2026-01-30 15:55 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-29 22:11 [PATCH 0/2] Introduce IORING_OP_MMAP Gabriel Krisman Bertazi
2026-01-29 22:11 ` [PATCH 1/2] io_uring: Support commands with optional file descriptors Gabriel Krisman Bertazi
2026-01-29 22:11 ` [PATCH 2/2] io_uring: introduce IORING_OP_MMAP Gabriel Krisman Bertazi
2026-01-30  6:03   ` kernel test robot
2026-01-30 15:47     ` Gabriel Krisman Bertazi
2026-01-30 15:55   ` Jens Axboe [this message]
2026-02-09 14:36     ` Gabriel Krisman Bertazi
2026-02-01 17:46 ` [PATCH 0/2] Introduce IORING_OP_MMAP David Hildenbrand (arm)
2026-02-01 18:16   ` Jens Axboe
2026-02-02  9:02     ` David Hildenbrand (arm)
2026-02-02 14:34       ` Jens Axboe
2026-02-04 19:47         ` David Hildenbrand (arm)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=efa7714d-565d-41c4-af85-d7a89e7fa399@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=io-uring@vger.kernel.org \
    --cc=krisman@suse.de \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.