From: Hao Xu <hao.xu@linux.dev>
To: Christian Brauner <brauner@kernel.org>,
djwong@kernel.org, Dave Chinner <david@fromorbit.com>,
Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org,
Dominique Martinet <asmadeus@codewreck.org>,
Pavel Begunkov <asml.silence@gmail.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Stefan Roesch <shr@fb.com>, Clay Harris <bugs@claycon.org>,
linux-fsdevel@vger.kernel.org, Wanpeng Li <wanpengli@tencent.com>
Subject: Re: [PATCH 3/5] io_uring: add support for getdents
Date: Mon, 31 Jul 2023 02:02:25 +0800 [thread overview]
Message-ID: <7adaea37-f84f-9415-41fa-53d36833f8f2@linux.dev> (raw)
In-Reply-To: <20230727-salbe-kurvigen-31b410c07bb9@brauner>
Hi Christian,
On 7/27/23 22:27, Christian Brauner wrote:
> On Thu, Jul 27, 2023 at 07:51:19PM +0800, Hao Xu wrote:
>> On 7/26/23 23:00, Christian Brauner wrote:
>>> On Tue, Jul 18, 2023 at 09:21:10PM +0800, Hao Xu wrote:
>>>> From: Hao Xu <howeyxu@tencent.com>
>>>>
>>>> This add support for getdents64 to io_uring, acting exactly like the
>>>> syscall: the directory is iterated from it's current's position as
>>>> stored in the file struct, and the file's position is updated exactly as
>>>> if getdents64 had been called.
>>>>
>>>> For filesystems that support NOWAIT in iterate_shared(), try to use it
>>>> first; if a user already knows the filesystem they use do not support
>>>> nowait they can force async through IOSQE_ASYNC in the sqe flags,
>>>> avoiding the need to bounce back through a useless EAGAIN return.
>>>>
>>>> Co-developed-by: Dominique Martinet <asmadeus@codewreck.org>
>>>> Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
>>>> Signed-off-by: Hao Xu <howeyxu@tencent.com>
>>>> ---
>>>> include/uapi/linux/io_uring.h | 7 +++++
>>>> io_uring/fs.c | 55 +++++++++++++++++++++++++++++++++++
>>>> io_uring/fs.h | 3 ++
>>>> io_uring/opdef.c | 8 +++++
>>>> 4 files changed, 73 insertions(+)
>>>>
>>>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>>> index 36f9c73082de..b200b2600622 100644
>>>> --- a/include/uapi/linux/io_uring.h
>>>> +++ b/include/uapi/linux/io_uring.h
>>>> @@ -65,6 +65,7 @@ struct io_uring_sqe {
>>>> __u32 xattr_flags;
>>>> __u32 msg_ring_flags;
>>>> __u32 uring_cmd_flags;
>>>> + __u32 getdents_flags;
>>>> };
>>>> __u64 user_data; /* data to be passed back at completion time */
>>>> /* pack this to avoid bogus arm OABI complaints */
>>>> @@ -235,6 +236,7 @@ enum io_uring_op {
>>>> IORING_OP_URING_CMD,
>>>> IORING_OP_SEND_ZC,
>>>> IORING_OP_SENDMSG_ZC,
>>>> + IORING_OP_GETDENTS,
>>>> /* this goes last, obviously */
>>>> IORING_OP_LAST,
>>>> @@ -273,6 +275,11 @@ enum io_uring_op {
>>>> */
>>>> #define SPLICE_F_FD_IN_FIXED (1U << 31) /* the last bit of __u32 */
>>>> +/*
>>>> + * sqe->getdents_flags
>>>> + */
>>>> +#define IORING_GETDENTS_REWIND (1U << 0)
>>>> +
>>>> /*
>>>> * POLL_ADD flags. Note that since sqe->poll_events is the flag space, the
>>>> * command flags for POLL_ADD are stored in sqe->len.
>>>> diff --git a/io_uring/fs.c b/io_uring/fs.c
>>>> index f6a69a549fd4..480f25677fed 100644
>>>> --- a/io_uring/fs.c
>>>> +++ b/io_uring/fs.c
>>>> @@ -47,6 +47,13 @@ struct io_link {
>>>> int flags;
>>>> };
>>>> +struct io_getdents {
>>>> + struct file *file;
>>>> + struct linux_dirent64 __user *dirent;
>>>> + unsigned int count;
>>>> + int flags;
>>>> +};
>>>> +
>>>> int io_renameat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>> {
>>>> struct io_rename *ren = io_kiocb_to_cmd(req, struct io_rename);
>>>> @@ -291,3 +298,51 @@ void io_link_cleanup(struct io_kiocb *req)
>>>> putname(sl->oldpath);
>>>> putname(sl->newpath);
>>>> }
>>>> +
>>>> +int io_getdents_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>> +{
>>>> + struct io_getdents *gd = io_kiocb_to_cmd(req, struct io_getdents);
>>>> +
>>>> + if (READ_ONCE(sqe->off) != 0)
>>>> + return -EINVAL;
>>>> +
>>>> + gd->dirent = u64_to_user_ptr(READ_ONCE(sqe->addr));
>>>> + gd->count = READ_ONCE(sqe->len);
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +int io_getdents(struct io_kiocb *req, unsigned int issue_flags)
>>>> +{
>>>> + struct io_getdents *gd = io_kiocb_to_cmd(req, struct io_getdents);
>>>> + struct file *file = req->file;
>>>> + unsigned long getdents_flags = 0;
>>>> + bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>>>
>>> Hm, I'm not sure what exactly the rules are for IO_URING_F_NONBLOCK.
>>> But to point this out:
>>>
>>> vfs_getdents()
>>> -> iterate_dir()
>>> {
>>> if (shared)
>>> res = down_read_killable(&inode->i_rwsem);
>>> else
>>> res = down_write_killable(&inode->i_rwsem);
>>> }
>>>
>>> which means you can still end up sleeping here before you go into a
>>> filesystem that does actually support non-waiting getdents. So if you
>>> have concurrent operations that grab inode lock (touch, mkdir etc) you
>>> can end up sleeping here.
>>>
>>> Is that intentional or an oversight? If the former can someone please
>>> explain the rules and why it's fine in this case?
>>
>> I actually saw this semaphore, and there is another xfs lock in
>> file_accessed
>> --> touch_atime
>> --> inode_update_time
>> --> inode->i_op->update_time == xfs_vn_update_time
>>
>> Forgot to point them out in the cover-letter..., I didn't modify them
>> since I'm not very sure about if we should do so, and I saw Stefan's
>> patchset didn't modify them too.
>>
>> My personnal thinking is we should apply trylock logic for this
>> inode->i_rwsem. For xfs lock in touch_atime, we should do that since it
>> doesn't make sense to rollback all the stuff while we are almost at the
>> end of getdents because of a lock.
>
> That manoeuvres around the problem. Which I'm slightly more sensitive
> too as this review is a rather expensive one.
>
> Plus, it seems fixable in at least two ways:
>
> For both we need to be able to tell the filesystem that a nowait atime
> update is requested. Simple thing seems to me to add a S_NOWAIT flag to
> file_time_flags and passing that via i_op->update_time() which already
> has a flag argument. That would likely also help kiocb_modified().
>
> file_accessed()
> -> touch_atime()
> -> inode_update_time()
> -> i_op->update_time == xfs_vn_update_time()
>
> Then we have two options afaict:
>
> (1) best-effort atime update
>
> file_accessed() already has the builtin assumption that updating atime
> might fail for other reasons - see the comment in there. So it is
> somewhat best-effort already.
>
> (2) move atime update before calling into filesystem
>
> If we want to be sure that access time is updated when a readdir request
> is issued through io_uring then we need to have file_accessed() give a
> return value and expose a new helper for io_uring or modify
> vfs_getdents() to do something like:
>
> vfs_getdents()
> {
> if (nowait)
> down_read_trylock()
>
> if (!IS_DEADDIR(inode)) {
> ret = file_accessed(file);
> if (ret == -EAGAIN)
> goto out_unlock;
>
> f_op->iterate_shared()
> }
> }
>
> It's not unprecedented to do update atime before the actual operation
> has been done afaict. That's already the case in xfs_file_write_checks()
> which is called before anything is written. So that seems ok.
I'm not familiar with this part(the time update), I guess we should
revert the updated time if we succeed to do file_accessed(file) but
fail somewhere later in f_op->iterate_shared()? Or is it definitely
counted as an "access" as long as we start to call getdents to a file?
Thanks,
Hao
>
> Does any of these two options work for the xfs maintainers and Jens?
next prev parent reply other threads:[~2023-07-30 18:02 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-18 13:21 [PATCH v4 0/5] io_uring getdents Hao Xu
2023-07-18 13:21 ` [PATCH 1/5] fs: split off vfs_getdents function of getdents64 syscall Hao Xu
2023-07-18 13:21 ` [PATCH 2/5] vfs_getdents/struct dir_context: add flags field Hao Xu
2023-07-18 13:21 ` [PATCH 3/5] io_uring: add support for getdents Hao Xu
2023-07-19 8:56 ` Hao Xu
2023-07-26 15:00 ` Christian Brauner
2023-07-27 11:51 ` Hao Xu
2023-07-27 14:27 ` Christian Brauner
2023-07-27 15:12 ` Pavel Begunkov
2023-07-27 15:52 ` Christian Brauner
2023-07-27 16:17 ` Pavel Begunkov
2023-07-27 16:28 ` Christian Brauner
2023-07-31 1:58 ` Dave Chinner
2023-07-31 7:34 ` Hao Xu
2023-07-31 7:50 ` Christian Brauner
2023-07-31 7:40 ` Christian Brauner
2023-07-30 18:02 ` Hao Xu [this message]
2023-07-31 8:18 ` Christian Brauner
2023-07-31 9:31 ` Hao Xu
2023-07-31 1:33 ` Dave Chinner
2023-07-31 8:13 ` Christian Brauner
2023-07-31 15:26 ` Darrick J. Wong
2023-07-31 22:18 ` Dave Chinner
2023-08-01 0:28 ` Jens Axboe
2023-08-01 0:47 ` Matthew Wilcox
2023-08-01 0:49 ` Jens Axboe
2023-08-01 1:01 ` Matthew Wilcox
2023-08-01 7:00 ` Christian Brauner
2023-08-01 6:59 ` Christian Brauner
2023-08-01 7:17 ` Christian Brauner
2023-08-08 4:34 ` Hao Xu
2023-08-08 5:18 ` Hao Xu
2023-08-08 9:33 ` Hao Xu
2023-08-08 22:55 ` Jens Axboe
2023-08-01 18:39 ` Hao Xu
2023-07-18 13:21 ` [PATCH 4/5] xfs: add NOWAIT semantics for readdir Hao Xu
2023-07-19 2:35 ` kernel test robot
2023-07-19 6:22 ` Fwd: " Hao Xu
2023-07-18 13:21 ` [PATCH RFC 5/5] disable fixed file for io_uring getdents for now Hao Xu
2023-07-26 14:23 ` Christian Brauner
2023-07-27 12:09 ` Hao Xu
2023-07-19 6:04 ` [PATCH v4 0/5] io_uring getdents Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7adaea37-f84f-9415-41fa-53d36833f8f2@linux.dev \
--to=hao.xu@linux.dev \
--cc=asmadeus@codewreck.org \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=bugs@claycon.org \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=io-uring@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=shr@fb.com \
--cc=viro@zeniv.linux.org.uk \
--cc=wanpengli@tencent.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.