From: Amir Goldstein <amir73il@gmail.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: Christian Brauner <brauner@kernel.org>,
Josef Bacik <josef@toxicpanda.com>,
David Howells <dhowells@redhat.com>,
Al Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
Miklos Szeredi <miklos@szeredi.hu>,
linux-fsdevel@vger.kernel.org
Subject: [PATCH] fs: allow calling kiocb_end_write() unmatched with kiocb_start_write()
Date: Tue, 21 Nov 2023 15:25:51 +0200 [thread overview]
Message-ID: <20231121132551.2337431-1-amir73il@gmail.com> (raw)
We want to move kiocb_start_write() into vfs_iocb_iter_write(), after
the permission hook, but leave kiocb_end_write() in the write completion
handler of the callers of vfs_iocb_iter_write().
After this change, there will be no way of knowing in completion handler,
if write has failed before or after calling kiocb_start_write().
Add a flag IOCB_WRITE_STARTED, which is set and cleared internally by
kiocb_{start,end}_write(), so that kiocb_end_write() could be called for
cleanup of async write, whether it was successful or whether it failed
before or after calling kiocb_start_write().
This flag must not be copied by stacked filesystems (e.g. overlayfs)
that clone the iocb to another iocb for io request on a backing file.
Link: https://lore.kernel.org/r/CAOQ4uxihfJJRxxUhAmOwtD97Lg8PL8RgXw88rH1UfEeP8AtP+w@mail.gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
Jens,
The kiocb_{start,end}_write() helpers were originally taken out of
an early version of the permission hooks cleanup patches [1].
This early version of the helpers had the IOCB_WRITE_STARTED flag, but
when I posted the helpers independently from the cleanup series, you had
correctly pointed out [2] that IOCB_WRITE_STARTED is not needed for any
of the existing callers of kiocb_{start,end}_write(), so I removed it for
the final version of the helpers that got merged.
When coming back to the permission hook cleanup, I see that the
IOCB_WRITE_STARTED flag is needed to allow the new semantics of calling
kiocb_start_write() inside vfs_iocb_iter_write() and kiocb_end_write()
in completion callback.
I realize these semantics are not great, but the alternative of moving
the permission hook from vfs_iocb_iter_write() into all the callers is
worse IMO.
Can you accept the solution with IOCB_WRITE_STARTED state flag?
Have a better idea for cleaner iocb issue semantics that will allow
calling the permission hook with freeze protection held?
Thanks,
Amir.
[1] https://lore.kernel.org/linux-fsdevel/20231114153321.1716028-1-amir73il@gmail.com/
[2] https://lore.kernel.org/linux-fsdevel/e6948836-1d9d-4219-9a21-a2ab442a9a34@kernel.dk/
fs/overlayfs/file.c | 2 +-
include/linux/fs.h | 18 ++++++++++++++++--
2 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 131621daeb13..e4baa4ea5c95 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -455,7 +455,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
aio_req->orig_iocb = iocb;
kiocb_clone(&aio_req->iocb, iocb, get_file(real.file));
- aio_req->iocb.ki_flags = ifl;
+ aio_req->iocb.ki_flags &= ifl;
aio_req->iocb.ki_complete = ovl_aio_queue_completion;
refcount_set(&aio_req->ref, 2);
kiocb_start_write(&aio_req->iocb);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 98b7a7a8c42e..64414e146e1e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -352,6 +352,8 @@ enum rw_hint {
* unrelated IO (like cache flushing, new IO generation, etc).
*/
#define IOCB_DIO_CALLER_COMP (1 << 22)
+/* file_start_write() was called */
+#define IOCB_WRITE_STARTED (1 << 23)
/* for use in trace events */
#define TRACE_IOCB_STRINGS \
@@ -366,7 +368,8 @@ enum rw_hint {
{ IOCB_WAITQ, "WAITQ" }, \
{ IOCB_NOIO, "NOIO" }, \
{ IOCB_ALLOC_CACHE, "ALLOC_CACHE" }, \
- { IOCB_DIO_CALLER_COMP, "CALLER_COMP" }
+ { IOCB_DIO_CALLER_COMP, "CALLER_COMP" }, \
+ { IOCB_WRITE_STARTED, "WRITE_STARTED" }
struct kiocb {
struct file *ki_filp;
@@ -2183,12 +2186,15 @@ static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)
};
}
+/* IOCB flags associated with ki_filp state must not be cloned */
+#define IOCB_CLONE_FLAGS_MASK (~IOCB_WRITE_STARTED)
+
static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src,
struct file *filp)
{
*kiocb = (struct kiocb) {
.ki_filp = filp,
- .ki_flags = kiocb_src->ki_flags,
+ .ki_flags = kiocb_src->ki_flags & IOCB_CLONE_FLAGS_MASK,
.ki_ioprio = kiocb_src->ki_ioprio,
.ki_pos = kiocb_src->ki_pos,
};
@@ -2744,12 +2750,16 @@ static inline void kiocb_start_write(struct kiocb *iocb)
{
struct inode *inode = file_inode(iocb->ki_filp);
+ if (WARN_ON_ONCE(iocb->ki_flags & IOCB_WRITE_STARTED))
+ return;
+
sb_start_write(inode->i_sb);
/*
* Fool lockdep by telling it the lock got released so that it
* doesn't complain about the held lock when we return to userspace.
*/
__sb_writers_release(inode->i_sb, SB_FREEZE_WRITE);
+ iocb->ki_flags |= IOCB_WRITE_STARTED;
}
/**
@@ -2762,11 +2772,15 @@ static inline void kiocb_end_write(struct kiocb *iocb)
{
struct inode *inode = file_inode(iocb->ki_filp);
+ if (!(iocb->ki_flags & IOCB_WRITE_STARTED))
+ return;
+
/*
* Tell lockdep we inherited freeze protection from submission thread.
*/
__sb_writers_acquired(inode->i_sb, SB_FREEZE_WRITE);
sb_end_write(inode->i_sb);
+ iocb->ki_flags &= ~IOCB_WRITE_STARTED;
}
/*
--
2.34.1
next reply other threads:[~2023-11-21 13:25 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-21 13:25 Amir Goldstein [this message]
2023-11-21 21:00 ` [PATCH] fs: allow calling kiocb_end_write() unmatched with kiocb_start_write() Josef Bacik
2023-11-21 21:30 ` Dave Chinner
2023-11-22 5:38 ` Amir Goldstein
2023-11-22 6:50 ` Christoph Hellwig
2023-11-22 10:20 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231121132551.2337431-1-amir73il@gmail.com \
--to=amir73il@gmail.com \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=dhowells@redhat.com \
--cc=jack@suse.cz \
--cc=josef@toxicpanda.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).