* [PATCH 0/4] Overlayfs aio cleanups
@ 2023-09-12 17:36 Amir Goldstein
2023-09-12 17:36 ` [PATCH 1/4] ovl: protect copying of realinode attributes to ovl inode Amir Goldstein
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Amir Goldstein @ 2023-09-12 17:36 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-unionfs
Miklos,
Following are the cleanups and minor fixes I did in preparation for
factoring out of read/write passthrough code.
There are very minor fixes, which do not desreve backporting IMO.
The only meaningful fix (fd leak) was already merged to master.
I will send out the re-factoring patch separately to fsdevel, but
I've split those prep patches for lack of wider audience interest.
Thanks,
Amir.
Amir Goldstein (4):
ovl: protect copying of realinode attributes to ovl inode
ovl: use simpler function to convert iocb to rw flags
ovl: propagate IOCB_APPEND flag on writes to realfile
ovl: move ovl_file_accessed() to aio completion
fs/overlayfs/file.c | 75 +++++++++++++++++++++++++--------------------
fs/overlayfs/util.c | 2 ++
2 files changed, 44 insertions(+), 33 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/4] ovl: protect copying of realinode attributes to ovl inode
2023-09-12 17:36 [PATCH 0/4] Overlayfs aio cleanups Amir Goldstein
@ 2023-09-12 17:36 ` Amir Goldstein
2023-09-24 9:28 ` Amir Goldstein
2023-09-12 17:36 ` [PATCH 2/4] ovl: use simpler function to convert iocb to rw flags Amir Goldstein
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Amir Goldstein @ 2023-09-12 17:36 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-unionfs
ovl_copyattr() may be called concurrently from aio completion context
without any lock and that could lead to overlay inode attributes getting
permanently out of sync with real inode attributes.
Similarly, ovl_file_accessed() is always called without any lock to do
"compare & copy" of mtime/ctime from realinode to inode.
Use ovl inode spinlock to protect those two helpers.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/overlayfs/file.c | 2 ++
fs/overlayfs/util.c | 2 ++
2 files changed, 4 insertions(+)
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 4193633c4c7a..c6ad84cf9246 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -249,6 +249,7 @@ static void ovl_file_accessed(struct file *file)
if (!upperinode)
return;
+ spin_lock(&inode->i_lock);
ctime = inode_get_ctime(inode);
uctime = inode_get_ctime(upperinode);
if ((!timespec64_equal(&inode->i_mtime, &upperinode->i_mtime) ||
@@ -256,6 +257,7 @@ static void ovl_file_accessed(struct file *file)
inode->i_mtime = upperinode->i_mtime;
inode_set_ctime_to_ts(inode, uctime);
}
+ spin_unlock(&inode->i_lock);
touch_atime(&file->f_path);
}
diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
index 89e0d60d35b6..b7922862ece3 100644
--- a/fs/overlayfs/util.c
+++ b/fs/overlayfs/util.c
@@ -1403,6 +1403,7 @@ void ovl_copyattr(struct inode *inode)
realinode = ovl_i_path_real(inode, &realpath);
real_idmap = mnt_idmap(realpath.mnt);
+ spin_lock(&inode->i_lock);
vfsuid = i_uid_into_vfsuid(real_idmap, realinode);
vfsgid = i_gid_into_vfsgid(real_idmap, realinode);
@@ -1413,4 +1414,5 @@ void ovl_copyattr(struct inode *inode)
inode->i_mtime = realinode->i_mtime;
inode_set_ctime_to_ts(inode, inode_get_ctime(realinode));
i_size_write(inode, i_size_read(realinode));
+ spin_unlock(&inode->i_lock);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/4] ovl: use simpler function to convert iocb to rw flags
2023-09-12 17:36 [PATCH 0/4] Overlayfs aio cleanups Amir Goldstein
2023-09-12 17:36 ` [PATCH 1/4] ovl: protect copying of realinode attributes to ovl inode Amir Goldstein
@ 2023-09-12 17:36 ` Amir Goldstein
2023-09-12 17:36 ` [PATCH 3/4] ovl: propagate IOCB_APPEND flag on writes to realfile Amir Goldstein
2023-09-12 17:36 ` [PATCH 4/4] ovl: move ovl_file_accessed() to aio completion Amir Goldstein
3 siblings, 0 replies; 9+ messages in thread
From: Amir Goldstein @ 2023-09-12 17:36 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-unionfs, Alessio Balsini
Overlayfs implements its own function to translate iocb flags into rw
flags, so that they can be passed into another vfs call.
With commit ce71bfea207b4 ("fs: align IOCB_* flags with RWF_* flags")
Jens created a 1:1 matching between the iocb flags and rw flags,
simplifying the conversion.
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/overlayfs/file.c | 34 ++++++++++++++--------------------
1 file changed, 14 insertions(+), 20 deletions(-)
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index c6ad84cf9246..98f0dedffc0f 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -262,20 +262,12 @@ static void ovl_file_accessed(struct file *file)
touch_atime(&file->f_path);
}
-static rwf_t ovl_iocb_to_rwf(int ifl)
+#define OVL_IOCB_MASK \
+ (IOCB_NOWAIT | IOCB_HIPRI | IOCB_DSYNC | IOCB_SYNC)
+
+static rwf_t iocb_to_rw_flags(int flags)
{
- rwf_t flags = 0;
-
- if (ifl & IOCB_NOWAIT)
- flags |= RWF_NOWAIT;
- if (ifl & IOCB_HIPRI)
- flags |= RWF_HIPRI;
- if (ifl & IOCB_DSYNC)
- flags |= RWF_DSYNC;
- if (ifl & IOCB_SYNC)
- flags |= RWF_SYNC;
-
- return flags;
+ return (__force rwf_t)(flags & OVL_IOCB_MASK);
}
static inline void ovl_aio_put(struct ovl_aio_req *aio_req)
@@ -333,8 +325,9 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
old_cred = ovl_override_creds(file_inode(file)->i_sb);
if (is_sync_kiocb(iocb)) {
- ret = vfs_iter_read(real.file, iter, &iocb->ki_pos,
- ovl_iocb_to_rwf(iocb->ki_flags));
+ rwf_t rwf = iocb_to_rw_flags(iocb->ki_flags);
+
+ ret = vfs_iter_read(real.file, iter, &iocb->ki_pos, rwf);
} else {
struct ovl_aio_req *aio_req;
@@ -369,7 +362,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
struct fd real;
const struct cred *old_cred;
ssize_t ret;
- int ifl = iocb->ki_flags;
+ int flags = iocb->ki_flags;
if (!iov_iter_count(iter))
return 0;
@@ -391,13 +384,14 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
goto out_fdput;
if (!ovl_should_sync(OVL_FS(inode->i_sb)))
- ifl &= ~(IOCB_DSYNC | IOCB_SYNC);
+ flags &= ~(IOCB_DSYNC | IOCB_SYNC);
old_cred = ovl_override_creds(file_inode(file)->i_sb);
if (is_sync_kiocb(iocb)) {
+ rwf_t rwf = iocb_to_rw_flags(flags);
+
file_start_write(real.file);
- ret = vfs_iter_write(real.file, iter, &iocb->ki_pos,
- ovl_iocb_to_rwf(ifl));
+ ret = vfs_iter_write(real.file, iter, &iocb->ki_pos, rwf);
file_end_write(real.file);
/* Update size */
ovl_copyattr(inode);
@@ -412,7 +406,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
real.flags = 0;
aio_req->orig_iocb = iocb;
kiocb_clone(&aio_req->iocb, iocb, get_file(real.file));
- aio_req->iocb.ki_flags = ifl;
+ aio_req->iocb.ki_flags = flags;
aio_req->iocb.ki_complete = ovl_aio_rw_complete;
refcount_set(&aio_req->ref, 2);
kiocb_start_write(&aio_req->iocb);
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/4] ovl: propagate IOCB_APPEND flag on writes to realfile
2023-09-12 17:36 [PATCH 0/4] Overlayfs aio cleanups Amir Goldstein
2023-09-12 17:36 ` [PATCH 1/4] ovl: protect copying of realinode attributes to ovl inode Amir Goldstein
2023-09-12 17:36 ` [PATCH 2/4] ovl: use simpler function to convert iocb to rw flags Amir Goldstein
@ 2023-09-12 17:36 ` Amir Goldstein
2023-09-12 17:36 ` [PATCH 4/4] ovl: move ovl_file_accessed() to aio completion Amir Goldstein
3 siblings, 0 replies; 9+ messages in thread
From: Amir Goldstein @ 2023-09-12 17:36 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-unionfs
If ovl file is opened O_APPEND, the underlying realfile is also
opened O_APPEND, so it makes sense to propagate the IOCB_APPEND flags
on sync writes to realfile, just as we do with aio writes.
Effectively, because sync ovl writes are protected by inode lock,
this change only makes a difference if the realfile is written to (size
extending writes) from underneath overlayfs. The behavior in this case
is undefined, so it is ok if we change the behavior (to fail the ovl
IOCB_APPEND write).
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/overlayfs/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 98f0dedffc0f..ffebffa710b5 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -263,7 +263,7 @@ static void ovl_file_accessed(struct file *file)
}
#define OVL_IOCB_MASK \
- (IOCB_NOWAIT | IOCB_HIPRI | IOCB_DSYNC | IOCB_SYNC)
+ (IOCB_NOWAIT | IOCB_HIPRI | IOCB_DSYNC | IOCB_SYNC | IOCB_APPEND)
static rwf_t iocb_to_rw_flags(int flags)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 4/4] ovl: move ovl_file_accessed() to aio completion
2023-09-12 17:36 [PATCH 0/4] Overlayfs aio cleanups Amir Goldstein
` (2 preceding siblings ...)
2023-09-12 17:36 ` [PATCH 3/4] ovl: propagate IOCB_APPEND flag on writes to realfile Amir Goldstein
@ 2023-09-12 17:36 ` Amir Goldstein
2023-09-28 5:11 ` Amir Goldstein
3 siblings, 1 reply; 9+ messages in thread
From: Amir Goldstein @ 2023-09-12 17:36 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-unionfs
Refactor the io completion helpers to handlers with clear hierarchy:
- ovl_aio_rw_complete() is called on completion of submitted aio
`- ovl_aio_cleanup() is called after any aio submission attempt
`- ovl_rw_complete() is called after any io attempt
Move ovl_copyattr() and ovl_file_accessed() to the common helper
ovl_rw_complete(), so that they are called after aio completion.
Note that moving ovl_file_accessed() changes touch_atime() therein to
be called with mounter credentials in the sync read case.
It does not seem to matter with which credentials touch_atime() is called.
If it did matter, we would have needed to override to mounter credentials
in ovl_update_time() which calls touch_atime() on upper dentry.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/overlayfs/file.c | 39 ++++++++++++++++++++++++++-------------
1 file changed, 26 insertions(+), 13 deletions(-)
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index ffebffa710b5..05ec614f7054 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -278,29 +278,43 @@ static inline void ovl_aio_put(struct ovl_aio_req *aio_req)
}
}
-static void ovl_aio_cleanup_handler(struct ovl_aio_req *aio_req)
+/* Completion for submitted/failed sync/async rw io */
+static void ovl_rw_complete(struct kiocb *orig_iocb)
+{
+ struct file *file = orig_iocb->ki_filp;
+
+ if (orig_iocb->ki_flags & IOCB_WRITE) {
+ /* Update size/mtime */
+ ovl_copyattr(file_inode(file));
+ } else {
+ /* Update atime */
+ ovl_file_accessed(file);
+ }
+}
+
+/* Completion for submitted/failed async rw io */
+static void ovl_aio_cleanup(struct ovl_aio_req *aio_req)
{
struct kiocb *iocb = &aio_req->iocb;
struct kiocb *orig_iocb = aio_req->orig_iocb;
- if (iocb->ki_flags & IOCB_WRITE) {
- struct inode *inode = file_inode(orig_iocb->ki_filp);
-
+ if (iocb->ki_flags & IOCB_WRITE)
kiocb_end_write(iocb);
- ovl_copyattr(inode);
- }
orig_iocb->ki_pos = iocb->ki_pos;
+ ovl_rw_complete(orig_iocb);
+
ovl_aio_put(aio_req);
}
+/* Completion for submitted async rw io */
static void ovl_aio_rw_complete(struct kiocb *iocb, long res)
{
struct ovl_aio_req *aio_req = container_of(iocb,
struct ovl_aio_req, iocb);
struct kiocb *orig_iocb = aio_req->orig_iocb;
- ovl_aio_cleanup_handler(aio_req);
+ ovl_aio_cleanup(aio_req);
orig_iocb->ki_complete(orig_iocb, res);
}
@@ -328,6 +342,7 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
rwf_t rwf = iocb_to_rw_flags(iocb->ki_flags);
ret = vfs_iter_read(real.file, iter, &iocb->ki_pos, rwf);
+ ovl_rw_complete(iocb);
} else {
struct ovl_aio_req *aio_req;
@@ -344,11 +359,10 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
ret = vfs_iocb_iter_read(real.file, &aio_req->iocb, iter);
ovl_aio_put(aio_req);
if (ret != -EIOCBQUEUED)
- ovl_aio_cleanup_handler(aio_req);
+ ovl_aio_cleanup(aio_req);
}
out:
revert_creds(old_cred);
- ovl_file_accessed(file);
out_fdput:
fdput(real);
@@ -386,15 +400,14 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
if (!ovl_should_sync(OVL_FS(inode->i_sb)))
flags &= ~(IOCB_DSYNC | IOCB_SYNC);
- old_cred = ovl_override_creds(file_inode(file)->i_sb);
+ old_cred = ovl_override_creds(inode->i_sb);
if (is_sync_kiocb(iocb)) {
rwf_t rwf = iocb_to_rw_flags(flags);
file_start_write(real.file);
ret = vfs_iter_write(real.file, iter, &iocb->ki_pos, rwf);
file_end_write(real.file);
- /* Update size */
- ovl_copyattr(inode);
+ ovl_rw_complete(iocb);
} else {
struct ovl_aio_req *aio_req;
@@ -413,7 +426,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
ret = vfs_iocb_iter_write(real.file, &aio_req->iocb, iter);
ovl_aio_put(aio_req);
if (ret != -EIOCBQUEUED)
- ovl_aio_cleanup_handler(aio_req);
+ ovl_aio_cleanup(aio_req);
}
out:
revert_creds(old_cred);
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/4] ovl: protect copying of realinode attributes to ovl inode
2023-09-12 17:36 ` [PATCH 1/4] ovl: protect copying of realinode attributes to ovl inode Amir Goldstein
@ 2023-09-24 9:28 ` Amir Goldstein
2023-09-26 21:17 ` Amir Goldstein
0 siblings, 1 reply; 9+ messages in thread
From: Amir Goldstein @ 2023-09-24 9:28 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-unionfs, Jan Kara, Christian Brauner, Jeff Layton
[adding some folks from the multigrain ctime discussion]
On Tue, Sep 12, 2023 at 8:36 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> ovl_copyattr() may be called concurrently from aio completion context
> without any lock and that could lead to overlay inode attributes getting
> permanently out of sync with real inode attributes.
>
> Similarly, ovl_file_accessed() is always called without any lock to do
> "compare & copy" of mtime/ctime from realinode to inode.
>
> Use ovl inode spinlock to protect those two helpers.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/overlayfs/file.c | 2 ++
> fs/overlayfs/util.c | 2 ++
> 2 files changed, 4 insertions(+)
>
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index 4193633c4c7a..c6ad84cf9246 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -249,6 +249,7 @@ static void ovl_file_accessed(struct file *file)
> if (!upperinode)
> return;
>
> + spin_lock(&inode->i_lock);
> ctime = inode_get_ctime(inode);
> uctime = inode_get_ctime(upperinode);
> if ((!timespec64_equal(&inode->i_mtime, &upperinode->i_mtime) ||
> !timespec64_equal(&ctime, &uctime))) {
> inode->i_mtime = upperinode->i_mtime;
> inode_set_ctime_to_ts(inode, uctime);
> }
> + spin_unlock(&inode->i_lock);
>
[patch manually edited to add missing line to context]
Miklos,
I am having latent concerns about this patch, which is currently
in overlayfs-next.
What was your reason for the "compare & copy" optimization?
I assume it was to avoid cache line bouncing. yes?
Would the added spinlock make this optimization moot?
I am asking since I calculated that on X86-64 and with
CONFIG_FS_POSIX_ACL && CONFIG_SECURITY
i_ctime.tv_nsec and i_lock are on the same cache line.
I should note for the non-overlayfs developers, that we do
not care to worry about changes to the upperinode that are
done behind the back of overlayfs.
Ovrerlayfs assumes that it is the only one to make changes
to the upperinode, otherwise, behavior is undefined.
This is the justification for only taking the overlayfs inode
i_lock for synchronization of this "compare & copy" routine.
Also, I think we only need to care that the overlayfs inode
timestamps are eventually consistent, after the last
ovl_file_accessed() call.
The decraled reason (in commit message) for adding the lock
here is to protect from races in the ovl aio code path, which was
not around when the ovl_file_accessed() helper was written.
But now I am wondering:
- Is the lock needed in all the sync calls?
- Is it needed even if there was no aio at all?
I think the answer is yes to both questions, so the patch
can remain in its current form, but I'm not 100% sure.
> touch_atime(&file->f_path);
> }
> diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> index 89e0d60d35b6..b7922862ece3 100644
> --- a/fs/overlayfs/util.c
> +++ b/fs/overlayfs/util.c
> @@ -1403,6 +1403,7 @@ void ovl_copyattr(struct inode *inode)
> realinode = ovl_i_path_real(inode, &realpath);
> real_idmap = mnt_idmap(realpath.mnt);
>
> + spin_lock(&inode->i_lock);
> vfsuid = i_uid_into_vfsuid(real_idmap, realinode);
> vfsgid = i_gid_into_vfsgid(real_idmap, realinode);
>
> inode->i_uid = vfsuid_into_kuid(vfsuid);
> inode->i_gid = vfsgid_into_kgid(vfsgid);
> inode->i_mode = realinode->i_mode;
> inode->i_mtime = realinode->i_mtime;
> inode_set_ctime_to_ts(inode, inode_get_ctime(realinode));
> i_size_write(inode, i_size_read(realinode));
> + spin_unlock(&inode->i_lock);
My concerns about this part of the patch is that AFAIK,
all the calls of ovl_copyattr(), except for the one in aio completion,
are called with overlayfs inode mutex lock held.
So this lock is strictly only needed because of the aio write case,
but I think we need to have the lock in place for all the other
cases to protect them against racing with aio completion?
I guess the overhead of the spinlock is not a worry if the
mutex is already held, even though we do call ovl_copyattr()
twice (before and after) in some operations.
Anyway, I just want to make sure that I did not make any
mistakes in my analysis of the problem and the fix.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/4] ovl: protect copying of realinode attributes to ovl inode
2023-09-24 9:28 ` Amir Goldstein
@ 2023-09-26 21:17 ` Amir Goldstein
2023-10-01 14:11 ` Amir Goldstein
0 siblings, 1 reply; 9+ messages in thread
From: Amir Goldstein @ 2023-09-26 21:17 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-unionfs, Jan Kara, Christian Brauner, Jeff Layton
On Sun, Sep 24, 2023 at 12:28 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> [adding some folks from the multigrain ctime discussion]
>
> On Tue, Sep 12, 2023 at 8:36 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > ovl_copyattr() may be called concurrently from aio completion context
> > without any lock and that could lead to overlay inode attributes getting
> > permanently out of sync with real inode attributes.
> >
> > Similarly, ovl_file_accessed() is always called without any lock to do
> > "compare & copy" of mtime/ctime from realinode to inode.
> >
> > Use ovl inode spinlock to protect those two helpers.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> > fs/overlayfs/file.c | 2 ++
> > fs/overlayfs/util.c | 2 ++
> > 2 files changed, 4 insertions(+)
> >
> > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > index 4193633c4c7a..c6ad84cf9246 100644
> > --- a/fs/overlayfs/file.c
> > +++ b/fs/overlayfs/file.c
> > @@ -249,6 +249,7 @@ static void ovl_file_accessed(struct file *file)
> > if (!upperinode)
> > return;
> >
> > + spin_lock(&inode->i_lock);
> > ctime = inode_get_ctime(inode);
> > uctime = inode_get_ctime(upperinode);
> > if ((!timespec64_equal(&inode->i_mtime, &upperinode->i_mtime) ||
> > !timespec64_equal(&ctime, &uctime))) {
> > inode->i_mtime = upperinode->i_mtime;
> > inode_set_ctime_to_ts(inode, uctime);
> > }
> > + spin_unlock(&inode->i_lock);
> >
>
> [patch manually edited to add missing line to context]
>
> Miklos,
>
> I am having latent concerns about this patch, which is currently
> in overlayfs-next.
>
> What was your reason for the "compare & copy" optimization?
> I assume it was to avoid cache line bouncing. yes?
> Would the added spinlock make this optimization moot?
> I am asking since I calculated that on X86-64 and with
> CONFIG_FS_POSIX_ACL && CONFIG_SECURITY
> i_ctime.tv_nsec and i_lock are on the same cache line.
>
> I should note for the non-overlayfs developers, that we do
> not care to worry about changes to the upperinode that are
> done behind the back of overlayfs.
>
> Ovrerlayfs assumes that it is the only one to make changes
> to the upperinode, otherwise, behavior is undefined.
>
> This is the justification for only taking the overlayfs inode
> i_lock for synchronization of this "compare & copy" routine.
>
> Also, I think we only need to care that the overlayfs inode
> timestamps are eventually consistent, after the last
> ovl_file_accessed() call.
>
> The decraled reason (in commit message) for adding the lock
> here is to protect from races in the ovl aio code path, which was
> not around when the ovl_file_accessed() helper was written.
>
> But now I am wondering:
> - Is the lock needed in all the sync calls?
> - Is it needed even if there was no aio at all?
>
> I think the answer is yes to both questions, so the patch
> can remain in its current form, but I'm not 100% sure.
>
> > touch_atime(&file->f_path);
> > }
> > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> > index 89e0d60d35b6..b7922862ece3 100644
> > --- a/fs/overlayfs/util.c
> > +++ b/fs/overlayfs/util.c
> > @@ -1403,6 +1403,7 @@ void ovl_copyattr(struct inode *inode)
> > realinode = ovl_i_path_real(inode, &realpath);
> > real_idmap = mnt_idmap(realpath.mnt);
> >
> > + spin_lock(&inode->i_lock);
> > vfsuid = i_uid_into_vfsuid(real_idmap, realinode);
> > vfsgid = i_gid_into_vfsgid(real_idmap, realinode);
> >
> > inode->i_uid = vfsuid_into_kuid(vfsuid);
> > inode->i_gid = vfsgid_into_kgid(vfsgid);
> > inode->i_mode = realinode->i_mode;
> > inode->i_mtime = realinode->i_mtime;
> > inode_set_ctime_to_ts(inode, inode_get_ctime(realinode));
> > i_size_write(inode, i_size_read(realinode));
> > + spin_unlock(&inode->i_lock);
>
> My concerns about this part of the patch is that AFAIK,
> all the calls of ovl_copyattr(), except for the one in aio completion,
> are called with overlayfs inode mutex lock held.
>
> So this lock is strictly only needed because of the aio write case,
> but I think we need to have the lock in place for all the other
> cases to protect them against racing with aio completion?
>
> I guess the overhead of the spinlock is not a worry if the
> mutex is already held, even though we do call ovl_copyattr()
> twice (before and after) in some operations.
>
> Anyway, I just want to make sure that I did not make any
> mistakes in my analysis of the problem and the fix.
>
FYI, for now I dropped this patch from overlayfs-next, because
I was warned that ovl_copyattr() could be called from interrupt
context in aio completion.
I will bring it back after I sort out the ovl aio completion context.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 4/4] ovl: move ovl_file_accessed() to aio completion
2023-09-12 17:36 ` [PATCH 4/4] ovl: move ovl_file_accessed() to aio completion Amir Goldstein
@ 2023-09-28 5:11 ` Amir Goldstein
0 siblings, 0 replies; 9+ messages in thread
From: Amir Goldstein @ 2023-09-28 5:11 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-unionfs
On Tue, Sep 12, 2023 at 8:37 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> Refactor the io completion helpers to handlers with clear hierarchy:
> - ovl_aio_rw_complete() is called on completion of submitted aio
> `- ovl_aio_cleanup() is called after any aio submission attempt
> `- ovl_rw_complete() is called after any io attempt
>
> Move ovl_copyattr() and ovl_file_accessed() to the common helper
> ovl_rw_complete(), so that they are called after aio completion.
>
> Note that moving ovl_file_accessed() changes touch_atime() therein to
> be called with mounter credentials in the sync read case.
>
> It does not seem to matter with which credentials touch_atime() is called.
> If it did matter, we would have needed to override to mounter credentials
> in ovl_update_time() which calls touch_atime() on upper dentry.
>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
This commit is nonsense. It does not fix any bugs.
Direct aio also calls file_accessed() before submission,
regardless of overlayfs.
And on top of that, this commit has a bug that regresses sync io.
So I am dropping it from overlayfs-next.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 1/4] ovl: protect copying of realinode attributes to ovl inode
2023-09-26 21:17 ` Amir Goldstein
@ 2023-10-01 14:11 ` Amir Goldstein
0 siblings, 0 replies; 9+ messages in thread
From: Amir Goldstein @ 2023-10-01 14:11 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: linux-unionfs, Jan Kara, Christian Brauner, Jeff Layton
On Wed, Sep 27, 2023 at 12:17 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Sun, Sep 24, 2023 at 12:28 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > [adding some folks from the multigrain ctime discussion]
> >
> > On Tue, Sep 12, 2023 at 8:36 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > ovl_copyattr() may be called concurrently from aio completion context
> > > without any lock and that could lead to overlay inode attributes getting
> > > permanently out of sync with real inode attributes.
> > >
> > > Similarly, ovl_file_accessed() is always called without any lock to do
> > > "compare & copy" of mtime/ctime from realinode to inode.
> > >
> > > Use ovl inode spinlock to protect those two helpers.
> > >
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > > ---
> > > fs/overlayfs/file.c | 2 ++
> > > fs/overlayfs/util.c | 2 ++
> > > 2 files changed, 4 insertions(+)
> > >
> > > diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> > > index 4193633c4c7a..c6ad84cf9246 100644
> > > --- a/fs/overlayfs/file.c
> > > +++ b/fs/overlayfs/file.c
> > > @@ -249,6 +249,7 @@ static void ovl_file_accessed(struct file *file)
> > > if (!upperinode)
> > > return;
> > >
> > > + spin_lock(&inode->i_lock);
> > > ctime = inode_get_ctime(inode);
> > > uctime = inode_get_ctime(upperinode);
> > > if ((!timespec64_equal(&inode->i_mtime, &upperinode->i_mtime) ||
> > > !timespec64_equal(&ctime, &uctime))) {
> > > inode->i_mtime = upperinode->i_mtime;
> > > inode_set_ctime_to_ts(inode, uctime);
> > > }
> > > + spin_unlock(&inode->i_lock);
> > >
> >
> > [patch manually edited to add missing line to context]
> >
> > Miklos,
> >
> > I am having latent concerns about this patch, which is currently
> > in overlayfs-next.
> >
> > What was your reason for the "compare & copy" optimization?
> > I assume it was to avoid cache line bouncing. yes?
> > Would the added spinlock make this optimization moot?
> > I am asking since I calculated that on X86-64 and with
> > CONFIG_FS_POSIX_ACL && CONFIG_SECURITY
> > i_ctime.tv_nsec and i_lock are on the same cache line.
> >
> > I should note for the non-overlayfs developers, that we do
> > not care to worry about changes to the upperinode that are
> > done behind the back of overlayfs.
> >
> > Ovrerlayfs assumes that it is the only one to make changes
> > to the upperinode, otherwise, behavior is undefined.
> >
> > This is the justification for only taking the overlayfs inode
> > i_lock for synchronization of this "compare & copy" routine.
> >
> > Also, I think we only need to care that the overlayfs inode
> > timestamps are eventually consistent, after the last
> > ovl_file_accessed() call.
> >
> > The decraled reason (in commit message) for adding the lock
> > here is to protect from races in the ovl aio code path, which was
> > not around when the ovl_file_accessed() helper was written.
> >
> > But now I am wondering:
> > - Is the lock needed in all the sync calls?
> > - Is it needed even if there was no aio at all?
> >
> > I think the answer is yes to both questions, so the patch
> > can remain in its current form, but I'm not 100% sure.
> >
> > > touch_atime(&file->f_path);
> > > }
> > > diff --git a/fs/overlayfs/util.c b/fs/overlayfs/util.c
> > > index 89e0d60d35b6..b7922862ece3 100644
> > > --- a/fs/overlayfs/util.c
> > > +++ b/fs/overlayfs/util.c
> > > @@ -1403,6 +1403,7 @@ void ovl_copyattr(struct inode *inode)
> > > realinode = ovl_i_path_real(inode, &realpath);
> > > real_idmap = mnt_idmap(realpath.mnt);
> > >
> > > + spin_lock(&inode->i_lock);
> > > vfsuid = i_uid_into_vfsuid(real_idmap, realinode);
> > > vfsgid = i_gid_into_vfsgid(real_idmap, realinode);
> > >
> > > inode->i_uid = vfsuid_into_kuid(vfsuid);
> > > inode->i_gid = vfsgid_into_kgid(vfsgid);
> > > inode->i_mode = realinode->i_mode;
> > > inode->i_mtime = realinode->i_mtime;
> > > inode_set_ctime_to_ts(inode, inode_get_ctime(realinode));
> > > i_size_write(inode, i_size_read(realinode));
> > > + spin_unlock(&inode->i_lock);
> >
> > My concerns about this part of the patch is that AFAIK,
> > all the calls of ovl_copyattr(), except for the one in aio completion,
> > are called with overlayfs inode mutex lock held.
> >
> > So this lock is strictly only needed because of the aio write case,
> > but I think we need to have the lock in place for all the other
> > cases to protect them against racing with aio completion?
> >
> > I guess the overhead of the spinlock is not a worry if the
> > mutex is already held, even though we do call ovl_copyattr()
> > twice (before and after) in some operations.
> >
> > Anyway, I just want to make sure that I did not make any
> > mistakes in my analysis of the problem and the fix.
> >
>
> FYI, for now I dropped this patch from overlayfs-next, because
> I was warned that ovl_copyattr() could be called from interrupt
> context in aio completion.
>
> I will bring it back after I sort out the ovl aio completion context.
>
FYI, after applying "ovl: punt write aio completion to workqueue" [1]
to overlayfs-next, I restored the half of this patch that adds
spinlock protection to ovl_copyattr().
But I dropped the half that adds spinlock protection to
ovl_file_accessed(), because I do not want to add a lock to the read
path and because ovl_file_accessed() is not anymore unsafe than
relatime_need_update() already is w.r.t atomic access to struct timespec64
and w.r.t atomic access to mtime/ctime/atime fields, so I guess nobody
really cares so much about being very strict with relatime rules...
Thanks,
Amir.
[1] https://lore.kernel.org/linux-unionfs/20230928064636.487317-1-amir73il@gmail.com/
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-10-01 14:11 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-12 17:36 [PATCH 0/4] Overlayfs aio cleanups Amir Goldstein
2023-09-12 17:36 ` [PATCH 1/4] ovl: protect copying of realinode attributes to ovl inode Amir Goldstein
2023-09-24 9:28 ` Amir Goldstein
2023-09-26 21:17 ` Amir Goldstein
2023-10-01 14:11 ` Amir Goldstein
2023-09-12 17:36 ` [PATCH 2/4] ovl: use simpler function to convert iocb to rw flags Amir Goldstein
2023-09-12 17:36 ` [PATCH 3/4] ovl: propagate IOCB_APPEND flag on writes to realfile Amir Goldstein
2023-09-12 17:36 ` [PATCH 4/4] ovl: move ovl_file_accessed() to aio completion Amir Goldstein
2023-09-28 5:11 ` Amir Goldstein
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).