[PATCH 1/2] include/linux/fs.h: add inode_lock

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] include/linux/fs.h: add inode_lock_killable()
@ 2025-04-29  9:46 Max Kellermann
  2025-04-29  9:46 ` [PATCH 2/2] fs: make several inode lock operations killable Max Kellermann
  0 siblings, 1 reply; 8+ messages in thread
From: Max Kellermann @ 2025-04-29  9:46 UTC (permalink / raw)
  To: viro, brauner, jack, linux-fsdevel, linux-kernel; +Cc: Max Kellermann

Prepare for making inode operations killable while they're waiting for
the lock.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
 include/linux/fs.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 016b0fe1536e..5e4ac873228d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -867,6 +867,11 @@ static inline void inode_lock(struct inode *inode)
 	down_write(&inode->i_rwsem);
 }
 
+static inline __must_check int inode_lock_killable(struct inode *inode)
+{
+	return down_write_killable(&inode->i_rwsem);
+}
+
 static inline void inode_unlock(struct inode *inode)
 {
 	up_write(&inode->i_rwsem);
@@ -877,6 +882,11 @@ static inline void inode_lock_shared(struct inode *inode)
 	down_read(&inode->i_rwsem);
 }
 
+static inline __must_check int inode_lock_shared_killable(struct inode *inode)
+{
+	return down_read_killable(&inode->i_rwsem);
+}
+
 static inline void inode_unlock_shared(struct inode *inode)
 {
 	up_read(&inode->i_rwsem);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] fs: make several inode lock operations killable
  2025-04-29  9:46 [PATCH 1/2] include/linux/fs.h: add inode_lock_killable() Max Kellermann
@ 2025-04-29  9:46 ` Max Kellermann
  2025-04-29 11:12   ` Christian Brauner
  0 siblings, 1 reply; 8+ messages in thread
From: Max Kellermann @ 2025-04-29  9:46 UTC (permalink / raw)
  To: viro, brauner, jack, linux-fsdevel, linux-kernel; +Cc: Max Kellermann

Allows killing processes that are waiting for the inode lock.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
---
 fs/open.c       | 14 +++++++++++---
 fs/read_write.c |  4 +++-
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index a9063cca9911..7828234a7caa 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -60,7 +60,10 @@ int do_truncate(struct mnt_idmap *idmap, struct dentry *dentry,
 	if (ret)
 		newattrs.ia_valid |= ret | ATTR_FORCE;
 
-	inode_lock(dentry->d_inode);
+	ret = inode_lock_killable(dentry->d_inode);
+	if (ret)
+		return ret;
+
 	/* Note any delegations or leases have already been broken: */
 	ret = notify_change(idmap, dentry, &newattrs, NULL);
 	inode_unlock(dentry->d_inode);
@@ -635,7 +638,9 @@ int chmod_common(const struct path *path, umode_t mode)
 	if (error)
 		return error;
 retry_deleg:
-	inode_lock(inode);
+	error = inode_lock_killable(inode);
+	if (error)
+		goto out_mnt_unlock;
 	error = security_path_chmod(path, mode);
 	if (error)
 		goto out_unlock;
@@ -650,6 +655,7 @@ int chmod_common(const struct path *path, umode_t mode)
 		if (!error)
 			goto retry_deleg;
 	}
+out_mnt_unlock:
 	mnt_drop_write(path->mnt);
 	return error;
 }
@@ -769,7 +775,9 @@ int chown_common(const struct path *path, uid_t user, gid_t group)
 		return -EINVAL;
 	if ((group != (gid_t)-1) && !setattr_vfsgid(&newattrs, gid))
 		return -EINVAL;
-	inode_lock(inode);
+	error = inode_lock_killable(inode);
+	if (error)
+		return error;
 	if (!S_ISDIR(inode->i_mode))
 		newattrs.ia_valid |= ATTR_KILL_SUID | ATTR_KILL_PRIV |
 				     setattr_should_drop_sgid(idmap, inode);
diff --git a/fs/read_write.c b/fs/read_write.c
index bb0ed26a0b3a..0ef70e128c4a 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -332,7 +332,9 @@ loff_t default_llseek(struct file *file, loff_t offset, int whence)
 	struct inode *inode = file_inode(file);
 	loff_t retval;
 
-	inode_lock(inode);
+	retval = inode_lock_killable(inode);
+	if (retval)
+		return retval;
 	switch (whence) {
 		case SEEK_END:
 			offset += i_size_read(inode);
-- 
2.47.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] fs: make several inode lock operations killable
  2025-04-29  9:46 ` [PATCH 2/2] fs: make several inode lock operations killable Max Kellermann
@ 2025-04-29 11:12   ` Christian Brauner
  2025-04-29 11:28     ` Max Kellermann
  0 siblings, 1 reply; 8+ messages in thread
From: Christian Brauner @ 2025-04-29 11:12 UTC (permalink / raw)
  To: Max Kellermann; +Cc: viro, jack, linux-fsdevel, linux-kernel

On Tue, Apr 29, 2025 at 11:46:44AM +0200, Max Kellermann wrote:
> Allows killing processes that are waiting for the inode lock.
> 
> Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
> ---

Oha, that's interesting.

>  fs/open.c       | 14 +++++++++++---
>  fs/read_write.c |  4 +++-
>  2 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index a9063cca9911..7828234a7caa 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -60,7 +60,10 @@ int do_truncate(struct mnt_idmap *idmap, struct dentry *dentry,
>  	if (ret)
>  		newattrs.ia_valid |= ret | ATTR_FORCE;
>  
> -	inode_lock(dentry->d_inode);
> +	ret = inode_lock_killable(dentry->d_inode);
> +	if (ret)
> +		return ret;
> +
>  	/* Note any delegations or leases have already been broken: */
>  	ret = notify_change(idmap, dentry, &newattrs, NULL);
>  	inode_unlock(dentry->d_inode);
> @@ -635,7 +638,9 @@ int chmod_common(const struct path *path, umode_t mode)
>  	if (error)
>  		return error;
>  retry_deleg:
> -	inode_lock(inode);
> +	error = inode_lock_killable(inode);

That's probably fine.

> +	if (error)
> +		goto out_mnt_unlock;
>  	error = security_path_chmod(path, mode);
>  	if (error)
>  		goto out_unlock;
> @@ -650,6 +655,7 @@ int chmod_common(const struct path *path, umode_t mode)
>  		if (!error)
>  			goto retry_deleg;
>  	}
> +out_mnt_unlock:
>  	mnt_drop_write(path->mnt);
>  	return error;
>  }
> @@ -769,7 +775,9 @@ int chown_common(const struct path *path, uid_t user, gid_t group)
>  		return -EINVAL;
>  	if ((group != (gid_t)-1) && !setattr_vfsgid(&newattrs, gid))
>  		return -EINVAL;
> -	inode_lock(inode);
> +	error = inode_lock_killable(inode);
> +	if (error)
> +		return error;

That too.

>  	if (!S_ISDIR(inode->i_mode))
>  		newattrs.ia_valid |= ATTR_KILL_SUID | ATTR_KILL_PRIV |
>  				     setattr_should_drop_sgid(idmap, inode);
> diff --git a/fs/read_write.c b/fs/read_write.c
> index bb0ed26a0b3a..0ef70e128c4a 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -332,7 +332,9 @@ loff_t default_llseek(struct file *file, loff_t offset, int whence)
>  	struct inode *inode = file_inode(file);
>  	loff_t retval;
>  
> -	inode_lock(inode);
> +	retval = inode_lock_killable(inode);

That change doesn't seem so obviously fine to me.
Either way I'd like to see this split in three patches and some
reasoning why it's safe and some justification why it's wanted...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] fs: make several inode lock operations killable
  2025-04-29 11:12   ` Christian Brauner
@ 2025-04-29 11:28     ` Max Kellermann
  2025-05-12  9:52       ` Christian Brauner
  0 siblings, 1 reply; 8+ messages in thread
From: Max Kellermann @ 2025-04-29 11:28 UTC (permalink / raw)
  To: Christian Brauner; +Cc: viro, jack, linux-fsdevel, linux-kernel

On Tue, Apr 29, 2025 at 1:12 PM Christian Brauner <brauner@kernel.org> wrote:
> > --- a/fs/read_write.c
> > +++ b/fs/read_write.c
> > @@ -332,7 +332,9 @@ loff_t default_llseek(struct file *file, loff_t offset, int whence)
> >       struct inode *inode = file_inode(file);
> >       loff_t retval;
> >
> > -     inode_lock(inode);
> > +     retval = inode_lock_killable(inode);
>
> That change doesn't seem so obviously fine to me.

Why do you think so? And how is this different than the other two.

> Either way I'd like to see this split in three patches and some
> reasoning why it's safe and some justification why it's wanted...

Sure I can split this patch, but before I spend the time, I'd like us
first to agree that the patch is useful.

I wrote this while debugging lots of netfs/nfs/ceph bugs; even without
these bugs, I/O operations on netfs can take a looong time (if the
server is slow) and the inode is locked during the whole operation.
That can cause lots of other processes to go stuck, and my patch
allows these operations to be canceled. Without this, the processes
not only remain stuck until the inode is unlocked, but all stuck
processes have to finish all their I/O before anything can continue.
I'd like to be able to "kill -9" stuck processes.

A similar NFS-specific patch I wrote was merged last year:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=38a125b31504f91bf6fdd3cfc3a3e9a721e6c97a
The same patch for Ceph was never merged (but not explicitly
rejected): https://lore.kernel.org/lkml/20241206165014.165614-1-max.kellermann@ionos.com/
Prior to my work, several NFS operations were already killable.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] fs: make several inode lock operations killable
  2025-04-29 11:28     ` Max Kellermann
@ 2025-05-12  9:52       ` Christian Brauner
  2025-05-12 17:33         ` Jan Kara
  0 siblings, 1 reply; 8+ messages in thread
From: Christian Brauner @ 2025-05-12  9:52 UTC (permalink / raw)
  To: Max Kellermann, jack; +Cc: viro, linux-fsdevel, linux-kernel

Sorry, coming back to this now. I lost sight of this patch.

On Tue, Apr 29, 2025 at 01:28:49PM +0200, Max Kellermann wrote:
> On Tue, Apr 29, 2025 at 1:12 PM Christian Brauner <brauner@kernel.org> wrote:
> > > --- a/fs/read_write.c
> > > +++ b/fs/read_write.c
> > > @@ -332,7 +332,9 @@ loff_t default_llseek(struct file *file, loff_t offset, int whence)
> > >       struct inode *inode = file_inode(file);
> > >       loff_t retval;
> > >
> > > -     inode_lock(inode);
> > > +     retval = inode_lock_killable(inode);
> >
> > That change doesn't seem so obviously fine to me.
> 
> Why do you think so? And how is this different than the other two.

chown_common() and chmod_common() are very close to the syscall boundary
so it's very unlikely that we run into weird issues apart from userspace
regression when they suddenly fail a change for new unexpected reasons.

But just look at default_llseek():

    > git grep default_llseek | wc -l
    461

That is a lot of stuff and it's not immediately clear how deeply or
nested they are called. For example from overlayfs in stacked
callchains. Who knows what strange assumptions some of the callers have
including the possible return values from that helper.

> 
> > Either way I'd like to see this split in three patches and some
> > reasoning why it's safe and some justification why it's wanted...
> 
> Sure I can split this patch, but before I spend the time, I'd like us
> first to agree that the patch is useful.

This is difficult to answer. Yes, on the face of it it seems useful to
be able to kill various operations that sleep on inode lock but who
knows what implicit guarantees/expectations we're going to break if we
do it. Maybe @Jan has some thoughts here as well.

> I wrote this while debugging lots of netfs/nfs/ceph bugs; even without
> these bugs, I/O operations on netfs can take a looong time (if the
> server is slow) and the inode is locked during the whole operation.
> That can cause lots of other processes to go stuck, and my patch
> allows these operations to be canceled. Without this, the processes
> not only remain stuck until the inode is unlocked, but all stuck
> processes have to finish all their I/O before anything can continue.
> I'd like to be able to "kill -9" stuck processes.
> 
> A similar NFS-specific patch I wrote was merged last year:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=38a125b31504f91bf6fdd3cfc3a3e9a721e6c97a
> The same patch for Ceph was never merged (but not explicitly
> rejected): https://lore.kernel.org/lkml/20241206165014.165614-1-max.kellermann@ionos.com/
> Prior to my work, several NFS operations were already killable.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] fs: make several inode lock operations killable
  2025-05-12  9:52       ` Christian Brauner
@ 2025-05-12 17:33         ` Jan Kara
  2025-05-13  7:30           ` Christian Brauner
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Kara @ 2025-05-12 17:33 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Max Kellermann, jack, viro, linux-fsdevel, linux-kernel

On Mon 12-05-25 11:52:14, Christian Brauner wrote:
> On Tue, Apr 29, 2025 at 01:28:49PM +0200, Max Kellermann wrote:
> > On Tue, Apr 29, 2025 at 1:12 PM Christian Brauner <brauner@kernel.org> wrote:
> > > > --- a/fs/read_write.c
> > > > +++ b/fs/read_write.c
> > > > @@ -332,7 +332,9 @@ loff_t default_llseek(struct file *file, loff_t offset, int whence)
> > > >       struct inode *inode = file_inode(file);
> > > >       loff_t retval;
> > > >
> > > > -     inode_lock(inode);
> > > > +     retval = inode_lock_killable(inode);
> > >
> > > That change doesn't seem so obviously fine to me.
> > 
> > Why do you think so? And how is this different than the other two.
> 
> chown_common() and chmod_common() are very close to the syscall boundary
> so it's very unlikely that we run into weird issues apart from userspace
> regression when they suddenly fail a change for new unexpected reasons.
> 
> But just look at default_llseek():
> 
>     > git grep default_llseek | wc -l
>     461
> 
> That is a lot of stuff and it's not immediately clear how deeply or
> nested they are called. For example from overlayfs in stacked
> callchains. Who knows what strange assumptions some of the callers have
> including the possible return values from that helper.
> 
> > 
> > > Either way I'd like to see this split in three patches and some
> > > reasoning why it's safe and some justification why it's wanted...
> > 
> > Sure I can split this patch, but before I spend the time, I'd like us
> > first to agree that the patch is useful.
> 
> This is difficult to answer. Yes, on the face of it it seems useful to
> be able to kill various operations that sleep on inode lock but who
> knows what implicit guarantees/expectations we're going to break if we
> do it. Maybe @Jan has some thoughts here as well.

So I think having lock killable is useful and generally this should be safe
wrt userspace (the process will get killed before it can notice the
difference anyway). The concern about guarantees / expectations is still
valid for the *kernel* code (which is I think what you meant above). So I
guess some audit of kernel paths that can end up calling ->llseek handler
and whether they are OK with the handler returning EINTR is needed. I
expect this will be fine but I would not bet too much on it :)

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] fs: make several inode lock operations killable
  2025-05-12 17:33         ` Jan Kara
@ 2025-05-13  7:30           ` Christian Brauner
  2025-05-13 10:41             ` Max Kellermann
  0 siblings, 1 reply; 8+ messages in thread
From: Christian Brauner @ 2025-05-13  7:30 UTC (permalink / raw)
  To: Jan Kara; +Cc: Max Kellermann, viro, linux-fsdevel, linux-kernel

On Mon, May 12, 2025 at 07:33:13PM +0200, Jan Kara wrote:
> On Mon 12-05-25 11:52:14, Christian Brauner wrote:
> > On Tue, Apr 29, 2025 at 01:28:49PM +0200, Max Kellermann wrote:
> > > On Tue, Apr 29, 2025 at 1:12 PM Christian Brauner <brauner@kernel.org> wrote:
> > > > > --- a/fs/read_write.c
> > > > > +++ b/fs/read_write.c
> > > > > @@ -332,7 +332,9 @@ loff_t default_llseek(struct file *file, loff_t offset, int whence)
> > > > >       struct inode *inode = file_inode(file);
> > > > >       loff_t retval;
> > > > >
> > > > > -     inode_lock(inode);
> > > > > +     retval = inode_lock_killable(inode);
> > > >
> > > > That change doesn't seem so obviously fine to me.
> > > 
> > > Why do you think so? And how is this different than the other two.
> > 
> > chown_common() and chmod_common() are very close to the syscall boundary
> > so it's very unlikely that we run into weird issues apart from userspace
> > regression when they suddenly fail a change for new unexpected reasons.
> > 
> > But just look at default_llseek():
> > 
> >     > git grep default_llseek | wc -l
> >     461
> > 
> > That is a lot of stuff and it's not immediately clear how deeply or
> > nested they are called. For example from overlayfs in stacked
> > callchains. Who knows what strange assumptions some of the callers have
> > including the possible return values from that helper.
> > 
> > > 
> > > > Either way I'd like to see this split in three patches and some
> > > > reasoning why it's safe and some justification why it's wanted...
> > > 
> > > Sure I can split this patch, but before I spend the time, I'd like us
> > > first to agree that the patch is useful.
> > 
> > This is difficult to answer. Yes, on the face of it it seems useful to
> > be able to kill various operations that sleep on inode lock but who
> > knows what implicit guarantees/expectations we're going to break if we
> > do it. Maybe @Jan has some thoughts here as well.
> 
> So I think having lock killable is useful and generally this should be safe
> wrt userspace (the process will get killed before it can notice the
> difference anyway). The concern about guarantees / expectations is still
> valid for the *kernel* code (which is I think what you meant above). So I
> guess some audit of kernel paths that can end up calling ->llseek handler
> and whether they are OK with the handler returning EINTR is needed. I
> expect this will be fine but I would not bet too much on it :)

Great. @Max you want to send an updated version where split into
separate patches?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/2] fs: make several inode lock operations killable
  2025-05-13  7:30           ` Christian Brauner
@ 2025-05-13 10:41             ` Max Kellermann
  0 siblings, 0 replies; 8+ messages in thread
From: Max Kellermann @ 2025-05-13 10:41 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, viro, linux-fsdevel, linux-kernel

On Tue, May 13, 2025 at 9:30 AM Christian Brauner <brauner@kernel.org> wrote:
> Great. @Max you want to send an updated version where split into
> separate patches?

So this is about fear of not-so-perfect existing code calling this.
Yes, will post splitted v2 today.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-05-13 10:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-29  9:46 [PATCH 1/2] include/linux/fs.h: add inode_lock_killable() Max Kellermann
2025-04-29  9:46 ` [PATCH 2/2] fs: make several inode lock operations killable Max Kellermann
2025-04-29 11:12   ` Christian Brauner
2025-04-29 11:28     ` Max Kellermann
2025-05-12  9:52       ` Christian Brauner
2025-05-12 17:33         ` Jan Kara
2025-05-13  7:30           ` Christian Brauner
2025-05-13 10:41             ` Max Kellermann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).