linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2] fuse: fix race in fuse_notify_store()
@ 2025-01-30 10:16 Luis Henriques
  2025-02-21 17:40 ` Luis Henriques
  2025-02-24 13:36 ` Miklos Szeredi
  0 siblings, 2 replies; 6+ messages in thread
From: Luis Henriques @ 2025-01-30 10:16 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, Bernd Schubert, Luis Henriques,
	Teng Qin

Userspace filesystems can push data for a specific inode without it being
explicitly requested.  This can be accomplished by using NOTIFY_STORE.
However, this may race against another process performing different
operations on the same inode.

If, for example, there is a process reading from it, it may happen that it
will block waiting for data to be available (locking the folio), while the
FUSE server will also block trying to lock the same folio to update it with
the inode data.

The easiest solution, as suggested by Miklos, is to allow the userspace
filesystem to skip locked folios.

Link: https://lore.kernel.org/CH2PR14MB41040692ABC50334F500789ED6C89@CH2PR14MB4104.namprd14.prod.outlook.com
Reported-by: Teng Qin <tqin@jumptrading.com>
Originally-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Luis Henriques <luis@igalia.com>
---
Hi!

Here's v2.  Other than fixing the bug pointed out by Bernd (thanks!), I've
also added an explanation to the 'XXX' comment.  As a matter of fact, I've
took another look at that code, and I felt compelled to remove that comment,
as using PAGE_SIZE seems to be the right thing.

Anyway, I'm still thinking that probably NOTIFY_STORE should *always* have
this behaviour, without the need for userspace to explicitly setting a flag.

Changes since v1:
- Only skip if __filemap_get_folio() returns -EAGAIN (Bernd)

 fs/fuse/dev.c             | 30 +++++++++++++++++++++++-------
 include/uapi/linux/fuse.h |  8 +++++++-
 2 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 27ccae63495d..309651f82ca4 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1630,6 +1630,7 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
 	unsigned int num;
 	loff_t file_size;
 	loff_t end;
+	int fgp_flags = FGP_LOCK | FGP_ACCESSED | FGP_CREAT;
 
 	err = -EINVAL;
 	if (size < sizeof(outarg))
@@ -1645,6 +1646,9 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
 
 	nodeid = outarg.nodeid;
 
+	if (outarg.flags & FUSE_NOTIFY_STORE_NOWAIT)
+		fgp_flags |= FGP_NOWAIT;
+
 	down_read(&fc->killsb);
 
 	err = -ENOENT;
@@ -1668,14 +1672,26 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
 		struct page *page;
 		unsigned int this_num;
 
-		folio = filemap_grab_folio(mapping, index);
-		err = PTR_ERR(folio);
-		if (IS_ERR(folio))
-			goto out_iput;
+		folio = __filemap_get_folio(mapping, index, fgp_flags,
+					    mapping_gfp_mask(mapping));
+		err = PTR_ERR_OR_ZERO(folio);
+		if (err) {
+			if (!(outarg.flags & FUSE_NOTIFY_STORE_NOWAIT) ||
+			    (err != -EAGAIN))
+				goto out_iput;
+			page = NULL;
+			/* XXX is it OK to use PAGE_SIZE here? */
+			this_num = min_t(unsigned int, num, PAGE_SIZE - offset);
+		} else {
+			page = &folio->page;
+			this_num = min_t(unsigned int, num,
+					 folio_size(folio) - offset);
+		}
 
-		page = &folio->page;
-		this_num = min_t(unsigned, num, folio_size(folio) - offset);
 		err = fuse_copy_page(cs, &page, offset, this_num, 0);
+		if (!page)
+			goto skip;
+
 		if (!folio_test_uptodate(folio) && !err && offset == 0 &&
 		    (this_num == folio_size(folio) || file_size == end)) {
 			folio_zero_segment(folio, this_num, folio_size(folio));
@@ -1683,7 +1699,7 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
 		}
 		folio_unlock(folio);
 		folio_put(folio);
-
+skip:
 		if (err)
 			goto out_iput;
 
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index e9e78292d107..59725f89340e 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -576,6 +576,12 @@ struct fuse_file_lock {
  */
 #define FUSE_EXPIRE_ONLY		(1 << 0)
 
+/**
+ * notify_store flags
+ * FUSE_NOTIFY_STORE_NOWAIT: skip locked pages
+ */
+#define FUSE_NOTIFY_STORE_NOWAIT	(1 << 0)
+
 /**
  * extension type
  * FUSE_MAX_NR_SECCTX: maximum value of &fuse_secctx_header.nr_secctx
@@ -1075,7 +1081,7 @@ struct fuse_notify_store_out {
 	uint64_t	nodeid;
 	uint64_t	offset;
 	uint32_t	size;
-	uint32_t	padding;
+	uint32_t	flags;
 };
 
 struct fuse_notify_retrieve_out {

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v2] fuse: fix race in fuse_notify_store()
  2025-01-30 10:16 [RFC PATCH v2] fuse: fix race in fuse_notify_store() Luis Henriques
@ 2025-02-21 17:40 ` Luis Henriques
  2025-02-24 13:36 ` Miklos Szeredi
  1 sibling, 0 replies; 6+ messages in thread
From: Luis Henriques @ 2025-02-21 17:40 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, Bernd Schubert, Teng Qin,
	Matt Harvey

On Thu, Jan 30 2025, Luis Henriques wrote:

> Userspace filesystems can push data for a specific inode without it being
> explicitly requested.  This can be accomplished by using NOTIFY_STORE.
> However, this may race against another process performing different
> operations on the same inode.
>
> If, for example, there is a process reading from it, it may happen that it
> will block waiting for data to be available (locking the folio), while the
> FUSE server will also block trying to lock the same folio to update it with
> the inode data.
>
> The easiest solution, as suggested by Miklos, is to allow the userspace
> filesystem to skip locked folios.
>
> Link: https://lore.kernel.org/CH2PR14MB41040692ABC50334F500789ED6C89@CH2PR14MB4104.namprd14.prod.outlook.com
> Reported-by: Teng Qin <tqin@jumptrading.com>
> Originally-by: Miklos Szeredi <miklos@szeredi.hu>
> Signed-off-by: Luis Henriques <luis@igalia.com>
> ---
> Hi!
>
> Here's v2.  Other than fixing the bug pointed out by Bernd (thanks!), I've
> also added an explanation to the 'XXX' comment.  As a matter of fact, I've
> took another look at that code, and I felt compelled to remove that comment,
> as using PAGE_SIZE seems to be the right thing.
>
> Anyway, I'm still thinking that probably NOTIFY_STORE should *always* have
> this behaviour, without the need for userspace to explicitly setting a flag.

Gentle ping.  I was wondering if you have any thoughts on this patch.
Specially regarding the behaviour change I'm suggesting above.

(Also, as I've mentioned before, I'm using the 'Originally-by' tag; not
sure this is the right thing to do.  Obviously, I'm fine dropping my
s-o-b, as I'm not the original author.)

Cheers,
-- 
Luís


> Changes since v1:
> - Only skip if __filemap_get_folio() returns -EAGAIN (Bernd)
>
>  fs/fuse/dev.c             | 30 +++++++++++++++++++++++-------
>  include/uapi/linux/fuse.h |  8 +++++++-
>  2 files changed, 30 insertions(+), 8 deletions(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 27ccae63495d..309651f82ca4 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -1630,6 +1630,7 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
>  	unsigned int num;
>  	loff_t file_size;
>  	loff_t end;
> +	int fgp_flags = FGP_LOCK | FGP_ACCESSED | FGP_CREAT;
>  
>  	err = -EINVAL;
>  	if (size < sizeof(outarg))
> @@ -1645,6 +1646,9 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
>  
>  	nodeid = outarg.nodeid;
>  
> +	if (outarg.flags & FUSE_NOTIFY_STORE_NOWAIT)
> +		fgp_flags |= FGP_NOWAIT;
> +
>  	down_read(&fc->killsb);
>  
>  	err = -ENOENT;
> @@ -1668,14 +1672,26 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
>  		struct page *page;
>  		unsigned int this_num;
>  
> -		folio = filemap_grab_folio(mapping, index);
> -		err = PTR_ERR(folio);
> -		if (IS_ERR(folio))
> -			goto out_iput;
> +		folio = __filemap_get_folio(mapping, index, fgp_flags,
> +					    mapping_gfp_mask(mapping));
> +		err = PTR_ERR_OR_ZERO(folio);
> +		if (err) {
> +			if (!(outarg.flags & FUSE_NOTIFY_STORE_NOWAIT) ||
> +			    (err != -EAGAIN))
> +				goto out_iput;
> +			page = NULL;
> +			/* XXX is it OK to use PAGE_SIZE here? */
> +			this_num = min_t(unsigned int, num, PAGE_SIZE - offset);
> +		} else {
> +			page = &folio->page;
> +			this_num = min_t(unsigned int, num,
> +					 folio_size(folio) - offset);
> +		}
>  
> -		page = &folio->page;
> -		this_num = min_t(unsigned, num, folio_size(folio) - offset);
>  		err = fuse_copy_page(cs, &page, offset, this_num, 0);
> +		if (!page)
> +			goto skip;
> +
>  		if (!folio_test_uptodate(folio) && !err && offset == 0 &&
>  		    (this_num == folio_size(folio) || file_size == end)) {
>  			folio_zero_segment(folio, this_num, folio_size(folio));
> @@ -1683,7 +1699,7 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
>  		}
>  		folio_unlock(folio);
>  		folio_put(folio);
> -
> +skip:
>  		if (err)
>  			goto out_iput;
>  
> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index e9e78292d107..59725f89340e 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -576,6 +576,12 @@ struct fuse_file_lock {
>   */
>  #define FUSE_EXPIRE_ONLY		(1 << 0)
>  
> +/**
> + * notify_store flags
> + * FUSE_NOTIFY_STORE_NOWAIT: skip locked pages
> + */
> +#define FUSE_NOTIFY_STORE_NOWAIT	(1 << 0)
> +
>  /**
>   * extension type
>   * FUSE_MAX_NR_SECCTX: maximum value of &fuse_secctx_header.nr_secctx
> @@ -1075,7 +1081,7 @@ struct fuse_notify_store_out {
>  	uint64_t	nodeid;
>  	uint64_t	offset;
>  	uint32_t	size;
> -	uint32_t	padding;
> +	uint32_t	flags;
>  };
>  
>  struct fuse_notify_retrieve_out {


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v2] fuse: fix race in fuse_notify_store()
  2025-01-30 10:16 [RFC PATCH v2] fuse: fix race in fuse_notify_store() Luis Henriques
  2025-02-21 17:40 ` Luis Henriques
@ 2025-02-24 13:36 ` Miklos Szeredi
  2025-02-24 14:30   ` Luis Henriques
  1 sibling, 1 reply; 6+ messages in thread
From: Miklos Szeredi @ 2025-02-24 13:36 UTC (permalink / raw)
  To: Luis Henriques; +Cc: linux-fsdevel, linux-kernel, Bernd Schubert, Teng Qin

On Thu, 30 Jan 2025 at 11:16, Luis Henriques <luis@igalia.com> wrote:
>
> Userspace filesystems can push data for a specific inode without it being
> explicitly requested.  This can be accomplished by using NOTIFY_STORE.
> However, this may race against another process performing different
> operations on the same inode.
>
> If, for example, there is a process reading from it, it may happen that it
> will block waiting for data to be available (locking the folio), while the
> FUSE server will also block trying to lock the same folio to update it with
> the inode data.
>
> The easiest solution, as suggested by Miklos, is to allow the userspace
> filesystem to skip locked folios.

Not sure.

The easiest solution is to make the server perform the two operations
independently.  I.e. never trigger a notification from a request.

This is true of other notifications, e.g. doing FUSE_NOTIFY_DELETE
during e.g. FUSE_RMDIR will deadlock on i_mutex.

Or am I misunderstanding the problem?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v2] fuse: fix race in fuse_notify_store()
  2025-02-24 13:36 ` Miklos Szeredi
@ 2025-02-24 14:30   ` Luis Henriques
  2025-02-24 14:39     ` Miklos Szeredi
  0 siblings, 1 reply; 6+ messages in thread
From: Luis Henriques @ 2025-02-24 14:30 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, Bernd Schubert, Teng Qin,
	Matt Harvey

On Mon, Feb 24 2025, Miklos Szeredi wrote:

> On Thu, 30 Jan 2025 at 11:16, Luis Henriques <luis@igalia.com> wrote:
>>
>> Userspace filesystems can push data for a specific inode without it being
>> explicitly requested.  This can be accomplished by using NOTIFY_STORE.
>> However, this may race against another process performing different
>> operations on the same inode.
>>
>> If, for example, there is a process reading from it, it may happen that it
>> will block waiting for data to be available (locking the folio), while the
>> FUSE server will also block trying to lock the same folio to update it with
>> the inode data.
>>
>> The easiest solution, as suggested by Miklos, is to allow the userspace
>> filesystem to skip locked folios.
>
> Not sure.
>
> The easiest solution is to make the server perform the two operations
> independently.  I.e. never trigger a notification from a request.
>
> This is true of other notifications, e.g. doing FUSE_NOTIFY_DELETE
> during e.g. FUSE_RMDIR will deadlock on i_mutex.

Hmmm... OK, the NOTIFY_DELETE and NOTIFY_INVAL_ENTRY deadlocks are
documented (in libfuse, at least).  So, maybe this one could be added to
the list of notifications that could deadlock.  However, IMHO, it would be
great if this could be fixed instead.

> Or am I misunderstanding the problem?

I believe the initial report[1] actually adds a specific use-case where
the deadlock can happen when the server performs the two operations
independently.  For example:

  - An application reads 4K of data at offset 0
  - The server gets a read request.  It performs the read, and gets more
    data than the data requested (say 4M)
  - It caches this data in userspace and replies to VFS with 4K of data
  - The server does a notify_store with the reminder data
  - In the meantime the userspace application reads more 4K at offset 4K

The last 2 operations can race and the server may deadlock if the
application already has locked the page where data will be read into.

Does it make sense?

[1] https://lore.kernel.org/CH2PR14MB41040692ABC50334F500789ED6C89@CH2PR14MB4104.namprd14.prod.outlook.com

Cheers,
-- 
Luís

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v2] fuse: fix race in fuse_notify_store()
  2025-02-24 14:30   ` Luis Henriques
@ 2025-02-24 14:39     ` Miklos Szeredi
  2025-02-25 10:37       ` Luis Henriques
  0 siblings, 1 reply; 6+ messages in thread
From: Miklos Szeredi @ 2025-02-24 14:39 UTC (permalink / raw)
  To: Luis Henriques
  Cc: linux-fsdevel, linux-kernel, Bernd Schubert, Teng Qin,
	Matt Harvey

On Mon, 24 Feb 2025 at 15:30, Luis Henriques <luis@igalia.com> wrote:
>
> On Mon, Feb 24 2025, Miklos Szeredi wrote:
>
> > On Thu, 30 Jan 2025 at 11:16, Luis Henriques <luis@igalia.com> wrote:
> >>
> >> Userspace filesystems can push data for a specific inode without it being
> >> explicitly requested.  This can be accomplished by using NOTIFY_STORE.
> >> However, this may race against another process performing different
> >> operations on the same inode.
> >>
> >> If, for example, there is a process reading from it, it may happen that it
> >> will block waiting for data to be available (locking the folio), while the
> >> FUSE server will also block trying to lock the same folio to update it with
> >> the inode data.
> >>
> >> The easiest solution, as suggested by Miklos, is to allow the userspace
> >> filesystem to skip locked folios.
> >
> > Not sure.
> >
> > The easiest solution is to make the server perform the two operations
> > independently.  I.e. never trigger a notification from a request.
> >
> > This is true of other notifications, e.g. doing FUSE_NOTIFY_DELETE
> > during e.g. FUSE_RMDIR will deadlock on i_mutex.
>
> Hmmm... OK, the NOTIFY_DELETE and NOTIFY_INVAL_ENTRY deadlocks are
> documented (in libfuse, at least).  So, maybe this one could be added to
> the list of notifications that could deadlock.  However, IMHO, it would be
> great if this could be fixed instead.
>
> > Or am I misunderstanding the problem?
>
> I believe the initial report[1] actually adds a specific use-case where
> the deadlock can happen when the server performs the two operations
> independently.  For example:
>
>   - An application reads 4K of data at offset 0
>   - The server gets a read request.  It performs the read, and gets more
>     data than the data requested (say 4M)
>   - It caches this data in userspace and replies to VFS with 4K of data
>   - The server does a notify_store with the reminder data
>   - In the meantime the userspace application reads more 4K at offset 4K
>
> The last 2 operations can race and the server may deadlock if the
> application already has locked the page where data will be read into.

I don't see the deadlock.  If the race was won by the read, then it
will proceed with FUSE_READ and fetch the data from the server.  When
this is finished,  NOTIFY_STORE will overwrite the page with the same
data.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v2] fuse: fix race in fuse_notify_store()
  2025-02-24 14:39     ` Miklos Szeredi
@ 2025-02-25 10:37       ` Luis Henriques
  0 siblings, 0 replies; 6+ messages in thread
From: Luis Henriques @ 2025-02-25 10:37 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-fsdevel, linux-kernel, Bernd Schubert, Teng Qin,
	Matt Harvey

On Mon, Feb 24 2025, Miklos Szeredi wrote:

> On Mon, 24 Feb 2025 at 15:30, Luis Henriques <luis@igalia.com> wrote:
>>
>> On Mon, Feb 24 2025, Miklos Szeredi wrote:
>>
>> > On Thu, 30 Jan 2025 at 11:16, Luis Henriques <luis@igalia.com> wrote:
>> >>
>> >> Userspace filesystems can push data for a specific inode without it being
>> >> explicitly requested.  This can be accomplished by using NOTIFY_STORE.
>> >> However, this may race against another process performing different
>> >> operations on the same inode.
>> >>
>> >> If, for example, there is a process reading from it, it may happen that it
>> >> will block waiting for data to be available (locking the folio), while the
>> >> FUSE server will also block trying to lock the same folio to update it with
>> >> the inode data.
>> >>
>> >> The easiest solution, as suggested by Miklos, is to allow the userspace
>> >> filesystem to skip locked folios.
>> >
>> > Not sure.
>> >
>> > The easiest solution is to make the server perform the two operations
>> > independently.  I.e. never trigger a notification from a request.
>> >
>> > This is true of other notifications, e.g. doing FUSE_NOTIFY_DELETE
>> > during e.g. FUSE_RMDIR will deadlock on i_mutex.
>>
>> Hmmm... OK, the NOTIFY_DELETE and NOTIFY_INVAL_ENTRY deadlocks are
>> documented (in libfuse, at least).  So, maybe this one could be added to
>> the list of notifications that could deadlock.  However, IMHO, it would be
>> great if this could be fixed instead.
>>
>> > Or am I misunderstanding the problem?
>>
>> I believe the initial report[1] actually adds a specific use-case where
>> the deadlock can happen when the server performs the two operations
>> independently.  For example:
>>
>>   - An application reads 4K of data at offset 0
>>   - The server gets a read request.  It performs the read, and gets more
>>     data than the data requested (say 4M)
>>   - It caches this data in userspace and replies to VFS with 4K of data
>>   - The server does a notify_store with the reminder data
>>   - In the meantime the userspace application reads more 4K at offset 4K
>>
>> The last 2 operations can race and the server may deadlock if the
>> application already has locked the page where data will be read into.
>
> I don't see the deadlock.  If the race was won by the read, then it
> will proceed with FUSE_READ and fetch the data from the server.  When
> this is finished,  NOTIFY_STORE will overwrite the page with the same
> data.

OK, that makes sense.  Took a bit to go through all this again, but I
agree that the only thing to do in then is probably to add a warning to
the libfuse API documentation, in fuse_lowlevel_notify_store(), as shown
below.  (I'll prepare an MR for that.)

Thank you, Miklos.

Cheers,
-- 
Luís

diff --git a/include/fuse_lowlevel.h b/include/fuse_lowlevel.h
index 93bcba296c2d..d1f9717347da 100644
--- a/include/fuse_lowlevel.h
+++ b/include/fuse_lowlevel.h
@@ -1845,6 +1845,10 @@ int fuse_lowlevel_notify_delete(struct fuse_session *se,
  * If the stored data overflows the current file size, then the size
  * is extended, similarly to a write(2) on the filesystem.
  *
+ * To avoid a deadlock this function must not be called while executing
+ * a related filesystem operation (e.g. while replying to a FUSE_READ
+ * request).
+ *
  * If this function returns an error, then the store wasn't fully
  * completed, but it may have been partially completed.
  *

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-02-25 10:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-30 10:16 [RFC PATCH v2] fuse: fix race in fuse_notify_store() Luis Henriques
2025-02-21 17:40 ` Luis Henriques
2025-02-24 13:36 ` Miklos Szeredi
2025-02-24 14:30   ` Luis Henriques
2025-02-24 14:39     ` Miklos Szeredi
2025-02-25 10:37       ` Luis Henriques

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).