public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] fscache: Fix oops due to race with cookie_lru and use_cookie
@ 2022-11-17 14:29 Dave Wysochanski
  2022-12-07  9:58 ` Daire Byrne
  0 siblings, 1 reply; 3+ messages in thread
From: Dave Wysochanski @ 2022-11-17 14:29 UTC (permalink / raw)
  To: David Howells, Daire Byrne, Benjamin Maynard; +Cc: linux-cachefs, linux-nfs

If a cookie expires from the LRU and the LRU_DISCARD flag is set,
but the state machine has not run yet, it's possible another thread
can call fscache_use_cookie and begin to use it.  When the
cookie_worker finally runs, it will see the LRU_DISCARD flag set,
transition the cookie->state to LRU_DISCARDING, which will then
withdraw the cookie.  Once the cookie is withdrawn the object is
removed the below oops will occur because the object associated
with the cookie is now NULL.

Fix the oops by clearing the LRU_DISCARD bit if another thread
uses the cookie before the cookie_worker runs.

  BUG: kernel NULL pointer dereference, address: 0000000000000008
  ...
  CPU: 31 PID: 44773 Comm: kworker/u130:1 Tainted: G     E    6.0.0-5.dneg.x86_64 #1
  Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
  Workqueue: events_unbound netfs_rreq_write_to_cache_work [netfs]
  RIP: 0010:cachefiles_prepare_write+0x28/0x90 [cachefiles]
  ...
  Call Trace:
   netfs_rreq_write_to_cache_work+0x11c/0x320 [netfs]
   process_one_work+0x217/0x3e0
   worker_thread+0x4a/0x3b0
   ? process_one_work+0x3e0/0x3e0
   kthread+0xd6/0x100
   ? kthread_complete_and_exit+0x20/0x20
   ret_from_fork+0x1f/0x30

Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning")
Reported-by: Daire Byrne <daire.byrne@gmail.com>
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/fscache/cookie.c            | 8 ++++++++
 include/trace/events/fscache.h | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 451d8a077e12..bce2492186d0 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -605,6 +605,14 @@ void __fscache_use_cookie(struct fscache_cookie *cookie, bool will_modify)
 			set_bit(FSCACHE_COOKIE_DO_PREP_TO_WRITE, &cookie->flags);
 			queue = true;
 		}
+		/*
+		 * We could race with cookie_lru which may set LRU_DISCARD bit
+		 * but has yet to run the cookie state machine.  If this happens
+		 * and another thread tries to use the cookie, clear LRU_DISCARD
+		 * so we don't end up withdrawing the cookie while in use.
+		 */
+		if (test_and_clear_bit(FSCACHE_COOKIE_DO_LRU_DISCARD, &cookie->flags))
+			fscache_see_cookie(cookie, fscache_cookie_see_lru_discard_clear);
 		break;
 
 	case FSCACHE_COOKIE_STATE_FAILED:
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index c078c48a8e6d..a6190aa1b406 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -66,6 +66,7 @@ enum fscache_cookie_trace {
 	fscache_cookie_put_work,
 	fscache_cookie_see_active,
 	fscache_cookie_see_lru_discard,
+	fscache_cookie_see_lru_discard_clear,
 	fscache_cookie_see_lru_do_one,
 	fscache_cookie_see_relinquish,
 	fscache_cookie_see_withdraw,
@@ -149,6 +150,7 @@ enum fscache_access_trace {
 	EM(fscache_cookie_put_work,		"PQ  work ")		\
 	EM(fscache_cookie_see_active,		"-   activ")		\
 	EM(fscache_cookie_see_lru_discard,	"-   x-lru")		\
+	EM(fscache_cookie_see_lru_discard_clear,"-   lrudc")            \
 	EM(fscache_cookie_see_lru_do_one,	"-   lrudo")		\
 	EM(fscache_cookie_see_relinquish,	"-   x-rlq")		\
 	EM(fscache_cookie_see_withdraw,		"-   x-wth")		\
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] fscache: Fix oops due to race with cookie_lru and use_cookie
  2022-11-17 14:29 Dave Wysochanski
@ 2022-12-07  9:58 ` Daire Byrne
  0 siblings, 0 replies; 3+ messages in thread
From: Daire Byrne @ 2022-12-07  9:58 UTC (permalink / raw)
  To: Dave Wysochanski
  Cc: David Howells, Daire Byrne, Benjamin Maynard, linux-cachefs,
	linux-nfs

I have also now tested this v2 patch and can confirm that it also
fixes the race in fscache that we were reliably able to reproduce with
our (re-export) workloads..

Tested-by: Daire Byrne <daire@dneg.com>

Daire

On Thu, 17 Nov 2022 at 14:30, Dave Wysochanski <dwysocha@redhat.com> wrote:
>
> If a cookie expires from the LRU and the LRU_DISCARD flag is set,
> but the state machine has not run yet, it's possible another thread
> can call fscache_use_cookie and begin to use it.  When the
> cookie_worker finally runs, it will see the LRU_DISCARD flag set,
> transition the cookie->state to LRU_DISCARDING, which will then
> withdraw the cookie.  Once the cookie is withdrawn the object is
> removed the below oops will occur because the object associated
> with the cookie is now NULL.
>
> Fix the oops by clearing the LRU_DISCARD bit if another thread
> uses the cookie before the cookie_worker runs.
>
>   BUG: kernel NULL pointer dereference, address: 0000000000000008
>   ...
>   CPU: 31 PID: 44773 Comm: kworker/u130:1 Tainted: G     E    6.0.0-5.dneg.x86_64 #1
>   Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
>   Workqueue: events_unbound netfs_rreq_write_to_cache_work [netfs]
>   RIP: 0010:cachefiles_prepare_write+0x28/0x90 [cachefiles]
>   ...
>   Call Trace:
>    netfs_rreq_write_to_cache_work+0x11c/0x320 [netfs]
>    process_one_work+0x217/0x3e0
>    worker_thread+0x4a/0x3b0
>    ? process_one_work+0x3e0/0x3e0
>    kthread+0xd6/0x100
>    ? kthread_complete_and_exit+0x20/0x20
>    ret_from_fork+0x1f/0x30
>
> Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning")
> Reported-by: Daire Byrne <daire.byrne@gmail.com>
> Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> ---
>  fs/fscache/cookie.c            | 8 ++++++++
>  include/trace/events/fscache.h | 2 ++
>  2 files changed, 10 insertions(+)
>
> diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
> index 451d8a077e12..bce2492186d0 100644
> --- a/fs/fscache/cookie.c
> +++ b/fs/fscache/cookie.c
> @@ -605,6 +605,14 @@ void __fscache_use_cookie(struct fscache_cookie *cookie, bool will_modify)
>                         set_bit(FSCACHE_COOKIE_DO_PREP_TO_WRITE, &cookie->flags);
>                         queue = true;
>                 }
> +               /*
> +                * We could race with cookie_lru which may set LRU_DISCARD bit
> +                * but has yet to run the cookie state machine.  If this happens
> +                * and another thread tries to use the cookie, clear LRU_DISCARD
> +                * so we don't end up withdrawing the cookie while in use.
> +                */
> +               if (test_and_clear_bit(FSCACHE_COOKIE_DO_LRU_DISCARD, &cookie->flags))
> +                       fscache_see_cookie(cookie, fscache_cookie_see_lru_discard_clear);
>                 break;
>
>         case FSCACHE_COOKIE_STATE_FAILED:
> diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
> index c078c48a8e6d..a6190aa1b406 100644
> --- a/include/trace/events/fscache.h
> +++ b/include/trace/events/fscache.h
> @@ -66,6 +66,7 @@ enum fscache_cookie_trace {
>         fscache_cookie_put_work,
>         fscache_cookie_see_active,
>         fscache_cookie_see_lru_discard,
> +       fscache_cookie_see_lru_discard_clear,
>         fscache_cookie_see_lru_do_one,
>         fscache_cookie_see_relinquish,
>         fscache_cookie_see_withdraw,
> @@ -149,6 +150,7 @@ enum fscache_access_trace {
>         EM(fscache_cookie_put_work,             "PQ  work ")            \
>         EM(fscache_cookie_see_active,           "-   activ")            \
>         EM(fscache_cookie_see_lru_discard,      "-   x-lru")            \
> +       EM(fscache_cookie_see_lru_discard_clear,"-   lrudc")            \
>         EM(fscache_cookie_see_lru_do_one,       "-   lrudo")            \
>         EM(fscache_cookie_see_relinquish,       "-   x-rlq")            \
>         EM(fscache_cookie_see_withdraw,         "-   x-wth")            \
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v2] fscache: Fix oops due to race with cookie_lru and use_cookie
@ 2022-12-07 13:49 David Howells
  0 siblings, 0 replies; 3+ messages in thread
From: David Howells @ 2022-12-07 13:49 UTC (permalink / raw)
  To: torvalds
  Cc: dhowells, Dave Wysochanski, Daire Byrne, Benjamin Maynard,
	linux-cachefs, linux-nfs, linux-fsdevel, linux-kernel

Hi Linus,

Could you apply this, please?

Thanks,
David
---
From: Dave Wysochanski <dwysocha@redhat.com>

If a cookie expires from the LRU and the LRU_DISCARD flag is set,
but the state machine has not run yet, it's possible another thread
can call fscache_use_cookie and begin to use it.  When the
cookie_worker finally runs, it will see the LRU_DISCARD flag set,
transition the cookie->state to LRU_DISCARDING, which will then
withdraw the cookie.  Once the cookie is withdrawn the object is
removed the below oops will occur because the object associated
with the cookie is now NULL.

Fix the oops by clearing the LRU_DISCARD bit if another thread
uses the cookie before the cookie_worker runs.

  BUG: kernel NULL pointer dereference, address: 0000000000000008
  ...
  CPU: 31 PID: 44773 Comm: kworker/u130:1 Tainted: G     E    6.0.0-5.dneg.x86_64 #1
  Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
  Workqueue: events_unbound netfs_rreq_write_to_cache_work [netfs]
  RIP: 0010:cachefiles_prepare_write+0x28/0x90 [cachefiles]
  ...
  Call Trace:
   netfs_rreq_write_to_cache_work+0x11c/0x320 [netfs]
   process_one_work+0x217/0x3e0
   worker_thread+0x4a/0x3b0
   ? process_one_work+0x3e0/0x3e0
   kthread+0xd6/0x100
   ? kthread_complete_and_exit+0x20/0x20
   ret_from_fork+0x1f/0x30

Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning")
Reported-by: Daire Byrne <daire.byrne@gmail.com>
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Daire Byrne <daire@dneg.com>
Link: https://lore.kernel.org/r/20221117115023.1350181-1-dwysocha@redhat.com/ # v1
Link: https://lore.kernel.org/r/20221117142915.1366990-1-dwysocha@redhat.com/ # v2
---
 fs/fscache/cookie.c            | 8 ++++++++
 include/trace/events/fscache.h | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c
index 451d8a077e12..bce2492186d0 100644
--- a/fs/fscache/cookie.c
+++ b/fs/fscache/cookie.c
@@ -605,6 +605,14 @@ void __fscache_use_cookie(struct fscache_cookie *cookie, bool will_modify)
 			set_bit(FSCACHE_COOKIE_DO_PREP_TO_WRITE, &cookie->flags);
 			queue = true;
 		}
+		/*
+		 * We could race with cookie_lru which may set LRU_DISCARD bit
+		 * but has yet to run the cookie state machine.  If this happens
+		 * and another thread tries to use the cookie, clear LRU_DISCARD
+		 * so we don't end up withdrawing the cookie while in use.
+		 */
+		if (test_and_clear_bit(FSCACHE_COOKIE_DO_LRU_DISCARD, &cookie->flags))
+			fscache_see_cookie(cookie, fscache_cookie_see_lru_discard_clear);
 		break;
 
 	case FSCACHE_COOKIE_STATE_FAILED:
diff --git a/include/trace/events/fscache.h b/include/trace/events/fscache.h
index c078c48a8e6d..a6190aa1b406 100644
--- a/include/trace/events/fscache.h
+++ b/include/trace/events/fscache.h
@@ -66,6 +66,7 @@ enum fscache_cookie_trace {
 	fscache_cookie_put_work,
 	fscache_cookie_see_active,
 	fscache_cookie_see_lru_discard,
+	fscache_cookie_see_lru_discard_clear,
 	fscache_cookie_see_lru_do_one,
 	fscache_cookie_see_relinquish,
 	fscache_cookie_see_withdraw,
@@ -149,6 +150,7 @@ enum fscache_access_trace {
 	EM(fscache_cookie_put_work,		"PQ  work ")		\
 	EM(fscache_cookie_see_active,		"-   activ")		\
 	EM(fscache_cookie_see_lru_discard,	"-   x-lru")		\
+	EM(fscache_cookie_see_lru_discard_clear,"-   lrudc")            \
 	EM(fscache_cookie_see_lru_do_one,	"-   lrudo")		\
 	EM(fscache_cookie_see_relinquish,	"-   x-rlq")		\
 	EM(fscache_cookie_see_withdraw,		"-   x-wth")		\


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-12-07 13:50 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-07 13:49 [PATCH v2] fscache: Fix oops due to race with cookie_lru and use_cookie David Howells
  -- strict thread matches above, loose matches on Subject: below --
2022-11-17 14:29 Dave Wysochanski
2022-12-07  9:58 ` Daire Byrne

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox