[PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client
@ 2024-09-06  0:57 Oleksandr Tymoshenko
  2024-09-06  0:58 ` kernel test robot
  2024-09-08 16:48 ` Trond Myklebust
  0 siblings, 2 replies; 17+ messages in thread
From: Oleksandr Tymoshenko @ 2024-09-06  0:57 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker
  Cc: linux-nfs, jbongio, stable, Oleksandr Tymoshenko

nfs41_init_clientid does not signal a failure condition from
nfs4_proc_exchange_id and nfs4_proc_create_session to a client which may
lead to mount syscall indefinitely blocked in the following stack trace:
  nfs_wait_client_init_complete
  nfs41_discover_server_trunking
  nfs4_discover_server_trunking
  nfs4_init_client
  nfs4_set_client
  nfs4_create_server
  nfs4_try_get_tree
  vfs_get_tree
  do_new_mount
  __se_sys_mount

and the client stuck in uninitialized state.

In addition to this all subsequent mount calls would also get blocked in
nfs_match_client waiting for the uninitialized client to finish
initialization:
  nfs_wait_client_init_complete
  nfs_match_client
  nfs_get_client
  nfs4_set_client
  nfs4_create_server
  nfs4_try_get_tree
  vfs_get_tree
  do_new_mount
  __se_sys_mount

To avoid this situation propagate error condition to the mount thread
and let mount syscall fail properly.

Signed-off-by: Oleksandr Tymoshenko <ovt@google.com>
---
 fs/nfs/nfs4state.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 877f682b45f2..54ad3440ad2b 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -335,8 +335,8 @@ int nfs41_init_clientid(struct nfs_client *clp, const struct cred *cred)
 	if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_CONFIRMED_R))
 		nfs4_state_start_reclaim_reboot(clp);
 	nfs41_finish_session_reset(clp);
-	nfs_mark_client_ready(clp, NFS_CS_READY);
 out:
+	nfs_mark_client_ready(clp, status == 0 ? NFS_CS_READY : status);
 	return status;
 }
 

---
base-commit: ad618736883b8970f66af799e34007475fe33a68
change-id: 20240906-nfs-mount-deadlock-fix-55c14b38e088

Best regards,
-- 
Oleksandr Tymoshenko <ovt@google.com>


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client
  2024-09-06  0:57 [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client Oleksandr Tymoshenko
@ 2024-09-06  0:58 ` kernel test robot
  2024-09-08 16:48 ` Trond Myklebust
  1 sibling, 0 replies; 17+ messages in thread
From: kernel test robot @ 2024-09-06  0:58 UTC (permalink / raw)
  To: Oleksandr Tymoshenko; +Cc: stable, oe-kbuild-all

Hi,

Thanks for your patch.

FYI: kernel test robot notices the stable kernel rule is not satisfied.

The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#option-1

Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree.
Subject: [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client
Link: https://lore.kernel.org/stable/20240906-nfs-mount-deadlock-fix-v1-1-ea1aef533f9c%40google.com

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client
  2024-09-06  0:57 [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client Oleksandr Tymoshenko
  2024-09-06  0:58 ` kernel test robot
@ 2024-09-08 16:48 ` Trond Myklebust
  2024-09-09 16:36   ` [PATCH 6.1.y] net: tls: handle backlogging of crypto requests Oleksandr Tymoshenko
  2024-09-09 17:46   ` Oleksandr Tymoshenko
  1 sibling, 2 replies; 17+ messages in thread
From: Trond Myklebust @ 2024-09-08 16:48 UTC (permalink / raw)
  To: Oleksandr Tymoshenko, Anna Schumaker; +Cc: linux-nfs, jbongio, stable

On Fri, 2024-09-06 at 00:57 +0000, Oleksandr Tymoshenko wrote:
> nfs41_init_clientid does not signal a failure condition from
> nfs4_proc_exchange_id and nfs4_proc_create_session to a client which
> may
> lead to mount syscall indefinitely blocked in the following stack
> trace:
>   nfs_wait_client_init_complete
>   nfs41_discover_server_trunking
>   nfs4_discover_server_trunking
>   nfs4_init_client
>   nfs4_set_client
>   nfs4_create_server
>   nfs4_try_get_tree
>   vfs_get_tree
>   do_new_mount
>   __se_sys_mount
> 
> and the client stuck in uninitialized state.
> 
> In addition to this all subsequent mount calls would also get blocked
> in
> nfs_match_client waiting for the uninitialized client to finish
> initialization:
>   nfs_wait_client_init_complete
>   nfs_match_client
>   nfs_get_client
>   nfs4_set_client
>   nfs4_create_server
>   nfs4_try_get_tree
>   vfs_get_tree
>   do_new_mount
>   __se_sys_mount
> 
> To avoid this situation propagate error condition to the mount thread
> and let mount syscall fail properly.
> 
> Signed-off-by: Oleksandr Tymoshenko <ovt@google.com>
> ---
>  fs/nfs/nfs4state.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 877f682b45f2..54ad3440ad2b 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -335,8 +335,8 @@ int nfs41_init_clientid(struct nfs_client *clp,
> const struct cred *cred)
>  	if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_CONFIRMED_R))
>  		nfs4_state_start_reclaim_reboot(clp);
>  	nfs41_finish_session_reset(clp);
> -	nfs_mark_client_ready(clp, NFS_CS_READY);
>  out:
> +	nfs_mark_client_ready(clp, status == 0 ? NFS_CS_READY :
> status);
>  	return status;
>  }

NACK. This will break all sorts of recovery scenarios, because it
doesn't distinguish between an initial 'mount' and a server reboot
recovery situation.
Even in the case where we are in the initial mount, it also doesn't
distinguish between transient errors such as NFS4ERR_DELAY or reboot
errors such as NFS4ERR_STALE_CLIENTID, etc.

Exactly what is the scenario that is causing your hang? Let's try to
address that with a more targeted fix.


-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 6.1.y] net: tls: handle backlogging of crypto requests
  2024-09-08 16:48 ` Trond Myklebust
@ 2024-09-09 16:36   ` Oleksandr Tymoshenko
  2024-09-09 17:56     ` Trond Myklebust
  2024-09-09 17:46   ` Oleksandr Tymoshenko
  1 sibling, 1 reply; 17+ messages in thread
From: Oleksandr Tymoshenko @ 2024-09-09 16:36 UTC (permalink / raw)
  To: trondmy; +Cc: anna, jbongio, linux-nfs, ovt, stable

>> nfs41_init_clientid does not signal a failure condition from
>> nfs4_proc_exchange_id and nfs4_proc_create_session to a client which
>> may
>> lead to mount syscall indefinitely blocked in the following stack

> NACK. This will break all sorts of recovery scenarios, because it
> doesn't distinguish between an initial 'mount' and a server reboot
> recovery situation.
> Even in the case where we are in the initial mount, it also doesn't
> distinguish between transient errors such as NFS4ERR_DELAY or reboot
> errors such as NFS4ERR_STALE_CLIENTID, etc.

> Exactly what is the scenario that is causing your hang? Let's try to
> address that with a more targeted fix.

The scenario is as follows: there are several NFS servers and several
production machines with multiple NFS mounts. This is a containerized
multi-tennant workflow so every tennant gets its own NFS mount to access their
data. At some point nfs41_init_clientid fails in the initial mount.nfs call
and all subsequent mount.nfs calls just hang in nfs_wait_client_init_complete
until the original one, where nfs4_proc_exchange_id has failed, is killed.

The cause of the nfs41_init_clientid failure in the production case is a timeout.
The following error message is observed in logs:
  NFS: state manager: lease expired failed on NFSv4 server <ip> with error 110

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6.1.y] net: tls: handle backlogging of crypto requests
  2024-09-09 16:36   ` [PATCH 6.1.y] net: tls: handle backlogging of crypto requests Oleksandr Tymoshenko
@ 2024-09-09 17:56     ` Trond Myklebust
  2024-09-09 23:06       ` [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client Oleksandr Tymoshenko
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2024-09-09 17:56 UTC (permalink / raw)
  To: ovt@google.com
  Cc: anna@kernel.org, jbongio@google.com, linux-nfs@vger.kernel.org,
	stable@vger.kernel.org

On Mon, 2024-09-09 at 16:36 +0000, Oleksandr Tymoshenko wrote:
> > > nfs41_init_clientid does not signal a failure condition from
> > > nfs4_proc_exchange_id and nfs4_proc_create_session to a client
> > > which
> > > may
> > > lead to mount syscall indefinitely blocked in the following stack
> 
> > NACK. This will break all sorts of recovery scenarios, because it
> > doesn't distinguish between an initial 'mount' and a server reboot
> > recovery situation.
> > Even in the case where we are in the initial mount, it also doesn't
> > distinguish between transient errors such as NFS4ERR_DELAY or
> > reboot
> > errors such as NFS4ERR_STALE_CLIENTID, etc.
> 
> > Exactly what is the scenario that is causing your hang? Let's try
> > to
> > address that with a more targeted fix.
> 
> The scenario is as follows: there are several NFS servers and several
> production machines with multiple NFS mounts. This is a containerized
> multi-tennant workflow so every tennant gets its own NFS mount to
> access their
> data. At some point nfs41_init_clientid fails in the initial
> mount.nfs call
> and all subsequent mount.nfs calls just hang in
> nfs_wait_client_init_complete
> until the original one, where nfs4_proc_exchange_id has failed, is
> killed.
> 
> The cause of the nfs41_init_clientid failure in the production case
> is a timeout.
> The following error message is observed in logs:
>   NFS: state manager: lease expired failed on NFSv4 server <ip> with
> error 110
> 

How about something like the following fix then?
8<-----------------------------------------------
From eb402b489bb0d0ada1a3dd9101d4d7e193402e46 Mon Sep 17 00:00:00 2001
Message-ID: <eb402b489bb0d0ada1a3dd9101d4d7e193402e46.1725904471.git.trond.myklebust@hammerspace.com>
From: Trond Myklebust <trond.myklebust@hammerspace.com>
Date: Mon, 9 Sep 2024 13:47:07 -0400
Subject: [PATCH] NFSv4: Fail mounts if the lease setup times out

If the server is down when the client is trying to mount, so that the
calls to exchange_id or create_session fail, then we should allow the
mount system call to fail rather than hang and block other mount/umount
calls.

Reported-by: Oleksandr Tymoshenko <ovt@google.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4state.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 30aba1dedaba..59dcdf9bc7b4 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2024,6 +2024,12 @@ static int nfs4_handle_reclaim_lease_error(struct nfs_client *clp, int status)
 		nfs_mark_client_ready(clp, -EPERM);
 		clear_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state);
 		return -EPERM;
+	case -ETIMEDOUT:
+		if (clp->cl_cons_state == NFS_CS_SESSION_INITING) {
+			nfs_mark_client_ready(clp, -EIO);
+			return -EIO;
+		}
+		fallthrough;
 	case -EACCES:
 	case -NFS4ERR_DELAY:
 	case -EAGAIN:
-- 
2.46.0


-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client
  2024-09-09 17:56     ` Trond Myklebust
@ 2024-09-09 23:06       ` Oleksandr Tymoshenko
  2024-09-10  0:22         ` Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Oleksandr Tymoshenko @ 2024-09-09 23:06 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: anna@kernel.org, jbongio@google.com, linux-nfs@vger.kernel.org,
	stable@vger.kernel.org

On Mon, Sep 9, 2024 at 10:56 AM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Mon, 2024-09-09 at 16:36 +0000, Oleksandr Tymoshenko wrote:
> > > > nfs41_init_clientid does not signal a failure condition from
> > > > nfs4_proc_exchange_id and nfs4_proc_create_session to a client
> > > > which
> > > > may
> > > > lead to mount syscall indefinitely blocked in the following stack
> >
> > > NACK. This will break all sorts of recovery scenarios, because it
> > > doesn't distinguish between an initial 'mount' and a server reboot
> > > recovery situation.
> > > Even in the case where we are in the initial mount, it also doesn't
> > > distinguish between transient errors such as NFS4ERR_DELAY or
> > > reboot
> > > errors such as NFS4ERR_STALE_CLIENTID, etc.
> >
> > > Exactly what is the scenario that is causing your hang? Let's try
> > > to
> > > address that with a more targeted fix.
> >
> > The scenario is as follows: there are several NFS servers and several
> > production machines with multiple NFS mounts. This is a containerized
> > multi-tennant workflow so every tennant gets its own NFS mount to
> > access their
> > data. At some point nfs41_init_clientid fails in the initial
> > mount.nfs call
> > and all subsequent mount.nfs calls just hang in
> > nfs_wait_client_init_complete
> > until the original one, where nfs4_proc_exchange_id has failed, is
> > killed.
> >
> > The cause of the nfs41_init_clientid failure in the production case
> > is a timeout.
> > The following error message is observed in logs:
> >   NFS: state manager: lease expired failed on NFSv4 server <ip> with
> > error 110
> >
>
> How about something like the following fix then?
> 8<-----------------------------------------------
> From eb402b489bb0d0ada1a3dd9101d4d7e193402e46 Mon Sep 17 00:00:00 2001
> Message-ID: <eb402b489bb0d0ada1a3dd9101d4d7e193402e46.1725904471.git.trond.myklebust@hammerspace.com>
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> Date: Mon, 9 Sep 2024 13:47:07 -0400
> Subject: [PATCH] NFSv4: Fail mounts if the lease setup times out
>
> If the server is down when the client is trying to mount, so that the
> calls to exchange_id or create_session fail, then we should allow the
> mount system call to fail rather than hang and block other mount/umount
> calls.
>
> Reported-by: Oleksandr Tymoshenko <ovt@google.com>
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
>  fs/nfs/nfs4state.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 30aba1dedaba..59dcdf9bc7b4 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -2024,6 +2024,12 @@ static int nfs4_handle_reclaim_lease_error(struct nfs_client *clp, int status)
>                 nfs_mark_client_ready(clp, -EPERM);
>                 clear_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state);
>                 return -EPERM;
> +       case -ETIMEDOUT:
> +               if (clp->cl_cons_state == NFS_CS_SESSION_INITING) {
> +                       nfs_mark_client_ready(clp, -EIO);
> +                       return -EIO;
> +               }
> +               fallthrough;
>         case -EACCES:
>         case -NFS4ERR_DELAY:
>         case -EAGAIN:
> --

This patch fixes the issue in my simulated environment. ETIMEDOUT is
the error code that
was observed in the production env but I guess it's not the only
possible one. Would it make
sense to handle all error conditions in the NFS_CS_SESSION_INITING
state or are there
some others that are recoverable?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client
  2024-09-09 23:06       ` [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client Oleksandr Tymoshenko
@ 2024-09-10  0:22         ` Trond Myklebust
  2024-09-10 21:08           ` Oleksandr Tymoshenko
  0 siblings, 1 reply; 17+ messages in thread
From: Trond Myklebust @ 2024-09-10  0:22 UTC (permalink / raw)
  To: ovt@google.com
  Cc: anna@kernel.org, jbongio@google.com, linux-nfs@vger.kernel.org,
	stable@vger.kernel.org

On Mon, 2024-09-09 at 16:06 -0700, Oleksandr Tymoshenko wrote:
> On Mon, Sep 9, 2024 at 10:56 AM Trond Myklebust
> <trondmy@hammerspace.com> wrote:
> > 
> > On Mon, 2024-09-09 at 16:36 +0000, Oleksandr Tymoshenko wrote:
> > > > > nfs41_init_clientid does not signal a failure condition from
> > > > > nfs4_proc_exchange_id and nfs4_proc_create_session to a
> > > > > client
> > > > > which
> > > > > may
> > > > > lead to mount syscall indefinitely blocked in the following
> > > > > stack
> > > 
> > > > NACK. This will break all sorts of recovery scenarios, because
> > > > it
> > > > doesn't distinguish between an initial 'mount' and a server
> > > > reboot
> > > > recovery situation.
> > > > Even in the case where we are in the initial mount, it also
> > > > doesn't
> > > > distinguish between transient errors such as NFS4ERR_DELAY or
> > > > reboot
> > > > errors such as NFS4ERR_STALE_CLIENTID, etc.
> > > 
> > > > Exactly what is the scenario that is causing your hang? Let's
> > > > try
> > > > to
> > > > address that with a more targeted fix.
> > > 
> > > The scenario is as follows: there are several NFS servers and
> > > several
> > > production machines with multiple NFS mounts. This is a
> > > containerized
> > > multi-tennant workflow so every tennant gets its own NFS mount to
> > > access their
> > > data. At some point nfs41_init_clientid fails in the initial
> > > mount.nfs call
> > > and all subsequent mount.nfs calls just hang in
> > > nfs_wait_client_init_complete
> > > until the original one, where nfs4_proc_exchange_id has failed,
> > > is
> > > killed.
> > > 
> > > The cause of the nfs41_init_clientid failure in the production
> > > case
> > > is a timeout.
> > > The following error message is observed in logs:
> > >   NFS: state manager: lease expired failed on NFSv4 server <ip>
> > > with
> > > error 110
> > > 
> > 
> > How about something like the following fix then?
> > 8<-----------------------------------------------
> > From eb402b489bb0d0ada1a3dd9101d4d7e193402e46 Mon Sep 17 00:00:00
> > 2001
> > Message-ID:
> > <eb402b489bb0d0ada1a3dd9101d4d7e193402e46.1725904471.git.trond.mykl
> > ebust@hammerspace.com>
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > Date: Mon, 9 Sep 2024 13:47:07 -0400
> > Subject: [PATCH] NFSv4: Fail mounts if the lease setup times out
> > 
> > If the server is down when the client is trying to mount, so that
> > the
> > calls to exchange_id or create_session fail, then we should allow
> > the
> > mount system call to fail rather than hang and block other
> > mount/umount
> > calls.
> > 
> > Reported-by: Oleksandr Tymoshenko <ovt@google.com>
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> >  fs/nfs/nfs4state.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > index 30aba1dedaba..59dcdf9bc7b4 100644
> > --- a/fs/nfs/nfs4state.c
> > +++ b/fs/nfs/nfs4state.c
> > @@ -2024,6 +2024,12 @@ static int
> > nfs4_handle_reclaim_lease_error(struct nfs_client *clp, int status)
> >                 nfs_mark_client_ready(clp, -EPERM);
> >                 clear_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state);
> >                 return -EPERM;
> > +       case -ETIMEDOUT:
> > +               if (clp->cl_cons_state == NFS_CS_SESSION_INITING) {
> > +                       nfs_mark_client_ready(clp, -EIO);
> > +                       return -EIO;
> > +               }
> > +               fallthrough;
> >         case -EACCES:
> >         case -NFS4ERR_DELAY:
> >         case -EAGAIN:
> > --
> 
> This patch fixes the issue in my simulated environment. ETIMEDOUT is
> the error code that
> was observed in the production env but I guess it's not the only
> possible one. Would it make
> sense to handle all error conditions in the NFS_CS_SESSION_INITING
> state or are there
> some others that are recoverable?
> 

The only other one that I'm thinking might want to be treated similarly
is the above EACCES error. That's because that is only returned if
there is a problem with your RPCSEC_GSS/krb5 credential. I was thinking
of changing that one too in the same patch, but came to the conclusion
it would be better to treat the two issues with separate fixes.

The other error conditions are all supposed to be transient NFS level
errors. They should not be treated as fatal.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client
  2024-09-10  0:22         ` Trond Myklebust
@ 2024-09-10 21:08           ` Oleksandr Tymoshenko
  2024-09-23 20:15             ` Oleksandr Tymoshenko
  0 siblings, 1 reply; 17+ messages in thread
From: Oleksandr Tymoshenko @ 2024-09-10 21:08 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: anna@kernel.org, jbongio@google.com, linux-nfs@vger.kernel.org,
	stable@vger.kernel.org

On Mon, Sep 9, 2024 at 5:22 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Mon, 2024-09-09 at 16:06 -0700, Oleksandr Tymoshenko wrote:
> > On Mon, Sep 9, 2024 at 10:56 AM Trond Myklebust
> > <trondmy@hammerspace.com> wrote:
> > >
> > > On Mon, 2024-09-09 at 16:36 +0000, Oleksandr Tymoshenko wrote:
> > > > > > nfs41_init_clientid does not signal a failure condition from
> > > > > > nfs4_proc_exchange_id and nfs4_proc_create_session to a
> > > > > > client
> > > > > > which
> > > > > > may
> > > > > > lead to mount syscall indefinitely blocked in the following
> > > > > > stack
> > > >
> > > > > NACK. This will break all sorts of recovery scenarios, because
> > > > > it
> > > > > doesn't distinguish between an initial 'mount' and a server
> > > > > reboot
> > > > > recovery situation.
> > > > > Even in the case where we are in the initial mount, it also
> > > > > doesn't
> > > > > distinguish between transient errors such as NFS4ERR_DELAY or
> > > > > reboot
> > > > > errors such as NFS4ERR_STALE_CLIENTID, etc.
> > > >
> > > > > Exactly what is the scenario that is causing your hang? Let's
> > > > > try
> > > > > to
> > > > > address that with a more targeted fix.
> > > >
> > > > The scenario is as follows: there are several NFS servers and
> > > > several
> > > > production machines with multiple NFS mounts. This is a
> > > > containerized
> > > > multi-tennant workflow so every tennant gets its own NFS mount to
> > > > access their
> > > > data. At some point nfs41_init_clientid fails in the initial
> > > > mount.nfs call
> > > > and all subsequent mount.nfs calls just hang in
> > > > nfs_wait_client_init_complete
> > > > until the original one, where nfs4_proc_exchange_id has failed,
> > > > is
> > > > killed.
> > > >
> > > > The cause of the nfs41_init_clientid failure in the production
> > > > case
> > > > is a timeout.
> > > > The following error message is observed in logs:
> > > >   NFS: state manager: lease expired failed on NFSv4 server <ip>
> > > > with
> > > > error 110
> > > >
> > >
> > > How about something like the following fix then?
> > > 8<-----------------------------------------------
> > > From eb402b489bb0d0ada1a3dd9101d4d7e193402e46 Mon Sep 17 00:00:00
> > > 2001
> > > Message-ID:
> > > <eb402b489bb0d0ada1a3dd9101d4d7e193402e46.1725904471.git.trond.mykl
> > > ebust@hammerspace.com>
> > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > Date: Mon, 9 Sep 2024 13:47:07 -0400
> > > Subject: [PATCH] NFSv4: Fail mounts if the lease setup times out
> > >
> > > If the server is down when the client is trying to mount, so that
> > > the
> > > calls to exchange_id or create_session fail, then we should allow
> > > the
> > > mount system call to fail rather than hang and block other
> > > mount/umount
> > > calls.
> > >
> > > Reported-by: Oleksandr Tymoshenko <ovt@google.com>
> > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > ---
> > >  fs/nfs/nfs4state.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > > index 30aba1dedaba..59dcdf9bc7b4 100644
> > > --- a/fs/nfs/nfs4state.c
> > > +++ b/fs/nfs/nfs4state.c
> > > @@ -2024,6 +2024,12 @@ static int
> > > nfs4_handle_reclaim_lease_error(struct nfs_client *clp, int status)
> > >                 nfs_mark_client_ready(clp, -EPERM);
> > >                 clear_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state);
> > >                 return -EPERM;
> > > +       case -ETIMEDOUT:
> > > +               if (clp->cl_cons_state == NFS_CS_SESSION_INITING) {
> > > +                       nfs_mark_client_ready(clp, -EIO);
> > > +                       return -EIO;
> > > +               }
> > > +               fallthrough;
> > >         case -EACCES:
> > >         case -NFS4ERR_DELAY:
> > >         case -EAGAIN:
> > > --
> >
> > This patch fixes the issue in my simulated environment. ETIMEDOUT is
> > the error code that
> > was observed in the production env but I guess it's not the only
> > possible one. Would it make
> > sense to handle all error conditions in the NFS_CS_SESSION_INITING
> > state or are there
> > some others that are recoverable?
> >
>
> The only other one that I'm thinking might want to be treated similarly
> is the above EACCES error. That's because that is only returned if
> there is a problem with your RPCSEC_GSS/krb5 credential. I was thinking
> of changing that one too in the same patch, but came to the conclusion
> it would be better to treat the two issues with separate fixes.
>
> The other error conditions are all supposed to be transient NFS level
> errors. They should not be treated as fatal.

Sounds good. Will you submit this patch to the mainline kernel? If so
please add me to Cc. Thanks for looking into this.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client
  2024-09-10 21:08           ` Oleksandr Tymoshenko
@ 2024-09-23 20:15             ` Oleksandr Tymoshenko
  2024-09-26 20:02               ` Trond Myklebust
  0 siblings, 1 reply; 17+ messages in thread
From: Oleksandr Tymoshenko @ 2024-09-23 20:15 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: anna@kernel.org, jbongio@google.com, linux-nfs@vger.kernel.org,
	stable@vger.kernel.org

Hi Trond,

Following up on this: do you have plans to submit the patch you proposed
or do you want me to rework my submission along the lines you proposed?

On Tue, Sep 10, 2024 at 2:08 PM Oleksandr Tymoshenko <ovt@google.com> wrote:
>
> On Mon, Sep 9, 2024 at 5:22 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
> >
> > On Mon, 2024-09-09 at 16:06 -0700, Oleksandr Tymoshenko wrote:
> > > On Mon, Sep 9, 2024 at 10:56 AM Trond Myklebust
> > > <trondmy@hammerspace.com> wrote:
> > > >
> > > > On Mon, 2024-09-09 at 16:36 +0000, Oleksandr Tymoshenko wrote:
> > > > > > > nfs41_init_clientid does not signal a failure condition from
> > > > > > > nfs4_proc_exchange_id and nfs4_proc_create_session to a
> > > > > > > client
> > > > > > > which
> > > > > > > may
> > > > > > > lead to mount syscall indefinitely blocked in the following
> > > > > > > stack
> > > > >
> > > > > > NACK. This will break all sorts of recovery scenarios, because
> > > > > > it
> > > > > > doesn't distinguish between an initial 'mount' and a server
> > > > > > reboot
> > > > > > recovery situation.
> > > > > > Even in the case where we are in the initial mount, it also
> > > > > > doesn't
> > > > > > distinguish between transient errors such as NFS4ERR_DELAY or
> > > > > > reboot
> > > > > > errors such as NFS4ERR_STALE_CLIENTID, etc.
> > > > >
> > > > > > Exactly what is the scenario that is causing your hang? Let's
> > > > > > try
> > > > > > to
> > > > > > address that with a more targeted fix.
> > > > >
> > > > > The scenario is as follows: there are several NFS servers and
> > > > > several
> > > > > production machines with multiple NFS mounts. This is a
> > > > > containerized
> > > > > multi-tennant workflow so every tennant gets its own NFS mount to
> > > > > access their
> > > > > data. At some point nfs41_init_clientid fails in the initial
> > > > > mount.nfs call
> > > > > and all subsequent mount.nfs calls just hang in
> > > > > nfs_wait_client_init_complete
> > > > > until the original one, where nfs4_proc_exchange_id has failed,
> > > > > is
> > > > > killed.
> > > > >
> > > > > The cause of the nfs41_init_clientid failure in the production
> > > > > case
> > > > > is a timeout.
> > > > > The following error message is observed in logs:
> > > > >   NFS: state manager: lease expired failed on NFSv4 server <ip>
> > > > > with
> > > > > error 110
> > > > >
> > > >
> > > > How about something like the following fix then?
> > > > 8<-----------------------------------------------
> > > > From eb402b489bb0d0ada1a3dd9101d4d7e193402e46 Mon Sep 17 00:00:00
> > > > 2001
> > > > Message-ID:
> > > > <eb402b489bb0d0ada1a3dd9101d4d7e193402e46.1725904471.git.trond.mykl
> > > > ebust@hammerspace.com>
> > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > Date: Mon, 9 Sep 2024 13:47:07 -0400
> > > > Subject: [PATCH] NFSv4: Fail mounts if the lease setup times out
> > > >
> > > > If the server is down when the client is trying to mount, so that
> > > > the
> > > > calls to exchange_id or create_session fail, then we should allow
> > > > the
> > > > mount system call to fail rather than hang and block other
> > > > mount/umount
> > > > calls.
> > > >
> > > > Reported-by: Oleksandr Tymoshenko <ovt@google.com>
> > > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > ---
> > > >  fs/nfs/nfs4state.c | 6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > > > index 30aba1dedaba..59dcdf9bc7b4 100644
> > > > --- a/fs/nfs/nfs4state.c
> > > > +++ b/fs/nfs/nfs4state.c
> > > > @@ -2024,6 +2024,12 @@ static int
> > > > nfs4_handle_reclaim_lease_error(struct nfs_client *clp, int status)
> > > >                 nfs_mark_client_ready(clp, -EPERM);
> > > >                 clear_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state);
> > > >                 return -EPERM;
> > > > +       case -ETIMEDOUT:
> > > > +               if (clp->cl_cons_state == NFS_CS_SESSION_INITING) {
> > > > +                       nfs_mark_client_ready(clp, -EIO);
> > > > +                       return -EIO;
> > > > +               }
> > > > +               fallthrough;
> > > >         case -EACCES:
> > > >         case -NFS4ERR_DELAY:
> > > >         case -EAGAIN:
> > > > --
> > >
> > > This patch fixes the issue in my simulated environment. ETIMEDOUT is
> > > the error code that
> > > was observed in the production env but I guess it's not the only
> > > possible one. Would it make
> > > sense to handle all error conditions in the NFS_CS_SESSION_INITING
> > > state or are there
> > > some others that are recoverable?
> > >
> >
> > The only other one that I'm thinking might want to be treated similarly
> > is the above EACCES error. That's because that is only returned if
> > there is a problem with your RPCSEC_GSS/krb5 credential. I was thinking
> > of changing that one too in the same patch, but came to the conclusion
> > it would be better to treat the two issues with separate fixes.
> >
> > The other error conditions are all supposed to be transient NFS level
> > errors. They should not be treated as fatal.
>
> Sounds good. Will you submit this patch to the mainline kernel? If so
> please add me to Cc. Thanks for looking into this.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client
  2024-09-23 20:15             ` Oleksandr Tymoshenko
@ 2024-09-26 20:02               ` Trond Myklebust
  0 siblings, 0 replies; 17+ messages in thread
From: Trond Myklebust @ 2024-09-26 20:02 UTC (permalink / raw)
  To: ovt@google.com
  Cc: anna@kernel.org, jbongio@google.com, linux-nfs@vger.kernel.org,
	stable@vger.kernel.org

Hi Oleksandr

On Mon, 2024-09-23 at 13:15 -0700, Oleksandr Tymoshenko wrote:
> Hi Trond,
> 
> Following up on this: do you have plans to submit the patch you
> proposed
> or do you want me to rework my submission along the lines you
> proposed?

Let's just merge the patch I proposed. I've sent a "clean" copy to Anna
to send upstream in the next bugfix pull.

Thanks!
  Trond



-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client
  2024-09-08 16:48 ` Trond Myklebust
  2024-09-09 16:36   ` [PATCH 6.1.y] net: tls: handle backlogging of crypto requests Oleksandr Tymoshenko
@ 2024-09-09 17:46   ` Oleksandr Tymoshenko
  1 sibling, 0 replies; 17+ messages in thread
From: Oleksandr Tymoshenko @ 2024-09-09 17:46 UTC (permalink / raw)
  To: trondmy; +Cc: anna, jbongio, linux-nfs, ovt, stable, gonzo

>> nfs41_init_clientid does not signal a failure condition from
>> nfs4_proc_exchange_id and nfs4_proc_create_session to a client which
>> may
>> lead to mount syscall indefinitely blocked in the following stack

> NACK. This will break all sorts of recovery scenarios, because it
> doesn't distinguish between an initial 'mount' and a server reboot
> recovery situation.
> Even in the case where we are in the initial mount, it also doesn't
> distinguish between transient errors such as NFS4ERR_DELAY or reboot
> errors such as NFS4ERR_STALE_CLIENTID, etc.

> Exactly what is the scenario that is causing your hang? Let's try to
> address that with a more targeted fix.

(re-sending with the correct subject, previous mistake was due to my tools failure)

The scenario is as follows: there are several NFS servers and several
production machines with multiple NFS mounts. This is a containerized
multi-tennant workflow so every tennant gets its own NFS mount to access their
data. At some point nfs41_init_clientid fails in the initial mount.nfs call
and all subsequent mount.nfs calls just hang in nfs_wait_client_init_complete
until the original one, where nfs4_proc_exchange_id has failed, is killed.

The cause of the nfs41_init_clientid failure in the production case is a timeout.
The following error message is observed in logs:
  NFS: state manager: lease expired failed on NFSv4 server <ip> with error 110

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 6.1.y] net: tls: handle backlogging of crypto requests
@ 2024-03-28 12:38 Srish Srinivasan
  2024-03-29  9:23 ` Greg KH
  2024-05-21 10:58 ` Oleksandr Tymoshenko
  0 siblings, 2 replies; 17+ messages in thread
From: Srish Srinivasan @ 2024-03-28 12:38 UTC (permalink / raw)
  To: stable, gregkh
  Cc: borisp, john.fastabend, kuba, davem, edumazet, pabeni, vakul.garg,
	davejwatson, netdev, ajay.kaher, alexey.makhalov,
	vasavi.sirnapalli, Sabrina Dubroca, Simon Horman, Sasha Levin,
	Srish Srinivasan

From: Jakub Kicinski <kuba@kernel.org>

commit 8590541473188741055d27b955db0777569438e3 upstream

Since we're setting the CRYPTO_TFM_REQ_MAY_BACKLOG flag on our
requests to the crypto API, crypto_aead_{encrypt,decrypt} can return
 -EBUSY instead of -EINPROGRESS in valid situations. For example, when
the cryptd queue for AESNI is full (easy to trigger with an
artificially low cryptd.cryptd_max_cpu_qlen), requests will be enqueued
to the backlog but still processed. In that case, the async callback
will also be called twice: first with err == -EINPROGRESS, which it
seems we can just ignore, then with err == 0.

Compared to Sabrina's original patch this version uses the new
tls_*crypt_async_wait() helpers and converts the EBUSY to
EINPROGRESS to avoid having to modify all the error handling
paths. The handling is identical.

Fixes: a54667f6728c ("tls: Add support for encryption using async offload accelerator")
Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records")
Co-developed-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://lore.kernel.org/netdev/9681d1febfec295449a62300938ed2ae66983f28.1694018970.git.sd@queasysnail.net/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[Srish: fixed merge-conflict in stable branch linux-6.1.y,
needs to go on top of https://lore.kernel.org/stable/20240307155930.913525-1-lee@kernel.org/]
Signed-off-by: Srish Srinivasan <srish.srinivasan@broadcom.com>
---
 net/tls/tls_sw.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 2bd27b777..61b01dfc6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -195,6 +195,17 @@ static void tls_decrypt_done(crypto_completion_data_t *data, int err)
 	struct sock *sk;
 	int aead_size;
 
+	/* If requests get too backlogged crypto API returns -EBUSY and calls
+	 * ->complete(-EINPROGRESS) immediately followed by ->complete(0)
+	 * to make waiting for backlog to flush with crypto_wait_req() easier.
+	 * First wait converts -EBUSY -> -EINPROGRESS, and the second one
+	 * -EINPROGRESS -> 0.
+	 * We have a single struct crypto_async_request per direction, this
+	 * scheme doesn't help us, so just ignore the first ->complete().
+	 */
+	if (err == -EINPROGRESS)
+		return;
+
 	aead_size = sizeof(*aead_req) + crypto_aead_reqsize(aead);
 	aead_size = ALIGN(aead_size, __alignof__(*dctx));
 	dctx = (void *)((u8 *)aead_req + aead_size);
@@ -268,6 +279,10 @@ static int tls_do_decryption(struct sock *sk,
 	}
 
 	ret = crypto_aead_decrypt(aead_req);
+	if (ret == -EBUSY) {
+		ret = tls_decrypt_async_wait(ctx);
+		ret = ret ?: -EINPROGRESS;
+	}
 	if (ret == -EINPROGRESS) {
 		if (darg->async)
 			return 0;
@@ -452,6 +467,9 @@ static void tls_encrypt_done(crypto_completion_data_t *data, int err)
 	bool ready = false;
 	struct sock *sk;
 
+	if (err == -EINPROGRESS) /* see the comment in tls_decrypt_done() */
+		return;
+
 	rec = container_of(aead_req, struct tls_rec, aead_req);
 	msg_en = &rec->msg_encrypted;
 
@@ -560,6 +578,10 @@ static int tls_do_encryption(struct sock *sk,
 	atomic_inc(&ctx->encrypt_pending);
 
 	rc = crypto_aead_encrypt(aead_req);
+	if (rc == -EBUSY) {
+		rc = tls_encrypt_async_wait(ctx);
+		rc = rc ?: -EINPROGRESS;
+	}
 	if (!rc || rc != -EINPROGRESS) {
 		atomic_dec(&ctx->encrypt_pending);
 		sge->offset -= prot->prepend_size;
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 6.1.y] net: tls: handle backlogging of crypto requests
  2024-03-28 12:38 [PATCH 6.1.y] net: tls: handle backlogging of crypto requests Srish Srinivasan
@ 2024-03-29  9:23 ` Greg KH
  2024-03-29 10:32   ` Srish Srinivasan
  2024-05-21 10:58 ` Oleksandr Tymoshenko
  1 sibling, 1 reply; 17+ messages in thread
From: Greg KH @ 2024-03-29  9:23 UTC (permalink / raw)
  To: Srish Srinivasan
  Cc: stable, borisp, john.fastabend, kuba, davem, edumazet, pabeni,
	vakul.garg, davejwatson, netdev, ajay.kaher, alexey.makhalov,
	vasavi.sirnapalli, Sabrina Dubroca, Simon Horman, Sasha Levin

On Thu, Mar 28, 2024 at 06:08:05PM +0530, Srish Srinivasan wrote:
> From: Jakub Kicinski <kuba@kernel.org>
> 
> commit 8590541473188741055d27b955db0777569438e3 upstream
> 
> Since we're setting the CRYPTO_TFM_REQ_MAY_BACKLOG flag on our
> requests to the crypto API, crypto_aead_{encrypt,decrypt} can return
>  -EBUSY instead of -EINPROGRESS in valid situations. For example, when
> the cryptd queue for AESNI is full (easy to trigger with an
> artificially low cryptd.cryptd_max_cpu_qlen), requests will be enqueued
> to the backlog but still processed. In that case, the async callback
> will also be called twice: first with err == -EINPROGRESS, which it
> seems we can just ignore, then with err == 0.
> 
> Compared to Sabrina's original patch this version uses the new
> tls_*crypt_async_wait() helpers and converts the EBUSY to
> EINPROGRESS to avoid having to modify all the error handling
> paths. The handling is identical.
> 
> Fixes: a54667f6728c ("tls: Add support for encryption using async offload accelerator")
> Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records")
> Co-developed-by: Sabrina Dubroca <sd@queasysnail.net>
> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
> Link: https://lore.kernel.org/netdev/9681d1febfec295449a62300938ed2ae66983f28.1694018970.git.sd@queasysnail.net/
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> Reviewed-by: Simon Horman <horms@kernel.org>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> [Srish: fixed merge-conflict in stable branch linux-6.1.y,
> needs to go on top of https://lore.kernel.org/stable/20240307155930.913525-1-lee@kernel.org/]
> Signed-off-by: Srish Srinivasan <srish.srinivasan@broadcom.com>
> ---
>  net/tls/tls_sw.c | 22 ++++++++++++++++++++++
>  1 file changed, 22 insertions(+)

Now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6.1.y] net: tls: handle backlogging of crypto requests
  2024-03-29  9:23 ` Greg KH
@ 2024-03-29 10:32   ` Srish Srinivasan
  2024-03-29 11:48     ` Greg KH
  0 siblings, 1 reply; 17+ messages in thread
From: Srish Srinivasan @ 2024-03-29 10:32 UTC (permalink / raw)
  To: Greg KH
  Cc: stable, borisp, john.fastabend, kuba, davem, edumazet, pabeni,
	vakul.garg, davejwatson, netdev, Ajay Kaher, Alexey Makhalov,
	Vasavi Sirnapalli, Sabrina Dubroca, Simon Horman, Sasha Levin

On Fri, Mar 29, 2024 at 2:53 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Thu, Mar 28, 2024 at 06:08:05PM +0530, Srish Srinivasan wrote:
> > From: Jakub Kicinski <kuba@kernel.org>
> >
> > commit 8590541473188741055d27b955db0777569438e3 upstream
> >
> > Since we're setting the CRYPTO_TFM_REQ_MAY_BACKLOG flag on our
> > requests to the crypto API, crypto_aead_{encrypt,decrypt} can return
> >  -EBUSY instead of -EINPROGRESS in valid situations. For example, when
> > the cryptd queue for AESNI is full (easy to trigger with an
> > artificially low cryptd.cryptd_max_cpu_qlen), requests will be enqueued
> > to the backlog but still processed. In that case, the async callback
> > will also be called twice: first with err == -EINPROGRESS, which it
> > seems we can just ignore, then with err == 0.
> >
> > Compared to Sabrina's original patch this version uses the new
> > tls_*crypt_async_wait() helpers and converts the EBUSY to
> > EINPROGRESS to avoid having to modify all the error handling
> > paths. The handling is identical.
> >
> > Fixes: a54667f6728c ("tls: Add support for encryption using async offload accelerator")
> > Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records")
> > Co-developed-by: Sabrina Dubroca <sd@queasysnail.net>
> > Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
> > Link: https://lore.kernel.org/netdev/9681d1febfec295449a62300938ed2ae66983f28.1694018970.git.sd@queasysnail.net/
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> > Reviewed-by: Simon Horman <horms@kernel.org>
> > Signed-off-by: David S. Miller <davem@davemloft.net>
> > Signed-off-by: Sasha Levin <sashal@kernel.org>
> > [Srish: fixed merge-conflict in stable branch linux-6.1.y,
> > needs to go on top of https://lore.kernel.org/stable/20240307155930.913525-1-lee@kernel.org/]
> > Signed-off-by: Srish Srinivasan <srish.srinivasan@broadcom.com>
> > ---
> >  net/tls/tls_sw.c | 22 ++++++++++++++++++++++
> >  1 file changed, 22 insertions(+)
>
> Now queued up, thanks.
>

Greg, this patch (i.e. v1) has hunk failures.

Just now I have sent v2 for this patch (after resolving hunks).
Requesting you to queue up v2:
https://lore.kernel.org/stable/20240329102540.3888561-1-srish.srinivasan@broadcom.com/T/#m164567a5bd32085931a1b1367ae12e4102870111

Sorry for the inconvenience.

> greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6.1.y] net: tls: handle backlogging of crypto requests
  2024-03-29 10:32   ` Srish Srinivasan
@ 2024-03-29 11:48     ` Greg KH
  0 siblings, 0 replies; 17+ messages in thread
From: Greg KH @ 2024-03-29 11:48 UTC (permalink / raw)
  To: Srish Srinivasan
  Cc: stable, borisp, john.fastabend, kuba, davem, edumazet, pabeni,
	vakul.garg, davejwatson, netdev, Ajay Kaher, Alexey Makhalov,
	Vasavi Sirnapalli, Sabrina Dubroca, Simon Horman, Sasha Levin

On Fri, Mar 29, 2024 at 04:02:57PM +0530, Srish Srinivasan wrote:
> On Fri, Mar 29, 2024 at 2:53 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Thu, Mar 28, 2024 at 06:08:05PM +0530, Srish Srinivasan wrote:
> > > From: Jakub Kicinski <kuba@kernel.org>
> > >
> > > commit 8590541473188741055d27b955db0777569438e3 upstream
> > >
> > > Since we're setting the CRYPTO_TFM_REQ_MAY_BACKLOG flag on our
> > > requests to the crypto API, crypto_aead_{encrypt,decrypt} can return
> > >  -EBUSY instead of -EINPROGRESS in valid situations. For example, when
> > > the cryptd queue for AESNI is full (easy to trigger with an
> > > artificially low cryptd.cryptd_max_cpu_qlen), requests will be enqueued
> > > to the backlog but still processed. In that case, the async callback
> > > will also be called twice: first with err == -EINPROGRESS, which it
> > > seems we can just ignore, then with err == 0.
> > >
> > > Compared to Sabrina's original patch this version uses the new
> > > tls_*crypt_async_wait() helpers and converts the EBUSY to
> > > EINPROGRESS to avoid having to modify all the error handling
> > > paths. The handling is identical.
> > >
> > > Fixes: a54667f6728c ("tls: Add support for encryption using async offload accelerator")
> > > Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records")
> > > Co-developed-by: Sabrina Dubroca <sd@queasysnail.net>
> > > Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
> > > Link: https://lore.kernel.org/netdev/9681d1febfec295449a62300938ed2ae66983f28.1694018970.git.sd@queasysnail.net/
> > > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> > > Reviewed-by: Simon Horman <horms@kernel.org>
> > > Signed-off-by: David S. Miller <davem@davemloft.net>
> > > Signed-off-by: Sasha Levin <sashal@kernel.org>
> > > [Srish: fixed merge-conflict in stable branch linux-6.1.y,
> > > needs to go on top of https://lore.kernel.org/stable/20240307155930.913525-1-lee@kernel.org/]
> > > Signed-off-by: Srish Srinivasan <srish.srinivasan@broadcom.com>
> > > ---
> > >  net/tls/tls_sw.c | 22 ++++++++++++++++++++++
> > >  1 file changed, 22 insertions(+)
> >
> > Now queued up, thanks.
> >
> 
> Greg, this patch (i.e. v1) has hunk failures.

What do you mean?  it worked here just fine.

> Just now I have sent v2 for this patch (after resolving hunks).
> Requesting you to queue up v2:
> https://lore.kernel.org/stable/20240329102540.3888561-1-srish.srinivasan@broadcom.com/T/#m164567a5bd32085931a1b1367ae12e4102870111

Let me see what the actual difference is...


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 6.1.y] net: tls: handle backlogging of crypto requests
  2024-03-28 12:38 [PATCH 6.1.y] net: tls: handle backlogging of crypto requests Srish Srinivasan
  2024-03-29  9:23 ` Greg KH
@ 2024-05-21 10:58 ` Oleksandr Tymoshenko
  2024-05-21 15:26   ` Greg KH
  1 sibling, 1 reply; 17+ messages in thread
From: Oleksandr Tymoshenko @ 2024-05-21 10:58 UTC (permalink / raw)
  To: srish.srinivasan
  Cc: ajay.kaher, alexey.makhalov, borisp, davejwatson, davem, edumazet,
	gregkh, horms, john.fastabend, kuba, netdev, pabeni, sashal, sd,
	stable, vakul.garg, vasavi.sirnapalli

Hello,

As far as I understand this issue also affects kernel 5.15. Are there any plans
to backport it to 5.15?

Thank you

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 6.1.y] net: tls: handle backlogging of crypto requests
  2024-05-21 10:58 ` Oleksandr Tymoshenko
@ 2024-05-21 15:26   ` Greg KH
  0 siblings, 0 replies; 17+ messages in thread
From: Greg KH @ 2024-05-21 15:26 UTC (permalink / raw)
  To: Oleksandr Tymoshenko
  Cc: srish.srinivasan, ajay.kaher, alexey.makhalov, borisp,
	davejwatson, davem, edumazet, horms, john.fastabend, kuba, netdev,
	pabeni, sashal, sd, stable, vakul.garg, vasavi.sirnapalli

On Tue, May 21, 2024 at 10:58:38AM +0000, Oleksandr Tymoshenko wrote:
> Hello,
> 
> As far as I understand this issue also affects kernel 5.15. Are there any plans
> to backport it to 5.15?

Why not provide a working backport if you are interested in the 5.15.y
kernel tree?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-09-26 20:02 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-06  0:57 [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client Oleksandr Tymoshenko
2024-09-06  0:58 ` kernel test robot
2024-09-08 16:48 ` Trond Myklebust
2024-09-09 16:36   ` [PATCH 6.1.y] net: tls: handle backlogging of crypto requests Oleksandr Tymoshenko
2024-09-09 17:56     ` Trond Myklebust
2024-09-09 23:06       ` [PATCH] NFSv4: fix a mount deadlock in NFS v4.1 client Oleksandr Tymoshenko
2024-09-10  0:22         ` Trond Myklebust
2024-09-10 21:08           ` Oleksandr Tymoshenko
2024-09-23 20:15             ` Oleksandr Tymoshenko
2024-09-26 20:02               ` Trond Myklebust
2024-09-09 17:46   ` Oleksandr Tymoshenko
  -- strict thread matches above, loose matches on Subject: below --
2024-03-28 12:38 [PATCH 6.1.y] net: tls: handle backlogging of crypto requests Srish Srinivasan
2024-03-29  9:23 ` Greg KH
2024-03-29 10:32   ` Srish Srinivasan
2024-03-29 11:48     ` Greg KH
2024-05-21 10:58 ` Oleksandr Tymoshenko
2024-05-21 15:26   ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox