PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC]

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC]
@ 2024-02-21  8:20 Zhitao Li
  2024-02-21 13:48 ` Trond Myklebust
  0 siblings, 1 reply; 11+ messages in thread
From: Zhitao Li @ 2024-02-21  8:20 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Jeff Layton,
	Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey
  Cc: linux-nfs, linux-kernel, Ping Huang

Hi, everyone,

- Facts:
I have a remote NFS export and I mount the same export on two
different directories in my OS with the same options. There is an
inflight IO under one mounted directory. And then I unmount another
mounted directory with force. The inflight IO ends up with "Unknown
error 512", which is ERESTARTSYS.

OS: Linux kernel v6.7.0
NFS mount options: vers=4.1


- My speculation:
When the same export is mounted on different directories with the same
options, superblock and sunrpc_client will be shared. Unmount with
force will kill all rpc_tasks with ERESTARTSYS in rpc_killall_tasks().
However, no signal gets involved in this case. So ERESTARTSYS is not
handled before entering user mode.

I think there are two unexpected points here:
1. The inflight IO should not fail when I unmount another directory,
though the two directories share the same export.
2. "ERESTARTSYS" should not be seen in user space. EIO may be better.


- Reproduction:
1. Prepare some NFS export, nfsd or nfs-ganesha. For example, the
export is "ip:/export_path".
2. On the latest stable mainstream Linux kernel v6.7.0, mount the
export into two different directories with the same options:
      mount -t nfs -o vers=4.1 ip:/export_path  /mnt/test1
      mount -t nfs -o vers=4.1 ip:/export_path  /mnt/test2
3. Start an inflight IO in "/mnt/test1":
      dd if=/dev/urandom of=/mnt/test1/1G bs=1M count=1024 oflag=direct
4. Umount "/mnt/test2" with force when IO in step 3 is going:
      umount -f /mnt/test2
5. The "dd" is expected to fail with following information:
       # dd if=/dev/urandom of=/mnt/test1/1G bs=1M count=1024 oflag=direct
       dd: error writing '/mnt/test1/1G': Unknown error 512
       214+0 records in
       213+0 records out
       223346688 bytes (223 MB, 213 MiB) copied, 7.87017 s, 28.4 MB/s.


- Helpful links
1. v6.7.0 rpc_killall_tasks():
https://elixir.bootlin.com/linux/v6.7/source/net/sunrpc/clnt.c#L869
2. COMMIT "SUNRPC: Fix up task signalling v5.2-rc1" changes the error
code of rpc_tasks in rpc_killall_tasks() from EIO to ERESTARTSYS. The
link is https://github.com/torvalds/linux/commit/ae67bd3821bb0a54d97e7883d211196637d487a9?diff=split&w=0


Looking forward to your early reply :)

Best regards,
Zhitao Li, in SmartX.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC]
  2024-02-21  8:20 PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC] Zhitao Li
@ 2024-02-21 13:48 ` Trond Myklebust
  2024-02-22  3:05   ` Zhitao Li
  2024-02-22 11:05   ` Jeff Layton
  0 siblings, 2 replies; 11+ messages in thread
From: Trond Myklebust @ 2024-02-21 13:48 UTC (permalink / raw)
  To: chuck.lever@oracle.com, zhitao.li@smartx.com, tom@talpey.com,
	anna@kernel.org, Dai.Ngo@oracle.com, jlayton@kernel.org,
	neilb@suse.de, kolga@netapp.com
  Cc: huangping@smartx.com, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote:
> [You don't often get email from zhitao.li@smartx.com. Learn why this
> is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> Hi, everyone,
> 
> - Facts:
> I have a remote NFS export and I mount the same export on two
> different directories in my OS with the same options. There is an
> inflight IO under one mounted directory. And then I unmount another
> mounted directory with force. The inflight IO ends up with "Unknown
> error 512", which is ERESTARTSYS.
> 

All of the above is well known. That's because forced umount affects
the entire filesystem. Why are you using it here in the first place? It
is not intended for casual use.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC]
  2024-02-21 13:48 ` Trond Myklebust
@ 2024-02-22  3:05   ` Zhitao Li
  2024-02-22 11:05   ` Jeff Layton
  1 sibling, 0 replies; 11+ messages in thread
From: Zhitao Li @ 2024-02-22  3:05 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: chuck.lever@oracle.com, tom@talpey.com, anna@kernel.org,
	Dai.Ngo@oracle.com, jlayton@kernel.org, neilb@suse.de,
	kolga@netapp.com, huangping@smartx.com, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

Forced unmount is used in a casual way to some degree by our team. We
will adjust the usage of forced unmount and multiple mount points for
the same export.

As you said, all of the above is well known. Actually, this phenomenon
is unexpected for us. Could you give some more information?  Are they
issues? And if they are issues, is there any plan to fix them?

Best regards,
Zhitao Li, in SmartX.

On Wed, Feb 21, 2024 at 9:48 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote:
> > [You don't often get email from zhitao.li@smartx.com. Learn why this
> > is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > Hi, everyone,
> >
> > - Facts:
> > I have a remote NFS export and I mount the same export on two
> > different directories in my OS with the same options. There is an
> > inflight IO under one mounted directory. And then I unmount another
> > mounted directory with force. The inflight IO ends up with "Unknown
> > error 512", which is ERESTARTSYS.
> >
>
> All of the above is well known. That's because forced umount affects
> the entire filesystem. Why are you using it here in the first place? It
> is not intended for casual use.
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC]
  2024-02-21 13:48 ` Trond Myklebust
  2024-02-22  3:05   ` Zhitao Li
@ 2024-02-22 11:05   ` Jeff Layton
  2024-02-22 15:20     ` Trond Myklebust
  2024-02-23  3:20     ` Zhitao Li
  1 sibling, 2 replies; 11+ messages in thread
From: Jeff Layton @ 2024-02-22 11:05 UTC (permalink / raw)
  To: Trond Myklebust, chuck.lever@oracle.com, zhitao.li@smartx.com,
	tom@talpey.com, anna@kernel.org, Dai.Ngo@oracle.com,
	neilb@suse.de, kolga@netapp.com
  Cc: huangping@smartx.com, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Wed, 2024-02-21 at 13:48 +0000, Trond Myklebust wrote:
> On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote:
> > [You don't often get email from zhitao.li@smartx.com. Learn why this
> > is important at https://aka.ms/LearnAboutSenderIdentification ]
> > 
> > Hi, everyone,
> > 
> > - Facts:
> > I have a remote NFS export and I mount the same export on two
> > different directories in my OS with the same options. There is an
> > inflight IO under one mounted directory. And then I unmount another
> > mounted directory with force. The inflight IO ends up with "Unknown
> > error 512", which is ERESTARTSYS.
> > 
> 
> All of the above is well known. That's because forced umount affects
> the entire filesystem. Why are you using it here in the first place? It
> is not intended for casual use.
> 

While I agree Trond's above statement, the kernel is not supposed to
leak error codes that high into userland. Are you seeing ERESTARTSYS
being returned to system calls? If so, which ones?
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC]
  2024-02-22 11:05   ` Jeff Layton
@ 2024-02-22 15:20     ` Trond Myklebust
  2024-02-23  3:44       ` Zhitao Li
  2024-02-23 10:31       ` Jeff Layton
  2024-02-23  3:20     ` Zhitao Li
  1 sibling, 2 replies; 11+ messages in thread
From: Trond Myklebust @ 2024-02-22 15:20 UTC (permalink / raw)
  To: chuck.lever@oracle.com, zhitao.li@smartx.com, kolga@netapp.com,
	anna@kernel.org, tom@talpey.com, jlayton@kernel.org,
	neilb@suse.de, Dai.Ngo@oracle.com
  Cc: huangping@smartx.com, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Thu, 2024-02-22 at 06:05 -0500, Jeff Layton wrote:
> On Wed, 2024-02-21 at 13:48 +0000, Trond Myklebust wrote:
> > On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote:
> > > [You don't often get email from zhitao.li@smartx.com. Learn why
> > > this
> > > is important at https://aka.ms/LearnAboutSenderIdentification ]
> > > 
> > > Hi, everyone,
> > > 
> > > - Facts:
> > > I have a remote NFS export and I mount the same export on two
> > > different directories in my OS with the same options. There is an
> > > inflight IO under one mounted directory. And then I unmount
> > > another
> > > mounted directory with force. The inflight IO ends up with
> > > "Unknown
> > > error 512", which is ERESTARTSYS.
> > > 
> > 
> > All of the above is well known. That's because forced umount
> > affects
> > the entire filesystem. Why are you using it here in the first
> > place? It
> > is not intended for casual use.
> > 
> 
> While I agree Trond's above statement, the kernel is not supposed to
> leak error codes that high into userland. Are you seeing ERESTARTSYS
> being returned to system calls? If so, which ones?

The point of forced umount is to kill all RPC calls associated with the
filesystem in order to unblock the umount. Basically, it triggers this
code before the unmount starts:

void nfs_umount_begin(struct super_block *sb)
{
        struct nfs_server *server;
        struct rpc_clnt *rpc;

        server = NFS_SB(sb);
        /* -EIO all pending I/O */
        rpc = server->client_acl;
        if (!IS_ERR(rpc))
                rpc_killall_tasks(rpc);
        rpc = server->client;
        if (!IS_ERR(rpc))
                rpc_killall_tasks(rpc);
}

So yes, that does signal all the way up to the application level, and
it is very much intended to do so.
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC]
  2024-02-22 15:20     ` Trond Myklebust
@ 2024-02-23  3:44       ` Zhitao Li
  2024-02-23 10:31       ` Jeff Layton
  1 sibling, 0 replies; 11+ messages in thread
From: Zhitao Li @ 2024-02-23  3:44 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: chuck.lever@oracle.com, kolga@netapp.com, anna@kernel.org,
	tom@talpey.com, jlayton@kernel.org, neilb@suse.de,
	Dai.Ngo@oracle.com, huangping@smartx.com,
	linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org

Thanks for Trond's reply.

ERESTARTSYS is an unknown error in application level.  Maybe EIO or
EINTR can be more suitable.

Best regards,
Zhitao Li, at SmartX

On Thu, Feb 22, 2024 at 11:20 PM Trond Myklebust
<trondmy@hammerspace.com> wrote:
>
> On Thu, 2024-02-22 at 06:05 -0500, Jeff Layton wrote:
> > On Wed, 2024-02-21 at 13:48 +0000, Trond Myklebust wrote:
> > > On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote:
> > > > [You don't often get email from zhitao.li@smartx.com. Learn why
> > > > this
> > > > is important at https://aka.ms/LearnAboutSenderIdentification ]
> > > >
> > > > Hi, everyone,
> > > >
> > > > - Facts:
> > > > I have a remote NFS export and I mount the same export on two
> > > > different directories in my OS with the same options. There is an
> > > > inflight IO under one mounted directory. And then I unmount
> > > > another
> > > > mounted directory with force. The inflight IO ends up with
> > > > "Unknown
> > > > error 512", which is ERESTARTSYS.
> > > >
> > >
> > > All of the above is well known. That's because forced umount
> > > affects
> > > the entire filesystem. Why are you using it here in the first
> > > place? It
> > > is not intended for casual use.
> > >
> >
> > While I agree Trond's above statement, the kernel is not supposed to
> > leak error codes that high into userland. Are you seeing ERESTARTSYS
> > being returned to system calls? If so, which ones?
>
> The point of forced umount is to kill all RPC calls associated with the
> filesystem in order to unblock the umount. Basically, it triggers this
> code before the unmount starts:
>
> void nfs_umount_begin(struct super_block *sb)
> {
>         struct nfs_server *server;
>         struct rpc_clnt *rpc;
>
>         server = NFS_SB(sb);
>         /* -EIO all pending I/O */
>         rpc = server->client_acl;
>         if (!IS_ERR(rpc))
>                 rpc_killall_tasks(rpc);
>         rpc = server->client;
>         if (!IS_ERR(rpc))
>                 rpc_killall_tasks(rpc);
> }
>
> So yes, that does signal all the way up to the application level, and
> it is very much intended to do so.
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC]
  2024-02-22 15:20     ` Trond Myklebust
  2024-02-23  3:44       ` Zhitao Li
@ 2024-02-23 10:31       ` Jeff Layton
  2024-02-27  2:35         ` Zhitao Li
  1 sibling, 1 reply; 11+ messages in thread
From: Jeff Layton @ 2024-02-23 10:31 UTC (permalink / raw)
  To: Trond Myklebust, chuck.lever@oracle.com, zhitao.li@smartx.com,
	kolga@netapp.com, anna@kernel.org, tom@talpey.com, neilb@suse.de,
	Dai.Ngo@oracle.com
  Cc: huangping@smartx.com, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Thu, 2024-02-22 at 15:20 +0000, Trond Myklebust wrote:
> On Thu, 2024-02-22 at 06:05 -0500, Jeff Layton wrote:
> > On Wed, 2024-02-21 at 13:48 +0000, Trond Myklebust wrote:
> > > On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote:
> > > > [You don't often get email from zhitao.li@smartx.com. Learn why
> > > > this
> > > > is important at https://aka.ms/LearnAboutSenderIdentification ]
> > > > 
> > > > Hi, everyone,
> > > > 
> > > > - Facts:
> > > > I have a remote NFS export and I mount the same export on two
> > > > different directories in my OS with the same options. There is an
> > > > inflight IO under one mounted directory. And then I unmount
> > > > another
> > > > mounted directory with force. The inflight IO ends up with
> > > > "Unknown
> > > > error 512", which is ERESTARTSYS.
> > > > 
> > > 
> > > All of the above is well known. That's because forced umount
> > > affects
> > > the entire filesystem. Why are you using it here in the first
> > > place? It
> > > is not intended for casual use.
> > > 
> > 
> > While I agree Trond's above statement, the kernel is not supposed to
> > leak error codes that high into userland. Are you seeing ERESTARTSYS
> > being returned to system calls? If so, which ones?
> 
> The point of forced umount is to kill all RPC calls associated with the
> filesystem in order to unblock the umount. Basically, it triggers this
> code before the unmount starts:
> 
> void nfs_umount_begin(struct super_block *sb)
> {
>         struct nfs_server *server;
>         struct rpc_clnt *rpc;
> 
>         server = NFS_SB(sb);
>         /* -EIO all pending I/O */
>         rpc = server->client_acl;
>         if (!IS_ERR(rpc))
>                 rpc_killall_tasks(rpc);
>         rpc = server->client;
>         if (!IS_ERR(rpc))
>                 rpc_killall_tasks(rpc);
> }
> 
> So yes, that does signal all the way up to the application level, and
> it is very much intended to do so.

Returning an error to userland in this situation is fine, but userland
programs aren't really equipped to deal with error numbers in this
range.

Emphasis on the first sentence in the comment in include/linux/errno.h:

-------------------8<-----------------------
/*
 * These should never be seen by user programs.  To return one of ERESTART*
 * codes, signal_pending() MUST be set.  Note that ptrace can observe these
 * at syscall exit tracing, but they will never be left for the debugged user
 * process to see.
 */
#define ERESTARTSYS     512
#define ERESTARTNOINTR  513
#define ERESTARTNOHAND  514     /* restart if no handler.. */
#define ENOIOCTLCMD     515     /* No ioctl command */
#define ERESTART_RESTARTBLOCK 516 /* restart by calling sys_restart_syscall */
#define EPROBE_DEFER    517     /* Driver requests probe retry */
#define EOPENSTALE      518     /* open found a stale dentry */
#define ENOPARAM        519     /* Parameter not supported */
-------------------8<-----------------------

If these values are leaking into userland, then that seems like a bug.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC]
  2024-02-23 10:31       ` Jeff Layton
@ 2024-02-27  2:35         ` Zhitao Li
  0 siblings, 0 replies; 11+ messages in thread
From: Zhitao Li @ 2024-02-27  2:35 UTC (permalink / raw)
  To: Trond Myklebust, Jeff Layton
  Cc: chuck.lever@oracle.com, kolga@netapp.com, anna@kernel.org,
	tom@talpey.com, neilb@suse.de, Dai.Ngo@oracle.com,
	huangping@smartx.com, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

Is there any plan for this ERESTARTSYS leak issue?

--
Zhitao Li, at SmartX

On Fri, Feb 23, 2024 at 6:31 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Thu, 2024-02-22 at 15:20 +0000, Trond Myklebust wrote:
> > On Thu, 2024-02-22 at 06:05 -0500, Jeff Layton wrote:
> > > On Wed, 2024-02-21 at 13:48 +0000, Trond Myklebust wrote:
> > > > On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote:
> > > > > [You don't often get email from zhitao.li@smartx.com. Learn why
> > > > > this
> > > > > is important at https://aka.ms/LearnAboutSenderIdentification ]
> > > > >
> > > > > Hi, everyone,
> > > > >
> > > > > - Facts:
> > > > > I have a remote NFS export and I mount the same export on two
> > > > > different directories in my OS with the same options. There is an
> > > > > inflight IO under one mounted directory. And then I unmount
> > > > > another
> > > > > mounted directory with force. The inflight IO ends up with
> > > > > "Unknown
> > > > > error 512", which is ERESTARTSYS.
> > > > >
> > > >
> > > > All of the above is well known. That's because forced umount
> > > > affects
> > > > the entire filesystem. Why are you using it here in the first
> > > > place? It
> > > > is not intended for casual use.
> > > >
> > >
> > > While I agree Trond's above statement, the kernel is not supposed to
> > > leak error codes that high into userland. Are you seeing ERESTARTSYS
> > > being returned to system calls? If so, which ones?
> >
> > The point of forced umount is to kill all RPC calls associated with the
> > filesystem in order to unblock the umount. Basically, it triggers this
> > code before the unmount starts:
> >
> > void nfs_umount_begin(struct super_block *sb)
> > {
> >         struct nfs_server *server;
> >         struct rpc_clnt *rpc;
> >
> >         server = NFS_SB(sb);
> >         /* -EIO all pending I/O */
> >         rpc = server->client_acl;
> >         if (!IS_ERR(rpc))
> >                 rpc_killall_tasks(rpc);
> >         rpc = server->client;
> >         if (!IS_ERR(rpc))
> >                 rpc_killall_tasks(rpc);
> > }
> >
> > So yes, that does signal all the way up to the application level, and
> > it is very much intended to do so.
>
> Returning an error to userland in this situation is fine, but userland
> programs aren't really equipped to deal with error numbers in this
> range.
>
> Emphasis on the first sentence in the comment in include/linux/errno.h:
>
> -------------------8<-----------------------
> /*
>  * These should never be seen by user programs.  To return one of ERESTART*
>  * codes, signal_pending() MUST be set.  Note that ptrace can observe these
>  * at syscall exit tracing, but they will never be left for the debugged user
>  * process to see.
>  */
> #define ERESTARTSYS     512
> #define ERESTARTNOINTR  513
> #define ERESTARTNOHAND  514     /* restart if no handler.. */
> #define ENOIOCTLCMD     515     /* No ioctl command */
> #define ERESTART_RESTARTBLOCK 516 /* restart by calling sys_restart_syscall */
> #define EPROBE_DEFER    517     /* Driver requests probe retry */
> #define EOPENSTALE      518     /* open found a stale dentry */
> #define ENOPARAM        519     /* Parameter not supported */
> -------------------8<-----------------------
>
> If these values are leaking into userland, then that seems like a bug.
> --
> Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC]
  2024-02-22 11:05   ` Jeff Layton
  2024-02-22 15:20     ` Trond Myklebust
@ 2024-02-23  3:20     ` Zhitao Li
  2024-02-27 23:55       ` NeilBrown
  1 sibling, 1 reply; 11+ messages in thread
From: Zhitao Li @ 2024-02-23  3:20 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Trond Myklebust, chuck.lever@oracle.com, tom@talpey.com,
	anna@kernel.org, Dai.Ngo@oracle.com, neilb@suse.de,
	kolga@netapp.com, huangping@smartx.com, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

Thanks for Jeff's reply.

I did see  ERESTARTSYS in userland. As described in the above
"Reproduction" chapter, "dd" fails with "dd: error writing
'/mnt/test1/1G': Unknown error 512".

After strace "dd", it turns out that syscall WRITE fails with:
write(1, "4\303\31\211\316\237\333\r-\275g\370\233\374X\277\374Tb\202\24\365\220\320\16\27o3\331q\344\364"...,
1048576) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)

In fact, other syscalls related to file systems can also fail with
ERESTARTSYS in our cases, for example: mount, open, read, write and so
on.

Maybe the reason is that on forced unmount, rpc_killall_tasks() in
net/sunrpc/clnt.c will set all inflight IO with ERESTARTSYS, while no
signal gets involved. So ERESTARTSYS is not handled before entering
userspace.

Best regards,
Zhitao Li at SmartX.

On Thu, Feb 22, 2024 at 7:05 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Wed, 2024-02-21 at 13:48 +0000, Trond Myklebust wrote:
> > On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote:
> > > [You don't often get email from zhitao.li@smartx.com. Learn why this
> > > is important at https://aka.ms/LearnAboutSenderIdentification ]
> > >
> > > Hi, everyone,
> > >
> > > - Facts:
> > > I have a remote NFS export and I mount the same export on two
> > > different directories in my OS with the same options. There is an
> > > inflight IO under one mounted directory. And then I unmount another
> > > mounted directory with force. The inflight IO ends up with "Unknown
> > > error 512", which is ERESTARTSYS.
> > >
> >
> > All of the above is well known. That's because forced umount affects
> > the entire filesystem. Why are you using it here in the first place? It
> > is not intended for casual use.
> >
>
> While I agree Trond's above statement, the kernel is not supposed to
> leak error codes that high into userland. Are you seeing ERESTARTSYS
> being returned to system calls? If so, which ones?
> --
> Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC]
  2024-02-23  3:20     ` Zhitao Li
@ 2024-02-27 23:55       ` NeilBrown
  2024-02-28  3:37         ` Zhitao Li
  0 siblings, 1 reply; 11+ messages in thread
From: NeilBrown @ 2024-02-27 23:55 UTC (permalink / raw)
  To: Zhitao Li
  Cc: Jeff Layton, Trond Myklebust, chuck.lever@oracle.com,
	tom@talpey.com, anna@kernel.org, Dai.Ngo@oracle.com,
	kolga@netapp.com, huangping@smartx.com, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Fri, 23 Feb 2024, Zhitao Li wrote:
> Thanks for Jeff's reply.
> 
> I did see  ERESTARTSYS in userland. As described in the above
> "Reproduction" chapter, "dd" fails with "dd: error writing
> '/mnt/test1/1G': Unknown error 512".
> 
> After strace "dd", it turns out that syscall WRITE fails with:
> write(1, "4\303\31\211\316\237\333\r-\275g\370\233\374X\277\374Tb\202\24\365\220\320\16\27o3\331q\344\364"...,
> 1048576) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> 
> In fact, other syscalls related to file systems can also fail with
> ERESTARTSYS in our cases, for example: mount, open, read, write and so
> on.
> 
> Maybe the reason is that on forced unmount, rpc_killall_tasks() in
> net/sunrpc/clnt.c will set all inflight IO with ERESTARTSYS, while no
> signal gets involved. So ERESTARTSYS is not handled before entering
> userspace.
> 
> Best regards,
> Zhitao Li at SmartX.
> 
> On Thu, Feb 22, 2024 at 7:05 PM Jeff Layton <jlayton@kernel.org> wrote:
> >
> > On Wed, 2024-02-21 at 13:48 +0000, Trond Myklebust wrote:
> > > On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote:
> > > > [You don't often get email from zhitao.li@smartx.com. Learn why this
> > > > is important at https://aka.ms/LearnAboutSenderIdentification ]
> > > >
> > > > Hi, everyone,
> > > >
> > > > - Facts:
> > > > I have a remote NFS export and I mount the same export on two
> > > > different directories in my OS with the same options. There is an
> > > > inflight IO under one mounted directory. And then I unmount another
> > > > mounted directory with force. The inflight IO ends up with "Unknown
> > > > error 512", which is ERESTARTSYS.
> > > >
> > >
> > > All of the above is well known. That's because forced umount affects
> > > the entire filesystem. Why are you using it here in the first place? It
> > > is not intended for casual use.
> > >
> >
> > While I agree Trond's above statement, the kernel is not supposed to
> > leak error codes that high into userland. Are you seeing ERESTARTSYS
> > being returned to system calls? If so, which ones?
> > --
> > Jeff Layton <jlayton@kernel.org>
> 

I think this bug was introduced by 
Commit ae67bd3821bb ("SUNRPC: Fix up task signalling")
in Linux v5.2.

Prior to that commit, rpc_killall_tasks set the error to -EIO.
After that commit it calls rpc_signal_task which always uses
-ERESTARTSYS.

This might be an appropriate fix.  Can you please test and confirm?

diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index 2d61987b3545..ed3a116efd5d 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -222,7 +222,7 @@ void		rpc_put_task(struct rpc_task *);
 void		rpc_put_task_async(struct rpc_task *);
 bool		rpc_task_set_rpc_status(struct rpc_task *task, int rpc_status);
 void		rpc_task_try_cancel(struct rpc_task *task, int error);
-void		rpc_signal_task(struct rpc_task *);
+void		rpc_signal_task(struct rpc_task *, int);
 void		rpc_exit_task(struct rpc_task *);
 void		rpc_exit(struct rpc_task *, int);
 void		rpc_release_calldata(const struct rpc_call_ops *, void *);
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index cda0935a68c9..cdbdfae13030 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -895,7 +895,7 @@ void rpc_killall_tasks(struct rpc_clnt *clnt)
 	trace_rpc_clnt_killall(clnt);
 	spin_lock(&clnt->cl_lock);
 	list_for_each_entry(rovr, &clnt->cl_tasks, tk_task)
-		rpc_signal_task(rovr);
+		rpc_signal_task(rovr, -EIO);
 	spin_unlock(&clnt->cl_lock);
 }
 EXPORT_SYMBOL_GPL(rpc_killall_tasks);
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 6debf4fd42d4..e88621881036 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -852,14 +852,14 @@ void rpc_exit_task(struct rpc_task *task)
 	}
 }
 
-void rpc_signal_task(struct rpc_task *task)
+void rpc_signal_task(struct rpc_task *task, int sig)
 {
 	struct rpc_wait_queue *queue;
 
 	if (!RPC_IS_ACTIVATED(task))
 		return;
 
-	if (!rpc_task_set_rpc_status(task, -ERESTARTSYS))
+	if (!rpc_task_set_rpc_status(task, sig))
 		return;
 	trace_rpc_task_signalled(task, task->tk_action);
 	set_bit(RPC_TASK_SIGNALLED, &task->tk_runstate);
@@ -992,7 +992,7 @@ static void __rpc_execute(struct rpc_task *task)
 			 * clean up after sleeping on some queue, we don't
 			 * break the loop here, but go around once more.
 			 */
-			rpc_signal_task(task);
+			rpc_signal_task(task, -ERESTARTSYS);
 		}
 		trace_rpc_task_sync_wake(task, task->tk_action);
 	}

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC]
  2024-02-27 23:55       ` NeilBrown
@ 2024-02-28  3:37         ` Zhitao Li
  0 siblings, 0 replies; 11+ messages in thread
From: Zhitao Li @ 2024-02-28  3:37 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jeff Layton, Trond Myklebust, chuck.lever@oracle.com,
	tom@talpey.com, anna@kernel.org, Dai.Ngo@oracle.com,
	kolga@netapp.com, huangping@smartx.com, linux-nfs@vger.kernel.org,
	linux-kernel@vger.kernel.org

Thanks for your help.

The fix does work in my cases. "dd" fails with EIO when its mount
point or another mount point with the same export is unmounted with
force.

1. Another mount point is unmounted with force:
[root@v6 zhitaoli]#   dd if=/dev/urandom of=/mnt/test/1G bs=1M
count=1024 oflag=direct
dd: error writing '/mnt/test/1G': Input/output error
100+0 records in
99+0 records out
103809024 bytes (104 MB, 99 MiB) copied, 5.05276 s, 20.5 MB/s

2. The mount point is unmounted with force:
[root@v6 zhitaoli]#   dd if=/dev/urandom of=/mnt/test/1G bs=1M
count=1024 oflag=direct
dd: error writing '/mnt/test/1G': Input/output error
60+0 records in
59+0 records out
61865984 bytes (62 MB, 59 MiB) copied, 2.13265 s, 29.0 MB/s

Best regards
Zhitao Li, at SmartX

On Wed, Feb 28, 2024 at 7:55 AM NeilBrown <neilb@suse.de> wrote:
>
> On Fri, 23 Feb 2024, Zhitao Li wrote:
> > Thanks for Jeff's reply.
> >
> > I did see  ERESTARTSYS in userland. As described in the above
> > "Reproduction" chapter, "dd" fails with "dd: error writing
> > '/mnt/test1/1G': Unknown error 512".
> >
> > After strace "dd", it turns out that syscall WRITE fails with:
> > write(1, "4\303\31\211\316\237\333\r-\275g\370\233\374X\277\374Tb\202\24\365\220\320\16\27o3\331q\344\364"...,
> > 1048576) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
> >
> > In fact, other syscalls related to file systems can also fail with
> > ERESTARTSYS in our cases, for example: mount, open, read, write and so
> > on.
> >
> > Maybe the reason is that on forced unmount, rpc_killall_tasks() in
> > net/sunrpc/clnt.c will set all inflight IO with ERESTARTSYS, while no
> > signal gets involved. So ERESTARTSYS is not handled before entering
> > userspace.
> >
> > Best regards,
> > Zhitao Li at SmartX.
> >
> > On Thu, Feb 22, 2024 at 7:05 PM Jeff Layton <jlayton@kernel.org> wrote:
> > >
> > > On Wed, 2024-02-21 at 13:48 +0000, Trond Myklebust wrote:
> > > > On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote:
> > > > > [You don't often get email from zhitao.li@smartx.com. Learn why this
> > > > > is important at https://aka.ms/LearnAboutSenderIdentification ]
> > > > >
> > > > > Hi, everyone,
> > > > >
> > > > > - Facts:
> > > > > I have a remote NFS export and I mount the same export on two
> > > > > different directories in my OS with the same options. There is an
> > > > > inflight IO under one mounted directory. And then I unmount another
> > > > > mounted directory with force. The inflight IO ends up with "Unknown
> > > > > error 512", which is ERESTARTSYS.
> > > > >
> > > >
> > > > All of the above is well known. That's because forced umount affects
> > > > the entire filesystem. Why are you using it here in the first place? It
> > > > is not intended for casual use.
> > > >
> > >
> > > While I agree Trond's above statement, the kernel is not supposed to
> > > leak error codes that high into userland. Are you seeing ERESTARTSYS
> > > being returned to system calls? If so, which ones?
> > > --
> > > Jeff Layton <jlayton@kernel.org>
> >
>
> I think this bug was introduced by
> Commit ae67bd3821bb ("SUNRPC: Fix up task signalling")
> in Linux v5.2.
>
> Prior to that commit, rpc_killall_tasks set the error to -EIO.
> After that commit it calls rpc_signal_task which always uses
> -ERESTARTSYS.
>
> This might be an appropriate fix.  Can you please test and confirm?
>
> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
> index 2d61987b3545..ed3a116efd5d 100644
> --- a/include/linux/sunrpc/sched.h
> +++ b/include/linux/sunrpc/sched.h
> @@ -222,7 +222,7 @@ void                rpc_put_task(struct rpc_task *);
>  void           rpc_put_task_async(struct rpc_task *);
>  bool           rpc_task_set_rpc_status(struct rpc_task *task, int rpc_status);
>  void           rpc_task_try_cancel(struct rpc_task *task, int error);
> -void           rpc_signal_task(struct rpc_task *);
> +void           rpc_signal_task(struct rpc_task *, int);
>  void           rpc_exit_task(struct rpc_task *);
>  void           rpc_exit(struct rpc_task *, int);
>  void           rpc_release_calldata(const struct rpc_call_ops *, void *);
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index cda0935a68c9..cdbdfae13030 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -895,7 +895,7 @@ void rpc_killall_tasks(struct rpc_clnt *clnt)
>         trace_rpc_clnt_killall(clnt);
>         spin_lock(&clnt->cl_lock);
>         list_for_each_entry(rovr, &clnt->cl_tasks, tk_task)
> -               rpc_signal_task(rovr);
> +               rpc_signal_task(rovr, -EIO);
>         spin_unlock(&clnt->cl_lock);
>  }
>  EXPORT_SYMBOL_GPL(rpc_killall_tasks);
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index 6debf4fd42d4..e88621881036 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -852,14 +852,14 @@ void rpc_exit_task(struct rpc_task *task)
>         }
>  }
>
> -void rpc_signal_task(struct rpc_task *task)
> +void rpc_signal_task(struct rpc_task *task, int sig)
>  {
>         struct rpc_wait_queue *queue;
>
>         if (!RPC_IS_ACTIVATED(task))
>                 return;
>
> -       if (!rpc_task_set_rpc_status(task, -ERESTARTSYS))
> +       if (!rpc_task_set_rpc_status(task, sig))
>                 return;
>         trace_rpc_task_signalled(task, task->tk_action);
>         set_bit(RPC_TASK_SIGNALLED, &task->tk_runstate);
> @@ -992,7 +992,7 @@ static void __rpc_execute(struct rpc_task *task)
>                          * clean up after sleeping on some queue, we don't
>                          * break the loop here, but go around once more.
>                          */
> -                       rpc_signal_task(task);
> +                       rpc_signal_task(task, -ERESTARTSYS);
>                 }
>                 trace_rpc_task_sync_wake(task, task->tk_action);
>         }

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-02-28  3:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-21  8:20 PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC] Zhitao Li
2024-02-21 13:48 ` Trond Myklebust
2024-02-22  3:05   ` Zhitao Li
2024-02-22 11:05   ` Jeff Layton
2024-02-22 15:20     ` Trond Myklebust
2024-02-23  3:44       ` Zhitao Li
2024-02-23 10:31       ` Jeff Layton
2024-02-27  2:35         ` Zhitao Li
2024-02-23  3:20     ` Zhitao Li
2024-02-27 23:55       ` NeilBrown
2024-02-28  3:37         ` Zhitao Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox