* PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC]
@ 2024-02-21 8:20 Zhitao Li
2024-02-21 13:48 ` Trond Myklebust
0 siblings, 1 reply; 11+ messages in thread
From: Zhitao Li @ 2024-02-21 8:20 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, Jeff Layton,
Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, linux-kernel, Ping Huang
Hi, everyone,
- Facts:
I have a remote NFS export and I mount the same export on two
different directories in my OS with the same options. There is an
inflight IO under one mounted directory. And then I unmount another
mounted directory with force. The inflight IO ends up with "Unknown
error 512", which is ERESTARTSYS.
OS: Linux kernel v6.7.0
NFS mount options: vers=4.1
- My speculation:
When the same export is mounted on different directories with the same
options, superblock and sunrpc_client will be shared. Unmount with
force will kill all rpc_tasks with ERESTARTSYS in rpc_killall_tasks().
However, no signal gets involved in this case. So ERESTARTSYS is not
handled before entering user mode.
I think there are two unexpected points here:
1. The inflight IO should not fail when I unmount another directory,
though the two directories share the same export.
2. "ERESTARTSYS" should not be seen in user space. EIO may be better.
- Reproduction:
1. Prepare some NFS export, nfsd or nfs-ganesha. For example, the
export is "ip:/export_path".
2. On the latest stable mainstream Linux kernel v6.7.0, mount the
export into two different directories with the same options:
mount -t nfs -o vers=4.1 ip:/export_path /mnt/test1
mount -t nfs -o vers=4.1 ip:/export_path /mnt/test2
3. Start an inflight IO in "/mnt/test1":
dd if=/dev/urandom of=/mnt/test1/1G bs=1M count=1024 oflag=direct
4. Umount "/mnt/test2" with force when IO in step 3 is going:
umount -f /mnt/test2
5. The "dd" is expected to fail with following information:
# dd if=/dev/urandom of=/mnt/test1/1G bs=1M count=1024 oflag=direct
dd: error writing '/mnt/test1/1G': Unknown error 512
214+0 records in
213+0 records out
223346688 bytes (223 MB, 213 MiB) copied, 7.87017 s, 28.4 MB/s.
- Helpful links
1. v6.7.0 rpc_killall_tasks():
https://elixir.bootlin.com/linux/v6.7/source/net/sunrpc/clnt.c#L869
2. COMMIT "SUNRPC: Fix up task signalling v5.2-rc1" changes the error
code of rpc_tasks in rpc_killall_tasks() from EIO to ERESTARTSYS. The
link is https://github.com/torvalds/linux/commit/ae67bd3821bb0a54d97e7883d211196637d487a9?diff=split&w=0
Looking forward to your early reply :)
Best regards,
Zhitao Li, in SmartX.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC] 2024-02-21 8:20 PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC] Zhitao Li @ 2024-02-21 13:48 ` Trond Myklebust 2024-02-22 3:05 ` Zhitao Li 2024-02-22 11:05 ` Jeff Layton 0 siblings, 2 replies; 11+ messages in thread From: Trond Myklebust @ 2024-02-21 13:48 UTC (permalink / raw) To: chuck.lever@oracle.com, zhitao.li@smartx.com, tom@talpey.com, anna@kernel.org, Dai.Ngo@oracle.com, jlayton@kernel.org, neilb@suse.de, kolga@netapp.com Cc: huangping@smartx.com, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote: > [You don't often get email from zhitao.li@smartx.com. Learn why this > is important at https://aka.ms/LearnAboutSenderIdentification ] > > Hi, everyone, > > - Facts: > I have a remote NFS export and I mount the same export on two > different directories in my OS with the same options. There is an > inflight IO under one mounted directory. And then I unmount another > mounted directory with force. The inflight IO ends up with "Unknown > error 512", which is ERESTARTSYS. > All of the above is well known. That's because forced umount affects the entire filesystem. Why are you using it here in the first place? It is not intended for casual use. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC] 2024-02-21 13:48 ` Trond Myklebust @ 2024-02-22 3:05 ` Zhitao Li 2024-02-22 11:05 ` Jeff Layton 1 sibling, 0 replies; 11+ messages in thread From: Zhitao Li @ 2024-02-22 3:05 UTC (permalink / raw) To: Trond Myklebust Cc: chuck.lever@oracle.com, tom@talpey.com, anna@kernel.org, Dai.Ngo@oracle.com, jlayton@kernel.org, neilb@suse.de, kolga@netapp.com, huangping@smartx.com, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org Forced unmount is used in a casual way to some degree by our team. We will adjust the usage of forced unmount and multiple mount points for the same export. As you said, all of the above is well known. Actually, this phenomenon is unexpected for us. Could you give some more information? Are they issues? And if they are issues, is there any plan to fix them? Best regards, Zhitao Li, in SmartX. On Wed, Feb 21, 2024 at 9:48 PM Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote: > > [You don't often get email from zhitao.li@smartx.com. Learn why this > > is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > Hi, everyone, > > > > - Facts: > > I have a remote NFS export and I mount the same export on two > > different directories in my OS with the same options. There is an > > inflight IO under one mounted directory. And then I unmount another > > mounted directory with force. The inflight IO ends up with "Unknown > > error 512", which is ERESTARTSYS. > > > > All of the above is well known. That's because forced umount affects > the entire filesystem. Why are you using it here in the first place? It > is not intended for casual use. > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC] 2024-02-21 13:48 ` Trond Myklebust 2024-02-22 3:05 ` Zhitao Li @ 2024-02-22 11:05 ` Jeff Layton 2024-02-22 15:20 ` Trond Myklebust 2024-02-23 3:20 ` Zhitao Li 1 sibling, 2 replies; 11+ messages in thread From: Jeff Layton @ 2024-02-22 11:05 UTC (permalink / raw) To: Trond Myklebust, chuck.lever@oracle.com, zhitao.li@smartx.com, tom@talpey.com, anna@kernel.org, Dai.Ngo@oracle.com, neilb@suse.de, kolga@netapp.com Cc: huangping@smartx.com, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Wed, 2024-02-21 at 13:48 +0000, Trond Myklebust wrote: > On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote: > > [You don't often get email from zhitao.li@smartx.com. Learn why this > > is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > Hi, everyone, > > > > - Facts: > > I have a remote NFS export and I mount the same export on two > > different directories in my OS with the same options. There is an > > inflight IO under one mounted directory. And then I unmount another > > mounted directory with force. The inflight IO ends up with "Unknown > > error 512", which is ERESTARTSYS. > > > > All of the above is well known. That's because forced umount affects > the entire filesystem. Why are you using it here in the first place? It > is not intended for casual use. > While I agree Trond's above statement, the kernel is not supposed to leak error codes that high into userland. Are you seeing ERESTARTSYS being returned to system calls? If so, which ones? -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC] 2024-02-22 11:05 ` Jeff Layton @ 2024-02-22 15:20 ` Trond Myklebust 2024-02-23 3:44 ` Zhitao Li 2024-02-23 10:31 ` Jeff Layton 2024-02-23 3:20 ` Zhitao Li 1 sibling, 2 replies; 11+ messages in thread From: Trond Myklebust @ 2024-02-22 15:20 UTC (permalink / raw) To: chuck.lever@oracle.com, zhitao.li@smartx.com, kolga@netapp.com, anna@kernel.org, tom@talpey.com, jlayton@kernel.org, neilb@suse.de, Dai.Ngo@oracle.com Cc: huangping@smartx.com, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Thu, 2024-02-22 at 06:05 -0500, Jeff Layton wrote: > On Wed, 2024-02-21 at 13:48 +0000, Trond Myklebust wrote: > > On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote: > > > [You don't often get email from zhitao.li@smartx.com. Learn why > > > this > > > is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > > > Hi, everyone, > > > > > > - Facts: > > > I have a remote NFS export and I mount the same export on two > > > different directories in my OS with the same options. There is an > > > inflight IO under one mounted directory. And then I unmount > > > another > > > mounted directory with force. The inflight IO ends up with > > > "Unknown > > > error 512", which is ERESTARTSYS. > > > > > > > All of the above is well known. That's because forced umount > > affects > > the entire filesystem. Why are you using it here in the first > > place? It > > is not intended for casual use. > > > > While I agree Trond's above statement, the kernel is not supposed to > leak error codes that high into userland. Are you seeing ERESTARTSYS > being returned to system calls? If so, which ones? The point of forced umount is to kill all RPC calls associated with the filesystem in order to unblock the umount. Basically, it triggers this code before the unmount starts: void nfs_umount_begin(struct super_block *sb) { struct nfs_server *server; struct rpc_clnt *rpc; server = NFS_SB(sb); /* -EIO all pending I/O */ rpc = server->client_acl; if (!IS_ERR(rpc)) rpc_killall_tasks(rpc); rpc = server->client; if (!IS_ERR(rpc)) rpc_killall_tasks(rpc); } So yes, that does signal all the way up to the application level, and it is very much intended to do so. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC] 2024-02-22 15:20 ` Trond Myklebust @ 2024-02-23 3:44 ` Zhitao Li 2024-02-23 10:31 ` Jeff Layton 1 sibling, 0 replies; 11+ messages in thread From: Zhitao Li @ 2024-02-23 3:44 UTC (permalink / raw) To: Trond Myklebust Cc: chuck.lever@oracle.com, kolga@netapp.com, anna@kernel.org, tom@talpey.com, jlayton@kernel.org, neilb@suse.de, Dai.Ngo@oracle.com, huangping@smartx.com, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org Thanks for Trond's reply. ERESTARTSYS is an unknown error in application level. Maybe EIO or EINTR can be more suitable. Best regards, Zhitao Li, at SmartX On Thu, Feb 22, 2024 at 11:20 PM Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Thu, 2024-02-22 at 06:05 -0500, Jeff Layton wrote: > > On Wed, 2024-02-21 at 13:48 +0000, Trond Myklebust wrote: > > > On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote: > > > > [You don't often get email from zhitao.li@smartx.com. Learn why > > > > this > > > > is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > > > > > Hi, everyone, > > > > > > > > - Facts: > > > > I have a remote NFS export and I mount the same export on two > > > > different directories in my OS with the same options. There is an > > > > inflight IO under one mounted directory. And then I unmount > > > > another > > > > mounted directory with force. The inflight IO ends up with > > > > "Unknown > > > > error 512", which is ERESTARTSYS. > > > > > > > > > > All of the above is well known. That's because forced umount > > > affects > > > the entire filesystem. Why are you using it here in the first > > > place? It > > > is not intended for casual use. > > > > > > > While I agree Trond's above statement, the kernel is not supposed to > > leak error codes that high into userland. Are you seeing ERESTARTSYS > > being returned to system calls? If so, which ones? > > The point of forced umount is to kill all RPC calls associated with the > filesystem in order to unblock the umount. Basically, it triggers this > code before the unmount starts: > > void nfs_umount_begin(struct super_block *sb) > { > struct nfs_server *server; > struct rpc_clnt *rpc; > > server = NFS_SB(sb); > /* -EIO all pending I/O */ > rpc = server->client_acl; > if (!IS_ERR(rpc)) > rpc_killall_tasks(rpc); > rpc = server->client; > if (!IS_ERR(rpc)) > rpc_killall_tasks(rpc); > } > > So yes, that does signal all the way up to the application level, and > it is very much intended to do so. > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC] 2024-02-22 15:20 ` Trond Myklebust 2024-02-23 3:44 ` Zhitao Li @ 2024-02-23 10:31 ` Jeff Layton 2024-02-27 2:35 ` Zhitao Li 1 sibling, 1 reply; 11+ messages in thread From: Jeff Layton @ 2024-02-23 10:31 UTC (permalink / raw) To: Trond Myklebust, chuck.lever@oracle.com, zhitao.li@smartx.com, kolga@netapp.com, anna@kernel.org, tom@talpey.com, neilb@suse.de, Dai.Ngo@oracle.com Cc: huangping@smartx.com, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Thu, 2024-02-22 at 15:20 +0000, Trond Myklebust wrote: > On Thu, 2024-02-22 at 06:05 -0500, Jeff Layton wrote: > > On Wed, 2024-02-21 at 13:48 +0000, Trond Myklebust wrote: > > > On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote: > > > > [You don't often get email from zhitao.li@smartx.com. Learn why > > > > this > > > > is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > > > > > Hi, everyone, > > > > > > > > - Facts: > > > > I have a remote NFS export and I mount the same export on two > > > > different directories in my OS with the same options. There is an > > > > inflight IO under one mounted directory. And then I unmount > > > > another > > > > mounted directory with force. The inflight IO ends up with > > > > "Unknown > > > > error 512", which is ERESTARTSYS. > > > > > > > > > > All of the above is well known. That's because forced umount > > > affects > > > the entire filesystem. Why are you using it here in the first > > > place? It > > > is not intended for casual use. > > > > > > > While I agree Trond's above statement, the kernel is not supposed to > > leak error codes that high into userland. Are you seeing ERESTARTSYS > > being returned to system calls? If so, which ones? > > The point of forced umount is to kill all RPC calls associated with the > filesystem in order to unblock the umount. Basically, it triggers this > code before the unmount starts: > > void nfs_umount_begin(struct super_block *sb) > { > struct nfs_server *server; > struct rpc_clnt *rpc; > > server = NFS_SB(sb); > /* -EIO all pending I/O */ > rpc = server->client_acl; > if (!IS_ERR(rpc)) > rpc_killall_tasks(rpc); > rpc = server->client; > if (!IS_ERR(rpc)) > rpc_killall_tasks(rpc); > } > > So yes, that does signal all the way up to the application level, and > it is very much intended to do so. Returning an error to userland in this situation is fine, but userland programs aren't really equipped to deal with error numbers in this range. Emphasis on the first sentence in the comment in include/linux/errno.h: -------------------8<----------------------- /* * These should never be seen by user programs. To return one of ERESTART* * codes, signal_pending() MUST be set. Note that ptrace can observe these * at syscall exit tracing, but they will never be left for the debugged user * process to see. */ #define ERESTARTSYS 512 #define ERESTARTNOINTR 513 #define ERESTARTNOHAND 514 /* restart if no handler.. */ #define ENOIOCTLCMD 515 /* No ioctl command */ #define ERESTART_RESTARTBLOCK 516 /* restart by calling sys_restart_syscall */ #define EPROBE_DEFER 517 /* Driver requests probe retry */ #define EOPENSTALE 518 /* open found a stale dentry */ #define ENOPARAM 519 /* Parameter not supported */ -------------------8<----------------------- If these values are leaking into userland, then that seems like a bug. -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC] 2024-02-23 10:31 ` Jeff Layton @ 2024-02-27 2:35 ` Zhitao Li 0 siblings, 0 replies; 11+ messages in thread From: Zhitao Li @ 2024-02-27 2:35 UTC (permalink / raw) To: Trond Myklebust, Jeff Layton Cc: chuck.lever@oracle.com, kolga@netapp.com, anna@kernel.org, tom@talpey.com, neilb@suse.de, Dai.Ngo@oracle.com, huangping@smartx.com, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org Is there any plan for this ERESTARTSYS leak issue? -- Zhitao Li, at SmartX On Fri, Feb 23, 2024 at 6:31 PM Jeff Layton <jlayton@kernel.org> wrote: > > On Thu, 2024-02-22 at 15:20 +0000, Trond Myklebust wrote: > > On Thu, 2024-02-22 at 06:05 -0500, Jeff Layton wrote: > > > On Wed, 2024-02-21 at 13:48 +0000, Trond Myklebust wrote: > > > > On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote: > > > > > [You don't often get email from zhitao.li@smartx.com. Learn why > > > > > this > > > > > is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > > > > > > > Hi, everyone, > > > > > > > > > > - Facts: > > > > > I have a remote NFS export and I mount the same export on two > > > > > different directories in my OS with the same options. There is an > > > > > inflight IO under one mounted directory. And then I unmount > > > > > another > > > > > mounted directory with force. The inflight IO ends up with > > > > > "Unknown > > > > > error 512", which is ERESTARTSYS. > > > > > > > > > > > > > All of the above is well known. That's because forced umount > > > > affects > > > > the entire filesystem. Why are you using it here in the first > > > > place? It > > > > is not intended for casual use. > > > > > > > > > > While I agree Trond's above statement, the kernel is not supposed to > > > leak error codes that high into userland. Are you seeing ERESTARTSYS > > > being returned to system calls? If so, which ones? > > > > The point of forced umount is to kill all RPC calls associated with the > > filesystem in order to unblock the umount. Basically, it triggers this > > code before the unmount starts: > > > > void nfs_umount_begin(struct super_block *sb) > > { > > struct nfs_server *server; > > struct rpc_clnt *rpc; > > > > server = NFS_SB(sb); > > /* -EIO all pending I/O */ > > rpc = server->client_acl; > > if (!IS_ERR(rpc)) > > rpc_killall_tasks(rpc); > > rpc = server->client; > > if (!IS_ERR(rpc)) > > rpc_killall_tasks(rpc); > > } > > > > So yes, that does signal all the way up to the application level, and > > it is very much intended to do so. > > Returning an error to userland in this situation is fine, but userland > programs aren't really equipped to deal with error numbers in this > range. > > Emphasis on the first sentence in the comment in include/linux/errno.h: > > -------------------8<----------------------- > /* > * These should never be seen by user programs. To return one of ERESTART* > * codes, signal_pending() MUST be set. Note that ptrace can observe these > * at syscall exit tracing, but they will never be left for the debugged user > * process to see. > */ > #define ERESTARTSYS 512 > #define ERESTARTNOINTR 513 > #define ERESTARTNOHAND 514 /* restart if no handler.. */ > #define ENOIOCTLCMD 515 /* No ioctl command */ > #define ERESTART_RESTARTBLOCK 516 /* restart by calling sys_restart_syscall */ > #define EPROBE_DEFER 517 /* Driver requests probe retry */ > #define EOPENSTALE 518 /* open found a stale dentry */ > #define ENOPARAM 519 /* Parameter not supported */ > -------------------8<----------------------- > > If these values are leaking into userland, then that seems like a bug. > -- > Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC] 2024-02-22 11:05 ` Jeff Layton 2024-02-22 15:20 ` Trond Myklebust @ 2024-02-23 3:20 ` Zhitao Li 2024-02-27 23:55 ` NeilBrown 1 sibling, 1 reply; 11+ messages in thread From: Zhitao Li @ 2024-02-23 3:20 UTC (permalink / raw) To: Jeff Layton Cc: Trond Myklebust, chuck.lever@oracle.com, tom@talpey.com, anna@kernel.org, Dai.Ngo@oracle.com, neilb@suse.de, kolga@netapp.com, huangping@smartx.com, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org Thanks for Jeff's reply. I did see ERESTARTSYS in userland. As described in the above "Reproduction" chapter, "dd" fails with "dd: error writing '/mnt/test1/1G': Unknown error 512". After strace "dd", it turns out that syscall WRITE fails with: write(1, "4\303\31\211\316\237\333\r-\275g\370\233\374X\277\374Tb\202\24\365\220\320\16\27o3\331q\344\364"..., 1048576) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) In fact, other syscalls related to file systems can also fail with ERESTARTSYS in our cases, for example: mount, open, read, write and so on. Maybe the reason is that on forced unmount, rpc_killall_tasks() in net/sunrpc/clnt.c will set all inflight IO with ERESTARTSYS, while no signal gets involved. So ERESTARTSYS is not handled before entering userspace. Best regards, Zhitao Li at SmartX. On Thu, Feb 22, 2024 at 7:05 PM Jeff Layton <jlayton@kernel.org> wrote: > > On Wed, 2024-02-21 at 13:48 +0000, Trond Myklebust wrote: > > On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote: > > > [You don't often get email from zhitao.li@smartx.com. Learn why this > > > is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > > > Hi, everyone, > > > > > > - Facts: > > > I have a remote NFS export and I mount the same export on two > > > different directories in my OS with the same options. There is an > > > inflight IO under one mounted directory. And then I unmount another > > > mounted directory with force. The inflight IO ends up with "Unknown > > > error 512", which is ERESTARTSYS. > > > > > > > All of the above is well known. That's because forced umount affects > > the entire filesystem. Why are you using it here in the first place? It > > is not intended for casual use. > > > > While I agree Trond's above statement, the kernel is not supposed to > leak error codes that high into userland. Are you seeing ERESTARTSYS > being returned to system calls? If so, which ones? > -- > Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC] 2024-02-23 3:20 ` Zhitao Li @ 2024-02-27 23:55 ` NeilBrown 2024-02-28 3:37 ` Zhitao Li 0 siblings, 1 reply; 11+ messages in thread From: NeilBrown @ 2024-02-27 23:55 UTC (permalink / raw) To: Zhitao Li Cc: Jeff Layton, Trond Myklebust, chuck.lever@oracle.com, tom@talpey.com, anna@kernel.org, Dai.Ngo@oracle.com, kolga@netapp.com, huangping@smartx.com, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org On Fri, 23 Feb 2024, Zhitao Li wrote: > Thanks for Jeff's reply. > > I did see ERESTARTSYS in userland. As described in the above > "Reproduction" chapter, "dd" fails with "dd: error writing > '/mnt/test1/1G': Unknown error 512". > > After strace "dd", it turns out that syscall WRITE fails with: > write(1, "4\303\31\211\316\237\333\r-\275g\370\233\374X\277\374Tb\202\24\365\220\320\16\27o3\331q\344\364"..., > 1048576) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) > > In fact, other syscalls related to file systems can also fail with > ERESTARTSYS in our cases, for example: mount, open, read, write and so > on. > > Maybe the reason is that on forced unmount, rpc_killall_tasks() in > net/sunrpc/clnt.c will set all inflight IO with ERESTARTSYS, while no > signal gets involved. So ERESTARTSYS is not handled before entering > userspace. > > Best regards, > Zhitao Li at SmartX. > > On Thu, Feb 22, 2024 at 7:05 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > On Wed, 2024-02-21 at 13:48 +0000, Trond Myklebust wrote: > > > On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote: > > > > [You don't often get email from zhitao.li@smartx.com. Learn why this > > > > is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > > > > > Hi, everyone, > > > > > > > > - Facts: > > > > I have a remote NFS export and I mount the same export on two > > > > different directories in my OS with the same options. There is an > > > > inflight IO under one mounted directory. And then I unmount another > > > > mounted directory with force. The inflight IO ends up with "Unknown > > > > error 512", which is ERESTARTSYS. > > > > > > > > > > All of the above is well known. That's because forced umount affects > > > the entire filesystem. Why are you using it here in the first place? It > > > is not intended for casual use. > > > > > > > While I agree Trond's above statement, the kernel is not supposed to > > leak error codes that high into userland. Are you seeing ERESTARTSYS > > being returned to system calls? If so, which ones? > > -- > > Jeff Layton <jlayton@kernel.org> > I think this bug was introduced by Commit ae67bd3821bb ("SUNRPC: Fix up task signalling") in Linux v5.2. Prior to that commit, rpc_killall_tasks set the error to -EIO. After that commit it calls rpc_signal_task which always uses -ERESTARTSYS. This might be an appropriate fix. Can you please test and confirm? diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index 2d61987b3545..ed3a116efd5d 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -222,7 +222,7 @@ void rpc_put_task(struct rpc_task *); void rpc_put_task_async(struct rpc_task *); bool rpc_task_set_rpc_status(struct rpc_task *task, int rpc_status); void rpc_task_try_cancel(struct rpc_task *task, int error); -void rpc_signal_task(struct rpc_task *); +void rpc_signal_task(struct rpc_task *, int); void rpc_exit_task(struct rpc_task *); void rpc_exit(struct rpc_task *, int); void rpc_release_calldata(const struct rpc_call_ops *, void *); diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index cda0935a68c9..cdbdfae13030 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -895,7 +895,7 @@ void rpc_killall_tasks(struct rpc_clnt *clnt) trace_rpc_clnt_killall(clnt); spin_lock(&clnt->cl_lock); list_for_each_entry(rovr, &clnt->cl_tasks, tk_task) - rpc_signal_task(rovr); + rpc_signal_task(rovr, -EIO); spin_unlock(&clnt->cl_lock); } EXPORT_SYMBOL_GPL(rpc_killall_tasks); diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 6debf4fd42d4..e88621881036 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -852,14 +852,14 @@ void rpc_exit_task(struct rpc_task *task) } } -void rpc_signal_task(struct rpc_task *task) +void rpc_signal_task(struct rpc_task *task, int sig) { struct rpc_wait_queue *queue; if (!RPC_IS_ACTIVATED(task)) return; - if (!rpc_task_set_rpc_status(task, -ERESTARTSYS)) + if (!rpc_task_set_rpc_status(task, sig)) return; trace_rpc_task_signalled(task, task->tk_action); set_bit(RPC_TASK_SIGNALLED, &task->tk_runstate); @@ -992,7 +992,7 @@ static void __rpc_execute(struct rpc_task *task) * clean up after sleeping on some queue, we don't * break the loop here, but go around once more. */ - rpc_signal_task(task); + rpc_signal_task(task, -ERESTARTSYS); } trace_rpc_task_sync_wake(task, task->tk_action); } ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC] 2024-02-27 23:55 ` NeilBrown @ 2024-02-28 3:37 ` Zhitao Li 0 siblings, 0 replies; 11+ messages in thread From: Zhitao Li @ 2024-02-28 3:37 UTC (permalink / raw) To: NeilBrown Cc: Jeff Layton, Trond Myklebust, chuck.lever@oracle.com, tom@talpey.com, anna@kernel.org, Dai.Ngo@oracle.com, kolga@netapp.com, huangping@smartx.com, linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org Thanks for your help. The fix does work in my cases. "dd" fails with EIO when its mount point or another mount point with the same export is unmounted with force. 1. Another mount point is unmounted with force: [root@v6 zhitaoli]# dd if=/dev/urandom of=/mnt/test/1G bs=1M count=1024 oflag=direct dd: error writing '/mnt/test/1G': Input/output error 100+0 records in 99+0 records out 103809024 bytes (104 MB, 99 MiB) copied, 5.05276 s, 20.5 MB/s 2. The mount point is unmounted with force: [root@v6 zhitaoli]# dd if=/dev/urandom of=/mnt/test/1G bs=1M count=1024 oflag=direct dd: error writing '/mnt/test/1G': Input/output error 60+0 records in 59+0 records out 61865984 bytes (62 MB, 59 MiB) copied, 2.13265 s, 29.0 MB/s Best regards Zhitao Li, at SmartX On Wed, Feb 28, 2024 at 7:55 AM NeilBrown <neilb@suse.de> wrote: > > On Fri, 23 Feb 2024, Zhitao Li wrote: > > Thanks for Jeff's reply. > > > > I did see ERESTARTSYS in userland. As described in the above > > "Reproduction" chapter, "dd" fails with "dd: error writing > > '/mnt/test1/1G': Unknown error 512". > > > > After strace "dd", it turns out that syscall WRITE fails with: > > write(1, "4\303\31\211\316\237\333\r-\275g\370\233\374X\277\374Tb\202\24\365\220\320\16\27o3\331q\344\364"..., > > 1048576) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) > > > > In fact, other syscalls related to file systems can also fail with > > ERESTARTSYS in our cases, for example: mount, open, read, write and so > > on. > > > > Maybe the reason is that on forced unmount, rpc_killall_tasks() in > > net/sunrpc/clnt.c will set all inflight IO with ERESTARTSYS, while no > > signal gets involved. So ERESTARTSYS is not handled before entering > > userspace. > > > > Best regards, > > Zhitao Li at SmartX. > > > > On Thu, Feb 22, 2024 at 7:05 PM Jeff Layton <jlayton@kernel.org> wrote: > > > > > > On Wed, 2024-02-21 at 13:48 +0000, Trond Myklebust wrote: > > > > On Wed, 2024-02-21 at 16:20 +0800, Zhitao Li wrote: > > > > > [You don't often get email from zhitao.li@smartx.com. Learn why this > > > > > is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > > > > > > > Hi, everyone, > > > > > > > > > > - Facts: > > > > > I have a remote NFS export and I mount the same export on two > > > > > different directories in my OS with the same options. There is an > > > > > inflight IO under one mounted directory. And then I unmount another > > > > > mounted directory with force. The inflight IO ends up with "Unknown > > > > > error 512", which is ERESTARTSYS. > > > > > > > > > > > > > All of the above is well known. That's because forced umount affects > > > > the entire filesystem. Why are you using it here in the first place? It > > > > is not intended for casual use. > > > > > > > > > > While I agree Trond's above statement, the kernel is not supposed to > > > leak error codes that high into userland. Are you seeing ERESTARTSYS > > > being returned to system calls? If so, which ones? > > > -- > > > Jeff Layton <jlayton@kernel.org> > > > > I think this bug was introduced by > Commit ae67bd3821bb ("SUNRPC: Fix up task signalling") > in Linux v5.2. > > Prior to that commit, rpc_killall_tasks set the error to -EIO. > After that commit it calls rpc_signal_task which always uses > -ERESTARTSYS. > > This might be an appropriate fix. Can you please test and confirm? > > diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h > index 2d61987b3545..ed3a116efd5d 100644 > --- a/include/linux/sunrpc/sched.h > +++ b/include/linux/sunrpc/sched.h > @@ -222,7 +222,7 @@ void rpc_put_task(struct rpc_task *); > void rpc_put_task_async(struct rpc_task *); > bool rpc_task_set_rpc_status(struct rpc_task *task, int rpc_status); > void rpc_task_try_cancel(struct rpc_task *task, int error); > -void rpc_signal_task(struct rpc_task *); > +void rpc_signal_task(struct rpc_task *, int); > void rpc_exit_task(struct rpc_task *); > void rpc_exit(struct rpc_task *, int); > void rpc_release_calldata(const struct rpc_call_ops *, void *); > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > index cda0935a68c9..cdbdfae13030 100644 > --- a/net/sunrpc/clnt.c > +++ b/net/sunrpc/clnt.c > @@ -895,7 +895,7 @@ void rpc_killall_tasks(struct rpc_clnt *clnt) > trace_rpc_clnt_killall(clnt); > spin_lock(&clnt->cl_lock); > list_for_each_entry(rovr, &clnt->cl_tasks, tk_task) > - rpc_signal_task(rovr); > + rpc_signal_task(rovr, -EIO); > spin_unlock(&clnt->cl_lock); > } > EXPORT_SYMBOL_GPL(rpc_killall_tasks); > diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c > index 6debf4fd42d4..e88621881036 100644 > --- a/net/sunrpc/sched.c > +++ b/net/sunrpc/sched.c > @@ -852,14 +852,14 @@ void rpc_exit_task(struct rpc_task *task) > } > } > > -void rpc_signal_task(struct rpc_task *task) > +void rpc_signal_task(struct rpc_task *task, int sig) > { > struct rpc_wait_queue *queue; > > if (!RPC_IS_ACTIVATED(task)) > return; > > - if (!rpc_task_set_rpc_status(task, -ERESTARTSYS)) > + if (!rpc_task_set_rpc_status(task, sig)) > return; > trace_rpc_task_signalled(task, task->tk_action); > set_bit(RPC_TASK_SIGNALLED, &task->tk_runstate); > @@ -992,7 +992,7 @@ static void __rpc_execute(struct rpc_task *task) > * clean up after sleeping on some queue, we don't > * break the loop here, but go around once more. > */ > - rpc_signal_task(task); > + rpc_signal_task(task, -ERESTARTSYS); > } > trace_rpc_task_sync_wake(task, task->tk_action); > } ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-02-28 3:38 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-02-21 8:20 PROBLEM: NFS client IO fails with ERESTARTSYS when another mount point with the same export is unmounted with force [NFS] [SUNRPC] Zhitao Li 2024-02-21 13:48 ` Trond Myklebust 2024-02-22 3:05 ` Zhitao Li 2024-02-22 11:05 ` Jeff Layton 2024-02-22 15:20 ` Trond Myklebust 2024-02-23 3:44 ` Zhitao Li 2024-02-23 10:31 ` Jeff Layton 2024-02-27 2:35 ` Zhitao Li 2024-02-23 3:20 ` Zhitao Li 2024-02-27 23:55 ` NeilBrown 2024-02-28 3:37 ` Zhitao Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox