* Interrupted IO causing async errors @ 2016-06-23 15:42 Steve Wise 2016-06-24 0:15 ` Chuck Lever 0 siblings, 1 reply; 5+ messages in thread From: Steve Wise @ 2016-06-23 15:42 UTC (permalink / raw) To: Chuck Lever Cc: Raju Rangoju, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA Hey chuck, we observe with 4.7-rc4 (and older kernels too) that interrupting a dbench test on a nfsrdma/cxgb4 mount while it is doing heavy I/O can result in cxgb4 logging an "invalid stag" error on an ingress RDMA WRITE message. Is this expected? I'm wondering if this is a normal side effect of interrupting the IO on the mount. Maybe due to the mount options or NFS version? This error could happen if the NFSRDMA client invalidated MRs that were advertised to the server for IO, while IO was still in flight. Is this expected or should we dive in further? Thoughts? thanks... Here are the details of the test. Steps: -> Load iw_cxgb4,rdma_ucm on both nodes. -> Assign ip to chelsio interfaces on both nodes. Server Side [gayabari]: -> mknod /dev/ram0 b 1 0 -> modprobe brd rd_nr=1 rd_size=1048576 -> mkdir /nfsrdma -> mkfs.ext3 /dev/ram0 -> mount /dev/ram0 /nfsrdma -> vim /etc/exports /nfsrdma *(sync,insecure,rw,no_root_squash,no_subtree_check) -> modprobe xprtrdma -> modprobe svcrdma -> service nfsserver restart -> echo rdma 20049 > /proc/fs/nfsd/portlist -> exportfs -rav Client Side [sonada]: -> modprobe xprtrdma -> modprobe svcrdma -> mount 102.1.1.186:/nfsrdma/ -o rdma,port=20049,vers=3,wsize=65536,rsize=65536 /mnt/ -> Then run below command on client [sonada] : sonada:~ # dbench -t100 -D /root/share1/ 10 -> Issue is seen only on killing dbench test in between otherwise it ran fine. Error seen on the nfsdma client: [ 1593.398351] cxgb4 0000:01:00.4: AE qpid 1028 opcode 0 status 0x1 type 0 len 0x18e6009c wrid.hi 0x2cce2dc wrid.lo 0x2 [ 1593.398374] RPC: rpcrdma_qp_async_error_upcall: QP request error on device cxgb4_0 ep ffff88022f3567e8 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Interrupted IO causing async errors 2016-06-23 15:42 Interrupted IO causing async errors Steve Wise @ 2016-06-24 0:15 ` Chuck Lever [not found] ` <BDD2D64C-7A05-42EB-83C2-F95825C7579D-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 5+ messages in thread From: Chuck Lever @ 2016-06-24 0:15 UTC (permalink / raw) To: Steve Wise Cc: Raju Rangoju, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA Hi Steve- > On Jun 23, 2016, at 11:42 AM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote: > > Hey chuck, we observe with 4.7-rc4 (and older kernels too) that interrupting a > dbench test on a nfsrdma/cxgb4 mount while it is doing heavy I/O can result in > cxgb4 logging an "invalid stag" error on an ingress RDMA WRITE message. Is > this expected? I'm wondering if this is a normal side effect of interrupting > the IO on the mount. Maybe due to the mount options or NFS version? This > error could happen if the NFSRDMA client invalidated MRs that were advertised to > the server for IO, while IO was still in flight. Is this expected or should we > dive in further? Thoughts? thanks... When an application is signaled, outstanding RPCs are terminated. When an RPC completes, whether because a reply was received, or because the local application has died, any memory that was registered on behalf of that RPC is invalidated before it can be used for something else. The data in that memory remains at rest until invalidation and DMA unmapping is complete. It appears that your server is attempting to read an argument or write a result for an RPC that is no longer pending. I think both sides should report a transport error, and the connection should terminate. No other problems, though: other operation should continue normally after the client re-establishes a fresh connection. If this doesn't match your observations, let me know. > Here are the details of the test. > > Steps: > > -> Load iw_cxgb4,rdma_ucm on both nodes. > -> Assign ip to chelsio interfaces on both nodes. > > Server Side [gayabari]: > > -> mknod /dev/ram0 b 1 0 > -> modprobe brd rd_nr=1 rd_size=1048576 > -> mkdir /nfsrdma > -> mkfs.ext3 /dev/ram0 > -> mount /dev/ram0 /nfsrdma > -> vim /etc/exports > /nfsrdma *(sync,insecure,rw,no_root_squash,no_subtree_check) > > -> modprobe xprtrdma > -> modprobe svcrdma > -> service nfsserver restart > -> echo rdma 20049 > /proc/fs/nfsd/portlist > -> exportfs -rav > > Client Side [sonada]: > > -> modprobe xprtrdma > -> modprobe svcrdma > > -> mount 102.1.1.186:/nfsrdma/ -o > rdma,port=20049,vers=3,wsize=65536,rsize=65536 > /mnt/ > > -> Then run below command on client [sonada] : > sonada:~ # dbench -t100 -D /root/share1/ 10 > > > -> Issue is seen only on killing dbench test in between otherwise it ran fine. > > Error seen on the nfsdma client: > > [ 1593.398351] cxgb4 0000:01:00.4: AE qpid 1028 opcode 0 status 0x1 type 0 len > 0x18e6009c wrid.hi 0x2cce2dc wrid.lo 0x2 > [ 1593.398374] RPC: rpcrdma_qp_async_error_upcall: QP request error on > device cxgb4_0 ep ffff88022f3567e8 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <BDD2D64C-7A05-42EB-83C2-F95825C7579D-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>]
* RE: Interrupted IO causing async errors [not found] ` <BDD2D64C-7A05-42EB-83C2-F95825C7579D-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> @ 2016-06-24 13:36 ` Steve Wise 2016-06-24 19:26 ` Jason Gunthorpe 1 sibling, 0 replies; 5+ messages in thread From: Steve Wise @ 2016-06-24 13:36 UTC (permalink / raw) To: 'Chuck Lever' Cc: 'Raju Rangoju', linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA > -----Original Message----- > From: Chuck Lever [mailto:chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org] > Sent: Thursday, June 23, 2016 7:15 PM > To: Steve Wise > Cc: Raju Rangoju; linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Subject: Re: Interrupted IO causing async errors > > Hi Steve- > > > On Jun 23, 2016, at 11:42 AM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> > wrote: > > > > Hey chuck, we observe with 4.7-rc4 (and older kernels too) that interrupting a > > dbench test on a nfsrdma/cxgb4 mount while it is doing heavy I/O can result in > > cxgb4 logging an "invalid stag" error on an ingress RDMA WRITE message. Is > > this expected? I'm wondering if this is a normal side effect of interrupting > > the IO on the mount. Maybe due to the mount options or NFS version? This > > error could happen if the NFSRDMA client invalidated MRs that were advertised > to > > the server for IO, while IO was still in flight. Is this expected or should we > > dive in further? Thoughts? thanks... > > When an application is signaled, outstanding RPCs are terminated. > When an RPC completes, whether because a reply was received, > or because the local application has died, any memory that was > registered on behalf of that RPC is invalidated before it can be > used for something else. The data in that memory remains at rest > until invalidation and DMA unmapping is complete. > > It appears that your server is attempting to read an argument or > write a result for an RPC that is no longer pending. I think both > sides should report a transport error, and the connection should > terminate. No other problems, though: other operation should > continue normally after the client re-establishes a fresh connection. > > If this doesn't match your observations, let me know. > This is exactly what we see. Thanks! Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Interrupted IO causing async errors [not found] ` <BDD2D64C-7A05-42EB-83C2-F95825C7579D-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> 2016-06-24 13:36 ` Steve Wise @ 2016-06-24 19:26 ` Jason Gunthorpe [not found] ` <20160624192637.GE14506-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> 1 sibling, 1 reply; 5+ messages in thread From: Jason Gunthorpe @ 2016-06-24 19:26 UTC (permalink / raw) To: Chuck Lever Cc: Steve Wise, Raju Rangoju, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA On Thu, Jun 23, 2016 at 08:15:29PM -0400, Chuck Lever wrote: > When an application is signaled, outstanding RPCs are terminated. > When an RPC completes, whether because a reply was received, > or because the local application has died, any memory that was > registered on behalf of that RPC is invalidated before it can be > used for something else. The data in that memory remains at rest > until invalidation and DMA unmapping is complete. > It appears that your server is attempting to read an argument or > write a result for an RPC that is no longer pending. I think both > sides should report a transport error, and the connection should > terminate. No other problems, though: other operation should > continue normally after the client re-establishes a fresh connection. Yuk! A transport tare down and restart on user space CTRL-C/etc ? Isn't that a little too common and a little too expensive to be a permanent solution? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <20160624192637.GE14506-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>]
* Re: Interrupted IO causing async errors [not found] ` <20160624192637.GE14506-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> @ 2016-06-24 22:16 ` Chuck Lever 0 siblings, 0 replies; 5+ messages in thread From: Chuck Lever @ 2016-06-24 22:16 UTC (permalink / raw) To: Jason Gunthorpe Cc: Steve Wise, Raju Rangoju, linux-nfs-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA > On Jun 24, 2016, at 3:26 PM, Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org> wrote: > >> On Thu, Jun 23, 2016 at 08:15:29PM -0400, Chuck Lever wrote: >> When an application is signaled, outstanding RPCs are terminated. >> When an RPC completes, whether because a reply was received, >> or because the local application has died, any memory that was >> registered on behalf of that RPC is invalidated before it can be >> used for something else. The data in that memory remains at rest >> until invalidation and DMA unmapping is complete. > >> It appears that your server is attempting to read an argument or >> write a result for an RPC that is no longer pending. I think both >> sides should report a transport error, and the connection should >> terminate. No other problems, though: other operation should >> continue normally after the client re-establishes a fresh connection. > > Yuk! A transport tare down and restart on user space CTRL-C/etc ? > Isn't that a little too common and a little too expensive to be a > permanent solution? I agree, it's not an optimal solution, but: - The server must not be allowed to update memory that may have already been re-used after an RPC has terminated. - Typically reconnecting is quite fast. - Not every ^C or signal will trigger connection loss: only if there happens to be an outstanding RPC involving the signaled application. - Often a ^C or signal is the result of impatience while waiting for storage target restart. In that case, a reconnect is required anyway and the client has to redrive RPCs it is still interested in. - The problem seems most acute for Long Replies, where a large buffer has been registered for the server to write the reply into (as opposed to using a simple RDMA Send). As we work to increase the inline thresholds, there will be many fewer Long Replies and that narrows the exposure significantly. - Signals during workloads where NFS/RDMA is beneficial seem to be fairly infrequent. That of course is likely to change over time. So this is a compromise solution that is safe and should not be too onerous to performance. However, I'm open to better (less noisy) approaches. A protocol mechanism could be added so that a client can notify a server of a canceled RPC. This helps some, but is racy (impossible to be 100% effective), and protocol changes take time. Wait a moment after a signal occurs during a synchronous RPC, to provide an opportunity for server replies to drain and handle invalidation for us. Again, helps some, but not 100% guaranteed to prevent a Remote Protection Error. Care must be taken to wait without deadlocking the RPC client. Memory resources for a terminated RPC could be parked until the server replies and the regions can be invalidated normally in the reply handler. If the advertised buffer is part of a direct I/O, and the application has terminated, that memory would have to be put aside somehow until it was safe to re-use. A page in the page cache would have to be removed or locked during the waiting period, and that has some undesirable consequences. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-06-24 22:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-23 15:42 Interrupted IO causing async errors Steve Wise
2016-06-24 0:15 ` Chuck Lever
[not found] ` <BDD2D64C-7A05-42EB-83C2-F95825C7579D-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2016-06-24 13:36 ` Steve Wise
2016-06-24 19:26 ` Jason Gunthorpe
[not found] ` <20160624192637.GE14506-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2016-06-24 22:16 ` Chuck Lever
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox