From: Mahesh Siddheshwar <siddheshwar.mahesh-xsfywfwIY+M@public.gmane.org>
To: Tom Tucker
<tom-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>,
Vu Pham <vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Roland Dreier <rdreier-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>,
ewg-G2znmakfqn7U1rindQTSdQ@public.gmane.org
Subject: Re: nfsrdma fails to write big file,
Date: Wed, 03 Mar 2010 12:26:40 -0800 [thread overview]
Message-ID: <4B8EC600.9050101@sun.com> (raw)
In-Reply-To: <4B89EF88.1030903-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
Hi Tom, Vu,
Tom Tucker wrote:
> Roland Dreier wrote:
>> > + /* > + * Add room for frmr
>> register and invalidate WRs
>> > + * Requests sometimes have two chunks, each chunk
>> > + * requires to have different frmr. The safest
>> > + * WRs required are max_send_wr * 6; however, we
>> > + * get send completions and poll fast enough, it
>> > + * is pretty safe to have max_send_wr * 4. >
>> + */
>> > + ep->rep_attr.cap.max_send_wr *= 4;
>>
>> Seems like a bad design if there is a possibility of work queue
>> overflow; if you're counting on events occurring in a particular order
>> or completions being handled "fast enough", then your design is going to
>> fail in some high load situations, which I don't think you want.
>
> Vu,
>
> Would you please try the following:
>
> - Set the multiplier to 5
While trying to test this between a Linux client and Solaris server,
I made the following changes in :
/usr/src/ofa_kernel-1.5.1/net/sunrpc/xprtrdma/verbs.c
diff verbs.c.org verbs.c
653c653
< ep->rep_attr.cap.max_send_wr *= 3;
---
> ep->rep_attr.cap.max_send_wr *= 8;
685c685
< ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 /* - 1*/;
---
> ep->rep_cqinit = ep->rep_attr.cap.max
(I bumped it to 8)
did make install.
On reboot I see the errors on NFS READs as opposed to WRITEs
as seen before, when I try to read a 10G file from the server.
The client is running: RHEL 5.3 (2.6.18-128.el5PAE) with
OFED-1.5.1-20100223-0740 bits. The client has an Sun IB
HCA: SUN0070130001, MT25418, 2.7.0 firmware, hw_rev = a0.
The server is running Solaris based on snv_128.
rpcdebug output from the client:
==
RPC: 85 call_bind (status 0)
RPC: 85 call_connect xprt ec78d800 is connected
RPC: 85 call_transmit (status 0)
RPC: 85 xprt_prepare_transmit
RPC: 85 xprt_cwnd_limited cong = 0 cwnd = 8192
RPC: 85 rpc_xdr_encode (status 0)
RPC: 85 marshaling UNIX cred eddb4dc0
RPC: 85 using AUTH_UNIX cred eddb4dc0 to wrap rpc data
RPC: 85 xprt_transmit(164)
RPC: rpcrdma_inline_pullup: pad 0 destp 0xf1dd1410 len 164 hdrlen 164
RPC: rpcrdma_register_frmr_external: Using frmr ec7da920 to map 4
segments
RPC: rpcrdma_create_chunks: write chunk elem
16384@0x38536d000:0xa601 (more)
RPC: rpcrdma_register_frmr_external: Using frmr ec7da960 to map 1
segments
RPC: rpcrdma_create_chunks: write chunk elem 108@0x31dd153c:0xaa01
(last)
RPC: rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 164 padlen
0 headerp 0xf1dd124c base 0xf1dd136c lkey 0x500
RPC: 85 xmit complete
RPC: 85 sleep_on(queue "xprt_pending" time 4683109)
RPC: 85 added to queue ec78d994 "xprt_pending"
RPC: 85 setting alarm for 60000 ms
RPC: wake_up_next(ec78d944 "xprt_resend")
RPC: wake_up_next(ec78d8f4 "xprt_sending")
RPC: rpcrdma_qp_async_error_upcall: QP error 3 on device mlx4_0 ep
ec78db40
RPC: 85 __rpc_wake_up_task (now 4683110)
RPC: 85 disabling timer
RPC: 85 removed from queue ec78d994 "xprt_pending"
RPC: __rpc_wake_up_task done
RPC: 85 __rpc_execute flags=0x1
RPC: 85 call_status (status -107)
RPC: 85 call_bind (status 0)
RPC: 85 call_connect xprt ec78d800 is not connected
RPC: 85 xprt_connect xprt ec78d800 is not connected
RPC: 85 sleep_on(queue "xprt_pending" time 4683110)
RPC: 85 added to queue ec78d994 "xprt_pending"
RPC: 85 setting alarm for 60000 ms
RPC: rpcrdma_event_process: event rep ec116800 status 5 opcode 80
length 2493606
RPC: rpcrdma_event_process: recv WC status 5, connection lost
RPC: rpcrdma_conn_upcall: disconnected: ec78dbccI4:20049 (ep
0xec78db40 event 0xa)
RPC: rpcrdma_conn_upcall: disconnected
rpcrdma: connection to ec78dbccI4:20049 closed (-103)
RPC: xprt_rdma_connect_worker: reconnect
==
On the server I see:
Mar 3 17:45:16 elena-ar hermon: [ID 271130 kern.notice] NOTICE:
hermon0: Device Error: CQE remote access error
Mar 3 17:45:16 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS:
bad sendreply
Mar 3 17:45:21 elena-ar hermon: [ID 271130 kern.notice] NOTICE:
hermon0: Device Error: CQE remote access error
Mar 3 17:45:21 elena-ar nfssrv: [ID 819430 kern.notice] NOTICE: NFS:
bad sendreply
The remote access error is actually seen on RDMA_WRITE.
Doing some more debug on the server with DTrace, I see that
the destination address and length matches the write chunk
element in the Linux debug output above.
0 9385 rib_write:entry daddr 38536d000, len 4000,
hdl a601
0 9358 rib_init_sendwait:return ffffff44a715d308
1 9296 rib_svc_scq_handler:return 1f7
1 9356 rib_sendwait:return 14
1 9386 rib_write:return 14
^^^ that is RDMA_FAILED in
1 63295 xdrrdma_send_read_data:return 0
1 5969 xdr_READ3res:return
1 5969 xdr_READ3res:return 0
Is this a variation of the previously discussed issue or something new?
Thanks,
Mahesh
> - Set the number of buffer credits small as follows "echo 4 >
> /proc/sys/sunrpc/rdma_slot_table_entries"
> - Rerun your test and see if you can reproduce the problem?
>
> I did the above and was unable to reproduce, but I would like to see
> if you can to convince ourselves that 5 is the right number.
>
> Thanks,
> Tom
>
>> - R.
>>
>
next prev parent reply other threads:[~2010-03-03 20:26 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-22 18:41 nfsrdma fails to write big file, Vu Pham
[not found] ` <9FA59C95FFCBB34EA5E42C1A8573784F02662E58-SDnKeQl2TTymvrjiD8yIlgC/G2K4zDHf@public.gmane.org>
2010-02-22 18:49 ` [ewg] " Tom Tucker
2010-02-22 20:22 ` Vu Pham
2010-02-24 18:56 ` Vu Pham
[not found] ` <9FA59C95FFCBB34EA5E42C1A8573784F02663166-SDnKeQl2TTymvrjiD8yIlgC/G2K4zDHf@public.gmane.org>
2010-02-24 19:06 ` Roland Dreier
[not found] ` <ada3a0q1mje.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-02-24 22:13 ` Tom Tucker
2010-02-28 4:22 ` Tom Tucker
2010-03-02 0:19 ` Vu Pham
[not found] ` <9FA59C95FFCBB34EA5E42C1A8573784F02663602-SDnKeQl2TTymvrjiD8yIlgC/G2K4zDHf@public.gmane.org>
2010-03-02 3:17 ` Tom Tucker
[not found] ` <4B89EF88.1030903-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-03 20:26 ` Mahesh Siddheshwar [this message]
[not found] ` <4B8EC600.9050101-xsfywfwIY+M@public.gmane.org>
2010-03-03 22:52 ` Tom Tucker
[not found] ` <4B8EE813.2010205-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-04 16:43 ` Mahesh Siddheshwar
2010-02-24 22:07 ` Tom Tucker
2010-02-24 22:48 ` Tom Tucker
[not found] ` <4B85ACD2.9040405-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-25 0:02 ` Tom Tucker
[not found] ` <4B85BDF9.8020009-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-25 0:51 ` Tom Tucker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B8EC600.9050101@sun.com \
--to=siddheshwar.mahesh-xsfywfwiy+m@public.gmane.org \
--cc=ewg-G2znmakfqn7U1rindQTSdQ@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=rdreier-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org \
--cc=tom-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org \
--cc=vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox