All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tom Tucker <tom-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: Vu Pham <vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Mahesh Siddheshwar
	<siddheshwar.mahesh-xsfywfwIY+M@public.gmane.org>,
	ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org,
	Roland Dreier <rdreier-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
Subject: Re: [ewg] nfsrdma fails to write big file,
Date: Wed, 24 Feb 2010 18:51:31 -0600	[thread overview]
Message-ID: <4B85C993.6030803@opengridcomputing.com> (raw)
In-Reply-To: <4B85BDF9.8020009-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>

Vu,

I ran the number of slots down to 8 (echo 8 > rdma_slot_table_entries) 
and I can reproduce the issue now. I'm going to try setting the 
allocation multiple to 5 and see if I can't prove to myself and Roland 
that we've accurately computed the correct factor.

I think overall a better solution might be a different credit system, 
however, I think that's a much more substantial change than we can 
tackle at this point.

Tom


Tom Tucker wrote:
> Vu,
>
> Based on the mapping code, it looks to me like the worst case is 
> RPCRDMA_MAX_SEGS * 2 + 1 as the multiplier. 
> However, I think in practice, due to the way that iov are built, the 
> actual max is 5 (frmr for head + pagelist plus invalidates for same plus 
> one for the send itself). Why did you think the max was 6?
>
> Thanks,
> Tom
>
> Tom Tucker wrote:
>   
>> Vu,
>>
>> Are you changing any of the default settings? For example rsize/wsize, 
>> etc... I'd like to reproduce this problem if I can.
>>
>> Thanks,
>>
>> Tom
>>
>> Vu Pham wrote:
>>   
>>     
>>> Tom,
>>>
>>> Did you make any change to have bonnie++, dd of a 10G file and vdbench
>>> concurrently run & finish?
>>>
>>> I keep hitting the WQE overflow error below.
>>> I saw that most of the requests have two chunks (32K chunk and
>>> some-bytes chunk), each chunk requires an frmr + invalidate wrs;
>>> However, you set ep->rep_attr.cap.max_send_wr = cdata->max_requests and
>>> then for frmr case you do
>>> ep->rep_atrr.cap.max_send_wr *=3; which is not enough. Moreover, you
>>> also set ep->rep_cqinit = max_send_wr/2 for send completion signal which
>>> causes the wqe overflow happened faster.
>>>
>>> After applying the following patch, I have thing vdbench, dd, and copy
>>> 10g_file running overnight
>>>
>>> -vu
>>>
>>>
>>> --- ofa_kernel-1.5.1.orig/net/sunrpc/xprtrdma/verbs.c   2010-02-24
>>> 10:41:22.000000000 -0800
>>> +++ ofa_kernel-1.5.1/net/sunrpc/xprtrdma/verbs.c        2010-02-24
>>> 10:03:18.000000000 -0800
>>> @@ -649,8 +654,15 @@
>>>         ep->rep_attr.cap.max_send_wr = cdata->max_requests;
>>>         switch (ia->ri_memreg_strategy) {
>>>         case RPCRDMA_FRMR:
>>> -               /* Add room for frmr register and invalidate WRs */
>>> -               ep->rep_attr.cap.max_send_wr *= 3;
>>> +               /* 
>>> +                * Add room for frmr register and invalidate WRs
>>> +                * Requests sometimes have two chunks, each chunk
>>> +                * requires to have different frmr. The safest
>>> +                * WRs required are max_send_wr * 6; however, we
>>> +                * get send completions and poll fast enough, it
>>> +                * is pretty safe to have max_send_wr * 4. 
>>> +                */
>>> +               ep->rep_attr.cap.max_send_wr *= 4;
>>>                 if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr)
>>>                         return -EINVAL;
>>>                 break;
>>> @@ -682,7 +694,8 @@
>>>                 ep->rep_attr.cap.max_recv_sge);
>>>
>>>         /* set trigger for requesting send completion */
>>> -       ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/2 /*  - 1*/;
>>> +       ep->rep_cqinit = ep->rep_attr.cap.max_send_wr/4;
>>> +       
>>>         switch (ia->ri_memreg_strategy) {
>>>         case RPCRDMA_MEMWINDOWS_ASYNC:
>>>         case RPCRDMA_MEMWINDOWS:
>>>
>>>
>>>
>>>
>>>
>>>   
>>>     
>>>       
>>>> -----Original Message-----
>>>> From: ewg-bounces-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org [mailto:ewg-
>>>> bounces-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org] On Behalf Of Vu Pham
>>>> Sent: Monday, February 22, 2010 12:23 PM
>>>> To: Tom Tucker
>>>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Mahesh Siddheshwar;
>>>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>>>> Subject: Re: [ewg] nfsrdma fails to write big file,
>>>>
>>>> Tom,
>>>>
>>>> Some more info on the problem:
>>>> 1. Running with memreg=4 (FMR) I can not reproduce the problem
>>>> 2. I also see different error on client
>>>>
>>>> Feb 22 12:16:55 mellanox-2 rpc.idmapd[5786]: nss_getpwnam: name
>>>> 'nobody'
>>>> does not map into domain 'localdomain'
>>>> Feb 22 12:16:55 mellanox-2 kernel: QP 0x70004b: WQE overflow
>>>> Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
>>>> Feb 22 12:16:55 mellanox-2 kernel: QP 0x6c004a: WQE overflow
>>>> Feb 22 12:16:55 mellanox-2 kernel: RPC: rpcrdma_ep_post: ib_post_send
>>>> returned -12 cq_init 48 cq_count 32
>>>> Feb 22 12:17:00 mellanox-2 kernel: RPC:       rpcrdma_event_process:
>>>> send WC status 5, vend_err F5
>>>> Feb 22 12:17:00 mellanox-2 kernel: rpcrdma: connection to
>>>> 13.20.1.9:20049 closed (-103)
>>>>
>>>> -vu
>>>>
>>>>     
>>>>       
>>>>         
>>>>> -----Original Message-----
>>>>> From: Tom Tucker [mailto:tom-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org]
>>>>> Sent: Monday, February 22, 2010 10:49 AM
>>>>> To: Vu Pham
>>>>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Mahesh Siddheshwar;
>>>>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>>>>> Subject: Re: [ewg] nfsrdma fails to write big file,
>>>>>
>>>>> Vu Pham wrote:
>>>>>       
>>>>>         
>>>>>           
>>>>>> Setup:
>>>>>> 1. linux nfsrdma client/server with OFED-1.5.1-20100217-0600,
>>>>>>         
>>>>>>           
>>>>>>             
>>>>> ConnectX2
>>>>>       
>>>>>         
>>>>>           
>>>>>> QDR HCAs fw 2.7.8-6, RHEL 5.2.
>>>>>> 2. Solaris nfsrdma server svn 130, ConnectX QDR HCA.
>>>>>>
>>>>>>
>>>>>> Running vdbench on 10g file or *dd if=/dev/zero of=10g_file bs=1M
>>>>>> count=10000*, operation fail, connection get drop, client cannot
>>>>>> re-establish connection to server.
>>>>>> After rebooting only the client, I can mount again.
>>>>>>
>>>>>> It happens with both solaris and linux nfsrdma servers.
>>>>>>
>>>>>> For linux client/server, I run memreg=5 (FRMR), I don't see
>>>>>>         
>>>>>>           
>>>>>>             
>>> problem
>>>   
>>>     
>>>       
>>>>> with
>>>>>       
>>>>>         
>>>>>           
>>>>>> memreg=6 (global dma key)
>>>>>>
>>>>>>
>>>>>>         
>>>>>>           
>>>>>>             
>>>>> Awesome. This is the key I think.
>>>>>
>>>>> Thanks for the info Vu,
>>>>> Tom
>>>>>
>>>>>
>>>>>       
>>>>>         
>>>>>           
>>>>>> On Solaris server snv 130, we see problem decoding write request
>>>>>>         
>>>>>>           
>>>>>>             
>>> of
>>>   
>>>     
>>>       
>>>>> 32K.
>>>>>       
>>>>>         
>>>>>           
>>>>>> The client send two read chunks (32K & 16-byte), the server fail
>>>>>>         
>>>>>>           
>>>>>>             
>>> to
>>>   
>>>     
>>>       
>>>>> do
>>>>>       
>>>>>         
>>>>>           
>>>>>> rdma read on the 16-byte chunk (cqe.status = 10 ie.
>>>>>> IB_WC_REM_ACCCESS_ERROR); therefore, server terminate the
>>>>>>         
>>>>>>           
>>>>>>             
>>>> connection.
>>>>     
>>>>       
>>>>         
>>>>> We
>>>>>       
>>>>>         
>>>>>           
>>>>>> don't see this problem on nfs version 3 on Solaris. Solaris server
>>>>>>         
>>>>>>           
>>>>>>             
>>>>> run
>>>>>       
>>>>>         
>>>>>           
>>>>>> normal memory registration mode.
>>>>>>
>>>>>> On linux client, I see cqe.status = 12 ie. IB_WC_RETRY_EXC_ERR
>>>>>>
>>>>>> I added these notes in bug #1919 (bugs.openfabrics.org) to track
>>>>>>         
>>>>>>           
>>>>>>             
>>>> the
>>>>     
>>>>       
>>>>         
>>>>>> issue.
>>>>>>
>>>>>> thanks,
>>>>>> -vu
>>>>>> _______________________________________________
>>>>>> ewg mailing list
>>>>>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>>>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>>>>
>>>>>>         
>>>>>>           
>>>>>>             
>>>> _______________________________________________
>>>> ewg mailing list
>>>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>>     
>>>>       
>>>>         
>>> _______________________________________________
>>> ewg mailing list
>>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>>   
>>>     
>>>       
>> _______________________________________________
>> ewg mailing list
>> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
>> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>>   
>>     
>
> _______________________________________________
> ewg mailing list
> ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org
> http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
>   

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

      parent reply	other threads:[~2010-02-25  0:51 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-22 18:41 nfsrdma fails to write big file, Vu Pham
     [not found] ` <9FA59C95FFCBB34EA5E42C1A8573784F02662E58-SDnKeQl2TTymvrjiD8yIlgC/G2K4zDHf@public.gmane.org>
2010-02-22 18:49   ` [ewg] " Tom Tucker
2010-02-22 20:22     ` Vu Pham
2010-02-24 18:56       ` Vu Pham
     [not found]         ` <9FA59C95FFCBB34EA5E42C1A8573784F02663166-SDnKeQl2TTymvrjiD8yIlgC/G2K4zDHf@public.gmane.org>
2010-02-24 19:06           ` Roland Dreier
     [not found]             ` <ada3a0q1mje.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-02-24 22:13               ` Tom Tucker
2010-02-28  4:22               ` Tom Tucker
2010-03-02  0:19                 ` Vu Pham
     [not found]                   ` <9FA59C95FFCBB34EA5E42C1A8573784F02663602-SDnKeQl2TTymvrjiD8yIlgC/G2K4zDHf@public.gmane.org>
2010-03-02  3:17                     ` Tom Tucker
     [not found]                 ` <4B89EF88.1030903-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-03 20:26                   ` Mahesh Siddheshwar
     [not found]                     ` <4B8EC600.9050101-xsfywfwIY+M@public.gmane.org>
2010-03-03 22:52                       ` [ewg] " Tom Tucker
     [not found]                         ` <4B8EE813.2010205-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-03-04 16:43                           ` Mahesh Siddheshwar
2010-02-24 22:07           ` Tom Tucker
2010-02-24 22:48           ` Tom Tucker
     [not found]             ` <4B85ACD2.9040405-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-25  0:02               ` Tom Tucker
     [not found]                 ` <4B85BDF9.8020009-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-02-25  0:51                   ` Tom Tucker [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B85C993.6030803@opengridcomputing.com \
    --to=tom-7bpotxp6k4+p2yhjcf5u+vpxobypeauw@public.gmane.org \
    --cc=ewg-ZwoEplunGu1OwGhvXhtEPSCwEArCW2h5@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=rdreier-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org \
    --cc=siddheshwar.mahesh-xsfywfwIY+M@public.gmane.org \
    --cc=vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.