public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Kumar Sanghvi <kumaras-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
To: "Davis, Arlin R" <arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org"
	<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>,
	"divy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org"
	<divy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Subject: Re: [PATCH] dapltest-server segfault seen on recent OFED-1.5.4 daily build
Date: Mon, 21 Nov 2011 16:50:34 +0530	[thread overview]
Message-ID: <4ECA3402.8030203@chelsio.com> (raw)
In-Reply-To: <54347E5A035A054EAE9D05927FB467F916EA49A5-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>

Hi,

On 11/19/2011 05:18 AM, Davis, Arlin R wrote:
>
>> #0  dapl_llist_remove_entry (head=0x636960, entry=0x7ffff0004bf8) at
[...]
>>
>
> You should have seen a message like "WARNING: overflow event on EVD".
>
> It appears that the default dapltest server allocates too small of a CR EVD for many client test configurations. When it hits the overflow queue case, the CR callback incorrectly frees the CR before it is removed from SP list. In your case, I am guessing that another CR came in on another thread and this memory was reallocated with flink ptr reinitialized.
>
> Please try the following patches.
>
> ---------
> Common: CR EVD overflow causes segfault.
>
> The CR is freed up incorrectly before unlinking with SP.
>
> Signed-off-by: Arlin Davis<arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>
>
> diff --git a/dapl/common/dapl_cr_callback.c b/dapl/common/dapl_cr_callback.c
> index 3997b38..c58444b 100644
> --- a/dapl/common/dapl_cr_callback.c
> +++ b/dapl/common/dapl_cr_callback.c
> @@ -414,7 +414,6 @@ dapli_connection_request(IN dp_ib_cm_handle_t ib_cm_handle,
>                                                       (DAT_CR_HANDLE) cr_ptr);
>
>          if (dat_status != DAT_SUCCESS) {
> -               dapls_cr_free(cr_ptr);
>                  (void)dapls_ib_reject_connection(ib_cm_handle,
>                                                   DAT_CONNECTION_EVENT_BROKEN,
>                                                   0, NULL);
> @@ -423,6 +422,7 @@ dapli_connection_request(IN dp_ib_cm_handle_t ib_cm_handle,
>                  dapl_os_lock(&sp_ptr->header.lock);
>                  dapl_sp_remove_cr(sp_ptr, cr_ptr);
>                  dapl_os_unlock(&sp_ptr->header.lock);
> +               dapls_cr_free(cr_ptr);
>                  return DAT_INSUFFICIENT_RESOURCES;
>          }
>
>
> ----------
> dapltest: server CR EVD is too small for multi-client configurations.
>
> Increase default size from 8 to 32.
>
> Signed-off-by: Arlin Davis<arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>
> diff --git a/test/dapltest/test/dapl_server.c b/test/dapltest/test/dapl_server.c
> index 443425c..92e0d21 100644
> --- a/test/dapltest/test/dapl_server.c
> +++ b/test/dapltest/test/dapl_server.c
> @@ -34,7 +34,7 @@
>   #undef DFLT_QLEN
>   #endif
>
> -#define DFLT_QLEN 8            /* default event queue length */
> +#define DFLT_QLEN 32           /* default event queue length */
>
>   int send_control_data(DT_Tdep_Print_Head * phead,
>                        unsigned char *buffp,
>
>
Thank you for the two patches. I tried the two patches and now, I have 
not seen a segfault till now on dapl-server at least.
However, after about 2 hours of test, some of dapl-client throws below 
error on console:
----
Server Name: 3.4.5.1
Server Net Address: 3.4.5.1
DT_cs_Client: Starting Test ...
FAIL: 16 Server test connections did not report ready.
FAIL: 16 Server test connections did not report ready.
----

dapl-client is stalled at this stage, and needs to be manually killed by 
Ctrl+C.
And below errors are seen on dapl-server console:
----
Test Error: Client_Mem_Info_Send-reaping DTO problem, status = FAILURE
Test Error: Client_Mem_Info_Send-reaping DTO problem, status = FAILURE
Test[b368]: Warning: dat_ep_disconnect (abrupt) #2 error 
DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED
Test[b368]: dat_evd_free (creq) error: DAT_INVALID_STATE 
DAT_INVALID_STATE_EVD_IN_USE
Test[b368]: Warning: dat_ep_disconnect (abrupt) #3 error 
DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED
Test[b368]: dat_evd_free (creq) error: DAT_INVALID_STATE 
DAT_INVALID_STATE_EVD_IN_USE
...
----

No message is seen in dmesg on either dapl-server or dapl-client machine.

If I manually kill the dapl-client, and restart it then, test again 
starts fine and runs for about 2 hours or so.


Thanks,
Kumar.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2011-11-21 11:20 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-18  9:01 dapltest-server segfault seen on recent OFED-1.5.4 daily build Kumar Sanghvi
     [not found] ` <20111118090155.GB17346-ZuiPNEE88OINxtijsoNbcrBI9BrxbZE7QQ4Iyu8u01E@public.gmane.org>
2011-11-18 23:48   ` [PATCH] " Davis, Arlin R
     [not found]     ` <54347E5A035A054EAE9D05927FB467F916EA49A5-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-11-21 11:20       ` Kumar Sanghvi [this message]
     [not found]         ` <4ECA3402.8030203-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2011-11-21 20:21           ` Davis, Arlin R
     [not found]             ` <54347E5A035A054EAE9D05927FB467F916EA4CE2-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-11-22  7:11               ` Kumar Sanghvi
     [not found]                 ` <4ECB4B17.20407-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2011-11-29 10:37                   ` Kumar Sanghvi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ECA3402.8030203@chelsio.com \
    --to=kumaras-ut6up61k2wzbdgjk7y7tuq@public.gmane.org \
    --cc=arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=divy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox