All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kumar Sanghvi <kumaras-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
To: "Davis, Arlin R" <arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org"
	<swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>,
	"divy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org"
	<divy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Subject: Re: [PATCH] dapltest-server segfault seen on recent OFED-1.5.4 daily build
Date: Mon, 21 Nov 2011 16:50:34 +0530	[thread overview]
Message-ID: <4ECA3402.8030203@chelsio.com> (raw)
In-Reply-To: <54347E5A035A054EAE9D05927FB467F916EA49A5-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>

Hi,

On 11/19/2011 05:18 AM, Davis, Arlin R wrote:
>
>> #0  dapl_llist_remove_entry (head=0x636960, entry=0x7ffff0004bf8) at
[...]
>>
>
> You should have seen a message like "WARNING: overflow event on EVD".
>
> It appears that the default dapltest server allocates too small of a CR EVD for many client test configurations. When it hits the overflow queue case, the CR callback incorrectly frees the CR before it is removed from SP list. In your case, I am guessing that another CR came in on another thread and this memory was reallocated with flink ptr reinitialized.
>
> Please try the following patches.
>
> ---------
> Common: CR EVD overflow causes segfault.
>
> The CR is freed up incorrectly before unlinking with SP.
>
> Signed-off-by: Arlin Davis<arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>
>
> diff --git a/dapl/common/dapl_cr_callback.c b/dapl/common/dapl_cr_callback.c
> index 3997b38..c58444b 100644
> --- a/dapl/common/dapl_cr_callback.c
> +++ b/dapl/common/dapl_cr_callback.c
> @@ -414,7 +414,6 @@ dapli_connection_request(IN dp_ib_cm_handle_t ib_cm_handle,
>                                                       (DAT_CR_HANDLE) cr_ptr);
>
>          if (dat_status != DAT_SUCCESS) {
> -               dapls_cr_free(cr_ptr);
>                  (void)dapls_ib_reject_connection(ib_cm_handle,
>                                                   DAT_CONNECTION_EVENT_BROKEN,
>                                                   0, NULL);
> @@ -423,6 +422,7 @@ dapli_connection_request(IN dp_ib_cm_handle_t ib_cm_handle,
>                  dapl_os_lock(&sp_ptr->header.lock);
>                  dapl_sp_remove_cr(sp_ptr, cr_ptr);
>                  dapl_os_unlock(&sp_ptr->header.lock);
> +               dapls_cr_free(cr_ptr);
>                  return DAT_INSUFFICIENT_RESOURCES;
>          }
>
>
> ----------
> dapltest: server CR EVD is too small for multi-client configurations.
>
> Increase default size from 8 to 32.
>
> Signed-off-by: Arlin Davis<arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
>
> diff --git a/test/dapltest/test/dapl_server.c b/test/dapltest/test/dapl_server.c
> index 443425c..92e0d21 100644
> --- a/test/dapltest/test/dapl_server.c
> +++ b/test/dapltest/test/dapl_server.c
> @@ -34,7 +34,7 @@
>   #undef DFLT_QLEN
>   #endif
>
> -#define DFLT_QLEN 8            /* default event queue length */
> +#define DFLT_QLEN 32           /* default event queue length */
>
>   int send_control_data(DT_Tdep_Print_Head * phead,
>                        unsigned char *buffp,
>
>
Thank you for the two patches. I tried the two patches and now, I have 
not seen a segfault till now on dapl-server at least.
However, after about 2 hours of test, some of dapl-client throws below 
error on console:
----
Server Name: 3.4.5.1
Server Net Address: 3.4.5.1
DT_cs_Client: Starting Test ...
FAIL: 16 Server test connections did not report ready.
FAIL: 16 Server test connections did not report ready.
----

dapl-client is stalled at this stage, and needs to be manually killed by 
Ctrl+C.
And below errors are seen on dapl-server console:
----
Test Error: Client_Mem_Info_Send-reaping DTO problem, status = FAILURE
Test Error: Client_Mem_Info_Send-reaping DTO problem, status = FAILURE
Test[b368]: Warning: dat_ep_disconnect (abrupt) #2 error 
DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED
Test[b368]: dat_evd_free (creq) error: DAT_INVALID_STATE 
DAT_INVALID_STATE_EVD_IN_USE
Test[b368]: Warning: dat_ep_disconnect (abrupt) #3 error 
DAT_INVALID_STATE DAT_INVALID_STATE_EP_UNCONNECTED
Test[b368]: dat_evd_free (creq) error: DAT_INVALID_STATE 
DAT_INVALID_STATE_EVD_IN_USE
...
----

No message is seen in dmesg on either dapl-server or dapl-client machine.

If I manually kill the dapl-client, and restart it then, test again 
starts fine and runs for about 2 hours or so.


Thanks,
Kumar.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2011-11-21 11:20 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-18  9:01 dapltest-server segfault seen on recent OFED-1.5.4 daily build Kumar Sanghvi
     [not found] ` <20111118090155.GB17346-ZuiPNEE88OINxtijsoNbcrBI9BrxbZE7QQ4Iyu8u01E@public.gmane.org>
2011-11-18 23:48   ` [PATCH] " Davis, Arlin R
     [not found]     ` <54347E5A035A054EAE9D05927FB467F916EA49A5-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-11-21 11:20       ` Kumar Sanghvi [this message]
     [not found]         ` <4ECA3402.8030203-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2011-11-21 20:21           ` Davis, Arlin R
     [not found]             ` <54347E5A035A054EAE9D05927FB467F916EA4CE2-P5GAC/sN6hmkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2011-11-22  7:11               ` Kumar Sanghvi
     [not found]                 ` <4ECB4B17.20407-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2011-11-29 10:37                   ` Kumar Sanghvi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ECA3402.8030203@chelsio.com \
    --to=kumaras-ut6up61k2wzbdgjk7y7tuq@public.gmane.org \
    --cc=arlin.r.davis-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=divy-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.