From: Hal Rosenstock <hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
To: Florent Parent <florent.parent-hUdP7zcY/u2zFPF/XzyFIQ@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: bug report: opensm 3.3.15 crash (with traces)
Date: Fri, 04 Apr 2014 12:56:23 -0400 [thread overview]
Message-ID: <533EE437.4070003@dev.mellanox.co.il> (raw)
In-Reply-To: <CAF3spKES=F1Tr6Y-iwMV_Dues80HnHvxOeaSN9KjTJbQvU0qyg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
Hi Florent,
On 4/2/2014 5:43 PM, Florent Parent wrote:
> Hi,
>
> We experienced constant crashing from opensm 3.3.15 (3.3.15-1.el6.cq5)
> after a recent upgrade. We compiled and installed 3.3.17 and problem
> went away.
>
> OpenSM server: CentOS 6.5 w/ stock RDMA. OpenSM 3.3.15 was from the
> CentOS repository.
>
> A behaviour that may help diagnose this: Unusual large amount messages
> were filling up the opensm.log file:
>
> Mar 13 09:50:04 909147 [4FAFC700] 0x01 -> log_rcv_cb_error: ERR 3111:
> Received MAD with error status = 0x1C
> SubnGetResp(SwitchInfo), attr_mod 0x0, TID 0x73c86e46
> Initial path: 0,1,33,30,28 Return path: 0,10,32,13,28
>
> 80 of these messages occur periodically. smpquery on the paths shows
> that these all point to the Sun QNEM switches (80 I4 chips).
> "use_mfttop FALSE" eliminated these messages.
Yes, this is caused by bad firmware. The best fix is to upgrade the
firmware on the devices indicated by the DR paths. There's also the
workaround on the OpenSM side that you are using.
This is orthogonal to the crashes below.
> Florent
>
>
> *** glibc detected *** /usr/sbin/opensm: malloc(): smallbin double
> linked list corrupted: 0x00007f9b3c4352a0 ***
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x76166)[0x7f9b56279166]
> /lib64/libc.so.6(+0x79f1f)[0x7f9b5627cf1f]
> /lib64/libc.so.6(__libc_malloc+0x71)[0x7f9b5627d991]
> /usr/sbin/opensm[0x4216f3]
> /usr/sbin/opensm(osm_pkey_mgr_process+0x467)[0x422187]
> /usr/sbin/opensm[0x446efb]
> /usr/sbin/opensm(osm_state_mgr_process+0x1f8)[0x448538]
> /usr/sbin/opensm[0x4422bb]
> /usr/lib64/libosmcomp.so.3(+0x85fe)[0x7f9b56ddb5fe]
> /lib64/libpthread.so.0(+0x79d1)[0x7f9b5659e9d1]
> /lib64/libc.so.6(clone+0x6d)[0x7f9b562ebb6d]
>
> *** glibc detected *** /usr/sbin/opensm: double free or corruption
> (out): 0x00007fe2f42e1830 ***
Are you using partitions ? Any idea on the scenario here ?
I can isolate the patch (beyond 3.3.15) that fixes this if needed.
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x76166)[0x7fe30ec9d166]
> /lib64/libc.so.6(+0x78c93)[0x7fe30ec9fc93]
> /usr/sbin/opensm[0x449cf6]
> /usr/sbin/opensm(osm_subn_rescan_conf_files+0x194)[0x44af14]
> /usr/sbin/opensm[0x447260]
> /usr/sbin/opensm(osm_state_mgr_process+0x1f8)[0x448538]
> /usr/sbin/opensm[0x4422bb]
> /usr/lib64/libosmcomp.so.3(+0x85fe)[0x7fe30f7ff5fe]
> /lib64/libpthread.so.0(+0x79d1)[0x7fe30efc29d1]
> /lib64/libc.so.6(clone+0x6d)[0x7fe30ed0fb6d]
>
> *** glibc detected *** /usr/sbin/opensm: malloc(): smallbin double
> linked list corrupted: 0x00007f200838ede0 ***
This is one I'm unfamiliar with and will need to investigate further.
Did this one also go away with 3.3.17 ?
Thanks.
-- Hal
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x76166)[0x7f2025131166]
> /lib64/libc.so.6(+0x79f1f)[0x7f2025134f1f]
> /lib64/libc.so.6(__libc_malloc+0x71)[0x7f2025135991]
> /usr/sbin/opensm[0x4216f3]
> /usr/sbin/opensm(osm_pkey_mgr_process+0x467)[0x422187]
> /usr/sbin/opensm[0x446efb]
> /usr/sbin/opensm(osm_state_mgr_process+0x1f8)[0x448538]
> /usr/sbin/opensm[0x4422bb]
> /usr/lib64/libosmcomp.so.3(+0x85fe)[0x7f2025c935fe]
> /lib64/libpthread.so.0(+0x79d1)[0x7f20254569d1]
> /lib64/libc.so.6(clone+0x6d)[0x7f20251a3b6d]
>
>
> *** glibc detected *** /usr/sbin/opensm: malloc(): smallbin double
> linked list corrupted: 0x00007f8464013df0 ***
> ======= Backtrace: =========
> /lib64/libc.so.6(+0x76166)[0x7f847ec95166]
> /lib64/libc.so.6(+0x79f1f)[0x7f847ec98f1f]
> /lib64/libc.so.6(__libc_malloc+0x71)[0x7f847ec99991]
> /usr/sbin/opensm[0x4216f3]
> /usr/sbin/opensm(osm_pkey_mgr_process+0x467)[0x422187]
> /usr/sbin/opensm[0x446efb]
> /usr/sbin/opensm(osm_state_mgr_process+0x1f8)[0x448538]
> /usr/sbin/opensm[0x4422bb]
> /usr/lib64/libosmcomp.so.3(+0x85fe)[0x7f847f7f75fe]
> /lib64/libpthread.so.0(+0x79d1)[0x7f847efba9d1]
> /lib64/libc.so.6(clone+0x6d)[0x7f847ed07b6d]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-04-04 16:56 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAF3spKHCEVtmdjXHS-1YGjZ2OHsURsn3Junt3OhbNCF7_AfG9A@mail.gmail.com>
[not found] ` <CAF3spKHCEVtmdjXHS-1YGjZ2OHsURsn3Junt3OhbNCF7_AfG9A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-04-02 21:43 ` bug report: opensm 3.3.15 crash (with traces) Florent Parent
[not found] ` <CAF3spKES=F1Tr6Y-iwMV_Dues80HnHvxOeaSN9KjTJbQvU0qyg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-04-04 16:56 ` Hal Rosenstock [this message]
[not found] ` <533EE437.4070003-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-04-04 23:21 ` Florent Parent
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=533EE437.4070003@dev.mellanox.co.il \
--to=hal-ldsdmyg8hgv8yrgs2mwiifqbs+8scbdb@public.gmane.org \
--cc=florent.parent-hUdP7zcY/u2zFPF/XzyFIQ@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox