public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Roland Dreier <rdreier-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
To: Josh England <jjengla-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Linux RDMA list <linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: NULL pointer dereference in rdma_ucm
Date: Tue, 20 Jul 2010 13:52:28 -0700	[thread overview]
Message-ID: <adazkxlq3jn.fsf@roland-alpha.cisco.com> (raw)
In-Reply-To: <AANLkTikwDEJY_F1ziGNPhYMBJEBC5UqD2XOLM7wSByj1-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> (Josh England's message of "Mon, 19 Jul 2010 16:20:58 -0700")

 > I'm experimenting with an rdma_cm application to push data around
 > between nodes on an ~1000 node cluster (CentOS-5.3 with 2.6.18-128.el5
 > and OFED-1.4.2).  Under heavy load, I'm seeing several nodes per day
 > kernel panic due to a NULL pointer dereference.  It may be that the
 > in-kernel field cm_id_priv has a NULL ->alt_av.port , causing the
 > Oops, but I don't know for sure.  Any ideas on how to debug this?

You have a pretty unsupportable combination of ancient kernel and old
OFED stack.  Is there any way you can test this with a recent mainline
kernel?

If I were debugging this I guess I would try to find out for sure where
the NULL dereference occurs -- I guess ib_cm_init_qp_attr() has all the
leaf functions (cm_init_qp_init_attr etc) inlined, so the first step
would be to figure out which one of those functions you're crashing in
(and also confirm that it's always the same one).  You could do that by
marking them noinline, or just put a WARN_ON(!<ptr>) before every
pointer dereference (does 2.6.18 even have WARN_ON?).

 - R.
-- 
Roland Dreier <rolandd-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2010-07-20 20:52 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-19 23:20 NULL pointer dereference in rdma_ucm Josh England
     [not found] ` <AANLkTikwDEJY_F1ziGNPhYMBJEBC5UqD2XOLM7wSByj1-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-20  6:40   ` Or Gerlitz
     [not found]     ` <4C4544CC.3090405-smomgflXvOZWk0Htik3J/w@public.gmane.org>
2010-07-20 19:55       ` Josh England
     [not found]         ` <AANLkTinOmOrg14OZGnj2qe1dwPaXQbN28-0kz2TINF6n-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-21 13:09           ` Or Gerlitz
     [not found]             ` <4C46F19A.3060203-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
2010-07-21 16:11               ` Josh England
     [not found]                 ` <AANLkTilFcc34DV_o-D4jxkmqgh36iGsOQ8BLah-8HjF0-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-21 17:51                   ` Hefty, Sean
2010-07-20 20:52   ` Roland Dreier [this message]
     [not found]     ` <adazkxlq3jn.fsf-BjVyx320WGW9gfZ95n9DRSW4+XlvGpQz@public.gmane.org>
2010-07-21  0:14       ` Josh England
2010-07-21 18:13   ` Hefty, Sean
     [not found]     ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25A71DAA57-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-07-21 18:40       ` Josh England
     [not found]         ` <AANLkTikgvfJ85iCaYaG2My7A92FFXz0Fb9vFhVXtUYyx-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-21 20:51           ` Hefty, Sean
     [not found]             ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25A71DAC7A-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-07-21 21:01               ` Josh England
2010-10-06 18:57       ` Josh England
     [not found]         ` <AANLkTimVvGGW6e=f-gL_Xz1vV4azHuST=6wy8Eba1G35-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-10-06 19:04           ` Hefty, Sean
     [not found]             ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25B532D218-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-10-06 19:57               ` Josh England
2010-07-21 23:36   ` [PATCH] rdma/ib_cm: check LAP state before sending an MRA Hefty, Sean
     [not found]     ` <CF9C39F99A89134C9CF9C4CCB68B8DDF25A71DAEF8-osO9UTpF0USkrb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2010-07-22 15:43       ` Arthur Kepner
2010-07-28 22:19       ` Roland Dreier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adazkxlq3jn.fsf@roland-alpha.cisco.com \
    --to=rdreier-fyb4gu1cfyuavxtiumwx3w@public.gmane.org \
    --cc=jjengla-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox