public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Fredrik Unger <funger-e+cCxrzAqRFWk0Htik3J/w@public.gmane.org>
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: kitten - mlx4: Unhandled interrupt - owner bit
Date: Wed, 10 Mar 2010 16:03:26 +0100	[thread overview]
Message-ID: <4B97B4BE.1050809@hpce.nec.com> (raw)

Hi,

I am new to this list, and if my question is misplaced
please suggest a better forum on or off-list.

We are using InfiniBand (core & mlx4 of OFED 1.4.1 + OFED kernel patches)
in a light weight kernel named kitten, partially derived from linux.
http://code.google.com/p/kitten/

We see problems of one or two unhandled interrupts when doing RDMA_READ
data transfers with mlx4 cards.  (SEND and RDMA_WRITE works well)
It appears only with larger messages 1-4 Mb.
write-combining is turned off.

Below a pingpong test - 1000 iterations per messages size:
ex.
<8>(init_task)       Size     Average      Stddev         Min      Median         Max
...
<8>(init_task)     524288      271.79        7.09      138.96      271.51      429.24
<4>irq_dispatch: Unhandled interrupt 74 (4a) [Owner]
<8>(init_task)    1048576      569.99      981.73      272.01      537.56    31581.67
<8>(init_task)    2097152     1070.57       28.95      537.88     1069.66     1779.97
<8>(init_task)    4194304     2135.99       52.86     1070.10     2134.70     3124.28

This error is random and appears in about one of three runs. Note the high max
value for one 1Mb message, as I guess the connection recovers.

When investigating the error it seems to stem from next_eqe_sw in drivers/net/mlx4/eq.c
called by the interrupt handler.
What happens is that (eqe->owner & 0x80) is true causing the routine to return
NULL resulting in an unhandled interrupt (eg the interrupt routine returns 0)

My understanding is that when the interrupt gets flagged the card would
have given the eqe (event queue entry?) to the software, but it could very well be more complex.

The same message can be seen when starting the driver, but it does not cause any problems :
<6>mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (April 4, 2008)
<4>irq_dispatch: Unhandled interrupt 74 (4a) [Owner]
 .... x 16

This problem could not be reproduced under linux so far.
The kitten interrupt handler is simple and just forwards the interrupt to the driver.

What does owner in the eqe struct mean ? Hardware or Software owns the entry ?
Has this bug been seen in Linux, even if we were not able to reproduce it ?
Can I get more debug information from the card ?
Any tips to what could go wrong in this context ? Are we missing some setup ?


Sincerely,

Fredrik Unger
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

             reply	other threads:[~2010-03-10 15:03 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-10 15:03 Fredrik Unger [this message]
     [not found] ` <4B97B4BE.1050809-e+cCxrzAqRFWk0Htik3J/w@public.gmane.org>
2010-03-10 16:35   ` kitten - mlx4: Unhandled interrupt - owner bit Eli Cohen
     [not found]     ` <20100310163521.GB18440-8YAHvHwT2UEvbXDkjdHOrw/a8Rv0c6iv@public.gmane.org>
2010-03-10 19:39       ` Fredrik Unger
     [not found]         ` <4B97F587.3000209-e+cCxrzAqRFWk0Htik3J/w@public.gmane.org>
2010-03-10 20:00           ` Roland Dreier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B97B4BE.1050809@hpce.nec.com \
    --to=funger-e+ccxrzaqrfwk0htik3j/w@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox