public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* Infiniband poll_cq : nothing on queue
@ 2011-03-17 22:09 Greg Kerr
       [not found] ` <201103171809.19293.kerr.g-movQPkccWJngpn9g0Uvcdg@public.gmane.org>
  0 siblings, 1 reply; 2+ messages in thread
From: Greg Kerr @ 2011-03-17 22:09 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi,

Sorry if this meant to be a kernel dev only mailing list, but I'm working on 
an Infiniband program, and am having a problem that seems to have no obvious 
answer. I was hoping someone here could nudge me in the right direction (since 
right now I'm search around blindly).

When I run ibv_post_send and ibv_post_recv on both nodes, no error is 
returned, but ibv_poll_cq never finds any completions on the queue. I was 
wondering what could be the cause of this, since I've spent a few days looking 
now.

The dest_qp_num seems fine on both nodes, as do the rq_psn and sq_psn.
I'm not sure where else there could be a problem.

To provide more background information I ran ibdump on my program (on both
nodes) and then analyzed the output in Wireshark. Basically node1 shows
nothing but RC Acknowledge packets and Node 2 shows nothing but RC Send
First packets. Does that reveal anything about where the problem likely
lies?

Of course if I look at the output of, say, ibv_rc_pingpong in Wireshark both
nodes show RC Send First, RC Send Middle, and RC Send Last packets, among
others.

I know this is too vague to really pinpoint my problem but I am hoping
someone can nudge me in the right direction of where I might try looking
(since nothing I've looked at so far has identified any clear problems).

Thanks,

Greg Kerr
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Infiniband poll_cq : nothing on queue
       [not found] ` <201103171809.19293.kerr.g-movQPkccWJngpn9g0Uvcdg@public.gmane.org>
@ 2011-03-17 22:30   ` Jason Gunthorpe
  0 siblings, 0 replies; 2+ messages in thread
From: Jason Gunthorpe @ 2011-03-17 22:30 UTC (permalink / raw)
  To: Greg Kerr; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, Mar 17, 2011 at 06:09:18PM -0400, Greg Kerr wrote:

> To provide more background information I ran ibdump on my program (on both
> nodes) and then analyzed the output in Wireshark. Basically node1 shows
> nothing but RC Acknowledge packets and Node 2 shows nothing but RC Send
> First packets. Does that reveal anything about where the problem likely
> lies?

That means node 2 is not getting/dropping the ACK packets from node
1.

Best choices, wrong PSN, wrong DLID, wrong QPN on on node 1, or the
QP is not in RTS on node 2.
 
Remember all three of the values are supposed to be swapped, node
1/node 2 rq_psn/sq_psn, DLID/SLID, qpn/dest_qp_num, and you have to
modify_qp three times all the way to RTS on both sides.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-03-17 22:30 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-17 22:09 Infiniband poll_cq : nothing on queue Greg Kerr
     [not found] ` <201103171809.19293.kerr.g-movQPkccWJngpn9g0Uvcdg@public.gmane.org>
2011-03-17 22:30   ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox