IPoIB issues

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

* IPoIB issues
@ 2010-03-02 21:54 Josh England
       [not found] ` <a72123c41003021354y7880e74cud26d6010f23f9458-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Josh England @ 2010-03-02 21:54 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hello,

I've been running into several issues using IPoIB.  The 2 primary uses
are for read-only NFS to the clients (over TCP) and access to an
ethernet-connected parallel filesystem (Panasas) through router nodes
passing IPoIB<-->10GbE.

All nodes are running CentOS 5.3 and OFED 1.4.2, although a have played
with OFED 1.5 and seen similar results.  Client nodes mount their NFS root
from boot servers via IPoIB with a ratio of 80:1.  The boot servers are the
ones that seem to have issues.  The fabric itself consists of ~1000 nodes
interconnected such that their is 2:1 oversubscription within any single rack,
and 20:1 oversubscription between racks (through the core switch).  I
don't know how much the oversubscription comes into play here as I can
reproduce the error within a single rack.

In datagram mode, I see errors on the boot servers of the form.

ib0: post_send failed
ib0: post_send failed
ib0: post_send failed

When using connected mode, I hit a different error:

NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 1999 msecs
ib0: queue stopped 1, tx_head 2154042680, tx_tail 2154039464
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 2999 msecs
ib0: queue stopped 1, tx_head 2154042680, tx_tail 2154039464
...
...
NETDEV WATCHDOG: ib0: transmit timed out
ib0: transmit timeout: latency 61824999 msecs
ib0: queue stopped 1, tx_head 2154042680, tx_tail 2154039464

The errors seem to hit only after NFS comes into play.  Once it
starts, the NETDEV WATCHDOG messages continue until I run
'ifconfig ib0 down up'.  I've tried tuning send_queue_size and
recv_queue_size on both sides, the txqueuelen of the ib0 interface, the
NFS rsize/wsize.  None of it seems to help greatly.  Does anyone have
any ideas about what can I do to try to fix
these problems?

-JE
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: IPoIB issues
       [not found] ` <a72123c41003021354y7880e74cud26d6010f23f9458-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-03-03 12:29   ` Eli Cohen
       [not found]     ` <20100303122937.GA1689-8YAHvHwT2UEvbXDkjdHOrw/a8Rv0c6iv@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Cohen @ 2010-03-03 12:29 UTC (permalink / raw)
  To: Josh England; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

I just posted a patch which might fix your problem. Please try it and
let us know if it fixed anything.

On Tue, Mar 02, 2010 at 01:54:09PM -0800, Josh England wrote:
> Hello,
> 
> I've been running into several issues using IPoIB.  The 2 primary uses
> are for read-only NFS to the clients (over TCP) and access to an
> ethernet-connected parallel filesystem (Panasas) through router nodes
> passing IPoIB<-->10GbE.
> 
> All nodes are running CentOS 5.3 and OFED 1.4.2, although a have played
> with OFED 1.5 and seen similar results.  Client nodes mount their NFS root
> from boot servers via IPoIB with a ratio of 80:1.  The boot servers are the
> ones that seem to have issues.  The fabric itself consists of ~1000 nodes
> interconnected such that their is 2:1 oversubscription within any single rack,
> and 20:1 oversubscription between racks (through the core switch).  I
> don't know how much the oversubscription comes into play here as I can
> reproduce the error within a single rack.
> 
> In datagram mode, I see errors on the boot servers of the form.
> 
> ib0: post_send failed
> ib0: post_send failed
> ib0: post_send failed
> 
> 
> When using connected mode, I hit a different error:
> 
> NETDEV WATCHDOG: ib0: transmit timed out
> ib0: transmit timeout: latency 1999 msecs
> ib0: queue stopped 1, tx_head 2154042680, tx_tail 2154039464
> NETDEV WATCHDOG: ib0: transmit timed out
> ib0: transmit timeout: latency 2999 msecs
> ib0: queue stopped 1, tx_head 2154042680, tx_tail 2154039464
> ...
> ...
> NETDEV WATCHDOG: ib0: transmit timed out
> ib0: transmit timeout: latency 61824999 msecs
> ib0: queue stopped 1, tx_head 2154042680, tx_tail 2154039464
> 
> 
> The errors seem to hit only after NFS comes into play.  Once it
> starts, the NETDEV WATCHDOG messages continue until I run
> 'ifconfig ib0 down up'.  I've tried tuning send_queue_size and
> recv_queue_size on both sides, the txqueuelen of the ib0 interface, the
> NFS rsize/wsize.  None of it seems to help greatly.  Does anyone have
> any ideas about what can I do to try to fix
> these problems?
> 
> -JE
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: IPoIB issues
       [not found]     ` <20100303122937.GA1689-8YAHvHwT2UEvbXDkjdHOrw/a8Rv0c6iv@public.gmane.org>
@ 2010-03-04  0:38       ` Josh England
  2010-03-10 15:30       ` Moni Shoua
  1 sibling, 0 replies; 7+ messages in thread
From: Josh England @ 2010-03-04  0:38 UTC (permalink / raw)
  To: Eli Cohen; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

I've applied the patch and initial testing has not produced any
transmit timeout errors.  I'll be doing some heavier testing in the
next couple days, but it looks good so far.  Thanks for the quick
turn-around!

-JE

On Wed, Mar 3, 2010 at 4:29 AM, Eli Cohen <eli-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> I just posted a patch which might fix your problem. Please try it and
> let us know if it fixed anything.
>
> On Tue, Mar 02, 2010 at 01:54:09PM -0800, Josh England wrote:
>> Hello,
>>
>> I've been running into several issues using IPoIB.  The 2 primary uses
>> are for read-only NFS to the clients (over TCP) and access to an
>> ethernet-connected parallel filesystem (Panasas) through router nodes
>> passing IPoIB<-->10GbE.
>>
>> All nodes are running CentOS 5.3 and OFED 1.4.2, although a have played
>> with OFED 1.5 and seen similar results.  Client nodes mount their NFS root
>> from boot servers via IPoIB with a ratio of 80:1.  The boot servers are the
>> ones that seem to have issues.  The fabric itself consists of ~1000 nodes
>> interconnected such that their is 2:1 oversubscription within any single rack,
>> and 20:1 oversubscription between racks (through the core switch).  I
>> don't know how much the oversubscription comes into play here as I can
>> reproduce the error within a single rack.
>>
>> In datagram mode, I see errors on the boot servers of the form.
>>
>> ib0: post_send failed
>> ib0: post_send failed
>> ib0: post_send failed
>>
>>
>> When using connected mode, I hit a different error:
>>
>> NETDEV WATCHDOG: ib0: transmit timed out
>> ib0: transmit timeout: latency 1999 msecs
>> ib0: queue stopped 1, tx_head 2154042680, tx_tail 2154039464
>> NETDEV WATCHDOG: ib0: transmit timed out
>> ib0: transmit timeout: latency 2999 msecs
>> ib0: queue stopped 1, tx_head 2154042680, tx_tail 2154039464
>> ...
>> ...
>> NETDEV WATCHDOG: ib0: transmit timed out
>> ib0: transmit timeout: latency 61824999 msecs
>> ib0: queue stopped 1, tx_head 2154042680, tx_tail 2154039464
>>
>>
>> The errors seem to hit only after NFS comes into play.  Once it
>> starts, the NETDEV WATCHDOG messages continue until I run
>> 'ifconfig ib0 down up'.  I've tried tuning send_queue_size and
>> recv_queue_size on both sides, the txqueuelen of the ib0 interface, the
>> NFS rsize/wsize.  None of it seems to help greatly.  Does anyone have
>> any ideas about what can I do to try to fix
>> these problems?
>>
>> -JE
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: IPoIB issues
       [not found]     ` <20100303122937.GA1689-8YAHvHwT2UEvbXDkjdHOrw/a8Rv0c6iv@public.gmane.org>
  2010-03-04  0:38       ` Josh England
@ 2010-03-10 15:30       ` Moni Shoua
       [not found]         ` <4B97BB1E.7010900-hKgKHo2Ms0F+cjeuK/JdrQ@public.gmane.org>
  1 sibling, 1 reply; 7+ messages in thread
From: Moni Shoua @ 2010-03-10 15:30 UTC (permalink / raw)
  To: Eli Cohen; +Cc: Josh England, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Eli Cohen wrote:
> I just posted a patch which might fix your problem. Please try it and
> let us know if it fixed anything.
> 
Hi Eli
Although Josh already reported that the patch seems to fix the issue I have a question though.

"post_send failed" prints were during work in datagram mode. I don't know if Josh verified 
that but I don't expect that these prints would go away, even with the patch. Am I right?

BTW, what could be the reason for UD QP post_send() failures?

>>
>> In datagram mode, I see errors on the boot servers of the form.
>>
>> ib0: post_send failed
>> ib0: post_send failed
>> ib0: post_send failed
>>
>>
>> When using connected mode, I hit a different error:
>>
>> NETDEV WATCHDOG: ib0: transmit timed out
>> ib0: transmit timeout: latency 1999 msecs
>> ib0: queue stopped 1, tx_head 2154042680, tx_tail 2154039464
>> NETDEV WATCHDOG: ib0: transmit timed out
>> ib0: transmit timeout: latency 2999 msecs
>> ib0: queue stopped 1, tx_head 2154042680, tx_tail 2154039464
>> ...
>> ...
>> NETDEV WATCHDOG: ib0: transmit timed out
>> ib0: transmit timeout: latency 61824999 msecs
>> ib0: queue stopped 1, tx_head 2154042680, tx_tail 2154039464
>>
>>
>> The errors seem to hit only after NFS comes into play.  Once it
>> starts, the NETDEV WATCHDOG messages continue until I run
>> 'ifconfig ib0 down up'.  I've tried tuning send_queue_size and
>> recv_queue_size on both sides, the txqueuelen of the ib0 interface, the
>> NFS rsize/wsize.  None of it seems to help greatly.  Does anyone have
>> any ideas about what can I do to try to fix
>> these problems?
>>
>> -JE
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: IPoIB issues
       [not found]         ` <4B97BB1E.7010900-hKgKHo2Ms0F+cjeuK/JdrQ@public.gmane.org>
@ 2010-03-11  6:56           ` Eli Cohen
       [not found]             ` <20100311065640.GB2081-8YAHvHwT2UEvbXDkjdHOrw/a8Rv0c6iv@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Eli Cohen @ 2010-03-11  6:56 UTC (permalink / raw)
  To: Moni Shoua; +Cc: Josh England, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Wed, Mar 10, 2010 at 05:30:38PM +0200, Moni Shoua wrote:
> Hi Eli
> Although Josh already reported that the patch seems to fix the issue I have a question though.
> 
> "post_send failed" prints were during work in datagram mode. I don't know if Josh verified 
> that but I don't expect that these prints would go away, even with the patch. Am I right?
The patch does not address these failures directly but maybe as a side
effect they would go away too. Maybe Josh can share with us his
experience.

> 
> BTW, what could be the reason for UD QP post_send() failures?
> 

Usually they should not fail unless the WR is malformed or the QP has
all available WR outstanding, which should not happen in IPoIB. I
think printing the return value is in place so in the future we will
have more information in such cases.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: IPoIB issues
       [not found]             ` <20100311065640.GB2081-8YAHvHwT2UEvbXDkjdHOrw/a8Rv0c6iv@public.gmane.org>
@ 2010-03-11  7:47               ` Or Gerlitz
       [not found]                 ` <4B98A013.3040103-smomgflXvOZWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Or Gerlitz @ 2010-03-11  7:47 UTC (permalink / raw)
  To: Eli Cohen; +Cc: Moni Shoua, Josh England, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Eli Cohen wrote:
> The patch does not address these failures directly but maybe as a side effect they would go away too. 
The patch seems to solve a case of possible "live lock" happening in a 
node which has both CM and datagram neighbors e.g where ipoib have 
called netif_stop etc but there is now room in the QP for more postings 
which could turn into letting the network layer continue to post if the 
CQ would have been polled. Its hard to see how this  relates to the post 
send error print

> I think printing the return value is in place so in the future we will have more information in such cases.
I posted a patch that does this, but I think it missed the 2.6.34 merge 
cycle.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: IPoIB issues
       [not found]                 ` <4B98A013.3040103-smomgflXvOZWk0Htik3J/w@public.gmane.org>
@ 2010-03-11  7:59                   ` Eli Cohen
  0 siblings, 0 replies; 7+ messages in thread
From: Eli Cohen @ 2010-03-11  7:59 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Moni Shoua, Josh England, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Thu, Mar 11, 2010 at 09:47:31AM +0200, Or Gerlitz wrote:
> >The patch does not address these failures directly but maybe as a
> >side effect they would go away too.
> The patch seems to solve a case of possible "live lock" happening in
> a node which has both CM and datagram neighbors e.g where ipoib have
> called netif_stop etc but there is now room in the QP for more
> postings which could turn into letting the network layer continue to
> post if the CQ would have been polled. Its hard to see how this
> relates to the post send error print
Right, I meant that they could disapear due to the system not getting
into such a state that they will show up but the patch __does not__
address that problem.

> 
> >I think printing the return value is in place so in the future we will have more information in such cases.
> I posted a patch that does this, but I think it missed the 2.6.34
> merge cycle.
> 
Can you push them to OFED-1.5.1? We'll remove the patch later when
it's in the kernel but at least we'll have the information handy
if/when we need it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-03-11  7:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-02 21:54 IPoIB issues Josh England
     [not found] ` <a72123c41003021354y7880e74cud26d6010f23f9458-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-03-03 12:29   ` Eli Cohen
     [not found]     ` <20100303122937.GA1689-8YAHvHwT2UEvbXDkjdHOrw/a8Rv0c6iv@public.gmane.org>
2010-03-04  0:38       ` Josh England
2010-03-10 15:30       ` Moni Shoua
     [not found]         ` <4B97BB1E.7010900-hKgKHo2Ms0F+cjeuK/JdrQ@public.gmane.org>
2010-03-11  6:56           ` Eli Cohen
     [not found]             ` <20100311065640.GB2081-8YAHvHwT2UEvbXDkjdHOrw/a8Rv0c6iv@public.gmane.org>
2010-03-11  7:47               ` Or Gerlitz
     [not found]                 ` <4B98A013.3040103-smomgflXvOZWk0Htik3J/w@public.gmane.org>
2010-03-11  7:59                   ` Eli Cohen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox