* vl15 drops
@ 2010-05-14 19:55 Bob Ciotti
[not found] ` <20100514195504.GA16471-2Ww08eqqtp6fRvmTrFJqzg@public.gmane.org>
0 siblings, 1 reply; 2+ messages in thread
From: Bob Ciotti @ 2010-05-14 19:55 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
We are chasing down some issues related to fabric discovery and SM failover
and occasionally we see a significant number of vl15 drops and are not sure
if this is a problem or not. Anyone have a reference on what controls are
available to reduce the number of drops? We would like to keep maxsmps set to at
least 16 or even raise it. Spec says at least one vl15 buffer, but are there
controls for buffer allocation on the switches wrt vl15, or other vl15
specific controls (in SM or switches or?) that might reduce drops?
Its a collection of InfiniscaleIII and IV.
Here is a small subset of the VL15 counters on a few on the switches.
Each line corresponds to a 24 port infiniscaleIII switch. All the d1
thru d11 (hypercube dimensions) connections are to other switches (not HCA).
r53i3/sw0/port 24 is where the SM is connected showing 9678 drops...
-- cb1 ib0 sw0 . . . . d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 io
== cb1 ib0 sw0 . . . SWPORT 9 13 14 15 16 17 18 19 20 21 22 23 24
r49i0 cb1 ib0 sw0 SwLid 7842 port-1 VL15Dropped: . . 1 3 8 13 11 . . . . . .
r49i1 cb1 ib0 sw0 SwLid 8349 port-1 VL15Dropped: . . 17 8 20 16 2 2 . . . . .
r49i2 cb1 ib0 sw0 SwLid 8554 port-1 VL15Dropped: . 4 . 12 23 2 2 . 2 . . . .
r49i3 cb1 ib0 sw0 SwLid 7330 port-1 VL15Dropped: 3 1 1 25 71 57 7 8 1 . . . .
r50i0 cb1 ib0 sw0 SwLid 8682 port-1 VL15Dropped: . 2 61 3 6 7 . 5 . . . . .
r50i1 cb1 ib0 sw0 SwLid 8132 port-1 VL15Dropped: . 3 58 33 20 44 1 9 4 . . . .
r50i2 cb1 ib0 sw0 SwLid 6435 port-1 VL15Dropped: 4 122 14 8 13 36 2 . . . . . .
r50i3 cb1 ib0 sw0 SwLid 8027 port-1 VL15Dropped: 22 171 167 49 113 57 . 2 . . . . .
r51i0 cb1 ib0 sw0 SwLid 7756 port-1 VL15Dropped: . . 3 24 16 11 . 2 . . . . .
r51i1 cb1 ib0 sw0 SwLid 6678 port-1 VL15Dropped: . 5 8 28 7 27 2 . 4 . . . .
r51i2 cb1 ib0 sw0 SwLid 7933 port-1 VL15Dropped: 1 34 29 3 7 5 2 . . . . . .
r51i3 cb1 ib0 sw0 SwLid 7426 port-1 VL15Dropped: 14 31 145 683 18 5 2 . 1 . . . .
r52i0 cb1 ib0 sw0 SwLid 6990 port-1 VL15Dropped: 1 26 65 23 132 49 1 1 1 . . . .
r52i1 cb1 ib0 sw0 SwLid 7465 port-1 VL15Dropped: 33 66 388 48 18 4 4 . 2 . . . .
r52i2 cb1 ib0 sw0 SwLid 6914 port-1 VL15Dropped: 17 320 39 14 15 12 4 5 4 . . . .
r52i3 cb1 ib0 sw0 SwLid 7895 port-1 VL15Dropped: 1770 614 1449 189 299 197 35 124 56 . 55 . 9678
r53i0 cb1 ib0 sw0 SwLid 8171 port-1 VL15Dropped: . . . . 1 5 . . . . . . .
r53i1 cb1 ib0 sw0 SwLid 8471 port-1 VL15Dropped: . . 1 . . 15 3 1 5 . . . .
r53i2 cb1 ib0 sw0 SwLid 6485 port-1 VL15Dropped: . . . . 8 17 2 . 2 . . . .
r53i3 cb1 ib0 sw0 SwLid 5949 port-1 VL15Dropped: . . 1 3 3 1 8 . . . . . .
r54i0 cb1 ib0 sw0 SwLid 7809 port-1 VL15Dropped: . . . . 2 . 2 . . . . . .
r54i1 cb1 ib0 sw0 SwLid 7890 port-1 VL15Dropped: . . . 4 16 1 . . 2 . . . .
r54i2 cb1 ib0 sw0 SwLid 6913 port-1 VL15Dropped: . 1 . 1 21 4 4 . 3 . . . .
r54i3 cb1 ib0 sw0 SwLid 7283 port-1 VL15Dropped: 1 11 1 7 34 4 18 . 2 . . . .
r55i0 cb1 ib0 sw0 SwLid 8350 port-1 VL15Dropped: . . . 2 . 8 7 . . . . . .
r55i1 cb1 ib0 sw0 SwLid 7516 port-1 VL15Dropped: . . 3 24 13 3 4 . 1 . . . .
r55i2 cb1 ib0 sw0 SwLid 8518 port-1 VL15Dropped: . 6 . 16 . 1 4 1 . . . . .
r55i3 cb1 ib0 sw0 SwLid 7666 port-1 VL15Dropped: . . 9 12 4 4 20 1 . . . . .
r56i0 cb1 ib0 sw0 SwLid 8219 port-1 VL15Dropped: . 3 29 12 . 6 . . 1 . . . .
r56i1 cb1 ib0 sw0 SwLid 5912 port-1 VL15Dropped: . 3 11 13 4 13 . . 2 . . . .
r56i2 cb1 ib0 sw0 SwLid 7329 port-1 VL15Dropped: 1 105 39 27 1 1 1 . . . . . .
r56i3 cb1 ib0 sw0 SwLid 7424 port-1 VL15Dropped: 14 6 11 77 10 46 4 2 . . . . .
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: vl15 drops
[not found] ` <20100514195504.GA16471-2Ww08eqqtp6fRvmTrFJqzg@public.gmane.org>
@ 2010-05-18 13:10 ` Hal Rosenstock
0 siblings, 0 replies; 2+ messages in thread
From: Hal Rosenstock @ 2010-05-18 13:10 UTC (permalink / raw)
To: Bob Ciotti; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Bob,
On Fri, May 14, 2010 at 3:55 PM, Bob Ciotti <Bob.Ciotti@nasa.gov> wrote:
> We are chasing down some issues related to fabric discovery and SM failover
> and occasionally we see a significant number of vl15 drops and are not sure
> if this is a problem or not. Anyone have a reference on what controls are
> available to reduce the number of drops? We would like to keep maxsmps set to at
> least 16 or even raise it. Spec says at least one vl15 buffer, but are there
> controls for buffer allocation on the switches wrt vl15, or other vl15
> specific controls (in SM or switches or?) that might reduce drops?
maxsmps exceeding the number of VL15 buffers available in the devices
can cause such drops. Switches may have controls over configuring such
buffering but it would be vendor proprietary.
Does reducing maxsmps reduce the VL15 dropping ?
> Its a collection of InfiniscaleIII and IV.
>
> Here is a small subset of the VL15 counters on a few on the switches.
> Each line corresponds to a 24 port infiniscaleIII switch. All the d1
> thru d11 (hypercube dimensions) connections are to other switches (not HCA).
>
> r53i3/sw0/port 24 is where the SM is connected showing 9678 drops...
>
>
> -- cb1 ib0 sw0 . . . . d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 d12 io
> == cb1 ib0 sw0 . . . SWPORT 9 13 14 15 16 17 18 19 20 21 22 23 24
> r49i0 cb1 ib0 sw0 SwLid 7842 port-1 VL15Dropped: . . 1 3 8 13 11 . . . . . .
> r49i1 cb1 ib0 sw0 SwLid 8349 port-1 VL15Dropped: . . 17 8 20 16 2 2 . . . . .
> r49i2 cb1 ib0 sw0 SwLid 8554 port-1 VL15Dropped: . 4 . 12 23 2 2 . 2 . . . .
> r49i3 cb1 ib0 sw0 SwLid 7330 port-1 VL15Dropped: 3 1 1 25 71 57 7 8 1 . . . .
> r50i0 cb1 ib0 sw0 SwLid 8682 port-1 VL15Dropped: . 2 61 3 6 7 . 5 . . . . .
> r50i1 cb1 ib0 sw0 SwLid 8132 port-1 VL15Dropped: . 3 58 33 20 44 1 9 4 . . . .
> r50i2 cb1 ib0 sw0 SwLid 6435 port-1 VL15Dropped: 4 122 14 8 13 36 2 . . . . . .
> r50i3 cb1 ib0 sw0 SwLid 8027 port-1 VL15Dropped: 22 171 167 49 113 57 . 2 . . . . .
> r51i0 cb1 ib0 sw0 SwLid 7756 port-1 VL15Dropped: . . 3 24 16 11 . 2 . . . . .
> r51i1 cb1 ib0 sw0 SwLid 6678 port-1 VL15Dropped: . 5 8 28 7 27 2 . 4 . . . .
> r51i2 cb1 ib0 sw0 SwLid 7933 port-1 VL15Dropped: 1 34 29 3 7 5 2 . . . . . .
> r51i3 cb1 ib0 sw0 SwLid 7426 port-1 VL15Dropped: 14 31 145 683 18 5 2 . 1 . . . .
> r52i0 cb1 ib0 sw0 SwLid 6990 port-1 VL15Dropped: 1 26 65 23 132 49 1 1 1 . . . .
> r52i1 cb1 ib0 sw0 SwLid 7465 port-1 VL15Dropped: 33 66 388 48 18 4 4 . 2 . . . .
> r52i2 cb1 ib0 sw0 SwLid 6914 port-1 VL15Dropped: 17 320 39 14 15 12 4 5 4 . . . .
> r52i3 cb1 ib0 sw0 SwLid 7895 port-1 VL15Dropped: 1770 614 1449 189 299 197 35 124 56 . 55 . 9678
> r53i0 cb1 ib0 sw0 SwLid 8171 port-1 VL15Dropped: . . . . 1 5 . . . . . . .
> r53i1 cb1 ib0 sw0 SwLid 8471 port-1 VL15Dropped: . . 1 . . 15 3 1 5 . . . .
> r53i2 cb1 ib0 sw0 SwLid 6485 port-1 VL15Dropped: . . . . 8 17 2 . 2 . . . .
> r53i3 cb1 ib0 sw0 SwLid 5949 port-1 VL15Dropped: . . 1 3 3 1 8 . . . . . .
> r54i0 cb1 ib0 sw0 SwLid 7809 port-1 VL15Dropped: . . . . 2 . 2 . . . . . .
> r54i1 cb1 ib0 sw0 SwLid 7890 port-1 VL15Dropped: . . . 4 16 1 . . 2 . . . .
> r54i2 cb1 ib0 sw0 SwLid 6913 port-1 VL15Dropped: . 1 . 1 21 4 4 . 3 . . . .
> r54i3 cb1 ib0 sw0 SwLid 7283 port-1 VL15Dropped: 1 11 1 7 34 4 18 . 2 . . . .
> r55i0 cb1 ib0 sw0 SwLid 8350 port-1 VL15Dropped: . . . 2 . 8 7 . . . . . .
> r55i1 cb1 ib0 sw0 SwLid 7516 port-1 VL15Dropped: . . 3 24 13 3 4 . 1 . . . .
> r55i2 cb1 ib0 sw0 SwLid 8518 port-1 VL15Dropped: . 6 . 16 . 1 4 1 . . . . .
> r55i3 cb1 ib0 sw0 SwLid 7666 port-1 VL15Dropped: . . 9 12 4 4 20 1 . . . . .
> r56i0 cb1 ib0 sw0 SwLid 8219 port-1 VL15Dropped: . 3 29 12 . 6 . . 1 . . . .
> r56i1 cb1 ib0 sw0 SwLid 5912 port-1 VL15Dropped: . 3 11 13 4 13 . . 2 . . . .
> r56i2 cb1 ib0 sw0 SwLid 7329 port-1 VL15Dropped: 1 105 39 27 1 1 1 . . . . . .
> r56i3 cb1 ib0 sw0 SwLid 7424 port-1 VL15Dropped: 14 6 11 77 10 46 4 2 . . .
Are these on both IS-3s and IS-4s ? Any idea on "when" the drops occur
? Are there any messages in the OpenSM log which might relate to this
(e.g. duplicated traps received frequently) ?
-- Hal
. .
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2010-05-18 13:10 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-14 19:55 vl15 drops Bob Ciotti
[not found] ` <20100514195504.GA16471-2Ww08eqqtp6fRvmTrFJqzg@public.gmane.org>
2010-05-18 13:10 ` Hal Rosenstock
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox