QoS settings not mapped correctly per pkey ?

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

* QoS settings not mapped correctly per pkey ?
@ 2009-11-25 10:57 Vincent Ficet
       [not found] ` <4B0D0DB2.6080802-6ktuUTfB/bM@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Vincent Ficet @ 2009-11-25 10:57 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb
  Cc: BOURDE CELINE, Vincent Ficet

Hello,

Following the QoS experiments I carried out yesterday, I wanted to set
up 3 IP networks, each one bound to a particular pkey, in order to
achieve QoS for each network.
Unfortunately, it seems that something is not mapped properly in the ULP
layers (vlarb tables are fine).

The settings are as follows:

opensm.conf:
------------

qos_max_vls    8
qos_high_limit 1
qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
qos_vlarb_low  0:8,1:1,2:1,3:4,4:0,5:0
qos_sl2vl      0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15

The corresponding VLArb tables are fine on both the server (pichu16) and
the client (pichu22):

[root@pichu22 network-scripts]# smpquery vlarb -D 0
# VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 0 LowCap
8 HighCap 8
# Low priority VL Arbitration Table:
VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
WEIGHT: |0x8 |0x1 |0x1 |0x4 |0x0 |0x0 |0x0 |0x0 |
# High priority VL Arbitration Table:
VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |

[root@pichu16 ~]# smpquery vlarb -D 0
# VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 0 LowCap
8 HighCap 8
# Low priority VL Arbitration Table:
VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
WEIGHT: |0x8 |0x1 |0x1 |0x4 |0x0 |0x0 |0x0 |0x0 |
# High priority VL Arbitration Table:
VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |

partitions.conf:
---------------

default=0x7fff,ipoib            : ALL=full;
ip_backbone=0x0001,ipoib        : ALL=full;
ip_admin=0x0002,ipoib            : ALL=full;

qos-policy.conf:
---------------

qos-ulps
    default                : 0 # default SL
    ipoib, pkey 0x7FFF     : 1 # IP with default pkey 0x7FFF
    ipoib, pkey 0x1        : 2 # backbone IP with pkey 0x1
    ipoib, pkey 0x2        : 3 # admin IP with pkey 0x2
end-qos-ulps

Assigned IP addresses (in /etc/hosts):
-------------------------------------

10.12.1.4       pichu16-ic0             # default IPoIB network, pkey 0x7FFF
10.13.1.4       pichu16-backbone        # IPoIB backbone network, pkey 0x1
10.14.1.4       pichu16-admin           # IPoIB admin network, pkey 0x2
10.12.1.10      pichu22-ic0             # default IPoIB network, pkey 0x7FFF
10.13.1.10      pichu22-backbone        # IPoIB backbone network, pkey 0x1
10.14.1.10      pichu22-admin           # IPoIB admin network, pkey 0x2

Note that the netmask is /16, so the -ic0, -backbone and -admin networks
cannot see each other.

IPoIB settings on server side:
------------------------------

[root@pichu16 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
==> /etc/sysconfig/network-scripts/ifcfg-ib0 <==
BOOTPROTO=static
IPADDR=10.12.1.4
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2044

==> /etc/sysconfig/network-scripts/ifcfg-ib0.8001 <==
BOOTPROTO=static
IPADDR=10.13.1.4
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2044

==> /etc/sysconfig/network-scripts/ifcfg-ib0.8002 <==
BOOTPROTO=static
IPADDR=10.14.1.4
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2044

[root@pichu16 ~]# ip addr show ib0
4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
state UP qlen 256
    link/infiniband
80:00:00:48:fe:80:00:00:00:00:00:00:2c:90:00:10:0d:00:05:6d brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet 10.12.1.4/16 brd 10.12.255.255 scope global ib0
    inet 10.13.1.4/16 brd 10.13.255.255 scope global ib0
    inet 10.14.1.4/16 brd 10.14.255.255 scope global ib0
    inet6 fe80::2e90:10:d00:56d/64 scope link
       valid_lft forever preferred_lft forever

IPoIB settings on client side:
------------------------------

[root@pichu22 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
==> /etc/sysconfig/network-scripts/ifcfg-ib0 <==
BOOTPROTO=static
IPADDR=10.12.1.10
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2044

==> /etc/sysconfig/network-scripts/ifcfg-ib0.8001 <==
BOOTPROTO=static
IPADDR=10.13.1.10
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2044

==> /etc/sysconfig/network-scripts/ifcfg-ib0.8002 <==
BOOTPROTO=static
IPADDR=10.14.1.10
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2044

[root@pichu22 ~]# ip addr show ib0
48: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
state UP qlen 256
    link/infiniband
80:00:00:48:fe:80:00:00:00:00:00:00:2c:90:00:10:0d:00:06:79 brd
00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet 10.12.1.10/16 brd 10.12.255.255 scope global ib0
    inet 10.13.1.10/16 brd 10.13.255.255 scope global ib0
    inet 10.14.1.10/16 brd 10.14.255.255 scope global ib0
    inet6 fe80::2e90:10:d00:679/64 scope link
       valid_lft forever preferred_lft forever

Iperf servers on server side:
-----------------------------

Quoting from iperf help:
  -B, --bind      <host>   bind to <host>, an interface or multicast address
  -s, --server             run in server mode

Each iperf server is bound to a dedicated interface as follows:

[root@pichu16 ~]# iperf -s -B pichu16-backbone
[root@pichu16 ~]# iperf -s -B pichu16-admin
[root@pichu16 ~]# iperf -s -B pichu16-ic0

Iperf clients on client side:
-----------------------------

Quoting from iperf help:
  -c, --client    <host>   run in client mode, connecting to <host>
  -t, --time      #        time in seconds to transmit for (default 10 secs)

And each iperf client talks to the corresponding iperf server:

[root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
100 2>&1; done | grep Gbits/sec
[  3]  0.0-100.0 sec  64.6 GBytes  5.55 Gbits/sec
[  3]  0.0-100.0 sec  64.5 GBytes  5.54 Gbits/sec
[  3]  0.0-100.0 sec  60.5 GBytes  5.20 Gbits/sec
[root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
-t 100 2>&1; done | grep Gbits/sec
[  3]  0.0-100.0 sec  64.8 GBytes  5.57 Gbits/sec
[  3]  0.0-100.0 sec  56.7 GBytes  4.87 Gbits/sec
[  3]  0.0-100.0 sec  59.7 GBytes  5.13 Gbits/sec
[root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t
100 2>&1; done | grep Gbits/sec
[  3]  0.0-100.0 sec  57.3 GBytes  4.92 Gbits/sec
[  3]  0.0-100.0 sec  61.6 GBytes  5.29 Gbits/sec
[  3]  0.0-100.0 sec  62.7 GBytes  5.38 Gbits/sec

Given the VLarb weights assigned (1 for *-ic0 on VL1, 1 for *-backbone
on VL2 and 4 for *-admin on VL3), we would expect different b/w figures
for the *-admin network.
As we can see, all iperf values are the same, showing that QoS is not
enforced on a per pkey basis.
It seems to me that something is not mapped properly in the ULP layers.
Could anyone tell me if I'm wrong here ? If not, is that a known issue ?

Thanks for your help,

Vincent




--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <4B0D0DB2.6080802-6ktuUTfB/bM@public.gmane.org>]

* Re: QoS settings not mapped correctly per pkey ?
       [not found] ` <4B0D0DB2.6080802-6ktuUTfB/bM@public.gmane.org>
@ 2009-11-25 12:12   ` Yevgeny Kliteynik
       [not found]     ` <4B0D1F36.1090007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Yevgeny Kliteynik @ 2009-11-25 12:12 UTC (permalink / raw)
  To: Vincent Ficet; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, BOURDE CELINE

Hi Vincent,

Vincent Ficet wrote:
> Hello,
> 
> Following the QoS experiments I carried out yesterday, I wanted to set
> up 3 IP networks, each one bound to a particular pkey, in order to
> achieve QoS for each network.
> Unfortunately, it seems that something is not mapped properly in the ULP
> layers (vlarb tables are fine).
> 
> The settings are as follows:
> 
> opensm.conf:
> ------------
> 
> qos_max_vls    8
> qos_high_limit 1
> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
> qos_vlarb_low  0:8,1:1,2:1,3:4,4:0,5:0
> qos_sl2vl      0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15

Please check section 7 of the QoS_management_in_OpenSM.txt
doc. It explains what exactly is the meaning of the values
in the VLArb table. It also has explanation of the problem
that you're seeing. Quoting from there:

 "Keep in mind that ports usually transmit packets of
  size equal to MTU. For instance, for 4KB MTU a single
  packet will require 64 credits, so in order to achieve
  effective VL arbitration for packets of 4KB MTU, the
  weighting values for each VL should be multiples of 64."

-- Yevgeny

 
> The corresponding VLArb tables are fine on both the server (pichu16) and
> the client (pichu22):
> 
> [root@pichu22 network-scripts]# smpquery vlarb -D 0
> # VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 0 LowCap
> 8 HighCap 8
> # Low priority VL Arbitration Table:
> VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
> WEIGHT: |0x8 |0x1 |0x1 |0x4 |0x0 |0x0 |0x0 |0x0 |
> # High priority VL Arbitration Table:
> VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
> WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
> 
> [root@pichu16 ~]# smpquery vlarb -D 0
> # VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 0 LowCap
> 8 HighCap 8
> # Low priority VL Arbitration Table:
> VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
> WEIGHT: |0x8 |0x1 |0x1 |0x4 |0x0 |0x0 |0x0 |0x0 |
> # High priority VL Arbitration Table:
> VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
> WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
> 
> partitions.conf:
> ---------------
> 
> default=0x7fff,ipoib            : ALL=full;
> ip_backbone=0x0001,ipoib        : ALL=full;
> ip_admin=0x0002,ipoib            : ALL=full;
> 
> qos-policy.conf:
> ---------------
> 
> qos-ulps
>     default                : 0 # default SL
>     ipoib, pkey 0x7FFF     : 1 # IP with default pkey 0x7FFF
>     ipoib, pkey 0x1        : 2 # backbone IP with pkey 0x1
>     ipoib, pkey 0x2        : 3 # admin IP with pkey 0x2
> end-qos-ulps
> 
> Assigned IP addresses (in /etc/hosts):
> -------------------------------------
> 
> 10.12.1.4       pichu16-ic0             # default IPoIB network, pkey 0x7FFF
> 10.13.1.4       pichu16-backbone        # IPoIB backbone network, pkey 0x1
> 10.14.1.4       pichu16-admin           # IPoIB admin network, pkey 0x2
> 10.12.1.10      pichu22-ic0             # default IPoIB network, pkey 0x7FFF
> 10.13.1.10      pichu22-backbone        # IPoIB backbone network, pkey 0x1
> 10.14.1.10      pichu22-admin           # IPoIB admin network, pkey 0x2
> 
> Note that the netmask is /16, so the -ic0, -backbone and -admin networks
> cannot see each other.
> 
> IPoIB settings on server side:
> ------------------------------
> 
> [root@pichu16 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
> ==> /etc/sysconfig/network-scripts/ifcfg-ib0 <==
> BOOTPROTO=static
> IPADDR=10.12.1.4
> NETMASK=255.255.0.0
> ONBOOT=yes
> MTU=2044
> 
> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8001 <==
> BOOTPROTO=static
> IPADDR=10.13.1.4
> NETMASK=255.255.0.0
> ONBOOT=yes
> MTU=2044
> 
> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8002 <==
> BOOTPROTO=static
> IPADDR=10.14.1.4
> NETMASK=255.255.0.0
> ONBOOT=yes
> MTU=2044
> 
> [root@pichu16 ~]# ip addr show ib0
> 4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
> state UP qlen 256
>     link/infiniband
> 80:00:00:48:fe:80:00:00:00:00:00:00:2c:90:00:10:0d:00:05:6d brd
> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>     inet 10.12.1.4/16 brd 10.12.255.255 scope global ib0
>     inet 10.13.1.4/16 brd 10.13.255.255 scope global ib0
>     inet 10.14.1.4/16 brd 10.14.255.255 scope global ib0
>     inet6 fe80::2e90:10:d00:56d/64 scope link
>        valid_lft forever preferred_lft forever
> 
> IPoIB settings on client side:
> ------------------------------
> 
> [root@pichu22 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
> ==> /etc/sysconfig/network-scripts/ifcfg-ib0 <==
> BOOTPROTO=static
> IPADDR=10.12.1.10
> NETMASK=255.255.0.0
> ONBOOT=yes
> MTU=2044
> 
> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8001 <==
> BOOTPROTO=static
> IPADDR=10.13.1.10
> NETMASK=255.255.0.0
> ONBOOT=yes
> MTU=2044
> 
> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8002 <==
> BOOTPROTO=static
> IPADDR=10.14.1.10
> NETMASK=255.255.0.0
> ONBOOT=yes
> MTU=2044
> 
> [root@pichu22 ~]# ip addr show ib0
> 48: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
> state UP qlen 256
>     link/infiniband
> 80:00:00:48:fe:80:00:00:00:00:00:00:2c:90:00:10:0d:00:06:79 brd
> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>     inet 10.12.1.10/16 brd 10.12.255.255 scope global ib0
>     inet 10.13.1.10/16 brd 10.13.255.255 scope global ib0
>     inet 10.14.1.10/16 brd 10.14.255.255 scope global ib0
>     inet6 fe80::2e90:10:d00:679/64 scope link
>        valid_lft forever preferred_lft forever
> 
> Iperf servers on server side:
> -----------------------------
> 
> Quoting from iperf help:
>   -B, --bind      <host>   bind to <host>, an interface or multicast address
>   -s, --server             run in server mode
> 
> Each iperf server is bound to a dedicated interface as follows:
> 
> [root@pichu16 ~]# iperf -s -B pichu16-backbone
> [root@pichu16 ~]# iperf -s -B pichu16-admin
> [root@pichu16 ~]# iperf -s -B pichu16-ic0
> 
> Iperf clients on client side:
> -----------------------------
> 
> Quoting from iperf help:
>   -c, --client    <host>   run in client mode, connecting to <host>
>   -t, --time      #        time in seconds to transmit for (default 10 secs)
> 
> And each iperf client talks to the corresponding iperf server:
> 
> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
> 100 2>&1; done | grep Gbits/sec
> [  3]  0.0-100.0 sec  64.6 GBytes  5.55 Gbits/sec
> [  3]  0.0-100.0 sec  64.5 GBytes  5.54 Gbits/sec
> [  3]  0.0-100.0 sec  60.5 GBytes  5.20 Gbits/sec
> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
> -t 100 2>&1; done | grep Gbits/sec
> [  3]  0.0-100.0 sec  64.8 GBytes  5.57 Gbits/sec
> [  3]  0.0-100.0 sec  56.7 GBytes  4.87 Gbits/sec
> [  3]  0.0-100.0 sec  59.7 GBytes  5.13 Gbits/sec
> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t
> 100 2>&1; done | grep Gbits/sec
> [  3]  0.0-100.0 sec  57.3 GBytes  4.92 Gbits/sec
> [  3]  0.0-100.0 sec  61.6 GBytes  5.29 Gbits/sec
> [  3]  0.0-100.0 sec  62.7 GBytes  5.38 Gbits/sec
> 
> Given the VLarb weights assigned (1 for *-ic0 on VL1, 1 for *-backbone
> on VL2 and 4 for *-admin on VL3), we would expect different b/w figures
> for the *-admin network.
> As we can see, all iperf values are the same, showing that QoS is not
> enforced on a per pkey basis.
> It seems to me that something is not mapped properly in the ULP layers.
> Could anyone tell me if I'm wrong here ? If not, is that a known issue ?
> 
> Thanks for your help,
> 
> Vincent
> 
> 
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <4B0D1F36.1090007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]

* Re: QoS settings not mapped correctly per pkey ?
       [not found]     ` <4B0D1F36.1090007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2009-11-25 14:01       ` Vincent Ficet
       [not found]         ` <4B0D38C7.3080505-6ktuUTfB/bM@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Vincent Ficet @ 2009-11-25 14:01 UTC (permalink / raw)
  To: kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, BOURDE CELINE

Yevgeny,

> Hi Vincent,
>
> Vincent Ficet wrote:
>> Hello,
>>
>> Following the QoS experiments I carried out yesterday, I wanted to set
>> up 3 IP networks, each one bound to a particular pkey, in order to
>> achieve QoS for each network.
>> Unfortunately, it seems that something is not mapped properly in the ULP
>> layers (vlarb tables are fine).
>>
>> The settings are as follows:
>>
>> opensm.conf:
>> ------------
>>
>> qos_max_vls    8
>> qos_high_limit 1
>> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
>> qos_vlarb_low  0:8,1:1,2:1,3:4,4:0,5:0
>> qos_sl2vl      0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
>
> Please check section 7 of the QoS_management_in_OpenSM.txt
> doc. It explains what exactly is the meaning of the values
> in the VLArb table. It also has explanation of the problem
> that you're seeing. Quoting from there:
>
> "Keep in mind that ports usually transmit packets of
>  size equal to MTU. For instance, for 4KB MTU a single
>  packet will require 64 credits, so in order to achieve
>  effective VL arbitration for packets of 4KB MTU, the
>  weighting values for each VL should be multiples of 64."
>
OK, I see the point.

To check that it works as you said. we changed the IPoIB MTU from 2044
to 2000 in order to make sure that it fits into the IB MTU. which is set
to 2K on our cluster.
In theory, such a 2K packet would require 32 packets (credits) of 64 bytes.

We changed the vlarb tables with increments of 32 (for VL 1,2,3):

qos_max_vls    8
qos_high_limit 1
qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
qos_vlarb_low  0:8,1:32,2:64,3:96,4:0,5:0
qos_sl2vl      0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15

and we also tried increments of 64:

qos_max_vls    8
qos_high_limit 1
qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
qos_vlarb_low  0:8,1:64,2:128,3:192,4:0,5:0
qos_sl2vl      0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15

But still, it does not make any difference:

 [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
20 2>&1; done | grep Gbits/sec
[  3]  0.0-20.0 sec  13.0 GBytes  5.57 Gbits/sec
[  3]  0.0-20.0 sec  12.9 GBytes  5.53 Gbits/sec
[  3]  0.0-20.0 sec  12.0 GBytes  5.17 Gbits/sec

[root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
-t 20 2>&1; done | grep Gbits/sec
[  3]  0.0-20.0 sec  13.1 GBytes  5.61 Gbits/sec
[  3]  0.0-20.0 sec  11.9 GBytes  5.09 Gbits/sec
[  3]  0.0-20.0 sec  9.43 GBytes  4.05 Gbits/sec

[root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t
20 2>&1; done | grep Gbits/sec
[  3]  0.0-20.0 sec  10.5 GBytes  4.50 Gbits/sec
[  3]  0.0-20.0 sec  12.3 GBytes  5.28 Gbits/sec
[  3]  0.0-20.0 sec  12.0 GBytes  5.15 Gbits/sec

Any other idea ?

Thanks for your help.

Vincent

> -- Yevgeny
>
>
>> The corresponding VLArb tables are fine on both the server (pichu16) and
>> the client (pichu22):
>>
>> [root@pichu22 network-scripts]# smpquery vlarb -D 0
>> # VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 0 LowCap
>> 8 HighCap 8
>> # Low priority VL Arbitration Table:
>> VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
>> WEIGHT: |0x8 |0x1 |0x1 |0x4 |0x0 |0x0 |0x0 |0x0 |
>> # High priority VL Arbitration Table:
>> VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
>> WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
>>
>> [root@pichu16 ~]# smpquery vlarb -D 0
>> # VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 0 LowCap
>> 8 HighCap 8
>> # Low priority VL Arbitration Table:
>> VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
>> WEIGHT: |0x8 |0x1 |0x1 |0x4 |0x0 |0x0 |0x0 |0x0 |
>> # High priority VL Arbitration Table:
>> VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
>> WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
>>
>> partitions.conf:
>> ---------------
>>
>> default=0x7fff,ipoib            : ALL=full;
>> ip_backbone=0x0001,ipoib        : ALL=full;
>> ip_admin=0x0002,ipoib            : ALL=full;
>>
>> qos-policy.conf:
>> ---------------
>>
>> qos-ulps
>>     default                : 0 # default SL
>>     ipoib, pkey 0x7FFF     : 1 # IP with default pkey 0x7FFF
>>     ipoib, pkey 0x1        : 2 # backbone IP with pkey 0x1
>>     ipoib, pkey 0x2        : 3 # admin IP with pkey 0x2
>> end-qos-ulps
>>
>> Assigned IP addresses (in /etc/hosts):
>> -------------------------------------
>>
>> 10.12.1.4       pichu16-ic0             # default IPoIB network, pkey
>> 0x7FFF
>> 10.13.1.4       pichu16-backbone        # IPoIB backbone network,
>> pkey 0x1
>> 10.14.1.4       pichu16-admin           # IPoIB admin network, pkey 0x2
>> 10.12.1.10      pichu22-ic0             # default IPoIB network, pkey
>> 0x7FFF
>> 10.13.1.10      pichu22-backbone        # IPoIB backbone network,
>> pkey 0x1
>> 10.14.1.10      pichu22-admin           # IPoIB admin network, pkey 0x2
>>
>> Note that the netmask is /16, so the -ic0, -backbone and -admin networks
>> cannot see each other.
>>
>> IPoIB settings on server side:
>> ------------------------------
>>
>> [root@pichu16 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0 <==
>> BOOTPROTO=static
>> IPADDR=10.12.1.4
>> NETMASK=255.255.0.0
>> ONBOOT=yes
>> MTU=2044
>>
>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8001 <==
>> BOOTPROTO=static
>> IPADDR=10.13.1.4
>> NETMASK=255.255.0.0
>> ONBOOT=yes
>> MTU=2044
>>
>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8002 <==
>> BOOTPROTO=static
>> IPADDR=10.14.1.4
>> NETMASK=255.255.0.0
>> ONBOOT=yes
>> MTU=2044
>>
>> [root@pichu16 ~]# ip addr show ib0
>> 4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
>> state UP qlen 256
>>     link/infiniband
>> 80:00:00:48:fe:80:00:00:00:00:00:00:2c:90:00:10:0d:00:05:6d brd
>> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>>     inet 10.12.1.4/16 brd 10.12.255.255 scope global ib0
>>     inet 10.13.1.4/16 brd 10.13.255.255 scope global ib0
>>     inet 10.14.1.4/16 brd 10.14.255.255 scope global ib0
>>     inet6 fe80::2e90:10:d00:56d/64 scope link
>>        valid_lft forever preferred_lft forever
>>
>> IPoIB settings on client side:
>> ------------------------------
>>
>> [root@pichu22 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0 <==
>> BOOTPROTO=static
>> IPADDR=10.12.1.10
>> NETMASK=255.255.0.0
>> ONBOOT=yes
>> MTU=2044
>>
>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8001 <==
>> BOOTPROTO=static
>> IPADDR=10.13.1.10
>> NETMASK=255.255.0.0
>> ONBOOT=yes
>> MTU=2044
>>
>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8002 <==
>> BOOTPROTO=static
>> IPADDR=10.14.1.10
>> NETMASK=255.255.0.0
>> ONBOOT=yes
>> MTU=2044
>>
>> [root@pichu22 ~]# ip addr show ib0
>> 48: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
>> state UP qlen 256
>>     link/infiniband
>> 80:00:00:48:fe:80:00:00:00:00:00:00:2c:90:00:10:0d:00:06:79 brd
>> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>>     inet 10.12.1.10/16 brd 10.12.255.255 scope global ib0
>>     inet 10.13.1.10/16 brd 10.13.255.255 scope global ib0
>>     inet 10.14.1.10/16 brd 10.14.255.255 scope global ib0
>>     inet6 fe80::2e90:10:d00:679/64 scope link
>>        valid_lft forever preferred_lft forever
>>
>> Iperf servers on server side:
>> -----------------------------
>>
>> Quoting from iperf help:
>>   -B, --bind      <host>   bind to <host>, an interface or multicast
>> address
>>   -s, --server             run in server mode
>>
>> Each iperf server is bound to a dedicated interface as follows:
>>
>> [root@pichu16 ~]# iperf -s -B pichu16-backbone
>> [root@pichu16 ~]# iperf -s -B pichu16-admin
>> [root@pichu16 ~]# iperf -s -B pichu16-ic0
>>
>> Iperf clients on client side:
>> -----------------------------
>>
>> Quoting from iperf help:
>>   -c, --client    <host>   run in client mode, connecting to <host>
>>   -t, --time      #        time in seconds to transmit for (default
>> 10 secs)
>>
>> And each iperf client talks to the corresponding iperf server:
>>
>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
>> 100 2>&1; done | grep Gbits/sec
>> [  3]  0.0-100.0 sec  64.6 GBytes  5.55 Gbits/sec
>> [  3]  0.0-100.0 sec  64.5 GBytes  5.54 Gbits/sec
>> [  3]  0.0-100.0 sec  60.5 GBytes  5.20 Gbits/sec
>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
>> -t 100 2>&1; done | grep Gbits/sec
>> [  3]  0.0-100.0 sec  64.8 GBytes  5.57 Gbits/sec
>> [  3]  0.0-100.0 sec  56.7 GBytes  4.87 Gbits/sec
>> [  3]  0.0-100.0 sec  59.7 GBytes  5.13 Gbits/sec
>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t
>> 100 2>&1; done | grep Gbits/sec
>> [  3]  0.0-100.0 sec  57.3 GBytes  4.92 Gbits/sec
>> [  3]  0.0-100.0 sec  61.6 GBytes  5.29 Gbits/sec
>> [  3]  0.0-100.0 sec  62.7 GBytes  5.38 Gbits/sec
>>
>> Given the VLarb weights assigned (1 for *-ic0 on VL1, 1 for *-backbone
>> on VL2 and 4 for *-admin on VL3), we would expect different b/w figures
>> for the *-admin network.
>> As we can see, all iperf values are the same, showing that QoS is not
>> enforced on a per pkey basis.
>> It seems to me that something is not mapped properly in the ULP layers.
>> Could anyone tell me if I'm wrong here ? If not, is that a known issue ?
>>
>> Thanks for your help,
>>
>> Vincent
>>
>>
>>
>>
>>
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <4B0D38C7.3080505-6ktuUTfB/bM@public.gmane.org>]

* Re: QoS settings not mapped correctly per pkey ?
       [not found]         ` <4B0D38C7.3080505-6ktuUTfB/bM@public.gmane.org>
@ 2009-11-25 14:37           ` Yevgeny Kliteynik
       [not found]             ` <4B0D410E.2010903-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Yevgeny Kliteynik @ 2009-11-25 14:37 UTC (permalink / raw)
  To: Vincent Ficet; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, BOURDE CELINE

Vincent Ficet wrote:
> Yevgeny,
> 
>> Hi Vincent,
>>
>> Vincent Ficet wrote:
>>> Hello,
>>>
>>> Following the QoS experiments I carried out yesterday, I wanted to set
>>> up 3 IP networks, each one bound to a particular pkey, in order to
>>> achieve QoS for each network.
>>> Unfortunately, it seems that something is not mapped properly in the ULP
>>> layers (vlarb tables are fine).
>>>
>>> The settings are as follows:
>>>
>>> opensm.conf:
>>> ------------
>>>
>>> qos_max_vls    8
>>> qos_high_limit 1
>>> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
>>> qos_vlarb_low  0:8,1:1,2:1,3:4,4:0,5:0
>>> qos_sl2vl      0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
>> Please check section 7 of the QoS_management_in_OpenSM.txt
>> doc. It explains what exactly is the meaning of the values
>> in the VLArb table. It also has explanation of the problem
>> that you're seeing. Quoting from there:
>>
>> "Keep in mind that ports usually transmit packets of
>>  size equal to MTU. For instance, for 4KB MTU a single
>>  packet will require 64 credits, so in order to achieve
>>  effective VL arbitration for packets of 4KB MTU, the
>>  weighting values for each VL should be multiples of 64."
>>
> OK, I see the point.
> 
> To check that it works as you said. we changed the IPoIB MTU from 2044
> to 2000 in order to make sure that it fits into the IB MTU. which is set
> to 2K on our cluster.
> In theory, such a 2K packet would require 32 packets (credits) of 64 bytes.
> 
> We changed the vlarb tables with increments of 32 (for VL 1,2,3):
> 
> qos_max_vls    8
> qos_high_limit 1
> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
> qos_vlarb_low  0:8,1:32,2:64,3:96,4:0,5:0
> qos_sl2vl      0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> 
> and we also tried increments of 64:
> 
> qos_max_vls    8
> qos_high_limit 1
> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
> qos_vlarb_low  0:8,1:64,2:128,3:192,4:0,5:0
> qos_sl2vl      0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> 
> But still, it does not make any difference:
> 
>  [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
> 20 2>&1; done | grep Gbits/sec
> [  3]  0.0-20.0 sec  13.0 GBytes  5.57 Gbits/sec
> [  3]  0.0-20.0 sec  12.9 GBytes  5.53 Gbits/sec
> [  3]  0.0-20.0 sec  12.0 GBytes  5.17 Gbits/sec
> 
> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
> -t 20 2>&1; done | grep Gbits/sec
> [  3]  0.0-20.0 sec  13.1 GBytes  5.61 Gbits/sec
> [  3]  0.0-20.0 sec  11.9 GBytes  5.09 Gbits/sec
> [  3]  0.0-20.0 sec  9.43 GBytes  4.05 Gbits/sec
> 
> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t
> 20 2>&1; done | grep Gbits/sec
> [  3]  0.0-20.0 sec  10.5 GBytes  4.50 Gbits/sec
> [  3]  0.0-20.0 sec  12.3 GBytes  5.28 Gbits/sec
> [  3]  0.0-20.0 sec  12.0 GBytes  5.15 Gbits/sec
> 
> Any other idea ?

OK, so there are three possible reasons that I can think of:
1. Something is wrong in the configuration.
2. The application does not saturate the link, thus QoS
   and the whole VL arbitration thing doesn't kick in.
3. There's some bug, somewhere.

Let's start with reason no. 1.
Please shut off each of the SLs one by one, and
make sure that the application gets zero BW on
these SLs. You can do it by mapping SL to VL15:

 qos_sl2vl      0,15,2,3,4,5,6,7,8,9,10,11,12,13,14,15

and then 

 qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15

and then 

 qos_sl2vl      0,1,2,15,4,5,6,7,8,9,10,11,12,13,14,15

If this part works well, then we will continue to
reason no. 2.

-- Yevgeny

 
> Thanks for your help.
> 
> Vincent
> 
>> -- Yevgeny
>>
>>
>>> The corresponding VLArb tables are fine on both the server (pichu16) and
>>> the client (pichu22):
>>>
>>> [root@pichu22 network-scripts]# smpquery vlarb -D 0
>>> # VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 0 LowCap
>>> 8 HighCap 8
>>> # Low priority VL Arbitration Table:
>>> VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
>>> WEIGHT: |0x8 |0x1 |0x1 |0x4 |0x0 |0x0 |0x0 |0x0 |
>>> # High priority VL Arbitration Table:
>>> VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
>>> WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
>>>
>>> [root@pichu16 ~]# smpquery vlarb -D 0
>>> # VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 0 LowCap
>>> 8 HighCap 8
>>> # Low priority VL Arbitration Table:
>>> VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
>>> WEIGHT: |0x8 |0x1 |0x1 |0x4 |0x0 |0x0 |0x0 |0x0 |
>>> # High priority VL Arbitration Table:
>>> VL    : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 |
>>> WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |
>>>
>>> partitions.conf:
>>> ---------------
>>>
>>> default=0x7fff,ipoib            : ALL=full;
>>> ip_backbone=0x0001,ipoib        : ALL=full;
>>> ip_admin=0x0002,ipoib            : ALL=full;
>>>
>>> qos-policy.conf:
>>> ---------------
>>>
>>> qos-ulps
>>>     default                : 0 # default SL
>>>     ipoib, pkey 0x7FFF     : 1 # IP with default pkey 0x7FFF
>>>     ipoib, pkey 0x1        : 2 # backbone IP with pkey 0x1
>>>     ipoib, pkey 0x2        : 3 # admin IP with pkey 0x2
>>> end-qos-ulps
>>>
>>> Assigned IP addresses (in /etc/hosts):
>>> -------------------------------------
>>>
>>> 10.12.1.4       pichu16-ic0             # default IPoIB network, pkey
>>> 0x7FFF
>>> 10.13.1.4       pichu16-backbone        # IPoIB backbone network,
>>> pkey 0x1
>>> 10.14.1.4       pichu16-admin           # IPoIB admin network, pkey 0x2
>>> 10.12.1.10      pichu22-ic0             # default IPoIB network, pkey
>>> 0x7FFF
>>> 10.13.1.10      pichu22-backbone        # IPoIB backbone network,
>>> pkey 0x1
>>> 10.14.1.10      pichu22-admin           # IPoIB admin network, pkey 0x2
>>>
>>> Note that the netmask is /16, so the -ic0, -backbone and -admin networks
>>> cannot see each other.
>>>
>>> IPoIB settings on server side:
>>> ------------------------------
>>>
>>> [root@pichu16 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
>>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0 <==
>>> BOOTPROTO=static
>>> IPADDR=10.12.1.4
>>> NETMASK=255.255.0.0
>>> ONBOOT=yes
>>> MTU=2044
>>>
>>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8001 <==
>>> BOOTPROTO=static
>>> IPADDR=10.13.1.4
>>> NETMASK=255.255.0.0
>>> ONBOOT=yes
>>> MTU=2044
>>>
>>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8002 <==
>>> BOOTPROTO=static
>>> IPADDR=10.14.1.4
>>> NETMASK=255.255.0.0
>>> ONBOOT=yes
>>> MTU=2044
>>>
>>> [root@pichu16 ~]# ip addr show ib0
>>> 4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
>>> state UP qlen 256
>>>     link/infiniband
>>> 80:00:00:48:fe:80:00:00:00:00:00:00:2c:90:00:10:0d:00:05:6d brd
>>> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>>>     inet 10.12.1.4/16 brd 10.12.255.255 scope global ib0
>>>     inet 10.13.1.4/16 brd 10.13.255.255 scope global ib0
>>>     inet 10.14.1.4/16 brd 10.14.255.255 scope global ib0
>>>     inet6 fe80::2e90:10:d00:56d/64 scope link
>>>        valid_lft forever preferred_lft forever
>>>
>>> IPoIB settings on client side:
>>> ------------------------------
>>>
>>> [root@pichu22 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
>>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0 <==
>>> BOOTPROTO=static
>>> IPADDR=10.12.1.10
>>> NETMASK=255.255.0.0
>>> ONBOOT=yes
>>> MTU=2044
>>>
>>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8001 <==
>>> BOOTPROTO=static
>>> IPADDR=10.13.1.10
>>> NETMASK=255.255.0.0
>>> ONBOOT=yes
>>> MTU=2044
>>>
>>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8002 <==
>>> BOOTPROTO=static
>>> IPADDR=10.14.1.10
>>> NETMASK=255.255.0.0
>>> ONBOOT=yes
>>> MTU=2044
>>>
>>> [root@pichu22 ~]# ip addr show ib0
>>> 48: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc pfifo_fast
>>> state UP qlen 256
>>>     link/infiniband
>>> 80:00:00:48:fe:80:00:00:00:00:00:00:2c:90:00:10:0d:00:06:79 brd
>>> 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
>>>     inet 10.12.1.10/16 brd 10.12.255.255 scope global ib0
>>>     inet 10.13.1.10/16 brd 10.13.255.255 scope global ib0
>>>     inet 10.14.1.10/16 brd 10.14.255.255 scope global ib0
>>>     inet6 fe80::2e90:10:d00:679/64 scope link
>>>        valid_lft forever preferred_lft forever
>>>
>>> Iperf servers on server side:
>>> -----------------------------
>>>
>>> Quoting from iperf help:
>>>   -B, --bind      <host>   bind to <host>, an interface or multicast
>>> address
>>>   -s, --server             run in server mode
>>>
>>> Each iperf server is bound to a dedicated interface as follows:
>>>
>>> [root@pichu16 ~]# iperf -s -B pichu16-backbone
>>> [root@pichu16 ~]# iperf -s -B pichu16-admin
>>> [root@pichu16 ~]# iperf -s -B pichu16-ic0
>>>
>>> Iperf clients on client side:
>>> -----------------------------
>>>
>>> Quoting from iperf help:
>>>   -c, --client    <host>   run in client mode, connecting to <host>
>>>   -t, --time      #        time in seconds to transmit for (default
>>> 10 secs)
>>>
>>> And each iperf client talks to the corresponding iperf server:
>>>
>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
>>> 100 2>&1; done | grep Gbits/sec
>>> [  3]  0.0-100.0 sec  64.6 GBytes  5.55 Gbits/sec
>>> [  3]  0.0-100.0 sec  64.5 GBytes  5.54 Gbits/sec
>>> [  3]  0.0-100.0 sec  60.5 GBytes  5.20 Gbits/sec
>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
>>> -t 100 2>&1; done | grep Gbits/sec
>>> [  3]  0.0-100.0 sec  64.8 GBytes  5.57 Gbits/sec
>>> [  3]  0.0-100.0 sec  56.7 GBytes  4.87 Gbits/sec
>>> [  3]  0.0-100.0 sec  59.7 GBytes  5.13 Gbits/sec
>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t
>>> 100 2>&1; done | grep Gbits/sec
>>> [  3]  0.0-100.0 sec  57.3 GBytes  4.92 Gbits/sec
>>> [  3]  0.0-100.0 sec  61.6 GBytes  5.29 Gbits/sec
>>> [  3]  0.0-100.0 sec  62.7 GBytes  5.38 Gbits/sec
>>>
>>> Given the VLarb weights assigned (1 for *-ic0 on VL1, 1 for *-backbone
>>> on VL2 and 4 for *-admin on VL3), we would expect different b/w figures
>>> for the *-admin network.
>>> As we can see, all iperf values are the same, showing that QoS is not
>>> enforced on a per pkey basis.
>>> It seems to me that something is not mapped properly in the ULP layers.
>>> Could anyone tell me if I'm wrong here ? If not, is that a known issue ?
>>>
>>> Thanks for your help,
>>>
>>> Vincent
>>>
>>>
>>>
>>>
>>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <4B0D410E.2010903-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]

* Re: QoS settings not mapped correctly per pkey ?
       [not found]             ` <4B0D410E.2010903-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2009-11-25 15:14               ` Vincent Ficet
       [not found]                 ` <4B0D49F0.6060400-6ktuUTfB/bM@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Vincent Ficet @ 2009-11-25 15:14 UTC (permalink / raw)
  To: kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, BOURDE CELINE

Yevgeny,
>
> OK, so there are three possible reasons that I can think of:
> 1. Something is wrong in the configuration.
> 2. The application does not saturate the link, thus QoS
>   and the whole VL arbitration thing doesn't kick in.
> 3. There's some bug, somewhere.
>
> Let's start with reason no. 1.
> Please shut off each of the SLs one by one, and
> make sure that the application gets zero BW on
> these SLs. You can do it by mapping SL to VL15:
>
> qos_sl2vl      0,15,2,3,4,5,6,7,8,9,10,11,12,13,14,15
If I shut down this SL by moving it to VL15, the interfaces stop pinging.
This is probably because some IPoIB multicast traffic gets cut off for
pkey 0x7fff .. ?

So no results for this one.
>
> and then
> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>
With this setup, and the following QoS settings:

qos_max_vls    8
qos_high_limit 1
qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15

I get roughly the same values for SL 1 to SL3:

[root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
10 -P 8 2>&1; done | grep SUM
[SUM]  0.0-10.0 sec  6.15 GBytes  5.28 Gbits/sec
[SUM]  0.0-10.0 sec  6.00 GBytes  5.16 Gbits/sec
[SUM]  0.0-10.1 sec  5.38 GBytes  4.59 Gbits/sec

[root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
-t 10 -P 8 2>&1; done | grep SUM
[SUM]  0.0-10.0 sec  6.09 GBytes  5.23 Gbits/sec
[SUM]  0.0-10.0 sec  6.41 GBytes  5.51 Gbits/sec
[SUM]  0.0-10.0 sec  4.72 GBytes  4.05 Gbits/sec

[root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t
10 -P 8 2>&1; done | grep SUM
[SUM]  0.0-10.1 sec  6.96 GBytes  5.92 Gbits/sec
[SUM]  0.0-10.1 sec  5.89 GBytes  5.00 Gbits/sec
[SUM]  0.0-10.0 sec  5.35 GBytes  4.58 Gbits/sec

> and then
> qos_sl2vl      0,1,2,15,4,5,6,7,8,9,10,11,12,13,14,15
Same results as the previous 0,1,15,3,... SL2vl mapping.
>
> If this part works well, then we will continue to
> reason no. 2.
In the above tests, I used -P8 to force 8 threads on the client side for
each test.
I have one quad core CPU(Intel  E55400).
This makes 24 iperf threads on 4 cores, which __should__ be fine (well I
suppose ...)

And regarding reason #3. I still get the error I got yesterday, which
you told me was not important because the SL's set in partitions.conf
would override what was read from qos-policy.conf in the first place.

Nov 25 13:13:05 664690 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS
Level SL (3)
Nov 25 13:13:05 664681 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS
Level SL (2)
Nov 25 13:13:05 664670 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS
Level SL (1)

Thanks for your help.

Vincent
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <4B0D49F0.6060400-6ktuUTfB/bM@public.gmane.org>]

* Re: QoS settings not mapped correctly per pkey ?
       [not found]                 ` <4B0D49F0.6060400-6ktuUTfB/bM@public.gmane.org>
@ 2009-11-25 15:45                   ` Yevgeny Kliteynik
       [not found]                     ` <4B0D5110.70606-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Yevgeny Kliteynik @ 2009-11-25 15:45 UTC (permalink / raw)
  To: Vincent Ficet; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, BOURDE CELINE

Vincent Ficet wrote:
> Yevgeny,
>> OK, so there are three possible reasons that I can think of:
>> 1. Something is wrong in the configuration.
>> 2. The application does not saturate the link, thus QoS
>>   and the whole VL arbitration thing doesn't kick in.
>> 3. There's some bug, somewhere.
>>
>> Let's start with reason no. 1.
>> Please shut off each of the SLs one by one, and
>> make sure that the application gets zero BW on
>> these SLs. You can do it by mapping SL to VL15:
>>
>> qos_sl2vl      0,15,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> If I shut down this SL by moving it to VL15, the interfaces stop pinging.
> This is probably because some IPoIB multicast traffic gets cut off for
> pkey 0x7fff .. ?

Could be, or because ALL interfaces are mapped to
SL1, which is what the results below suggest.

> So no results for this one.
>> and then
>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>>
> With this setup, and the following QoS settings:
> 
> qos_max_vls    8
> qos_high_limit 1
> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
> qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
> 
> I get roughly the same values for SL 1 to SL3:

That doesn't look right.
You have shut off SL2, so you can't see same
BW for this SL. Looks like there is a problem
in configuration (or bug in SM).

Have you validated somehow that the interfaces
have been mapped to the right SLs?

> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
> 10 -P 8 2>&1; done | grep SUM
> [SUM]  0.0-10.0 sec  6.15 GBytes  5.28 Gbits/sec
> [SUM]  0.0-10.0 sec  6.00 GBytes  5.16 Gbits/sec
> [SUM]  0.0-10.1 sec  5.38 GBytes  4.59 Gbits/sec
> 
> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
> -t 10 -P 8 2>&1; done | grep SUM
> [SUM]  0.0-10.0 sec  6.09 GBytes  5.23 Gbits/sec
> [SUM]  0.0-10.0 sec  6.41 GBytes  5.51 Gbits/sec
> [SUM]  0.0-10.0 sec  4.72 GBytes  4.05 Gbits/sec
> 
> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t
> 10 -P 8 2>&1; done | grep SUM
> [SUM]  0.0-10.1 sec  6.96 GBytes  5.92 Gbits/sec
> [SUM]  0.0-10.1 sec  5.89 GBytes  5.00 Gbits/sec
> [SUM]  0.0-10.0 sec  5.35 GBytes  4.58 Gbits/sec
> 
>> and then
>> qos_sl2vl      0,1,2,15,4,5,6,7,8,9,10,11,12,13,14,15
> Same results as the previous 0,1,15,3,... SL2vl mapping.
>> If this part works well, then we will continue to
>> reason no. 2.
> In the above tests, I used -P8 to force 8 threads on the client side for
> each test.
> I have one quad core CPU(Intel  E55400).
> This makes 24 iperf threads on 4 cores, which __should__ be fine (well I
> suppose ...)

Best would be having one qperf per CPU core,
which is 4 qperf's in your case.

What is your subnet setup?

-- Yevgeny


> And regarding reason #3. I still get the error I got yesterday, which
> you told me was not important because the SL's set in partitions.conf
> would override what was read from qos-policy.conf in the first place.
> 
> Nov 25 13:13:05 664690 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
> AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS
> Level SL (3)
> Nov 25 13:13:05 664681 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
> AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS
> Level SL (2)
> Nov 25 13:13:05 664670 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
> AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS
> Level SL (1)
> 
> Thanks for your help.
> 
> Vincent
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <4B0D5110.70606-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]

* Re: QoS settings not mapped correctly per pkey ?
       [not found]                     ` <4B0D5110.70606-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2009-11-26  7:57                       ` Vincent Ficet
       [not found]                         ` <4B0E34EB.6020403-6ktuUTfB/bM@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Vincent Ficet @ 2009-11-26  7:57 UTC (permalink / raw)
  To: kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, BOURDE CELINE

Hello Yevgeny,

>>> OK, so there are three possible reasons that I can think of:
>>> 1. Something is wrong in the configuration.
>>> 2. The application does not saturate the link, thus QoS
>>>   and the whole VL arbitration thing doesn't kick in.
>>> 3. There's some bug, somewhere.
>>>
>>> Let's start with reason no. 1.
>>> Please shut off each of the SLs one by one, and
>>> make sure that the application gets zero BW on
>>> these SLs. You can do it by mapping SL to VL15:
>>>
>>> qos_sl2vl      0,15,2,3,4,5,6,7,8,9,10,11,12,13,14,15
>> If I shut down this SL by moving it to VL15, the interfaces stop
>> pinging.
>> This is probably because some IPoIB multicast traffic gets cut off for
>> pkey 0x7fff .. ?
>
> Could be, or because ALL interfaces are mapped to
> SL1, which is what the results below suggest.
Yes, you are right (see below).
>
>> So no results for this one.
>>> and then
>>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>>>
>> With this setup, and the following QoS settings:
>>
>> qos_max_vls    8
>> qos_high_limit 1
>> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
>> qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>>
>> I get roughly the same values for SL 1 to SL3:
>
> That doesn't look right.
> You have shut off SL2, so you can't see same
> BW for this SL. Looks like there is a problem
> in configuration (or bug in SM).
Yes, that's correct: There could be a configuration issue or a bug in SM:

Current setup and results:

qos_max_vls    8
qos_high_limit 1
qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15

[root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
10 -P 8 2>&1; done | grep SUM
[SUM]  0.0-10.1 sec  9.78 GBytes  8.28 Gbits/sec
[SUM]  0.0-10.0 sec  5.69 GBytes  4.89 Gbits/sec
[SUM]  0.0-10.0 sec  4.30 GBytes  3.69 Gbits/sec
[root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
-t 10 -P 8 2>&1; done | grep SUM
[SUM]  0.0-10.2 sec  6.44 GBytes  5.45 Gbits/sec
[SUM]  0.0-10.1 sec  6.64 GBytes  5.66 Gbits/sec
[SUM]  0.0-10.0 sec  6.03 GBytes  5.15 Gbits/sec
[root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t
10 -P 8 2>&1; done | grep SUM
[SUM]  0.0-10.0 sec  5.80 GBytes  4.98 Gbits/sec
[SUM]  0.0-10.0 sec  7.04 GBytes  6.02 Gbits/sec
[SUM]  0.0-10.0 sec  6.60 GBytes  5.67 Gbits/sec

The -backbone bandwidth should be 0 here.

>
> Have you validated somehow that the interfaces
> have been mapped to the right SLs?
Two things:
1/ Either the interface have not been mapped properly to the right SL's,
but given the config files below, I doubt it:

[root@pichu22 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
==> /etc/sysconfig/network-scripts/ifcfg-ib0 <==
BOOTPROTO=static
IPADDR=10.12.1.10
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2000

==> /etc/sysconfig/network-scripts/ifcfg-ib0.8001 <==
BOOTPROTO=static
IPADDR=10.13.1.10
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2000

==> /etc/sysconfig/network-scripts/ifcfg-ib0.8002 <==
BOOTPROTO=static
IPADDR=10.14.1.10
NETMASK=255.255.0.0
ONBOOT=yes
MTU=2000

partitions.conf:
-----------------

default=0x7fff,ipoib            : ALL=full;
ip_backbone=0x0001,ipoib        : ALL=full;
ip_admin=0x0002,ipoib            : ALL=full;

qos-policy.conf:
----------------
qos-ulps
        default            : 0 # default SL
    ipoib, pkey 0x7FFF    : 1 # IP with default pkey 0x7FFF
    ipoib, pkey 0x1        : 2 # backbone IP with pkey 0x1
    ipoib, pkey 0x2        : 3 # admin IP with pkey 0x2
end-qos-ulps

ib0.8001 maps to pkey 1 (with MSB set to 1 due to full membership =>
0x8001 = (1<<16 | 1)
ib0.8002 maps to pkey 2 (with MSB set to 1 due to full membership =>
0x8002 = (1<<16 | 2)

2/ Somehow, the qos policy parsing does not map pkeys as we would
expect, which is what the opensm messages would suggest:

Nov 25 13:13:05 664690 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS
Level SL (3)
Nov 25 13:13:05 664681 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS
Level SL (2)
Nov 25 13:13:05 664670 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS
Level SL (1)

If the messages are correct and do reflect what opensm is actually
doing, this would explain why shutting down SL1 (by moving it to VL15)
prevented all interfaces from running.

>
>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
>> 10 -P 8 2>&1; done | grep SUM
>> [SUM]  0.0-10.0 sec  6.15 GBytes  5.28 Gbits/sec
>> [SUM]  0.0-10.0 sec  6.00 GBytes  5.16 Gbits/sec
>> [SUM]  0.0-10.1 sec  5.38 GBytes  4.59 Gbits/sec
>>
>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
>> -t 10 -P 8 2>&1; done | grep SUM
>> [SUM]  0.0-10.0 sec  6.09 GBytes  5.23 Gbits/sec
>> [SUM]  0.0-10.0 sec  6.41 GBytes  5.51 Gbits/sec
>> [SUM]  0.0-10.0 sec  4.72 GBytes  4.05 Gbits/sec
>>
>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t
>> 10 -P 8 2>&1; done | grep SUM
>> [SUM]  0.0-10.1 sec  6.96 GBytes  5.92 Gbits/sec
>> [SUM]  0.0-10.1 sec  5.89 GBytes  5.00 Gbits/sec
>> [SUM]  0.0-10.0 sec  5.35 GBytes  4.58 Gbits/sec
>>
>>> and then
>>> qos_sl2vl      0,1,2,15,4,5,6,7,8,9,10,11,12,13,14,15
>> Same results as the previous 0,1,15,3,... SL2vl mapping.
>>> If this part works well, then we will continue to
>>> reason no. 2.
>> In the above tests, I used -P8 to force 8 threads on the client side for
>> each test.
>> I have one quad core CPU(Intel  E55400).
>> This makes 24 iperf threads on 4 cores, which __should__ be fine (well I
>> suppose ...)
>
> Best would be having one qperf per CPU core,
> which is 4 qperf's in your case.
>
> What is your subnet setup?
Nothing fancy for this test: I just bounce the taffic through a switch;

[root@pichu16 ~]# ibtracert 49 53
>From ca {0x2c9000100d00056c} portnum 1 lid 49-49 "pichu16 HCA-1"
[1] -> switch port {0x0002c9000100d0d4}[22] lid 58-58 "bullX chassis 36
port QDR switch"
[28] -> ca port {0x2c9000100d000679}[1] lid 53-53 "pichu22 HCA-1"
To ca {0x2c9000100d000678} portnum 1 lid 53-53 "pichu22 HCA-1"

Vincent

>
> -- Yevgeny
>
>
>> And regarding reason #3. I still get the error I got yesterday, which
>> you told me was not important because the SL's set in partitions.conf
>> would override what was read from qos-policy.conf in the first place.
>>
>> Nov 25 13:13:05 664690 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
>> AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS
>> Level SL (3)
>> Nov 25 13:13:05 664681 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
>> AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS
>> Level SL (2)
>> Nov 25 13:13:05 664670 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
>> AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS
>> Level SL (1)
>>
>> Thanks for your help.
>>
>> Vincent
>>
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <4B0E34EB.6020403-6ktuUTfB/bM@public.gmane.org>]

* Re: QoS settings not mapped correctly per pkey ?
       [not found]                         ` <4B0E34EB.6020403-6ktuUTfB/bM@public.gmane.org>
@ 2009-11-26  8:25                           ` Yevgeny Kliteynik
       [not found]                             ` <4B0E3B63.40705-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Yevgeny Kliteynik @ 2009-11-26  8:25 UTC (permalink / raw)
  To: Vincent Ficet; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, BOURDE CELINE

Vincent Ficet wrote:
> Hello Yevgeny,
> 
>>>> OK, so there are three possible reasons that I can think of:
>>>> 1. Something is wrong in the configuration.
>>>> 2. The application does not saturate the link, thus QoS
>>>>   and the whole VL arbitration thing doesn't kick in.
>>>> 3. There's some bug, somewhere.
>>>>
>>>> Let's start with reason no. 1.
>>>> Please shut off each of the SLs one by one, and
>>>> make sure that the application gets zero BW on
>>>> these SLs. You can do it by mapping SL to VL15:
>>>>
>>>> qos_sl2vl      0,15,2,3,4,5,6,7,8,9,10,11,12,13,14,15
>>> If I shut down this SL by moving it to VL15, the interfaces stop
>>> pinging.
>>> This is probably because some IPoIB multicast traffic gets cut off for
>>> pkey 0x7fff .. ?
>> Could be, or because ALL interfaces are mapped to
>> SL1, which is what the results below suggest.
> Yes, you are right (see below).
>>> So no results for this one.
>>>> and then
>>>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>>>>
>>> With this setup, and the following QoS settings:
>>>
>>> qos_max_vls    8
>>> qos_high_limit 1
>>> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
>>> qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
>>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>>>
>>> I get roughly the same values for SL 1 to SL3:
>> That doesn't look right.
>> You have shut off SL2, so you can't see same
>> BW for this SL. Looks like there is a problem
>> in configuration (or bug in SM).
> Yes, that's correct: There could be a configuration issue or a bug in SM:
> 
> Current setup and results:
> 
> qos_max_vls    8
> qos_high_limit 1
> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
> qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
> 
> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
> 10 -P 8 2>&1; done | grep SUM
> [SUM]  0.0-10.1 sec  9.78 GBytes  8.28 Gbits/sec
> [SUM]  0.0-10.0 sec  5.69 GBytes  4.89 Gbits/sec
> [SUM]  0.0-10.0 sec  4.30 GBytes  3.69 Gbits/sec
> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
> -t 10 -P 8 2>&1; done | grep SUM
> [SUM]  0.0-10.2 sec  6.44 GBytes  5.45 Gbits/sec
> [SUM]  0.0-10.1 sec  6.64 GBytes  5.66 Gbits/sec
> [SUM]  0.0-10.0 sec  6.03 GBytes  5.15 Gbits/sec
> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t
> 10 -P 8 2>&1; done | grep SUM
> [SUM]  0.0-10.0 sec  5.80 GBytes  4.98 Gbits/sec
> [SUM]  0.0-10.0 sec  7.04 GBytes  6.02 Gbits/sec
> [SUM]  0.0-10.0 sec  6.60 GBytes  5.67 Gbits/sec
> 
> The -backbone bandwidth should be 0 here.
> 
>> Have you validated somehow that the interfaces
>> have been mapped to the right SLs?
> Two things:
> 1/ Either the interface have not been mapped properly to the right SL's,
> but given the config files below, I doubt it:
> 
> [root@pichu22 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
> ==> /etc/sysconfig/network-scripts/ifcfg-ib0 <==
> BOOTPROTO=static
> IPADDR=10.12.1.10
> NETMASK=255.255.0.0
> ONBOOT=yes
> MTU=2000
> 
> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8001 <==
> BOOTPROTO=static
> IPADDR=10.13.1.10
> NETMASK=255.255.0.0
> ONBOOT=yes
> MTU=2000
> 
> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8002 <==
> BOOTPROTO=static
> IPADDR=10.14.1.10
> NETMASK=255.255.0.0
> ONBOOT=yes
> MTU=2000
> 
> partitions.conf:
> -----------------
> 
> default=0x7fff,ipoib            : ALL=full;
> ip_backbone=0x0001,ipoib        : ALL=full;
> ip_admin=0x0002,ipoib            : ALL=full;
> 
> qos-policy.conf:
> ----------------
> qos-ulps
>         default            : 0 # default SL
>     ipoib, pkey 0x7FFF    : 1 # IP with default pkey 0x7FFF
>     ipoib, pkey 0x1        : 2 # backbone IP with pkey 0x1
>     ipoib, pkey 0x2        : 3 # admin IP with pkey 0x2
> end-qos-ulps
> 
> ib0.8001 maps to pkey 1 (with MSB set to 1 due to full membership =>
> 0x8001 = (1<<16 | 1)
> ib0.8002 maps to pkey 2 (with MSB set to 1 due to full membership =>
> 0x8002 = (1<<16 | 2)
> 
> 2/ Somehow, the qos policy parsing does not map pkeys as we would
> expect, which is what the opensm messages would suggest:
> 
> Nov 25 13:13:05 664690 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
> AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS
> Level SL (3)
> Nov 25 13:13:05 664681 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
> AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS
> Level SL (2)
> Nov 25 13:13:05 664670 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
> AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS
> Level SL (1)
> 
> If the messages are correct and do reflect what opensm is actually
> doing, this would explain why shutting down SL1 (by moving it to VL15)
> prevented all interfaces from running.

What SM are you using?
Does it have the following bug fix:

http://www.openfabrics.org/git/?p=~sashak/management.git;a=commit;h=ef4c8ac3fdd50bb0b7af06887abdb5b73b7ed8c3

-- Yevgeny

>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
>>> 10 -P 8 2>&1; done | grep SUM
>>> [SUM]  0.0-10.0 sec  6.15 GBytes  5.28 Gbits/sec
>>> [SUM]  0.0-10.0 sec  6.00 GBytes  5.16 Gbits/sec
>>> [SUM]  0.0-10.1 sec  5.38 GBytes  4.59 Gbits/sec
>>>
>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
>>> -t 10 -P 8 2>&1; done | grep SUM
>>> [SUM]  0.0-10.0 sec  6.09 GBytes  5.23 Gbits/sec
>>> [SUM]  0.0-10.0 sec  6.41 GBytes  5.51 Gbits/sec
>>> [SUM]  0.0-10.0 sec  4.72 GBytes  4.05 Gbits/sec
>>>
>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t
>>> 10 -P 8 2>&1; done | grep SUM
>>> [SUM]  0.0-10.1 sec  6.96 GBytes  5.92 Gbits/sec
>>> [SUM]  0.0-10.1 sec  5.89 GBytes  5.00 Gbits/sec
>>> [SUM]  0.0-10.0 sec  5.35 GBytes  4.58 Gbits/sec
>>>
>>>> and then
>>>> qos_sl2vl      0,1,2,15,4,5,6,7,8,9,10,11,12,13,14,15
>>> Same results as the previous 0,1,15,3,... SL2vl mapping.
>>>> If this part works well, then we will continue to
>>>> reason no. 2.
>>> In the above tests, I used -P8 to force 8 threads on the client side for
>>> each test.
>>> I have one quad core CPU(Intel  E55400).
>>> This makes 24 iperf threads on 4 cores, which __should__ be fine (well I
>>> suppose ...)
>> Best would be having one qperf per CPU core,
>> which is 4 qperf's in your case.
>>
>> What is your subnet setup?
> Nothing fancy for this test: I just bounce the taffic through a switch;
> 
> [root@pichu16 ~]# ibtracert 49 53
>>From ca {0x2c9000100d00056c} portnum 1 lid 49-49 "pichu16 HCA-1"
> [1] -> switch port {0x0002c9000100d0d4}[22] lid 58-58 "bullX chassis 36
> port QDR switch"
> [28] -> ca port {0x2c9000100d000679}[1] lid 53-53 "pichu22 HCA-1"
> To ca {0x2c9000100d000678} portnum 1 lid 53-53 "pichu22 HCA-1"
> 
> Vincent
> 
>> -- Yevgeny
>>
>>
>>> And regarding reason #3. I still get the error I got yesterday, which
>>> you told me was not important because the SL's set in partitions.conf
>>> would override what was read from qos-policy.conf in the first place.
>>>
>>> Nov 25 13:13:05 664690 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
>>> AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS
>>> Level SL (3)
>>> Nov 25 13:13:05 664681 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
>>> AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS
>>> Level SL (2)
>>> Nov 25 13:13:05 664670 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
>>> AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS
>>> Level SL (1)
>>>
>>> Thanks for your help.
>>>
>>> Vincent
>>>
>>
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <4B0E3B63.40705-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]

* Re: QoS settings not mapped correctly per pkey ?
       [not found]                             ` <4B0E3B63.40705-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2009-11-26  8:49                               ` Vincent Ficet
       [not found]                                 ` <4B0E4105.5080107-6ktuUTfB/bM@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Vincent Ficet @ 2009-11-26  8:49 UTC (permalink / raw)
  To: kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, BOURDE CELINE

Yevgeny Kliteynik wrote:
> Vincent Ficet wrote:
>> Hello Yevgeny,
>>
>>>>> OK, so there are three possible reasons that I can think of:
>>>>> 1. Something is wrong in the configuration.
>>>>> 2. The application does not saturate the link, thus QoS
>>>>>   and the whole VL arbitration thing doesn't kick in.
>>>>> 3. There's some bug, somewhere.
>>>>>
>>>>> Let's start with reason no. 1.
>>>>> Please shut off each of the SLs one by one, and
>>>>> make sure that the application gets zero BW on
>>>>> these SLs. You can do it by mapping SL to VL15:
>>>>>
>>>>> qos_sl2vl      0,15,2,3,4,5,6,7,8,9,10,11,12,13,14,15
>>>> If I shut down this SL by moving it to VL15, the interfaces stop
>>>> pinging.
>>>> This is probably because some IPoIB multicast traffic gets cut off for
>>>> pkey 0x7fff .. ?
>>> Could be, or because ALL interfaces are mapped to
>>> SL1, which is what the results below suggest.
>> Yes, you are right (see below).
>>>> So no results for this one.
>>>>> and then
>>>>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>>>>>
>>>> With this setup, and the following QoS settings:
>>>>
>>>> qos_max_vls    8
>>>> qos_high_limit 1
>>>> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
>>>> qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
>>>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>>>>
>>>> I get roughly the same values for SL 1 to SL3:
>>> That doesn't look right.
>>> You have shut off SL2, so you can't see same
>>> BW for this SL. Looks like there is a problem
>>> in configuration (or bug in SM).
>> Yes, that's correct: There could be a configuration issue or a bug in
>> SM:
>>
>> Current setup and results:
>>
>> qos_max_vls    8
>> qos_high_limit 1
>> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
>> qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>>
>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
>> 10 -P 8 2>&1; done | grep SUM
>> [SUM]  0.0-10.1 sec  9.78 GBytes  8.28 Gbits/sec
>> [SUM]  0.0-10.0 sec  5.69 GBytes  4.89 Gbits/sec
>> [SUM]  0.0-10.0 sec  4.30 GBytes  3.69 Gbits/sec
>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
>> -t 10 -P 8 2>&1; done | grep SUM
>> [SUM]  0.0-10.2 sec  6.44 GBytes  5.45 Gbits/sec
>> [SUM]  0.0-10.1 sec  6.64 GBytes  5.66 Gbits/sec
>> [SUM]  0.0-10.0 sec  6.03 GBytes  5.15 Gbits/sec
>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t
>> 10 -P 8 2>&1; done | grep SUM
>> [SUM]  0.0-10.0 sec  5.80 GBytes  4.98 Gbits/sec
>> [SUM]  0.0-10.0 sec  7.04 GBytes  6.02 Gbits/sec
>> [SUM]  0.0-10.0 sec  6.60 GBytes  5.67 Gbits/sec
>>
>> The -backbone bandwidth should be 0 here.
>>
>>> Have you validated somehow that the interfaces
>>> have been mapped to the right SLs?
>> Two things:
>> 1/ Either the interface have not been mapped properly to the right SL's,
>> but given the config files below, I doubt it:
>>
>> [root@pichu22 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0 <==
>> BOOTPROTO=static
>> IPADDR=10.12.1.10
>> NETMASK=255.255.0.0
>> ONBOOT=yes
>> MTU=2000
>>
>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8001 <==
>> BOOTPROTO=static
>> IPADDR=10.13.1.10
>> NETMASK=255.255.0.0
>> ONBOOT=yes
>> MTU=2000
>>
>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8002 <==
>> BOOTPROTO=static
>> IPADDR=10.14.1.10
>> NETMASK=255.255.0.0
>> ONBOOT=yes
>> MTU=2000
>>
>> partitions.conf:
>> -----------------
>>
>> default=0x7fff,ipoib            : ALL=full;
>> ip_backbone=0x0001,ipoib        : ALL=full;
>> ip_admin=0x0002,ipoib            : ALL=full;
>>
>> qos-policy.conf:
>> ----------------
>> qos-ulps
>>         default            : 0 # default SL
>>     ipoib, pkey 0x7FFF    : 1 # IP with default pkey 0x7FFF
>>     ipoib, pkey 0x1        : 2 # backbone IP with pkey 0x1
>>     ipoib, pkey 0x2        : 3 # admin IP with pkey 0x2
>> end-qos-ulps
>>
>> ib0.8001 maps to pkey 1 (with MSB set to 1 due to full membership =>
>> 0x8001 = (1<<16 | 1)
>> ib0.8002 maps to pkey 2 (with MSB set to 1 due to full membership =>
>> 0x8002 = (1<<16 | 2)
>>
>> 2/ Somehow, the qos policy parsing does not map pkeys as we would
>> expect, which is what the opensm messages would suggest:
>>
>> Nov 25 13:13:05 664690 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
>> AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS
>> Level SL (3)
>> Nov 25 13:13:05 664681 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
>> AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS
>> Level SL (2)
>> Nov 25 13:13:05 664670 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
>> AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS
>> Level SL (1)
>>
>> If the messages are correct and do reflect what opensm is actually
>> doing, this would explain why shutting down SL1 (by moving it to VL15)
>> prevented all interfaces from running.
>
> What SM are you using?
OpenSM 3.3.2
> Does it have the following bug fix:
>
> http://www.openfabrics.org/git/?p=~sashak/management.git;a=commit;h=ef4c8ac3fdd50bb0b7af06887abdb5b73b7ed8c3
>
Yes it does.

The most recent git commit (sorted by date) is for this rpm is:
* Sun Aug 23 2009 Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
commit 3f4954c73add5e7b598883242782607f87c482b4

Apart from the following commit (with a bogus date):
* Tue Jul 24 2035 Keshetti Mahesh <keshetti.mahesh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
commit a0c23ed2194e96816744a075d405ff34c8373fa3

Thanks,

Vincent
>
> -- Yevgeny
>
>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
>>>> 10 -P 8 2>&1; done | grep SUM
>>>> [SUM]  0.0-10.0 sec  6.15 GBytes  5.28 Gbits/sec
>>>> [SUM]  0.0-10.0 sec  6.00 GBytes  5.16 Gbits/sec
>>>> [SUM]  0.0-10.1 sec  5.38 GBytes  4.59 Gbits/sec
>>>>
>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c
>>>> pichu16-backbone
>>>> -t 10 -P 8 2>&1; done | grep SUM
>>>> [SUM]  0.0-10.0 sec  6.09 GBytes  5.23 Gbits/sec
>>>> [SUM]  0.0-10.0 sec  6.41 GBytes  5.51 Gbits/sec
>>>> [SUM]  0.0-10.0 sec  4.72 GBytes  4.05 Gbits/sec
>>>>
>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c
>>>> pichu16-admin -t
>>>> 10 -P 8 2>&1; done | grep SUM
>>>> [SUM]  0.0-10.1 sec  6.96 GBytes  5.92 Gbits/sec
>>>> [SUM]  0.0-10.1 sec  5.89 GBytes  5.00 Gbits/sec
>>>> [SUM]  0.0-10.0 sec  5.35 GBytes  4.58 Gbits/sec
>>>>
>>>>> and then
>>>>> qos_sl2vl      0,1,2,15,4,5,6,7,8,9,10,11,12,13,14,15
>>>> Same results as the previous 0,1,15,3,... SL2vl mapping.
>>>>> If this part works well, then we will continue to
>>>>> reason no. 2.
>>>> In the above tests, I used -P8 to force 8 threads on the client
>>>> side for
>>>> each test.
>>>> I have one quad core CPU(Intel  E55400).
>>>> This makes 24 iperf threads on 4 cores, which __should__ be fine
>>>> (well I
>>>> suppose ...)
>>> Best would be having one qperf per CPU core,
>>> which is 4 qperf's in your case.
>>>
>>> What is your subnet setup?
>> Nothing fancy for this test: I just bounce the taffic through a switch;
>>
>> [root@pichu16 ~]# ibtracert 49 53
>>> From ca {0x2c9000100d00056c} portnum 1 lid 49-49 "pichu16 HCA-1"
>> [1] -> switch port {0x0002c9000100d0d4}[22] lid 58-58 "bullX chassis 36
>> port QDR switch"
>> [28] -> ca port {0x2c9000100d000679}[1] lid 53-53 "pichu22 HCA-1"
>> To ca {0x2c9000100d000678} portnum 1 lid 53-53 "pichu22 HCA-1"
>>
>> Vincent
>>
>>> -- Yevgeny
>>>
>>>
>>>> And regarding reason #3. I still get the error I got yesterday, which
>>>> you told me was not important because the SL's set in partitions.conf
>>>> would override what was read from qos-policy.conf in the first place.
>>>>
>>>> Nov 25 13:13:05 664690 [373E910] 0x01 ->
>>>> __qos_policy_validate_pkey: ERR
>>>> AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS
>>>> Level SL (3)
>>>> Nov 25 13:13:05 664681 [373E910] 0x01 ->
>>>> __qos_policy_validate_pkey: ERR
>>>> AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS
>>>> Level SL (2)
>>>> Nov 25 13:13:05 664670 [373E910] 0x01 ->
>>>> __qos_policy_validate_pkey: ERR
>>>> AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS
>>>> Level SL (1)
>>>>
>>>> Thanks for your help.
>>>>
>>>> Vincent
>>>>
>>>
>>>
>>
>>
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <4B0E4105.5080107-6ktuUTfB/bM@public.gmane.org>]

* Re: QoS settings not mapped correctly per pkey ?
       [not found]                                 ` <4B0E4105.5080107-6ktuUTfB/bM@public.gmane.org>
@ 2009-11-26  9:56                                   ` Yevgeny Kliteynik
       [not found]                                     ` <4B0E50D6.8020401-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Yevgeny Kliteynik @ 2009-11-26  9:56 UTC (permalink / raw)
  To: Vincent Ficet; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, BOURDE CELINE

Vincent Ficet wrote:
> Yevgeny Kliteynik wrote:
>> Vincent Ficet wrote:
>>> Hello Yevgeny,
>>>
>>>>>> OK, so there are three possible reasons that I can think of:
>>>>>> 1. Something is wrong in the configuration.
>>>>>> 2. The application does not saturate the link, thus QoS
>>>>>>   and the whole VL arbitration thing doesn't kick in.
>>>>>> 3. There's some bug, somewhere.
>>>>>>
>>>>>> Let's start with reason no. 1.
>>>>>> Please shut off each of the SLs one by one, and
>>>>>> make sure that the application gets zero BW on
>>>>>> these SLs. You can do it by mapping SL to VL15:
>>>>>>
>>>>>> qos_sl2vl      0,15,2,3,4,5,6,7,8,9,10,11,12,13,14,15
>>>>> If I shut down this SL by moving it to VL15, the interfaces stop
>>>>> pinging.
>>>>> This is probably because some IPoIB multicast traffic gets cut off for
>>>>> pkey 0x7fff .. ?
>>>> Could be, or because ALL interfaces are mapped to
>>>> SL1, which is what the results below suggest.
>>> Yes, you are right (see below).
>>>>> So no results for this one.
>>>>>> and then
>>>>>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>>>>>>
>>>>> With this setup, and the following QoS settings:
>>>>>
>>>>> qos_max_vls    8
>>>>> qos_high_limit 1
>>>>> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
>>>>> qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
>>>>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>>>>>
>>>>> I get roughly the same values for SL 1 to SL3:
>>>> That doesn't look right.
>>>> You have shut off SL2, so you can't see same
>>>> BW for this SL. Looks like there is a problem
>>>> in configuration (or bug in SM).
>>> Yes, that's correct: There could be a configuration issue or a bug in
>>> SM:
>>>
>>> Current setup and results:
>>>
>>> qos_max_vls    8
>>> qos_high_limit 1
>>> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
>>> qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
>>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>>>
>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
>>> 10 -P 8 2>&1; done | grep SUM
>>> [SUM]  0.0-10.1 sec  9.78 GBytes  8.28 Gbits/sec
>>> [SUM]  0.0-10.0 sec  5.69 GBytes  4.89 Gbits/sec
>>> [SUM]  0.0-10.0 sec  4.30 GBytes  3.69 Gbits/sec
>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone
>>> -t 10 -P 8 2>&1; done | grep SUM
>>> [SUM]  0.0-10.2 sec  6.44 GBytes  5.45 Gbits/sec
>>> [SUM]  0.0-10.1 sec  6.64 GBytes  5.66 Gbits/sec
>>> [SUM]  0.0-10.0 sec  6.03 GBytes  5.15 Gbits/sec
>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t
>>> 10 -P 8 2>&1; done | grep SUM
>>> [SUM]  0.0-10.0 sec  5.80 GBytes  4.98 Gbits/sec
>>> [SUM]  0.0-10.0 sec  7.04 GBytes  6.02 Gbits/sec
>>> [SUM]  0.0-10.0 sec  6.60 GBytes  5.67 Gbits/sec
>>>
>>> The -backbone bandwidth should be 0 here.
>>>
>>>> Have you validated somehow that the interfaces
>>>> have been mapped to the right SLs?
>>> Two things:
>>> 1/ Either the interface have not been mapped properly to the right SL's,
>>> but given the config files below, I doubt it:
>>>
>>> [root@pichu22 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
>>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0 <==
>>> BOOTPROTO=static
>>> IPADDR=10.12.1.10
>>> NETMASK=255.255.0.0
>>> ONBOOT=yes
>>> MTU=2000
>>>
>>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8001 <==
>>> BOOTPROTO=static
>>> IPADDR=10.13.1.10
>>> NETMASK=255.255.0.0
>>> ONBOOT=yes
>>> MTU=2000
>>>
>>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8002 <==
>>> BOOTPROTO=static
>>> IPADDR=10.14.1.10
>>> NETMASK=255.255.0.0
>>> ONBOOT=yes
>>> MTU=2000
>>>
>>> partitions.conf:
>>> -----------------
>>>
>>> default=0x7fff,ipoib            : ALL=full;
>>> ip_backbone=0x0001,ipoib        : ALL=full;
>>> ip_admin=0x0002,ipoib            : ALL=full;
>>>
>>> qos-policy.conf:
>>> ----------------
>>> qos-ulps
>>>         default            : 0 # default SL
>>>     ipoib, pkey 0x7FFF    : 1 # IP with default pkey 0x7FFF
>>>     ipoib, pkey 0x1        : 2 # backbone IP with pkey 0x1
>>>     ipoib, pkey 0x2        : 3 # admin IP with pkey 0x2
>>> end-qos-ulps
>>>
>>> ib0.8001 maps to pkey 1 (with MSB set to 1 due to full membership =>
>>> 0x8001 = (1<<16 | 1)
>>> ib0.8002 maps to pkey 2 (with MSB set to 1 due to full membership =>
>>> 0x8002 = (1<<16 | 2)
>>>
>>> 2/ Somehow, the qos policy parsing does not map pkeys as we would
>>> expect, which is what the opensm messages would suggest:
>>>
>>> Nov 25 13:13:05 664690 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
>>> AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS
>>> Level SL (3)
>>> Nov 25 13:13:05 664681 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
>>> AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS
>>> Level SL (2)
>>> Nov 25 13:13:05 664670 [373E910] 0x01 -> __qos_policy_validate_pkey: ERR
>>> AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS
>>> Level SL (1)
>>>
>>> If the messages are correct and do reflect what opensm is actually
>>> doing, this would explain why shutting down SL1 (by moving it to VL15)
>>> prevented all interfaces from running.
>> What SM are you using?
> OpenSM 3.3.2
>> Does it have the following bug fix:
>>
>> http://www.openfabrics.org/git/?p=~sashak/management.git;a=commit;h=ef4c8ac3fdd50bb0b7af06887abdb5b73b7ed8c3
>>
> Yes it does.
> 
> The most recent git commit (sorted by date) is for this rpm is:
> * Sun Aug 23 2009 Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
> commit 3f4954c73add5e7b598883242782607f87c482b4

OK, in that case I ran out of ideas. Need to debug.
We can do it here, but best would be if you open a
bug at bugzilla.
Please run opensm as follows:

  opensm -Q -Y <qos_policy_file> -P <partition_config_file> -e -V -s 0 -d1 &

Wait a minute or so, try your test, and attach OSM
log to the issue.

-- Yevgeny



> Apart from the following commit (with a bogus date):
> * Tue Jul 24 2035 Keshetti Mahesh <keshetti.mahesh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> commit a0c23ed2194e96816744a075d405ff34c8373fa3
> 
> Thanks,
> 
> Vincent
>> -- Yevgeny
>>
>>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
>>>>> 10 -P 8 2>&1; done | grep SUM
>>>>> [SUM]  0.0-10.0 sec  6.15 GBytes  5.28 Gbits/sec
>>>>> [SUM]  0.0-10.0 sec  6.00 GBytes  5.16 Gbits/sec
>>>>> [SUM]  0.0-10.1 sec  5.38 GBytes  4.59 Gbits/sec
>>>>>
>>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c
>>>>> pichu16-backbone
>>>>> -t 10 -P 8 2>&1; done | grep SUM
>>>>> [SUM]  0.0-10.0 sec  6.09 GBytes  5.23 Gbits/sec
>>>>> [SUM]  0.0-10.0 sec  6.41 GBytes  5.51 Gbits/sec
>>>>> [SUM]  0.0-10.0 sec  4.72 GBytes  4.05 Gbits/sec
>>>>>
>>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c
>>>>> pichu16-admin -t
>>>>> 10 -P 8 2>&1; done | grep SUM
>>>>> [SUM]  0.0-10.1 sec  6.96 GBytes  5.92 Gbits/sec
>>>>> [SUM]  0.0-10.1 sec  5.89 GBytes  5.00 Gbits/sec
>>>>> [SUM]  0.0-10.0 sec  5.35 GBytes  4.58 Gbits/sec
>>>>>
>>>>>> and then
>>>>>> qos_sl2vl      0,1,2,15,4,5,6,7,8,9,10,11,12,13,14,15
>>>>> Same results as the previous 0,1,15,3,... SL2vl mapping.
>>>>>> If this part works well, then we will continue to
>>>>>> reason no. 2.
>>>>> In the above tests, I used -P8 to force 8 threads on the client
>>>>> side for
>>>>> each test.
>>>>> I have one quad core CPU(Intel  E55400).
>>>>> This makes 24 iperf threads on 4 cores, which __should__ be fine
>>>>> (well I
>>>>> suppose ...)
>>>> Best would be having one qperf per CPU core,
>>>> which is 4 qperf's in your case.
>>>>
>>>> What is your subnet setup?
>>> Nothing fancy for this test: I just bounce the taffic through a switch;
>>>
>>> [root@pichu16 ~]# ibtracert 49 53
>>>> From ca {0x2c9000100d00056c} portnum 1 lid 49-49 "pichu16 HCA-1"
>>> [1] -> switch port {0x0002c9000100d0d4}[22] lid 58-58 "bullX chassis 36
>>> port QDR switch"
>>> [28] -> ca port {0x2c9000100d000679}[1] lid 53-53 "pichu22 HCA-1"
>>> To ca {0x2c9000100d000678} portnum 1 lid 53-53 "pichu22 HCA-1"
>>>
>>> Vincent
>>>
>>>> -- Yevgeny
>>>>
>>>>
>>>>> And regarding reason #3. I still get the error I got yesterday, which
>>>>> you told me was not important because the SL's set in partitions.conf
>>>>> would override what was read from qos-policy.conf in the first place.
>>>>>
>>>>> Nov 25 13:13:05 664690 [373E910] 0x01 ->
>>>>> __qos_policy_validate_pkey: ERR
>>>>> AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS
>>>>> Level SL (3)
>>>>> Nov 25 13:13:05 664681 [373E910] 0x01 ->
>>>>> __qos_policy_validate_pkey: ERR
>>>>> AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS
>>>>> Level SL (2)
>>>>> Nov 25 13:13:05 664670 [373E910] 0x01 ->
>>>>> __qos_policy_validate_pkey: ERR
>>>>> AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS
>>>>> Level SL (1)
>>>>>
>>>>> Thanks for your help.
>>>>>
>>>>> Vincent
>>>>>
>>>>
>>>
>>
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <4B0E50D6.8020401-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]

* Re: QoS settings not mapped correctly per pkey ?
       [not found]                                     ` <4B0E50D6.8020401-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2009-12-03  8:01                                       ` Yevgeny Kliteynik
       [not found]                                         ` <4B177058.9070909-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Yevgeny Kliteynik @ 2009-12-03  8:01 UTC (permalink / raw)
  To: sebastien dugue
  Cc: Vincent Ficet, linux-rdma-u79uwXL29TY76Z2rM5mHXA, BOURDE CELINE

Sebastien,

I noticed that you found the problem in IPoIB child 
interfaces configuration. Glad that this worked out well. 

My question is about the note that you left in the issue:

 " It looks like in 'datagram' mode, the SL weights
   do not seem to be applied, or maybe this is an
   artifact of IPoIB in 'datagram mode' "

Have you checked that in this mode you do get the right
SL for each child interface by shutting off the relevant
SL (mapping it to VL15)?

If yes, then what you're saying is that you see that
interfaces use the right SL and VL, but you don't see
any arbitration between VLs?

-- Yevgeny

Yevgeny Kliteynik wrote:
> Vincent Ficet wrote:
>> Yevgeny Kliteynik wrote:
>>> Vincent Ficet wrote:
>>>> Hello Yevgeny,
>>>>
>>>>>>> OK, so there are three possible reasons that I can think of:
>>>>>>> 1. Something is wrong in the configuration.
>>>>>>> 2. The application does not saturate the link, thus QoS
>>>>>>>   and the whole VL arbitration thing doesn't kick in.
>>>>>>> 3. There's some bug, somewhere.
>>>>>>>
>>>>>>> Let's start with reason no. 1.
>>>>>>> Please shut off each of the SLs one by one, and
>>>>>>> make sure that the application gets zero BW on
>>>>>>> these SLs. You can do it by mapping SL to VL15:
>>>>>>>
>>>>>>> qos_sl2vl      0,15,2,3,4,5,6,7,8,9,10,11,12,13,14,15
>>>>>> If I shut down this SL by moving it to VL15, the interfaces stop
>>>>>> pinging.
>>>>>> This is probably because some IPoIB multicast traffic gets cut off 
>>>>>> for
>>>>>> pkey 0x7fff .. ?
>>>>> Could be, or because ALL interfaces are mapped to
>>>>> SL1, which is what the results below suggest.
>>>> Yes, you are right (see below).
>>>>>> So no results for this one.
>>>>>>> and then
>>>>>>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>>>>>>>
>>>>>> With this setup, and the following QoS settings:
>>>>>>
>>>>>> qos_max_vls    8
>>>>>> qos_high_limit 1
>>>>>> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
>>>>>> qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
>>>>>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>>>>>>
>>>>>> I get roughly the same values for SL 1 to SL3:
>>>>> That doesn't look right.
>>>>> You have shut off SL2, so you can't see same
>>>>> BW for this SL. Looks like there is a problem
>>>>> in configuration (or bug in SM).
>>>> Yes, that's correct: There could be a configuration issue or a bug in
>>>> SM:
>>>>
>>>> Current setup and results:
>>>>
>>>> qos_max_vls    8
>>>> qos_high_limit 1
>>>> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
>>>> qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
>>>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>>>>
>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
>>>> 10 -P 8 2>&1; done | grep SUM
>>>> [SUM]  0.0-10.1 sec  9.78 GBytes  8.28 Gbits/sec
>>>> [SUM]  0.0-10.0 sec  5.69 GBytes  4.89 Gbits/sec
>>>> [SUM]  0.0-10.0 sec  4.30 GBytes  3.69 Gbits/sec
>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c 
>>>> pichu16-backbone
>>>> -t 10 -P 8 2>&1; done | grep SUM
>>>> [SUM]  0.0-10.2 sec  6.44 GBytes  5.45 Gbits/sec
>>>> [SUM]  0.0-10.1 sec  6.64 GBytes  5.66 Gbits/sec
>>>> [SUM]  0.0-10.0 sec  6.03 GBytes  5.15 Gbits/sec
>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c 
>>>> pichu16-admin -t
>>>> 10 -P 8 2>&1; done | grep SUM
>>>> [SUM]  0.0-10.0 sec  5.80 GBytes  4.98 Gbits/sec
>>>> [SUM]  0.0-10.0 sec  7.04 GBytes  6.02 Gbits/sec
>>>> [SUM]  0.0-10.0 sec  6.60 GBytes  5.67 Gbits/sec
>>>>
>>>> The -backbone bandwidth should be 0 here.
>>>>
>>>>> Have you validated somehow that the interfaces
>>>>> have been mapped to the right SLs?
>>>> Two things:
>>>> 1/ Either the interface have not been mapped properly to the right 
>>>> SL's,
>>>> but given the config files below, I doubt it:
>>>>
>>>> [root@pichu22 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
>>>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0 <==
>>>> BOOTPROTO=static
>>>> IPADDR=10.12.1.10
>>>> NETMASK=255.255.0.0
>>>> ONBOOT=yes
>>>> MTU=2000
>>>>
>>>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8001 <==
>>>> BOOTPROTO=static
>>>> IPADDR=10.13.1.10
>>>> NETMASK=255.255.0.0
>>>> ONBOOT=yes
>>>> MTU=2000
>>>>
>>>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8002 <==
>>>> BOOTPROTO=static
>>>> IPADDR=10.14.1.10
>>>> NETMASK=255.255.0.0
>>>> ONBOOT=yes
>>>> MTU=2000
>>>>
>>>> partitions.conf:
>>>> -----------------
>>>>
>>>> default=0x7fff,ipoib            : ALL=full;
>>>> ip_backbone=0x0001,ipoib        : ALL=full;
>>>> ip_admin=0x0002,ipoib            : ALL=full;
>>>>
>>>> qos-policy.conf:
>>>> ----------------
>>>> qos-ulps
>>>>         default            : 0 # default SL
>>>>     ipoib, pkey 0x7FFF    : 1 # IP with default pkey 0x7FFF
>>>>     ipoib, pkey 0x1        : 2 # backbone IP with pkey 0x1
>>>>     ipoib, pkey 0x2        : 3 # admin IP with pkey 0x2
>>>> end-qos-ulps
>>>>
>>>> ib0.8001 maps to pkey 1 (with MSB set to 1 due to full membership =>
>>>> 0x8001 = (1<<16 | 1)
>>>> ib0.8002 maps to pkey 2 (with MSB set to 1 due to full membership =>
>>>> 0x8002 = (1<<16 | 2)
>>>>
>>>> 2/ Somehow, the qos policy parsing does not map pkeys as we would
>>>> expect, which is what the opensm messages would suggest:
>>>>
>>>> Nov 25 13:13:05 664690 [373E910] 0x01 -> __qos_policy_validate_pkey: 
>>>> ERR
>>>> AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS
>>>> Level SL (3)
>>>> Nov 25 13:13:05 664681 [373E910] 0x01 -> __qos_policy_validate_pkey: 
>>>> ERR
>>>> AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS
>>>> Level SL (2)
>>>> Nov 25 13:13:05 664670 [373E910] 0x01 -> __qos_policy_validate_pkey: 
>>>> ERR
>>>> AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS
>>>> Level SL (1)
>>>>
>>>> If the messages are correct and do reflect what opensm is actually
>>>> doing, this would explain why shutting down SL1 (by moving it to VL15)
>>>> prevented all interfaces from running.
>>> What SM are you using?
>> OpenSM 3.3.2
>>> Does it have the following bug fix:
>>>
>>> http://www.openfabrics.org/git/?p=~sashak/management.git;a=commit;h=ef4c8ac3fdd50bb0b7af06887abdb5b73b7ed8c3 
>>>
>>>
>> Yes it does.
>>
>> The most recent git commit (sorted by date) is for this rpm is:
>> * Sun Aug 23 2009 Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
>> commit 3f4954c73add5e7b598883242782607f87c482b4
> 
> OK, in that case I ran out of ideas. Need to debug.
> We can do it here, but best would be if you open a
> bug at bugzilla.
> Please run opensm as follows:
> 
>  opensm -Q -Y <qos_policy_file> -P <partition_config_file> -e -V -s 0 -d1 &
> 
> Wait a minute or so, try your test, and attach OSM
> log to the issue.
> 
> -- Yevgeny
> 
> 
> 
>> Apart from the following commit (with a bogus date):
>> * Tue Jul 24 2035 Keshetti Mahesh <keshetti.mahesh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>> commit a0c23ed2194e96816744a075d405ff34c8373fa3
>>
>> Thanks,
>>
>> Vincent
>>> -- Yevgeny
>>>
>>>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c 
>>>>>> pichu16-ic0 -t
>>>>>> 10 -P 8 2>&1; done | grep SUM
>>>>>> [SUM]  0.0-10.0 sec  6.15 GBytes  5.28 Gbits/sec
>>>>>> [SUM]  0.0-10.0 sec  6.00 GBytes  5.16 Gbits/sec
>>>>>> [SUM]  0.0-10.1 sec  5.38 GBytes  4.59 Gbits/sec
>>>>>>
>>>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c
>>>>>> pichu16-backbone
>>>>>> -t 10 -P 8 2>&1; done | grep SUM
>>>>>> [SUM]  0.0-10.0 sec  6.09 GBytes  5.23 Gbits/sec
>>>>>> [SUM]  0.0-10.0 sec  6.41 GBytes  5.51 Gbits/sec
>>>>>> [SUM]  0.0-10.0 sec  4.72 GBytes  4.05 Gbits/sec
>>>>>>
>>>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c
>>>>>> pichu16-admin -t
>>>>>> 10 -P 8 2>&1; done | grep SUM
>>>>>> [SUM]  0.0-10.1 sec  6.96 GBytes  5.92 Gbits/sec
>>>>>> [SUM]  0.0-10.1 sec  5.89 GBytes  5.00 Gbits/sec
>>>>>> [SUM]  0.0-10.0 sec  5.35 GBytes  4.58 Gbits/sec
>>>>>>
>>>>>>> and then
>>>>>>> qos_sl2vl      0,1,2,15,4,5,6,7,8,9,10,11,12,13,14,15
>>>>>> Same results as the previous 0,1,15,3,... SL2vl mapping.
>>>>>>> If this part works well, then we will continue to
>>>>>>> reason no. 2.
>>>>>> In the above tests, I used -P8 to force 8 threads on the client
>>>>>> side for
>>>>>> each test.
>>>>>> I have one quad core CPU(Intel  E55400).
>>>>>> This makes 24 iperf threads on 4 cores, which __should__ be fine
>>>>>> (well I
>>>>>> suppose ...)
>>>>> Best would be having one qperf per CPU core,
>>>>> which is 4 qperf's in your case.
>>>>>
>>>>> What is your subnet setup?
>>>> Nothing fancy for this test: I just bounce the taffic through a switch;
>>>>
>>>> [root@pichu16 ~]# ibtracert 49 53
>>>>> From ca {0x2c9000100d00056c} portnum 1 lid 49-49 "pichu16 HCA-1"
>>>> [1] -> switch port {0x0002c9000100d0d4}[22] lid 58-58 "bullX chassis 36
>>>> port QDR switch"
>>>> [28] -> ca port {0x2c9000100d000679}[1] lid 53-53 "pichu22 HCA-1"
>>>> To ca {0x2c9000100d000678} portnum 1 lid 53-53 "pichu22 HCA-1"
>>>>
>>>> Vincent
>>>>
>>>>> -- Yevgeny
>>>>>
>>>>>
>>>>>> And regarding reason #3. I still get the error I got yesterday, which
>>>>>> you told me was not important because the SL's set in partitions.conf
>>>>>> would override what was read from qos-policy.conf in the first place.
>>>>>>
>>>>>> Nov 25 13:13:05 664690 [373E910] 0x01 ->
>>>>>> __qos_policy_validate_pkey: ERR
>>>>>> AC15: pkey 0x0002 in match rule - overriding partition SL (0) with 
>>>>>> QoS
>>>>>> Level SL (3)
>>>>>> Nov 25 13:13:05 664681 [373E910] 0x01 ->
>>>>>> __qos_policy_validate_pkey: ERR
>>>>>> AC15: pkey 0x0001 in match rule - overriding partition SL (0) with 
>>>>>> QoS
>>>>>> Level SL (2)
>>>>>> Nov 25 13:13:05 664670 [373E910] 0x01 ->
>>>>>> __qos_policy_validate_pkey: ERR
>>>>>> AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with 
>>>>>> QoS
>>>>>> Level SL (1)
>>>>>>
>>>>>> Thanks for your help.
>>>>>>
>>>>>> Vincent
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <4B177058.9070909-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]

* Re: QoS settings not mapped correctly per pkey ?
       [not found]                                         ` <4B177058.9070909-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2009-12-03  8:17                                           ` sebastien dugue
  2009-12-03  9:04                                             ` Yevgeny Kliteynik
  2009-12-03  8:21                                           ` Or Gerlitz
  1 sibling, 1 reply; 16+ messages in thread
From: sebastien dugue @ 2009-12-03  8:17 UTC (permalink / raw)
  To: kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb
  Cc: Vincent Ficet, linux-rdma-u79uwXL29TY76Z2rM5mHXA, BOURDE CELINE


  Hi Yevgeny,

On Thu, 03 Dec 2009 10:01:28 +0200
Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:

> Sebastien,
> 
> I noticed that you found the problem in IPoIB child 
> interfaces configuration. Glad that this worked out well. 
> 
> My question is about the note that you left in the issue:
> 
>  " It looks like in 'datagram' mode, the SL weights
>    do not seem to be applied, or maybe this is an
>    artifact of IPoIB in 'datagram mode' "
> 
> Have you checked that in this mode you do get the right
> SL for each child interface by shutting off the relevant
> SL (mapping it to VL15)?

  Yes, SL to VL mapping is OK.

> 
> If yes, then what you're saying is that you see that
> interfaces use the right SL and VL, but you don't see
> any arbitration between VLs?

  Right, whatever the weights I put in the vlarbs tables have absolutely
no effect when IPoIB is in datagram mode. I don't know if it's
an arbitration problem (don't think so) or an IPoIB problem.

  Could be that due to the 2044 bytes MTU in datagram mode, iperf
spends much time not doing transfers and fails to provide enough
data to the interfaces. Don't know.

  Once I switched to connected mode, with a 65520 bytes MTU, things
started to work OK with a much better overall combined bandwidth.

  Thanks,

  Sébastien.

> 
> -- Yevgeny
> 
> Yevgeny Kliteynik wrote:
> > Vincent Ficet wrote:
> >> Yevgeny Kliteynik wrote:
> >>> Vincent Ficet wrote:
> >>>> Hello Yevgeny,
> >>>>
> >>>>>>> OK, so there are three possible reasons that I can think of:
> >>>>>>> 1. Something is wrong in the configuration.
> >>>>>>> 2. The application does not saturate the link, thus QoS
> >>>>>>>   and the whole VL arbitration thing doesn't kick in.
> >>>>>>> 3. There's some bug, somewhere.
> >>>>>>>
> >>>>>>> Let's start with reason no. 1.
> >>>>>>> Please shut off each of the SLs one by one, and
> >>>>>>> make sure that the application gets zero BW on
> >>>>>>> these SLs. You can do it by mapping SL to VL15:
> >>>>>>>
> >>>>>>> qos_sl2vl      0,15,2,3,4,5,6,7,8,9,10,11,12,13,14,15
> >>>>>> If I shut down this SL by moving it to VL15, the interfaces stop
> >>>>>> pinging.
> >>>>>> This is probably because some IPoIB multicast traffic gets cut off 
> >>>>>> for
> >>>>>> pkey 0x7fff .. ?
> >>>>> Could be, or because ALL interfaces are mapped to
> >>>>> SL1, which is what the results below suggest.
> >>>> Yes, you are right (see below).
> >>>>>> So no results for this one.
> >>>>>>> and then
> >>>>>>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
> >>>>>>>
> >>>>>> With this setup, and the following QoS settings:
> >>>>>>
> >>>>>> qos_max_vls    8
> >>>>>> qos_high_limit 1
> >>>>>> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
> >>>>>> qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
> >>>>>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
> >>>>>>
> >>>>>> I get roughly the same values for SL 1 to SL3:
> >>>>> That doesn't look right.
> >>>>> You have shut off SL2, so you can't see same
> >>>>> BW for this SL. Looks like there is a problem
> >>>>> in configuration (or bug in SM).
> >>>> Yes, that's correct: There could be a configuration issue or a bug in
> >>>> SM:
> >>>>
> >>>> Current setup and results:
> >>>>
> >>>> qos_max_vls    8
> >>>> qos_high_limit 1
> >>>> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
> >>>> qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
> >>>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
> >>>>
> >>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
> >>>> 10 -P 8 2>&1; done | grep SUM
> >>>> [SUM]  0.0-10.1 sec  9.78 GBytes  8.28 Gbits/sec
> >>>> [SUM]  0.0-10.0 sec  5.69 GBytes  4.89 Gbits/sec
> >>>> [SUM]  0.0-10.0 sec  4.30 GBytes  3.69 Gbits/sec
> >>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c 
> >>>> pichu16-backbone
> >>>> -t 10 -P 8 2>&1; done | grep SUM
> >>>> [SUM]  0.0-10.2 sec  6.44 GBytes  5.45 Gbits/sec
> >>>> [SUM]  0.0-10.1 sec  6.64 GBytes  5.66 Gbits/sec
> >>>> [SUM]  0.0-10.0 sec  6.03 GBytes  5.15 Gbits/sec
> >>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c 
> >>>> pichu16-admin -t
> >>>> 10 -P 8 2>&1; done | grep SUM
> >>>> [SUM]  0.0-10.0 sec  5.80 GBytes  4.98 Gbits/sec
> >>>> [SUM]  0.0-10.0 sec  7.04 GBytes  6.02 Gbits/sec
> >>>> [SUM]  0.0-10.0 sec  6.60 GBytes  5.67 Gbits/sec
> >>>>
> >>>> The -backbone bandwidth should be 0 here.
> >>>>
> >>>>> Have you validated somehow that the interfaces
> >>>>> have been mapped to the right SLs?
> >>>> Two things:
> >>>> 1/ Either the interface have not been mapped properly to the right 
> >>>> SL's,
> >>>> but given the config files below, I doubt it:
> >>>>
> >>>> [root@pichu22 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
> >>>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0 <==
> >>>> BOOTPROTO=static
> >>>> IPADDR=10.12.1.10
> >>>> NETMASK=255.255.0.0
> >>>> ONBOOT=yes
> >>>> MTU=2000
> >>>>
> >>>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8001 <==
> >>>> BOOTPROTO=static
> >>>> IPADDR=10.13.1.10
> >>>> NETMASK=255.255.0.0
> >>>> ONBOOT=yes
> >>>> MTU=2000
> >>>>
> >>>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8002 <==
> >>>> BOOTPROTO=static
> >>>> IPADDR=10.14.1.10
> >>>> NETMASK=255.255.0.0
> >>>> ONBOOT=yes
> >>>> MTU=2000
> >>>>
> >>>> partitions.conf:
> >>>> -----------------
> >>>>
> >>>> default=0x7fff,ipoib            : ALL=full;
> >>>> ip_backbone=0x0001,ipoib        : ALL=full;
> >>>> ip_admin=0x0002,ipoib            : ALL=full;
> >>>>
> >>>> qos-policy.conf:
> >>>> ----------------
> >>>> qos-ulps
> >>>>         default            : 0 # default SL
> >>>>     ipoib, pkey 0x7FFF    : 1 # IP with default pkey 0x7FFF
> >>>>     ipoib, pkey 0x1        : 2 # backbone IP with pkey 0x1
> >>>>     ipoib, pkey 0x2        : 3 # admin IP with pkey 0x2
> >>>> end-qos-ulps
> >>>>
> >>>> ib0.8001 maps to pkey 1 (with MSB set to 1 due to full membership =>
> >>>> 0x8001 = (1<<16 | 1)
> >>>> ib0.8002 maps to pkey 2 (with MSB set to 1 due to full membership =>
> >>>> 0x8002 = (1<<16 | 2)
> >>>>
> >>>> 2/ Somehow, the qos policy parsing does not map pkeys as we would
> >>>> expect, which is what the opensm messages would suggest:
> >>>>
> >>>> Nov 25 13:13:05 664690 [373E910] 0x01 -> __qos_policy_validate_pkey: 
> >>>> ERR
> >>>> AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS
> >>>> Level SL (3)
> >>>> Nov 25 13:13:05 664681 [373E910] 0x01 -> __qos_policy_validate_pkey: 
> >>>> ERR
> >>>> AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS
> >>>> Level SL (2)
> >>>> Nov 25 13:13:05 664670 [373E910] 0x01 -> __qos_policy_validate_pkey: 
> >>>> ERR
> >>>> AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS
> >>>> Level SL (1)
> >>>>
> >>>> If the messages are correct and do reflect what opensm is actually
> >>>> doing, this would explain why shutting down SL1 (by moving it to VL15)
> >>>> prevented all interfaces from running.
> >>> What SM are you using?
> >> OpenSM 3.3.2
> >>> Does it have the following bug fix:
> >>>
> >>> http://www.openfabrics.org/git/?p=~sashak/management.git;a=commit;h=ef4c8ac3fdd50bb0b7af06887abdb5b73b7ed8c3 
> >>>
> >>>
> >> Yes it does.
> >>
> >> The most recent git commit (sorted by date) is for this rpm is:
> >> * Sun Aug 23 2009 Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
> >> commit 3f4954c73add5e7b598883242782607f87c482b4
> > 
> > OK, in that case I ran out of ideas. Need to debug.
> > We can do it here, but best would be if you open a
> > bug at bugzilla.
> > Please run opensm as follows:
> > 
> >  opensm -Q -Y <qos_policy_file> -P <partition_config_file> -e -V -s 0 -d1 &
> > 
> > Wait a minute or so, try your test, and attach OSM
> > log to the issue.
> > 
> > -- Yevgeny
> > 
> > 
> > 
> >> Apart from the following commit (with a bogus date):
> >> * Tue Jul 24 2035 Keshetti Mahesh <keshetti.mahesh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >> commit a0c23ed2194e96816744a075d405ff34c8373fa3
> >>
> >> Thanks,
> >>
> >> Vincent
> >>> -- Yevgeny
> >>>
> >>>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c 
> >>>>>> pichu16-ic0 -t
> >>>>>> 10 -P 8 2>&1; done | grep SUM
> >>>>>> [SUM]  0.0-10.0 sec  6.15 GBytes  5.28 Gbits/sec
> >>>>>> [SUM]  0.0-10.0 sec  6.00 GBytes  5.16 Gbits/sec
> >>>>>> [SUM]  0.0-10.1 sec  5.38 GBytes  4.59 Gbits/sec
> >>>>>>
> >>>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c
> >>>>>> pichu16-backbone
> >>>>>> -t 10 -P 8 2>&1; done | grep SUM
> >>>>>> [SUM]  0.0-10.0 sec  6.09 GBytes  5.23 Gbits/sec
> >>>>>> [SUM]  0.0-10.0 sec  6.41 GBytes  5.51 Gbits/sec
> >>>>>> [SUM]  0.0-10.0 sec  4.72 GBytes  4.05 Gbits/sec
> >>>>>>
> >>>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c
> >>>>>> pichu16-admin -t
> >>>>>> 10 -P 8 2>&1; done | grep SUM
> >>>>>> [SUM]  0.0-10.1 sec  6.96 GBytes  5.92 Gbits/sec
> >>>>>> [SUM]  0.0-10.1 sec  5.89 GBytes  5.00 Gbits/sec
> >>>>>> [SUM]  0.0-10.0 sec  5.35 GBytes  4.58 Gbits/sec
> >>>>>>
> >>>>>>> and then
> >>>>>>> qos_sl2vl      0,1,2,15,4,5,6,7,8,9,10,11,12,13,14,15
> >>>>>> Same results as the previous 0,1,15,3,... SL2vl mapping.
> >>>>>>> If this part works well, then we will continue to
> >>>>>>> reason no. 2.
> >>>>>> In the above tests, I used -P8 to force 8 threads on the client
> >>>>>> side for
> >>>>>> each test.
> >>>>>> I have one quad core CPU(Intel  E55400).
> >>>>>> This makes 24 iperf threads on 4 cores, which __should__ be fine
> >>>>>> (well I
> >>>>>> suppose ...)
> >>>>> Best would be having one qperf per CPU core,
> >>>>> which is 4 qperf's in your case.
> >>>>>
> >>>>> What is your subnet setup?
> >>>> Nothing fancy for this test: I just bounce the taffic through a switch;
> >>>>
> >>>> [root@pichu16 ~]# ibtracert 49 53
> >>>>> From ca {0x2c9000100d00056c} portnum 1 lid 49-49 "pichu16 HCA-1"
> >>>> [1] -> switch port {0x0002c9000100d0d4}[22] lid 58-58 "bullX chassis 36
> >>>> port QDR switch"
> >>>> [28] -> ca port {0x2c9000100d000679}[1] lid 53-53 "pichu22 HCA-1"
> >>>> To ca {0x2c9000100d000678} portnum 1 lid 53-53 "pichu22 HCA-1"
> >>>>
> >>>> Vincent
> >>>>
> >>>>> -- Yevgeny
> >>>>>
> >>>>>
> >>>>>> And regarding reason #3. I still get the error I got yesterday, which
> >>>>>> you told me was not important because the SL's set in partitions.conf
> >>>>>> would override what was read from qos-policy.conf in the first place.
> >>>>>>
> >>>>>> Nov 25 13:13:05 664690 [373E910] 0x01 ->
> >>>>>> __qos_policy_validate_pkey: ERR
> >>>>>> AC15: pkey 0x0002 in match rule - overriding partition SL (0) with 
> >>>>>> QoS
> >>>>>> Level SL (3)
> >>>>>> Nov 25 13:13:05 664681 [373E910] 0x01 ->
> >>>>>> __qos_policy_validate_pkey: ERR
> >>>>>> AC15: pkey 0x0001 in match rule - overriding partition SL (0) with 
> >>>>>> QoS
> >>>>>> Level SL (2)
> >>>>>> Nov 25 13:13:05 664670 [373E910] 0x01 ->
> >>>>>> __qos_policy_validate_pkey: ERR
> >>>>>> AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with 
> >>>>>> QoS
> >>>>>> Level SL (1)
> >>>>>>
> >>>>>> Thanks for your help.
> >>>>>>
> >>>>>> Vincent
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >>
> > 
> > -- 
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: QoS settings not mapped correctly per pkey ?
  2009-12-03  8:17                                           ` sebastien dugue
@ 2009-12-03  9:04                                             ` Yevgeny Kliteynik
       [not found]                                               ` <4B177F10.1040908-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 16+ messages in thread
From: Yevgeny Kliteynik @ 2009-12-03  9:04 UTC (permalink / raw)
  To: sebastien dugue
  Cc: Vincent Ficet, linux-rdma-u79uwXL29TY76Z2rM5mHXA, BOURDE CELINE

sebastien dugue wrote:
>   Hi Yevgeny,
> 
> On Thu, 03 Dec 2009 10:01:28 +0200
> Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> 
>> Sebastien,
>>
>> I noticed that you found the problem in IPoIB child 
>> interfaces configuration. Glad that this worked out well. 
>>
>> My question is about the note that you left in the issue:
>>
>>  " It looks like in 'datagram' mode, the SL weights
>>    do not seem to be applied, or maybe this is an
>>    artifact of IPoIB in 'datagram mode' "
>>
>> Have you checked that in this mode you do get the right
>> SL for each child interface by shutting off the relevant
>> SL (mapping it to VL15)?
> 
>   Yes, SL to VL mapping is OK.
> 
>> If yes, then what you're saying is that you see that
>> interfaces use the right SL and VL, but you don't see
>> any arbitration between VLs?
> 
>   Right, whatever the weights I put in the vlarbs tables have absolutely
> no effect when IPoIB is in datagram mode. I don't know if it's
> an arbitration problem (don't think so) or an IPoIB problem.
> 
>   Could be that due to the 2044 bytes MTU in datagram mode, iperf
> spends much time not doing transfers and fails to provide enough
> data to the interfaces. Don't know.
> 
>   Once I switched to connected mode, with a 65520 bytes MTU, things
> started to work OK with a much better overall combined bandwidth.

OK, then "a much better overall combined bandwidth" is an
answer here. VL arbitration kicks in only when you saturate
the link. If you don't, there's no point doing arbitration,
because HW is able so serve any packet that comes w/o the
need to prioritize.

-- Yevgeny
 
>   Thanks,
> 
>   Sébastien.
> 
>> -- Yevgeny
>>
>> Yevgeny Kliteynik wrote:
>>> Vincent Ficet wrote:
>>>> Yevgeny Kliteynik wrote:
>>>>> Vincent Ficet wrote:
>>>>>> Hello Yevgeny,
>>>>>>
>>>>>>>>> OK, so there are three possible reasons that I can think of:
>>>>>>>>> 1. Something is wrong in the configuration.
>>>>>>>>> 2. The application does not saturate the link, thus QoS
>>>>>>>>>   and the whole VL arbitration thing doesn't kick in.
>>>>>>>>> 3. There's some bug, somewhere.
>>>>>>>>>
>>>>>>>>> Let's start with reason no. 1.
>>>>>>>>> Please shut off each of the SLs one by one, and
>>>>>>>>> make sure that the application gets zero BW on
>>>>>>>>> these SLs. You can do it by mapping SL to VL15:
>>>>>>>>>
>>>>>>>>> qos_sl2vl      0,15,2,3,4,5,6,7,8,9,10,11,12,13,14,15
>>>>>>>> If I shut down this SL by moving it to VL15, the interfaces stop
>>>>>>>> pinging.
>>>>>>>> This is probably because some IPoIB multicast traffic gets cut off 
>>>>>>>> for
>>>>>>>> pkey 0x7fff .. ?
>>>>>>> Could be, or because ALL interfaces are mapped to
>>>>>>> SL1, which is what the results below suggest.
>>>>>> Yes, you are right (see below).
>>>>>>>> So no results for this one.
>>>>>>>>> and then
>>>>>>>>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>>>>>>>>>
>>>>>>>> With this setup, and the following QoS settings:
>>>>>>>>
>>>>>>>> qos_max_vls    8
>>>>>>>> qos_high_limit 1
>>>>>>>> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
>>>>>>>> qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
>>>>>>>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>>>>>>>>
>>>>>>>> I get roughly the same values for SL 1 to SL3:
>>>>>>> That doesn't look right.
>>>>>>> You have shut off SL2, so you can't see same
>>>>>>> BW for this SL. Looks like there is a problem
>>>>>>> in configuration (or bug in SM).
>>>>>> Yes, that's correct: There could be a configuration issue or a bug in
>>>>>> SM:
>>>>>>
>>>>>> Current setup and results:
>>>>>>
>>>>>> qos_max_vls    8
>>>>>> qos_high_limit 1
>>>>>> qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0
>>>>>> qos_vlarb_low  0:1,1:64,2:128,3:192,4:0,5:0
>>>>>> qos_sl2vl      0,1,15,3,4,5,6,7,8,9,10,11,12,13,14,15
>>>>>>
>>>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t
>>>>>> 10 -P 8 2>&1; done | grep SUM
>>>>>> [SUM]  0.0-10.1 sec  9.78 GBytes  8.28 Gbits/sec
>>>>>> [SUM]  0.0-10.0 sec  5.69 GBytes  4.89 Gbits/sec
>>>>>> [SUM]  0.0-10.0 sec  4.30 GBytes  3.69 Gbits/sec
>>>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c 
>>>>>> pichu16-backbone
>>>>>> -t 10 -P 8 2>&1; done | grep SUM
>>>>>> [SUM]  0.0-10.2 sec  6.44 GBytes  5.45 Gbits/sec
>>>>>> [SUM]  0.0-10.1 sec  6.64 GBytes  5.66 Gbits/sec
>>>>>> [SUM]  0.0-10.0 sec  6.03 GBytes  5.15 Gbits/sec
>>>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c 
>>>>>> pichu16-admin -t
>>>>>> 10 -P 8 2>&1; done | grep SUM
>>>>>> [SUM]  0.0-10.0 sec  5.80 GBytes  4.98 Gbits/sec
>>>>>> [SUM]  0.0-10.0 sec  7.04 GBytes  6.02 Gbits/sec
>>>>>> [SUM]  0.0-10.0 sec  6.60 GBytes  5.67 Gbits/sec
>>>>>>
>>>>>> The -backbone bandwidth should be 0 here.
>>>>>>
>>>>>>> Have you validated somehow that the interfaces
>>>>>>> have been mapped to the right SLs?
>>>>>> Two things:
>>>>>> 1/ Either the interface have not been mapped properly to the right 
>>>>>> SL's,
>>>>>> but given the config files below, I doubt it:
>>>>>>
>>>>>> [root@pichu22 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0*
>>>>>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0 <==
>>>>>> BOOTPROTO=static
>>>>>> IPADDR=10.12.1.10
>>>>>> NETMASK=255.255.0.0
>>>>>> ONBOOT=yes
>>>>>> MTU=2000
>>>>>>
>>>>>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8001 <==
>>>>>> BOOTPROTO=static
>>>>>> IPADDR=10.13.1.10
>>>>>> NETMASK=255.255.0.0
>>>>>> ONBOOT=yes
>>>>>> MTU=2000
>>>>>>
>>>>>> ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8002 <==
>>>>>> BOOTPROTO=static
>>>>>> IPADDR=10.14.1.10
>>>>>> NETMASK=255.255.0.0
>>>>>> ONBOOT=yes
>>>>>> MTU=2000
>>>>>>
>>>>>> partitions.conf:
>>>>>> -----------------
>>>>>>
>>>>>> default=0x7fff,ipoib            : ALL=full;
>>>>>> ip_backbone=0x0001,ipoib        : ALL=full;
>>>>>> ip_admin=0x0002,ipoib            : ALL=full;
>>>>>>
>>>>>> qos-policy.conf:
>>>>>> ----------------
>>>>>> qos-ulps
>>>>>>         default            : 0 # default SL
>>>>>>     ipoib, pkey 0x7FFF    : 1 # IP with default pkey 0x7FFF
>>>>>>     ipoib, pkey 0x1        : 2 # backbone IP with pkey 0x1
>>>>>>     ipoib, pkey 0x2        : 3 # admin IP with pkey 0x2
>>>>>> end-qos-ulps
>>>>>>
>>>>>> ib0.8001 maps to pkey 1 (with MSB set to 1 due to full membership =>
>>>>>> 0x8001 = (1<<16 | 1)
>>>>>> ib0.8002 maps to pkey 2 (with MSB set to 1 due to full membership =>
>>>>>> 0x8002 = (1<<16 | 2)
>>>>>>
>>>>>> 2/ Somehow, the qos policy parsing does not map pkeys as we would
>>>>>> expect, which is what the opensm messages would suggest:
>>>>>>
>>>>>> Nov 25 13:13:05 664690 [373E910] 0x01 -> __qos_policy_validate_pkey: 
>>>>>> ERR
>>>>>> AC15: pkey 0x0002 in match rule - overriding partition SL (0) with QoS
>>>>>> Level SL (3)
>>>>>> Nov 25 13:13:05 664681 [373E910] 0x01 -> __qos_policy_validate_pkey: 
>>>>>> ERR
>>>>>> AC15: pkey 0x0001 in match rule - overriding partition SL (0) with QoS
>>>>>> Level SL (2)
>>>>>> Nov 25 13:13:05 664670 [373E910] 0x01 -> __qos_policy_validate_pkey: 
>>>>>> ERR
>>>>>> AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with QoS
>>>>>> Level SL (1)
>>>>>>
>>>>>> If the messages are correct and do reflect what opensm is actually
>>>>>> doing, this would explain why shutting down SL1 (by moving it to VL15)
>>>>>> prevented all interfaces from running.
>>>>> What SM are you using?
>>>> OpenSM 3.3.2
>>>>> Does it have the following bug fix:
>>>>>
>>>>> http://www.openfabrics.org/git/?p=~sashak/management.git;a=commit;h=ef4c8ac3fdd50bb0b7af06887abdb5b73b7ed8c3 
>>>>>
>>>>>
>>>> Yes it does.
>>>>
>>>> The most recent git commit (sorted by date) is for this rpm is:
>>>> * Sun Aug 23 2009 Sasha Khapyorsky <sashak-smomgflXvOZWk0Htik3J/w@public.gmane.org>
>>>> commit 3f4954c73add5e7b598883242782607f87c482b4
>>> OK, in that case I ran out of ideas. Need to debug.
>>> We can do it here, but best would be if you open a
>>> bug at bugzilla.
>>> Please run opensm as follows:
>>>
>>>  opensm -Q -Y <qos_policy_file> -P <partition_config_file> -e -V -s 0 -d1 &
>>>
>>> Wait a minute or so, try your test, and attach OSM
>>> log to the issue.
>>>
>>> -- Yevgeny
>>>
>>>
>>>
>>>> Apart from the following commit (with a bogus date):
>>>> * Tue Jul 24 2035 Keshetti Mahesh <keshetti.mahesh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>>> commit a0c23ed2194e96816744a075d405ff34c8373fa3
>>>>
>>>> Thanks,
>>>>
>>>> Vincent
>>>>> -- Yevgeny
>>>>>
>>>>>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c 
>>>>>>>> pichu16-ic0 -t
>>>>>>>> 10 -P 8 2>&1; done | grep SUM
>>>>>>>> [SUM]  0.0-10.0 sec  6.15 GBytes  5.28 Gbits/sec
>>>>>>>> [SUM]  0.0-10.0 sec  6.00 GBytes  5.16 Gbits/sec
>>>>>>>> [SUM]  0.0-10.1 sec  5.38 GBytes  4.59 Gbits/sec
>>>>>>>>
>>>>>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c
>>>>>>>> pichu16-backbone
>>>>>>>> -t 10 -P 8 2>&1; done | grep SUM
>>>>>>>> [SUM]  0.0-10.0 sec  6.09 GBytes  5.23 Gbits/sec
>>>>>>>> [SUM]  0.0-10.0 sec  6.41 GBytes  5.51 Gbits/sec
>>>>>>>> [SUM]  0.0-10.0 sec  4.72 GBytes  4.05 Gbits/sec
>>>>>>>>
>>>>>>>> [root@pichu22 ~]# while test -e keep_going; do iperf -c
>>>>>>>> pichu16-admin -t
>>>>>>>> 10 -P 8 2>&1; done | grep SUM
>>>>>>>> [SUM]  0.0-10.1 sec  6.96 GBytes  5.92 Gbits/sec
>>>>>>>> [SUM]  0.0-10.1 sec  5.89 GBytes  5.00 Gbits/sec
>>>>>>>> [SUM]  0.0-10.0 sec  5.35 GBytes  4.58 Gbits/sec
>>>>>>>>
>>>>>>>>> and then
>>>>>>>>> qos_sl2vl      0,1,2,15,4,5,6,7,8,9,10,11,12,13,14,15
>>>>>>>> Same results as the previous 0,1,15,3,... SL2vl mapping.
>>>>>>>>> If this part works well, then we will continue to
>>>>>>>>> reason no. 2.
>>>>>>>> In the above tests, I used -P8 to force 8 threads on the client
>>>>>>>> side for
>>>>>>>> each test.
>>>>>>>> I have one quad core CPU(Intel  E55400).
>>>>>>>> This makes 24 iperf threads on 4 cores, which __should__ be fine
>>>>>>>> (well I
>>>>>>>> suppose ...)
>>>>>>> Best would be having one qperf per CPU core,
>>>>>>> which is 4 qperf's in your case.
>>>>>>>
>>>>>>> What is your subnet setup?
>>>>>> Nothing fancy for this test: I just bounce the taffic through a switch;
>>>>>>
>>>>>> [root@pichu16 ~]# ibtracert 49 53
>>>>>>> From ca {0x2c9000100d00056c} portnum 1 lid 49-49 "pichu16 HCA-1"
>>>>>> [1] -> switch port {0x0002c9000100d0d4}[22] lid 58-58 "bullX chassis 36
>>>>>> port QDR switch"
>>>>>> [28] -> ca port {0x2c9000100d000679}[1] lid 53-53 "pichu22 HCA-1"
>>>>>> To ca {0x2c9000100d000678} portnum 1 lid 53-53 "pichu22 HCA-1"
>>>>>>
>>>>>> Vincent
>>>>>>
>>>>>>> -- Yevgeny
>>>>>>>
>>>>>>>
>>>>>>>> And regarding reason #3. I still get the error I got yesterday, which
>>>>>>>> you told me was not important because the SL's set in partitions.conf
>>>>>>>> would override what was read from qos-policy.conf in the first place.
>>>>>>>>
>>>>>>>> Nov 25 13:13:05 664690 [373E910] 0x01 ->
>>>>>>>> __qos_policy_validate_pkey: ERR
>>>>>>>> AC15: pkey 0x0002 in match rule - overriding partition SL (0) with 
>>>>>>>> QoS
>>>>>>>> Level SL (3)
>>>>>>>> Nov 25 13:13:05 664681 [373E910] 0x01 ->
>>>>>>>> __qos_policy_validate_pkey: ERR
>>>>>>>> AC15: pkey 0x0001 in match rule - overriding partition SL (0) with 
>>>>>>>> QoS
>>>>>>>> Level SL (2)
>>>>>>>> Nov 25 13:13:05 664670 [373E910] 0x01 ->
>>>>>>>> __qos_policy_validate_pkey: ERR
>>>>>>>> AC15: pkey 0x7FFF in match rule - overriding partition SL (0) with 
>>>>>>>> QoS
>>>>>>>> Level SL (1)
>>>>>>>>
>>>>>>>> Thanks for your help.
>>>>>>>>
>>>>>>>> Vincent
>>>>>>>>
>>>>>
>>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <4B177F10.1040908-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>]

* Re: QoS settings not mapped correctly per pkey ?
       [not found]                                               ` <4B177F10.1040908-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2009-12-03  9:08                                                 ` sebastien dugue
  0 siblings, 0 replies; 16+ messages in thread
From: sebastien dugue @ 2009-12-03  9:08 UTC (permalink / raw)
  To: kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb
  Cc: Vincent Ficet, linux-rdma-u79uwXL29TY76Z2rM5mHXA, BOURDE CELINE

On Thu, 03 Dec 2009 11:04:16 +0200
Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:

> sebastien dugue wrote:
> >   Hi Yevgeny,
> > 
> > On Thu, 03 Dec 2009 10:01:28 +0200
> > Yevgeny Kliteynik <kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> > 
> >> Sebastien,
> >>
> >> I noticed that you found the problem in IPoIB child 
> >> interfaces configuration. Glad that this worked out well. 
> >>
> >> My question is about the note that you left in the issue:
> >>
> >>  " It looks like in 'datagram' mode, the SL weights
> >>    do not seem to be applied, or maybe this is an
> >>    artifact of IPoIB in 'datagram mode' "
> >>
> >> Have you checked that in this mode you do get the right
> >> SL for each child interface by shutting off the relevant
> >> SL (mapping it to VL15)?
> > 
> >   Yes, SL to VL mapping is OK.
> > 
> >> If yes, then what you're saying is that you see that
> >> interfaces use the right SL and VL, but you don't see
> >> any arbitration between VLs?
> > 
> >   Right, whatever the weights I put in the vlarbs tables have absolutely
> > no effect when IPoIB is in datagram mode. I don't know if it's
> > an arbitration problem (don't think so) or an IPoIB problem.
> > 
> >   Could be that due to the 2044 bytes MTU in datagram mode, iperf
> > spends much time not doing transfers and fails to provide enough
> > data to the interfaces. Don't know.
> > 
> >   Once I switched to connected mode, with a 65520 bytes MTU, things
> > started to work OK with a much better overall combined bandwidth.
> 
> OK, then "a much better overall combined bandwidth" is an
> answer here. VL arbitration kicks in only when you saturate
> the link. If you don't, there's no point doing arbitration,
> because HW is able so serve any packet that comes w/o the
> need to prioritize.

  Yep, that's the conclusion I came to.

  Might be interesting to find where the bottleneck is though, that
prevents saturating the link.

  Sébastien.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: QoS settings not mapped correctly per pkey ?
       [not found]                                         ` <4B177058.9070909-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2009-12-03  8:17                                           ` sebastien dugue
@ 2009-12-03  8:21                                           ` Or Gerlitz
       [not found]                                             ` <4B1774F0.9060002-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 16+ messages in thread
From: Or Gerlitz @ 2009-12-03  8:21 UTC (permalink / raw)
  To: kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, sebastien dugue
  Cc: Vincent Ficet, linux-rdma-u79uwXL29TY76Z2rM5mHXA, BOURDE CELINE

Yevgeny Kliteynik wrote:
> " It looks like in 'datagram' mode, the SL weights
>   do not seem to be applied, or maybe this is an
>   artifact of IPoIB in 'datagram mode' "

yes, there's no reason for connected mode to behave differently wrt to QoS/SL assignment from the SM, as both modes get their SL from the path record provided by the SM and both mode use the same code for the path query...

> Have you checked that in this mode you do get the right
> SL for each child interface by shutting off the relevant
> SL (mapping it to VL15)?

seeing what SL is provided by the SM in return to the path query is trivial, either through the opensm logs or the ipoib ones, e.g here you see that ib1 got SL 0
on its Path to GID fe80:0000:0000:0000:0008:f104:0399:3c92 LID 0x0006 which is
10.10.0.91

> ifdown ib1
> echo 1 > /sys/module/ib_ipoib/parameters/debug_level
> ifup ib1
> ping 10.10.0.91
> dmesg | grep ib1

> ib1: Start path record lookup for fe80:0000:0000:0000:0008:f104:0399:3c92 MTU > 0
> ib1: PathRec LID 0x0006 for GID fe80:0000:0000:0000:0008:f104:0399:3c92
> ib1: Created ah ffff81021ddda180
> ib1: created address handle ffff81021ddda500 for LID 0x0006, SL 0

> # ip neigh show dev ib1
> 10.10.0.91 lladdr 80:00:00:49:fe:80:00:00:00:00:00:00:00:08:f1:04:03:99:3c:92 REACHABLE

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

[parent not found: <4B1774F0.9060002-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>]

* Re: QoS settings not mapped correctly per pkey ?
       [not found]                                             ` <4B1774F0.9060002-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
@ 2009-12-03  9:05                                               ` sebastien dugue
  0 siblings, 0 replies; 16+ messages in thread
From: sebastien dugue @ 2009-12-03  9:05 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb, Vincent Ficet,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, BOURDE CELINE



  Hi Or,

On Thu, 03 Dec 2009 10:21:04 +0200
Or Gerlitz <ogerlitz-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org> wrote:

> Yevgeny Kliteynik wrote:
> > " It looks like in 'datagram' mode, the SL weights
> >   do not seem to be applied, or maybe this is an
> >   artifact of IPoIB in 'datagram mode' "
> 
> yes, there's no reason for connected mode to behave differently wrt to QoS/SL assignment from the SM, as both modes get their SL from the path record provided by the SM and both mode use the same code for the path query...

  Right, that was my gut's feeling.

> 
> > Have you checked that in this mode you do get the right
> > SL for each child interface by shutting off the relevant
> > SL (mapping it to VL15)?
> 
> seeing what SL is provided by the SM in return to the path query is trivial, either through the opensm logs or the ipoib ones, e.g here you see that ib1 got SL 0
> on its Path to GID fe80:0000:0000:0000:0008:f104:0399:3c92 LID 0x0006 which is
> 10.10.0.91

  Just checked, the SLs provided by the path queries are OK in datagram mode.

  There might be some overhead somewhere (in iperf or in the IPoIB layer) that
prevent using the link full bandwidth in datagram mode.

  Thanks,

  Sébastien.


> 
> > ifdown ib1
> > echo 1 > /sys/module/ib_ipoib/parameters/debug_level
> > ifup ib1
> > ping 10.10.0.91
> > dmesg | grep ib1
> 
> > ib1: Start path record lookup for fe80:0000:0000:0000:0008:f104:0399:3c92 MTU > 0
> > ib1: PathRec LID 0x0006 for GID fe80:0000:0000:0000:0008:f104:0399:3c92
> > ib1: Created ah ffff81021ddda180
> > ib1: created address handle ffff81021ddda500 for LID 0x0006, SL 0
> 
> > # ip neigh show dev ib1
> > 10.10.0.91 lladdr 80:00:00:49:fe:80:00:00:00:00:00:00:00:08:f1:04:03:99:3c:92 REACHABLE
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2009-12-03  9:08 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-25 10:57 QoS settings not mapped correctly per pkey ? Vincent Ficet
     [not found] ` <4B0D0DB2.6080802-6ktuUTfB/bM@public.gmane.org>
2009-11-25 12:12   ` Yevgeny Kliteynik
     [not found]     ` <4B0D1F36.1090007-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2009-11-25 14:01       ` Vincent Ficet
     [not found]         ` <4B0D38C7.3080505-6ktuUTfB/bM@public.gmane.org>
2009-11-25 14:37           ` Yevgeny Kliteynik
     [not found]             ` <4B0D410E.2010903-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2009-11-25 15:14               ` Vincent Ficet
     [not found]                 ` <4B0D49F0.6060400-6ktuUTfB/bM@public.gmane.org>
2009-11-25 15:45                   ` Yevgeny Kliteynik
     [not found]                     ` <4B0D5110.70606-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2009-11-26  7:57                       ` Vincent Ficet
     [not found]                         ` <4B0E34EB.6020403-6ktuUTfB/bM@public.gmane.org>
2009-11-26  8:25                           ` Yevgeny Kliteynik
     [not found]                             ` <4B0E3B63.40705-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2009-11-26  8:49                               ` Vincent Ficet
     [not found]                                 ` <4B0E4105.5080107-6ktuUTfB/bM@public.gmane.org>
2009-11-26  9:56                                   ` Yevgeny Kliteynik
     [not found]                                     ` <4B0E50D6.8020401-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2009-12-03  8:01                                       ` Yevgeny Kliteynik
     [not found]                                         ` <4B177058.9070909-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2009-12-03  8:17                                           ` sebastien dugue
2009-12-03  9:04                                             ` Yevgeny Kliteynik
     [not found]                                               ` <4B177F10.1040908-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2009-12-03  9:08                                                 ` sebastien dugue
2009-12-03  8:21                                           ` Or Gerlitz
     [not found]                                             ` <4B1774F0.9060002-hKgKHo2Ms0FWk0Htik3J/w@public.gmane.org>
2009-12-03  9:05                                               ` sebastien dugue

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox