From mboxrd@z Thu Jan 1 00:00:00 1970 From: Yevgeny Kliteynik Subject: Re: QoS settings not mapped correctly per pkey ? Date: Wed, 25 Nov 2009 14:12:38 +0200 Message-ID: <4B0D1F36.1090007@dev.mellanox.co.il> References: <4B0D0DB2.6080802@bull.net> Reply-To: kliteyn-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4B0D0DB2.6080802-6ktuUTfB/bM@public.gmane.org> Sender: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Vincent Ficet Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, BOURDE CELINE List-Id: linux-rdma@vger.kernel.org Hi Vincent, Vincent Ficet wrote: > Hello, > > Following the QoS experiments I carried out yesterday, I wanted to set > up 3 IP networks, each one bound to a particular pkey, in order to > achieve QoS for each network. > Unfortunately, it seems that something is not mapped properly in the ULP > layers (vlarb tables are fine). > > The settings are as follows: > > opensm.conf: > ------------ > > qos_max_vls 8 > qos_high_limit 1 > qos_vlarb_high 0:0,1:0,2:0,3:0,4:0,5:0 > qos_vlarb_low 0:8,1:1,2:1,3:4,4:0,5:0 > qos_sl2vl 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 Please check section 7 of the QoS_management_in_OpenSM.txt doc. It explains what exactly is the meaning of the values in the VLArb table. It also has explanation of the problem that you're seeing. Quoting from there: "Keep in mind that ports usually transmit packets of size equal to MTU. For instance, for 4KB MTU a single packet will require 64 credits, so in order to achieve effective VL arbitration for packets of 4KB MTU, the weighting values for each VL should be multiples of 64." -- Yevgeny > The corresponding VLArb tables are fine on both the server (pichu16) and > the client (pichu22): > > [root@pichu22 network-scripts]# smpquery vlarb -D 0 > # VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 0 LowCap > 8 HighCap 8 > # Low priority VL Arbitration Table: > VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 | > WEIGHT: |0x8 |0x1 |0x1 |0x4 |0x0 |0x0 |0x0 |0x0 | > # High priority VL Arbitration Table: > VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 | > WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 | > > [root@pichu16 ~]# smpquery vlarb -D 0 > # VLArbitration tables: DR path slid 65535; dlid 65535; 0 port 0 LowCap > 8 HighCap 8 > # Low priority VL Arbitration Table: > VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 | > WEIGHT: |0x8 |0x1 |0x1 |0x4 |0x0 |0x0 |0x0 |0x0 | > # High priority VL Arbitration Table: > VL : |0x0 |0x1 |0x2 |0x3 |0x4 |0x5 |0x0 |0x0 | > WEIGHT: |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 |0x0 | > > partitions.conf: > --------------- > > default=0x7fff,ipoib : ALL=full; > ip_backbone=0x0001,ipoib : ALL=full; > ip_admin=0x0002,ipoib : ALL=full; > > qos-policy.conf: > --------------- > > qos-ulps > default : 0 # default SL > ipoib, pkey 0x7FFF : 1 # IP with default pkey 0x7FFF > ipoib, pkey 0x1 : 2 # backbone IP with pkey 0x1 > ipoib, pkey 0x2 : 3 # admin IP with pkey 0x2 > end-qos-ulps > > Assigned IP addresses (in /etc/hosts): > ------------------------------------- > > 10.12.1.4 pichu16-ic0 # default IPoIB network, pkey 0x7FFF > 10.13.1.4 pichu16-backbone # IPoIB backbone network, pkey 0x1 > 10.14.1.4 pichu16-admin # IPoIB admin network, pkey 0x2 > 10.12.1.10 pichu22-ic0 # default IPoIB network, pkey 0x7FFF > 10.13.1.10 pichu22-backbone # IPoIB backbone network, pkey 0x1 > 10.14.1.10 pichu22-admin # IPoIB admin network, pkey 0x2 > > Note that the netmask is /16, so the -ic0, -backbone and -admin networks > cannot see each other. > > IPoIB settings on server side: > ------------------------------ > > [root@pichu16 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0* > ==> /etc/sysconfig/network-scripts/ifcfg-ib0 <== > BOOTPROTO=static > IPADDR=10.12.1.4 > NETMASK=255.255.0.0 > ONBOOT=yes > MTU=2044 > > ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8001 <== > BOOTPROTO=static > IPADDR=10.13.1.4 > NETMASK=255.255.0.0 > ONBOOT=yes > MTU=2044 > > ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8002 <== > BOOTPROTO=static > IPADDR=10.14.1.4 > NETMASK=255.255.0.0 > ONBOOT=yes > MTU=2044 > > [root@pichu16 ~]# ip addr show ib0 > 4: ib0: mtu 2044 qdisc pfifo_fast > state UP qlen 256 > link/infiniband > 80:00:00:48:fe:80:00:00:00:00:00:00:2c:90:00:10:0d:00:05:6d brd > 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff > inet 10.12.1.4/16 brd 10.12.255.255 scope global ib0 > inet 10.13.1.4/16 brd 10.13.255.255 scope global ib0 > inet 10.14.1.4/16 brd 10.14.255.255 scope global ib0 > inet6 fe80::2e90:10:d00:56d/64 scope link > valid_lft forever preferred_lft forever > > IPoIB settings on client side: > ------------------------------ > > [root@pichu22 ~]# tail -n 5 /etc/sysconfig/network-scripts/ifcfg-ib0* > ==> /etc/sysconfig/network-scripts/ifcfg-ib0 <== > BOOTPROTO=static > IPADDR=10.12.1.10 > NETMASK=255.255.0.0 > ONBOOT=yes > MTU=2044 > > ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8001 <== > BOOTPROTO=static > IPADDR=10.13.1.10 > NETMASK=255.255.0.0 > ONBOOT=yes > MTU=2044 > > ==> /etc/sysconfig/network-scripts/ifcfg-ib0.8002 <== > BOOTPROTO=static > IPADDR=10.14.1.10 > NETMASK=255.255.0.0 > ONBOOT=yes > MTU=2044 > > [root@pichu22 ~]# ip addr show ib0 > 48: ib0: mtu 2044 qdisc pfifo_fast > state UP qlen 256 > link/infiniband > 80:00:00:48:fe:80:00:00:00:00:00:00:2c:90:00:10:0d:00:06:79 brd > 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff > inet 10.12.1.10/16 brd 10.12.255.255 scope global ib0 > inet 10.13.1.10/16 brd 10.13.255.255 scope global ib0 > inet 10.14.1.10/16 brd 10.14.255.255 scope global ib0 > inet6 fe80::2e90:10:d00:679/64 scope link > valid_lft forever preferred_lft forever > > Iperf servers on server side: > ----------------------------- > > Quoting from iperf help: > -B, --bind bind to , an interface or multicast address > -s, --server run in server mode > > Each iperf server is bound to a dedicated interface as follows: > > [root@pichu16 ~]# iperf -s -B pichu16-backbone > [root@pichu16 ~]# iperf -s -B pichu16-admin > [root@pichu16 ~]# iperf -s -B pichu16-ic0 > > Iperf clients on client side: > ----------------------------- > > Quoting from iperf help: > -c, --client run in client mode, connecting to > -t, --time # time in seconds to transmit for (default 10 secs) > > And each iperf client talks to the corresponding iperf server: > > [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-ic0 -t > 100 2>&1; done | grep Gbits/sec > [ 3] 0.0-100.0 sec 64.6 GBytes 5.55 Gbits/sec > [ 3] 0.0-100.0 sec 64.5 GBytes 5.54 Gbits/sec > [ 3] 0.0-100.0 sec 60.5 GBytes 5.20 Gbits/sec > [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-backbone > -t 100 2>&1; done | grep Gbits/sec > [ 3] 0.0-100.0 sec 64.8 GBytes 5.57 Gbits/sec > [ 3] 0.0-100.0 sec 56.7 GBytes 4.87 Gbits/sec > [ 3] 0.0-100.0 sec 59.7 GBytes 5.13 Gbits/sec > [root@pichu22 ~]# while test -e keep_going; do iperf -c pichu16-admin -t > 100 2>&1; done | grep Gbits/sec > [ 3] 0.0-100.0 sec 57.3 GBytes 4.92 Gbits/sec > [ 3] 0.0-100.0 sec 61.6 GBytes 5.29 Gbits/sec > [ 3] 0.0-100.0 sec 62.7 GBytes 5.38 Gbits/sec > > Given the VLarb weights assigned (1 for *-ic0 on VL1, 1 for *-backbone > on VL2 and 4 for *-admin on VL3), we would expect different b/w figures > for the *-admin network. > As we can see, all iperf values are the same, showing that QoS is not > enforced on a per pkey basis. > It seems to me that something is not mapped properly in the ULP layers. > Could anyone tell me if I'm wrong here ? If not, is that a known issue ? > > Thanks for your help, > > Vincent > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html