All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Robert Hoo <robert.hu@intel.com>
Cc: brouer@redhat.com, robert.hu@linux.intel.com,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: [PATCH] pktgen: add a new sample script for 40G and above link testing
Date: Fri, 25 Aug 2017 11:19:21 +0200	[thread overview]
Message-ID: <20170825111921.061713c8@redhat.com> (raw)
In-Reply-To: <1503127531-134546-1-git-send-email-robert.hu@intel.com>


(please don't use BCC on the netdev list, replies might miss the list in cc)

Comments inlined below:

On Fri, 25 Aug 2017 10:24:30 +0800 Robert Hoo <robert.hu@intel.com> wrote:

> From: Robert Ho <robert.hu@intel.com>
> 
> It's hard to benchmark 40G+ network bandwidth using ordinary
> tools like iperf, netperf. I then tried with pktgen multiqueue sample
> scripts, but still cannot reach line rate.

The pktgen_sample02_multiqueue.sh does not use burst or skb_cloning.
Thus, the performance will suffer.

See the samples that use the burst feature:
  pktgen_sample03_burst_single_flow.sh
  pktgen_sample05_flow_per_thread.sh

With the pktgen "burst" feature, I can easily generate 40G.  Generating
100G is also possible, but often you will hit some HW limits before the
pktgen limit.  I experienced hitting both (1) PCIe Gen3 x8 limit, and (2)
memory bandwidth limit.


> I then derived this NUMA awared irq affinity sample script from
> multi-queue sample one, successfully benchmarked 40G link. I think this can
> also be useful for 100G reference, though I haven't got device to test.

Okay, so your issue was really related to NUMA irq affinity.  I do feel
that IRQ tuning lives outside the realm of the pktgen scripts, but
looking closer at your script, I it doesn't look like you change the
IRQ setting which is good.  

You introduce some helper functions take makes it possible to extract
NUMA information in the shell script code, really cool.  I would like
to see these functions being integrated into the function.sh file.

 
> This script simply does:
> Detect $DEV's NUMA node belonging.
> Bind each thread (processor from that NUMA node) with each $DEV queue's
> irq affinity, 1:1 mapping.
> How many '-t' threads input determines how many queues will be
> utilized.
> 
> Tested with Intel XL710 NIC with Cisco 3172 switch.
> 
> It would be even slightly better if the irqbalance service is turned
> off outside.

Yes, if you don't turn-off (kill) irqbalance it will move around the
IRQs behind your back...

 
> Referrences:
> https://people.netfilter.org/hawk/presentations/LCA2015/net_stack_challenges_100G_LCA2015.pdf
> http://www.intel.cn/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf
> 
> Signed-off-by: Robert Hoo <robert.hu@intel.com>
> ---
>  ...tgen_sample06_numa_awared_queue_irq_affinity.sh | 132 +++++++++++++++++++++
>  1 file changed, 132 insertions(+)
>  create mode 100755 samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
> 
> diff --git a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
> new file mode 100755
> index 0000000..f0ee25c
> --- /dev/null
> +++ b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
> @@ -0,0 +1,132 @@
> +#!/bin/bash
> +#
> +# Multiqueue: Using pktgen threads for sending on multiple CPUs
> +#  * adding devices to kernel threads which are in the same NUMA node
> +#  * bound devices queue's irq affinity to the threads, 1:1 mapping
> +#  * notice the naming scheme for keeping device names unique
> +#  * nameing scheme: dev@thread_number
> +#  * flow variation via random UDP source port
> +#
> +basedir=`dirname $0`
> +source ${basedir}/functions.sh
> +root_check_run_with_sudo "$@"
> +#
> +# Required param: -i dev in $DEV
> +source ${basedir}/parameters.sh
> +
> +get_iface_node()
> +{
> +	echo `cat /sys/class/net/$1/device/numa_node`

Here you could use the following shell trick to avoid using "cat":

 echo $(</sys/class/net/$1/device/numa_node)

It looks like you don't handle the case of -1, which indicate non-NUMA
system.  You need to use something like::

get_iface_node()
{
    local node=$(</sys/class/net/$1/device/numa_node)
    if [[ $node == -1 ]]; then
	echo 0
    else
	echo $node
    fi
}


> +}
> +
> +get_iface_irqs()
> +{
> +	local IFACE=$1
> +	local queues="${IFACE}-.*TxRx"
> +
> +	irqs=$(grep "$queues" /proc/interrupts | cut -f1 -d:)
> +	[ -z "$irqs" ] && irqs=$(grep $IFACE /proc/interrupts | cut -f1 -d:)
> +	[ -z "$irqs" ] && irqs=$(for i in `ls -Ux /sys/class/net/$IFACE/device/msi_irqs` ;\
> +		do grep "$i:.*TxRx" /proc/interrupts | grep -v fdir | cut -f 1 -d : ;\
> +	    done)

Nice that you handle all these different methods.  I personally look
in /proc/irq/*/$IFACE*/../smp_affinity_list , like (copy-paste):

echo " --- Align IRQs ---"
# I've named my NICs ixgbe1 + ixgbe2
for F in /proc/irq/*/ixgbe*-TxRx-*/../smp_affinity_list; do
   # Extract irqname e.g. "ixgbe2-TxRx-2"
   irqname=$(basename $(dirname $(dirname $F))) ;
   # Substring pattern removal
   hwq_nr=${irqname#*-*-}
   echo $hwq_nr > $F
   #grep . -H $F;
done
grep -H . /proc/irq/*/ixgbe*/../smp_affinity_list

Maybe I should switch to use:
   /sys/class/net/$IFACE/device/msi_irqs/*
 

> +	[ -z "$irqs" ] && echo "Error: Could not find interrupts for $IFACE"

In the error case you should let the script die.  There is a helper
function for this called "err" (where first arg is the exitcode, which
is useful to detect the reason your script failed).


> +	echo $irqs
> +}

> +get_node_cpus()
> +{
> +	local node=$1
> +	local node_cpu_list
> +	local node_cpu_range_list=`cut -f1- -d, --output-delimiter=" " \
> +			/sys/devices/system/node/node$node/cpulist`
> +
> +	for cpu_range in $node_cpu_range_list
> +	do
> +		node_cpu_list="$node_cpu_list "`seq -s " " ${cpu_range//-/ }`
> +	done
> +
> +	echo $node_cpu_list
> +}
> +
> +
> +# Base Config
> +DELAY="0"        # Zero means max speed
> +COUNT="20000000"   # Zero means indefinitely
> +[ -z "$CLONE_SKB" ] && CLONE_SKB="0"
> +
> +# Flow variation random source port between min and max
> +UDP_MIN=9
> +UDP_MAX=109
> +
> +node=`get_iface_node $DEV`
> +irq_array=(`get_iface_irqs $DEV`)
> +cpu_array=(`get_node_cpus $node`)

Nice trick to generate an array.

> +
> +[ $THREADS -gt ${#irq_array[*]} -o $THREADS -gt ${#cpu_array[*]}  ] && \
> +	err 1 "Thread number $THREADS exceeds: min (${#irq_array[*]},${#cpu_array[*]})"
> +
> +# (example of setting default params in your script)
> +if [ -z "$DEST_IP" ]; then
> +    [ -z "$IP6" ] && DEST_IP="198.18.0.42" || DEST_IP="FD00::1"
> +fi
> +[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
> +
> +# General cleanup everything since last run
> +pg_ctrl "reset"
> +
> +# Threads are specified with parameter -t value in $THREADS
> +for ((i = 0; i < $THREADS; i++)); do
> +    # The device name is extended with @name, using thread number to
> +    # make then unique, but any name will do.
> +    # Set the queue's irq affinity to this $thread (processor)
> +    thread=${cpu_array[$i]}
> +    dev=${DEV}@${thread}
> +    echo $thread > /proc/irq/${irq_array[$i]}/smp_affinity_list
> +    echo "irq ${irq_array[$i]} is set affinity to `cat /proc/irq/${irq_array[$i]}/smp_affinity_list`"
> +
> +    # Add remove all other devices and add_device $dev to thread
> +    pg_thread $thread "rem_device_all"
> +    pg_thread $thread "add_device" $dev
> +
> +    # select queue and bind the queue and $dev in 1:1 relationship
> +    queue_num=$i
> +    echo "queue number is $queue_num"
> +    pg_set $dev "queue_map_min $queue_num"
> +    pg_set $dev "queue_map_max $queue_num"
> +
> +    # Notice config queue to map to cpu (mirrors smp_processor_id())
> +    # It is beneficial to map IRQ /proc/irq/*/smp_affinity 1:1 to CPU number
> +    pg_set $dev "flag QUEUE_MAP_CPU"
> +
> +    # Base config of dev
> +    pg_set $dev "count $COUNT"
> +    pg_set $dev "clone_skb $CLONE_SKB"
> +    pg_set $dev "pkt_size $PKT_SIZE"
> +    pg_set $dev "delay $DELAY"
> +
> +    # Flag example disabling timestamping
> +    pg_set $dev "flag NO_TIMESTAMP"
> +
> +    # Destination
> +    pg_set $dev "dst_mac $DST_MAC"
> +    pg_set $dev "dst$IP6 $DEST_IP"
> +
> +    # Setup random UDP port src range
> +    pg_set $dev "flag UDPSRC_RND"
> +    pg_set $dev "udp_src_min $UDP_MIN"
> +    pg_set $dev "udp_src_max $UDP_MAX"
> +done
> +
> +# start_run
> +echo "Running... ctrl^C to stop" >&2
> +pg_ctrl "start"
> +echo "Done" >&2
> +
> +# Print results
> +for ((i = 0; i < $THREADS; i++)); do
> +    thread=${cpu_array[$i]}
> +    dev=${DEV}@${thread}
> +    echo "Device: $dev"
> +    cat /proc/net/pktgen/$dev | grep -A2 "Result:"
> +done



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

  reply	other threads:[~2017-08-25  9:19 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-25  2:24 [PATCH] pktgen: add a new sample script for 40G and above link testing Robert Hoo
2017-08-25  9:19 ` Jesper Dangaard Brouer [this message]
2017-08-25 14:24   ` Waskiewicz Jr, Peter
2017-08-25 14:59     ` Jesper Dangaard Brouer
2017-08-25 15:11       ` Waskiewicz Jr, Peter
2017-09-01 13:57     ` Robert Hoo
2017-09-01 13:48   ` Robert Hoo
  -- strict thread matches above, loose matches on Subject: below --
2017-08-25  9:26 Robert Hoo
2017-08-25  9:47 ` Jesper Dangaard Brouer
2017-08-27  8:25 ` Tariq Toukan
2017-09-01 13:53   ` Robert Hoo
2017-08-24 12:06 Robert Hoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170825111921.061713c8@redhat.com \
    --to=brouer@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=robert.hu@intel.com \
    --cc=robert.hu@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.