* Re: [PATCH net-next] net: phylink: clarify where phylink should be used
From: Andrew Lunn @ 2019-09-14 15:08 UTC (permalink / raw)
To: Russell King
Cc: David S. Miller, Florian Fainelli, Heiner Kallweit,
Jonathan Corbet, netdev, linux-doc
In-Reply-To: <E1i94b6-0008TL-IR@rmk-PC.armlinux.org.uk>
On Sat, Sep 14, 2019 at 10:44:04AM +0100, Russell King wrote:
> Update the phylink documentation to make it clear that phylink is
> designed to be used on the MAC facing side of the link, rather than
> between a SFP and PHY.
>
> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
^ permalink raw reply
* Re: [PATCH net] ip6_gre: fix a dst leak in ip6erspan_tunnel_xmit
From: William Tu @ 2019-09-14 15:12 UTC (permalink / raw)
To: Xin Long; +Cc: network dev, David Miller
In-Reply-To: <1bfbf329c5b3649a6c6362350a0d609ff184deba.1568367947.git.lucien.xin@gmail.com>
On Fri, Sep 13, 2019 at 2:45 AM Xin Long <lucien.xin@gmail.com> wrote:
>
> In ip6erspan_tunnel_xmit(), if the skb will not be sent out, it has to
> be freed on the tx_err path. Otherwise when deleting a netns, it would
> cause dst/dev to leak, and dmesg shows:
>
> unregister_netdevice: waiting for lo to become free. Usage count = 1
>
> Fixes: ef7baf5e083c ("ip6_gre: add ip6 erspan collect_md mode")
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
LGTM, thanks for the fix!
Acked-by: William Tu <u9012063@gmail.com>
> net/ipv6/ip6_gre.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> index dd2d0b96..d5779d6 100644
> --- a/net/ipv6/ip6_gre.c
> +++ b/net/ipv6/ip6_gre.c
> @@ -968,7 +968,7 @@ static netdev_tx_t ip6erspan_tunnel_xmit(struct sk_buff *skb,
> if (unlikely(!tun_info ||
> !(tun_info->mode & IP_TUNNEL_INFO_TX) ||
> ip_tunnel_info_af(tun_info) != AF_INET6))
> - return -EINVAL;
> + goto tx_err;
>
> key = &tun_info->key;
> memset(&fl6, 0, sizeof(fl6));
> --
> 2.1.0
>
^ permalink raw reply
* [v3 1/3] samples: pktgen: make variable consistent with option
From: Daniel T. Lee @ 2019-09-14 15:13 UTC (permalink / raw)
To: Jesper Dangaard Brouer, David S . Miller; +Cc: netdev
This commit changes variable names that can cause confusion.
For example, variable DST_MIN is quite confusing since the
keyword 'udp_dst_min' and keyword 'dst_min' is used with pg_ctrl.
On the following commit, 'dst_min' will be used to set destination IP,
and the existing variable name DST_MIN should be changed.
Variable names are matched to the exact keyword used with pg_ctrl.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
---
.../pktgen_bench_xmit_mode_netif_receive.sh | 8 ++++----
.../pktgen/pktgen_bench_xmit_mode_queue_xmit.sh | 8 ++++----
samples/pktgen/pktgen_sample01_simple.sh | 16 ++++++++--------
samples/pktgen/pktgen_sample02_multiqueue.sh | 16 ++++++++--------
.../pktgen/pktgen_sample03_burst_single_flow.sh | 8 ++++----
samples/pktgen/pktgen_sample04_many_flows.sh | 8 ++++----
.../pktgen/pktgen_sample05_flow_per_thread.sh | 8 ++++----
...en_sample06_numa_awared_queue_irq_affinity.sh | 16 ++++++++--------
8 files changed, 44 insertions(+), 44 deletions(-)
diff --git a/samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh b/samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh
index e14b1a9144d9..9b74502c58f7 100755
--- a/samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh
+++ b/samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh
@@ -42,8 +42,8 @@ fi
[ -z "$BURST" ] && BURST=1024
[ -z "$COUNT" ] && COUNT="10000000" # Zero means indefinitely
if [ -n "$DST_PORT" ]; then
- read -r DST_MIN DST_MAX <<< $(parse_ports $DST_PORT)
- validate_ports $DST_MIN $DST_MAX
+ read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
+ validate_ports $UDP_DST_MIN $UDP_DST_MAX
fi
# Base Config
@@ -76,8 +76,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
pg_set $dev "flag UDPDST_RND"
- pg_set $dev "udp_dst_min $DST_MIN"
- pg_set $dev "udp_dst_max $DST_MAX"
+ pg_set $dev "udp_dst_min $UDP_DST_MIN"
+ pg_set $dev "udp_dst_max $UDP_DST_MAX"
fi
# Inject packet into RX path of stack
diff --git a/samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh b/samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh
index 82c3e504e056..0f332555b40d 100755
--- a/samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh
+++ b/samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh
@@ -25,8 +25,8 @@ if [[ -n "$BURST" ]]; then
fi
[ -z "$COUNT" ] && COUNT="10000000" # Zero means indefinitely
if [ -n "$DST_PORT" ]; then
- read -r DST_MIN DST_MAX <<< $(parse_ports $DST_PORT)
- validate_ports $DST_MIN $DST_MAX
+ read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
+ validate_ports $UDP_DST_MIN $UDP_DST_MAX
fi
# Base Config
@@ -59,8 +59,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
pg_set $dev "flag UDPDST_RND"
- pg_set $dev "udp_dst_min $DST_MIN"
- pg_set $dev "udp_dst_max $DST_MAX"
+ pg_set $dev "udp_dst_min $UDP_DST_MIN"
+ pg_set $dev "udp_dst_max $UDP_DST_MAX"
fi
# Inject packet into TX qdisc egress path of stack
diff --git a/samples/pktgen/pktgen_sample01_simple.sh b/samples/pktgen/pktgen_sample01_simple.sh
index d1702fdde8f3..063ec0998906 100755
--- a/samples/pktgen/pktgen_sample01_simple.sh
+++ b/samples/pktgen/pktgen_sample01_simple.sh
@@ -23,16 +23,16 @@ fi
[ -z "$DST_MAC" ] && usage && err 2 "Must specify -m dst_mac"
[ -z "$COUNT" ] && COUNT="100000" # Zero means indefinitely
if [ -n "$DST_PORT" ]; then
- read -r DST_MIN DST_MAX <<< $(parse_ports $DST_PORT)
- validate_ports $DST_MIN $DST_MAX
+ read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
+ validate_ports $UDP_DST_MIN $UDP_DST_MAX
fi
# Base Config
DELAY="0" # Zero means max speed
# Flow variation random source port between min and max
-UDP_MIN=9
-UDP_MAX=109
+UDP_SRC_MIN=9
+UDP_SRC_MAX=109
# General cleanup everything since last run
# (especially important if other threads were configured by other scripts)
@@ -66,14 +66,14 @@ pg_set $DEV "dst$IP6 $DEST_IP"
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
pg_set $DEV "flag UDPDST_RND"
- pg_set $DEV "udp_dst_min $DST_MIN"
- pg_set $DEV "udp_dst_max $DST_MAX"
+ pg_set $DEV "udp_dst_min $UDP_DST_MIN"
+ pg_set $DEV "udp_dst_max $UDP_DST_MAX"
fi
# Setup random UDP port src range
pg_set $DEV "flag UDPSRC_RND"
-pg_set $DEV "udp_src_min $UDP_MIN"
-pg_set $DEV "udp_src_max $UDP_MAX"
+pg_set $DEV "udp_src_min $UDP_SRC_MIN"
+pg_set $DEV "udp_src_max $UDP_SRC_MAX"
# start_run
echo "Running... ctrl^C to stop" >&2
diff --git a/samples/pktgen/pktgen_sample02_multiqueue.sh b/samples/pktgen/pktgen_sample02_multiqueue.sh
index 7f7a9a27548f..a4726fb50197 100755
--- a/samples/pktgen/pktgen_sample02_multiqueue.sh
+++ b/samples/pktgen/pktgen_sample02_multiqueue.sh
@@ -21,8 +21,8 @@ DELAY="0" # Zero means max speed
[ -z "$CLONE_SKB" ] && CLONE_SKB="0"
# Flow variation random source port between min and max
-UDP_MIN=9
-UDP_MAX=109
+UDP_SRC_MIN=9
+UDP_SRC_MAX=109
# (example of setting default params in your script)
if [ -z "$DEST_IP" ]; then
@@ -30,8 +30,8 @@ if [ -z "$DEST_IP" ]; then
fi
[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
if [ -n "$DST_PORT" ]; then
- read -r DST_MIN DST_MAX <<< $(parse_ports $DST_PORT)
- validate_ports $DST_MIN $DST_MAX
+ read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
+ validate_ports $UDP_DST_MIN $UDP_DST_MAX
fi
# General cleanup everything since last run
@@ -67,14 +67,14 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
pg_set $dev "flag UDPDST_RND"
- pg_set $dev "udp_dst_min $DST_MIN"
- pg_set $dev "udp_dst_max $DST_MAX"
+ pg_set $dev "udp_dst_min $UDP_DST_MIN"
+ pg_set $dev "udp_dst_max $UDP_DST_MAX"
fi
# Setup random UDP port src range
pg_set $dev "flag UDPSRC_RND"
- pg_set $dev "udp_src_min $UDP_MIN"
- pg_set $dev "udp_src_max $UDP_MAX"
+ pg_set $dev "udp_src_min $UDP_SRC_MIN"
+ pg_set $dev "udp_src_max $UDP_SRC_MAX"
done
# start_run
diff --git a/samples/pktgen/pktgen_sample03_burst_single_flow.sh b/samples/pktgen/pktgen_sample03_burst_single_flow.sh
index b520637817ce..dfea91a09ccc 100755
--- a/samples/pktgen/pktgen_sample03_burst_single_flow.sh
+++ b/samples/pktgen/pktgen_sample03_burst_single_flow.sh
@@ -34,8 +34,8 @@ fi
[ -z "$CLONE_SKB" ] && CLONE_SKB="0" # No need for clones when bursting
[ -z "$COUNT" ] && COUNT="0" # Zero means indefinitely
if [ -n "$DST_PORT" ]; then
- read -r DST_MIN DST_MAX <<< $(parse_ports $DST_PORT)
- validate_ports $DST_MIN $DST_MAX
+ read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
+ validate_ports $UDP_DST_MIN $UDP_DST_MAX
fi
# Base Config
@@ -67,8 +67,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
pg_set $dev "flag UDPDST_RND"
- pg_set $dev "udp_dst_min $DST_MIN"
- pg_set $dev "udp_dst_max $DST_MAX"
+ pg_set $dev "udp_dst_min $UDP_DST_MIN"
+ pg_set $dev "udp_dst_max $UDP_DST_MAX"
fi
# Setup burst, for easy testing -b 0 disable bursting
diff --git a/samples/pktgen/pktgen_sample04_many_flows.sh b/samples/pktgen/pktgen_sample04_many_flows.sh
index 5b6e9d9cb5b5..7ea9b4a3acf6 100755
--- a/samples/pktgen/pktgen_sample04_many_flows.sh
+++ b/samples/pktgen/pktgen_sample04_many_flows.sh
@@ -18,8 +18,8 @@ source ${basedir}/parameters.sh
[ -z "$CLONE_SKB" ] && CLONE_SKB="0"
[ -z "$COUNT" ] && COUNT="0" # Zero means indefinitely
if [ -n "$DST_PORT" ]; then
- read -r DST_MIN DST_MAX <<< $(parse_ports $DST_PORT)
- validate_ports $DST_MIN $DST_MAX
+ read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
+ validate_ports $UDP_DST_MIN $UDP_DST_MAX
fi
# NOTICE: Script specific settings
@@ -63,8 +63,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
pg_set $dev "flag UDPDST_RND"
- pg_set $dev "udp_dst_min $DST_MIN"
- pg_set $dev "udp_dst_max $DST_MAX"
+ pg_set $dev "udp_dst_min $UDP_DST_MIN"
+ pg_set $dev "udp_dst_max $UDP_DST_MAX"
fi
# Randomize source IP-addresses
diff --git a/samples/pktgen/pktgen_sample05_flow_per_thread.sh b/samples/pktgen/pktgen_sample05_flow_per_thread.sh
index 0c06e63fbe97..fbfafe029e11 100755
--- a/samples/pktgen/pktgen_sample05_flow_per_thread.sh
+++ b/samples/pktgen/pktgen_sample05_flow_per_thread.sh
@@ -23,8 +23,8 @@ source ${basedir}/parameters.sh
[ -z "$BURST" ] && BURST=32
[ -z "$COUNT" ] && COUNT="0" # Zero means indefinitely
if [ -n "$DST_PORT" ]; then
- read -r DST_MIN DST_MAX <<< $(parse_ports $DST_PORT)
- validate_ports $DST_MIN $DST_MAX
+ read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
+ validate_ports $UDP_DST_MIN $UDP_DST_MAX
fi
# Base Config
@@ -56,8 +56,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
pg_set $dev "flag UDPDST_RND"
- pg_set $dev "udp_dst_min $DST_MIN"
- pg_set $dev "udp_dst_max $DST_MAX"
+ pg_set $dev "udp_dst_min $UDP_DST_MIN"
+ pg_set $dev "udp_dst_max $UDP_DST_MAX"
fi
# Setup source IP-addresses based on thread number
diff --git a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
index 97f0266c0356..755e662183f1 100755
--- a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
+++ b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
@@ -20,8 +20,8 @@ DELAY="0" # Zero means max speed
[ -z "$CLONE_SKB" ] && CLONE_SKB="0"
# Flow variation random source port between min and max
-UDP_MIN=9
-UDP_MAX=109
+UDP_SRC_MIN=9
+UDP_SRC_MAX=109
node=`get_iface_node $DEV`
irq_array=(`get_iface_irqs $DEV`)
@@ -36,8 +36,8 @@ if [ -z "$DEST_IP" ]; then
fi
[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
if [ -n "$DST_PORT" ]; then
- read -r DST_MIN DST_MAX <<< $(parse_ports $DST_PORT)
- validate_ports $DST_MIN $DST_MAX
+ read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
+ validate_ports $UDP_DST_MIN $UDP_DST_MAX
fi
# General cleanup everything since last run
@@ -84,14 +84,14 @@ for ((i = 0; i < $THREADS; i++)); do
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
pg_set $dev "flag UDPDST_RND"
- pg_set $dev "udp_dst_min $DST_MIN"
- pg_set $dev "udp_dst_max $DST_MAX"
+ pg_set $dev "udp_dst_min $UDP_DST_MIN"
+ pg_set $dev "udp_dst_max $UDP_DST_MAX"
fi
# Setup random UDP port src range
pg_set $dev "flag UDPSRC_RND"
- pg_set $dev "udp_src_min $UDP_MIN"
- pg_set $dev "udp_src_max $UDP_MAX"
+ pg_set $dev "udp_src_min $UDP_SRC_MIN"
+ pg_set $dev "udp_src_max $UDP_SRC_MAX"
done
# start_run
--
2.20.1
^ permalink raw reply related
* [v3 2/3] samples: pktgen: add helper functions for IP(v4/v6) CIDR parsing
From: Daniel T. Lee @ 2019-09-14 15:13 UTC (permalink / raw)
To: Jesper Dangaard Brouer, David S . Miller; +Cc: netdev
In-Reply-To: <20190914151353.18054-1-danieltimlee@gmail.com>
This commit adds CIDR parsing and IP validate helper function to parse
single IP or range of IP with CIDR. (e.g. 198.18.0.0/15)
Helpers will be used in prior to set target address in samples/pktgen.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
---
Changes since v3:
* Set errexit option to stop script execution on error
samples/pktgen/functions.sh | 124 ++++++++++++++++++++++++++++++++++++
1 file changed, 124 insertions(+)
diff --git a/samples/pktgen/functions.sh b/samples/pktgen/functions.sh
index 4af4046d71be..87ae61701904 100644
--- a/samples/pktgen/functions.sh
+++ b/samples/pktgen/functions.sh
@@ -5,6 +5,8 @@
# Author: Jesper Dangaaard Brouer
# License: GPL
+set -o errexit
+
## -- General shell logging cmds --
function err() {
local exitcode=$1
@@ -163,6 +165,128 @@ function get_node_cpus()
echo $node_cpu_list
}
+# Extend shrunken IPv6 address.
+# fe80::42:bcff:fe84:e10a => fe80:0:0:0:42:bcff:fe84:e10a
+function extend_addr6()
+{
+ local addr=$1
+ local sep=: sep2=::
+ local sep_cnt=$(tr -cd $sep <<< $1 | wc -c)
+ local shrink
+
+ # separator count : should be between 2, 7.
+ if [[ $sep_cnt -lt 2 || $sep_cnt -gt 7 ]]; then
+ err 5 "Invalid IP6 address sep: $1"
+ fi
+
+ # if shrink '::' occurs multiple, it's malformed.
+ shrink=( $(egrep -o "$sep{2,}" <<< $addr) )
+ if [[ ${#shrink[@]} -ne 0 ]]; then
+ if [[ ${#shrink[@]} -gt 1 || ( ${shrink[0]} != $sep2 ) ]]; then
+ err 5 "Invalid IP$IP6 address shr: $1"
+ fi
+ fi
+
+ # add 0 at begin & end, and extend addr by adding :0
+ [[ ${addr:0:1} == $sep ]] && addr=0${addr}
+ [[ ${addr: -1} == $sep ]] && addr=${addr}0
+ echo "${addr/$sep2/$(printf ':0%.s' $(seq $[8-sep_cnt])):}"
+}
+
+
+# Given a single IP(v4/v6) address, whether it is valid.
+function validate_addr()
+{
+ # check function is called with (funcname)6
+ [[ ${FUNCNAME[1]: -1} == 6 ]] && local IP6=6
+ local len=$[ IP6 ? 8 : 4 ]
+ local max=$[ 2**(len*2)-1 ]
+ local addr sep
+
+ # set separator for each IP(v4/v6)
+ [[ $IP6 ]] && sep=: || sep=.
+ IFS=$sep read -a addr <<< $1
+
+ # array length
+ if [[ ${#addr[@]} != $len ]]; then
+ err 5 "Invalid IP$IP6 address: $1"
+ fi
+
+ # check each digit between 0, $max
+ for digit in "${addr[@]}"; do
+ [[ $IP6 ]] && digit=$[ 16#$digit ]
+ if [[ $digit -lt 0 || $digit -gt $max ]]; then
+ err 5 "Invalid IP$IP6 address: $1"
+ fi
+ done
+
+ return 0
+}
+
+function validate_addr6() { validate_addr $@ ; }
+
+# Given a single IP(v4/v6) or CIDR, return minimum and maximum IP addr.
+function parse_addr()
+{
+ # check function is called with (funcname)6
+ [[ ${FUNCNAME[1]: -1} == 6 ]] && local IP6=6
+ local bitlen=$[ IP6 ? 128 : 32 ]
+ local octet=$[ IP6 ? 16 : 8 ]
+
+ local addr=$1
+ local net prefix
+ local min_ip max_ip
+
+ IFS='/' read net prefix <<< $addr
+ [[ $IP6 ]] && net=$(extend_addr6 $net)
+ validate_addr$IP6 $net
+
+ if [[ $prefix -gt $bitlen ]]; then
+ err 5 "Invalid prefix: $prefix"
+ elif [[ -z $prefix ]]; then
+ min_ip=$net
+ max_ip=$net
+ else
+ # defining array for converting Decimal 2 Binary
+ # 00000000 00000001 00000010 00000011 00000100 ...
+ local d2b='{0..1}{0..1}{0..1}{0..1}{0..1}{0..1}{0..1}{0..1}'
+ [[ $IP6 ]] && d2b+=$d2b
+ eval local D2B=($d2b)
+
+ local remain=$[ bitlen-prefix ]
+ local min_mask max_mask
+ local min max
+ local ip_bit
+ local ip sep
+
+ # set separator for each IP(v4/v6)
+ [[ $IP6 ]] && sep=: || sep=.
+ IFS=$sep read -ra ip <<< $net
+
+ min_mask="$(printf '1%.s' $(seq $prefix))$(printf '0%.s' $(seq $remain))"
+ max_mask="$(printf '0%.s' $(seq $prefix))$(printf '1%.s' $(seq $remain))"
+
+ # calculate min/max ip with &,| operator
+ for i in "${!ip[@]}"; do
+ digit=$[ IP6 ? 16#${ip[$i]} : ${ip[$i]} ]
+ ip_bit=${D2B[$digit]}
+
+ idx=$[ octet*i ]
+ min[$i]=$[ 2#$ip_bit & 2#${min_mask:$idx:$octet} ]
+ max[$i]=$[ 2#$ip_bit | 2#${max_mask:$idx:$octet} ]
+ [[ $IP6 ]] && { min[$i]=$(printf '%X' ${min[$i]});
+ max[$i]=$(printf '%X' ${max[$i]}); }
+ done
+
+ min_ip=$(IFS=$sep; echo "${min[*]}")
+ max_ip=$(IFS=$sep; echo "${max[*]}")
+ fi
+
+ echo $min_ip $max_ip
+}
+
+function parse_addr6() { parse_addr $@ ; }
+
# Given a single or range of port(s), return minimum and maximum port number.
function parse_ports()
{
--
2.20.1
^ permalink raw reply related
* [v3 3/3] samples: pktgen: allow to specify destination IP range (CIDR)
From: Daniel T. Lee @ 2019-09-14 15:13 UTC (permalink / raw)
To: Jesper Dangaard Brouer, David S . Miller; +Cc: netdev
In-Reply-To: <20190914151353.18054-1-danieltimlee@gmail.com>
Currently, kernel pktgen has the feature to specify destination
address range for sending packet. (e.g. pgset "dst_min/dst_max")
But on samples, each of the scripts doesn't have any option to achieve this.
This commit adds the feature to specify the destination address range with CIDR.
-d : ($DEST_IP) destination IP. CIDR (e.g. 198.18.0.0/15) is also allowed
# ./pktgen_sample01_simple.sh -6 -d fe80::20/126 -p 3000 -n 4
# tcpdump ip6 and udp
05:14:18.082285 IP6 fe80::99.71 > fe80::23.3000: UDP, length 16
05:14:18.082564 IP6 fe80::99.43 > fe80::23.3000: UDP, length 16
05:14:18.083366 IP6 fe80::99.107 > fe80::22.3000: UDP, length 16
05:14:18.083585 IP6 fe80::99.97 > fe80::21.3000: UDP, length 16
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
---
samples/pktgen/README.rst | 2 +-
samples/pktgen/parameters.sh | 2 +-
.../pktgen/pktgen_bench_xmit_mode_netif_receive.sh | 4 +++-
samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh | 4 +++-
samples/pktgen/pktgen_sample01_simple.sh | 4 +++-
samples/pktgen/pktgen_sample02_multiqueue.sh | 4 +++-
samples/pktgen/pktgen_sample03_burst_single_flow.sh | 4 +++-
samples/pktgen/pktgen_sample04_many_flows.sh | 11 ++++++++---
samples/pktgen/pktgen_sample05_flow_per_thread.sh | 4 +++-
.../pktgen_sample06_numa_awared_queue_irq_affinity.sh | 4 +++-
10 files changed, 31 insertions(+), 12 deletions(-)
diff --git a/samples/pktgen/README.rst b/samples/pktgen/README.rst
index fd39215db508..3f6483e8b2df 100644
--- a/samples/pktgen/README.rst
+++ b/samples/pktgen/README.rst
@@ -18,7 +18,7 @@ across the sample scripts. Usage example is printed on errors::
Usage: ./pktgen_sample01_simple.sh [-vx] -i ethX
-i : ($DEV) output interface/device (required)
-s : ($PKT_SIZE) packet size
- -d : ($DEST_IP) destination IP
+ -d : ($DEST_IP) destination IP. CIDR (e.g. 198.18.0.0/15) is also allowed
-m : ($DST_MAC) destination MAC-addr
-p : ($DST_PORT) destination PORT range (e.g. 433-444) is also allowed
-t : ($THREADS) threads to start
diff --git a/samples/pktgen/parameters.sh b/samples/pktgen/parameters.sh
index a06b00a0c7b6..ff0ed474fee9 100644
--- a/samples/pktgen/parameters.sh
+++ b/samples/pktgen/parameters.sh
@@ -8,7 +8,7 @@ function usage() {
echo "Usage: $0 [-vx] -i ethX"
echo " -i : (\$DEV) output interface/device (required)"
echo " -s : (\$PKT_SIZE) packet size"
- echo " -d : (\$DEST_IP) destination IP"
+ echo " -d : (\$DEST_IP) destination IP. CIDR (e.g. 198.18.0.0/15) is also allowed"
echo " -m : (\$DST_MAC) destination MAC-addr"
echo " -p : (\$DST_PORT) destination PORT range (e.g. 433-444) is also allowed"
echo " -t : (\$THREADS) threads to start"
diff --git a/samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh b/samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh
index 9b74502c58f7..da6cb711b7f4 100755
--- a/samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh
+++ b/samples/pktgen/pktgen_bench_xmit_mode_netif_receive.sh
@@ -41,6 +41,7 @@ fi
[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
[ -z "$BURST" ] && BURST=1024
[ -z "$COUNT" ] && COUNT="10000000" # Zero means indefinitely
+[ -n "$DEST_IP" ] && read -r DST_MIN DST_MAX <<< $(parse_addr${IP6} $DEST_IP)
if [ -n "$DST_PORT" ]; then
read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
validate_ports $UDP_DST_MIN $UDP_DST_MAX
@@ -71,7 +72,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
# Destination
pg_set $dev "dst_mac $DST_MAC"
- pg_set $dev "dst$IP6 $DEST_IP"
+ pg_set $dev "dst${IP6}_min $DST_MIN"
+ pg_set $dev "dst${IP6}_max $DST_MAX"
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
diff --git a/samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh b/samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh
index 0f332555b40d..355937787364 100755
--- a/samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh
+++ b/samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh
@@ -24,6 +24,7 @@ if [[ -n "$BURST" ]]; then
err 1 "Bursting not supported for this mode"
fi
[ -z "$COUNT" ] && COUNT="10000000" # Zero means indefinitely
+[ -n "$DEST_IP" ] && read -r DST_MIN DST_MAX <<< $(parse_addr${IP6} $DEST_IP)
if [ -n "$DST_PORT" ]; then
read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
validate_ports $UDP_DST_MIN $UDP_DST_MAX
@@ -54,7 +55,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
# Destination
pg_set $dev "dst_mac $DST_MAC"
- pg_set $dev "dst$IP6 $DEST_IP"
+ pg_set $dev "dst${IP6}_min $DST_MIN"
+ pg_set $dev "dst${IP6}_max $DST_MAX"
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
diff --git a/samples/pktgen/pktgen_sample01_simple.sh b/samples/pktgen/pktgen_sample01_simple.sh
index 063ec0998906..08995fa70025 100755
--- a/samples/pktgen/pktgen_sample01_simple.sh
+++ b/samples/pktgen/pktgen_sample01_simple.sh
@@ -22,6 +22,7 @@ fi
# Example enforce param "-m" for dst_mac
[ -z "$DST_MAC" ] && usage && err 2 "Must specify -m dst_mac"
[ -z "$COUNT" ] && COUNT="100000" # Zero means indefinitely
+[ -n "$DEST_IP" ] && read -r DST_MIN DST_MAX <<< $(parse_addr${IP6} $DEST_IP)
if [ -n "$DST_PORT" ]; then
read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
validate_ports $UDP_DST_MIN $UDP_DST_MAX
@@ -61,7 +62,8 @@ pg_set $DEV "flag NO_TIMESTAMP"
# Destination
pg_set $DEV "dst_mac $DST_MAC"
-pg_set $DEV "dst$IP6 $DEST_IP"
+pg_set $DEV "dst${IP6}_min $DST_MIN"
+pg_set $DEV "dst${IP6}_max $DST_MAX"
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
diff --git a/samples/pktgen/pktgen_sample02_multiqueue.sh b/samples/pktgen/pktgen_sample02_multiqueue.sh
index a4726fb50197..9b806e41c23a 100755
--- a/samples/pktgen/pktgen_sample02_multiqueue.sh
+++ b/samples/pktgen/pktgen_sample02_multiqueue.sh
@@ -29,6 +29,7 @@ if [ -z "$DEST_IP" ]; then
[ -z "$IP6" ] && DEST_IP="198.18.0.42" || DEST_IP="FD00::1"
fi
[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
+[ -n "$DEST_IP" ] && read -r DST_MIN DST_MAX <<< $(parse_addr${IP6} $DEST_IP)
if [ -n "$DST_PORT" ]; then
read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
validate_ports $UDP_DST_MIN $UDP_DST_MAX
@@ -62,7 +63,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
# Destination
pg_set $dev "dst_mac $DST_MAC"
- pg_set $dev "dst$IP6 $DEST_IP"
+ pg_set $dev "dst${IP6}_min $DST_MIN"
+ pg_set $dev "dst${IP6}_max $DST_MAX"
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
diff --git a/samples/pktgen/pktgen_sample03_burst_single_flow.sh b/samples/pktgen/pktgen_sample03_burst_single_flow.sh
index dfea91a09ccc..cb067788ceb3 100755
--- a/samples/pktgen/pktgen_sample03_burst_single_flow.sh
+++ b/samples/pktgen/pktgen_sample03_burst_single_flow.sh
@@ -33,6 +33,7 @@ fi
[ -z "$BURST" ] && BURST=32
[ -z "$CLONE_SKB" ] && CLONE_SKB="0" # No need for clones when bursting
[ -z "$COUNT" ] && COUNT="0" # Zero means indefinitely
+[ -n "$DEST_IP" ] && read -r DST_MIN DST_MAX <<< $(parse_addr${IP6} $DEST_IP)
if [ -n "$DST_PORT" ]; then
read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
validate_ports $UDP_DST_MIN $UDP_DST_MAX
@@ -62,7 +63,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
# Destination
pg_set $dev "dst_mac $DST_MAC"
- pg_set $dev "dst$IP6 $DEST_IP"
+ pg_set $dev "dst${IP6}_min $DST_MIN"
+ pg_set $dev "dst${IP6}_max $DST_MAX"
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
diff --git a/samples/pktgen/pktgen_sample04_many_flows.sh b/samples/pktgen/pktgen_sample04_many_flows.sh
index 7ea9b4a3acf6..626e33016869 100755
--- a/samples/pktgen/pktgen_sample04_many_flows.sh
+++ b/samples/pktgen/pktgen_sample04_many_flows.sh
@@ -17,6 +17,7 @@ source ${basedir}/parameters.sh
[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
[ -z "$CLONE_SKB" ] && CLONE_SKB="0"
[ -z "$COUNT" ] && COUNT="0" # Zero means indefinitely
+[ -n "$DEST_IP" ] && read -r DST_MIN DST_MAX <<< $(parse_addr $DEST_IP)
if [ -n "$DST_PORT" ]; then
read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
validate_ports $UDP_DST_MIN $UDP_DST_MAX
@@ -37,6 +38,9 @@ if [[ -n "$BURST" ]]; then
err 1 "Bursting not supported for this mode"
fi
+# 198.18.0.0 / 198.19.255.255
+read -r SRC_MIN SRC_MAX <<< $(parse_addr 198.18.0.0/15)
+
# General cleanup everything since last run
pg_ctrl "reset"
@@ -58,7 +62,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
# Single destination
pg_set $dev "dst_mac $DST_MAC"
- pg_set $dev "dst $DEST_IP"
+ pg_set $dev "dst_min $DST_MIN"
+ pg_set $dev "dst_max $DST_MAX"
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
@@ -69,8 +74,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
# Randomize source IP-addresses
pg_set $dev "flag IPSRC_RND"
- pg_set $dev "src_min 198.18.0.0"
- pg_set $dev "src_max 198.19.255.255"
+ pg_set $dev "src_min $SRC_MIN"
+ pg_set $dev "src_max $SRC_MAX"
# Limit number of flows (max 65535)
pg_set $dev "flows $FLOWS"
diff --git a/samples/pktgen/pktgen_sample05_flow_per_thread.sh b/samples/pktgen/pktgen_sample05_flow_per_thread.sh
index fbfafe029e11..cb79de073e9d 100755
--- a/samples/pktgen/pktgen_sample05_flow_per_thread.sh
+++ b/samples/pktgen/pktgen_sample05_flow_per_thread.sh
@@ -22,6 +22,7 @@ source ${basedir}/parameters.sh
[ -z "$CLONE_SKB" ] && CLONE_SKB="0"
[ -z "$BURST" ] && BURST=32
[ -z "$COUNT" ] && COUNT="0" # Zero means indefinitely
+[ -n "$DEST_IP" ] && read -r DST_MIN DST_MAX <<< $(parse_addr $DEST_IP)
if [ -n "$DST_PORT" ]; then
read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
validate_ports $UDP_DST_MIN $UDP_DST_MAX
@@ -51,7 +52,8 @@ for ((thread = $F_THREAD; thread <= $L_THREAD; thread++)); do
# Single destination
pg_set $dev "dst_mac $DST_MAC"
- pg_set $dev "dst $DEST_IP"
+ pg_set $dev "dst_min $DST_MIN"
+ pg_set $dev "dst_max $DST_MAX"
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
diff --git a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
index 755e662183f1..739adcda5b5f 100755
--- a/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
+++ b/samples/pktgen/pktgen_sample06_numa_awared_queue_irq_affinity.sh
@@ -35,6 +35,7 @@ if [ -z "$DEST_IP" ]; then
[ -z "$IP6" ] && DEST_IP="198.18.0.42" || DEST_IP="FD00::1"
fi
[ -z "$DST_MAC" ] && DST_MAC="90:e2:ba:ff:ff:ff"
+[ -n "$DEST_IP" ] && read -r DST_MIN DST_MAX <<< $(parse_addr${IP6} $DEST_IP)
if [ -n "$DST_PORT" ]; then
read -r UDP_DST_MIN UDP_DST_MAX <<< $(parse_ports $DST_PORT)
validate_ports $UDP_DST_MIN $UDP_DST_MAX
@@ -79,7 +80,8 @@ for ((i = 0; i < $THREADS; i++)); do
# Destination
pg_set $dev "dst_mac $DST_MAC"
- pg_set $dev "dst$IP6 $DEST_IP"
+ pg_set $dev "dst${IP6}_min $DST_MIN"
+ pg_set $dev "dst${IP6}_max $DST_MAX"
if [ -n "$DST_PORT" ]; then
# Single destination port or random port range
--
2.20.1
^ permalink raw reply related
* e1000e: workqueue problem
From: Randy Dunlap @ 2019-09-14 15:20 UTC (permalink / raw)
To: netdev@vger.kernel.org, Jeff Kirsher, intel-wired-lan
This is 5.30-rc8 on x86_64.
I would be happy to hear if this has already been addressed in some of
your recent patches.
Or: this could be a PCI-related or workqueue bug, but it only shows up
when loading the e1000e driver. And not every time.
[ 623.410732] calling e1000_init_module+0x0/0x1000 [e1000e] @ 11258
[ 623.410768] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[ 623.410792] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[ 623.420492] e1000e 0000:00:19.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[ 623.529922] e1000e 0000:00:19.0 0000:00:19.0 (uninitialized): registered PHC clock
[ 623.631604] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width x1) e8:9d:87:4a:c2:2d
[ 623.631670] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network Connection
[ 623.631774] e1000e 0000:00:19.0 eth0: MAC: 10, PHY: 11, PBA No: FFFFFF-0FF
[ 623.632021] probe of 0000:00:19.0 returned 1 after 221138 usecs
[ 623.632520] initcall e1000_init_module+0x0/0x1000 [e1000e] returned 0 after 216532 usecs
[ 623.752736] e1000e 0000:00:19.0 eth0: removed PHC
[ 623.920346] ==================================================================
[ 623.920389] BUG: KASAN: use-after-free in __queue_work+0x98d/0xcb0
[ 623.920414] Read of size 4 at addr ffff8881127b12a8 by task swapper/3/0
[ 623.920452] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.3.0-rc8 #2
[ 623.920476] Hardware name: TOSHIBA PORTEGE R835/Portable PC, BIOS Version 4.10 01/08/2013
[ 623.920504] Call Trace:
[ 623.920520] <IRQ>
[ 623.920538] dump_stack+0x7b/0xb5
[ 623.920559] print_address_description+0x6e/0x470
[ 623.920586] __kasan_report+0x11a/0x198
[ 623.920606] ? sched_clock_cpu+0x1b/0x1e0
[ 623.920625] ? __queue_work+0x98d/0xcb0
[ 623.920647] ? __queue_work+0x98d/0xcb0
[ 623.920668] kasan_report+0x12/0x20
[ 623.920687] __asan_report_load4_noabort+0x14/0x20
[ 623.920708] __queue_work+0x98d/0xcb0
[ 623.920736] delayed_work_timer_fn+0x58/0x90
[ 623.920759] call_timer_fn.isra.21+0x19f/0x2e0
[ 623.920778] ? call_timer_fn.isra.21+0x16c/0x2e0
[ 623.920799] ? queue_work_node+0x370/0x370
[ 623.920819] ? del_timer+0xe0/0xe0
[ 623.920844] ? __kasan_check_read+0x11/0x20
[ 623.920865] ? do_raw_spin_unlock+0x54/0x220
[ 623.920889] run_timer_softirq+0x4d9/0xe70
[ 623.920910] ? lock_acquire+0xd2/0x180
[ 623.920931] ? queue_work_node+0x370/0x370
[ 623.920952] ? trigger_dyntick_cpu+0x290/0x290
[ 623.920974] ? sched_clock+0x9/0x10
[ 623.920992] ? sched_clock+0x9/0x10
[ 623.921011] ? sched_clock_cpu+0x1b/0x1e0
[ 623.921031] ? lapic_next_deadline+0x21/0x30
[ 623.921052] ? clockevents_program_event+0x264/0x350
[ 623.921082] __do_softirq+0x1c8/0x623
[ 623.921109] irq_exit+0x1c8/0x210
[ 623.921128] smp_apic_timer_interrupt+0xd1/0x130
[ 623.921151] apic_timer_interrupt+0xf/0x20
[ 623.921170] </IRQ>
[ 623.921185] RIP: 0010:cpuidle_enter_state+0x11a/0xbb0
[ 623.921207] Code: 80 7d d0 00 8b 4d c8 74 1d 9c 58 66 66 90 66 90 f6 c4 02 0f 85 6d 05 00 00 31 ff 89 4d d0 e8 1d 59 80 fe 8b 4d d0 fb 66 66 90 <66> 66 90 85 c9 79 47 49 8d 7d 10 48 b8 00 00 00 00 00 fc ff df 48
[ 623.921263] RSP: 0018:ffff888107f67cd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 623.921291] RAX: dffffc0000000000 RBX: ffffffffb4770958 RCX: 0000000000000005
[ 623.921317] RDX: 1ffff110238fef99 RSI: 000000003574540e RDI: ffff88811c7f7cc8
[ 623.921343] RBP: ffff888107f67d38 R08: 0000000000000002 R09: ffffed1020feb691
[ 623.921369] R10: ffff888107f67c70 R11: ffffed1020feb690 R12: ffffffffb4770760
[ 623.921394] R13: ffffe8fffea03158 R14: 0000009144892b23 R15: 00000091447a7ab5
[ 623.921433] ? cpuidle_enter_state+0xf0/0xbb0
[ 623.921456] ? menu_enable_device+0x170/0x170
[ 623.921480] cpuidle_enter+0x4a/0xa0
[ 623.921499] ? __kasan_check_write+0x14/0x20
[ 623.921521] call_cpuidle+0x68/0xc0
[ 623.921542] do_idle+0x31d/0x3e0
[ 623.921559] ? apic_timer_interrupt+0xa/0x20
[ 623.921582] ? arch_cpu_idle_exit+0x40/0x40
[ 623.921611] cpu_startup_entry+0x18/0x20
[ 623.921631] start_secondary+0x338/0x430
[ 623.921652] ? set_cpu_sibling_map+0x37e0/0x37e0
[ 623.921686] secondary_startup_64+0xa4/0xb0
[ 623.921730] Allocated by task 11258:
[ 623.921748] save_stack+0x21/0x90
[ 623.921765] __kasan_kmalloc.constprop.8+0xa7/0xd0
[ 623.921786] kasan_kmalloc+0x9/0x10
[ 623.921804] alloc_workqueue+0x122/0xe30
[ 623.921831] e1000_probe+0x1c4c/0x4550 [e1000e]
[ 623.921853] local_pci_probe+0xd9/0x180
[ 623.921872] pci_device_probe+0x340/0x740
[ 623.921891] really_probe+0x516/0xb00
[ 623.921909] driver_probe_device+0xf0/0x3a0
[ 623.921928] device_driver_attach+0xec/0x120
[ 623.921947] __driver_attach+0x108/0x270
[ 623.921965] bus_for_each_dev+0x116/0x1b0
[ 623.921983] driver_attach+0x38/0x50
[ 623.922001] bus_add_driver+0x44e/0x6a0
[ 623.922019] driver_register+0x18e/0x410
[ 623.922037] __pci_register_driver+0x187/0x240
[ 623.922058] 0xffffffffc029803d
[ 623.922075] do_one_initcall+0xab/0x2d5
[ 623.922106] do_init_module+0x1c7/0x582
[ 623.922138] load_module+0x4efd/0x5f30
[ 623.922170] __do_sys_finit_module+0x12a/0x1b0
[ 623.922193] __x64_sys_finit_module+0x6e/0xb0
[ 623.922212] do_syscall_64+0xaa/0x380
[ 623.922231] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 623.922263] Freed by task 0:
[ 623.922279] save_stack+0x21/0x90
[ 623.922296] __kasan_slab_free+0x137/0x190
[ 623.922316] kasan_slab_free+0xe/0x10
[ 623.922333] kfree+0xb8/0x210
[ 623.922349] rcu_free_wq+0xd5/0x130
[ 623.922367] rcu_core+0x478/0xfa0
[ 623.922383] rcu_core_si+0x9/0x10
[ 623.922400] __do_softirq+0x1c8/0x623
[ 623.922429] The buggy address belongs to the object at ffff8881127b10e8
which belongs to the cache kmalloc-512 of size 512
[ 623.922472] The buggy address is located 448 bytes inside of
512-byte region [ffff8881127b10e8, ffff8881127b12e8)
[ 623.922511] The buggy address belongs to the page:
[ 623.922532] page:ffffea000449ec00 refcount:1 mapcount:0 mapping:ffff888107c11300 index:0x0 compound_mapcount: 0
[ 623.922567] flags: 0x17ffffc0010200(slab|head)
[ 623.922589] raw: 0017ffffc0010200 ffffea000442c008 ffffea000442da08 ffff888107c11300
[ 623.922617] raw: 0000000000000000 0000000000250025 00000001ffffffff 0000000000000000
[ 623.922644] page dumped because: kasan: bad access detected
[ 623.922676] Memory state around the buggy address:
[ 623.922697] ffff8881127b1180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 623.922723] ffff8881127b1200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 623.922749] >ffff8881127b1280: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
[ 623.922775] ^
[ 623.922794] ffff8881127b1300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 623.922820] ffff8881127b1380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 623.922845] ==================================================================
[ 623.922870] Disabling lock debugging due to kernel taint
[ 624.029500] e1000e: eth0 NIC Link is Down
--
~Randy
^ permalink raw reply
* Re: [PATCH v2] net: mdio: switch to using gpiod_get_optional()
From: Andrew Lunn @ 2019-09-14 15:21 UTC (permalink / raw)
To: Dmitry Torokhov
Cc: Florian Fainelli, Heiner Kallweit, David S. Miller, Linus Walleij,
Andy Shevchenko, netdev, linux-kernel
In-Reply-To: <20190913225547.GA106494@dtor-ws>
On Fri, Sep 13, 2019 at 03:55:47PM -0700, Dmitry Torokhov wrote:
> The MDIO device reset line is optional and now that gpiod_get_optional()
> returns proper value when GPIO support is compiled out, there is no
> reason to use fwnode_get_named_gpiod() that I plan to hide away.
>
> Let's switch to using more standard gpiod_get_optional() and
> gpiod_set_consumer_name() to keep the nice "PHY reset" label.
>
> Also there is no reason to only try to fetch the reset GPIO when we have
> OF node, gpiolib can fetch GPIO data from firmwares as well.
>
> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
^ permalink raw reply
* Re: [PATCH net-next] net: dsa: b53: Add support for port_egress_floods callback
From: Andrew Lunn @ 2019-09-14 15:23 UTC (permalink / raw)
To: Florian Fainelli
Cc: netdev, b.spranger, Vivien Didelot, David S. Miller, open list
In-Reply-To: <20190913032841.4302-1-f.fainelli@gmail.com>
On Thu, Sep 12, 2019 at 08:28:39PM -0700, Florian Fainelli wrote:
> Add support for configuring the per-port egress flooding control for
> both Unicast and Multicast traffic.
>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
> Beneditk,
>
> Do you mind re-testing, or confirming that this patch that I sent much
> earlier does work correctly for you? Thanks!
>
> drivers/net/dsa/b53/b53_common.c | 33 ++++++++++++++++++++++++++++++++
> drivers/net/dsa/b53/b53_priv.h | 2 ++
> 2 files changed, 35 insertions(+)
>
> diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
> index 7d328a5f0161..ac2ec08a652b 100644
> --- a/drivers/net/dsa/b53/b53_common.c
> +++ b/drivers/net/dsa/b53/b53_common.c
> @@ -342,6 +342,13 @@ static void b53_set_forwarding(struct b53_device *dev, int enable)
> b53_read8(dev, B53_CTRL_PAGE, B53_SWITCH_CTRL, &mgmt);
> mgmt |= B53_MII_DUMB_FWDG_EN;
> b53_write8(dev, B53_CTRL_PAGE, B53_SWITCH_CTRL, mgmt);
> +
> + /* Look at B53_UC_FWD_EN and B53_MC_FWD_EN to decide whether
> + * frames should be flooed or not.
Hi Florian
s/flooed/flooded
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
^ permalink raw reply
* Re: [PATCH v4 1/2] ethtool: implement Energy Detect Powerdown support via phy-tunable
From: Andrew Lunn @ 2019-09-14 15:26 UTC (permalink / raw)
To: Alexandru Ardelean
Cc: netdev, devicetree, linux-kernel, davem, robh+dt, mark.rutland,
f.fainelli, hkallweit1, mkubecek
In-Reply-To: <20190912162812.402-2-alexandru.ardelean@analog.com>
On Thu, Sep 12, 2019 at 07:28:11PM +0300, Alexandru Ardelean wrote:
> The `phy_tunable_id` has been named `ETHTOOL_PHY_EDPD` since it looks like
> this feature is common across other PHYs (like EEE), and defining
> `ETHTOOL_PHY_ENERGY_DETECT_POWER_DOWN` seems too long.
>
> The way EDPD works, is that the RX block is put to a lower power mode,
> except for link-pulse detection circuits. The TX block is also put to low
> power mode, but the PHY wakes-up periodically to send link pulses, to avoid
> lock-ups in case the other side is also in EDPD mode.
>
> Currently, there are 2 PHY drivers that look like they could use this new
> PHY tunable feature: the `adin` && `micrel` PHYs.
>
> The ADIN's datasheet mentions that TX pulses are at intervals of 1 second
> default each, and they can be disabled. For the Micrel KSZ9031 PHY, the
> datasheet does not mention whether they can be disabled, but mentions that
> they can modified.
>
> The way this change is structured, is similar to the PHY tunable downshift
> control:
> * a `ETHTOOL_PHY_EDPD_DFLT_TX_MSECS` value is exposed to cover a default
> TX interval; some PHYs could specify a certain value that makes sense
> * `ETHTOOL_PHY_EDPD_NO_TX` would disable TX when EDPD is enabled
> * `ETHTOOL_PHY_EDPD_DISABLE` will disable EDPD
>
> As noted by the `ETHTOOL_PHY_EDPD_DFLT_TX_MSECS` the interval unit is 1
> millisecond, which should cover a reasonable range of intervals:
> - from 1 millisecond, which does not sound like much of a power-saver
> - to ~65 seconds which is quite a lot to wait for a link to come up when
> plugging a cable
>
> Signed-off-by: Alexandru Ardelean <alexandru.ardelean@analog.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
^ permalink raw reply
* Re: [PATCH v4 2/2] net: phy: adin: implement Energy Detect Powerdown mode via phy-tunable
From: Andrew Lunn @ 2019-09-14 15:29 UTC (permalink / raw)
To: Alexandru Ardelean
Cc: netdev, devicetree, linux-kernel, davem, robh+dt, mark.rutland,
f.fainelli, hkallweit1, mkubecek
In-Reply-To: <20190912162812.402-3-alexandru.ardelean@analog.com>
On Thu, Sep 12, 2019 at 07:28:12PM +0300, Alexandru Ardelean wrote:
> +static int adin_set_edpd(struct phy_device *phydev, u16 tx_interval)
> +{
> + u16 val;
> +
> + if (tx_interval == ETHTOOL_PHY_EDPD_DISABLE)
> + return phy_clear_bits(phydev, ADIN1300_PHY_CTRL_STATUS2,
> + (ADIN1300_NRG_PD_EN | ADIN1300_NRG_PD_TX_EN));
> +
> + val = ADIN1300_NRG_PD_EN;
> +
> + switch (tx_interval) {
> + case 1000: /* 1 second */
> + /* fallthrough */
> + case ETHTOOL_PHY_EDPD_DFLT_TX_MSECS:
> + val |= ADIN1300_NRG_PD_TX_EN;
> + /* fallthrough */
> + case ETHTOOL_PHY_EDPD_NO_TX:
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + return phy_modify(phydev, ADIN1300_PHY_CTRL_STATUS2,
> + (ADIN1300_NRG_PD_EN | ADIN1300_NRG_PD_TX_EN),
> + val);
> +}
> +
>
> + rc = adin_set_edpd(phydev, 1);
> + if (rc < 0)
> + return rc;
Hi Alexandru
Shouldn't this be adin_set_edpd(phydev, 1000);
Andrew
^ permalink raw reply
* Re: [PATCH v5 1/2] tcp: Add TCP_INFO counter for packets received out-of-order
From: Neal Cardwell @ 2019-09-14 15:43 UTC (permalink / raw)
To: Thomas Higdon
Cc: netdev@vger.kernel.org, Jonathan Lemon, Dave Jones, Eric Dumazet,
Dave Taht, Yuchung Cheng, Soheil Hassas Yeganeh
In-Reply-To: <20190913232332.44036-1-tph@fb.com>
On Fri, Sep 13, 2019 at 7:23 PM Thomas Higdon <tph@fb.com> wrote:
>
> For receive-heavy cases on the server-side, we want to track the
> connection quality for individual client IPs. This counter, similar to
> the existing system-wide TCPOFOQueue counter in /proc/net/netstat,
> tracks out-of-order packet reception. By providing this counter in
> TCP_INFO, it will allow understanding to what degree receive-heavy
> sockets are experiencing out-of-order delivery and packet drops
> indicating congestion.
>
> Please note that this is similar to the counter in NetBSD TCP_INFO, and
> has the same name.
>
> Also note that we avoid increasing the size of the tcp_sock struct by
> taking advantage of a hole.
>
> Signed-off-by: Thomas Higdon <tph@fb.com>
> ---
> changes since v4:
> - optimize placement of rcv_ooopack to avoid increasing tcp_sock struct
> size
Acked-by: Neal Cardwell <ncardwell@google.com>
Thanks, Thomas, for adding this!
After this is merged, would you mind sending a patch to add support to
the "ss" command line tool to print these 2 new fields?
My favorite recent example of such a patch to ss is Eric's change:
https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/commit/misc/ss.c?id=5eead6270a19f00464052d4084f32182cfe027ff
thanks,
neal
^ permalink raw reply
* Re: [PATCH v5 2/2] tcp: Add snd_wnd to TCP_INFO
From: Neal Cardwell @ 2019-09-14 15:45 UTC (permalink / raw)
To: Thomas Higdon
Cc: netdev@vger.kernel.org, Jonathan Lemon, Dave Jones, Eric Dumazet,
Dave Taht, Yuchung Cheng, Soheil Hassas Yeganeh
In-Reply-To: <20190913232332.44036-2-tph@fb.com>
On Fri, Sep 13, 2019 at 7:23 PM Thomas Higdon <tph@fb.com> wrote:
>
> Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP
> performance problems --
> > (1) Usually when we're diagnosing TCP performance problems, we do so
> > from the sender, since the sender makes most of the
> > performance-critical decisions (cwnd, pacing, TSO size, TSQ, etc).
> > From the sender-side the thing that would be most useful is to see
> > tp->snd_wnd, the receive window that the receiver has advertised to
> > the sender.
>
> This serves the purpose of adding an additional __u32 to avoid the
> would-be hole caused by the addition of the tcpi_rcvi_ooopack field.
>
> Signed-off-by: Thomas Higdon <tph@fb.com>
> ---
> changes since v4:
> - clarify comment
Acked-by: Neal Cardwell <ncardwell@google.com>
Thanks!
neal
^ permalink raw reply
* Re: [PATCH v2] net: mdio: switch to using gpiod_get_optional()
From: Andy Shevchenko @ 2019-09-14 17:09 UTC (permalink / raw)
To: Dmitry Torokhov
Cc: Andrew Lunn, Florian Fainelli, Heiner Kallweit, David S. Miller,
Linus Walleij, netdev, linux-kernel
In-Reply-To: <20190913225547.GA106494@dtor-ws>
On Fri, Sep 13, 2019 at 03:55:47PM -0700, Dmitry Torokhov wrote:
> The MDIO device reset line is optional and now that gpiod_get_optional()
> returns proper value when GPIO support is compiled out, there is no
> reason to use fwnode_get_named_gpiod() that I plan to hide away.
>
> Let's switch to using more standard gpiod_get_optional() and
> gpiod_set_consumer_name() to keep the nice "PHY reset" label.
>
> Also there is no reason to only try to fetch the reset GPIO when we have
> OF node, gpiolib can fetch GPIO data from firmwares as well.
>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
But see comment below.
> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
> ---
>
> Note this is an update to a patch titled "[PATCH 05/11] net: mdio:
> switch to using fwnode_gpiod_get_index()" that no longer uses the new
> proposed API and instead works with already existing ones.
>
> drivers/net/phy/mdio_bus.c | 22 +++++++++-------------
> 1 file changed, 9 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
> index ce940871331e..2e29ab841b4d 100644
> --- a/drivers/net/phy/mdio_bus.c
> +++ b/drivers/net/phy/mdio_bus.c
> @@ -42,21 +42,17 @@
>
> static int mdiobus_register_gpiod(struct mdio_device *mdiodev)
> {
> - struct gpio_desc *gpiod = NULL;
> + int error;
>
> /* Deassert the optional reset signal */
> - if (mdiodev->dev.of_node)
> - gpiod = fwnode_get_named_gpiod(&mdiodev->dev.of_node->fwnode,
> - "reset-gpios", 0, GPIOD_OUT_LOW,
> - "PHY reset");
> - if (IS_ERR(gpiod)) {
> - if (PTR_ERR(gpiod) == -ENOENT || PTR_ERR(gpiod) == -ENOSYS)
> - gpiod = NULL;
> - else
> - return PTR_ERR(gpiod);
> - }
> -
> - mdiodev->reset_gpio = gpiod;
> + mdiodev->reset_gpio = gpiod_get_optional(&mdiodev->dev,
> + "reset", GPIOD_OUT_LOW);
> + error = PTR_ERR_OR_ZERO(mdiodev->reset_gpio);
> + if (error)
> + return error;
> +
> + if (mdiodev->reset_gpio)
This is redundant check.
> + gpiod_set_consumer_name(mdiodev->reset_gpio, "PHY reset");
> return 0;
> }
> --
> 2.23.0.237.gc6a4ce50a0-goog
>
>
> --
> Dmitry
--
With Best Regards,
Andy Shevchenko
^ permalink raw reply
* Re: SFP support with RGMII MAC via RGMII to SERDES/SGMII PHY?
From: Florian Fainelli @ 2019-09-14 17:15 UTC (permalink / raw)
To: Russell King - ARM Linux admin
Cc: George McCollister, netdev, Andrew Lunn, Heiner Kallweit
In-Reply-To: <20190914084856.GD13294@shell.armlinux.org.uk>
On 9/14/2019 1:48 AM, Russell King - ARM Linux admin wrote:
> On Fri, Sep 13, 2019 at 08:31:18PM -0700, Florian Fainelli wrote:
>> +Russell, Andrew, Heiner,
>>
>> On 9/13/2019 9:44 AM, George McCollister wrote:
>>> Every example of phylink SFP support I've seen is using an Ethernet
>>> MAC with native SGMII.
>>> Can phylink facilitate support of Fiber and Copper SFP modules
>>> connected to an RGMII MAC if all of the following are true?
>>
>> I don't think that use case has been presented before, but phylink
>> sounds like the tool that should help solve it. From your description
>> below, it sounds like all the pieces are there to support it. Is the
>> Ethernet MAC driver upstream?
>
> It has been presented, and it's something I've been trying to support
> for the last couple of years - in fact, I have patches in my tree that
> support a very similar scenario on the Macchiatobin with the 88x3310
> PHYs.
>
>>> 1) The MAC is connected via RGMII to a transceiver/PHY (such as
>>> Marvell 88E1512) which then connects to the SFP via SERDER/SGMII. If
>>> you want to see a block diagram it's the first one here:
>>> https://www.marvell.com/transceivers/assets/Alaska_88E1512-001_product_brief.pdf
>
> As mentioned above, this is no different from the Macchiatobin,
> where we have:
>
> .-------- RJ45
> MAC ---- 88x3310 PHY
> `-------- SFP+
>
> except instead of the MAC to PHY link being 10GBASE-R, it's RGMII,
> and the PHY to SFP+ link is 10GBASE-R instead of 1000BASE-X.
>
> Note that you're abusing the term "SGMII". SGMII is a Cisco
> modification of the IEEE 802.3 1000BASE-X protocol. Fiber SFPs
> exclusively use 1000BASE-X protocol. However, some copper SFPs
> (with a RJ45) do use SGMII.
>
>>> 2) The 1G Ethernet driver has been converted to use phylink.
>
> This is not necessary for this scenario. The PHY driver needs to
> be updated to know about SFP though.
>
> See:
>
> http://git.armlinux.org.uk/cgit/linux-arm.git/commit/?h=phy&id=ece56785ee0e9df40dc823fdc39ee74b4a7cd1c4
Regarding that patch, the SFP attach/detach callbacks do not seem very
specific to the PHY driver, only the sfp_insert callback which needs to
check the interface selected by the SFP.
Do you think it would make sense to move some of that logic into the
core PHY library and only have PHY drivers can be used to connect a SFP
cage specify a "sfp_select_interface" callback that is responsible for
rejecting the mode the SFP has been configured in, if unsupported?
Likewise for parsing the "sfp" property, if we parse that property in
the core and do not have a sfp_select_interface callback defined, then
it is not going to work.
So far we know that both marvel10g.c and marvell.c, two drivers that
would already justify factoring things in the core, I recall some people
using Broadcom PHYs having the same use cases, so 3 possible drivers
needing that functionality.
>
> as an example of the 88x3310 supporting a SFP+ cage. This patch is
> also necessary:
>
> http://git.armlinux.org.uk/cgit/linux-arm.git/commit/?h=phy&id=ef2d699397ca28c7f89e01cc9e5037989096a990
>
> and if anything is going to stand in the way of progress on this, it
> is likely to be that patch. I'll be attempting to post these after
> the next merge window (i.o.w. probably posting them in three weeks
> time.)
--
Florian
^ permalink raw reply
* Re: SFP support with RGMII MAC via RGMII to SERDES/SGMII PHY?
From: Russell King - ARM Linux admin @ 2019-09-14 17:44 UTC (permalink / raw)
To: Florian Fainelli; +Cc: George McCollister, netdev, Andrew Lunn, Heiner Kallweit
In-Reply-To: <84d75b1c-8489-4242-fe6d-e7d3b389f1a2@gmail.com>
On Sat, Sep 14, 2019 at 10:15:26AM -0700, Florian Fainelli wrote:
>
>
> On 9/14/2019 1:48 AM, Russell King - ARM Linux admin wrote:
> > On Fri, Sep 13, 2019 at 08:31:18PM -0700, Florian Fainelli wrote:
> >> +Russell, Andrew, Heiner,
> >>
> >> On 9/13/2019 9:44 AM, George McCollister wrote:
> >>> Every example of phylink SFP support I've seen is using an Ethernet
> >>> MAC with native SGMII.
> >>> Can phylink facilitate support of Fiber and Copper SFP modules
> >>> connected to an RGMII MAC if all of the following are true?
> >>
> >> I don't think that use case has been presented before, but phylink
> >> sounds like the tool that should help solve it. From your description
> >> below, it sounds like all the pieces are there to support it. Is the
> >> Ethernet MAC driver upstream?
> >
> > It has been presented, and it's something I've been trying to support
> > for the last couple of years - in fact, I have patches in my tree that
> > support a very similar scenario on the Macchiatobin with the 88x3310
> > PHYs.
> >
> >>> 1) The MAC is connected via RGMII to a transceiver/PHY (such as
> >>> Marvell 88E1512) which then connects to the SFP via SERDER/SGMII. If
> >>> you want to see a block diagram it's the first one here:
> >>> https://www.marvell.com/transceivers/assets/Alaska_88E1512-001_product_brief.pdf
> >
> > As mentioned above, this is no different from the Macchiatobin,
> > where we have:
> >
> > .-------- RJ45
> > MAC ---- 88x3310 PHY
> > `-------- SFP+
> >
> > except instead of the MAC to PHY link being 10GBASE-R, it's RGMII,
> > and the PHY to SFP+ link is 10GBASE-R instead of 1000BASE-X.
> >
> > Note that you're abusing the term "SGMII". SGMII is a Cisco
> > modification of the IEEE 802.3 1000BASE-X protocol. Fiber SFPs
> > exclusively use 1000BASE-X protocol. However, some copper SFPs
> > (with a RJ45) do use SGMII.
> >
> >>> 2) The 1G Ethernet driver has been converted to use phylink.
> >
> > This is not necessary for this scenario. The PHY driver needs to
> > be updated to know about SFP though.
> >
> > See:
> >
> > http://git.armlinux.org.uk/cgit/linux-arm.git/commit/?h=phy&id=ece56785ee0e9df40dc823fdc39ee74b4a7cd1c4
>
> Regarding that patch, the SFP attach/detach callbacks do not seem very
> specific to the PHY driver, only the sfp_insert callback which needs to
> check the interface selected by the SFP.
>
> Do you think it would make sense to move some of that logic into the
> core PHY library and only have PHY drivers can be used to connect a SFP
> cage specify a "sfp_select_interface" callback that is responsible for
> rejecting the mode the SFP has been configured in, if unsupported?
It's not that simple. The Marvell 1G PHYs which have a fiber interface
re-use the fiber interface for SGMII when configured for such a mode.
It's not as simple as "did the driver specify a callback for this
feature" but "does the PHY support a fiber interface _and_ does the PHY
configuration allow the fiber interface to be used?" So, I think the
PHY driver needs to have a say (in terms of code) whether there is
support for fiber.
However, it'd be silly to specify a sfp property in a situation where
the fiber interface on a PHY can't be used.
In any case, the callback into the PHY driver needs to be as per the
"sfp_insert" method - some PHYs will only be able to support a limited
number of SFPs. It seems, for example, the 88x3310 can support more
than just 10G modules - it allegedly can support 2500base-X, 1000base-X
and SGMII modules too if we hit it hard enough.
> Likewise for parsing the "sfp" property, if we parse that property in
> the core and do not have a sfp_select_interface callback defined, then
> it is not going to work.
Today, I've moved parsing the "sfp" property into sfp-bus, so that's no
longer a concern.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
^ permalink raw reply
* Re: [PATCH v5 2/2] tcp: Add snd_wnd to TCP_INFO
From: Soheil Hassas Yeganeh @ 2019-09-14 17:57 UTC (permalink / raw)
To: Neal Cardwell
Cc: Thomas Higdon, netdev@vger.kernel.org, Jonathan Lemon, Dave Jones,
Eric Dumazet, Dave Taht, Yuchung Cheng
In-Reply-To: <CADVnQykBFBU5bFLXRr_aRzxNVpNGQRtELG5kd6viGWqO0uyyng@mail.gmail.com>
On Sat, Sep 14, 2019 at 11:45 AM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Fri, Sep 13, 2019 at 7:23 PM Thomas Higdon <tph@fb.com> wrote:
> >
> > Neal Cardwell mentioned that snd_wnd would be useful for diagnosing TCP
> > performance problems --
> > > (1) Usually when we're diagnosing TCP performance problems, we do so
> > > from the sender, since the sender makes most of the
> > > performance-critical decisions (cwnd, pacing, TSO size, TSQ, etc).
> > > From the sender-side the thing that would be most useful is to see
> > > tp->snd_wnd, the receive window that the receiver has advertised to
> > > the sender.
> >
> > This serves the purpose of adding an additional __u32 to avoid the
> > would-be hole caused by the addition of the tcpi_rcvi_ooopack field.
> >
> > Signed-off-by: Thomas Higdon <tph@fb.com>
> > ---
> > changes since v4:
> > - clarify comment
>
> Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Thank you for adding the new field!
^ permalink raw reply
* Re: [PATCH] vhost: Fix compile time error
From: Michael S. Tsirkin @ 2019-09-14 19:38 UTC (permalink / raw)
To: Guenter Roeck
Cc: Jason Wang, kvm, virtualization, netdev, linux-kernel,
Greg Kroah-Hartman, Linus Torvalds
In-Reply-To: <1568450697-16775-1-git-send-email-linux@roeck-us.net>
On Sat, Sep 14, 2019 at 01:44:57AM -0700, Guenter Roeck wrote:
> Building vhost on 32-bit targets results in the following error.
>
> drivers/vhost/vhost.c: In function 'translate_desc':
> include/linux/compiler.h:549:38: error:
> call to '__compiletime_assert_1879' declared with attribute error:
> BUILD_BUG_ON failed: sizeof(_s) > sizeof(long)
>
> Fixes: a89db445fbd7 ("vhost: block speculation of translated descriptors")
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> ---
> drivers/vhost/vhost.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index acabf20b069e..102a0c877007 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2074,7 +2074,7 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len,
> _iov->iov_base = (void __user *)
> ((unsigned long)node->userspace_addr +
> array_index_nospec((unsigned long)(addr - node->start),
> - node->size));
> + (unsigned long)node->size));
Unfortunately this does not fix the case where size is actually 64 bit,
e.g. a single node with VA 0, size 2^32 is how
you cover the whole virtual address space.
this is not how qemu uses it, but it's valid.
I think it's best to just revert the patch for now.
> s += size;
> addr += size;
> ++ret;
> --
> 2.7.4
^ permalink raw reply
* [PULL] vhost: a last minute revert
From: Michael S. Tsirkin @ 2019-09-14 19:38 UTC (permalink / raw)
To: Linus Torvalds; +Cc: kvm, virtualization, netdev, linux-kernel, stable, mst
So I made a mess of it. Sent a pull before making sure it works on 32
bit too. Hope it's not too late to revert. Will teach me to be way more
careful in the near future.
The following changes since commit 060423bfdee3f8bc6e2c1bac97de24d5415e2bc4:
vhost: make sure log_num < in_num (2019-09-11 15:15:26 -0400)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
for you to fetch changes up to 0d4a3f2abbef73b9e5bb5f12213c275565473588:
Revert "vhost: block speculation of translated descriptors" (2019-09-14 15:21:51 -0400)
----------------------------------------------------------------
virtio: a last minute revert
32 bit build got broken by the latest defence in depth patch.
Revert and we'll try again in the next cycle.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
----------------------------------------------------------------
Michael S. Tsirkin (1):
Revert "vhost: block speculation of translated descriptors"
drivers/vhost/vhost.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
^ permalink raw reply
* Re: [PULL] vhost: a last minute revert
From: pr-tracker-bot @ 2019-09-14 23:25 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Linus Torvalds, kvm, virtualization, netdev, linux-kernel, stable,
mst
In-Reply-To: <20190914153859-mutt-send-email-mst@kernel.org>
The pull request you sent on Sat, 14 Sep 2019 15:38:59 -0400:
> https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/1f9c632cde0c3d781463a88ce430a8dd4a7c1a0e
Thank you!
--
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker
^ permalink raw reply
* Re: [GIT] Networking
From: pr-tracker-bot @ 2019-09-14 23:25 UTC (permalink / raw)
To: David Miller; +Cc: torvalds, akpm, netdev, linux-kernel
In-Reply-To: <20190913.215540.530478705339568215.davem@davemloft.net>
The pull request you sent on Fri, 13 Sep 2019 21:55:40 +0100 (WEST):
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git refs/heads/master
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/36024fcf8d28999f270908e75675d43b099ff7b3
Thank you!
--
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker
^ permalink raw reply
* [PATCH v3 net-next 0/7] tc-taprio offload for SJA1105 DSA
From: Vladimir Oltean @ 2019-09-15 1:53 UTC (permalink / raw)
To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
vedang.patel, richardcochran
Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
jhs, xiyou.wangcong, kurt.kanzenbach, joergen.andreasen, netdev,
Vladimir Oltean
This is the second attempt to submit the tc-taprio offload model for
inclusion in the net tree. The sja1105 switch driver will provide the
first implementation of the offload. Only the bare minimum is added:
- The offload model and a DSA pass-through
- The hardware implementation
- The interaction with the netdev queues in the tagger code
- Documentation
What has been removed from the first attempt is support for
PTP-as-clocksource in sja1105. This will be added as soon as the
offload model is settled.
Vinicius Costa Gomes (1):
taprio: Add support for hardware offloading
Vladimir Oltean (6):
net: dsa: Pass ndo_setup_tc slave callback to drivers
net: dsa: sja1105: Add static config tables for scheduling
net: dsa: sja1105: Advertise the 8 TX queues
net: dsa: sja1105: Make HOSTPRIO a kernel config
net: dsa: sja1105: Configure the Time-Aware Scheduler via tc-taprio
offload
docs: net: dsa: sja1105: Add info about the time-aware scheduler
Documentation/networking/dsa/sja1105.rst | 90 ++++
drivers/net/dsa/sja1105/Kconfig | 17 +
drivers/net/dsa/sja1105/Makefile | 4 +
drivers/net/dsa/sja1105/sja1105.h | 6 +
.../net/dsa/sja1105/sja1105_dynamic_config.c | 8 +
drivers/net/dsa/sja1105/sja1105_main.c | 28 +-
.../net/dsa/sja1105/sja1105_static_config.c | 167 +++++++
.../net/dsa/sja1105/sja1105_static_config.h | 48 +-
drivers/net/dsa/sja1105/sja1105_tas.c | 427 ++++++++++++++++++
drivers/net/dsa/sja1105/sja1105_tas.h | 42 ++
include/linux/netdevice.h | 1 +
include/net/dsa.h | 2 +
include/net/pkt_sched.h | 23 +
include/uapi/linux/pkt_sched.h | 3 +-
net/dsa/slave.c | 12 +-
net/dsa/tag_sja1105.c | 3 +-
net/sched/sch_taprio.c | 409 +++++++++++++++--
17 files changed, 1237 insertions(+), 53 deletions(-)
create mode 100644 drivers/net/dsa/sja1105/sja1105_tas.c
create mode 100644 drivers/net/dsa/sja1105/sja1105_tas.h
--
For those who want to follow along with the hardware implementation, the
manual is here:
https://www.nxp.com/docs/en/user-guide/UM10944.pdf
Notable changes in v2:
- Made the series independent from PTP (which is temporarily removed)
- Changed the meaning of the gate_mask - it is now acting on traffic
classes even in the view exposed by taprio to drivers.
- Removed the next_sched hrtimer.
- Summarized one of the responses given to Vinicius into a new
documentation section.
The first version of the net-next patch series can be found here:
https://www.spinics.net/lists/netdev/msg597214.html
Changes in the first version of the net-next series compared to RFC v2:
- Made "flags 1" and "flags 2" mutually exclusive in the taprio qdisc
- Moved taprio_enable_offload and taprio_disable_offload out of atomic
context - spin_lock_bh(qdisc_lock(sch)). This allows drivers that
implement the ndo_setup_tc to sleep and for taprio memory to be
allocated with GFP_KERNEL. The only thing that was kept under the
spinlock is the assignment of the q->dequeue and q->peek pointers.
- Finally making proper use of own API - added a taprio_alloc helper to
avoid passing stack memory to drivers.
The second version of the RFC is at:
https://www.spinics.net/lists/netdev/msg596663.html
Changes in v2 of the RFC since v1:
- Adapted the taprio offload patch to work by specifying "flags 2" to
the iproute2-next tc. At the moment I don't clearly understand whether
the full offload and the txtime assist ("flags 1") are mutually
exclusive or not (i.e. whether a "flags 3" mode should be rejected,
which it currently isn't).
- Added reference counting to the taprio offload structure. Maybe the
function names and placement could have been better though. As for the
other complaint (cycle time calculation) it got fixed in the taprio
parser in the meantime.
- Converted sja1105 to use the hardware PTP registers, and save/restore
the PTP time across resets.
- Made the DSA callback for ndo_setup_tc a bit more generic, but I don't
know whether it fulfills expectations. Drivers still can't do blocking
operations in its execution context.
- Added a state machine for starting/stopping the scheduler based on the
last command run on the PTP clock.
The first RFC from July can be seen at:
https://lists.openwall.net/netdev/2019/07/07/81
Original cover letter:
Using Vinicius Costa Gomes' configuration interface for 802.1Qbv (later
resent by Voon Weifeng for the stmmac driver), I am submitting for
review a draft implementation of this offload for a DSA switch.
I don't want to insist too much on the hardware specifics of SJA1105
which isn't otherwise very compliant to the IEEE spec.
In order to be able to test with Vedang Patel's iproute2 patch for
taprio offload (https://www.spinics.net/lists/netdev/msg573072.html)
I had to actually revert the txtime-assist branch as it had changed the
iproute2 interface.
In terms of impact for DSA drivers, I would like to point out that:
- Maybe somebody should pre-populate qopt->cycle_time in case the user
does not provide one. Otherwise each driver needs to iterate over the
GCL once, just to set the cycle time (right now stmmac does as well).
- Configuring the switch over SPI cannot apparently be done from this
ndo_setup_tc callback because it runs in atomic context. I also have
some downstream patches to offload tc clsact matchall with mirred
action, but in that case it looks like the atomic context restriction
does not apply.
- I had to copy the struct tc_taprio_qopt_offload to driver private
memory because a static config needs to be constructed every time a
change takes place, and there are up to 4 switch ports that may take a
TAS configuration. I have created a private
tc_taprio_qopt_offload_copy() helper for this - I don't know whether
it's of any help in the general case.
There is more to be done however. The TAS needs to be integrated with
the PTP driver. This is because with a PTP clock source, the base time
is written dynamically to the PTPSCHTM (PTP schedule time) register and
must be a time in the future. Then the "real" base time of each port's
TAS config can be offset by at most ~50 ms (the DELTA field from the
Schedule Entry Points Table) relative to PTPSCHTM.
Because base times in the past are completely ignored by this hardware,
we need to decide if it's ok behaviorally for a driver to "roll" a past
base time into the immediate future by incrementally adding the cycle
time (so the phase doesn't change). If it is, then decide by how long in
the future it is ok to do so. Or alternatively, is it preferable if the
driver errors out if the user-supplied base time is in the past and the
hardware doesn't like it? But even then, there might be fringe cases
when the base time becomes a past PTP time right as the driver tries to
apply the config.
Also applying a tc-taprio offload to a second SJA1105 switch port will
inevitably need to roll the first port's (now past) base time into an
equivalent future time.
All of this is going to be complicated even further by the fact that
resetting the switch (to apply the tc-taprio offload) makes it reset its
PTP time.
2.17.1
^ permalink raw reply
* [PATCH v3 net-next 2/7] net: dsa: Pass ndo_setup_tc slave callback to drivers
From: Vladimir Oltean @ 2019-09-15 1:53 UTC (permalink / raw)
To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
vedang.patel, richardcochran
Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
jhs, xiyou.wangcong, kurt.kanzenbach, joergen.andreasen, netdev,
Vladimir Oltean
In-Reply-To: <20190915015314.26605-1-olteanv@gmail.com>
DSA currently handles shared block filters (for the classifier-action
qdisc) in the core due to what I believe are simply pragmatic reasons -
hiding the complexity from drivers and offerring a simple API for port
mirroring.
Extend the dsa_slave_setup_tc function by passing all other qdisc
offloads to the driver layer, where the driver may choose what it
implements and how. DSA is simply a pass-through in this case.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
---
include/net/dsa.h | 2 ++
net/dsa/slave.c | 12 ++++++++----
2 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 96acb14ec1a8..541fb514e31d 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -515,6 +515,8 @@ struct dsa_switch_ops {
bool ingress);
void (*port_mirror_del)(struct dsa_switch *ds, int port,
struct dsa_mall_mirror_tc_entry *mirror);
+ int (*port_setup_tc)(struct dsa_switch *ds, int port,
+ enum tc_setup_type type, void *type_data);
/*
* Cross-chip operations
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 9a88035517a6..75d58229a4bd 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1035,12 +1035,16 @@ static int dsa_slave_setup_tc_block(struct net_device *dev,
static int dsa_slave_setup_tc(struct net_device *dev, enum tc_setup_type type,
void *type_data)
{
- switch (type) {
- case TC_SETUP_BLOCK:
+ struct dsa_port *dp = dsa_slave_to_port(dev);
+ struct dsa_switch *ds = dp->ds;
+
+ if (type == TC_SETUP_BLOCK)
return dsa_slave_setup_tc_block(dev, type_data);
- default:
+
+ if (!ds->ops->port_setup_tc)
return -EOPNOTSUPP;
- }
+
+ return ds->ops->port_setup_tc(ds, dp->index, type, type_data);
}
static void dsa_slave_get_stats64(struct net_device *dev,
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 1/7] taprio: Add support for hardware offloading
From: Vladimir Oltean @ 2019-09-15 1:53 UTC (permalink / raw)
To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
vedang.patel, richardcochran
Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
jhs, xiyou.wangcong, kurt.kanzenbach, joergen.andreasen, netdev,
Vladimir Oltean
In-Reply-To: <20190915015314.26605-1-olteanv@gmail.com>
From: Vinicius Costa Gomes <vinicius.gomes@intel.com>
This allows taprio to offload the schedule enforcement to capable
network cards, resulting in more precise windows and less CPU usage.
The gate mask acts on traffic classes (groups of queues of same
priority), as specified in IEEE 802.1Q-2018, and following the existing
taprio and mqprio semantics.
It is up to the driver to perform conversion between tc and individual
netdev queues if for some reason it needs to make that distinction.
Full offload is requested from the network interface by specifying
"flags 2" in the tc qdisc creation command, which in turn corresponds to
the TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD bit.
The important detail here is the clockid which is implicitly /dev/ptpN
for full offload, and hence not configurable.
A reference counting API is added to support the use case where Ethernet
drivers need to keep the taprio offload structure locally (i.e. they are
a multi-port switch driver, and configuring a port depends on the
settings of other ports as well). The refcount_t variable is kept in a
private structure (__tc_taprio_qopt_offload) and not exposed to drivers.
In the future, the private structure might also be expanded with a
backpointer to taprio_sched *q, to implement the notification system
described in the patch (of when admin became oper, or an error occurred,
etc, so the offload can be monitored with 'tc qdisc show').
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
include/linux/netdevice.h | 1 +
include/net/pkt_sched.h | 23 ++
include/uapi/linux/pkt_sched.h | 3 +-
net/sched/sch_taprio.c | 409 +++++++++++++++++++++++++++++----
4 files changed, 392 insertions(+), 44 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d7d5626002e9..9eda1c31d1f7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -847,6 +847,7 @@ enum tc_setup_type {
TC_SETUP_QDISC_ETF,
TC_SETUP_ROOT_QDISC,
TC_SETUP_QDISC_GRED,
+ TC_SETUP_QDISC_TAPRIO,
};
/* These structures hold the attributes of bpf state that are being passed
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index a16fbe9a2a67..d1632979622e 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -161,4 +161,27 @@ struct tc_etf_qopt_offload {
s32 queue;
};
+struct tc_taprio_sched_entry {
+ u8 command; /* TC_TAPRIO_CMD_* */
+
+ /* The gate_mask in the offloading side refers to traffic classes */
+ u32 gate_mask;
+ u32 interval;
+};
+
+struct tc_taprio_qopt_offload {
+ u8 enable;
+ ktime_t base_time;
+ u64 cycle_time;
+ u64 cycle_time_extension;
+
+ size_t num_entries;
+ struct tc_taprio_sched_entry entries[0];
+};
+
+/* Reference counting */
+struct tc_taprio_qopt_offload *taprio_offload_get(struct tc_taprio_qopt_offload
+ *offload);
+void taprio_offload_free(struct tc_taprio_qopt_offload *offload);
+
#endif
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 18f185299f47..5011259b8f67 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -1160,7 +1160,8 @@ enum {
* [TCA_TAPRIO_ATTR_SCHED_ENTRY_INTERVAL]
*/
-#define TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST 0x1
+#define TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST BIT(0)
+#define TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD BIT(1)
enum {
TCA_TAPRIO_ATTR_UNSPEC,
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 84b863e2bdbd..2f7b34205c82 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -29,8 +29,8 @@ static DEFINE_SPINLOCK(taprio_list_lock);
#define TAPRIO_ALL_GATES_OPEN -1
-#define FLAGS_VALID(flags) (!((flags) & ~TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST))
#define TXTIME_ASSIST_IS_ENABLED(flags) ((flags) & TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST)
+#define FULL_OFFLOAD_IS_ENABLED(flags) ((flags) & TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD)
struct sched_entry {
struct list_head list;
@@ -75,9 +75,16 @@ struct taprio_sched {
struct sched_gate_list __rcu *admin_sched;
struct hrtimer advance_timer;
struct list_head taprio_list;
+ struct sk_buff *(*dequeue)(struct Qdisc *sch);
+ struct sk_buff *(*peek)(struct Qdisc *sch);
u32 txtime_delay;
};
+struct __tc_taprio_qopt_offload {
+ refcount_t users;
+ struct tc_taprio_qopt_offload offload;
+};
+
static ktime_t sched_base_time(const struct sched_gate_list *sched)
{
if (!sched)
@@ -268,6 +275,19 @@ static bool is_valid_interval(struct sk_buff *skb, struct Qdisc *sch)
return entry;
}
+static bool taprio_flags_valid(u32 flags)
+{
+ /* Make sure no other flag bits are set. */
+ if (flags & ~(TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST |
+ TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD))
+ return false;
+ /* txtime-assist and full offload are mutually exclusive */
+ if ((flags & TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST) &&
+ (flags & TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD))
+ return false;
+ return true;
+}
+
/* This returns the tstamp value set by TCP in terms of the set clock. */
static ktime_t get_tcp_tstamp(struct taprio_sched *q, struct sk_buff *skb)
{
@@ -417,7 +437,7 @@ static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
return qdisc_enqueue(skb, child, to_free);
}
-static struct sk_buff *taprio_peek(struct Qdisc *sch)
+static struct sk_buff *taprio_peek_soft(struct Qdisc *sch)
{
struct taprio_sched *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch);
@@ -461,6 +481,36 @@ static struct sk_buff *taprio_peek(struct Qdisc *sch)
return NULL;
}
+static struct sk_buff *taprio_peek_offload(struct Qdisc *sch)
+{
+ struct taprio_sched *q = qdisc_priv(sch);
+ struct net_device *dev = qdisc_dev(sch);
+ struct sk_buff *skb;
+ int i;
+
+ for (i = 0; i < dev->num_tx_queues; i++) {
+ struct Qdisc *child = q->qdiscs[i];
+
+ if (unlikely(!child))
+ continue;
+
+ skb = child->ops->peek(child);
+ if (!skb)
+ continue;
+
+ return skb;
+ }
+
+ return NULL;
+}
+
+static struct sk_buff *taprio_peek(struct Qdisc *sch)
+{
+ struct taprio_sched *q = qdisc_priv(sch);
+
+ return q->peek(sch);
+}
+
static void taprio_set_budget(struct taprio_sched *q, struct sched_entry *entry)
{
atomic_set(&entry->budget,
@@ -468,7 +518,7 @@ static void taprio_set_budget(struct taprio_sched *q, struct sched_entry *entry)
atomic64_read(&q->picos_per_byte)));
}
-static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
+static struct sk_buff *taprio_dequeue_soft(struct Qdisc *sch)
{
struct taprio_sched *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch);
@@ -550,6 +600,40 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
return skb;
}
+static struct sk_buff *taprio_dequeue_offload(struct Qdisc *sch)
+{
+ struct taprio_sched *q = qdisc_priv(sch);
+ struct net_device *dev = qdisc_dev(sch);
+ struct sk_buff *skb;
+ int i;
+
+ for (i = 0; i < dev->num_tx_queues; i++) {
+ struct Qdisc *child = q->qdiscs[i];
+
+ if (unlikely(!child))
+ continue;
+
+ skb = child->ops->dequeue(child);
+ if (unlikely(!skb))
+ continue;
+
+ qdisc_bstats_update(sch, skb);
+ qdisc_qstats_backlog_dec(sch, skb);
+ sch->q.qlen--;
+
+ return skb;
+ }
+
+ return NULL;
+}
+
+static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
+{
+ struct taprio_sched *q = qdisc_priv(sch);
+
+ return q->dequeue(sch);
+}
+
static bool should_restart_cycle(const struct sched_gate_list *oper,
const struct sched_entry *entry)
{
@@ -932,6 +1016,9 @@ static void taprio_start_sched(struct Qdisc *sch,
struct taprio_sched *q = qdisc_priv(sch);
ktime_t expires;
+ if (FULL_OFFLOAD_IS_ENABLED(q->flags))
+ return;
+
expires = hrtimer_get_expires(&q->advance_timer);
if (expires == 0)
expires = KTIME_MAX;
@@ -1011,6 +1098,254 @@ static void setup_txtime(struct taprio_sched *q,
}
}
+static struct tc_taprio_qopt_offload *taprio_offload_alloc(int num_entries)
+{
+ size_t size = sizeof(struct tc_taprio_sched_entry) * num_entries +
+ sizeof(struct __tc_taprio_qopt_offload);
+ struct __tc_taprio_qopt_offload *__offload;
+
+ __offload = kzalloc(size, GFP_KERNEL);
+ if (!__offload)
+ return NULL;
+
+ refcount_set(&__offload->users, 1);
+
+ return &__offload->offload;
+}
+
+struct tc_taprio_qopt_offload *taprio_offload_get(struct tc_taprio_qopt_offload
+ *offload)
+{
+ struct __tc_taprio_qopt_offload *__offload;
+
+ __offload = container_of(offload, struct __tc_taprio_qopt_offload,
+ offload);
+
+ refcount_inc(&__offload->users);
+
+ return offload;
+}
+EXPORT_SYMBOL_GPL(taprio_offload_get);
+
+void taprio_offload_free(struct tc_taprio_qopt_offload *offload)
+{
+ struct __tc_taprio_qopt_offload *__offload;
+
+ __offload = container_of(offload, struct __tc_taprio_qopt_offload,
+ offload);
+
+ if (!refcount_dec_and_test(&__offload->users))
+ return;
+
+ kfree(__offload);
+}
+EXPORT_SYMBOL_GPL(taprio_offload_free);
+
+/* The function will only serve to keep the pointers to the "oper" and "admin"
+ * schedules valid in relation to their base times, so when calling dump() the
+ * users looks at the right schedules.
+ * When using full offload, the admin configuration is promoted to oper at the
+ * base_time in the PHC time domain. But because the system time is not
+ * necessarily in sync with that, we can't just trigger a hrtimer to call
+ * switch_schedules at the right hardware time.
+ * At the moment we call this by hand right away from taprio, but in the future
+ * it will be useful to create a mechanism for drivers to notify taprio of the
+ * offload state (PENDING, ACTIVE, INACTIVE) so it can be visible in dump().
+ * This is left as TODO.
+ */
+void taprio_offload_config_changed(struct taprio_sched *q)
+{
+ struct sched_gate_list *oper, *admin;
+
+ spin_lock(&q->current_entry_lock);
+
+ oper = rcu_dereference_protected(q->oper_sched,
+ lockdep_is_held(&q->current_entry_lock));
+ admin = rcu_dereference_protected(q->admin_sched,
+ lockdep_is_held(&q->current_entry_lock));
+
+ switch_schedules(q, &admin, &oper);
+
+ spin_unlock(&q->current_entry_lock);
+}
+
+static void taprio_sched_to_offload(struct taprio_sched *q,
+ struct sched_gate_list *sched,
+ const struct tc_mqprio_qopt *mqprio,
+ struct tc_taprio_qopt_offload *offload)
+{
+ struct sched_entry *entry;
+ int i = 0;
+
+ offload->base_time = sched->base_time;
+ offload->cycle_time = sched->cycle_time;
+ offload->cycle_time_extension = sched->cycle_time_extension;
+
+ list_for_each_entry(entry, &sched->entries, list) {
+ struct tc_taprio_sched_entry *e = &offload->entries[i];
+
+ e->command = entry->command;
+ e->interval = entry->interval;
+ e->gate_mask = entry->gate_mask;
+ i++;
+ }
+
+ offload->num_entries = i;
+}
+
+static int taprio_enable_offload(struct net_device *dev,
+ struct tc_mqprio_qopt *mqprio,
+ struct taprio_sched *q,
+ struct sched_gate_list *sched,
+ struct netlink_ext_ack *extack)
+{
+ const struct net_device_ops *ops = dev->netdev_ops;
+ struct tc_taprio_qopt_offload *offload;
+ int err = 0;
+
+ if (!ops->ndo_setup_tc) {
+ NL_SET_ERR_MSG(extack,
+ "Device does not support taprio offload");
+ return -EOPNOTSUPP;
+ }
+
+ offload = taprio_offload_alloc(sched->num_entries);
+ if (!offload) {
+ NL_SET_ERR_MSG(extack,
+ "Not enough memory for enabling offload mode");
+ return -ENOMEM;
+ }
+ offload->enable = 1;
+ taprio_sched_to_offload(q, sched, mqprio, offload);
+
+ err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TAPRIO, offload);
+ if (err < 0) {
+ NL_SET_ERR_MSG(extack,
+ "Device failed to setup taprio offload");
+ goto done;
+ }
+
+ taprio_offload_config_changed(q);
+
+done:
+ taprio_offload_free(offload);
+
+ return err;
+}
+
+static int taprio_disable_offload(struct net_device *dev,
+ struct taprio_sched *q,
+ struct netlink_ext_ack *extack)
+{
+ const struct net_device_ops *ops = dev->netdev_ops;
+ struct tc_taprio_qopt_offload *offload;
+ int err;
+
+ if (!FULL_OFFLOAD_IS_ENABLED(q->flags))
+ return 0;
+
+ if (!ops->ndo_setup_tc)
+ return -EOPNOTSUPP;
+
+ offload = taprio_offload_alloc(0);
+ if (!offload) {
+ NL_SET_ERR_MSG(extack,
+ "Not enough memory to disable offload mode");
+ return -ENOMEM;
+ }
+ offload->enable = 0;
+
+ err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TAPRIO, offload);
+ if (err < 0) {
+ NL_SET_ERR_MSG(extack,
+ "Device failed to disable offload");
+ goto out;
+ }
+
+out:
+ taprio_offload_free(offload);
+
+ return err;
+}
+
+/* If full offload is enabled, the only possible clockid is the net device's
+ * PHC. For that reason, specifying a clockid through netlink is incorrect.
+ * For txtime-assist, it is implicitly assumed that the device's PHC is kept
+ * in sync with the specified clockid via a user space daemon such as phc2sys.
+ * For both software taprio and txtime-assist, the clockid is used for the
+ * hrtimer that advances the schedule and hence mandatory.
+ */
+static int taprio_parse_clockid(struct Qdisc *sch, struct nlattr **tb,
+ struct netlink_ext_ack *extack)
+{
+ struct taprio_sched *q = qdisc_priv(sch);
+ struct net_device *dev = qdisc_dev(sch);
+ int err = -EINVAL;
+
+ if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
+ const struct ethtool_ops *ops = dev->ethtool_ops;
+ struct ethtool_ts_info info = {
+ .cmd = ETHTOOL_GET_TS_INFO,
+ .phc_index = -1,
+ };
+
+ if (tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]) {
+ NL_SET_ERR_MSG(extack,
+ "The 'clockid' cannot be specified for full offload");
+ goto out;
+ }
+
+ if (ops && ops->get_ts_info)
+ err = ops->get_ts_info(dev, &info);
+
+ if (err || info.phc_index < 0) {
+ NL_SET_ERR_MSG(extack,
+ "Device does not have a PTP clock");
+ err = -ENOTSUPP;
+ goto out;
+ }
+ } else if (tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]) {
+ int clockid = nla_get_s32(tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]);
+
+ /* We only support static clockids and we don't allow
+ * for it to be modified after the first init.
+ */
+ if (clockid < 0 ||
+ (q->clockid != -1 && q->clockid != clockid)) {
+ NL_SET_ERR_MSG(extack,
+ "Changing the 'clockid' of a running schedule is not supported");
+ err = -ENOTSUPP;
+ goto out;
+ }
+
+ switch (clockid) {
+ case CLOCK_REALTIME:
+ q->tk_offset = TK_OFFS_REAL;
+ break;
+ case CLOCK_MONOTONIC:
+ q->tk_offset = TK_OFFS_MAX;
+ break;
+ case CLOCK_BOOTTIME:
+ q->tk_offset = TK_OFFS_BOOT;
+ break;
+ case CLOCK_TAI:
+ q->tk_offset = TK_OFFS_TAI;
+ break;
+ default:
+ NL_SET_ERR_MSG(extack, "Invalid 'clockid'");
+ err = -EINVAL;
+ goto out;
+ }
+
+ q->clockid = clockid;
+ } else {
+ NL_SET_ERR_MSG(extack, "Specifying a 'clockid' is mandatory");
+ goto out;
+ }
+out:
+ return err;
+}
+
static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
struct netlink_ext_ack *extack)
{
@@ -1020,9 +1355,9 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
struct net_device *dev = qdisc_dev(sch);
struct tc_mqprio_qopt *mqprio = NULL;
u32 taprio_flags = 0;
- int i, err, clockid;
unsigned long flags;
ktime_t start;
+ int i, err;
err = nla_parse_nested_deprecated(tb, TCA_TAPRIO_ATTR_MAX, opt,
taprio_policy, extack);
@@ -1038,7 +1373,7 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
if (q->flags != 0 && q->flags != taprio_flags) {
NL_SET_ERR_MSG_MOD(extack, "Changing 'flags' of a running schedule is not supported");
return -EOPNOTSUPP;
- } else if (!FLAGS_VALID(taprio_flags)) {
+ } else if (!taprio_flags_valid(taprio_flags)) {
NL_SET_ERR_MSG_MOD(extack, "Specified 'flags' are not valid");
return -EINVAL;
}
@@ -1078,30 +1413,19 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
goto free_sched;
}
- if (tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]) {
- clockid = nla_get_s32(tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]);
-
- /* We only support static clockids and we don't allow
- * for it to be modified after the first init.
- */
- if (clockid < 0 ||
- (q->clockid != -1 && q->clockid != clockid)) {
- NL_SET_ERR_MSG(extack, "Changing the 'clockid' of a running schedule is not supported");
- err = -ENOTSUPP;
- goto free_sched;
- }
-
- q->clockid = clockid;
- }
-
- if (q->clockid == -1 && !tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]) {
- NL_SET_ERR_MSG(extack, "Specifying a 'clockid' is mandatory");
- err = -EINVAL;
+ err = taprio_parse_clockid(sch, tb, extack);
+ if (err < 0)
goto free_sched;
- }
taprio_set_picos_per_byte(dev, q);
+ if (FULL_OFFLOAD_IS_ENABLED(taprio_flags))
+ err = taprio_enable_offload(dev, mqprio, q, new_admin, extack);
+ else
+ err = taprio_disable_offload(dev, q, extack);
+ if (err)
+ goto free_sched;
+
/* Protects against enqueue()/dequeue() */
spin_lock_bh(qdisc_lock(sch));
@@ -1116,6 +1440,7 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
}
if (!TXTIME_ASSIST_IS_ENABLED(taprio_flags) &&
+ !FULL_OFFLOAD_IS_ENABLED(taprio_flags) &&
!hrtimer_active(&q->advance_timer)) {
hrtimer_init(&q->advance_timer, q->clockid, HRTIMER_MODE_ABS);
q->advance_timer.function = advance_sched;
@@ -1134,23 +1459,15 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
mqprio->prio_tc_map[i]);
}
- switch (q->clockid) {
- case CLOCK_REALTIME:
- q->tk_offset = TK_OFFS_REAL;
- break;
- case CLOCK_MONOTONIC:
- q->tk_offset = TK_OFFS_MAX;
- break;
- case CLOCK_BOOTTIME:
- q->tk_offset = TK_OFFS_BOOT;
- break;
- case CLOCK_TAI:
- q->tk_offset = TK_OFFS_TAI;
- break;
- default:
- NL_SET_ERR_MSG(extack, "Invalid 'clockid'");
- err = -EINVAL;
- goto unlock;
+ if (FULL_OFFLOAD_IS_ENABLED(taprio_flags)) {
+ q->dequeue = taprio_dequeue_offload;
+ q->peek = taprio_peek_offload;
+ } else {
+ /* Be sure to always keep the function pointers
+ * in a consistent state.
+ */
+ q->dequeue = taprio_dequeue_soft;
+ q->peek = taprio_peek_soft;
}
err = taprio_get_start_time(sch, new_admin, &start);
@@ -1212,6 +1529,8 @@ static void taprio_destroy(struct Qdisc *sch)
hrtimer_cancel(&q->advance_timer);
+ taprio_disable_offload(dev, q, NULL);
+
if (q->qdiscs) {
for (i = 0; i < dev->num_tx_queues && q->qdiscs[i]; i++)
qdisc_put(q->qdiscs[i]);
@@ -1241,6 +1560,9 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
hrtimer_init(&q->advance_timer, CLOCK_TAI, HRTIMER_MODE_ABS);
q->advance_timer.function = advance_sched;
+ q->dequeue = taprio_dequeue_soft;
+ q->peek = taprio_peek_soft;
+
q->root = sch;
/* We only support static clockids. Use an invalid value as default
@@ -1423,7 +1745,8 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
if (nla_put(skb, TCA_TAPRIO_ATTR_PRIOMAP, sizeof(opt), &opt))
goto options_error;
- if (nla_put_s32(skb, TCA_TAPRIO_ATTR_SCHED_CLOCKID, q->clockid))
+ if (!FULL_OFFLOAD_IS_ENABLED(q->flags) &&
+ nla_put_s32(skb, TCA_TAPRIO_ATTR_SCHED_CLOCKID, q->clockid))
goto options_error;
if (q->flags && nla_put_u32(skb, TCA_TAPRIO_ATTR_FLAGS, q->flags))
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 4/7] net: dsa: sja1105: Advertise the 8 TX queues
From: Vladimir Oltean @ 2019-09-15 1:53 UTC (permalink / raw)
To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
vedang.patel, richardcochran
Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
jhs, xiyou.wangcong, kurt.kanzenbach, joergen.andreasen, netdev,
Vladimir Oltean
In-Reply-To: <20190915015314.26605-1-olteanv@gmail.com>
This is a preparation patch for the tc-taprio offload (and potentially
for other future offloads such as tc-mqprio).
Instead of looking directly at skb->priority during xmit, let's get the
netdev queue and the queue-to-traffic-class mapping, and put the
resulting traffic class into the dsa_8021q PCP field. The switch is
configured with a 1-to-1 PCP-to-ingress-queue-to-egress-queue mapping
(see vlan_pmap in sja1105_main.c), so the effect is that we can inject
into a front-panel's egress traffic class through VLAN tagging from
Linux, completely transparently.
Unfortunately the switch doesn't look at the VLAN PCP in the case of
management traffic to/from the CPU (link-local frames at
01-80-C2-xx-xx-xx or 01-1B-19-xx-xx-xx) so we can't alter the
transmission queue of this type of traffic on a frame-by-frame basis. It
is only selected through the "hostprio" setting which ATM is harcoded in
the driver to 7.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
drivers/net/dsa/sja1105/sja1105_main.c | 7 ++++++-
net/dsa/tag_sja1105.c | 3 ++-
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index d8cff0107ec4..108f62c27c28 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -384,7 +384,9 @@ static int sja1105_init_general_params(struct sja1105_private *priv)
/* Disallow dynamic changing of the mirror port */
.mirr_ptacu = 0,
.switchid = priv->ds->index,
- /* Priority queue for link-local frames trapped to CPU */
+ /* Priority queue for link-local management frames
+ * (both ingress to and egress from CPU - PTP, STP etc)
+ */
.hostprio = 7,
.mac_fltres1 = SJA1105_LINKLOCAL_FILTER_A,
.mac_flt1 = SJA1105_LINKLOCAL_FILTER_A_MASK,
@@ -1711,6 +1713,9 @@ static int sja1105_setup(struct dsa_switch *ds)
*/
ds->vlan_filtering_is_global = true;
+ /* Advertise the 8 egress queues */
+ ds->num_tx_queues = SJA1105_NUM_TC;
+
/* The DSA/switchdev model brings up switch ports in standalone mode by
* default, and that means vlan_filtering is 0 since they're not under
* a bridge, so it's safe to set up switch tagging at this time.
diff --git a/net/dsa/tag_sja1105.c b/net/dsa/tag_sja1105.c
index 47ee88163a9d..9c9aff3e52cf 100644
--- a/net/dsa/tag_sja1105.c
+++ b/net/dsa/tag_sja1105.c
@@ -89,7 +89,8 @@ static struct sk_buff *sja1105_xmit(struct sk_buff *skb,
struct dsa_port *dp = dsa_slave_to_port(netdev);
struct dsa_switch *ds = dp->ds;
u16 tx_vid = dsa_8021q_tx_vid(ds, dp->index);
- u8 pcp = skb->priority;
+ u16 queue_mapping = skb_get_queue_mapping(skb);
+ u8 pcp = netdev_txq_to_tc(netdev, queue_mapping);
/* Transmitting management traffic does not rely upon switch tagging,
* but instead SPI-installed management routes. Part 2 of this
--
2.17.1
^ permalink raw reply related
* [PATCH v3 net-next 5/7] net: dsa: sja1105: Make HOSTPRIO a kernel config
From: Vladimir Oltean @ 2019-09-15 1:53 UTC (permalink / raw)
To: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
vedang.patel, richardcochran
Cc: weifeng.voon, jiri, m-karicheri2, Jose.Abreu, ilias.apalodimas,
jhs, xiyou.wangcong, kurt.kanzenbach, joergen.andreasen, netdev,
Vladimir Oltean
In-Reply-To: <20190915015314.26605-1-olteanv@gmail.com>
Unfortunately with this hardware, there is no way to transmit in-band
QoS hints with management frames (i.e. VLAN PCP is ignored). The traffic
class for these is fixed in the static config (which in turn requires a
reset to change).
With the new ability to add time gates for individual traffic classes,
there is a real danger that the user might unknowingly turn off the
traffic class for PTP, BPDUs, LLDP etc.
So we need to manage this situation the best we can. There isn't any
knob in Linux for this, and changing it at runtime probably isn't worth
it either. So just make the setting loud enough by promoting it to a
Kconfig, which the user can customize to their particular setup.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
drivers/net/dsa/sja1105/Kconfig | 9 +++++++++
drivers/net/dsa/sja1105/sja1105_main.c | 2 +-
2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/sja1105/Kconfig b/drivers/net/dsa/sja1105/Kconfig
index 770134a66e48..e26666b1ecb9 100644
--- a/drivers/net/dsa/sja1105/Kconfig
+++ b/drivers/net/dsa/sja1105/Kconfig
@@ -17,6 +17,15 @@ tristate "NXP SJA1105 Ethernet switch family support"
- SJA1105R (Gen. 2, SGMII, No TT-Ethernet)
- SJA1105S (Gen. 2, SGMII, TT-Ethernet)
+config NET_DSA_SJA1105_HOSTPRIO
+ int "Traffic class for management traffic"
+ range 0 7
+ default 7
+ depends on NET_DSA_SJA1105
+ help
+ Configure the traffic class which will be used for management
+ (link-local) traffic sent and received over switch ports.
+
config NET_DSA_SJA1105_PTP
bool "Support for the PTP clock on the NXP SJA1105 Ethernet switch"
depends on NET_DSA_SJA1105
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index 108f62c27c28..0b0205abc3d2 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -387,7 +387,7 @@ static int sja1105_init_general_params(struct sja1105_private *priv)
/* Priority queue for link-local management frames
* (both ingress to and egress from CPU - PTP, STP etc)
*/
- .hostprio = 7,
+ .hostprio = CONFIG_NET_DSA_SJA1105_HOSTPRIO,
.mac_fltres1 = SJA1105_LINKLOCAL_FILTER_A,
.mac_flt1 = SJA1105_LINKLOCAL_FILTER_A_MASK,
.incl_srcpt1 = false,
--
2.17.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox