* [PATCH v2] ipv4: make tcp_notsent_lowat sysctl knob behave as true unsigned int
From: Pavel Tikhomirov @ 2017-01-09 7:45 UTC (permalink / raw)
To: David S . Miller
Cc: Eric Dumazet, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
Patrick McHardy, netdev, linux-kernel, Konstantin Khorenko,
Pavel Tikhomirov
In-Reply-To: <20161230.152325.1460360247883491150.davem@davemloft.net>
> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-1
> echo 4294967295 > /proc/sys/net/ipv4/tcp_notsent_lowat
-bash: echo: write error: Invalid argument
> echo -2147483648 > /proc/sys/net/ipv4/tcp_notsent_lowat
> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-2147483648
but in documentation we have "tcp_notsent_lowat - UNSIGNED INTEGER"
v2: simplify to just proc_douintvec
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
net/ipv4/sysctl_net_ipv4.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 80bc36b..566cfc5 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -958,7 +958,7 @@ static struct ctl_table ipv4_net_table[] = {
.data = &init_net.ipv4.sysctl_tcp_notsent_lowat,
.maxlen = sizeof(unsigned int),
.mode = 0644,
- .proc_handler = proc_dointvec,
+ .proc_handler = proc_douintvec,
},
#ifdef CONFIG_IP_ROUTE_MULTIPATH
{
--
2.9.3
^ permalink raw reply related
* Re: [PATCH net-next v2] net: dsa: make "label" property optional for dsa2
From: Jiri Pirko @ 2017-01-09 7:32 UTC (permalink / raw)
To: Vivien Didelot
Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli,
Andrew Lunn, Uwe Kleine-König, Andrey Smirnov
In-Reply-To: <20170108231552.26995-1-vivien.didelot@savoirfairelinux.com>
Mon, Jan 09, 2017 at 12:15:52AM CET, vivien.didelot@savoirfairelinux.com wrote:
>In the new DTS bindings for DSA (dsa2), the "ethernet" and "link"
>phandles are respectively mandatory and exclusive to CPU port and DSA
>link device tree nodes.
>
>Simplify dsa2.c a bit by checking the presence of such phandle instead
>of checking the redundant "label" property.
>
>Then the Linux philosophy for Ethernet switch ports is to expose them to
>userspace as standard NICs by default. Thus use the standard enumerated
>"eth%d" device name if no "label" property is provided for a user port.
>This allows to save DTS files from subjective net device names.
>
>Here's an example on a ZII Dev Rev B board without "label" properties:
>
> # ip link | grep ': ' | cut -d: -f2
> lo
> eth0
> eth1
> eth2@eth1
> eth3@eth1
> eth4@eth1
> eth5@eth1
> eth6@eth1
> eth7@eth1
> eth8@eth1
> eth9@eth1
> eth10@eth1
> eth11@eth1
> eth12@eth1
>
>If one wants to rename an interface, udev rules can be used as usual, as
>suggested in the switchdev documentation:
>
> # cat /etc/udev/rules.d/90-net-dsa.rules
> SUBSYSTEM=="net", ACTION=="add", ENV{DEVTYPE}=="dsa", NAME="sw$attr{phys_switch_id}p$attr{phys_port_id}"
>
> # ip link | awk '/@eth/ { split($2,a,"@"); print a[1]; }'
> sw00000000p00
> sw00000000p01
> sw00000000p02
> sw01000000p00
> sw01000000p01
> sw01000000p02
> sw02000000p00
> sw02000000p01
> sw02000000p02
> sw02000000p03
> sw02000000p04
>
>Until the printing of netdev_phys_item_id structures is fixed in
>net/core/net-sysfs.c, an external helper can be used like this:
>
> # cat /etc/udev/rules.d/90-net-dsa.rules
> SUBSYSTEM=="net", ACTION=="add", ENV{DEVTYPE}=="dsa", PROGRAM="/lib/udev/dsanitizer $attr{phys_switch_id} $attr{phys_port_id}", NAME="$result"
I know this is kind of confusing, but phys_port_id is to be used to
indicate same physical port that is shared by multiple netdevices- for
example sr-iov usecase. For switchdev usecase, you should use
phys_port_name.
I will add some documentation to kernel regarding this. But I see that
net/dsa/slave.c already implements .ndo_get_phys_port_id :(
I recently made changes in udev so it names the switch ports according
to phys_port_name, out of the box, without need for any rules:
https://github.com/systemd/systemd/pull/4506/commits/c960caa0c2a620fc506c6f0f7b6c40eeace48e4d
I guess that it should be enough for you to implement
ndo_get_phys_port_name.
>
> # cat /lib/udev/dsanitizer
> #!/bin/sh
> echo $1 | sed -e 's,^0*,,' -e 's,0*$,,' | xargs printf sw%d
> echo $2 | sed -e 's,^0*,,' | xargs printf p%d
>
> # ip link | awk '/@eth/ { split($2,a,"@"); print a[1]; }'
> sw0p0
> sw0p1
> sw0p2
> sw1p0
> sw1p1
> sw1p2
> sw2p0
> sw2p1
> sw2p2
> sw2p3
> sw2p4
>
>Of course the current behavior is unchanged, and the optional "label"
>property for user ports has precedence over the enumerated name.
>
>Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
>Acked-by: Uwe Kleine-König <uwe@kleine-koenig.org>
>---
> Documentation/devicetree/bindings/net/dsa/dsa.txt | 20 ++++++++-----------
> net/dsa/dsa2.c | 24 ++++-------------------
> 2 files changed, 12 insertions(+), 32 deletions(-)
>
>diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt b/Documentation/devicetree/bindings/net/dsa/dsa.txt
>index a4a570fb2494..cfe8f64eca4f 100644
>--- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
>+++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
>@@ -34,13 +34,9 @@ Required properties:
>
> Each port children node must have the following mandatory properties:
> - reg : Describes the port address in the switch
>-- label : Describes the label associated with this port, which
>- will become the netdev name. Special labels are
>- "cpu" to indicate a CPU port and "dsa" to
>- indicate an uplink/downlink port between switches in
>- the cluster.
>
>-A port labelled "dsa" has the following mandatory property:
>+An uplink/downlink port between switches in the cluster has the following
>+mandatory property:
>
> - link : Should be a list of phandles to other switch's DSA
> port. This port is used as the outgoing port
>@@ -48,12 +44,17 @@ A port labelled "dsa" has the following mandatory property:
> information must be given, not just the one hop
> routes to neighbouring switches.
>
>-A port labelled "cpu" has the following mandatory property:
>+A CPU port has the following mandatory property:
>
> - ethernet : Should be a phandle to a valid Ethernet device node.
> This host device is what the switch port is
> connected to.
>
>+A user port has the following optional property:
>+
>+- label : Describes the label associated with this port, which
>+ will become the netdev name.
>+
> Port child nodes may also contain the following optional standardised
> properties, described in binding documents:
>
>@@ -107,7 +108,6 @@ linked into one DSA cluster.
>
> switch0port5: port@5 {
> reg = <5>;
>- label = "dsa";
> phy-mode = "rgmii-txid";
> link = <&switch1port6
> &switch2port9>;
>@@ -119,7 +119,6 @@ linked into one DSA cluster.
>
> port@6 {
> reg = <6>;
>- label = "cpu";
> ethernet = <&fec1>;
> fixed-link {
> speed = <100>;
>@@ -165,7 +164,6 @@ linked into one DSA cluster.
>
> switch1port5: port@5 {
> reg = <5>;
>- label = "dsa";
> link = <&switch2port9>;
> phy-mode = "rgmii-txid";
> fixed-link {
>@@ -176,7 +174,6 @@ linked into one DSA cluster.
>
> switch1port6: port@6 {
> reg = <6>;
>- label = "dsa";
> phy-mode = "rgmii-txid";
> link = <&switch0port5>;
> fixed-link {
>@@ -255,7 +252,6 @@ linked into one DSA cluster.
>
> switch2port9: port@9 {
> reg = <9>;
>- label = "dsa";
> phy-mode = "rgmii-txid";
> link = <&switch1port5
> &switch0port5>;
>diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
>index bad119cee2a3..9526bdf2a34a 100644
>--- a/net/dsa/dsa2.c
>+++ b/net/dsa/dsa2.c
>@@ -81,30 +81,12 @@ static void dsa_dst_del_ds(struct dsa_switch_tree *dst,
>
> static bool dsa_port_is_dsa(struct device_node *port)
> {
>- const char *name;
>-
>- name = of_get_property(port, "label", NULL);
>- if (!name)
>- return false;
>-
>- if (!strcmp(name, "dsa"))
>- return true;
>-
>- return false;
>+ return !!of_parse_phandle(port, "link", 0);
> }
>
> static bool dsa_port_is_cpu(struct device_node *port)
> {
>- const char *name;
>-
>- name = of_get_property(port, "label", NULL);
>- if (!name)
>- return false;
>-
>- if (!strcmp(name, "cpu"))
>- return true;
>-
>- return false;
>+ return !!of_parse_phandle(port, "ethernet", 0);
> }
>
> static bool dsa_ds_find_port(struct dsa_switch *ds,
>@@ -268,6 +250,8 @@ static int dsa_user_port_apply(struct device_node *port, u32 index,
> int err;
>
> name = of_get_property(port, "label", NULL);
>+ if (!name)
>+ name = "eth%d";
>
> err = dsa_slave_create(ds, ds->dev, index, name);
> if (err) {
>--
>2.11.0
>
^ permalink raw reply
* Re: [PATCH 2/3] xen: modify xenstore watch event interface
From: Juergen Gross @ 2017-01-09 7:12 UTC (permalink / raw)
To: Boris Ostrovsky, linux-kernel, xen-devel
Cc: konrad.wilk, roger.pau, wei.liu2, paul.durrant, netdev
In-Reply-To: <c4a181ac-ca47-16bd-5b3d-ea25e413355f@oracle.com>
On 06/01/17 22:57, Boris Ostrovsky wrote:
> On 01/06/2017 10:05 AM, Juergen Gross wrote:
>> Today a Xenstore watch event is delivered via a callback function
>> declared as:
>>
>> void (*callback)(struct xenbus_watch *,
>> const char **vec, unsigned int len);
>>
>> As all watch events only ever come with two parameters (path and token)
>> changing the prototype to:
>>
>> void (*callback)(struct xenbus_watch *,
>> const char *path, const char *token);
>>
>> is the natural thing to do.
>>
>> Apply this change and adapt all users.
>>
>> Cc: konrad.wilk@oracle.com
>> Cc: roger.pau@citrix.com
>> Cc: wei.liu2@citrix.com
>> Cc: paul.durrant@citrix.com
>> Cc: netdev@vger.kernel.org
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>
>
>>
>> @@ -903,24 +902,24 @@ static int process_msg(void)
>> body[msg->hdr.len] = '\0';
>>
>> if (msg->hdr.type == XS_WATCH_EVENT) {
>> - msg->u.watch.vec = split(body, msg->hdr.len,
>> - &msg->u.watch.vec_size);
>> - if (IS_ERR(msg->u.watch.vec)) {
>> - err = PTR_ERR(msg->u.watch.vec);
>> + if (count_strings(body, msg->hdr.len) != 2) {
>> + err = -EINVAL;
>
> xenbus_write_watch() returns -EILSEQ when this type of error is
> encountered so perhaps for we should return the same error here.
Not since 9a6161fe73bdd3ae4a1e18421b0b20cb7141f680. :-)
>
> Either way
>
> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Thanks,
Juergen
^ permalink raw reply
* UNSUBSCIBE
From: Vink, Ronald @ 2017-01-09 6:56 UTC (permalink / raw)
To: netfilter-devel@vger.kernel.org
Cc: netdev@vger.kernel.org, netfilter@vger.kernel.org,
netfilter-announce@lists.netfilter.org
-----Original Message-----
From: netfilter-announce [mailto:netfilter-announce-bounces@lists.netfilter.org] On Behalf Of Pablo Neira Ayuso
Sent: dinsdag 20 december 2016 21:47
To: netfilter-devel@vger.kernel.org
Cc: lwn@lwn.net; netdev@vger.kernel.org; netfilter@vger.kernel.org; netfilter-announce@lists.netfilter.org
Subject: [ANNOUNCE] nftables 0.7 release
Hi!
The Netfilter project proudly presents:
nftables 0.7
This release contains many accumulated bug fixes and new features available up to the (upcoming) Linux 4.10-rc1 kernel release.
* Facilitate migration from iptables to nftables:
At compilation time, you have to pass this option.
# ./configure --with-xtables
And libxtables needs to be installed in your system. This allows you
to list a ruleset containing xt extensions loaded through
iptables-compat-restore tool. The nft tool provides a native
translation for iptables extensions (if available).
* Add new fib expression, which can be used to obtain the output
interface from the route table based on either source or destination
address of a packet. This can be used to e.g. add reverse path
filtering, eg. drop if not coming from the same interface packet
arrived on:
# nft add rule x prerouting fib saddr . iif oif eq 0 drop
Accept only if from eth:
# nft add rule x prerouting fib saddr . iif oif eq "eth0" accept
Accept if from any valid interface:
# nft add rule x prerouting fib saddr oif accept
Querying of address type is also supported, this can be used
to only accept packets to addresses configured in the same
interface, eg.
# nft add rule x prerouting fib daddr . iif type local accept
Its also possible to use mark and verdict map, eg,
# nft add rule x prerouting \
meta mark set 0xdead fib daddr . mark type vmap {
blackhole : drop,
prohibit : drop,
unicast : accept
}
* Support hashing of any arbitrary key combination, eg.
# nft add rule x y \
dnat to jhash ip saddr . tcp dport mod 2 map { \
0 : 192.168.20.100, \
1 : 192.168.30.100 \
}
Another usecase: Set packet marks based on any arbitrary hashing.
* Add number generation support. Useful for round-robin packet mark
setting, eg.
# nft add rule filter prerouting meta mark set numgen inc mod 2
You can also specify an offset to indicate from what value you want
to start from.
The modulus provides the scale of the counting sequence. You can
also use this from maps, eg.
# nft add rule nat prerouting \
dnat to numgen inc mod 2 map { 0 : 192.168.10.100, 1 : 192.168.20.200 }
So this is distributing new connections in a round-robin fashion
between 192.168.10.100 and 192.168.20.200. Don't forget the special NAT
chain semantics: Only the first packet evaluates the rule, follow up
packets rely on conntrack to apply the NAT information.
You can also emulate flow distribution with different backend weights
using intervals, eg.
# nft add rule nat prerouting \
dnat to numgen inc mod 10 map { 0-5 : 192.168.10.100, 6-9 : 192.168.20.200 }
* Add quota support, eg.
# nft add rule filter input \
flow table http { ip saddr timeout 60s quota over 50 mbytes } drop
This creates a flow table, where every flow gets a quota of 50
mbytes. You can also from use simple rules too to enforce quotas, of
course.
* Introduce routing expression, for routing related data with support
for nexthop (i.e. the directly connected IP address that an outgoing
packet is sent to), which can be used either for matching or accounting, eg.
# nft add rule filter postrouting \
ip daddr 192.168.1.0/24 rt nexthop != 192.168.0.1 drop
This will drop any traffic to 192.168.1.0/24 that is not routed via
192.168.0.1.
# nft add rule filter postrouting \
flow table acct { rt nexthop timeout 600s counter }
# nft add rule ip6 filter postrouting \
flow table acct { rt nexthop timeout 600s counter }
These rules count outgoing traffic per nexthop. Note that the timeout
releases an entry if no traffic is seen for this nexthop within 10
minutes.
* Notrack support, to explicitly skip connection tracking for matching
packets, eg.
# nft add rule ip raw prerouting tcp dport { 80, 443 } notrack
So you can skip tracking for http and https traffic.
* Support to set non-byte bound packet header fields, including
checksum adjustment, eg. ip6 ecn set 1.
* Add 'create set' and 'create element' commands, eg.
# nft add set x y { type ipv4_addr\; }
# nft create set x y { type ipv4_addr\; }
<cmdline>:1:1-35: Error: Could not process rule: File exists
create set x y { type ipv4_addr; }
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# nft add set x y { type ipv4_addr\; }
#
So 'create' bails out if the set already exists, while 'add'
doesn't, for more ergonomic usage as several users requested on
the mailing list.
* Allow to use variable reference for set element definitions, eg.
# cat ruleset.nft
define s-ext-2-int = { 10.10.10.10 . 25, 10.10.10.10 . 143 }
table inet forward {
set s-ext-2-int {
type ipv4_addr . inet_service
elements = $s-ext-2-int
}
}
# nft -f ruleset.nft
Useful to improve ruleset maintainability, as you can split out
variable and set definitions from the filtering policy itself.
* Allow to use variable definitions from element commands, eg.
define whitelist_v4 = { 1.1.1.1 }
table inet filter {
set whitelist_v4 { type ipv4_addr; }
}
add element inet filter whitelist_v4 $whitelist_v4
* Add support to flush set. You can use this new command to remove all
existing elements in a set, eg.
# nft flush set filter xyz
Note that this requires (upcoming) Linux kernel 4.10-rc versions.
* Inverted set lookups, eg. tcp dport != { 80, 443 }.
* Honor absolute and relative paths via include file, where:
include "./ruleset.nft"
refers to a file in the working directory.
include "ruleset.nft"
refers to a file in the nftables root path (via sysconfdir), and:
include "/etc/nftables/ruleset.nft"
provides an absolute reference to the file that need to be included.
This also solves an ambiguity if the same file name is used both under
sysconfdir and the current working directory.
* Support log flags, to enable logging TCP sequence and options:
# nft add rule x y log flags tcp sequence,options
... IP options, eg:
# nft add rule x y log flags ip options
... socket UID, eg.
# nft add rule x y log flags skuid
... decide ethernet link layer address, eg.
# nft add rule x y log flags ether
... or simply set on all flags:
# nft add rule x y log flags all
* tc classid parser support, eg.
nft add rule filter forward meta priority abcd:1234
* Allow numeric connlabels, so if connlabel still works with undefined
labels, eg. ct label set 2.
* Document log, reject, counter, meta, limit, nat, ct, payload and
queue statements from nft(8) manpage.
Bugfixes
========
Not strictly limited to this list below, but some highlights:
* Allow split table definitions, eg.
# cat ruleset.nft
table inet filter {
chain ssh {
type filter hook input priority 0; policy accept;
tcp dport ssh accept;
}
}
table inet filter {
chain input {
type filter hook input priority 1; policy drop;
}
}
# nft -f ruleset.nft
* Use new range expression to represent inverted intervals, eg.
ip saddr != 1.1.1.1-2.2.2.2, since previously generated bytecode was
not correct.
* Solve endianness problems with link layer address.
* Fix parser to keep map flag around on definition.
* Skip timeout attribute in dynamic set updates, other kernel bails
out with EINVAL.
* Restore parsing of dynamic set element updates.
* The time datatype now uses milliseconds, as the kernel expects.
* Allow numeric interface index numbers, eg. in meta iif, oif.
* Fix monitor trace crash with netdev family.
* Flow table with concatenation fixes.
* Keep element comments around when using set intervals.
* Fixed memory corruption in userspace when deleting lots of elements
in one go via nft -f.
* Several nft internal cache fixes, including cache reset on 'flush
ruleset'.
* Restore parens on right-hand side of relational expression.
* Replace getnameinfo() by internal lookup table, so we don't rely on
/etc/services anymore for service names, so we restrict them to
a well-known set that is supported by our scanner. You can list
service names via 'nft describe tcp dport'.
* Display symbol table values in the right hostbyte order and
decimal/hexadecimal representation.
* Fix a nasty bug in the set interval code triggering huge memory
consumption in userspace for set and map intervals with runtime
updates.
We also got lots more tests added to our infrastructure to catch up regressions.
Syntax updates
==============
Several minor syntax updates, although previous syntax has been preserved by now to facilitate transition, the new one is prefered:
* Consistency grammar fixes: 'snat' and 'dnat' now require 'to', eg.
snat to 1.2.3.4. For consistency with existing statements such as
redirect, masquerade, dup and fwd. Moreover, add colon after 'to' in
'redirect' for consistency with nat and masq statements.
* Allow ct l3proto/protocol without direction since they are unrelated
to the direction.
* Explicit ruleset exportation, eg. nft export ruleset json, for
consistency with other existing ruleset commands.
* Always quote user-defined strings from rules when listing them.
* Support for RFC2732 IPv6 address format with brackets, eg.
dnat to [2001:838:35f:1::]:80
* Allow strings starting by underscores and dots in user-define
strings, conforming with POSIX.1-2008 (which is simultaneously IEEE
Std 1003.1-2008).
Resources
=========
The nftables code can be obtained from:
* http://netfilter.org/projects/nftables/downloads.html
* ftp://ftp.netfilter.org/pub/nftables
* git://git.netfilter.org/nftables
To build the code, libnftnl 1.0.7 and libmnl >= 1.0.2 are required:
* http://netfilter.org/projects/libnftnl/index.html
* http://netfilter.org/projects/libmnl/index.html
Visit our wikipage for user documentation at:
* http://wiki.nftables.org
For the manpage reference, check man(8) nft.
In case of bugs and feature request, file them via:
* https://bugzilla.netfilter.org
Make sure you create no duplicates already, thanks!
Happy holidays!
^ permalink raw reply
* Re: [PATCH net-next] net:add one common config ARCH_WANT_RELAX_ORDER to support relax ordering.
From: maowenan @ 2017-01-09 5:33 UTC (permalink / raw)
To: Alexander Duyck; +Cc: Netdev, Jeff Kirsher
In-Reply-To: <CAKgT0Uencn48k69UYt24B75O--5GL9pC-haYh4do4rxeN4pYgA@mail.gmail.com>
On 2017/1/6 23:41, Alexander Duyck wrote:
> On Fri, Jan 6, 2017 at 1:52 AM, Mao Wenan <maowenan@huawei.com> wrote:
>> Relax ordering(RO) is one feature of 82599 NIC, to enable this feature can
>> enhance the performance for some cpu architecure, such as SPARC and so on.
>> Currently it only supports one special cpu architecture(SPARC) in 82599
>> driver to enable RO feature, this is not very common for other cpu architecture
>> which really needs RO feature.
>> This patch add one common config CONFIG_ARCH_WANT_RELAX_ORDER to set RO feature,
>> and should define CONFIG_ARCH_WANT_RELAX_ORDER in sparc Kconfig firstly.
>>
>> Signed-off-by: Mao Wenan <maowenan@huawei.com>
>> ---
>> arch/sparc/Kconfig | 1 +
>> drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +-
>> 2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
>> index cf4034c..68ac5c7 100644
>> --- a/arch/sparc/Kconfig
>> +++ b/arch/sparc/Kconfig
>> @@ -44,6 +44,7 @@ config SPARC
>> select CPU_NO_EFFICIENT_FFS
>> select HAVE_ARCH_HARDENED_USERCOPY
>> select PROVE_LOCKING_SMALL if PROVE_LOCKING
>> + select ARCH_WANT_RELAX_ORDER
>>
>> config SPARC32
>> def_bool !64BIT
>
>
> I'm pretty sure this is incomplete. I think you need to add a couple
> lines to arch/Kconfig so that the config option itself is listed
> somewhere. You might look at using something like HAVE_CMPXCHG_DOUBLE
> as an example.
>
> - Alex
>
>
thank you for comments, i will send v2 patch soon.
^ permalink raw reply
* [PATCH v2 net-next] net:add one common config ARCH_WANT_RELAX_ORDER to support relax ordering.
From: Mao Wenan @ 2017-01-09 5:32 UTC (permalink / raw)
To: netdev, jeffrey.t.kirsher, alexander.duyck
Relax ordering(RO) is one feature of 82599 NIC, to enable this feature can
enhance the performance for some cpu architecure, such as SPARC and so on.
Currently it only supports one special cpu architecture(SPARC) in 82599
driver to enable RO feature, this is not very common for other cpu architecture
which really needs RO feature.
This patch add one common config CONFIG_ARCH_WANT_RELAX_ORDER to set RO feature,
and should define CONFIG_ARCH_WANT_RELAX_ORDER in sparc Kconfig firstly.
Signed-off-by: Mao Wenan <maowenan@huawei.com>
---
arch/Kconfig | 3 +++
arch/sparc/Kconfig | 1 +
drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +-
3 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index 99839c2..bd04eac 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -781,4 +781,7 @@ config VMAP_STACK
the stack to map directly to the KASAN shadow map using a formula
that is incorrect if the stack is in vmalloc space.
+config ARCH_WANT_RELAX_ORDER
+ bool
+
source "kernel/gcov/Kconfig"
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index cf4034c..68ac5c7 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -44,6 +44,7 @@ config SPARC
select CPU_NO_EFFICIENT_FFS
select HAVE_ARCH_HARDENED_USERCOPY
select PROVE_LOCKING_SMALL if PROVE_LOCKING
+ select ARCH_WANT_RELAX_ORDER
config SPARC32
def_bool !64BIT
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 094e1d6..c38d50c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -350,7 +350,7 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
-#ifndef CONFIG_SPARC
+#ifndef CONFIG_ARCH_WANT_RELAX_ORDER
/* Disable relaxed ordering */
for (i = 0; i < hw->mac.max_tx_queues; i++) {
u32 regval;
--
2.7.0
^ permalink raw reply related
* [GIT] Networking
From: David Miller @ 2017-01-09 3:38 UTC (permalink / raw)
To: torvalds; +Cc: akpm, netdev, linux-kernel
1) Fix dumping of nft_quota entries, from Pablo Neira Ayuso.
2) Fix out of bounds access in nf_tables discovered by KASAN,
from Florian Westphal.
3) Fix IRQ enabling in dp83867 driver, from Grygorii Strashko.
4) Fix unicast filtering in be2net driver, from Ivan Vecera.
5) tg3_get_stats64() can race with driver close and ethtool
reconfigurations, fix from Michael Chan.
6) Fix error handling when pass limit is reached in bpf code
gen on x86. From Daniel Borkmann.
7) Don't clobber switch ops and use proper MDIO nested reads
and writes in bcm_sf2 driver, from Florian Fainelli.
Please pull, thanks a lot!
The following changes since commit e02003b515e8d95f40f20f213622bb82510873d2:
Merge tag 'xfs-for-linus-4.10-rc3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux (2017-01-04 18:33:35 -0800)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
for you to fetch changes up to 03430fa10b99e95e3a15eb7c00978fb1652f3b24:
Merge branch 'bcm_sf2-fixes' (2017-01-08 22:01:22 -0500)
----------------------------------------------------------------
Artur Molchanov (1):
bridge: netfilter: Fix dropping packets that moving through bridge interface
Daniel Borkmann (1):
bpf: change back to orig prog on too many passes
David Forster (1):
vti6: fix device register to report IFLA_INFO_KIND
David S. Miller (3):
Merge git://git.kernel.org/.../pablo/nf
Merge tag 'mac80211-for-davem-2017-01-06' of git://git.kernel.org/.../jberg/mac80211
Merge branch 'bcm_sf2-fixes'
Florian Fainelli (2):
net: dsa: bcm_sf2: Do not clobber b53_switch_ops
net: dsa: bcm_sf2: Utilize nested MDIO read/write
Florian Westphal (1):
netfilter: nf_tables: fix oob access
Grygorii Strashko (1):
net: phy: dp83867: fix irq generation
Ivan Vecera (2):
be2net: fix accesses to unicast list
be2net: fix unicast list filling
Johannes Berg (1):
nl80211: fix sched scan netlink socket owner destruction
Kweh, Hock Leong (1):
net: stmmac: fix maxmtu assignment to be within valid range
Lendacky, Thomas (1):
amd-xgbe: Fix IRQ processing when running in single IRQ mode
Michael Chan (1):
tg3: Fix race condition in tg3_get_stats64().
Pablo Neira Ayuso (3):
netfilter: nft_quota: reset quota after dump
netfilter: nft_queue: use raw_smp_processor_id()
netfilter: nft_payload: mangle ckecksum if NFT_PAYLOAD_L4CSUM_PSEUDOHDR is set
Paul Moore (1):
netlabel: add CALIPSO to the list of built-in protocols
Sergei Shtylyov (2):
sh_eth: fix EESIPR values for SH77{34|63}
sh_eth: R8A7740 supports packet shecksumming
Xin Long (1):
netfilter: ipt_CLUSTERIP: check duplicate config when initializing
Zhu Yanjun (1):
r8169: fix the typo in the comment
arch/x86/net/bpf_jit_comp.c | 2 ++
drivers/net/dsa/bcm_sf2.c | 11 +++++++++--
drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 2 +-
drivers/net/ethernet/broadcom/tg3.c | 3 +++
drivers/net/ethernet/emulex/benet/be_main.c | 12 ++++--------
drivers/net/ethernet/realtek/r8169.c | 2 +-
drivers/net/ethernet/renesas/sh_eth.c | 5 +++--
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 +++++++++-
drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 6 ++++++
drivers/net/phy/dp83867.c | 10 ++++++++++
net/bridge/br_netfilter_hooks.c | 2 +-
net/ipv4/netfilter/ipt_CLUSTERIP.c | 34 +++++++++++++++++++++++-----------
net/ipv6/ip6_vti.c | 2 +-
net/netfilter/nf_tables_api.c | 2 +-
net/netfilter/nft_payload.c | 27 +++++++++++++++++++--------
net/netfilter/nft_queue.c | 2 +-
net/netfilter/nft_quota.c | 26 ++++++++++++++------------
net/netlabel/netlabel_kapi.c | 5 +----
net/wireless/nl80211.c | 16 +++++++---------
19 files changed, 116 insertions(+), 63 deletions(-)
^ permalink raw reply
* Re: [PATCH net 0/2] net: dsa: bcm_sf2: Couple fixes
From: David Miller @ 2017-01-09 3:02 UTC (permalink / raw)
To: f.fainelli; +Cc: netdev, andrew, vivien.didelot
In-Reply-To: <20170108050157.16302-1-f.fainelli@gmail.com>
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Sat, 7 Jan 2017 21:01:55 -0800
> Here are a couple of fixes for bcm_sf2, please queue these up for
> -stable as well, thank you very much!
Series applied and queued up for -stable, thanks.
^ permalink raw reply
* Re: [PATCH V4 net-next 1/3] vhost: better detection of available buffers
From: Jason Wang @ 2017-01-09 2:59 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: kvm, netdev, virtualization, wexu, stefanha
In-Reply-To: <20170106214903-mutt-send-email-mst@kernel.org>
On 2017年01月07日 03:55, Michael S. Tsirkin wrote:
> On Fri, Jan 06, 2017 at 10:13:15AM +0800, Jason Wang wrote:
>> This patch tries to do several tweaks on vhost_vq_avail_empty() for a
>> better performance:
>>
>> - check cached avail index first which could avoid userspace memory access.
>> - using unlikely() for the failure of userspace access
>> - check vq->last_avail_idx instead of cached avail index as the last
>> step.
>>
>> This patch is need for batching supports which needs to peek whether
>> or not there's still available buffers in the ring.
>>
>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>> drivers/vhost/vhost.c | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index d643260..9f11838 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -2241,11 +2241,15 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>> __virtio16 avail_idx;
>> int r;
>>
>> + if (vq->avail_idx != vq->last_avail_idx)
>> + return false;
>> +
>> r = vhost_get_user(vq, avail_idx, &vq->avail->idx);
>> - if (r)
>> + if (unlikely(r))
>> return false;
>> + vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
>>
>> - return vhost16_to_cpu(vq, avail_idx) == vq->avail_idx;
>> + return vq->avail_idx == vq->last_avail_idx;
>> }
>> EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
> So again, this did not address the issue I pointed out in v1:
> if we have 1 buffer in RX queue and
> that is not enough to store the whole packet,
> vhost_vq_avail_empty returns false, then we re-read
> the descriptors again and again.
>
> You have saved a single index access but not the more expensive
> descriptor access.
Looks not, if I understand the code correctly, in this case,
get_rx_bufs() will return zero, and we will try to enable rx kick and
exit the loop.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next v2] net: dsa: make "label" property optional for dsa2
From: Vivien Didelot @ 2017-01-09 2:56 UTC (permalink / raw)
To: Andrew Lunn
Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli,
Uwe Kleine-König, Andrey Smirnov, Jiri Pirko
In-Reply-To: <20170108233019.GA25588@lunn.ch>
Hi Andrew,
Andrew Lunn <andrew@lunn.ch> writes:
>> Until the printing of netdev_phys_item_id structures is fixed in
>> net/core/net-sysfs.c, an external helper can be used like this:
>
> As Florian pointed out, this cannot be changed. It is now part of the
> ABI. We have to live with it printing little endian numbers as big
> endian.
I totally understand the fact that ABI must not be changed. However we
should be aware that the current phys_switch_id of DSA is broken.
In addition to the minor issue of being hardly useable, it does not meet
the requirement described in the switchdev documentation of being unique
on a system. A switch ID in DSA is currently unique only to a DSA tree.
A system with two disjoint switch trees will have two switches with a
phys_switch_id of "00000000".
> Rather than recommending something, it might be better to point to the
> Free Desktop "Predictable Network Interface Names" which is what most
> people will end up with, if they rename:
>
> https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/
>
> It would also be good to test on a recent systemd system and see what
> happens. What names does it pick?
Note that the udev rules I gave in this commit message were only there
as examples of renaming DSA slave interfaces from userspace. This is
orthogonal with the purpose of this patch.
Thanks,
Vivien
^ permalink raw reply
* Re: [PATCH V4 net-next 3/3] tun: rx batching
From: Jason Wang @ 2017-01-09 2:39 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: kvm, netdev, virtualization, wexu, stefanha
In-Reply-To: <20170106214323-mutt-send-email-mst@kernel.org>
On 2017年01月07日 03:47, Michael S. Tsirkin wrote:
>> +static int tun_get_coalesce(struct net_device *dev,
>> + struct ethtool_coalesce *ec)
>> +{
>> + struct tun_struct *tun = netdev_priv(dev);
>> +
>> + ec->rx_max_coalesced_frames = tun->rx_batched;
>> +
>> + return 0;
>> +}
>> +
>> +static int tun_set_coalesce(struct net_device *dev,
>> + struct ethtool_coalesce *ec)
>> +{
>> + struct tun_struct *tun = netdev_priv(dev);
>> +
>> + if (ec->rx_max_coalesced_frames > NAPI_POLL_WEIGHT)
>> + return -EINVAL;
> So what should userspace do? Keep trying until it succeeds?
> I think it's better to just use NAPI_POLL_WEIGHT instead and DTRT here.
>
Well, looking at how set_coalesce is implemented in other drivers,
-EINVAL is usually used when user give a value that exceeds the
limitation. For tuntap, what missed here is probably just a
documentation for coalescing in tuntap.txt. (Or extend ethtool to return
the max value). This seems much better than silently reduce the value to
the limitation.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
^ permalink raw reply
* Re: [PATCH net-next 0/6] convert tc_verd to integer bitfields
From: David Miller @ 2017-01-09 2:10 UTC (permalink / raw)
To: willemdebruijn.kernel
Cc: netdev, fw, dborkman, jhs, alexei.starovoitov, eric.dumazet,
willemb
In-Reply-To: <20170107220638.61314-1-willemdebruijn.kernel@gmail.com>
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Sat, 7 Jan 2017 17:06:32 -0500
> The skb tc_verd field takes up two bytes but uses far fewer bits.
> Convert the remaining use cases to bitfields that fit in existing
> holes (depending on config options) and potentially save the two
> bytes in struct sk_buff.
...
Series applied, thanks!
^ permalink raw reply
* Re: [PATCH net-next] net: dsa: select NET_SWITCHDEV
From: Randy Dunlap @ 2017-01-09 1:32 UTC (permalink / raw)
To: Florian Fainelli, Vivien Didelot, netdev
Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn, Jiri Pirko
In-Reply-To: <ae5c2999-98a3-1b88-dd1d-970d958f6d7a@gmail.com>
On 01/08/17 17:18, Florian Fainelli wrote:
> On 01/08/2017 03:17 PM, Vivien Didelot wrote:
>> DSA wraps SWITCHDEV, thus select it instead of depending on it.
>>
>> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
>
> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
>
but when CONFIG_INET is not enabled, the patch causes this warning:
warning: (NET_DSA) selects NET_SWITCHDEV which has unmet direct dependencies (NET && INET)
--
~Randy
^ permalink raw reply
* Re: [PATCH v2] phy state machine: failsafe leave invalid RUNNING state
From: Florian Fainelli @ 2017-01-09 1:24 UTC (permalink / raw)
To: Zefir Kurtisi, netdev; +Cc: andrew
In-Reply-To: <1483701288-14019-1-git-send-email-zefir.kurtisi@neratec.com>
On 01/06/2017 03:14 AM, Zefir Kurtisi wrote:
> While in RUNNING state, phy_state_machine() checks for link changes by
> comparing phydev->link before and after calling phy_read_status().
> This works as long as it is guaranteed that phydev->link is never
> changed outside the phy_state_machine().
>
> If in some setups this happens, it causes the state machine to miss
> a link loss and remain RUNNING despite phydev->link being 0.
>
> This has been observed running a dsa setup with a process continuously
> polling the link states over ethtool each second (SNMPD RFC-1213
> agent). Disconnecting the link on a phy followed by a ETHTOOL_GSET
> causes dsa_slave_get_settings() / dsa_slave_get_link_ksettings() to
> call phy_read_status() and with that modify the link status - and
> with that bricking the phy state machine.
>
> This patch adds a fail-safe check while in RUNNING, which causes to
> move to CHANGELINK when the link is gone and we are still RUNNING.
>
> Signed-off-by: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [PATCH net-next] net: dsa: select NET_SWITCHDEV
From: Florian Fainelli @ 2017-01-09 1:18 UTC (permalink / raw)
To: Vivien Didelot, netdev
Cc: linux-kernel, kernel, David S. Miller, Andrew Lunn, Jiri Pirko
In-Reply-To: <20170108231724.27398-1-vivien.didelot@savoirfairelinux.com>
On 01/08/2017 03:17 PM, Vivien Didelot wrote:
> DSA wraps SWITCHDEV, thus select it instead of depending on it.
>
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
--
Florian
^ permalink raw reply
* Re: [PATCH v2] PCI: lock each enable/disable num_vfs operation in sysfs
From: Gavin Shan @ 2017-01-08 23:45 UTC (permalink / raw)
To: Emil Tantilov
Cc: linux-pci, intel-wired-lan, alexander.h.duyck, netdev,
linux-kernel
In-Reply-To: <20170106215908.20736.34632.stgit@localhost6.localdomain6>
On Fri, Jan 06, 2017 at 01:59:08PM -0800, Emil Tantilov wrote:
>Enabling/disabling SRIOV via sysfs by echo-ing multiple values
>simultaneously:
>
>echo 63 > /sys/class/net/ethX/device/sriov_numvfs&
>echo 63 > /sys/class/net/ethX/device/sriov_numvfs
>
>sleep 5
>
>echo 0 > /sys/class/net/ethX/device/sriov_numvfs&
>echo 0 > /sys/class/net/ethX/device/sriov_numvfs
>
>Results in the following bug:
>
>kernel BUG at drivers/pci/iov.c:495!
>invalid opcode: 0000 [#1] SMP
>CPU: 1 PID: 8050 Comm: bash Tainted: G W 4.9.0-rc7-net-next #2092
>RIP: 0010:[<ffffffff813b1647>]
> [<ffffffff813b1647>] pci_iov_release+0x57/0x60
>
>Call Trace:
> [<ffffffff81391726>] pci_release_dev+0x26/0x70
> [<ffffffff8155be6e>] device_release+0x3e/0xb0
> [<ffffffff81365ee7>] kobject_cleanup+0x67/0x180
> [<ffffffff81365d9d>] kobject_put+0x2d/0x60
> [<ffffffff8155bc27>] put_device+0x17/0x20
> [<ffffffff8139c08a>] pci_dev_put+0x1a/0x20
> [<ffffffff8139cb6b>] pci_get_dev_by_id+0x5b/0x90
> [<ffffffff8139cca5>] pci_get_subsys+0x35/0x40
> [<ffffffff8139ccc8>] pci_get_device+0x18/0x20
> [<ffffffff8139ccfb>] pci_get_domain_bus_and_slot+0x2b/0x60
> [<ffffffff813b09e7>] pci_iov_remove_virtfn+0x57/0x180
> [<ffffffff813b0b95>] pci_disable_sriov+0x65/0x140
> [<ffffffffa00a1af7>] ixgbe_disable_sriov+0xc7/0x1d0 [ixgbe]
> [<ffffffffa00a1e9d>] ixgbe_pci_sriov_configure+0x3d/0x170 [ixgbe]
> [<ffffffff8139d28c>] sriov_numvfs_store+0xdc/0x130
>...
>RIP [<ffffffff813b1647>] pci_iov_release+0x57/0x60
>
>Use the existing mutex lock to protect each enable/disable operation.
>
>-v2: move the existing lock from protecting the config of the IOV bus
>to protecting the writes to sriov_numvfs in sysfs without maintaining
>a "locked" version of pci_iov_add/remove_virtfn().
>As suggested by Gavin Shan <gwshan@linux.vnet.ibm.com>
>
>CC: Alexander Duyck <alexander.h.duyck@intel.com>
>Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
>---
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
^ permalink raw reply
* Re: [PATCH net-next] net: dsa: select NET_SWITCHDEV
From: Andrew Lunn @ 2017-01-08 23:31 UTC (permalink / raw)
To: Vivien Didelot
Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli,
Jiri Pirko
In-Reply-To: <20170108231724.27398-1-vivien.didelot@savoirfairelinux.com>
On Sun, Jan 08, 2017 at 06:17:24PM -0500, Vivien Didelot wrote:
> DSA wraps SWITCHDEV, thus select it instead of depending on it.
>
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
^ permalink raw reply
* Re: [PATCH net-next v2] net: dsa: make "label" property optional for dsa2
From: Andrew Lunn @ 2017-01-08 23:30 UTC (permalink / raw)
To: Vivien Didelot
Cc: netdev, linux-kernel, kernel, David S. Miller, Florian Fainelli,
Uwe Kleine-König, Andrey Smirnov, Jiri Pirko
In-Reply-To: <20170108231552.26995-1-vivien.didelot@savoirfairelinux.com>
> Until the printing of netdev_phys_item_id structures is fixed in
> net/core/net-sysfs.c, an external helper can be used like this:
Hi Vivien
As Florian pointed out, this cannot be changed. It is now part of the
ABI. We have to live with it printing little endian numbers as big
endian.
> # cat /etc/udev/rules.d/90-net-dsa.rules
> SUBSYSTEM=="net", ACTION=="add", ENV{DEVTYPE}=="dsa", PROGRAM="/lib/udev/dsanitizer $attr{phys_switch_id} $attr{phys_port_id}", NAME="$result"
>
> # cat /lib/udev/dsanitizer
> #!/bin/sh
> echo $1 | sed -e 's,^0*,,' -e 's,0*$,,' | xargs printf sw%d
> echo $2 | sed -e 's,^0*,,' | xargs printf p%d
>
> # ip link | awk '/@eth/ { split($2,a,"@"); print a[1]; }'
> sw0p0
> sw0p1
> sw0p2
> sw1p0
> sw1p1
> sw1p2
> sw2p0
> sw2p1
> sw2p2
> sw2p3
> sw2p4
Rather than recommending something, it might be better to point to the
Free Desktop "Predictable Network Interface Names" which is what most
people will end up with, if they rename:
https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/
It would also be good to test on a recent systemd system and see what
happens. What names does it pick?
Andrew
^ permalink raw reply
* Re: [PATCH v5] net: stmmac: fix maxmtu assignment to be within valid range
From: David Miller @ 2017-01-08 23:20 UTC (permalink / raw)
To: hock.leong.kweh
Cc: Joao.Pinto, peppe.cavallaro, seraphin.bonnaffe, jarod,
andy.shevchenko, alexandre.torgue, manabian, niklas.cassel, johan,
pavel, lars.persson, netdev, linux-kernel
In-Reply-To: <1483781523-14334-1-git-send-email-hock.leong.kweh@intel.com>
From: "Kweh, Hock Leong" <hock.leong.kweh@intel.com>
Date: Sat, 7 Jan 2017 17:32:03 +0800
> From: "Kweh, Hock Leong" <hock.leong.kweh@intel.com>
>
> There is no checking valid value of maxmtu when getting it from
> device tree. This resolution added the checking condition to
> ensure the assignment is made within a valid range.
>
> Signed-off-by: Kweh, Hock Leong <hock.leong.kweh@intel.com>
Applied, thank you.
^ permalink raw reply
* [PATCH net-next] net: dsa: select NET_SWITCHDEV
From: Vivien Didelot @ 2017-01-08 23:17 UTC (permalink / raw)
To: netdev
Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
Andrew Lunn, Jiri Pirko, Vivien Didelot
DSA wraps SWITCHDEV, thus select it instead of depending on it.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
---
net/dsa/Kconfig | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index 2ae9bb357523..675acbf1502d 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -6,7 +6,8 @@ config HAVE_NET_DSA
config NET_DSA
tristate "Distributed Switch Architecture"
- depends on HAVE_NET_DSA && NET_SWITCHDEV
+ depends on HAVE_NET_DSA
+ select NET_SWITCHDEV
select PHYLIB
---help---
Say Y if you want to enable support for the hardware switches supported
--
2.11.0
^ permalink raw reply related
* Re: [PATCH v2 03/12] net: ethernet: aquantia: Add ring support code
From: Rami Rosen @ 2017-01-08 23:18 UTC (permalink / raw)
To: Alexander Loktionov
Cc: Netdev, David VomLehn, Simon Edelhaus, Dmitrii Tarakanov,
Pavel Belous
In-Reply-To: <7d0b8bdb9c3ec1d2cbfb136796dbfc66e0ab535d.1483689029.git.vomlehn@texas.net>
Hi, Alexander,
After a brief review, I have the following minor comments:
...
...
> diff --git a/drivers/net/ethernet/aquantia/aq_ring.c b/drivers/net/ethernet/aquantia/aq_ring.c
> new file mode 100644
> index 0000000..a7ef6aa
> --- /dev/null
> +++ b/drivers/net/ethernet/aquantia/aq_ring.c
> @@ -0,0 +1,380 @@
Should be aq_ring.c and not aq_pci_ring.c
> +
> +/* File aq_pci_ring.c: Definition of functions for Rx/Tx rings. */
> +
The aq_nic_cfg parameter is not used, it should be removed:
> +struct aq_ring_s *aq_ring_tx_alloc(struct aq_ring_s *self,
> + struct aq_nic_s *aq_nic,
> + unsigned int idx,
> + struct aq_nic_cfg_s *aq_nic_cfg)
> +{
> + int err = 0;
> +
> + if (!self) {
> + err = -ENOMEM;
> + goto err_exit;
> + }
> + self->aq_nic = aq_nic;
> + self->idx = idx;
> + self->size = aq_nic_cfg->txds;
> + self->dx_size = aq_nic_cfg->aq_hw_caps->txd_size;
> +
> + self = aq_ring_alloc(self, aq_nic, aq_nic_cfg);
> + if (!self) {
> + err = -ENOMEM;
> + goto err_exit;
> + }
> +
> +err_exit:
> + if (err < 0) {
> + aq_ring_free(self);
> + self = NULL;
> + }
> + return self;
> +}
> +
Shouldn't the return type be void for next 2 methods?
> +int aq_ring_init(struct aq_ring_s *self)
> +{
> + self->hw_head = 0;
> + self->sw_head = 0;
> + self->sw_tail = 0;
> + return 0;
> +}
> +
> +int aq_ring_deinit(struct aq_ring_s *self)
> +{
> + return 0;
> +}
> +
> +void aq_ring_free(struct aq_ring_s *self)
> +{
> + if (!self)
I would prefer here simply "return" and remove altogether the err_exit
label, but it is up to you:
> + goto err_exit;
> +
> + kfree(self->buff_ring);
> +
> + if (self->dx_ring)
> + dma_free_coherent(aq_nic_get_dev(self->aq_nic),
> + self->size * self->dx_size, self->dx_ring,
> + self->dx_ring_pa);
> +
> +err_exit:;
> +}
> +
Shouldn't the following method return type be void ?
> +
> +int aq_ring_tx_clean(struct aq_ring_s *self)
> +{
> + struct device *dev = aq_nic_get_dev(self->aq_nic);
> + struct net_device *ndev = aq_nic_get_ndev(self->aq_nic);
> +
> + for (; self->sw_head != self->hw_head;
> + self->sw_head = aq_ring_next_dx(self, self->sw_head)) {
> + struct aq_ring_buff_s *buff = &self->buff_ring[self->sw_head];
> +
> + ++self->stats.tx_packets;
> + ++ndev->stats.tx_packets;
> + ndev->stats.tx_bytes += buff->len;
> +
> + if (likely(buff->is_mapped)) {
> + if (unlikely(buff->is_sop))
> + dma_unmap_single(dev, buff->pa, buff->len,
> + DMA_TO_DEVICE);
> + else
> + dma_unmap_page(dev, buff->pa, buff->len,
> + DMA_TO_DEVICE);
> + }
> +
> + if (unlikely(buff->is_eop))
> + dev_kfree_skb_any(buff->skb);
> + }
> +
> + if (aq_ring_avail_dx(self) > AQ_CFG_SKB_FRAGS_MAX)
> + aq_nic_ndev_queue_start(self->aq_nic, self->idx);
> +
> + return 0;
> +}
> +
The "err" variable in aq_ring_rx_clean() is meaningless and according to
current implementation of this method it should be removed. You set it
at the beginning to
0, then later on you also assign 0 to it under certain conditions, and
that's it, no other assignment. Maybe the second assignment should
have been to some other value than 0, but as it is it, the "err"
variable has no meaning.
> +int aq_ring_rx_clean(struct aq_ring_s *self, int *work_done, int budget)
> +{
> + struct net_device *ndev = aq_nic_get_ndev(self->aq_nic);
> + int err = 0;
> + bool is_rsc_completed = true;
> +
> + for (; (self->sw_head != self->hw_head) && budget;
> + self->sw_head = aq_ring_next_dx(self, self->sw_head),
> + --budget, ++(*work_done)) {
> + struct aq_ring_buff_s *buff = &self->buff_ring[self->sw_head];
> + struct sk_buff *skb = NULL;
> + unsigned int next_ = 0U;
> + unsigned int i = 0U;
> + struct aq_ring_buff_s *buff_ = NULL;
> +
> + if (buff->is_error) {
> + __free_pages(buff->page, 0);
> + continue;
> + }
> +
> + if (buff->is_cleaned)
> + continue;
> +
> + ++self->stats.rx_packets;
> + ++ndev->stats.rx_packets;
> + ndev->stats.rx_bytes += buff->len;
> +
> + if (!buff->is_eop) {
> + for (next_ = buff->next,
> + buff_ = &self->buff_ring[next_]; true;
> + next_ = buff_->next,
> + buff_ = &self->buff_ring[next_]) {
> + is_rsc_completed =
> + aq_ring_dx_in_range(self->sw_head,
> + next_,
> + self->hw_head);
> +
> + if (unlikely(!is_rsc_completed)) {
> + is_rsc_completed = false;
> + break;
> + }
> +
> + if (buff_->is_eop)
> + break;
> + }
> +
> + if (!is_rsc_completed) {
> + err = 0;
> + goto err_exit;
> + }
> + }
> +
> + skb = netdev_alloc_skb(ndev, ETH_HLEN + AQ_CFG_IP_ALIGN);
> +
> + skb_reserve(skb, AQ_CFG_IP_ALIGN);
> + skb_put(skb, ETH_HLEN);
> + memcpy(skb->data, page_address(buff->page), ETH_HLEN);
> +
> + skb_add_rx_frag(skb, 0, buff->page, ETH_HLEN,
> + buff->len - ETH_HLEN,
> + SKB_TRUESIZE(buff->len - ETH_HLEN));
> + if (!buff->is_eop) {
> + for (i = 1U, next_ = buff->next,
> + buff_ = &self->buff_ring[next_]; true;
> + next_ = buff_->next,
> + buff_ = &self->buff_ring[next_], ++i) {
> + skb_add_rx_frag(skb, i, buff_->page, 0,
> + buff_->len,
> + SKB_TRUESIZE(buff->len -
> + ETH_HLEN));
> + buff_->is_cleaned = 1;
> +
> + if (buff_->is_eop)
> + break;
> + }
> + }
> +
> + skb->dev = ndev;
> +
> + skb->protocol = eth_type_trans(skb, ndev);
> + if (unlikely(buff->is_cso_err)) {
> + ++self->stats.rx_errors;
> + __skb_mark_checksum_bad(skb);
> + } else {
> + if (buff->is_ip_cso) {
> + __skb_incr_checksum_unnecessary(skb);
> + if (buff->is_udp_cso || buff->is_tcp_cso)
> + __skb_incr_checksum_unnecessary(skb);
> + } else {
> + skb->ip_summed = CHECKSUM_NONE;
> + }
> + }
> +
> + skb_set_hash(skb, buff->rss_hash,
> + buff->is_hash_l4 ? PKT_HASH_TYPE_L4 :
> + PKT_HASH_TYPE_NONE);
> +
> + skb_record_rx_queue(skb, self->idx);
> +
> + netif_receive_skb(skb);
> + }
> +
> +err_exit:
> + return err;
> +}
Should the following two methods be void ?
> +
> +int aq_ring_tx_drop(struct aq_ring_s *self)
> +{
> + for (; self->sw_head != self->sw_tail;
> + self->sw_head = aq_ring_next_dx(self, self->sw_head)) {
> + struct aq_ring_buff_s *buff = &self->buff_ring[self->sw_head];
> + struct device *ndev = aq_nic_get_dev(self->aq_nic);
> +
> + if (likely(buff->is_mapped)) {
> + if (unlikely(buff->is_sop))
> + dma_unmap_single(ndev, buff->pa, buff->len,
> + DMA_TO_DEVICE);
> + else
> + dma_unmap_page(ndev, buff->pa, buff->len,
> + DMA_TO_DEVICE);
> + }
> +
> + if (unlikely(buff->is_eop))
> + dev_kfree_skb_any(buff->skb);
> + }
> +
> + return 0;
> +}
> +
> +int aq_ring_rx_drop(struct aq_ring_s *self)
> +{
> + for (; self->sw_head != self->sw_tail;
> + self->sw_head = aq_ring_next_dx(self, self->sw_head)) {
> + struct aq_ring_buff_s *buff = &self->buff_ring[self->sw_head];
> +
> + dma_unmap_page(aq_nic_get_dev(self->aq_nic), buff->pa,
> + AQ_CFG_RX_FRAME_MAX, DMA_FROM_DEVICE);
> +
> + __free_pages(buff->page, 0);
> + }
> +
> + return 0;
> +}
> +
> diff --git a/drivers/net/ethernet/aquantia/aq_ring.h b/drivers/net/ethernet/aquantia/aq_ring.h
> new file mode 100644
> index 0000000..8f7e16e
> --- /dev/null
> +++ b/drivers/net/ethernet/aquantia/aq_ring.h
> @@ -0,0 +1,147 @@
> +/*
> + * aQuantia Corporation Network Driver
> + * Copyright (C) 2014-2016 aQuantia Corporation. All rights reserved
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + */
> +
File name should be aq_ring.h:
> +/* File aq_pci_ring.h: Declaration of functions for Rx/Tx rings. */
> +
> +#ifndef AQ_RING_H
> +#define AQ_RING_H
> +
> +#include "aq_common.h"
> +
> +struct page;
> +
> +/* TxC SOP DX EOP
> + * +----------+----------+----------+-----------
> + * 8bytes|len l3,l4 | pa | pa | pa
> + * +----------+----------+----------+-----------
> + * 4/8bytes|len pkt |len pkt | | skb
> + * +----------+----------+----------+-----------
> + * 4/8bytes|is_txc |len,flags |len |len,is_eop
> + * +----------+----------+----------+-----------
> + *
> + * This aq_ring_buff_s doesn't have endianness dependency.
Typo: chache->cache
> + * It is __packed for chache line optimisations.
Regards,
Rami Rosen
^ permalink raw reply
* [PATCH net-next v2] net: dsa: make "label" property optional for dsa2
From: Vivien Didelot @ 2017-01-08 23:15 UTC (permalink / raw)
To: netdev
Cc: linux-kernel, kernel, David S. Miller, Florian Fainelli,
Andrew Lunn, Uwe Kleine-König, Andrey Smirnov, Jiri Pirko,
Vivien Didelot
In the new DTS bindings for DSA (dsa2), the "ethernet" and "link"
phandles are respectively mandatory and exclusive to CPU port and DSA
link device tree nodes.
Simplify dsa2.c a bit by checking the presence of such phandle instead
of checking the redundant "label" property.
Then the Linux philosophy for Ethernet switch ports is to expose them to
userspace as standard NICs by default. Thus use the standard enumerated
"eth%d" device name if no "label" property is provided for a user port.
This allows to save DTS files from subjective net device names.
Here's an example on a ZII Dev Rev B board without "label" properties:
# ip link | grep ': ' | cut -d: -f2
lo
eth0
eth1
eth2@eth1
eth3@eth1
eth4@eth1
eth5@eth1
eth6@eth1
eth7@eth1
eth8@eth1
eth9@eth1
eth10@eth1
eth11@eth1
eth12@eth1
If one wants to rename an interface, udev rules can be used as usual, as
suggested in the switchdev documentation:
# cat /etc/udev/rules.d/90-net-dsa.rules
SUBSYSTEM=="net", ACTION=="add", ENV{DEVTYPE}=="dsa", NAME="sw$attr{phys_switch_id}p$attr{phys_port_id}"
# ip link | awk '/@eth/ { split($2,a,"@"); print a[1]; }'
sw00000000p00
sw00000000p01
sw00000000p02
sw01000000p00
sw01000000p01
sw01000000p02
sw02000000p00
sw02000000p01
sw02000000p02
sw02000000p03
sw02000000p04
Until the printing of netdev_phys_item_id structures is fixed in
net/core/net-sysfs.c, an external helper can be used like this:
# cat /etc/udev/rules.d/90-net-dsa.rules
SUBSYSTEM=="net", ACTION=="add", ENV{DEVTYPE}=="dsa", PROGRAM="/lib/udev/dsanitizer $attr{phys_switch_id} $attr{phys_port_id}", NAME="$result"
# cat /lib/udev/dsanitizer
#!/bin/sh
echo $1 | sed -e 's,^0*,,' -e 's,0*$,,' | xargs printf sw%d
echo $2 | sed -e 's,^0*,,' | xargs printf p%d
# ip link | awk '/@eth/ { split($2,a,"@"); print a[1]; }'
sw0p0
sw0p1
sw0p2
sw1p0
sw1p1
sw1p2
sw2p0
sw2p1
sw2p2
sw2p3
sw2p4
Of course the current behavior is unchanged, and the optional "label"
property for user ports has precedence over the enumerated name.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Uwe Kleine-König <uwe@kleine-koenig.org>
---
Documentation/devicetree/bindings/net/dsa/dsa.txt | 20 ++++++++-----------
net/dsa/dsa2.c | 24 ++++-------------------
2 files changed, 12 insertions(+), 32 deletions(-)
diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt b/Documentation/devicetree/bindings/net/dsa/dsa.txt
index a4a570fb2494..cfe8f64eca4f 100644
--- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
+++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
@@ -34,13 +34,9 @@ Required properties:
Each port children node must have the following mandatory properties:
- reg : Describes the port address in the switch
-- label : Describes the label associated with this port, which
- will become the netdev name. Special labels are
- "cpu" to indicate a CPU port and "dsa" to
- indicate an uplink/downlink port between switches in
- the cluster.
-A port labelled "dsa" has the following mandatory property:
+An uplink/downlink port between switches in the cluster has the following
+mandatory property:
- link : Should be a list of phandles to other switch's DSA
port. This port is used as the outgoing port
@@ -48,12 +44,17 @@ A port labelled "dsa" has the following mandatory property:
information must be given, not just the one hop
routes to neighbouring switches.
-A port labelled "cpu" has the following mandatory property:
+A CPU port has the following mandatory property:
- ethernet : Should be a phandle to a valid Ethernet device node.
This host device is what the switch port is
connected to.
+A user port has the following optional property:
+
+- label : Describes the label associated with this port, which
+ will become the netdev name.
+
Port child nodes may also contain the following optional standardised
properties, described in binding documents:
@@ -107,7 +108,6 @@ linked into one DSA cluster.
switch0port5: port@5 {
reg = <5>;
- label = "dsa";
phy-mode = "rgmii-txid";
link = <&switch1port6
&switch2port9>;
@@ -119,7 +119,6 @@ linked into one DSA cluster.
port@6 {
reg = <6>;
- label = "cpu";
ethernet = <&fec1>;
fixed-link {
speed = <100>;
@@ -165,7 +164,6 @@ linked into one DSA cluster.
switch1port5: port@5 {
reg = <5>;
- label = "dsa";
link = <&switch2port9>;
phy-mode = "rgmii-txid";
fixed-link {
@@ -176,7 +174,6 @@ linked into one DSA cluster.
switch1port6: port@6 {
reg = <6>;
- label = "dsa";
phy-mode = "rgmii-txid";
link = <&switch0port5>;
fixed-link {
@@ -255,7 +252,6 @@ linked into one DSA cluster.
switch2port9: port@9 {
reg = <9>;
- label = "dsa";
phy-mode = "rgmii-txid";
link = <&switch1port5
&switch0port5>;
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index bad119cee2a3..9526bdf2a34a 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -81,30 +81,12 @@ static void dsa_dst_del_ds(struct dsa_switch_tree *dst,
static bool dsa_port_is_dsa(struct device_node *port)
{
- const char *name;
-
- name = of_get_property(port, "label", NULL);
- if (!name)
- return false;
-
- if (!strcmp(name, "dsa"))
- return true;
-
- return false;
+ return !!of_parse_phandle(port, "link", 0);
}
static bool dsa_port_is_cpu(struct device_node *port)
{
- const char *name;
-
- name = of_get_property(port, "label", NULL);
- if (!name)
- return false;
-
- if (!strcmp(name, "cpu"))
- return true;
-
- return false;
+ return !!of_parse_phandle(port, "ethernet", 0);
}
static bool dsa_ds_find_port(struct dsa_switch *ds,
@@ -268,6 +250,8 @@ static int dsa_user_port_apply(struct device_node *port, u32 index,
int err;
name = of_get_property(port, "label", NULL);
+ if (!name)
+ name = "eth%d";
err = dsa_slave_create(ds, ds->dev, index, name);
if (err) {
--
2.11.0
^ permalink raw reply related
* Re: [PATCH v2] phy state machine: failsafe leave invalid RUNNING state
From: David Miller @ 2017-01-08 23:16 UTC (permalink / raw)
To: zefir.kurtisi; +Cc: netdev, f.fainelli, andrew
In-Reply-To: <1483701288-14019-1-git-send-email-zefir.kurtisi@neratec.com>
From: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Date: Fri, 6 Jan 2017 12:14:48 +0100
> While in RUNNING state, phy_state_machine() checks for link changes by
> comparing phydev->link before and after calling phy_read_status().
> This works as long as it is guaranteed that phydev->link is never
> changed outside the phy_state_machine().
>
> If in some setups this happens, it causes the state machine to miss
> a link loss and remain RUNNING despite phydev->link being 0.
>
> This has been observed running a dsa setup with a process continuously
> polling the link states over ethtool each second (SNMPD RFC-1213
> agent). Disconnecting the link on a phy followed by a ETHTOOL_GSET
> causes dsa_slave_get_settings() / dsa_slave_get_link_ksettings() to
> call phy_read_status() and with that modify the link status - and
> with that bricking the phy state machine.
>
> This patch adds a fail-safe check while in RUNNING, which causes to
> move to CHANGELINK when the link is gone and we are still RUNNING.
>
> Signed-off-by: Zefir Kurtisi <zefir.kurtisi@neratec.com>
> ---
> Changes to v1:
> * fix kbuild test robot error: use phydev_err instead of dev_warn
> (adapt to changed struct phy_device after 4.4.21)
Florian and Andrew, please provide some feedback on this.
Thank you.
^ permalink raw reply
* Re: [PATCH net-next] mdio: Demote print from info to debug in mdio_device_register
From: David Miller @ 2017-01-08 23:15 UTC (permalink / raw)
To: f.fainelli; +Cc: netdev, andrew
In-Reply-To: <20170107062759.32015-1-f.fainelli@gmail.com>
From: Florian Fainelli <f.fainelli@gmail.com>
Date: Fri, 6 Jan 2017 22:27:59 -0800
> While it is useful to know which MDIO device is being registered, demote
> the dev_info() to a dev_dbg().
>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Applied.
^ permalink raw reply
* Re: [PATCH net-next 2/2] net: remove useless memset's in drivers get_stats64
From: David Miller @ 2017-01-08 23:14 UTC (permalink / raw)
To: stephen; +Cc: netdev, sthemmin
In-Reply-To: <20170107031253.2739-2-sthemmin@microsoft.com>
From: Stephen Hemminger <stephen@networkplumber.org>
Date: Fri, 6 Jan 2017 19:12:53 -0800
> In dev_get_stats() the statistic structure storage has already been
> zeroed. Therefore network drivers do not need to call memset() again.
>
> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Applied.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox