Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 0/2] sh_eth: RPADIR related clean-ups
From: David Miller @ 2018-06-26 14:16 UTC (permalink / raw)
  To: sergei.shtylyov; +Cc: netdev, linux-renesas-soc
In-Reply-To: <2809eba8-4c9a-1d5f-a47d-8125777e365b@cogentembedded.com>

From: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Date: Mon, 25 Jun 2018 23:34:52 +0300

> Here's a set of 2 patches against DaveM's 'net-next.git' repo. They are
> clean-ups related to RPADIR (DMA padding to NET_IP_ALIGN)...

Series applied.

^ permalink raw reply

* [PATCH net-next 1/1] tc-tests: add an extreme-case csum action test
From: Keara Leibovitz @ 2018-06-26 14:16 UTC (permalink / raw)
  To: davem; +Cc: netdev, jhs, xiyou.wangcong, jiri, lucasb, Keara Leibovitz

Added an extreme-case test for all 7 csum action headers.

Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
---
 .../tc-testing/tc-tests/actions/csum.json          | 24 ++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json b/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json
index 3a2f51fc7fd4..a022792d392a 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/csum.json
@@ -336,6 +336,30 @@
         ]
     },
     {
+        "id": "b10b",
+        "name": "Add all 7 csum actions",
+        "category": [
+            "actions",
+            "csum"
+        ],
+        "setup": [
+            [
+                "$TC actions flush action csum",
+                0,
+                1,
+                255
+            ]
+        ],
+        "cmdUnderTest": "$TC actions add action csum icmp ip4h sctp igmp udplite udp tcp index 7",
+        "expExitCode": "0",
+        "verifyCmd": "$TC actions get action csum index 7",
+        "matchPattern": "action order [0-9]*: csum \\(iph, icmp, igmp, tcp, udp, udplite, sctp\\).*index 7 ref",
+        "matchCount": "1",
+        "teardown": [
+            "$TC actions flush action csum"
+        ]
+    },
+    {
         "id": "ce92",
         "name": "Add csum udp action with cookie",
         "category": [
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] NFC: llcp: fix nfc_llcp_send_ui_frame() lockup
From: Steven Rostedt @ 2018-06-26 14:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Sergey Senozhatsky, Dmitry Vyukov, Samuel Ortiz, David S. Miller,
	Petr Mladek, syzkaller-bugs, linux-wireless, netdev, LKML, syzbot,
	Sergey Senozhatsky
In-Reply-To: <8c410102-43ab-dfdb-0d71-2ee5951e1af8@gmail.com>

On Mon, 25 Jun 2018 23:44:22 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> > The loop is still infinite, correct, but we have a preemption point now.
> > Sure, net people can come with a much better solution, I'll be happy to
> > scratch my patch.
> >   
> 
> This can not be the right solution, think about current thread being real time,
> cond_resched() might be a nop.

Good point! Bah, as one of the RT maintainers, I should have noticed
that too :-p

I'm losing my touch.

-- Steve


> 
> We should probably not loop at all, or not use MSG_DONTWAIT.
> 
> (And remove this useless "Could not allocate PDU" message)
> 
> NFC maintainers should really take a look at this.

^ permalink raw reply

* Bug report: epoll can fail to report EPOLLOUT when unix datagram socket peer is closed
From: Ian Lance Taylor @ 2018-06-26 14:18 UTC (permalink / raw)
  To: netdev

I'm reporting what appears to be a bug in the Linux kernel's epoll
support.  It seems that epoll appears to sometimes fail to report an
EPOLLOUT event when the other side of an AF_UNIX/SOCK_DGRAM socket is
closed.  This bug report started as a Go program reported at
https://golang.org/issue/23604.  I've written a C program that
demonstrates the same symptoms, at
https://github.com/golang/go/issues/23604#issuecomment-398945027 .

The C program sets up an AF_UNIX/SOCK_DGRAM server and serveral
identical clients, all running in non-blocking mode.  All the
non-blocking sockets are added to epoll, using EPOLLET.  The server
periodically closes and reopens its socket.  The clients look for
ECONNREFUSED errors on their write calls, and close and reopen their
sockets when they see one.

The clients will sometimes fill up their buffer and block with EAGAIN.
At that point they expect the poller to return an EPOLLOUT event to
tell them when they are ready to write again.  The expectation is that
either the server will read data, freeing up buffer space, or will
close the socket, which should cause the sending packets to be
discarded, freeing up buffer space.  Generally the EPOLLOUT event
happens.  But sometimes, the poller never returns such an event, and
the client stalls.  In the test program this is reported as a client
that waits more than 20 seconds to be told to continue.

A similar bug report was made, with few details, at
https://stackoverflow.com/questions/38441059/edge-triggered-epoll-for-unix-domain-socket
.

I've tested the program and seen the failure on kernel 4.9.0-6-amd64.
A colleague has tested the program and seen the failure on
4.18.0-smp-DEV #3 SMP @1529531011 x86_64 GNU/Linux.

If there is a better way for me to report this, please let me know.

Thanks for your attention.

Ian

^ permalink raw reply

* Re: [PATCH net-next v2 0/7] net: sched: support replay of filter offload when binding to block
From: David Miller @ 2018-06-26 14:21 UTC (permalink / raw)
  To: jakub.kicinski; +Cc: jiri, xiyou.wangcong, jhs, gerlitz.or, netdev, oss-drivers
In-Reply-To: <20180625213010.13266-1-jakub.kicinski@netronome.com>

From: Jakub Kicinski <jakub.kicinski@netronome.com>
Date: Mon, 25 Jun 2018 14:30:03 -0700

> This series from John adds the ability to replay filter offload requests
> when new offload callback is being registered on a TC block.  This is most
> likely to take place for shared blocks today, when a block which already
> has rules is bound to another interface.  Prior to this patch set if any
> of the rules were offloaded the block bind would fail.
> 
> A new tcf_proto_op is added to generate a filter-specific offload request.
> The new 'offload' op is supporting extack from day 0, hence we need to
> propagate extack to .ndo_setup_tc TC_BLOCK_BIND/TC_BLOCK_UNBIND and
> through tcf_block_cb_register() to tcf_block_playback_offloads().
> 
> The immediate use of this patch set is to simplify life of drivers which
> require duplicating rules when sharing blocks.  Switch drivers (mlxsw)
> can bind ports to rule lists dynamically, NIC drivers generally don't
> have that ability and need the rules to be duplicated for each ingress
> they match on.  In code terms this means that switch drivers don't
> register multiple callbacks for each port.  NIC drivers do, and get a
> separate request and hance rule per-port, as if the block was not shared.
> The registration fails today, however, if some rules were already present.
> 
> As John notes in description of patch 7, drivers which register multiple
> callbacks to shared blocks will likely need to flush the rules on block
> unbind.  This set makes the core not only replay the the offload add
> requests but also offload remove requests when callback is unregistered.
> 
> v2:
>  - name parameters in patch 2;
>  - use unsigned int instead of u32 for in_hw_coun;
>  - improve extack message in patch 7.

Series applied, thank you.

^ permalink raw reply

* Re: [patch net-next RFC 03/12] mlxsw: core: Add core environment module for port temperature reading
From: Andrew Lunn @ 2018-06-26 14:22 UTC (permalink / raw)
  To: Vadim Pasternak, linux-pm; +Cc: netdev, linux, rui.zhang, edubezval, jiri
In-Reply-To: <1530015037-67361-4-git-send-email-vadimp@mellanox.com>

On Tue, Jun 26, 2018 at 12:10:28PM +0000, Vadim Pasternak wrote:

Adding the linux-pm@vger.kernel.org list.

> Add new core_env module to allow port temperature reading. This
> information has most critical impact on system's thermal monitoring and
> is to be used by core_hwmon and core_thermal modules.
> 
> New internal API reads the temperature from all the modules, which are
> equipped with the thermal sensor and exposes temperature according to
> the worst measure. All individual temperature values are normalized to
> pre-defined range.

This patchset has been sent to the netdev list before. I raised a few
questions about this, which is why it is now being posted to a bigger
group for review.

The hardware has up to 64 temperature sensors. These sensors are
hot-plugable, since they are inside SFP modules, which are
hot-plugable. Different SFP modules can have different operating
temperature ranges. They contain an EEPROM which lists upper and lower
warning and fail temperatures, and report alarms when these thresholds
a reached.

This code takes the 64 sensors readings and calculates a single value
it passes to one thermal zone. That thermal zone then controls one fan
to keep this single value in range.

I queried is this is the correct way to do this? Would it not be
better to have up to 64 thermal zones? Leave the thermal core to
iterate over all the zones in order to determine how the fan should be
driven?

This is possibly the first board with so many sensors. However, i
doubt it is totally unique. Other big Ethernet switches with lots of
SFP modules may be added later. Also, 10G copper PHYs often have
temperature sensors, so this is not limited to just boards with
optical ports. So having a generic solution would be good.

What do the Linux PM exports say about this?

Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH v4 net-next] mdio-mux-gpio: Remove VLA usage
From: David Miller @ 2018-06-26 14:25 UTC (permalink / raw)
  To: keescook; +Cc: linux-kernel, joe, andrew, netdev
In-Reply-To: <20180625224949.GA48766@beast>

From: Kees Cook <keescook@chromium.org>
Date: Mon, 25 Jun 2018 15:49:49 -0700

> In the quest to remove all stack VLA usage from the kernel[1], this
> allocates the values buffer during the callback instead of putting it
> on the stack.
> 
> [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> Reviewed-by: Andrew Lunn <andrew@lunn.ch>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] selftests: forwarding: mirror_gre_vlan_bridge_1q: Unset rp_filter
From: David Miller @ 2018-06-26 14:25 UTC (permalink / raw)
  To: petrm; +Cc: netdev, linux-kselftest, shuah
In-Reply-To: <66d218fb95026460e240752eec3af58920a14d86.1529968802.git.petrm@mellanox.com>

From: Petr Machata <petrm@mellanox.com>
Date: Tue, 26 Jun 2018 01:20:32 +0200

> The IP addresses of tunnel endpoint at H3 are set at the VLAN device
> $h3.555. Therefore when test_gretap_untagged_egress() sets vlan 555 to
> egress untagged at $swp3, $h3's rp_filter rejects these packets. The
> test then spuriously fails.
> 
> Therefore turn off net.ipv4.conf.{all, $h3}.rp_filter.
> 
> Fixes: 9c7c8a82442c ("selftests: forwarding: mirror_gre_vlan_bridge_1q: Add more tests")
> Signed-off-by: Petr Machata <petrm@mellanox.com>
> Reviewed-by: Ido Schimmel <idosch@mellanox.com>

Applied.

^ permalink raw reply

* Re: [patch net-next RFC 11/12] mlxsw: core: Extend hwmon interface with FAN fault attribute
From: Andrew Lunn @ 2018-06-26 14:28 UTC (permalink / raw)
  To: Vadim Pasternak
  Cc: davem, netdev, linux, rui.zhang, edubezval, jiri, mlxsw,
	michaelsh
In-Reply-To: <1530015037-67361-12-git-send-email-vadimp@mellanox.com>

> +static ssize_t mlxsw_hwmon_fan_fault_show(struct device *dev,
> +					  struct device_attribute *attr,
> +					  char *buf)
> +{
> +	struct mlxsw_hwmon_attr *mlwsw_hwmon_attr =
> +			container_of(attr, struct mlxsw_hwmon_attr, dev_attr);
> +	struct mlxsw_hwmon *mlxsw_hwmon = mlwsw_hwmon_attr->hwmon;
> +	char mfsm_pl[MLXSW_REG_MFSM_LEN];
> +	u16 tach;
> +	int err;
> +
> +	mlxsw_reg_mfsm_pack(mfsm_pl, mlwsw_hwmon_attr->type_index);
> +	err = mlxsw_reg_query(mlxsw_hwmon->core, MLXSW_REG(mfsm), mfsm_pl);
> +	if (err) {
> +		dev_err(mlxsw_hwmon->bus_info->dev, "Failed to query fan\n");
> +		return err;
> +	}
> +	tach = mlxsw_reg_mfsm_rpm_get(mfsm_pl);
> +
> +	return sprintf(buf, "%u\n", (tach < mlxsw_hwmon->tach_min) ? 1 : 0);
> +}

Documentation/hwmon/sysfs-interface says:

Alarms are direct indications read from the chips. The drivers do NOT
make comparisons of readings to thresholds. This allows violations
between readings to be caught and alarmed. The exact definition of an
alarm (for example, whether a threshold must be met or must be exceeded
to cause an alarm) is chip-dependent.

Now, this is a fault, not an alarm. But does the same apply?

     Andrew

^ permalink raw reply

* Re: [PATCH net-next] rds: clean up loopback rds_connections on netns deletion
From: David Miller @ 2018-06-26 14:29 UTC (permalink / raw)
  To: sowmini.varadhan; +Cc: netdev, rds-devel, santosh.shilimkar
In-Reply-To: <20180626134043.GE20575@oracle.com>

From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Date: Tue, 26 Jun 2018 09:40:43 -0400

> On (06/26/18 22:23), David Miller wrote:
>> 
>> Since this probably fixes syzbot reports, this can be targetted
>> at 'net' instead?
> 
> that thought occurred to me but I wanted to be conservative and have
> it in net-next first, have the syzkaller-bugs team confirm the
> the fixes and then backport to earlier kernels (if needed)..

I think there is a way to ask syzbot to test a patch in an
email.

^ permalink raw reply

* Re: [PATCH net-next] rds: clean up loopback rds_connections on netns deletion
From: Sowmini Varadhan @ 2018-06-26 14:44 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, rds-devel, santosh.shilimkar, dvyukov, syzkaller-bugs
In-Reply-To: <20180626.232956.1181108479532700313.davem@davemloft.net>

On (06/26/18 23:29), David Miller wrote:
> 
> I think there is a way to ask syzbot to test a patch in an
> email.

Dmitry/syzkaller-bugs, can you clarify? 

This is for the cluster of dup reports like
 https://groups.google.com/forum/#!topic/syzkaller-bugs/zBph8Vu-q2U
and (most recently)
 https://www.spinics.net/lists/linux-rdma/msg66020.html

as I understand it, if there is no reproducer, you cannot really
have a pass/fail test to confirm the fix.

--Sowmini

^ permalink raw reply

* RE: [patch net-next RFC 11/12] mlxsw: core: Extend hwmon interface with FAN fault attribute
From: Vadim Pasternak @ 2018-06-26 14:47 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: davem@davemloft.net, netdev@vger.kernel.org, linux@roeck-us.net,
	rui.zhang@intel.com, edubezval@gmail.com, jiri@resnulli.us, mlxsw,
	Michael Shych
In-Reply-To: <20180626142838.GC5064@lunn.ch>



> -----Original Message-----
> From: Andrew Lunn [mailto:andrew@lunn.ch]
> Sent: Tuesday, June 26, 2018 5:29 PM
> To: Vadim Pasternak <vadimp@mellanox.com>
> Cc: davem@davemloft.net; netdev@vger.kernel.org; linux@roeck-us.net;
> rui.zhang@intel.com; edubezval@gmail.com; jiri@resnulli.us; mlxsw
> <mlxsw@mellanox.com>; Michael Shych <michaelsh@mellanox.com>
> Subject: Re: [patch net-next RFC 11/12] mlxsw: core: Extend hwmon interface
> with FAN fault attribute
> 
> > +static ssize_t mlxsw_hwmon_fan_fault_show(struct device *dev,
> > +					  struct device_attribute *attr,
> > +					  char *buf)
> > +{
> > +	struct mlxsw_hwmon_attr *mlwsw_hwmon_attr =
> > +			container_of(attr, struct mlxsw_hwmon_attr,
> dev_attr);
> > +	struct mlxsw_hwmon *mlxsw_hwmon = mlwsw_hwmon_attr->hwmon;
> > +	char mfsm_pl[MLXSW_REG_MFSM_LEN];
> > +	u16 tach;
> > +	int err;
> > +
> > +	mlxsw_reg_mfsm_pack(mfsm_pl, mlwsw_hwmon_attr->type_index);
> > +	err = mlxsw_reg_query(mlxsw_hwmon->core, MLXSW_REG(mfsm),
> mfsm_pl);
> > +	if (err) {
> > +		dev_err(mlxsw_hwmon->bus_info->dev, "Failed to query
> fan\n");
> > +		return err;
> > +	}
> > +	tach = mlxsw_reg_mfsm_rpm_get(mfsm_pl);
> > +
> > +	return sprintf(buf, "%u\n", (tach < mlxsw_hwmon->tach_min) ? 1 : 0);
> > +}
> 
> Documentation/hwmon/sysfs-interface says:
> 
> Alarms are direct indications read from the chips. The drivers do NOT make
> comparisons of readings to thresholds. This allows violations between readings
> to be caught and alarmed. The exact definition of an alarm (for example,
> whether a threshold must be met or must be exceeded to cause an alarm) is
> chip-dependent.
> 
> Now, this is a fault, not an alarm. But does the same apply?

Hi Andrew,

Hardware provides minimum value for tachometer.
Tachometer is considered as faulty in case it's below this
value.
In case any tachometer is faulty, PWM according to the
system requirements should be set to 100% until the fault
is not recovered (f.e. by physical replacing of bad unit).
This is the motivation to expose fan{x}_fault in the way
it's exposed.

Thanks,
Vadim.

> 
>      Andrew

^ permalink raw reply

* Re: [PATCH net-next] rds: clean up loopback rds_connections on netns deletion
From: Dmitry Vyukov @ 2018-06-26 14:48 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: David Miller, netdev, rds-devel, Santosh Shilimkar,
	syzkaller-bugs
In-Reply-To: <20180626144454.GF20575@oracle.com>

On Tue, Jun 26, 2018 at 4:44 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (06/26/18 23:29), David Miller wrote:
>>
>> I think there is a way to ask syzbot to test a patch in an
>> email.
>
> Dmitry/syzkaller-bugs, can you clarify?
>
> This is for the cluster of dup reports like
>  https://groups.google.com/forum/#!topic/syzkaller-bugs/zBph8Vu-q2U
> and (most recently)
>  https://www.spinics.net/lists/linux-rdma/msg66020.html
>
> as I understand it, if there is no reproducer, you cannot really
> have a pass/fail test to confirm the fix.

This bug has a reproducer as far as I see:

https://syzkaller.appspot.com/bug?id=f4ef381349e100280193c25f24e01d9d364132d9

It seems to be a subtle race since syzbot did not progress with
minimization too much:

https://syzkaller.appspot.com/text?tag=ReproSyz&x=16cbfeaf800000

it probably hit the race by a pure luck of the large program, but then
never had the same luck when tried to remove any syscalls.
So it can make sense to submit several test requests to get more testing.

^ permalink raw reply

* Re: [PATCH v2] fib_rules: match rules based on suppress_* properties too
From: Roopa Prabhu @ 2018-06-26 14:51 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: netdev
In-Reply-To: <20180625233932.11531-1-Jason@zx2c4.com>

On Mon, Jun 25, 2018 at 4:39 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Two rules with different values of suppress_prefix or suppress_ifgroup
> are not the same. This fixes an -EEXIST when running:
>
>    $ ip -4 rule add table main suppress_prefixlength 0
>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Fixes: f9d4b0c1e969 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule")
> ---
> This adds the new condition you mentioned. I'm not sure what you make of
> DaveM's remark about this not being in the original code, but here is
> nonetheless the requested change.

I just saw DaveM's comment and agree the new rule_find is different
but that was intentional and it merged
the finding of the rule in the newlink and dellink paths. I did port
each of the conditions from previous rule_exists
to new rule_find, but forgot to add the new keys which now became
necessary. I replied with details on your
other bug report thread. Also pasting that response here:

So the previous rule_exists code did not check for attribute matches correctly.
It would ignore a rule at the first non-existent attribute mis-match.
And rule_find will always
be called with a valid key.
eg in your case, it would
return at pref mismatch...and never match an existing rule.

$ip -4 rule add table main suppress_prefixlength 0
$ip -4 rule add table main suppress_prefixlength 0
$ip -4 rule add table main suppress_prefixlength 0

$ip rule show
0:      from all lookup local
32763:  from all lookup main suppress_prefixlength 0
32764:  from all lookup main suppress_prefixlength 0
32765:  from all lookup main suppress_prefixlength 0
32766:  from all lookup main
32767:  from all lookup default

With your patch, you should get proper EXISTS check
$ ip -4 rule add table main suppress_prefixlength 0
$ ip -4 rule add table main suppress_prefixlength 0

RTNETLINK answers: File exists

Dave, pls let me know if this is acceptable. If not
I can easily restore the previous rule_exists func. Will also submit a
patch to cover this in self-tests.

thanks.

>
>  net/core/fib_rules.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
> index 126ffc5bc630..bc8425d81022 100644
> --- a/net/core/fib_rules.c
> +++ b/net/core/fib_rules.c
> @@ -416,6 +416,14 @@ static struct fib_rule *rule_find(struct fib_rules_ops *ops,
>                 if (rule->mark && r->mark != rule->mark)
>                         continue;
>
> +               if (rule->suppress_ifgroup != -1 &&
> +                   r->suppress_ifgroup != rule->suppress_ifgroup)
> +                       continue;
> +
> +               if (rule->suppress_prefixlen != -1 &&
> +                   r->suppress_prefixlen != rule->suppress_prefixlen)
> +                       continue;
> +
>                 if (rule->mark_mask && r->mark_mask != rule->mark_mask)
>                         continue;
>
> --

^ permalink raw reply

* Re: [PATCH net-next 1/1] tc-testing: initial version of tunnel_key unit tests
From: Davide Caratti @ 2018-06-26 14:51 UTC (permalink / raw)
  To: Keara Leibovitz, davem; +Cc: netdev, jhs, xiyou.wangcong, jiri, lucasb
In-Reply-To: <1530019039-20519-1-git-send-email-kleib@mojatatu.com>

On Tue, 2018-06-26 at 09:17 -0400, Keara Leibovitz wrote:
> Create unittests for the tc tunnel_key action.
> 
> 
> Signed-off-by: Keara Leibovitz <kleib@mojatatu.com>
> ---
>  .../tc-testing/tc-tests/actions/tunnel_key.json    | 676 +++++++++++++++++++++
>  1 file changed, 676 insertions(+)
>  create mode 100644 tools/testing/selftests/tc-testing/tc-tests/actions/tunnel_key.json
> 
> diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/tunnel_key.json b/tools/testing/selftests/tc-testing/tc-tests/actions/tunnel_key.json
> new file mode 100644
> index 000000000000..bfe522ac8177

hello Keara!

I think the 'teardown' stage in some of these tests should be reviewed.
Those that are meant to test invalid configurations (like dc6b) should
allow non-zero exit codes in the teardown stage, if the wrong
configuration is catched by the userspace TC tool, before talking to the
kernel. 

Otherwise, those tests will fail when they are invoked one by one with the
act_tunnel_key module unloaded.

> --- /dev/null
> +++ b/tools/testing/selftests/tc-testing/tc-tests/actions/tunnel_key.json
> @@ -0,0 +1,676 @@
> 
...

> +    {
> +        "id": "dc6b",
> +        "name": "Add tunnel_key set action with missing mandatory src_ip parameter",
> +        "category": [
> +            "actions",
> +            "tunnel_key"
> +        ],
> +        "setup": [
> +            [
> +                "$TC actions flush action tunnel_key",
> +                0,
> +                1,
> +                255
> +            ]
> +        ],
> +        "cmdUnderTest": "$TC actions add action tunnel_key set dst_ip 20.20.20.2 id 100",
> +        "expExitCode": "255",
> +        "verifyCmd": "$TC actions list action tunnel_key",
> +        "matchPattern": "action order [0-9]+: tunnel_key set.*dst_ip 20.20.20.2.*key_id 100",
> +        "matchCount": "0",
> +        "teardown": [
> +            "$TC actions flush action tunnel_key"
> +        ]
> +    },

example: try the test above as follows:

[root@rhel tc-testing]# modprobe  act_tunnel_key
[root@rhel tc-testing]# ./tdc.py -e dc6b
Test dc6b: Add tunnel_key set action with missing mandatory src_ip parameter
All test results: 

1..1
ok 1 - dc6b # Add tunnel_key set action with missing mandatory src_ip parameter
about to flush the tap output if tests need to be skipped
done flushing skipped test tap output

[root@rhel tc-testing]# modprobe -r act_tunnel_key ; ./tdc.py -p /usr/local/src/iproute2/tc/tc -e dc6b
Test dc6b: Add tunnel_key set action with missing mandatory src_ip parameter

-----> teardown stage *** Could not execute: "$TC actions flush action tunnel_key"

-----> teardown stage *** Error message: "Error: Cannot flush unknown TC action.
We have an error flushing
"
[...]
---------------
accumulated output for this test:
---------------
All test results: 

1..1
about to flush the tap output if tests need to be skipped
ok 1 - dc6b # skipped - previous teardown failed 1 dc6b
done flushing skipped test tap output

(BTW: I'm fixing the bpf test suite for a similar problem, I forgot to fix
it when I posted commit f7017cafcdd ("tc-testing: fix tdc tests for 'bpf'
action") . Sorry for that.)


WDYT?

regards,
-- 
davide

^ permalink raw reply

* Re: [PATCH net-next] rds: clean up loopback rds_connections on netns deletion
From: Sowmini Varadhan @ 2018-06-26 14:53 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, rds-devel, santosh.shilimkar
In-Reply-To: <20180626.232956.1181108479532700313.davem@davemloft.net>

On (06/26/18 23:29), David Miller wrote:
> >> 
> >> Since this probably fixes syzbot reports, this can be targetted
> >> at 'net' instead?
> > 
> > that thought occurred to me but I wanted to be conservative and have
> > it in net-next first, have the syzkaller-bugs team confirm the
> > the fixes and then backport to earlier kernels (if needed)..
> 
> I think there is a way to ask syzbot to test a patch in an
> email.

and just to add, the fix itself is logically correct, so belongs in
net-next. What I dont have (and therefore did not target net) is
official confirmation that the syzbot failures are root-caused to the
absence of this patch (since there is no reproducer for many of these,
and no crash dumps available from syzbot).  

--Sowmini

^ permalink raw reply

* Re: Fwd: [PATCH 0/6] offload Linux LAG devices to the TC datapath
From: Or Gerlitz @ 2018-06-26 14:57 UTC (permalink / raw)
  To: John Hurley, Jakub Kicinski, Jiri Pirko
  Cc: netdev, ASAP_Direct_Dev, simon.horman, Andy Gospodarek
In-Reply-To: <8f406548-8f90-b658-fcd1-342d702b3445@mellanox.com>

> -------- Forwarded Message --------
> Subject: [PATCH 0/6] offload Linux LAG devices to the TC datapath
> Date: Thu, 21 Jun 2018 14:35:55 +0100
> From: John Hurley <john.hurley@netronome.com>
> To: dev@openvswitch.org, roid@mellanox.com, gavi@mellanox.com, paulb@mellanox.com, fbl@sysclose.org, simon.horman@netronome.com
> CC: John Hurley <john.hurley@netronome.com>
> 
> This patchset extends OvS TC and the linux-netdev implementation to
> support the offloading of Linux Link Aggregation devices (LAG) and their
> slaves. TC blocks are used to provide this offload. Blocks, in TC, group
> together a series of qdiscs. If a filter is added to one of these qdiscs
> then it applied to all. Similarly, if a packet is matched on one of the
> grouped qdiscs then the stats for the entire block are increased. The
> basis of the LAG offload is that the LAG master (attached to the OvS
> bridge) and slaves that may exist outside of OvS are all added to the same
> TC block. OvS can then control the filters and collect the stats on the
> slaves via its interaction with the LAG master.
> 
> The TC API is extended within OvS to allow the addition of a block id to
> ingress qdisc adds. Block ids are then assigned to each LAG master that is
> attached to the OvS bridge. The linux netdev netlink socket is used to
> monitor slave devices. If a LAG slave is found whose master is on the bridge
> then it is added to the same block as its master. If the underlying slaves
> belong to an offloadable device then the Linux LAG device can be offloaded
> to hardware.

Guys (J/J/J), 

Doing this here b/c

a. this has impact on the kernel side of things

b. I am more of a netdev and not openvswitch citizen..

some comments, 

1. this + Jakub's patch for the reply are really a great design

2. re the egress side of things. Some NIC HWs can't just use LAG
as the egress port destination of an ACL (tc rule) and the HW rule
needs to be duplicated to both HW ports. So... in that case, you 
see the HW driver doing the duplication (:() or we can somehow
make it happen from user-space?

3. for the case of overlay networks, e.g OVS based vxlan tunnel, the
ingress (decap) rule is set on the vxlan device. Jakub, you mentioned 
a possible kernel patch to the HW (nfp, mlx5) drivers to have them bind 
to the tunnel device for ingress rules. If we have agreed way to identify
uplink representors, can we do that from ovs too? does it matter if we are
bonding + encapsulating or just encapsulating? note that under encap scheme
the bond is typically not part of the OVS bridge. 

Or.

^ permalink raw reply

* Re: [PATCH net-next] rds: clean up loopback rds_connections on netns deletion
From: Sowmini Varadhan @ 2018-06-26 15:04 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: David Miller, netdev, rds-devel, Santosh Shilimkar,
	syzkaller-bugs
In-Reply-To: <CACT4Y+YFX3jiGtugGmO+_taxJgfn9eo1YOz_jufeBmkv_V72uA@mail.gmail.com>

On (06/26/18 16:48), Dmitry Vyukov wrote:
> it probably hit the race by a pure luck of the large program, but then
> never had the same luck when tried to remove any syscalls.
> So it can make sense to submit several test requests to get more testing.

How does one submit test requests by email? 

the last time I asked this question, the answer was a pointer to
https://groups.google.com/forum/#!msg/syzkaller-bugs/7ucgCkAJKSk/skZjgavRAQAJ

Thanks
--Sowmini

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Cornelia Huck @ 2018-06-26 15:08 UTC (permalink / raw)
  To: Siwei Liu
  Cc: Alexander Duyck, virtio-dev, Jiri Pirko, Michael S. Tsirkin,
	Jakub Kicinski, Samudrala, Sridhar, konrad.wilk, qemu-devel,
	virtualization, Venu Busireddy, Netdev, boris.ostrovsky,
	aaron.f.brown, Joao Martins
In-Reply-To: <CADGSJ21HNd4VYNcCt4H0gJ_CCx1GUFpHrDof2N=4WqhD24Zc2A@mail.gmail.com>

On Fri, 22 Jun 2018 17:05:04 -0700
Siwei Liu <loseweigh@gmail.com> wrote:

> On Fri, Jun 22, 2018 at 3:33 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > I suspect the diveregence will be lost on most users though
> > simply because they don't even care about vfio. They just
> > want things to go fast.  
> 
> Like Jason said, VF isn't faster than virtio-net in all cases. It
> depends on the workload and performance metrics: throughput, latency,
> or packet per second.

So, will it be guest/admin-controllable then where the traffic flows
through? Just because we do have a vf available after negotiation of
the feature bit, it does not necessarily mean we want to use it? Do we
(the guest) even want to make it visible in that case?

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Cornelia Huck @ 2018-06-26 15:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Samudrala, Sridhar, Siwei Liu, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180626044650-mutt-send-email-mst@kernel.org>

On Tue, 26 Jun 2018 04:50:25 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote:
> > > > > > Might not neccessarily be something wrong, but it's very limited to
> > > > > > prohibit the MAC of VF from changing when enslaved by failover.  
> > > > > You mean guest changing MAC? I'm not sure why we prohibit that.  
> > > > I think Sridhar and Jiri might be better person to answer it. My
> > > > impression was that sync'ing the MAC address change between all 3
> > > > devices is challenging, as the failover driver uses MAC address to
> > > > match net_device internally.  
> > 
> > Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement
> > of the MAC between the PF and VF.  Allowing the guest to change the MAC will require
> > synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers
> > don't allow changing guest MAC unless it is a trusted VF.  
> 
> OK but it's a policy thing. Maybe it's a trusted VF. Who knows?
> For example I can see host just
> failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it.
> I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest.
> 

So, what I get from this is that QEMU needs to be able to control all
of standby, uuid, and mac to accommodate the different setups
(respectively have libvirt/management software set it up). Is the host
able to find out respectively define whether a VF is trusted?

^ permalink raw reply

* Re: [PATCH net-next] rds: clean up loopback rds_connections on netns deletion
From: Dmitry Vyukov @ 2018-06-26 15:21 UTC (permalink / raw)
  To: Sowmini Varadhan
  Cc: David Miller, netdev, rds-devel, Santosh Shilimkar,
	syzkaller-bugs
In-Reply-To: <20180626150442.GI20575@oracle.com>

On Tue, Jun 26, 2018 at 5:04 PM, Sowmini Varadhan
<sowmini.varadhan@oracle.com> wrote:
> On (06/26/18 16:48), Dmitry Vyukov wrote:
>> it probably hit the race by a pure luck of the large program, but then
>> never had the same luck when tried to remove any syscalls.
>> So it can make sense to submit several test requests to get more testing.
>
> How does one submit test requests by email?

https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches

> the last time I asked this question, the answer was a pointer to
> https://groups.google.com/forum/#!msg/syzkaller-bugs/7ucgCkAJKSk/skZjgavRAQAJ

You probably asked to apply an unsubmitted patch to syzbot git tree.
That's the question that I gave that link to. But now it's also
detailed here:

https://github.com/google/syzkaller/blob/master/docs/syzbot.md#no-custom-patches

^ permalink raw reply

* Re: [PATCH v3,net-next] vlan: implement vlan id and protocol changes
From: Ido Schimmel @ 2018-06-26 15:29 UTC (permalink / raw)
  To: Chas Williams; +Cc: dsa, David S. Miller, netdev, Roopa Prabhu, idosch
In-Reply-To: <CAG2-GkmUJCc2bvOpaXsnUsEeJCLjWeYrs4Xe2kF_9M48FMRTzA@mail.gmail.com>

On Tue, Jun 26, 2018 at 09:33:40AM -0400, Chas Williams wrote:
> On Tue, Jun 26, 2018 at 6:32 AM Ido Schimmel <idosch@idosch.org> wrote:
> 
> > On Mon, Jun 25, 2018 at 02:45:24PM -0600, David Ahern wrote:
> > > On 6/25/18 4:30 AM, Chas Williams wrote:
> > > > vlan_changelink silently ignores attempts to change the vlan id
> > > > or protocol id of an existing vlan interface.  Implement by adding
> > > > the new vlan id and protocol to the interface's vlan group and then
> > > > removing the old vlan id and protocol from the vlan group.
> > > >
> > > > Signed-off-by: Chas Williams <3chas3@gmail.com>
> > > > ---
> > > >  include/linux/netdevice.h |  1 +
> > > >  net/8021q/vlan.c          |  4 ++--
> > > >  net/8021q/vlan.h          |  2 ++
> > > >  net/8021q/vlan_netlink.c  | 38 ++++++++++++++++++++++++++++++++++++++
> > > >  net/core/dev.c            |  1 +
> > > >  5 files changed, 44 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > > index 3ec9850c7936..a95ae238addf 100644
> > > > --- a/include/linux/netdevice.h
> > > > +++ b/include/linux/netdevice.h
> > > > @@ -2409,6 +2409,7 @@ enum netdev_cmd {
> > > >     NETDEV_CVLAN_FILTER_DROP_INFO,
> > > >     NETDEV_SVLAN_FILTER_PUSH_INFO,
> > > >     NETDEV_SVLAN_FILTER_DROP_INFO,
> > > > +   NETDEV_CHANGEVLAN,
> > > >  };
> > > >  const char *netdev_cmd_to_name(enum netdev_cmd cmd);
> > > >
> > >
> > > you add the new notifier, but do not add any hooks to catch and process
> > it.
> > >
> > > Personally, I think it is a bit sketchy to change the vlan id on an
> > > existing device and I suspect it will cause latent errors.
> >
> > +1
> >
> > >
> > > What's your use case for trying to implement the change versus causing
> > > it to generate an unsupported error?
> > >
> > > If this patch does get accepted, I believe the mlxsw switchdev driver
> > > will be impacted.
> >
> > Yes, at minimum we need to return an error for NETDEV_CHANGEVLAN, but
> > looking at the code it seems that there's no proper rollback.
> >
> 
> I would prefer not to bother with error handling on the notification.  If
> something misses the notification, something misses the notification.
> It happens.

The notification is used so that relevant users in the kernel can
potentially veto the operation and refuse it. See other notifications
such as NETDEV_PRECHANGEUPPER.

The driver David mentioned is one existing user that needs to refuse the
VLAN change as it can't support it.

^ permalink raw reply

* Re: [virtio-dev] Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Michael S. Tsirkin @ 2018-06-26 15:38 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Samudrala, Sridhar, Siwei Liu, Alexander Duyck, virtio-dev,
	aaron.f.brown, Jiri Pirko, Jakub Kicinski, Netdev, qemu-devel,
	virtualization, konrad.wilk, boris.ostrovsky, Joao Martins,
	Venu Busireddy, vijay.balakrishna
In-Reply-To: <20180626171732.5038f53f.cohuck@redhat.com>

On Tue, Jun 26, 2018 at 05:17:32PM +0200, Cornelia Huck wrote:
> On Tue, 26 Jun 2018 04:50:25 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Jun 25, 2018 at 10:54:09AM -0700, Samudrala, Sridhar wrote:
> > > > > > > Might not neccessarily be something wrong, but it's very limited to
> > > > > > > prohibit the MAC of VF from changing when enslaved by failover.  
> > > > > > You mean guest changing MAC? I'm not sure why we prohibit that.  
> > > > > I think Sridhar and Jiri might be better person to answer it. My
> > > > > impression was that sync'ing the MAC address change between all 3
> > > > > devices is challenging, as the failover driver uses MAC address to
> > > > > match net_device internally.  
> > > 
> > > Yes. The MAC address is assigned by the hypervisor and it needs to manage the movement
> > > of the MAC between the PF and VF.  Allowing the guest to change the MAC will require
> > > synchronization between the hypervisor and the PF/VF drivers. Most of the VF drivers
> > > don't allow changing guest MAC unless it is a trusted VF.  
> > 
> > OK but it's a policy thing. Maybe it's a trusted VF. Who knows?
> > For example I can see host just
> > failing VIRTIO_NET_CTRL_MAC_ADDR_SET if it wants to block it.
> > I'm not sure why VIRTIO_NET_F_STANDBY has to block it in the guest.
> > 
> 
> So, what I get from this is that QEMU needs to be able to control all
> of standby, uuid, and mac to accommodate the different setups
> (respectively have libvirt/management software set it up). Is the host
> able to find out respectively define whether a VF is trusted?

You do it with ip link I think but QEMU doesn't normally do this,
it relies on libvirt to poke at host kernel and supply the info.

-- 
MST

^ permalink raw reply

* [PATCH 0/3] xdp: don't mix XDP_TX and XDP_REDIRECT flush ops
From: Jesper Dangaard Brouer @ 2018-06-26 15:39 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer
  Cc: John Fastabend, Jason Wang, Daniel Borkmann, BjörnTöpel,
	Alexei Starovoitov

Fix driver logic that are combining XDP_TX flush and XDP_REDIRECT map
flushing.  These are two different XDP xmit modes, and it is clearly
wrong to invoke both types of flush operations when only one of the
XDP xmit modes is used.

---
Unsure what git tree to send this against. Thus, I'll leave it up-to
the patchwork assigner ;-)


Jesper Dangaard Brouer (3):
      ixgbe: split XDP_TX tail and XDP_REDIRECT map flushing
      i40e: split XDP_TX tail and XDP_REDIRECT map flushing
      virtio_net: split XDP_TX kick and XDP_REDIRECT map flushing


 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   24 +++++++++++++-------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   24 ++++++++++++--------
 drivers/net/virtio_net.c                      |   30 ++++++++++++++++---------
 3 files changed, 48 insertions(+), 30 deletions(-)

^ permalink raw reply

* [PATCH 1/3] ixgbe: split XDP_TX tail and XDP_REDIRECT map flushing
From: Jesper Dangaard Brouer @ 2018-06-26 15:39 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer
  Cc: John Fastabend, Jason Wang, Daniel Borkmann, BjörnTöpel,
	Alexei Starovoitov
In-Reply-To: <153002741940.15389.10466368482771753300.stgit@firesoul>

The driver was combining the XDP_TX tail flush and XDP_REDIRECT
map flushing (xdp_do_flush_map).  This is suboptimal, these two
flush operations should be kept separate.

Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 4929f7265598..5f8a969638b2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2186,9 +2186,10 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
 	return skb;
 }
 
-#define IXGBE_XDP_PASS 0
-#define IXGBE_XDP_CONSUMED 1
-#define IXGBE_XDP_TX 2
+#define IXGBE_XDP_PASS		0
+#define IXGBE_XDP_CONSUMED	BIT(0)
+#define IXGBE_XDP_TX		BIT(1)
+#define IXGBE_XDP_REDIR		BIT(2)
 
 static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
 			       struct xdp_frame *xdpf);
@@ -2225,7 +2226,7 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 	case XDP_REDIRECT:
 		err = xdp_do_redirect(adapter->netdev, xdp, xdp_prog);
 		if (!err)
-			result = IXGBE_XDP_TX;
+			result = IXGBE_XDP_REDIR;
 		else
 			result = IXGBE_XDP_CONSUMED;
 		break;
@@ -2285,7 +2286,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	unsigned int mss = 0;
 #endif /* IXGBE_FCOE */
 	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
-	bool xdp_xmit = false;
+	unsigned int xdp_xmit = 0;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
@@ -2328,8 +2329,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		}
 
 		if (IS_ERR(skb)) {
-			if (PTR_ERR(skb) == -IXGBE_XDP_TX) {
-				xdp_xmit = true;
+			unsigned int xdp_res = -PTR_ERR(skb);
+
+			if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR)) {
+				xdp_xmit |= xdp_res;
 				ixgbe_rx_buffer_flip(rx_ring, rx_buffer, size);
 			} else {
 				rx_buffer->pagecnt_bias++;
@@ -2401,7 +2404,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		total_rx_packets++;
 	}
 
-	if (xdp_xmit) {
+	if (xdp_xmit & IXGBE_XDP_REDIR)
+		xdp_do_flush_map();
+
+	if (xdp_xmit & IXGBE_XDP_TX) {
 		struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
 
 		/* Force memory writes to complete before letting h/w
@@ -2409,8 +2415,6 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		 */
 		wmb();
 		writel(ring->next_to_use, ring->tail);
-
-		xdp_do_flush_map();
 	}
 
 	u64_stats_update_begin(&rx_ring->syncp);

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox