Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] macsec: add genl family module alias
From: David Miller @ 2017-08-22 21:26 UTC (permalink / raw)
  To: sd; +Cc: netdev
In-Reply-To: <dde846e6ac5c2bc4189c240be6ccaa54d3b60d5e.1503406499.git.sd@queasysnail.net>

From: Sabrina Dubroca <sd@queasysnail.net>
Date: Tue, 22 Aug 2017 15:36:08 +0200

> This helps tools such as wpa_supplicant can start even if the macsec
> module isn't loaded yet.
> 
> Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net v1 0/2] tipc: topology server fixes
From: David Miller @ 2017-08-22 21:25 UTC (permalink / raw)
  To: parthasarathy.bhuvaragan
  Cc: netdev, tipc-discussion, jon.maloy, maloy, ying.xue
In-Reply-To: <1503397721-19682-1-git-send-email-parthasarathy.bhuvaragan@ericsson.com>

From: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Date: Tue, 22 Aug 2017 12:28:39 +0200

> The following commits fixes two race conditions causing general
> protection faults.

Series applied, thank you.

^ permalink raw reply

* Re: [PATCH v2] i40e/i40evf: fix out-of-bounds read of cpumask
From: Stefano Brivio @ 2017-08-22 21:23 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Intel Wired LAN, netdev, stable, Juergen Gross
In-Reply-To: <20170822210442.18006-1-jacob.e.keller@intel.com>

[Fixed Cc: address for stable, Cc'ed Juergen]

On Tue, 22 Aug 2017 14:04:42 -0700
Jacob Keller <jacob.e.keller@intel.com> wrote:

> When responding to an affinity hint we directly copied a cpumask value,
> intsead of using cpumask_copy. According to cpumask.h this is not
> correct because cpumask_t is only guaranteed to have enough space for
> the number of CPUs in the system, and may not be as big as we expect.
> Thus a direct copy results in an out-of-bound read and potentially
> a crash if the pages are aligned just right. This will be easily
> detected on a kernel with KASAN enabled:

I still think commit message of my patch
(ae9c9586f61e914dc1c6fe2e6ac1fb2bf07283bc.1502792828.git.sbrivio@redhat.com)
was perhaps a bit clearer, but okay, this is also clear, fair enough.

> KASAN reports:
> [   25.242312] BUG: KASAN: slab-out-of-bounds in i40e_irq_affinity_notify+0x30/0x50 [i40e] at addr ffff880462eea960
[...]
> [   25.242597] ==================================================================

This is also taken from my message, not terribly happy about it
(and still happier with it than without). Fair enough, whatever it
takes to get this applied as soon as possible...

> Fixes: 96db776a3682 ("i40e/i40evf: fix interrupt affinity bug", 2016-09-14)
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Cc: stable@vger.kernel.org # 4.10+

FWIW,

Acked-by: Stefano Brivio <sbrivio@redhat.com>

^ permalink raw reply

* Re: [patch net] mlxsw: spectrum_switchdev: Fix mrouter flag update
From: David Miller @ 2017-08-22 21:23 UTC (permalink / raw)
  To: jiri; +Cc: netdev, nogahf, idosch, mlxsw
In-Reply-To: <20170822082811.1356-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Tue, 22 Aug 2017 10:28:11 +0200

> From: Nogah Frankel <nogahf@mellanox.com>
> 
> Update the value of the mrouter flag in struct mlxsw_sp_bridge_port when
> it is being changed.
> 
> Fixes: c57529e1d5d8 ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
> Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
> Reviewed-by: Ido Schimmel <idosch@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: ethernet: freescale: fs_enet: make mdiobb_ops const
From: David Miller @ 2017-08-22 21:23 UTC (permalink / raw)
  To: bhumirks
  Cc: julia.lawall, pantelis.antoniou, vbordug, linuxppc-dev, netdev,
	linux-kernel
In-Reply-To: <1503389759-17545-1-git-send-email-bhumirks@gmail.com>

From: Bhumika Goyal <bhumirks@gmail.com>
Date: Tue, 22 Aug 2017 13:45:59 +0530

> Make this const as it is only stored in a const field of a
> mdiobb_ctrl structure.
> 
> Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: mdio-gpio: make mdiobb_ops const
From: David Miller @ 2017-08-22 21:23 UTC (permalink / raw)
  To: bhumirks; +Cc: julia.lawall, andrew, f.fainelli, netdev, linux-kernel
In-Reply-To: <1503389609-17490-1-git-send-email-bhumirks@gmail.com>

From: Bhumika Goyal <bhumirks@gmail.com>
Date: Tue, 22 Aug 2017 13:43:29 +0530

> Make this const as it is only stored in a const field of a
> mdiobb_ctrl structure.
> 
> Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 4/5] xdp: remove net_device names from xdp_redirect tracepoint
From: Daniel Borkmann @ 2017-08-22 21:23 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev; +Cc: John Fastabend
In-Reply-To: <150343485489.31091.9423090032200726744.stgit@firesoul>

On 08/22/2017 10:47 PM, Jesper Dangaard Brouer wrote:
> There is too much overhead in the current trace_xdp_redirect
> tracepoint as it does strcpy and strlen on the net_device names.
>
> Besides, exposing the ifindex/index is actually the information that
> is needed in the tracepoint to diagnose issues.  When a lookup fails
> (either ifindex or devmap index) then there is a need for saying which
> to_index that have issues.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH net-next 3/5] ixgbe: use return codes from ndo_xdp_xmit that are distinguishable
From: Daniel Borkmann @ 2017-08-22 21:21 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev; +Cc: John Fastabend
In-Reply-To: <150343484979.31091.3667334412617955097.stgit@firesoul>

On 08/22/2017 10:47 PM, Jesper Dangaard Brouer wrote:
> For XDP_REDIRECT the use of return code -EINVAL is confusing, as it is
> used in three different cases.  (1) When the index or ifindex lookup
> fails, and in the ixgbe driver (2) when link is down and (3) when XDP
> have not been enabled.
>
> The return code can be picked up by the tracepoint xdp:xdp_redirect
> for diagnosing why XDP_REDIRECT isn't working.  Thus, there is a need
> different return codes to tell the issues apart.
>
> I'm considering using a specific err-code scheme for XDP_REDIRECT
> instead of using these errno codes.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH net-next] net: sched: use kvmalloc() for class hash tables
From: Alexei Starovoitov @ 2017-08-22 21:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Jamal Hadi Salim, Cong Wang, Jiri Pirko
In-Reply-To: <1503430006.2499.55.camel@edumazet-glaptop3.roam.corp.google.com>

On Tue, Aug 22, 2017 at 12:26:46PM -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> High order GFP_KERNEL allocations can stress the host badly.
> 
> Use modern kvmalloc_array()/kvfree() instead of custom
> allocations.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Thanks for fixing these issues.
Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply

* Re: [PATCH net-next 2/5] xdp: make generic xdp redirect use tracepoint trace_xdp_redirect
From: Daniel Borkmann @ 2017-08-22 21:21 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev; +Cc: John Fastabend
In-Reply-To: <150343484470.31091.11187498529801746935.stgit@firesoul>

On 08/22/2017 10:47 PM, Jesper Dangaard Brouer wrote:
> If the xdp_do_generic_redirect() call fails, it trigger the
> trace_xdp_exception tracepoint.  It seems better to use the same
> tracepoint trace_xdp_redirect, as the native xdp_do_redirect{,_map} does.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Makes sense to make this consistent.

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH] net: ethernet: ax88796: make mdiobb_ops const
From: David Miller @ 2017-08-22 21:21 UTC (permalink / raw)
  To: bhumirks; +Cc: julia.lawall, netdev, linux-kernel
In-Reply-To: <1503389479-17429-1-git-send-email-bhumirks@gmail.com>

From: Bhumika Goyal <bhumirks@gmail.com>
Date: Tue, 22 Aug 2017 13:41:19 +0530

> Make this const as it is only stored in a const field of a
> mdiobb_ctrl structure.
> 
> Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] net: ftgmac100: Fix oops in probe on failure to find associated PHY
From: David Miller @ 2017-08-22 21:20 UTC (permalink / raw)
  To: andrew; +Cc: netdev, benh, joel, linux-kernel, openbmc, linux-aspeed,
	ryan_chen
In-Reply-To: <20170822063622.21550-1-andrew@aj.id.au>

From: Andrew Jeffery <andrew@aj.id.au>
Date: Tue, 22 Aug 2017 16:06:22 +0930

> netif_napi_del() should be paired with netif_napi_add(), however no
> such call takes place in ftgmac100_probe(). This triggers a NULL
> pointer dereference if e.g. no PHY is found by the MDIO probe:
 ...
> Signed-off-by: Andrew Jeffery <andrew@aj.id.au>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next 1/5] xdp: remove bpf_warn_invalid_xdp_redirect
From: Daniel Borkmann @ 2017-08-22 21:19 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev; +Cc: John Fastabend
In-Reply-To: <150343483960.31091.3438305938845615426.stgit@firesoul>

On 08/22/2017 10:47 PM, Jesper Dangaard Brouer wrote:
> Given there is a tracepoint that can track the error code
> of xdp_do_redirect calls, the WARN_ONCE in bpf_warn_invalid_xdp_redirect
> doesn't seem relevant any longer.  Simply remove the function.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH net-next v3 0/2] Simplify the tcp_conn_request.
From: David Miller @ 2017-08-22 21:16 UTC (permalink / raw)
  To: xiangxia.m.yue; +Cc: netdev, eric.dumazet
In-Reply-To: <1503383629-12392-1-git-send-email-xiangxia.m.yue@gmail.com>

From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Date: Mon, 21 Aug 2017 23:33:47 -0700

> Just simplify the tcp_conn_request function.

Series applied, thanks.

The explicit 'dst = NULL' in the variable declaration is probably
superfluous now.

And in fact it was hiding the bug that we didn't have a proper
'dst' early enough.

^ permalink raw reply

* Re: [PATCH 0/2] net: Fix crashes due to activity during suspend
From: Geert Uytterhoeven @ 2017-08-22 21:16 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Geert Uytterhoeven, David S . Miller, Steve Glendinning,
	Andrew Lunn, Lukas Wunner, Rafael J . Wysocki,
	netdev@vger.kernel.org, Linux PM list, Linux-Renesas,
	linux-kernel@vger.kernel.org
In-Reply-To: <757f672b-d3cf-9189-0533-cb45e325c6b8@gmail.com>

Hi Florian,

On Tue, Aug 22, 2017 at 8:49 PM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 08/22/2017 11:37 AM, Geert Uytterhoeven wrote:
>> If an Ethernet device is used while the device is suspended, the system may
>> crash.
>>
>> E.g. on sh73a0/kzm9g and r8a73a4/ape6evm, the external Ethernet chip is
>> driven by a PM controlled clock.  If the Ethernet registers are accessed
>> while the clock is not running, the system will crash with an imprecise
>> external abort.
>>
>> This patch series fixes two of such crashes:
>>   1. The first patch prevents the PHY polling state machine from accessing
>>      PHY registers while a device is suspended,
>>   2. The second patch prevents the net core from trying to transmit packets
>>      when an smsc911x device is suspended.
>>
>> Both crashes can be reproduced on sh73a0/kzm9g and r8a73a4/ape6evm during
>> s2ram (rarely), or by using pm_test (more likely to trigger):
>>
>>     # echo 0 > /sys/module/printk/parameters/console_suspend
>>     # echo platform > /sys/power/pm_test
>>     # echo mem > /sys/power/state
>>
>> With this series applied, my test systems survive a loop of 100 test
>> suspends.
>
> It seems to me like part, if not the entire problem is that smsc91xx's
> suspend and resume functions are way too simplistic and absolutely do
> not manage the PHY during suspend/resume, the PHY state machine is not
> even stopped, so of course, this will cause bus errors if you access
> those registers.
>
> You are addressing this as part of patch 2, but this seems to me like
> this is still a bit incomplete and you'd need at least phy_stop() and/or
> phy_suspend() (does a power down of the PHY) and phy_start() and/or
> phy_resume() calls to complete the PHY state machine shutdown during
> suspend.
>
> Have you tried that?

Thank you, I will give that a try!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* RE: [PATCH v2] i40e/i40evf: fix out-of-bounds read of cpumask
From: Keller, Jacob E @ 2017-08-22 21:15 UTC (permalink / raw)
  To: Keller, Jacob E, Intel Wired LAN
  Cc: netdev@vger.kernel.org, stable@vger.kernel.org#4.10+
In-Reply-To: <20170822210442.18006-1-jacob.e.keller@intel.com>

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On
> Behalf Of Jacob Keller
> Sent: Tuesday, August 22, 2017 2:05 PM
> To: Intel Wired LAN <intel-wired-lan@lists.osuosl.org>
> Cc: netdev@vger.kernel.org; Keller, Jacob E <jacob.e.keller@intel.com>;
> stable@vger.kernel.org#4.10+
> Subject: [PATCH v2] i40e/i40evf: fix out-of-bounds read of cpumask
> 
> When responding to an affinity hint we directly copied a cpumask value,
> intsead of using cpumask_copy. According to cpumask.h this is not
> correct because cpumask_t is only guaranteed to have enough space for
> the number of CPUs in the system, and may not be as big as we expect.
> Thus a direct copy results in an out-of-bound read and potentially
> a crash if the pages are aligned just right. This will be easily
> detected on a kernel with KASAN enabled:
> 
> KASAN reports:
> [   25.242312] BUG: KASAN: slab-out-of-bounds in
> i40e_irq_affinity_notify+0x30/0x50 [i40e] at addr ffff880462eea960
> [   25.242315] Read of size 1024 by task kworker/2:1/170
> [   25.242322] CPU: 2 PID: 170 Comm: kworker/2:1 Not tainted 4.11.0-
> 22.el7a.x86_64 #1
> [   25.242325] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 05/06/2015
> [   25.242336] Workqueue: events irq_affinity_notify
> [   25.242340] Call Trace:
> [   25.242350]  dump_stack+0x63/0x8d
> [   25.242358]  kasan_object_err+0x21/0x70
> [   25.242364]  kasan_report+0x288/0x540
> [   25.242397]  ? i40e_irq_affinity_notify+0x30/0x50 [i40e]
> [   25.242403]  check_memory_region+0x13c/0x1a0
> [   25.242408]  __asan_loadN+0xf/0x20
> [   25.242440]  i40e_irq_affinity_notify+0x30/0x50 [i40e]
> [   25.242446]  irq_affinity_notify+0x1b4/0x230
> [   25.242452]  ? irq_set_affinity_notifier+0x130/0x130
> [   25.242457]  ? kasan_slab_free+0x89/0xc0
> [   25.242466]  process_one_work+0x32f/0x6f0
> [   25.242472]  worker_thread+0x89/0x770
> [   25.242481]  ? pci_mmcfg_check_reserved+0xc0/0xc0
> [   25.242488]  kthread+0x18c/0x1e0
> [   25.242493]  ? process_one_work+0x6f0/0x6f0
> [   25.242499]  ? kthread_create_on_node+0xc0/0xc0
> [   25.242506]  ret_from_fork+0x2c/0x40
> [   25.242511] Object at ffff880462eea960, in cache kmalloc-8 size: 8
> [   25.242513] Allocated:
> [   25.242514] PID = 170
> [   25.242522]  save_stack_trace+0x1b/0x20
> [   25.242529]  save_stack+0x46/0xd0
> [   25.242533]  kasan_kmalloc+0xad/0xe0
> [   25.242537]  __kmalloc_node+0x12c/0x2b0
> [   25.242542]  alloc_cpumask_var_node+0x3c/0x60
> [   25.242546]  alloc_cpumask_var+0xe/0x10
> [   25.242550]  irq_affinity_notify+0x94/0x230
> [   25.242555]  process_one_work+0x32f/0x6f0
> [   25.242559]  worker_thread+0x89/0x770
> [   25.242564]  kthread+0x18c/0x1e0
> [   25.242568]  ret_from_fork+0x2c/0x40
> [   25.242569] Freed:
> [   25.242570] PID = 0
> [   25.242572] (stack is not available)
> [   25.242573] Memory state around the buggy address:
> [   25.242578]  ffff880462eea800: fc fc 00 fc fc 00 fc fc 00 fc fc 00 fc fc fb fc
> [   25.242582]  ffff880462eea880: fc fb fc fc fb fc fc 00 fc fc 00 fc fc 00 fc fc
> [   25.242586] >ffff880462eea900: 00 fc fc 00 fc fc 00 fc fc fb fc fc 00 fc fc fc
> [   25.242588]                                                           ^
> [   25.242592]  ffff880462eea980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [   25.242596]  ffff880462eeaa00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [   25.242597]
> ==================================================================
> 
> Fixes: 96db776a3682 ("i40e/i40evf: fix interrupt affinity bug", 2016-09-14)
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Cc: stable@vger.kernel.org # 4.10+
> ---
> This updates the commit message for the original fix, and indicates that
> it fixes a potential crash, as well as tagged the commit for stable and
> added a Fixes to indicate which commit this fixes.
> 

I should have noted, I changed the title to be more accurate as well, this is a v2 of https://patchwork.ozlabs.org/patch/787388/

^ permalink raw reply

* [PATCH v2 net-next 2/2] selftests/net: Add a test to validate behavior of rx timestamps
From: Mike Maloney @ 2017-08-22 21:08 UTC (permalink / raw)
  To: netdev, davem; +Cc: willemdebruijn.kernel, soheil, Mike Maloney
In-Reply-To: <20170822210849.23162-1-maloneykernel@gmail.com>

From: Mike Maloney <maloney@google.com>

Validate the behavior of the combination of various timestamp socket
options, and ensure consistency across ip, udp, and tcp.

Signed-off-by: Mike Maloney <maloney@google.com>
---
 .../selftests/networking/timestamping/.gitignore   |   1 +
 .../selftests/networking/timestamping/Makefile     |   4 +-
 .../networking/timestamping/rxtimestamp.c          | 389 +++++++++++++++++++++
 3 files changed, 393 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/networking/timestamping/rxtimestamp.c

diff --git a/tools/testing/selftests/networking/timestamping/.gitignore b/tools/testing/selftests/networking/timestamping/.gitignore
index 9e69e982fb38..d9355035e746 100644
--- a/tools/testing/selftests/networking/timestamping/.gitignore
+++ b/tools/testing/selftests/networking/timestamping/.gitignore
@@ -1,3 +1,4 @@
 timestamping
+rxtimestamp
 txtimestamp
 hwtstamp_config
diff --git a/tools/testing/selftests/networking/timestamping/Makefile b/tools/testing/selftests/networking/timestamping/Makefile
index ccbb9edbbbb9..92fb8ee917c5 100644
--- a/tools/testing/selftests/networking/timestamping/Makefile
+++ b/tools/testing/selftests/networking/timestamping/Makefile
@@ -1,4 +1,6 @@
-TEST_PROGS := hwtstamp_config timestamping txtimestamp
+CFLAGS += -I../../../../../usr/include
+
+TEST_PROGS := hwtstamp_config rxtimestamp timestamping txtimestamp
 
 all: $(TEST_PROGS)
 
diff --git a/tools/testing/selftests/networking/timestamping/rxtimestamp.c b/tools/testing/selftests/networking/timestamping/rxtimestamp.c
new file mode 100644
index 000000000000..00f286661dcd
--- /dev/null
+++ b/tools/testing/selftests/networking/timestamping/rxtimestamp.c
@@ -0,0 +1,389 @@
+#include <errno.h>
+#include <error.h>
+#include <getopt.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <sys/time.h>
+#include <sys/socket.h>
+#include <sys/select.h>
+#include <sys/ioctl.h>
+#include <arpa/inet.h>
+#include <net/if.h>
+
+#include <asm/types.h>
+#include <linux/net_tstamp.h>
+#include <linux/errqueue.h>
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+
+struct options {
+	int so_timestamp;
+	int so_timestampns;
+	int so_timestamping;
+};
+
+struct tstamps {
+	bool tstamp;
+	bool tstampns;
+	bool swtstamp;
+	bool hwtstamp;
+};
+
+struct socket_type {
+	char *friendly_name;
+	int type;
+	int protocol;
+	bool enabled;
+};
+
+struct test_case {
+	struct options sockopt;
+	struct tstamps expected;
+	bool enabled;
+};
+
+struct sof_flag {
+	int mask;
+	char *name;
+};
+
+static struct sof_flag sof_flags[] = {
+#define SOF_FLAG(f) { f, #f }
+	SOF_FLAG(SOF_TIMESTAMPING_SOFTWARE),
+	SOF_FLAG(SOF_TIMESTAMPING_RX_SOFTWARE),
+	SOF_FLAG(SOF_TIMESTAMPING_RX_HARDWARE),
+};
+
+static struct socket_type socket_types[] = {
+	{ "ip",		SOCK_RAW,	IPPROTO_EGP },
+	{ "udp",	SOCK_DGRAM,	IPPROTO_UDP },
+	{ "tcp",	SOCK_STREAM,	IPPROTO_TCP },
+};
+
+static struct test_case test_cases[] = {
+	{ {}, {} },
+	{
+		{ so_timestamp: 1 },
+		{ tstamp: true }
+	},
+	{
+		{ so_timestampns: 1 },
+		{ tstampns: true }
+	},
+	{
+		{ so_timestamp: 1, so_timestampns: 1 },
+		{ tstampns: true }
+	},
+	{
+		{ so_timestamping: SOF_TIMESTAMPING_RX_SOFTWARE },
+		{}
+	},
+	{
+		/* Loopback device does not support hw timestamps. */
+		{ so_timestamping: SOF_TIMESTAMPING_RX_HARDWARE },
+		{}
+	},
+	{
+		{ so_timestamping: SOF_TIMESTAMPING_SOFTWARE },
+		{}
+	},
+	{
+		{ so_timestamping: SOF_TIMESTAMPING_RX_SOFTWARE
+			| SOF_TIMESTAMPING_RX_HARDWARE },
+		{}
+	},
+	{
+		{ so_timestamping: SOF_TIMESTAMPING_SOFTWARE
+			| SOF_TIMESTAMPING_RX_SOFTWARE },
+		{ swtstamp: true }
+	},
+	{
+		{ so_timestamp: 1, so_timestamping: SOF_TIMESTAMPING_SOFTWARE
+			| SOF_TIMESTAMPING_RX_SOFTWARE },
+		{ tstamp: true, swtstamp: true }
+	},
+};
+
+static struct option long_options[] = {
+	{ "list_tests", no_argument, 0, 'l' },
+	{ "test_num", required_argument, 0, 'n' },
+	{ "op_size", required_argument, 0, 's' },
+	{ "tcp", no_argument, 0, 't' },
+	{ "udp", no_argument, 0, 'u' },
+	{ "ip", no_argument, 0, 'i' },
+};
+
+static int next_port = 19999;
+static int op_size = 10 * 1024;
+
+void print_test_case(struct test_case *t)
+{
+	int f = 0;
+
+	printf("sockopts {");
+	if (t->sockopt.so_timestamp)
+		printf(" SO_TIMESTAMP ");
+	if (t->sockopt.so_timestampns)
+		printf(" SO_TIMESTAMPNS ");
+	if (t->sockopt.so_timestamping) {
+		printf(" SO_TIMESTAMPING: {");
+		for (f = 0; f < ARRAY_SIZE(sof_flags); f++)
+			if (t->sockopt.so_timestamping & sof_flags[f].mask)
+				printf(" %s |", sof_flags[f].name);
+		printf("}");
+	}
+	printf("} expected cmsgs: {");
+	if (t->expected.tstamp)
+		printf(" SCM_TIMESTAMP ");
+	if (t->expected.tstampns)
+		printf(" SCM_TIMESTAMPNS ");
+	if (t->expected.swtstamp || t->expected.hwtstamp) {
+		printf(" SCM_TIMESTAMPING {");
+		if (t->expected.swtstamp)
+			printf("0");
+		if (t->expected.swtstamp && t->expected.hwtstamp)
+			printf(",");
+		if (t->expected.hwtstamp)
+			printf("2");
+		printf("}");
+	}
+	printf("}\n");
+}
+
+void do_send(int src)
+{
+	int r;
+	char *buf = malloc(op_size);
+
+	memset(buf, 'z', op_size);
+	r = write(src, buf, op_size);
+	if (r < 0)
+		error(1, errno, "Failed to sendmsg");
+
+	free(buf);
+}
+
+bool do_recv(int rcv, int read_size, struct tstamps expected)
+{
+	const int CMSG_SIZE = 1024;
+
+	struct scm_timestamping *ts;
+	struct tstamps actual = {};
+	char cmsg_buf[CMSG_SIZE];
+	struct iovec recv_iov;
+	struct cmsghdr *cmsg;
+	bool failed = false;
+	struct msghdr hdr;
+	int flags = 0;
+	int r;
+
+	memset(&hdr, 0, sizeof(hdr));
+	hdr.msg_iov = &recv_iov;
+	hdr.msg_iovlen = 1;
+	recv_iov.iov_base = malloc(read_size);
+	recv_iov.iov_len = read_size;
+
+	hdr.msg_control = cmsg_buf;
+	hdr.msg_controllen = sizeof(cmsg_buf);
+
+	r = recvmsg(rcv, &hdr, flags);
+	if (r < 0)
+		error(1, errno, "Failed to recvmsg");
+	if (r != read_size)
+		error(1, 0, "Only received %d bytes of payload.", r);
+
+	if (hdr.msg_flags & (MSG_TRUNC | MSG_CTRUNC))
+		error(1, 0, "Message was truncated.");
+
+	for (cmsg = CMSG_FIRSTHDR(&hdr); cmsg != NULL;
+	     cmsg = CMSG_NXTHDR(&hdr, cmsg)) {
+		if (cmsg->cmsg_level != SOL_SOCKET)
+			error(1, 0, "Unexpected cmsg_level %d",
+			      cmsg->cmsg_level);
+		switch (cmsg->cmsg_type) {
+		case SCM_TIMESTAMP:
+			actual.tstamp = true;
+			break;
+		case SCM_TIMESTAMPNS:
+			actual.tstampns = true;
+			break;
+		case SCM_TIMESTAMPING:
+			ts = (struct scm_timestamping *)CMSG_DATA(cmsg);
+			actual.swtstamp = !!ts->ts[0].tv_sec;
+			if (ts->ts[1].tv_sec != 0)
+				error(0, 0, "ts[1] should not be set.");
+			actual.hwtstamp = !!ts->ts[2].tv_sec;
+			break;
+		default:
+			error(1, 0, "Unexpected cmsg_type %d", cmsg->cmsg_type);
+		}
+	}
+
+#define VALIDATE(field) \
+	do { \
+		if (expected.field != actual.field) { \
+			if (expected.field) \
+				error(0, 0, "Expected " #field " to be set."); \
+			else \
+				error(0, 0, \
+				      "Expected " #field " to not be set."); \
+			failed = true; \
+		} \
+	} while (0)
+
+	VALIDATE(tstamp);
+	VALIDATE(tstampns);
+	VALIDATE(swtstamp);
+	VALIDATE(hwtstamp);
+#undef VALIDATE
+
+	free(recv_iov.iov_base);
+
+	return failed;
+}
+
+void config_so_flags(int rcv, struct options o)
+{
+	int on = 1;
+
+	if (setsockopt(rcv, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on)) < 0)
+		error(1, errno, "Failed to enable SO_REUSEADDR");
+
+	if (o.so_timestamp &&
+	    setsockopt(rcv, SOL_SOCKET, SO_TIMESTAMP,
+		       &o.so_timestamp, sizeof(o.so_timestamp)) < 0)
+		error(1, errno, "Failed to enable SO_TIMESTAMP");
+
+	if (o.so_timestampns &&
+	    setsockopt(rcv, SOL_SOCKET, SO_TIMESTAMPNS,
+		       &o.so_timestampns, sizeof(o.so_timestampns)) < 0)
+		error(1, errno, "Failed to enable SO_TIMESTAMPNS");
+
+	if (o.so_timestamping &&
+	    setsockopt(rcv, SOL_SOCKET, SO_TIMESTAMPING,
+		       &o.so_timestamping, sizeof(o.so_timestamping)) < 0)
+		error(1, errno, "Failed to set SO_TIMESTAMPING");
+}
+
+bool run_test_case(struct socket_type s, struct test_case t)
+{
+	int port = (s.type == SOCK_RAW) ? 0 : next_port++;
+	int read_size = op_size;
+	struct sockaddr_in addr;
+	bool failed = false;
+	int src, dst, rcv;
+
+	src = socket(AF_INET, s.type, s.protocol);
+	if (src < 0)
+		error(1, errno, "Failed to open src socket");
+
+	dst = socket(AF_INET, s.type, s.protocol);
+	if (dst < 0)
+		error(1, errno, "Failed to open dst socket");
+
+	memset(&addr, 0, sizeof(addr));
+	addr.sin_family = AF_INET;
+	addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
+	addr.sin_port = htons(port);
+
+	if (bind(dst, (struct sockaddr *)&addr, sizeof(addr)) < 0)
+		error(1, errno, "Failed to bind to port %d", port);
+
+	if (s.type == SOCK_STREAM && (listen(dst, 1) < 0))
+		error(1, errno, "Failed to listen");
+
+	if (connect(src, (struct sockaddr *)&addr, sizeof(addr)) < 0)
+		error(1, errno, "Failed to connect");
+
+	if (s.type == SOCK_STREAM) {
+		rcv = accept(dst, NULL, NULL);
+		if (rcv < 0)
+			error(1, errno, "Failed to accept");
+		close(dst);
+	} else {
+		rcv = dst;
+	}
+
+	config_so_flags(rcv, t.sockopt);
+	usleep(20000); /* setsockopt for SO_TIMESTAMPING is asynchronous */
+	do_send(src);
+
+	if (s.type == SOCK_RAW)
+		read_size += 20;  /* for IP header */
+	failed = do_recv(rcv, read_size, t.expected);
+
+	close(rcv);
+	close(src);
+
+	return failed;
+}
+
+int main(int argc, char **argv)
+{
+	bool all_protocols = true;
+	bool all_tests = true;
+	int arg_index = 0;
+	int failures = 0;
+	int s, t;
+	char opt;
+
+	while ((opt = getopt_long(argc, argv, "", long_options,
+				  &arg_index)) != -1) {
+		switch (opt) {
+		case 'l':
+			for (t = 0; t < ARRAY_SIZE(test_cases); t++) {
+				printf("%d\t", t);
+				print_test_case(&test_cases[t]);
+			}
+			return 0;
+		case 'n':
+			t = atoi(optarg);
+			if (t > ARRAY_SIZE(test_cases))
+				error(1, 0, "Invalid test case: %d", t);
+			all_tests = false;
+			test_cases[t].enabled = true;
+			break;
+		case 's':
+			op_size = atoi(optarg);
+			break;
+		case 't':
+			all_protocols = false;
+			socket_types[2].enabled = true;
+			break;
+		case 'u':
+			all_protocols = false;
+			socket_types[1].enabled = true;
+			break;
+		case 'i':
+			all_protocols = false;
+			socket_types[0].enabled = true;
+			break;
+		default:
+			error(1, 0, "Failed to parse parameters.");
+		}
+	}
+
+	for (s = 0; s < ARRAY_SIZE(socket_types); s++) {
+		if (!all_protocols && !socket_types[s].enabled)
+			continue;
+
+		printf("Testing %s...\n", socket_types[s].friendly_name);
+		for (t = 0; t < ARRAY_SIZE(test_cases); t++) {
+			if (!all_tests && !test_cases[t].enabled)
+				continue;
+
+			printf("Starting testcase %d...\n", t);
+			if (run_test_case(socket_types[s], test_cases[t])) {
+				failures++;
+				printf("FAILURE in test case ");
+				print_test_case(&test_cases[t]);
+			}
+		}
+	}
+	if (!failures)
+		printf("PASSED.\n");
+	return failures;
+}
-- 
2.14.1.480.gb18f417b89-goog

^ permalink raw reply related

* [PATCH v2 net-next 1/2] tcp: Extend SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg
From: Mike Maloney @ 2017-08-22 21:08 UTC (permalink / raw)
  To: netdev, davem; +Cc: willemdebruijn.kernel, soheil, Mike Maloney
In-Reply-To: <20170822210849.23162-1-maloneykernel@gmail.com>

From: Mike Maloney <maloney@google.com>

When SOF_TIMESTAMPING_RX_SOFTWARE is enabled for tcp sockets, return the
timestamp corresponding to the highest sequence number data returned.

Previously the skb->tstamp is overwritten when a TCP packet is placed
in the out of order queue.  While the packet is in the ooo queue, save the
timestamp in the TCB_SKB_CB.  This space is shared with the gso_*
options which are only used on the tx path, and a previously unused 4
byte hole.

When skbs are coalesced either in the sk_receive_queue or the
out_of_order_queue always choose the timestamp of the appended skb to
maintain the invariant of returning the timestamp of the last byte in
the recvmsg buffer.

Signed-off-by: Mike Maloney <maloney@google.com>
---
 include/net/tcp.h    |  9 +++++++-
 net/ipv4/tcp.c       | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/tcp_input.c | 35 ++++++++++++++++++++++++----
 net/ipv4/tcp_ipv4.c  |  2 ++
 net/ipv6/tcp_ipv6.c  |  2 ++
 5 files changed, 108 insertions(+), 5 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index afdab3781425..f26d20e9760d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -774,6 +774,12 @@ struct tcp_skb_cb {
 			u16	tcp_gso_segs;
 			u16	tcp_gso_size;
 		};
+
+		/* Used to stash the receive timestamp while this skb is in the
+		 * out of order queue, as skb->tstamp is overwritten by the
+		 * rbnode.
+		 */
+		ktime_t		swtstamp;
 	};
 	__u8		tcp_flags;	/* TCP header flags. (tcp[13])	*/
 
@@ -790,7 +796,8 @@ struct tcp_skb_cb {
 	__u8		ip_dsfield;	/* IPv4 tos or IPv6 dsfield	*/
 	__u8		txstamp_ack:1,	/* Record TX timestamp for ack? */
 			eor:1,		/* Is skb MSG_EOR marked? */
-			unused:6;
+			has_rxtstamp:1,	/* SKB has a RX timestamp	*/
+			unused:5;
 	__u32		ack_seq;	/* Sequence number ACK'd	*/
 	union {
 		struct {
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index d25e3bcca66b..0cce4472b4a1 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -269,6 +269,7 @@
 #include <linux/err.h>
 #include <linux/time.h>
 #include <linux/slab.h>
+#include <linux/errqueue.h>
 
 #include <net/icmp.h>
 #include <net/inet_common.h>
@@ -1695,6 +1696,61 @@ int tcp_peek_len(struct socket *sock)
 }
 EXPORT_SYMBOL(tcp_peek_len);
 
+static void tcp_update_recv_tstamps(struct sk_buff *skb,
+				    struct scm_timestamping *tss)
+{
+	if (skb->tstamp)
+		tss->ts[0] = ktime_to_timespec(skb->tstamp);
+	else
+		tss->ts[0] = (struct timespec) {0};
+
+	if (skb_hwtstamps(skb)->hwtstamp)
+		tss->ts[2] = ktime_to_timespec(skb_hwtstamps(skb)->hwtstamp);
+	else
+		tss->ts[2] = (struct timespec) {0};
+}
+
+/* Similar to __sock_recv_timestamp, but does not require an skb */
+void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk,
+			struct scm_timestamping *tss)
+{
+	struct timeval tv;
+	bool has_timestamping = false;
+
+	if (tss->ts[0].tv_sec || tss->ts[0].tv_nsec) {
+		if (sock_flag(sk, SOCK_RCVTSTAMP)) {
+			if (sock_flag(sk, SOCK_RCVTSTAMPNS)) {
+				put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS,
+					 sizeof(tss->ts[0]), &tss->ts[0]);
+			} else {
+				tv.tv_sec = tss->ts[0].tv_sec;
+				tv.tv_usec = tss->ts[0].tv_nsec / 1000;
+
+				put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP,
+					 sizeof(tv), &tv);
+			}
+		}
+
+		if (sk->sk_tsflags & SOF_TIMESTAMPING_SOFTWARE)
+			has_timestamping = true;
+		else
+			tss->ts[0] = (struct timespec) {0};
+	}
+
+	if (tss->ts[2].tv_sec || tss->ts[2].tv_nsec) {
+		if (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)
+			has_timestamping = true;
+		else
+			tss->ts[2] = (struct timespec) {0};
+	}
+
+	if (has_timestamping) {
+		tss->ts[1] = (struct timespec) {0};
+		put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING,
+			 sizeof(*tss), tss);
+	}
+}
+
 /*
  *	This routine copies from a sock struct into the user buffer.
  *
@@ -1716,6 +1772,8 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 	long timeo;
 	struct sk_buff *skb, *last;
 	u32 urg_hole = 0;
+	struct scm_timestamping tss;
+	bool has_tss = false;
 
 	if (unlikely(flags & MSG_ERRQUEUE))
 		return inet_recv_error(sk, msg, len, addr_len);
@@ -1911,6 +1969,10 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 		if (used + offset < skb->len)
 			continue;
 
+		if (TCP_SKB_CB(skb)->has_rxtstamp) {
+			tcp_update_recv_tstamps(skb, &tss);
+			has_tss = true;
+		}
 		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)
 			goto found_fin_ok;
 		if (!(flags & MSG_PEEK))
@@ -1929,6 +1991,9 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 	 * on connected socket. I was just happy when found this 8) --ANK
 	 */
 
+	if (has_tss)
+		tcp_recv_timestamp(msg, sk, &tss);
+
 	/* Clean up data we have read: This will do ACK frames. */
 	tcp_cleanup_rbuf(sk, copied);
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ddc854728a60..66abcbf6f381 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4246,9 +4246,15 @@ static void tcp_sack_remove(struct tcp_sock *tp)
 	tp->rx_opt.num_sacks = num_sacks;
 }
 
+enum tcp_queue {
+	OOO_QUEUE,
+	RCV_QUEUE,
+};
+
 /**
  * tcp_try_coalesce - try to merge skb to prior one
  * @sk: socket
+ * @dest: destination queue
  * @to: prior buffer
  * @from: buffer to add in queue
  * @fragstolen: pointer to boolean
@@ -4260,6 +4266,7 @@ static void tcp_sack_remove(struct tcp_sock *tp)
  * Returns true if caller should free @from instead of queueing it
  */
 static bool tcp_try_coalesce(struct sock *sk,
+			     enum tcp_queue dest,
 			     struct sk_buff *to,
 			     struct sk_buff *from,
 			     bool *fragstolen)
@@ -4281,6 +4288,15 @@ static bool tcp_try_coalesce(struct sock *sk,
 	TCP_SKB_CB(to)->end_seq = TCP_SKB_CB(from)->end_seq;
 	TCP_SKB_CB(to)->ack_seq = TCP_SKB_CB(from)->ack_seq;
 	TCP_SKB_CB(to)->tcp_flags |= TCP_SKB_CB(from)->tcp_flags;
+
+	if (TCP_SKB_CB(from)->has_rxtstamp) {
+		TCP_SKB_CB(to)->has_rxtstamp = true;
+		if (dest == OOO_QUEUE)
+			TCP_SKB_CB(to)->swtstamp = TCP_SKB_CB(from)->swtstamp;
+		else
+			to->tstamp = from->tstamp;
+	}
+
 	return true;
 }
 
@@ -4315,6 +4331,9 @@ static void tcp_ofo_queue(struct sock *sk)
 		}
 		p = rb_next(p);
 		rb_erase(&skb->rbnode, &tp->out_of_order_queue);
+		/* Replace tstamp which was stomped by rbnode */
+		if (TCP_SKB_CB(skb)->has_rxtstamp)
+			skb->tstamp = TCP_SKB_CB(skb)->swtstamp;
 
 		if (unlikely(!after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt))) {
 			SOCK_DEBUG(sk, "ofo packet was already received\n");
@@ -4326,7 +4345,8 @@ static void tcp_ofo_queue(struct sock *sk)
 			   TCP_SKB_CB(skb)->end_seq);
 
 		tail = skb_peek_tail(&sk->sk_receive_queue);
-		eaten = tail && tcp_try_coalesce(sk, tail, skb, &fragstolen);
+		eaten = tail && tcp_try_coalesce(sk, RCV_QUEUE,
+						 tail, skb, &fragstolen);
 		tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq);
 		fin = TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN;
 		if (!eaten)
@@ -4380,6 +4400,10 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 		return;
 	}
 
+	/* Stash tstamp to avoid being stomped on by rbnode */
+	if (TCP_SKB_CB(skb)->has_rxtstamp)
+		TCP_SKB_CB(skb)->swtstamp = skb->tstamp;
+
 	inet_csk_schedule_ack(sk);
 
 	NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPOFOQUEUE);
@@ -4405,7 +4429,8 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 	/* In the typical case, we are adding an skb to the end of the list.
 	 * Use of ooo_last_skb avoids the O(Log(N)) rbtree lookup.
 	 */
-	if (tcp_try_coalesce(sk, tp->ooo_last_skb, skb, &fragstolen)) {
+	if (tcp_try_coalesce(sk, OOO_QUEUE, tp->ooo_last_skb,
+			     skb, &fragstolen)) {
 coalesce_done:
 		tcp_grow_window(sk, skb);
 		kfree_skb_partial(skb, fragstolen);
@@ -4455,7 +4480,8 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 				__kfree_skb(skb1);
 				goto merge_right;
 			}
-		} else if (tcp_try_coalesce(sk, skb1, skb, &fragstolen)) {
+		} else if (tcp_try_coalesce(sk, OOO_QUEUE, skb1,
+					    skb, &fragstolen)) {
 			goto coalesce_done;
 		}
 		p = &parent->rb_right;
@@ -4506,7 +4532,8 @@ static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, int
 
 	__skb_pull(skb, hdrlen);
 	eaten = (tail &&
-		 tcp_try_coalesce(sk, tail, skb, fragstolen)) ? 1 : 0;
+		 tcp_try_coalesce(sk, RCV_QUEUE, tail,
+				  skb, fragstolen)) ? 1 : 0;
 	tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq);
 	if (!eaten) {
 		__skb_queue_tail(&sk->sk_receive_queue, skb);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 5af8b809dfbc..a63486afa7a7 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1637,6 +1637,8 @@ int tcp_v4_rcv(struct sk_buff *skb)
 	TCP_SKB_CB(skb)->tcp_tw_isn = 0;
 	TCP_SKB_CB(skb)->ip_dsfield = ipv4_get_dsfield(iph);
 	TCP_SKB_CB(skb)->sacked	 = 0;
+	TCP_SKB_CB(skb)->has_rxtstamp =
+			skb->tstamp || skb_hwtstamps(skb)->hwtstamp;
 
 lookup:
 	sk = __inet_lookup_skb(&tcp_hashinfo, skb, __tcp_hdrlen(th), th->source,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index d79a1af3252e..abba3bc2a3d9 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1394,6 +1394,8 @@ static void tcp_v6_fill_cb(struct sk_buff *skb, const struct ipv6hdr *hdr,
 	TCP_SKB_CB(skb)->tcp_tw_isn = 0;
 	TCP_SKB_CB(skb)->ip_dsfield = ipv6_get_dsfield(hdr);
 	TCP_SKB_CB(skb)->sacked = 0;
+	TCP_SKB_CB(skb)->has_rxtstamp =
+			skb->tstamp || skb_hwtstamps(skb)->hwtstamp;
 }
 
 static int tcp_v6_rcv(struct sk_buff *skb)
-- 
2.14.1.480.gb18f417b89-goog

^ permalink raw reply related

* [PATCH v2 net-next 0/2] Add software rx timestamp for TCP.
From: Mike Maloney @ 2017-08-22 21:08 UTC (permalink / raw)
  To: netdev, davem; +Cc: willemdebruijn.kernel, soheil, Mike Maloney

From: Mike Maloney <maloney@google.com>

Add software rx timestamps for TCP, and a test to ensure consistency of
behavior between IP, UDP, and TCP implementation.

Changes since v1:
  -Initialize tss->ts[1] to 0 if caller requested any timestamps.
  -Fix test case to validate that tss->ts[1] is zero.
  -Fix tests to actually use a raw socket.
  -Fix --tcp flag to work on the test.

Mike Maloney (2):
  tcp: Extend SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg
  selftests/net: Add a test to validate behavior of rx timestamps

 include/net/tcp.h                                  |   9 +-
 net/ipv4/tcp.c                                     |  65 ++++
 net/ipv4/tcp_input.c                               |  35 +-
 net/ipv4/tcp_ipv4.c                                |   2 +
 net/ipv6/tcp_ipv6.c                                |   2 +
 .../selftests/networking/timestamping/.gitignore   |   1 +
 .../selftests/networking/timestamping/Makefile     |   4 +-
 .../networking/timestamping/rxtimestamp.c          | 389 +++++++++++++++++++++
 8 files changed, 501 insertions(+), 6 deletions(-)
 create mode 100644 tools/testing/selftests/networking/timestamping/rxtimestamp.c

-- 
2.14.1.480.gb18f417b89-goog

^ permalink raw reply

* Re: [PATCH net-next,0/4] hv_netvsc: Ethtool handler to change UDP hash levels
From: David Miller @ 2017-08-22 21:08 UTC (permalink / raw)
  To: haiyangz, haiyangz; +Cc: netdev, kys, olaf, vkuznets, linux-kernel
In-Reply-To: <1503368560-14331-1-git-send-email-haiyangz@exchange.microsoft.com>

From: Haiyang Zhang <haiyangz@exchange.microsoft.com>
Date: Mon, 21 Aug 2017 19:22:36 -0700

> From: Haiyang Zhang <haiyangz@microsoft.com>
> 
> The patch set adds the functions to switch UDP hash level between
> L3 and L4 by ethtool command. UDP over IPv4 and v6 can be set
> differently. The default hash level is L4. We currently only
> allow switching TX hash level from within the guests.
> 
> The ethtool callback function is triggered by command line, and
> update the per device variables of the hash level.
> 
> On Azure, fragmented UDP packets is not yet supported with L4
> hashing, and may have high packet loss rate. Using L3 hashing is
> recommended in this case. This ethtool option allows a user to
> make this selection.

Series applied, thanks.

^ permalink raw reply

* [PATCH v2] i40e/i40evf: fix out-of-bounds read of cpumask
From: Jacob Keller @ 2017-08-22 21:04 UTC (permalink / raw)
  To: Intel Wired LAN; +Cc: netdev, Jacob Keller, stable

When responding to an affinity hint we directly copied a cpumask value,
intsead of using cpumask_copy. According to cpumask.h this is not
correct because cpumask_t is only guaranteed to have enough space for
the number of CPUs in the system, and may not be as big as we expect.
Thus a direct copy results in an out-of-bound read and potentially
a crash if the pages are aligned just right. This will be easily
detected on a kernel with KASAN enabled:

KASAN reports:
[   25.242312] BUG: KASAN: slab-out-of-bounds in i40e_irq_affinity_notify+0x30/0x50 [i40e] at addr ffff880462eea960
[   25.242315] Read of size 1024 by task kworker/2:1/170
[   25.242322] CPU: 2 PID: 170 Comm: kworker/2:1 Not tainted 4.11.0-22.el7a.x86_64 #1
[   25.242325] Hardware name: HP ProLiant DL380 Gen9, BIOS P89 05/06/2015
[   25.242336] Workqueue: events irq_affinity_notify
[   25.242340] Call Trace:
[   25.242350]  dump_stack+0x63/0x8d
[   25.242358]  kasan_object_err+0x21/0x70
[   25.242364]  kasan_report+0x288/0x540
[   25.242397]  ? i40e_irq_affinity_notify+0x30/0x50 [i40e]
[   25.242403]  check_memory_region+0x13c/0x1a0
[   25.242408]  __asan_loadN+0xf/0x20
[   25.242440]  i40e_irq_affinity_notify+0x30/0x50 [i40e]
[   25.242446]  irq_affinity_notify+0x1b4/0x230
[   25.242452]  ? irq_set_affinity_notifier+0x130/0x130
[   25.242457]  ? kasan_slab_free+0x89/0xc0
[   25.242466]  process_one_work+0x32f/0x6f0
[   25.242472]  worker_thread+0x89/0x770
[   25.242481]  ? pci_mmcfg_check_reserved+0xc0/0xc0
[   25.242488]  kthread+0x18c/0x1e0
[   25.242493]  ? process_one_work+0x6f0/0x6f0
[   25.242499]  ? kthread_create_on_node+0xc0/0xc0
[   25.242506]  ret_from_fork+0x2c/0x40
[   25.242511] Object at ffff880462eea960, in cache kmalloc-8 size: 8
[   25.242513] Allocated:
[   25.242514] PID = 170
[   25.242522]  save_stack_trace+0x1b/0x20
[   25.242529]  save_stack+0x46/0xd0
[   25.242533]  kasan_kmalloc+0xad/0xe0
[   25.242537]  __kmalloc_node+0x12c/0x2b0
[   25.242542]  alloc_cpumask_var_node+0x3c/0x60
[   25.242546]  alloc_cpumask_var+0xe/0x10
[   25.242550]  irq_affinity_notify+0x94/0x230
[   25.242555]  process_one_work+0x32f/0x6f0
[   25.242559]  worker_thread+0x89/0x770
[   25.242564]  kthread+0x18c/0x1e0
[   25.242568]  ret_from_fork+0x2c/0x40
[   25.242569] Freed:
[   25.242570] PID = 0
[   25.242572] (stack is not available)
[   25.242573] Memory state around the buggy address:
[   25.242578]  ffff880462eea800: fc fc 00 fc fc 00 fc fc 00 fc fc 00 fc fc fb fc
[   25.242582]  ffff880462eea880: fc fb fc fc fb fc fc 00 fc fc 00 fc fc 00 fc fc
[   25.242586] >ffff880462eea900: 00 fc fc 00 fc fc 00 fc fc fb fc fc 00 fc fc fc
[   25.242588]                                                           ^
[   25.242592]  ffff880462eea980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   25.242596]  ffff880462eeaa00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   25.242597] ==================================================================

Fixes: 96db776a3682 ("i40e/i40evf: fix interrupt affinity bug", 2016-09-14)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: stable@vger.kernel.org # 4.10+
---
This updates the commit message for the original fix, and indicates that
it fixes a potential crash, as well as tagged the commit for stable and
added a Fixes to indicate which commit this fixes.

 drivers/net/ethernet/intel/i40e/i40e_main.c     | 2 +-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 397f1bcaed3e..50a7260b32c2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3450,7 +3450,7 @@ static void i40e_irq_affinity_notify(struct irq_affinity_notify *notify,
 	struct i40e_q_vector *q_vector =
 		container_of(notify, struct i40e_q_vector, affinity_notify);
 
-	q_vector->affinity_mask = *mask;
+	cpumask_copy(&q_vector->affinity_mask, mask);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 1ffd55e06a49..87175a14740e 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -520,7 +520,7 @@ static void i40evf_irq_affinity_notify(struct irq_affinity_notify *notify,
 	struct i40e_q_vector *q_vector =
 		container_of(notify, struct i40e_q_vector, affinity_notify);
 
-	q_vector->affinity_mask = *mask;
+	cpumask_copy(&q_vector->affinity_mask, mask);
 }
 
 /**
-- 
2.14.1.323.g792488f9a5e1

^ permalink raw reply related

* [PATCH] e1000: changed some expensive calls of udelay to usleep_range
From: nxf23276 @ 2017-08-22 21:02 UTC (permalink / raw)
  To: jeffrey.t.kirsher
  Cc: michael.kardonik, shannon.nelson, carolyn.wyborny,
	donald.c.skidmore, bruce.w.allan, john.ronciak, mitch.a.williams,
	intel-wired-lan, netdev, linux-kernel, nxf23276

    Calls to udelay are not preemtable by userspace so userspace
    applications experience a large (~200us) latency when running on core
    0. Instead usleep_range can be used to be more friendly to userspace
    since it is preemtable. This is due to udelay using busy-wait loops
    while usleep_rang uses hrtimers instead. It is recommended to use
    udelay when the delay is <10us since at that precision overhead of
    usleep_range hrtimer setup causes issues. However, the replaced calls
    are for 50us and 100us so this should not be not an issue.

Signed-off-by: nxf23276 <matthew.tan_1@nxp.com>
---
 drivers/net/ethernet/intel/e1000e/phy.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/phy.c b/drivers/net/ethernet/intel/e1000e/phy.c
index de13aea..ee6ab53 100644
--- a/drivers/net/ethernet/intel/e1000e/phy.c
+++ b/drivers/net/ethernet/intel/e1000e/phy.c
@@ -158,7 +158,7 @@ s32 e1000e_read_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 *data)
 	 * the lower time out
 	 */
 	for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {
-		udelay(50);
+		usleep_range(40, 60);
 		mdic = er32(MDIC);
 		if (mdic & E1000_MDIC_READY)
 			break;
@@ -183,7 +183,7 @@ s32 e1000e_read_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 *data)
 	 * reading duplicate data in the next MDIC transaction.
 	 */
 	if (hw->mac.type == e1000_pch2lan)
-		udelay(100);
+		usleep_range(90, 110);
 
 	return 0;
 }
@@ -222,7 +222,7 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 data)
 	 * the lower time out
 	 */
 	for (i = 0; i < (E1000_GEN_POLL_TIMEOUT * 3); i++) {
-		udelay(50);
+		usleep_range(40, 60);
 		mdic = er32(MDIC);
 		if (mdic & E1000_MDIC_READY)
 			break;
@@ -246,7 +246,7 @@ s32 e1000e_write_phy_reg_mdic(struct e1000_hw *hw, u32 offset, u16 data)
 	 * reading duplicate data in the next MDIC transaction.
 	 */
 	if (hw->mac.type == e1000_pch2lan)
-		udelay(100);
+		usleep_range(90, 110);
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net] net/hsr: Check skb_put_padto() return value
From: David Miller @ 2017-08-22 20:49 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev, mail, peter.heise, arvid.brodin, linux-kernel
In-Reply-To: <20170821195910.28752-1-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Mon, 21 Aug 2017 12:59:10 -0700

> skb_put_padto() will free the sk_buff passed as reference in case of
> errors, but we still need to check its return value and decide what to
> do.
> 
> Detected by CoverityScan, CID#1416688 ("CHECKED_RETURN")
> 
> Fixes: ee1c27977284 ("net/hsr: Added support for HSR v1")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Applied, thanks.

^ permalink raw reply

* [PATCH net-next 5/5] xdp: get tracepoints xdp_exception and xdp_redirect in sync
From: Jesper Dangaard Brouer @ 2017-08-22 20:47 UTC (permalink / raw)
  To: netdev; +Cc: John Fastabend, Jesper Dangaard Brouer
In-Reply-To: <150343479290.31091.8019008896152616977.stgit@firesoul>

Remove the net_device string name from the xdp_exception tracepoint,
like the xdp_redirect tracepoint.

Align the TP_STRUCT to have common entries between these two
tracepoint.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/trace/events/xdp.h |   24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/trace/events/xdp.h b/include/trace/events/xdp.h
index 7511bed80558..6495b0d9d5c7 100644
--- a/include/trace/events/xdp.h
+++ b/include/trace/events/xdp.h
@@ -31,22 +31,22 @@ TRACE_EVENT(xdp_exception,
 	TP_ARGS(dev, xdp, act),
 
 	TP_STRUCT__entry(
-		__string(name, dev->name)
 		__array(u8, prog_tag, 8)
 		__field(u32, act)
+		__field(int, ifindex)
 	),
 
 	TP_fast_assign(
 		BUILD_BUG_ON(sizeof(__entry->prog_tag) != sizeof(xdp->tag));
 		memcpy(__entry->prog_tag, xdp->tag, sizeof(xdp->tag));
-		__assign_str(name, dev->name);
-		__entry->act = act;
+		__entry->act		= act;
+		__entry->ifindex	= dev->ifindex;
 	),
 
-	TP_printk("prog=%s device=%s action=%s",
+	TP_printk("prog=%s action=%s ifindex=%d",
 		  __print_hex_str(__entry->prog_tag, 8),
-		  __get_str(name),
-		  __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB))
+		  __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
+		  __entry->ifindex)
 );
 
 TRACE_EVENT(xdp_redirect,
@@ -57,26 +57,26 @@ TRACE_EVENT(xdp_redirect,
 	TP_ARGS(from_index, to_index, xdp, act, err),
 
 	TP_STRUCT__entry(
-		__field(int, from_index)
-		__field(int, to_index)
 		__array(u8, prog_tag, 8)
 		__field(u32, act)
+		__field(int, from_index)
+		__field(int, to_index)
 		__field(int, err)
 	),
 
 	TP_fast_assign(
 		BUILD_BUG_ON(sizeof(__entry->prog_tag) != sizeof(xdp->tag));
 		memcpy(__entry->prog_tag, xdp->tag, sizeof(xdp->tag));
+		__entry->act		= act;
 		__entry->from_index	= from_index;
 		__entry->to_index	= to_index;
-		__entry->act = act;
-		__entry->err = err;
+		__entry->err		= err;
 	),
 
-	TP_printk("prog=%s from=%d to=%d action=%s err=%d",
+	TP_printk("prog=%s action=%s from=%d to=%d err=%d",
 		  __print_hex_str(__entry->prog_tag, 8),
-		  __entry->from_index, __entry->to_index,
 		  __print_symbolic(__entry->act, __XDP_ACT_SYM_TAB),
+		  __entry->from_index, __entry->to_index,
 		  __entry->err)
 );
 #endif /* _TRACE_XDP_H */

^ permalink raw reply related

* RE: [patch net-next] net/sched: Fix the logic error to decide the ingress qdisc
From: Chopra, Manish @ 2017-08-22 20:47 UTC (permalink / raw)
  To: Chris Mi, netdev@vger.kernel.org; +Cc: davem@davemloft.net, jiri@resnulli.us
In-Reply-To: <1503055460-36795-1-git-send-email-chrism@mellanox.com>

-----Original Message-----
From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org] On Behalf Of Chris Mi
Sent: Friday, August 18, 2017 4:54 PM
To: netdev@vger.kernel.org
Cc: davem@davemloft.net; jiri@resnulli.us
Subject: [patch net-next] net/sched: Fix the logic error to decide the ingress qdisc

The offending commit used a newly added helper function.
But the logic is wrong. Without this fix, the affected NICs can't do HW offload. Error -EOPNOTSUPP will be returned directly.

Fixes: a2e8da9378cc ("net/sched: use newly added classid identity helpers")
Signed-off-by: Chris Mi <chrism@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c     | 2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c       | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c   | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c    | 2 +-
 drivers/net/ethernet/netronome/nfp/bpf/main.c       | 2 +-
 drivers/net/ethernet/netronome/nfp/flower/offload.c | 2 +-
 6 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 77538cd..e55a929 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -2892,7 +2892,7 @@ static int cxgb_set_tx_maxrate(struct net_device *dev, int index, u32 rate)  static int cxgb_setup_tc_cls_u32(struct net_device *dev,
 				 struct tc_cls_u32_offload *cls_u32)  {
-	if (is_classid_clsact_ingress(cls_u32->common.classid) ||
+	if (!is_classid_clsact_ingress(cls_u32->common.classid) ||
 	    cls_u32->common.chain_index)
 		return -EOPNOTSUPP;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index f9fd8d8..56d7ef0 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9230,7 +9230,7 @@ static int ixgbe_setup_tc_cls_u32(struct net_device *dev,  {
 	struct ixgbe_adapter *adapter = netdev_priv(dev);
 
-	if (is_classid_clsact_ingress(cls_u32->common.classid) ||
+	if (!is_classid_clsact_ingress(cls_u32->common.classid) ||
 	    cls_u32->common.chain_index)
 		return -EOPNOTSUPP;

Hi Jiri/Chris,

I was looking at ixgbe and observed that it uses "is_tcf_mirred_egress_redirect()" API which works for below filter command -

     # add u32 filter with action to redirect to macvlan netdev
     tc filter add dev p4p1 parent ffff: protocol ip prio 99 \
        handle 800:0:2 u32 ht 800: \
        match ip src 192.168.2.3/32 \
        action mirred egress redirect dev mvlan_1

Assuming that hardware actually redirect the packets to "ingress" direction of mac vlan interface [that is it delivers the packet to the driver in receive flow]
Isn't the use of above API incorrect or misleading ?  shouldn't it be something like "is_tcf_mirred_ingress_redirect()" and part of the command should be replaced with below ?

action mirrred ingress redirect mvlan_1 

Thanks,
Manish

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox