Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH] net: dsa: sja1105: sja1105_main: Add of_node_put()
From: Nishka Dasgupta @ 2019-07-23 10:44 UTC (permalink / raw)
  To: olteanv, andrew, vivien.didelot, f.fainelli, davem, netdev,
	linux-kernel
  Cc: Nishka Dasgupta

Each iteration of for_each_child_of_node puts the previous node, but in
the case of a return from the middle of the loop, there is no put, thus
causing a memory leak. Hence add an of_node_put before the return.
Issue found with Coccinelle.

Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
---
 drivers/net/dsa/sja1105/sja1105_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index 32bf3a7cc3b6..6ed5f1e35789 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -625,6 +625,7 @@ static int sja1105_parse_ports_node(struct sja1105_private *priv,
 		if (of_property_read_u32(child, "reg", &index) < 0) {
 			dev_err(dev, "Port number not defined in device tree "
 				"(property \"reg\")\n");
+			of_node_put(child);
 			return -ENODEV;
 		}
 
@@ -634,6 +635,7 @@ static int sja1105_parse_ports_node(struct sja1105_private *priv,
 			dev_err(dev, "Failed to read phy-mode or "
 				"phy-interface-type property for port %d\n",
 				index);
+			of_node_put(child);
 			return -ENODEV;
 		}
 		ports[index].phy_mode = phy_mode;
@@ -643,6 +645,7 @@ static int sja1105_parse_ports_node(struct sja1105_private *priv,
 			if (!of_phy_is_fixed_link(child)) {
 				dev_err(dev, "phy-handle or fixed-link "
 					"properties missing!\n");
+				of_node_put(child);
 				return -ENODEV;
 			}
 			/* phy-handle is missing, but fixed-link isn't.
-- 
2.19.1


^ permalink raw reply related

* [PATCH] net: dsa: mv88e6xxx: chip: Add of_node_put() before return
From: Nishka Dasgupta @ 2019-07-23 10:43 UTC (permalink / raw)
  To: andrew, vivien.didelot, f.fainelli, davem, netdev; +Cc: Nishka Dasgupta

Each iteration of for_each_available_child_of_node puts the previous
node, but in the case of a return from the middle of the loop, there is
no put, thus causing a memory leak. Hence add an of_node_put before the
return.
Issue found with Coccinelle.

Signed-off-by: Nishka Dasgupta <nishkadg.linux@gmail.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 6b17cd961d06..c97dea4599a8 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2721,6 +2721,7 @@ static int mv88e6xxx_mdios_register(struct mv88e6xxx_chip *chip,
 			err = mv88e6xxx_mdio_register(chip, child, true);
 			if (err) {
 				mv88e6xxx_mdios_unregister(chip);
+				of_node_put(child);
 				return err;
 			}
 		}
-- 
2.19.1


^ permalink raw reply related

* Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jon Hunter @ 2019-07-23 10:38 UTC (permalink / raw)
  To: Jose Abreu, Lars Persson, Ilias Apalodimas
  Cc: Joao Pinto, Alexandre Torgue, Maxime Ripard,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com, Chen-Yu Tsai,
	Maxime Coquelin, linux-tegra, Giuseppe Cavallaro,
	David S . Miller, linux-arm-kernel@lists.infradead.org
In-Reply-To: <BYAPR12MB32692AF2BA127C5DA5B74804D3C70@BYAPR12MB3269.namprd12.prod.outlook.com>


On 23/07/2019 11:07, Jose Abreu wrote:
> From: Jon Hunter <jonathanh@nvidia.com>
> Date: Jul/23/2019, 11:01:24 (UTC+00:00)
> 
>> This appears to be a winner and by disabling the SMMU for the ethernet
>> controller and reverting commit 954a03be033c7cef80ddc232e7cbdb17df735663
>> this worked! So yes appears to be related to the SMMU being enabled. We
>> had to enable the SMMU for ethernet recently due to commit
>> 954a03be033c7cef80ddc232e7cbdb17df735663.
> 
> Finally :)
> 
> However, from "git show 954a03be033c7cef80ddc232e7cbdb17df735663":
> 
> +         There are few reasons to allow unmatched stream bypass, and
> +         even fewer good ones.  If saying YES here breaks your board
> +         you should work on fixing your board.
> 
> So, how can we fix this ? Is your ethernet DT node marked as 
> "dma-coherent;" ?

TBH I have no idea. I can't say I fully understand your change or how it
is breaking things for us.

Currently, the Tegra DT binding does not have 'dma-coherent' set. I see
this is optional, but I am not sure how you determine whether or not
this should be set.

Jon

-- 
nvpublic

^ permalink raw reply

* Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Robin Murphy @ 2019-07-23 10:29 UTC (permalink / raw)
  To: Jose Abreu, Jon Hunter, Lars Persson, Ilias Apalodimas
  Cc: Joao Pinto, Alexandre Torgue, Maxime Ripard,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com, Chen-Yu Tsai,
	Maxime Coquelin, linux-tegra, Giuseppe Cavallaro,
	David S . Miller, linux-arm-kernel@lists.infradead.org
In-Reply-To: <BYAPR12MB32692AF2BA127C5DA5B74804D3C70@BYAPR12MB3269.namprd12.prod.outlook.com>

On 23/07/2019 11:07, Jose Abreu wrote:
> From: Jon Hunter <jonathanh@nvidia.com>
> Date: Jul/23/2019, 11:01:24 (UTC+00:00)
> 
>> This appears to be a winner and by disabling the SMMU for the ethernet
>> controller and reverting commit 954a03be033c7cef80ddc232e7cbdb17df735663
>> this worked! So yes appears to be related to the SMMU being enabled. We
>> had to enable the SMMU for ethernet recently due to commit
>> 954a03be033c7cef80ddc232e7cbdb17df735663.
> 
> Finally :)
> 
> However, from "git show 954a03be033c7cef80ddc232e7cbdb17df735663":
> 
> +         There are few reasons to allow unmatched stream bypass, and
> +         even fewer good ones.  If saying YES here breaks your board
> +         you should work on fixing your board.
> 
> So, how can we fix this ? Is your ethernet DT node marked as
> "dma-coherent;" ?

The first thing to try would be booting the failing setup with 
"iommu.passthrough=1" (or using CONFIG_IOMMU_DEFAULT_PASSTHROUGH) - if 
that makes things seem OK, then the problem is likely related to address 
translation; if not, then it's probably time to start looking at nasties 
like coherency and ordering, although in principle I wouldn't expect the 
SMMU to have too much impact there.

Do you know if the SMMU interrupts are working correctly? If not, it's 
possible that an incorrect address or mapping direction could lead to 
the DMA transaction just being silently terminated without any fault 
indication, which generally presents as inexplicable weirdness (I've 
certainly seen that on another platform with the mix of an unsupported 
interrupt controller and an 'imperfect' ethernet driver).

Just to confirm, has the original patch been tested with 
CONFIG_DMA_API_DEBUG to rule out any high-level mishaps?

Robin.

^ permalink raw reply

* Re: [bpf-next 0/6] Introduce a BPF helper to generate SYN cookies
From: Lorenz Bauer @ 2019-07-23 10:27 UTC (permalink / raw)
  To: Petar Penkov
  Cc: Networking, bpf, davem, Alexei Starovoitov, Daniel Borkmann,
	Eric Dumazet, Stanislav Fomichev, Petar Penkov
In-Reply-To: <20190723002042.105927-1-ppenkov.kernel@gmail.com>

On Tue, 23 Jul 2019 at 01:20, Petar Penkov <ppenkov.kernel@gmail.com> wrote:
>
> From: Petar Penkov <ppenkov@google.com>
>
> This patch series introduces a BPF helper function that allows generating SYN
> cookies from BPF. Currently, this helper is enabled at both the TC hook and the
> XDP hook.
>
> The first two patches in the series add/modify several TCP helper functions to
> allow for SKB-less operation, as is the case at the XDP hook.
>
> The third patch introduces the bpf_tcp_gen_syncookie helper function which
> generates a SYN cookie for either XDP or TC programs. The return value of
> this function contains both the MSS value, encoded in the cookie, and the
> cookie itself.
>
> The last three patches sync tools/ and add a test.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply

* [PATCH bpf 2/2] selftests/bpf: add another gso_segs access
From: Eric Dumazet @ 2019-07-23 10:15 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: David S . Miller, netdev, Eric Dumazet, Eric Dumazet,
	Stanislav Fomichev, bpf
In-Reply-To: <20190723101538.136328-1-edumazet@google.com>

Use BPF_REG_1 for source and destination of gso_segs read,
to exercise "bpf: fix access to skb_shared_info->gso_segs" fix.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Stanislav Fomichev <sdf@google.com>
---
 tools/testing/selftests/bpf/verifier/ctx_skb.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/testing/selftests/bpf/verifier/ctx_skb.c b/tools/testing/selftests/bpf/verifier/ctx_skb.c
index b0fda2877119c4af08277bd0f329f238c193313c..d438193804b212ffa80c94be47e8c1aca392181e 100644
--- a/tools/testing/selftests/bpf/verifier/ctx_skb.c
+++ b/tools/testing/selftests/bpf/verifier/ctx_skb.c
@@ -974,6 +974,17 @@
 	.result = ACCEPT,
 	.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
 },
+{
+	"read gso_segs from CGROUP_SKB",
+	.insns = {
+	BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_1,
+		    offsetof(struct __sk_buff, gso_segs)),
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_CGROUP_SKB,
+},
 {
 	"write gso_segs from CGROUP_SKB",
 	.insns = {
-- 
2.22.0.657.g960e92d24f-goog


^ permalink raw reply related

* [PATCH bpf 1/2] bpf: fix access to skb_shared_info->gso_segs
From: Eric Dumazet @ 2019-07-23 10:15 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: David S . Miller, netdev, Eric Dumazet, Eric Dumazet,
	Stanislav Fomichev, bpf, syzbot
In-Reply-To: <20190723101538.136328-1-edumazet@google.com>

It is possible we reach bpf_convert_ctx_access() with
si->dst_reg == si->src_reg

Therefore, we need to load BPF_REG_AX before eventually
mangling si->src_reg.

syzbot generated this x86 code :
   3:   55                      push   %rbp
   4:   48 89 e5                mov    %rsp,%rbp
   7:   48 81 ec 00 00 00 00    sub    $0x0,%rsp // Might be avoided ?
   e:   53                      push   %rbx
   f:   41 55                   push   %r13
  11:   41 56                   push   %r14
  13:   41 57                   push   %r15
  15:   6a 00                   pushq  $0x0
  17:   31 c0                   xor    %eax,%eax
  19:   48 8b bf c0 00 00 00    mov    0xc0(%rdi),%rdi
  20:   44 8b 97 bc 00 00 00    mov    0xbc(%rdi),%r10d
  27:   4c 01 d7                add    %r10,%rdi
  2a:   48 0f b7 7f 06          movzwq 0x6(%rdi),%rdi // Crash
  2f:   5b                      pop    %rbx
  30:   41 5f                   pop    %r15
  32:   41 5e                   pop    %r14
  34:   41 5d                   pop    %r13
  36:   5b                      pop    %rbx
  37:   c9                      leaveq
  38:   c3                      retq

Fixes: d9ff286a0f59 ("bpf: allow BPF programs access skb_shared_info->gso_segs field")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
 net/core/filter.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 4e2a79b2fd77f36ba2a31e9e43af1abc1207766e..7878f918b8c057b7b90ca0afcf2d5773cfb55e15 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -7455,12 +7455,12 @@ static u32 bpf_convert_ctx_access(enum bpf_access_type type,
 	case offsetof(struct __sk_buff, gso_segs):
 		/* si->dst_reg = skb_shinfo(SKB); */
 #ifdef NET_SKBUFF_DATA_USES_OFFSET
-		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, head),
-				      si->dst_reg, si->src_reg,
-				      offsetof(struct sk_buff, head));
 		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, end),
 				      BPF_REG_AX, si->src_reg,
 				      offsetof(struct sk_buff, end));
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, head),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct sk_buff, head));
 		*insn++ = BPF_ALU64_REG(BPF_ADD, si->dst_reg, BPF_REG_AX);
 #else
 		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct sk_buff, end),
-- 
2.22.0.657.g960e92d24f-goog


^ permalink raw reply related

* [PATCH bpf 0/2] bpf: gso_segs fixes
From: Eric Dumazet @ 2019-07-23 10:15 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: David S . Miller, netdev, Eric Dumazet, Eric Dumazet,
	Stanislav Fomichev, bpf

First patch changes the kernel, second patch
adds a new test.

Note that other patches might be needed to take
care of similar issues in sock_ops_convert_ctx_access()
and SOCK_OPS_GET_FIELD()

Eric Dumazet (2):
  bpf: fix access to skb_shared_info->gso_segs
  selftests/bpf: add another gso_segs access

 net/core/filter.c                              |  6 +++---
 tools/testing/selftests/bpf/verifier/ctx_skb.c | 11 +++++++++++
 2 files changed, 14 insertions(+), 3 deletions(-)

-- 
2.22.0.657.g960e92d24f-goog


^ permalink raw reply

* RE: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jose Abreu @ 2019-07-23 10:07 UTC (permalink / raw)
  To: Jon Hunter, Jose Abreu, Lars Persson, Ilias Apalodimas
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org, Joao Pinto,
	David S . Miller, Giuseppe Cavallaro, Alexandre Torgue,
	Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai, linux-tegra
In-Reply-To: <ab14f31f-2045-b1be-d31f-2a81b8527dac@nvidia.com>

From: Jon Hunter <jonathanh@nvidia.com>
Date: Jul/23/2019, 11:01:24 (UTC+00:00)

> This appears to be a winner and by disabling the SMMU for the ethernet
> controller and reverting commit 954a03be033c7cef80ddc232e7cbdb17df735663
> this worked! So yes appears to be related to the SMMU being enabled. We
> had to enable the SMMU for ethernet recently due to commit
> 954a03be033c7cef80ddc232e7cbdb17df735663.

Finally :)

However, from "git show 954a03be033c7cef80ddc232e7cbdb17df735663":

+         There are few reasons to allow unmatched stream bypass, and
+         even fewer good ones.  If saying YES here breaks your board
+         you should work on fixing your board.

So, how can we fix this ? Is your ethernet DT node marked as 
"dma-coherent;" ?

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool
From: Jon Hunter @ 2019-07-23 10:01 UTC (permalink / raw)
  To: Jose Abreu, Lars Persson, Ilias Apalodimas
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-stm32@st-md-mailman.stormreply.com,
	linux-arm-kernel@lists.infradead.org, Joao Pinto,
	David S . Miller, Giuseppe Cavallaro, Alexandre Torgue,
	Maxime Coquelin, Maxime Ripard, Chen-Yu Tsai, linux-tegra
In-Reply-To: <BYAPR12MB3269A725AFDDA21E92946558D3C70@BYAPR12MB3269.namprd12.prod.outlook.com>


On 23/07/2019 09:14, Jose Abreu wrote:
> From: Jose Abreu <joabreu@synopsys.com>
> Date: Jul/22/2019, 15:04:49 (UTC+00:00)
> 
>> From: Jon Hunter <jonathanh@nvidia.com>
>> Date: Jul/22/2019, 13:05:38 (UTC+00:00)
>>
>>>
>>> On 22/07/2019 12:39, Jose Abreu wrote:
>>>> From: Lars Persson <lists@bofh.nu>
>>>> Date: Jul/22/2019, 12:11:50 (UTC+00:00)
>>>>
>>>>> On Mon, Jul 22, 2019 at 12:18 PM Ilias Apalodimas
>>>>> <ilias.apalodimas@linaro.org> wrote:
>>>>>>
>>>>>> On Thu, Jul 18, 2019 at 07:48:04AM +0000, Jose Abreu wrote:
>>>>>>> From: Jon Hunter <jonathanh@nvidia.com>
>>>>>>> Date: Jul/17/2019, 19:58:53 (UTC+00:00)
>>>>>>>
>>>>>>>> Let me know if you have any thoughts.
>>>>>>>
>>>>>>> Can you try attached patch ?
>>>>>>>
>>>>>>
>>>>>> The log says  someone calls panic() right?
>>>>>> Can we trye and figure were that happens during the stmmac init phase?
>>>>>>
>>>>>
>>>>> The reason for the panic is hidden in this one line of the kernel logs:
>>>>> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>>>>>
>>>>> The init process is killed by SIGSEGV (signal 11 = 0xb).
>>>>>
>>>>> I would suggest you look for data corruption bugs in the RX path. If
>>>>> the code is fetched from the NFS mount then a corrupt RX buffer can
>>>>> trigger a crash in userspace.
>>>>>
>>>>> /Lars
>>>>
>>>>
>>>> Jon, I'm not familiar with ARM. Are the buffer addresses being allocated 
>>>> in a coherent region ? Can you try attached patch which adds full memory 
>>>> barrier before the sync ?
>>>
>>> TBH I am not sure about the buffer addresses either. The attached patch
>>> did not help. Same problem persists.
>>
>> OK. I'm just guessing now at this stage but can you disable SMP ?

I tried limiting the number of CPUs to one by setting 'maxcpus=0' on the
kernel command line. However, this did not help.

>> We have to narrow down if this is coherency issue but you said that 
>> booting without NFS and then mounting manually the share works ... So, 
>> can you share logs with same debug prints in this condition in order to 
>> compare ?
> 
> Jon, I have one ARM based board and I can't face your issue but I 
> noticed that my buffer addresses are being mapped using SWIOTLB. Can you 
> disable IOMMU support on your setup and let me know if the problem 
> persists ?

This appears to be a winner and by disabling the SMMU for the ethernet
controller and reverting commit 954a03be033c7cef80ddc232e7cbdb17df735663
this worked! So yes appears to be related to the SMMU being enabled. We
had to enable the SMMU for ethernet recently due to commit
954a03be033c7cef80ddc232e7cbdb17df735663.

Cheers
Jon

-- 
nvpublic

^ permalink raw reply

* Re: [PATCH] net-ipv6-ndisc: add support for RFC7710 RA Captive Portal Identifier
From: Maciej Żenczykowski @ 2019-07-23  9:52 UTC (permalink / raw)
  To: David Miller; +Cc: Linux NetDev, Lorenzo Colitti, Remi NGUYEN VAN, raorn
In-Reply-To: <20190722.121107.493176692915633338.davem@davemloft.net>

> Applied to net-next

Any chance we could get this into LTS releases?

I can trivially backport this into Android common kernels - which
would get this into the kernel in time for devices that launch with
Android R, but by getting it into LTS we'd get this support even on
devices that upgrade to Android R (ie. it speeds things up by about 2
years).

Thanks,
Maciej

^ permalink raw reply

* Re: [PATCH bpf-next 5/5] selftests/bpf: remove perf buffer helpers
From: Song Liu @ 2019-07-23  9:39 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev@vger.kernel.org, Alexei Starovoitov,
	daniel@iogearbox.net, andrii.nakryiko@gmail.com, Kernel Team
In-Reply-To: <20190723043112.3145810-6-andriin@fb.com>



> On Jul 22, 2019, at 9:31 PM, Andrii Nakryiko <andriin@fb.com> wrote:
> 
> libbpf's perf_buffer API supersedes trace_helper.h's helpers.
> Remove those helpers after all existing users were already moved to
> perf_buffer API.
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>


^ permalink raw reply

* Re: [PATCH bpf-next 3/5] samples/bpf: convert xdp_sample_pkts_user to perf_buffer API
From: Song Liu @ 2019-07-23  9:39 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Alexei Starovoitov, daniel@iogearbox.net,
	andrii.nakryiko@gmail.com, Kernel Team
In-Reply-To: <20190723043112.3145810-4-andriin@fb.com>



> On Jul 22, 2019, at 9:31 PM, Andrii Nakryiko <andriin@fb.com> wrote:
> 
> Convert xdp_sample_pkts_user to libbpf's perf_buffer API.
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply

* Re: [bpf-next 6/6] selftests/bpf: add test for bpf_tcp_gen_syncookie
From: Lorenz Bauer @ 2019-07-23  9:37 UTC (permalink / raw)
  To: Petar Penkov
  Cc: Networking, bpf, davem, Alexei Starovoitov, Daniel Borkmann,
	Eric Dumazet, Stanislav Fomichev, Petar Penkov
In-Reply-To: <20190723002042.105927-7-ppenkov.kernel@gmail.com>

On Tue, 23 Jul 2019 at 01:20, Petar Penkov <ppenkov.kernel@gmail.com> wrote:
> +static __always_inline __s64 gen_syncookie(void *data_end, struct bpf_sock *sk,
> +                                          void *iph, __u32 ip_size,
> +                                          struct tcphdr *tcph)
> +{
> +       __u32 thlen = tcph->doff * 4;
> +
> +       if (tcph->syn && !tcph->ack) {
> +               // packet should only have an MSS option
> +               if (thlen != 24)
> +                       return 0;

Just for my own understanding: without this the verifier complains since
thlen is not a known value, even though it is in bounds due to the check below?

> +
> +               if ((void *)tcph + thlen > data_end)
> +                       return 0;
> +
> +               return bpf_tcp_gen_syncookie(sk, iph, ip_size, tcph, thlen);
> +       }
> +       return 0;
> +}
> +

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply

* RE: [PATCH net-next 0/3] net: stmmac: Convert to phylink
From: Jose Abreu @ 2019-07-23  9:36 UTC (permalink / raw)
  To: Ondřej Jirman, Jose Abreu
  Cc: Andrew Lunn, linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	Joao Pinto, David S . Miller, Giuseppe Cavallaro,
	Alexandre Torgue, Russell King, Florian Fainelli, Heiner Kallweit
In-Reply-To: <20190722143955.uwzvcmhc4bdr2zr5@core.my.home>

From: Ondřej Jirman <megi@xff.cz>
Date: Jul/22/2019, 15:39:55 (UTC+00:00)

> On Mon, Jul 22, 2019 at 02:26:45PM +0000, Jose Abreu wrote:
> > From: Andrew Lunn <andrew@lunn.ch>
> > Date: Jul/22/2019, 15:19:43 (UTC+00:00)
> > 
> > > On Mon, Jul 22, 2019 at 01:58:20PM +0000, Jose Abreu wrote:
> > > > From: Andrew Lunn <andrew@lunn.ch>
> > > > Date: Jul/22/2019, 14:40:23 (UTC+00:00)
> > > > 
> > > > > Does this mean that all stmmac variants support 1G? There are none
> > > > > which just support Fast Ethernet?
> > > > 
> > > > This glue logic drivers sometimes reflect a custom IP that's Synopsys 
> > > > based but modified by customer, so I can't know before-hand what's the 
> > > > supported max speed. There are some old versions that don't support 1G 
> > > > but I expect that PHY driver limits this ...
> > > 
> > > If a Fast PHY is used, then yes, it would be limited. But sometimes a
> > > 1G PHY is used because they are cheaper than a Fast PHY.
> > >  
> > > > > I'm also not sure the change fits the problem. Why did it not
> > > > > negotiate 100FULL rather than 10Half? You are only moving the 1G
> > > > > speeds around, so 100 speeds should of been advertised and selected.
> > > > 
> > > > Hmm, now that I'm looking at it closer I agree with you. Maybe link 
> > > > partner or PHY doesn't support 100M ?
> > > 
> > > In the working case, ethtool shows the link partner supports 10, 100,
> > > and 1G. So something odd is going on here.
> > > 
> > > You fix does seems reasonable, and it has been reported to fix the
> > > issue, but it would be good to understand what is going on here.
> > 
> > Agreed!
> > 
> > Ondrej, can you please share dmesg log and ethtool output with the fixed 
> > patch ?
> 
> See the attachment, or this link:

So, I've removed all 1G link modes from stmmac and run it on an ARM 
based board. My link status resolves to 100M/Full using Generic PHY so 
maybe something is wrong with the PHY driver that Ondrej is using 
("RTL8211E Gigabit Ethernet") ?

---
Thanks,
Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH bpf-next 2/5] selftests/bpf: switch test_tcpnotify to perf_buffer API
From: Song Liu @ 2019-07-23  9:35 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev@vger.kernel.org, Alexei Starovoitov,
	daniel@iogearbox.net, andrii.nakryiko@gmail.com, Kernel Team
In-Reply-To: <20190723043112.3145810-3-andriin@fb.com>



> On Jul 22, 2019, at 9:31 PM, Andrii Nakryiko <andriin@fb.com> wrote:
> 
> Switch test_tcpnotify test to use libbpf's perf_buffer API instead of
> re-implementing portion of it.
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>


^ permalink raw reply

* Re: [PATCH bpf-next 4/5] samples/bpf: switch trace_output sample to perf_buffer API
From: Song Liu @ 2019-07-23  9:29 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev@vger.kernel.org, Alexei Starovoitov,
	daniel@iogearbox.net, andrii.nakryiko@gmail.com, Kernel Team
In-Reply-To: <20190723043112.3145810-5-andriin@fb.com>



> On Jul 22, 2019, at 9:31 PM, Andrii Nakryiko <andriin@fb.com> wrote:
> 
> Convert trace_output sample to libbpf's perf_buffer API.
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
> samples/bpf/trace_output_user.c | 43 +++++++++++----------------------
> 1 file changed, 14 insertions(+), 29 deletions(-)
> 
> diff --git a/samples/bpf/trace_output_user.c b/samples/bpf/trace_output_user.c
> index 2dd1d39b152a..8ee47699a870 100644
> --- a/samples/bpf/trace_output_user.c
> +++ b/samples/bpf/trace_output_user.c
> @@ -18,9 +18,6 @@
> #include <libbpf.h>
> #include "bpf_load.h"
> #include "perf-sys.h"
> -#include "trace_helpers.h"
> -
> -static int pmu_fd;
> 
> static __u64 time_get_ns(void)
> {
> @@ -31,12 +28,12 @@ static __u64 time_get_ns(void)
> }
> 
> static __u64 start_time;
> +static __u64 cnt;
> 
> #define MAX_CNT 100000ll
> 
> -static int print_bpf_output(void *data, int size)
> +static void print_bpf_output(void *ctx, int cpu, void *data, __u32 size)
> {
> -	static __u64 cnt;
> 	struct {
> 		__u64 pid;
> 		__u64 cookie;
> @@ -45,7 +42,7 @@ static int print_bpf_output(void *data, int size)
> 	if (e->cookie != 0x12345678) {
> 		printf("BUG pid %llx cookie %llx sized %d\n",
> 		       e->pid, e->cookie, size);
> -		return LIBBPF_PERF_EVENT_ERROR;
> +		return;
> 	}
> 
> 	cnt++;
> @@ -53,30 +50,14 @@ static int print_bpf_output(void *data, int size)
> 	if (cnt == MAX_CNT) {
> 		printf("recv %lld events per sec\n",
> 		       MAX_CNT * 1000000000ll / (time_get_ns() - start_time));
> -		return LIBBPF_PERF_EVENT_DONE;
> +		return;
> 	}
> -
> -	return LIBBPF_PERF_EVENT_CONT;
> -}
> -
> -static void test_bpf_perf_event(void)
> -{
> -	struct perf_event_attr attr = {
> -		.sample_type = PERF_SAMPLE_RAW,
> -		.type = PERF_TYPE_SOFTWARE,
> -		.config = PERF_COUNT_SW_BPF_OUTPUT,
> -	};
> -	int key = 0;
> -
> -	pmu_fd = sys_perf_event_open(&attr, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0);
> -
> -	assert(pmu_fd >= 0);
> -	assert(bpf_map_update_elem(map_fd[0], &key, &pmu_fd, BPF_ANY) == 0);
> -	ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
> }
> 
> int main(int argc, char **argv)
> {
> +	struct perf_buffer_opts pb_opts = {};
> +	struct perf_buffer *pb;
> 	char filename[256];
> 	FILE *f;
> 	int ret;
> @@ -88,16 +69,20 @@ int main(int argc, char **argv)
> 		return 1;
> 	}
> 
> -	test_bpf_perf_event();
> -
> -	if (perf_event_mmap(pmu_fd) < 0)
> +	pb_opts.sample_cb = print_bpf_output;
> +	pb = perf_buffer__new(map_fd[0], 8, &pb_opts);
> +	ret = libbpf_get_error(pb);
> +	if (ret) {
> +		printf("failed to setup perf_buffer: %d\n", ret);
> 		return 1;
> +	}
> 
> 	f = popen("taskset 1 dd if=/dev/zero of=/dev/null", "r");
> 	(void) f;
> 
> 	start_time = time_get_ns();
> -	ret = perf_event_poller(pmu_fd, print_bpf_output);
> +	while ((ret = perf_buffer__poll(pb, 1000)) >= 0 && cnt < MAX_CNT) {
> +	}
> 	kill(0, SIGINT);
> 	return ret;
> }
> -- 
> 2.17.1
> 


^ permalink raw reply

* Re: [PATCH net-next v2 3/3] netlink: add validation of NLA_F_NESTED flag
From: Thomas Haller @ 2019-07-23  9:28 UTC (permalink / raw)
  To: Michal Kubecek, netdev
  Cc: David S. Miller, Johannes Berg, David Ahern, linux-kernel
In-Reply-To: <20190723090908.GA2204@unicorn.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 1851 bytes --]

On Tue, 2019-07-23 at 11:09 +0200, Michal Kubecek wrote:
> On Tue, Jul 23, 2019 at 10:57:54AM +0200, Thomas Haller wrote:
> > Does this flag and strict validation really provide any value?
> > Commonly a netlink message is a plain TLV blob, and the meaning
> > depends entirely on the policy.
> > 
> > What I mean is that for example
> > 
> >   NLA_PUT_U32 (msg, ATTR_IFINDEX, (uint32_t) ifindex)
> >   NLA_PUT_STRING (msg, ATTR_IFNAME, "net")
> > 
> > results in a 4 bytes payload that does not encode whether the data
> > is
> > a number or a string.
> > 
> > Why is it valuable in this case to encode additional type
> > information
> > inside the message, when it's commonly not done and also not
> > necessary?
> 
> One big advantage of having nested attributes explicitly marked is
> that
> it allows parsers not aware of the semantics to recognize nested
> attributes and parse their inner structure.
> 
> This is very important e.g. for debugging purposes as without the
> flag,
> wireshark can only recurse into nested attributes if it understands
> the
> protocol and knows they are nested, otherwise it displays them only
> as
> an opaque blob (which is what happens for most netlink based
> protocols).
> Another example is mnl_nlmsg_fprintf() function from libmnl which is
> also a valuable debugging aid but without NLA_F_NESTED flags it
> cannot
> show message structure properly.

Hi,

I don't question the use of the flag. I question whether it's necessary
for kernel to strictly require the sending side to aid debuggability.

"e.g. for debugging purposes" makes it sound like it would be important
for something else. I wonder what else.


Anyway. What you elaborate makes sense!! Thanks


My main point was to raise awareness that this is a problem for libnl3.


best,
Thomas

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH bpf-next 1/5] selftests/bpf: convert test_get_stack_raw_tp to perf_buffer API
From: Song Liu @ 2019-07-23  9:25 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev@vger.kernel.org, Alexei Starovoitov,
	daniel@iogearbox.net, andrii.nakryiko@gmail.com, Kernel Team
In-Reply-To: <20190723043112.3145810-2-andriin@fb.com>



> On Jul 22, 2019, at 9:31 PM, Andrii Nakryiko <andriin@fb.com> wrote:
> 
> Convert test_get_stack_raw_tp test to new perf_buffer API.
> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
> .../bpf/prog_tests/get_stack_raw_tp.c         | 78 ++++++++++---------
> .../bpf/progs/test_get_stack_rawtp.c          |  2 +-
> 2 files changed, 44 insertions(+), 36 deletions(-)
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c b/tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c
> index c2a0a9d5591b..473889e1b219 100644
> --- a/tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c
> +++ b/tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c
> @@ -1,8 +1,15 @@
> // SPDX-License-Identifier: GPL-2.0
> +#define _GNU_SOURCE
> +#include <pthread.h>
> +#include <sched.h>
> +#include <sys/socket.h>
> #include <test_progs.h>
> 
> #define MAX_CNT_RAWTP	10ull
> #define MAX_STACK_RAWTP	100
> +
> +static int duration = 0;
> +

Are we using "duration" at all?

> struct get_stack_trace_t {
> 	int pid;
> 	int kern_stack_size;
> @@ -13,7 +20,7 @@ struct get_stack_trace_t {
> 	struct bpf_stack_build_id user_stack_buildid[MAX_STACK_RAWTP];
> };
> 
> -static int get_stack_print_output(void *data, int size)
> +static void get_stack_print_output(void *ctx, int cpu, void *data, __u32 size)
> {
> 	bool good_kern_stack = false, good_user_stack = false;
> 	const char *nonjit_func = "___bpf_prog_run";
> @@ -65,75 +72,76 @@ static int get_stack_print_output(void *data, int size)
> 		if (e->user_stack_size > 0 && e->user_stack_buildid_size > 0)
> 			good_user_stack = true;
> 	}
> -	if (!good_kern_stack || !good_user_stack)
> -		return LIBBPF_PERF_EVENT_ERROR;
> 
> -	if (cnt == MAX_CNT_RAWTP)
> -		return LIBBPF_PERF_EVENT_DONE;
> -
> -	return LIBBPF_PERF_EVENT_CONT;
> +	if (!good_kern_stack)
> +	    CHECK(!good_kern_stack, "bad_kern_stack", "bad\n");

Two "bad" is a little weird. How about "kern stack", "bad"?

> +	if (!good_user_stack)
> +	    CHECK(!good_user_stack, "bad_user_stack", "bad\n");
> }
> 
> void test_get_stack_raw_tp(void)
> {
> 	const char *file = "./test_get_stack_rawtp.o";
> -	int i, efd, err, prog_fd, pmu_fd, perfmap_fd;
> -	struct perf_event_attr attr = {};
> +	const char *prog_name = "raw_tracepoint/sys_enter";
> +	int i, err, prog_fd, exp_cnt = MAX_CNT_RAWTP;
> +	struct perf_buffer_opts pb_opts = {};
> +	struct perf_buffer *pb = NULL;
> +	struct bpf_link *link = NULL;
> 	struct timespec tv = {0, 10};
> -	__u32 key = 0, duration = 0;
> +	struct bpf_program *prog;
> 	struct bpf_object *obj;
> +	struct bpf_map *map;
> +	cpu_set_t cpu_set;
> 
> 	err = bpf_prog_load(file, BPF_PROG_TYPE_RAW_TRACEPOINT, &obj, &prog_fd);
> 	if (CHECK(err, "prog_load raw tp", "err %d errno %d\n", err, errno))
> 		return;
> 
> -	efd = bpf_raw_tracepoint_open("sys_enter", prog_fd);
> -	if (CHECK(efd < 0, "raw_tp_open", "err %d errno %d\n", efd, errno))
> +	prog = bpf_object__find_program_by_title(obj, prog_name);
> +	if (CHECK(!prog, "find_probe", "prog '%s' not found\n", prog_name))
> 		goto close_prog;
> 
> -	perfmap_fd = bpf_find_map(__func__, obj, "perfmap");
> -	if (CHECK(perfmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
> -		  perfmap_fd, errno))
> +	map = bpf_object__find_map_by_name(obj, "perfmap");
> +	if (CHECK(!map, "bpf_find_map", "not found\n"))
> 		goto close_prog;
> 
> 	err = load_kallsyms();
> 	if (CHECK(err < 0, "load_kallsyms", "err %d errno %d\n", err, errno))
> 		goto close_prog;
> 
> -	attr.sample_type = PERF_SAMPLE_RAW;
> -	attr.type = PERF_TYPE_SOFTWARE;
> -	attr.config = PERF_COUNT_SW_BPF_OUTPUT;
> -	pmu_fd = syscall(__NR_perf_event_open, &attr, getpid()/*pid*/, -1/*cpu*/,
> -			 -1/*group_fd*/, 0);
> -	if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd,
> -		  errno))
> +	CPU_ZERO(&cpu_set);
> +	CPU_SET(0, &cpu_set);
> +	err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set);
> +	if (CHECK(err, "set_affinity", "err %d, errno %d\n", err, errno))
> 		goto close_prog;
> 
> -	err = bpf_map_update_elem(perfmap_fd, &key, &pmu_fd, BPF_ANY);
> -	if (CHECK(err < 0, "bpf_map_update_elem", "err %d errno %d\n", err,
> -		  errno))
> +	link = bpf_program__attach_raw_tracepoint(prog, "sys_enter");
> +	if (CHECK(IS_ERR(link), "attach_raw_tp", "err %ld\n", PTR_ERR(link)))
> 		goto close_prog;
> 
> -	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
> -	if (CHECK(err < 0, "ioctl PERF_EVENT_IOC_ENABLE", "err %d errno %d\n",
> -		  err, errno))
> -		goto close_prog;
> -
> -	err = perf_event_mmap(pmu_fd);
> -	if (CHECK(err < 0, "perf_event_mmap", "err %d errno %d\n", err, errno))
> +	pb_opts.sample_cb = get_stack_print_output;
> +	pb = perf_buffer__new(bpf_map__fd(map), 8, &pb_opts);
> +	if (CHECK(IS_ERR(pb), "perf_buf__new", "err %ld\n", PTR_ERR(pb)))
> 		goto close_prog;
> 
> 	/* trigger some syscall action */
> 	for (i = 0; i < MAX_CNT_RAWTP; i++)
> 		nanosleep(&tv, NULL);
> 
> -	err = perf_event_poller(pmu_fd, get_stack_print_output);
> -	if (CHECK(err < 0, "perf_event_poller", "err %d errno %d\n", err, errno))
> -		goto close_prog;
> +	while (exp_cnt > 0) {
> +		err = perf_buffer__poll(pb, 100);
> +		if (err < 0 && CHECK(err < 0, "pb__poll", "err %d\n", err))
> +			goto close_prog;
> +		exp_cnt -= err;
> +	}
> 
> 	goto close_prog_noerr;
> close_prog:
> 	error_cnt++;
> close_prog_noerr:
> +	if (!IS_ERR_OR_NULL(link))
> +		bpf_link__destroy(link);
> +	if (!IS_ERR_OR_NULL(pb))
> +		perf_buffer__free(pb);
> 	bpf_object__close(obj);
> }
> diff --git a/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c b/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c
> index 33254b771384..f8ffa3f3d44b 100644
> --- a/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c
> +++ b/tools/testing/selftests/bpf/progs/test_get_stack_rawtp.c
> @@ -55,7 +55,7 @@ struct {
> 	__type(value, raw_stack_trace_t);
> } rawdata_map SEC(".maps");
> 
> -SEC("tracepoint/raw_syscalls/sys_enter")
> +SEC("raw_tracepoint/sys_enter")
> int bpf_prog1(void *ctx)
> {
> 	int max_len, max_buildid_len, usize, ksize, total_size;
> -- 
> 2.17.1
> 


^ permalink raw reply

* Re: [PATCH 2/6] vhost: validate MMU notifier registration
From: Michael S. Tsirkin @ 2019-07-23  9:17 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel
In-Reply-To: <20190723075718.6275-3-jasowang@redhat.com>

On Tue, Jul 23, 2019 at 03:57:14AM -0400, Jason Wang wrote:
> The return value of mmu_notifier_register() is not checked in
> vhost_vring_set_num_addr(). This will cause an out of sync between mm
> and MMU notifier thus a double free. To solve this, introduce a
> boolean flag to track whether MMU notifier is registered and only do
> unregistering when it was true.
> 
> Reported-and-tested-by:
> syzbot+e58112d71f77113ddb7b@syzkaller.appspotmail.com
> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Right. This fixes the bug.
But it's not great that simple things like
setting vq address put pressure on memory allocator.
Also, if we get a single during processing
notifier register will fail, disabling optimization permanently.

In fact, see below:


> ---
>  drivers/vhost/vhost.c | 19 +++++++++++++++----
>  drivers/vhost/vhost.h |  1 +
>  2 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 34c0d970bcbc..058191d5efad 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -630,6 +630,7 @@ void vhost_dev_init(struct vhost_dev *dev,
>  	dev->iov_limit = iov_limit;
>  	dev->weight = weight;
>  	dev->byte_weight = byte_weight;
> +	dev->has_notifier = false;
>  	init_llist_head(&dev->work_list);
>  	init_waitqueue_head(&dev->wait);
>  	INIT_LIST_HEAD(&dev->read_list);
> @@ -731,6 +732,7 @@ long vhost_dev_set_owner(struct vhost_dev *dev)
>  	if (err)
>  		goto err_mmu_notifier;
>  #endif
> +	dev->has_notifier = true;
>  
>  	return 0;
>  

I just noticed that set owner now fails if we get a signal.
Userspace could retry in theory but it does not:
this is userspace abi breakage since it used to only
fail on invalid input.

> @@ -960,7 +962,11 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
>  	}
>  	if (dev->mm) {
>  #if VHOST_ARCH_CAN_ACCEL_UACCESS
> -		mmu_notifier_unregister(&dev->mmu_notifier, dev->mm);
> +		if (dev->has_notifier) {
> +			mmu_notifier_unregister(&dev->mmu_notifier,
> +						dev->mm);
> +			dev->has_notifier = false;
> +		}
>  #endif
>  		mmput(dev->mm);
>  	}
> @@ -2065,8 +2071,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d,
>  	/* Unregister MMU notifer to allow invalidation callback
>  	 * can access vq->uaddrs[] without holding a lock.
>  	 */
> -	if (d->mm)
> +	if (d->has_notifier) {
>  		mmu_notifier_unregister(&d->mmu_notifier, d->mm);
> +		d->has_notifier = false;
> +	}
>  
>  	vhost_uninit_vq_maps(vq);
>  #endif
> @@ -2086,8 +2094,11 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d,
>  	if (r == 0)
>  		vhost_setup_vq_uaddr(vq);
>  
> -	if (d->mm)
> -		mmu_notifier_register(&d->mmu_notifier, d->mm);
> +	if (d->mm) {
> +		r = mmu_notifier_register(&d->mmu_notifier, d->mm);
> +		if (!r)
> +			d->has_notifier = true;
> +	}
>  #endif
>  
>  	mutex_unlock(&vq->mutex);
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 819296332913..a62f56a4cf72 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -214,6 +214,7 @@ struct vhost_dev {
>  	int iov_limit;
>  	int weight;
>  	int byte_weight;
> +	bool has_notifier;
>  };
>  
>  bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len);
> -- 
> 2.18.1

^ permalink raw reply

* Re: [PATCH 4/6] vhost: reset invalidate_count in vhost_set_vring_num_addr()
From: Michael S. Tsirkin @ 2019-07-23  9:17 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel
In-Reply-To: <20190723075718.6275-5-jasowang@redhat.com>

On Tue, Jul 23, 2019 at 03:57:16AM -0400, Jason Wang wrote:
> The vhost_set_vring_num_addr() could be called in the middle of
> invalidate_range_start() and invalidate_range_end(). If we don't reset
> invalidate_count after the un-registering of MMU notifier, the
> invalidate_cont will run out of sync (e.g never reach zero). This will
> in fact disable the fast accessor path. Fixing by reset the count to
> zero.
> 
> Reported-by: Michael S. Tsirkin <mst@redhat.com>
> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/vhost.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 03666b702498..89c9f08b5146 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -2074,6 +2074,10 @@ static long vhost_vring_set_num_addr(struct vhost_dev *d,
>  		d->has_notifier = false;
>  	}
>  
> +	/* reset invalidate_count in case we are in the middle of
> +	 * invalidate_start() and invalidate_end().
> +	 */
> +	vq->invalidate_count = 0;

I think that the code is ok but the comments are not very clear:
- we are never in the middle since we just removed the notifier
- the result is not just disabling optimization:
  if notifier becomes negative, then later we
  can think it's ok to map when it isn't since
  notifier is active.

>  	vhost_uninit_vq_maps(vq);
>  #endif
>  
> -- 
> 2.18.1

^ permalink raw reply

* Re: [PATCH 5/6] vhost: mark dirty pages during map uninit
From: Michael S. Tsirkin @ 2019-07-23  9:17 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel
In-Reply-To: <20190723075718.6275-6-jasowang@redhat.com>

On Tue, Jul 23, 2019 at 03:57:17AM -0400, Jason Wang wrote:
> We don't mark dirty pages if the map was teared down outside MMU
> notifier. This will lead untracked dirty pages. Fixing by marking
> dirty pages during map uninit.
> 
> Reported-by: Michael S. Tsirkin <mst@redhat.com>
> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/vhost.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 89c9f08b5146..5b8821d00fe4 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -306,6 +306,18 @@ static void vhost_map_unprefetch(struct vhost_map *map)
>  	kfree(map);
>  }
>  
> +static void vhost_set_map_dirty(struct vhost_virtqueue *vq,
> +				struct vhost_map *map, int index)
> +{
> +	struct vhost_uaddr *uaddr = &vq->uaddrs[index];
> +	int i;
> +
> +	if (uaddr->write) {
> +		for (i = 0; i < map->npages; i++)
> +			set_page_dirty(map->pages[i]);
> +	}
> +}
> +
>  static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
>  {
>  	struct vhost_map *map[VHOST_NUM_ADDRS];
> @@ -315,8 +327,10 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
>  	for (i = 0; i < VHOST_NUM_ADDRS; i++) {
>  		map[i] = rcu_dereference_protected(vq->maps[i],
>  				  lockdep_is_held(&vq->mmu_lock));
> -		if (map[i])
> +		if (map[i]) {
> +			vhost_set_map_dirty(vq, map[i], i);
>  			rcu_assign_pointer(vq->maps[i], NULL);
> +		}
>  	}
>  	spin_unlock(&vq->mmu_lock);
>  
> @@ -354,7 +368,6 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
>  {
>  	struct vhost_uaddr *uaddr = &vq->uaddrs[index];
>  	struct vhost_map *map;
> -	int i;
>  
>  	if (!vhost_map_range_overlap(uaddr, start, end))
>  		return;
> @@ -365,10 +378,7 @@ static void vhost_invalidate_vq_start(struct vhost_virtqueue *vq,
>  	map = rcu_dereference_protected(vq->maps[index],
>  					lockdep_is_held(&vq->mmu_lock));
>  	if (map) {
> -		if (uaddr->write) {
> -			for (i = 0; i < map->npages; i++)
> -				set_page_dirty(map->pages[i]);
> -		}
> +		vhost_set_map_dirty(vq, map, index);
>  		rcu_assign_pointer(vq->maps[index], NULL);
>  	}
>  	spin_unlock(&vq->mmu_lock);

OK and the reason it's safe is because the invalidate counter
got incremented so we know page will not get mapped again.

But we *do* need to wait for page not to be mapped.
And if that means waiting for VQ processing to finish,
then I worry that is a very log time.


> -- 
> 2.18.1

^ permalink raw reply

* Re: [PATCH 6/6] vhost: don't do synchronize_rcu() in vhost_uninit_vq_maps()
From: Michael S. Tsirkin @ 2019-07-23  9:16 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, virtualization, netdev, linux-kernel
In-Reply-To: <20190723075718.6275-7-jasowang@redhat.com>

On Tue, Jul 23, 2019 at 03:57:18AM -0400, Jason Wang wrote:
> There's no need for RCU synchronization in vhost_uninit_vq_maps()
> since we've already serialized with readers (memory accessors). This
> also avoid the possible userspace DOS through ioctl() because of the
> possible high latency caused by synchronize_rcu().
> 
> Reported-by: Michael S. Tsirkin <mst@redhat.com>
> Fixes: 7f466032dc9e ("vhost: access vq metadata through kernel virtual address")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

I agree synchronize_rcu in both mmu notifiers and ioctl
is a problem we must fix.

> ---
>  drivers/vhost/vhost.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index 5b8821d00fe4..a17df1f4069a 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -334,7 +334,9 @@ static void vhost_uninit_vq_maps(struct vhost_virtqueue *vq)
>  	}
>  	spin_unlock(&vq->mmu_lock);
>  
> -	synchronize_rcu();
> +	/* No need for synchronize_rcu() or kfree_rcu() since we are
> +	 * serialized with memory accessors (e.g vq mutex held).
> +	 */
>  
>  	for (i = 0; i < VHOST_NUM_ADDRS; i++)
>  		if (map[i])
> -- 
> 2.18.1

.. however we can not RCU with no synchronization in sight.
Sometimes there are hacks like using a lock/unlock
pair instead of sync, but here no one bothers.

specifically notifiers call reset vq maps which calls
uninit vq maps which is not under any lock.

You will get use after free when map is then accessed.

If you always have a lock then just take that lock
and no need for RCU.

-- 
MST

^ permalink raw reply

* Re: [PATCH net-next v2 3/3] netlink: add validation of NLA_F_NESTED flag
From: Michal Kubecek @ 2019-07-23  9:09 UTC (permalink / raw)
  To: netdev
  Cc: Thomas Haller, David S. Miller, Johannes Berg, David Ahern,
	linux-kernel
In-Reply-To: <0fc58a4883f6656208b9250876e53d723919e342.camel@redhat.com>

On Tue, Jul 23, 2019 at 10:57:54AM +0200, Thomas Haller wrote:
> Does this flag and strict validation really provide any value?
> Commonly a netlink message is a plain TLV blob, and the meaning
> depends entirely on the policy.
> 
> What I mean is that for example
> 
>   NLA_PUT_U32 (msg, ATTR_IFINDEX, (uint32_t) ifindex)
>   NLA_PUT_STRING (msg, ATTR_IFNAME, "net")
> 
> results in a 4 bytes payload that does not encode whether the data is
> a number or a string.
> 
> Why is it valuable in this case to encode additional type information
> inside the message, when it's commonly not done and also not
> necessary?

One big advantage of having nested attributes explicitly marked is that
it allows parsers not aware of the semantics to recognize nested
attributes and parse their inner structure.

This is very important e.g. for debugging purposes as without the flag,
wireshark can only recurse into nested attributes if it understands the
protocol and knows they are nested, otherwise it displays them only as
an opaque blob (which is what happens for most netlink based protocols).
Another example is mnl_nlmsg_fprintf() function from libmnl which is
also a valuable debugging aid but without NLA_F_NESTED flags it cannot
show message structure properly.

Michal Kubecek

^ permalink raw reply

* [PATCH V3 1/1] can: sja1000: f81601: add Fintek F81601 support
From: Ji-Ze Hong (Peter Hong) @ 2019-07-23  9:03 UTC (permalink / raw)
  To: wg, mkl, peter_hong
  Cc: davem, f.suligoi, linux-kernel, linux-can, netdev,
	Ji-Ze Hong (Peter Hong)

This patch add support for Fintek PCIE to 2 CAN controller support

Signed-off-by: Ji-Ze Hong (Peter Hong) <hpeter+linux_kernel@gmail.com>
---
v3:
	1: Fix module parameter "internal_clk" default from 1 to true.
	2: Remove non-usable pcim_iounmap().

v2:
	1: Fix comment on the spinlock with write access.
	2: Use ARRAY_SIZE instead of F81601_PCI_MAX_CHAN.
	3: Check the strap pin outside the loop.
	4: Fix the cleanup issue in f81601_pci_add_card().
	5: Remove unused "channels" in struct f81601_pci_card.

 drivers/net/can/sja1000/Kconfig  |   8 ++
 drivers/net/can/sja1000/Makefile |   1 +
 drivers/net/can/sja1000/f81601.c | 213 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 222 insertions(+)
 create mode 100644 drivers/net/can/sja1000/f81601.c

diff --git a/drivers/net/can/sja1000/Kconfig b/drivers/net/can/sja1000/Kconfig
index f6dc89927ece..8588323c5138 100644
--- a/drivers/net/can/sja1000/Kconfig
+++ b/drivers/net/can/sja1000/Kconfig
@@ -101,4 +101,12 @@ config CAN_TSCAN1
 	  IRQ numbers are read from jumpers JP4 and JP5,
 	  SJA1000 IO base addresses are chosen heuristically (first that works).
 
+config CAN_F81601
+	tristate "Fintek F81601 PCIE to 2 CAN Controller"
+	depends on PCI
+	help
+	  This driver adds support for Fintek F81601 PCIE to 2 CAN Controller.
+	  It had internal 24MHz clock source, but it can be changed by
+	  manufacturer. We can use modinfo to get usage for parameters.
+	  Visit http://www.fintek.com.tw to get more information.
 endif
diff --git a/drivers/net/can/sja1000/Makefile b/drivers/net/can/sja1000/Makefile
index 9253aaf9e739..6f6268543bd9 100644
--- a/drivers/net/can/sja1000/Makefile
+++ b/drivers/net/can/sja1000/Makefile
@@ -13,3 +13,4 @@ obj-$(CONFIG_CAN_PEAK_PCMCIA) += peak_pcmcia.o
 obj-$(CONFIG_CAN_PEAK_PCI) += peak_pci.o
 obj-$(CONFIG_CAN_PLX_PCI) += plx_pci.o
 obj-$(CONFIG_CAN_TSCAN1) += tscan1.o
+obj-$(CONFIG_CAN_F81601) += f81601.o
diff --git a/drivers/net/can/sja1000/f81601.c b/drivers/net/can/sja1000/f81601.c
new file mode 100644
index 000000000000..3d0436efead9
--- /dev/null
+++ b/drivers/net/can/sja1000/f81601.c
@@ -0,0 +1,213 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Fintek F81601 PCIE to 2 CAN controller driver
+ *
+ * Copyright (C) 2019 Peter Hong <peter_hong@fintek.com.tw>
+ * Copyright (C) 2019 Linux Foundation
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+#include <linux/netdevice.h>
+#include <linux/delay.h>
+#include <linux/slab.h>
+#include <linux/pci.h>
+#include <linux/can/dev.h>
+#include <linux/io.h>
+#include <linux/version.h>
+
+#include "sja1000.h"
+
+#define F81601_PCI_MAX_CHAN		2
+
+#define F81601_DECODE_REG		0x209
+#define F81601_IO_MODE			BIT(7)
+#define F81601_MEM_MODE			BIT(6)
+#define F81601_CFG_MODE			BIT(5)
+#define F81601_CAN2_INTERNAL_CLK	BIT(3)
+#define F81601_CAN1_INTERNAL_CLK	BIT(2)
+#define F81601_CAN2_EN			BIT(1)
+#define F81601_CAN1_EN			BIT(0)
+
+#define F81601_TRAP_REG			0x20a
+#define F81601_CAN2_HAS_EN		BIT(4)
+
+struct f81601_pci_card {
+	void __iomem *addr;
+	spinlock_t lock;	/* use this spin lock only for write access */
+	struct pci_dev *dev;
+	struct net_device *net_dev[F81601_PCI_MAX_CHAN];
+};
+
+static const struct pci_device_id f81601_pci_tbl[] = {
+	{ PCI_DEVICE(0x1c29, 0x1703) },
+	{},
+};
+
+MODULE_DEVICE_TABLE(pci, f81601_pci_tbl);
+
+static bool internal_clk = true;
+module_param(internal_clk, bool, 0444);
+MODULE_PARM_DESC(internal_clk, "Use internal clock, default true (24MHz)");
+
+static unsigned int external_clk;
+module_param(external_clk, uint, 0444);
+MODULE_PARM_DESC(external_clk, "External clock when internal_clk disabled");
+
+static u8 f81601_pci_read_reg(const struct sja1000_priv *priv, int port)
+{
+	return readb(priv->reg_base + port);
+}
+
+static void f81601_pci_write_reg(const struct sja1000_priv *priv, int port,
+				 u8 val)
+{
+	struct f81601_pci_card *card = priv->priv;
+	unsigned long flags;
+
+	spin_lock_irqsave(&card->lock, flags);
+	writeb(val, priv->reg_base + port);
+	readb(priv->reg_base);
+	spin_unlock_irqrestore(&card->lock, flags);
+}
+
+static void f81601_pci_del_card(struct pci_dev *pdev)
+{
+	struct f81601_pci_card *card = pci_get_drvdata(pdev);
+	struct net_device *dev;
+	int i = 0;
+
+	for (i = 0; i < ARRAY_SIZE(card->net_dev); i++) {
+		dev = card->net_dev[i];
+		if (!dev)
+			continue;
+
+		dev_info(&pdev->dev, "%s: Removing %s\n", __func__, dev->name);
+
+		unregister_sja1000dev(dev);
+		free_sja1000dev(dev);
+	}
+}
+
+/* Probe F81601 based device for the SJA1000 chips and register each
+ * available CAN channel to SJA1000 Socket-CAN subsystem.
+ */
+static int f81601_pci_add_card(struct pci_dev *pdev,
+			       const struct pci_device_id *ent)
+{
+	struct sja1000_priv *priv;
+	struct net_device *dev;
+	struct f81601_pci_card *card;
+	int err, i, count;
+	u8 tmp;
+
+	if (pcim_enable_device(pdev) < 0) {
+		dev_err(&pdev->dev, "Failed to enable PCI device\n");
+		return -ENODEV;
+	}
+
+	dev_info(&pdev->dev, "Detected card at slot #%i\n",
+		 PCI_SLOT(pdev->devfn));
+
+	card = devm_kzalloc(&pdev->dev, sizeof(*card), GFP_KERNEL);
+	if (!card)
+		return -ENOMEM;
+
+	card->dev = pdev;
+	spin_lock_init(&card->lock);
+
+	pci_set_drvdata(pdev, card);
+
+	tmp = F81601_IO_MODE | F81601_MEM_MODE | F81601_CFG_MODE |
+		F81601_CAN2_EN | F81601_CAN1_EN;
+
+	if (internal_clk) {
+		tmp |= F81601_CAN2_INTERNAL_CLK | F81601_CAN1_INTERNAL_CLK;
+
+		dev_info(&pdev->dev,
+			 "F81601 running with internal clock: 24Mhz\n");
+	} else {
+		dev_info(&pdev->dev,
+			 "F81601 running with external clock: %dMhz\n",
+			 external_clk / 1000000);
+	}
+
+	pci_write_config_byte(pdev, F81601_DECODE_REG, tmp);
+
+	card->addr = pcim_iomap(pdev, 0, pci_resource_len(pdev, 0));
+
+	if (!card->addr) {
+		err = -ENOMEM;
+		dev_err(&pdev->dev, "%s: Failed to remap BAR\n", __func__);
+		goto failure_cleanup;
+	}
+
+	/* read CAN2_HW_EN strap pin to detect how many CANBUS do we have */
+	count = ARRAY_SIZE(card->net_dev);
+	pci_read_config_byte(pdev, F81601_TRAP_REG, &tmp);
+	if (!(tmp & F81601_CAN2_HAS_EN))
+		count = 1;
+
+	/* Detect available channels */
+	for (i = 0; i < count; i++) {
+		dev = alloc_sja1000dev(0);
+		if (!dev) {
+			err = -ENOMEM;
+			goto failure_cleanup;
+		}
+
+		priv = netdev_priv(dev);
+		priv->priv = card;
+		priv->irq_flags = IRQF_SHARED;
+		priv->reg_base = card->addr + 0x80 * i;
+		priv->read_reg = f81601_pci_read_reg;
+		priv->write_reg = f81601_pci_write_reg;
+
+		if (internal_clk)
+			priv->can.clock.freq = 24000000 / 2;
+		else
+			priv->can.clock.freq = external_clk / 2;
+
+		priv->ocr = OCR_TX0_PUSHPULL | OCR_TX1_PUSHPULL;
+		priv->cdr = CDR_CBP;
+
+		SET_NETDEV_DEV(dev, &pdev->dev);
+		dev->dev_id = i;
+		dev->irq = pdev->irq;
+
+		/* Register SJA1000 device */
+		err = register_sja1000dev(dev);
+		if (err) {
+			dev_err(&pdev->dev,
+				"%s: Registering device failed: %x\n", __func__,
+				err);
+			free_sja1000dev(dev);
+			goto failure_cleanup;
+		}
+
+		card->net_dev[i] = dev;
+		dev_info(&pdev->dev, "Channel #%d, %s at 0x%p, irq %d\n", i,
+			 dev->name, priv->reg_base, dev->irq);
+	}
+
+	return 0;
+
+failure_cleanup:
+	dev_err(&pdev->dev, "%s: failed: %d. Cleaning Up.\n", __func__, err);
+	f81601_pci_del_card(pdev);
+
+	return err;
+}
+
+static struct pci_driver f81601_pci_driver = {
+	.name =		"f81601",
+	.id_table =	f81601_pci_tbl,
+	.probe =	f81601_pci_add_card,
+	.remove =	f81601_pci_del_card,
+};
+
+MODULE_DESCRIPTION("Fintek F81601 PCIE to 2 CANBUS adaptor driver");
+MODULE_AUTHOR("Peter Hong <peter_hong@fintek.com.tw>");
+MODULE_LICENSE("GPL v2");
+
+module_pci_driver(f81601_pci_driver);
-- 
2.7.4


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox