Netdev List
 help / color / mirror / Atom feed
* [PATCH -next 0/2] Support riscv jit to provide bpf_line_info
From: Pu Lehui @ 2022-04-26 14:09 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev, linux-kernel
  Cc: bjorn, luke.r.nels, xi.wang, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, kpsingh, paul.walmsley,
	palmer, aou, pulehui

patch 1 fix an issue that could not print bpf line info due
to data inconsistency in 32-bit environment.

patch 2 add support for riscv jit to provide bpf_line_info.
Both RV32 and RV64 tests have been passed as like follow:

./test_progs -a btf
#19 btf:OK
Summary: 1/215 PASSED, 0 SKIPPED, 0 FAILED

Pu Lehui (2):
  bpf: Unify data extension operation of jited_ksyms and jited_linfo
  riscv, bpf: Support riscv jit to provide bpf_line_info

 arch/riscv/net/bpf_jit.h                     |  1 +
 arch/riscv/net/bpf_jit_core.c                |  7 ++++++-
 kernel/bpf/syscall.c                         |  5 ++++-
 tools/lib/bpf/bpf_prog_linfo.c               |  8 ++++----
 tools/testing/selftests/bpf/prog_tests/btf.c | 18 +++++++++---------
 5 files changed, 24 insertions(+), 15 deletions(-)

-- 
2.25.1


^ permalink raw reply

* [PATCH -next 2/2] riscv, bpf: Support riscv jit to provide bpf_line_info
From: Pu Lehui @ 2022-04-26 14:09 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev, linux-kernel
  Cc: bjorn, luke.r.nels, xi.wang, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, kpsingh, paul.walmsley,
	palmer, aou, pulehui
In-Reply-To: <20220426140924.3308472-1-pulehui@huawei.com>

Add support for riscv jit to provide bpf_line_info.
We need to consider the prologue offset in ctx->offset,
but unlike x86 and arm64, ctx->offset of riscv does not
provide an extra slot for the prologue, so here we just
calculate the len of prologue and add it to ctx->offset
at the end. Both RV64 and RV32 have been tested.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 arch/riscv/net/bpf_jit.h      | 1 +
 arch/riscv/net/bpf_jit_core.c | 7 ++++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/net/bpf_jit.h b/arch/riscv/net/bpf_jit.h
index 2a3715bf29fe..7dbbad7595f0 100644
--- a/arch/riscv/net/bpf_jit.h
+++ b/arch/riscv/net/bpf_jit.h
@@ -69,6 +69,7 @@ struct rv_jit_context {
 	struct bpf_prog *prog;
 	u16 *insns;		/* RV insns */
 	int ninsns;
+	int prologue_offset;
 	int epilogue_offset;
 	int *offset;		/* BPF to RV */
 	int nexentries;
diff --git a/arch/riscv/net/bpf_jit_core.c b/arch/riscv/net/bpf_jit_core.c
index be743d700aa7..6383eb591b0d 100644
--- a/arch/riscv/net/bpf_jit_core.c
+++ b/arch/riscv/net/bpf_jit_core.c
@@ -44,7 +44,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	unsigned int prog_size = 0, extable_size = 0;
 	bool tmp_blinded = false, extra_pass = false;
 	struct bpf_prog *tmp, *orig_prog = prog;
-	int pass = 0, prev_ninsns = 0, i;
+	int pass = 0, prev_ninsns = 0, prologue_len, i;
 	struct rv_jit_data *jit_data;
 	struct rv_jit_context *ctx;
 
@@ -95,6 +95,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 			prog = orig_prog;
 			goto out_offset;
 		}
+		ctx->prologue_offset = ctx->ninsns;
 		bpf_jit_build_prologue(ctx);
 		ctx->epilogue_offset = ctx->ninsns;
 		bpf_jit_build_epilogue(ctx);
@@ -161,6 +162,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 
 	if (!prog->is_func || extra_pass) {
 		bpf_jit_binary_lock_ro(jit_data->header);
+		prologue_len = ctx->epilogue_offset - ctx->prologue_offset;
+		for (i = 0; i < prog->len; i++)
+			ctx->offset[i] = ninsns_rvoff(prologue_len + ctx->offset[i]);
+		bpf_prog_fill_jited_linfo(prog, ctx->offset);
 out_offset:
 		kfree(ctx->offset);
 		kfree(jit_data);
-- 
2.25.1


^ permalink raw reply related

* [PATCH -next 1/2] bpf: Unify data extension operation of jited_ksyms and jited_linfo
From: Pu Lehui @ 2022-04-26 14:09 UTC (permalink / raw)
  To: bpf, linux-riscv, netdev, linux-kernel
  Cc: bjorn, luke.r.nels, xi.wang, ast, daniel, andrii, kafai,
	songliubraving, yhs, john.fastabend, kpsingh, paul.walmsley,
	palmer, aou, pulehui
In-Reply-To: <20220426140924.3308472-1-pulehui@huawei.com>

We found that 32-bit environment can not print bpf line info due
to data inconsistency between jited_ksyms[0] and jited_linfo[0].

For example:
jited_kyms[0] = 0xb800067c, jited_linfo[0] = 0xffffffffb800067c

We know that both of them store bpf func address, but due to the
different data extension operations when extended to u64, they may
not be the same. We need to unify the data extension operations of
them.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
---
 kernel/bpf/syscall.c                         |  5 ++++-
 tools/lib/bpf/bpf_prog_linfo.c               |  8 ++++----
 tools/testing/selftests/bpf/prog_tests/btf.c | 18 +++++++++---------
 3 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e9621cfa09f2..4c417c806d92 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3868,13 +3868,16 @@ static int bpf_prog_get_info_by_fd(struct file *file,
 		info.nr_jited_line_info = 0;
 	if (info.nr_jited_line_info && ulen) {
 		if (bpf_dump_raw_ok(file->f_cred)) {
+			unsigned long jited_linfo_addr;
 			__u64 __user *user_linfo;
 			u32 i;
 
 			user_linfo = u64_to_user_ptr(info.jited_line_info);
 			ulen = min_t(u32, info.nr_jited_line_info, ulen);
 			for (i = 0; i < ulen; i++) {
-				if (put_user((__u64)(long)prog->aux->jited_linfo[i],
+				jited_linfo_addr = (unsigned long)
+					prog->aux->jited_linfo[i];
+				if (put_user((__u64) jited_linfo_addr,
 					     &user_linfo[i]))
 					return -EFAULT;
 			}
diff --git a/tools/lib/bpf/bpf_prog_linfo.c b/tools/lib/bpf/bpf_prog_linfo.c
index 5c503096ef43..5cf41a563ef5 100644
--- a/tools/lib/bpf/bpf_prog_linfo.c
+++ b/tools/lib/bpf/bpf_prog_linfo.c
@@ -127,7 +127,7 @@ struct bpf_prog_linfo *bpf_prog_linfo__new(const struct bpf_prog_info *info)
 	prog_linfo->raw_linfo = malloc(data_sz);
 	if (!prog_linfo->raw_linfo)
 		goto err_free;
-	memcpy(prog_linfo->raw_linfo, (void *)(long)info->line_info, data_sz);
+	memcpy(prog_linfo->raw_linfo, (void *)(unsigned long)info->line_info, data_sz);
 
 	nr_jited_func = info->nr_jited_ksyms;
 	if (!nr_jited_func ||
@@ -148,7 +148,7 @@ struct bpf_prog_linfo *bpf_prog_linfo__new(const struct bpf_prog_info *info)
 	if (!prog_linfo->raw_jited_linfo)
 		goto err_free;
 	memcpy(prog_linfo->raw_jited_linfo,
-	       (void *)(long)info->jited_line_info, data_sz);
+	       (void *)(unsigned long)info->jited_line_info, data_sz);
 
 	/* Number of jited_line_info per jited func */
 	prog_linfo->nr_jited_linfo_per_func = malloc(nr_jited_func *
@@ -166,8 +166,8 @@ struct bpf_prog_linfo *bpf_prog_linfo__new(const struct bpf_prog_info *info)
 		goto err_free;
 
 	if (dissect_jited_func(prog_linfo,
-			       (__u64 *)(long)info->jited_ksyms,
-			       (__u32 *)(long)info->jited_func_lens))
+			       (__u64 *)(unsigned long)info->jited_ksyms,
+			       (__u32 *)(unsigned long)info->jited_func_lens))
 		goto err_free;
 
 	return prog_linfo;
diff --git a/tools/testing/selftests/bpf/prog_tests/btf.c b/tools/testing/selftests/bpf/prog_tests/btf.c
index 84aae639ddb5..d9ba1ec1d5b3 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf.c
@@ -6451,8 +6451,8 @@ static int test_get_linfo(const struct prog_info_raw_test *test,
 		  info.nr_jited_line_info, jited_cnt,
 		  info.line_info_rec_size, rec_size,
 		  info.jited_line_info_rec_size, jited_rec_size,
-		  (void *)(long)info.line_info,
-		  (void *)(long)info.jited_line_info)) {
+		  (void *)(unsigned long)info.line_info,
+		  (void *)(unsigned long)info.jited_line_info)) {
 		err = -1;
 		goto done;
 	}
@@ -6500,8 +6500,8 @@ static int test_get_linfo(const struct prog_info_raw_test *test,
 	}
 
 	if (CHECK(jited_linfo[0] != jited_ksyms[0],
-		  "jited_linfo[0]:%lx != jited_ksyms[0]:%lx",
-		  (long)(jited_linfo[0]), (long)(jited_ksyms[0]))) {
+		  "jited_linfo[0]:%llx != jited_ksyms[0]:%llx",
+		  jited_linfo[0], jited_ksyms[0])) {
 		err = -1;
 		goto done;
 	}
@@ -6519,16 +6519,16 @@ static int test_get_linfo(const struct prog_info_raw_test *test,
 		}
 
 		if (CHECK(jited_linfo[i] <= jited_linfo[i - 1],
-			  "jited_linfo[%u]:%lx <= jited_linfo[%u]:%lx",
-			  i, (long)jited_linfo[i],
-			  i - 1, (long)(jited_linfo[i - 1]))) {
+			  "jited_linfo[%u]:%llx <= jited_linfo[%u]:%llx",
+			  i, jited_linfo[i],
+			  i - 1, (jited_linfo[i - 1]))) {
 			err = -1;
 			goto done;
 		}
 
 		if (CHECK(jited_linfo[i] - cur_func_ksyms > cur_func_len,
-			  "jited_linfo[%u]:%lx - %lx > %u",
-			  i, (long)jited_linfo[i], (long)cur_func_ksyms,
+			  "jited_linfo[%u]:%llx - %llx > %u",
+			  i, jited_linfo[i], cur_func_ksyms,
 			  cur_func_len)) {
 			err = -1;
 			goto done;
-- 
2.25.1


^ permalink raw reply related

* Re: net: stmmac: dwmac-imx: half duplex crash
From: Andrew Lunn @ 2022-04-26 13:36 UTC (permalink / raw)
  To: Marcel Ziswiler
  Cc: linux-imx@nxp.com, peppe.cavallaro@st.com,
	linux-stm32@st-md-mailman.stormreply.com, davem@davemloft.net,
	linux-kernel@vger.kernel.org, pabeni@redhat.com,
	shawnguo@kernel.org, joabreu@synopsys.com, kernel@pengutronix.de,
	s.hauer@pengutronix.de, kuba@kernel.org,
	alexandre.torgue@foss.st.com, mcoquelin.stm32@gmail.com,
	netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	festevam@gmail.com
In-Reply-To: <8f8cdcf584c13faf8bcdc2abfdb62b09950ea652.camel@toradex.com>

> > Anyway, this is roughly there the check should go.
> 
> You mean it would need an additional check against advertising nothing?

I would check for a mode being requested which is not
supported. phydev->supported tells you what the MAC/PHY can actually
do. If there is a bit set which is not a member of that, return
EINVAL. I don't think the plumbing is there, but netlink ethtool
allows you to also return a text message via extack, so you could give
the user a bit more information, the link mode which is invalid.

> Well, we are gearing up on our automated testing infrastructure and asking my humble opinion on what exactly to
> test concerning the Ethernet subsystem I gave the brilliant suggestion to try each and every supported link
> mode (;-p). Which actually works just fine on every other hardware of ours just not the i.MX 8M Plus with the
> DWMAC IP (remember, even FEC MAC works). So for now this is not something a customer of ours has real trouble
> with but it raised some questions concerning whether or not and what exactly we do support...

So in practice, this should not happen. You don't advertise the half
modes, so you should never end up in a half mode. So it is not a
problem :-)

	Andrew

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: add skb drop reasons to inet connect request
From: Eric Dumazet @ 2022-04-26 13:32 UTC (permalink / raw)
  To: Menglong Dong
  Cc: Jakub Kicinski, Steven Rostedt, Ingo Molnar, David Miller,
	Hideaki YOSHIFUJI, David Ahern, Paolo Abeni, benbjiang,
	flyingpeng, Menglong Dong, Martin KaFai Lau, Talal Ahmad,
	Kees Cook, mengensun, Dongli Zhang, LKML, netdev
In-Reply-To: <20220426080709.6504-2-imagedong@tencent.com>

On Tue, Apr 26, 2022 at 1:07 AM <menglong8.dong@gmail.com> wrote:
>
> From: Menglong Dong <imagedong@tencent.com>
>
> The 'conn_request()' in struct inet_connection_sock_af_ops is used to
> process connection requesting for TCP/DCCP. Take TCP for example, it
> is just 'tcp_v4_conn_request()'.
>
> When non-zero value is returned by 'tcp_v4_conn_request()', the skb
> will be freed by kfree_skb() and a 'reset' packet will be send.
> Otherwise, it will be freed normally.
>
> In this code path, 'consume_skb()' is used in many abnormal cases, such
> as the accept queue of the listen socket full, which should be
> 'kfree_skb()'.
>
> Therefore, we make a little change to the 'conn_request()' interface.
> When 0 is returned, we call 'consume_skb()' as usual; when negative is
> returned, we call 'kfree_skb()' and send a 'reset' as usual; when
> positive is returned, which has not happened yet, we do nothing, and
> skb will be freed in 'conn_request()'. Then, we can use drop reasons
> in 'conn_request()'.
>
> Following new drop reasons are added:
>
>   SKB_DROP_REASON_LISTENOVERFLOWS
>   SKB_DROP_REASON_TCP_REQQFULLDROP
>
> Reviewed-by: Jiang Biao <benbjiang@tencent.com>
> Reviewed-by: Hao Peng <flyingpeng@tencent.com>
> Signed-off-by: Menglong Dong <imagedong@tencent.com>
> ---
>  include/linux/skbuff.h     |  4 ++++
>  include/trace/events/skb.h |  2 ++
>  net/dccp/input.c           | 12 +++++-------
>  net/ipv4/tcp_input.c       | 21 +++++++++++++--------
>  net/ipv4/tcp_ipv4.c        |  3 ++-
>  5 files changed, 26 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 84d78df60453..f33b3636bbce 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -469,6 +469,10 @@ enum skb_drop_reason {
>         SKB_DROP_REASON_PKT_TOO_BIG,    /* packet size is too big (maybe exceed
>                                          * the MTU)
>                                          */
> +       SKB_DROP_REASON_LISTENOVERFLOWS, /* accept queue of the listen socket is full */
> +       SKB_DROP_REASON_TCP_REQQFULLDROP, /* request queue of the listen
> +                                          * socket is full
> +                                          */
>         SKB_DROP_REASON_MAX,
>  };
>
> diff --git a/include/trace/events/skb.h b/include/trace/events/skb.h
> index a477bf907498..de6c93670437 100644
> --- a/include/trace/events/skb.h
> +++ b/include/trace/events/skb.h
> @@ -80,6 +80,8 @@
>         EM(SKB_DROP_REASON_IP_INADDRERRORS, IP_INADDRERRORS)    \
>         EM(SKB_DROP_REASON_IP_INNOROUTES, IP_INNOROUTES)        \
>         EM(SKB_DROP_REASON_PKT_TOO_BIG, PKT_TOO_BIG)            \
> +       EM(SKB_DROP_REASON_LISTENOVERFLOWS, LISTENOVERFLOWS)    \
> +       EM(SKB_DROP_REASON_TCP_REQQFULLDROP, TCP_REQQFULLDROP)  \
>         EMe(SKB_DROP_REASON_MAX, MAX)
>
>  #undef EM
> diff --git a/net/dccp/input.c b/net/dccp/input.c
> index 2cbb757a894f..ed20dfe83f66 100644
> --- a/net/dccp/input.c
> +++ b/net/dccp/input.c
> @@ -574,8 +574,7 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
>         struct dccp_sock *dp = dccp_sk(sk);
>         struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb);
>         const int old_state = sk->sk_state;
> -       bool acceptable;
> -       int queued = 0;
> +       int err, queued = 0;
>
>         /*
>          *  Step 3: Process LISTEN state
> @@ -606,13 +605,12 @@ int dccp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
>                          */
>                         rcu_read_lock();
>                         local_bh_disable();
> -                       acceptable = inet_csk(sk)->icsk_af_ops->conn_request(sk, skb) >= 0;
> +                       err = inet_csk(sk)->icsk_af_ops->conn_request(sk, skb);
>                         local_bh_enable();
>                         rcu_read_unlock();
> -                       if (!acceptable)
> -                               return 1;
> -                       consume_skb(skb);
> -                       return 0;
> +                       if (!err)
> +                               consume_skb(skb);
> +                       return err < 0;
>                 }
>                 if (dh->dccph_type == DCCP_PKT_RESET)
>                         goto discard;
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index daff631b9486..e0bbbd624246 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -6411,7 +6411,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
>         struct inet_connection_sock *icsk = inet_csk(sk);
>         const struct tcphdr *th = tcp_hdr(skb);
>         struct request_sock *req;
> -       int queued = 0;
> +       int err, queued = 0;
>         bool acceptable;
>         SKB_DR(reason);
>
> @@ -6438,14 +6438,13 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
>                          */
>                         rcu_read_lock();
>                         local_bh_disable();
> -                       acceptable = icsk->icsk_af_ops->conn_request(sk, skb) >= 0;
> +                       err = icsk->icsk_af_ops->conn_request(sk, skb);
>                         local_bh_enable();
>                         rcu_read_unlock();
>
> -                       if (!acceptable)
> -                               return 1;
> -                       consume_skb(skb);
> -                       return 0;
> +                       if (!err)
> +                               consume_skb(skb);

Please, do not add more mess like that, where skb is either freed by
the callee or the caller.


> +                       return err < 0;

Where err is set to a negative value ?


>                 }
>                 SKB_DR_SET(reason, TCP_FLAGS);
>                 goto discard;
> @@ -6878,6 +6877,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
>         bool want_cookie = false;
>         struct dst_entry *dst;
>         struct flowi fl;
> +       SKB_DR(reason);
>
>         /* TW buckets are converted to open requests without
>          * limitations, they conserve resources and peer is
> @@ -6886,12 +6886,15 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
>         if ((net->ipv4.sysctl_tcp_syncookies == 2 ||
>              inet_csk_reqsk_queue_is_full(sk)) && !isn) {
>                 want_cookie = tcp_syn_flood_action(sk, rsk_ops->slab_name);
> -               if (!want_cookie)
> +               if (!want_cookie) {
> +                       SKB_DR_SET(reason, TCP_REQQFULLDROP);
>                         goto drop;
> +               }
>         }
>
>         if (sk_acceptq_is_full(sk)) {
>                 NET_INC_STATS(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
> +               SKB_DR_SET(reason, LISTENOVERFLOWS);
>                 goto drop;
>         }
>
> @@ -6947,6 +6950,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
>                          */
>                         pr_drop_req(req, ntohs(tcp_hdr(skb)->source),
>                                     rsk_ops->family);
> +                       SKB_DR_SET(reason, TCP_REQQFULLDROP);
>                         goto drop_and_release;
>                 }
>
> @@ -7006,7 +7010,8 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
>  drop_and_free:
>         __reqsk_free(req);
>  drop:
> +       kfree_skb_reason(skb, reason);

Ugh no, prefer "return reason" and leave to the caller the freeing part.

Your changes are too invasive and will hurt future backports.


>         tcp_listendrop(sk);
> -       return 0;
> +       return 1;
>  }
>  EXPORT_SYMBOL(tcp_conn_request);
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 157265aecbed..b8daf49f54a5 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1470,7 +1470,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
>
>  drop:
>         tcp_listendrop(sk);
> -       return 0;

This return 0 meant : do not send reset.


> +       kfree_skb_reason(skb, SKB_DROP_REASON_IP_INADDRERRORS);

double kfree_skb() ?

> +       return 1;

-> send RESET

>  }
>  EXPORT_SYMBOL(tcp_v4_conn_request);
>
> --
> 2.36.0
>

I have a hard time understanding this patch.

Where is the related IPv6 change ?

I really wonder if you actually have tested this.

^ permalink raw reply

* Re: [PATCH 2/3] perf: arm-spe: Fix SPE events with phys addresses
From: Leo Yan @ 2022-04-26 13:19 UTC (permalink / raw)
  To: James Clark
  Cc: Timothy Hayes, linux-kernel, linux-perf-users, acme, John Garry,
	Will Deacon, Mathieu Poirier, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Martin KaFai Lau, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, linux-arm-kernel, netdev,
	bpf
In-Reply-To: <322009d2-330c-22d4-4075-eca2042f64e1@arm.com>

On Mon, Apr 25, 2022 at 10:12:36AM +0100, James Clark wrote:

[...]

> >> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> >> index 151cc38a171c..1a80151baed9 100644
> >> --- a/tools/perf/util/arm-spe.c
> >> +++ b/tools/perf/util/arm-spe.c
> >> @@ -1033,7 +1033,8 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
> >>  	memset(&attr, 0, sizeof(struct perf_event_attr));
> >>  	attr.size = sizeof(struct perf_event_attr);
> >>  	attr.type = PERF_TYPE_HARDWARE;
> >> -	attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
> >> +	attr.sample_type = evsel->core.attr.sample_type &
> >> +				(PERF_SAMPLE_MASK | PERF_SAMPLE_PHYS_ADDR);
> > 
> > I verified this patch and I can confirm the physical address can be
> > dumped successfully.
> > 
> > I have a more general question, seems to me, we need to change the
> > macro PERF_SAMPLE_MASK in the file util/event.h as below, so
> > here doesn't need to 'or' the flag PERF_SAMPLE_PHYS_ADDR anymore.
> > 
> > @Arnaldo, @Jiri, could you confirm if this is the right way to move
> > forward?  I am not sure why PERF_SAMPLE_MASK doesn't contain the bit
> > PERF_SAMPLE_PHYS_ADDR in current code.
> 
> I think there is a reason that PERF_SAMPLE_MASK is a subset of all the
> bits. This comment below suggests it. Is it so the mask only includes fields
> that are 64bits? That makes the __evsel__sample_size() function a simple
> multiplication of a count of all the fields that are 64bits.

After reading code, seems the conclusion "a count of all the fields
that are 64bits" is not valid.  PERF_SAMPLE_MASK contains bits
PERF_SAMPLE_IP and PERF_SAMPLE_TID, the corresponding fields 'pid'
and 'tid' in the structure perf_sample are u32 type.

>   static int
>   perf_event__check_size(union perf_event *event, unsigned int sample_size)
>   {
> 	/*
> 	 * The evsel's sample_size is based on PERF_SAMPLE_MASK which includes
> 	 * up to PERF_SAMPLE_PERIOD.  After that overflow() must be used to
> 	 * check the format does not go past the end of the event.
> 	 */
> 	if (sample_size + sizeof(event->header) > event->header.size)
> 		return -EFAULT;

Okay, thanks for sharing the info, it does show that it's deliberately to
not contain all fields in PERF_SAMPLE_MASK.  If so, this patch is fine
for me:

Reviewed-by: Leo Yan <leo.yan@linaro.org>


> 	return 0;
>   }
> 
> Having said that, the mask was updated once to add PERF_SAMPLE_IDENTIFIER to
> it, so that comment is slightly out of date now.
> 
> 
> > 
> > diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
> > index cdd72e05fd28..c905ac32ebad 100644
> > --- a/tools/perf/util/event.h
> > +++ b/tools/perf/util/event.h
> > @@ -39,7 +39,7 @@ struct perf_event_attr;
> >          PERF_SAMPLE_TIME | PERF_SAMPLE_ADDR |          \
> >         PERF_SAMPLE_ID | PERF_SAMPLE_STREAM_ID |        \
> >          PERF_SAMPLE_CPU | PERF_SAMPLE_PERIOD |         \
> > -        PERF_SAMPLE_IDENTIFIER)
> > +        PERF_SAMPLE_IDENTIFIER | PERF_SAMPLE_PHYS_ADDR)
> > 
> > Thanks,
> > Leo
> > 
> >>  	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
> >>  			    PERF_SAMPLE_PERIOD | PERF_SAMPLE_DATA_SRC |
> >>  			    PERF_SAMPLE_WEIGHT | PERF_SAMPLE_ADDR;
> >> -- 
> >> 2.25.1
> >>

^ permalink raw reply

* [PATCH v2 net-next] ixgbe: add xdp frags support to ndo_xdp_xmit
From: Lorenzo Bianconi @ 2022-04-26 13:14 UTC (permalink / raw)
  To: netdev
  Cc: intel-wired-lan, davem, kuba, pabeni, jesse.brandeburg,
	anthony.l.nguyen, alice.michael, bpf, ast, daniel,
	lorenzo.bianconi, andrii, magnus.karlsson, jbrouer, toke

Add the capability to map non-linear xdp frames in XDP_TX and ndo_xdp_xmit
callback.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
Changes since v1:
- rebase on top of net-next
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 99 ++++++++++++-------
 1 file changed, 63 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index c4a4954aa317..8b84c9b2eecc 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2344,6 +2344,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			hard_start = page_address(rx_buffer->page) +
 				     rx_buffer->page_offset - offset;
 			xdp_prepare_buff(&xdp, hard_start, offset, size, true);
+			xdp_buff_clear_frags_flag(&xdp);
 #if (PAGE_SIZE > 4096)
 			/* At larger PAGE_SIZE, frame_sz depend on len size */
 			xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, size);
@@ -8571,57 +8572,83 @@ static u16 ixgbe_select_queue(struct net_device *dev, struct sk_buff *skb,
 int ixgbe_xmit_xdp_ring(struct ixgbe_ring *ring,
 			struct xdp_frame *xdpf)
 {
-	struct ixgbe_tx_buffer *tx_buffer;
-	union ixgbe_adv_tx_desc *tx_desc;
-	u32 len, cmd_type;
-	dma_addr_t dma;
-	u16 i;
-
-	len = xdpf->len;
+	struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf);
+	u8 nr_frags = unlikely(xdp_frame_has_frags(xdpf)) ? sinfo->nr_frags : 0;
+	u16 i = 0, index = ring->next_to_use;
+	struct ixgbe_tx_buffer *tx_head = &ring->tx_buffer_info[index];
+	struct ixgbe_tx_buffer *tx_buff = tx_head;
+	union ixgbe_adv_tx_desc *tx_desc = IXGBE_TX_DESC(ring, index);
+	u32 cmd_type, len = xdpf->len;
+	void *data = xdpf->data;
 
-	if (unlikely(!ixgbe_desc_unused(ring)))
+	if (unlikely(ixgbe_desc_unused(ring) < 1 + nr_frags))
 		return IXGBE_XDP_CONSUMED;
 
-	dma = dma_map_single(ring->dev, xdpf->data, len, DMA_TO_DEVICE);
-	if (dma_mapping_error(ring->dev, dma))
-		return IXGBE_XDP_CONSUMED;
+	tx_head->bytecount = xdp_get_frame_len(xdpf);
+	tx_head->gso_segs = 1;
+	tx_head->xdpf = xdpf;
 
-	/* record the location of the first descriptor for this packet */
-	tx_buffer = &ring->tx_buffer_info[ring->next_to_use];
-	tx_buffer->bytecount = len;
-	tx_buffer->gso_segs = 1;
-	tx_buffer->protocol = 0;
+	tx_desc->read.olinfo_status =
+		cpu_to_le32(tx_head->bytecount << IXGBE_ADVTXD_PAYLEN_SHIFT);
 
-	i = ring->next_to_use;
-	tx_desc = IXGBE_TX_DESC(ring, i);
+	for (;;) {
+		dma_addr_t dma;
 
-	dma_unmap_len_set(tx_buffer, len, len);
-	dma_unmap_addr_set(tx_buffer, dma, dma);
-	tx_buffer->xdpf = xdpf;
+		dma = dma_map_single(ring->dev, data, len, DMA_TO_DEVICE);
+		if (dma_mapping_error(ring->dev, dma))
+			goto unmap;
 
-	tx_desc->read.buffer_addr = cpu_to_le64(dma);
+		dma_unmap_len_set(tx_buff, len, len);
+		dma_unmap_addr_set(tx_buff, dma, dma);
+
+		cmd_type = IXGBE_ADVTXD_DTYP_DATA | IXGBE_ADVTXD_DCMD_DEXT |
+			   IXGBE_ADVTXD_DCMD_IFCS | len;
+		tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
+		tx_desc->read.buffer_addr = cpu_to_le64(dma);
+		tx_buff->protocol = 0;
+
+		if (++index == ring->count)
+			index = 0;
+
+		if (i == nr_frags)
+			break;
+
+		tx_buff = &ring->tx_buffer_info[index];
+		tx_desc = IXGBE_TX_DESC(ring, index);
+		tx_desc->read.olinfo_status = 0;
 
+		data = skb_frag_address(&sinfo->frags[i]);
+		len = skb_frag_size(&sinfo->frags[i]);
+		i++;
+	}
 	/* put descriptor type bits */
-	cmd_type = IXGBE_ADVTXD_DTYP_DATA |
-		   IXGBE_ADVTXD_DCMD_DEXT |
-		   IXGBE_ADVTXD_DCMD_IFCS;
-	cmd_type |= len | IXGBE_TXD_CMD;
-	tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
-	tx_desc->read.olinfo_status =
-		cpu_to_le32(len << IXGBE_ADVTXD_PAYLEN_SHIFT);
+	tx_desc->read.cmd_type_len |= cpu_to_le32(IXGBE_TXD_CMD);
 
 	/* Avoid any potential race with xdp_xmit and cleanup */
 	smp_wmb();
 
-	/* set next_to_watch value indicating a packet is present */
-	i++;
-	if (i == ring->count)
-		i = 0;
-
-	tx_buffer->next_to_watch = tx_desc;
-	ring->next_to_use = i;
+	tx_head->next_to_watch = tx_desc;
+	ring->next_to_use = index;
 
 	return IXGBE_XDP_TX;
+
+unmap:
+	for (;;) {
+		tx_buff = &ring->tx_buffer_info[index];
+		if (dma_unmap_len(tx_buff, len))
+			dma_unmap_page(ring->dev, dma_unmap_addr(tx_buff, dma),
+				       dma_unmap_len(tx_buff, len),
+				       DMA_TO_DEVICE);
+		dma_unmap_len_set(tx_buff, len, 0);
+		if (tx_buff == tx_head)
+			break;
+
+		if (!index)
+			index += ring->count;
+		index--;
+	}
+
+	return IXGBE_XDP_CONSUMED;
 }
 
 netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
-- 
2.35.1


^ permalink raw reply related

* Re: [PATCH v2 net-next] net: generalize skb freeing deferral to per-cpu lists
From: Eric Dumazet @ 2022-04-26 13:11 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev
In-Reply-To: <2c092f98a8fe1702173fe2b4999811dd2263faf3.camel@redhat.com>

On Tue, Apr 26, 2022 at 12:38 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
>
> Hello,
>
> I'm sorry for the late feedback. I have only a possibly relevant point
> below.
>
> On Fri, 2022-04-22 at 13:12 -0700, Eric Dumazet wrote:
> [...]
> > @@ -6571,6 +6577,28 @@ static int napi_threaded_poll(void *data)
> >       return 0;
> >  }
> >
> > +static void skb_defer_free_flush(struct softnet_data *sd)
> > +{
> > +     struct sk_buff *skb, *next;
> > +     unsigned long flags;
> > +
> > +     /* Paired with WRITE_ONCE() in skb_attempt_defer_free() */
> > +     if (!READ_ONCE(sd->defer_list))
> > +             return;
> > +
> > +     spin_lock_irqsave(&sd->defer_lock, flags);
> > +     skb = sd->defer_list;
>
> I *think* that this read can possibly be fused with the previous one,
> and another READ_ONCE() should avoid that.

Only the lockless read needs READ_ONCE()

For the one after spin_lock_irqsave(&sd->defer_lock, flags),
there is no need for any additional barrier.

If the compiler really wants to use multiple one-byte-at-a-time loads,
we are not going to fight, there might be good reasons for that.

(We do not want to spread READ_ONCE / WRITE_ONCE for all
loads/stores, as this has performance implications)

>
> BTW it looks like this version gives slightly better results than the
> previous one, perhpas due to the single-liked list usage?

Yes, this could be the case, or maybe it is because 10 runs are not enough
in a host with 32 RX queues, with a 50/50 split between the two NUMA nodes.

When reaching high throughput, every detail matters, like background usage on
the network, from monitoring and machine health daemons.

Thanks.

^ permalink raw reply

* Re: [PATCH net v3] virtio_net: fix wrong buf address calculation when using xdp
From: patchwork-bot+netdevbpf @ 2022-04-26 13:00 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, kuba, davem, stable, jasowang, xuanzhuo, daniel, mst,
	virtualization
In-Reply-To: <20220425103703.3067292-1-razor@blackwall.org>

Hello:

This patch was applied to netdev/net.git (master)
by Paolo Abeni <pabeni@redhat.com>:

On Mon, 25 Apr 2022 13:37:03 +0300 you wrote:
> We received a report[1] of kernel crashes when Cilium is used in XDP
> mode with virtio_net after updating to newer kernels. After
> investigating the reason it turned out that when using mergeable bufs
> with an XDP program which adjusts xdp.data or xdp.data_meta page_to_buf()
> calculates the build_skb address wrong because the offset can become less
> than the headroom so it gets the address of the previous page (-X bytes
> depending on how lower offset is):
>  page_to_skb: page addr ffff9eb2923e2000 buf ffff9eb2923e1ffc offset 252 headroom 256
> 
> [...]

Here is the summary with links:
  - [net,v3] virtio_net: fix wrong buf address calculation when using xdp
    https://git.kernel.org/netdev/net/c/acb16b395c3f

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH net] SUNRPC: Fix local socket leak in xs_local_setup_socket()
From: Wang Hai @ 2022-04-26 13:20 UTC (permalink / raw)
  To: trond.myklebust, anna, chuck.lever, davem, kuba, pabeni
  Cc: linux-nfs, netdev, linux-kernel, wanghai38

If the connection to a local endpoint in xs_local_setup_socket() fails,
fput() is missing in the error path, which will result in a socket leak.
It can be reproduced in simple script below.

while true
do
        systemctl stop rpcbind.service
        systemctl stop rpc-statd.service
        systemctl stop nfs-server.service

        systemctl restart rpcbind.service
        systemctl restart rpc-statd.service
        systemctl restart nfs-server.service
done

When executing the script, you can observe that the
"cat /proc/net/unix | wc -l" count keeps growing.

Add the missing fput(), and restore transport to old socket.

Signed-off-by: Wang Hai <wanghai38@huawei.com>
---
 net/sunrpc/xprtsock.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 0f39e08ee580..7219c545385e 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1819,6 +1819,9 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt,
 {
 	struct sock_xprt *transport = container_of(xprt, struct sock_xprt,
 									xprt);
+	struct socket *trans_sock = NULL;
+	struct sock *trans_inet = NULL;
+	int ret;
 
 	if (!transport->inet) {
 		struct sock *sk = sock->sk;
@@ -1835,6 +1838,9 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt,
 
 		xprt_clear_connected(xprt);
 
+		trans_sock = transport->sock;
+		trans_inet = transport->inet;
+
 		/* Reset to new socket */
 		transport->sock = sock;
 		transport->inet = sk;
@@ -1844,7 +1850,14 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt,
 
 	xs_stream_start_connect(transport);
 
-	return kernel_connect(sock, xs_addr(xprt), xprt->addrlen, 0);
+	ret = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, 0);
+	/* Restore to old socket */
+	if (ret && trans_inet) {
+		transport->sock = trans_sock;
+		transport->inet = trans_inet;
+	}
+
+	return ret;
 }
 
 /**
@@ -1887,7 +1900,7 @@ static int xs_local_setup_socket(struct sock_xprt *transport)
 		xprt->stat.connect_time += (long)jiffies -
 					   xprt->stat.connect_start;
 		xprt_set_connected(xprt);
-		break;
+		goto out;
 	case -ENOBUFS:
 		break;
 	case -ENOENT:
@@ -1904,6 +1917,9 @@ static int xs_local_setup_socket(struct sock_xprt *transport)
 				xprt->address_strings[RPC_DISPLAY_ADDR]);
 	}
 
+	transport->file = NULL;
+	fput(filp);
+
 out:
 	xprt_clear_connecting(xprt);
 	xprt_wake_pending_tasks(xprt, status);
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH net-next] net: mark tulip obsolete
From: Jakub Kicinski @ 2022-04-26 12:53 UTC (permalink / raw)
  To: Helge Deller; +Cc: davem, netdev, linux-parisc
In-Reply-To: <a66551f3-192a-70dc-4eb9-62090dbfe5fb@gmx.de>

On Tue, 15 Mar 2022 23:18:38 +0100 Helge Deller wrote:
> On 3/15/22 20:04, Jakub Kicinski wrote:
> > On Tue, 15 Mar 2022 19:44:24 +0100 Helge Deller wrote:  
> >> On 3/15/22 19:43, Jakub Kicinski wrote:  
> >>> It's ancient, an likely completely unused at this point.
> >>> Let's mark it obsolete to prevent refactoring.  
> >>
> >> NAK.
> >>
> >> This driver is needed by nearly all PA-RISC machines.  
> >
> > I was just trying to steer newcomers to code that's more relevant today.  
> 
> That intention is ok, but "obsolete" means it's not used any more,
> and that's not true.

Hi Helge! Which incarnation of tulip do you need for PA-RISC, exactly?
I'd like to try to remove DE4X5, if that's not the one you need
(getting rid of virt_to_bus()-using drivers).

^ permalink raw reply

* Re: [linux-next:master] BUILD REGRESSION e7d6987e09a328d4a949701db40ef63fbb970670
From: Jiri Pirko @ 2022-04-26 12:41 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: kernel test robot, Andrew Morton, netdev, Ido Schimmel
In-Reply-To: <20220426051716.7fc4b9c1@kernel.org>

Tue, Apr 26, 2022 at 02:17:16PM CEST, kuba@kernel.org wrote:
>On Tue, 26 Apr 2022 13:42:04 +0800 kernel test robot wrote:
>> drivers/net/ethernet/mellanox/mlxsw/core_linecards.c:851:8: warning: Use of memory after it is freed [clang-analyzer-unix.Malloc]
>
>Hi Ido, Jiri,
>
>is this one on your radar?

Will send a fix for this, thanks.


^ permalink raw reply

* Re: [PATCH net-next 00/11] mlxsw: extend line card model by devices and info
From: Jakub Kicinski @ 2022-04-26 12:41 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Ido Schimmel, Ido Schimmel, netdev, davem, pabeni, jiri, petrm,
	dsahern, andrew, mlxsw
In-Reply-To: <YmeXyzumj1oTSX+x@nanopsycho>

On Tue, 26 Apr 2022 08:57:15 +0200 Jiri Pirko wrote:
> >> In this particular case, these devices are gearboxes. They are running
> >> their own firmware and we want user space to be able to query and update
> >> the running firmware version.  
> >
> >Nothing too special, then, we don't create "devices" for every
> >component of the system which can have a separate FW. That's where
> >"components" are intended to be used..  
> 
> *
> Sure, that is why I re-used components :)

Well, right, I guess you did reuse them a little :)

> But you have to somehow link the component to the particular gearbox on
> particular line card. Say, you need to flash GB on line card 8. This is
> basically providing a way to expose this relationship to user.
> 
> Also, the "lc info" shows the FW version for gearboxes. As Ido
> mentioned, the GB versions could be listed in "devlink dev info" in
> theory. But then, you need to somehow expose the relationship with
> line card as well.

Why would the automation which comes to update the firmware care 
at all where the component is? Humans can see what the component 
is by looking at the name.

If we do need to know (*if*!) you can list FW components as a lc
attribute, no need for new commands and objects.

IMHO we should either keep lc objects simple and self contained or 
give them a devlink instance. Creating sub-objects from them is very
worrying. If there is _any_ chance we'll need per-lc health reporters 
or sbs or params(🤢) etc. etc. - let's bite the bullet _now_ and create
full devlink sub-instances!

> I don't see any simpler iface than this.

Based on the assumptions you've made, maybe, but the uAPI should
abstract away the irrelevant details. I'm questioning the assumptions.

> >> The idea (implemented in the next patchset) is to let these devices
> >> expose their own "component name", which can then be plugged into
> >> the existing flash command:
> >> 
> >>     $ devlink lc show pci/0000:01:00.0 lc 8
> >>     pci/0000:01:00.0:
> >>       lc 8 state active type 16x100G
> >>         supported_types:
> >>            16x100G
> >>         devices:
> >>           device 0 flashable true component lc8_dev0
> >>           device 1 flashable false
> >>           device 2 flashable false
> >>           device 3 flashable false
> >>     $ devlink dev flash pci/0000:01:00.0 file some_file.mfa2
> >> component lc8_dev0  
> >
> >IDK if it's just me or this assumes deep knowledge of the system.
> >I don't understand why we need to list devices 1-3 at all. And they
> >don't even have names. No information is exposed.   
> 
> There are 4 gearboxes on the line card. They share the same flash. So
> if you flash gearbox 0, the rest will use the same FW.

o_0 so the FW component is called lcX_dev0 and yet it applies to _all_
devices, not just dev0?! Looking at the output above I thought other
devices simply don't have FW ("flashable false").

> I'm exposing them for the sake of completeness. Also, the interface
> needs to be designed as a list anyway, as different line cards may
> have separate flash per gearbox.
> 
> What's is the harm in exposing devices 1-3? If you insist, we can hide
> them.

Well, they are unnecessary (this is uAPI), and coming from the outside
I misinterpreted what the information reported means, so yeah, I'd
classify it as harmful :(

^ permalink raw reply

* Re: [PATCH net-next 00/11] mlxsw: extend line card model by devices and info
From: Jiri Pirko @ 2022-04-26 12:41 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Ido Schimmel, Jakub Kicinski, Ido Schimmel, netdev, davem, pabeni,
	jiri, petrm, dsahern, mlxsw
In-Reply-To: <YmflStBQCrzP8E6t@lunn.ch>

Tue, Apr 26, 2022 at 02:27:54PM CEST, andrew@lunn.ch wrote:
>> It is not a separate devlink device, not removetely. A devlink device is
>> attached to some bus on the host, it contains entities like netdevices,
>> etc.
>> 
>> Line card devices, on contrary, are accessible over ASIC FW interface,
>> they reside on line cards. ASIC FW is using build-in SDK to communicate
>> with them. There is really nothing to expose, except for the face they
>> are there, with some FW version and later on (follow-up patchset) to be
>> able to flash FW on them.
>
>But isn't this just an implementation detail?
>
>Say the flash was directly accessible to the host? It is just another
>mtd devices? The gearbox is just another bunch of MMIO registers. You

Not really, the ASIC FW is also not mtd device. ASIC FW exposes set of
registers to work with the ASIC FW flash. The linecard gearbox
implements the same set of registers, tunneled over MDDT register.

Also, the gearbox is not accessable from the host. The ASIC FW is the
only one able to access it.


>can access the SFP socket via a host i2c bus, etc. More of a SoC like
>implementation, which the enterprise routers are like.
>
>This is a completely different set of implementation details, but i
>still have the same basic building blocks. Should it look the same,
>and the implementation details are hidden, or do you want to expose
>your implementation details?

Well, I got your point. If the HW would be designed in the way the
building blocks are exposed to the host, that would work. However, that
is not the case here, unfortunatelly.


>
>	Andrew

^ permalink raw reply

* [PATCH] net: fec: add missing of_node_put() in fec_enet_init_stop_mode()
From: Yang Yingliang @ 2022-04-26 12:52 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: qiangqing.zhang, fugang.duan, davem, kuba

Put device node in error path in fec_enet_init_stop_mode().

Fixes: 8a448bf832af ("net: ethernet: fec: move GPR register offset and bit into DT")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 11227f51404c..9f33ec838b52 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3731,7 +3731,7 @@ static int fec_enet_init_stop_mode(struct fec_enet_private *fep,
 					 ARRAY_SIZE(out_val));
 	if (ret) {
 		dev_dbg(&fep->pdev->dev, "no stop mode property\n");
-		return ret;
+		goto out;
 	}
 
 	fep->stop_gpr.gpr = syscon_node_to_regmap(gpr_np);
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH net-next 00/11] mlxsw: extend line card model by devices and info
From: Jiri Pirko @ 2022-04-26 12:36 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, Ido Schimmel, Ido Schimmel, netdev, davem, pabeni,
	jiri, petrm, dsahern, mlxsw
In-Reply-To: <Ymfhf9aS0Rc/rhN2@lunn.ch>

Tue, Apr 26, 2022 at 02:11:43PM CEST, andrew@lunn.ch wrote:
>> >> The idea (implemented in the next patchset) is to let these devices
>> >> expose their own "component name", which can then be plugged into the
>> >> existing flash command:
>> >> 
>> >>     $ devlink lc show pci/0000:01:00.0 lc 8
>> >>     pci/0000:01:00.0:
>> >>       lc 8 state active type 16x100G
>> >>         supported_types:
>> >>            16x100G
>> >>         devices:
>> >>           device 0 flashable true component lc8_dev0
>> >>           device 1 flashable false
>> >>           device 2 flashable false
>> >>           device 3 flashable false
>> >>     $ devlink dev flash pci/0000:01:00.0 file some_file.mfa2 component lc8_dev0
>> >
>> >IDK if it's just me or this assumes deep knowledge of the system.
>> >I don't understand why we need to list devices 1-3 at all. And they
>> >don't even have names. No information is exposed. 
>> 
>> There are 4 gearboxes on the line card. They share the same flash. So if
>> you flash gearbox 0, the rest will use the same FW.
>
>Is the flash big enough to hold four copies of the firmware? Listing

One copy, all 4 are using it.


>all four devices gives you a path forward to allowing each device to
>have its own firmware.

Yes.

>
>On the flip side, flashable false is not really true. I don't know
>this API at all, but are there any flags? Can you at least indicate it
>is shared? You could then have an output something like:

Yes, that I can do.

>
>           device 0 flashable true shared component lc8_dev0
>           device 1 flashable true shared component lc8_dev0
>           device 2 flashable true shared component lc8_dev0
>           device 3 flashable true shared component lc8_dev0
>
>so you get to see the sharing relationship.

Agreed.

>
>   Andrew

^ permalink raw reply

* [PATCH -next] net: cpsw: add missing of_node_put() in cpsw_probe_dt()
From: Yang Yingliang @ 2022-04-26 12:47 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: grygorii.strashko, davem, kuba

If devm_kcalloc() fails, 'tmp_node' should be put in cpsw_probe_dt().

Fixes: ed3525eda4c4 ("net: ethernet: ti: introduce cpsw switchdev based driver part 1 - dual-emac")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
---
 drivers/net/ethernet/ti/cpsw_new.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
index bd4b1528cf99..b81179f7d738 100644
--- a/drivers/net/ethernet/ti/cpsw_new.c
+++ b/drivers/net/ethernet/ti/cpsw_new.c
@@ -1246,8 +1246,10 @@ static int cpsw_probe_dt(struct cpsw_common *cpsw)
 	data->slave_data = devm_kcalloc(dev, CPSW_SLAVE_PORTS_NUM,
 					sizeof(struct cpsw_slave_data),
 					GFP_KERNEL);
-	if (!data->slave_data)
+	if (!data->slave_data) {
+		of_node_put(tmp_node);
 		return -ENOMEM;
+	}
 
 	/* Populate all the child nodes here...
 	 */
-- 
2.25.1


^ permalink raw reply related

* Re: net: stmmac: dwmac-imx: half duplex crash
From: Marcel Ziswiler @ 2022-04-26 12:32 UTC (permalink / raw)
  To: andrew@lunn.ch
  Cc: linux-imx@nxp.com, peppe.cavallaro@st.com,
	linux-stm32@st-md-mailman.stormreply.com, davem@davemloft.net,
	linux-kernel@vger.kernel.org, pabeni@redhat.com,
	shawnguo@kernel.org, joabreu@synopsys.com, kernel@pengutronix.de,
	s.hauer@pengutronix.de, kuba@kernel.org,
	alexandre.torgue@foss.st.com, mcoquelin.stm32@gmail.com,
	netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	festevam@gmail.com
In-Reply-To: <YmbFblFCrGFND+h/@lunn.ch>

On Mon, 2022-04-25 at 17:59 +0200, Andrew Lunn wrote:
> > Good point. I was blinded by NXP downstream which, while listing all incl. 10baseT/Half and 100baseT/Half
> > as
> > supported link modes, also does not work. However, upstream indeed shows only full-duplex modes as
> > supported:
> > 
> > root@verdin-imx8mp-07106916:~# ethtool eth1
> > Settings for eth1:
> >         Supported ports: [ TP MII ]
> >         Supported link modes:   10baseT/Full 
> >                                 100baseT/Full 
> >                                 1000baseT/Full 
> 
> So maybe we actually want ethtool to report -EINVAL when asked to do
> something which is not supported! Humm:
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/phy.c#L783
> 
> 
>         /* We make sure that we don't pass unsupported values in to the PHY */
>         linkmode_and(advertising, advertising, phydev->supported);
> 
> So maybe the unsupported mode got removed, and the PHY was asked to
> advertise nothing!

Yeah, that's also what I was suspecting.

And running ethtool again after the crash kinda supports this theory.

root@verdin-imx8mp-07106916:~# ethtool -s eth1 advertise 0x01

=> crash

root@verdin-imx8mp-07106916:~# ethtool eth1
Settings for eth1:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Full 
                                100baseT/Full 
                                1000baseT/Full 
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  Not reported
                                ^^^^^^^^^^^^
        Advertised pause frame use: Symmetric Receive-only
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Speed: Unknown!
        Duplex: Unknown! (255)
        Port: Twisted Pair
        PHYAD: 7
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: Unknown
        Supports Wake-on: ug
        Wake-on: d
        Current message level: 0x0000003f (63)
                               drv probe link timer ifdown ifup
        Link detected: no

> Anyway, this is roughly there the check should go.

You mean it would need an additional check against advertising nothing?

> > ...
> > 
> > Once I remove them queues being setup via device tree it shows all modes as supported again:
> > 
> > root@verdin-imx8mp-07106916:~# ethtool eth1
> > Settings for eth1:
> >         Supported ports: [ TP MII ]
> >         Supported link modes:   10baseT/Half 10baseT/Full 
> >                                 100baseT/Half 100baseT/Full 
> >                                 1000baseT/Full 
> > ...
> > 
> > However, 10baseT/Half, while no longer just crashing, still does not seem to work right. Looking at
> > wireshark
> > traces it does send packets but seems not to ever get neither ARP nor DHCP answers (as well as any other
> > packet
> > for that matter).
> 
> So the answers are on the wire, just not received?

Yes, judging from the wireshark trace that is exactly how it looks.

> > Looks like the same actually applies to 10baseT/Full as well. While 100baseT/Half and
> > 100baseT/Full work fine now.
> > 
> > Any idea what else could still be going wrong with them 10baseT modes?
> 
> I would use mii-tool to check the status of the PHY. Make sure it
> really has negotiated 10/Half mode.

Yes, it looks like it.

root@verdin-imx8mp-07106916:~# mii-tool
eth0: negotiated 10baseT-HD, link ok
eth1: negotiated 10baseT-HD, link ok

As a matter of fact, the exact same KSZ9131RNXI PHY albeit on FEC MAC eth0 works just fine with 10Mbps half-
duplex.

> After that, it is very likely to
> be a MAC problem, and i don't think i can help you.

Sure, anyway, thanks again for all your suggestions so far. I hope somebody more familiar with the DWMAC side
of things might chime in now...

> > On a side note, besides modifying the device tree for such single-queue setup being half-duplex capable, is
> > there any easier way? Much nicer would, of course, be if it justworkedTM (e.g. advertise all modes but once
> > a
> > half-duplex mode is chosen revert to such single-queue operation). Then, on the other hand, who still uses
> > half-duplex communication in this day and age (;-p).
> 
> You seem to need it for some reason!

Well, we are gearing up on our automated testing infrastructure and asking my humble opinion on what exactly to
test concerning the Ethernet subsystem I gave the brilliant suggestion to try each and every supported link
mode (;-p). Which actually works just fine on every other hardware of ours just not the i.MX 8M Plus with the
DWMAC IP (remember, even FEC MAC works). So for now this is not something a customer of ours has real trouble
with but it raised some questions concerning whether or not and what exactly we do support...

> Anyway, it is just code. You have all the needed information in the
> adjust_link callback, so you could implement it.

Yeah, I guess that might be a neat little side project trying to get more into the topic. So far much of the
interaction within the networking subsystem in Linux is still rather a miracle to me...

>             Andrew

Cheers

Marcel

^ permalink raw reply

* Re: [PATCH net-next 00/11] mlxsw: extend line card model by devices and info
From: Andrew Lunn @ 2022-04-26 12:27 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Ido Schimmel, Jakub Kicinski, Ido Schimmel, netdev, davem, pabeni,
	jiri, petrm, dsahern, mlxsw
In-Reply-To: <YmeViVZ1XhCBCFLN@nanopsycho>

> It is not a separate devlink device, not removetely. A devlink device is
> attached to some bus on the host, it contains entities like netdevices,
> etc.
> 
> Line card devices, on contrary, are accessible over ASIC FW interface,
> they reside on line cards. ASIC FW is using build-in SDK to communicate
> with them. There is really nothing to expose, except for the face they
> are there, with some FW version and later on (follow-up patchset) to be
> able to flash FW on them.

But isn't this just an implementation detail?

Say the flash was directly accessible to the host? It is just another
mtd devices? The gearbox is just another bunch of MMIO registers. You
can access the SFP socket via a host i2c bus, etc. More of a SoC like
implementation, which the enterprise routers are like.

This is a completely different set of implementation details, but i
still have the same basic building blocks. Should it look the same,
and the implementation details are hidden, or do you want to expose
your implementation details?

	Andrew

^ permalink raw reply

* Re: [PATCHv2 bpf-next 1/4] kallsyms: Add kallsyms_lookup_names function
From: Jiri Olsa @ 2022-04-26 12:27 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: Jiri Olsa, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	netdev, bpf, lkml, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh
In-Reply-To: <20220426190108.d9c76f5ccff52e27dbef21af@kernel.org>

On Tue, Apr 26, 2022 at 07:01:08PM +0900, Masami Hiramatsu wrote:
> Hi Jiri,
> 
> Sorry for replying late.
> 
> On Fri, 22 Apr 2022 08:47:13 +0200
> Jiri Olsa <olsajiri@gmail.com> wrote:
> 
> > On Tue, Apr 19, 2022 at 10:26:05AM +0200, Jiri Olsa wrote:
> > 
> > SNIP
> > 
> > > > > +static int kallsyms_callback(void *data, const char *name,
> > > > > +			     struct module *mod, unsigned long addr)
> > > > > +{
> > > > > +	struct kallsyms_data *args = data;
> > > > > +
> > > > > +	if (!bsearch(&name, args->syms, args->cnt, sizeof(*args->syms), symbols_cmp))
> > > > > +		return 0;
> > > > > +
> > > > > +	addr = ftrace_location(addr);
> > > > > +	if (!addr)
> > > > > +		return 0;
> > > > 
> > > > Ooops, wait. Did you do this last version? I missed this point.
> > > > This changes the meanings of the kernel function.
> > > 
> > > yes, it was there before ;-) and you're right.. so some archs can
> > > return different address, I did not realize that
> > > 
> > > > 
> > > > > +
> > > > > +	args->addrs[args->found++] = addr;
> > > > > +	return args->found == args->cnt ? 1 : 0;
> > > > > +}
> > > > > +
> > > > > +/**
> > > > > + * kallsyms_lookup_names - Lookup addresses for array of symbols
> > > > 
> > > > More correctly "Lookup 'ftraced' addresses for array of sorted symbols", right?
> > > > 
> > > > I'm not sure, we can call it as a 'kallsyms' API, since this is using
> > > > kallsyms but doesn't return symbol address, but ftrace address.
> > > > I think this name misleads user to expect returning symbol address.
> > > > 
> > > > > + *
> > > > > + * @syms: array of symbols pointers symbols to resolve, must be
> > > > > + * alphabetically sorted
> > > > > + * @cnt: number of symbols/addresses in @syms/@addrs arrays
> > > > > + * @addrs: array for storing resulting addresses
> > > > > + *
> > > > > + * This function looks up addresses for array of symbols provided in
> > > > > + * @syms array (must be alphabetically sorted) and stores them in
> > > > > + * @addrs array, which needs to be big enough to store at least @cnt
> > > > > + * addresses.
> > > > 
> > > > Hmm, sorry I changed my mind. I rather like to expose kallsyms_on_each_symbol()
> > > > and provide this API from fprobe or ftrace, because this returns ftrace address
> > > > and thus this is only used from fprobe.
> > > 
> > > ok, so how about:
> > > 
> > >   int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt, unsigned long *addrs);
> > 
> > quick question.. is it ok if it stays in kalsyms.c object?
> 
> I think if this is for the ftrace API, I think it should be in the ftrace.c, and
> it can remove unneeded #ifdefs in C code.
> 
> > 
> > so we don't need to expose kallsyms_on_each_symbol,
> > and it stays in 'kalsyms' place
> 
> We don't need to expose it to modules, but just make it into a global scope.
> I don't think that doesn't cause a problem.

np, will move it to ftrace

thanks,
jirka

^ permalink raw reply

* Re: [linux-next:master] BUILD REGRESSION e7d6987e09a328d4a949701db40ef63fbb970670
From: Jakub Kicinski @ 2022-04-26 12:17 UTC (permalink / raw)
  To: kernel test robot; +Cc: Andrew Morton, netdev, Ido Schimmel, Jiri Pirko
In-Reply-To: <6267862c.xuehJN2IUHn8WMof%lkp@intel.com>

On Tue, 26 Apr 2022 13:42:04 +0800 kernel test robot wrote:
> drivers/net/ethernet/mellanox/mlxsw/core_linecards.c:851:8: warning: Use of memory after it is freed [clang-analyzer-unix.Malloc]

Hi Ido, Jiri,

is this one on your radar?

^ permalink raw reply

* Re: [PATCH net-next 0/5] net: ipqess: introduce Qualcomm IPQESS driver
From: Robert Marko @ 2022-04-26 12:12 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Maxime Chevallier, David Miller, Rob Herring, netdev,
	Linux Kernel Mailing List, devicetree, Thomas Petazzoni,
	Florian Fainelli, Heiner Kallweit, Russell King, Linux ARM,
	Vladimir Oltean, Luka Perkov
In-Reply-To: <YmMQMoMcO8uU2dKN@lunn.ch>

On Fri, Apr 22, 2022 at 10:29 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Fri, Apr 22, 2022 at 08:03:00PM +0200, Maxime Chevallier wrote:
> > Hello everyone,
> >
> > This series introduces a new driver, for the Qualcomm IPQESS Ethernet
> > Controller, found on the IPQ4019.
> >
> > The driver itself is pretty straightforward, but has lived out-of-tree
> > for a while. I've done my best to clean-up some outdated API calls, but
> > some might remain.
> >
> > This controller is somewhat special, since it's part of the IPQ4019 SoC
> > which also includes an QCA8K switch, and uses the IPQESS controller for
> > the CPU port.
>
> Does it exist in a form where it is not connected to a switch?

Hi Andrew, both the ethernet controller and the QCA8337N based switch are
part of the SoC silicon and are always present.
The ethernet controller is always connected to the switch port 0.

Regards,
Robert
>
> As Florian has suggested, if we assume frames are always going
> to/coming from a switch, we can play around with the frame format a
> little. A dummy tag could be added to the head or tail of the frame,
> which the MAC driver then uses. That gives us a more normal structure.
>
>       Andrew



-- 
Robert Marko
Staff Embedded Linux Engineer
Sartura Ltd.
Lendavska ulica 16a
10000 Zagreb, Croatia
Email: robert.marko@sartura.hr
Web: www.sartura.hr

^ permalink raw reply

* Re: [PATCH net-next 00/11] mlxsw: extend line card model by devices and info
From: Andrew Lunn @ 2022-04-26 12:11 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jakub Kicinski, Ido Schimmel, Ido Schimmel, netdev, davem, pabeni,
	jiri, petrm, dsahern, mlxsw
In-Reply-To: <YmeXyzumj1oTSX+x@nanopsycho>

> >> The idea (implemented in the next patchset) is to let these devices
> >> expose their own "component name", which can then be plugged into the
> >> existing flash command:
> >> 
> >>     $ devlink lc show pci/0000:01:00.0 lc 8
> >>     pci/0000:01:00.0:
> >>       lc 8 state active type 16x100G
> >>         supported_types:
> >>            16x100G
> >>         devices:
> >>           device 0 flashable true component lc8_dev0
> >>           device 1 flashable false
> >>           device 2 flashable false
> >>           device 3 flashable false
> >>     $ devlink dev flash pci/0000:01:00.0 file some_file.mfa2 component lc8_dev0
> >
> >IDK if it's just me or this assumes deep knowledge of the system.
> >I don't understand why we need to list devices 1-3 at all. And they
> >don't even have names. No information is exposed. 
> 
> There are 4 gearboxes on the line card. They share the same flash. So if
> you flash gearbox 0, the rest will use the same FW.

Is the flash big enough to hold four copies of the firmware? Listing
all four devices gives you a path forward to allowing each device to
have its own firmware.

On the flip side, flashable false is not really true. I don't know
this API at all, but are there any flags? Can you at least indicate it
is shared? You could then have an output something like:

           device 0 flashable true shared component lc8_dev0
           device 1 flashable true shared component lc8_dev0
           device 2 flashable true shared component lc8_dev0
           device 3 flashable true shared component lc8_dev0

so you get to see the sharing relationship.

   Andrew

^ permalink raw reply

* Re: [PATCH net V2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
From: Paolo Abeni @ 2022-04-26 11:17 UTC (permalink / raw)
  To: Duoming Zhou, krzysztof.kozlowski, linux-kernel
  Cc: davem, gregkh, alexander.deucher, broonie, akpm, netdev, linma
In-Reply-To: <20220425095838.100882-1-duoming@zju.edu.cn>

On Mon, 2022-04-25 at 17:58 +0800, Duoming Zhou wrote:
> There are destructive operations such as nfcmrvl_fw_dnld_abort and
> gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
> gpio and so on could be destructed while the upper layer functions such as
> nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
> to double-free, use-after-free and null-ptr-deref bugs.
> 
> There are three situations that could lead to double-free bugs.
> 
> The first situation is shown below:
> 
>    (Thread 1)                 |      (Thread 2)
> nfcmrvl_fw_dnld_start         |
>  ...                          |  nfcmrvl_nci_unregister_dev
>  release_firmware()           |   nfcmrvl_fw_dnld_abort
>   kfree(fw) //(1)             |    fw_dnld_over
>                               |     release_firmware
>   ...                         |      kfree(fw) //(2)
>                               |     ...
> 
> The second situation is shown below:
> 
>    (Thread 1)                 |      (Thread 2)
> nfcmrvl_fw_dnld_start         |
>  ...                          |
>  mod_timer                    |
>  (wait a time)                |
>  fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
>    fw_dnld_over               |   nfcmrvl_fw_dnld_abort
>     release_firmware          |    fw_dnld_over
>      kfree(fw) //(1)          |     release_firmware
>      ...                      |      kfree(fw) //(2)
> 
> The third situation is shown below:
> 
>        (Thread 1)               |       (Thread 2)
> nfcmrvl_nci_recv_frame          |
>  if(..->fw_download_in_progress)|
>   nfcmrvl_fw_dnld_recv_frame    |
>    queue_work                   |
>                                 |
> fw_dnld_rx_work                 | nfcmrvl_nci_unregister_dev
>  fw_dnld_over                   |  nfcmrvl_fw_dnld_abort
>   release_firmware              |   fw_dnld_over
>    kfree(fw) //(1)              |    release_firmware
>                                 |     kfree(fw) //(2)
> 
> The firmware struct is deallocated in position (1) and deallocated
> in position (2) again.
> 
> The crash trace triggered by POC is like below:
> 
> [  122.640457] BUG: KASAN: double-free or invalid-free in fw_dnld_over+0x28/0xf0
> [  122.640457] Call Trace:
> [  122.640457]  <TASK>
> [  122.640457]  kfree+0xb0/0x330
> [  122.640457]  fw_dnld_over+0x28/0xf0
> [  122.640457]  nfcmrvl_nci_unregister_dev+0x61/0x70
> [  122.640457]  nci_uart_tty_close+0x87/0xd0
> [  122.640457]  tty_ldisc_kill+0x3e/0x80
> [  122.640457]  tty_ldisc_hangup+0x1b2/0x2c0
> [  122.640457]  __tty_hangup.part.0+0x316/0x520
> [  122.640457]  tty_release+0x200/0x670
> [  122.640457]  __fput+0x110/0x410
> [  122.640457]  task_work_run+0x86/0xd0
> [  122.640457]  exit_to_user_mode_prepare+0x1aa/0x1b0
> [  122.640457]  syscall_exit_to_user_mode+0x19/0x50
> [  122.640457]  do_syscall_64+0x48/0x90
> [  122.640457]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  122.640457] RIP: 0033:0x7f68433f6beb
> 
> What's more, there are also use-after-free and null-ptr-deref bugs
> in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or
> set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev,
> then, we dereference firmware, gpio or the members of priv->fw_dnld in
> nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.
> 
> This patch reorders destructive operations after nci_unregister_device
> to avoid the double-free, UAF and NPD bugs, as nci_unregister_device
> is well synchronized and won't return if there is a running routine.
> This was mentioned in commit 3e3b5dfcd16a ("NFC: reorder the logic in
> nfc_{un,}register_device").

It looks like the above is not enough to close all the possible races,
specifically it looks like fw_dnld_timeout() and fw_dnld_rx_work() may
still race one vs another. 

I *think* that the approach you already suggested here:

https://lore.kernel.org/netdev/1d34425a0ea8a553a66dcf4f22ca55cc920dbb42.1649913521.git.duoming@zju.edu.cn/

should be safer - but you have to protect with the same spinlock even
every fw_dnld->fw modification.

@Lin Ma: I see you don't like the spinlock solution, but this other
option looks racing. Do you have other suggestions? (and/or would you
reconsider the spinlock?)

Thanks!

Paolo


^ permalink raw reply

* Re: [PATCH] net: dsa: mv88e6xxx: Skip cmode writable for mv88e6*41 if unchanged
From: Nathan Rossi @ 2022-04-26 10:45 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Marek Behún, netdev, linux-kernel, Andrew Lunn,
	Vivien Didelot, Florian Fainelli, Vladimir Oltean,
	David S. Miller, Jakub Kicinski
In-Reply-To: <43773a65a27cb714e3319b06b0215fcb0471aae6.camel@redhat.com>

On Tue, 26 Apr 2022 at 17:50, Paolo Abeni <pabeni@redhat.com> wrote:
>
> Hello,
>
> On Sat, 2022-04-23 at 15:25 +0200, Marek Behún wrote:
> > On Sat, 23 Apr 2022 13:20:35 +0000
> > Nathan Rossi <nathan@nathanrossi.com> wrote:
> >
> > > The mv88e6341_port_set_cmode function always calls the set writable
> > > regardless of whether the current cmode is different from the desired
> > > cmode. This is problematic for specific configurations of the mv88e6341
> > > and mv88e6141 (in single chip adddressing mode?) where the hidden
> > > registers are not accessible.
> >
> > I don't have a problem with skipping setting cmode writable if cmode is
> > not being changed. But hidden registers should be accessible regardless
> > of whether you are using single chip addressing mode or not. You need
> > to find why it isn't working for you, this is a bug.
>
> For the records, I read the above as requiring a fix the root cause, so
> I'm not applying this patch. Please correct me if I'm wrong.

You are correct. Sorry I forgot to reply to this thread after posting
the root cause fix.

For reference the root cause fix to the issue mentioned by this patch
is https://lore.kernel.org/netdev/20220425070454.348584-1-nathan@nathanrossi.com/.

Thanks,
Nathan

>
> Thanks,
>
> Paolo
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox