Netdev List
 help / color / mirror / Atom feed
* Re: [Patch bpf-next v1 1/4] tcp: introduce tcp_read_skb()
From: Cong Wang @ 2022-04-21 19:00 UTC (permalink / raw)
  To: John Fastabend
  Cc: Linux Kernel Network Developers, Cong Wang, Eric Dumazet,
	Daniel Borkmann, Jakub Sitnicki
In-Reply-To: <6255da425c4ad_57e1208f9@john.notmuch>

On Tue, Apr 12, 2022 at 1:00 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Cong Wang wrote:
> > From: Cong Wang <cong.wang@bytedance.com>
> >
> > This patch inroduces tcp_read_skb() based on tcp_read_sock(),
> > a preparation for the next patch which actually introduces
> > a new sock ops.
> >
> > TCP is special here, because it has tcp_read_sock() which is
> > mainly used by splice(). tcp_read_sock() supports partial read
> > and arbitrary offset, neither of them is needed for sockmap.
> >
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > ---
>
> Thanks for doing this Cong comment/question inline.
>
> [...]
>
> > +int tcp_read_skb(struct sock *sk, read_descriptor_t *desc,
> > +              sk_read_actor_t recv_actor)
> > +{
> > +     struct sk_buff *skb;
> > +     struct tcp_sock *tp = tcp_sk(sk);
> > +     u32 seq = tp->copied_seq;
> > +     u32 offset;
> > +     int copied = 0;
> > +
> > +     if (sk->sk_state == TCP_LISTEN)
> > +             return -ENOTCONN;
> > +     while ((skb = tcp_recv_skb(sk, seq, &offset, true)) != NULL) {
>
> I'm trying to see why we might have an offset here if we always
> consume the entire skb. There is a comment in tcp_recv_skb around
> GRO packets, but not clear how this applies here if it does at all
> to me yet. Will read a bit more I guess.
>
> If the offset can be >0 than we also need to fix the recv_actor to
> account for the extra offset in the skb. As is the bpf prog might
> see duplicate data. This is a problem on the stream parser now.
>
> Then another fallout is if offset is zero than we could just do
> a skb_dequeue here and skip the tcp_recv_skb bool flag addition
> and upate.

I think it is mainly for splice(), and of course strparser, but none of
them is touched by my patchset.

>
> I'll continue reading after a few other things I need to get
> sorted this afternoon, but maybe you have the answer on hand.
>

Please let me know if you have any other comments.

Thanks.

^ permalink raw reply

* [PATCH net-next] net: ipa: compute proper aggregation limit
From: Alex Elder @ 2022-04-21 18:53 UTC (permalink / raw)
  To: davem, kuba
  Cc: mka, evgreen, bjorn.andersson, cpratapa, avuyyuru, jponduru,
	subashab, elder, netdev, linux-arm-msm, linux-kernel

The aggregation byte limit for an endpoint is currently computed
based on the endpoint's receive buffer size.

However, some bytes at the front of each receive buffer are reserved
on the assumption that--as with SKBs--it might be useful to insert
data (such as headers) before what lands in the buffer.

The aggregation byte limit currently doesn't take into account that
reserved space, and as a result, aggregation could require space
past that which is available in the buffer.

Fix this by reducing the size used to compute the aggregation byte
limit by the NET_SKB_PAD offset reserved for each receive buffer.

Signed-off-by: Alex Elder <elder@linaro.org>
---
Note:  This is a bug, but it won't apply cleanly to older kernels,
       so I will be posting back-ports separately.

 drivers/net/ipa/ipa_endpoint.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ipa/ipa_endpoint.c b/drivers/net/ipa/ipa_endpoint.c
index 888e94278a84f..e133eb2bebcfb 100644
--- a/drivers/net/ipa/ipa_endpoint.c
+++ b/drivers/net/ipa/ipa_endpoint.c
@@ -130,9 +130,10 @@ static bool ipa_endpoint_data_valid_one(struct ipa *ipa, u32 count,
 		 */
 		if (data->endpoint.config.aggregation) {
 			limit += SZ_1K * aggr_byte_limit_max(ipa->version);
-			if (buffer_size > limit) {
+			if (buffer_size - NET_SKB_PAD > limit) {
 				dev_err(dev, "RX buffer size too large for aggregated RX endpoint %u (%u > %u)\n",
-					data->endpoint_id, buffer_size, limit);
+					data->endpoint_id,
+					buffer_size - NET_SKB_PAD, limit);
 
 				return false;
 			}
@@ -739,6 +740,7 @@ static void ipa_endpoint_init_aggr(struct ipa_endpoint *endpoint)
 	if (endpoint->data->aggregation) {
 		if (!endpoint->toward_ipa) {
 			const struct ipa_endpoint_rx_data *rx_data;
+			u32 buffer_size;
 			bool close_eof;
 			u32 limit;
 
@@ -746,7 +748,8 @@ static void ipa_endpoint_init_aggr(struct ipa_endpoint *endpoint)
 			val |= u32_encode_bits(IPA_ENABLE_AGGR, AGGR_EN_FMASK);
 			val |= u32_encode_bits(IPA_GENERIC, AGGR_TYPE_FMASK);
 
-			limit = ipa_aggr_size_kb(rx_data->buffer_size);
+			buffer_size = rx_data->buffer_size;
+			limit = ipa_aggr_size_kb(buffer_size - NET_SKB_PAD);
 			val |= aggr_byte_limit_encoded(version, limit);
 
 			limit = IPA_AGGR_TIME_LIMIT;
-- 
2.32.0


^ permalink raw reply related

* Re: [PATCH v5] igb: Convert kmap() to kmap_local_page()
From: Ira Weiny @ 2022-04-21 17:49 UTC (permalink / raw)
  To: Alaa Mohamed
  Cc: outreachy, jesse.brandeburg, anthony.l.nguyen, davem, kuba,
	pabeni, intel-wired-lan, netdev, linux-kernel
In-Reply-To: <20220419234313.10324-1-eng.alaamohamedsoliman.am@gmail.com>

On Wed, Apr 20, 2022 at 01:43:13AM +0200, Alaa Mohamed wrote:
> kmap() is being deprecated and these usages are all local to the thread
> so there is no reason kmap_local_page() can't be used.
> 
> Replace kmap() calls with kmap_local_page().
> 
> Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

> ---
> changes in V2:
> 	fix kunmap_local path value to take address of the mapped page.
> ---
> changes in V3:
> 	edit commit message to be clearer
> ---
> changes in V4:
> 	edit the commit message
> ---
> changes in V5:
> 	-edit commit subject
> 	-edit commit message
> ---
>  drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
> index 2a5782063f4c..c14fc871dd41 100644
> --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
> +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
> @@ -1798,14 +1798,14 @@ static int igb_check_lbtest_frame(struct igb_rx_buffer *rx_buffer,
>  
>  	frame_size >>= 1;
>  
> -	data = kmap(rx_buffer->page);
> +	data = kmap_local_page(rx_buffer->page);
>  
>  	if (data[3] != 0xFF ||
>  	    data[frame_size + 10] != 0xBE ||
>  	    data[frame_size + 12] != 0xAF)
>  		match = false;
>  
> -	kunmap(rx_buffer->page);
> +	kunmap_local(data);
>  
>  	return match;
>  }
> -- 
> 2.35.2
> 

^ permalink raw reply

* Re: Support for IEEE1588 timestamping in the BCM54210PE PHY using the kernel mii_timestamper interface
From: Florian Fainelli @ 2022-04-21 17:32 UTC (permalink / raw)
  To: Andrew Lunn, Lasse Johnsen
  Cc: netdev, richardcochran, Gordon Hollingworth, Ahmad Byagowi
In-Reply-To: <YmBc2E2eCPHMA7lR@lunn.ch>

On 4/20/22 12:19, Andrew Lunn wrote:
> On Wed, Apr 20, 2022 at 03:03:26PM +0100, Lasse Johnsen wrote:
>> Hello,
>>
>>
>> The attached set of patches adds support for the IEEE1588 functionality on the BCM54210PE PHY using the Linux Kernel mii_timestamper interface. The BCM54210PE PHY can be found in the Raspberry PI Compute Module 4 and the work has been undertaken by Timebeat.app on behalf of Raspberry PI with help and support from the nice engineers at Broadcom.

Lasse, can you copy the maintainers listed for 
drivers/net/phy/broadcom.c in the future? Thank you.
-- 
Florian

^ permalink raw reply

* Re: [PATCH bpf-next] selftests/bpf: fix prog_tests/uprobe_autoattach compilation error
From: Artem Savkov @ 2022-04-21 17:29 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alan Maguire, Alexei Starovoitov, Andrii Nakryiko, bpf, netdev,
	linux-kernel, linux-kselftest
In-Reply-To: <e5919342-0697-65f0-063f-4941e74fe1ca@iogearbox.net>

On Thu, Apr 21, 2022 at 06:53:22PM +0200, Daniel Borkmann wrote:
> On 4/21/22 3:23 PM, Artem Savkov wrote:
> > I am getting the following compilation error for prog_tests/uprobe_autoattach.c
> > 
> > tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c: In function ‘test_uprobe_autoattach’:
> > ./test_progs.h:209:26: error: pointer ‘mem’ may be used after ‘free’ [-Werror=use-after-free]
> > 
> > mem variable is now used in one of the asserts so it shouldn't be freed right
> > away. Move free(mem) after the assert block.
> 
> Looks good, but I rephrased this a bit to avoid confusion. It's false positive given we
> only compare the addresses but don't deref mem, which the compiler might not be able to
> follow in this case.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=6a12b8e20d7e72386594a9dbe7bf2d7fae3b3aa6

Right. Thank you for fixing up the commit message.

-- 
 Artem


^ permalink raw reply

* Re: Accessing XDP packet memory from the end
From: Toke Høiland-Jørgensen @ 2022-04-21 17:17 UTC (permalink / raw)
  To: Larysa Zaremba, bpf
  Cc: Larysa Zaremba, netdev, Andrii Nakryiko, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Magnus Karlsson,
	Maciej Fijalkowski, Alexander Lobakin
In-Reply-To: <20220421155620.81048-1-larysa.zaremba@intel.com>

Larysa Zaremba <larysa.zaremba@intel.com> writes:

> Dear all,
> Our team has encountered a need of accessing data_meta in a following way:
>
> int xdp_meta_prog(struct xdp_md *ctx)
> {
> 	void *data_meta_ptr = (void *)(long)ctx->data_meta;
> 	void *data_end = (void *)(long)ctx->data_end;
> 	void *data = (void *)(long)ctx->data;
> 	u64 data_size = sizeof(u32);
> 	u32 magic_meta;
> 	u8 offset;
>
> 	offset = (u8)((s64)data - (s64)data_meta_ptr);
> 	if (offset < data_size) {
> 		bpf_printk("invalid offset: %ld\n", offset);
> 		return XDP_DROP;
> 	}
>
> 	data_meta_ptr += offset;
> 	data_meta_ptr -= data_size;
>
> 	if (data_meta_ptr + data_size > data) {
> 		return XDP_DROP;
> 	}
> 		
> 	magic_meta = *((u32 *)data);
> 	bpf_printk("Magic: %d\n", magic_meta);
> 	return XDP_PASS;
> }
>
> Unfortunately, verifier claims this code attempts to access packet with
> an offset of -2 (a constant part) and negative offset is generally forbidden.
>
> For now we have 2 solutions, one is using bpf_xdp_adjust_meta(),
> which is pretty good, but not ideal for the hot path.
> The second one is the patch at the end.
>
> Do you see any other way of accessing memory from the end of data_meta/data?
> What do you think about both suggested solutions?

The problem is that the compiler is generating code that the verifier
doesn't understand. It's notoriously hard to get LLVM to produce code
that preserves the right bounds checks which is why projects like Cilium
use helpers with inline ASM to produce the right loads, like in [0].

Adapting that cilium helper to load from the metadata area, your example
can be rewritten as follows (which works just fine with no verifier
changes):

static __always_inline int
xdp_load_meta_bytes(const struct xdp_md *ctx, __u64 off, void *to, const __u64 len)
{
	void *from;
	int ret;
	/* LLVM tends to generate code that verifier doesn't understand,
	 * so force it the way we want it in order to open up a range
	 * on the reg.
	 */
	asm volatile("r1 = *(u32 *)(%[ctx] +8)\n\t"
		     "r2 = *(u32 *)(%[ctx] +0)\n\t"
		     "%[off] &= %[offmax]\n\t"
		     "r1 += %[off]\n\t"
		     "%[from] = r1\n\t"
		     "r1 += %[len]\n\t"
		     "if r1 > r2 goto +2\n\t"
		     "%[ret] = 0\n\t"
		     "goto +1\n\t"
		     "%[ret] = %[errno]\n\t"
		     : [ret]"=r"(ret), [from]"=r"(from)
		     : [ctx]"r"(ctx), [off]"r"(off), [len]"ri"(len),
		       [offmax]"i"(__CTX_OFF_MAX), [errno]"i"(-EINVAL)
		     : "r1", "r2");
	if (!ret)
		__builtin_memcpy(to, from, len);
	return ret;
}


SEC("xdp")
int xdp_meta_prog(struct xdp_md *ctx)
{
        void *data_meta_ptr = (void *)(long)ctx->data_meta;
        void *data = (void *)(long)ctx->data;
        __u32 magic_meta;
        __u8 offset;
	int ret;

        offset = (__u8)((__s64)data - (__s64)data_meta_ptr);
	ret = xdp_load_meta_bytes(ctx, offset - 4, &magic_meta, sizeof(magic_meta));
	if (ret) {
		bpf_printk("load bytes failed: %d\n", ret);
                return XDP_DROP;
	}

        bpf_printk("Magic: %d\n", magic_meta);
        return XDP_PASS;
}

-Toke


[0] https://github.com/cilium/cilium/blob/master/bpf/include/bpf/ctx/xdp.h#L35


^ permalink raw reply

* Re: [PATCH net] net: Use this_cpu_inc() to increment net->core_stats
From: Eric Dumazet @ 2022-04-21 17:11 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Thomas Gleixner, Peter Zijlstra
In-Reply-To: <YmGLkz+dIBb5JjFF@linutronix.de>

On Thu, Apr 21, 2022 at 9:51 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2022-04-21 09:06:05 [-0700], Eric Dumazet wrote:
> > On Thu, Apr 21, 2022 at 7:00 AM Sebastian Andrzej Siewior
> > <bigeasy@linutronix.de> wrote:
> > >
> >
> > >                 for_each_possible_cpu(i) {
> > >                         core_stats = per_cpu_ptr(p, i);
> > > -                       storage->rx_dropped += local_read(&core_stats->rx_dropped);
> > > -                       storage->tx_dropped += local_read(&core_stats->tx_dropped);
> > > -                       storage->rx_nohandler += local_read(&core_stats->rx_nohandler);
> > > +                       storage->rx_dropped += core_stats->rx_dropped;
> > > +                       storage->tx_dropped += core_stats->tx_dropped;
> > > +                       storage->rx_nohandler += core_stats->rx_nohandler;
> >
> > I think that one of the reasons for me to use  local_read() was that
> > it provided what was needed to avoid future syzbot reports.
>
> syzbot report due a plain read of a per-CPU variable which might be
> modified?

Yes, this is KCSAN  (
https://www.kernel.org/doc/html/latest/dev-tools/kcsan.html )

>
> > Perhaps use READ_ONCE() here ?
> >
> > Yes, we have many similar folding loops that are  simply assuming
> > compiler won't do stupid things.
>
> I wasn't sure about that and added PeterZ to do some yelling here just
> in case. And yes, we have other sites doing exactly that. In
>    Documentation/core-api/this_cpu_ops.rst
> there is nothing about remote-READ-access (only that there should be no
> writes (due to parallel this_cpu_inc() on the local CPU)). I know that a
> 32bit write can be optimized in two 16bit writes in certain cases but a
> read is a read.
> PeterZ? :)
>
> Sebastian

^ permalink raw reply

* Re: [PATCH -next] libbpf: Remove redundant non-null checks on obj_elf
From: patchwork-bot+netdevbpf @ 2022-04-21 17:00 UTC (permalink / raw)
  To: Gaosheng Cui
  Cc: ast, daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, netdev, bpf, linux-kernel, gongruiqi1, wangweiyang2
In-Reply-To: <20220421031803.2283974-1-cuigaosheng1@huawei.com>

Hello:

This patch was applied to bpf/bpf-next.git (master)
by Andrii Nakryiko <andrii@kernel.org>:

On Thu, 21 Apr 2022 11:18:03 +0800 you wrote:
> Obj_elf is already non-null checked at the function entry, so remove
> redundant non-null checks on obj_elf.
> 
> Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
> ---
>  tools/lib/bpf/libbpf.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)

Here is the summary with links:
  - [-next] libbpf: Remove redundant non-null checks on obj_elf
    https://git.kernel.org/bpf/bpf-next/c/b71a2ebf74ef

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH bpf-next] selftests/bpf: fix map tests errno checks
From: patchwork-bot+netdevbpf @ 2022-04-21 17:00 UTC (permalink / raw)
  To: Artem Savkov
  Cc: laoar.shao, ast, daniel, andrii, bpf, netdev, linux-kernel,
	linux-kselftest
In-Reply-To: <20220421094320.1563570-1-asavkov@redhat.com>

Hello:

This patch was applied to bpf/bpf-next.git (master)
by Andrii Nakryiko <andrii@kernel.org>:

On Thu, 21 Apr 2022 11:43:20 +0200 you wrote:
> Switching to libbpf 1.0 API broke test_lpm_map and test_lru_map as error
> reporting changed. Instead of setting errno and returning -1 bpf calls
> now return -Exxx directly.
> Drop errno checks and look at return code directly.
> 
> Fixes: b858ba8c52b6 ("selftests/bpf: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK")
> Signed-off-by: Artem Savkov <asavkov@redhat.com>
> 
> [...]

Here is the summary with links:
  - [bpf-next] selftests/bpf: fix map tests errno checks
    https://git.kernel.org/bpf/bpf-next/c/c14766a8a8f3

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH bpf-next] selftests/bpf: fix prog_tests/uprobe_autoattach compilation error
From: patchwork-bot+netdevbpf @ 2022-04-21 17:00 UTC (permalink / raw)
  To: Artem Savkov
  Cc: alan.maguire, ast, daniel, andrii, bpf, netdev, linux-kernel,
	linux-kselftest
In-Reply-To: <20220421132317.1583867-1-asavkov@redhat.com>

Hello:

This patch was applied to bpf/bpf-next.git (master)
by Daniel Borkmann <daniel@iogearbox.net>:

On Thu, 21 Apr 2022 15:23:17 +0200 you wrote:
> I am getting the following compilation error for prog_tests/uprobe_autoattach.c
> 
> tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c: In function ‘test_uprobe_autoattach’:
> ./test_progs.h:209:26: error: pointer ‘mem’ may be used after ‘free’ [-Werror=use-after-free]
> 
> mem variable is now used in one of the asserts so it shouldn't be freed right
> away. Move free(mem) after the assert block.
> 
> [...]

Here is the summary with links:
  - [bpf-next] selftests/bpf: fix prog_tests/uprobe_autoattach compilation error
    https://git.kernel.org/bpf/bpf-next/c/6a12b8e20d7e

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH bpf-next] selftests/bpf: fix prog_tests/uprobe_autoattach compilation error
From: Andrii Nakryiko @ 2022-04-21 16:57 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Artem Savkov, Alan Maguire, Alexei Starovoitov, Andrii Nakryiko,
	bpf, Networking, open list, open list:KERNEL SELFTEST FRAMEWORK
In-Reply-To: <e5919342-0697-65f0-063f-4941e74fe1ca@iogearbox.net>

On Thu, Apr 21, 2022 at 9:53 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 4/21/22 3:23 PM, Artem Savkov wrote:
> > I am getting the following compilation error for prog_tests/uprobe_autoattach.c
> >
> > tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c: In function ‘test_uprobe_autoattach’:
> > ./test_progs.h:209:26: error: pointer ‘mem’ may be used after ‘free’ [-Werror=use-after-free]
> >
> > mem variable is now used in one of the asserts so it shouldn't be freed right
> > away. Move free(mem) after the assert block.
>
> Looks good, but I rephrased this a bit to avoid confusion. It's false positive given we
> only compare the addresses but don't deref mem, which the compiler might not be able to
> follow in this case.
>

Great, thanks Daniel!

> https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=6a12b8e20d7e72386594a9dbe7bf2d7fae3b3aa6
>
> Thanks,
> Daniel
>
> > Fixes: 1717e248014c ("selftests/bpf: Uprobe tests should verify param/return values")
> > Signed-off-by: Artem Savkov <asavkov@redhat.com>
> > ---
> >   tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> > index d6003dc8cc99..35b87c7ba5be 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> > @@ -34,7 +34,6 @@ void test_uprobe_autoattach(void)
> >
> >       /* trigger & validate shared library u[ret]probes attached by name */
> >       mem = malloc(malloc_sz);
> > -     free(mem);
> >
> >       ASSERT_EQ(skel->bss->uprobe_byname_parm1, trigger_val, "check_uprobe_byname_parm1");
> >       ASSERT_EQ(skel->bss->uprobe_byname_ran, 1, "check_uprobe_byname_ran");
> > @@ -44,6 +43,8 @@ void test_uprobe_autoattach(void)
> >       ASSERT_EQ(skel->bss->uprobe_byname2_ran, 3, "check_uprobe_byname2_ran");
> >       ASSERT_EQ(skel->bss->uretprobe_byname2_rc, mem, "check_uretprobe_byname2_rc");
> >       ASSERT_EQ(skel->bss->uretprobe_byname2_ran, 4, "check_uretprobe_byname2_ran");
> > +
> > +     free(mem);
> >   cleanup:
> >       test_uprobe_autoattach__destroy(skel);
> >   }
> >
>

^ permalink raw reply

* Re: [PATCH -next] libbpf: Add additional null-pointer checking in make_parent_dir
From: Andrii Nakryiko @ 2022-04-21 16:55 UTC (permalink / raw)
  To: Gaosheng Cui
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin Lau,
	Song Liu, Yonghong Song, john fastabend, KP Singh, Networking,
	bpf, open list, gongruiqi1, wangweiyang2
In-Reply-To: <20220421130056.2510372-1-cuigaosheng1@huawei.com>

On Thu, Apr 21, 2022 at 6:01 AM Gaosheng Cui <cuigaosheng1@huawei.com> wrote:
>
> The make_parent_dir is called without null-pointer checking for path,
> such as bpf_link__pin. To ensure there is no null-pointer dereference
> in make_parent_dir, so make_parent_dir requires additional null-pointer
> checking for path.
>
> Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
> ---
>  tools/lib/bpf/libbpf.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index b53e51884f9e..5786e6184bf5 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -7634,6 +7634,9 @@ static int make_parent_dir(const char *path)
>         char *dname, *dir;
>         int err = 0;
>
> +       if (path == NULL)
> +               return -EINVAL;
> +

API contract is that path shouldn't be NULL. Just like we don't check
link, obj, prog for NULL in every single API, I don't think we need to
do it here, unless I'm missing something?

>         dname = strdup(path);
>         if (dname == NULL)
>                 return -ENOMEM;
> --
> 2.25.1
>

^ permalink raw reply

* Re: [PATCH] wwan_hwsim: Avoid flush_scheduled_work() usage
From: Sergey Ryazanov @ 2022-04-21 16:54 UTC (permalink / raw)
  To: Loic Poulain
  Cc: Tetsuo Handa, Johannes Berg, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Network Development
In-Reply-To: <CAMZdPi_uieGNWyGAAywBz2Utg0iW1jGUTWzUbj3SmsZ+-iDTfQ@mail.gmail.com>

Hello Loic,

On Wed, Apr 20, 2022 at 12:53 PM Loic Poulain <loic.poulain@linaro.org> wrote:
> On Wed, 20 Apr 2022 at 04:22, Tetsuo Handa
> <penguin-kernel@i-love.sakura.ne.jp> wrote:
>>
>> Flushing system-wide workqueues is dangerous and will be forbidden.
>> Replace system_wq with local wwan_wq.
>>
>> Link: https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp
> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>
> Could you add a 'Fixes' tag?

From what I understand, an inaccurate flushing of the system work
queue can potentially cause a system freeze. That is why
flush_scheduled_work() is planned to be removed. The hwsim module is
just a random function user without any known issues. So, a 'fixes'
tag is not required here, and there is no need to bother the stable
team with a change backport.

Anyway, Tetsuo, you missed a target tree in the subject. If this is
not a fix, then you probably should target your changes to the
'net-next' tree.

-- 
Sergey

^ permalink raw reply

* Re: [PATCH bpf-next] selftests/bpf: fix prog_tests/uprobe_autoattach compilation error
From: Daniel Borkmann @ 2022-04-21 16:53 UTC (permalink / raw)
  To: Artem Savkov, Alan Maguire
  Cc: Alexei Starovoitov, Andrii Nakryiko, bpf, netdev, linux-kernel,
	linux-kselftest
In-Reply-To: <20220421132317.1583867-1-asavkov@redhat.com>

On 4/21/22 3:23 PM, Artem Savkov wrote:
> I am getting the following compilation error for prog_tests/uprobe_autoattach.c
> 
> tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c: In function ‘test_uprobe_autoattach’:
> ./test_progs.h:209:26: error: pointer ‘mem’ may be used after ‘free’ [-Werror=use-after-free]
> 
> mem variable is now used in one of the asserts so it shouldn't be freed right
> away. Move free(mem) after the assert block.

Looks good, but I rephrased this a bit to avoid confusion. It's false positive given we
only compare the addresses but don't deref mem, which the compiler might not be able to
follow in this case.

https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=6a12b8e20d7e72386594a9dbe7bf2d7fae3b3aa6

Thanks,
Daniel

> Fixes: 1717e248014c ("selftests/bpf: Uprobe tests should verify param/return values")
> Signed-off-by: Artem Savkov <asavkov@redhat.com>
> ---
>   tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> index d6003dc8cc99..35b87c7ba5be 100644
> --- a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> @@ -34,7 +34,6 @@ void test_uprobe_autoattach(void)
>   
>   	/* trigger & validate shared library u[ret]probes attached by name */
>   	mem = malloc(malloc_sz);
> -	free(mem);
>   
>   	ASSERT_EQ(skel->bss->uprobe_byname_parm1, trigger_val, "check_uprobe_byname_parm1");
>   	ASSERT_EQ(skel->bss->uprobe_byname_ran, 1, "check_uprobe_byname_ran");
> @@ -44,6 +43,8 @@ void test_uprobe_autoattach(void)
>   	ASSERT_EQ(skel->bss->uprobe_byname2_ran, 3, "check_uprobe_byname2_ran");
>   	ASSERT_EQ(skel->bss->uretprobe_byname2_rc, mem, "check_uretprobe_byname2_rc");
>   	ASSERT_EQ(skel->bss->uretprobe_byname2_ran, 4, "check_uretprobe_byname2_ran");
> +
> +	free(mem);
>   cleanup:
>   	test_uprobe_autoattach__destroy(skel);
>   }
> 


^ permalink raw reply

* [PATCH 3/3] perf test: Add perf_event_attr test for Arm SPE
From: Timothy Hayes @ 2022-04-21 16:52 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme
  Cc: Timothy Hayes, John Garry, Will Deacon, Mathieu Poirier, Leo Yan,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, linux-arm-kernel, netdev, bpf
In-Reply-To: <20220421165205.117662-1-timothy.hayes@arm.com>

Adds a perf_event_attr test for Arm SPE in which the presence of
physical addresses are checked when SPE unit is run with pa_enable=1.

Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
---
 tools/perf/tests/attr/README                         |  1 +
 .../perf/tests/attr/test-record-spe-physical-address | 12 ++++++++++++
 2 files changed, 13 insertions(+)
 create mode 100644 tools/perf/tests/attr/test-record-spe-physical-address

diff --git a/tools/perf/tests/attr/README b/tools/perf/tests/attr/README
index 454505d343fa..eb3f7d4bb324 100644
--- a/tools/perf/tests/attr/README
+++ b/tools/perf/tests/attr/README
@@ -60,6 +60,7 @@ Following tests are defined (with perf commands):
   perf record -R kill                           (test-record-raw)
   perf record -c 2 -e arm_spe_0// -- kill       (test-record-spe-period)
   perf record -e arm_spe_0/period=3/ -- kill    (test-record-spe-period-term)
+  perf record -e arm_spe_0/pa_enable=1/ -- kill (test-record-spe-physical-address)
   perf stat -e cycles kill                      (test-stat-basic)
   perf stat kill                                (test-stat-default)
   perf stat -d kill                             (test-stat-detailed-1)
diff --git a/tools/perf/tests/attr/test-record-spe-physical-address b/tools/perf/tests/attr/test-record-spe-physical-address
new file mode 100644
index 000000000000..7ebcf5012ce3
--- /dev/null
+++ b/tools/perf/tests/attr/test-record-spe-physical-address
@@ -0,0 +1,12 @@
+[config]
+command = record
+args    = --no-bpf-event -e arm_spe_0/pa_enable=1/ -- kill >/dev/null 2>&1
+ret     = 1
+arch    = aarch64
+
+[event-10:base-record-spe]
+# 622727 is the decimal of IP|TID|TIME|CPU|IDENTIFIER|DATA_SRC|PHYS_ADDR
+sample_type=622727
+
+# dummy event
+[event-1:base-record-spe]
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply related

* [PATCH 2/3] perf: arm-spe: Fix SPE events with phys addresses
From: Timothy Hayes @ 2022-04-21 16:52 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme
  Cc: Timothy Hayes, John Garry, Will Deacon, Mathieu Poirier, Leo Yan,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, linux-arm-kernel, netdev, bpf
In-Reply-To: <20220421165205.117662-1-timothy.hayes@arm.com>

This patch corrects a bug whereby SPE collection is invoked with
pa_enable=1 but synthesized events fail to show physical addresses.

Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
---
 tools/perf/arch/arm64/util/arm-spe.c | 10 ++++++++++
 tools/perf/util/arm-spe.c            |  3 ++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/tools/perf/arch/arm64/util/arm-spe.c b/tools/perf/arch/arm64/util/arm-spe.c
index af4d63af8072..e8b577d33e53 100644
--- a/tools/perf/arch/arm64/util/arm-spe.c
+++ b/tools/perf/arch/arm64/util/arm-spe.c
@@ -148,6 +148,7 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
 	bool privileged = perf_event_paranoid_check(-1);
 	struct evsel *tracking_evsel;
 	int err;
+	u64 bit;
 
 	sper->evlist = evlist;
 
@@ -245,6 +246,15 @@ static int arm_spe_recording_options(struct auxtrace_record *itr,
 	 */
 	evsel__set_sample_bit(arm_spe_evsel, DATA_SRC);
 
+	/*
+	 * The PHYS_ADDR flag does not affect the driver behaviour, it is used to
+	 * inform that the resulting output's SPE samples contain physical addresses
+	 * where applicable.
+	 */
+	bit = perf_pmu__format_bits(&arm_spe_pmu->format, "pa_enable");
+	if (arm_spe_evsel->core.attr.config & bit)
+		evsel__set_sample_bit(arm_spe_evsel, PHYS_ADDR);
+
 	/* Add dummy event to keep tracking */
 	err = parse_events(evlist, "dummy:u", NULL);
 	if (err)
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 151cc38a171c..1a80151baed9 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -1033,7 +1033,8 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
 	memset(&attr, 0, sizeof(struct perf_event_attr));
 	attr.size = sizeof(struct perf_event_attr);
 	attr.type = PERF_TYPE_HARDWARE;
-	attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
+	attr.sample_type = evsel->core.attr.sample_type &
+				(PERF_SAMPLE_MASK | PERF_SAMPLE_PHYS_ADDR);
 	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
 			    PERF_SAMPLE_PERIOD | PERF_SAMPLE_DATA_SRC |
 			    PERF_SAMPLE_WEIGHT | PERF_SAMPLE_ADDR;
-- 
2.25.1


^ permalink raw reply related

* [PATCH 1/3] perf: arm-spe: Fix addresses of synthesized SPE events
From: Timothy Hayes @ 2022-04-21 16:52 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme
  Cc: Timothy Hayes, John Garry, Will Deacon, Mathieu Poirier, Leo Yan,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, linux-arm-kernel, netdev, bpf
In-Reply-To: <20220421165205.117662-1-timothy.hayes@arm.com>

This patch corrects a bug whereby synthesized events from SPE
samples are missing virtual addresses.

Signed-off-by: Timothy Hayes <timothy.hayes@arm.com>
---
 tools/perf/util/arm-spe.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index d2b64e3f588b..151cc38a171c 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -1036,7 +1036,7 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
 	attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
 	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
 			    PERF_SAMPLE_PERIOD | PERF_SAMPLE_DATA_SRC |
-			    PERF_SAMPLE_WEIGHT;
+			    PERF_SAMPLE_WEIGHT | PERF_SAMPLE_ADDR;
 	if (spe->timeless_decoding)
 		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
 	else
-- 
2.25.1


^ permalink raw reply related

* [PATCH 0/3] perf: arm-spe: Fix addresses of synthesized Arm SPE events
From: Timothy Hayes @ 2022-04-21 16:52 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme
  Cc: John Garry, Will Deacon, Mathieu Poirier, Leo Yan, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Martin KaFai Lau,
	Song Liu, Yonghong Song, John Fastabend, KP Singh,
	linux-arm-kernel, netdev, bpf

This patch set fixes problems related to address in synthesized events from SPE.

Committer testing:

perf record --no-bpf-event -e arm_spe_0/pa_enable=1/ -- sleep 1
perf inject -i perf.data -o perf-inj.data --itrace --strip
perf script -i perf-inj.data --fields hw:+addr,+phys_addr

Before:
	sleep 49337 [004] 20133.731889:          1             l1d-access:  ffffcbebcde1d5b8 [unknown] ([unknown])
	sleep 49337 [004] 20133.731889:          1             tlb-access:  ffffcbebcde1d5b8 [unknown] ([unknown])
	sleep 49337 [004] 20133.731889:          1                 memory:  ffffcbebcde1d5b8 [unknown] ([unknown])

After:
	sleep 49337 [004] 20133.731889:          1             l1d-access: ffff800034123970 ffffcbebcde1d5b8 [unknown] ([unknown])       153d19970
	sleep 49337 [004] 20133.731889:          1             tlb-access: ffff800034123970 ffffcbebcde1d5b8 [unknown] ([unknown])       153d19970
	sleep 49337 [004] 20133.731889:          1                 memory: ffff800034123970 ffffcbebcde1d5b8 [unknown] ([unknown])       153d19970

tools/perf/arch/arm64/util/arm-spe.c                   | 10 ++++++++++
tools/perf/tests/attr/README                           |  1 +
tools/perf/tests/attr/test-record-spe-physical-address | 12 ++++++++++++
tools/perf/util/arm-spe.c                              |  5 +++--
4 files changed, 26 insertions(+), 2 deletions(-)


^ permalink raw reply

* Re: [PATCH bpf-next] selftests/bpf: fix prog_tests/uprobe_autoattach compilation error
From: Andrii Nakryiko @ 2022-04-21 16:51 UTC (permalink / raw)
  To: Artem Savkov
  Cc: Alan Maguire, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Networking, open list,
	open list:KERNEL SELFTEST FRAMEWORK
In-Reply-To: <20220421132317.1583867-1-asavkov@redhat.com>

On Thu, Apr 21, 2022 at 6:23 AM Artem Savkov <asavkov@redhat.com> wrote:
>
> I am getting the following compilation error for prog_tests/uprobe_autoattach.c
>
> tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c: In function ‘test_uprobe_autoattach’:
> ./test_progs.h:209:26: error: pointer ‘mem’ may be used after ‘free’ [-Werror=use-after-free]
>
> mem variable is now used in one of the asserts so it shouldn't be freed right
> away. Move free(mem) after the assert block.

The memory is not used, we only compare the value of the pointer
itself, we don't dereference. So the compiler is being paranoid here.
But while initial version relied on free() to happen before all the
ASSERTs, now we don't, so moving free after asserts is fine.

>
> Fixes: 1717e248014c ("selftests/bpf: Uprobe tests should verify param/return values")
> Signed-off-by: Artem Savkov <asavkov@redhat.com>
> ---
>  tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> index d6003dc8cc99..35b87c7ba5be 100644
> --- a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> +++ b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
> @@ -34,7 +34,6 @@ void test_uprobe_autoattach(void)
>
>         /* trigger & validate shared library u[ret]probes attached by name */
>         mem = malloc(malloc_sz);
> -       free(mem);
>
>         ASSERT_EQ(skel->bss->uprobe_byname_parm1, trigger_val, "check_uprobe_byname_parm1");
>         ASSERT_EQ(skel->bss->uprobe_byname_ran, 1, "check_uprobe_byname_ran");
> @@ -44,6 +43,8 @@ void test_uprobe_autoattach(void)
>         ASSERT_EQ(skel->bss->uprobe_byname2_ran, 3, "check_uprobe_byname2_ran");
>         ASSERT_EQ(skel->bss->uretprobe_byname2_rc, mem, "check_uretprobe_byname2_rc");
>         ASSERT_EQ(skel->bss->uretprobe_byname2_ran, 4, "check_uretprobe_byname2_ran");
> +
> +       free(mem);
>  cleanup:
>         test_uprobe_autoattach__destroy(skel);
>  }
> --
> 2.35.1
>

^ permalink raw reply

* Re: [PATCH net] net: Use this_cpu_inc() to increment net->core_stats
From: Sebastian Andrzej Siewior @ 2022-04-21 16:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Thomas Gleixner, Peter Zijlstra
In-Reply-To: <CANn89iKjSmnTSzzHdnP-HEYMajrz+MOrjFooaMFop4Vo43kLdg@mail.gmail.com>

On 2022-04-21 09:06:05 [-0700], Eric Dumazet wrote:
> On Thu, Apr 21, 2022 at 7:00 AM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> >
> 
> >                 for_each_possible_cpu(i) {
> >                         core_stats = per_cpu_ptr(p, i);
> > -                       storage->rx_dropped += local_read(&core_stats->rx_dropped);
> > -                       storage->tx_dropped += local_read(&core_stats->tx_dropped);
> > -                       storage->rx_nohandler += local_read(&core_stats->rx_nohandler);
> > +                       storage->rx_dropped += core_stats->rx_dropped;
> > +                       storage->tx_dropped += core_stats->tx_dropped;
> > +                       storage->rx_nohandler += core_stats->rx_nohandler;
> 
> I think that one of the reasons for me to use  local_read() was that
> it provided what was needed to avoid future syzbot reports.

syzbot report due a plain read of a per-CPU variable which might be
modified?

> Perhaps use READ_ONCE() here ?
> 
> Yes, we have many similar folding loops that are  simply assuming
> compiler won't do stupid things.

I wasn't sure about that and added PeterZ to do some yelling here just
in case. And yes, we have other sites doing exactly that. In
   Documentation/core-api/this_cpu_ops.rst
there is nothing about remote-READ-access (only that there should be no
writes (due to parallel this_cpu_inc() on the local CPU)). I know that a
32bit write can be optimized in two 16bit writes in certain cases but a
read is a read.
PeterZ? :)

Sebastian

^ permalink raw reply

* Re: [PATCH net-next] selftests: net: vrf_strict_mode_test: add support to select a test to run
From: Jaehee Park @ 2022-04-21 16:47 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: outreachy, Julia Denham, Roopa Prabhu, Stefano Brivio, netdev
In-Reply-To: <41a48002-817b-4366-d316-9d94e8d81a79@nvidia.com>

On Thu, Apr 21, 2022 at 07:52:45AM -0700, Roopa Prabhu wrote:
> 
> On 4/19/22 21:55, Jaehee Park wrote:
> > Add a boilerplate test loop to run all tests in
> > vrf_strict_mode_test.sh.
> > 
> > Signed-off-by: Jaehee Park <jhpark1013@gmail.com>
> > ---
> 
> Jaehee, this needs more work. The idea is to be able to run individual tests
> with -t option.
> 
> An example is drop_monitor_tests.sh, see the usage and getopts arg parsing
> at the beginning of the test
> 
> eg ./drop_monitor_tests.sh -t <testname>
> 
> 

Thanks Roopa, I've sent version 2 for your review. 


> >   .../testing/selftests/net/vrf_strict_mode_test.sh  | 14 +++++++++++++-
> >   1 file changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/testing/selftests/net/vrf_strict_mode_test.sh b/tools/testing/selftests/net/vrf_strict_mode_test.sh
> > index 865d53c1781c..116ca43381b5 100755
> > --- a/tools/testing/selftests/net/vrf_strict_mode_test.sh
> > +++ b/tools/testing/selftests/net/vrf_strict_mode_test.sh
> > @@ -14,6 +14,8 @@ INIT_NETNS_NAME="init"
> >   PAUSE_ON_FAIL=${PAUSE_ON_FAIL:=no}
> > +TESTS="vrf_strict_mode_tests_init vrf_strict_mode_tests_testns vrf_strict_mode_tests_mix"
> > +
> >   log_test()
> >   {
> >   	local rc=$1
> > @@ -391,7 +393,17 @@ fi
> >   cleanup &> /dev/null
> >   setup
> > -vrf_strict_mode_tests
> > +for t in $TESTS
> > +do
> > +	case $t in
> > +	vrf_strict_mode_tests_init|vrf_strict_mode_init) vrf_strict_mode_tests_init;;
> > +	vrf_strict_mode_tests_testns|vrf_strict_mode_testns) vrf_strict_mode_tests_testns;;
> > +	vrf_strict_mode_tests_mix|vrf_strict_mode_mix) vrf_strict_mode_tests_mix;;
> > +
> > +	help) echo "Test names: $TESTS"; exit 0;;
> > +
> > +	esac
> > +done
> >   cleanup
> >   print_log_test_results

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net v2] ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg()
From: Keller, Jacob E @ 2022-04-21 16:40 UTC (permalink / raw)
  To: ivecera, netdev@vger.kernel.org
  Cc: Fei Liu, moderated list:INTEL ETHERNET DRIVERS, mschmidt,
	Brett Creeley, open list, Jakub Kicinski, Paolo Abeni,
	David S. Miller
In-Reply-To: <20220419142221.2349382-1-ivecera@redhat.com>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Ivan
> Vecera
> Sent: Tuesday, April 19, 2022 7:22 AM
> To: netdev@vger.kernel.org
> Cc: Fei Liu <feliu@redhat.com>; moderated list:INTEL ETHERNET DRIVERS <intel-
> wired-lan@lists.osuosl.org>; mschmidt <mschmidt@redhat.com>; Brett Creeley
> <brett.creeley@intel.com>; open list <linux-kernel@vger.kernel.org>; Jakub
> Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; David S. Miller
> <davem@davemloft.net>
> Subject: [Intel-wired-lan] [PATCH net v2] ice: Protect vf_state check by cfg_lock in
> ice_vc_process_vf_msg()
> 
> Previous patch labelled "ice: Fix incorrect locking in
> ice_vc_process_vf_msg()"  fixed an issue with ignored messages
> sent by VF driver but a small race window still left.
> 
> Recently caught trace during 'ip link set ... vf 0 vlan ...' operation:
> 
> [ 7332.995625] ice 0000:3b:00.0: Clearing port VLAN on VF 0
> [ 7333.001023] iavf 0000:3b:01.0: Reset indication received from the PF
> [ 7333.007391] iavf 0000:3b:01.0: Scheduling reset task
> [ 7333.059575] iavf 0000:3b:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our
> request 3
> [ 7333.059626] ice 0000:3b:00.0: Invalid message from VF 0, opcode 3, len 4,
> error -1
> 
> Setting of VLAN for VF causes a reset of the affected VF using
> ice_reset_vf() function that runs with cfg_lock taken:
> 
> 1. ice_notify_vf_reset() informs IAVF driver that reset is needed and
>    IAVF schedules its own reset procedure
> 2. Bit ICE_VF_STATE_DIS is set in vf->vf_state
> 3. Misc initialization steps
> 4. ice_sriov_post_vsi_rebuild() -> ice_vf_set_initialized() and that
>    clears ICE_VF_STATE_DIS in vf->vf_state
> 
> Step 3 is mentioned race window because IAVF reset procedure runs in
> parallel and one of its step is sending of VIRTCHNL_OP_GET_VF_RESOURCES
> message (opcode==3). This message is handled in ice_vc_process_vf_msg()
> and if it is received during the mentioned race window then it's
> marked as invalid and error is returned to VF driver.
> 
> Protect vf_state check in ice_vc_process_vf_msg() by cfg_lock to avoid
> this race condition.
> 
> Fixes: e6ba5273d4ed ("ice: Fix race conditions between virtchnl handling and VF
> ndo ops")
> Tested-by: Fei Liu <feliu@redhat.com>
> Signed-off-by: Ivan Vecera <ivecera@redhat.com>

Thanks, this looks good to me.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>

> ---
>  drivers/net/ethernet/intel/ice/ice_virtchnl.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> index 5612c032f15a..b72606c9e6d0 100644
> --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c
> @@ -3625,6 +3625,8 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct
> ice_rq_event_info *event)
>  		return;
>  	}
> 
> +	mutex_lock(&vf->cfg_lock);
> +
>  	/* Check if VF is disabled. */
>  	if (test_bit(ICE_VF_STATE_DIS, vf->vf_states)) {
>  		err = -EPERM;
> @@ -3648,19 +3650,14 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct
> ice_rq_event_info *event)
>  				      NULL, 0);
>  		dev_err(dev, "Invalid message from VF %d, opcode %d, len %d,
> error %d\n",
>  			vf_id, v_opcode, msglen, err);
> -		ice_put_vf(vf);
> -		return;
> +		goto finish;
>  	}
> 
> -	mutex_lock(&vf->cfg_lock);
> -
>  	if (!ice_vc_is_opcode_allowed(vf, v_opcode)) {
>  		ice_vc_send_msg_to_vf(vf, v_opcode,
>  				      VIRTCHNL_STATUS_ERR_NOT_SUPPORTED,
> NULL,
>  				      0);
> -		mutex_unlock(&vf->cfg_lock);
> -		ice_put_vf(vf);
> -		return;
> +		goto finish;
>  	}
> 
>  	switch (v_opcode) {
> @@ -3773,6 +3770,7 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct
> ice_rq_event_info *event)
>  			 vf_id, v_opcode, err);
>  	}
> 
> +finish:
>  	mutex_unlock(&vf->cfg_lock);
>  	ice_put_vf(vf);
>  }
> --
> 2.35.1
> 
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply

* [PATCH net-next v2] selftests: net: vrf_strict_mode_test: add support to select a test to run
From: Jaehee Park @ 2022-04-21 16:40 UTC (permalink / raw)
  To: outreachy, Julia Denham, Roopa Prabhu, Stefano Brivio, netdev,
	jhpark1013, Roopa Prabhu

Add a boilerplate test loop to run all tests in
vrf_strict_mode_test.sh. Add a -t flag that allows a selected test to
run.

Signed-off-by: Jaehee Park <jhpark1013@gmail.com>
---
 .../selftests/net/vrf_strict_mode_test.sh     | 31 ++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/vrf_strict_mode_test.sh b/tools/testing/selftests/net/vrf_strict_mode_test.sh
index 865d53c1781c..ca4379265706 100755
--- a/tools/testing/selftests/net/vrf_strict_mode_test.sh
+++ b/tools/testing/selftests/net/vrf_strict_mode_test.sh
@@ -14,6 +14,8 @@ INIT_NETNS_NAME="init"
 
 PAUSE_ON_FAIL=${PAUSE_ON_FAIL:=no}
 
+TESTS="init testns mix"
+
 log_test()
 {
 	local rc=$1
@@ -353,6 +355,23 @@ vrf_strict_mode_tests()
 	vrf_strict_mode_tests_mix
 }
 
+usage()
+{
+	cat <<EOF
+usage: ${0##*/} OPTS
+
+	-t <test> Test(s) to run (default: all)
+		  (options: $TESTS)
+EOF
+}
+while getopts ":t:h" opt; do
+	case $opt in
+		t) TESTS=$OPTARG;;
+		h) usage; exit 0;;
+		*) usage; exit 1;;
+	esac
+done
+
 vrf_strict_mode_check_support()
 {
 	local nsname=$1
@@ -391,7 +410,17 @@ fi
 cleanup &> /dev/null
 
 setup
-vrf_strict_mode_tests
+for t in $TESTS
+do
+	case $t in
+	vrf_strict_mode_tests_init|init) vrf_strict_mode_tests_init;;
+	vrf_strict_mode_tests_testns|testns) vrf_strict_mode_tests_testns;;
+	vrf_strict_mode_tests_mix|mix) vrf_strict_mode_tests_mix;;
+
+	help) echo "Test names: $TESTS"; exit 0;;
+
+	esac
+done
 cleanup
 
 print_log_test_results
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH] wwan_hwsim: Avoid flush_scheduled_work() usage
From: Sergey Ryazanov @ 2022-04-21 16:35 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Loic Poulain, Johannes Berg, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Network Development
In-Reply-To: <0bc6443a-dbac-70ab-bf99-9a439e35f3ef@I-love.SAKURA.ne.jp>

On Wed, Apr 20, 2022 at 5:22 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
> Flushing system-wide workqueues is dangerous and will be forbidden.
> Replace system_wq with local wwan_wq.
>
> Link: https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

Looks good! Just a couple minor questions below.

Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com>

> ---
> Note: This patch is only compile tested. By the way, don't you want to call
> debugfs_remove(wwan_hwsim_debugfs_devcreate) at err_clean_devs label in
> wwan_hwsim_init() like wwan_hwsim_exit() does, for debugfs_create_file("devcreate")
> is called before "goto err_clean_devs" happens?

As I replied in another mail. This is not strictly required, but will
not hurt anyone.

>  drivers/net/wwan/wwan_hwsim.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/wwan/wwan_hwsim.c b/drivers/net/wwan/wwan_hwsim.c
> index 5b62cf3b3c42..2136319f588f 100644
> --- a/drivers/net/wwan/wwan_hwsim.c
> +++ b/drivers/net/wwan/wwan_hwsim.c
> @@ -33,6 +33,7 @@ static struct dentry *wwan_hwsim_debugfs_devcreate;
>  static DEFINE_SPINLOCK(wwan_hwsim_devs_lock);
>  static LIST_HEAD(wwan_hwsim_devs);
>  static unsigned int wwan_hwsim_dev_idx;
> +static struct workqueue_struct *wwan_wq;
>
>  struct wwan_hwsim_dev {
>         struct list_head list;
> @@ -371,7 +372,7 @@ static ssize_t wwan_hwsim_debugfs_portdestroy_write(struct file *file,
>          * waiting this callback to finish in the debugfs_remove() call. So,
>          * use workqueue.
>          */
> -       schedule_work(&port->del_work);
> +       queue_work(wwan_wq, &port->del_work);
>
>         return count;
>  }
> @@ -416,7 +417,7 @@ static ssize_t wwan_hwsim_debugfs_devdestroy_write(struct file *file,
>          * waiting this callback to finish in the debugfs_remove() call. So,
>          * use workqueue.
>          */
> -       schedule_work(&dev->del_work);
> +       queue_work(wwan_wq, &dev->del_work);
>
>         return count;
>  }
> @@ -506,9 +507,15 @@ static int __init wwan_hwsim_init(void)
>         if (wwan_hwsim_devsnum < 0 || wwan_hwsim_devsnum > 128)
>                 return -EINVAL;
>
> +       wwan_wq = alloc_workqueue("wwan_wq", 0, 0);
> +       if (!wwan_wq)
> +               return -ENOMEM;
> +
>         wwan_hwsim_class = class_create(THIS_MODULE, "wwan_hwsim");
> -       if (IS_ERR(wwan_hwsim_class))
> +       if (IS_ERR(wwan_hwsim_class)) {
> +               destroy_workqueue(wwan_wq);

How about jumping to some label from here and do the workqueue
destroying there? E.g.

err = PTR_ERR(wwan_hwsim_class);
goto err_wq_destroy;

This will keep code symmetric.

>                 return PTR_ERR(wwan_hwsim_class);
> +       }
>
>         wwan_hwsim_debugfs_topdir = debugfs_create_dir("wwan_hwsim", NULL);
>         wwan_hwsim_debugfs_devcreate =
> @@ -524,6 +531,7 @@ static int __init wwan_hwsim_init(void)
>
>  err_clean_devs:
>         wwan_hwsim_free_devs();
> +       destroy_workqueue(wwan_wq);
>         debugfs_remove(wwan_hwsim_debugfs_topdir);
>         class_destroy(wwan_hwsim_class);

As you can see there are no need to wait the workqueue flushing, since
it was not used. So the queue destroying call can be moved below the
class destroying to keep cleanup symmetrical to the init sequence.
E.g.

 err_clean_devs:
        wwan_hwsim_free_devs();
        debugfs_remove(wwan_hwsim_debugfs_topdir);
        class_destroy(wwan_hwsim_class);

+err_wq_destroy:
+       destroy_workqueue(wwan_wq);
+
       return err;
}

> @@ -534,7 +542,7 @@ static void __exit wwan_hwsim_exit(void)
>  {
>         debugfs_remove(wwan_hwsim_debugfs_devcreate);   /* Avoid new devs */
>         wwan_hwsim_free_devs();
> -       flush_scheduled_work();         /* Wait deletion works completion */
> +       destroy_workqueue(wwan_wq);             /* Wait deletion works completion */
>         debugfs_remove(wwan_hwsim_debugfs_topdir);
>         class_destroy(wwan_hwsim_class);
>  }

I do not care too much, but can we explicitly call the queue flushing
to make  the exit handler as clear as possible?

 {
        debugfs_remove(wwan_hwsim_debugfs_devcreate);   /* Avoid new devs */
        wwan_hwsim_free_devs();
-       flush_scheduled_work();         /* Wait deletion works completion */
+       flush_workqueue(wwan_wq);             /* Wait deletion works
completion */
        debugfs_remove(wwan_hwsim_debugfs_topdir);
        class_destroy(wwan_hwsim_class);
+       destroy_workqueue(wwan_wq);
 }

-- 
Sergey

^ permalink raw reply

* Re: Accessing XDP packet memory from the end
From: Jesper Dangaard Brouer @ 2022-04-21 16:27 UTC (permalink / raw)
  To: Larysa Zaremba, bpf
  Cc: brouer, netdev, Andrii Nakryiko, Alexei Starovoitov,
	Daniel Borkmann, Toke Hoiland-Jorgensen, Magnus Karlsson,
	Maciej Fijalkowski, Alexander Lobakin, xdp-hints@xdp-project.net
In-Reply-To: <20220421155620.81048-1-larysa.zaremba@intel.com>



On 21/04/2022 17.56, Larysa Zaremba wrote:
> Dear all,
> Our team has encountered a need of accessing data_meta in a following way:
> 
> int xdp_meta_prog(struct xdp_md *ctx)
> {
> 	void *data_meta_ptr = (void *)(long)ctx->data_meta;
> 	void *data_end = (void *)(long)ctx->data_end;
> 	void *data = (void *)(long)ctx->data;
> 	u64 data_size = sizeof(u32);
> 	u32 magic_meta;
> 	u8 offset;
> 
> 	offset = (u8)((s64)data - (s64)data_meta_ptr);

I'm not sure the verifier can handle this 'offset' calc. As it cannot
statically know the sized based on this statement. Maybe this is not the
issue.

> 	if (offset < data_size) {
> 		bpf_printk("invalid offset: %ld\n", offset);
> 		return XDP_DROP;
> 	}
> 
> 	data_meta_ptr += offset;
> 	data_meta_ptr -= data_size;
> 
> 	if (data_meta_ptr + data_size > data) {
> 		return XDP_DROP;
> 	}
> 		
> 	magic_meta = *((u32 *)data);
> 	bpf_printk("Magic: %d\n", magic_meta);
> 	return XDP_PASS;
> }
> 
> Unfortunately, verifier claims this code attempts to access packet with
> an offset of -2 (a constant part) and negative offset is generally forbidden.

Are you forgetting to mention:
  - Have you modified the NIC driver to adjust data_meta pointer and 
provide info in this area?

p.s. this is exactly what I'm also working towards[1], so I'll be happy
to collaborate.  I'm missing the driver code, as link[1] is focused on
decoding BTF data_meta area in userspace for AF_XDP.

[1] 
https://github.com/xdp-project/bpf-examples/tree/master/AF_XDP-interaction

> For now we have 2 solutions, one is using bpf_xdp_adjust_meta(),
> which is pretty good, but not ideal for the hot path.
> The second one is the patch at the end.
> 

Are you saying, verifier cannot handle that driver changed data_meta 
pointer and provided info there (without calling bpf_xdp_adjust_meta)?


> Do you see any other way of accessing memory from the end of data_meta/data?
> What do you think about both suggested solutions?
> 
> Best regards,
> Larysa Zaremba
> 
> ---
> 
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3576,8 +3576,11 @@ static int check_packet_access(struct bpf_verifier_env *env, u32 regno, int off,
>   	}
>   
>   	err = reg->range < 0 ? -EINVAL :
> -	      __check_mem_access(env, regno, off, size, reg->range,
> -				 zero_size_allowed);
> +	      __check_mem_access(env, regno, off + reg->smin_value, size,
> +				 reg->range + reg->smin_value, zero_size_allowed);
> +	err = err ? :
> +	      __check_mem_access(env, regno, off + reg->umax_value, size,
> +				 reg->range + reg->umax_value, zero_size_allowed);
>   	if (err) {
>   		verbose(env, "R%d offset is outside of the packet\n", regno);
>   		return err;
> 


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox