Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net v3] virtio_net: fix wrong buf address calculation when using xdp
From: Michael S. Tsirkin @ 2022-04-25 13:52 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, kuba, davem, stable, Jason Wang, Xuan Zhuo,
	Daniel Borkmann, virtualization
In-Reply-To: <20220425103703.3067292-1-razor@blackwall.org>

On Mon, Apr 25, 2022 at 01:37:03PM +0300, Nikolay Aleksandrov wrote:
> We received a report[1] of kernel crashes when Cilium is used in XDP
> mode with virtio_net after updating to newer kernels. After
> investigating the reason it turned out that when using mergeable bufs
> with an XDP program which adjusts xdp.data or xdp.data_meta page_to_buf()
> calculates the build_skb address wrong because the offset can become less
> than the headroom so it gets the address of the previous page (-X bytes
> depending on how lower offset is):
>  page_to_skb: page addr ffff9eb2923e2000 buf ffff9eb2923e1ffc offset 252 headroom 256
> 
> This is a pr_err() I added in the beginning of page_to_skb which clearly
> shows offset that is less than headroom by adding 4 bytes of metadata
> via an xdp prog. The calculations done are:
>  receive_mergeable():
>  headroom = VIRTIO_XDP_HEADROOM; // VIRTIO_XDP_HEADROOM == 256 bytes
>  offset = xdp.data - page_address(xdp_page) -
>           vi->hdr_len - metasize;
> 
>  page_to_skb():
>  p = page_address(page) + offset;
>  ...
>  buf = p - headroom;
> 
> Now buf goes -4 bytes from the page's starting address as can be seen
> above which is set as skb->head and skb->data by build_skb later. Depending
> on what's done with the skb (when it's freed most often) we get all kinds
> of corruptions and BUG_ON() triggers in mm[2]. We have to recalculate
> the new headroom after the xdp program has run, similar to how offset
> and len are recalculated. Headroom is directly related to
> data_hard_start, data and data_meta, so we use them to get the new size.
> The result is correct (similar pr_err() in page_to_skb, one case of
> xdp_page and one case of virtnet buf):
>  a) Case with 4 bytes of metadata
>  [  115.949641] page_to_skb: page addr ffff8b4dcfad2000 offset 252 headroom 252
>  [  121.084105] page_to_skb: page addr ffff8b4dcf018000 offset 20732 headroom 252
>  b) Case of pushing data +32 bytes
>  [  153.181401] page_to_skb: page addr ffff8b4dd0c4d000 offset 288 headroom 288
>  [  158.480421] page_to_skb: page addr ffff8b4dd00b0000 offset 24864 headroom 288
>  c) Case of pushing data -33 bytes
>  [  835.906830] page_to_skb: page addr ffff8b4dd3270000 offset 223 headroom 223
>  [  840.839910] page_to_skb: page addr ffff8b4dcdd68000 offset 12511 headroom 223
> 
> Offset and headroom are equal because offset points to the start of
> reserved bytes for the virtio_net header which are at buf start +
> headroom, while data points at buf start + vnet hdr size + headroom so
> when data or data_meta are adjusted by the xdp prog both the headroom size
> and the offset change equally. We can use data_hard_start to compute the
> new headroom after the xdp prog (linearized / page start case, the
> virtnet buf case is similar just with bigger base offset):
>  xdp.data_hard_start = page_address + vnet_hdr
>  xdp.data = page_address + vnet_hdr + headroom
>  new headroom after xdp prog = xdp.data - xdp.data_hard_start - metasize
> 
> An example reproducer xdp prog[3] is below.
> 
> [1] https://github.com/cilium/cilium/issues/19453
> 
> [2] Two of the many traces:
>  [   40.437400] BUG: Bad page state in process swapper/0  pfn:14940
>  [   40.916726] BUG: Bad page state in process systemd-resolve  pfn:053b7
>  [   41.300891] kernel BUG at include/linux/mm.h:720!
>  [   41.301801] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>  [   41.302784] CPU: 1 PID: 1181 Comm: kubelet Kdump: loaded Tainted: G    B   W         5.18.0-rc1+ #37
>  [   41.304458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
>  [   41.306018] RIP: 0010:page_frag_free+0x79/0xe0
>  [   41.306836] Code: 00 00 75 ea 48 8b 07 a9 00 00 01 00 74 e0 48 8b 47 48 48 8d 50 ff a8 01 48 0f 45 fa eb d0 48 c7 c6 18 b8 30 a6 e8 d7 f8 fc ff <0f> 0b 48 8d 78 ff eb bc 48 8b 07 a9 00 00 01 00 74 3a 66 90 0f b6
>  [   41.310235] RSP: 0018:ffffac05c2a6bc78 EFLAGS: 00010292
>  [   41.311201] RAX: 000000000000003e RBX: 0000000000000000 RCX: 0000000000000000
>  [   41.312502] RDX: 0000000000000001 RSI: ffffffffa6423004 RDI: 00000000ffffffff
>  [   41.313794] RBP: ffff993c98823600 R08: 0000000000000000 R09: 00000000ffffdfff
>  [   41.315089] R10: ffffac05c2a6ba68 R11: ffffffffa698ca28 R12: ffff993c98823600
>  [   41.316398] R13: ffff993c86311ebc R14: 0000000000000000 R15: 000000000000005c
>  [   41.317700] FS:  00007fe13fc56740(0000) GS:ffff993cdd900000(0000) knlGS:0000000000000000
>  [   41.319150] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  [   41.320152] CR2: 000000c00008a000 CR3: 0000000014908000 CR4: 0000000000350ee0
>  [   41.321387] Call Trace:
>  [   41.321819]  <TASK>
>  [   41.322193]  skb_release_data+0x13f/0x1c0
>  [   41.322902]  __kfree_skb+0x20/0x30
>  [   41.343870]  tcp_recvmsg_locked+0x671/0x880
>  [   41.363764]  tcp_recvmsg+0x5e/0x1c0
>  [   41.384102]  inet_recvmsg+0x42/0x100
>  [   41.406783]  ? sock_recvmsg+0x1d/0x70
>  [   41.428201]  sock_read_iter+0x84/0xd0
>  [   41.445592]  ? 0xffffffffa3000000
>  [   41.462442]  new_sync_read+0x148/0x160
>  [   41.479314]  ? 0xffffffffa3000000
>  [   41.496937]  vfs_read+0x138/0x190
>  [   41.517198]  ksys_read+0x87/0xc0
>  [   41.535336]  do_syscall_64+0x3b/0x90
>  [   41.551637]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>  [   41.568050] RIP: 0033:0x48765b
>  [   41.583955] Code: e8 4a 35 fe ff eb 88 cc cc cc cc cc cc cc cc e8 fb 7a fe ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
>  [   41.632818] RSP: 002b:000000c000a2f5b8 EFLAGS: 00000212 ORIG_RAX: 0000000000000000
>  [   41.664588] RAX: ffffffffffffffda RBX: 000000c000062000 RCX: 000000000048765b
>  [   41.681205] RDX: 0000000000005e54 RSI: 000000c000e66000 RDI: 0000000000000016
>  [   41.697164] RBP: 000000c000a2f608 R08: 0000000000000001 R09: 00000000000001b4
>  [   41.713034] R10: 00000000000000b6 R11: 0000000000000212 R12: 00000000000000e9
>  [   41.728755] R13: 0000000000000001 R14: 000000c000a92000 R15: ffffffffffffffff
>  [   41.744254]  </TASK>
>  [   41.758585] Modules linked in: br_netfilter bridge veth netconsole virtio_net
> 
>  and
> 
>  [   33.524802] BUG: Bad page state in process systemd-network  pfn:11e60
>  [   33.528617] page ffffe05dc0147b00 ffffe05dc04e7a00 ffff8ae9851ec000 (1) len 82 offset 252 metasize 4 hroom 0 hdr_len 12 data ffff8ae9851ec10c data_meta ffff8ae9851ec108 data_end ffff8ae9851ec14e
>  [   33.529764] page:000000003792b5ba refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x11e60
>  [   33.532463] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
>  [   33.532468] raw: 000fffffc0000000 0000000000000000 dead000000000122 0000000000000000
>  [   33.532470] raw: 0000000000000000 0000000000000000 00000000fffffdff 0000000000000000
>  [   33.532471] page dumped because: nonzero mapcount
>  [   33.532472] Modules linked in: br_netfilter bridge veth netconsole virtio_net
>  [   33.532479] CPU: 0 PID: 791 Comm: systemd-network Kdump: loaded Not tainted 5.18.0-rc1+ #37
>  [   33.532482] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
>  [   33.532484] Call Trace:
>  [   33.532496]  <TASK>
>  [   33.532500]  dump_stack_lvl+0x45/0x5a
>  [   33.532506]  bad_page.cold+0x63/0x94
>  [   33.532510]  free_pcp_prepare+0x290/0x420
>  [   33.532515]  free_unref_page+0x1b/0x100
>  [   33.532518]  skb_release_data+0x13f/0x1c0
>  [   33.532524]  kfree_skb_reason+0x3e/0xc0
>  [   33.532527]  ip6_mc_input+0x23c/0x2b0
>  [   33.532531]  ip6_sublist_rcv_finish+0x83/0x90
>  [   33.532534]  ip6_sublist_rcv+0x22b/0x2b0
> 
> [3] XDP program to reproduce(xdp_pass.c):
>  #include <linux/bpf.h>
>  #include <bpf/bpf_helpers.h>
> 
>  SEC("xdp_pass")
>  int xdp_pkt_pass(struct xdp_md *ctx)
>  {
>           bpf_xdp_adjust_head(ctx, -(int)32);
>           return XDP_PASS;
>  }
> 
>  char _license[] SEC("license") = "GPL";
> 
>  compile: clang -O2 -g -Wall -target bpf -c xdp_pass.c -o xdp_pass.o
>  load on virtio_net: ip link set enp1s0 xdpdrv obj xdp_pass.o sec xdp_pass
> 
> CC: stable@vger.kernel.org
> CC: Jason Wang <jasowang@redhat.com>
> CC: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> CC: Daniel Borkmann <daniel@iogearbox.net>
> CC: "Michael S. Tsirkin" <mst@redhat.com>
> CC: virtualization@lists.linux-foundation.org
> Fixes: 8fb7da9e9907 ("virtio_net: get build_skb() buf by data ptr")
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>

Acked-by: Michael S. Tsirkin <mst@redhat.com>
> ---
> v3: Add a comment explaining why offset and headroom are equal,
>     no code changes
> v2: Recalculate headroom based on data, data_hard_start and data_meta
> 
>  drivers/net/virtio_net.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 87838cbe38cf..cbba9d2e8f32 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1005,6 +1005,24 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  			 * xdp.data_meta were adjusted
>  			 */
>  			len = xdp.data_end - xdp.data + vi->hdr_len + metasize;
> +
> +			/* recalculate headroom if xdp.data or xdp_data_meta
> +			 * were adjusted, note that offset should always point
> +			 * to the start of the reserved bytes for virtio_net
> +			 * header which are followed by xdp.data, that means
> +			 * that offset is equal to the headroom (when buf is
> +			 * starting at the beginning of the page, otherwise
> +			 * there is a base offset inside the page) but it's used
> +			 * with a different starting point (buf start) than
> +			 * xdp.data (buf start + vnet hdr size). If xdp.data or
> +			 * data_meta were adjusted by the xdp prog then the
> +			 * headroom size has changed and so has the offset, we
> +			 * can use data_hard_start, which points at buf start +
> +			 * vnet hdr size, to calculate the new headroom and use
> +			 * it later to compute buf start in page_to_skb()
> +			 */
> +			headroom = xdp.data - xdp.data_hard_start - metasize;
> +
>  			/* We can only create skb based on xdp_page. */
>  			if (unlikely(xdp_page != page)) {
>  				rcu_read_unlock();
> @@ -1012,7 +1030,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  				head_skb = page_to_skb(vi, rq, xdp_page, offset,
>  						       len, PAGE_SIZE, false,
>  						       metasize,
> -						       VIRTIO_XDP_HEADROOM);
> +						       headroom);
>  				return head_skb;
>  			}
>  			break;
> -- 
> 2.35.1


^ permalink raw reply

* Re: [PATCHv2 net-next] net/af_packet: add VLAN support for AF_PACKET SOCK_RAW GSO
From: Willem de Bruijn @ 2022-04-25 13:49 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Michael S . Tsirkin, Jason Wang, David S . Miller,
	Jakub Kicinski, Paolo Abeni, Maxim Mikityanskiy, Willem de Bruijn,
	Balazs Nemeth, Mike Pattrick, Eric Dumazet
In-Reply-To: <20220425014502.985464-1-liuhangbin@gmail.com>

On Sun, Apr 24, 2022 at 10:01 PM Hangbin Liu <liuhangbin@gmail.com> wrote:
>
> Currently, the kernel drops GSO VLAN tagged packet if it's created with
> socket(AF_PACKET, SOCK_RAW, 0) plus virtio_net_hdr.
>
> The reason is AF_PACKET doesn't adjust the skb network header if there is
> a VLAN tag. Then after virtio_net_hdr_set_proto() called, the skb->protocol
> will be set to ETH_P_IP/IPv6. And in later inet/ipv6_gso_segment() the skb
> is dropped as network header position is invalid.
>
> Let's handle VLAN packets by adjusting network header position in
> packet_parse_headers(). The adjustment is safe and does not affect the
> later xmit as tap device also did that.
>
> In packet_snd(), packet_parse_headers() need to be moved before calling
> virtio_net_hdr_set_proto(), so we can set correct skb->protocol and
> network header first.
>
> There is no need to update tpacket_snd() as it calls packet_parse_headers()
> in tpacket_fill_skb(), which is already before calling virtio_net_hdr_*
> functions.
>
> skb->no_fcs setting is also moved upper to make all skb settings together
> and keep consistency with function packet_sendmsg_spkt().
>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

Acked-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply

* Re: [PATCH bpf-next 1/4] libbpf: Define DEFAULT_BPFFS
From: Daniel Borkmann @ 2022-04-25 13:47 UTC (permalink / raw)
  To: Yafang Shao, ast, andrii, kafai, songliubraving, yhs,
	john.fastabend, kpsingh
  Cc: netdev, bpf
In-Reply-To: <20220423140058.54414-2-laoar.shao@gmail.com>

On 4/23/22 4:00 PM, Yafang Shao wrote:
> Let's use a macro DEFAULT_BPFFS instead of the hard-coded "/sys/fs/bpf".
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
[...]
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index cdbfee60ea3e..3784867811a4 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -28,6 +28,8 @@ LIBBPF_API __u32 libbpf_major_version(void);
>   LIBBPF_API __u32 libbpf_minor_version(void);
>   LIBBPF_API const char *libbpf_version_string(void);
>   
> +#define DEFAULT_BPFFS "/sys/fs/bpf"

Small nit, but given this is included in a lot of places potentially, it should
have a LIBBPF_ prefix to avoid collisions. Maybe: LIBBPF_BPFFS_DEFAULT_MNT

>   enum libbpf_errno {
>   	__LIBBPF_ERRNO__START = 4000,
>   
> @@ -91,7 +93,7 @@ struct bpf_object_open_opts {
>   	bool relaxed_core_relocs;
>   	/* maps that set the 'pinning' attribute in their definition will have
>   	 * their pin_path attribute set to a file in this directory, and be
> -	 * auto-pinned to that path on load; defaults to "/sys/fs/bpf".
> +	 * auto-pinned to that path on load; defaults to DEFAULT_BPFFS.
>   	 */
>   	const char *pin_root_path;
>   
> @@ -190,7 +192,7 @@ bpf_object__open_xattr(struct bpf_object_open_attr *attr);
>   
>   enum libbpf_pin_type {
>   	LIBBPF_PIN_NONE,
> -	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
> +	/* PIN_BY_NAME: pin maps by name (in DEFAULT_BPFFS by default) */
>   	LIBBPF_PIN_BY_NAME,
>   };
>   
> 


^ permalink raw reply

* Re: [PATCH v5 09/11] net: ieee802154: atusb: Call _xmit_error() when a transmission fails
From: Alexander Aring @ 2022-04-25 13:43 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Stefan Schmidt, linux-wpan - ML, David S. Miller, Jakub Kicinski,
	open list:NETWORKING [GENERAL], David Girault, Romuald Despres,
	Frederic Blain, Nicolas Schodet, Thomas Petazzoni
In-Reply-To: <20220425151644.01dd47f4@xps13>

Hi,

On Mon, Apr 25, 2022 at 9:16 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> Hi Alexander,
>
> alex.aring@gmail.com wrote on Mon, 25 Apr 2022 09:05:49 -0400:
>
> > Hi,
> >
> > On Mon, Apr 25, 2022 at 8:35 AM Alexander Aring <alex.aring@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > On Thu, Apr 7, 2022 at 4:06 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> > > >
> > > > Hi Alexander,
> > > >
> > > > alex.aring@gmail.com wrote on Wed, 6 Apr 2022 17:58:59 -0400:
> > > >
> > > > > Hi,
> > > > >
> > > > > On Wed, Apr 6, 2022 at 11:34 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> > > > > >
> > > > > > ieee802154_xmit_error() is the right helper to call when a transmission
> > > > > > has failed. Let's use it instead of open-coding it.
> > > > > >
> > > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > > > > > ---
> > > > > >  drivers/net/ieee802154/atusb.c | 5 ++---
> > > > > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/ieee802154/atusb.c b/drivers/net/ieee802154/atusb.c
> > > > > > index f27a5f535808..d04db4d07a64 100644
> > > > > > --- a/drivers/net/ieee802154/atusb.c
> > > > > > +++ b/drivers/net/ieee802154/atusb.c
> > > > > > @@ -271,9 +271,8 @@ static void atusb_tx_done(struct atusb *atusb, u8 seq)
> > > > > >                  * unlikely case now that seq == expect is then true, but can
> > > > > >                  * happen and fail with a tx_skb = NULL;
> > > > > >                  */
> > > > > > -               ieee802154_wake_queue(atusb->hw);
> > > > > > -               if (atusb->tx_skb)
> > > > > > -                       dev_kfree_skb_irq(atusb->tx_skb);
> > > > > > +               ieee802154_xmit_error(atusb->hw, atusb->tx_skb,
> > > > > > +                                     IEEE802154_SYSTEM_ERROR);
> > > > >
> > > > > That should then call the xmit_error for ANY other reason which is not
> > > > > 802.15.4 specific which is the bus_error() function?
> > > >
> > > > I'll drop the bus error function so we can stick to a regular
> > > > _xmit_error() call.
> > > >
> > >
> > > Okay, this error is only hardware related.
> > >
> > > > Besides, we do not have any trac information nor any easy access to
> > > > what failed exactly, so it's probably best anyway.
> > >
> > > This is correct, Somebody needs to write support for it for atusb firmware. [0]
> > > However some transceivers can't yet or will never (because lack of
> > > functionality?) report such errors back... they will act a little bit
> > > weird.
> > >
> > > However, this return value is a BIG step moving into the right
> > > direction that other drivers can follow.
> > >
> > > I think for MLME ops we can definitely handle some transmit errors now
> > > and retry so that we don't wait for anything when we know the
> > > transceiver was never submitting.
> > >
> >
> > s/submitting/transmitted/
> >
> > I could more deeper into that topic:
> >
> > 1.
> >
> > In my opinion this result value was especially necessary for MLME-ops,
> > for others which do not directly work with MAC... they provide an
> > upper layer protocol if they want something reliable.
> >
> > 2.
> >
> > Later on we can do statistics like what was already going around in
> > the linux-wpan community to have something like whatever dump to see
> > all neighbors and see how many ack failures there, etc. Some people
> > want to make some predictions about link quality here. The kernel
> > should therefore only capture some stats and the $WHATEVER userspace
> > capable netlink monitor daemon should make some heuristic by dumping
> > those stats.
>
> I like the idea of having a per-device dump of the stats. It would be
> really straightforward to implement with the current scan
> implementation that I am about to share. We already have a per PAN
> structure (with information like ID, channel, last time it was seen,
> strength, etc). We might improve this structure with counters for all
> the common mac errors. Maybe an option to the "pans dump" command
> (again, in the pipe) might be a good start to get all the stats, like
> "pans dump [stats]". I'll keep this in mind.
>

sounds to me like a per pan filtering use case... which can be
implemented in the kernel, but nowadays there might be more generic
ways of filtering out dumps in the kernel (eBPF?). However we need to
talk about this again if there are some patches. But yes there should
be plenty of information about neighbors whichever frame was received
(which also includes transmission case in that way if an ACK was
received or not if ack request bit was set, channel access failures,
etc?). There exists a difference between if we know some address
information (MAC) or it's PHY related like channel access failure.
Most people are interested in per node information (we have address
information).

> > 3.
> >
> > Sometimes even IP capable protocols (using 6LoWPAN) want to know if
> > ack was received or not, as mentioned. But this required additional
> > handling in the socket layers... I didn't look into that topic yet but
> > I know wireless solved it because they have some similar problems.
>
> I did not look at the upper layers yet, but that would indeed be a nice
> use case of these MAC statuses.
>

I am a little bit worried about that, because at some point we run in
fragmentation of IPv6/6LoWPAN, and then one packet gets into several
802.15.4 frames... whereas such information is per frame. If all has
the same result - no problems. There is a special handling needed
which need to be documented well, e.g. I would say if one frame had no
ack back I would say it should report back to user space that no ack
came back... if possible provide detailed information but I think that
isn't easier to get back to user space than just a bit if ack was back
or not.

Again this is something where a socket error queue needs to be involved.

- Alex

^ permalink raw reply

* Re: Offloading Priority Tables for queue classification
From: Maxime Chevallier @ 2022-04-25 13:37 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev@vger.kernel.org, Jakub Kicinski, Vinicius Costa Gomes,
	Thomas Petazzoni, Allan.Nielsen@microchip.com
In-Reply-To: <20220415154229.pmzgkgvlau5mftkp@skbuf>

Hi Vladimir !

On Fri, 15 Apr 2022 15:42:29 +0000
Vladimir Oltean <vladimir.oltean@nxp.com> wrote:

> Hi Maxime!
> 
> On Fri, Apr 15, 2022 at 05:37:18PM +0200, Maxime Chevallier wrote:
> > So in the end, my question would be : Is it OK to mix all of these
> > together ? Using the dcb layer to configure the internal mapping of
> > traffic -> priority, then using mqprio to configure the priority ->
> > queue mapping, and then either TC again or ethtool do configure the
> > behaviour of the queues themselves ? Or is there some other way that
> > we've missed ?  
> 
> I think it's ok to mix all of those together. At least the
> ocelot/felix DSA switches do support both advance QoS classification
> using tc-flower
> + skbedit priority action, and basic QoS classification (port-based or
> IP DSCP based) using the dcbnl application table. In short, at the end
> of the QoS classification process, a traffic class for the packet is
> selected. Then, the frame preemption would operate on the packet's
> traffic class.

Thank you very much for that answer, it helps a lot. TBH when digging
into classification, especially with TC, it's a bit overwhelming and it
gets difficult pretty quickly to get what's the right approach.

> Do you have any particular concerns?

My concerns were mainly about not reinventing the wheel, but also
making sure that I have the correct understanding on these
classification steps. I'll make sure to CC you when I have a first
series implementing that.

Best regards,

Maxime

^ permalink raw reply

* [PATCH bpf-next v2] bpf: compute map_btf_id during build time
From: menglong8.dong @ 2022-04-25 13:32 UTC (permalink / raw)
  To: ast
  Cc: rostedt, mingo, davem, yoshfuji, dsahern, kuba, pabeni, benbjiang,
	flyingpeng, imagedong, edumazet, kafai, talalahmad, keescook,
	mengensun, dongli.zhang, linux-kernel, netdev, kernel test robot

From: Menglong Dong <imagedong@tencent.com>

For now, the field 'map_btf_id' in 'struct bpf_map_ops' for all map
types are computed during vmlinux-btf init:

  btf_parse_vmlinux() -> btf_vmlinux_map_ids_init()

It will lookup the btf_type according to the 'map_btf_name' field in
'struct bpf_map_ops'. This process can be done during build time,
thanks to Jiri's resolve_btfids.

selftest of map_ptr has passed:

  $96 map_ptr:OK
  Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Reported-by: kernel test robot <lkp@intel.com>
---
v2:
- remove unused btf_vmlinux_map_ops
---
 include/linux/bpf.h            |  3 +--
 kernel/bpf/arraymap.c          | 26 +++++++---------------
 kernel/bpf/bloom_filter.c      |  6 ++---
 kernel/bpf/bpf_inode_storage.c |  6 ++---
 kernel/bpf/bpf_struct_ops.c    |  6 ++---
 kernel/bpf/bpf_task_storage.c  |  5 ++---
 kernel/bpf/btf.c               | 40 ----------------------------------
 kernel/bpf/cpumap.c            |  6 ++---
 kernel/bpf/devmap.c            | 10 ++++-----
 kernel/bpf/hashtab.c           | 22 ++++++-------------
 kernel/bpf/local_storage.c     |  7 +++---
 kernel/bpf/lpm_trie.c          |  6 ++---
 kernel/bpf/queue_stack_maps.c  | 10 ++++-----
 kernel/bpf/reuseport_array.c   |  6 ++---
 kernel/bpf/ringbuf.c           |  6 ++---
 kernel/bpf/stackmap.c          |  5 ++---
 net/core/bpf_sk_storage.c      |  5 ++---
 net/core/sock_map.c            | 10 ++++-----
 net/xdp/xskmap.c               |  6 ++---
 19 files changed, 62 insertions(+), 129 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bdb5298735ce..8383e756188e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -147,8 +147,7 @@ struct bpf_map_ops {
 				     bpf_callback_t callback_fn,
 				     void *callback_ctx, u64 flags);
 
-	/* BTF name and id of struct allocated by map_alloc */
-	const char * const map_btf_name;
+	/* BTF id of struct allocated by map_alloc */
 	int *map_btf_id;
 
 	/* bpf_iter info used to open a seq_file */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 7f145aefbff8..75f256f3f9ec 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -11,6 +11,7 @@
 #include <linux/perf_event.h>
 #include <uapi/linux/btf.h>
 #include <linux/rcupdate_trace.h>
+#include <linux/btf_ids.h>
 
 #include "map_in_map.h"
 
@@ -680,7 +681,7 @@ static int bpf_for_each_array_elem(struct bpf_map *map, bpf_callback_t callback_
 	return num_elems;
 }
 
-static int array_map_btf_id;
+BTF_ID_LIST_SINGLE(array_map_btf_ids, struct, bpf_array)
 const struct bpf_map_ops array_map_ops = {
 	.map_meta_equal = array_map_meta_equal,
 	.map_alloc_check = array_map_alloc_check,
@@ -701,12 +702,10 @@ const struct bpf_map_ops array_map_ops = {
 	.map_update_batch = generic_map_update_batch,
 	.map_set_for_each_callback_args = map_set_for_each_callback_args,
 	.map_for_each_callback = bpf_for_each_array_elem,
-	.map_btf_name = "bpf_array",
-	.map_btf_id = &array_map_btf_id,
+	.map_btf_id = &array_map_btf_ids[0],
 	.iter_seq_info = &iter_seq_info,
 };
 
-static int percpu_array_map_btf_id;
 const struct bpf_map_ops percpu_array_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc_check = array_map_alloc_check,
@@ -722,8 +721,7 @@ const struct bpf_map_ops percpu_array_map_ops = {
 	.map_update_batch = generic_map_update_batch,
 	.map_set_for_each_callback_args = map_set_for_each_callback_args,
 	.map_for_each_callback = bpf_for_each_array_elem,
-	.map_btf_name = "bpf_array",
-	.map_btf_id = &percpu_array_map_btf_id,
+	.map_btf_id = &array_map_btf_ids[0],
 	.iter_seq_info = &iter_seq_info,
 };
 
@@ -1102,7 +1100,6 @@ static void prog_array_map_free(struct bpf_map *map)
  * Thus, prog_array_map cannot be used as an inner_map
  * and map_meta_equal is not implemented.
  */
-static int prog_array_map_btf_id;
 const struct bpf_map_ops prog_array_map_ops = {
 	.map_alloc_check = fd_array_map_alloc_check,
 	.map_alloc = prog_array_map_alloc,
@@ -1118,8 +1115,7 @@ const struct bpf_map_ops prog_array_map_ops = {
 	.map_fd_sys_lookup_elem = prog_fd_array_sys_lookup_elem,
 	.map_release_uref = prog_array_map_clear,
 	.map_seq_show_elem = prog_array_map_seq_show_elem,
-	.map_btf_name = "bpf_array",
-	.map_btf_id = &prog_array_map_btf_id,
+	.map_btf_id = &array_map_btf_ids[0],
 };
 
 static struct bpf_event_entry *bpf_event_entry_gen(struct file *perf_file,
@@ -1208,7 +1204,6 @@ static void perf_event_fd_array_map_free(struct bpf_map *map)
 	fd_array_map_free(map);
 }
 
-static int perf_event_array_map_btf_id;
 const struct bpf_map_ops perf_event_array_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc_check = fd_array_map_alloc_check,
@@ -1221,8 +1216,7 @@ const struct bpf_map_ops perf_event_array_map_ops = {
 	.map_fd_put_ptr = perf_event_fd_array_put_ptr,
 	.map_release = perf_event_fd_array_release,
 	.map_check_btf = map_check_no_btf,
-	.map_btf_name = "bpf_array",
-	.map_btf_id = &perf_event_array_map_btf_id,
+	.map_btf_id = &array_map_btf_ids[0],
 };
 
 #ifdef CONFIG_CGROUPS
@@ -1245,7 +1239,6 @@ static void cgroup_fd_array_free(struct bpf_map *map)
 	fd_array_map_free(map);
 }
 
-static int cgroup_array_map_btf_id;
 const struct bpf_map_ops cgroup_array_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc_check = fd_array_map_alloc_check,
@@ -1257,8 +1250,7 @@ const struct bpf_map_ops cgroup_array_map_ops = {
 	.map_fd_get_ptr = cgroup_fd_array_get_ptr,
 	.map_fd_put_ptr = cgroup_fd_array_put_ptr,
 	.map_check_btf = map_check_no_btf,
-	.map_btf_name = "bpf_array",
-	.map_btf_id = &cgroup_array_map_btf_id,
+	.map_btf_id = &array_map_btf_ids[0],
 };
 #endif
 
@@ -1332,7 +1324,6 @@ static int array_of_map_gen_lookup(struct bpf_map *map,
 	return insn - insn_buf;
 }
 
-static int array_of_maps_map_btf_id;
 const struct bpf_map_ops array_of_maps_map_ops = {
 	.map_alloc_check = fd_array_map_alloc_check,
 	.map_alloc = array_of_map_alloc,
@@ -1345,6 +1336,5 @@ const struct bpf_map_ops array_of_maps_map_ops = {
 	.map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem,
 	.map_gen_lookup = array_of_map_gen_lookup,
 	.map_check_btf = map_check_no_btf,
-	.map_btf_name = "bpf_array",
-	.map_btf_id = &array_of_maps_map_btf_id,
+	.map_btf_id = &array_map_btf_ids[0],
 };
diff --git a/kernel/bpf/bloom_filter.c b/kernel/bpf/bloom_filter.c
index b141a1346f72..b9ea539a5561 100644
--- a/kernel/bpf/bloom_filter.c
+++ b/kernel/bpf/bloom_filter.c
@@ -7,6 +7,7 @@
 #include <linux/err.h>
 #include <linux/jhash.h>
 #include <linux/random.h>
+#include <linux/btf_ids.h>
 
 #define BLOOM_CREATE_FLAG_MASK \
 	(BPF_F_NUMA_NODE | BPF_F_ZERO_SEED | BPF_F_ACCESS_MASK)
@@ -192,7 +193,7 @@ static int bloom_map_check_btf(const struct bpf_map *map,
 	return btf_type_is_void(key_type) ? 0 : -EINVAL;
 }
 
-static int bpf_bloom_map_btf_id;
+BTF_ID_LIST_SINGLE(bpf_bloom_map_btf_ids, struct, bpf_bloom_filter)
 const struct bpf_map_ops bloom_filter_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc = bloom_map_alloc,
@@ -205,6 +206,5 @@ const struct bpf_map_ops bloom_filter_map_ops = {
 	.map_update_elem = bloom_map_update_elem,
 	.map_delete_elem = bloom_map_delete_elem,
 	.map_check_btf = bloom_map_check_btf,
-	.map_btf_name = "bpf_bloom_filter",
-	.map_btf_id = &bpf_bloom_map_btf_id,
+	.map_btf_id = &bpf_bloom_map_btf_ids[0],
 };
diff --git a/kernel/bpf/bpf_inode_storage.c b/kernel/bpf/bpf_inode_storage.c
index 96be8d518885..703bda2eda3c 100644
--- a/kernel/bpf/bpf_inode_storage.c
+++ b/kernel/bpf/bpf_inode_storage.c
@@ -245,7 +245,8 @@ static void inode_storage_map_free(struct bpf_map *map)
 	bpf_local_storage_map_free(smap, NULL);
 }
 
-static int inode_storage_map_btf_id;
+BTF_ID_LIST_SINGLE(inode_storage_map_btf_ids, struct,
+		   bpf_local_storage_map)
 const struct bpf_map_ops inode_storage_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc_check = bpf_local_storage_map_alloc_check,
@@ -256,8 +257,7 @@ const struct bpf_map_ops inode_storage_map_ops = {
 	.map_update_elem = bpf_fd_inode_storage_update_elem,
 	.map_delete_elem = bpf_fd_inode_storage_delete_elem,
 	.map_check_btf = bpf_local_storage_map_check_btf,
-	.map_btf_name = "bpf_local_storage_map",
-	.map_btf_id = &inode_storage_map_btf_id,
+	.map_btf_id = &inode_storage_map_btf_ids[0],
 	.map_owner_storage_ptr = inode_storage_ptr,
 };
 
diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c
index 21069dbe9138..f29619dfa72d 100644
--- a/kernel/bpf/bpf_struct_ops.c
+++ b/kernel/bpf/bpf_struct_ops.c
@@ -10,6 +10,7 @@
 #include <linux/seq_file.h>
 #include <linux/refcount.h>
 #include <linux/mutex.h>
+#include <linux/btf_ids.h>
 
 enum bpf_struct_ops_state {
 	BPF_STRUCT_OPS_STATE_INIT,
@@ -612,7 +613,7 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
 	return map;
 }
 
-static int bpf_struct_ops_map_btf_id;
+BTF_ID_LIST_SINGLE(bpf_struct_ops_map_btf_ids, struct, bpf_struct_ops_map)
 const struct bpf_map_ops bpf_struct_ops_map_ops = {
 	.map_alloc_check = bpf_struct_ops_map_alloc_check,
 	.map_alloc = bpf_struct_ops_map_alloc,
@@ -622,8 +623,7 @@ const struct bpf_map_ops bpf_struct_ops_map_ops = {
 	.map_delete_elem = bpf_struct_ops_map_delete_elem,
 	.map_update_elem = bpf_struct_ops_map_update_elem,
 	.map_seq_show_elem = bpf_struct_ops_map_seq_show_elem,
-	.map_btf_name = "bpf_struct_ops_map",
-	.map_btf_id = &bpf_struct_ops_map_btf_id,
+	.map_btf_id = &bpf_struct_ops_map_btf_ids[0],
 };
 
 /* "const void *" because some subsystem is
diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
index 6638a0ecc3d2..e8106df3a09e 100644
--- a/kernel/bpf/bpf_task_storage.c
+++ b/kernel/bpf/bpf_task_storage.c
@@ -307,7 +307,7 @@ static void task_storage_map_free(struct bpf_map *map)
 	bpf_local_storage_map_free(smap, &bpf_task_storage_busy);
 }
 
-static int task_storage_map_btf_id;
+BTF_ID_LIST_SINGLE(task_storage_map_btf_ids, struct, bpf_local_storage_map)
 const struct bpf_map_ops task_storage_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc_check = bpf_local_storage_map_alloc_check,
@@ -318,8 +318,7 @@ const struct bpf_map_ops task_storage_map_ops = {
 	.map_update_elem = bpf_pid_task_storage_update_elem,
 	.map_delete_elem = bpf_pid_task_storage_delete_elem,
 	.map_check_btf = bpf_local_storage_map_check_btf,
-	.map_btf_name = "bpf_local_storage_map",
-	.map_btf_id = &task_storage_map_btf_id,
+	.map_btf_id = &task_storage_map_btf_ids[0],
 	.map_owner_storage_ptr = task_storage_ptr,
 };
 
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 0918a39279f6..6bb04e5c1da7 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -4716,41 +4716,6 @@ btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf,
 	return ctx_type;
 }
 
-static const struct bpf_map_ops * const btf_vmlinux_map_ops[] = {
-#define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type)
-#define BPF_LINK_TYPE(_id, _name)
-#define BPF_MAP_TYPE(_id, _ops) \
-	[_id] = &_ops,
-#include <linux/bpf_types.h>
-#undef BPF_PROG_TYPE
-#undef BPF_LINK_TYPE
-#undef BPF_MAP_TYPE
-};
-
-static int btf_vmlinux_map_ids_init(const struct btf *btf,
-				    struct bpf_verifier_log *log)
-{
-	const struct bpf_map_ops *ops;
-	int i, btf_id;
-
-	for (i = 0; i < ARRAY_SIZE(btf_vmlinux_map_ops); ++i) {
-		ops = btf_vmlinux_map_ops[i];
-		if (!ops || (!ops->map_btf_name && !ops->map_btf_id))
-			continue;
-		if (!ops->map_btf_name || !ops->map_btf_id) {
-			bpf_log(log, "map type %d is misconfigured\n", i);
-			return -EINVAL;
-		}
-		btf_id = btf_find_by_name_kind(btf, ops->map_btf_name,
-					       BTF_KIND_STRUCT);
-		if (btf_id < 0)
-			return btf_id;
-		*ops->map_btf_id = btf_id;
-	}
-
-	return 0;
-}
-
 static int btf_translate_to_vmlinux(struct bpf_verifier_log *log,
 				     struct btf *btf,
 				     const struct btf_type *t,
@@ -4812,11 +4777,6 @@ struct btf *btf_parse_vmlinux(void)
 	/* btf_parse_vmlinux() runs under bpf_verifier_lock */
 	bpf_ctx_convert.t = btf_type_by_id(btf, bpf_ctx_convert_btf_id[0]);
 
-	/* find bpf map structs for map_ptr access checking */
-	err = btf_vmlinux_map_ids_init(btf, log);
-	if (err < 0)
-		goto errout;
-
 	bpf_struct_ops_init(btf, log);
 
 	refcount_set(&btf->refcnt, 1);
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 650e5d21f90d..f4860ac756cd 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -27,6 +27,7 @@
 #include <linux/kthread.h>
 #include <linux/capability.h>
 #include <trace/events/xdp.h>
+#include <linux/btf_ids.h>
 
 #include <linux/netdevice.h>   /* netif_receive_skb_list */
 #include <linux/etherdevice.h> /* eth_type_trans */
@@ -673,7 +674,7 @@ static int cpu_map_redirect(struct bpf_map *map, u32 ifindex, u64 flags)
 				      __cpu_map_lookup_elem);
 }
 
-static int cpu_map_btf_id;
+BTF_ID_LIST_SINGLE(cpu_map_btf_ids, struct, bpf_cpu_map)
 const struct bpf_map_ops cpu_map_ops = {
 	.map_meta_equal		= bpf_map_meta_equal,
 	.map_alloc		= cpu_map_alloc,
@@ -683,8 +684,7 @@ const struct bpf_map_ops cpu_map_ops = {
 	.map_lookup_elem	= cpu_map_lookup_elem,
 	.map_get_next_key	= cpu_map_get_next_key,
 	.map_check_btf		= map_check_no_btf,
-	.map_btf_name		= "bpf_cpu_map",
-	.map_btf_id		= &cpu_map_btf_id,
+	.map_btf_id		= &cpu_map_btf_ids[0],
 	.map_redirect		= cpu_map_redirect,
 };
 
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 038f6d7a83e4..c2867068e5bd 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -48,6 +48,7 @@
 #include <net/xdp.h>
 #include <linux/filter.h>
 #include <trace/events/xdp.h>
+#include <linux/btf_ids.h>
 
 #define DEV_CREATE_FLAG_MASK \
 	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
@@ -1005,7 +1006,7 @@ static int dev_hash_map_redirect(struct bpf_map *map, u32 ifindex, u64 flags)
 				      __dev_map_hash_lookup_elem);
 }
 
-static int dev_map_btf_id;
+BTF_ID_LIST_SINGLE(dev_map_btf_ids, struct, bpf_dtab)
 const struct bpf_map_ops dev_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc = dev_map_alloc,
@@ -1015,12 +1016,10 @@ const struct bpf_map_ops dev_map_ops = {
 	.map_update_elem = dev_map_update_elem,
 	.map_delete_elem = dev_map_delete_elem,
 	.map_check_btf = map_check_no_btf,
-	.map_btf_name = "bpf_dtab",
-	.map_btf_id = &dev_map_btf_id,
+	.map_btf_id = &dev_map_btf_ids[0],
 	.map_redirect = dev_map_redirect,
 };
 
-static int dev_map_hash_map_btf_id;
 const struct bpf_map_ops dev_map_hash_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc = dev_map_alloc,
@@ -1030,8 +1029,7 @@ const struct bpf_map_ops dev_map_hash_ops = {
 	.map_update_elem = dev_map_hash_update_elem,
 	.map_delete_elem = dev_map_hash_delete_elem,
 	.map_check_btf = map_check_no_btf,
-	.map_btf_name = "bpf_dtab",
-	.map_btf_id = &dev_map_hash_map_btf_id,
+	.map_btf_id = &dev_map_btf_ids[0],
 	.map_redirect = dev_hash_map_redirect,
 };
 
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 65877967f414..874e261b114d 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -10,6 +10,7 @@
 #include <linux/random.h>
 #include <uapi/linux/btf.h>
 #include <linux/rcupdate_trace.h>
+#include <linux/btf_ids.h>
 #include "percpu_freelist.h"
 #include "bpf_lru_list.h"
 #include "map_in_map.h"
@@ -2105,7 +2106,7 @@ static int bpf_for_each_hash_elem(struct bpf_map *map, bpf_callback_t callback_f
 	return num_elems;
 }
 
-static int htab_map_btf_id;
+BTF_ID_LIST_SINGLE(htab_map_btf_ids, struct, bpf_htab)
 const struct bpf_map_ops htab_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc_check = htab_map_alloc_check,
@@ -2122,12 +2123,10 @@ const struct bpf_map_ops htab_map_ops = {
 	.map_set_for_each_callback_args = map_set_for_each_callback_args,
 	.map_for_each_callback = bpf_for_each_hash_elem,
 	BATCH_OPS(htab),
-	.map_btf_name = "bpf_htab",
-	.map_btf_id = &htab_map_btf_id,
+	.map_btf_id = &htab_map_btf_ids[0],
 	.iter_seq_info = &iter_seq_info,
 };
 
-static int htab_lru_map_btf_id;
 const struct bpf_map_ops htab_lru_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc_check = htab_map_alloc_check,
@@ -2145,8 +2144,7 @@ const struct bpf_map_ops htab_lru_map_ops = {
 	.map_set_for_each_callback_args = map_set_for_each_callback_args,
 	.map_for_each_callback = bpf_for_each_hash_elem,
 	BATCH_OPS(htab_lru),
-	.map_btf_name = "bpf_htab",
-	.map_btf_id = &htab_lru_map_btf_id,
+	.map_btf_id = &htab_map_btf_ids[0],
 	.iter_seq_info = &iter_seq_info,
 };
 
@@ -2252,7 +2250,6 @@ static void htab_percpu_map_seq_show_elem(struct bpf_map *map, void *key,
 	rcu_read_unlock();
 }
 
-static int htab_percpu_map_btf_id;
 const struct bpf_map_ops htab_percpu_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc_check = htab_map_alloc_check,
@@ -2267,12 +2264,10 @@ const struct bpf_map_ops htab_percpu_map_ops = {
 	.map_set_for_each_callback_args = map_set_for_each_callback_args,
 	.map_for_each_callback = bpf_for_each_hash_elem,
 	BATCH_OPS(htab_percpu),
-	.map_btf_name = "bpf_htab",
-	.map_btf_id = &htab_percpu_map_btf_id,
+	.map_btf_id = &htab_map_btf_ids[0],
 	.iter_seq_info = &iter_seq_info,
 };
 
-static int htab_lru_percpu_map_btf_id;
 const struct bpf_map_ops htab_lru_percpu_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc_check = htab_map_alloc_check,
@@ -2287,8 +2282,7 @@ const struct bpf_map_ops htab_lru_percpu_map_ops = {
 	.map_set_for_each_callback_args = map_set_for_each_callback_args,
 	.map_for_each_callback = bpf_for_each_hash_elem,
 	BATCH_OPS(htab_lru_percpu),
-	.map_btf_name = "bpf_htab",
-	.map_btf_id = &htab_lru_percpu_map_btf_id,
+	.map_btf_id = &htab_map_btf_ids[0],
 	.iter_seq_info = &iter_seq_info,
 };
 
@@ -2412,7 +2406,6 @@ static void htab_of_map_free(struct bpf_map *map)
 	fd_htab_map_free(map);
 }
 
-static int htab_of_maps_map_btf_id;
 const struct bpf_map_ops htab_of_maps_map_ops = {
 	.map_alloc_check = fd_htab_map_alloc_check,
 	.map_alloc = htab_of_map_alloc,
@@ -2425,6 +2418,5 @@ const struct bpf_map_ops htab_of_maps_map_ops = {
 	.map_fd_sys_lookup_elem = bpf_map_fd_sys_lookup_elem,
 	.map_gen_lookup = htab_of_map_gen_lookup,
 	.map_check_btf = map_check_no_btf,
-	.map_btf_name = "bpf_htab",
-	.map_btf_id = &htab_of_maps_map_btf_id,
+	.map_btf_id = &htab_map_btf_ids[0],
 };
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 497916060ac7..8654fc97f5fe 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -9,6 +9,7 @@
 #include <linux/rbtree.h>
 #include <linux/slab.h>
 #include <uapi/linux/btf.h>
+#include <linux/btf_ids.h>
 
 #ifdef CONFIG_CGROUP_BPF
 
@@ -446,7 +447,8 @@ static void cgroup_storage_seq_show_elem(struct bpf_map *map, void *key,
 	rcu_read_unlock();
 }
 
-static int cgroup_storage_map_btf_id;
+BTF_ID_LIST_SINGLE(cgroup_storage_map_btf_ids, struct,
+		   bpf_cgroup_storage_map)
 const struct bpf_map_ops cgroup_storage_map_ops = {
 	.map_alloc = cgroup_storage_map_alloc,
 	.map_free = cgroup_storage_map_free,
@@ -456,8 +458,7 @@ const struct bpf_map_ops cgroup_storage_map_ops = {
 	.map_delete_elem = cgroup_storage_delete_elem,
 	.map_check_btf = cgroup_storage_check_btf,
 	.map_seq_show_elem = cgroup_storage_seq_show_elem,
-	.map_btf_name = "bpf_cgroup_storage_map",
-	.map_btf_id = &cgroup_storage_map_btf_id,
+	.map_btf_id = &cgroup_storage_map_btf_ids[0],
 };
 
 int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux, struct bpf_map *_map)
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 5763cc7ac4f1..f0d05a3cc4b9 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -14,6 +14,7 @@
 #include <linux/vmalloc.h>
 #include <net/ipv6.h>
 #include <uapi/linux/btf.h>
+#include <linux/btf_ids.h>
 
 /* Intermediate node */
 #define LPM_TREE_NODE_FLAG_IM BIT(0)
@@ -719,7 +720,7 @@ static int trie_check_btf(const struct bpf_map *map,
 	       -EINVAL : 0;
 }
 
-static int trie_map_btf_id;
+BTF_ID_LIST_SINGLE(trie_map_btf_ids, struct, lpm_trie)
 const struct bpf_map_ops trie_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc = trie_alloc,
@@ -732,6 +733,5 @@ const struct bpf_map_ops trie_map_ops = {
 	.map_update_batch = generic_map_update_batch,
 	.map_delete_batch = generic_map_delete_batch,
 	.map_check_btf = trie_check_btf,
-	.map_btf_name = "lpm_trie",
-	.map_btf_id = &trie_map_btf_id,
+	.map_btf_id = &trie_map_btf_ids[0],
 };
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index f9c734aaa990..a1c0794ae49d 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -8,6 +8,7 @@
 #include <linux/list.h>
 #include <linux/slab.h>
 #include <linux/capability.h>
+#include <linux/btf_ids.h>
 #include "percpu_freelist.h"
 
 #define QUEUE_STACK_CREATE_FLAG_MASK \
@@ -247,7 +248,7 @@ static int queue_stack_map_get_next_key(struct bpf_map *map, void *key,
 	return -EINVAL;
 }
 
-static int queue_map_btf_id;
+BTF_ID_LIST_SINGLE(queue_map_btf_ids, struct, bpf_queue_stack)
 const struct bpf_map_ops queue_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc_check = queue_stack_map_alloc_check,
@@ -260,11 +261,9 @@ const struct bpf_map_ops queue_map_ops = {
 	.map_pop_elem = queue_map_pop_elem,
 	.map_peek_elem = queue_map_peek_elem,
 	.map_get_next_key = queue_stack_map_get_next_key,
-	.map_btf_name = "bpf_queue_stack",
-	.map_btf_id = &queue_map_btf_id,
+	.map_btf_id = &queue_map_btf_ids[0],
 };
 
-static int stack_map_btf_id;
 const struct bpf_map_ops stack_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc_check = queue_stack_map_alloc_check,
@@ -277,6 +276,5 @@ const struct bpf_map_ops stack_map_ops = {
 	.map_pop_elem = stack_map_pop_elem,
 	.map_peek_elem = stack_map_peek_elem,
 	.map_get_next_key = queue_stack_map_get_next_key,
-	.map_btf_name = "bpf_queue_stack",
-	.map_btf_id = &stack_map_btf_id,
+	.map_btf_id = &queue_map_btf_ids[0],
 };
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index 8251243022a2..e2618fb5870e 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -6,6 +6,7 @@
 #include <linux/err.h>
 #include <linux/sock_diag.h>
 #include <net/sock_reuseport.h>
+#include <linux/btf_ids.h>
 
 struct reuseport_array {
 	struct bpf_map map;
@@ -337,7 +338,7 @@ static int reuseport_array_get_next_key(struct bpf_map *map, void *key,
 	return 0;
 }
 
-static int reuseport_array_map_btf_id;
+BTF_ID_LIST_SINGLE(reuseport_array_map_btf_ids, struct, reuseport_array)
 const struct bpf_map_ops reuseport_array_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc_check = reuseport_array_alloc_check,
@@ -346,6 +347,5 @@ const struct bpf_map_ops reuseport_array_ops = {
 	.map_lookup_elem = reuseport_array_lookup_elem,
 	.map_get_next_key = reuseport_array_get_next_key,
 	.map_delete_elem = reuseport_array_delete_elem,
-	.map_btf_name = "reuseport_array",
-	.map_btf_id = &reuseport_array_map_btf_id,
+	.map_btf_id = &reuseport_array_map_btf_ids[0],
 };
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 710ba9de12ce..b651e45228d2 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -10,6 +10,7 @@
 #include <linux/poll.h>
 #include <linux/kmemleak.h>
 #include <uapi/linux/btf.h>
+#include <linux/btf_ids.h>
 
 #define RINGBUF_CREATE_FLAG_MASK (BPF_F_NUMA_NODE)
 
@@ -263,7 +264,7 @@ static __poll_t ringbuf_map_poll(struct bpf_map *map, struct file *filp,
 	return 0;
 }
 
-static int ringbuf_map_btf_id;
+BTF_ID_LIST_SINGLE(ringbuf_map_btf_ids, struct, bpf_ringbuf_map)
 const struct bpf_map_ops ringbuf_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc = ringbuf_map_alloc,
@@ -274,8 +275,7 @@ const struct bpf_map_ops ringbuf_map_ops = {
 	.map_update_elem = ringbuf_map_update_elem,
 	.map_delete_elem = ringbuf_map_delete_elem,
 	.map_get_next_key = ringbuf_map_get_next_key,
-	.map_btf_name = "bpf_ringbuf_map",
-	.map_btf_id = &ringbuf_map_btf_id,
+	.map_btf_id = &ringbuf_map_btf_ids[0],
 };
 
 /* Given pointer to ring buffer record metadata and struct bpf_ringbuf itself,
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 1dd5266fbebb..1adbe67cdb95 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -654,7 +654,7 @@ static void stack_map_free(struct bpf_map *map)
 	put_callchain_buffers();
 }
 
-static int stack_trace_map_btf_id;
+BTF_ID_LIST_SINGLE(stack_trace_map_btf_ids, struct, bpf_stack_map)
 const struct bpf_map_ops stack_trace_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc = stack_map_alloc,
@@ -664,6 +664,5 @@ const struct bpf_map_ops stack_trace_map_ops = {
 	.map_update_elem = stack_map_update_elem,
 	.map_delete_elem = stack_map_delete_elem,
 	.map_check_btf = map_check_no_btf,
-	.map_btf_name = "bpf_stack_map",
-	.map_btf_id = &stack_trace_map_btf_id,
+	.map_btf_id = &stack_trace_map_btf_ids[0],
 };
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index e3ac36380520..4008dd7c1b90 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -338,7 +338,7 @@ bpf_sk_storage_ptr(void *owner)
 	return &sk->sk_bpf_storage;
 }
 
-static int sk_storage_map_btf_id;
+BTF_ID_LIST_SINGLE(sk_storage_map_btf_ids, struct, bpf_local_storage_map)
 const struct bpf_map_ops sk_storage_map_ops = {
 	.map_meta_equal = bpf_map_meta_equal,
 	.map_alloc_check = bpf_local_storage_map_alloc_check,
@@ -349,8 +349,7 @@ const struct bpf_map_ops sk_storage_map_ops = {
 	.map_update_elem = bpf_fd_sk_storage_update_elem,
 	.map_delete_elem = bpf_fd_sk_storage_delete_elem,
 	.map_check_btf = bpf_local_storage_map_check_btf,
-	.map_btf_name = "bpf_local_storage_map",
-	.map_btf_id = &sk_storage_map_btf_id,
+	.map_btf_id = &sk_storage_map_btf_ids[0],
 	.map_local_storage_charge = bpf_sk_storage_charge,
 	.map_local_storage_uncharge = bpf_sk_storage_uncharge,
 	.map_owner_storage_ptr = bpf_sk_storage_ptr,
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 2d213c4011db..81d4b4756a02 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -793,7 +793,7 @@ static const struct bpf_iter_seq_info sock_map_iter_seq_info = {
 	.seq_priv_size		= sizeof(struct sock_map_seq_info),
 };
 
-static int sock_map_btf_id;
+BTF_ID_LIST_SINGLE(sock_map_btf_ids, struct, bpf_stab)
 const struct bpf_map_ops sock_map_ops = {
 	.map_meta_equal		= bpf_map_meta_equal,
 	.map_alloc		= sock_map_alloc,
@@ -805,8 +805,7 @@ const struct bpf_map_ops sock_map_ops = {
 	.map_lookup_elem	= sock_map_lookup,
 	.map_release_uref	= sock_map_release_progs,
 	.map_check_btf		= map_check_no_btf,
-	.map_btf_name		= "bpf_stab",
-	.map_btf_id		= &sock_map_btf_id,
+	.map_btf_id		= &sock_map_btf_ids[0],
 	.iter_seq_info		= &sock_map_iter_seq_info,
 };
 
@@ -1385,7 +1384,7 @@ static const struct bpf_iter_seq_info sock_hash_iter_seq_info = {
 	.seq_priv_size		= sizeof(struct sock_hash_seq_info),
 };
 
-static int sock_hash_map_btf_id;
+BTF_ID_LIST_SINGLE(sock_hash_map_btf_ids, struct, bpf_shtab)
 const struct bpf_map_ops sock_hash_ops = {
 	.map_meta_equal		= bpf_map_meta_equal,
 	.map_alloc		= sock_hash_alloc,
@@ -1397,8 +1396,7 @@ const struct bpf_map_ops sock_hash_ops = {
 	.map_lookup_elem_sys_only = sock_hash_lookup_sys,
 	.map_release_uref	= sock_hash_release_progs,
 	.map_check_btf		= map_check_no_btf,
-	.map_btf_name		= "bpf_shtab",
-	.map_btf_id		= &sock_hash_map_btf_id,
+	.map_btf_id		= &sock_hash_map_btf_ids[0],
 	.iter_seq_info		= &sock_hash_iter_seq_info,
 };
 
diff --git a/net/xdp/xskmap.c b/net/xdp/xskmap.c
index 65b53fb3de13..acc8e52a4f5f 100644
--- a/net/xdp/xskmap.c
+++ b/net/xdp/xskmap.c
@@ -9,6 +9,7 @@
 #include <net/xdp_sock.h>
 #include <linux/slab.h>
 #include <linux/sched.h>
+#include <linux/btf_ids.h>
 
 #include "xsk.h"
 
@@ -254,7 +255,7 @@ static bool xsk_map_meta_equal(const struct bpf_map *meta0,
 		bpf_map_meta_equal(meta0, meta1);
 }
 
-static int xsk_map_btf_id;
+BTF_ID_LIST_SINGLE(xsk_map_btf_ids, struct, xsk_map)
 const struct bpf_map_ops xsk_map_ops = {
 	.map_meta_equal = xsk_map_meta_equal,
 	.map_alloc = xsk_map_alloc,
@@ -266,7 +267,6 @@ const struct bpf_map_ops xsk_map_ops = {
 	.map_update_elem = xsk_map_update_elem,
 	.map_delete_elem = xsk_map_delete_elem,
 	.map_check_btf = map_check_no_btf,
-	.map_btf_name = "xsk_map",
-	.map_btf_id = &xsk_map_btf_id,
+	.map_btf_id = &xsk_map_btf_ids[0],
 	.map_redirect = xsk_map_redirect,
 };
-- 
2.36.0


^ permalink raw reply related

* [PATCH v2 1/1] ixgbe: correct SDP0 check of SFP cage for X550
From: Jeff Daly @ 2022-04-25 13:17 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: Stephen Douthit, Jesse Brandeburg, Tony Nguyen, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Jeff Kirsher, Don Skidmore,
	moderated list:INTEL ETHERNET DRIVERS,
	open list:NETWORKING DRIVERS, open list
In-Reply-To: <20220420205130.23616-1-jeffd@silicom-usa.com>

SDP0 for X550 NICs is active low to indicate the presence of an SFP in the
cage (MOD_ABS#).  Invert the results of the logical AND to set
sfp_cage_full variable correctly.

Fixes: aac9e053f104 ("ixgbe: cleanup crosstalk fix")
Suggested-by: Stephen Douthit <stephend@silicom-usa.com>
Signed-off-by: Jeff Daly <jeffd@silicom-usa.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 4c26c4b92f07..13482d4e24e2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -3299,17 +3299,17 @@ s32 ixgbe_check_mac_link_generic(struct ixgbe_hw *hw, ixgbe_link_speed *speed,
 	 * the SFP+ cage is full.
 	 */
 	if (ixgbe_need_crosstalk_fix(hw)) {
-		u32 sfp_cage_full;
+		bool sfp_cage_full;
 
 		switch (hw->mac.type) {
 		case ixgbe_mac_82599EB:
-			sfp_cage_full = IXGBE_READ_REG(hw, IXGBE_ESDP) &
-					IXGBE_ESDP_SDP2;
+			sfp_cage_full = !!(IXGBE_READ_REG(hw, IXGBE_ESDP) &
+					   IXGBE_ESDP_SDP2);
 			break;
 		case ixgbe_mac_X550EM_x:
 		case ixgbe_mac_x550em_a:
-			sfp_cage_full = IXGBE_READ_REG(hw, IXGBE_ESDP) &
-					IXGBE_ESDP_SDP0;
+			sfp_cage_full = !(IXGBE_READ_REG(hw, IXGBE_ESDP) &
+					  IXGBE_ESDP_SDP0);
 			break;
 		default:
 			/* sanity check - No SFP+ devices here */
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH v5 09/11] net: ieee802154: atusb: Call _xmit_error() when a transmission fails
From: Miquel Raynal @ 2022-04-25 13:16 UTC (permalink / raw)
  To: Alexander Aring
  Cc: Stefan Schmidt, linux-wpan - ML, David S. Miller, Jakub Kicinski,
	open list:NETWORKING [GENERAL], David Girault, Romuald Despres,
	Frederic Blain, Nicolas Schodet, Thomas Petazzoni
In-Reply-To: <CAB_54W4X4--=kmXW-xtV5WiawP0s0UWSCZWb4U8wOR-xhhgR9g@mail.gmail.com>

Hi Alexander,

alex.aring@gmail.com wrote on Mon, 25 Apr 2022 09:05:49 -0400:

> Hi,
> 
> On Mon, Apr 25, 2022 at 8:35 AM Alexander Aring <alex.aring@gmail.com> wrote:
> >
> > Hi,
> >
> > On Thu, Apr 7, 2022 at 4:06 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:  
> > >
> > > Hi Alexander,
> > >
> > > alex.aring@gmail.com wrote on Wed, 6 Apr 2022 17:58:59 -0400:
> > >  
> > > > Hi,
> > > >
> > > > On Wed, Apr 6, 2022 at 11:34 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:  
> > > > >
> > > > > ieee802154_xmit_error() is the right helper to call when a transmission
> > > > > has failed. Let's use it instead of open-coding it.
> > > > >
> > > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > > > > ---
> > > > >  drivers/net/ieee802154/atusb.c | 5 ++---
> > > > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/ieee802154/atusb.c b/drivers/net/ieee802154/atusb.c
> > > > > index f27a5f535808..d04db4d07a64 100644
> > > > > --- a/drivers/net/ieee802154/atusb.c
> > > > > +++ b/drivers/net/ieee802154/atusb.c
> > > > > @@ -271,9 +271,8 @@ static void atusb_tx_done(struct atusb *atusb, u8 seq)
> > > > >                  * unlikely case now that seq == expect is then true, but can
> > > > >                  * happen and fail with a tx_skb = NULL;
> > > > >                  */
> > > > > -               ieee802154_wake_queue(atusb->hw);
> > > > > -               if (atusb->tx_skb)
> > > > > -                       dev_kfree_skb_irq(atusb->tx_skb);
> > > > > +               ieee802154_xmit_error(atusb->hw, atusb->tx_skb,
> > > > > +                                     IEEE802154_SYSTEM_ERROR);  
> > > >
> > > > That should then call the xmit_error for ANY other reason which is not
> > > > 802.15.4 specific which is the bus_error() function?  
> > >
> > > I'll drop the bus error function so we can stick to a regular
> > > _xmit_error() call.
> > >  
> >
> > Okay, this error is only hardware related.
> >  
> > > Besides, we do not have any trac information nor any easy access to
> > > what failed exactly, so it's probably best anyway.  
> >
> > This is correct, Somebody needs to write support for it for atusb firmware. [0]
> > However some transceivers can't yet or will never (because lack of
> > functionality?) report such errors back... they will act a little bit
> > weird.
> >
> > However, this return value is a BIG step moving into the right
> > direction that other drivers can follow.
> >
> > I think for MLME ops we can definitely handle some transmit errors now
> > and retry so that we don't wait for anything when we know the
> > transceiver was never submitting.
> >  
> 
> s/submitting/transmitted/
> 
> I could more deeper into that topic:
> 
> 1.
> 
> In my opinion this result value was especially necessary for MLME-ops,
> for others which do not directly work with MAC... they provide an
> upper layer protocol if they want something reliable.
> 
> 2.
> 
> Later on we can do statistics like what was already going around in
> the linux-wpan community to have something like whatever dump to see
> all neighbors and see how many ack failures there, etc. Some people
> want to make some predictions about link quality here. The kernel
> should therefore only capture some stats and the $WHATEVER userspace
> capable netlink monitor daemon should make some heuristic by dumping
> those stats.

I like the idea of having a per-device dump of the stats. It would be
really straightforward to implement with the current scan
implementation that I am about to share. We already have a per PAN
structure (with information like ID, channel, last time it was seen,
strength, etc). We might improve this structure with counters for all
the common mac errors. Maybe an option to the "pans dump" command
(again, in the pipe) might be a good start to get all the stats, like
"pans dump [stats]". I'll keep this in mind.

> 3.
> 
> Sometimes even IP capable protocols (using 6LoWPAN) want to know if
> ack was received or not, as mentioned. But this required additional
> handling in the socket layers... I didn't look into that topic yet but
> I know wireless solved it because they have some similar problems.

I did not look at the upper layers yet, but that would indeed be a nice
use case of these MAC statuses.

Thanks,
Miquèl

^ permalink raw reply

* Re: [PATCH net] tcp: fix potential xmit stalls caused by TCP_NOTSENT_LOWAT
From: Dave Taht @ 2022-04-25 13:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Eric Dumazet, Doug Porter, Soheil Hassas Yeganeh, Neal Cardwell
In-Reply-To: <20220425003407.3002429-1-eric.dumazet@gmail.com>

On Sun, Apr 24, 2022 at 7:00 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
>
> I had this bug sitting for too long in my pile, it is time to fix it.
>
> Thanks to Doug Porter for reminding me of it!
>
> We had various attempts in the past, including commit
> 0cbe6a8f089e ("tcp: remove SOCK_QUEUE_SHRUNK"),
> but the issue is that TCP stack currently only generates
> EPOLLOUT from input path, when tp->snd_una has advanced
> and skb(s) cleaned from rtx queue.
>
> If a flow has a big RTT, and/or receives SACKs, it is possible
> that the notsent part (tp->write_seq - tp->snd_nxt) reaches 0
> and no more data can be sent until tp->snd_una finally advances.
>
> What is needed is to also check if POLLOUT needs to be generated
> whenever tp->snd_nxt is advanced, from output path.
>
> This bug triggers more often after an idle period, as
> we do not receive ACK for at least one RTT. tcp_notsent_lowat
> could be a fraction of what CWND and pacing rate would allow to
> send during this RTT.
>
> In a followup patch, I will remove the bogus call
> to tcp_chrono_stop(sk, TCP_CHRONO_SNDBUF_LIMITED)
> from tcp_check_space(). Fact that we have decided to generate
> an EPOLLOUT does not mean the application has immediately
> refilled the transmit queue. This optimistic call
> might have been the reason the bug seemed not too serious.
>
> Tested:
>
> 200 ms rtt, 1% packet loss, 32 MB tcp_rmem[2] and tcp_wmem[2]
>
> $ echo 500000 >/proc/sys/net/ipv4/tcp_notsent_lowat
> $ cat bench_rr.sh
> SUM=0
> for i in {1..10}
> do
>  V=`netperf -H remote_host -l30 -t TCP_RR -- -r 10000000,10000 -o LOCAL_BYTES_SENT | egrep -v "MIGRATED|Bytes"`
>  echo $V
>  SUM=$(($SUM + $V))
> done
> echo SUM=$SUM
>
> Before patch:
> $ bench_rr.sh
> 130000000
> 80000000
> 140000000
> 140000000
> 140000000
> 140000000
> 130000000
> 40000000
> 90000000
> 110000000
> SUM=1140000000
>
> After patch:
> $ bench_rr.sh
> 430000000
> 590000000
> 530000000
> 450000000
> 450000000
> 350000000
> 450000000
> 490000000
> 480000000
> 460000000
> SUM=4680000000  # This is 410 % of the value before patch.
>
> Fixes: c9bee3b7fdec ("tcp: TCP_NOTSENT_LOWAT socket option")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Doug Porter <dsp@fb.com>
> Cc: Soheil Hassas Yeganeh <soheil@google.com>
> Cc: Neal Cardwell <ncardwell@google.com>
> ---
>  include/net/tcp.h     |  1 +
>  net/ipv4/tcp_input.c  | 12 +++++++++++-
>  net/ipv4/tcp_output.c |  1 +
>  3 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 9987b3fba9f202632916cc439af9d17f1e68bcd3..cc1295037533a7741e454f7c040f77a21deae02b 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -621,6 +621,7 @@ void tcp_synack_rtt_meas(struct sock *sk, struct request_sock *req);
>  void tcp_reset(struct sock *sk, struct sk_buff *skb);
>  void tcp_skb_mark_lost_uncond_verify(struct tcp_sock *tp, struct sk_buff *skb);
>  void tcp_fin(struct sock *sk);
> +void tcp_check_space(struct sock *sk);
>
>  /* tcp_timer.c */
>  void tcp_init_xmit_timers(struct sock *);
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 2088f93fa37b5fb9110e7933242a27bd4009990e..48f6075228600896daa6507c4cd06acfc851a0fa 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -5454,7 +5454,17 @@ static void tcp_new_space(struct sock *sk)
>         INDIRECT_CALL_1(sk->sk_write_space, sk_stream_write_space, sk);
>  }
>
> -static void tcp_check_space(struct sock *sk)
> +/* Caller made space either from:
> + * 1) Freeing skbs in rtx queues (after tp->snd_una has advanced)
> + * 2) Sent skbs from output queue (and thus advancing tp->snd_nxt)
> + *
> + * We might be able to generate EPOLLOUT to the application if:
> + * 1) Space consumed in output/rtx queues is below sk->sk_sndbuf/2
> + * 2) notsent amount (tp->write_seq - tp->snd_nxt) became
> + *    small enough that tcp_stream_memory_free() decides it
> + *    is time to generate EPOLLOUT.
> + */
> +void tcp_check_space(struct sock *sk)
>  {
>         /* pairs with tcp_poll() */
>         smp_mb();
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 9ede847f4199844c5884e3f62ea450562072a0a7..1ca2f28c9981018e6cfaee3435d711467af6048d 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -82,6 +82,7 @@ static void tcp_event_new_data_sent(struct sock *sk, struct sk_buff *skb)
>
>         NET_ADD_STATS(sock_net(sk), LINUX_MIB_TCPORIGDATASENT,
>                       tcp_skb_pcount(skb));
> +       tcp_check_space(sk);
>  }
>
>  /* SND.NXT, if window was not shrunk or the amount of shrunk was less than one
> --
> 2.36.0.rc2.479.g8af0fa9b8e-goog
>

Thx! We have been having very good results with TCP_NOTSENT_LOWAT set
to 32k or less behind an apache traffic server... and had some really
puzzling ones at geosync RTTs. Now I gotta go retest.

Side question: Is there a guide/set of recommendations to setting this
value more appropriately, under what circumstances? Could it
autoconfigure?

net.ipv4.tcp_notsent_lowat = 32768

-- 
FQ World Domination pending: https://blog.cerowrt.org/post/state_of_fq_codel/
Dave Täht CEO, TekLibre, LLC

^ permalink raw reply

* Re: [PATCH] dt-bindings: can: renesas,rcar-canfd: Document RZ/G2UL support
From: Geert Uytterhoeven @ 2022-04-25 13:11 UTC (permalink / raw)
  To: Biju Das
  Cc: Wolfgang Grandegger, Marc Kleine-Budde, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Rob Herring, Krzysztof Kozlowski,
	Fabrizio Castro, linux-can, netdev,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Chris Paterson, Biju Das, Prabhakar Mahadev Lad, Linux-Renesas
In-Reply-To: <20220423130743.123198-1-biju.das.jz@bp.renesas.com>

On Sat, Apr 23, 2022 at 3:07 PM Biju Das <biju.das.jz@bp.renesas.com> wrote:
> Add CANFD binding documentation for Renesas R9A07G043 (RZ/G2UL) SoC.
>
> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH net v3] virtio_net: fix wrong buf address calculation when using xdp
From: Daniel Borkmann @ 2022-04-25 13:10 UTC (permalink / raw)
  To: Nikolay Aleksandrov, netdev
  Cc: kuba, davem, stable, Jason Wang, Xuan Zhuo, Michael S. Tsirkin,
	virtualization
In-Reply-To: <20220425103703.3067292-1-razor@blackwall.org>

On 4/25/22 12:37 PM, Nikolay Aleksandrov wrote:
> We received a report[1] of kernel crashes when Cilium is used in XDP
> mode with virtio_net after updating to newer kernels. After
> investigating the reason it turned out that when using mergeable bufs
> with an XDP program which adjusts xdp.data or xdp.data_meta page_to_buf()
> calculates the build_skb address wrong because the offset can become less
> than the headroom so it gets the address of the previous page (-X bytes
> depending on how lower offset is):
>   page_to_skb: page addr ffff9eb2923e2000 buf ffff9eb2923e1ffc offset 252 headroom 256
[...]
> CC: stable@vger.kernel.org
> CC: Jason Wang <jasowang@redhat.com>
> CC: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> CC: Daniel Borkmann <daniel@iogearbox.net>
> CC: "Michael S. Tsirkin" <mst@redhat.com>
> CC: virtualization@lists.linux-foundation.org
> Fixes: 8fb7da9e9907 ("virtio_net: get build_skb() buf by data ptr")
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>

Thanks everyone!

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH v22 1/2] wireless: Initial driver submission for pureLiFi STA devices
From: Kalle Valo @ 2022-04-25 13:06 UTC (permalink / raw)
  To: Srinivasan Raju
  Cc: mostafa.afgani, Srinivasan Raju, David S. Miller, Jakub Kicinski,
	open list, open list:NETWORKING DRIVERS (WIRELESS),
	open list:NETWORKING DRIVERS
In-Reply-To: <20220224182042.132466-3-srini.raju@purelifi.com>

Srinivasan Raju <srini.raju@purelifi.com> wrote:

> This driver implementation has been based on the zd1211rw driver
> 
> Driver is based on 802.11 softMAC Architecture and uses
> native 802.11 for configuration and management
> 
> The driver is compiled and tested in ARM, x86 architectures and
> compiled in powerpc architecture
> 
> Signed-off-by: Srinivasan Raju <srini.raju@purelifi.com>

While reviewing this for the last time I did some minor changes in the pending branch:

https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git/commit/?h=pending&id=68d57a07bfe5bb29b80cd8b8fa24c9d1ea104124

Here's the commit log I plan to use:

    wireless: add plfxlc driver for pureLiFi X, XL, XC devices
    
    This is a driver for pureLiFi X, XL, XC devices which use light to transmit
    data, so they are not compatible with normal Wi-Fi devices. The driver uses
    separate NL80211_BAND_LC band to distinguish from Wi-Fi.  The driver is based
    on 802.11 softMAC Architecture and uses native 802.11 for configuration and
    management. Station and Ad-Hoc modes are supported.
    
    The driver is compiled and tested in ARM, x86 architectures and compiled in
    powerpc architecture. This driver implementation has been based on the zd1211rw
    driver.

And the list of changes I made:

* rewrote the commit log

* change MAINTAINER to be only for plfxlc

* CONFIG_PURELIFI_XLC -> CONFIG_PLFXLC

* purelifi_xlc.ko -> plfxlc.ko

* improve Kconfig help text

* _LF_X_CHIP_H -> PLFXLC_CHIP_H

* PURELIFI_MAC_H -> PLFXLC_MAC_H

* _PURELIFI_USB_H -> PLFXLC_USB_H

* upload_mac_and_serial() -> plfxlc_upload_mac_and_serial()

Unless I don't get any comments I'm planning to merge this on Wednesday.

-- 
https://patchwork.kernel.org/project/linux-wireless/patch/20220224182042.132466-3-srini.raju@purelifi.com/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


^ permalink raw reply

* Re: [PATCH v5 09/11] net: ieee802154: atusb: Call _xmit_error() when a transmission fails
From: Alexander Aring @ 2022-04-25 13:05 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Stefan Schmidt, linux-wpan - ML, David S. Miller, Jakub Kicinski,
	open list:NETWORKING [GENERAL], David Girault, Romuald Despres,
	Frederic Blain, Nicolas Schodet, Thomas Petazzoni
In-Reply-To: <CAB_54W5ovb9=rWB-H9oZygWuQpLSG58XFtgniNn9eDh51BBQNw@mail.gmail.com>

Hi,

On Mon, Apr 25, 2022 at 8:35 AM Alexander Aring <alex.aring@gmail.com> wrote:
>
> Hi,
>
> On Thu, Apr 7, 2022 at 4:06 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> >
> > Hi Alexander,
> >
> > alex.aring@gmail.com wrote on Wed, 6 Apr 2022 17:58:59 -0400:
> >
> > > Hi,
> > >
> > > On Wed, Apr 6, 2022 at 11:34 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> > > >
> > > > ieee802154_xmit_error() is the right helper to call when a transmission
> > > > has failed. Let's use it instead of open-coding it.
> > > >
> > > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > > > ---
> > > >  drivers/net/ieee802154/atusb.c | 5 ++---
> > > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/net/ieee802154/atusb.c b/drivers/net/ieee802154/atusb.c
> > > > index f27a5f535808..d04db4d07a64 100644
> > > > --- a/drivers/net/ieee802154/atusb.c
> > > > +++ b/drivers/net/ieee802154/atusb.c
> > > > @@ -271,9 +271,8 @@ static void atusb_tx_done(struct atusb *atusb, u8 seq)
> > > >                  * unlikely case now that seq == expect is then true, but can
> > > >                  * happen and fail with a tx_skb = NULL;
> > > >                  */
> > > > -               ieee802154_wake_queue(atusb->hw);
> > > > -               if (atusb->tx_skb)
> > > > -                       dev_kfree_skb_irq(atusb->tx_skb);
> > > > +               ieee802154_xmit_error(atusb->hw, atusb->tx_skb,
> > > > +                                     IEEE802154_SYSTEM_ERROR);
> > >
> > > That should then call the xmit_error for ANY other reason which is not
> > > 802.15.4 specific which is the bus_error() function?
> >
> > I'll drop the bus error function so we can stick to a regular
> > _xmit_error() call.
> >
>
> Okay, this error is only hardware related.
>
> > Besides, we do not have any trac information nor any easy access to
> > what failed exactly, so it's probably best anyway.
>
> This is correct, Somebody needs to write support for it for atusb firmware. [0]
> However some transceivers can't yet or will never (because lack of
> functionality?) report such errors back... they will act a little bit
> weird.
>
> However, this return value is a BIG step moving into the right
> direction that other drivers can follow.
>
> I think for MLME ops we can definitely handle some transmit errors now
> and retry so that we don't wait for anything when we know the
> transceiver was never submitting.
>

s/submitting/transmitted/

I could more deeper into that topic:

1.

In my opinion this result value was especially necessary for MLME-ops,
for others which do not directly work with MAC... they provide an
upper layer protocol if they want something reliable.

2.

Later on we can do statistics like what was already going around in
the linux-wpan community to have something like whatever dump to see
all neighbors and see how many ack failures there, etc. Some people
want to make some predictions about link quality here. The kernel
should therefore only capture some stats and the $WHATEVER userspace
capable netlink monitor daemon should make some heuristic by dumping
those stats.

3.

Sometimes even IP capable protocols (using 6LoWPAN) want to know if
ack was received or not, as mentioned. But this required additional
handling in the socket layers... I didn't look into that topic yet but
I know wireless solved it because they have some similar problems.

- Alex

^ permalink raw reply

* Re: [PATCH] batman-adv: remove unnecessary type castings
From: Sven Eckelmann @ 2022-04-25 12:50 UTC (permalink / raw)
  To: mareklindner, sw, a, davem, kuba, pabeni, Yu Zhe
  Cc: b.a.t.m.a.n, netdev, linux-kernel, liqiong, kernel-janitors,
	Yu Zhe
In-Reply-To: <20220425113635.1609532-1-yuzhe@nfschina.com>

[-- Attachment #1: Type: text/plain, Size: 842 bytes --]

On Monday, 25 April 2022 13:36:35 CEST Yu Zhe wrote:
> remove unnecessary void* type castings.
> 
> Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
> ---
>  net/batman-adv/translation-table.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

If you send a second version then please use `git format-patch -v2 ...` to 
format the patch. Now it looks in patchworks like you've resent the first 
version again. And please also add a little changelog after "---" which 
explains what you've changed. It is trivial in this little patch but still 
might be useful.

Regarding the patch: Now you've removed bridge_loop_avoidance.c + 
batadv_choose_tt instead of fixing your patch. I would really prefer this 
patch version:

https://git.open-mesh.org/linux-merge.git/commitdiff/8864d2fcf04385cabb8c8bb159f1f2ba5790cf71

Kind regards,
	Sven

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v6 00/10] ieee802154: Better Tx error handling
From: Alexander Aring @ 2022-04-25 12:37 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Stefan Schmidt, linux-wpan - ML, David S. Miller, Jakub Kicinski,
	open list:NETWORKING [GENERAL], David Girault, Romuald Despres,
	Frederic Blain, Nicolas Schodet, Thomas Petazzoni
In-Reply-To: <20220407100903.1695973-1-miquel.raynal@bootlin.com>

Hi,

On Thu, Apr 7, 2022 at 6:09 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> The idea here is to provide a fully synchronous Tx API and also be able
> to be sure that a transfer has finished. This will be used later by
> another series. However, while working on this task, it appeared
> necessary to first rework the way MLME errors were (not) propagated to
> the upper layers. This small series tries to tackle exactly that, before
> introducing the synchronous API.
>

Acked-by: Alexander Aring <aahringo@redhat.com>

Thanks!

- Alex

^ permalink raw reply

* Re: [PATCH v5 09/11] net: ieee802154: atusb: Call _xmit_error() when a transmission fails
From: Alexander Aring @ 2022-04-25 12:35 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Stefan Schmidt, linux-wpan - ML, David S. Miller, Jakub Kicinski,
	open list:NETWORKING [GENERAL], David Girault, Romuald Despres,
	Frederic Blain, Nicolas Schodet, Thomas Petazzoni
In-Reply-To: <20220407100646.049467af@xps13>

Hi,

On Thu, Apr 7, 2022 at 4:06 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> Hi Alexander,
>
> alex.aring@gmail.com wrote on Wed, 6 Apr 2022 17:58:59 -0400:
>
> > Hi,
> >
> > On Wed, Apr 6, 2022 at 11:34 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> > >
> > > ieee802154_xmit_error() is the right helper to call when a transmission
> > > has failed. Let's use it instead of open-coding it.
> > >
> > > Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
> > > ---
> > >  drivers/net/ieee802154/atusb.c | 5 ++---
> > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/net/ieee802154/atusb.c b/drivers/net/ieee802154/atusb.c
> > > index f27a5f535808..d04db4d07a64 100644
> > > --- a/drivers/net/ieee802154/atusb.c
> > > +++ b/drivers/net/ieee802154/atusb.c
> > > @@ -271,9 +271,8 @@ static void atusb_tx_done(struct atusb *atusb, u8 seq)
> > >                  * unlikely case now that seq == expect is then true, but can
> > >                  * happen and fail with a tx_skb = NULL;
> > >                  */
> > > -               ieee802154_wake_queue(atusb->hw);
> > > -               if (atusb->tx_skb)
> > > -                       dev_kfree_skb_irq(atusb->tx_skb);
> > > +               ieee802154_xmit_error(atusb->hw, atusb->tx_skb,
> > > +                                     IEEE802154_SYSTEM_ERROR);
> >
> > That should then call the xmit_error for ANY other reason which is not
> > 802.15.4 specific which is the bus_error() function?
>
> I'll drop the bus error function so we can stick to a regular
> _xmit_error() call.
>

Okay, this error is only hardware related.

> Besides, we do not have any trac information nor any easy access to
> what failed exactly, so it's probably best anyway.

This is correct, Somebody needs to write support for it for atusb firmware. [0]
However some transceivers can't yet or will never (because lack of
functionality?) report such errors back... they will act a little bit
weird.

However, this return value is a BIG step moving into the right
direction that other drivers can follow.

I think for MLME ops we can definitely handle some transmit errors now
and retry so that we don't wait for anything when we know the
transceiver was never submitting.

- Alex

[0] http://projects.qi-hardware.com/index.php/p/ben-wpan/source/tree/master/atusb/fw

^ permalink raw reply

* Re: [PATCH v5 06/11] net: ieee802154: at86rf230: Rename the asynchronous error helper
From: Alexander Aring @ 2022-04-25 12:22 UTC (permalink / raw)
  To: Miquel Raynal
  Cc: Stefan Schmidt, linux-wpan - ML, David S. Miller, Jakub Kicinski,
	open list:NETWORKING [GENERAL], David Girault, Romuald Despres,
	Frederic Blain, Nicolas Schodet, Thomas Petazzoni
In-Reply-To: <20220407100513.4b8378b6@xps13>

Hi,

On Thu, Apr 7, 2022 at 4:05 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
>
> Hi Alexander,
>
> alex.aring@gmail.com wrote on Wed, 6 Apr 2022 17:57:41 -0400:
>
> > Hi,
> >
> > On Wed, Apr 6, 2022 at 11:34 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote:
> > >
> > > In theory there are two distinct error path:
> > > - The bus error when forwarding a packet to the transceiver fails.
> > > - The transmitter error, after the transmission has been offloaded.
> > >
> > > Right now in this driver only the former situation is properly handled,
> > > so rename the different helpers to reflect this situation before
> > > improving the support of the other path.
> > >
> >
> > I have no idea what I should think about this patch.
> >
> > On the driver layer there only exists "bus errors" okay, whatever
> > error because spi_async() returns an error and we try to recover from
> > it. Also async_error() will be called when there is a timeout because
> > the transceiver took too long for some state change... In this case
> > most often this async_error() is called if spi_async() returns an
> > error but as I said it's not always the case (e.g. timeout)... it is
> > some kind of hardware issue (indicated by 802.15.4 SYSTEM_ERROR for
> > upper layer) and probably if it occurs we can't recover anyway from it
> > (maybe rfkill support can do it, which does a whole transceiver reset
> > routine, but this is always user triggered so far I know).
> >
> > However if you want that patch in that's it's fine for me, but for me
> > this if somebody looks closely into the code it's obvious that in most
> > cases it's called when spi_async() returns an error (which is not
> > always the case see timeout).
>
> I thought it would clarify the situation but I overlooked the timeout
> situation. Actually I did wrote it before understanding what was wrong
> with the patch coming next (I assume my new approach is fine?), and
> the two changes are fully independent, so I'll drop this patch too.
>

new patch is perfect, I like hw_error().

- Alex

^ permalink raw reply

* Re: [PATCH] FDDI: defxx: simplify if-if to if-else
From: Andrew Lunn @ 2022-04-25 12:06 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Wan Jiabing, David S. Miller, Jakub Kicinski, Paolo Abeni, netdev,
	linux-kernel, kael_w
In-Reply-To: <alpine.DEB.2.21.2204250009240.9383@angie.orcam.me.uk>

On Mon, Apr 25, 2022 at 12:26:10AM +0100, Maciej W. Rozycki wrote:
> On Mon, 25 Apr 2022, Andrew Lunn wrote:
> 
> > >  NAK.  The first conditional optionally sets `bp->mmio = false', which 
> > > changes the value of `dfx_use_mmio' in some configurations:
> > > 
> > > #if defined(CONFIG_EISA) || defined(CONFIG_PCI)
> > > #define dfx_use_mmio bp->mmio
> > > #else
> > > #define dfx_use_mmio true
> > > #endif
> > 
> > Which is just asking for trouble like this.
> > 
> > Could i suggest dfx_use_mmio is changed to DFX_USE_MMIO to give a hint
> > something horrible is going on.
> 
>  There's legacy behind it, `dfx_use_mmio' used to be a proper variable and 
> references were retained not to obfuscate the changes that ultimately led 
> to the current arrangement.  I guess at this stage it could be changed to 
> a function-like macro or a static inline function taking `bp' as the 
> argument.

Yes, something like that would be good.

> > It probably won't stop the robots finding this if (x) if (!x), but
> > there is a chance the robot drivers will wonder why it is upper case.
> 
>  Well, blindly relying on automation is bound to cause trouble.

Unfortunately, there are a number of bot drivers who do blindly rely
on automation. We have had to undo the same broken bot driven changes
a few times, and ended up adding extra comments to catch the eye of
the bot drivers.

    Andrew

^ permalink raw reply

* Re: [PATCH net v2 0/2] ixgbe: fix promiscuous mode on VF
From: Olivier Matz @ 2022-04-25 11:51 UTC (permalink / raw)
  To: netdev
  Cc: Jesse Brandeburg, Tony Nguyen, David S . Miller, Jakub Kicinski,
	Paolo Abeni, intel-wired-lan, Paul Menzel, stable,
	Nicolas Dichtel
In-Reply-To: <20220406095252.22338-1-olivier.matz@6wind.com>

Hi,

On Wed, Apr 06, 2022 at 11:52:50AM +0200, Olivier Matz wrote:
> These 2 patches fix issues related to the promiscuous mode on VF.
> 
> Comments are welcome,
> Olivier
> 
> Cc: stable@vger.kernel.org
> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> 
> Changes since v1:
> - resend with CC intel-wired-lan
> - remove CC Hiroshi Shimamoto (address does not exist anymore)
> 
> Olivier Matz (2):
>   ixgbe: fix bcast packets Rx on VF after promisc removal
>   ixgbe: fix unexpected VLAN Rx in promisc mode on VF
> 
>  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

Any feedback about this patchset?
Comments are welcome.

Thanks,
Olivier

^ permalink raw reply

* Re: [PATCH net-next v3 1/2] rtnetlink: add extack support in fdb del handlers
From: Alaa Mohamed @ 2022-04-25 11:51 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Nikolay Aleksandrov, netdev, outreachy, roopa, jdenham, sbrivio,
	jesse.brandeburg, anthony.l.nguyen, davem, kuba, pabeni,
	vladimir.oltean, claudiu.manoil, alexandre.belloni, shshaikh,
	manishc, intel-wired-lan, linux-kernel, UNGLinuxDriver,
	GR-Linux-NIC-Dev, bridge
In-Reply-To: <alpine.DEB.2.22.394.2204250808280.2759@hadrien>


On ٢٥‏/٤‏/٢٠٢٢ ٠٨:١١, Julia Lawall wrote:
>
> On Sun, 24 Apr 2022, Alaa Mohamed wrote:
>
>> On ٢٤/٤/٢٠٢٢ ٢١:٥٥, Nikolay Aleksandrov wrote:
>>> On 24/04/2022 22:49, Alaa Mohamed wrote:
>>>> On ٢٤/٤/٢٠٢٢ ٢١:٠٢, Nikolay Aleksandrov wrote:
>>>>> On 24/04/2022 15:09, Alaa Mohamed wrote:
>>>>>> Add extack support to .ndo_fdb_del in netdevice.h and
>>>>>> all related methods.
>>>>>>
>>>>>> Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
>>>>>> ---
>>>>>> changes in V3:
>>>>>>            fix errors reported by checkpatch.pl
>>>>>> ---
>>>>>>     drivers/net/ethernet/intel/ice/ice_main.c        | 4 ++--
>>>>>>     drivers/net/ethernet/mscc/ocelot_net.c           | 4 ++--
>>>>>>     drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 2 +-
>>>>>>     drivers/net/macvlan.c                            | 2 +-
>>>>>>     drivers/net/vxlan/vxlan_core.c                   | 2 +-
>>>>>>     include/linux/netdevice.h                        | 2 +-
>>>>>>     net/bridge/br_fdb.c                              | 2 +-
>>>>>>     net/bridge/br_private.h                          | 2 +-
>>>>>>     net/core/rtnetlink.c                             | 4 ++--
>>>>>>     9 files changed, 12 insertions(+), 12 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c
>>>>>> b/drivers/net/ethernet/intel/ice/ice_main.c
>>>>>> index d768925785ca..7b55d8d94803 100644
>>>>>> --- a/drivers/net/ethernet/intel/ice/ice_main.c
>>>>>> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
>>>>>> @@ -5678,10 +5678,10 @@ ice_fdb_add(struct ndmsg *ndm, struct nlattr
>>>>>> __always_unused *tb[],
>>>>>>     static int
>>>>>>     ice_fdb_del(struct ndmsg *ndm, __always_unused struct nlattr *tb[],
>>>>>>             struct net_device *dev, const unsigned char *addr,
>>>>>> -        __always_unused u16 vid)
>>>>>> +        __always_unused u16 vid, struct netlink_ext_ack *extack)
>>>>>>     {
>>>>>>         int err;
>>>>>> -
>>>>>> +
>>>>> What's changed here?
>>>> In the previous version, I removed the blank line after "int err;" and you
>>>> said I shouldn't so I added blank line.
>>>>
>>> Yeah, my question is are you fixing a dos ending or something else?
>>> The blank line is already there, what's wrong with it?
>> No, I didn't.
> OK, so what is the answer to the question about what changed?  It looks
> like you remove a blank line and then add it back.  But that should not
> show up as a difference when you generate the patch.
>
> When you answer a comment, please put a blank line before and after your
> answer.  Otherwise it can be hard to see your answer when it is in the
> middle of a larger patch.
>
>>> The point is it's not nice to mix style fixes and other changes, more so
>>> if nothing is mentioned in the commit message.
>> Got it, So, what should I do to fix it?
> A series?  But it is not clear that any change is needed here at all.
>
> julia

I understood but I mean how to create the patch without this but I will 
google it , Thanks Julia.


>
>>>>>>         if (ndm->ndm_state & NUD_PERMANENT) {
>>>>>>             netdev_err(dev, "FDB only supports static addresses\n");
>>>>>>             return -EINVAL;
>>>>>> diff --git a/drivers/net/ethernet/mscc/ocelot_net.c
>>>>>> b/drivers/net/ethernet/mscc/ocelot_net.c
>>>>>> index 247bc105bdd2..e07c64e3159c 100644
>>>>>> --- a/drivers/net/ethernet/mscc/ocelot_net.c
>>>>>> +++ b/drivers/net/ethernet/mscc/ocelot_net.c
>>>>>> @@ -774,14 +774,14 @@ static int ocelot_port_fdb_add(struct ndmsg
>>>>>> *ndm, struct nlattr *tb[],
>>>>>>
>>>>>>     static int ocelot_port_fdb_del(struct ndmsg *ndm, struct nlattr
>>>>>> *tb[],
>>>>>>                        struct net_device *dev,
>>>>>> -                   const unsigned char *addr, u16 vid)
>>>>>> +                   const unsigned char *addr, u16 vid, struct
>>>>>> netlink_ext_ack *extack)
>>>>>>     {
>>>>>>         struct ocelot_port_private *priv = netdev_priv(dev);
>>>>>>         struct ocelot_port *ocelot_port = &priv->port;
>>>>>>         struct ocelot *ocelot = ocelot_port->ocelot;
>>>>>>         int port = priv->chip_port;
>>>>>>
>>>>>> -    return ocelot_fdb_del(ocelot, port, addr, vid,
>>>>>> ocelot_port->bridge);
>>>>>> +    return ocelot_fdb_del(ocelot, port, addr, vid,
>>>>>> ocelot_port->bridge, extack);
>>>>>>     }
>>>>>>
>>>>>>     static int ocelot_port_fdb_dump(struct sk_buff *skb,
>>>>>> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
>>>>>> b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
>>>>>> index d320567b2cca..51fa23418f6a 100644
>>>>>> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
>>>>>> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c
>>>>>> @@ -368,7 +368,7 @@ static int qlcnic_set_mac(struct net_device
>>>>>> *netdev, void *p)
>>>>>>
>>>>>>     static int qlcnic_fdb_del(struct ndmsg *ndm, struct nlattr *tb[],
>>>>>>                 struct net_device *netdev,
>>>>>> -            const unsigned char *addr, u16 vid)
>>>>>> +            const unsigned char *addr, u16 vid, struct
>>>>>> netlink_ext_ack *extack)
>>>>>>     {
>>>>>>         struct qlcnic_adapter *adapter = netdev_priv(netdev);
>>>>>>         int err = -EOPNOTSUPP;
>>>>>> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
>>>>>> index 069e8824c264..ffd34d9f7049 100644
>>>>>> --- a/drivers/net/macvlan.c
>>>>>> +++ b/drivers/net/macvlan.c
>>>>>> @@ -1017,7 +1017,7 @@ static int macvlan_fdb_add(struct ndmsg *ndm,
>>>>>> struct nlattr *tb[],
>>>>>>
>>>>>>     static int macvlan_fdb_del(struct ndmsg *ndm, struct nlattr *tb[],
>>>>>>                    struct net_device *dev,
>>>>>> -               const unsigned char *addr, u16 vid)
>>>>>> +               const unsigned char *addr, u16 vid, struct
>>>>>> netlink_ext_ack *extack)
>>>>>>     {
>>>>>>         struct macvlan_dev *vlan = netdev_priv(dev);
>>>>>>         int err = -EINVAL;
>>>>>> diff --git a/drivers/net/vxlan/vxlan_core.c
>>>>>> b/drivers/net/vxlan/vxlan_core.c
>>>>>> index de97ff98d36e..cf2f60037340 100644
>>>>>> --- a/drivers/net/vxlan/vxlan_core.c
>>>>>> +++ b/drivers/net/vxlan/vxlan_core.c
>>>>>> @@ -1280,7 +1280,7 @@ int __vxlan_fdb_delete(struct vxlan_dev *vxlan,
>>>>>>     /* Delete entry (via netlink) */
>>>>>>     static int vxlan_fdb_delete(struct ndmsg *ndm, struct nlattr *tb[],
>>>>>>                     struct net_device *dev,
>>>>>> -                const unsigned char *addr, u16 vid)
>>>>>> +                const unsigned char *addr, u16 vid, struct
>>>>>> netlink_ext_ack *extack)
>>>>>>     {
>>>>>>         struct vxlan_dev *vxlan = netdev_priv(dev);
>>>>>>         union vxlan_addr ip;
>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>>>> index 28ea4f8269d4..d0d2a8f33c73 100644
>>>>>> --- a/include/linux/netdevice.h
>>>>>> +++ b/include/linux/netdevice.h
>>>>>> @@ -1509,7 +1509,7 @@ struct net_device_ops {
>>>>>>                                struct nlattr *tb[],
>>>>>>                                struct net_device *dev,
>>>>>>                                const unsigned char *addr,
>>>>>> -                           u16 vid);
>>>>>> +                           u16 vid, struct netlink_ext_ack *extack);
>>>>>>         int            (*ndo_fdb_dump)(struct sk_buff *skb,
>>>>>>                             struct netlink_callback *cb,
>>>>>>                             struct net_device *dev,
>>>>>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
>>>>>> index 6ccda68bd473..5bfce2e9a553 100644
>>>>>> --- a/net/bridge/br_fdb.c
>>>>>> +++ b/net/bridge/br_fdb.c
>>>>>> @@ -1110,7 +1110,7 @@ static int __br_fdb_delete(struct net_bridge
>>>>>> *br,
>>>>>>     /* Remove neighbor entry with RTM_DELNEIGH */
>>>>>>     int br_fdb_delete(struct ndmsg *ndm, struct nlattr *tb[],
>>>>>>               struct net_device *dev,
>>>>>> -          const unsigned char *addr, u16 vid)
>>>>>> +          const unsigned char *addr, u16 vid, struct netlink_ext_ack
>>>>>> *extack)
>>>>>>     {
>>>>>>         struct net_bridge_vlan_group *vg;
>>>>>>         struct net_bridge_port *p = NULL;
>>>>>> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
>>>>>> index 18ccc3d5d296..95348c1c9ce5 100644
>>>>>> --- a/net/bridge/br_private.h
>>>>>> +++ b/net/bridge/br_private.h
>>>>>> @@ -780,7 +780,7 @@ void br_fdb_update(struct net_bridge *br, struct
>>>>>> net_bridge_port *source,
>>>>>>                const unsigned char *addr, u16 vid, unsigned long
>>>>>> flags);
>>>>>>
>>>>>>     int br_fdb_delete(struct ndmsg *ndm, struct nlattr *tb[],
>>>>>> -          struct net_device *dev, const unsigned char *addr, u16
>>>>>> vid);
>>>>>> +          struct net_device *dev, const unsigned char *addr, u16 vid,
>>>>>> struct netlink_ext_ack *extack);
>>>>> This is way too long (111 chars) and checkpatch should've complained
>>>>> about it.
>>>>> WARNING: line length of 111 exceeds 100 columns
>>>>> #234: FILE: net/bridge/br_private.h:782:
>>>>> +          struct net_device *dev, const unsigned char *addr, u16 vid,
>>>>> struct netlink_ext_ack *extack);
>>>> I will fix it.
>>>>
>>>>>>     int br_fdb_add(struct ndmsg *nlh, struct nlattr *tb[], struct
>>>>>> net_device *dev,
>>>>>>                const unsigned char *addr, u16 vid, u16 nlh_flags,
>>>>>>                struct netlink_ext_ack *extack);
>>>>>> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
>>>>>> index 4041b3e2e8ec..99b30ae58a47 100644
>>>>>> --- a/net/core/rtnetlink.c
>>>>>> +++ b/net/core/rtnetlink.c
>>>>>> @@ -4223,7 +4223,7 @@ static int rtnl_fdb_del(struct sk_buff *skb,
>>>>>> struct nlmsghdr *nlh,
>>>>>>             const struct net_device_ops *ops = br_dev->netdev_ops;
>>>>>>
>>>>>>             if (ops->ndo_fdb_del)
>>>>>> -            err = ops->ndo_fdb_del(ndm, tb, dev, addr, vid);
>>>>>> +            err = ops->ndo_fdb_del(ndm, tb, dev, addr, vid, extack);
>>>>>>
>>>>>>             if (err)
>>>>>>                 goto out;
>>>>>> @@ -4235,7 +4235,7 @@ static int rtnl_fdb_del(struct sk_buff *skb,
>>>>>> struct nlmsghdr *nlh,
>>>>>>         if (ndm->ndm_flags & NTF_SELF) {
>>>>>>             if (dev->netdev_ops->ndo_fdb_del)
>>>>>>                 err = dev->netdev_ops->ndo_fdb_del(ndm, tb, dev, addr,
>>>>>> -                               vid);
>>>>>> +                               vid, extack);
>>>>>>             else
>>>>>>                 err = ndo_dflt_fdb_del(ndm, tb, dev, addr, vid);
>>>>>>
>>>>>> --
>>>>>> 2.36.0
>>>>>>
> >

^ permalink raw reply

* Re: [PATCH net-next v3 1/2] rtnetlink: add extack support in fdb del handlers
From: Alaa Mohamed @ 2022-04-25 11:47 UTC (permalink / raw)
  To: Nikolay Aleksandrov, netdev
  Cc: outreachy, roopa, jdenham, sbrivio, jesse.brandeburg,
	anthony.l.nguyen, davem, kuba, pabeni, vladimir.oltean,
	claudiu.manoil, alexandre.belloni, shshaikh, manishc,
	intel-wired-lan, linux-kernel, UNGLinuxDriver, GR-Linux-NIC-Dev,
	bridge
In-Reply-To: <0f1e1250-920a-c7d1-900c-98ef3e0456d8@blackwall.org>


On ٢٤‏/٤‏/٢٠٢٢ ٢٣:٥٢, Nikolay Aleksandrov wrote:
> On 4/25/22 00:09, Alaa Mohamed wrote:
>>
>> On ٢٤‏/٤‏/٢٠٢٢ ٢١:٥٥, Nikolay Aleksandrov wrote:
>>> On 24/04/2022 22:49, Alaa Mohamed wrote:
>>>> On ٢٤‏/٤‏/٢٠٢٢ ٢١:٠٢, Nikolay Aleksandrov wrote:
>>>>> On 24/04/2022 15:09, Alaa Mohamed wrote:
>>>>>> Add extack support to .ndo_fdb_del in netdevice.h and
>>>>>> all related methods.
>>>>>>
>>>>>> Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
>>>>>> ---
>>>>>> changes in V3:
>>>>>>           fix errors reported by checkpatch.pl
>>>>>> ---
>>>>>>    drivers/net/ethernet/intel/ice/ice_main.c        | 4 ++--
>>>>>>    drivers/net/ethernet/mscc/ocelot_net.c           | 4 ++--
>>>>>>    drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 2 +-
>>>>>>    drivers/net/macvlan.c                            | 2 +-
>>>>>>    drivers/net/vxlan/vxlan_core.c                   | 2 +-
>>>>>>    include/linux/netdevice.h                        | 2 +-
>>>>>>    net/bridge/br_fdb.c                              | 2 +-
>>>>>>    net/bridge/br_private.h                          | 2 +-
>>>>>>    net/core/rtnetlink.c                             | 4 ++--
>>>>>>    9 files changed, 12 insertions(+), 12 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c 
>>>>>> b/drivers/net/ethernet/intel/ice/ice_main.c
>>>>>> index d768925785ca..7b55d8d94803 100644
>>>>>> --- a/drivers/net/ethernet/intel/ice/ice_main.c
>>>>>> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
>>>>>> @@ -5678,10 +5678,10 @@ ice_fdb_add(struct ndmsg *ndm, struct 
>>>>>> nlattr __always_unused *tb[],
>>>>>>    static int
>>>>>>    ice_fdb_del(struct ndmsg *ndm, __always_unused struct nlattr 
>>>>>> *tb[],
>>>>>>            struct net_device *dev, const unsigned char *addr,
>>>>>> -        __always_unused u16 vid)
>>>>>> +        __always_unused u16 vid, struct netlink_ext_ack *extack)
>>>>>>    {
>>>>>>        int err;
>>>>>> -
>>>>>> +
>>>>> What's changed here?
>>>> In the previous version, I removed the blank line after "int err;" 
>>>> and you said I shouldn't so I added blank line.
>>>>
>>> Yeah, my question is are you fixing a dos ending or something else?
>>> The blank line is already there, what's wrong with it?
>> No, I didn't.
>>>
>>> The point is it's not nice to mix style fixes and other changes, 
>>> more so
>>> if nothing is mentioned in the commit message.
>> Got it, So, what should I do to fix it?
>
> Don't change that line? I mean I'm even surprised this made it in the 
> patch. As I mentioned above, there is already a new line there so I'm 
> not sure how you're removing it and adding it again. :)
>
> Cheers,
>  Nik


Thanks Nik, I will fix this.


^ permalink raw reply

* Re: [ovs-dev] [PATCH] openvswitch: Ensure nf_ct_put is not called with null pointer
From: Greg KH @ 2022-04-25 11:44 UTC (permalink / raw)
  To: Ilya Maximets
  Cc: Florian Westphal, Mark Mielke, dev, netdev, stable,
	Jakub Kicinski, Paolo Abeni, David S. Miller, Pablo Neira Ayuso,
	Antti Antinoja
In-Reply-To: <590d44a1-ca27-c171-de87-fe57fc07dff5@ovn.org>

On Mon, Apr 25, 2022 at 12:36:54PM +0200, Ilya Maximets wrote:
> On 4/10/22 17:41, Florian Westphal wrote:
> > Mark Mielke <mark.mielke@gmail.com> wrote:
> >> A recent commit replaced calls to nf_conntrack_put() with calls
> >> to nf_ct_put(). nf_conntrack_put() permitted the caller to pass
> >> null without side effects, while nf_ct_put() performs WARN_ON()
> >> and proceeds to try and de-reference the pointer. ovs-vswitchd
> >> triggers the warning on startup:
> >>
> >> [   22.178881] WARNING: CPU: 69 PID: 2157 at include/net/netfilter/nf_conntrack.h:176 __ovs_ct_lookup+0x4e2/0x6a0 [openvswitch]
> >> ...
> >> [   22.213573] Call Trace:
> >> [   22.214318]  <TASK>
> >> [   22.215064]  ovs_ct_execute+0x49c/0x7f0 [openvswitch]
> >> ...
> >> Cc: stable@vger.kernel.org
> >> Fixes: 408bdcfce8df ("net: prefer nf_ct_put instead of nf_conntrack_put")
> > 
> > Actually, no.  As Pablo Neira just pointed out to me Upstream kernel is fine.
> > The preceeding commit made nf_ct_out() a noop when ct is NULL.
> 
> Hi, Florian.
> 
> There is a problem on 5.15 longterm tree where the offending commit
> got backported, but the previous one was not, so it triggers an issue
> while loading the openvswitch module.
> 
> To be more clear, v5.15.35 contains the following commit:
>   408bdcfce8df ("net: prefer nf_ct_put instead of nf_conntrack_put")
> backported as commit 72dd9e61fa319bc44020c2d365275fc8f6799bff, but
> it doesn't have the previous one:
>   6ae7989c9af0 ("netfilter: conntrack: avoid useless indirection during conntrack destruction")
> that adds the NULL pointer check to the nf_ct_put().
> 
> Either 6ae7989c9af0 should be backported to 5.15 or 72dd9e61fa31
> reverted on that tree.

I've backported the needed commit now, thanks.

greg k-h

^ permalink raw reply

* Re: [PATCH net v3] virtio_net: fix wrong buf address calculation when using xdp
From: Xuan Zhuo @ 2022-04-25 11:37 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: kuba, davem, Nikolay Aleksandrov, stable, Jason Wang,
	Daniel Borkmann, Michael S. Tsirkin, virtualization, netdev
In-Reply-To: <20220425103703.3067292-1-razor@blackwall.org>

On Mon, 25 Apr 2022 13:37:03 +0300, Nikolay Aleksandrov <razor@blackwall.org> wrote:
> We received a report[1] of kernel crashes when Cilium is used in XDP
> mode with virtio_net after updating to newer kernels. After
> investigating the reason it turned out that when using mergeable bufs
> with an XDP program which adjusts xdp.data or xdp.data_meta page_to_buf()
> calculates the build_skb address wrong because the offset can become less
> than the headroom so it gets the address of the previous page (-X bytes
> depending on how lower offset is):
>  page_to_skb: page addr ffff9eb2923e2000 buf ffff9eb2923e1ffc offset 252 headroom 256
>
> This is a pr_err() I added in the beginning of page_to_skb which clearly
> shows offset that is less than headroom by adding 4 bytes of metadata
> via an xdp prog. The calculations done are:
>  receive_mergeable():
>  headroom = VIRTIO_XDP_HEADROOM; // VIRTIO_XDP_HEADROOM == 256 bytes
>  offset = xdp.data - page_address(xdp_page) -
>           vi->hdr_len - metasize;
>
>  page_to_skb():
>  p = page_address(page) + offset;
>  ...
>  buf = p - headroom;
>
> Now buf goes -4 bytes from the page's starting address as can be seen
> above which is set as skb->head and skb->data by build_skb later. Depending
> on what's done with the skb (when it's freed most often) we get all kinds
> of corruptions and BUG_ON() triggers in mm[2]. We have to recalculate
> the new headroom after the xdp program has run, similar to how offset
> and len are recalculated. Headroom is directly related to
> data_hard_start, data and data_meta, so we use them to get the new size.
> The result is correct (similar pr_err() in page_to_skb, one case of
> xdp_page and one case of virtnet buf):
>  a) Case with 4 bytes of metadata
>  [  115.949641] page_to_skb: page addr ffff8b4dcfad2000 offset 252 headroom 252
>  [  121.084105] page_to_skb: page addr ffff8b4dcf018000 offset 20732 headroom 252
>  b) Case of pushing data +32 bytes
>  [  153.181401] page_to_skb: page addr ffff8b4dd0c4d000 offset 288 headroom 288
>  [  158.480421] page_to_skb: page addr ffff8b4dd00b0000 offset 24864 headroom 288
>  c) Case of pushing data -33 bytes
>  [  835.906830] page_to_skb: page addr ffff8b4dd3270000 offset 223 headroom 223
>  [  840.839910] page_to_skb: page addr ffff8b4dcdd68000 offset 12511 headroom 223
>
> Offset and headroom are equal because offset points to the start of
> reserved bytes for the virtio_net header which are at buf start +
> headroom, while data points at buf start + vnet hdr size + headroom so
> when data or data_meta are adjusted by the xdp prog both the headroom size
> and the offset change equally. We can use data_hard_start to compute the
> new headroom after the xdp prog (linearized / page start case, the
> virtnet buf case is similar just with bigger base offset):
>  xdp.data_hard_start = page_address + vnet_hdr
>  xdp.data = page_address + vnet_hdr + headroom
>  new headroom after xdp prog = xdp.data - xdp.data_hard_start - metasize
>
> An example reproducer xdp prog[3] is below.
>
> [1] https://github.com/cilium/cilium/issues/19453
>
> [2] Two of the many traces:
>  [   40.437400] BUG: Bad page state in process swapper/0  pfn:14940
>  [   40.916726] BUG: Bad page state in process systemd-resolve  pfn:053b7
>  [   41.300891] kernel BUG at include/linux/mm.h:720!
>  [   41.301801] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>  [   41.302784] CPU: 1 PID: 1181 Comm: kubelet Kdump: loaded Tainted: G    B   W         5.18.0-rc1+ #37
>  [   41.304458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
>  [   41.306018] RIP: 0010:page_frag_free+0x79/0xe0
>  [   41.306836] Code: 00 00 75 ea 48 8b 07 a9 00 00 01 00 74 e0 48 8b 47 48 48 8d 50 ff a8 01 48 0f 45 fa eb d0 48 c7 c6 18 b8 30 a6 e8 d7 f8 fc ff <0f> 0b 48 8d 78 ff eb bc 48 8b 07 a9 00 00 01 00 74 3a 66 90 0f b6
>  [   41.310235] RSP: 0018:ffffac05c2a6bc78 EFLAGS: 00010292
>  [   41.311201] RAX: 000000000000003e RBX: 0000000000000000 RCX: 0000000000000000
>  [   41.312502] RDX: 0000000000000001 RSI: ffffffffa6423004 RDI: 00000000ffffffff
>  [   41.313794] RBP: ffff993c98823600 R08: 0000000000000000 R09: 00000000ffffdfff
>  [   41.315089] R10: ffffac05c2a6ba68 R11: ffffffffa698ca28 R12: ffff993c98823600
>  [   41.316398] R13: ffff993c86311ebc R14: 0000000000000000 R15: 000000000000005c
>  [   41.317700] FS:  00007fe13fc56740(0000) GS:ffff993cdd900000(0000) knlGS:0000000000000000
>  [   41.319150] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  [   41.320152] CR2: 000000c00008a000 CR3: 0000000014908000 CR4: 0000000000350ee0
>  [   41.321387] Call Trace:
>  [   41.321819]  <TASK>
>  [   41.322193]  skb_release_data+0x13f/0x1c0
>  [   41.322902]  __kfree_skb+0x20/0x30
>  [   41.343870]  tcp_recvmsg_locked+0x671/0x880
>  [   41.363764]  tcp_recvmsg+0x5e/0x1c0
>  [   41.384102]  inet_recvmsg+0x42/0x100
>  [   41.406783]  ? sock_recvmsg+0x1d/0x70
>  [   41.428201]  sock_read_iter+0x84/0xd0
>  [   41.445592]  ? 0xffffffffa3000000
>  [   41.462442]  new_sync_read+0x148/0x160
>  [   41.479314]  ? 0xffffffffa3000000
>  [   41.496937]  vfs_read+0x138/0x190
>  [   41.517198]  ksys_read+0x87/0xc0
>  [   41.535336]  do_syscall_64+0x3b/0x90
>  [   41.551637]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>  [   41.568050] RIP: 0033:0x48765b
>  [   41.583955] Code: e8 4a 35 fe ff eb 88 cc cc cc cc cc cc cc cc e8 fb 7a fe ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
>  [   41.632818] RSP: 002b:000000c000a2f5b8 EFLAGS: 00000212 ORIG_RAX: 0000000000000000
>  [   41.664588] RAX: ffffffffffffffda RBX: 000000c000062000 RCX: 000000000048765b
>  [   41.681205] RDX: 0000000000005e54 RSI: 000000c000e66000 RDI: 0000000000000016
>  [   41.697164] RBP: 000000c000a2f608 R08: 0000000000000001 R09: 00000000000001b4
>  [   41.713034] R10: 00000000000000b6 R11: 0000000000000212 R12: 00000000000000e9
>  [   41.728755] R13: 0000000000000001 R14: 000000c000a92000 R15: ffffffffffffffff
>  [   41.744254]  </TASK>
>  [   41.758585] Modules linked in: br_netfilter bridge veth netconsole virtio_net
>
>  and
>
>  [   33.524802] BUG: Bad page state in process systemd-network  pfn:11e60
>  [   33.528617] page ffffe05dc0147b00 ffffe05dc04e7a00 ffff8ae9851ec000 (1) len 82 offset 252 metasize 4 hroom 0 hdr_len 12 data ffff8ae9851ec10c data_meta ffff8ae9851ec108 data_end ffff8ae9851ec14e
>  [   33.529764] page:000000003792b5ba refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x11e60
>  [   33.532463] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
>  [   33.532468] raw: 000fffffc0000000 0000000000000000 dead000000000122 0000000000000000
>  [   33.532470] raw: 0000000000000000 0000000000000000 00000000fffffdff 0000000000000000
>  [   33.532471] page dumped because: nonzero mapcount
>  [   33.532472] Modules linked in: br_netfilter bridge veth netconsole virtio_net
>  [   33.532479] CPU: 0 PID: 791 Comm: systemd-network Kdump: loaded Not tainted 5.18.0-rc1+ #37
>  [   33.532482] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
>  [   33.532484] Call Trace:
>  [   33.532496]  <TASK>
>  [   33.532500]  dump_stack_lvl+0x45/0x5a
>  [   33.532506]  bad_page.cold+0x63/0x94
>  [   33.532510]  free_pcp_prepare+0x290/0x420
>  [   33.532515]  free_unref_page+0x1b/0x100
>  [   33.532518]  skb_release_data+0x13f/0x1c0
>  [   33.532524]  kfree_skb_reason+0x3e/0xc0
>  [   33.532527]  ip6_mc_input+0x23c/0x2b0
>  [   33.532531]  ip6_sublist_rcv_finish+0x83/0x90
>  [   33.532534]  ip6_sublist_rcv+0x22b/0x2b0
>
> [3] XDP program to reproduce(xdp_pass.c):
>  #include <linux/bpf.h>
>  #include <bpf/bpf_helpers.h>
>
>  SEC("xdp_pass")
>  int xdp_pkt_pass(struct xdp_md *ctx)
>  {
>           bpf_xdp_adjust_head(ctx, -(int)32);
>           return XDP_PASS;
>  }
>
>  char _license[] SEC("license") = "GPL";
>
>  compile: clang -O2 -g -Wall -target bpf -c xdp_pass.c -o xdp_pass.o
>  load on virtio_net: ip link set enp1s0 xdpdrv obj xdp_pass.o sec xdp_pass
>
> CC: stable@vger.kernel.org
> CC: Jason Wang <jasowang@redhat.com>
> CC: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> CC: Daniel Borkmann <daniel@iogearbox.net>
> CC: "Michael S. Tsirkin" <mst@redhat.com>
> CC: virtualization@lists.linux-foundation.org
> Fixes: 8fb7da9e9907 ("virtio_net: get build_skb() buf by data ptr")
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>

Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

> ---
> v3: Add a comment explaining why offset and headroom are equal,
>     no code changes
> v2: Recalculate headroom based on data, data_hard_start and data_meta
>
>  drivers/net/virtio_net.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 87838cbe38cf..cbba9d2e8f32 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1005,6 +1005,24 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  			 * xdp.data_meta were adjusted
>  			 */
>  			len = xdp.data_end - xdp.data + vi->hdr_len + metasize;
> +
> +			/* recalculate headroom if xdp.data or xdp_data_meta
> +			 * were adjusted, note that offset should always point
> +			 * to the start of the reserved bytes for virtio_net
> +			 * header which are followed by xdp.data, that means
> +			 * that offset is equal to the headroom (when buf is
> +			 * starting at the beginning of the page, otherwise
> +			 * there is a base offset inside the page) but it's used
> +			 * with a different starting point (buf start) than
> +			 * xdp.data (buf start + vnet hdr size). If xdp.data or
> +			 * data_meta were adjusted by the xdp prog then the
> +			 * headroom size has changed and so has the offset, we
> +			 * can use data_hard_start, which points at buf start +
> +			 * vnet hdr size, to calculate the new headroom and use
> +			 * it later to compute buf start in page_to_skb()
> +			 */
> +			headroom = xdp.data - xdp.data_hard_start - metasize;
> +
>  			/* We can only create skb based on xdp_page. */
>  			if (unlikely(xdp_page != page)) {
>  				rcu_read_unlock();
> @@ -1012,7 +1030,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  				head_skb = page_to_skb(vi, rq, xdp_page, offset,
>  						       len, PAGE_SIZE, false,
>  						       metasize,
> -						       VIRTIO_XDP_HEADROOM);
> +						       headroom);
>  				return head_skb;
>  			}
>  			break;
> --
> 2.35.1
>

^ permalink raw reply

* [PATCH] batman-adv: remove unnecessary type castings
From: Yu Zhe @ 2022-04-25 11:36 UTC (permalink / raw)
  To: mareklindner, sw, a, sven, davem, kuba, pabeni
  Cc: b.a.t.m.a.n, netdev, linux-kernel, liqiong, kernel-janitors,
	Yu Zhe
In-Reply-To: <20220421154829.9775-1-yuzhe@nfschina.com>

remove unnecessary void* type castings.

Signed-off-by: Yu Zhe <yuzhe@nfschina.com>
---
 net/batman-adv/translation-table.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index 8478034d3abf..cbf96eebf05b 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -2766,7 +2766,7 @@ static void batadv_tt_tvlv_generate(struct batadv_priv *bat_priv,
 	u32 i;
 
 	tt_tot = batadv_tt_entries(tt_len);
-	tt_change = (struct batadv_tvlv_tt_change *)tvlv_buff;
+	tt_change = tvlv_buff;
 
 	if (!valid_cb)
 		return;
@@ -3994,7 +3994,7 @@ static void batadv_tt_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv,
 	if (tvlv_value_len < sizeof(*tt_data))
 		return;
 
-	tt_data = (struct batadv_tvlv_tt_data *)tvlv_value;
+	tt_data = tvlv_value;
 	tvlv_value_len -= sizeof(*tt_data);
 
 	num_vlan = ntohs(tt_data->num_vlan);
@@ -4037,7 +4037,7 @@ static int batadv_tt_tvlv_unicast_handler_v1(struct batadv_priv *bat_priv,
 	if (tvlv_value_len < sizeof(*tt_data))
 		return NET_RX_SUCCESS;
 
-	tt_data = (struct batadv_tvlv_tt_data *)tvlv_value;
+	tt_data = tvlv_value;
 	tvlv_value_len -= sizeof(*tt_data);
 
 	tt_vlan_len = sizeof(struct batadv_tvlv_tt_vlan_data);
@@ -4129,7 +4129,7 @@ static int batadv_roam_tvlv_unicast_handler_v1(struct batadv_priv *bat_priv,
 		goto out;
 
 	batadv_inc_counter(bat_priv, BATADV_CNT_TT_ROAM_ADV_RX);
-	roaming_adv = (struct batadv_tvlv_roam_adv *)tvlv_value;
+	roaming_adv = tvlv_value;
 
 	batadv_dbg(BATADV_DBG_TT, bat_priv,
 		   "Received ROAMING_ADV from %pM (client %pM)\n",
-- 
2.25.1


^ permalink raw reply related

* RE: [Intel-wired-lan] [PATCH net v1] ixgbe: ensure IPsec VF<->PF compatibility
From: Jankowski, Konrad0 @ 2022-04-25 11:34 UTC (permalink / raw)
  To: Leon Romanovsky, David S . Miller, Jakub Kicinski
  Cc: Steffen Klassert, Shannon Nelson, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	Jeff Kirsher, Raed Salem, Shannon Nelson, Paolo Abeni,
	Leon Romanovsky
In-Reply-To: <737616899df2a482e4ec35aa4056c9ac608d2f50.1648714609.git.leonro@nvidia.com>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Leon Romanovsky
> Sent: Thursday, March 31, 2022 10:20 AM
> To: David S . Miller <davem@davemloft.net>; Jakub Kicinski
> <kuba@kernel.org>
> Cc: Steffen Klassert <steffen.klassert@secunet.com>; Shannon Nelson
> <snelson@pensando.io>; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; intel-wired-lan@lists.osuosl.org; Jeff Kirsher
> <jeffrey.t.kirsher@intel.com>; Raed Salem <raeds@nvidia.com>; Shannon
> Nelson <shannon.nelson@oracle.com>; Paolo Abeni <pabeni@redhat.com>;
> Leon Romanovsky <leonro@nvidia.com>
> Subject: [Intel-wired-lan] [PATCH net v1] ixgbe: ensure IPsec VF<->PF
> compatibility
> 
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> The VF driver can forward any IPsec flags and such makes the function is not
> extendable and prone to backward/forward incompatibility.
> 
> If new software runs on VF, it won't know that PF configured something
> completely different as it "knows" only XFRM_OFFLOAD_INBOUND flag.
> 
> Fixes: eda0333ac293 ("ixgbe: add VF IPsec management")
> Reviewed-by: Raed Salem <raeds@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>a
> ---
> Chaagelog:
> v1:
>  * Replaced bits arithmetic with more simple expression
> v0:
> https://lore.kernel.org/all/3702fad8a016170947da5f3c521a9251cf0f4a22.16
> 48637865.git.leonro@nvidia.com
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
> index e596e1a9fc75..69d11ff7677d 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
> @@ -903,7 +903,8 @@ int ixgbe_ipsec_vf_add_sa(struct ixgbe_adapter
> *adapter, u32 *msgbuf, u32 vf)

Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox