Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 1/4] dpaa2-eth: Use a single page per Rx buffer
From: Ilias Apalodimas @ 2019-02-05  9:35 UTC (permalink / raw)
  To: Ioana Ciocoi Radulescu
  Cc: netdev@vger.kernel.org, davem@davemloft.net, Ioana Ciornei,
	brouer@redhat.com
In-Reply-To: <1549299625-28399-2-git-send-email-ruxandra.radulescu@nxp.com>

Hi Ioana, 

Can you share any results on XDP (XDP_DROP is usually useful for the hardware
capabilities).

> Instead of allocating page fragments via the network stack,
> use the page allocator directly. For now, we consume one page
> for each Rx buffer.
> 
> With the new memory model we are free to consider adding more
> XDP support.
> 
> Performance decreases slightly in some IP forwarding cases.
> No visible effect on termination traffic. The driver memory
> footprint increases as a result of this change, but it is
> still small enough to not really matter.
> 
> Another side effect is that now Rx buffer alignment requirements
> are naturally satisfied without any additional actions needed.
> Remove alignment related code, except in the buffer layout
> information conveyed to MC, as hardware still needs to know the
> alignment value we guarantee.
> 
> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
> Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com>
> ---
>  drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c | 61 +++++++++++++-----------
>  drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h | 21 +++-----
>  2 files changed, 38 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> index 04925c7..6e58de6 100644
> --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
> @@ -86,16 +86,16 @@ static void free_rx_fd(struct dpaa2_eth_priv *priv,
>  	for (i = 1; i < DPAA2_ETH_MAX_SG_ENTRIES; i++) {
>  		addr = dpaa2_sg_get_addr(&sgt[i]);
>  		sg_vaddr = dpaa2_iova_to_virt(priv->iommu_domain, addr);
> -		dma_unmap_single(dev, addr, DPAA2_ETH_RX_BUF_SIZE,
> -				 DMA_BIDIRECTIONAL);
> +		dma_unmap_page(dev, addr, DPAA2_ETH_RX_BUF_SIZE,
> +			       DMA_BIDIRECTIONAL);
>  
> -		skb_free_frag(sg_vaddr);
> +		free_pages((unsigned long)sg_vaddr, 0);
>  		if (dpaa2_sg_is_final(&sgt[i]))
>  			break;
>  	}
>  
>  free_buf:
> -	skb_free_frag(vaddr);
> +	free_pages((unsigned long)vaddr, 0);
>  }
>  
>  /* Build a linear skb based on a single-buffer frame descriptor */
> @@ -109,7 +109,7 @@ static struct sk_buff *build_linear_skb(struct dpaa2_eth_channel *ch,
>  
>  	ch->buf_count--;
>  
> -	skb = build_skb(fd_vaddr, DPAA2_ETH_SKB_SIZE);
> +	skb = build_skb(fd_vaddr, DPAA2_ETH_RX_BUF_RAW_SIZE);
>  	if (unlikely(!skb))
>  		return NULL;
>  
> @@ -144,19 +144,19 @@ static struct sk_buff *build_frag_skb(struct dpaa2_eth_priv *priv,
>  		/* Get the address and length from the S/G entry */
>  		sg_addr = dpaa2_sg_get_addr(sge);
>  		sg_vaddr = dpaa2_iova_to_virt(priv->iommu_domain, sg_addr);
> -		dma_unmap_single(dev, sg_addr, DPAA2_ETH_RX_BUF_SIZE,
> -				 DMA_BIDIRECTIONAL);
> +		dma_unmap_page(dev, sg_addr, DPAA2_ETH_RX_BUF_SIZE,
> +			       DMA_BIDIRECTIONAL);
>  
>  		sg_length = dpaa2_sg_get_len(sge);
>  
>  		if (i == 0) {
>  			/* We build the skb around the first data buffer */
> -			skb = build_skb(sg_vaddr, DPAA2_ETH_SKB_SIZE);
> +			skb = build_skb(sg_vaddr, DPAA2_ETH_RX_BUF_RAW_SIZE);
>  			if (unlikely(!skb)) {
>  				/* Free the first SG entry now, since we already
>  				 * unmapped it and obtained the virtual address
>  				 */
> -				skb_free_frag(sg_vaddr);
> +				free_pages((unsigned long)sg_vaddr, 0);
>  
>  				/* We still need to subtract the buffers used
>  				 * by this FD from our software counter
> @@ -211,9 +211,9 @@ static void free_bufs(struct dpaa2_eth_priv *priv, u64 *buf_array, int count)
>  
>  	for (i = 0; i < count; i++) {
>  		vaddr = dpaa2_iova_to_virt(priv->iommu_domain, buf_array[i]);
> -		dma_unmap_single(dev, buf_array[i], DPAA2_ETH_RX_BUF_SIZE,
> -				 DMA_BIDIRECTIONAL);
> -		skb_free_frag(vaddr);
> +		dma_unmap_page(dev, buf_array[i], DPAA2_ETH_RX_BUF_SIZE,
> +			       DMA_BIDIRECTIONAL);
> +		free_pages((unsigned long)vaddr, 0);

I got some hardware/XDP related questions since i have no idea how the hardware
works. From what i understand on the code, you are trying to manage the buffer
from the hw buffer manager and if the fails you unmap and free the pages.
Since XDP relies on recycling for speed (hint check xdp_return_buff()), is it
possible to recycle the buffer if the hw fails ?

>  	}
>  }
>  
> @@ -378,16 +378,16 @@ static void dpaa2_eth_rx(struct dpaa2_eth_priv *priv,
>  			return;
>  		}
>  
> -		dma_unmap_single(dev, addr, DPAA2_ETH_RX_BUF_SIZE,
> -				 DMA_BIDIRECTIONAL);
> +		dma_unmap_page(dev, addr, DPAA2_ETH_RX_BUF_SIZE,
> +			       DMA_BIDIRECTIONAL);
>  		skb = build_linear_skb(ch, fd, vaddr);
>  	} else if (fd_format == dpaa2_fd_sg) {
>  		WARN_ON(priv->xdp_prog);
>  
> -		dma_unmap_single(dev, addr, DPAA2_ETH_RX_BUF_SIZE,
> -				 DMA_BIDIRECTIONAL);
> +		dma_unmap_page(dev, addr, DPAA2_ETH_RX_BUF_SIZE,
> +			       DMA_BIDIRECTIONAL);
>  		skb = build_frag_skb(priv, ch, buf_data);
> -		skb_free_frag(vaddr);
> +		free_pages((unsigned long)vaddr, 0);
>  		percpu_extras->rx_sg_frames++;
>  		percpu_extras->rx_sg_bytes += dpaa2_fd_get_len(fd);
>  	} else {
> @@ -903,7 +903,7 @@ static int add_bufs(struct dpaa2_eth_priv *priv,
>  {
>  	struct device *dev = priv->net_dev->dev.parent;
>  	u64 buf_array[DPAA2_ETH_BUFS_PER_CMD];
> -	void *buf;
> +	struct page *page;
>  	dma_addr_t addr;
>  	int i, err;
>  
> @@ -911,14 +911,16 @@ static int add_bufs(struct dpaa2_eth_priv *priv,
>  		/* Allocate buffer visible to WRIOP + skb shared info +
>  		 * alignment padding
>  		 */
> -		buf = napi_alloc_frag(dpaa2_eth_buf_raw_size(priv));
> -		if (unlikely(!buf))
> +		/* allocate one page for each Rx buffer. WRIOP sees
> +		 * the entire page except for a tailroom reserved for
> +		 * skb shared info
> +		 */
> +		page = dev_alloc_pages(0);
> +		if (!page)
>  			goto err_alloc;
>  
> -		buf = PTR_ALIGN(buf, priv->rx_buf_align);
> -
> -		addr = dma_map_single(dev, buf, DPAA2_ETH_RX_BUF_SIZE,
> -				      DMA_BIDIRECTIONAL);
> +		addr = dma_map_page(dev, page, 0, DPAA2_ETH_RX_BUF_SIZE,
> +				    DMA_BIDIRECTIONAL);
>  		if (unlikely(dma_mapping_error(dev, addr)))
>  			goto err_map;
>  
> @@ -926,7 +928,7 @@ static int add_bufs(struct dpaa2_eth_priv *priv,
>  
>  		/* tracing point */
>  		trace_dpaa2_eth_buf_seed(priv->net_dev,
> -					 buf, dpaa2_eth_buf_raw_size(priv),
> +					 page, DPAA2_ETH_RX_BUF_RAW_SIZE,
>  					 addr, DPAA2_ETH_RX_BUF_SIZE,
>  					 bpid);
>  	}
> @@ -948,7 +950,7 @@ static int add_bufs(struct dpaa2_eth_priv *priv,
>  	return i;
>  
>  err_map:
> -	skb_free_frag(buf);
> +	__free_pages(page, 0);
>  err_alloc:
>  	/* If we managed to allocate at least some buffers,
>  	 * release them to hardware
> @@ -2134,6 +2136,7 @@ static int set_buffer_layout(struct dpaa2_eth_priv *priv)
>  {
>  	struct device *dev = priv->net_dev->dev.parent;
>  	struct dpni_buffer_layout buf_layout = {0};
> +	u16 rx_buf_align;
>  	int err;
>  
>  	/* We need to check for WRIOP version 1.0.0, but depending on the MC
> @@ -2142,9 +2145,9 @@ static int set_buffer_layout(struct dpaa2_eth_priv *priv)
>  	 */
>  	if (priv->dpni_attrs.wriop_version == DPAA2_WRIOP_VERSION(0, 0, 0) ||
>  	    priv->dpni_attrs.wriop_version == DPAA2_WRIOP_VERSION(1, 0, 0))
> -		priv->rx_buf_align = DPAA2_ETH_RX_BUF_ALIGN_REV1;
> +		rx_buf_align = DPAA2_ETH_RX_BUF_ALIGN_REV1;
>  	else
> -		priv->rx_buf_align = DPAA2_ETH_RX_BUF_ALIGN;
> +		rx_buf_align = DPAA2_ETH_RX_BUF_ALIGN;
>  
>  	/* tx buffer */
>  	buf_layout.private_data_size = DPAA2_ETH_SWA_SIZE;
> @@ -2184,7 +2187,7 @@ static int set_buffer_layout(struct dpaa2_eth_priv *priv)
>  	/* rx buffer */
>  	buf_layout.pass_frame_status = true;
>  	buf_layout.pass_parser_result = true;
> -	buf_layout.data_align = priv->rx_buf_align;
> +	buf_layout.data_align = rx_buf_align;
>  	buf_layout.data_head_room = dpaa2_eth_rx_head_room(priv);
>  	buf_layout.private_data_size = 0;
>  	buf_layout.options = DPNI_BUF_LAYOUT_OPT_PARSER_RESULT |
> diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h
> index 31fe486..da3d039 100644
> --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h
> +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.h
> @@ -63,9 +63,11 @@
>  /* Hardware requires alignment for ingress/egress buffer addresses */
>  #define DPAA2_ETH_TX_BUF_ALIGN		64
>  
> -#define DPAA2_ETH_RX_BUF_SIZE		2048
> -#define DPAA2_ETH_SKB_SIZE \
> -	(DPAA2_ETH_RX_BUF_SIZE + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
> +#define DPAA2_ETH_RX_BUF_RAW_SIZE	PAGE_SIZE
> +#define DPAA2_ETH_RX_BUF_TAILROOM \
> +	SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
> +#define DPAA2_ETH_RX_BUF_SIZE \
> +	(DPAA2_ETH_RX_BUF_RAW_SIZE - DPAA2_ETH_RX_BUF_TAILROOM)
>  
>  /* Hardware annotation area in RX/TX buffers */
>  #define DPAA2_ETH_RX_HWA_SIZE		64
> @@ -343,7 +345,6 @@ struct dpaa2_eth_priv {
>  	bool rx_tstamp; /* Rx timestamping enabled */
>  
>  	u16 tx_qdid;
> -	u16 rx_buf_align;
>  	struct fsl_mc_io *mc_io;
>  	/* Cores which have an affine DPIO/DPCON.
>  	 * This is the cpu set on which Rx and Tx conf frames are processed
> @@ -418,15 +419,6 @@ enum dpaa2_eth_rx_dist {
>  	DPAA2_ETH_RX_DIST_CLS
>  };
>  
> -/* Hardware only sees DPAA2_ETH_RX_BUF_SIZE, but the skb built around
> - * the buffer also needs space for its shared info struct, and we need
> - * to allocate enough to accommodate hardware alignment restrictions
> - */
> -static inline unsigned int dpaa2_eth_buf_raw_size(struct dpaa2_eth_priv *priv)
> -{
> -	return DPAA2_ETH_SKB_SIZE + priv->rx_buf_align;
> -}
> -
>  static inline
>  unsigned int dpaa2_eth_needed_headroom(struct dpaa2_eth_priv *priv,
>  				       struct sk_buff *skb)
> @@ -451,8 +443,7 @@ unsigned int dpaa2_eth_needed_headroom(struct dpaa2_eth_priv *priv,
>   */
>  static inline unsigned int dpaa2_eth_rx_head_room(struct dpaa2_eth_priv *priv)
>  {
> -	return priv->tx_data_offset + DPAA2_ETH_TX_BUF_ALIGN -
> -	       DPAA2_ETH_RX_HWA_SIZE;
> +	return priv->tx_data_offset - DPAA2_ETH_RX_HWA_SIZE;
>  }
>  
>  int dpaa2_eth_set_hash(struct net_device *net_dev, u64 flags);
> -- 
> 2.7.4
> 

^ permalink raw reply

* Re: [PATCH net v5 1/2] net/mlx5e: Update hw flows when encap source mac changed
From: Or Gerlitz @ 2019-02-05  9:03 UTC (permalink / raw)
  To: davem@davemloft.net
  Cc: xiangxia.m.yue@gmail.com, netdev@vger.kernel.org, Hadar Hen Zion,
	Saeed Mahameed
In-Reply-To: <6170db12d80e727169dca2d00a0226b91f8f87f0.camel@mellanox.com>

On Wed, Jan 30, 2019 at 12:20 AM Saeed Mahameed <saeedm@mellanox.com> wrote:
>
> On Mon, 2019-01-28 at 15:28 -0800, xiangxia.m.yue@gmail.com wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > When we offload tc filters to hardware, hardware flows can
> > be updated when mac of encap destination ip is changed.
> > But we ignore one case, that the mac of local encap ip can
> > be changed too, so we should also update them.
> >
> > To fix it, add route_dev in mlx5e_encap_entry struct to save
> > the local encap netdevice, and when mac changed, kernel will
> > flush all the neighbour on the netdevice and send
> > NETEVENT_NEIGH_UPDATE
> > event. The mlx5 driver will delete the flows and add them when
> > neighbour
> > available again.
> >
> > Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update
> > flow")
> > Cc: Hadar Hen Zion <hadarh@mellanox.com>
> > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
> >
> >
>
> Thank you Tonghao and Or.
>
> Acked-by: Saeed Mahameed <saeedm@mellanox.com>

Dave, I see a copy of the patch on patchworks [1] with status being
not applicable, what is missing here? the patch should apply AFAIK.
There was also another patch reviewed here by me and Saeed [2], not
sure where it stands from your side.

[1] https://patchwork.ozlabs.org/patch/1023844/
[2] https://marc.info/?l=linux-netdev&m=154878514916414&w=2

^ permalink raw reply

* Re: [PATCH v2 0/7] sh_eth: implement simple RX checksum offload
From: Sergei Shtylyov @ 2019-02-05  8:00 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-renesas-soc, linux-sh
In-Reply-To: <20190204.133136.1764517040498341680.davem@davemloft.net>

On 05.02.2019 0:31, David Miller wrote:

>> Here's a set of 7 patches against DaveM's 'net-next.git' repo. I'm implemeting
>> the simple RX checksum offload (like was done for the 'ravb' driver by Simon
>> Horman); it has been only tested on the R8A7740 and R8A77980 SoCs, the other
>> SoCs should just work (according to their manuals)...
> 
> Series applied, thanks.
> 
> There was a "tha" --> "the" typo in one of your commit messages which I
> fixed up.

    Indeed! Thank you. :-)

MBR, Sergei

^ permalink raw reply

* Re: [PATCH mlx5-next 12/12] net/mlx5: Set ODP SRQ support in firmware
From: Leon Romanovsky @ 2019-02-05  6:27 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Jason Gunthorpe, dledford@redhat.com, Majd Dibbiny, Moni Shoua,
	linux-rdma@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <19e67f214aca4424e0e373d782c4c7e6bc25dcef.camel@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 1371 bytes --]

On Mon, Feb 04, 2019 at 11:47:23PM +0000, Saeed Mahameed wrote:
> On Tue, 2019-01-22 at 08:48 +0200, Leon Romanovsky wrote:
> > From: Moni Shoua <monis@mellanox.com>
> >
> > To avoid compatibility issue with older kernels the firmware doesn't
> > allow SRQ to work with ODP unless kernel asks for it.
> >
> > Signed-off-by: Moni Shoua <monis@mellanox.com>
> > Reviewed-by: Majd Dibbiny <majd@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > ---
> >  .../net/ethernet/mellanox/mlx5/core/main.c    | 53
> > +++++++++++++++++++
> >  include/linux/mlx5/device.h                   |  3 ++
> >  include/linux/mlx5/mlx5_ifc.h                 |  1 +
> >  3 files changed, 57 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > index be81b319b0dc..b3a76df0cf6c 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > @@ -459,6 +459,53 @@ static int handle_hca_cap_atomic(struct
> > mlx5_core_dev *dev)
> >  	return err;
> >  }
> >
> > +static int handle_hca_cap_odp(struct mlx5_core_dev *dev)
> > +{
> > +	void *set_ctx;
> > +	void *set_hca_cap;
> > +	int set_sz = MLX5_ST_SZ_BYTES(set_hca_cap_in);
> > +	int err;
> > +
>
> reversed xmas tree.

I'll send followup to address your comments.

Thanks

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Is advertising of 2500Mbps support must from phy device to set phy at 2500Mbps link speed
From: abhijit @ 2019-02-05  6:09 UTC (permalink / raw)
  To: netdev

Hi All,

We are using Ethernet MAC which has integrated Phy. This phy supports 
speed up to 10000Mbps. The phy has, 1000Base-X PCS(Physical Coding 
Sub-layer) followed by SerDes interface to support 10Mbps to 10000Mbps. 
Currently we are trying to get this phy at 2500Mbps. This device has 16 
registers that corresponds to Clause 37, which can be used to advertise 
speed till 1000Mbps.
So my question is,
1. Without phy advertising its capability of 2500Mbps, is there any way 
I can set phy speed at 2500Mbps?
2. I tried disabling auto-negotiation and setting speed at 2500Mbps with 
ethtool (ethtool -s eth0  speed 2500 autoneg off), but the ethtool 
reported this configuration as invalid?
3. At the end we are targeting print of "link up (2500/Full)"

Regards,
   Abhijit

^ permalink raw reply

* linux-next: manual merge of the akpm-current tree with the net tree
From: Stephen Rothwell @ 2019-02-05  5:25 UTC (permalink / raw)
  To: Andrew Morton, David Miller, Networking
  Cc: Linux Next Mailing List, Linux Kernel Mailing List,
	Kent Overstreet, Xin Long

[-- Attachment #1: Type: text/plain, Size: 790 bytes --]

Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  net/sctp/stream.c

between commit:

  cfe4bd7a257f ("sctp: check and update stream->out_curr when allocating stream_out")

from the net tree and commit:

  2bd3fbb3ff23 ("sctp: convert to genradix")

from the akpm-current tree.

I fixed it up (the latter removed the code modified by the former, so I
did that) and can carry the fix as necessary. This is now fixed as far as
linux-next is concerned, but any non trivial conflicts should be mentioned
to your upstream maintainer when your tree is submitted for merging.
You may also want to consider cooperating with the maintainer of the
conflicting tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v7 0/8] devlink: Add configuration parameters support for devlink_port
From: Vasundhara Volam @ 2019-02-05  4:23 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Michal Kubecek, Netdev, David Miller, michael.chan@broadcom.com,
	Jiri Pirko
In-Reply-To: <20190204185626.3c7a162b@cakuba.netronome.com>

On Tue, Feb 5, 2019 at 8:26 AM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Mon, 4 Feb 2019 12:25:13 +0530, Vasundhara Volam wrote:
> > > > IMHO this is not really a problem. We can either use an additional flag
> > > > telling kernel/driver if we are setting runtime or persistent WoL mask
> > > > or we can pass (up to) two bitmaps.
> > Jakub, I will add another two bitmask parameters for WoL named as
> > wake_on_lan_runtime
> > and wake_on_lan_persistent. This will give information about what
> > types are runtime and
> > what types are persistent, does the device support for any given WoL types.
> >
> > Does that sound good?
>
> No?  We were talking about using the soon-too-come ethtool netlink
> API with additional indication that given configuration request is
> supposed to be persisted.  Adding more devlink parameters is exactly
> the opposite of what you should be doing.
Okay. So, till then can we have the devlink wake_on_lan parameter or
you want this to be removed? Could you please clarify?

Once ethtool netlink API is available with persisted support, I can remove
this wake_on_lan parameter from devlink. Thanks.

^ permalink raw reply

* RE: [PATCH v2 2/2] netdev/phy: add MDIO bus multiplexer driven by a regmap
From: Pankaj Bansal @ 2019-02-05  3:52 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli; +Cc: netdev@vger.kernel.org
In-Reply-To: <201902042228.ePAd4m5Y%fengguang.wu@intel.com>

Hi Andrew,

I am getting this compilation error when the mdio-mux-regmap.ko is built as standalone module.
This error doesn't come when the mdio-mux-regmap is built as part of linux kernel.

I don't understand the reason for it. Inline definitions of functions are only defined if CONFIG_MDIO_BUS_MUX_REGMAP
Is NOT defined. 

BUT it is defined as " CONFIG_MDIO_BUS_MUX_REGMAP = m"

Can you please help me to solve this?

Regards,
Pankaj Bansal

> -----Original Message-----
> From: kbuild test robot [mailto:lkp@intel.com]
> Sent: Monday, 4 February, 2019 08:22 PM
> To: Pankaj Bansal <pankaj.bansal@nxp.com>
> Cc: kbuild-all@01.org; Andrew Lunn <andrew@lunn.ch>; Florian Fainelli
> <f.fainelli@gmail.com>; netdev@vger.kernel.org; Pankaj Bansal
> <pankaj.bansal@nxp.com>
> Subject: Re: [PATCH v2 2/2] netdev/phy: add MDIO bus multiplexer driven by a
> regmap
> 
> Hi Pankaj,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on net/master]
> [also build test ERROR on v5.0-rc4]
> [if your patch is applied to the wrong git tree, please drop us a note to help
> improve the system]
> 
> url:
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.
> com%2F0day-ci%2Flinux%2Fcommits%2FPankaj-Bansal%2Fadd-MDIO-bus-
> multiplexer-driven-by-a-regmap-device%2F20190204-
> 213429&amp;data=02%7C01%7Cpankaj.bansal%40nxp.com%7Caabb2bdecd8a4
> 2c19f8108d68ab053d6%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7
> C636848890628820621&amp;sdata=WLP4yuLXqfoNLfrPSIWhNkfg24YdR6Ny8jb
> cUgvcrDQ%3D&amp;reserved=0
> config: sh-allmodconfig (attached as .config)
> compiler: sh4-linux-gnu-gcc (Debian 8.2.0-11) 8.2.0
> reproduce:
>         wget
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fraw.git
> hubusercontent.com%2Fintel%2Flkp-
> tests%2Fmaster%2Fsbin%2Fmake.cross&amp;data=02%7C01%7Cpankaj.bansal
> %40nxp.com%7Caabb2bdecd8a42c19f8108d68ab053d6%7C686ea1d3bc2b4c6f
> a92cd99c5c301635%7C0%7C0%7C636848890628820621&amp;sdata=pGqb5xq
> RsWH6fXd8NWLUKX86qQ0ZO8OOy8nj9rckqV8%3D&amp;reserved=0 -O
> ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         GCC_VERSION=8.2.0 make.cross ARCH=sh
> 
> All errors (new ones prefixed by >>):
> 
> >> drivers/net/phy/mdio-mux-regmap.c:72:5: error: redefinition of
> 'mdio_mux_regmap_init'
>     int mdio_mux_regmap_init(struct device *dev,
>         ^~~~~~~~~~~~~~~~~~~~
>    In file included from drivers/net/phy/mdio-mux-regmap.c:18:
>    include/linux/mdio-mux.h:53:19: note: previous definition of
> 'mdio_mux_regmap_init' was here
>     static inline int mdio_mux_regmap_init(struct device *dev,
>                       ^~~~~~~~~~~~~~~~~~~~
> >> drivers/net/phy/mdio-mux-regmap.c:158:5: error: redefinition of
> 'mdio_mux_regmap_uninit'
>     int mdio_mux_regmap_uninit(void *data)
>         ^~~~~~~~~~~~~~~~~~~~~~
>    In file included from drivers/net/phy/mdio-mux-regmap.c:18:
>    include/linux/mdio-mux.h:60:19: note: previous definition of
> 'mdio_mux_regmap_uninit' was here
>     static inline int mdio_mux_regmap_uninit(void *data)
>                       ^~~~~~~~~~~~~~~~~~~~~~
> 
> vim +/mdio_mux_regmap_init +72 drivers/net/phy/mdio-mux-regmap.c
> 
>     62
>     63	/**
>     64	 * mdio_mux_regmap_init - control MDIO bus muxing using regmap
> constructs.
>     65	 * @dev: device with which regmap construct is associated.
>     66	 * @mux_node: mdio bus mux node that contains parent mdio bus
> phandle.
>     67	 *	      This node also contains sub nodes, where each subnode
> denotes
>     68	 *	      a child mdio bus. All the child mdio buses are muxed, i.e. at a
>     69	 *	      time only one of the child mdio buses can be used.
>     70	 * @data: to store the address of data allocated by this function
>     71	 */
>   > 72	int mdio_mux_regmap_init(struct device *dev,
>     73				 struct device_node *mux_node,
>     74				 void **data)
>     75	{
>     76		struct device_node *child;
>     77		struct mdio_mux_regmap_state *s;
>     78		int ret;
>     79		u32 val;
>     80
>     81		dev_dbg(dev, "probing node %pOF\n", mux_node);
>     82
>     83		s = devm_kzalloc(dev, sizeof(*s), GFP_KERNEL);
>     84		if (!s)
>     85			return -ENOMEM;
>     86
>     87		s->regmap = dev_get_regmap(dev, NULL);
>     88		if (IS_ERR(s->regmap)) {
>     89			dev_err(dev, "Failed to get parent regmap\n");
>     90			return PTR_ERR(s->regmap);
>     91		}
>     92
>     93		ret = of_property_read_u32(mux_node, "reg", &s->mux_reg);
>     94		if (ret) {
>     95			dev_err(dev, "missing or invalid reg property\n");
>     96			return -ENODEV;
>     97		}
>     98
>     99		/* Test Register read write */
>    100		ret = regmap_read(s->regmap, s->mux_reg, &val);
>    101		if (ret) {
>    102			dev_err(dev, "error while reading reg\n");
>    103			return ret;
>    104		}
>    105
>    106		ret = regmap_write(s->regmap, s->mux_reg, val);
>    107		if (ret) {
>    108			dev_err(dev, "error while writing reg\n");
>    109			return ret;
>    110		}
>    111
>    112		ret = of_property_read_u32(mux_node, "mux-mask", &s-
> >mask);
>    113		if (ret) {
>    114			dev_err(dev, "missing or invalid mux-mask property\n");
>    115			return -ENODEV;
>    116		}
>    117
>    118		/* Verify that the 'reg' property of each child MDIO bus does
> not
>    119		 * set any bits outside of the 'mask'.
>    120		 */
>    121		for_each_available_child_of_node(mux_node, child) {
>    122			ret = of_property_read_u32(child, "reg", &val);
>    123			if (ret) {
>    124				dev_err(dev, "%pOF is missing a 'reg'
> property\n",
>    125					child);
>    126				of_node_put(child);
>    127				return -ENODEV;
>    128			}
>    129			if (val & ~s->mask) {
>    130				dev_err(dev,
>    131					"%pOF has a 'reg' value with unmasked
> bits\n",
>    132					child);
>    133				of_node_put(child);
>    134				return -ENODEV;
>    135			}
>    136		}
>    137
>    138		ret = mdio_mux_init(dev, mux_node,
> mdio_mux_regmap_switch_fn,
>    139				    &s->mux_handle, s, NULL);
>    140		if (ret) {
>    141			if (ret != -EPROBE_DEFER)
>    142				dev_err(dev, "failed to register mdio-mux
> bus %pOF\n",
>    143					mux_node);
>    144			return ret;
>    145		}
>    146
>    147		*data = s;
>    148
>    149		return 0;
>    150	}
>    151	EXPORT_SYMBOL_GPL(mdio_mux_regmap_init);
>    152
>    153	/**
>    154	 * mdio_mux_regmap_uninit - relinquish the control of MDIO bus
> muxing using
>    155	 *			    regmap constructs.
>    156	 * @data: address of data allocated by mdio_mux_regmap_init
>    157	 */
>  > 158	int mdio_mux_regmap_uninit(void *data)
>    159	{
>    160		struct mdio_mux_regmap_state *s = data;
>    161
>    162		mdio_mux_uninit(s->mux_handle);
>    163
>    164		return 0;
>    165	}
>    166	EXPORT_SYMBOL_GPL(mdio_mux_regmap_uninit);
>    167
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.01
> .org%2Fpipermail%2Fkbuild-
> all&amp;data=02%7C01%7Cpankaj.bansal%40nxp.com%7Caabb2bdecd8a42c19
> f8108d68ab053d6%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636
> 848890628820621&amp;sdata=P7RrlnHYStMpoDpwIDMmlRiDTCemyEE3bxuUrq
> d0uxY%3D&amp;reserved=0                   Intel Corporation

^ permalink raw reply

* Re: [PATCH bpf-next] libbpf: fix libbpf_print
From: Yonghong Song @ 2019-02-05  3:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Stanislav Fomichev, netdev@vger.kernel.org, davem@davemloft.net,
	ast@kernel.org, daniel@iogearbox.net
In-Reply-To: <20190205015133.cv23h2hkz6pym4dw@ast-mbp.dhcp.thefacebook.com>



On 2/4/19 5:51 PM, Alexei Starovoitov wrote:
> On Tue, Feb 05, 2019 at 12:37:29AM +0000, Yonghong Song wrote:
>>
>>
>> On 2/4/19 4:20 PM, Stanislav Fomichev wrote:
>>> With the recent print rework we now have the following problem:
>>> pr_{warning,info,debug} expand to __pr which calls libbpf_print.
>>> libbpf_print does va_start and calls __libbpf_pr with va_list argument.
>>> In __base_pr we again do va_start. Because the next argument is a
>>> va_list, we don't get correct pointer to the argument (and print noting
>>> in my case, I don't know why it doesn't crash tbh).
>>>
>>> Fix this by changing libbpf_print_fn_t signature to accept va_list and
>>> remove unneeded calls to va_start in the existing users.
>>>
>>> Alternatively, this can we solved by exporting __libbpf_pr and
>>> changing __pr macro to (and killing libbpf_print):
>>> {
>>> 	if (__libbpf_pr)
>>> 		__libbpf_pr(level, "libbpf: " fmt, ##__VA_ARGS__)
>>> }
>>>
>>> Signed-off-by: Stanislav Fomichev <sdf@google.com>
>>
>> It is my mistake. My early version did passed correctly and later
>> on I made some changes and did not test properly. Thanks for the fix!
>>
>> Acked-by: Yonghong Song <yhs@fb.com>
> 
> argh.
> Applied. Thanks for the fix.
> Yonghong, how was the earlier patch set tested?

Before the global function patch set, I have a global variable version,
which I tested and worked. Later on when changing to global
libbpf_print approach, I tested it through test_btf. That is why
I found a missing check for LIBBPF_DEBUG level in default __base_pr.
But I have to admit that I probably did not pay attention to contents 
somehow so I missed the garbled output.

> It sounds that nothing should have worked.
> How perf changes were tested?

I only tested compilation. The context is similar to a few other
bpf selftests programs and I assume with similar implementation, the
result should be similar. But really badly, they are all incorrect :-(

^ permalink raw reply

* Re: [PATCH] net: fix IPv6 prefix route residue
From: David Ahern @ 2019-02-05  3:21 UTC (permalink / raw)
  To: David Miller, liuzhiqiang26
  Cc: kuznet, yoshfuji, 0xeffeff, edumazet, netdev, mingfangsen,
	zhangwenhao8, wangxiaogang3, zhoukang7
In-Reply-To: <20190203.090411.1685970814345088903.davem@redhat.com>

On 2/3/19 9:04 AM, David Miller wrote:
> From: Zhiqiang Liu <liuzhiqiang26@huawei.com>
> Date: Sun, 3 Feb 2019 14:10:31 +0800
> 
>> @@ -1165,7 +1165,8 @@ enum cleanup_prefix_rt_t {
>>  	list_for_each_entry(ifa, &idev->addr_list, if_list) {
>>  		if (ifa == ifp)
>>  			continue;
>> -		if (!ipv6_prefix_equal(&ifa->addr, &ifp->addr,
>> +		if (ifa->prefix_len != ifp->prefix_len ||
>> +			!ipv6_prefix_equal(&ifa->addr, &ifp->addr,
>>  				       ifp->prefix_len))
> 
> Please fix the indentation of this conditional, it should be:
> 
> 		if (ifa->prefix_len != ifp->prefix_len ||
> 		    !ipv6_prefix_equal(&ifa->addr, &ifp->addr,
> 				       ifp->prefix_len))
> 
> Thank you.
> 

Needs a Fixes tag too.

Fixes: 5b84efecb7d9 ("ipv6 addrconf: don't cleanup prefix route for
IFA_F_NOPREFIXROUTE")

and cc to that author Thomas Haller <thaller@redhat.com>

^ permalink raw reply

* Re: [PATCH net] virtio_net: Account for tx bytes and packets on sending xdp_frames
From: David Ahern @ 2019-02-05  3:13 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David Miller, mst, makita.toshiaki, jasowang, netdev,
	virtualization, hawk, Toke Høiland-Jørgensen,
	Jakub Kicinski, John Fastabend, Daniel Borkmann, Saeed Mahameed,
	Tariq Toukan
In-Reply-To: <20190204125307.08492005@redhat.com>

On 2/4/19 3:53 AM, Jesper Dangaard Brouer wrote:
> On Sat, 2 Feb 2019 14:27:26 -0700
> David Ahern <dsahern@gmail.com> wrote:
> 
>> On 1/31/19 1:15 PM, Jesper Dangaard Brouer wrote:
>>>>
>>>> David, Jesper, care to chime in where we ended up in that last thread
>>>> discussion this?  
>>>
>>> IHMO packets RX and TX on a device need to be accounted, in standard
>>> counters, regardless of XDP.  For XDP RX the packet is counted as RX,
>>> regardless if XDP choose to XDP_DROP.  On XDP TX which is via
>>> XDP_REDIRECT or XDP_TX, the driver that transmit the packet need to
>>> account the packet in a TX counter (this if often delayed to DMA TX
>>> completion handling).  We cannot break the expectation that RX and TX
>>> counter are visible to userspace stats tools. XDP should not make these
>>> packets invisible.  
>>
>> Agreed. What I was pushing on that last thread was Rx, Tx and dropped
>> are all accounted by the driver in standard stats. Basically if the
>> driver touched it, the driver's counters should indicate that.
> 
> Sound like we all agree (except with the dropped counter, see below).
> 
> Do notice that mlx5 driver doesn't do this.  It is actually rather
> confusing to use XDP on mlx5, as when XDP "consume" which include
> XDP_DROP, XDP_REDIRECT or XDP_TX, then the driver standard stats are
> not incremented... the packet is invisible to "ifconfig" stat based
> tools.

mlx5 needs some work. As I recall it still has the bug/panic removing
xdp programs - at least I don't recall seeing a patch for it.

> 
> 
>> The push back was on dropped packets and whether that counter should be
>> bumped on XDP_DROP.
> 
> My opinion is the XDP_DROP action should NOT increment the drivers drop
> counter.  First of all the "dropped" counter is also use for other
> stuff, which will confuse that this counter express.  Second, choosing
> XDP_DROP is a policy choice, it still means it was RX-ed at the driver
> level.
> 

Understood. Hopefully in March I will get some time to come back to this
and propose an idea on what I would like to see - namely, the admin has
a config option at load time to enable driver counters versus custom map
counters. (meaning the operator of the node chooses standard stats over
strict performance.) But of course that means the drivers have the code
to collect those stats.

^ permalink raw reply

* [PATCH] ipmr: ip6mr: Create new sockopt to clear mfc cache only
From: Callum Sinclair @ 2019-02-05  2:57 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji, nikolay, netdev, linux-kernel; +Cc: Callum Sinclair

Created a way to clear the multicast forwarding cache on a socket
without having to either remove the entries manually using the delete
entry socket option or destroy and recreate the multicast socket.

Patch Set 2:
  - Fix Compile Errors

Patch Set 3:
  - Fix Style Errors

Callum Sinclair (1):
  ipmr: ip6mr: Create new sockopt to clear mfc cache only

 include/uapi/linux/mroute.h  |  3 ++-
 include/uapi/linux/mroute6.h |  3 ++-
 net/ipv4/ipmr.c              | 40 +++++++++++++++++++++----------
 net/ipv6/ip6mr.c             | 46 +++++++++++++++++++++++-------------
 4 files changed, 61 insertions(+), 31 deletions(-)

-- 
2.20.1


^ permalink raw reply

* [PATCH] ipmr: ip6mr: Create new sockopt to clear mfc cache only
From: Callum Sinclair @ 2019-02-05  2:58 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji, nikolay, netdev, linux-kernel; +Cc: Callum Sinclair
In-Reply-To: <20190205025800.17185-1-callum.sinclair@alliedtelesis.co.nz>

Currently the only way to clear the mfc cache was to delete the entries
one by one using the MRT_DEL_MFC socket option or to destroy and
recreate the socket.

Create a new socket option which will clear the multicast forwarding
cache on the socket without destroying the socket.

Signed-off-by: Callum Sinclair <callum.sinclair@alliedtelesis.co.nz>
---
 include/uapi/linux/mroute.h  |  3 ++-
 include/uapi/linux/mroute6.h |  3 ++-
 net/ipv4/ipmr.c              | 40 +++++++++++++++++++++----------
 net/ipv6/ip6mr.c             | 46 +++++++++++++++++++++++-------------
 4 files changed, 61 insertions(+), 31 deletions(-)

diff --git a/include/uapi/linux/mroute.h b/include/uapi/linux/mroute.h
index 5d37a9ccce63..8a0beb885cd9 100644
--- a/include/uapi/linux/mroute.h
+++ b/include/uapi/linux/mroute.h
@@ -28,7 +28,8 @@
 #define MRT_TABLE	(MRT_BASE+9)	/* Specify mroute table ID		*/
 #define MRT_ADD_MFC_PROXY	(MRT_BASE+10)	/* Add a (*,*|G) mfc entry	*/
 #define MRT_DEL_MFC_PROXY	(MRT_BASE+11)	/* Del a (*,*|G) mfc entry	*/
-#define MRT_MAX		(MRT_BASE+11)
+#define MRT_DEL_MFC_ALL		(MRT_BASE+12)	/* Del all multicast entries	*/
+#define MRT_MAX		(MRT_BASE+12)
 
 #define SIOCGETVIFCNT	SIOCPROTOPRIVATE	/* IP protocol privates */
 #define SIOCGETSGCNT	(SIOCPROTOPRIVATE+1)
diff --git a/include/uapi/linux/mroute6.h b/include/uapi/linux/mroute6.h
index 9999cc006390..7def70cdf571 100644
--- a/include/uapi/linux/mroute6.h
+++ b/include/uapi/linux/mroute6.h
@@ -31,7 +31,8 @@
 #define MRT6_TABLE	(MRT6_BASE+9)	/* Specify mroute table ID		*/
 #define MRT6_ADD_MFC_PROXY	(MRT6_BASE+10)	/* Add a (*,*|G) mfc entry	*/
 #define MRT6_DEL_MFC_PROXY	(MRT6_BASE+11)	/* Del a (*,*|G) mfc entry	*/
-#define MRT6_MAX	(MRT6_BASE+11)
+#define MRT6_DEL_MFC_ALL	(MRT6_BASE+12)	/* Del all multicast entries	*/
+#define MRT6_MAX	(MRT6_BASE+12)
 
 #define SIOCGETMIFCNT_IN6	SIOCPROTOPRIVATE	/* IP protocol privates */
 #define SIOCGETSGCNT_IN6	(SIOCPROTOPRIVATE+1)
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index ddbf8c9a1abb..ccfeebd38e1a 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1298,22 +1298,12 @@ static int ipmr_mfc_add(struct net *net, struct mr_table *mrt,
 	return 0;
 }
 
-/* Close the multicast socket, and clear the vif tables etc */
-static void mroute_clean_tables(struct mr_table *mrt, bool all)
+/* Clear the vif tables */
+static void mroute_clean_cache(struct mr_table *mrt, bool all)
 {
 	struct net *net = read_pnet(&mrt->net);
-	struct mr_mfc *c, *tmp;
 	struct mfc_cache *cache;
-	LIST_HEAD(list);
-	int i;
-
-	/* Shut down all active vif entries */
-	for (i = 0; i < mrt->maxvif; i++) {
-		if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
-			continue;
-		vif_delete(mrt, i, 0, &list);
-	}
-	unregister_netdevice_many(&list);
+	struct mr_mfc *c, *tmp;
 
 	/* Wipe the cache */
 	list_for_each_entry_safe(c, tmp, &mrt->mfc_cache_list, list) {
@@ -1340,6 +1330,23 @@ static void mroute_clean_tables(struct mr_table *mrt, bool all)
 	}
 }
 
+/* Close the multicast socket, and clear the vif tables etc */
+static void mroute_clean_tables(struct mr_table *mrt, bool all)
+{
+	LIST_HEAD(list);
+	int i;
+
+	/* Shut down all active vif entries */
+	for (i = 0; i < mrt->maxvif; i++) {
+		if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
+			continue;
+		vif_delete(mrt, i, 0, &list);
+	}
+	unregister_netdevice_many(&list);
+
+	mroute_clean_cache(mrt, all);
+}
+
 /* called from ip_ra_control(), before an RCU grace period,
  * we dont need to call synchronize_rcu() here
  */
@@ -1482,6 +1489,13 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval,
 					   sk == rtnl_dereference(mrt->mroute_sk),
 					   parent);
 		break;
+	case MRT_DEL_MFC_ALL:
+		rtnl_lock();
+		ipmr_for_each_table(mrt, net) {
+			mroute_clean_cache(mrt, true);
+		}
+		rtnl_unlock();
+		break;
 	/* Control PIM assert. */
 	case MRT_ASSERT:
 		if (optlen != sizeof(val)) {
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 30337b38274b..0168420d217b 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1492,25 +1492,11 @@ static int ip6mr_mfc_add(struct net *net, struct mr_table *mrt,
 	return 0;
 }
 
-/*
- *	Close the multicast socket, and clear the vif tables etc
- */
-
-static void mroute_clean_tables(struct mr_table *mrt, bool all)
+/* Clear the vif tables */
+static void mroute_clean_cache(struct mr_table *mrt, bool all)
 {
 	struct mr_mfc *c, *tmp;
-	LIST_HEAD(list);
-	int i;
-
-	/* Shut down all active vif entries */
-	for (i = 0; i < mrt->maxvif; i++) {
-		if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
-			continue;
-		mif6_delete(mrt, i, 0, &list);
-	}
-	unregister_netdevice_many(&list);
 
-	/* Wipe the cache */
 	list_for_each_entry_safe(c, tmp, &mrt->mfc_cache_list, list) {
 		if (!all && (c->mfc_flags & MFC_STATIC))
 			continue;
@@ -1536,6 +1522,27 @@ static void mroute_clean_tables(struct mr_table *mrt, bool all)
 	}
 }
 
+/*
+ *	Close the multicast socket, and clear the vif tables etc
+ */
+
+static void mroute_clean_tables(struct mr_table *mrt, bool all)
+{
+	LIST_HEAD(list);
+	int i;
+
+	/* Shut down all active vif entries */
+	for (i = 0; i < mrt->maxvif; i++) {
+		if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
+			continue;
+		mif6_delete(mrt, i, 0, &list);
+	}
+	unregister_netdevice_many(&list);
+
+	/* Wipe the cache */
+	mroute_clean_cache(mrt, all);
+}
+
 static int ip6mr_sk_init(struct mr_table *mrt, struct sock *sk)
 {
 	int err = 0;
@@ -1703,6 +1710,13 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, uns
 					    parent);
 		rtnl_unlock();
 		return ret;
+	case MRT6_DEL_MFC_ALL:
+		rtnl_lock();
+		ip6mr_for_each_table(mrt, net) {
+			mroute_clean_cache(mrt, true);
+		}
+		rtnl_unlock();
+		return 0;
 
 	/*
 	 *	Control PIM assert (to activate pim will activate assert)
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH net-next v7 0/8] devlink: Add configuration parameters support for devlink_port
From: Jakub Kicinski @ 2019-02-05  2:56 UTC (permalink / raw)
  To: Vasundhara Volam
  Cc: Michal Kubecek, Netdev, David Miller, michael.chan@broadcom.com,
	Jiri Pirko
In-Reply-To: <CAACQVJpVHLCeeE1KhJanYv_kDXzE=KX_OAPiCnFfjotf3K1+TA@mail.gmail.com>

On Mon, 4 Feb 2019 12:25:13 +0530, Vasundhara Volam wrote:
> > > IMHO this is not really a problem. We can either use an additional flag
> > > telling kernel/driver if we are setting runtime or persistent WoL mask
> > > or we can pass (up to) two bitmaps.  
> Jakub, I will add another two bitmask parameters for WoL named as
> wake_on_lan_runtime
> and wake_on_lan_persistent. This will give information about what
> types are runtime and
> what types are persistent, does the device support for any given WoL types.
> 
> Does that sound good?

No?  We were talking about using the soon-too-come ethtool netlink
API with additional indication that given configuration request is
supposed to be persisted.  Adding more devlink parameters is exactly 
the opposite of what you should be doing.

^ permalink raw reply

* Re: Compiler warning
From: David Ahern @ 2019-02-05  2:55 UTC (permalink / raw)
  To: Koen Vandeputte, netdev
In-Reply-To: <60f98697-952d-aa39-765f-29736de0c4a2@ncentric.com>

On 2/4/19 3:43 AM, Koen Vandeputte wrote:
> Hi All,
> 
> I'm seeing following compiler warning during kernel compilation
> (5.0-rc5  and  4.14.96):
> 
> 
> net/core/dev.c: In function 'validate_xmit_skb_list':
> net/core/dev.c:3405:15: warning: 'tail' may be used uninitialized in
> this function [-Wmaybe-uninitialized]
>     tail->next = skb;
>     ~~~~~~~~~~~^~~~~
> 
> 
> Source shows this:
> 
> https://elixir.bootlin.com/linux/v5.0-rc5/source/net/core/dev.c#L3387
> 
> Looks like "tail" can get deferenced while it indeed doesn't get
> initialized? Kind regards, Koen
> 

same with this one - false positive. head is initialized to NULL. tail
is set on the first pass through the loop.

What compiler / version is this?

^ permalink raw reply

* Re: Compiler warning - ipv4 fib_trie
From: David Ahern @ 2019-02-05  2:53 UTC (permalink / raw)
  To: Koen Vandeputte, netdev
In-Reply-To: <ba46f918-243d-c106-6166-28caecdfd81d@ncentric.com>

On 2/4/19 3:55 AM, Koen Vandeputte wrote:
> Hi All,
> 
> During compilation of kernel 4.14.96 and 5.0-rc5 I'm seeing following
> warning:
> 
> net/ipv4/fib_trie.c: In function 'fib_trie_unmerge':
> net/ipv4/fib_trie.c:1749:8: warning: 'local_tp' may be used
> uninitialized in this function [-Wmaybe-uninitialized]
>     if (fib_insert_alias(lt, local_tp, local_l, new_fa,
>         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>            NULL, l->key)) {
>            ~~~~~~~~~~~~~
> 
> Code:
> 
> https://elixir.bootlin.com/linux/v5.0-rc5/source/net/ipv4/fib_trie.c#L1731
> 
> Looks like 'local_tp' should be NULL-ified
> 
> 
> Thanks,
> 
> Koen
> 

I believe that is a false positive. If you analyze the loop, local_tp is
set when local_l is NULL and it is set before the first use. I am
surprised the compiler can not track that. fib_find_node has 1 return
point and local_tp is set right before.

^ permalink raw reply

* Re: [PATCH v1] net: dsa: qca8k: implement DT-based ports <-> phy translation
From: Andrew Lunn @ 2019-02-05  2:45 UTC (permalink / raw)
  To: Christian Lamparter; +Cc: netdev, Florian Fainelli, Vivien Didelot
In-Reply-To: <20190204213555.26054-1-chunkeey@gmail.com>

On Mon, Feb 04, 2019 at 10:35:55PM +0100, Christian Lamparter wrote:
> The QCA8337 enumerates 5 PHYs on the MDC/MDIO access: PHY0-PHY4.
> Based on the System Block Diagram in Section 1.2 of the
> QCA8337's datasheet. These PHYs are internally connected
> to MACs of PORT 1 - PORT 5. However, neither qca8k's slave
> mdio access functions qca8k_phy_read()/qca8k_phy_write()
> nor the dsa framework is set up for that.
> 
> This version of the patch uses the existing phy-handle
> properties of each specified DSA Port in the DT to map
> each PORT/MAC to its exposed PHY on the MDIO bus. This
> is supported by the current binding document qca8k.txt
> as well.

Hi Christian

Looking at Documentation/devicetree/bindings/net/dsa/qca8k.txt 

I think everything you need is already implemented. What problem do
you actually have?

Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH v2] net: dsa: mv88e6xxx: Revise irq setup ordering
From: Andrew Lunn @ 2019-02-05  2:21 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <8d8123cc-60f0-e236-e496-0aacf735fceb@bell.net>

> The problem is INTn can go low before the interrupt handler for it is
> registered and enabled.

> This can't happen.  The domain is setup immediately after registering
> the GPIO interrupt.
> The interrupt can't fire until one of the enables is set.

These two statement seem to contradict each other?

> These are set
> by mv88e6xxx_g2_irq_setup(),
> mv88e6xxx_g1_atu_prob_irq_setup() and
> mv88e6xxx_g1_vtu_prob_irq_setup().  These irqs
> are setup after mv88e6xxx_g1_irq_setup()/mv88e6xxx_irq_poll_setup() is
> called.  Thus, the
> irq domain is setup before the GPIO interrupt can fire.

At what point is INTn going low, causing you all these problems? I've
yet to see a real description of the race. Please give us a blow by
blow of how the race happens. And then explain how your fix actually
fixes this.

Also, i'm not yet convinced this hardware can actually work correctly
with edge interrupts. You can probably reduce the size of the race
window, but i don't think you can eliminate it. And if you cannot
eliminate it, at some point it is going to hit you.

     Andrew

^ permalink raw reply

* Re: [PATCH net-next v2 1/1] net/mlx5: Fix code style issue in mlx driver
From: David Miller @ 2019-02-05  2:02 UTC (permalink / raw)
  To: saeedm; +Cc: xiangxia.m.yue, netdev
In-Reply-To: <CALzJLG_uLE5c-yHYPGXV5Pp2w4nEYGBU9tzJwNMABNiC1V6HTg@mail.gmail.com>

From: Saeed Mahameed <saeedm@dev.mellanox.co.il>
Date: Mon, 4 Feb 2019 15:26:51 -0800

> On Thu, Jan 31, 2019 at 5:20 PM <xiangxia.m.yue@gmail.com> wrote:
>>
>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>>
>> Add the tab before '}' and keep the code style consistent.
>>
>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
> 
> Acked-by: Saeed Mahameed <saeedm@mellanox.com>
> 
> Dave, you can take this patch to net-next.

Great, applied.

^ permalink raw reply

* Re: [PATCH bpf-next] libbpf: fix libbpf_print
From: Alexei Starovoitov @ 2019-02-05  1:51 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Stanislav Fomichev, netdev@vger.kernel.org, davem@davemloft.net,
	ast@kernel.org, daniel@iogearbox.net
In-Reply-To: <f7b5339e-b682-c00a-afaf-a00e08aff5f7@fb.com>

On Tue, Feb 05, 2019 at 12:37:29AM +0000, Yonghong Song wrote:
> 
> 
> On 2/4/19 4:20 PM, Stanislav Fomichev wrote:
> > With the recent print rework we now have the following problem:
> > pr_{warning,info,debug} expand to __pr which calls libbpf_print.
> > libbpf_print does va_start and calls __libbpf_pr with va_list argument.
> > In __base_pr we again do va_start. Because the next argument is a
> > va_list, we don't get correct pointer to the argument (and print noting
> > in my case, I don't know why it doesn't crash tbh).
> > 
> > Fix this by changing libbpf_print_fn_t signature to accept va_list and
> > remove unneeded calls to va_start in the existing users.
> > 
> > Alternatively, this can we solved by exporting __libbpf_pr and
> > changing __pr macro to (and killing libbpf_print):
> > {
> > 	if (__libbpf_pr)
> > 		__libbpf_pr(level, "libbpf: " fmt, ##__VA_ARGS__)
> > }
> > 
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> 
> It is my mistake. My early version did passed correctly and later
> on I made some changes and did not test properly. Thanks for the fix!
> 
> Acked-by: Yonghong Song <yhs@fb.com>

argh.
Applied. Thanks for the fix.
Yonghong, how was the earlier patch set tested?
It sounds that nothing should have worked.
How perf changes were tested?


^ permalink raw reply

* Re: [PATCH btf 0/3] Add BTF types deduplication algorithm
From: Andrii Nakryiko @ 2019-02-05  1:34 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Andrii Nakryiko, netdev, Alexei Starovoitov, Yonghong Song,
	Martin Lau, Arnaldo Carvalho de Melo, kernel-team
In-Reply-To: <f31c224c-6f4c-ef04-887c-680e6c8facd4@iogearbox.net>

On Mon, Feb 4, 2019 at 5:32 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Hi Andrii,
>
> On 01/31/2019 07:58 AM, Andrii Nakryiko wrote:
> > This patch series adds BTF deduplication algorithm to libbpf. This algorithm
> > allows to take BTF type information containing duplicate per-compilation unit
> > information and reduce it to equivalent set of BTF types with no duplication without
> > loss of information. It also deduplicates strings and removes those strings that
> > are not referenced from any BTF type (and line information in .BTF.ext section,
> > if any).
> >
> > Algorithm also resolves struct/union forward declarations into concrete BTF types
> > across multiple compilation units to facilitate better deduplication ratio. If
> > undesired, this resolution can be disabled through specifying corresponding options.
> >
> > When applied to BTF data emitted by pahole's DWARF->BTF converter, it reduces
> > the overall size of .BTF section by about 65x, from about 112MB to 1.75MB, leaving
> > only 29247 out of initial 3073497 BTF type descriptors.
> >
> > Algorithm with minor differences and preliminary results before FUNC/FUNC_PROTO
> > support is also described more verbosely at:
> > https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html
> >
> > Andrii Nakryiko (3):
> >   btf: extract BTF type size calculation
> >   btf: add BTF types deduplication algorithm
> >   selftests/btf: add initial BTF dedup tests
> >
> >  tools/lib/bpf/btf.c                    | 1851 +++++++++++++++++++++++-
> >  tools/lib/bpf/btf.h                    |   11 +
> >  tools/lib/bpf/libbpf.map               |    3 +
> >  tools/testing/selftests/bpf/test_btf.c |  535 ++++++-
> >  4 files changed, 2333 insertions(+), 67 deletions(-)
>
> This would need a proper rebase for bpf-next so that it applies w/o
> bigger conflicts. Please also change the libbpf.map and place the newly
> exported functions under LIBBPF_0.0.2 (as everything under 0.0.1 was from
> prior released kernel).

Done. Thanks!

>
> Thanks,
> Daniel

^ permalink raw reply

* [PATCH btf v2 2/3] btf: add BTF types deduplication algorithm
From: Andrii Nakryiko @ 2019-02-05  1:29 UTC (permalink / raw)
  To: netdev, ast, yhs, daniel, acme, kernel-team, kafai, ecree,
	andrii.nakryiko
  Cc: Andrii Nakryiko
In-Reply-To: <20190205012946.1590917-1-andriin@fb.com>

This patch implements BTF types deduplication algorithm. It allows to
greatly compress typical output of pahole's DWARF-to-BTF conversion or
LLVM's compilation output by detecting and collapsing identical types emitted in
isolation per compilation unit. Algorithm also resolves struct/union forward
declarations into concrete BTF types representing referenced struct/union. If
undesired, this resolution can be disabled through specifying corresponding options.

Algorithm itself and its application to Linux kernel's BTF types is
described in details at:
https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/btf.c      | 1741 ++++++++++++++++++++++++++++++++++++++
 tools/lib/bpf/btf.h      |    7 +
 tools/lib/bpf/libbpf.map |    1 +
 3 files changed, 1749 insertions(+)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 06bd1a625ff4..e5097be16018 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -849,3 +849,1744 @@ __u32 btf_ext__line_info_rec_size(const struct btf_ext *btf_ext)
 {
 	return btf_ext->line_info.rec_size;
 }
+
+struct btf_dedup;
+
+static struct btf_dedup *btf_dedup_new(struct btf *btf, struct btf_ext *btf_ext,
+				       const struct btf_dedup_opts *opts);
+static void btf_dedup_free(struct btf_dedup *d);
+static int btf_dedup_strings(struct btf_dedup *d);
+static int btf_dedup_prim_types(struct btf_dedup *d);
+static int btf_dedup_struct_types(struct btf_dedup *d);
+static int btf_dedup_ref_types(struct btf_dedup *d);
+static int btf_dedup_compact_types(struct btf_dedup *d);
+static int btf_dedup_remap_types(struct btf_dedup *d);
+
+/*
+ * Deduplicate BTF types and strings.
+ *
+ * BTF dedup algorithm takes as an input `struct btf` representing `.BTF` ELF
+ * section with all BTF type descriptors and string data. It overwrites that
+ * memory in-place with deduplicated types and strings without any loss of
+ * information. If optional `struct btf_ext` representing '.BTF.ext' ELF section
+ * is provided, all the strings referenced from .BTF.ext section are honored
+ * and updated to point to the right offsets after deduplication.
+ *
+ * If function returns with error, type/string data might be garbled and should
+ * be discarded.
+ *
+ * More verbose and detailed description of both problem btf_dedup is solving,
+ * as well as solution could be found at:
+ * https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html
+ *
+ * Problem description and justification
+ * =====================================
+ *
+ * BTF type information is typically emitted either as a result of conversion
+ * from DWARF to BTF or directly by compiler. In both cases, each compilation
+ * unit contains information about a subset of all the types that are used
+ * in an application. These subsets are frequently overlapping and contain a lot
+ * of duplicated information when later concatenated together into a single
+ * binary. This algorithm ensures that each unique type is represented by single
+ * BTF type descriptor, greatly reducing resulting size of BTF data.
+ *
+ * Compilation unit isolation and subsequent duplication of data is not the only
+ * problem. The same type hierarchy (e.g., struct and all the type that struct
+ * references) in different compilation units can be represented in BTF to
+ * various degrees of completeness (or, rather, incompleteness) due to
+ * struct/union forward declarations.
+ *
+ * Let's take a look at an example, that we'll use to better understand the
+ * problem (and solution). Suppose we have two compilation units, each using
+ * same `struct S`, but each of them having incomplete type information about
+ * struct's fields:
+ *
+ * // CU #1:
+ * struct S;
+ * struct A {
+ *	int a;
+ *	struct A* self;
+ *	struct S* parent;
+ * };
+ * struct B;
+ * struct S {
+ *	struct A* a_ptr;
+ *	struct B* b_ptr;
+ * };
+ *
+ * // CU #2:
+ * struct S;
+ * struct A;
+ * struct B {
+ *	int b;
+ *	struct B* self;
+ *	struct S* parent;
+ * };
+ * struct S {
+ *	struct A* a_ptr;
+ *	struct B* b_ptr;
+ * };
+ *
+ * In case of CU #1, BTF data will know only that `struct B` exist (but no
+ * more), but will know the complete type information about `struct A`. While
+ * for CU #2, it will know full type information about `struct B`, but will
+ * only know about forward declaration of `struct A` (in BTF terms, it will
+ * have `BTF_KIND_FWD` type descriptor with name `B`).
+ *
+ * This compilation unit isolation means that it's possible that there is no
+ * single CU with complete type information describing structs `S`, `A`, and
+ * `B`. Also, we might get tons of duplicated and redundant type information.
+ *
+ * Additional complication we need to keep in mind comes from the fact that
+ * types, in general, can form graphs containing cycles, not just DAGs.
+ *
+ * While algorithm does deduplication, it also merges and resolves type
+ * information (unless disabled throught `struct btf_opts`), whenever possible.
+ * E.g., in the example above with two compilation units having partial type
+ * information for structs `A` and `B`, the output of algorithm will emit
+ * a single copy of each BTF type that describes structs `A`, `B`, and `S`
+ * (as well as type information for `int` and pointers), as if they were defined
+ * in a single compilation unit as:
+ *
+ * struct A {
+ *	int a;
+ *	struct A* self;
+ *	struct S* parent;
+ * };
+ * struct B {
+ *	int b;
+ *	struct B* self;
+ *	struct S* parent;
+ * };
+ * struct S {
+ *	struct A* a_ptr;
+ *	struct B* b_ptr;
+ * };
+ *
+ * Algorithm summary
+ * =================
+ *
+ * Algorithm completes its work in 6 separate passes:
+ *
+ * 1. Strings deduplication.
+ * 2. Primitive types deduplication (int, enum, fwd).
+ * 3. Struct/union types deduplication.
+ * 4. Reference types deduplication (pointers, typedefs, arrays, funcs, func
+ *    protos, and const/volatile/restrict modifiers).
+ * 5. Types compaction.
+ * 6. Types remapping.
+ *
+ * Algorithm determines canonical type descriptor, which is a single
+ * representative type for each truly unique type. This canonical type is the
+ * one that will go into final deduplicated BTF type information. For
+ * struct/unions, it is also the type that algorithm will merge additional type
+ * information into (while resolving FWDs), as it discovers it from data in
+ * other CUs. Each input BTF type eventually gets either mapped to itself, if
+ * that type is canonical, or to some other type, if that type is equivalent
+ * and was chosen as canonical representative. This mapping is stored in
+ * `btf_dedup->map` array. This map is also used to record STRUCT/UNION that
+ * FWD type got resolved to.
+ *
+ * To facilitate fast discovery of canonical types, we also maintain canonical
+ * index (`btf_dedup->dedup_table`), which maps type descriptor's signature hash
+ * (i.e., hashed kind, name, size, fields, etc) into a list of canonical types
+ * that match that signature. With sufficiently good choice of type signature
+ * hashing function, we can limit number of canonical types for each unique type
+ * signature to a very small number, allowing to find canonical type for any
+ * duplicated type very quickly.
+ *
+ * Struct/union deduplication is the most critical part and algorithm for
+ * deduplicating structs/unions is described in greater details in comments for
+ * `btf_dedup_is_equiv` function.
+ */
+int btf__dedup(struct btf *btf, struct btf_ext *btf_ext,
+	       const struct btf_dedup_opts *opts)
+{
+	struct btf_dedup *d = btf_dedup_new(btf, btf_ext, opts);
+	int err;
+
+	if (IS_ERR(d)) {
+		pr_debug("btf_dedup_new failed: %ld", PTR_ERR(d));
+		return -EINVAL;
+	}
+
+	err = btf_dedup_strings(d);
+	if (err < 0) {
+		pr_debug("btf_dedup_strings failed:%d\n", err);
+		goto done;
+	}
+	err = btf_dedup_prim_types(d);
+	if (err < 0) {
+		pr_debug("btf_dedup_prim_types failed:%d\n", err);
+		goto done;
+	}
+	err = btf_dedup_struct_types(d);
+	if (err < 0) {
+		pr_debug("btf_dedup_struct_types failed:%d\n", err);
+		goto done;
+	}
+	err = btf_dedup_ref_types(d);
+	if (err < 0) {
+		pr_debug("btf_dedup_ref_types failed:%d\n", err);
+		goto done;
+	}
+	err = btf_dedup_compact_types(d);
+	if (err < 0) {
+		pr_debug("btf_dedup_compact_types failed:%d\n", err);
+		goto done;
+	}
+	err = btf_dedup_remap_types(d);
+	if (err < 0) {
+		pr_debug("btf_dedup_remap_types failed:%d\n", err);
+		goto done;
+	}
+
+done:
+	btf_dedup_free(d);
+	return err;
+}
+
+#define BTF_DEDUP_TABLE_SIZE_LOG 14
+#define BTF_DEDUP_TABLE_MOD ((1 << BTF_DEDUP_TABLE_SIZE_LOG) - 1)
+#define BTF_UNPROCESSED_ID ((__u32)-1)
+#define BTF_IN_PROGRESS_ID ((__u32)-2)
+
+struct btf_dedup_node {
+	struct btf_dedup_node *next;
+	__u32 type_id;
+};
+
+struct btf_dedup {
+	/* .BTF section to be deduped in-place */
+	struct btf *btf;
+	/*
+	 * Optional .BTF.ext section. When provided, any strings referenced
+	 * from it will be taken into account when deduping strings
+	 */
+	struct btf_ext *btf_ext;
+	/*
+	 * This is a map from any type's signature hash to a list of possible
+	 * canonical representative type candidates. Hash collisions are
+	 * ignored, so even types of various kinds can share same list of
+	 * candidates, which is fine because we rely on subsequent
+	 * btf_xxx_equal() checks to authoritatively verify type equality.
+	 */
+	struct btf_dedup_node **dedup_table;
+	/* Canonical types map */
+	__u32 *map;
+	/* Hypothetical mapping, used during type graph equivalence checks */
+	__u32 *hypot_map;
+	__u32 *hypot_list;
+	size_t hypot_cnt;
+	size_t hypot_cap;
+	/* Various option modifying behavior of algorithm */
+	struct btf_dedup_opts opts;
+};
+
+struct btf_str_ptr {
+	const char *str;
+	__u32 new_off;
+	bool used;
+};
+
+struct btf_str_ptrs {
+	struct btf_str_ptr *ptrs;
+	const char *data;
+	__u32 cnt;
+	__u32 cap;
+};
+
+static inline __u32 hash_combine(__u32 h, __u32 value)
+{
+/* 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1 */
+#define GOLDEN_RATIO_PRIME 0x9e370001UL
+	return h * 37 + value * GOLDEN_RATIO_PRIME;
+#undef GOLDEN_RATIO_PRIME
+}
+
+#define for_each_hash_node(table, hash, node) \
+	for (node = table[hash & BTF_DEDUP_TABLE_MOD]; node; node = node->next)
+
+static int btf_dedup_table_add(struct btf_dedup *d, __u32 hash, __u32 type_id)
+{
+	struct btf_dedup_node *node = malloc(sizeof(struct btf_dedup_node));
+
+	if (!node)
+		return -ENOMEM;
+	node->type_id = type_id;
+	node->next = d->dedup_table[hash & BTF_DEDUP_TABLE_MOD];
+	d->dedup_table[hash & BTF_DEDUP_TABLE_MOD] = node;
+	return 0;
+}
+
+static int btf_dedup_hypot_map_add(struct btf_dedup *d,
+				   __u32 from_id, __u32 to_id)
+{
+	if (d->hypot_cnt == d->hypot_cap) {
+		__u32 *new_list;
+
+		d->hypot_cap += max(16, d->hypot_cap / 2);
+		new_list = realloc(d->hypot_list, sizeof(__u32) * d->hypot_cap);
+		if (!new_list)
+			return -ENOMEM;
+		d->hypot_list = new_list;
+	}
+	d->hypot_list[d->hypot_cnt++] = from_id;
+	d->hypot_map[from_id] = to_id;
+	return 0;
+}
+
+static void btf_dedup_clear_hypot_map(struct btf_dedup *d)
+{
+	int i;
+
+	for (i = 0; i < d->hypot_cnt; i++)
+		d->hypot_map[d->hypot_list[i]] = BTF_UNPROCESSED_ID;
+	d->hypot_cnt = 0;
+}
+
+static void btf_dedup_table_free(struct btf_dedup *d)
+{
+	struct btf_dedup_node *head, *tmp;
+	int i;
+
+	if (!d->dedup_table)
+		return;
+
+	for (i = 0; i < (1 << BTF_DEDUP_TABLE_SIZE_LOG); i++) {
+		while (d->dedup_table[i]) {
+			tmp = d->dedup_table[i];
+			d->dedup_table[i] = tmp->next;
+			free(tmp);
+		}
+
+		head = d->dedup_table[i];
+		while (head) {
+			tmp = head;
+			head = head->next;
+			free(tmp);
+		}
+	}
+
+	free(d->dedup_table);
+	d->dedup_table = NULL;
+}
+
+static void btf_dedup_free(struct btf_dedup *d)
+{
+	btf_dedup_table_free(d);
+
+	free(d->map);
+	d->map = NULL;
+
+	free(d->hypot_map);
+	d->hypot_map = NULL;
+
+	free(d->hypot_list);
+	d->hypot_list = NULL;
+
+	free(d);
+}
+
+static struct btf_dedup *btf_dedup_new(struct btf *btf, struct btf_ext *btf_ext,
+				       const struct btf_dedup_opts *opts)
+{
+	struct btf_dedup *d = calloc(1, sizeof(struct btf_dedup));
+	int i, err = 0;
+
+	if (!d)
+		return ERR_PTR(-ENOMEM);
+
+	d->btf = btf;
+	d->btf_ext = btf_ext;
+
+	d->dedup_table = calloc(1 << BTF_DEDUP_TABLE_SIZE_LOG,
+				sizeof(struct btf_dedup_node *));
+	if (!d->dedup_table) {
+		err = -ENOMEM;
+		goto done;
+	}
+
+	d->map = malloc(sizeof(__u32) * (1 + btf->nr_types));
+	if (!d->map) {
+		err = -ENOMEM;
+		goto done;
+	}
+	/* special BTF "void" type is made canonical immediately */
+	d->map[0] = 0;
+	for (i = 1; i <= btf->nr_types; i++)
+		d->map[i] = BTF_UNPROCESSED_ID;
+
+	d->hypot_map = malloc(sizeof(__u32) * (1 + btf->nr_types));
+	if (!d->hypot_map) {
+		err = -ENOMEM;
+		goto done;
+	}
+	for (i = 0; i <= btf->nr_types; i++)
+		d->hypot_map[i] = BTF_UNPROCESSED_ID;
+
+	d->opts.dont_resolve_fwds = opts && opts->dont_resolve_fwds;
+
+done:
+	if (err) {
+		btf_dedup_free(d);
+		return ERR_PTR(err);
+	}
+
+	return d;
+}
+
+typedef int (*str_off_fn_t)(__u32 *str_off_ptr, void *ctx);
+
+/*
+ * Iterate over all possible places in .BTF and .BTF.ext that can reference
+ * string and pass pointer to it to a provided callback `fn`.
+ */
+static int btf_for_each_str_off(struct btf_dedup *d, str_off_fn_t fn, void *ctx)
+{
+	void *line_data_cur, *line_data_end;
+	int i, j, r, rec_size;
+	struct btf_type *t;
+
+	for (i = 1; i <= d->btf->nr_types; i++) {
+		t = d->btf->types[i];
+		r = fn(&t->name_off, ctx);
+		if (r)
+			return r;
+
+		switch (BTF_INFO_KIND(t->info)) {
+		case BTF_KIND_STRUCT:
+		case BTF_KIND_UNION: {
+			struct btf_member *m = (struct btf_member *)(t + 1);
+			__u16 vlen = BTF_INFO_VLEN(t->info);
+
+			for (j = 0; j < vlen; j++) {
+				r = fn(&m->name_off, ctx);
+				if (r)
+					return r;
+				m++;
+			}
+			break;
+		}
+		case BTF_KIND_ENUM: {
+			struct btf_enum *m = (struct btf_enum *)(t + 1);
+			__u16 vlen = BTF_INFO_VLEN(t->info);
+
+			for (j = 0; j < vlen; j++) {
+				r = fn(&m->name_off, ctx);
+				if (r)
+					return r;
+				m++;
+			}
+			break;
+		}
+		case BTF_KIND_FUNC_PROTO: {
+			struct btf_param *m = (struct btf_param *)(t + 1);
+			__u16 vlen = BTF_INFO_VLEN(t->info);
+
+			for (j = 0; j < vlen; j++) {
+				r = fn(&m->name_off, ctx);
+				if (r)
+					return r;
+				m++;
+			}
+			break;
+		}
+		default:
+			break;
+		}
+	}
+
+	if (!d->btf_ext)
+		return 0;
+
+	line_data_cur = d->btf_ext->line_info.info;
+	line_data_end = d->btf_ext->line_info.info + d->btf_ext->line_info.len;
+	rec_size = d->btf_ext->line_info.rec_size;
+
+	while (line_data_cur < line_data_end) {
+		struct btf_ext_info_sec *sec = line_data_cur;
+		struct bpf_line_info_min *line_info;
+		__u32 num_info = sec->num_info;
+
+		r = fn(&sec->sec_name_off, ctx);
+		if (r)
+			return r;
+
+		line_data_cur += sizeof(struct btf_ext_info_sec);
+		for (i = 0; i < num_info; i++) {
+			line_info = line_data_cur;
+			r = fn(&line_info->file_name_off, ctx);
+			if (r)
+				return r;
+			r = fn(&line_info->line_off, ctx);
+			if (r)
+				return r;
+			line_data_cur += rec_size;
+		}
+	}
+
+	return 0;
+}
+
+static int str_sort_by_content(const void *a1, const void *a2)
+{
+	const struct btf_str_ptr *p1 = a1;
+	const struct btf_str_ptr *p2 = a2;
+
+	return strcmp(p1->str, p2->str);
+}
+
+static int str_sort_by_offset(const void *a1, const void *a2)
+{
+	const struct btf_str_ptr *p1 = a1;
+	const struct btf_str_ptr *p2 = a2;
+
+	if (p1->str != p2->str)
+		return p1->str < p2->str ? -1 : 1;
+	return 0;
+}
+
+static int btf_dedup_str_ptr_cmp(const void *str_ptr, const void *pelem)
+{
+	const struct btf_str_ptr *p = pelem;
+
+	if (str_ptr != p->str)
+		return (const char *)str_ptr < p->str ? -1 : 1;
+	return 0;
+}
+
+static int btf_str_mark_as_used(__u32 *str_off_ptr, void *ctx)
+{
+	struct btf_str_ptrs *strs;
+	struct btf_str_ptr *s;
+
+	if (*str_off_ptr == 0)
+		return 0;
+
+	strs = ctx;
+	s = bsearch(strs->data + *str_off_ptr, strs->ptrs, strs->cnt,
+		    sizeof(struct btf_str_ptr), btf_dedup_str_ptr_cmp);
+	if (!s)
+		return -EINVAL;
+	s->used = true;
+	return 0;
+}
+
+static int btf_str_remap_offset(__u32 *str_off_ptr, void *ctx)
+{
+	struct btf_str_ptrs *strs;
+	struct btf_str_ptr *s;
+
+	if (*str_off_ptr == 0)
+		return 0;
+
+	strs = ctx;
+	s = bsearch(strs->data + *str_off_ptr, strs->ptrs, strs->cnt,
+		    sizeof(struct btf_str_ptr), btf_dedup_str_ptr_cmp);
+	if (!s)
+		return -EINVAL;
+	*str_off_ptr = s->new_off;
+	return 0;
+}
+
+/*
+ * Dedup string and filter out those that are not referenced from either .BTF
+ * or .BTF.ext (if provided) sections.
+ *
+ * This is done by building index of all strings in BTF's string section,
+ * then iterating over all entities that can reference strings (e.g., type
+ * names, struct field names, .BTF.ext line info, etc) and marking corresponding
+ * strings as used. After that all used strings are deduped and compacted into
+ * sequential blob of memory and new offsets are calculated. Then all the string
+ * references are iterated again and rewritten using new offsets.
+ */
+static int btf_dedup_strings(struct btf_dedup *d)
+{
+	const struct btf_header *hdr = d->btf->hdr;
+	char *start = (char *)d->btf->nohdr_data + hdr->str_off;
+	char *end = start + d->btf->hdr->str_len;
+	char *p = start, *tmp_strs = NULL;
+	struct btf_str_ptrs strs = {
+		.cnt = 0,
+		.cap = 0,
+		.ptrs = NULL,
+		.data = start,
+	};
+	int i, j, err = 0, grp_idx;
+	bool grp_used;
+
+	/* build index of all strings */
+	while (p < end) {
+		if (strs.cnt + 1 > strs.cap) {
+			struct btf_str_ptr *new_ptrs;
+
+			strs.cap += max(strs.cnt / 2, 16);
+			new_ptrs = realloc(strs.ptrs,
+					   sizeof(strs.ptrs[0]) * strs.cap);
+			if (!new_ptrs) {
+				err = -ENOMEM;
+				goto done;
+			}
+			strs.ptrs = new_ptrs;
+		}
+
+		strs.ptrs[strs.cnt].str = p;
+		strs.ptrs[strs.cnt].used = false;
+
+		p += strlen(p) + 1;
+		strs.cnt++;
+	}
+
+	/* temporary storage for deduplicated strings */
+	tmp_strs = malloc(d->btf->hdr->str_len);
+	if (!tmp_strs) {
+		err = -ENOMEM;
+		goto done;
+	}
+
+	/* mark all used strings */
+	strs.ptrs[0].used = true;
+	err = btf_for_each_str_off(d, btf_str_mark_as_used, &strs);
+	if (err)
+		goto done;
+
+	/* sort strings by context, so that we can identify duplicates */
+	qsort(strs.ptrs, strs.cnt, sizeof(strs.ptrs[0]), str_sort_by_content);
+
+	/*
+	 * iterate groups of equal strings and if any instance in a group was
+	 * referenced, emit single instance and remember new offset
+	 */
+	p = tmp_strs;
+	grp_idx = 0;
+	grp_used = strs.ptrs[0].used;
+	/* iterate past end to avoid code duplication after loop */
+	for (i = 1; i <= strs.cnt; i++) {
+		/*
+		 * when i == strs.cnt, we want to skip string comparison and go
+		 * straight to handling last group of strings (otherwise we'd
+		 * need to handle last group after the loop w/ duplicated code)
+		 */
+		if (i < strs.cnt &&
+		    !strcmp(strs.ptrs[i].str, strs.ptrs[grp_idx].str)) {
+			grp_used = grp_used || strs.ptrs[i].used;
+			continue;
+		}
+
+		/*
+		 * this check would have been required after the loop to handle
+		 * last group of strings, but due to <= condition in a loop
+		 * we avoid that duplication
+		 */
+		if (grp_used) {
+			int new_off = p - tmp_strs;
+			__u32 len = strlen(strs.ptrs[grp_idx].str);
+
+			memmove(p, strs.ptrs[grp_idx].str, len + 1);
+			for (j = grp_idx; j < i; j++)
+				strs.ptrs[j].new_off = new_off;
+			p += len + 1;
+		}
+
+		if (i < strs.cnt) {
+			grp_idx = i;
+			grp_used = strs.ptrs[i].used;
+		}
+	}
+
+	/* replace original strings with deduped ones */
+	d->btf->hdr->str_len = p - tmp_strs;
+	memmove(start, tmp_strs, d->btf->hdr->str_len);
+	end = start + d->btf->hdr->str_len;
+
+	/* restore original order for further binary search lookups */
+	qsort(strs.ptrs, strs.cnt, sizeof(strs.ptrs[0]), str_sort_by_offset);
+
+	/* remap string offsets */
+	err = btf_for_each_str_off(d, btf_str_remap_offset, &strs);
+	if (err)
+		goto done;
+
+	d->btf->hdr->str_len = end - start;
+
+done:
+	free(tmp_strs);
+	free(strs.ptrs);
+	return err;
+}
+
+static __u32 btf_hash_common(struct btf_type *t)
+{
+	__u32 h;
+
+	h = hash_combine(0, t->name_off);
+	h = hash_combine(h, t->info);
+	h = hash_combine(h, t->size);
+	return h;
+}
+
+static bool btf_equal_common(struct btf_type *t1, struct btf_type *t2)
+{
+	return t1->name_off == t2->name_off &&
+	       t1->info == t2->info &&
+	       t1->size == t2->size;
+}
+
+/* Calculate type signature hash of INT. */
+static __u32 btf_hash_int(struct btf_type *t)
+{
+	__u32 info = *(__u32 *)(t + 1);
+	__u32 h;
+
+	h = btf_hash_common(t);
+	h = hash_combine(h, info);
+	return h;
+}
+
+/* Check structural equality of two INTs. */
+static bool btf_equal_int(struct btf_type *t1, struct btf_type *t2)
+{
+	__u32 info1, info2;
+
+	if (!btf_equal_common(t1, t2))
+		return false;
+	info1 = *(__u32 *)(t1 + 1);
+	info2 = *(__u32 *)(t2 + 1);
+	return info1 == info2;
+}
+
+/* Calculate type signature hash of ENUM. */
+static __u32 btf_hash_enum(struct btf_type *t)
+{
+	struct btf_enum *member = (struct btf_enum *)(t + 1);
+	__u32 vlen = BTF_INFO_VLEN(t->info);
+	__u32 h = btf_hash_common(t);
+	int i;
+
+	for (i = 0; i < vlen; i++) {
+		h = hash_combine(h, member->name_off);
+		h = hash_combine(h, member->val);
+		member++;
+	}
+	return h;
+}
+
+/* Check structural equality of two ENUMs. */
+static bool btf_equal_enum(struct btf_type *t1, struct btf_type *t2)
+{
+	struct btf_enum *m1, *m2;
+	__u16 vlen;
+	int i;
+
+	if (!btf_equal_common(t1, t2))
+		return false;
+
+	vlen = BTF_INFO_VLEN(t1->info);
+	m1 = (struct btf_enum *)(t1 + 1);
+	m2 = (struct btf_enum *)(t2 + 1);
+	for (i = 0; i < vlen; i++) {
+		if (m1->name_off != m2->name_off || m1->val != m2->val)
+			return false;
+		m1++;
+		m2++;
+	}
+	return true;
+}
+
+/*
+ * Calculate type signature hash of STRUCT/UNION, ignoring referenced type IDs,
+ * as referenced type IDs equivalence is established separately during type
+ * graph equivalence check algorithm.
+ */
+static __u32 btf_hash_struct(struct btf_type *t)
+{
+	struct btf_member *member = (struct btf_member *)(t + 1);
+	__u32 vlen = BTF_INFO_VLEN(t->info);
+	__u32 h = btf_hash_common(t);
+	int i;
+
+	for (i = 0; i < vlen; i++) {
+		h = hash_combine(h, member->name_off);
+		h = hash_combine(h, member->offset);
+		/* no hashing of referenced type ID, it can be unresolved yet */
+		member++;
+	}
+	return h;
+}
+
+/*
+ * Check structural compatibility of two FUNC_PROTOs, ignoring referenced type
+ * IDs. This check is performed during type graph equivalence check and
+ * referenced types equivalence is checked separately.
+ */
+static bool btf_equal_struct(struct btf_type *t1, struct btf_type *t2)
+{
+	struct btf_member *m1, *m2;
+	__u16 vlen;
+	int i;
+
+	if (!btf_equal_common(t1, t2))
+		return false;
+
+	vlen = BTF_INFO_VLEN(t1->info);
+	m1 = (struct btf_member *)(t1 + 1);
+	m2 = (struct btf_member *)(t2 + 1);
+	for (i = 0; i < vlen; i++) {
+		if (m1->name_off != m2->name_off || m1->offset != m2->offset)
+			return false;
+		m1++;
+		m2++;
+	}
+	return true;
+}
+
+/*
+ * Calculate type signature hash of ARRAY, including referenced type IDs,
+ * under assumption that they were already resolved to canonical type IDs and
+ * are not going to change.
+ */
+static __u32 btf_hash_array(struct btf_type *t)
+{
+	struct btf_array *info = (struct btf_array *)(t + 1);
+	__u32 h = btf_hash_common(t);
+
+	h = hash_combine(h, info->type);
+	h = hash_combine(h, info->index_type);
+	h = hash_combine(h, info->nelems);
+	return h;
+}
+
+/*
+ * Check exact equality of two ARRAYs, taking into account referenced
+ * type IDs, under assumption that they were already resolved to canonical
+ * type IDs and are not going to change.
+ * This function is called during reference types deduplication to compare
+ * ARRAY to potential canonical representative.
+ */
+static bool btf_equal_array(struct btf_type *t1, struct btf_type *t2)
+{
+	struct btf_array *info1, *info2;
+
+	if (!btf_equal_common(t1, t2))
+		return false;
+
+	info1 = (struct btf_array *)(t1 + 1);
+	info2 = (struct btf_array *)(t2 + 1);
+	return info1->type == info2->type &&
+	       info1->index_type == info2->index_type &&
+	       info1->nelems == info2->nelems;
+}
+
+/*
+ * Check structural compatibility of two ARRAYs, ignoring referenced type
+ * IDs. This check is performed during type graph equivalence check and
+ * referenced types equivalence is checked separately.
+ */
+static bool btf_compat_array(struct btf_type *t1, struct btf_type *t2)
+{
+	struct btf_array *info1, *info2;
+
+	if (!btf_equal_common(t1, t2))
+		return false;
+
+	info1 = (struct btf_array *)(t1 + 1);
+	info2 = (struct btf_array *)(t2 + 1);
+	return info1->nelems == info2->nelems;
+}
+
+/*
+ * Calculate type signature hash of FUNC_PROTO, including referenced type IDs,
+ * under assumption that they were already resolved to canonical type IDs and
+ * are not going to change.
+ */
+static inline __u32 btf_hash_fnproto(struct btf_type *t)
+{
+	struct btf_param *member = (struct btf_param *)(t + 1);
+	__u16 vlen = BTF_INFO_VLEN(t->info);
+	__u32 h = btf_hash_common(t);
+	int i;
+
+	for (i = 0; i < vlen; i++) {
+		h = hash_combine(h, member->name_off);
+		h = hash_combine(h, member->type);
+		member++;
+	}
+	return h;
+}
+
+/*
+ * Check exact equality of two FUNC_PROTOs, taking into account referenced
+ * type IDs, under assumption that they were already resolved to canonical
+ * type IDs and are not going to change.
+ * This function is called during reference types deduplication to compare
+ * FUNC_PROTO to potential canonical representative.
+ */
+static inline bool btf_equal_fnproto(struct btf_type *t1, struct btf_type *t2)
+{
+	struct btf_param *m1, *m2;
+	__u16 vlen;
+	int i;
+
+	if (!btf_equal_common(t1, t2))
+		return false;
+
+	vlen = BTF_INFO_VLEN(t1->info);
+	m1 = (struct btf_param *)(t1 + 1);
+	m2 = (struct btf_param *)(t2 + 1);
+	for (i = 0; i < vlen; i++) {
+		if (m1->name_off != m2->name_off || m1->type != m2->type)
+			return false;
+		m1++;
+		m2++;
+	}
+	return true;
+}
+
+/*
+ * Check structural compatibility of two FUNC_PROTOs, ignoring referenced type
+ * IDs. This check is performed during type graph equivalence check and
+ * referenced types equivalence is checked separately.
+ */
+static inline bool btf_compat_fnproto(struct btf_type *t1, struct btf_type *t2)
+{
+	struct btf_param *m1, *m2;
+	__u16 vlen;
+	int i;
+
+	/* skip return type ID */
+	if (t1->name_off != t2->name_off || t1->info != t2->info)
+		return false;
+
+	vlen = BTF_INFO_VLEN(t1->info);
+	m1 = (struct btf_param *)(t1 + 1);
+	m2 = (struct btf_param *)(t2 + 1);
+	for (i = 0; i < vlen; i++) {
+		if (m1->name_off != m2->name_off)
+			return false;
+		m1++;
+		m2++;
+	}
+	return true;
+}
+
+/*
+ * Deduplicate primitive types, that can't reference other types, by calculating
+ * their type signature hash and comparing them with any possible canonical
+ * candidate. If no canonical candidate matches, type itself is marked as
+ * canonical and is added into `btf_dedup->dedup_table` as another candidate.
+ */
+static int btf_dedup_prim_type(struct btf_dedup *d, __u32 type_id)
+{
+	struct btf_type *t = d->btf->types[type_id];
+	struct btf_type *cand;
+	struct btf_dedup_node *cand_node;
+	/* if we don't find equivalent type, then we are canonical */
+	__u32 new_id = type_id;
+	__u32 h;
+
+	switch (BTF_INFO_KIND(t->info)) {
+	case BTF_KIND_CONST:
+	case BTF_KIND_VOLATILE:
+	case BTF_KIND_RESTRICT:
+	case BTF_KIND_PTR:
+	case BTF_KIND_TYPEDEF:
+	case BTF_KIND_ARRAY:
+	case BTF_KIND_STRUCT:
+	case BTF_KIND_UNION:
+	case BTF_KIND_FUNC:
+	case BTF_KIND_FUNC_PROTO:
+		return 0;
+
+	case BTF_KIND_INT:
+		h = btf_hash_int(t);
+		for_each_hash_node(d->dedup_table, h, cand_node) {
+			cand = d->btf->types[cand_node->type_id];
+			if (btf_equal_int(t, cand)) {
+				new_id = cand_node->type_id;
+				break;
+			}
+		}
+		break;
+
+	case BTF_KIND_ENUM:
+		h = btf_hash_enum(t);
+		for_each_hash_node(d->dedup_table, h, cand_node) {
+			cand = d->btf->types[cand_node->type_id];
+			if (btf_equal_enum(t, cand)) {
+				new_id = cand_node->type_id;
+				break;
+			}
+		}
+		break;
+
+	case BTF_KIND_FWD:
+		h = btf_hash_common(t);
+		for_each_hash_node(d->dedup_table, h, cand_node) {
+			cand = d->btf->types[cand_node->type_id];
+			if (btf_equal_common(t, cand)) {
+				new_id = cand_node->type_id;
+				break;
+			}
+		}
+		break;
+
+	default:
+		return -EINVAL;
+	}
+
+	d->map[type_id] = new_id;
+	if (type_id == new_id && btf_dedup_table_add(d, h, type_id))
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int btf_dedup_prim_types(struct btf_dedup *d)
+{
+	int i, err;
+
+	for (i = 1; i <= d->btf->nr_types; i++) {
+		err = btf_dedup_prim_type(d, i);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+/*
+ * Check whether type is already mapped into canonical one (could be to itself).
+ */
+static inline bool is_type_mapped(struct btf_dedup *d, uint32_t type_id)
+{
+	return d->map[type_id] <= BTF_MAX_TYPE;
+}
+
+/*
+ * Resolve type ID into its canonical type ID, if any; otherwise return original
+ * type ID. If type is FWD and is resolved into STRUCT/UNION already, follow
+ * STRUCT/UNION link and resolve it into canonical type ID as well.
+ */
+static inline __u32 resolve_type_id(struct btf_dedup *d, __u32 type_id)
+{
+	while (is_type_mapped(d, type_id) && d->map[type_id] != type_id)
+		type_id = d->map[type_id];
+	return type_id;
+}
+
+/*
+ * Resolve FWD to underlying STRUCT/UNION, if any; otherwise return original
+ * type ID.
+ */
+static uint32_t resolve_fwd_id(struct btf_dedup *d, uint32_t type_id)
+{
+	__u32 orig_type_id = type_id;
+
+	if (BTF_INFO_KIND(d->btf->types[type_id]->info) != BTF_KIND_FWD)
+		return type_id;
+
+	while (is_type_mapped(d, type_id) && d->map[type_id] != type_id)
+		type_id = d->map[type_id];
+
+	if (BTF_INFO_KIND(d->btf->types[type_id]->info) != BTF_KIND_FWD)
+		return type_id;
+
+	return orig_type_id;
+}
+
+
+static inline __u16 btf_fwd_kind(struct btf_type *t)
+{
+	return BTF_INFO_KFLAG(t->info) ? BTF_KIND_UNION : BTF_KIND_STRUCT;
+}
+
+/*
+ * Check equivalence of BTF type graph formed by candidate struct/union (we'll
+ * call it "candidate graph" in this description for brevity) to a type graph
+ * formed by (potential) canonical struct/union ("canonical graph" for brevity
+ * here, though keep in mind that not all types in canonical graph are
+ * necessarily canonical representatives themselves, some of them might be
+ * duplicates or its uniqueness might not have been established yet).
+ * Returns:
+ *  - >0, if type graphs are equivalent;
+ *  -  0, if not equivalent;
+ *  - <0, on error.
+ *
+ * Algorithm performs side-by-side DFS traversal of both type graphs and checks
+ * equivalence of BTF types at each step. If at any point BTF types in candidate
+ * and canonical graphs are not compatible structurally, whole graphs are
+ * incompatible. If types are structurally equivalent (i.e., all information
+ * except referenced type IDs is exactly the same), a mapping from `canon_id` to
+ * a `cand_id` is recored in hypothetical mapping (`btf_dedup->hypot_map`).
+ * If a type references other types, then those referenced types are checked
+ * for equivalence recursively.
+ *
+ * During DFS traversal, if we find that for current `canon_id` type we
+ * already have some mapping in hypothetical map, we check for two possible
+ * situations:
+ *   - `canon_id` is mapped to exactly the same type as `cand_id`. This will
+ *     happen when type graphs have cycles. In this case we assume those two
+ *     types are equivalent.
+ *   - `canon_id` is mapped to different type. This is contradiction in our
+ *     hypothetical mapping, because same graph in canonical graph corresponds
+ *     to two different types in candidate graph, which for equivalent type
+ *     graphs shouldn't happen. This condition terminates equivalence check
+ *     with negative result.
+ *
+ * If type graphs traversal exhausts types to check and find no contradiction,
+ * then type graphs are equivalent.
+ *
+ * When checking types for equivalence, there is one special case: FWD types.
+ * If FWD type resolution is allowed and one of the types (either from canonical
+ * or candidate graph) is FWD and other is STRUCT/UNION (depending on FWD's kind
+ * flag) and their names match, hypothetical mapping is updated to point from
+ * FWD to STRUCT/UNION. If graphs will be determined as equivalent successfully,
+ * this mapping will be used to record FWD -> STRUCT/UNION mapping permanently.
+ *
+ * Technically, this could lead to incorrect FWD to STRUCT/UNION resolution,
+ * if there are two exactly named (or anonymous) structs/unions that are
+ * compatible structurally, one of which has FWD field, while other is concrete
+ * STRUCT/UNION, but according to C sources they are different structs/unions
+ * that are referencing different types with the same name. This is extremely
+ * unlikely to happen, but btf_dedup API allows to disable FWD resolution if
+ * this logic is causing problems.
+ *
+ * Doing FWD resolution means that both candidate and/or canonical graphs can
+ * consists of portions of the graph that come from multiple compilation units.
+ * This is due to the fact that types within single compilation unit are always
+ * deduplicated and FWDs are already resolved, if referenced struct/union
+ * definiton is available. So, if we had unresolved FWD and found corresponding
+ * STRUCT/UNION, they will be from different compilation units. This
+ * consequently means that when we "link" FWD to corresponding STRUCT/UNION,
+ * type graph will likely have at least two different BTF types that describe
+ * same type (e.g., most probably there will be two different BTF types for the
+ * same 'int' primitive type) and could even have "overlapping" parts of type
+ * graph that describe same subset of types.
+ *
+ * This in turn means that our assumption that each type in canonical graph
+ * must correspond to exactly one type in candidate graph might not hold
+ * anymore and will make it harder to detect contradictions using hypothetical
+ * map. To handle this problem, we allow to follow FWD -> STRUCT/UNION
+ * resolution only in canonical graph. FWDs in candidate graphs are never
+ * resolved. To see why it's OK, let's check all possible situations w.r.t. FWDs
+ * that can occur:
+ *   - Both types in canonical and candidate graphs are FWDs. If they are
+ *     structurally equivalent, then they can either be both resolved to the
+ *     same STRUCT/UNION or not resolved at all. In both cases they are
+ *     equivalent and there is no need to resolve FWD on candidate side.
+ *   - Both types in canonical and candidate graphs are concrete STRUCT/UNION,
+ *     so nothing to resolve as well, algorithm will check equivalence anyway.
+ *   - Type in canonical graph is FWD, while type in candidate is concrete
+ *     STRUCT/UNION. In this case candidate graph comes from single compilation
+ *     unit, so there is exactly one BTF type for each unique C type. After
+ *     resolving FWD into STRUCT/UNION, there might be more than one BTF type
+ *     in canonical graph mapping to single BTF type in candidate graph, but
+ *     because hypothetical mapping maps from canonical to candidate types, it's
+ *     alright, and we still maintain the property of having single `canon_id`
+ *     mapping to single `cand_id` (there could be two different `canon_id`
+ *     mapped to the same `cand_id`, but it's not contradictory).
+ *   - Type in canonical graph is concrete STRUCT/UNION, while type in candidate
+ *     graph is FWD. In this case we are just going to check compatibility of
+ *     STRUCT/UNION and corresponding FWD, and if they are compatible, we'll
+ *     assume that whatever STRUCT/UNION FWD resolves to must be equivalent to
+ *     a concrete STRUCT/UNION from canonical graph. If the rest of type graphs
+ *     turn out equivalent, we'll re-resolve FWD to concrete STRUCT/UNION from
+ *     canonical graph.
+ */
+static int btf_dedup_is_equiv(struct btf_dedup *d, __u32 cand_id,
+			      __u32 canon_id)
+{
+	struct btf_type *cand_type;
+	struct btf_type *canon_type;
+	__u32 hypot_type_id;
+	__u16 cand_kind;
+	__u16 canon_kind;
+	int i, eq;
+
+	/* if both resolve to the same canonical, they must be equivalent */
+	if (resolve_type_id(d, cand_id) == resolve_type_id(d, canon_id))
+		return 1;
+
+	canon_id = resolve_fwd_id(d, canon_id);
+
+	hypot_type_id = d->hypot_map[canon_id];
+	if (hypot_type_id <= BTF_MAX_TYPE)
+		return hypot_type_id == cand_id;
+
+	if (btf_dedup_hypot_map_add(d, canon_id, cand_id))
+		return -ENOMEM;
+
+	cand_type = d->btf->types[cand_id];
+	canon_type = d->btf->types[canon_id];
+	cand_kind = BTF_INFO_KIND(cand_type->info);
+	canon_kind = BTF_INFO_KIND(canon_type->info);
+
+	if (cand_type->name_off != canon_type->name_off)
+		return 0;
+
+	/* FWD <--> STRUCT/UNION equivalence check, if enabled */
+	if (!d->opts.dont_resolve_fwds
+	    && (cand_kind == BTF_KIND_FWD || canon_kind == BTF_KIND_FWD)
+	    && cand_kind != canon_kind) {
+		__u16 real_kind;
+		__u16 fwd_kind;
+
+		if (cand_kind == BTF_KIND_FWD) {
+			real_kind = canon_kind;
+			fwd_kind = btf_fwd_kind(cand_type);
+		} else {
+			real_kind = cand_kind;
+			fwd_kind = btf_fwd_kind(canon_type);
+		}
+		return fwd_kind == real_kind;
+	}
+
+	if (cand_type->info != canon_type->info)
+		return 0;
+
+	switch (cand_kind) {
+	case BTF_KIND_INT:
+		return btf_equal_int(cand_type, canon_type);
+
+	case BTF_KIND_ENUM:
+		return btf_equal_enum(cand_type, canon_type);
+
+	case BTF_KIND_FWD:
+		return btf_equal_common(cand_type, canon_type);
+
+	case BTF_KIND_CONST:
+	case BTF_KIND_VOLATILE:
+	case BTF_KIND_RESTRICT:
+	case BTF_KIND_PTR:
+	case BTF_KIND_TYPEDEF:
+	case BTF_KIND_FUNC:
+		return btf_dedup_is_equiv(d, cand_type->type, canon_type->type);
+
+	case BTF_KIND_ARRAY: {
+		struct btf_array *cand_arr, *canon_arr;
+
+		if (!btf_compat_array(cand_type, canon_type))
+			return 0;
+		cand_arr = (struct btf_array *)(cand_type + 1);
+		canon_arr = (struct btf_array *)(canon_type + 1);
+		eq = btf_dedup_is_equiv(d,
+			cand_arr->index_type, canon_arr->index_type);
+		if (eq <= 0)
+			return eq;
+		return btf_dedup_is_equiv(d, cand_arr->type, canon_arr->type);
+	}
+
+	case BTF_KIND_STRUCT:
+	case BTF_KIND_UNION: {
+		struct btf_member *cand_m, *canon_m;
+		__u16 vlen;
+
+		if (!btf_equal_struct(cand_type, canon_type))
+			return 0;
+		vlen = BTF_INFO_VLEN(cand_type->info);
+		cand_m = (struct btf_member *)(cand_type + 1);
+		canon_m = (struct btf_member *)(canon_type + 1);
+		for (i = 0; i < vlen; i++) {
+			eq = btf_dedup_is_equiv(d, cand_m->type, canon_m->type);
+			if (eq <= 0)
+				return eq;
+			cand_m++;
+			canon_m++;
+		}
+
+		return 1;
+	}
+
+	case BTF_KIND_FUNC_PROTO: {
+		struct btf_param *cand_p, *canon_p;
+		__u16 vlen;
+
+		if (!btf_compat_fnproto(cand_type, canon_type))
+			return 0;
+		eq = btf_dedup_is_equiv(d, cand_type->type, canon_type->type);
+		if (eq <= 0)
+			return eq;
+		vlen = BTF_INFO_VLEN(cand_type->info);
+		cand_p = (struct btf_param *)(cand_type + 1);
+		canon_p = (struct btf_param *)(canon_type + 1);
+		for (i = 0; i < vlen; i++) {
+			eq = btf_dedup_is_equiv(d, cand_p->type, canon_p->type);
+			if (eq <= 0)
+				return eq;
+			cand_p++;
+			canon_p++;
+		}
+		return 1;
+	}
+
+	default:
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/*
+ * Use hypothetical mapping, produced by successful type graph equivalence
+ * check, to augment existing struct/union canonical mapping, where possible.
+ *
+ * If BTF_KIND_FWD resolution is allowed, this mapping is also used to record
+ * FWD -> STRUCT/UNION correspondence as well. FWD resolution is bidirectional:
+ * it doesn't matter if FWD type was part of canonical graph or candidate one,
+ * we are recording the mapping anyway. As opposed to carefulness required
+ * for struct/union correspondence mapping (described below), for FWD resolution
+ * it's not important, as by the time that FWD type (reference type) will be
+ * deduplicated all structs/unions will be deduped already anyway.
+ *
+ * Recording STRUCT/UNION mapping is purely a performance optimization and is
+ * not required for correctness. It needs to be done carefully to ensure that
+ * struct/union from candidate's type graph is not mapped into corresponding
+ * struct/union from canonical type graph that itself hasn't been resolved into
+ * canonical representative. The only guarantee we have is that canonical
+ * struct/union was determined as canonical and that won't change. But any
+ * types referenced through that struct/union fields could have been not yet
+ * resolved, so in case like that it's too early to establish any kind of
+ * correspondence between structs/unions.
+ *
+ * No canonical correspondence is derived for primitive types (they are already
+ * deduplicated completely already anyway) or reference types (they rely on
+ * stability of struct/union canonical relationship for equivalence checks).
+ */
+static void btf_dedup_merge_hypot_map(struct btf_dedup *d)
+{
+	__u32 cand_type_id, targ_type_id;
+	__u16 t_kind, c_kind;
+	__u32 t_id, c_id;
+	int i;
+
+	for (i = 0; i < d->hypot_cnt; i++) {
+		cand_type_id = d->hypot_list[i];
+		targ_type_id = d->hypot_map[cand_type_id];
+		t_id = resolve_type_id(d, targ_type_id);
+		c_id = resolve_type_id(d, cand_type_id);
+		t_kind = BTF_INFO_KIND(d->btf->types[t_id]->info);
+		c_kind = BTF_INFO_KIND(d->btf->types[c_id]->info);
+		/*
+		 * Resolve FWD into STRUCT/UNION.
+		 * It's ok to resolve FWD into STRUCT/UNION that's not yet
+		 * mapped to canonical representative (as opposed to
+		 * STRUCT/UNION <--> STRUCT/UNION mapping logic below), because
+		 * eventually that struct is going to be mapped and all resolved
+		 * FWDs will automatically resolve to correct canonical
+		 * representative. This will happen before ref type deduping,
+		 * which critically depends on stability of these mapping. This
+		 * stability is not a requirement for STRUCT/UNION equivalence
+		 * checks, though.
+		 */
+		if (t_kind != BTF_KIND_FWD && c_kind == BTF_KIND_FWD)
+			d->map[c_id] = t_id;
+		else if (t_kind == BTF_KIND_FWD && c_kind != BTF_KIND_FWD)
+			d->map[t_id] = c_id;
+
+		if ((t_kind == BTF_KIND_STRUCT || t_kind == BTF_KIND_UNION) &&
+		    c_kind != BTF_KIND_FWD &&
+		    is_type_mapped(d, c_id) &&
+		    !is_type_mapped(d, t_id)) {
+			/*
+			 * as a perf optimization, we can map struct/union
+			 * that's part of type graph we just verified for
+			 * equivalence. We can do that for struct/union that has
+			 * canonical representative only, though.
+			 */
+			d->map[t_id] = c_id;
+		}
+	}
+}
+
+/*
+ * Deduplicate struct/union types.
+ *
+ * For each struct/union type its type signature hash is calculated, taking
+ * into account type's name, size, number, order and names of fields, but
+ * ignoring type ID's referenced from fields, because they might not be deduped
+ * completely until after reference types deduplication phase. This type hash
+ * is used to iterate over all potential canonical types, sharing same hash.
+ * For each canonical candidate we check whether type graphs that they form
+ * (through referenced types in fields and so on) are equivalent using algorithm
+ * implemented in `btf_dedup_is_equiv`. If such equivalence is found and
+ * BTF_KIND_FWD resolution is allowed, then hypothetical mapping
+ * (btf_dedup->hypot_map) produced by aforementioned type graph equivalence
+ * algorithm is used to record FWD -> STRUCT/UNION mapping. It's also used to
+ * potentially map other structs/unions to their canonical representatives,
+ * if such relationship hasn't yet been established. This speeds up algorithm
+ * by eliminating some of the duplicate work.
+ *
+ * If no matching canonical representative was found, struct/union is marked
+ * as canonical for itself and is added into btf_dedup->dedup_table hash map
+ * for further look ups.
+ */
+static int btf_dedup_struct_type(struct btf_dedup *d, __u32 type_id)
+{
+	struct btf_dedup_node *cand_node;
+	struct btf_type *t;
+	/* if we don't find equivalent type, then we are canonical */
+	__u32 new_id = type_id;
+	__u16 kind;
+	__u32 h;
+
+	/* already deduped or is in process of deduping (loop detected) */
+	if (d->map[type_id] <= BTF_MAX_TYPE)
+		return 0;
+
+	t = d->btf->types[type_id];
+	kind = BTF_INFO_KIND(t->info);
+
+	if (kind != BTF_KIND_STRUCT && kind != BTF_KIND_UNION)
+		return 0;
+
+	h = btf_hash_struct(t);
+	for_each_hash_node(d->dedup_table, h, cand_node) {
+		int eq;
+
+		btf_dedup_clear_hypot_map(d);
+		eq = btf_dedup_is_equiv(d, type_id, cand_node->type_id);
+		if (eq < 0)
+			return eq;
+		if (!eq)
+			continue;
+		new_id = cand_node->type_id;
+		btf_dedup_merge_hypot_map(d);
+		break;
+	}
+
+	d->map[type_id] = new_id;
+	if (type_id == new_id && btf_dedup_table_add(d, h, type_id))
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int btf_dedup_struct_types(struct btf_dedup *d)
+{
+	int i, err;
+
+	for (i = 1; i <= d->btf->nr_types; i++) {
+		err = btf_dedup_struct_type(d, i);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+/*
+ * Deduplicate reference type.
+ *
+ * Once all primitive and struct/union types got deduplicated, we can easily
+ * deduplicate all other (reference) BTF types. This is done in two steps:
+ *
+ * 1. Resolve all referenced type IDs into their canonical type IDs. This
+ * resolution can be done either immediately for primitive or struct/union types
+ * (because they were deduped in previous two phases) or recursively for
+ * reference types. Recursion will always terminate at either primitive or
+ * struct/union type, at which point we can "unwind" chain of reference types
+ * one by one. There is no danger of encountering cycles because in C type
+ * system the only way to form type cycle is through struct/union, so any chain
+ * of reference types, even those taking part in a type cycle, will inevitably
+ * reach struct/union at some point.
+ *
+ * 2. Once all referenced type IDs are resolved into canonical ones, BTF type
+ * becomes "stable", in the sense that no further deduplication will cause
+ * any changes to it. With that, it's now possible to calculate type's signature
+ * hash (this time taking into account referenced type IDs) and loop over all
+ * potential canonical representatives. If no match was found, current type
+ * will become canonical representative of itself and will be added into
+ * btf_dedup->dedup_table as another possible canonical representative.
+ */
+static int btf_dedup_ref_type(struct btf_dedup *d, __u32 type_id)
+{
+	struct btf_dedup_node *cand_node;
+	struct btf_type *t, *cand;
+	/* if we don't find equivalent type, then we are representative type */
+	__u32 new_id = type_id;
+	__u32 h, ref_type_id;
+
+	if (d->map[type_id] == BTF_IN_PROGRESS_ID)
+		return -ELOOP;
+	if (d->map[type_id] <= BTF_MAX_TYPE)
+		return resolve_type_id(d, type_id);
+
+	t = d->btf->types[type_id];
+	d->map[type_id] = BTF_IN_PROGRESS_ID;
+
+	switch (BTF_INFO_KIND(t->info)) {
+	case BTF_KIND_CONST:
+	case BTF_KIND_VOLATILE:
+	case BTF_KIND_RESTRICT:
+	case BTF_KIND_PTR:
+	case BTF_KIND_TYPEDEF:
+	case BTF_KIND_FUNC:
+		ref_type_id = btf_dedup_ref_type(d, t->type);
+		if (ref_type_id < 0)
+			return ref_type_id;
+		t->type = ref_type_id;
+
+		h = btf_hash_common(t);
+		for_each_hash_node(d->dedup_table, h, cand_node) {
+			cand = d->btf->types[cand_node->type_id];
+			if (btf_equal_common(t, cand)) {
+				new_id = cand_node->type_id;
+				break;
+			}
+		}
+		break;
+
+	case BTF_KIND_ARRAY: {
+		struct btf_array *info = (struct btf_array *)(t + 1);
+
+		ref_type_id = btf_dedup_ref_type(d, info->type);
+		if (ref_type_id < 0)
+			return ref_type_id;
+		info->type = ref_type_id;
+
+		ref_type_id = btf_dedup_ref_type(d, info->index_type);
+		if (ref_type_id < 0)
+			return ref_type_id;
+		info->index_type = ref_type_id;
+
+		h = btf_hash_array(t);
+		for_each_hash_node(d->dedup_table, h, cand_node) {
+			cand = d->btf->types[cand_node->type_id];
+			if (btf_equal_array(t, cand)) {
+				new_id = cand_node->type_id;
+				break;
+			}
+		}
+		break;
+	}
+
+	case BTF_KIND_FUNC_PROTO: {
+		struct btf_param *param;
+		__u16 vlen;
+		int i;
+
+		ref_type_id = btf_dedup_ref_type(d, t->type);
+		if (ref_type_id < 0)
+			return ref_type_id;
+		t->type = ref_type_id;
+
+		vlen = BTF_INFO_VLEN(t->info);
+		param = (struct btf_param *)(t + 1);
+		for (i = 0; i < vlen; i++) {
+			ref_type_id = btf_dedup_ref_type(d, param->type);
+			if (ref_type_id < 0)
+				return ref_type_id;
+			param->type = ref_type_id;
+			param++;
+		}
+
+		h = btf_hash_fnproto(t);
+		for_each_hash_node(d->dedup_table, h, cand_node) {
+			cand = d->btf->types[cand_node->type_id];
+			if (btf_equal_fnproto(t, cand)) {
+				new_id = cand_node->type_id;
+				break;
+			}
+		}
+		break;
+	}
+
+	default:
+		return -EINVAL;
+	}
+
+	d->map[type_id] = new_id;
+	if (type_id == new_id && btf_dedup_table_add(d, h, type_id))
+		return -ENOMEM;
+
+	return new_id;
+}
+
+static int btf_dedup_ref_types(struct btf_dedup *d)
+{
+	int i, err;
+
+	for (i = 1; i <= d->btf->nr_types; i++) {
+		err = btf_dedup_ref_type(d, i);
+		if (err < 0)
+			return err;
+	}
+	btf_dedup_table_free(d);
+	return 0;
+}
+
+/*
+ * Compact types.
+ *
+ * After we established for each type its corresponding canonical representative
+ * type, we now can eliminate types that are not canonical and leave only
+ * canonical ones layed out sequentially in memory by copying them over
+ * duplicates. During compaction btf_dedup->hypot_map array is reused to store
+ * a map from original type ID to a new compacted type ID, which will be used
+ * during next phase to "fix up" type IDs, referenced from struct/union and
+ * reference types.
+ */
+static int btf_dedup_compact_types(struct btf_dedup *d)
+{
+	struct btf_type **new_types;
+	__u32 next_type_id = 1;
+	char *types_start, *p;
+	int i, len;
+
+	/* we are going to reuse hypot_map to store compaction remapping */
+	d->hypot_map[0] = 0;
+	for (i = 1; i <= d->btf->nr_types; i++)
+		d->hypot_map[i] = BTF_UNPROCESSED_ID;
+
+	types_start = d->btf->nohdr_data + d->btf->hdr->type_off;
+	p = types_start;
+
+	for (i = 1; i <= d->btf->nr_types; i++) {
+		if (d->map[i] != i)
+			continue;
+
+		len = btf_type_size(d->btf->types[i]);
+		if (len < 0)
+			return len;
+
+		memmove(p, d->btf->types[i], len);
+		d->hypot_map[i] = next_type_id;
+		d->btf->types[next_type_id] = (struct btf_type *)p;
+		p += len;
+		next_type_id++;
+	}
+
+	/* shrink struct btf's internal types index and update btf_header */
+	d->btf->nr_types = next_type_id - 1;
+	d->btf->types_size = d->btf->nr_types;
+	d->btf->hdr->type_len = p - types_start;
+	new_types = realloc(d->btf->types,
+			    (1 + d->btf->nr_types) * sizeof(struct btf_type *));
+	if (!new_types)
+		return -ENOMEM;
+	d->btf->types = new_types;
+
+	/* make sure string section follows type information without gaps */
+	d->btf->hdr->str_off = p - (char *)d->btf->nohdr_data;
+	memmove(p, d->btf->strings, d->btf->hdr->str_len);
+	d->btf->strings = p;
+	p += d->btf->hdr->str_len;
+
+	d->btf->data_size = p - (char *)d->btf->data;
+	return 0;
+}
+
+/*
+ * Figure out final (deduplicated and compacted) type ID for provided original
+ * `type_id` by first resolving it into corresponding canonical type ID and
+ * then mapping it to a deduplicated type ID, stored in btf_dedup->hypot_map,
+ * which is populated during compaction phase.
+ */
+static int btf_dedup_remap_type_id(struct btf_dedup *d, __u32 type_id)
+{
+	__u32 resolved_type_id, new_type_id;
+
+	resolved_type_id = resolve_type_id(d, type_id);
+	new_type_id = d->hypot_map[resolved_type_id];
+	if (new_type_id > BTF_MAX_TYPE)
+		return -EINVAL;
+	return new_type_id;
+}
+
+/*
+ * Remap referenced type IDs into deduped type IDs.
+ *
+ * After BTF types are deduplicated and compacted, their final type IDs may
+ * differ from original ones. The map from original to a corresponding
+ * deduped type ID is stored in btf_dedup->hypot_map and is populated during
+ * compaction phase. During remapping phase we are rewriting all type IDs
+ * referenced from any BTF type (e.g., struct fields, func proto args, etc) to
+ * their final deduped type IDs.
+ */
+static int btf_dedup_remap_type(struct btf_dedup *d, __u32 type_id)
+{
+	struct btf_type *t = d->btf->types[type_id];
+	int i, r;
+
+	switch (BTF_INFO_KIND(t->info)) {
+	case BTF_KIND_INT:
+	case BTF_KIND_ENUM:
+		break;
+
+	case BTF_KIND_FWD:
+	case BTF_KIND_CONST:
+	case BTF_KIND_VOLATILE:
+	case BTF_KIND_RESTRICT:
+	case BTF_KIND_PTR:
+	case BTF_KIND_TYPEDEF:
+	case BTF_KIND_FUNC:
+		r = btf_dedup_remap_type_id(d, t->type);
+		if (r < 0)
+			return r;
+		t->type = r;
+		break;
+
+	case BTF_KIND_ARRAY: {
+		struct btf_array *arr_info = (struct btf_array *)(t + 1);
+
+		r = btf_dedup_remap_type_id(d, arr_info->type);
+		if (r < 0)
+			return r;
+		arr_info->type = r;
+		r = btf_dedup_remap_type_id(d, arr_info->index_type);
+		if (r < 0)
+			return r;
+		arr_info->index_type = r;
+		break;
+	}
+
+	case BTF_KIND_STRUCT:
+	case BTF_KIND_UNION: {
+		struct btf_member *member = (struct btf_member *)(t + 1);
+		__u16 vlen = BTF_INFO_VLEN(t->info);
+
+		for (i = 0; i < vlen; i++) {
+			r = btf_dedup_remap_type_id(d, member->type);
+			if (r < 0)
+				return r;
+			member->type = r;
+			member++;
+		}
+		break;
+	}
+
+	case BTF_KIND_FUNC_PROTO: {
+		struct btf_param *param = (struct btf_param *)(t + 1);
+		__u16 vlen = BTF_INFO_VLEN(t->info);
+
+		r = btf_dedup_remap_type_id(d, t->type);
+		if (r < 0)
+			return r;
+		t->type = r;
+
+		for (i = 0; i < vlen; i++) {
+			r = btf_dedup_remap_type_id(d, param->type);
+			if (r < 0)
+				return r;
+			param->type = r;
+			param++;
+		}
+		break;
+	}
+
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int btf_dedup_remap_types(struct btf_dedup *d)
+{
+	int i, r;
+
+	for (i = 1; i <= d->btf->nr_types; i++) {
+		r = btf_dedup_remap_type(d, i);
+		if (r < 0)
+			return r;
+	}
+	return 0;
+}
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 258c87e9f55d..c739de7ed993 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -84,6 +84,13 @@ LIBBPF_API int btf_ext__reloc_line_info(const struct btf *btf,
 LIBBPF_API __u32 btf_ext__func_info_rec_size(const struct btf_ext *btf_ext);
 LIBBPF_API __u32 btf_ext__line_info_rec_size(const struct btf_ext *btf_ext);
 
+struct btf_dedup_opts {
+	bool dont_resolve_fwds;
+};
+
+LIBBPF_API int btf__dedup(struct btf *btf, struct btf_ext *btf_ext,
+			  const struct btf_dedup_opts *opts);
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 7990e857e003..7e4a8c1e1c1c 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -133,6 +133,7 @@ LIBBPF_0.0.2 {
 		bpf_map_lookup_elem_flags;
 		bpf_object__find_map_fd_by_name;
 		bpf_get_link_xdp_id;
+		btf__dedup;
 		btf__get_map_kv_tids;
 		btf_ext__free;
 		btf_ext__func_info_rec_size;
-- 
2.17.1


^ permalink raw reply related

* [PATCH btf v2 3/3] selftests/btf: add initial BTF dedup tests
From: Andrii Nakryiko @ 2019-02-05  1:29 UTC (permalink / raw)
  To: netdev, ast, yhs, daniel, acme, kernel-team, kafai, ecree,
	andrii.nakryiko
  Cc: Andrii Nakryiko
In-Reply-To: <20190205012946.1590917-1-andriin@fb.com>

This patch sets up a new kind of tests (BTF dedup tests) and tests few aspects of
BTF dedup algorithm. More complete set of tests will come in follow up patches.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/btf.c                    |  12 +
 tools/lib/bpf/btf.h                    |   3 +
 tools/lib/bpf/libbpf.map               |   2 +
 tools/testing/selftests/bpf/test_btf.c | 535 ++++++++++++++++++++++++-
 4 files changed, 537 insertions(+), 15 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index e5097be16018..4949f8840bda 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -237,6 +237,11 @@ static int btf_parse_type_sec(struct btf *btf)
 	return 0;
 }
 
+__u32 btf__get_nr_types(const struct btf *btf)
+{
+	return btf->nr_types;
+}
+
 const struct btf_type *btf__type_by_id(const struct btf *btf, __u32 type_id)
 {
 	if (type_id > btf->nr_types)
@@ -427,6 +432,13 @@ int btf__fd(const struct btf *btf)
 	return btf->fd;
 }
 
+void btf__get_strings(const struct btf *btf, const char **strings,
+		      __u32 *str_len)
+{
+	*strings = btf->strings;
+	*str_len = btf->hdr->str_len;
+}
+
 const char *btf__name_by_offset(const struct btf *btf, __u32 offset)
 {
 	if (offset < btf->hdr->str_len)
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index c739de7ed993..25a9d2db035d 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -59,11 +59,14 @@ LIBBPF_API void btf__free(struct btf *btf);
 LIBBPF_API struct btf *btf__new(__u8 *data, __u32 size);
 LIBBPF_API __s32 btf__find_by_name(const struct btf *btf,
 				   const char *type_name);
+LIBBPF_API __u32 btf__get_nr_types(const struct btf *btf);
 LIBBPF_API const struct btf_type *btf__type_by_id(const struct btf *btf,
 						  __u32 id);
 LIBBPF_API __s64 btf__resolve_size(const struct btf *btf, __u32 type_id);
 LIBBPF_API int btf__resolve_type(const struct btf *btf, __u32 type_id);
 LIBBPF_API int btf__fd(const struct btf *btf);
+LIBBPF_API void btf__get_strings(const struct btf *btf, const char **strings,
+				 __u32 *str_len);
 LIBBPF_API const char *btf__name_by_offset(const struct btf *btf, __u32 offset);
 LIBBPF_API int btf__get_from_id(__u32 id, struct btf **btf);
 LIBBPF_API int btf__get_map_kv_tids(const struct btf *btf, char *map_name,
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 7e4a8c1e1c1c..89c1149e32ee 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -135,6 +135,8 @@ LIBBPF_0.0.2 {
 		bpf_get_link_xdp_id;
 		btf__dedup;
 		btf__get_map_kv_tids;
+		btf__get_nr_types;
+		btf__get_strings;
 		btf_ext__free;
 		btf_ext__func_info_rec_size;
 		btf_ext__line_info_rec_size;
diff --git a/tools/testing/selftests/bpf/test_btf.c b/tools/testing/selftests/bpf/test_btf.c
index aebaeff5a5a0..91080311f3e1 100644
--- a/tools/testing/selftests/bpf/test_btf.c
+++ b/tools/testing/selftests/bpf/test_btf.c
@@ -79,12 +79,21 @@ static int __base_pr(enum libbpf_print_level level __attribute__((unused)),
 	BTF_TYPE_ENC(name, BTF_INFO_ENC(BTF_KIND_INT, 0, 0), sz),	\
 	BTF_INT_ENC(encoding, bits_offset, bits)
 
+#define BTF_FWD_ENC(name, kind_flag) \
+	BTF_TYPE_ENC(name, BTF_INFO_ENC(BTF_KIND_FWD, kind_flag, 0), 0)
+
 #define BTF_ARRAY_ENC(type, index_type, nr_elems)	\
 	(type), (index_type), (nr_elems)
 #define BTF_TYPE_ARRAY_ENC(type, index_type, nr_elems) \
 	BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_ARRAY, 0, 0), 0), \
 	BTF_ARRAY_ENC(type, index_type, nr_elems)
 
+#define BTF_STRUCT_ENC(name, nr_elems, sz)	\
+	BTF_TYPE_ENC(name, BTF_INFO_ENC(BTF_KIND_STRUCT, 0, nr_elems), sz)
+
+#define BTF_UNION_ENC(name, nr_elems, sz)	\
+	BTF_TYPE_ENC(name, BTF_INFO_ENC(BTF_KIND_UNION, 0, nr_elems), sz)
+
 #define BTF_MEMBER_ENC(name, type, bits_offset)	\
 	(name), (type), (bits_offset)
 #define BTF_ENUM_ENC(name, val) (name), (val)
@@ -100,6 +109,12 @@ static int __base_pr(enum libbpf_print_level level __attribute__((unused)),
 #define BTF_CONST_ENC(type) \
 	BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_CONST, 0, 0), type)
 
+#define BTF_VOLATILE_ENC(type) \
+	BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_VOLATILE, 0, 0), type)
+
+#define BTF_RESTRICT_ENC(type) \
+	BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_RESTRICT, 0, 0), type)
+
 #define BTF_FUNC_PROTO_ENC(ret_type, nargs) \
 	BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, nargs), ret_type)
 
@@ -112,6 +127,10 @@ static int __base_pr(enum libbpf_print_level level __attribute__((unused)),
 #define BTF_END_RAW 0xdeadbeef
 #define NAME_TBD 0xdeadb33f
 
+#define NAME_NTH(N) (0xffff0000 | N)
+#define IS_NAME_NTH(X) ((X & 0xffff0000) == 0xffff0000)
+#define GET_NAME_NTH_IDX(X) (X & 0x0000ffff)
+
 #define MAX_NR_RAW_U32 1024
 #define BTF_LOG_BUF_SIZE 65535
 
@@ -120,12 +139,14 @@ static struct args {
 	unsigned int file_test_num;
 	unsigned int get_info_test_num;
 	unsigned int info_raw_test_num;
+	unsigned int dedup_test_num;
 	bool raw_test;
 	bool file_test;
 	bool get_info_test;
 	bool pprint_test;
 	bool always_log;
 	bool info_raw_test;
+	bool dedup_test;
 } args;
 
 static char btf_log_buf[BTF_LOG_BUF_SIZE];
@@ -2836,11 +2857,13 @@ static void *btf_raw_create(const struct btf_header *hdr,
 			    const char **ret_next_str)
 {
 	const char *next_str = str, *end_str = str + str_sec_size;
+	const char **strs_idx = NULL, **tmp_strs_idx;
+	int strs_cap = 0, strs_cnt = 0, next_str_idx = 0;
 	unsigned int size_needed, offset;
 	struct btf_header *ret_hdr;
-	int i, type_sec_size;
+	int i, type_sec_size, err = 0;
 	uint32_t *ret_types;
-	void *raw_btf;
+	void *raw_btf = NULL;
 
 	type_sec_size = get_raw_sec_size(raw_types);
 	if (CHECK(type_sec_size < 0, "Cannot get nr_raw_types"))
@@ -2855,17 +2878,44 @@ static void *btf_raw_create(const struct btf_header *hdr,
 	memcpy(raw_btf, hdr, sizeof(*hdr));
 	offset = sizeof(*hdr);
 
+	/* Index strings */
+	while ((next_str = get_next_str(next_str, end_str))) {
+		if (strs_cnt == strs_cap) {
+			strs_cap += max(16, strs_cap / 2);
+			tmp_strs_idx = realloc(strs_idx,
+					       sizeof(*strs_idx) * strs_cap);
+			if (CHECK(!tmp_strs_idx,
+				  "Cannot allocate memory for strs_idx")) {
+				err = -1;
+				goto done;
+			}
+			strs_idx = tmp_strs_idx;
+		}
+		strs_idx[strs_cnt++] = next_str;
+		next_str += strlen(next_str);
+	}
+
 	/* Copy type section */
 	ret_types = raw_btf + offset;
 	for (i = 0; i < type_sec_size / sizeof(raw_types[0]); i++) {
 		if (raw_types[i] == NAME_TBD) {
-			next_str = get_next_str(next_str, end_str);
-			if (CHECK(!next_str, "Error in getting next_str")) {
-				free(raw_btf);
-				return NULL;
+			if (CHECK(next_str_idx == strs_cnt,
+				  "Error in getting next_str #%d",
+				  next_str_idx)) {
+				err = -1;
+				goto done;
 			}
-			ret_types[i] = next_str - str;
-			next_str += strlen(next_str);
+			ret_types[i] = strs_idx[next_str_idx++] - str;
+		} else if (IS_NAME_NTH(raw_types[i])) {
+			int idx = GET_NAME_NTH_IDX(raw_types[i]);
+
+			if (CHECK(idx <= 0 || idx > strs_cnt,
+				  "Error getting string #%d, strs_cnt:%d",
+				  idx, strs_cnt)) {
+				err = -1;
+				goto done;
+			}
+			ret_types[i] = strs_idx[idx-1] - str;
 		} else {
 			ret_types[i] = raw_types[i];
 		}
@@ -2882,8 +2932,17 @@ static void *btf_raw_create(const struct btf_header *hdr,
 
 	*btf_size = size_needed;
 	if (ret_next_str)
-		*ret_next_str = next_str;
+		*ret_next_str =
+			next_str_idx < strs_cnt ? strs_idx[next_str_idx] : NULL;
 
+done:
+	if (err) {
+		if (raw_btf)
+			free(raw_btf);
+		if (strs_idx)
+			free(strs_idx);
+		return NULL;
+	}
 	return raw_btf;
 }
 
@@ -5552,20 +5611,450 @@ static int test_info_raw(void)
 	return err;
 }
 
+struct btf_raw_data {
+	__u32 raw_types[MAX_NR_RAW_U32];
+	const char *str_sec;
+	__u32 str_sec_size;
+};
+
+struct btf_dedup_test {
+	const char *descr;
+	struct btf_raw_data input;
+	struct btf_raw_data expect;
+	struct btf_dedup_opts opts;
+};
+
+const struct btf_dedup_test dedup_tests[] = {
+
+{
+	.descr = "dedup: unused strings filtering",
+	.input = {
+		.raw_types = {
+			BTF_TYPE_INT_ENC(NAME_NTH(2), BTF_INT_SIGNED, 0, 32, 4),
+			BTF_TYPE_INT_ENC(NAME_NTH(5), BTF_INT_SIGNED, 0, 64, 8),
+			BTF_END_RAW,
+		},
+		BTF_STR_SEC("\0unused\0int\0foo\0bar\0long"),
+	},
+	.expect = {
+		.raw_types = {
+			BTF_TYPE_INT_ENC(NAME_NTH(1), BTF_INT_SIGNED, 0, 32, 4),
+			BTF_TYPE_INT_ENC(NAME_NTH(2), BTF_INT_SIGNED, 0, 64, 8),
+			BTF_END_RAW,
+		},
+		BTF_STR_SEC("\0int\0long"),
+	},
+	.opts = {
+		.dont_resolve_fwds = false,
+	},
+},
+{
+	.descr = "dedup: strings deduplication",
+	.input = {
+		.raw_types = {
+			BTF_TYPE_INT_ENC(NAME_NTH(1), BTF_INT_SIGNED, 0, 32, 4),
+			BTF_TYPE_INT_ENC(NAME_NTH(2), BTF_INT_SIGNED, 0, 64, 8),
+			BTF_TYPE_INT_ENC(NAME_NTH(3), BTF_INT_SIGNED, 0, 32, 4),
+			BTF_TYPE_INT_ENC(NAME_NTH(4), BTF_INT_SIGNED, 0, 64, 8),
+			BTF_TYPE_INT_ENC(NAME_NTH(5), BTF_INT_SIGNED, 0, 32, 4),
+			BTF_END_RAW,
+		},
+		BTF_STR_SEC("\0int\0long int\0int\0long int\0int"),
+	},
+	.expect = {
+		.raw_types = {
+			BTF_TYPE_INT_ENC(NAME_NTH(1), BTF_INT_SIGNED, 0, 32, 4),
+			BTF_TYPE_INT_ENC(NAME_NTH(2), BTF_INT_SIGNED, 0, 64, 8),
+			BTF_END_RAW,
+		},
+		BTF_STR_SEC("\0int\0long int"),
+	},
+	.opts = {
+		.dont_resolve_fwds = false,
+	},
+},
+{
+	.descr = "dedup: struct example #1",
+	/*
+	 * struct s {
+	 *	struct s *next;
+	 *	const int *a;
+	 *	int b[16];
+	 *	int c;
+	 * }
+	 */
+	.input = {
+		.raw_types = {
+			/* int */
+			BTF_TYPE_INT_ENC(NAME_NTH(1), BTF_INT_SIGNED, 0, 32, 4),	/* [1] */
+			/* int[16] */
+			BTF_TYPE_ARRAY_ENC(1, 1, 16),					/* [2] */
+			/* struct s { */
+			BTF_STRUCT_ENC(NAME_NTH(2), 4, 84),				/* [3] */
+				BTF_MEMBER_ENC(NAME_NTH(3), 4, 0),	/* struct s *next;	*/
+				BTF_MEMBER_ENC(NAME_NTH(4), 5, 64),	/* const int *a;	*/
+				BTF_MEMBER_ENC(NAME_NTH(5), 2, 128),	/* int b[16];		*/
+				BTF_MEMBER_ENC(NAME_NTH(6), 1, 640),	/* int c;		*/
+			/* ptr -> [3] struct s */
+			BTF_PTR_ENC(3),							/* [4] */
+			/* ptr -> [6] const int */
+			BTF_PTR_ENC(6),							/* [5] */
+			/* const -> [1] int */
+			BTF_CONST_ENC(1),						/* [6] */
+
+			/* full copy of the above */
+			BTF_TYPE_INT_ENC(NAME_NTH(1), BTF_INT_SIGNED, 0, 32, 4),	/* [7] */
+			BTF_TYPE_ARRAY_ENC(7, 7, 16),					/* [8] */
+			BTF_STRUCT_ENC(NAME_NTH(2), 4, 84),				/* [9] */
+				BTF_MEMBER_ENC(NAME_NTH(3), 10, 0),
+				BTF_MEMBER_ENC(NAME_NTH(4), 11, 64),
+				BTF_MEMBER_ENC(NAME_NTH(5), 8, 128),
+				BTF_MEMBER_ENC(NAME_NTH(6), 7, 640),
+			BTF_PTR_ENC(9),							/* [10] */
+			BTF_PTR_ENC(12),						/* [11] */
+			BTF_CONST_ENC(7),						/* [12] */
+			BTF_END_RAW,
+		},
+		BTF_STR_SEC("\0int\0s\0next\0a\0b\0c\0"),
+	},
+	.expect = {
+		.raw_types = {
+			/* int */
+			BTF_TYPE_INT_ENC(NAME_NTH(4), BTF_INT_SIGNED, 0, 32, 4),	/* [1] */
+			/* int[16] */
+			BTF_TYPE_ARRAY_ENC(1, 1, 16),					/* [2] */
+			/* struct s { */
+			BTF_STRUCT_ENC(NAME_NTH(6), 4, 84),				/* [3] */
+				BTF_MEMBER_ENC(NAME_NTH(5), 4, 0),	/* struct s *next;	*/
+				BTF_MEMBER_ENC(NAME_NTH(1), 5, 64),	/* const int *a;	*/
+				BTF_MEMBER_ENC(NAME_NTH(2), 2, 128),	/* int b[16];		*/
+				BTF_MEMBER_ENC(NAME_NTH(3), 1, 640),	/* int c;		*/
+			/* ptr -> [3] struct s */
+			BTF_PTR_ENC(3),							/* [4] */
+			/* ptr -> [6] const int */
+			BTF_PTR_ENC(6),							/* [5] */
+			/* const -> [1] int */
+			BTF_CONST_ENC(1),						/* [6] */
+			BTF_END_RAW,
+		},
+		BTF_STR_SEC("\0a\0b\0c\0int\0next\0s"),
+	},
+	.opts = {
+		.dont_resolve_fwds = false,
+	},
+},
+{
+	.descr = "dedup: all possible kinds (no duplicates)",
+	.input = {
+		.raw_types = {
+			BTF_TYPE_INT_ENC(NAME_TBD, BTF_INT_SIGNED, 0, 32, 8),		/* [1] int */
+			BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_ENUM, 0, 2), 4),	/* [2] enum */
+				BTF_ENUM_ENC(NAME_TBD, 0),
+				BTF_ENUM_ENC(NAME_TBD, 1),
+			BTF_FWD_ENC(NAME_TBD, 1 /* union kind_flag */),			/* [3] fwd */
+			BTF_TYPE_ARRAY_ENC(2, 1, 7),					/* [4] array */
+			BTF_STRUCT_ENC(NAME_TBD, 1, 4),					/* [5] struct */
+				BTF_MEMBER_ENC(NAME_TBD, 1, 0),
+			BTF_UNION_ENC(NAME_TBD, 1, 4),					/* [6] union */
+				BTF_MEMBER_ENC(NAME_TBD, 1, 0),
+			BTF_TYPEDEF_ENC(NAME_TBD, 1),					/* [7] typedef */
+			BTF_PTR_ENC(0),							/* [8] ptr */
+			BTF_CONST_ENC(8),						/* [9] const */
+			BTF_VOLATILE_ENC(8),						/* [10] volatile */
+			BTF_RESTRICT_ENC(8),						/* [11] restrict */
+			BTF_FUNC_PROTO_ENC(1, 2),					/* [12] func_proto */
+				BTF_FUNC_PROTO_ARG_ENC(NAME_TBD, 1),
+				BTF_FUNC_PROTO_ARG_ENC(NAME_TBD, 8),
+			BTF_FUNC_ENC(NAME_TBD, 12),					/* [13] func */
+			BTF_END_RAW,
+		},
+		BTF_STR_SEC("\0A\0B\0C\0D\0E\0F\0G\0H\0I\0J\0K\0L\0M"),
+	},
+	.expect = {
+		.raw_types = {
+			BTF_TYPE_INT_ENC(NAME_TBD, BTF_INT_SIGNED, 0, 32, 8),		/* [1] int */
+			BTF_TYPE_ENC(NAME_TBD, BTF_INFO_ENC(BTF_KIND_ENUM, 0, 2), 4),	/* [2] enum */
+				BTF_ENUM_ENC(NAME_TBD, 0),
+				BTF_ENUM_ENC(NAME_TBD, 1),
+			BTF_FWD_ENC(NAME_TBD, 1 /* union kind_flag */),			/* [3] fwd */
+			BTF_TYPE_ARRAY_ENC(2, 1, 7),					/* [4] array */
+			BTF_STRUCT_ENC(NAME_TBD, 1, 4),					/* [5] struct */
+				BTF_MEMBER_ENC(NAME_TBD, 1, 0),
+			BTF_UNION_ENC(NAME_TBD, 1, 4),					/* [6] union */
+				BTF_MEMBER_ENC(NAME_TBD, 1, 0),
+			BTF_TYPEDEF_ENC(NAME_TBD, 1),					/* [7] typedef */
+			BTF_PTR_ENC(0),							/* [8] ptr */
+			BTF_CONST_ENC(8),						/* [9] const */
+			BTF_VOLATILE_ENC(8),						/* [10] volatile */
+			BTF_RESTRICT_ENC(8),						/* [11] restrict */
+			BTF_FUNC_PROTO_ENC(1, 2),					/* [12] func_proto */
+				BTF_FUNC_PROTO_ARG_ENC(NAME_TBD, 1),
+				BTF_FUNC_PROTO_ARG_ENC(NAME_TBD, 8),
+			BTF_FUNC_ENC(NAME_TBD, 12),					/* [13] func */
+			BTF_END_RAW,
+		},
+		BTF_STR_SEC("\0A\0B\0C\0D\0E\0F\0G\0H\0I\0J\0K\0L\0M"),
+	},
+	.opts = {
+		.dont_resolve_fwds = false,
+	},
+},
+{
+	.descr = "dedup: no int duplicates",
+	.input = {
+		.raw_types = {
+			BTF_TYPE_INT_ENC(NAME_NTH(1), BTF_INT_SIGNED, 0, 32, 8),
+			/* different name */
+			BTF_TYPE_INT_ENC(NAME_NTH(2), BTF_INT_SIGNED, 0, 32, 8),
+			/* different encoding */
+			BTF_TYPE_INT_ENC(NAME_NTH(1), BTF_INT_CHAR, 0, 32, 8),
+			BTF_TYPE_INT_ENC(NAME_NTH(1), BTF_INT_BOOL, 0, 32, 8),
+			/* different bit offset */
+			BTF_TYPE_INT_ENC(NAME_NTH(1), BTF_INT_SIGNED, 8, 32, 8),
+			/* different bit size */
+			BTF_TYPE_INT_ENC(NAME_NTH(1), BTF_INT_SIGNED, 0, 27, 8),
+			/* different byte size */
+			BTF_TYPE_INT_ENC(NAME_NTH(1), BTF_INT_SIGNED, 0, 32, 4),
+			BTF_END_RAW,
+		},
+		BTF_STR_SEC("\0int\0some other int"),
+	},
+	.expect = {
+		.raw_types = {
+			BTF_TYPE_INT_ENC(NAME_NTH(1), BTF_INT_SIGNED, 0, 32, 8),
+			/* different name */
+			BTF_TYPE_INT_ENC(NAME_NTH(2), BTF_INT_SIGNED, 0, 32, 8),
+			/* different encoding */
+			BTF_TYPE_INT_ENC(NAME_NTH(1), BTF_INT_CHAR, 0, 32, 8),
+			BTF_TYPE_INT_ENC(NAME_NTH(1), BTF_INT_BOOL, 0, 32, 8),
+			/* different bit offset */
+			BTF_TYPE_INT_ENC(NAME_NTH(1), BTF_INT_SIGNED, 8, 32, 8),
+			/* different bit size */
+			BTF_TYPE_INT_ENC(NAME_NTH(1), BTF_INT_SIGNED, 0, 27, 8),
+			/* different byte size */
+			BTF_TYPE_INT_ENC(NAME_NTH(1), BTF_INT_SIGNED, 0, 32, 4),
+			BTF_END_RAW,
+		},
+		BTF_STR_SEC("\0int\0some other int"),
+	},
+	.opts = {
+		.dont_resolve_fwds = false,
+	},
+},
+
+};
+
+static int btf_type_size(const struct btf_type *t)
+{
+	int base_size = sizeof(struct btf_type);
+	__u16 vlen = BTF_INFO_VLEN(t->info);
+	__u16 kind = BTF_INFO_KIND(t->info);
+
+	switch (kind) {
+	case BTF_KIND_FWD:
+	case BTF_KIND_CONST:
+	case BTF_KIND_VOLATILE:
+	case BTF_KIND_RESTRICT:
+	case BTF_KIND_PTR:
+	case BTF_KIND_TYPEDEF:
+	case BTF_KIND_FUNC:
+		return base_size;
+	case BTF_KIND_INT:
+		return base_size + sizeof(__u32);
+	case BTF_KIND_ENUM:
+		return base_size + vlen * sizeof(struct btf_enum);
+	case BTF_KIND_ARRAY:
+		return base_size + sizeof(struct btf_array);
+	case BTF_KIND_STRUCT:
+	case BTF_KIND_UNION:
+		return base_size + vlen * sizeof(struct btf_member);
+	case BTF_KIND_FUNC_PROTO:
+		return base_size + vlen * sizeof(struct btf_param);
+	default:
+		fprintf(stderr, "Unsupported BTF_KIND:%u\n", kind);
+		return -EINVAL;
+	}
+}
+
+static void dump_btf_strings(const char *strs, __u32 len)
+{
+	const char *cur = strs;
+	int i = 0;
+
+	while (cur < strs + len) {
+		fprintf(stderr, "string #%d: '%s'\n", i, cur);
+		cur += strlen(cur) + 1;
+		i++;
+	}
+}
+
+static int do_test_dedup(unsigned int test_num)
+{
+	const struct btf_dedup_test *test = &dedup_tests[test_num - 1];
+	int err = 0, i;
+	__u32 test_nr_types, expect_nr_types, test_str_len, expect_str_len;
+	void *raw_btf;
+	unsigned int raw_btf_size;
+	struct btf *test_btf = NULL, *expect_btf = NULL;
+	const char *ret_test_next_str, *ret_expect_next_str;
+	const char *test_strs, *expect_strs;
+	const char *test_str_cur, *test_str_end;
+	const char *expect_str_cur, *expect_str_end;
+
+	fprintf(stderr, "BTF dedup test[%u] (%s):", test_num, test->descr);
+
+	raw_btf = btf_raw_create(&hdr_tmpl, test->input.raw_types,
+				 test->input.str_sec, test->input.str_sec_size,
+				 &raw_btf_size, &ret_test_next_str);
+	if (!raw_btf)
+		return -1;
+	test_btf = btf__new((__u8 *)raw_btf, raw_btf_size);
+	free(raw_btf);
+	if (CHECK(IS_ERR(test_btf), "invalid test_btf errno:%ld",
+		  PTR_ERR(test_btf))) {
+		err = -1;
+		goto done;
+	}
+
+	raw_btf = btf_raw_create(&hdr_tmpl, test->expect.raw_types,
+				 test->expect.str_sec,
+				 test->expect.str_sec_size,
+				 &raw_btf_size, &ret_expect_next_str);
+	if (!raw_btf)
+		return -1;
+	expect_btf = btf__new((__u8 *)raw_btf, raw_btf_size);
+	free(raw_btf);
+	if (CHECK(IS_ERR(expect_btf), "invalid expect_btf errno:%ld",
+		  PTR_ERR(expect_btf))) {
+		err = -1;
+		goto done;
+	}
+
+	err = btf__dedup(test_btf, NULL, &test->opts);
+	if (CHECK(err, "btf_dedup failed errno:%d", err)) {
+		err = -1;
+		goto done;
+	}
+
+	btf__get_strings(test_btf, &test_strs, &test_str_len);
+	btf__get_strings(expect_btf, &expect_strs, &expect_str_len);
+	if (CHECK(test_str_len != expect_str_len,
+		  "test_str_len:%u != expect_str_len:%u",
+		  test_str_len, expect_str_len)) {
+		fprintf(stderr, "\ntest strings:\n");
+		dump_btf_strings(test_strs, test_str_len);
+		fprintf(stderr, "\nexpected strings:\n");
+		dump_btf_strings(expect_strs, expect_str_len);
+		err = -1;
+		goto done;
+	}
+
+	test_str_cur = test_strs;
+	test_str_end = test_strs + test_str_len;
+	expect_str_cur = expect_strs;
+	expect_str_end = expect_strs + expect_str_len;
+	while (test_str_cur < test_str_end && expect_str_cur < expect_str_end) {
+		size_t test_len, expect_len;
+
+		test_len = strlen(test_str_cur);
+		expect_len = strlen(expect_str_cur);
+		if (CHECK(test_len != expect_len,
+			  "test_len:%zu != expect_len:%zu "
+			  "(test_str:%s, expect_str:%s)",
+			  test_len, expect_len, test_str_cur, expect_str_cur)) {
+			err = -1;
+			goto done;
+		}
+		if (CHECK(strcmp(test_str_cur, expect_str_cur),
+			  "test_str:%s != expect_str:%s",
+			  test_str_cur, expect_str_cur)) {
+			err = -1;
+			goto done;
+		}
+		test_str_cur += test_len + 1;
+		expect_str_cur += expect_len + 1;
+	}
+	if (CHECK(test_str_cur != test_str_end,
+		  "test_str_cur:%p != test_str_end:%p",
+		  test_str_cur, test_str_end)) {
+		err = -1;
+		goto done;
+	}
+
+	test_nr_types = btf__get_nr_types(test_btf);
+	expect_nr_types = btf__get_nr_types(expect_btf);
+	if (CHECK(test_nr_types != expect_nr_types,
+		  "test_nr_types:%u != expect_nr_types:%u",
+		  test_nr_types, expect_nr_types)) {
+		err = -1;
+		goto done;
+	}
+
+	for (i = 1; i <= test_nr_types; i++) {
+		const struct btf_type *test_type, *expect_type;
+		int test_size, expect_size;
+
+		test_type = btf__type_by_id(test_btf, i);
+		expect_type = btf__type_by_id(expect_btf, i);
+		test_size = btf_type_size(test_type);
+		expect_size = btf_type_size(expect_type);
+
+		if (CHECK(test_size != expect_size,
+			  "type #%d: test_size:%d != expect_size:%u",
+			  i, test_size, expect_size)) {
+			err = -1;
+			goto done;
+		}
+		if (CHECK(memcmp((void *)test_type,
+				 (void *)expect_type,
+				 test_size),
+			  "type #%d: contents differ", i)) {
+			err = -1;
+			goto done;
+		}
+	}
+
+done:
+	if (!err)
+		fprintf(stderr, "OK");
+	if (!IS_ERR(test_btf))
+		btf__free(test_btf);
+	if (!IS_ERR(expect_btf))
+		btf__free(expect_btf);
+
+	return err;
+}
+
+static int test_dedup(void)
+{
+	unsigned int i;
+	int err = 0;
+
+	if (args.dedup_test_num)
+		return count_result(do_test_dedup(args.dedup_test_num));
+
+	for (i = 1; i <= ARRAY_SIZE(dedup_tests); i++)
+		err |= count_result(do_test_dedup(i));
+
+	return err;
+}
+
 static void usage(const char *cmd)
 {
 	fprintf(stderr, "Usage: %s [-l] [[-r btf_raw_test_num (1 - %zu)] |\n"
 			"\t[-g btf_get_info_test_num (1 - %zu)] |\n"
 			"\t[-f btf_file_test_num (1 - %zu)] |\n"
 			"\t[-k btf_prog_info_raw_test_num (1 - %zu)] |\n"
-			"\t[-p (pretty print test)]]\n",
+			"\t[-p (pretty print test)] |\n"
+			"\t[-d btf_dedup_test_num (1 - %zu)]]\n",
 		cmd, ARRAY_SIZE(raw_tests), ARRAY_SIZE(get_info_tests),
-		ARRAY_SIZE(file_tests), ARRAY_SIZE(info_raw_tests));
+		ARRAY_SIZE(file_tests), ARRAY_SIZE(info_raw_tests),
+		ARRAY_SIZE(dedup_tests));
 }
 
 static int parse_args(int argc, char **argv)
 {
-	const char *optstr = "lpk:f:r:g:";
+	const char *optstr = "hlpk:f:r:g:d:";
 	int opt;
 
 	while ((opt = getopt(argc, argv, optstr)) != -1) {
@@ -5592,12 +6081,16 @@ static int parse_args(int argc, char **argv)
 			args.info_raw_test_num = atoi(optarg);
 			args.info_raw_test = true;
 			break;
+		case 'd':
+			args.dedup_test_num = atoi(optarg);
+			args.dedup_test = true;
+			break;
 		case 'h':
 			usage(argv[0]);
 			exit(0);
 		default:
-				usage(argv[0]);
-				return -1;
+			usage(argv[0]);
+			return -1;
 		}
 	}
 
@@ -5633,6 +6126,14 @@ static int parse_args(int argc, char **argv)
 		return -1;
 	}
 
+	if (args.dedup_test_num &&
+	    (args.dedup_test_num < 1 ||
+	     args.dedup_test_num > ARRAY_SIZE(dedup_tests))) {
+		fprintf(stderr, "BTF dedup test number must be [1 - %zu]\n",
+			ARRAY_SIZE(dedup_tests));
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -5668,14 +6169,18 @@ int main(int argc, char **argv)
 	if (args.info_raw_test)
 		err |= test_info_raw();
 
+	if (args.dedup_test)
+		err |= test_dedup();
+
 	if (args.raw_test || args.get_info_test || args.file_test ||
-	    args.pprint_test || args.info_raw_test)
+	    args.pprint_test || args.info_raw_test || args.dedup_test)
 		goto done;
 
 	err |= test_raw();
 	err |= test_get_info();
 	err |= test_file();
 	err |= test_info_raw();
+	err |= test_dedup();
 
 done:
 	print_summary();
-- 
2.17.1


^ permalink raw reply related

* [PATCH btf v2 1/3] btf: extract BTF type size calculation
From: Andrii Nakryiko @ 2019-02-05  1:29 UTC (permalink / raw)
  To: netdev, ast, yhs, daniel, acme, kernel-team, kafai, ecree,
	andrii.nakryiko
  Cc: Andrii Nakryiko
In-Reply-To: <20190205012946.1590917-1-andriin@fb.com>

This pre-patch extracts calculation of amount of space taken by BTF type descriptor
for later reuse by btf_dedup functionality.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/btf.c | 98 +++++++++++++++++++++------------------------
 1 file changed, 46 insertions(+), 52 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 7ec0463354db..06bd1a625ff4 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -182,6 +182,37 @@ static int btf_parse_str_sec(struct btf *btf)
 	return 0;
 }
 
+static int btf_type_size(struct btf_type *t)
+{
+	int base_size = sizeof(struct btf_type);
+	__u16 vlen = BTF_INFO_VLEN(t->info);
+
+	switch (BTF_INFO_KIND(t->info)) {
+	case BTF_KIND_FWD:
+	case BTF_KIND_CONST:
+	case BTF_KIND_VOLATILE:
+	case BTF_KIND_RESTRICT:
+	case BTF_KIND_PTR:
+	case BTF_KIND_TYPEDEF:
+	case BTF_KIND_FUNC:
+		return base_size;
+	case BTF_KIND_INT:
+		return base_size + sizeof(__u32);
+	case BTF_KIND_ENUM:
+		return base_size + vlen * sizeof(struct btf_enum);
+	case BTF_KIND_ARRAY:
+		return base_size + sizeof(struct btf_array);
+	case BTF_KIND_STRUCT:
+	case BTF_KIND_UNION:
+		return base_size + vlen * sizeof(struct btf_member);
+	case BTF_KIND_FUNC_PROTO:
+		return base_size + vlen * sizeof(struct btf_param);
+	default:
+		pr_debug("Unsupported BTF_KIND:%u\n", BTF_INFO_KIND(t->info));
+		return -EINVAL;
+	}
+}
+
 static int btf_parse_type_sec(struct btf *btf)
 {
 	struct btf_header *hdr = btf->hdr;
@@ -191,41 +222,13 @@ static int btf_parse_type_sec(struct btf *btf)
 
 	while (next_type < end_type) {
 		struct btf_type *t = next_type;
-		__u16 vlen = BTF_INFO_VLEN(t->info);
+		int type_size;
 		int err;
 
-		next_type += sizeof(*t);
-		switch (BTF_INFO_KIND(t->info)) {
-		case BTF_KIND_INT:
-			next_type += sizeof(int);
-			break;
-		case BTF_KIND_ARRAY:
-			next_type += sizeof(struct btf_array);
-			break;
-		case BTF_KIND_STRUCT:
-		case BTF_KIND_UNION:
-			next_type += vlen * sizeof(struct btf_member);
-			break;
-		case BTF_KIND_ENUM:
-			next_type += vlen * sizeof(struct btf_enum);
-			break;
-		case BTF_KIND_FUNC_PROTO:
-			next_type += vlen * sizeof(struct btf_param);
-			break;
-		case BTF_KIND_FUNC:
-		case BTF_KIND_TYPEDEF:
-		case BTF_KIND_PTR:
-		case BTF_KIND_FWD:
-		case BTF_KIND_VOLATILE:
-		case BTF_KIND_CONST:
-		case BTF_KIND_RESTRICT:
-			break;
-		default:
-			pr_debug("Unsupported BTF_KIND:%u\n",
-			     BTF_INFO_KIND(t->info));
-			return -EINVAL;
-		}
-
+		type_size = btf_type_size(t);
+		if (type_size < 0)
+			return type_size;
+		next_type += type_size;
 		err = btf_add_type(btf, t);
 		if (err)
 			return err;
@@ -252,21 +255,6 @@ static bool btf_type_is_void_or_null(const struct btf_type *t)
 	return !t || btf_type_is_void(t);
 }
 
-static __s64 btf_type_size(const struct btf_type *t)
-{
-	switch (BTF_INFO_KIND(t->info)) {
-	case BTF_KIND_INT:
-	case BTF_KIND_STRUCT:
-	case BTF_KIND_UNION:
-	case BTF_KIND_ENUM:
-		return t->size;
-	case BTF_KIND_PTR:
-		return sizeof(void *);
-	default:
-		return -EINVAL;
-	}
-}
-
 #define MAX_RESOLVE_DEPTH 32
 
 __s64 btf__resolve_size(const struct btf *btf, __u32 type_id)
@@ -280,11 +268,16 @@ __s64 btf__resolve_size(const struct btf *btf, __u32 type_id)
 	t = btf__type_by_id(btf, type_id);
 	for (i = 0; i < MAX_RESOLVE_DEPTH && !btf_type_is_void_or_null(t);
 	     i++) {
-		size = btf_type_size(t);
-		if (size >= 0)
-			break;
-
 		switch (BTF_INFO_KIND(t->info)) {
+		case BTF_KIND_INT:
+		case BTF_KIND_STRUCT:
+		case BTF_KIND_UNION:
+		case BTF_KIND_ENUM:
+			size = t->size;
+			goto done;
+		case BTF_KIND_PTR:
+			size = sizeof(void *);
+			goto done;
 		case BTF_KIND_TYPEDEF:
 		case BTF_KIND_VOLATILE:
 		case BTF_KIND_CONST:
@@ -308,6 +301,7 @@ __s64 btf__resolve_size(const struct btf *btf, __u32 type_id)
 	if (size < 0)
 		return -EINVAL;
 
+done:
 	if (nelems && size > UINT32_MAX / nelems)
 		return -E2BIG;
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH btf v2 0/3] Add BTF types deduplication algorithm
From: Andrii Nakryiko @ 2019-02-05  1:29 UTC (permalink / raw)
  To: netdev, ast, yhs, daniel, acme, kernel-team, kafai, ecree,
	andrii.nakryiko
  Cc: Andrii Nakryiko

This patch series adds BTF deduplication algorithm to libbpf. This algorithm
allows to take BTF type information containing duplicate per-compilation unit
information and reduce it to equivalent set of BTF types with no duplication without
loss of information. It also deduplicates strings and removes those strings that
are not referenced from any BTF type (and line information in .BTF.ext section,
if any).

Algorithm also resolves struct/union forward declarations into concrete BTF types
across multiple compilation units to facilitate better deduplication ratio. If
undesired, this resolution can be disabled through specifying corresponding options.

When applied to BTF data emitted by pahole's DWARF->BTF converter, it reduces
the overall size of .BTF section by about 65x, from about 112MB to 1.75MB, leaving
only 29247 out of initial 3073497 BTF type descriptors.

Algorithm with minor differences and preliminary results before FUNC/FUNC_PROTO
support is also described more verbosely at:
https://facebookmicrosites.github.io/bpf/blog/2018/11/14/btf-enhancement.html

v1->v2:
- rebase on latest bpf-next
- err_log/elog -> pr_debug
- btf__dedup, btf__get_strings, btf__get_nr_types listed under 0.0.2 version

Andrii Nakryiko (3):
  btf: extract BTF type size calculation
  btf: add BTF types deduplication algorithm
  selftests/btf: add initial BTF dedup tests

 tools/lib/bpf/btf.c                    | 1851 +++++++++++++++++++++++-
 tools/lib/bpf/btf.h                    |   10 +
 tools/lib/bpf/libbpf.map               |    3 +
 tools/testing/selftests/bpf/test_btf.c |  535 ++++++-
 4 files changed, 2332 insertions(+), 67 deletions(-)

-- 
2.17.1


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox