Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: pull-request: can 2019-07-24
From: David Miller @ 2019-07-24 21:15 UTC (permalink / raw)
  To: mkl; +Cc: netdev, linux-can, kernel
In-Reply-To: <20190724130322.31702-1-mkl@pengutronix.de>

From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Wed, 24 Jul 2019 15:03:15 +0200

> this is a pull reqeust of 7 patches for net/master.

Pulled, thank you.

^ permalink raw reply

* Re: [PATCH] qlge: Fix build error without CONFIG_ETHERNET
From: David Miller @ 2019-07-24 21:12 UTC (permalink / raw)
  To: yuehaibing
  Cc: manishc, GR-Linux-NIC-Dev, gregkh, bpoirier, linux-kernel, devel,
	netdev
In-Reply-To: <20190724.141202.10100086363454182.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Wed, 24 Jul 2019 14:12:02 -0700 (PDT)

> From: YueHaibing <yuehaibing@huawei.com>
> Date: Wed, 24 Jul 2019 21:01:26 +0800
> 
>> Now if CONFIG_ETHERNET is not set, QLGE driver
>> building fails:
>> 
>> drivers/staging/qlge/qlge_main.o: In function `qlge_remove':
>> drivers/staging/qlge/qlge_main.c:4831: undefined reference to `unregister_netdev'
>> 
>> Reported-by: Hulk Robot <hulkci@huawei.com>
>> Fixes: 955315b0dc8c ("qlge: Move drivers/net/ethernet/qlogic/qlge/ to drivers/staging/qlge/")
>> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
> 
> I'll let Greg take this.

Actually, I take that back.

Since the move to staging happened in my tree I will take this ;-)

^ permalink raw reply

* Re: [PATCH] qlge: Fix build error without CONFIG_ETHERNET
From: David Miller @ 2019-07-24 21:12 UTC (permalink / raw)
  To: yuehaibing
  Cc: manishc, GR-Linux-NIC-Dev, gregkh, bpoirier, linux-kernel, devel,
	netdev
In-Reply-To: <20190724130126.53532-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Wed, 24 Jul 2019 21:01:26 +0800

> Now if CONFIG_ETHERNET is not set, QLGE driver
> building fails:
> 
> drivers/staging/qlge/qlge_main.o: In function `qlge_remove':
> drivers/staging/qlge/qlge_main.c:4831: undefined reference to `unregister_netdev'
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Fixes: 955315b0dc8c ("qlge: Move drivers/net/ethernet/qlogic/qlge/ to drivers/staging/qlge/")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

I'll let Greg take this.

^ permalink raw reply

* Re: [PATCH] r8169: fix a typo in a comment
From: David Miller @ 2019-07-24 21:11 UTC (permalink / raw)
  To: corentinmusard; +Cc: netdev, linux-kernel, trivial
In-Reply-To: <20190724123443.GA9626@user.home>

From: Corentin Musard <corentinmusard@gmail.com>
Date: Wed, 24 Jul 2019 14:34:43 +0200

> Replace "additonal" by "additional" in a comment.
> Typo found by checkpatch.pl.
> 
> Signed-off-by: Corentin Musard <corentinmusard@gmail.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH v4 net-next 11/19] ionic: Add Rx filter and rx_mode ndo support
From: Saeed Mahameed @ 2019-07-24 21:10 UTC (permalink / raw)
  To: snelson@pensando.io, netdev@vger.kernel.org, davem@davemloft.net
In-Reply-To: <dbc85d6d-b0cd-43d4-aded-88b482ecc477@pensando.io>

On Tue, 2019-07-23 at 17:07 -0700, Shannon Nelson wrote:
> On 7/23/19 4:20 PM, Saeed Mahameed wrote:
> > On Mon, 2019-07-22 at 14:40 -0700, Shannon Nelson wrote:
> > > @@ -607,6 +947,8 @@ static void ionic_lif_free(struct lif *lif)
> > >   	ionic_qcqs_free(lif);
> > >   	ionic_lif_reset(lif);
> > >   
> > I don't think you want deferred.work running while reset is
> > executing..
> > cancel_work_sync should happen as early as you close the netdevice.
> 
> Given the current implementation, it doesn't actually hurt anything,
> but 
> yes it makes sense to move it up in the sequence.
> 
> > I assume ionic_lif_reset will flush all configurations and you
> > don't
> > need to cleanup anything manually?  or any data structure stored in
> > driver ?
> 
> Most of the driver data structure cleaning has happened in 
> ionic_lif_deinit() before getting here.

This means that yo do have a problem keeping the deferred.work running
after ionic_lif_deinit

> 
> sln
> 

^ permalink raw reply

* Re: Reminder: 99 open syzbot bugs in net subsystem
From: Eric Biggers @ 2019-07-24 21:09 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, dvyukov, netdev, fw, i.maximets, edumazet, dsahern,
	linux-kernel, syzkaller-bugs
In-Reply-To: <20190724.130928.1854327585456756387.davem@davemloft.net>

On Wed, Jul 24, 2019 at 01:09:28PM -0700, David Miller wrote:
> From: Eric Biggers <ebiggers@kernel.org>
> Date: Wed, 24 Jul 2019 11:37:12 -0700
> 
> > We can argue about what words to use to describe this situation, but
> > it doesn't change the situation itself.
> 
> And we should argue about those words because it matters to humans and
> effects how they feel, and humans ultimately fix these bugs.
> 
> So please stop with the hyperbole.
> 
> Thank you.

Okay, there are 151 bugs that syzbot saw on the mainline Linux kernel in the
last 7 days (90.1% with reproducers).  Of those, 59 were reported over 3 months
ago (89.8% with reproducers).  Of those, 12 were reported over a year ago (83.3%
with reproducers).

No opinion on whether those are small/medium/large numbers, in case it would
hurt someone's feelings.

These numbers do *not* include bugs that are still valid but weren't seen on
mainline in last 7 days, e.g.:

- Bugs that are seen only rarely, so by chance weren't seen in last 7 days.
- Bugs only in linux-next and/or subsystem branches.
- Bugs that were seen in mainline more than 7 days ago, and then only on
  linux-next or subsystem branch in last 7 days.
- Bugs that stopped being seen due to a change in syzkaller.
- Bugs that stopped being seen due to a change in kernel config.
- Bugs that stopped being seen due to other environment changes such as kernel
  command line parameters.
- Bugs that stopped being seen due to a kernel change that hid the bug but
  didn't actually fix it, i.e. still reachable in other ways.

Eric

^ permalink raw reply

* [PATCH net-next] r8169: improve rtl_set_rx_mode
From: Heiner Kallweit @ 2019-07-24 21:04 UTC (permalink / raw)
  To: Realtek linux nic maintainers, David Miller; +Cc: netdev@vger.kernel.org

This patch improves and simplifies rtl_set_rx_mode a little.
No functional change intended.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169_main.c | 52 ++++++++++-------------
 1 file changed, 22 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 9c743d2fc..c39d3a77c 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -61,7 +61,7 @@
 
 /* Maximum number of multicast addresses to filter (vs. Rx-all-multicast).
    The RTL chips use a 64 element hash table based on the Ethernet CRC. */
-static const int multicast_filter_limit = 32;
+#define	MC_FILTER_LIMIT	32
 
 #define TX_DMA_BURST	7	/* Maximum PCI burst, '7' is unlimited */
 #define InterFrameGap	0x03	/* 3 means InterFrameGap = the shortest one */
@@ -4147,53 +4147,45 @@ static void rtl8169_set_magic_reg(struct rtl8169_private *tp, unsigned mac_versi
 static void rtl_set_rx_mode(struct net_device *dev)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
-	u32 mc_filter[2];	/* Multicast hash filter */
-	int rx_mode;
-	u32 tmp = 0;
+	/* Multicast hash filter */
+	u32 mc_filter[2] = { 0xffffffff, 0xffffffff };
+	u32 rx_mode = AcceptBroadcast | AcceptMyPhys | AcceptMulticast;
+	u32 tmp;
 
 	if (dev->flags & IFF_PROMISC) {
 		/* Unconditionally log net taps. */
 		netif_notice(tp, link, dev, "Promiscuous mode enabled\n");
-		rx_mode =
-		    AcceptBroadcast | AcceptMulticast | AcceptMyPhys |
-		    AcceptAllPhys;
-		mc_filter[1] = mc_filter[0] = 0xffffffff;
-	} else if ((netdev_mc_count(dev) > multicast_filter_limit) ||
-		   (dev->flags & IFF_ALLMULTI)) {
-		/* Too many to filter perfectly -- accept all multicasts. */
-		rx_mode = AcceptBroadcast | AcceptMulticast | AcceptMyPhys;
-		mc_filter[1] = mc_filter[0] = 0xffffffff;
+		rx_mode |= AcceptAllPhys;
+	} else if (netdev_mc_count(dev) > MC_FILTER_LIMIT ||
+		   dev->flags & IFF_ALLMULTI ||
+		   tp->mac_version == RTL_GIGA_MAC_VER_35) {
+		/* accept all multicasts */
+	} else if (netdev_mc_empty(dev)) {
+		rx_mode &= ~AcceptMulticast;
 	} else {
 		struct netdev_hw_addr *ha;
 
-		rx_mode = AcceptBroadcast | AcceptMyPhys;
 		mc_filter[1] = mc_filter[0] = 0;
 		netdev_for_each_mc_addr(ha, dev) {
-			int bit_nr = ether_crc(ETH_ALEN, ha->addr) >> 26;
-			mc_filter[bit_nr >> 5] |= 1 << (bit_nr & 31);
-			rx_mode |= AcceptMulticast;
+			u32 bit_nr = ether_crc(ETH_ALEN, ha->addr) >> 26;
+			mc_filter[bit_nr >> 5] |= BIT(bit_nr & 31);
+		}
+
+		if (tp->mac_version > RTL_GIGA_MAC_VER_06) {
+			tmp = mc_filter[0];
+			mc_filter[0] = swab32(mc_filter[1]);
+			mc_filter[1] = swab32(tmp);
 		}
 	}
 
 	if (dev->features & NETIF_F_RXALL)
 		rx_mode |= (AcceptErr | AcceptRunt);
 
-	tmp = (RTL_R32(tp, RxConfig) & ~RX_CONFIG_ACCEPT_MASK) | rx_mode;
-
-	if (tp->mac_version > RTL_GIGA_MAC_VER_06) {
-		u32 data = mc_filter[0];
-
-		mc_filter[0] = swab32(mc_filter[1]);
-		mc_filter[1] = swab32(data);
-	}
-
-	if (tp->mac_version == RTL_GIGA_MAC_VER_35)
-		mc_filter[1] = mc_filter[0] = 0xffffffff;
-
 	RTL_W32(tp, MAR0 + 4, mc_filter[1]);
 	RTL_W32(tp, MAR0 + 0, mc_filter[0]);
 
-	RTL_W32(tp, RxConfig, tmp);
+	tmp = RTL_R32(tp, RxConfig);
+	RTL_W32(tp, RxConfig, (tmp & ~RX_CONFIG_ACCEPT_MASK) | rx_mode);
 }
 
 DECLARE_RTL_COND(rtl_csiar_cond)
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH] ip6_gre: reload ipv6h in prepare_ip6gre_xmit_ipv6
From: David Miller @ 2019-07-24 21:00 UTC (permalink / raw)
  To: yanhaishuang; +Cc: kuznet, netdev, linux-kernel, u9012063
In-Reply-To: <1563969642-11843-1-git-send-email-yanhaishuang@cmss.chinamobile.com>

From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Wed, 24 Jul 2019 20:00:42 +0800

> Since ip6_tnl_parse_tlv_enc_lim() can call pskb_may_pull()
> which may change skb->data, so we need to re-load ipv6h at
> the right place.
> 
> Fixes: 898b29798e36 ("ip6_gre: Refactor ip6gre xmit codes")
> Cc: William Tu <u9012063@gmail.com>
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH] net/ipv4: cleanup error condition testing
From: David Miller @ 2019-07-24 20:57 UTC (permalink / raw)
  To: pavel; +Cc: kuznet, yoshfuji, netdev, linux-kernel
In-Reply-To: <20190724100611.GA7516@amd>

From: Pavel Machek <pavel@ucw.cz>
Date: Wed, 24 Jul 2019 12:06:11 +0200

> Cleanup testing for error condition.
> 
> Signed-off-by: Pavel Machek <pavel@denx.de>

Applied.

^ permalink raw reply

* Re: [PATCH mlx5-next] net/mlx5: Fix modify_cq_in alignment
From: Saeed Mahameed @ 2019-07-24 20:56 UTC (permalink / raw)
  To: davem@davemloft.net, leon@kernel.org
  Cc: Jason Gunthorpe, Yishai Hadas, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, dledford@redhat.com, Edward Srouji
In-Reply-To: <20190723190414.GU5125@mtr-leonro.mtl.com>

On Tue, 2019-07-23 at 22:04 +0300, Leon Romanovsky wrote:
> On Tue, Jul 23, 2019 at 11:28:50AM -0700, David Miller wrote:
> > From: Leon Romanovsky <leon@kernel.org>
> > Date: Tue, 23 Jul 2019 10:12:55 +0300
> > 
> > > From: Edward Srouji <edwards@mellanox.com>
> > > 
> > > Fix modify_cq_in alignment to match the device specification.
> > > After this fix the 'cq_umem_valid' field will be in the right
> > > offset.
> > > 
> > > Cc: <stable@vger.kernel.org> # 4.19
> > > Fixes: bd37197554eb ("net/mlx5: Update mlx5_ifc with DEVX UID
> > > bits")

Leon, I applied this patch to my tree, it got marked for -stable 4.20
and not 4.19, i checked manually and indeed the offending patch came to
light only on 4.20


^ permalink raw reply

* Re: [PATCH] r8169: fix a typo in a comment
From: Heiner Kallweit @ 2019-07-24 20:53 UTC (permalink / raw)
  To: Corentin Musard, David S. Miller, netdev, linux-kernel; +Cc: trivial
In-Reply-To: <20190724123443.GA9626@user.home>

On 24.07.2019 14:34, Corentin Musard wrote:
> Replace "additonal" by "additional" in a comment.
> Typo found by checkpatch.pl.
> 
> Signed-off-by: Corentin Musard <corentinmusard@gmail.com>
> ---
>  drivers/net/ethernet/realtek/r8169_main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index 0637c6752a78..7231ab3573ff 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -6334,7 +6334,7 @@ rtl8169_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
>  	stats->multicast	= dev->stats.multicast;
>  
>  	/*
> -	 * Fetch additonal counter values missing in stats collected by driver
> +	 * Fetch additional counter values missing in stats collected by driver
>  	 * from tally counters.
>  	 */
>  	if (pm_runtime_active(&pdev->dev))
> 
Should have been annotated net-next and sent also to me. Apart from that:
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>

^ permalink raw reply

* Re: [PATCH bpf-next 1/6] bpf: add bpf_map_value_size and bp_map_copy_value helper functions
From: Song Liu @ 2019-07-24 20:53 UTC (permalink / raw)
  To: Brian Vazquez
  Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
	David S . Miller, Stanislav Fomichev, Willem de Bruijn,
	Petar Penkov, open list, Networking, bpf
In-Reply-To: <20190724165803.87470-2-brianvv@google.com>

On Wed, Jul 24, 2019 at 10:10 AM Brian Vazquez <brianvv@google.com> wrote:
>
> Move reusable code from map_lookup_elem to helper functions to avoid code
> duplication in kernel/bpf/syscall.c
>
> Suggested-by: Stanislav Fomichev <sdf@google.com>
> Signed-off-by: Brian Vazquez <brianvv@google.com>

Acked-by: Song Liu <songliubraving@fb.com>

Some very minor nits though.

> ---
>  kernel/bpf/syscall.c | 134 +++++++++++++++++++++++--------------------
>  1 file changed, 73 insertions(+), 61 deletions(-)
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 5d141f16f6fa9..86cdc2f7bb56e 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -126,6 +126,76 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
>         return map;
>  }
>
> +static u32 bpf_map_value_size(struct bpf_map *map)
> +{
> +       if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
> +           map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
> +           map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY ||
> +           map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE)
> +               return round_up(map->value_size, 8) * num_possible_cpus();
> +       else if (IS_FD_MAP(map))
> +               return sizeof(u32);
> +       else
> +               return  map->value_size;
                  ^ extra space after return

> +}
> +
> +static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
> +                             __u64 flags)
> +{
> +       void *ptr;
> +       int err;
> +
> +       if (bpf_map_is_dev_bound(map))
> +               return  bpf_map_offload_lookup_elem(map, key, value);
                  ^ another extra space after return, did replace? :-)

Thanks,
Song

^ permalink raw reply

* Re: [net-next 6/6] e1000e: disable force K1-off feature
From: Jeff Kirsher @ 2019-07-24 20:52 UTC (permalink / raw)
  To: David Miller; +Cc: kai.heng.feng, netdev, nhorman, sassmann, aaron.f.brown
In-Reply-To: <20190723.140444.1126474066269131522.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1060 bytes --]

On Tue, 2019-07-23 at 14:04 -0700, David Miller wrote:
> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Date: Tue, 23 Jul 2019 10:36:50 -0700
> 
> > diff --git a/drivers/net/ethernet/intel/e1000e/hw.h
> > b/drivers/net/ethernet/intel/e1000e/hw.h
> > index eff75bd8a8f0..e3c71fd093ee 100644
> > --- a/drivers/net/ethernet/intel/e1000e/hw.h
> > +++ b/drivers/net/ethernet/intel/e1000e/hw.h
> > @@ -662,6 +662,7 @@ struct e1000_dev_spec_ich8lan {
> >  	bool kmrn_lock_loss_workaround_enabled;
> >  	struct e1000_shadow_ram
> > shadow_ram[E1000_ICH8_SHADOW_RAM_WORDS];
> >  	bool nvm_k1_enabled;
> > +	bool disable_k1_off;
> >  	bool eee_disable;
> 
> I don't see any code actually setting this boolean, how does it work?

So either this patch is missing a bit of code OR this is not needed for
the Linux driver.  Either way this patch needs changes or needs to be
dropped, so I am dropping this patch from the series until I get
further information from the client driver team.

I will be re-submitting this series, minus this patch.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 2/2] mlx4/en_netdev: call notifiers when hw_enc_features change
From: Saeed Mahameed @ 2019-07-24 20:47 UTC (permalink / raw)
  To: dcaratti@redhat.com, davem@davemloft.net, Tariq Toukan,
	netdev@vger.kernel.org
In-Reply-To: <e157af6e79d9385df37444d817cf3c166878c8f6.1563976690.git.dcaratti@redhat.com>

On Wed, 2019-07-24 at 16:02 +0200, Davide Caratti wrote:
> ensure to call netdev_features_change() when the driver flips its
> hw_enc_features bits.
> 
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>

The patch is correct, but can you explain how did you come to this ? 
did you encounter any issue with the current code ?

I am asking just because i think the whole dynamic changing of dev-
>hw_enc_features is redundant since mlx4 has the featutres_check
callback.

> ---
>  .../net/ethernet/mellanox/mlx4/en_netdev.c    | 39 ++++++++++++-----
> --
>  1 file changed, 26 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 52500f744a0e..1b484dc6e1c2 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -2628,6 +2628,30 @@ static int mlx4_en_get_phys_port_id(struct
> net_device *dev,
>  	return 0;
>  }
>  
> +#define MLX4_GSO_PARTIAL_FEATURES (NETIF_F_IP_CSUM |
> NETIF_F_IPV6_CSUM | \
> +				   NETIF_F_RXCSUM | \
> +				   NETIF_F_TSO | NETIF_F_TSO6 | \
> +				   NETIF_F_GSO_UDP_TUNNEL | \
> +				   NETIF_F_GSO_UDP_TUNNEL_CSUM | \
> +				   NETIF_F_GSO_PARTIAL)
> +
> +static void mlx4_set_vxlan_offloads(struct net_device *dev, bool
> enable)
> +{
> +	netdev_features_t hw_enc_features;
> +
> +	rtnl_lock();
> +	hw_enc_features = dev->hw_enc_features;
> +	if (enable)
> +		dev->hw_enc_features |= MLX4_GSO_PARTIAL_FEATURES;
> +	else
> +		dev->hw_enc_features &= ~MLX4_GSO_PARTIAL_FEATURES;
> +
> +	if (hw_enc_features ^ dev->hw_enc_features)
> +		netdev_features_change(dev);
> +
> +	rtnl_unlock();
> +}
> +
>  static void mlx4_en_add_vxlan_offloads(struct work_struct *work)
>  {
>  	int ret;
> @@ -2647,12 +2671,7 @@ static void mlx4_en_add_vxlan_offloads(struct
> work_struct *work)
>  	}
>  
>  	/* set offloads */
> -	priv->dev->hw_enc_features |= NETIF_F_IP_CSUM |
> NETIF_F_IPV6_CSUM |
> -				      NETIF_F_RXCSUM |
> -				      NETIF_F_TSO | NETIF_F_TSO6 |
> -				      NETIF_F_GSO_UDP_TUNNEL |
> -				      NETIF_F_GSO_UDP_TUNNEL_CSUM |
> -				      NETIF_F_GSO_PARTIAL;
> +	mlx4_set_vxlan_offloads(priv->dev, true);
>  }
>  
>  static void mlx4_en_del_vxlan_offloads(struct work_struct *work)
> @@ -2661,13 +2680,7 @@ static void mlx4_en_del_vxlan_offloads(struct
> work_struct *work)
>  	struct mlx4_en_priv *priv = container_of(work, struct
> mlx4_en_priv,
>  						 vxlan_del_task);
>  	/* unset offloads */
> -	priv->dev->hw_enc_features &= ~(NETIF_F_IP_CSUM |
> NETIF_F_IPV6_CSUM |
> -					NETIF_F_RXCSUM |
> -					NETIF_F_TSO | NETIF_F_TSO6 |
> -					NETIF_F_GSO_UDP_TUNNEL |
> -					NETIF_F_GSO_UDP_TUNNEL_CSUM |
> -					NETIF_F_GSO_PARTIAL);
> -
> +	mlx4_set_vxlan_offloads(priv->dev, false);
>  	ret = mlx4_SET_PORT_VXLAN(priv->mdev->dev, priv->port,
>  				  VXLAN_STEER_BY_OUTER_MAC, 0);
>  	if (ret)

^ permalink raw reply

* Re: [PATCH net-next 1/4] sctp: check addr_size with sa_family_t size in __sctp_setsockopt_connectx
From: Neil Horman @ 2019-07-24 20:43 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: Xin Long, network dev, linux-sctp, davem
In-Reply-To: <20190724191243.GA4063@localhost.localdomain>

On Wed, Jul 24, 2019 at 04:12:43PM -0300, Marcelo Ricardo Leitner wrote:
> On Wed, Jul 24, 2019 at 04:05:43PM -0300, Marcelo Ricardo Leitner wrote:
> > On Wed, Jul 24, 2019 at 02:44:56PM -0400, Neil Horman wrote:
> > > On Wed, Jul 24, 2019 at 09:49:07AM -0300, Marcelo Ricardo Leitner wrote:
> > > > On Wed, Jul 24, 2019 at 09:36:50AM -0300, Marcelo Ricardo Leitner wrote:
> > > > > On Wed, Jul 24, 2019 at 07:22:35AM -0400, Neil Horman wrote:
> > > > > > On Wed, Jul 24, 2019 at 03:21:12PM +0800, Xin Long wrote:
> > > > > > > On Tue, Jul 23, 2019 at 11:25 PM Neil Horman <nhorman@tuxdriver.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jul 23, 2019 at 01:37:57AM +0800, Xin Long wrote:
> > > > > > > > > Now __sctp_connect() is called by __sctp_setsockopt_connectx() and
> > > > > > > > > sctp_inet_connect(), the latter has done addr_size check with size
> > > > > > > > > of sa_family_t.
> > > > > > > > >
> > > > > > > > > In the next patch to clean up __sctp_connect(), we will remove
> > > > > > > > > addr_size check with size of sa_family_t from __sctp_connect()
> > > > > > > > > for the 1st address.
> > > > > > > > >
> > > > > > > > > So before doing that, __sctp_setsockopt_connectx() should do
> > > > > > > > > this check first, as sctp_inet_connect() does.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > > > > > > > > ---
> > > > > > > > >  net/sctp/socket.c | 2 +-
> > > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > >
> > > > > > > > > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > > > > > > > > index aa80cda..5f92e4a 100644
> > > > > > > > > --- a/net/sctp/socket.c
> > > > > > > > > +++ b/net/sctp/socket.c
> > > > > > > > > @@ -1311,7 +1311,7 @@ static int __sctp_setsockopt_connectx(struct sock *sk,
> > > > > > > > >       pr_debug("%s: sk:%p addrs:%p addrs_size:%d\n",
> > > > > > > > >                __func__, sk, addrs, addrs_size);
> > > > > > > > >
> > > > > > > > > -     if (unlikely(addrs_size <= 0))
> > > > > > > > > +     if (unlikely(addrs_size < sizeof(sa_family_t)))
> > > > > > > > I don't think this is what you want to check for here.  sa_family_t is
> > > > > > > > an unsigned short, and addrs_size is the number of bytes in the addrs
> > > > > > > > array.  The addrs array should be at least the size of one struct
> > > > > > > > sockaddr (16 bytes iirc), and, if larger, should be a multiple of
> > > > > > > > sizeof(struct sockaddr)
> > > > > > > sizeof(struct sockaddr) is not the right value to check either.
> > > > > > > 
> > > > > > > The proper check will be done later in __sctp_connect():
> > > > > > > 
> > > > > > >         af = sctp_get_af_specific(daddr->sa.sa_family);
> > > > > > >         if (!af || af->sockaddr_len > addrs_size)
> > > > > > >                 return -EINVAL;
> > > > > > > 
> > > > > > > So the check 'addrs_size < sizeof(sa_family_t)' in this patch is
> > > > > > > just to make sure daddr->sa.sa_family is accessible. the same
> > > > > > > check is also done in sctp_inet_connect().
> > > > > > > 
> > > > > > That doesn't make much sense, if the proper check is done in __sctp_connect with
> > > > > > the size of the families sockaddr_len, then we don't need this check at all, we
> > > > > > can just let memdup_user take the fault on copy_to_user and return -EFAULT.  If
> > > > > > we get that from memdup_user, we know its not accessible, and can bail out.
> > > > > > 
> > > > > > About the only thing we need to check for here is that addr_len isn't some
> > > > > > absurdly high value (i.e. a negative value), so that we avoid trying to kmalloc
> > > > > > upwards of 2G in memdup_user.  Your change does that just fine, but its no
> > > > > > better or worse than checking for <=0
> > > > > 
> > > > > One can argue that such check against absurdly high values is random
> > > > > and not effective, as 2G can be somewhat reasonable on 8GB systems but
> > > > > certainly isn't on 512MB ones. On that, kmemdup_user() will also fail
> > > > > gracefully as it uses GFP_USER and __GFP_NOWARN.
> > > > > 
> > > > > The original check is more for protecting for sane usage of the
> > > > > variable, which is an int, and a negative value is questionable. We
> > > > > could cast, yes, but.. was that really the intent of the application?
> > > > > Probably not.
> > > > 
> > > > Though that said, I'm okay with the new check here: a quick sanity
> > > > check that can avoid expensive calls to kmalloc(), while more refined
> > > > check is done later on.
> > > > 
> > > I agree a sanity check makes sense, just to avoid allocating a huge value
> > > (even 2G is absurd on many systems), however, I'm not super comfortable with
> > > checking for the value being less than 16 (sizeof(sa_family_t)).  The zero check
> > 
> > 16 bits you mean then, per
> > include/uapi/linux/socket.h
> > typedef unsigned short __kernel_sa_family_t;
> > include/linux/socket.h
> > typedef __kernel_sa_family_t    sa_family_t;
> > 
> > > is fairly obvious given the signed nature of the lengh field, this check makes
> > > me wonder what exactly we are checking for.
> > 
> > A minimum viable buffer without doing more extensive tests. Beyond
> > sa_family, we need to parse sa_family and then that's left for later.
> > Perhaps a comment helps, something like
> > 	/* Check if we have at least the family type in there */
> > ?
> 
> Hm, then this could be
> -     if (unlikely(addrs_size <= 0))
> +     if (unlikely(addrs_size < sizeof(struct sockaddr_in)))
> (ipv4)
> As it can't be smaller than that, always.
> 
True, but I think perhaps just the family type size check is more correct, as
thats the minimal information we need to get the proper sockaddr_len out of
sctp_get_af_specific.

Neil

> > 
> >   Marcelo
> > 
> > > 
> > > Neil
> > > 
> > > > > 
> > > > > > 
> > > > > > Neil
> > > > > > 
> > > > > > > >
> > > > > > > > Neil
> > > > > > > >
> > > > > > > > >               return -EINVAL;
> > > > > > > > >
> > > > > > > > >       kaddrs = memdup_user(addrs, addrs_size);
> > > > > > > > > --
> > > > > > > > > 2.1.0
> > > > > > > > >
> > > > > > > > >
> > > > > > > 
> > > > 
> > 
> 

^ permalink raw reply

* Re: [PATCH bpf-next] libbpf: silence GCC8 warning about string truncation
From: Song Liu @ 2019-07-24 20:43 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Kernel Team, Magnus Karlsson
In-Reply-To: <20190723220127.1815913-1-andriin@fb.com>

On Tue, Jul 23, 2019 at 7:33 PM Andrii Nakryiko <andriin@fb.com> wrote:
>
> Despite a proper NULL-termination after strncpy(..., ..., IFNAMSIZ - 1),
> GCC8 still complains about *expected* string truncation:
>
>   xsk.c:330:2: error: 'strncpy' output may be truncated copying 15 bytes
>   from a string of length 15 [-Werror=stringop-truncation]
>     strncpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ - 1);
>
> This patch gets rid of the issue altogether by using memcpy instead.
> There is no performance regression, as strncpy will still copy and fill
> all of the bytes anyway.
>
> Cc: Magnus Karlsson <magnus.karlsson@intel.com>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply

* Re: [PATCH net-next 1/4] sctp: check addr_size with sa_family_t size in __sctp_setsockopt_connectx
From: Neil Horman @ 2019-07-24 20:41 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: Xin Long, network dev, linux-sctp, davem
In-Reply-To: <20190724190543.GH6204@localhost.localdomain>

On Wed, Jul 24, 2019 at 04:05:43PM -0300, Marcelo Ricardo Leitner wrote:
> On Wed, Jul 24, 2019 at 02:44:56PM -0400, Neil Horman wrote:
> > On Wed, Jul 24, 2019 at 09:49:07AM -0300, Marcelo Ricardo Leitner wrote:
> > > On Wed, Jul 24, 2019 at 09:36:50AM -0300, Marcelo Ricardo Leitner wrote:
> > > > On Wed, Jul 24, 2019 at 07:22:35AM -0400, Neil Horman wrote:
> > > > > On Wed, Jul 24, 2019 at 03:21:12PM +0800, Xin Long wrote:
> > > > > > On Tue, Jul 23, 2019 at 11:25 PM Neil Horman <nhorman@tuxdriver.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jul 23, 2019 at 01:37:57AM +0800, Xin Long wrote:
> > > > > > > > Now __sctp_connect() is called by __sctp_setsockopt_connectx() and
> > > > > > > > sctp_inet_connect(), the latter has done addr_size check with size
> > > > > > > > of sa_family_t.
> > > > > > > >
> > > > > > > > In the next patch to clean up __sctp_connect(), we will remove
> > > > > > > > addr_size check with size of sa_family_t from __sctp_connect()
> > > > > > > > for the 1st address.
> > > > > > > >
> > > > > > > > So before doing that, __sctp_setsockopt_connectx() should do
> > > > > > > > this check first, as sctp_inet_connect() does.
> > > > > > > >
> > > > > > > > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > > > > > > > ---
> > > > > > > >  net/sctp/socket.c | 2 +-
> > > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > > > > > > > index aa80cda..5f92e4a 100644
> > > > > > > > --- a/net/sctp/socket.c
> > > > > > > > +++ b/net/sctp/socket.c
> > > > > > > > @@ -1311,7 +1311,7 @@ static int __sctp_setsockopt_connectx(struct sock *sk,
> > > > > > > >       pr_debug("%s: sk:%p addrs:%p addrs_size:%d\n",
> > > > > > > >                __func__, sk, addrs, addrs_size);
> > > > > > > >
> > > > > > > > -     if (unlikely(addrs_size <= 0))
> > > > > > > > +     if (unlikely(addrs_size < sizeof(sa_family_t)))
> > > > > > > I don't think this is what you want to check for here.  sa_family_t is
> > > > > > > an unsigned short, and addrs_size is the number of bytes in the addrs
> > > > > > > array.  The addrs array should be at least the size of one struct
> > > > > > > sockaddr (16 bytes iirc), and, if larger, should be a multiple of
> > > > > > > sizeof(struct sockaddr)
> > > > > > sizeof(struct sockaddr) is not the right value to check either.
> > > > > > 
> > > > > > The proper check will be done later in __sctp_connect():
> > > > > > 
> > > > > >         af = sctp_get_af_specific(daddr->sa.sa_family);
> > > > > >         if (!af || af->sockaddr_len > addrs_size)
> > > > > >                 return -EINVAL;
> > > > > > 
> > > > > > So the check 'addrs_size < sizeof(sa_family_t)' in this patch is
> > > > > > just to make sure daddr->sa.sa_family is accessible. the same
> > > > > > check is also done in sctp_inet_connect().
> > > > > > 
> > > > > That doesn't make much sense, if the proper check is done in __sctp_connect with
> > > > > the size of the families sockaddr_len, then we don't need this check at all, we
> > > > > can just let memdup_user take the fault on copy_to_user and return -EFAULT.  If
> > > > > we get that from memdup_user, we know its not accessible, and can bail out.
> > > > > 
> > > > > About the only thing we need to check for here is that addr_len isn't some
> > > > > absurdly high value (i.e. a negative value), so that we avoid trying to kmalloc
> > > > > upwards of 2G in memdup_user.  Your change does that just fine, but its no
> > > > > better or worse than checking for <=0
> > > > 
> > > > One can argue that such check against absurdly high values is random
> > > > and not effective, as 2G can be somewhat reasonable on 8GB systems but
> > > > certainly isn't on 512MB ones. On that, kmemdup_user() will also fail
> > > > gracefully as it uses GFP_USER and __GFP_NOWARN.
> > > > 
> > > > The original check is more for protecting for sane usage of the
> > > > variable, which is an int, and a negative value is questionable. We
> > > > could cast, yes, but.. was that really the intent of the application?
> > > > Probably not.
> > > 
> > > Though that said, I'm okay with the new check here: a quick sanity
> > > check that can avoid expensive calls to kmalloc(), while more refined
> > > check is done later on.
> > > 
> > I agree a sanity check makes sense, just to avoid allocating a huge value
> > (even 2G is absurd on many systems), however, I'm not super comfortable with
> > checking for the value being less than 16 (sizeof(sa_family_t)).  The zero check
> 
> 16 bits you mean then, per
> include/uapi/linux/socket.h
> typedef unsigned short __kernel_sa_family_t;
> include/linux/socket.h
> typedef __kernel_sa_family_t    sa_family_t;
> 
> > is fairly obvious given the signed nature of the lengh field, this check makes
> > me wonder what exactly we are checking for.
> 
> A minimum viable buffer without doing more extensive tests. Beyond
> sa_family, we need to parse sa_family and then that's left for later.
> Perhaps a comment helps, something like
> 	/* Check if we have at least the family type in there */
> ?
> 
Yeah, I'd be ok with something like that, or perhaps:

/* Check if we have at least the family value in buffer, so get_af_specific can
 * do a proper size check in __sctp_connect
 */

Neil

>   Marcelo
> 
> > 
> > Neil
> > 
> > > > 
> > > > > 
> > > > > Neil
> > > > > 
> > > > > > >
> > > > > > > Neil
> > > > > > >
> > > > > > > >               return -EINVAL;
> > > > > > > >
> > > > > > > >       kaddrs = memdup_user(addrs, addrs_size);
> > > > > > > > --
> > > > > > > > 2.1.0
> > > > > > > >
> > > > > > > >
> > > > > > 
> > > 
> 

^ permalink raw reply

* Re: [PATCH net-next 1/2] mlx4/en_netdev: update vlan features with tunnel offloads
From: Saeed Mahameed @ 2019-07-24 20:39 UTC (permalink / raw)
  To: dcaratti@redhat.com, davem@davemloft.net, Tariq Toukan,
	netdev@vger.kernel.org
In-Reply-To: <c3ccd45ee8742fb1a35fcfd41955907329e8112b.1563976690.git.dcaratti@redhat.com>

On Wed, 2019-07-24 at 16:02 +0200, Davide Caratti wrote:
> ConnectX-3 Pro can offload transmission of VLAN packets with VXLAN
> inside:
> enable tunnel offloads in dev->vlan_features, like it's done with
> other
> NIC drivers (e.g. be2net and ixgbe).
> 
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>


LGTM
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>

^ permalink raw reply

* [PATCH net-next 1/3] net: ethernet: mediatek: Add basic PHYLINK support
From: René van Dorst @ 2019-07-24 19:23 UTC (permalink / raw)
  To: netdev
  Cc: frank-w, sean.wang, f.fainelli, linux, davem, matthias.bgg,
	andrew, vivien.didelot, john, linux-mediatek, linux-mips, robh+dt,
	devicetree, René van Dorst

This convert the basics to PHYLINK API.
SGMII support is not in this patch.

Signed-off-by: René van Dorst <opensource@vdorst.com>
Tested-by: Frank Wunderlich <frank-w@public-files.de>
---
 drivers/net/ethernet/mediatek/Kconfig       |   2 +-
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 404 +++++++++++---------
 drivers/net/ethernet/mediatek/mtk_eth_soc.h |  30 +-
 3 files changed, 253 insertions(+), 183 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/Kconfig b/drivers/net/ethernet/mediatek/Kconfig
index 263cd0909fe0..e20f04d17b49 100644
--- a/drivers/net/ethernet/mediatek/Kconfig
+++ b/drivers/net/ethernet/mediatek/Kconfig
@@ -10,7 +10,7 @@ if NET_VENDOR_MEDIATEK
 config NET_MEDIATEK_SOC
 	tristate "MediaTek SoC Gigabit Ethernet support"
 	depends on NET_VENDOR_MEDIATEK
-	select PHYLIB
+	select PHYLINK
 	---help---
 	  This driver supports the gigabit ethernet MACs in the
 	  MediaTek SoC family.
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 597e377d8f20..853929070cb3 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -18,6 +18,7 @@
 #include <linux/tcp.h>
 #include <linux/interrupt.h>
 #include <linux/pinctrl/devinfo.h>
+#include <linux/phylink.h>
 
 #include "mtk_eth_soc.h"
 
@@ -186,165 +187,219 @@ static void mtk_gmac0_rgmii_adjust(struct mtk_eth *eth, int speed)
 	mtk_w32(eth, val, TRGMII_TCK_CTRL);
 }
 
-static void mtk_phy_link_adjust(struct net_device *dev)
+static void mtk_mac_config(struct phylink_config *config, unsigned int mode,
+			   const struct phylink_link_state *state)
 {
-	struct mtk_mac *mac = netdev_priv(dev);
-	u16 lcl_adv = 0, rmt_adv = 0;
-	u8 flowctrl;
-	u32 mcr = MAC_MCR_MAX_RX_1536 | MAC_MCR_IPG_CFG |
-		  MAC_MCR_FORCE_MODE | MAC_MCR_TX_EN |
-		  MAC_MCR_RX_EN | MAC_MCR_BACKOFF_EN |
-		  MAC_MCR_BACKPR_EN;
+	struct mtk_mac *mac = container_of(config, struct mtk_mac,
+					   phylink_config);
+	struct mtk_eth *eth = mac->hw;
 
-	if (unlikely(test_bit(MTK_RESETTING, &mac->hw->state)))
-		return;
+	u32 ge_mode = 0, val, mcr_cur, mcr_new;
+
+	if (mac->interface != state->interface) {
+		/* Setup soc pin functions */
+		switch (state->interface) {
+		case PHY_INTERFACE_MODE_TRGMII:
+			if (mac->id)
+				goto err_phy;
+			if (!MTK_HAS_CAPS(mac->hw->soc->caps,
+					  MTK_GMAC1_TRGMII))
+				goto err_phy;
+			/* fall through */
+		case PHY_INTERFACE_MODE_RGMII_TXID:
+		case PHY_INTERFACE_MODE_RGMII_RXID:
+		case PHY_INTERFACE_MODE_RGMII_ID:
+		case PHY_INTERFACE_MODE_RGMII:
+			break;
+		case PHY_INTERFACE_MODE_MII:
+			ge_mode = 1;
+			break;
+		case PHY_INTERFACE_MODE_REVMII:
+			ge_mode = 2;
+			break;
+		case PHY_INTERFACE_MODE_RMII:
+			if (mac->id)
+				goto err_phy;
+			ge_mode = 3;
+			break;
+		default:
+			goto err_phy;
+		}
+
+		/* Setup clock for 1st gmac */
+		if (!mac->id &&
+		    MTK_HAS_CAPS(mac->hw->soc->caps, MTK_GMAC1_TRGMII)) {
+			if (MTK_HAS_CAPS(mac->hw->soc->caps,
+					 MTK_TRGMII_MT7621_CLK)) {
+				if (mt7621_gmac0_rgmii_adjust(mac->hw,
+							      state->interface))
+					goto err_phy;
+			} else {
+				if (state->interface !=
+				    PHY_INTERFACE_MODE_TRGMII)
+					mtk_gmac0_rgmii_adjust(mac->hw,
+							       state->speed);
+			}
+		}
 
-	switch (dev->phydev->speed) {
+		/* put the gmac into the right mode */
+		regmap_read(eth->ethsys, ETHSYS_SYSCFG0, &val);
+		val &= ~SYSCFG0_GE_MODE(SYSCFG0_GE_MASK, mac->id);
+		val |= SYSCFG0_GE_MODE(ge_mode, mac->id);
+		regmap_write(eth->ethsys, ETHSYS_SYSCFG0, val);
+
+		mac->interface = state->interface;
+	}
+
+	/* Setup gmac */
+	mcr_cur = mtk_r32(mac->hw, MTK_MAC_MCR(mac->id));
+	mcr_new = mcr_cur;
+	mcr_new &= ~(MAC_MCR_SPEED_100 | MAC_MCR_SPEED_1000 |
+		     MAC_MCR_FORCE_DPX | MAC_MCR_FORCE_TX_FC |
+		     MAC_MCR_FORCE_RX_FC);
+	mcr_new |= MAC_MCR_MAX_RX_1536 | MAC_MCR_IPG_CFG | MAC_MCR_FORCE_MODE |
+		   MAC_MCR_BACKOFF_EN | MAC_MCR_BACKPR_EN | MAC_MCR_FORCE_LINK;
+
+	switch (state->speed) {
 	case SPEED_1000:
-		mcr |= MAC_MCR_SPEED_1000;
+		mcr_new |= MAC_MCR_SPEED_1000;
 		break;
 	case SPEED_100:
-		mcr |= MAC_MCR_SPEED_100;
+		mcr_new |= MAC_MCR_SPEED_100;
 		break;
 	}
-
-	if (MTK_HAS_CAPS(mac->hw->soc->caps, MTK_GMAC1_TRGMII) && !mac->id) {
-		if (MTK_HAS_CAPS(mac->hw->soc->caps, MTK_TRGMII_MT7621_CLK)) {
-			if (mt7621_gmac0_rgmii_adjust(mac->hw,
-						      dev->phydev->interface))
-				return;
-		} else {
-			if (!mac->trgmii)
-				mtk_gmac0_rgmii_adjust(mac->hw,
-						       dev->phydev->speed);
-		}
+	if (state->duplex == DUPLEX_FULL) {
+		mcr_new |= MAC_MCR_FORCE_DPX;
+		if (state->pause & MLO_PAUSE_TX)
+			mcr_new |= MAC_MCR_FORCE_TX_FC;
+		if (state->pause & MLO_PAUSE_RX)
+			mcr_new |= MAC_MCR_FORCE_RX_FC;
 	}
 
-	if (dev->phydev->link)
-		mcr |= MAC_MCR_FORCE_LINK;
+	/* Only update control register when needed! */
+	if (mcr_new != mcr_cur)
+		mtk_w32(mac->hw, mcr_new, MTK_MAC_MCR(mac->id));
 
-	if (dev->phydev->duplex) {
-		mcr |= MAC_MCR_FORCE_DPX;
+	return;
 
-		if (dev->phydev->pause)
-			rmt_adv = LPA_PAUSE_CAP;
-		if (dev->phydev->asym_pause)
-			rmt_adv |= LPA_PAUSE_ASYM;
+err_phy:
+	dev_err(eth->dev, "%s: GMAC%d mode %s not supported!\n", __func__,
+		mac->id, phy_modes(state->interface));
+}
+
+static int mtk_mac_link_state(struct phylink_config *config,
+			      struct phylink_link_state *state)
+{
+	struct mtk_mac *mac = container_of(config, struct mtk_mac,
+					   phylink_config);
+	u32 pmsr;
 
-		lcl_adv = linkmode_adv_to_lcl_adv_t(dev->phydev->advertising);
-		flowctrl = mii_resolve_flowctrl_fdx(lcl_adv, rmt_adv);
+	pmsr = mtk_r32(mac->hw, MTK_MAC_MSR(mac->id));
 
-		if (flowctrl & FLOW_CTRL_TX)
-			mcr |= MAC_MCR_FORCE_TX_FC;
-		if (flowctrl & FLOW_CTRL_RX)
-			mcr |= MAC_MCR_FORCE_RX_FC;
+	state->link = (pmsr & MAC_MSR_LINK);
+	state->duplex = (pmsr & MAC_MSR_DPX) >> 1;
 
-		netif_dbg(mac->hw, link, dev, "rx pause %s, tx pause %s\n",
-			  flowctrl & FLOW_CTRL_RX ? "enabled" : "disabled",
-			  flowctrl & FLOW_CTRL_TX ? "enabled" : "disabled");
+	switch (pmsr & (MAC_MSR_SPEED_1000 | MAC_MSR_SPEED_100)) {
+	case 0:
+		state->speed = SPEED_10;
+		break;
+	case MAC_MSR_SPEED_100:
+		state->speed = SPEED_100;
+		break;
+	case MAC_MSR_SPEED_1000:
+		state->speed = SPEED_1000;
+		break;
+	default:
+		state->speed = SPEED_UNKNOWN;
+		break;
 	}
 
-	mtk_w32(mac->hw, mcr, MTK_MAC_MCR(mac->id));
+	state->pause = 0;
+	if (pmsr & MAC_MSR_RX_FC)
+		state->pause |= MLO_PAUSE_RX;
+	if (pmsr & MAC_MSR_TX_FC)
+		state->pause |= MLO_PAUSE_TX;
 
-	if (!of_phy_is_fixed_link(mac->of_node))
-		phy_print_status(dev->phydev);
+	return 1;
 }
 
-static int mtk_phy_connect_node(struct mtk_eth *eth, struct mtk_mac *mac,
-				struct device_node *phy_node)
+static void mtk_mac_an_restart(struct phylink_config *config)
 {
-	struct phy_device *phydev;
-	int phy_mode;
+	/* Do nothing */
+}
 
-	phy_mode = of_get_phy_mode(phy_node);
-	if (phy_mode < 0) {
-		dev_err(eth->dev, "incorrect phy-mode %d\n", phy_mode);
-		return -EINVAL;
-	}
+static void mtk_mac_link_down(struct phylink_config *config, unsigned int mode,
+			      phy_interface_t interface)
+{
+	struct mtk_mac *mac = container_of(config, struct mtk_mac,
+					   phylink_config);
 
-	phydev = of_phy_connect(eth->netdev[mac->id], phy_node,
-				mtk_phy_link_adjust, 0, phy_mode);
-	if (!phydev) {
-		dev_err(eth->dev, "could not connect to PHY\n");
-		return -ENODEV;
-	}
+	mtk_w32(mac->hw, 0x8000, MTK_MAC_MCR(mac->id));
+}
 
-	dev_info(eth->dev,
-		 "connected mac %d to PHY at %s [uid=%08x, driver=%s]\n",
-		 mac->id, phydev_name(phydev), phydev->phy_id,
-		 phydev->drv->name);
+static void mtk_mac_link_up(struct phylink_config *config, unsigned int mode,
+			    phy_interface_t interface,
+			    struct phy_device *phy)
+{
+	struct mtk_mac *mac = container_of(config, struct mtk_mac,
+					   phylink_config);
+	u32 mcr;
 
-	return 0;
+	mcr = mtk_r32(mac->hw, MTK_MAC_MCR(mac->id));
+	mcr |= MAC_MCR_TX_EN | MAC_MCR_RX_EN;
+	mtk_w32(mac->hw, mcr, MTK_MAC_MCR(mac->id));
 }
 
-static int mtk_phy_connect(struct net_device *dev)
+static void mtk_validate(struct phylink_config *config,
+			 unsigned long *supported,
+			 struct phylink_link_state *state)
 {
-	struct mtk_mac *mac = netdev_priv(dev);
-	struct mtk_eth *eth;
-	struct device_node *np;
-	u32 val;
-	int err;
-
-	eth = mac->hw;
-	np = of_parse_phandle(mac->of_node, "phy-handle", 0);
-	if (!np && of_phy_is_fixed_link(mac->of_node))
-		if (!of_phy_register_fixed_link(mac->of_node))
-			np = of_node_get(mac->of_node);
-	if (!np)
-		return -ENODEV;
+	__ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
+	struct mtk_mac *mac = container_of(config, struct mtk_mac,
+					   phylink_config);
 
-	err = mtk_setup_hw_path(eth, mac->id, of_get_phy_mode(np));
-	if (err)
-		goto err_phy;
-
-	mac->ge_mode = 0;
-	switch (of_get_phy_mode(np)) {
-	case PHY_INTERFACE_MODE_TRGMII:
-		mac->trgmii = true;
-	case PHY_INTERFACE_MODE_RGMII_TXID:
-	case PHY_INTERFACE_MODE_RGMII_RXID:
-	case PHY_INTERFACE_MODE_RGMII_ID:
-	case PHY_INTERFACE_MODE_RGMII:
-	case PHY_INTERFACE_MODE_SGMII:
-		break;
-	case PHY_INTERFACE_MODE_MII:
-	case PHY_INTERFACE_MODE_GMII:
-		mac->ge_mode = 1;
-		break;
-	case PHY_INTERFACE_MODE_REVMII:
-		mac->ge_mode = 2;
-		break;
-	case PHY_INTERFACE_MODE_RMII:
-		if (!mac->id)
-			goto err_phy;
-		mac->ge_mode = 3;
-		break;
-	default:
-		goto err_phy;
+	if (state->interface != PHY_INTERFACE_MODE_NA &&
+	    state->interface != PHY_INTERFACE_MODE_MII &&
+	    !(!mac->id && state->interface == PHY_INTERFACE_MODE_TRGMII &&
+	      MTK_HAS_CAPS(mac->hw->soc->caps, MTK_TRGMII)) &&
+	    !phy_interface_mode_is_rgmii(state->interface)) {
+		linkmode_zero(supported);
+		return;
 	}
 
-	/* put the gmac into the right mode */
-	regmap_read(eth->ethsys, ETHSYS_SYSCFG0, &val);
-	val &= ~SYSCFG0_GE_MODE(SYSCFG0_GE_MASK, mac->id);
-	val |= SYSCFG0_GE_MODE(mac->ge_mode, mac->id);
-	regmap_write(eth->ethsys, ETHSYS_SYSCFG0, val);
+	phylink_set_port_modes(mask);
+	phylink_set(mask, Autoneg);
 
-	/* couple phydev to net_device */
-	if (mtk_phy_connect_node(eth, mac, np))
-		goto err_phy;
-
-	of_node_put(np);
+	if (state->interface == PHY_INTERFACE_MODE_TRGMII) {
+		phylink_set(mask, 1000baseT_Full);
+	} else {
+		phylink_set(mask, 10baseT_Half);
+		phylink_set(mask, 10baseT_Full);
+		phylink_set(mask, 100baseT_Half);
+		phylink_set(mask, 100baseT_Full);
+
+		if (state->interface != PHY_INTERFACE_MODE_MII) {
+			phylink_set(mask, 1000baseT_Half);
+			phylink_set(mask, 1000baseT_Full);
+		}
+	}
 
-	return 0;
+	phylink_set(mask, Pause);
+	phylink_set(mask, Asym_Pause);
 
-err_phy:
-	if (of_phy_is_fixed_link(mac->of_node))
-		of_phy_deregister_fixed_link(mac->of_node);
-	of_node_put(np);
-	dev_err(eth->dev, "%s: invalid phy\n", __func__);
-	return -EINVAL;
+	linkmode_and(supported, supported, mask);
+	linkmode_and(state->advertising, state->advertising, mask);
 }
 
+static const struct phylink_mac_ops mtk_phylink_ops = {
+	.validate = mtk_validate,
+	.mac_link_state = mtk_mac_link_state,
+	.mac_an_restart = mtk_mac_an_restart,
+	.mac_config = mtk_mac_config,
+	.mac_link_down = mtk_mac_link_down,
+	.mac_link_up = mtk_mac_link_up,
+};
+
 static int mtk_mdio_init(struct mtk_eth *eth)
 {
 	struct device_node *mii_np;
@@ -1798,6 +1853,13 @@ static int mtk_open(struct net_device *dev)
 {
 	struct mtk_mac *mac = netdev_priv(dev);
 	struct mtk_eth *eth = mac->hw;
+	int err = phylink_of_phy_connect(mac->phylink, mac->of_node, 0);
+
+	if (err) {
+		netdev_err(dev, "%s: could not attach PHY: %d\n", __func__,
+			   err);
+		return err;
+	}
 
 	/* we run 2 netdevs on the same dma ring so we only bring it up once */
 	if (!refcount_read(&eth->dma_refcnt)) {
@@ -1815,7 +1877,7 @@ static int mtk_open(struct net_device *dev)
 	else
 		refcount_inc(&eth->dma_refcnt);
 
-	phy_start(dev->phydev);
+	phylink_start(mac->phylink);
 	netif_start_queue(dev);
 
 	return 0;
@@ -1849,8 +1911,11 @@ static int mtk_stop(struct net_device *dev)
 	struct mtk_mac *mac = netdev_priv(dev);
 	struct mtk_eth *eth = mac->hw;
 
+	phylink_stop(mac->phylink);
+
 	netif_tx_disable(dev);
-	phy_stop(dev->phydev);
+
+	phylink_disconnect_phy(mac->phylink);
 
 	/* only shutdown DMA if this is the last user */
 	if (!refcount_dec_and_test(&eth->dma_refcnt))
@@ -1926,15 +1991,6 @@ static int mtk_hw_init(struct mtk_eth *eth)
 	ethsys_reset(eth, RSTCTRL_FE);
 	ethsys_reset(eth, RSTCTRL_PPE);
 
-	regmap_read(eth->ethsys, ETHSYS_SYSCFG0, &val);
-	for (i = 0; i < MTK_MAC_COUNT; i++) {
-		if (!eth->mac[i])
-			continue;
-		val &= ~SYSCFG0_GE_MODE(SYSCFG0_GE_MASK, eth->mac[i]->id);
-		val |= SYSCFG0_GE_MODE(eth->mac[i]->ge_mode, eth->mac[i]->id);
-	}
-	regmap_write(eth->ethsys, ETHSYS_SYSCFG0, val);
-
 	if (eth->pctl) {
 		/* Set GE2 driving and slew rate */
 		regmap_write(eth->pctl, GPIO_DRV_SEL10, 0xa00);
@@ -1979,7 +2035,7 @@ static int mtk_hw_init(struct mtk_eth *eth)
 	mtk_w32(eth, MTK_RX_DONE_INT, MTK_QDMA_INT_GRP2);
 	mtk_w32(eth, 0x21021000, MTK_FE_INT_GRP);
 
-	for (i = 0; i < 2; i++) {
+	for (i = 0; i < MTK_MAC_COUNT; i++) {
 		u32 val = mtk_r32(eth, MTK_GDMA_FWD_CFG(i));
 
 		/* setup the forward port to send frame to PDMA */
@@ -2031,7 +2087,7 @@ static int __init mtk_init(struct net_device *dev)
 			dev->dev_addr);
 	}
 
-	return mtk_phy_connect(dev);
+	return 0;
 }
 
 static void mtk_uninit(struct net_device *dev)
@@ -2039,20 +2095,20 @@ static void mtk_uninit(struct net_device *dev)
 	struct mtk_mac *mac = netdev_priv(dev);
 	struct mtk_eth *eth = mac->hw;
 
-	phy_disconnect(dev->phydev);
-	if (of_phy_is_fixed_link(mac->of_node))
-		of_phy_deregister_fixed_link(mac->of_node);
+	phylink_disconnect_phy(mac->phylink);
 	mtk_tx_irq_disable(eth, ~0);
 	mtk_rx_irq_disable(eth, ~0);
 }
 
 static int mtk_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 {
+	struct mtk_mac *mac = netdev_priv(dev);
+
 	switch (cmd) {
 	case SIOCGMIIPHY:
 	case SIOCGMIIREG:
 	case SIOCSMIIREG:
-		return phy_mii_ioctl(dev->phydev, ifr, cmd);
+		return phylink_mii_ioctl(mac->phylink, ifr, cmd);
 	default:
 		break;
 	}
@@ -2093,16 +2149,6 @@ static void mtk_pending_work(struct work_struct *work)
 				     eth->dev->pins->default_state);
 	mtk_hw_init(eth);
 
-	for (i = 0; i < MTK_MAC_COUNT; i++) {
-		if (!eth->mac[i] ||
-		    of_phy_is_fixed_link(eth->mac[i]->of_node))
-			continue;
-		err = phy_init_hw(eth->netdev[i]->phydev);
-		if (err)
-			dev_err(eth->dev, "%s: PHY init failed.\n",
-				eth->netdev[i]->name);
-	}
-
 	/* restart DMA and enable IRQs */
 	for (i = 0; i < MTK_MAC_COUNT; i++) {
 		if (!test_bit(i, &restart))
@@ -2165,9 +2211,7 @@ static int mtk_get_link_ksettings(struct net_device *ndev,
 	if (unlikely(test_bit(MTK_RESETTING, &mac->hw->state)))
 		return -EBUSY;
 
-	phy_ethtool_ksettings_get(ndev->phydev, cmd);
-
-	return 0;
+	return phylink_ethtool_ksettings_get(mac->phylink, cmd);
 }
 
 static int mtk_set_link_ksettings(struct net_device *ndev,
@@ -2178,7 +2222,7 @@ static int mtk_set_link_ksettings(struct net_device *ndev,
 	if (unlikely(test_bit(MTK_RESETTING, &mac->hw->state)))
 		return -EBUSY;
 
-	return phy_ethtool_ksettings_set(ndev->phydev, cmd);
+	return phylink_ethtool_ksettings_set(mac->phylink, cmd);
 }
 
 static void mtk_get_drvinfo(struct net_device *dev,
@@ -2212,22 +2256,10 @@ static int mtk_nway_reset(struct net_device *dev)
 	if (unlikely(test_bit(MTK_RESETTING, &mac->hw->state)))
 		return -EBUSY;
 
-	return genphy_restart_aneg(dev->phydev);
-}
-
-static u32 mtk_get_link(struct net_device *dev)
-{
-	struct mtk_mac *mac = netdev_priv(dev);
-	int err;
-
-	if (unlikely(test_bit(MTK_RESETTING, &mac->hw->state)))
-		return -EBUSY;
-
-	err = genphy_update_link(dev->phydev);
-	if (err)
-		return ethtool_op_get_link(dev);
+	if (!mac->phylink)
+		return -ENOTSUPP;
 
-	return dev->phydev->link;
+	return phylink_ethtool_nway_reset(mac->phylink);
 }
 
 static void mtk_get_strings(struct net_device *dev, u32 stringset, u8 *data)
@@ -2347,7 +2379,7 @@ static const struct ethtool_ops mtk_ethtool_ops = {
 	.get_msglevel		= mtk_get_msglevel,
 	.set_msglevel		= mtk_set_msglevel,
 	.nway_reset		= mtk_nway_reset,
-	.get_link		= mtk_get_link,
+	.get_link		= ethtool_op_get_link,
 	.get_strings		= mtk_get_strings,
 	.get_sset_count		= mtk_get_sset_count,
 	.get_ethtool_stats	= mtk_get_ethtool_stats,
@@ -2375,9 +2407,10 @@ static const struct net_device_ops mtk_netdev_ops = {
 
 static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np)
 {
+	struct phylink *phylink;
 	struct mtk_mac *mac;
 	const __be32 *_id = of_get_property(np, "reg", NULL);
-	int id, err;
+	int phy_mode, id, err;
 
 	if (!_id) {
 		dev_err(eth->dev, "missing mac id\n");
@@ -2421,6 +2454,32 @@ static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np)
 	u64_stats_init(&mac->hw_stats->syncp);
 	mac->hw_stats->reg_offset = id * MTK_STAT_OFFSET;
 
+	/* phylink create */
+	phy_mode = of_get_phy_mode(np);
+	if (phy_mode < 0) {
+		dev_err(eth->dev, "incorrect phy-mode\n");
+		err = -EINVAL;
+		goto free_netdev;
+	}
+
+	/* mac config is not set */
+	mac->interface = PHY_INTERFACE_MODE_NA;
+	mac->mode = MLO_AN_PHY;
+	mac->speed = SPEED_UNKNOWN;
+
+	mac->phylink_config.dev = &eth->netdev[id]->dev;
+	mac->phylink_config.type = PHYLINK_NETDEV;
+
+	phylink = phylink_create(&mac->phylink_config,
+				 of_fwnode_handle(mac->of_node),
+				 phy_mode, &mtk_phylink_ops);
+	if (IS_ERR(phylink)) {
+		err = PTR_ERR(phylink);
+		goto free_netdev;
+	}
+
+	mac->phylink = phylink;
+
 	SET_NETDEV_DEV(eth->netdev[id], eth->dev);
 	eth->netdev[id]->watchdog_timeo = 5 * HZ;
 	eth->netdev[id]->netdev_ops = &mtk_netdev_ops;
@@ -2617,6 +2676,7 @@ static int mtk_probe(struct platform_device *pdev)
 static int mtk_remove(struct platform_device *pdev)
 {
 	struct mtk_eth *eth = platform_get_drvdata(pdev);
+	struct mtk_mac *mac;
 	int i;
 
 	/* stop all devices to make sure that dma is properly shut down */
@@ -2624,6 +2684,8 @@ static int mtk_remove(struct platform_device *pdev)
 		if (!eth->netdev[i])
 			continue;
 		mtk_stop(eth->netdev[i]);
+		mac = netdev_priv(eth->netdev[i]);
+		phylink_disconnect_phy(mac->phylink);
 	}
 
 	mtk_hw_deinit(eth);
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index bab94f763e2c..3bfcba9ffb58 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -14,6 +14,7 @@
 #include <linux/of_net.h>
 #include <linux/u64_stats_sync.h>
 #include <linux/refcount.h>
+#include <linux/phylink.h>
 
 #define MTK_QDMA_PAGE_SIZE	2048
 #define	MTK_MAX_RX_LENGTH	1536
@@ -320,12 +321,18 @@
 #define MAC_MCR_SPEED_100	BIT(2)
 #define MAC_MCR_FORCE_DPX	BIT(1)
 #define MAC_MCR_FORCE_LINK	BIT(0)
-#define MAC_MCR_FIXED_LINK	(MAC_MCR_MAX_RX_1536 | MAC_MCR_IPG_CFG | \
-				 MAC_MCR_FORCE_MODE | MAC_MCR_TX_EN | \
-				 MAC_MCR_RX_EN | MAC_MCR_BACKOFF_EN | \
-				 MAC_MCR_BACKPR_EN | MAC_MCR_FORCE_RX_FC | \
-				 MAC_MCR_FORCE_TX_FC | MAC_MCR_SPEED_1000 | \
-				 MAC_MCR_FORCE_DPX | MAC_MCR_FORCE_LINK)
+
+/* Mac status registers */
+#define MTK_MAC_MSR(x)		(0x10108 + (x * 0x100))
+#define MAC_MSR_EEE1G		BIT(7)
+#define MAC_MSR_EEE100M		BIT(6)
+#define MAC_MSR_RX_FC		BIT(5)
+#define MAC_MSR_TX_FC		BIT(4)
+#define MAC_MSR_SPEED_1000	BIT(3)
+#define MAC_MSR_SPEED_100	BIT(2)
+#define MAC_MSR_SPEED_MASK	(MAC_MSR_SPEED_1000 | MAC_MSR_SPEED_100)
+#define MAC_MSR_DPX		BIT(1)
+#define MAC_MSR_LINK		BIT(0)
 
 /* TRGMII RXC control register */
 #define TRGMII_RCK_CTRL		0x10300
@@ -815,22 +822,23 @@ struct mtk_eth {
 /* struct mtk_mac -	the structure that holds the info about the MACs of the
  *			SoC
  * @id:			The number of the MAC
- * @ge_mode:            Interface mode kept for setup restoring
+ * @interface:		Interface mode kept for detecting change in hw settings
  * @of_node:		Our devicetree node
  * @hw:			Backpointer to our main datastruture
  * @hw_stats:		Packet statistics counter
- * @trgmii		Indicate if the MAC uses TRGMII connected to internal
-			switch
  */
 struct mtk_mac {
 	int				id;
-	int				ge_mode;
+	phy_interface_t			interface;
+	unsigned int			mode;
+	int				speed;
 	struct device_node		*of_node;
+	struct phylink			*phylink;
+	struct phylink_config		phylink_config;
 	struct mtk_eth			*hw;
 	struct mtk_hw_stats		*hw_stats;
 	__be32				hwlro_ip[MTK_MAX_LRO_IP_CNT];
 	int				hwlro_ip_cnt;
-	bool				trgmii;
 };
 
 /* the struct describing the SoC. these are declared in the soc_xyz.c files */
-- 
2.20.1


^ permalink raw reply related

* [PATCH net-next 0/3] net: dsa: MT7530: Convert to PHYLINK and add support for port 5
From: René van Dorst @ 2019-07-24 19:25 UTC (permalink / raw)
  To: netdev
  Cc: frank-w, sean.wang, f.fainelli, linux, davem, matthias.bgg,
	andrew, vivien.didelot, john, linux-mediatek, linux-mips, robh+dt,
	devicetree, René van Dorst

1. net: dsa: mt7530: Convert to PHYLINK API
   This patch converts mt7530 to PHYLINK API.
2. dt-bindings: net: dsa: mt7530: Add support for port 5
3. net: dsa: mt7530: Add support for port 5
   These 2 patches adding support for port 5 of the switch.

rfc -> v1:
 * Mostly phylink improvements after review.
 * Drop phy isolation patches. Adds no value for now.

René van Dorst (3):
  net: dsa: mt7530: Convert to PHYLINK API
  dt-bindings: net: dsa: mt7530: Add support for port 5
  net: dsa: mt7530: Add support for port 5

 .../devicetree/bindings/net/dsa/mt7530.txt    | 215 +++++++++++
 drivers/net/dsa/mt7530.c                      | 356 +++++++++++++++---
 drivers/net/dsa/mt7530.h                      |  60 ++-
 3 files changed, 561 insertions(+), 70 deletions(-)

To: <netdev@vger.kernel.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Cc: Frank Wunderlich <frank-w@public-files.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-mediatek@lists.infradead.org
Cc: linux-mips@vger.kernel.org
Cc: John Crispin <john@phrozen.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>

-- 
2.20.1

^ permalink raw reply

* [PATCH bpf-next 00/10] CO-RE offset relocations
From: Andrii Nakryiko @ 2019-07-24 19:27 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel, yhs
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko

This patch set implements central part of CO-RE (Compile Once - Run
Everywhere, see [0] and [1] for slides and video): relocating field offsets.
Most of the details are written down as comments to corresponding parts of the
code.

Patch #1 adds loading of .BTF.ext offset relocations section and macros to
work with its contents.
Patch #2 implements CO-RE relocations algorithm in libbpf.
Patches #3-#10 adds selftests validating various parts of relocation handling,
type compatibility, etc.

For all tests to work, you'll need latest Clang/LLVM supporting
__builtin_preserve_access_index intrinsic, used for recording offset
relocations. Kernel on which selftests run should have BTF information built
in (CONFIG_DEBUG_INFO_BTF=y).

  [0] http://vger.kernel.org/bpfconf2019.html#session-2
  [1] http://vger.kernel.org/lpc-bpf2018.html#session-2CO-RE relocations

This patch set implements central part of CO-RE (Compile Once - Run
Everywhere, see [0] and [1] for slides and video): relocating field offsets.
Most of the details are written down as comments to corresponding parts of the
code.

Patch #1 adds loading of .BTF.ext offset relocations section and macros to
work with its contents.
Patch #2 implements CO-RE relocations algorithm in libbpf.
Patches #3-#10 adds selftests validating various parts of relocation handling,
type compatibility, etc.

For all tests to work, you'll need latest Clang/LLVM supporting
__builtin_preserve_access_index intrinsic, used for recording offset
relocations. Kernel on which selftests run should have BTF information built
in (CONFIG_DEBUG_INFO_BTF=y).

  [0] http://vger.kernel.org/bpfconf2019.html#session-2
  [1] http://vger.kernel.org/lpc-bpf2018.html#session-2

Andrii Nakryiko (10):
  libbpf: add .BTF.ext offset relocation section loading
  libbpf: implement BPF CO-RE offset relocation algorithm
  selftests/bpf: add CO-RE relocs testing setup
  selftests/bpf: add CO-RE relocs struct flavors tests
  selftests/bpf: add CO-RE relocs nesting tests
  selftests/bpf: add CO-RE relocs array tests
  selftests/bpf: add CO-RE relocs enum/ptr/func_proto tests
  selftests/bpf: add CO-RE relocs modifiers/typedef tests
  selftest/bpf: add CO-RE relocs ptr-as-array tests
  selftests/bpf: add CO-RE relocs ints tests

 tools/lib/bpf/btf.c                           |  64 +-
 tools/lib/bpf/btf.h                           |   4 +
 tools/lib/bpf/libbpf.c                        | 866 +++++++++++++++++-
 tools/lib/bpf/libbpf.h                        |   1 +
 tools/lib/bpf/libbpf_internal.h               |  91 ++
 .../selftests/bpf/prog_tests/core_reloc.c     | 363 ++++++++
 .../bpf/progs/btf__core_reloc_arrays.c        |   3 +
 .../btf__core_reloc_arrays___diff_arr_dim.c   |   3 +
 ...btf__core_reloc_arrays___diff_arr_val_sz.c |   3 +
 .../btf__core_reloc_arrays___err_non_array.c  |   3 +
 ...btf__core_reloc_arrays___err_too_shallow.c |   3 +
 .../btf__core_reloc_arrays___err_too_small.c  |   3 +
 ..._core_reloc_arrays___err_wrong_val_type1.c |   3 +
 ..._core_reloc_arrays___err_wrong_val_type2.c |   3 +
 .../bpf/progs/btf__core_reloc_flavors.c       |   3 +
 .../btf__core_reloc_flavors__err_wrong_name.c |   3 +
 .../bpf/progs/btf__core_reloc_ints.c          |   3 +
 .../bpf/progs/btf__core_reloc_ints___bool.c   |   3 +
 .../btf__core_reloc_ints___err_bitfield.c     |   3 +
 .../btf__core_reloc_ints___err_wrong_sz_16.c  |   3 +
 .../btf__core_reloc_ints___err_wrong_sz_32.c  |   3 +
 .../btf__core_reloc_ints___err_wrong_sz_64.c  |   3 +
 .../btf__core_reloc_ints___err_wrong_sz_8.c   |   3 +
 .../btf__core_reloc_ints___reverse_sign.c     |   3 +
 .../bpf/progs/btf__core_reloc_mods.c          |   3 +
 .../progs/btf__core_reloc_mods___mod_swap.c   |   3 +
 .../progs/btf__core_reloc_mods___typedefs.c   |   3 +
 .../bpf/progs/btf__core_reloc_nesting.c       |   3 +
 .../btf__core_reloc_nesting___anon_embed.c    |   3 +
 ...f__core_reloc_nesting___dup_compat_types.c |   5 +
 ...core_reloc_nesting___err_array_container.c |   3 +
 ...tf__core_reloc_nesting___err_array_field.c |   3 +
 ...e_reloc_nesting___err_dup_incompat_types.c |   4 +
 ...re_reloc_nesting___err_missing_container.c |   3 +
 ...__core_reloc_nesting___err_missing_field.c |   3 +
 ..._reloc_nesting___err_nonstruct_container.c |   3 +
 ...e_reloc_nesting___err_partial_match_dups.c |   4 +
 .../btf__core_reloc_nesting___err_too_deep.c  |   3 +
 .../btf__core_reloc_nesting___extra_nesting.c |   3 +
 ..._core_reloc_nesting___struct_union_mixup.c |   3 +
 .../bpf/progs/btf__core_reloc_primitives.c    |   3 +
 ...f__core_reloc_primitives___diff_enum_def.c |   3 +
 ..._core_reloc_primitives___diff_func_proto.c |   3 +
 ...f__core_reloc_primitives___diff_ptr_type.c |   3 +
 ...tf__core_reloc_primitives___err_non_enum.c |   3 +
 ...btf__core_reloc_primitives___err_non_int.c |   3 +
 ...btf__core_reloc_primitives___err_non_ptr.c |   3 +
 .../bpf/progs/btf__core_reloc_ptr_as_arr.c    |   3 +
 .../btf__core_reloc_ptr_as_arr___diff_sz.c    |   3 +
 .../selftests/bpf/progs/core_reloc_types.h    | 642 +++++++++++++
 .../bpf/progs/test_core_reloc_arrays.c        |  58 ++
 .../bpf/progs/test_core_reloc_flavors.c       |  65 ++
 .../bpf/progs/test_core_reloc_ints.c          |  48 +
 .../bpf/progs/test_core_reloc_kernel.c        |  39 +
 .../bpf/progs/test_core_reloc_mods.c          |  68 ++
 .../bpf/progs/test_core_reloc_nesting.c       |  48 +
 .../bpf/progs/test_core_reloc_primitives.c    |  50 +
 .../bpf/progs/test_core_reloc_ptr_as_arr.c    |  34 +
 58 files changed, 2527 insertions(+), 47 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/core_reloc.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___diff_arr_dim.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___diff_arr_val_sz.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_non_array.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_too_shallow.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_too_small.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_wrong_val_type1.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_wrong_val_type2.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_flavors.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_flavors__err_wrong_name.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___bool.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_bitfield.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_16.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_32.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_64.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_8.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___reverse_sign.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods___mod_swap.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods___typedefs.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___anon_embed.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___dup_compat_types.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_array_container.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_array_field.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_dup_incompat_types.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_missing_container.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_missing_field.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_nonstruct_container.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_partial_match_dups.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_too_deep.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___extra_nesting.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___struct_union_mixup.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_enum_def.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_func_proto.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_ptr_type.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_enum.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_int.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_ptr.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ptr_as_arr.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ptr_as_arr___diff_sz.c
 create mode 100644 tools/testing/selftests/bpf/progs/core_reloc_types.h
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_flavors.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_ints.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_kernel.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_mods.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_nesting.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_primitives.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_ptr_as_arr.c

-- 
2.17.1


^ permalink raw reply

* [PATCH bpf-next 01/10] libbpf: add .BTF.ext offset relocation section loading
From: Andrii Nakryiko @ 2019-07-24 19:27 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel, yhs
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190724192742.1419254-1-andriin@fb.com>

Add support for BPF CO-RE offset relocations. Add section/record
iteration macros for .BTF.ext. These macro are useful for iterating over
each .BTF.ext record, either for dumping out contents or later for BPF
CO-RE relocation handling.

To enable other parts of libbpf to work with .BTF.ext contents, moved
a bunch of type definitions into libbpf_internal.h.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/btf.c             | 64 +++++++++--------------
 tools/lib/bpf/btf.h             |  4 ++
 tools/lib/bpf/libbpf_internal.h | 91 +++++++++++++++++++++++++++++++++
 3 files changed, 118 insertions(+), 41 deletions(-)

diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 467224feb43b..4a36bc783848 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -42,47 +42,6 @@ struct btf {
 	int fd;
 };
 
-struct btf_ext_info {
-	/*
-	 * info points to the individual info section (e.g. func_info and
-	 * line_info) from the .BTF.ext. It does not include the __u32 rec_size.
-	 */
-	void *info;
-	__u32 rec_size;
-	__u32 len;
-};
-
-struct btf_ext {
-	union {
-		struct btf_ext_header *hdr;
-		void *data;
-	};
-	struct btf_ext_info func_info;
-	struct btf_ext_info line_info;
-	__u32 data_size;
-};
-
-struct btf_ext_info_sec {
-	__u32	sec_name_off;
-	__u32	num_info;
-	/* Followed by num_info * record_size number of bytes */
-	__u8	data[0];
-};
-
-/* The minimum bpf_func_info checked by the loader */
-struct bpf_func_info_min {
-	__u32   insn_off;
-	__u32   type_id;
-};
-
-/* The minimum bpf_line_info checked by the loader */
-struct bpf_line_info_min {
-	__u32	insn_off;
-	__u32	file_name_off;
-	__u32	line_off;
-	__u32	line_col;
-};
-
 static inline __u64 ptr_to_u64(const void *ptr)
 {
 	return (__u64) (unsigned long) ptr;
@@ -831,6 +790,9 @@ static int btf_ext_setup_info(struct btf_ext *btf_ext,
 	/* The start of the info sec (including the __u32 record_size). */
 	void *info;
 
+	if (ext_sec->len == 0)
+		return 0;
+
 	if (ext_sec->off & 0x03) {
 		pr_debug(".BTF.ext %s section is not aligned to 4 bytes\n",
 		     ext_sec->desc);
@@ -934,6 +896,19 @@ static int btf_ext_setup_line_info(struct btf_ext *btf_ext)
 	return btf_ext_setup_info(btf_ext, &param);
 }
 
+static int btf_ext_setup_offset_reloc(struct btf_ext *btf_ext)
+{
+	struct btf_ext_sec_setup_param param = {
+		.off = btf_ext->hdr->offset_reloc_off,
+		.len = btf_ext->hdr->offset_reloc_len,
+		.min_rec_size = sizeof(struct bpf_offset_reloc),
+		.ext_info = &btf_ext->offset_reloc_info,
+		.desc = "offset_reloc",
+	};
+
+	return btf_ext_setup_info(btf_ext, &param);
+}
+
 static int btf_ext_parse_hdr(__u8 *data, __u32 data_size)
 {
 	const struct btf_ext_header *hdr = (struct btf_ext_header *)data;
@@ -1004,6 +979,13 @@ struct btf_ext *btf_ext__new(__u8 *data, __u32 size)
 	if (err)
 		goto done;
 
+	/* check if there is offset_reloc_off/offset_reloc_len fields */
+	if (btf_ext->hdr->hdr_len < sizeof(struct btf_ext_header))
+		goto done;
+	err = btf_ext_setup_offset_reloc(btf_ext);
+	if (err)
+		goto done;
+
 done:
 	if (err) {
 		btf_ext__free(btf_ext);
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 88a52ae56fc6..287361ee1f6b 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -57,6 +57,10 @@ struct btf_ext_header {
 	__u32	func_info_len;
 	__u32	line_info_off;
 	__u32	line_info_len;
+
+	/* optional part of .BTF.ext header */
+	__u32	offset_reloc_off;
+	__u32	offset_reloc_len;
 };
 
 LIBBPF_API void btf__free(struct btf *btf);
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index 2ac29bd36226..087ff512282f 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -46,4 +46,95 @@ do {				\
 int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
 			 const char *str_sec, size_t str_len);
 
+struct btf_ext_info {
+	/*
+	 * info points to the individual info section (e.g. func_info and
+	 * line_info) from the .BTF.ext. It does not include the __u32 rec_size.
+	 */
+	void *info;
+	__u32 rec_size;
+	__u32 len;
+};
+
+#define for_each_btf_ext_sec(seg, sec)					\
+	for (sec = (seg)->info;						\
+	     (void *)sec < (seg)->info + (seg)->len;			\
+	     sec = (void *)sec + sizeof(struct btf_ext_info_sec) +	\
+		   (seg)->rec_size * sec->num_info)
+
+#define for_each_btf_ext_rec(seg, sec, i, rec)				\
+	for (i = 0, rec = (void *)&(sec)->data;				\
+	     i < (sec)->num_info;					\
+	     i++, rec = (void *)rec + (seg)->rec_size)
+
+struct btf_ext {
+	union {
+		struct btf_ext_header *hdr;
+		void *data;
+	};
+	struct btf_ext_info func_info;
+	struct btf_ext_info line_info;
+	struct btf_ext_info offset_reloc_info;
+	__u32 data_size;
+};
+
+struct btf_ext_info_sec {
+	__u32	sec_name_off;
+	__u32	num_info;
+	/* Followed by num_info * record_size number of bytes */
+	__u8	data[0];
+};
+
+/* The minimum bpf_func_info checked by the loader */
+struct bpf_func_info_min {
+	__u32   insn_off;
+	__u32   type_id;
+};
+
+/* The minimum bpf_line_info checked by the loader */
+struct bpf_line_info_min {
+	__u32	insn_off;
+	__u32	file_name_off;
+	__u32	line_off;
+	__u32	line_col;
+};
+
+/* The minimum bpf_offset_reloc checked by the loader
+ *
+ * Offset relocation captures the following data:
+ * - insn_off - instruction offset (in bytes) within a BPF program that needs
+ *   its insn->imm field to be relocated with actual offset;
+ * - type_id - BTF type ID of the "root" (containing) entity of a relocatable
+ *   offset;
+ * - access_str_off - offset into corresponding .BTF string section. String
+ *   itself encodes an accessed field using a sequence of field and array
+ *   indicies, separated by colon (:). It's conceptually very close to LLVM's
+ *   getelementptr ([0]) instruction's arguments for identifying offset to 
+ *   a field.
+ *
+ * Example to provide a better feel.
+ *
+ *   struct sample {
+ *       int a;
+ *       struct {
+ *           int b[10];
+ *       };
+ *   };
+ * 
+ *   struct sample *s = ...;
+ *   int x = &s->a;     // encoded as "0:0" (a is field #0)
+ *   int y = &s->b[5];  // encoded as "0:1:5" (b is field #1, arr elem #5)
+ *   int z = &s[10]->b; // encoded as "10:1" (ptr is used as an array)
+ *
+ * type_id for all relocs in this example  will capture BTF type id of
+ * `struct sample`.
+ *
+ *   [0] https://llvm.org/docs/LangRef.html#getelementptr-instruction
+ */
+struct bpf_offset_reloc {
+	__u32   insn_off;
+	__u32   type_id;
+	__u32   access_str_off;
+};
+
 #endif /* __LIBBPF_LIBBPF_INTERNAL_H */
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf-next 03/10] selftests/bpf: add CO-RE relocs testing setup
From: Andrii Nakryiko @ 2019-07-24 19:27 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel, yhs
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190724192742.1419254-1-andriin@fb.com>

Add CO-RE relocation test runner. Add one simple test validating that
libbpf's logic for searching for kernel image and loading BTF out of it
works.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 .../selftests/bpf/prog_tests/core_reloc.c     | 126 ++++++++++++++++++
 .../bpf/progs/test_core_reloc_kernel.c        |  39 ++++++
 2 files changed, 165 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/core_reloc.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_kernel.c

diff --git a/tools/testing/selftests/bpf/prog_tests/core_reloc.c b/tools/testing/selftests/bpf/prog_tests/core_reloc.c
new file mode 100644
index 000000000000..b8d6ea578b20
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/core_reloc.c
@@ -0,0 +1,126 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+
+struct core_reloc_test_case {
+	const char *case_name;
+	const char *bpf_obj_file;
+	const char *btf_src_file;
+	const char *input;
+	int input_len;
+	const char *output;
+	int output_len;
+	bool fails;
+};
+
+static struct core_reloc_test_case test_cases[] = {
+	/* validate we can find kernel image and use its BTF for relocs */
+	{
+		.case_name = "kernel",
+		.bpf_obj_file = "test_core_reloc_kernel.o",
+		.btf_src_file = NULL, /* load from /lib/modules/$(uname -r) */
+		.input = "",
+		.input_len = 0,
+		.output = "\1", /* true */
+		.output_len = 1,
+	},
+};
+
+struct data {
+	char in[256];
+	char out[256];
+};
+
+void test_core_reloc(void)
+{
+	const char *probe_name = "raw_tracepoint/sys_enter";
+	struct bpf_object_load_attr load_attr = {};
+	struct core_reloc_test_case *test_case;
+	int err, duration = 0, i, equal;
+	struct bpf_link *link = NULL;
+	struct bpf_map *data_map;
+	struct bpf_program *prog;
+	struct bpf_object *obj;
+	const int zero = 0;
+	struct data data;
+
+	for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+		test_case = &test_cases[i];
+
+		obj = bpf_object__open(test_case->bpf_obj_file);
+		if (CHECK(IS_ERR_OR_NULL(obj), "obj_open",
+			  "case #%d: failed to open '%s': %ld\n",
+			  i, test_case->bpf_obj_file, PTR_ERR(obj)))
+			return;
+
+		prog = bpf_object__find_program_by_title(obj, probe_name);
+		if (CHECK(!prog, "find_probe",
+			  "case #%d: prog '%s' not found\n",
+			  i, probe_name))
+			goto cleanup;
+		bpf_program__set_type(prog, BPF_PROG_TYPE_RAW_TRACEPOINT);
+
+		load_attr.obj = obj;
+		load_attr.log_level = 0;
+		load_attr.target_btf_path = test_case->btf_src_file;
+		err = bpf_object__load_xattr(&load_attr);
+		if (test_case->fails) {
+			CHECK(!err, "obj_load_fail",
+			      "case #%d: should fail to load prog '%s'\n",
+			      i, probe_name);
+			goto cleanup;
+		} else {
+			if (CHECK(err, "obj_load",
+				  "case #%d: failed to load prog '%s': %d\n",
+				  i, probe_name, err))
+				goto cleanup;
+		}
+
+		link = bpf_program__attach_raw_tracepoint(prog, "sys_enter");
+		if (CHECK(IS_ERR(link), "attach_raw_tp", "case #%d: err %ld\n",
+			  i, PTR_ERR(link)))
+			goto cleanup;
+
+		data_map = bpf_object__find_map_by_name(obj, "test_cor.bss");
+		if (CHECK(!data_map, "find_data_map",
+			  "case #%d: failed to find data map\n", i))
+			goto cleanup;
+
+		memset(&data, 0, sizeof(data));
+		memcpy(data.in, test_case->input, test_case->input_len);
+
+		err = bpf_map_update_elem(bpf_map__fd(data_map),
+					  &zero, &data, 0);
+		if (CHECK(err, "update_data_map",
+			  "case #%d: failed to update .data map: %d\n",
+			  i, err))
+			goto cleanup;
+
+		/* trigger test run */
+		usleep(1);
+
+		err = bpf_map_lookup_elem(bpf_map__fd(data_map), &zero, &data);
+		if (CHECK(err, "get_result",
+			  "case #%d: failed to get output data: %d\n", i, err))
+			goto cleanup;
+
+		equal = memcmp(data.out, test_case->output,
+			       test_case->output_len) == 0;
+		if (CHECK(!equal, "check_result",
+			  "case #%d: input/output data don't match\n", i)) {
+			int j;
+
+			for (j = 0; j < test_case->output_len; j++) {
+				printf("case #%d: byte #%d, EXP 0x%02hhx GOT 0x%02hhx\n",
+				       i, j, test_case->output[j], data.out[j]);
+			}
+			goto cleanup;
+		}
+
+cleanup:
+		if (!IS_ERR_OR_NULL(link)) {
+			bpf_link__destroy(link);
+			link = NULL;
+		}
+		bpf_object__close(obj);
+	}
+}
diff --git a/tools/testing/selftests/bpf/progs/test_core_reloc_kernel.c b/tools/testing/selftests/bpf/progs/test_core_reloc_kernel.c
new file mode 100644
index 000000000000..9a71080ce4ec
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_core_reloc_kernel.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+
+#include <linux/bpf.h>
+#include <stdint.h>
+#include "bpf_helpers.h"
+
+char _license[] SEC("license") = "GPL";
+
+static volatile struct data {
+	char in[256];
+	char out[256];
+} data;
+
+struct task_struct {
+	int pid;
+	int tgid;
+};
+
+SEC("raw_tracepoint/sys_enter")
+int test_core_nesting(void *ctx)
+{
+	struct task_struct *task = (void *)bpf_get_current_task();
+	uint64_t pid_tgid = bpf_get_current_pid_tgid();
+	int pid, tgid;
+
+	if (bpf_probe_read(&pid, sizeof(pid),
+			   __builtin_preserve_access_index(&task->pid)))
+		return 1;
+	if (bpf_probe_read(&tgid, sizeof(tgid),
+			   __builtin_preserve_access_index(&task->tgid)))
+		return 1;
+
+	/* validate pid + tgid matches */
+	data.out[0] = (((uint64_t)pid << 32) | tgid) == pid_tgid;
+
+	return 0;
+}
+
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf-next 04/10] selftests/bpf: add CO-RE relocs struct flavors tests
From: Andrii Nakryiko @ 2019-07-24 19:27 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel, yhs
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190724192742.1419254-1-andriin@fb.com>

Add tests verifying that BPF program can use various struct/union
"flavors" to extract data from the same target struct/union.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 .../selftests/bpf/prog_tests/core_reloc.c     | 34 ++++++++++
 .../bpf/progs/btf__core_reloc_flavors.c       |  3 +
 .../btf__core_reloc_flavors__err_wrong_name.c |  3 +
 .../selftests/bpf/progs/core_reloc_types.h    | 15 +++++
 .../bpf/progs/test_core_reloc_flavors.c       | 65 +++++++++++++++++++
 5 files changed, 120 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_flavors.c
 create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_flavors__err_wrong_name.c
 create mode 100644 tools/testing/selftests/bpf/progs/core_reloc_types.h
 create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_flavors.c

diff --git a/tools/testing/selftests/bpf/prog_tests/core_reloc.c b/tools/testing/selftests/bpf/prog_tests/core_reloc.c
index b8d6ea578b20..c553c5f07ec3 100644
--- a/tools/testing/selftests/bpf/prog_tests/core_reloc.c
+++ b/tools/testing/selftests/bpf/prog_tests/core_reloc.c
@@ -1,5 +1,32 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <test_progs.h>
+#include "progs/core_reloc_types.h"
+
+#define STRUCT_TO_CHAR_PTR(struct_name) (const char *)&(struct struct_name)
+
+#define FLAVORS_DATA(struct_name) STRUCT_TO_CHAR_PTR(struct_name) {	\
+	.a = 42,							\
+	.b = 0xc001,							\
+	.c = 0xbeef,							\
+}
+
+#define FLAVORS_CASE_COMMON(name)					\
+	.case_name = #name,						\
+	.bpf_obj_file = "test_core_reloc_flavors.o",			\
+	.btf_src_file = "btf__core_reloc_" #name ".o"			\
+
+#define FLAVORS_CASE(name) {						\
+	FLAVORS_CASE_COMMON(name),					\
+	.input = FLAVORS_DATA(core_reloc_##name),			\
+	.input_len = sizeof(struct core_reloc_##name),			\
+	.output = FLAVORS_DATA(core_reloc_flavors),			\
+	.output_len = sizeof(struct core_reloc_flavors),		\
+}
+
+#define FLAVORS_ERR_CASE(name) {					\
+	FLAVORS_CASE_COMMON(name),					\
+	.fails = true,							\
+}
 
 struct core_reloc_test_case {
 	const char *case_name;
@@ -23,6 +50,13 @@ static struct core_reloc_test_case test_cases[] = {
 		.output = "\1", /* true */
 		.output_len = 1,
 	},
+
+	/* validate BPF program can use multiple flavors to match against
+	 * single target BTF type
+	 */
+	FLAVORS_CASE(flavors),
+
+	FLAVORS_ERR_CASE(flavors__err_wrong_name),
 };
 
 struct data {
diff --git a/tools/testing/selftests/bpf/progs/btf__core_reloc_flavors.c b/tools/testing/selftests/bpf/progs/btf__core_reloc_flavors.c
new file mode 100644
index 000000000000..b74455b91227
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/btf__core_reloc_flavors.c
@@ -0,0 +1,3 @@
+#include "core_reloc_types.h"
+
+void f(struct core_reloc_flavors x) {}
diff --git a/tools/testing/selftests/bpf/progs/btf__core_reloc_flavors__err_wrong_name.c b/tools/testing/selftests/bpf/progs/btf__core_reloc_flavors__err_wrong_name.c
new file mode 100644
index 000000000000..7b6035f86ee6
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/btf__core_reloc_flavors__err_wrong_name.c
@@ -0,0 +1,3 @@
+#include "core_reloc_types.h"
+
+void f(struct core_reloc_flavors__err_wrong_name x) {}
diff --git a/tools/testing/selftests/bpf/progs/core_reloc_types.h b/tools/testing/selftests/bpf/progs/core_reloc_types.h
new file mode 100644
index 000000000000..33b0c6a61912
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/core_reloc_types.h
@@ -0,0 +1,15 @@
+/*
+ * FLAVORS
+ */
+struct core_reloc_flavors {
+	int a;
+	int b;
+	int c;
+};
+
+/* this is not a flavor, as it doesn't have triple underscore */
+struct core_reloc_flavors__err_wrong_name {
+	int a;
+	int b;
+	int c;
+};
diff --git a/tools/testing/selftests/bpf/progs/test_core_reloc_flavors.c b/tools/testing/selftests/bpf/progs/test_core_reloc_flavors.c
new file mode 100644
index 000000000000..92660344518d
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_core_reloc_flavors.c
@@ -0,0 +1,65 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+
+#include <linux/bpf.h>
+#include <stdint.h>
+#include "bpf_helpers.h"
+
+char _license[] SEC("license") = "GPL";
+
+static volatile struct data {
+	char in[256];
+	char out[256];
+} data;
+
+struct core_reloc_flavors {
+	int a;
+	int b;
+	int c;
+};
+
+/* local flavor with reversed layout */
+struct core_reloc_flavors___reversed {
+	int c;
+	int b;
+	int a;
+};
+
+/* local flavor with nested/overlapping layout */
+struct core_reloc_flavors___weird {
+	struct {
+		int b;
+	};
+	/* a and c overlap in local flavor, but this should still work
+	 * correctly with target original flavor
+	 */
+	union {
+		int a;
+		int c;
+	};
+};
+
+SEC("raw_tracepoint/sys_enter")
+int test_core_nesting(void *ctx)
+{
+	struct core_reloc_flavors *in_orig = (void *)&data.in;
+	struct core_reloc_flavors___reversed *in_rev = (void *)&data.in;
+	struct core_reloc_flavors___weird *in_weird = (void *)&data.in;
+	struct core_reloc_flavors *out = (void *)&data.out;
+
+	/* read a using weird layout */
+	if (bpf_probe_read(&out->a, sizeof(in_weird->a),
+			   __builtin_preserve_access_index(&in_weird->a)))
+		return 1;
+	/* read b using reversed layout */
+	if (bpf_probe_read(&out->b, sizeof(in_rev->b),
+			   __builtin_preserve_access_index(&in_rev->b)))
+		return 1;
+	/* read c using original layout */
+	if (bpf_probe_read(&out->c, sizeof(in_orig->c),
+			   __builtin_preserve_access_index(&in_orig->c)))
+		return 1;
+
+	return 0;
+}
+
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf-next 02/10] libbpf: implement BPF CO-RE offset relocation algorithm
From: Andrii Nakryiko @ 2019-07-24 19:27 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel, yhs
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190724192742.1419254-1-andriin@fb.com>

This patch implements the core logic for BPF CO-RE offsets relocations.
All the details are described in code comments.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 tools/lib/bpf/libbpf.c | 866 ++++++++++++++++++++++++++++++++++++++++-
 tools/lib/bpf/libbpf.h |   1 +
 2 files changed, 861 insertions(+), 6 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 8741c39adb1c..86d87bf10d46 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -38,6 +38,7 @@
 #include <sys/stat.h>
 #include <sys/types.h>
 #include <sys/vfs.h>
+#include <sys/utsname.h>
 #include <tools/libc_compat.h>
 #include <libelf.h>
 #include <gelf.h>
@@ -47,6 +48,7 @@
 #include "btf.h"
 #include "str_error.h"
 #include "libbpf_internal.h"
+#include "hashmap.h"
 
 #ifndef EM_BPF
 #define EM_BPF 247
@@ -1013,16 +1015,22 @@ static int bpf_object__init_user_maps(struct bpf_object *obj, bool strict)
 }
 
 static const struct btf_type *skip_mods_and_typedefs(const struct btf *btf,
-						     __u32 id)
+						     __u32 id,
+						     __u32 *res_id)
 {
 	const struct btf_type *t = btf__type_by_id(btf, id);
 
+	if (res_id)
+		*res_id = id;
+
 	while (true) {
 		switch (BTF_INFO_KIND(t->info)) {
 		case BTF_KIND_VOLATILE:
 		case BTF_KIND_CONST:
 		case BTF_KIND_RESTRICT:
 		case BTF_KIND_TYPEDEF:
+			if (res_id)
+				*res_id = t->type;
 			t = btf__type_by_id(btf, t->type);
 			break;
 		default:
@@ -1041,7 +1049,7 @@ static const struct btf_type *skip_mods_and_typedefs(const struct btf *btf,
 static bool get_map_field_int(const char *map_name, const struct btf *btf,
 			      const struct btf_type *def,
 			      const struct btf_member *m, __u32 *res) {
-	const struct btf_type *t = skip_mods_and_typedefs(btf, m->type);
+	const struct btf_type *t = skip_mods_and_typedefs(btf, m->type, NULL);
 	const char *name = btf__name_by_offset(btf, m->name_off);
 	const struct btf_array *arr_info;
 	const struct btf_type *arr_t;
@@ -1107,7 +1115,7 @@ static int bpf_object__init_user_btf_map(struct bpf_object *obj,
 		return -EOPNOTSUPP;
 	}
 
-	def = skip_mods_and_typedefs(obj->btf, var->type);
+	def = skip_mods_and_typedefs(obj->btf, var->type, NULL);
 	if (BTF_INFO_KIND(def->info) != BTF_KIND_STRUCT) {
 		pr_warning("map '%s': unexpected def kind %u.\n",
 			   map_name, BTF_INFO_KIND(var->info));
@@ -2289,6 +2297,845 @@ bpf_program_reloc_btf_ext(struct bpf_program *prog, struct bpf_object *obj,
 	return 0;
 }
 
+#define BPF_CORE_SPEC_MAX_LEN 64
+
+/* represents BPF CO-RE field or array element accessor */
+struct bpf_core_accessor {
+	__u32 type_id;		/* struct/union type or array element type */
+	__u32 idx;		/* field index or array index */
+	const char *name;	/* field name or NULL for array accessor */
+};
+
+struct bpf_core_spec {
+	const struct btf *btf;
+	/* high-level spec: named fields and array indicies only */
+	struct bpf_core_accessor spec[BPF_CORE_SPEC_MAX_LEN];
+	/* high-level spec length */
+	int len;
+	/* raw, low-level spec: 1-to-1 with accessor spec string */
+	int raw_spec[BPF_CORE_SPEC_MAX_LEN];
+	/* raw spec length */
+	int raw_len;
+	/* field byte offset represented by spec */
+	__u32 offset;
+};
+
+static bool str_is_empty(const char *s)
+{
+	return !s || !s[0];
+}
+
+static int btf_kind(const struct btf_type *t)
+{
+	return BTF_INFO_KIND(t->info);
+}
+
+static bool btf_is_composite(const struct btf_type *t)
+{
+	int kind = btf_kind(t);
+
+	return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
+}
+
+static bool btf_is_array(const struct btf_type *t)
+{
+	return btf_kind(t) == BTF_KIND_ARRAY;
+}
+
+/* 
+ * Turn bpf_offset_reloc into a low- and high-level spec representation,
+ * validating correctness along the way, as well as calculating resulting
+ * field offset (in bytes), specified by accessor string. Low-level spec
+ * captures every single level of nestedness, including traversing anonymous
+ * struct/union members. High-level one only captures semantically meaningful
+ * "turning points": named fields and array indicies.
+ * E.g., for this case:
+ *
+ *   struct sample {
+ *       int __unimportant;
+ *       struct {
+ *           int __1;
+ *           int __2;
+ *           int a[7];
+ *       };
+ *   };
+ *
+ *   struct sample *s = ...;
+ *
+ *   int x = &s->a[3]; // access string = '0:1:2:3'
+ *
+ * Low-level spec has 1:1 mapping with each element of access string (it's
+ * just a parsed access string representation): [0, 1, 2, 3].
+ *
+ * High-level spec will capture only 3 points:
+ *   - intial zero-index access by pointer (&s->... is the same as &s[0]...);
+ *   - field 'a' access (corresponds to '2' in low-level spec);
+ *   - array element #3 access (corresponds to '3' in low-level spec).
+ *
+ */
+static int bpf_core_spec_parse(const struct btf *btf,
+			       __u32 type_id,
+			       const char *spec_str,
+			       struct bpf_core_spec *spec)
+{
+	int access_idx, parsed_len, i;
+	const struct btf_type *t;
+	__u32 id = type_id;
+	const char *name;
+	__s64 sz;
+
+	if (str_is_empty(spec_str) || *spec_str == ':')
+		return -EINVAL;
+
+	memset(spec, 0, sizeof(*spec));
+	spec->btf = btf;
+
+	/* parse spec_str="0:1:2:3:4" into array raw_spec=[0, 1, 2, 3, 4] */
+	while (*spec_str) {
+		if (*spec_str == ':')
+			++spec_str;
+		if (sscanf(spec_str, "%d%n", &access_idx, &parsed_len) != 1)
+			return -EINVAL;
+		if (spec->raw_len == BPF_CORE_SPEC_MAX_LEN)
+			return -E2BIG;
+		spec_str += parsed_len;
+		spec->raw_spec[spec->raw_len++] = access_idx;
+	}
+
+	if (spec->raw_len == 0)
+		return -EINVAL;
+
+	for (i = 0; i < spec->raw_len; i++) {
+		t = skip_mods_and_typedefs(btf, id, &id);
+		if (!t)
+			return -EINVAL;
+
+		access_idx = spec->raw_spec[i];
+
+		if (i == 0) {
+			/* first spec value is always reloc type array index */
+			spec->spec[spec->len].type_id = id;
+			spec->spec[spec->len].idx = access_idx;
+			spec->len++;
+
+			sz = btf__resolve_size(btf, id);
+			if (sz < 0)
+				return sz;
+			spec->offset += access_idx * sz;
+			continue;
+		}
+
+		if (btf_is_composite(t)) {
+			const struct btf_member *m = (void *)(t + 1);
+			__u32 offset;
+
+			if (access_idx >= BTF_INFO_VLEN(t->info))
+				return -EINVAL;
+
+			m = &m[access_idx];
+
+			if (BTF_INFO_KFLAG(t->info)) {
+				if (BTF_MEMBER_BITFIELD_SIZE(m->offset))
+					return -EINVAL;
+				offset = BTF_MEMBER_BIT_OFFSET(m->offset);
+			} else {
+				offset = m->offset;
+			}
+			if (m->offset % 8)
+				return -EINVAL;
+			spec->offset += offset / 8;
+
+			if (m->name_off) {
+				name = btf__name_by_offset(btf, m->name_off);
+				if (str_is_empty(name))
+					return -EINVAL;
+
+				spec->spec[spec->len].type_id = id;
+				spec->spec[spec->len].idx = access_idx;
+				spec->spec[spec->len].name = name;
+				spec->len++;
+			}
+
+			id = m->type;
+		} else if (btf_is_array(t)) {
+			const struct btf_array *a = (void *)(t + 1);
+
+			t = skip_mods_and_typedefs(btf, a->type, &id);
+			if (!t || access_idx >= a->nelems)
+				return -EINVAL;
+
+			spec->spec[spec->len].type_id = id;
+			spec->spec[spec->len].idx = access_idx;
+			spec->len++;
+
+			sz = btf__resolve_size(btf, id);
+			if (sz < 0)
+				return sz;
+			spec->offset += access_idx * sz;
+		} else {
+			pr_warning("relo for [%u] %s (at idx %d) captures type [%d] of unexpected kind %d\n",
+				   type_id, spec_str, i, id, btf_kind(t));
+			return -EINVAL;
+		}
+	}
+
+	if (spec->len == 0)
+		return -EINVAL;
+
+	return 0;
+}
+
+/* Given 'some_struct_name___with_flavor' return the length of a name prefix
+ * before last triple underscore. Struct name part after last triple
+ * underscore is ignored by BPF CO-RE relocation during relocation matching.
+ */
+static size_t bpf_core_essential_name_len(const char *name)
+{
+	size_t n = strlen(name);
+	int i = n - 3;
+
+	while (i > 0) {
+		if (name[i] == '_' && name[i + 1] == '_' && name[i + 2] == '_')
+			return i;
+		i--;
+	}
+	return n;
+}
+
+/* dynamically sized list of type IDs */
+struct ids_vec {
+	__u32 *data;
+	int len;
+};
+
+static void bpf_core_free_cands(struct ids_vec *cand_ids)
+{
+	free(cand_ids->data);
+	free(cand_ids);
+}
+
+static struct ids_vec *bpf_core_find_cands(const struct btf *local_btf,
+					   __u32 local_type_id,
+					   const struct btf *targ_btf)
+{
+	size_t local_essent_len, targ_essent_len;
+	const char *local_name, *targ_name;
+	const struct btf_type *t;
+	struct ids_vec *cand_ids;
+	__u32 *new_ids;
+	int i, err, n;
+
+	t = btf__type_by_id(local_btf, local_type_id);
+	if (!t)
+		return ERR_PTR(-EINVAL);
+
+	local_name = btf__name_by_offset(local_btf, t->name_off);
+	if (str_is_empty(local_name))
+		return ERR_PTR(-EINVAL);
+	local_essent_len = bpf_core_essential_name_len(local_name);
+
+	cand_ids = calloc(1, sizeof(*cand_ids));
+	if (!cand_ids)
+		return ERR_PTR(-ENOMEM);
+
+	n = btf__get_nr_types(targ_btf);
+	for (i = 1; i <= n; i++) {
+		t = btf__type_by_id(targ_btf, i);
+		targ_name = btf__name_by_offset(targ_btf, t->name_off);
+		if (str_is_empty(targ_name))
+			continue;
+
+		targ_essent_len = bpf_core_essential_name_len(targ_name);
+		if (targ_essent_len != local_essent_len)
+			continue;
+
+		if (strncmp(local_name, targ_name, local_essent_len) == 0) {
+			pr_debug("[%d] (%s): found candidate [%d] (%s)\n",
+				 local_type_id, local_name, i, targ_name);
+			new_ids = realloc(cand_ids->data, cand_ids->len + 1);
+			if (!new_ids) {
+				err = -ENOMEM;
+				goto err_out;
+			}
+			cand_ids->data = new_ids;
+			cand_ids->data[cand_ids->len++] = i;
+		}
+	}
+	return cand_ids;
+err_out:
+	bpf_core_free_cands(cand_ids);
+	return ERR_PTR(err);
+}
+
+/* Check two types for compatibility, skipping const/volatile/restrict and
+ * typedefs, to ensure we are relocating offset to the compatible entities:
+ *   - any two STRUCTs/UNIONs are compatible and can be mixed;
+ *   - any two FWDs are compatible;
+ *   - any two PTRs are always compatible;
+ *   - for ENUMs, check sizes, names are ignored;
+ *   - for INT, size and bitness should match, signedness is ignored;
+ *   - for ARRAY, dimensionality is ignored, element types are checked for
+ *     compatibility recursively;
+ *   - everything else shouldn't be ever a target of relocation.
+ * These rules are not set in stone and probably will be adjusted as we get
+ * more experience with using BPF CO-RE relocations.
+ */
+static int bpf_core_fields_are_compat(const struct btf *local_btf,
+				      __u32 local_id,
+				      const struct btf *targ_btf,
+				      __u32 targ_id)
+{
+	const struct btf_type *local_type, *targ_type;
+	__u16 kind;
+
+recur:
+	local_type = skip_mods_and_typedefs(local_btf, local_id, &local_id);
+	targ_type = skip_mods_and_typedefs(targ_btf, targ_id, &targ_id);
+	if (!local_type || !targ_type)
+		return -EINVAL;
+
+	if (btf_is_composite(local_type) && btf_is_composite(targ_type))
+		return 1;
+	if (BTF_INFO_KIND(local_type->info) != BTF_INFO_KIND(targ_type->info))
+		return 0;
+
+	kind = BTF_INFO_KIND(local_type->info);
+	switch (kind) {
+	case BTF_KIND_FWD:
+	case BTF_KIND_PTR:
+		return 1;
+	case BTF_KIND_ENUM:
+		return local_type->size == targ_type->size;
+	case BTF_KIND_INT: {
+		__u32 loc_int = *(__u32 *)(local_type + 1);
+		__u32 targ_int = *(__u32 *)(targ_type + 1);
+
+		return BTF_INT_OFFSET(loc_int) == 0 &&
+		       BTF_INT_OFFSET(targ_int) == 0 &&
+		       local_type->size == targ_type->size &&
+		       BTF_INT_BITS(loc_int) == BTF_INT_BITS(targ_int);
+	}
+	case BTF_KIND_ARRAY: {
+		const struct btf_array *loc_a, *targ_a;
+
+		loc_a = (void *)(local_type + 1);
+		targ_a = (void *)(targ_type + 1);
+		local_id = loc_a->type;
+		targ_id = targ_a->type;
+		goto recur;
+	}
+	default:
+		pr_warning("unexpected kind %d relocated, local [%d], target [%d]\n",
+			   kind, local_id, targ_id);
+		return 0;
+	}
+}
+
+/* 
+ * Given single high-level accessor (either named field or array index) in
+ * local type, find corresponding high-level accessor for a target type. Along
+ * the way, maintain low-level spec for target as well. Also keep updating
+ * target offset.
+ */
+static int bpf_core_match_member(const struct btf *local_btf,
+				 const struct bpf_core_accessor *local_acc,
+				 const struct btf *targ_btf,
+				 __u32 targ_id,
+				 struct bpf_core_spec *spec,
+				 __u32 *next_targ_id)
+{
+	const struct btf_type *local_type, *targ_type;
+	const struct btf_member *local_member, *m;
+	const char *local_name, *targ_name;
+	__u32 local_id;
+	int i, n, found;
+
+	targ_type = skip_mods_and_typedefs(targ_btf, targ_id, &targ_id);
+	if (!targ_type)
+		return -EINVAL;
+	if (!btf_is_composite(targ_type))
+		return 0;
+
+	local_id = local_acc->type_id;
+	local_type = btf__type_by_id(local_btf, local_id);
+	local_member = (void *)(local_type + 1);
+	local_member += local_acc->idx;
+	local_name = btf__name_by_offset(local_btf, local_member->name_off);
+
+	n = BTF_INFO_VLEN(targ_type->info);
+	m = (void *)(targ_type + 1);
+	for (i = 0; i < n; i++, m++) {
+		__u32 offset;
+
+		/* bitfield relocations not supported */
+		if (BTF_INFO_KFLAG(targ_type->info)) {
+			if (BTF_MEMBER_BITFIELD_SIZE(m->offset))
+				continue;
+			offset = BTF_MEMBER_BIT_OFFSET(m->offset);
+		} else {
+			offset = m->offset;
+		}
+		if (offset % 8)
+			continue;
+
+		/* too deep struct/union/array nesting */
+		if (spec->raw_len == BPF_CORE_SPEC_MAX_LEN)
+			return -E2BIG;
+
+		/* speculate this member will be the good one */
+		spec->offset += offset / 8;
+		spec->raw_spec[spec->raw_len++] = i;
+
+		targ_name = btf__name_by_offset(targ_btf, m->name_off);
+		if (str_is_empty(targ_name)) {
+			/* embedded struct/union, we need to go deeper */
+			found = bpf_core_match_member(local_btf, local_acc,
+						      targ_btf, m->type,
+						      spec, next_targ_id);
+			if (found) /* either found or error */
+				return found;
+		} else if (strcmp(local_name, targ_name) == 0) {
+			/* matching named field */
+			struct bpf_core_accessor *targ_acc;
+
+			targ_acc = &spec->spec[spec->len++];
+			targ_acc->type_id = targ_id;
+			targ_acc->idx = i;
+			targ_acc->name = targ_name;
+
+			*next_targ_id = m->type;
+			found = bpf_core_fields_are_compat(local_btf,
+							   local_member->type,
+							   targ_btf, m->type);
+			if (!found)
+				spec->len--; /* pop accessor */
+			return found;
+		}
+		/* member turned out to be not we looked for */
+		spec->offset -= offset / 8;
+		spec->raw_len--;
+	}
+
+	return 0;
+}
+
+/*
+ * Try to match local spec to a target type and, if successful, produce full
+ * target spec (high-level, low-level + offset).
+ */
+static int bpf_core_spec_match(struct bpf_core_spec *local_spec,
+			       const struct btf *targ_btf, __u32 targ_id,
+			       struct bpf_core_spec *targ_spec)
+{
+	const struct btf_type *targ_type;
+	const struct bpf_core_accessor *local_acc;
+	struct bpf_core_accessor *targ_acc;
+	int i, sz, matched;
+
+	memset(targ_spec, 0, sizeof(*targ_spec));
+	targ_spec->btf = targ_btf;
+
+	local_acc = &local_spec->spec[0];
+	targ_acc = &targ_spec->spec[0];
+
+	for (i = 0; i < local_spec->len; i++, local_acc++, targ_acc++) {
+		targ_type = skip_mods_and_typedefs(targ_spec->btf, targ_id,
+						   &targ_id);
+		if (!targ_type)
+			return -EINVAL;
+
+		if (local_acc->name) {
+			if (!btf_is_composite(targ_type))
+				return 0;
+
+			matched = bpf_core_match_member(local_spec->btf,
+							local_acc,
+							targ_btf, targ_id,
+							targ_spec, &targ_id);
+			if (matched <= 0)
+				return matched;
+		} else {
+			/* for i=0, targ_id is already treated as array element
+			 * type (because it's the original struct), for others
+			 * we should find array element type first
+			 */
+			if (i > 0) {
+				const struct btf_array *a;
+
+				if (!btf_is_array(targ_type))
+					return 0;
+
+				a = (void *)(targ_type + 1);
+				if (local_acc->idx >= a->nelems)
+					return 0;
+				if (!skip_mods_and_typedefs(targ_btf, a->type,
+							    &targ_id))
+					return -EINVAL;
+			}
+
+			/* too deep struct/union/array nesting */
+			if (targ_spec->raw_len == BPF_CORE_SPEC_MAX_LEN)
+				return -E2BIG;
+
+			targ_acc->type_id = targ_id;
+			targ_acc->idx = local_acc->idx;
+			targ_acc->name = NULL;
+			targ_spec->len++;
+			targ_spec->raw_spec[targ_spec->raw_len] = targ_acc->idx;
+			targ_spec->raw_len++;
+
+			sz = btf__resolve_size(targ_btf, targ_id);
+			if (sz < 0)
+				return sz;
+			targ_spec->offset += local_acc->idx * sz;
+		}
+	}
+
+	return 1;
+}
+
+/*
+ * Patch relocatable BPF instruction.
+ * Expected insn->imm value is provided for validation, as well as the new
+ * relocated value.
+ *
+ * Currently three kinds of BPF instructions are supported:
+ * 1. rX = <imm> (assignment with immediate operand);
+ * 2. rX += <imm> (arithmetic operations with immediate operand);
+ * 3. *(rX) = <imm> (indirect memory assignment with immediate operand).
+ *
+ * If actual insn->imm value is wrong, bail out.
+ */
+static int bpf_core_reloc_insn(struct bpf_program *prog, int insn_off,
+			       __u32 orig_off, __u32 new_off)
+{
+	struct bpf_insn *insn;
+	int insn_idx;
+	__u8 class;
+
+	if (insn_off % sizeof(struct bpf_insn))
+		return -EINVAL;
+	insn_idx = insn_off / sizeof(struct bpf_insn);
+
+	insn = &prog->insns[insn_idx];
+	class = BPF_CLASS(insn->code);
+
+	if (class == BPF_ALU || class == BPF_ALU64) {
+		if (BPF_SRC(insn->code) != BPF_K)
+			return -EINVAL;
+		if (insn->imm != orig_off)
+			return -EINVAL;
+		insn->imm = new_off;
+		pr_debug("prog '%s': patched insn #%d (ALU/ALU64) imm %d -> %d\n",
+			 bpf_program__title(prog, false),
+			 insn_idx, orig_off, new_off);
+	} else if (class == BPF_ST && BPF_MODE(insn->code) == BPF_MEM) {
+		if (insn->imm != orig_off)
+			return -EINVAL;
+		insn->imm = new_off;
+		pr_debug("prog '%s': patched insn #%d (ST | MEM) imm %d -> %d\n",
+			 bpf_program__title(prog, false),
+			 insn_idx, orig_off, new_off);
+	} else {
+		pr_warning("prog '%s': trying to relocate unrecognized insn #%d, code:%x, src:%x, dst:%x, off:%x, imm:%x\n",
+			   bpf_program__title(prog, false),
+			   insn_idx, insn->code, insn->src_reg, insn->dst_reg,
+			   insn->off, insn->imm);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+/*
+ * Probe few well-known locations for vmlinux kernel image and try to load BTF
+ * data out of it to use for target BTF.
+ */
+static struct btf *bpf_core_find_kernel_btf(void)
+{
+	const char *locations[] = {
+		"/lib/modules/%1$s/vmlinux-%1$s",
+		"/usr/lib/modules/%1$s/kernel/vmlinux",
+	};
+	char path[PATH_MAX + 1];
+	struct utsname buf;
+	struct btf *btf;
+	int i, err;
+
+	err = uname(&buf);
+	if (err) {
+		pr_warning("failed to uname(): %d\n", err);
+		return ERR_PTR(err);
+	}
+
+	for (i = 0; i < ARRAY_SIZE(locations); i++) {
+		snprintf(path, PATH_MAX, locations[i], buf.release);
+		pr_debug("attempting to load kernel BTF from '%s'\n", path);
+
+		if (access(path, R_OK))
+			continue;
+
+		btf = btf__parse_elf(path, NULL);
+		if (IS_ERR(btf))
+			continue;
+
+		pr_debug("successfully loaded kernel BTF from '%s'\n", path);
+		return btf;
+	}
+
+	pr_warning("failed to find valid kernel BTF\n");
+	return ERR_PTR(-ESRCH);
+}
+
+static size_t bpf_core_hash_fn(const void *key, void *ctx)
+{
+	return (size_t)key;
+}
+
+static bool bpf_core_equal_fn(const void *k1, const void *k2, void *ctx)
+{
+	return k1 == k2;
+}
+
+static void *u32_to_ptr(__u32 x)
+{
+	return (void *)(uintptr_t)x;
+}
+
+/* 
+ * CO-RE relocate single instruction.
+ *
+ * The outline and important points of the algorithm:
+ * 1. For given local type, find corresponding candidate target types.
+ *    Candidate type is a type with the same "essential" name, ignoring
+ *    everything after last triple underscore (___). E.g., `sample`,
+ *    `sample___flavor_one`, `sample___flavor_another_one`, are all candidates
+ *    for each other. Names with triple underscore are referred to as
+ *    "flavors" and are useful, among other things, to allow to
+ *    specify/support incompatible variations of the same kernel struct, which
+ *    might differ between different kernel versions and/or build
+ *    configurations.
+ * 2. For each candidate type, try to match local specification to this
+ *    candidate target type. Matching involves finding corresponding
+ *    high-level spec accessors, meaning that all named fields should match,
+ *    as well as all array accesses should be within the actual bounds. Also,
+ *    types should be compatible (see bpf_core_fields_are_compat for details).
+ * 3. It is supported and expected that there might be multiple flavors
+ *    matching the spec. As long as all the specs resolve to the same set of
+ *    offsets across all candidates, there is not error. If there is any
+ *    ambiguity, CO-RE relocation will fail. This is necessary to accomodate
+ *    imprefection of BTF deduplication, which can cause slight duplication of
+ *    the same BTF type, if some directly or indirectly referenced (by
+ *    pointer) type gets resolved to different actual types in different
+ *    object files. If such situation occurs, deduplicated BTF will end up
+ *    with two (or more) structurally identical types, which differ only in
+ *    types they refer to through pointer. This should be OK in most cases and
+ *    is not an error.
+ * 4. Candidate types search is performed by linearly scanning through all
+ *    types in target BTF. It is anticipated that this is overall more
+ *    efficient memory-wise and not significantly worse (if not better)
+ *    CPU-wise compared to prebuilding a map from all local type names to
+ *    a list of candidate type names. It's also sped up by caching resolved
+ *    list of matching candidates per each local "root" type ID, that has at
+ *    least one bpf_offset_reloc associated with it. This list is shared
+ *    between multiple relocations for the same type ID and is updated as some
+ *    of the candidates are pruned due to structural incompatibility.
+ */
+static int bpf_core_reloc_offset(struct bpf_program *prog,
+				 const struct bpf_offset_reloc *relo,
+				 int relo_idx,
+				 const struct btf *local_btf,
+				 const struct btf *targ_btf,
+				 struct hashmap *cand_cache)
+{
+	const char *prog_name = bpf_program__title(prog, false);
+	struct bpf_core_spec local_spec, cand_spec, targ_spec;
+	const void *type_key = u32_to_ptr(relo->type_id);
+	const struct btf_type *local_type, *cand_type;
+	const char *local_name, *cand_name;
+	struct ids_vec *cand_ids;
+	__u32 local_id, cand_id;
+	const char *spec_str;
+	int i, j, err;
+
+	local_id = relo->type_id;
+	local_type = btf__type_by_id(local_btf, local_id);
+	if (!local_type)
+		return -EINVAL;
+
+	local_name = btf__name_by_offset(local_btf, local_type->name_off);
+	if (str_is_empty(local_name))
+		return -EINVAL;
+
+	spec_str = btf__name_by_offset(local_btf, relo->access_str_off);
+	if (str_is_empty(spec_str))
+		return -EINVAL;
+
+	pr_debug("prog '%s': relo #%d: insn_off=%d, [%d] (%s) + %s\n",
+		 prog_name, relo_idx, relo->insn_off,
+		 local_id, local_name, spec_str);
+
+	err = bpf_core_spec_parse(local_btf, local_id, spec_str, &local_spec);
+	if (err) {
+		pr_warning("prog '%s': relo #%d: parsing [%d] (%s) + %s failed: %d\n",
+			   prog_name, relo_idx, local_id, local_name, spec_str,
+			   err);
+		return -EINVAL;
+	}
+	pr_debug("prog '%s': relo #%d: [%d] (%s) + %s is off %u, len %d, raw_len %d\n",
+		 prog_name, relo_idx, local_id, local_name, spec_str,
+		 local_spec.offset, local_spec.len, local_spec.raw_len);
+
+	if (!hashmap__find(cand_cache, type_key, (void **)&cand_ids)) {
+		cand_ids = bpf_core_find_cands(local_btf, local_id, targ_btf);
+		if (IS_ERR(cand_ids)) {
+			pr_warning("prog '%s': relo #%d: target candidate search failed for [%d] (%s) + %s: %ld\n",
+				   prog_name, relo_idx, local_id, local_name,
+				   spec_str, PTR_ERR(cand_ids));
+			return PTR_ERR(cand_ids);
+		}
+		err = hashmap__set(cand_cache, type_key, cand_ids, NULL, NULL);
+		if (err) {
+			bpf_core_free_cands(cand_ids);
+			return err;
+		}
+	}
+
+	for (i = 0, j = 0; i < cand_ids->len; i++) {
+		cand_id = cand_ids->data[j];
+		cand_type = btf__type_by_id(targ_btf, cand_id);
+		cand_name = btf__name_by_offset(targ_btf, cand_type->name_off);
+
+		err = bpf_core_spec_match(&local_spec, targ_btf,
+					  cand_id, &cand_spec);
+		if (err < 0) {
+			pr_warning("prog '%s': relo #%d: failed to match spec [%d] (%s) + %s to candidate #%d [%d] (%s): %d\n",
+				   prog_name, relo_idx, local_id, local_name,
+				   spec_str, i, cand_id, cand_name, err);
+			return err;
+		}
+		if (err == 0) {
+			pr_debug("prog '%s': relo #%d: candidate #%d [%d] (%s) doesn't match spec\n",
+				 prog_name, relo_idx, i, cand_id, cand_name);
+			continue;
+		}
+
+		pr_debug("prog '%s': relo #%d: candidate #%d ([%d] %s) is off %u, len %d, raw_len %d\n",
+			 prog_name, relo_idx, i, cand_id, cand_name,
+			 cand_spec.offset, cand_spec.len, cand_spec.raw_len);
+
+		if (j == 0) {
+			targ_spec = cand_spec;
+		} else if (cand_spec.offset != targ_spec.offset) {
+			/* if there are many candidates, they should all
+			 * resolve to the same offset
+			 */
+			pr_warning("prog '%s': relo #%d: candidate #%d ([%d] %s): conflicting offset found (%u != %u)\n",
+				   prog_name, relo_idx, i, cand_id, cand_name,
+				   cand_spec.offset, targ_spec.offset);
+			return -EINVAL;
+		}
+
+		cand_ids->data[j++] = cand_spec.spec[0].type_id;
+	}
+
+	cand_ids->len = j;
+	if (cand_ids->len == 0) {
+		pr_warning("prog '%s': relo #%d: no matching targets found for [%d] (%s) + %s\n",
+			   prog_name, relo_idx, local_id, local_name, spec_str);
+		return -ESRCH;
+	}
+
+	err = bpf_core_reloc_insn(prog, relo->insn_off,
+				  local_spec.offset, targ_spec.offset);
+	if (err) {
+		pr_warning("prog '%s': relo #%d: failed to patch insn at offset %d: %d\n",
+			   prog_name, relo_idx, relo->insn_off, err);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+bpf_core_reloc_offsets(struct bpf_object *obj, const char *targ_btf_path)
+{
+	const struct btf_ext_info_sec *sec;
+	const struct bpf_offset_reloc *rec;
+	const struct btf_ext_info *seg;
+	struct hashmap_entry *entry;
+	struct hashmap *cand_cache = NULL;
+	struct bpf_program *prog;
+	struct btf *targ_btf;
+	const char *sec_name;
+	int i, err = 0;
+
+	if (targ_btf_path)
+		targ_btf = btf__parse_elf(targ_btf_path, NULL);
+	else
+		targ_btf = bpf_core_find_kernel_btf();
+	if (IS_ERR(targ_btf)) {
+		pr_warning("failed to get target BTF: %ld\n",
+			   PTR_ERR(targ_btf));
+		return PTR_ERR(targ_btf);
+	}
+
+	cand_cache = hashmap__new(bpf_core_hash_fn, bpf_core_equal_fn, NULL);
+	if (IS_ERR(cand_cache)) {
+		err = PTR_ERR(cand_cache);
+		goto out;
+	}
+
+	seg = &obj->btf_ext->offset_reloc_info;
+	for_each_btf_ext_sec(seg, sec) {
+		sec_name = btf__name_by_offset(obj->btf, sec->sec_name_off);
+		if (str_is_empty(sec_name)) {
+			err = -EINVAL;
+			goto out;
+		}
+		prog = bpf_object__find_program_by_title(obj, sec_name);
+		if (!prog) {
+			pr_warning("failed to find program '%s' for CO-RE offset relocation\n",
+				   sec_name);
+			err = -EINVAL;
+			goto out;
+		}
+
+		pr_debug("prog '%s': performing %d CO-RE offset relocs\n",
+			 sec_name, sec->num_info);
+
+		for_each_btf_ext_rec(seg, sec, i, rec) {
+			err = bpf_core_reloc_offset(prog, rec, i, obj->btf,
+						    targ_btf, cand_cache);
+			if (err) {
+				pr_warning("prog '%s': relo #%d: failed to relocate: %d\n",
+					   sec_name, i, err);
+				goto out;
+			}
+		}
+	}
+
+out:
+	btf__free(targ_btf);
+	if (!IS_ERR_OR_NULL(cand_cache)) {
+		hashmap__for_each_entry(cand_cache, entry, i) {
+			bpf_core_free_cands(entry->value);
+		}
+		hashmap__free(cand_cache);
+	}
+	return err;
+}
+
+static int
+bpf_object__relocate_core(struct bpf_object *obj, const char *targ_btf_path)
+{
+	int err = 0;
+
+	if (obj->btf_ext->offset_reloc_info.len)
+		err = bpf_core_reloc_offsets(obj, targ_btf_path);
+
+	return err;
+}
+
 static int
 bpf_program__reloc_text(struct bpf_program *prog, struct bpf_object *obj,
 			struct reloc_desc *relo)
@@ -2396,14 +3243,21 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
 	return 0;
 }
 
-
 static int
-bpf_object__relocate(struct bpf_object *obj)
+bpf_object__relocate(struct bpf_object *obj, const char *targ_btf_path)
 {
 	struct bpf_program *prog;
 	size_t i;
 	int err;
 
+	if (obj->btf_ext) {
+		err = bpf_object__relocate_core(obj, targ_btf_path);
+		if (err) {
+			pr_warning("failed to perform CO-RE relocations: %d\n",
+				   err);
+			return err;
+		}
+	}
 	for (i = 0; i < obj->nr_programs; i++) {
 		prog = &obj->programs[i];
 
@@ -2804,7 +3658,7 @@ int bpf_object__load_xattr(struct bpf_object_load_attr *attr)
 	obj->loaded = true;
 
 	CHECK_ERR(bpf_object__create_maps(obj), err, out);
-	CHECK_ERR(bpf_object__relocate(obj), err, out);
+	CHECK_ERR(bpf_object__relocate(obj, attr->target_btf_path), err, out);
 	CHECK_ERR(bpf_object__load_progs(obj, attr->log_level), err, out);
 
 	return 0;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 5cbf459ece0b..6004bfe1ebf0 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -92,6 +92,7 @@ LIBBPF_API void bpf_object__close(struct bpf_object *object);
 struct bpf_object_load_attr {
 	struct bpf_object *obj;
 	int log_level;
+	const char *target_btf_path;
 };
 
 /* Load/unload object into/from kernel */
-- 
2.17.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox