* Re: pull-request: can 2019-07-24
From: David Miller @ 2019-07-24 21:15 UTC (permalink / raw)
To: mkl; +Cc: netdev, linux-can, kernel
In-Reply-To: <20190724130322.31702-1-mkl@pengutronix.de>
From: Marc Kleine-Budde <mkl@pengutronix.de>
Date: Wed, 24 Jul 2019 15:03:15 +0200
> this is a pull reqeust of 7 patches for net/master.
Pulled, thank you.
^ permalink raw reply
* Re: [PATCH] qlge: Fix build error without CONFIG_ETHERNET
From: David Miller @ 2019-07-24 21:12 UTC (permalink / raw)
To: yuehaibing
Cc: manishc, GR-Linux-NIC-Dev, gregkh, bpoirier, linux-kernel, devel,
netdev
In-Reply-To: <20190724.141202.10100086363454182.davem@davemloft.net>
From: David Miller <davem@davemloft.net>
Date: Wed, 24 Jul 2019 14:12:02 -0700 (PDT)
> From: YueHaibing <yuehaibing@huawei.com>
> Date: Wed, 24 Jul 2019 21:01:26 +0800
>
>> Now if CONFIG_ETHERNET is not set, QLGE driver
>> building fails:
>>
>> drivers/staging/qlge/qlge_main.o: In function `qlge_remove':
>> drivers/staging/qlge/qlge_main.c:4831: undefined reference to `unregister_netdev'
>>
>> Reported-by: Hulk Robot <hulkci@huawei.com>
>> Fixes: 955315b0dc8c ("qlge: Move drivers/net/ethernet/qlogic/qlge/ to drivers/staging/qlge/")
>> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
>
> I'll let Greg take this.
Actually, I take that back.
Since the move to staging happened in my tree I will take this ;-)
^ permalink raw reply
* Re: [PATCH] qlge: Fix build error without CONFIG_ETHERNET
From: David Miller @ 2019-07-24 21:12 UTC (permalink / raw)
To: yuehaibing
Cc: manishc, GR-Linux-NIC-Dev, gregkh, bpoirier, linux-kernel, devel,
netdev
In-Reply-To: <20190724130126.53532-1-yuehaibing@huawei.com>
From: YueHaibing <yuehaibing@huawei.com>
Date: Wed, 24 Jul 2019 21:01:26 +0800
> Now if CONFIG_ETHERNET is not set, QLGE driver
> building fails:
>
> drivers/staging/qlge/qlge_main.o: In function `qlge_remove':
> drivers/staging/qlge/qlge_main.c:4831: undefined reference to `unregister_netdev'
>
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Fixes: 955315b0dc8c ("qlge: Move drivers/net/ethernet/qlogic/qlge/ to drivers/staging/qlge/")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
I'll let Greg take this.
^ permalink raw reply
* Re: [PATCH] r8169: fix a typo in a comment
From: David Miller @ 2019-07-24 21:11 UTC (permalink / raw)
To: corentinmusard; +Cc: netdev, linux-kernel, trivial
In-Reply-To: <20190724123443.GA9626@user.home>
From: Corentin Musard <corentinmusard@gmail.com>
Date: Wed, 24 Jul 2019 14:34:43 +0200
> Replace "additonal" by "additional" in a comment.
> Typo found by checkpatch.pl.
>
> Signed-off-by: Corentin Musard <corentinmusard@gmail.com>
Applied to net-next.
^ permalink raw reply
* Re: [PATCH v4 net-next 11/19] ionic: Add Rx filter and rx_mode ndo support
From: Saeed Mahameed @ 2019-07-24 21:10 UTC (permalink / raw)
To: snelson@pensando.io, netdev@vger.kernel.org, davem@davemloft.net
In-Reply-To: <dbc85d6d-b0cd-43d4-aded-88b482ecc477@pensando.io>
On Tue, 2019-07-23 at 17:07 -0700, Shannon Nelson wrote:
> On 7/23/19 4:20 PM, Saeed Mahameed wrote:
> > On Mon, 2019-07-22 at 14:40 -0700, Shannon Nelson wrote:
> > > @@ -607,6 +947,8 @@ static void ionic_lif_free(struct lif *lif)
> > > ionic_qcqs_free(lif);
> > > ionic_lif_reset(lif);
> > >
> > I don't think you want deferred.work running while reset is
> > executing..
> > cancel_work_sync should happen as early as you close the netdevice.
>
> Given the current implementation, it doesn't actually hurt anything,
> but
> yes it makes sense to move it up in the sequence.
>
> > I assume ionic_lif_reset will flush all configurations and you
> > don't
> > need to cleanup anything manually? or any data structure stored in
> > driver ?
>
> Most of the driver data structure cleaning has happened in
> ionic_lif_deinit() before getting here.
This means that yo do have a problem keeping the deferred.work running
after ionic_lif_deinit
>
> sln
>
^ permalink raw reply
* Re: Reminder: 99 open syzbot bugs in net subsystem
From: Eric Biggers @ 2019-07-24 21:09 UTC (permalink / raw)
To: David Miller
Cc: eric.dumazet, dvyukov, netdev, fw, i.maximets, edumazet, dsahern,
linux-kernel, syzkaller-bugs
In-Reply-To: <20190724.130928.1854327585456756387.davem@davemloft.net>
On Wed, Jul 24, 2019 at 01:09:28PM -0700, David Miller wrote:
> From: Eric Biggers <ebiggers@kernel.org>
> Date: Wed, 24 Jul 2019 11:37:12 -0700
>
> > We can argue about what words to use to describe this situation, but
> > it doesn't change the situation itself.
>
> And we should argue about those words because it matters to humans and
> effects how they feel, and humans ultimately fix these bugs.
>
> So please stop with the hyperbole.
>
> Thank you.
Okay, there are 151 bugs that syzbot saw on the mainline Linux kernel in the
last 7 days (90.1% with reproducers). Of those, 59 were reported over 3 months
ago (89.8% with reproducers). Of those, 12 were reported over a year ago (83.3%
with reproducers).
No opinion on whether those are small/medium/large numbers, in case it would
hurt someone's feelings.
These numbers do *not* include bugs that are still valid but weren't seen on
mainline in last 7 days, e.g.:
- Bugs that are seen only rarely, so by chance weren't seen in last 7 days.
- Bugs only in linux-next and/or subsystem branches.
- Bugs that were seen in mainline more than 7 days ago, and then only on
linux-next or subsystem branch in last 7 days.
- Bugs that stopped being seen due to a change in syzkaller.
- Bugs that stopped being seen due to a change in kernel config.
- Bugs that stopped being seen due to other environment changes such as kernel
command line parameters.
- Bugs that stopped being seen due to a kernel change that hid the bug but
didn't actually fix it, i.e. still reachable in other ways.
Eric
^ permalink raw reply
* [PATCH net-next] r8169: improve rtl_set_rx_mode
From: Heiner Kallweit @ 2019-07-24 21:04 UTC (permalink / raw)
To: Realtek linux nic maintainers, David Miller; +Cc: netdev@vger.kernel.org
This patch improves and simplifies rtl_set_rx_mode a little.
No functional change intended.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
drivers/net/ethernet/realtek/r8169_main.c | 52 ++++++++++-------------
1 file changed, 22 insertions(+), 30 deletions(-)
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index 9c743d2fc..c39d3a77c 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -61,7 +61,7 @@
/* Maximum number of multicast addresses to filter (vs. Rx-all-multicast).
The RTL chips use a 64 element hash table based on the Ethernet CRC. */
-static const int multicast_filter_limit = 32;
+#define MC_FILTER_LIMIT 32
#define TX_DMA_BURST 7 /* Maximum PCI burst, '7' is unlimited */
#define InterFrameGap 0x03 /* 3 means InterFrameGap = the shortest one */
@@ -4147,53 +4147,45 @@ static void rtl8169_set_magic_reg(struct rtl8169_private *tp, unsigned mac_versi
static void rtl_set_rx_mode(struct net_device *dev)
{
struct rtl8169_private *tp = netdev_priv(dev);
- u32 mc_filter[2]; /* Multicast hash filter */
- int rx_mode;
- u32 tmp = 0;
+ /* Multicast hash filter */
+ u32 mc_filter[2] = { 0xffffffff, 0xffffffff };
+ u32 rx_mode = AcceptBroadcast | AcceptMyPhys | AcceptMulticast;
+ u32 tmp;
if (dev->flags & IFF_PROMISC) {
/* Unconditionally log net taps. */
netif_notice(tp, link, dev, "Promiscuous mode enabled\n");
- rx_mode =
- AcceptBroadcast | AcceptMulticast | AcceptMyPhys |
- AcceptAllPhys;
- mc_filter[1] = mc_filter[0] = 0xffffffff;
- } else if ((netdev_mc_count(dev) > multicast_filter_limit) ||
- (dev->flags & IFF_ALLMULTI)) {
- /* Too many to filter perfectly -- accept all multicasts. */
- rx_mode = AcceptBroadcast | AcceptMulticast | AcceptMyPhys;
- mc_filter[1] = mc_filter[0] = 0xffffffff;
+ rx_mode |= AcceptAllPhys;
+ } else if (netdev_mc_count(dev) > MC_FILTER_LIMIT ||
+ dev->flags & IFF_ALLMULTI ||
+ tp->mac_version == RTL_GIGA_MAC_VER_35) {
+ /* accept all multicasts */
+ } else if (netdev_mc_empty(dev)) {
+ rx_mode &= ~AcceptMulticast;
} else {
struct netdev_hw_addr *ha;
- rx_mode = AcceptBroadcast | AcceptMyPhys;
mc_filter[1] = mc_filter[0] = 0;
netdev_for_each_mc_addr(ha, dev) {
- int bit_nr = ether_crc(ETH_ALEN, ha->addr) >> 26;
- mc_filter[bit_nr >> 5] |= 1 << (bit_nr & 31);
- rx_mode |= AcceptMulticast;
+ u32 bit_nr = ether_crc(ETH_ALEN, ha->addr) >> 26;
+ mc_filter[bit_nr >> 5] |= BIT(bit_nr & 31);
+ }
+
+ if (tp->mac_version > RTL_GIGA_MAC_VER_06) {
+ tmp = mc_filter[0];
+ mc_filter[0] = swab32(mc_filter[1]);
+ mc_filter[1] = swab32(tmp);
}
}
if (dev->features & NETIF_F_RXALL)
rx_mode |= (AcceptErr | AcceptRunt);
- tmp = (RTL_R32(tp, RxConfig) & ~RX_CONFIG_ACCEPT_MASK) | rx_mode;
-
- if (tp->mac_version > RTL_GIGA_MAC_VER_06) {
- u32 data = mc_filter[0];
-
- mc_filter[0] = swab32(mc_filter[1]);
- mc_filter[1] = swab32(data);
- }
-
- if (tp->mac_version == RTL_GIGA_MAC_VER_35)
- mc_filter[1] = mc_filter[0] = 0xffffffff;
-
RTL_W32(tp, MAR0 + 4, mc_filter[1]);
RTL_W32(tp, MAR0 + 0, mc_filter[0]);
- RTL_W32(tp, RxConfig, tmp);
+ tmp = RTL_R32(tp, RxConfig);
+ RTL_W32(tp, RxConfig, (tmp & ~RX_CONFIG_ACCEPT_MASK) | rx_mode);
}
DECLARE_RTL_COND(rtl_csiar_cond)
--
2.22.0
^ permalink raw reply related
* Re: [PATCH] ip6_gre: reload ipv6h in prepare_ip6gre_xmit_ipv6
From: David Miller @ 2019-07-24 21:00 UTC (permalink / raw)
To: yanhaishuang; +Cc: kuznet, netdev, linux-kernel, u9012063
In-Reply-To: <1563969642-11843-1-git-send-email-yanhaishuang@cmss.chinamobile.com>
From: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Date: Wed, 24 Jul 2019 20:00:42 +0800
> Since ip6_tnl_parse_tlv_enc_lim() can call pskb_may_pull()
> which may change skb->data, so we need to re-load ipv6h at
> the right place.
>
> Fixes: 898b29798e36 ("ip6_gre: Refactor ip6gre xmit codes")
> Cc: William Tu <u9012063@gmail.com>
> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Applied and queued up for -stable, thanks.
^ permalink raw reply
* Re: [PATCH] net/ipv4: cleanup error condition testing
From: David Miller @ 2019-07-24 20:57 UTC (permalink / raw)
To: pavel; +Cc: kuznet, yoshfuji, netdev, linux-kernel
In-Reply-To: <20190724100611.GA7516@amd>
From: Pavel Machek <pavel@ucw.cz>
Date: Wed, 24 Jul 2019 12:06:11 +0200
> Cleanup testing for error condition.
>
> Signed-off-by: Pavel Machek <pavel@denx.de>
Applied.
^ permalink raw reply
* Re: [PATCH mlx5-next] net/mlx5: Fix modify_cq_in alignment
From: Saeed Mahameed @ 2019-07-24 20:56 UTC (permalink / raw)
To: davem@davemloft.net, leon@kernel.org
Cc: Jason Gunthorpe, Yishai Hadas, netdev@vger.kernel.org,
linux-rdma@vger.kernel.org, dledford@redhat.com, Edward Srouji
In-Reply-To: <20190723190414.GU5125@mtr-leonro.mtl.com>
On Tue, 2019-07-23 at 22:04 +0300, Leon Romanovsky wrote:
> On Tue, Jul 23, 2019 at 11:28:50AM -0700, David Miller wrote:
> > From: Leon Romanovsky <leon@kernel.org>
> > Date: Tue, 23 Jul 2019 10:12:55 +0300
> >
> > > From: Edward Srouji <edwards@mellanox.com>
> > >
> > > Fix modify_cq_in alignment to match the device specification.
> > > After this fix the 'cq_umem_valid' field will be in the right
> > > offset.
> > >
> > > Cc: <stable@vger.kernel.org> # 4.19
> > > Fixes: bd37197554eb ("net/mlx5: Update mlx5_ifc with DEVX UID
> > > bits")
Leon, I applied this patch to my tree, it got marked for -stable 4.20
and not 4.19, i checked manually and indeed the offending patch came to
light only on 4.20
^ permalink raw reply
* Re: [PATCH] r8169: fix a typo in a comment
From: Heiner Kallweit @ 2019-07-24 20:53 UTC (permalink / raw)
To: Corentin Musard, David S. Miller, netdev, linux-kernel; +Cc: trivial
In-Reply-To: <20190724123443.GA9626@user.home>
On 24.07.2019 14:34, Corentin Musard wrote:
> Replace "additonal" by "additional" in a comment.
> Typo found by checkpatch.pl.
>
> Signed-off-by: Corentin Musard <corentinmusard@gmail.com>
> ---
> drivers/net/ethernet/realtek/r8169_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index 0637c6752a78..7231ab3573ff 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -6334,7 +6334,7 @@ rtl8169_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
> stats->multicast = dev->stats.multicast;
>
> /*
> - * Fetch additonal counter values missing in stats collected by driver
> + * Fetch additional counter values missing in stats collected by driver
> * from tally counters.
> */
> if (pm_runtime_active(&pdev->dev))
>
Should have been annotated net-next and sent also to me. Apart from that:
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
^ permalink raw reply
* Re: [PATCH bpf-next 1/6] bpf: add bpf_map_value_size and bp_map_copy_value helper functions
From: Song Liu @ 2019-07-24 20:53 UTC (permalink / raw)
To: Brian Vazquez
Cc: Brian Vazquez, Alexei Starovoitov, Daniel Borkmann,
David S . Miller, Stanislav Fomichev, Willem de Bruijn,
Petar Penkov, open list, Networking, bpf
In-Reply-To: <20190724165803.87470-2-brianvv@google.com>
On Wed, Jul 24, 2019 at 10:10 AM Brian Vazquez <brianvv@google.com> wrote:
>
> Move reusable code from map_lookup_elem to helper functions to avoid code
> duplication in kernel/bpf/syscall.c
>
> Suggested-by: Stanislav Fomichev <sdf@google.com>
> Signed-off-by: Brian Vazquez <brianvv@google.com>
Acked-by: Song Liu <songliubraving@fb.com>
Some very minor nits though.
> ---
> kernel/bpf/syscall.c | 134 +++++++++++++++++++++++--------------------
> 1 file changed, 73 insertions(+), 61 deletions(-)
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 5d141f16f6fa9..86cdc2f7bb56e 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -126,6 +126,76 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
> return map;
> }
>
> +static u32 bpf_map_value_size(struct bpf_map *map)
> +{
> + if (map->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
> + map->map_type == BPF_MAP_TYPE_LRU_PERCPU_HASH ||
> + map->map_type == BPF_MAP_TYPE_PERCPU_ARRAY ||
> + map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE)
> + return round_up(map->value_size, 8) * num_possible_cpus();
> + else if (IS_FD_MAP(map))
> + return sizeof(u32);
> + else
> + return map->value_size;
^ extra space after return
> +}
> +
> +static int bpf_map_copy_value(struct bpf_map *map, void *key, void *value,
> + __u64 flags)
> +{
> + void *ptr;
> + int err;
> +
> + if (bpf_map_is_dev_bound(map))
> + return bpf_map_offload_lookup_elem(map, key, value);
^ another extra space after return, did replace? :-)
Thanks,
Song
^ permalink raw reply
* Re: [net-next 6/6] e1000e: disable force K1-off feature
From: Jeff Kirsher @ 2019-07-24 20:52 UTC (permalink / raw)
To: David Miller; +Cc: kai.heng.feng, netdev, nhorman, sassmann, aaron.f.brown
In-Reply-To: <20190723.140444.1126474066269131522.davem@davemloft.net>
[-- Attachment #1: Type: text/plain, Size: 1060 bytes --]
On Tue, 2019-07-23 at 14:04 -0700, David Miller wrote:
> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Date: Tue, 23 Jul 2019 10:36:50 -0700
>
> > diff --git a/drivers/net/ethernet/intel/e1000e/hw.h
> > b/drivers/net/ethernet/intel/e1000e/hw.h
> > index eff75bd8a8f0..e3c71fd093ee 100644
> > --- a/drivers/net/ethernet/intel/e1000e/hw.h
> > +++ b/drivers/net/ethernet/intel/e1000e/hw.h
> > @@ -662,6 +662,7 @@ struct e1000_dev_spec_ich8lan {
> > bool kmrn_lock_loss_workaround_enabled;
> > struct e1000_shadow_ram
> > shadow_ram[E1000_ICH8_SHADOW_RAM_WORDS];
> > bool nvm_k1_enabled;
> > + bool disable_k1_off;
> > bool eee_disable;
>
> I don't see any code actually setting this boolean, how does it work?
So either this patch is missing a bit of code OR this is not needed for
the Linux driver. Either way this patch needs changes or needs to be
dropped, so I am dropping this patch from the series until I get
further information from the client driver team.
I will be re-submitting this series, minus this patch.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH net-next 2/2] mlx4/en_netdev: call notifiers when hw_enc_features change
From: Saeed Mahameed @ 2019-07-24 20:47 UTC (permalink / raw)
To: dcaratti@redhat.com, davem@davemloft.net, Tariq Toukan,
netdev@vger.kernel.org
In-Reply-To: <e157af6e79d9385df37444d817cf3c166878c8f6.1563976690.git.dcaratti@redhat.com>
On Wed, 2019-07-24 at 16:02 +0200, Davide Caratti wrote:
> ensure to call netdev_features_change() when the driver flips its
> hw_enc_features bits.
>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
The patch is correct, but can you explain how did you come to this ?
did you encounter any issue with the current code ?
I am asking just because i think the whole dynamic changing of dev-
>hw_enc_features is redundant since mlx4 has the featutres_check
callback.
> ---
> .../net/ethernet/mellanox/mlx4/en_netdev.c | 39 ++++++++++++-----
> --
> 1 file changed, 26 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> index 52500f744a0e..1b484dc6e1c2 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> @@ -2628,6 +2628,30 @@ static int mlx4_en_get_phys_port_id(struct
> net_device *dev,
> return 0;
> }
>
> +#define MLX4_GSO_PARTIAL_FEATURES (NETIF_F_IP_CSUM |
> NETIF_F_IPV6_CSUM | \
> + NETIF_F_RXCSUM | \
> + NETIF_F_TSO | NETIF_F_TSO6 | \
> + NETIF_F_GSO_UDP_TUNNEL | \
> + NETIF_F_GSO_UDP_TUNNEL_CSUM | \
> + NETIF_F_GSO_PARTIAL)
> +
> +static void mlx4_set_vxlan_offloads(struct net_device *dev, bool
> enable)
> +{
> + netdev_features_t hw_enc_features;
> +
> + rtnl_lock();
> + hw_enc_features = dev->hw_enc_features;
> + if (enable)
> + dev->hw_enc_features |= MLX4_GSO_PARTIAL_FEATURES;
> + else
> + dev->hw_enc_features &= ~MLX4_GSO_PARTIAL_FEATURES;
> +
> + if (hw_enc_features ^ dev->hw_enc_features)
> + netdev_features_change(dev);
> +
> + rtnl_unlock();
> +}
> +
> static void mlx4_en_add_vxlan_offloads(struct work_struct *work)
> {
> int ret;
> @@ -2647,12 +2671,7 @@ static void mlx4_en_add_vxlan_offloads(struct
> work_struct *work)
> }
>
> /* set offloads */
> - priv->dev->hw_enc_features |= NETIF_F_IP_CSUM |
> NETIF_F_IPV6_CSUM |
> - NETIF_F_RXCSUM |
> - NETIF_F_TSO | NETIF_F_TSO6 |
> - NETIF_F_GSO_UDP_TUNNEL |
> - NETIF_F_GSO_UDP_TUNNEL_CSUM |
> - NETIF_F_GSO_PARTIAL;
> + mlx4_set_vxlan_offloads(priv->dev, true);
> }
>
> static void mlx4_en_del_vxlan_offloads(struct work_struct *work)
> @@ -2661,13 +2680,7 @@ static void mlx4_en_del_vxlan_offloads(struct
> work_struct *work)
> struct mlx4_en_priv *priv = container_of(work, struct
> mlx4_en_priv,
> vxlan_del_task);
> /* unset offloads */
> - priv->dev->hw_enc_features &= ~(NETIF_F_IP_CSUM |
> NETIF_F_IPV6_CSUM |
> - NETIF_F_RXCSUM |
> - NETIF_F_TSO | NETIF_F_TSO6 |
> - NETIF_F_GSO_UDP_TUNNEL |
> - NETIF_F_GSO_UDP_TUNNEL_CSUM |
> - NETIF_F_GSO_PARTIAL);
> -
> + mlx4_set_vxlan_offloads(priv->dev, false);
> ret = mlx4_SET_PORT_VXLAN(priv->mdev->dev, priv->port,
> VXLAN_STEER_BY_OUTER_MAC, 0);
> if (ret)
^ permalink raw reply
* Re: [PATCH net-next 1/4] sctp: check addr_size with sa_family_t size in __sctp_setsockopt_connectx
From: Neil Horman @ 2019-07-24 20:43 UTC (permalink / raw)
To: Marcelo Ricardo Leitner; +Cc: Xin Long, network dev, linux-sctp, davem
In-Reply-To: <20190724191243.GA4063@localhost.localdomain>
On Wed, Jul 24, 2019 at 04:12:43PM -0300, Marcelo Ricardo Leitner wrote:
> On Wed, Jul 24, 2019 at 04:05:43PM -0300, Marcelo Ricardo Leitner wrote:
> > On Wed, Jul 24, 2019 at 02:44:56PM -0400, Neil Horman wrote:
> > > On Wed, Jul 24, 2019 at 09:49:07AM -0300, Marcelo Ricardo Leitner wrote:
> > > > On Wed, Jul 24, 2019 at 09:36:50AM -0300, Marcelo Ricardo Leitner wrote:
> > > > > On Wed, Jul 24, 2019 at 07:22:35AM -0400, Neil Horman wrote:
> > > > > > On Wed, Jul 24, 2019 at 03:21:12PM +0800, Xin Long wrote:
> > > > > > > On Tue, Jul 23, 2019 at 11:25 PM Neil Horman <nhorman@tuxdriver.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jul 23, 2019 at 01:37:57AM +0800, Xin Long wrote:
> > > > > > > > > Now __sctp_connect() is called by __sctp_setsockopt_connectx() and
> > > > > > > > > sctp_inet_connect(), the latter has done addr_size check with size
> > > > > > > > > of sa_family_t.
> > > > > > > > >
> > > > > > > > > In the next patch to clean up __sctp_connect(), we will remove
> > > > > > > > > addr_size check with size of sa_family_t from __sctp_connect()
> > > > > > > > > for the 1st address.
> > > > > > > > >
> > > > > > > > > So before doing that, __sctp_setsockopt_connectx() should do
> > > > > > > > > this check first, as sctp_inet_connect() does.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > > > > > > > > ---
> > > > > > > > > net/sctp/socket.c | 2 +-
> > > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > >
> > > > > > > > > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > > > > > > > > index aa80cda..5f92e4a 100644
> > > > > > > > > --- a/net/sctp/socket.c
> > > > > > > > > +++ b/net/sctp/socket.c
> > > > > > > > > @@ -1311,7 +1311,7 @@ static int __sctp_setsockopt_connectx(struct sock *sk,
> > > > > > > > > pr_debug("%s: sk:%p addrs:%p addrs_size:%d\n",
> > > > > > > > > __func__, sk, addrs, addrs_size);
> > > > > > > > >
> > > > > > > > > - if (unlikely(addrs_size <= 0))
> > > > > > > > > + if (unlikely(addrs_size < sizeof(sa_family_t)))
> > > > > > > > I don't think this is what you want to check for here. sa_family_t is
> > > > > > > > an unsigned short, and addrs_size is the number of bytes in the addrs
> > > > > > > > array. The addrs array should be at least the size of one struct
> > > > > > > > sockaddr (16 bytes iirc), and, if larger, should be a multiple of
> > > > > > > > sizeof(struct sockaddr)
> > > > > > > sizeof(struct sockaddr) is not the right value to check either.
> > > > > > >
> > > > > > > The proper check will be done later in __sctp_connect():
> > > > > > >
> > > > > > > af = sctp_get_af_specific(daddr->sa.sa_family);
> > > > > > > if (!af || af->sockaddr_len > addrs_size)
> > > > > > > return -EINVAL;
> > > > > > >
> > > > > > > So the check 'addrs_size < sizeof(sa_family_t)' in this patch is
> > > > > > > just to make sure daddr->sa.sa_family is accessible. the same
> > > > > > > check is also done in sctp_inet_connect().
> > > > > > >
> > > > > > That doesn't make much sense, if the proper check is done in __sctp_connect with
> > > > > > the size of the families sockaddr_len, then we don't need this check at all, we
> > > > > > can just let memdup_user take the fault on copy_to_user and return -EFAULT. If
> > > > > > we get that from memdup_user, we know its not accessible, and can bail out.
> > > > > >
> > > > > > About the only thing we need to check for here is that addr_len isn't some
> > > > > > absurdly high value (i.e. a negative value), so that we avoid trying to kmalloc
> > > > > > upwards of 2G in memdup_user. Your change does that just fine, but its no
> > > > > > better or worse than checking for <=0
> > > > >
> > > > > One can argue that such check against absurdly high values is random
> > > > > and not effective, as 2G can be somewhat reasonable on 8GB systems but
> > > > > certainly isn't on 512MB ones. On that, kmemdup_user() will also fail
> > > > > gracefully as it uses GFP_USER and __GFP_NOWARN.
> > > > >
> > > > > The original check is more for protecting for sane usage of the
> > > > > variable, which is an int, and a negative value is questionable. We
> > > > > could cast, yes, but.. was that really the intent of the application?
> > > > > Probably not.
> > > >
> > > > Though that said, I'm okay with the new check here: a quick sanity
> > > > check that can avoid expensive calls to kmalloc(), while more refined
> > > > check is done later on.
> > > >
> > > I agree a sanity check makes sense, just to avoid allocating a huge value
> > > (even 2G is absurd on many systems), however, I'm not super comfortable with
> > > checking for the value being less than 16 (sizeof(sa_family_t)). The zero check
> >
> > 16 bits you mean then, per
> > include/uapi/linux/socket.h
> > typedef unsigned short __kernel_sa_family_t;
> > include/linux/socket.h
> > typedef __kernel_sa_family_t sa_family_t;
> >
> > > is fairly obvious given the signed nature of the lengh field, this check makes
> > > me wonder what exactly we are checking for.
> >
> > A minimum viable buffer without doing more extensive tests. Beyond
> > sa_family, we need to parse sa_family and then that's left for later.
> > Perhaps a comment helps, something like
> > /* Check if we have at least the family type in there */
> > ?
>
> Hm, then this could be
> - if (unlikely(addrs_size <= 0))
> + if (unlikely(addrs_size < sizeof(struct sockaddr_in)))
> (ipv4)
> As it can't be smaller than that, always.
>
True, but I think perhaps just the family type size check is more correct, as
thats the minimal information we need to get the proper sockaddr_len out of
sctp_get_af_specific.
Neil
> >
> > Marcelo
> >
> > >
> > > Neil
> > >
> > > > >
> > > > > >
> > > > > > Neil
> > > > > >
> > > > > > > >
> > > > > > > > Neil
> > > > > > > >
> > > > > > > > > return -EINVAL;
> > > > > > > > >
> > > > > > > > > kaddrs = memdup_user(addrs, addrs_size);
> > > > > > > > > --
> > > > > > > > > 2.1.0
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > >
> >
>
^ permalink raw reply
* Re: [PATCH bpf-next] libbpf: silence GCC8 warning about string truncation
From: Song Liu @ 2019-07-24 20:43 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Kernel Team, Magnus Karlsson
In-Reply-To: <20190723220127.1815913-1-andriin@fb.com>
On Tue, Jul 23, 2019 at 7:33 PM Andrii Nakryiko <andriin@fb.com> wrote:
>
> Despite a proper NULL-termination after strncpy(..., ..., IFNAMSIZ - 1),
> GCC8 still complains about *expected* string truncation:
>
> xsk.c:330:2: error: 'strncpy' output may be truncated copying 15 bytes
> from a string of length 15 [-Werror=stringop-truncation]
> strncpy(ifr.ifr_name, xsk->ifname, IFNAMSIZ - 1);
>
> This patch gets rid of the issue altogether by using memcpy instead.
> There is no performance regression, as strncpy will still copy and fill
> all of the bytes anyway.
>
> Cc: Magnus Karlsson <magnus.karlsson@intel.com>
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Song Liu <songliubraving@fb.com>
^ permalink raw reply
* Re: [PATCH net-next 1/4] sctp: check addr_size with sa_family_t size in __sctp_setsockopt_connectx
From: Neil Horman @ 2019-07-24 20:41 UTC (permalink / raw)
To: Marcelo Ricardo Leitner; +Cc: Xin Long, network dev, linux-sctp, davem
In-Reply-To: <20190724190543.GH6204@localhost.localdomain>
On Wed, Jul 24, 2019 at 04:05:43PM -0300, Marcelo Ricardo Leitner wrote:
> On Wed, Jul 24, 2019 at 02:44:56PM -0400, Neil Horman wrote:
> > On Wed, Jul 24, 2019 at 09:49:07AM -0300, Marcelo Ricardo Leitner wrote:
> > > On Wed, Jul 24, 2019 at 09:36:50AM -0300, Marcelo Ricardo Leitner wrote:
> > > > On Wed, Jul 24, 2019 at 07:22:35AM -0400, Neil Horman wrote:
> > > > > On Wed, Jul 24, 2019 at 03:21:12PM +0800, Xin Long wrote:
> > > > > > On Tue, Jul 23, 2019 at 11:25 PM Neil Horman <nhorman@tuxdriver.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jul 23, 2019 at 01:37:57AM +0800, Xin Long wrote:
> > > > > > > > Now __sctp_connect() is called by __sctp_setsockopt_connectx() and
> > > > > > > > sctp_inet_connect(), the latter has done addr_size check with size
> > > > > > > > of sa_family_t.
> > > > > > > >
> > > > > > > > In the next patch to clean up __sctp_connect(), we will remove
> > > > > > > > addr_size check with size of sa_family_t from __sctp_connect()
> > > > > > > > for the 1st address.
> > > > > > > >
> > > > > > > > So before doing that, __sctp_setsockopt_connectx() should do
> > > > > > > > this check first, as sctp_inet_connect() does.
> > > > > > > >
> > > > > > > > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > > > > > > > ---
> > > > > > > > net/sctp/socket.c | 2 +-
> > > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > > > > > > > index aa80cda..5f92e4a 100644
> > > > > > > > --- a/net/sctp/socket.c
> > > > > > > > +++ b/net/sctp/socket.c
> > > > > > > > @@ -1311,7 +1311,7 @@ static int __sctp_setsockopt_connectx(struct sock *sk,
> > > > > > > > pr_debug("%s: sk:%p addrs:%p addrs_size:%d\n",
> > > > > > > > __func__, sk, addrs, addrs_size);
> > > > > > > >
> > > > > > > > - if (unlikely(addrs_size <= 0))
> > > > > > > > + if (unlikely(addrs_size < sizeof(sa_family_t)))
> > > > > > > I don't think this is what you want to check for here. sa_family_t is
> > > > > > > an unsigned short, and addrs_size is the number of bytes in the addrs
> > > > > > > array. The addrs array should be at least the size of one struct
> > > > > > > sockaddr (16 bytes iirc), and, if larger, should be a multiple of
> > > > > > > sizeof(struct sockaddr)
> > > > > > sizeof(struct sockaddr) is not the right value to check either.
> > > > > >
> > > > > > The proper check will be done later in __sctp_connect():
> > > > > >
> > > > > > af = sctp_get_af_specific(daddr->sa.sa_family);
> > > > > > if (!af || af->sockaddr_len > addrs_size)
> > > > > > return -EINVAL;
> > > > > >
> > > > > > So the check 'addrs_size < sizeof(sa_family_t)' in this patch is
> > > > > > just to make sure daddr->sa.sa_family is accessible. the same
> > > > > > check is also done in sctp_inet_connect().
> > > > > >
> > > > > That doesn't make much sense, if the proper check is done in __sctp_connect with
> > > > > the size of the families sockaddr_len, then we don't need this check at all, we
> > > > > can just let memdup_user take the fault on copy_to_user and return -EFAULT. If
> > > > > we get that from memdup_user, we know its not accessible, and can bail out.
> > > > >
> > > > > About the only thing we need to check for here is that addr_len isn't some
> > > > > absurdly high value (i.e. a negative value), so that we avoid trying to kmalloc
> > > > > upwards of 2G in memdup_user. Your change does that just fine, but its no
> > > > > better or worse than checking for <=0
> > > >
> > > > One can argue that such check against absurdly high values is random
> > > > and not effective, as 2G can be somewhat reasonable on 8GB systems but
> > > > certainly isn't on 512MB ones. On that, kmemdup_user() will also fail
> > > > gracefully as it uses GFP_USER and __GFP_NOWARN.
> > > >
> > > > The original check is more for protecting for sane usage of the
> > > > variable, which is an int, and a negative value is questionable. We
> > > > could cast, yes, but.. was that really the intent of the application?
> > > > Probably not.
> > >
> > > Though that said, I'm okay with the new check here: a quick sanity
> > > check that can avoid expensive calls to kmalloc(), while more refined
> > > check is done later on.
> > >
> > I agree a sanity check makes sense, just to avoid allocating a huge value
> > (even 2G is absurd on many systems), however, I'm not super comfortable with
> > checking for the value being less than 16 (sizeof(sa_family_t)). The zero check
>
> 16 bits you mean then, per
> include/uapi/linux/socket.h
> typedef unsigned short __kernel_sa_family_t;
> include/linux/socket.h
> typedef __kernel_sa_family_t sa_family_t;
>
> > is fairly obvious given the signed nature of the lengh field, this check makes
> > me wonder what exactly we are checking for.
>
> A minimum viable buffer without doing more extensive tests. Beyond
> sa_family, we need to parse sa_family and then that's left for later.
> Perhaps a comment helps, something like
> /* Check if we have at least the family type in there */
> ?
>
Yeah, I'd be ok with something like that, or perhaps:
/* Check if we have at least the family value in buffer, so get_af_specific can
* do a proper size check in __sctp_connect
*/
Neil
> Marcelo
>
> >
> > Neil
> >
> > > >
> > > > >
> > > > > Neil
> > > > >
> > > > > > >
> > > > > > > Neil
> > > > > > >
> > > > > > > > return -EINVAL;
> > > > > > > >
> > > > > > > > kaddrs = memdup_user(addrs, addrs_size);
> > > > > > > > --
> > > > > > > > 2.1.0
> > > > > > > >
> > > > > > > >
> > > > > >
> > >
>
^ permalink raw reply
* Re: [PATCH net-next 1/2] mlx4/en_netdev: update vlan features with tunnel offloads
From: Saeed Mahameed @ 2019-07-24 20:39 UTC (permalink / raw)
To: dcaratti@redhat.com, davem@davemloft.net, Tariq Toukan,
netdev@vger.kernel.org
In-Reply-To: <c3ccd45ee8742fb1a35fcfd41955907329e8112b.1563976690.git.dcaratti@redhat.com>
On Wed, 2019-07-24 at 16:02 +0200, Davide Caratti wrote:
> ConnectX-3 Pro can offload transmission of VLAN packets with VXLAN
> inside:
> enable tunnel offloads in dev->vlan_features, like it's done with
> other
> NIC drivers (e.g. be2net and ixgbe).
>
> Signed-off-by: Davide Caratti <dcaratti@redhat.com>
LGTM
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
^ permalink raw reply
* [PATCH net-next 1/3] net: ethernet: mediatek: Add basic PHYLINK support
From: René van Dorst @ 2019-07-24 19:23 UTC (permalink / raw)
To: netdev
Cc: frank-w, sean.wang, f.fainelli, linux, davem, matthias.bgg,
andrew, vivien.didelot, john, linux-mediatek, linux-mips, robh+dt,
devicetree, René van Dorst
This convert the basics to PHYLINK API.
SGMII support is not in this patch.
Signed-off-by: René van Dorst <opensource@vdorst.com>
Tested-by: Frank Wunderlich <frank-w@public-files.de>
---
drivers/net/ethernet/mediatek/Kconfig | 2 +-
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 404 +++++++++++---------
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 30 +-
3 files changed, 253 insertions(+), 183 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/Kconfig b/drivers/net/ethernet/mediatek/Kconfig
index 263cd0909fe0..e20f04d17b49 100644
--- a/drivers/net/ethernet/mediatek/Kconfig
+++ b/drivers/net/ethernet/mediatek/Kconfig
@@ -10,7 +10,7 @@ if NET_VENDOR_MEDIATEK
config NET_MEDIATEK_SOC
tristate "MediaTek SoC Gigabit Ethernet support"
depends on NET_VENDOR_MEDIATEK
- select PHYLIB
+ select PHYLINK
---help---
This driver supports the gigabit ethernet MACs in the
MediaTek SoC family.
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 597e377d8f20..853929070cb3 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -18,6 +18,7 @@
#include <linux/tcp.h>
#include <linux/interrupt.h>
#include <linux/pinctrl/devinfo.h>
+#include <linux/phylink.h>
#include "mtk_eth_soc.h"
@@ -186,165 +187,219 @@ static void mtk_gmac0_rgmii_adjust(struct mtk_eth *eth, int speed)
mtk_w32(eth, val, TRGMII_TCK_CTRL);
}
-static void mtk_phy_link_adjust(struct net_device *dev)
+static void mtk_mac_config(struct phylink_config *config, unsigned int mode,
+ const struct phylink_link_state *state)
{
- struct mtk_mac *mac = netdev_priv(dev);
- u16 lcl_adv = 0, rmt_adv = 0;
- u8 flowctrl;
- u32 mcr = MAC_MCR_MAX_RX_1536 | MAC_MCR_IPG_CFG |
- MAC_MCR_FORCE_MODE | MAC_MCR_TX_EN |
- MAC_MCR_RX_EN | MAC_MCR_BACKOFF_EN |
- MAC_MCR_BACKPR_EN;
+ struct mtk_mac *mac = container_of(config, struct mtk_mac,
+ phylink_config);
+ struct mtk_eth *eth = mac->hw;
- if (unlikely(test_bit(MTK_RESETTING, &mac->hw->state)))
- return;
+ u32 ge_mode = 0, val, mcr_cur, mcr_new;
+
+ if (mac->interface != state->interface) {
+ /* Setup soc pin functions */
+ switch (state->interface) {
+ case PHY_INTERFACE_MODE_TRGMII:
+ if (mac->id)
+ goto err_phy;
+ if (!MTK_HAS_CAPS(mac->hw->soc->caps,
+ MTK_GMAC1_TRGMII))
+ goto err_phy;
+ /* fall through */
+ case PHY_INTERFACE_MODE_RGMII_TXID:
+ case PHY_INTERFACE_MODE_RGMII_RXID:
+ case PHY_INTERFACE_MODE_RGMII_ID:
+ case PHY_INTERFACE_MODE_RGMII:
+ break;
+ case PHY_INTERFACE_MODE_MII:
+ ge_mode = 1;
+ break;
+ case PHY_INTERFACE_MODE_REVMII:
+ ge_mode = 2;
+ break;
+ case PHY_INTERFACE_MODE_RMII:
+ if (mac->id)
+ goto err_phy;
+ ge_mode = 3;
+ break;
+ default:
+ goto err_phy;
+ }
+
+ /* Setup clock for 1st gmac */
+ if (!mac->id &&
+ MTK_HAS_CAPS(mac->hw->soc->caps, MTK_GMAC1_TRGMII)) {
+ if (MTK_HAS_CAPS(mac->hw->soc->caps,
+ MTK_TRGMII_MT7621_CLK)) {
+ if (mt7621_gmac0_rgmii_adjust(mac->hw,
+ state->interface))
+ goto err_phy;
+ } else {
+ if (state->interface !=
+ PHY_INTERFACE_MODE_TRGMII)
+ mtk_gmac0_rgmii_adjust(mac->hw,
+ state->speed);
+ }
+ }
- switch (dev->phydev->speed) {
+ /* put the gmac into the right mode */
+ regmap_read(eth->ethsys, ETHSYS_SYSCFG0, &val);
+ val &= ~SYSCFG0_GE_MODE(SYSCFG0_GE_MASK, mac->id);
+ val |= SYSCFG0_GE_MODE(ge_mode, mac->id);
+ regmap_write(eth->ethsys, ETHSYS_SYSCFG0, val);
+
+ mac->interface = state->interface;
+ }
+
+ /* Setup gmac */
+ mcr_cur = mtk_r32(mac->hw, MTK_MAC_MCR(mac->id));
+ mcr_new = mcr_cur;
+ mcr_new &= ~(MAC_MCR_SPEED_100 | MAC_MCR_SPEED_1000 |
+ MAC_MCR_FORCE_DPX | MAC_MCR_FORCE_TX_FC |
+ MAC_MCR_FORCE_RX_FC);
+ mcr_new |= MAC_MCR_MAX_RX_1536 | MAC_MCR_IPG_CFG | MAC_MCR_FORCE_MODE |
+ MAC_MCR_BACKOFF_EN | MAC_MCR_BACKPR_EN | MAC_MCR_FORCE_LINK;
+
+ switch (state->speed) {
case SPEED_1000:
- mcr |= MAC_MCR_SPEED_1000;
+ mcr_new |= MAC_MCR_SPEED_1000;
break;
case SPEED_100:
- mcr |= MAC_MCR_SPEED_100;
+ mcr_new |= MAC_MCR_SPEED_100;
break;
}
-
- if (MTK_HAS_CAPS(mac->hw->soc->caps, MTK_GMAC1_TRGMII) && !mac->id) {
- if (MTK_HAS_CAPS(mac->hw->soc->caps, MTK_TRGMII_MT7621_CLK)) {
- if (mt7621_gmac0_rgmii_adjust(mac->hw,
- dev->phydev->interface))
- return;
- } else {
- if (!mac->trgmii)
- mtk_gmac0_rgmii_adjust(mac->hw,
- dev->phydev->speed);
- }
+ if (state->duplex == DUPLEX_FULL) {
+ mcr_new |= MAC_MCR_FORCE_DPX;
+ if (state->pause & MLO_PAUSE_TX)
+ mcr_new |= MAC_MCR_FORCE_TX_FC;
+ if (state->pause & MLO_PAUSE_RX)
+ mcr_new |= MAC_MCR_FORCE_RX_FC;
}
- if (dev->phydev->link)
- mcr |= MAC_MCR_FORCE_LINK;
+ /* Only update control register when needed! */
+ if (mcr_new != mcr_cur)
+ mtk_w32(mac->hw, mcr_new, MTK_MAC_MCR(mac->id));
- if (dev->phydev->duplex) {
- mcr |= MAC_MCR_FORCE_DPX;
+ return;
- if (dev->phydev->pause)
- rmt_adv = LPA_PAUSE_CAP;
- if (dev->phydev->asym_pause)
- rmt_adv |= LPA_PAUSE_ASYM;
+err_phy:
+ dev_err(eth->dev, "%s: GMAC%d mode %s not supported!\n", __func__,
+ mac->id, phy_modes(state->interface));
+}
+
+static int mtk_mac_link_state(struct phylink_config *config,
+ struct phylink_link_state *state)
+{
+ struct mtk_mac *mac = container_of(config, struct mtk_mac,
+ phylink_config);
+ u32 pmsr;
- lcl_adv = linkmode_adv_to_lcl_adv_t(dev->phydev->advertising);
- flowctrl = mii_resolve_flowctrl_fdx(lcl_adv, rmt_adv);
+ pmsr = mtk_r32(mac->hw, MTK_MAC_MSR(mac->id));
- if (flowctrl & FLOW_CTRL_TX)
- mcr |= MAC_MCR_FORCE_TX_FC;
- if (flowctrl & FLOW_CTRL_RX)
- mcr |= MAC_MCR_FORCE_RX_FC;
+ state->link = (pmsr & MAC_MSR_LINK);
+ state->duplex = (pmsr & MAC_MSR_DPX) >> 1;
- netif_dbg(mac->hw, link, dev, "rx pause %s, tx pause %s\n",
- flowctrl & FLOW_CTRL_RX ? "enabled" : "disabled",
- flowctrl & FLOW_CTRL_TX ? "enabled" : "disabled");
+ switch (pmsr & (MAC_MSR_SPEED_1000 | MAC_MSR_SPEED_100)) {
+ case 0:
+ state->speed = SPEED_10;
+ break;
+ case MAC_MSR_SPEED_100:
+ state->speed = SPEED_100;
+ break;
+ case MAC_MSR_SPEED_1000:
+ state->speed = SPEED_1000;
+ break;
+ default:
+ state->speed = SPEED_UNKNOWN;
+ break;
}
- mtk_w32(mac->hw, mcr, MTK_MAC_MCR(mac->id));
+ state->pause = 0;
+ if (pmsr & MAC_MSR_RX_FC)
+ state->pause |= MLO_PAUSE_RX;
+ if (pmsr & MAC_MSR_TX_FC)
+ state->pause |= MLO_PAUSE_TX;
- if (!of_phy_is_fixed_link(mac->of_node))
- phy_print_status(dev->phydev);
+ return 1;
}
-static int mtk_phy_connect_node(struct mtk_eth *eth, struct mtk_mac *mac,
- struct device_node *phy_node)
+static void mtk_mac_an_restart(struct phylink_config *config)
{
- struct phy_device *phydev;
- int phy_mode;
+ /* Do nothing */
+}
- phy_mode = of_get_phy_mode(phy_node);
- if (phy_mode < 0) {
- dev_err(eth->dev, "incorrect phy-mode %d\n", phy_mode);
- return -EINVAL;
- }
+static void mtk_mac_link_down(struct phylink_config *config, unsigned int mode,
+ phy_interface_t interface)
+{
+ struct mtk_mac *mac = container_of(config, struct mtk_mac,
+ phylink_config);
- phydev = of_phy_connect(eth->netdev[mac->id], phy_node,
- mtk_phy_link_adjust, 0, phy_mode);
- if (!phydev) {
- dev_err(eth->dev, "could not connect to PHY\n");
- return -ENODEV;
- }
+ mtk_w32(mac->hw, 0x8000, MTK_MAC_MCR(mac->id));
+}
- dev_info(eth->dev,
- "connected mac %d to PHY at %s [uid=%08x, driver=%s]\n",
- mac->id, phydev_name(phydev), phydev->phy_id,
- phydev->drv->name);
+static void mtk_mac_link_up(struct phylink_config *config, unsigned int mode,
+ phy_interface_t interface,
+ struct phy_device *phy)
+{
+ struct mtk_mac *mac = container_of(config, struct mtk_mac,
+ phylink_config);
+ u32 mcr;
- return 0;
+ mcr = mtk_r32(mac->hw, MTK_MAC_MCR(mac->id));
+ mcr |= MAC_MCR_TX_EN | MAC_MCR_RX_EN;
+ mtk_w32(mac->hw, mcr, MTK_MAC_MCR(mac->id));
}
-static int mtk_phy_connect(struct net_device *dev)
+static void mtk_validate(struct phylink_config *config,
+ unsigned long *supported,
+ struct phylink_link_state *state)
{
- struct mtk_mac *mac = netdev_priv(dev);
- struct mtk_eth *eth;
- struct device_node *np;
- u32 val;
- int err;
-
- eth = mac->hw;
- np = of_parse_phandle(mac->of_node, "phy-handle", 0);
- if (!np && of_phy_is_fixed_link(mac->of_node))
- if (!of_phy_register_fixed_link(mac->of_node))
- np = of_node_get(mac->of_node);
- if (!np)
- return -ENODEV;
+ __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
+ struct mtk_mac *mac = container_of(config, struct mtk_mac,
+ phylink_config);
- err = mtk_setup_hw_path(eth, mac->id, of_get_phy_mode(np));
- if (err)
- goto err_phy;
-
- mac->ge_mode = 0;
- switch (of_get_phy_mode(np)) {
- case PHY_INTERFACE_MODE_TRGMII:
- mac->trgmii = true;
- case PHY_INTERFACE_MODE_RGMII_TXID:
- case PHY_INTERFACE_MODE_RGMII_RXID:
- case PHY_INTERFACE_MODE_RGMII_ID:
- case PHY_INTERFACE_MODE_RGMII:
- case PHY_INTERFACE_MODE_SGMII:
- break;
- case PHY_INTERFACE_MODE_MII:
- case PHY_INTERFACE_MODE_GMII:
- mac->ge_mode = 1;
- break;
- case PHY_INTERFACE_MODE_REVMII:
- mac->ge_mode = 2;
- break;
- case PHY_INTERFACE_MODE_RMII:
- if (!mac->id)
- goto err_phy;
- mac->ge_mode = 3;
- break;
- default:
- goto err_phy;
+ if (state->interface != PHY_INTERFACE_MODE_NA &&
+ state->interface != PHY_INTERFACE_MODE_MII &&
+ !(!mac->id && state->interface == PHY_INTERFACE_MODE_TRGMII &&
+ MTK_HAS_CAPS(mac->hw->soc->caps, MTK_TRGMII)) &&
+ !phy_interface_mode_is_rgmii(state->interface)) {
+ linkmode_zero(supported);
+ return;
}
- /* put the gmac into the right mode */
- regmap_read(eth->ethsys, ETHSYS_SYSCFG0, &val);
- val &= ~SYSCFG0_GE_MODE(SYSCFG0_GE_MASK, mac->id);
- val |= SYSCFG0_GE_MODE(mac->ge_mode, mac->id);
- regmap_write(eth->ethsys, ETHSYS_SYSCFG0, val);
+ phylink_set_port_modes(mask);
+ phylink_set(mask, Autoneg);
- /* couple phydev to net_device */
- if (mtk_phy_connect_node(eth, mac, np))
- goto err_phy;
-
- of_node_put(np);
+ if (state->interface == PHY_INTERFACE_MODE_TRGMII) {
+ phylink_set(mask, 1000baseT_Full);
+ } else {
+ phylink_set(mask, 10baseT_Half);
+ phylink_set(mask, 10baseT_Full);
+ phylink_set(mask, 100baseT_Half);
+ phylink_set(mask, 100baseT_Full);
+
+ if (state->interface != PHY_INTERFACE_MODE_MII) {
+ phylink_set(mask, 1000baseT_Half);
+ phylink_set(mask, 1000baseT_Full);
+ }
+ }
- return 0;
+ phylink_set(mask, Pause);
+ phylink_set(mask, Asym_Pause);
-err_phy:
- if (of_phy_is_fixed_link(mac->of_node))
- of_phy_deregister_fixed_link(mac->of_node);
- of_node_put(np);
- dev_err(eth->dev, "%s: invalid phy\n", __func__);
- return -EINVAL;
+ linkmode_and(supported, supported, mask);
+ linkmode_and(state->advertising, state->advertising, mask);
}
+static const struct phylink_mac_ops mtk_phylink_ops = {
+ .validate = mtk_validate,
+ .mac_link_state = mtk_mac_link_state,
+ .mac_an_restart = mtk_mac_an_restart,
+ .mac_config = mtk_mac_config,
+ .mac_link_down = mtk_mac_link_down,
+ .mac_link_up = mtk_mac_link_up,
+};
+
static int mtk_mdio_init(struct mtk_eth *eth)
{
struct device_node *mii_np;
@@ -1798,6 +1853,13 @@ static int mtk_open(struct net_device *dev)
{
struct mtk_mac *mac = netdev_priv(dev);
struct mtk_eth *eth = mac->hw;
+ int err = phylink_of_phy_connect(mac->phylink, mac->of_node, 0);
+
+ if (err) {
+ netdev_err(dev, "%s: could not attach PHY: %d\n", __func__,
+ err);
+ return err;
+ }
/* we run 2 netdevs on the same dma ring so we only bring it up once */
if (!refcount_read(ð->dma_refcnt)) {
@@ -1815,7 +1877,7 @@ static int mtk_open(struct net_device *dev)
else
refcount_inc(ð->dma_refcnt);
- phy_start(dev->phydev);
+ phylink_start(mac->phylink);
netif_start_queue(dev);
return 0;
@@ -1849,8 +1911,11 @@ static int mtk_stop(struct net_device *dev)
struct mtk_mac *mac = netdev_priv(dev);
struct mtk_eth *eth = mac->hw;
+ phylink_stop(mac->phylink);
+
netif_tx_disable(dev);
- phy_stop(dev->phydev);
+
+ phylink_disconnect_phy(mac->phylink);
/* only shutdown DMA if this is the last user */
if (!refcount_dec_and_test(ð->dma_refcnt))
@@ -1926,15 +1991,6 @@ static int mtk_hw_init(struct mtk_eth *eth)
ethsys_reset(eth, RSTCTRL_FE);
ethsys_reset(eth, RSTCTRL_PPE);
- regmap_read(eth->ethsys, ETHSYS_SYSCFG0, &val);
- for (i = 0; i < MTK_MAC_COUNT; i++) {
- if (!eth->mac[i])
- continue;
- val &= ~SYSCFG0_GE_MODE(SYSCFG0_GE_MASK, eth->mac[i]->id);
- val |= SYSCFG0_GE_MODE(eth->mac[i]->ge_mode, eth->mac[i]->id);
- }
- regmap_write(eth->ethsys, ETHSYS_SYSCFG0, val);
-
if (eth->pctl) {
/* Set GE2 driving and slew rate */
regmap_write(eth->pctl, GPIO_DRV_SEL10, 0xa00);
@@ -1979,7 +2035,7 @@ static int mtk_hw_init(struct mtk_eth *eth)
mtk_w32(eth, MTK_RX_DONE_INT, MTK_QDMA_INT_GRP2);
mtk_w32(eth, 0x21021000, MTK_FE_INT_GRP);
- for (i = 0; i < 2; i++) {
+ for (i = 0; i < MTK_MAC_COUNT; i++) {
u32 val = mtk_r32(eth, MTK_GDMA_FWD_CFG(i));
/* setup the forward port to send frame to PDMA */
@@ -2031,7 +2087,7 @@ static int __init mtk_init(struct net_device *dev)
dev->dev_addr);
}
- return mtk_phy_connect(dev);
+ return 0;
}
static void mtk_uninit(struct net_device *dev)
@@ -2039,20 +2095,20 @@ static void mtk_uninit(struct net_device *dev)
struct mtk_mac *mac = netdev_priv(dev);
struct mtk_eth *eth = mac->hw;
- phy_disconnect(dev->phydev);
- if (of_phy_is_fixed_link(mac->of_node))
- of_phy_deregister_fixed_link(mac->of_node);
+ phylink_disconnect_phy(mac->phylink);
mtk_tx_irq_disable(eth, ~0);
mtk_rx_irq_disable(eth, ~0);
}
static int mtk_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
{
+ struct mtk_mac *mac = netdev_priv(dev);
+
switch (cmd) {
case SIOCGMIIPHY:
case SIOCGMIIREG:
case SIOCSMIIREG:
- return phy_mii_ioctl(dev->phydev, ifr, cmd);
+ return phylink_mii_ioctl(mac->phylink, ifr, cmd);
default:
break;
}
@@ -2093,16 +2149,6 @@ static void mtk_pending_work(struct work_struct *work)
eth->dev->pins->default_state);
mtk_hw_init(eth);
- for (i = 0; i < MTK_MAC_COUNT; i++) {
- if (!eth->mac[i] ||
- of_phy_is_fixed_link(eth->mac[i]->of_node))
- continue;
- err = phy_init_hw(eth->netdev[i]->phydev);
- if (err)
- dev_err(eth->dev, "%s: PHY init failed.\n",
- eth->netdev[i]->name);
- }
-
/* restart DMA and enable IRQs */
for (i = 0; i < MTK_MAC_COUNT; i++) {
if (!test_bit(i, &restart))
@@ -2165,9 +2211,7 @@ static int mtk_get_link_ksettings(struct net_device *ndev,
if (unlikely(test_bit(MTK_RESETTING, &mac->hw->state)))
return -EBUSY;
- phy_ethtool_ksettings_get(ndev->phydev, cmd);
-
- return 0;
+ return phylink_ethtool_ksettings_get(mac->phylink, cmd);
}
static int mtk_set_link_ksettings(struct net_device *ndev,
@@ -2178,7 +2222,7 @@ static int mtk_set_link_ksettings(struct net_device *ndev,
if (unlikely(test_bit(MTK_RESETTING, &mac->hw->state)))
return -EBUSY;
- return phy_ethtool_ksettings_set(ndev->phydev, cmd);
+ return phylink_ethtool_ksettings_set(mac->phylink, cmd);
}
static void mtk_get_drvinfo(struct net_device *dev,
@@ -2212,22 +2256,10 @@ static int mtk_nway_reset(struct net_device *dev)
if (unlikely(test_bit(MTK_RESETTING, &mac->hw->state)))
return -EBUSY;
- return genphy_restart_aneg(dev->phydev);
-}
-
-static u32 mtk_get_link(struct net_device *dev)
-{
- struct mtk_mac *mac = netdev_priv(dev);
- int err;
-
- if (unlikely(test_bit(MTK_RESETTING, &mac->hw->state)))
- return -EBUSY;
-
- err = genphy_update_link(dev->phydev);
- if (err)
- return ethtool_op_get_link(dev);
+ if (!mac->phylink)
+ return -ENOTSUPP;
- return dev->phydev->link;
+ return phylink_ethtool_nway_reset(mac->phylink);
}
static void mtk_get_strings(struct net_device *dev, u32 stringset, u8 *data)
@@ -2347,7 +2379,7 @@ static const struct ethtool_ops mtk_ethtool_ops = {
.get_msglevel = mtk_get_msglevel,
.set_msglevel = mtk_set_msglevel,
.nway_reset = mtk_nway_reset,
- .get_link = mtk_get_link,
+ .get_link = ethtool_op_get_link,
.get_strings = mtk_get_strings,
.get_sset_count = mtk_get_sset_count,
.get_ethtool_stats = mtk_get_ethtool_stats,
@@ -2375,9 +2407,10 @@ static const struct net_device_ops mtk_netdev_ops = {
static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np)
{
+ struct phylink *phylink;
struct mtk_mac *mac;
const __be32 *_id = of_get_property(np, "reg", NULL);
- int id, err;
+ int phy_mode, id, err;
if (!_id) {
dev_err(eth->dev, "missing mac id\n");
@@ -2421,6 +2454,32 @@ static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np)
u64_stats_init(&mac->hw_stats->syncp);
mac->hw_stats->reg_offset = id * MTK_STAT_OFFSET;
+ /* phylink create */
+ phy_mode = of_get_phy_mode(np);
+ if (phy_mode < 0) {
+ dev_err(eth->dev, "incorrect phy-mode\n");
+ err = -EINVAL;
+ goto free_netdev;
+ }
+
+ /* mac config is not set */
+ mac->interface = PHY_INTERFACE_MODE_NA;
+ mac->mode = MLO_AN_PHY;
+ mac->speed = SPEED_UNKNOWN;
+
+ mac->phylink_config.dev = ð->netdev[id]->dev;
+ mac->phylink_config.type = PHYLINK_NETDEV;
+
+ phylink = phylink_create(&mac->phylink_config,
+ of_fwnode_handle(mac->of_node),
+ phy_mode, &mtk_phylink_ops);
+ if (IS_ERR(phylink)) {
+ err = PTR_ERR(phylink);
+ goto free_netdev;
+ }
+
+ mac->phylink = phylink;
+
SET_NETDEV_DEV(eth->netdev[id], eth->dev);
eth->netdev[id]->watchdog_timeo = 5 * HZ;
eth->netdev[id]->netdev_ops = &mtk_netdev_ops;
@@ -2617,6 +2676,7 @@ static int mtk_probe(struct platform_device *pdev)
static int mtk_remove(struct platform_device *pdev)
{
struct mtk_eth *eth = platform_get_drvdata(pdev);
+ struct mtk_mac *mac;
int i;
/* stop all devices to make sure that dma is properly shut down */
@@ -2624,6 +2684,8 @@ static int mtk_remove(struct platform_device *pdev)
if (!eth->netdev[i])
continue;
mtk_stop(eth->netdev[i]);
+ mac = netdev_priv(eth->netdev[i]);
+ phylink_disconnect_phy(mac->phylink);
}
mtk_hw_deinit(eth);
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index bab94f763e2c..3bfcba9ffb58 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -14,6 +14,7 @@
#include <linux/of_net.h>
#include <linux/u64_stats_sync.h>
#include <linux/refcount.h>
+#include <linux/phylink.h>
#define MTK_QDMA_PAGE_SIZE 2048
#define MTK_MAX_RX_LENGTH 1536
@@ -320,12 +321,18 @@
#define MAC_MCR_SPEED_100 BIT(2)
#define MAC_MCR_FORCE_DPX BIT(1)
#define MAC_MCR_FORCE_LINK BIT(0)
-#define MAC_MCR_FIXED_LINK (MAC_MCR_MAX_RX_1536 | MAC_MCR_IPG_CFG | \
- MAC_MCR_FORCE_MODE | MAC_MCR_TX_EN | \
- MAC_MCR_RX_EN | MAC_MCR_BACKOFF_EN | \
- MAC_MCR_BACKPR_EN | MAC_MCR_FORCE_RX_FC | \
- MAC_MCR_FORCE_TX_FC | MAC_MCR_SPEED_1000 | \
- MAC_MCR_FORCE_DPX | MAC_MCR_FORCE_LINK)
+
+/* Mac status registers */
+#define MTK_MAC_MSR(x) (0x10108 + (x * 0x100))
+#define MAC_MSR_EEE1G BIT(7)
+#define MAC_MSR_EEE100M BIT(6)
+#define MAC_MSR_RX_FC BIT(5)
+#define MAC_MSR_TX_FC BIT(4)
+#define MAC_MSR_SPEED_1000 BIT(3)
+#define MAC_MSR_SPEED_100 BIT(2)
+#define MAC_MSR_SPEED_MASK (MAC_MSR_SPEED_1000 | MAC_MSR_SPEED_100)
+#define MAC_MSR_DPX BIT(1)
+#define MAC_MSR_LINK BIT(0)
/* TRGMII RXC control register */
#define TRGMII_RCK_CTRL 0x10300
@@ -815,22 +822,23 @@ struct mtk_eth {
/* struct mtk_mac - the structure that holds the info about the MACs of the
* SoC
* @id: The number of the MAC
- * @ge_mode: Interface mode kept for setup restoring
+ * @interface: Interface mode kept for detecting change in hw settings
* @of_node: Our devicetree node
* @hw: Backpointer to our main datastruture
* @hw_stats: Packet statistics counter
- * @trgmii Indicate if the MAC uses TRGMII connected to internal
- switch
*/
struct mtk_mac {
int id;
- int ge_mode;
+ phy_interface_t interface;
+ unsigned int mode;
+ int speed;
struct device_node *of_node;
+ struct phylink *phylink;
+ struct phylink_config phylink_config;
struct mtk_eth *hw;
struct mtk_hw_stats *hw_stats;
__be32 hwlro_ip[MTK_MAX_LRO_IP_CNT];
int hwlro_ip_cnt;
- bool trgmii;
};
/* the struct describing the SoC. these are declared in the soc_xyz.c files */
--
2.20.1
^ permalink raw reply related
* [PATCH net-next 0/3] net: dsa: MT7530: Convert to PHYLINK and add support for port 5
From: René van Dorst @ 2019-07-24 19:25 UTC (permalink / raw)
To: netdev
Cc: frank-w, sean.wang, f.fainelli, linux, davem, matthias.bgg,
andrew, vivien.didelot, john, linux-mediatek, linux-mips, robh+dt,
devicetree, René van Dorst
1. net: dsa: mt7530: Convert to PHYLINK API
This patch converts mt7530 to PHYLINK API.
2. dt-bindings: net: dsa: mt7530: Add support for port 5
3. net: dsa: mt7530: Add support for port 5
These 2 patches adding support for port 5 of the switch.
rfc -> v1:
* Mostly phylink improvements after review.
* Drop phy isolation patches. Adds no value for now.
René van Dorst (3):
net: dsa: mt7530: Convert to PHYLINK API
dt-bindings: net: dsa: mt7530: Add support for port 5
net: dsa: mt7530: Add support for port 5
.../devicetree/bindings/net/dsa/mt7530.txt | 215 +++++++++++
drivers/net/dsa/mt7530.c | 356 +++++++++++++++---
drivers/net/dsa/mt7530.h | 60 ++-
3 files changed, 561 insertions(+), 70 deletions(-)
To: <netdev@vger.kernel.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Cc: Frank Wunderlich <frank-w@public-files.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-mediatek@lists.infradead.org
Cc: linux-mips@vger.kernel.org
Cc: John Crispin <john@phrozen.org>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Vivien Didelot <vivien.didelot@gmail.com>
--
2.20.1
^ permalink raw reply
* [PATCH bpf-next 00/10] CO-RE offset relocations
From: Andrii Nakryiko @ 2019-07-24 19:27 UTC (permalink / raw)
To: bpf, netdev, ast, daniel, yhs
Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
This patch set implements central part of CO-RE (Compile Once - Run
Everywhere, see [0] and [1] for slides and video): relocating field offsets.
Most of the details are written down as comments to corresponding parts of the
code.
Patch #1 adds loading of .BTF.ext offset relocations section and macros to
work with its contents.
Patch #2 implements CO-RE relocations algorithm in libbpf.
Patches #3-#10 adds selftests validating various parts of relocation handling,
type compatibility, etc.
For all tests to work, you'll need latest Clang/LLVM supporting
__builtin_preserve_access_index intrinsic, used for recording offset
relocations. Kernel on which selftests run should have BTF information built
in (CONFIG_DEBUG_INFO_BTF=y).
[0] http://vger.kernel.org/bpfconf2019.html#session-2
[1] http://vger.kernel.org/lpc-bpf2018.html#session-2CO-RE relocations
This patch set implements central part of CO-RE (Compile Once - Run
Everywhere, see [0] and [1] for slides and video): relocating field offsets.
Most of the details are written down as comments to corresponding parts of the
code.
Patch #1 adds loading of .BTF.ext offset relocations section and macros to
work with its contents.
Patch #2 implements CO-RE relocations algorithm in libbpf.
Patches #3-#10 adds selftests validating various parts of relocation handling,
type compatibility, etc.
For all tests to work, you'll need latest Clang/LLVM supporting
__builtin_preserve_access_index intrinsic, used for recording offset
relocations. Kernel on which selftests run should have BTF information built
in (CONFIG_DEBUG_INFO_BTF=y).
[0] http://vger.kernel.org/bpfconf2019.html#session-2
[1] http://vger.kernel.org/lpc-bpf2018.html#session-2
Andrii Nakryiko (10):
libbpf: add .BTF.ext offset relocation section loading
libbpf: implement BPF CO-RE offset relocation algorithm
selftests/bpf: add CO-RE relocs testing setup
selftests/bpf: add CO-RE relocs struct flavors tests
selftests/bpf: add CO-RE relocs nesting tests
selftests/bpf: add CO-RE relocs array tests
selftests/bpf: add CO-RE relocs enum/ptr/func_proto tests
selftests/bpf: add CO-RE relocs modifiers/typedef tests
selftest/bpf: add CO-RE relocs ptr-as-array tests
selftests/bpf: add CO-RE relocs ints tests
tools/lib/bpf/btf.c | 64 +-
tools/lib/bpf/btf.h | 4 +
tools/lib/bpf/libbpf.c | 866 +++++++++++++++++-
tools/lib/bpf/libbpf.h | 1 +
tools/lib/bpf/libbpf_internal.h | 91 ++
.../selftests/bpf/prog_tests/core_reloc.c | 363 ++++++++
.../bpf/progs/btf__core_reloc_arrays.c | 3 +
.../btf__core_reloc_arrays___diff_arr_dim.c | 3 +
...btf__core_reloc_arrays___diff_arr_val_sz.c | 3 +
.../btf__core_reloc_arrays___err_non_array.c | 3 +
...btf__core_reloc_arrays___err_too_shallow.c | 3 +
.../btf__core_reloc_arrays___err_too_small.c | 3 +
..._core_reloc_arrays___err_wrong_val_type1.c | 3 +
..._core_reloc_arrays___err_wrong_val_type2.c | 3 +
.../bpf/progs/btf__core_reloc_flavors.c | 3 +
.../btf__core_reloc_flavors__err_wrong_name.c | 3 +
.../bpf/progs/btf__core_reloc_ints.c | 3 +
.../bpf/progs/btf__core_reloc_ints___bool.c | 3 +
.../btf__core_reloc_ints___err_bitfield.c | 3 +
.../btf__core_reloc_ints___err_wrong_sz_16.c | 3 +
.../btf__core_reloc_ints___err_wrong_sz_32.c | 3 +
.../btf__core_reloc_ints___err_wrong_sz_64.c | 3 +
.../btf__core_reloc_ints___err_wrong_sz_8.c | 3 +
.../btf__core_reloc_ints___reverse_sign.c | 3 +
.../bpf/progs/btf__core_reloc_mods.c | 3 +
.../progs/btf__core_reloc_mods___mod_swap.c | 3 +
.../progs/btf__core_reloc_mods___typedefs.c | 3 +
.../bpf/progs/btf__core_reloc_nesting.c | 3 +
.../btf__core_reloc_nesting___anon_embed.c | 3 +
...f__core_reloc_nesting___dup_compat_types.c | 5 +
...core_reloc_nesting___err_array_container.c | 3 +
...tf__core_reloc_nesting___err_array_field.c | 3 +
...e_reloc_nesting___err_dup_incompat_types.c | 4 +
...re_reloc_nesting___err_missing_container.c | 3 +
...__core_reloc_nesting___err_missing_field.c | 3 +
..._reloc_nesting___err_nonstruct_container.c | 3 +
...e_reloc_nesting___err_partial_match_dups.c | 4 +
.../btf__core_reloc_nesting___err_too_deep.c | 3 +
.../btf__core_reloc_nesting___extra_nesting.c | 3 +
..._core_reloc_nesting___struct_union_mixup.c | 3 +
.../bpf/progs/btf__core_reloc_primitives.c | 3 +
...f__core_reloc_primitives___diff_enum_def.c | 3 +
..._core_reloc_primitives___diff_func_proto.c | 3 +
...f__core_reloc_primitives___diff_ptr_type.c | 3 +
...tf__core_reloc_primitives___err_non_enum.c | 3 +
...btf__core_reloc_primitives___err_non_int.c | 3 +
...btf__core_reloc_primitives___err_non_ptr.c | 3 +
.../bpf/progs/btf__core_reloc_ptr_as_arr.c | 3 +
.../btf__core_reloc_ptr_as_arr___diff_sz.c | 3 +
.../selftests/bpf/progs/core_reloc_types.h | 642 +++++++++++++
.../bpf/progs/test_core_reloc_arrays.c | 58 ++
.../bpf/progs/test_core_reloc_flavors.c | 65 ++
.../bpf/progs/test_core_reloc_ints.c | 48 +
.../bpf/progs/test_core_reloc_kernel.c | 39 +
.../bpf/progs/test_core_reloc_mods.c | 68 ++
.../bpf/progs/test_core_reloc_nesting.c | 48 +
.../bpf/progs/test_core_reloc_primitives.c | 50 +
.../bpf/progs/test_core_reloc_ptr_as_arr.c | 34 +
58 files changed, 2527 insertions(+), 47 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/core_reloc.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___diff_arr_dim.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___diff_arr_val_sz.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_non_array.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_too_shallow.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_too_small.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_wrong_val_type1.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_arrays___err_wrong_val_type2.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_flavors.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_flavors__err_wrong_name.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___bool.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_bitfield.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_16.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_32.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_64.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___err_wrong_sz_8.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ints___reverse_sign.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods___mod_swap.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_mods___typedefs.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___anon_embed.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___dup_compat_types.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_array_container.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_array_field.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_dup_incompat_types.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_missing_container.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_missing_field.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_nonstruct_container.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_partial_match_dups.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___err_too_deep.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___extra_nesting.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_nesting___struct_union_mixup.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_enum_def.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_func_proto.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___diff_ptr_type.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_enum.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_int.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_primitives___err_non_ptr.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ptr_as_arr.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_ptr_as_arr___diff_sz.c
create mode 100644 tools/testing/selftests/bpf/progs/core_reloc_types.h
create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_arrays.c
create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_flavors.c
create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_ints.c
create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_kernel.c
create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_mods.c
create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_nesting.c
create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_primitives.c
create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_ptr_as_arr.c
--
2.17.1
^ permalink raw reply
* [PATCH bpf-next 01/10] libbpf: add .BTF.ext offset relocation section loading
From: Andrii Nakryiko @ 2019-07-24 19:27 UTC (permalink / raw)
To: bpf, netdev, ast, daniel, yhs
Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190724192742.1419254-1-andriin@fb.com>
Add support for BPF CO-RE offset relocations. Add section/record
iteration macros for .BTF.ext. These macro are useful for iterating over
each .BTF.ext record, either for dumping out contents or later for BPF
CO-RE relocation handling.
To enable other parts of libbpf to work with .BTF.ext contents, moved
a bunch of type definitions into libbpf_internal.h.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
tools/lib/bpf/btf.c | 64 +++++++++--------------
tools/lib/bpf/btf.h | 4 ++
tools/lib/bpf/libbpf_internal.h | 91 +++++++++++++++++++++++++++++++++
3 files changed, 118 insertions(+), 41 deletions(-)
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 467224feb43b..4a36bc783848 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -42,47 +42,6 @@ struct btf {
int fd;
};
-struct btf_ext_info {
- /*
- * info points to the individual info section (e.g. func_info and
- * line_info) from the .BTF.ext. It does not include the __u32 rec_size.
- */
- void *info;
- __u32 rec_size;
- __u32 len;
-};
-
-struct btf_ext {
- union {
- struct btf_ext_header *hdr;
- void *data;
- };
- struct btf_ext_info func_info;
- struct btf_ext_info line_info;
- __u32 data_size;
-};
-
-struct btf_ext_info_sec {
- __u32 sec_name_off;
- __u32 num_info;
- /* Followed by num_info * record_size number of bytes */
- __u8 data[0];
-};
-
-/* The minimum bpf_func_info checked by the loader */
-struct bpf_func_info_min {
- __u32 insn_off;
- __u32 type_id;
-};
-
-/* The minimum bpf_line_info checked by the loader */
-struct bpf_line_info_min {
- __u32 insn_off;
- __u32 file_name_off;
- __u32 line_off;
- __u32 line_col;
-};
-
static inline __u64 ptr_to_u64(const void *ptr)
{
return (__u64) (unsigned long) ptr;
@@ -831,6 +790,9 @@ static int btf_ext_setup_info(struct btf_ext *btf_ext,
/* The start of the info sec (including the __u32 record_size). */
void *info;
+ if (ext_sec->len == 0)
+ return 0;
+
if (ext_sec->off & 0x03) {
pr_debug(".BTF.ext %s section is not aligned to 4 bytes\n",
ext_sec->desc);
@@ -934,6 +896,19 @@ static int btf_ext_setup_line_info(struct btf_ext *btf_ext)
return btf_ext_setup_info(btf_ext, ¶m);
}
+static int btf_ext_setup_offset_reloc(struct btf_ext *btf_ext)
+{
+ struct btf_ext_sec_setup_param param = {
+ .off = btf_ext->hdr->offset_reloc_off,
+ .len = btf_ext->hdr->offset_reloc_len,
+ .min_rec_size = sizeof(struct bpf_offset_reloc),
+ .ext_info = &btf_ext->offset_reloc_info,
+ .desc = "offset_reloc",
+ };
+
+ return btf_ext_setup_info(btf_ext, ¶m);
+}
+
static int btf_ext_parse_hdr(__u8 *data, __u32 data_size)
{
const struct btf_ext_header *hdr = (struct btf_ext_header *)data;
@@ -1004,6 +979,13 @@ struct btf_ext *btf_ext__new(__u8 *data, __u32 size)
if (err)
goto done;
+ /* check if there is offset_reloc_off/offset_reloc_len fields */
+ if (btf_ext->hdr->hdr_len < sizeof(struct btf_ext_header))
+ goto done;
+ err = btf_ext_setup_offset_reloc(btf_ext);
+ if (err)
+ goto done;
+
done:
if (err) {
btf_ext__free(btf_ext);
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index 88a52ae56fc6..287361ee1f6b 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -57,6 +57,10 @@ struct btf_ext_header {
__u32 func_info_len;
__u32 line_info_off;
__u32 line_info_len;
+
+ /* optional part of .BTF.ext header */
+ __u32 offset_reloc_off;
+ __u32 offset_reloc_len;
};
LIBBPF_API void btf__free(struct btf *btf);
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index 2ac29bd36226..087ff512282f 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -46,4 +46,95 @@ do { \
int libbpf__load_raw_btf(const char *raw_types, size_t types_len,
const char *str_sec, size_t str_len);
+struct btf_ext_info {
+ /*
+ * info points to the individual info section (e.g. func_info and
+ * line_info) from the .BTF.ext. It does not include the __u32 rec_size.
+ */
+ void *info;
+ __u32 rec_size;
+ __u32 len;
+};
+
+#define for_each_btf_ext_sec(seg, sec) \
+ for (sec = (seg)->info; \
+ (void *)sec < (seg)->info + (seg)->len; \
+ sec = (void *)sec + sizeof(struct btf_ext_info_sec) + \
+ (seg)->rec_size * sec->num_info)
+
+#define for_each_btf_ext_rec(seg, sec, i, rec) \
+ for (i = 0, rec = (void *)&(sec)->data; \
+ i < (sec)->num_info; \
+ i++, rec = (void *)rec + (seg)->rec_size)
+
+struct btf_ext {
+ union {
+ struct btf_ext_header *hdr;
+ void *data;
+ };
+ struct btf_ext_info func_info;
+ struct btf_ext_info line_info;
+ struct btf_ext_info offset_reloc_info;
+ __u32 data_size;
+};
+
+struct btf_ext_info_sec {
+ __u32 sec_name_off;
+ __u32 num_info;
+ /* Followed by num_info * record_size number of bytes */
+ __u8 data[0];
+};
+
+/* The minimum bpf_func_info checked by the loader */
+struct bpf_func_info_min {
+ __u32 insn_off;
+ __u32 type_id;
+};
+
+/* The minimum bpf_line_info checked by the loader */
+struct bpf_line_info_min {
+ __u32 insn_off;
+ __u32 file_name_off;
+ __u32 line_off;
+ __u32 line_col;
+};
+
+/* The minimum bpf_offset_reloc checked by the loader
+ *
+ * Offset relocation captures the following data:
+ * - insn_off - instruction offset (in bytes) within a BPF program that needs
+ * its insn->imm field to be relocated with actual offset;
+ * - type_id - BTF type ID of the "root" (containing) entity of a relocatable
+ * offset;
+ * - access_str_off - offset into corresponding .BTF string section. String
+ * itself encodes an accessed field using a sequence of field and array
+ * indicies, separated by colon (:). It's conceptually very close to LLVM's
+ * getelementptr ([0]) instruction's arguments for identifying offset to
+ * a field.
+ *
+ * Example to provide a better feel.
+ *
+ * struct sample {
+ * int a;
+ * struct {
+ * int b[10];
+ * };
+ * };
+ *
+ * struct sample *s = ...;
+ * int x = &s->a; // encoded as "0:0" (a is field #0)
+ * int y = &s->b[5]; // encoded as "0:1:5" (b is field #1, arr elem #5)
+ * int z = &s[10]->b; // encoded as "10:1" (ptr is used as an array)
+ *
+ * type_id for all relocs in this example will capture BTF type id of
+ * `struct sample`.
+ *
+ * [0] https://llvm.org/docs/LangRef.html#getelementptr-instruction
+ */
+struct bpf_offset_reloc {
+ __u32 insn_off;
+ __u32 type_id;
+ __u32 access_str_off;
+};
+
#endif /* __LIBBPF_LIBBPF_INTERNAL_H */
--
2.17.1
^ permalink raw reply related
* [PATCH bpf-next 03/10] selftests/bpf: add CO-RE relocs testing setup
From: Andrii Nakryiko @ 2019-07-24 19:27 UTC (permalink / raw)
To: bpf, netdev, ast, daniel, yhs
Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190724192742.1419254-1-andriin@fb.com>
Add CO-RE relocation test runner. Add one simple test validating that
libbpf's logic for searching for kernel image and loading BTF out of it
works.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
.../selftests/bpf/prog_tests/core_reloc.c | 126 ++++++++++++++++++
.../bpf/progs/test_core_reloc_kernel.c | 39 ++++++
2 files changed, 165 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/core_reloc.c
create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_kernel.c
diff --git a/tools/testing/selftests/bpf/prog_tests/core_reloc.c b/tools/testing/selftests/bpf/prog_tests/core_reloc.c
new file mode 100644
index 000000000000..b8d6ea578b20
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/core_reloc.c
@@ -0,0 +1,126 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+
+struct core_reloc_test_case {
+ const char *case_name;
+ const char *bpf_obj_file;
+ const char *btf_src_file;
+ const char *input;
+ int input_len;
+ const char *output;
+ int output_len;
+ bool fails;
+};
+
+static struct core_reloc_test_case test_cases[] = {
+ /* validate we can find kernel image and use its BTF for relocs */
+ {
+ .case_name = "kernel",
+ .bpf_obj_file = "test_core_reloc_kernel.o",
+ .btf_src_file = NULL, /* load from /lib/modules/$(uname -r) */
+ .input = "",
+ .input_len = 0,
+ .output = "\1", /* true */
+ .output_len = 1,
+ },
+};
+
+struct data {
+ char in[256];
+ char out[256];
+};
+
+void test_core_reloc(void)
+{
+ const char *probe_name = "raw_tracepoint/sys_enter";
+ struct bpf_object_load_attr load_attr = {};
+ struct core_reloc_test_case *test_case;
+ int err, duration = 0, i, equal;
+ struct bpf_link *link = NULL;
+ struct bpf_map *data_map;
+ struct bpf_program *prog;
+ struct bpf_object *obj;
+ const int zero = 0;
+ struct data data;
+
+ for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+ test_case = &test_cases[i];
+
+ obj = bpf_object__open(test_case->bpf_obj_file);
+ if (CHECK(IS_ERR_OR_NULL(obj), "obj_open",
+ "case #%d: failed to open '%s': %ld\n",
+ i, test_case->bpf_obj_file, PTR_ERR(obj)))
+ return;
+
+ prog = bpf_object__find_program_by_title(obj, probe_name);
+ if (CHECK(!prog, "find_probe",
+ "case #%d: prog '%s' not found\n",
+ i, probe_name))
+ goto cleanup;
+ bpf_program__set_type(prog, BPF_PROG_TYPE_RAW_TRACEPOINT);
+
+ load_attr.obj = obj;
+ load_attr.log_level = 0;
+ load_attr.target_btf_path = test_case->btf_src_file;
+ err = bpf_object__load_xattr(&load_attr);
+ if (test_case->fails) {
+ CHECK(!err, "obj_load_fail",
+ "case #%d: should fail to load prog '%s'\n",
+ i, probe_name);
+ goto cleanup;
+ } else {
+ if (CHECK(err, "obj_load",
+ "case #%d: failed to load prog '%s': %d\n",
+ i, probe_name, err))
+ goto cleanup;
+ }
+
+ link = bpf_program__attach_raw_tracepoint(prog, "sys_enter");
+ if (CHECK(IS_ERR(link), "attach_raw_tp", "case #%d: err %ld\n",
+ i, PTR_ERR(link)))
+ goto cleanup;
+
+ data_map = bpf_object__find_map_by_name(obj, "test_cor.bss");
+ if (CHECK(!data_map, "find_data_map",
+ "case #%d: failed to find data map\n", i))
+ goto cleanup;
+
+ memset(&data, 0, sizeof(data));
+ memcpy(data.in, test_case->input, test_case->input_len);
+
+ err = bpf_map_update_elem(bpf_map__fd(data_map),
+ &zero, &data, 0);
+ if (CHECK(err, "update_data_map",
+ "case #%d: failed to update .data map: %d\n",
+ i, err))
+ goto cleanup;
+
+ /* trigger test run */
+ usleep(1);
+
+ err = bpf_map_lookup_elem(bpf_map__fd(data_map), &zero, &data);
+ if (CHECK(err, "get_result",
+ "case #%d: failed to get output data: %d\n", i, err))
+ goto cleanup;
+
+ equal = memcmp(data.out, test_case->output,
+ test_case->output_len) == 0;
+ if (CHECK(!equal, "check_result",
+ "case #%d: input/output data don't match\n", i)) {
+ int j;
+
+ for (j = 0; j < test_case->output_len; j++) {
+ printf("case #%d: byte #%d, EXP 0x%02hhx GOT 0x%02hhx\n",
+ i, j, test_case->output[j], data.out[j]);
+ }
+ goto cleanup;
+ }
+
+cleanup:
+ if (!IS_ERR_OR_NULL(link)) {
+ bpf_link__destroy(link);
+ link = NULL;
+ }
+ bpf_object__close(obj);
+ }
+}
diff --git a/tools/testing/selftests/bpf/progs/test_core_reloc_kernel.c b/tools/testing/selftests/bpf/progs/test_core_reloc_kernel.c
new file mode 100644
index 000000000000..9a71080ce4ec
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_core_reloc_kernel.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+
+#include <linux/bpf.h>
+#include <stdint.h>
+#include "bpf_helpers.h"
+
+char _license[] SEC("license") = "GPL";
+
+static volatile struct data {
+ char in[256];
+ char out[256];
+} data;
+
+struct task_struct {
+ int pid;
+ int tgid;
+};
+
+SEC("raw_tracepoint/sys_enter")
+int test_core_nesting(void *ctx)
+{
+ struct task_struct *task = (void *)bpf_get_current_task();
+ uint64_t pid_tgid = bpf_get_current_pid_tgid();
+ int pid, tgid;
+
+ if (bpf_probe_read(&pid, sizeof(pid),
+ __builtin_preserve_access_index(&task->pid)))
+ return 1;
+ if (bpf_probe_read(&tgid, sizeof(tgid),
+ __builtin_preserve_access_index(&task->tgid)))
+ return 1;
+
+ /* validate pid + tgid matches */
+ data.out[0] = (((uint64_t)pid << 32) | tgid) == pid_tgid;
+
+ return 0;
+}
+
--
2.17.1
^ permalink raw reply related
* [PATCH bpf-next 04/10] selftests/bpf: add CO-RE relocs struct flavors tests
From: Andrii Nakryiko @ 2019-07-24 19:27 UTC (permalink / raw)
To: bpf, netdev, ast, daniel, yhs
Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190724192742.1419254-1-andriin@fb.com>
Add tests verifying that BPF program can use various struct/union
"flavors" to extract data from the same target struct/union.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
.../selftests/bpf/prog_tests/core_reloc.c | 34 ++++++++++
.../bpf/progs/btf__core_reloc_flavors.c | 3 +
.../btf__core_reloc_flavors__err_wrong_name.c | 3 +
.../selftests/bpf/progs/core_reloc_types.h | 15 +++++
.../bpf/progs/test_core_reloc_flavors.c | 65 +++++++++++++++++++
5 files changed, 120 insertions(+)
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_flavors.c
create mode 100644 tools/testing/selftests/bpf/progs/btf__core_reloc_flavors__err_wrong_name.c
create mode 100644 tools/testing/selftests/bpf/progs/core_reloc_types.h
create mode 100644 tools/testing/selftests/bpf/progs/test_core_reloc_flavors.c
diff --git a/tools/testing/selftests/bpf/prog_tests/core_reloc.c b/tools/testing/selftests/bpf/prog_tests/core_reloc.c
index b8d6ea578b20..c553c5f07ec3 100644
--- a/tools/testing/selftests/bpf/prog_tests/core_reloc.c
+++ b/tools/testing/selftests/bpf/prog_tests/core_reloc.c
@@ -1,5 +1,32 @@
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
+#include "progs/core_reloc_types.h"
+
+#define STRUCT_TO_CHAR_PTR(struct_name) (const char *)&(struct struct_name)
+
+#define FLAVORS_DATA(struct_name) STRUCT_TO_CHAR_PTR(struct_name) { \
+ .a = 42, \
+ .b = 0xc001, \
+ .c = 0xbeef, \
+}
+
+#define FLAVORS_CASE_COMMON(name) \
+ .case_name = #name, \
+ .bpf_obj_file = "test_core_reloc_flavors.o", \
+ .btf_src_file = "btf__core_reloc_" #name ".o" \
+
+#define FLAVORS_CASE(name) { \
+ FLAVORS_CASE_COMMON(name), \
+ .input = FLAVORS_DATA(core_reloc_##name), \
+ .input_len = sizeof(struct core_reloc_##name), \
+ .output = FLAVORS_DATA(core_reloc_flavors), \
+ .output_len = sizeof(struct core_reloc_flavors), \
+}
+
+#define FLAVORS_ERR_CASE(name) { \
+ FLAVORS_CASE_COMMON(name), \
+ .fails = true, \
+}
struct core_reloc_test_case {
const char *case_name;
@@ -23,6 +50,13 @@ static struct core_reloc_test_case test_cases[] = {
.output = "\1", /* true */
.output_len = 1,
},
+
+ /* validate BPF program can use multiple flavors to match against
+ * single target BTF type
+ */
+ FLAVORS_CASE(flavors),
+
+ FLAVORS_ERR_CASE(flavors__err_wrong_name),
};
struct data {
diff --git a/tools/testing/selftests/bpf/progs/btf__core_reloc_flavors.c b/tools/testing/selftests/bpf/progs/btf__core_reloc_flavors.c
new file mode 100644
index 000000000000..b74455b91227
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/btf__core_reloc_flavors.c
@@ -0,0 +1,3 @@
+#include "core_reloc_types.h"
+
+void f(struct core_reloc_flavors x) {}
diff --git a/tools/testing/selftests/bpf/progs/btf__core_reloc_flavors__err_wrong_name.c b/tools/testing/selftests/bpf/progs/btf__core_reloc_flavors__err_wrong_name.c
new file mode 100644
index 000000000000..7b6035f86ee6
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/btf__core_reloc_flavors__err_wrong_name.c
@@ -0,0 +1,3 @@
+#include "core_reloc_types.h"
+
+void f(struct core_reloc_flavors__err_wrong_name x) {}
diff --git a/tools/testing/selftests/bpf/progs/core_reloc_types.h b/tools/testing/selftests/bpf/progs/core_reloc_types.h
new file mode 100644
index 000000000000..33b0c6a61912
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/core_reloc_types.h
@@ -0,0 +1,15 @@
+/*
+ * FLAVORS
+ */
+struct core_reloc_flavors {
+ int a;
+ int b;
+ int c;
+};
+
+/* this is not a flavor, as it doesn't have triple underscore */
+struct core_reloc_flavors__err_wrong_name {
+ int a;
+ int b;
+ int c;
+};
diff --git a/tools/testing/selftests/bpf/progs/test_core_reloc_flavors.c b/tools/testing/selftests/bpf/progs/test_core_reloc_flavors.c
new file mode 100644
index 000000000000..92660344518d
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_core_reloc_flavors.c
@@ -0,0 +1,65 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+
+#include <linux/bpf.h>
+#include <stdint.h>
+#include "bpf_helpers.h"
+
+char _license[] SEC("license") = "GPL";
+
+static volatile struct data {
+ char in[256];
+ char out[256];
+} data;
+
+struct core_reloc_flavors {
+ int a;
+ int b;
+ int c;
+};
+
+/* local flavor with reversed layout */
+struct core_reloc_flavors___reversed {
+ int c;
+ int b;
+ int a;
+};
+
+/* local flavor with nested/overlapping layout */
+struct core_reloc_flavors___weird {
+ struct {
+ int b;
+ };
+ /* a and c overlap in local flavor, but this should still work
+ * correctly with target original flavor
+ */
+ union {
+ int a;
+ int c;
+ };
+};
+
+SEC("raw_tracepoint/sys_enter")
+int test_core_nesting(void *ctx)
+{
+ struct core_reloc_flavors *in_orig = (void *)&data.in;
+ struct core_reloc_flavors___reversed *in_rev = (void *)&data.in;
+ struct core_reloc_flavors___weird *in_weird = (void *)&data.in;
+ struct core_reloc_flavors *out = (void *)&data.out;
+
+ /* read a using weird layout */
+ if (bpf_probe_read(&out->a, sizeof(in_weird->a),
+ __builtin_preserve_access_index(&in_weird->a)))
+ return 1;
+ /* read b using reversed layout */
+ if (bpf_probe_read(&out->b, sizeof(in_rev->b),
+ __builtin_preserve_access_index(&in_rev->b)))
+ return 1;
+ /* read c using original layout */
+ if (bpf_probe_read(&out->c, sizeof(in_orig->c),
+ __builtin_preserve_access_index(&in_orig->c)))
+ return 1;
+
+ return 0;
+}
+
--
2.17.1
^ permalink raw reply related
* [PATCH bpf-next 02/10] libbpf: implement BPF CO-RE offset relocation algorithm
From: Andrii Nakryiko @ 2019-07-24 19:27 UTC (permalink / raw)
To: bpf, netdev, ast, daniel, yhs
Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko
In-Reply-To: <20190724192742.1419254-1-andriin@fb.com>
This patch implements the core logic for BPF CO-RE offsets relocations.
All the details are described in code comments.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
tools/lib/bpf/libbpf.c | 866 ++++++++++++++++++++++++++++++++++++++++-
tools/lib/bpf/libbpf.h | 1 +
2 files changed, 861 insertions(+), 6 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 8741c39adb1c..86d87bf10d46 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -38,6 +38,7 @@
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/vfs.h>
+#include <sys/utsname.h>
#include <tools/libc_compat.h>
#include <libelf.h>
#include <gelf.h>
@@ -47,6 +48,7 @@
#include "btf.h"
#include "str_error.h"
#include "libbpf_internal.h"
+#include "hashmap.h"
#ifndef EM_BPF
#define EM_BPF 247
@@ -1013,16 +1015,22 @@ static int bpf_object__init_user_maps(struct bpf_object *obj, bool strict)
}
static const struct btf_type *skip_mods_and_typedefs(const struct btf *btf,
- __u32 id)
+ __u32 id,
+ __u32 *res_id)
{
const struct btf_type *t = btf__type_by_id(btf, id);
+ if (res_id)
+ *res_id = id;
+
while (true) {
switch (BTF_INFO_KIND(t->info)) {
case BTF_KIND_VOLATILE:
case BTF_KIND_CONST:
case BTF_KIND_RESTRICT:
case BTF_KIND_TYPEDEF:
+ if (res_id)
+ *res_id = t->type;
t = btf__type_by_id(btf, t->type);
break;
default:
@@ -1041,7 +1049,7 @@ static const struct btf_type *skip_mods_and_typedefs(const struct btf *btf,
static bool get_map_field_int(const char *map_name, const struct btf *btf,
const struct btf_type *def,
const struct btf_member *m, __u32 *res) {
- const struct btf_type *t = skip_mods_and_typedefs(btf, m->type);
+ const struct btf_type *t = skip_mods_and_typedefs(btf, m->type, NULL);
const char *name = btf__name_by_offset(btf, m->name_off);
const struct btf_array *arr_info;
const struct btf_type *arr_t;
@@ -1107,7 +1115,7 @@ static int bpf_object__init_user_btf_map(struct bpf_object *obj,
return -EOPNOTSUPP;
}
- def = skip_mods_and_typedefs(obj->btf, var->type);
+ def = skip_mods_and_typedefs(obj->btf, var->type, NULL);
if (BTF_INFO_KIND(def->info) != BTF_KIND_STRUCT) {
pr_warning("map '%s': unexpected def kind %u.\n",
map_name, BTF_INFO_KIND(var->info));
@@ -2289,6 +2297,845 @@ bpf_program_reloc_btf_ext(struct bpf_program *prog, struct bpf_object *obj,
return 0;
}
+#define BPF_CORE_SPEC_MAX_LEN 64
+
+/* represents BPF CO-RE field or array element accessor */
+struct bpf_core_accessor {
+ __u32 type_id; /* struct/union type or array element type */
+ __u32 idx; /* field index or array index */
+ const char *name; /* field name or NULL for array accessor */
+};
+
+struct bpf_core_spec {
+ const struct btf *btf;
+ /* high-level spec: named fields and array indicies only */
+ struct bpf_core_accessor spec[BPF_CORE_SPEC_MAX_LEN];
+ /* high-level spec length */
+ int len;
+ /* raw, low-level spec: 1-to-1 with accessor spec string */
+ int raw_spec[BPF_CORE_SPEC_MAX_LEN];
+ /* raw spec length */
+ int raw_len;
+ /* field byte offset represented by spec */
+ __u32 offset;
+};
+
+static bool str_is_empty(const char *s)
+{
+ return !s || !s[0];
+}
+
+static int btf_kind(const struct btf_type *t)
+{
+ return BTF_INFO_KIND(t->info);
+}
+
+static bool btf_is_composite(const struct btf_type *t)
+{
+ int kind = btf_kind(t);
+
+ return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
+}
+
+static bool btf_is_array(const struct btf_type *t)
+{
+ return btf_kind(t) == BTF_KIND_ARRAY;
+}
+
+/*
+ * Turn bpf_offset_reloc into a low- and high-level spec representation,
+ * validating correctness along the way, as well as calculating resulting
+ * field offset (in bytes), specified by accessor string. Low-level spec
+ * captures every single level of nestedness, including traversing anonymous
+ * struct/union members. High-level one only captures semantically meaningful
+ * "turning points": named fields and array indicies.
+ * E.g., for this case:
+ *
+ * struct sample {
+ * int __unimportant;
+ * struct {
+ * int __1;
+ * int __2;
+ * int a[7];
+ * };
+ * };
+ *
+ * struct sample *s = ...;
+ *
+ * int x = &s->a[3]; // access string = '0:1:2:3'
+ *
+ * Low-level spec has 1:1 mapping with each element of access string (it's
+ * just a parsed access string representation): [0, 1, 2, 3].
+ *
+ * High-level spec will capture only 3 points:
+ * - intial zero-index access by pointer (&s->... is the same as &s[0]...);
+ * - field 'a' access (corresponds to '2' in low-level spec);
+ * - array element #3 access (corresponds to '3' in low-level spec).
+ *
+ */
+static int bpf_core_spec_parse(const struct btf *btf,
+ __u32 type_id,
+ const char *spec_str,
+ struct bpf_core_spec *spec)
+{
+ int access_idx, parsed_len, i;
+ const struct btf_type *t;
+ __u32 id = type_id;
+ const char *name;
+ __s64 sz;
+
+ if (str_is_empty(spec_str) || *spec_str == ':')
+ return -EINVAL;
+
+ memset(spec, 0, sizeof(*spec));
+ spec->btf = btf;
+
+ /* parse spec_str="0:1:2:3:4" into array raw_spec=[0, 1, 2, 3, 4] */
+ while (*spec_str) {
+ if (*spec_str == ':')
+ ++spec_str;
+ if (sscanf(spec_str, "%d%n", &access_idx, &parsed_len) != 1)
+ return -EINVAL;
+ if (spec->raw_len == BPF_CORE_SPEC_MAX_LEN)
+ return -E2BIG;
+ spec_str += parsed_len;
+ spec->raw_spec[spec->raw_len++] = access_idx;
+ }
+
+ if (spec->raw_len == 0)
+ return -EINVAL;
+
+ for (i = 0; i < spec->raw_len; i++) {
+ t = skip_mods_and_typedefs(btf, id, &id);
+ if (!t)
+ return -EINVAL;
+
+ access_idx = spec->raw_spec[i];
+
+ if (i == 0) {
+ /* first spec value is always reloc type array index */
+ spec->spec[spec->len].type_id = id;
+ spec->spec[spec->len].idx = access_idx;
+ spec->len++;
+
+ sz = btf__resolve_size(btf, id);
+ if (sz < 0)
+ return sz;
+ spec->offset += access_idx * sz;
+ continue;
+ }
+
+ if (btf_is_composite(t)) {
+ const struct btf_member *m = (void *)(t + 1);
+ __u32 offset;
+
+ if (access_idx >= BTF_INFO_VLEN(t->info))
+ return -EINVAL;
+
+ m = &m[access_idx];
+
+ if (BTF_INFO_KFLAG(t->info)) {
+ if (BTF_MEMBER_BITFIELD_SIZE(m->offset))
+ return -EINVAL;
+ offset = BTF_MEMBER_BIT_OFFSET(m->offset);
+ } else {
+ offset = m->offset;
+ }
+ if (m->offset % 8)
+ return -EINVAL;
+ spec->offset += offset / 8;
+
+ if (m->name_off) {
+ name = btf__name_by_offset(btf, m->name_off);
+ if (str_is_empty(name))
+ return -EINVAL;
+
+ spec->spec[spec->len].type_id = id;
+ spec->spec[spec->len].idx = access_idx;
+ spec->spec[spec->len].name = name;
+ spec->len++;
+ }
+
+ id = m->type;
+ } else if (btf_is_array(t)) {
+ const struct btf_array *a = (void *)(t + 1);
+
+ t = skip_mods_and_typedefs(btf, a->type, &id);
+ if (!t || access_idx >= a->nelems)
+ return -EINVAL;
+
+ spec->spec[spec->len].type_id = id;
+ spec->spec[spec->len].idx = access_idx;
+ spec->len++;
+
+ sz = btf__resolve_size(btf, id);
+ if (sz < 0)
+ return sz;
+ spec->offset += access_idx * sz;
+ } else {
+ pr_warning("relo for [%u] %s (at idx %d) captures type [%d] of unexpected kind %d\n",
+ type_id, spec_str, i, id, btf_kind(t));
+ return -EINVAL;
+ }
+ }
+
+ if (spec->len == 0)
+ return -EINVAL;
+
+ return 0;
+}
+
+/* Given 'some_struct_name___with_flavor' return the length of a name prefix
+ * before last triple underscore. Struct name part after last triple
+ * underscore is ignored by BPF CO-RE relocation during relocation matching.
+ */
+static size_t bpf_core_essential_name_len(const char *name)
+{
+ size_t n = strlen(name);
+ int i = n - 3;
+
+ while (i > 0) {
+ if (name[i] == '_' && name[i + 1] == '_' && name[i + 2] == '_')
+ return i;
+ i--;
+ }
+ return n;
+}
+
+/* dynamically sized list of type IDs */
+struct ids_vec {
+ __u32 *data;
+ int len;
+};
+
+static void bpf_core_free_cands(struct ids_vec *cand_ids)
+{
+ free(cand_ids->data);
+ free(cand_ids);
+}
+
+static struct ids_vec *bpf_core_find_cands(const struct btf *local_btf,
+ __u32 local_type_id,
+ const struct btf *targ_btf)
+{
+ size_t local_essent_len, targ_essent_len;
+ const char *local_name, *targ_name;
+ const struct btf_type *t;
+ struct ids_vec *cand_ids;
+ __u32 *new_ids;
+ int i, err, n;
+
+ t = btf__type_by_id(local_btf, local_type_id);
+ if (!t)
+ return ERR_PTR(-EINVAL);
+
+ local_name = btf__name_by_offset(local_btf, t->name_off);
+ if (str_is_empty(local_name))
+ return ERR_PTR(-EINVAL);
+ local_essent_len = bpf_core_essential_name_len(local_name);
+
+ cand_ids = calloc(1, sizeof(*cand_ids));
+ if (!cand_ids)
+ return ERR_PTR(-ENOMEM);
+
+ n = btf__get_nr_types(targ_btf);
+ for (i = 1; i <= n; i++) {
+ t = btf__type_by_id(targ_btf, i);
+ targ_name = btf__name_by_offset(targ_btf, t->name_off);
+ if (str_is_empty(targ_name))
+ continue;
+
+ targ_essent_len = bpf_core_essential_name_len(targ_name);
+ if (targ_essent_len != local_essent_len)
+ continue;
+
+ if (strncmp(local_name, targ_name, local_essent_len) == 0) {
+ pr_debug("[%d] (%s): found candidate [%d] (%s)\n",
+ local_type_id, local_name, i, targ_name);
+ new_ids = realloc(cand_ids->data, cand_ids->len + 1);
+ if (!new_ids) {
+ err = -ENOMEM;
+ goto err_out;
+ }
+ cand_ids->data = new_ids;
+ cand_ids->data[cand_ids->len++] = i;
+ }
+ }
+ return cand_ids;
+err_out:
+ bpf_core_free_cands(cand_ids);
+ return ERR_PTR(err);
+}
+
+/* Check two types for compatibility, skipping const/volatile/restrict and
+ * typedefs, to ensure we are relocating offset to the compatible entities:
+ * - any two STRUCTs/UNIONs are compatible and can be mixed;
+ * - any two FWDs are compatible;
+ * - any two PTRs are always compatible;
+ * - for ENUMs, check sizes, names are ignored;
+ * - for INT, size and bitness should match, signedness is ignored;
+ * - for ARRAY, dimensionality is ignored, element types are checked for
+ * compatibility recursively;
+ * - everything else shouldn't be ever a target of relocation.
+ * These rules are not set in stone and probably will be adjusted as we get
+ * more experience with using BPF CO-RE relocations.
+ */
+static int bpf_core_fields_are_compat(const struct btf *local_btf,
+ __u32 local_id,
+ const struct btf *targ_btf,
+ __u32 targ_id)
+{
+ const struct btf_type *local_type, *targ_type;
+ __u16 kind;
+
+recur:
+ local_type = skip_mods_and_typedefs(local_btf, local_id, &local_id);
+ targ_type = skip_mods_and_typedefs(targ_btf, targ_id, &targ_id);
+ if (!local_type || !targ_type)
+ return -EINVAL;
+
+ if (btf_is_composite(local_type) && btf_is_composite(targ_type))
+ return 1;
+ if (BTF_INFO_KIND(local_type->info) != BTF_INFO_KIND(targ_type->info))
+ return 0;
+
+ kind = BTF_INFO_KIND(local_type->info);
+ switch (kind) {
+ case BTF_KIND_FWD:
+ case BTF_KIND_PTR:
+ return 1;
+ case BTF_KIND_ENUM:
+ return local_type->size == targ_type->size;
+ case BTF_KIND_INT: {
+ __u32 loc_int = *(__u32 *)(local_type + 1);
+ __u32 targ_int = *(__u32 *)(targ_type + 1);
+
+ return BTF_INT_OFFSET(loc_int) == 0 &&
+ BTF_INT_OFFSET(targ_int) == 0 &&
+ local_type->size == targ_type->size &&
+ BTF_INT_BITS(loc_int) == BTF_INT_BITS(targ_int);
+ }
+ case BTF_KIND_ARRAY: {
+ const struct btf_array *loc_a, *targ_a;
+
+ loc_a = (void *)(local_type + 1);
+ targ_a = (void *)(targ_type + 1);
+ local_id = loc_a->type;
+ targ_id = targ_a->type;
+ goto recur;
+ }
+ default:
+ pr_warning("unexpected kind %d relocated, local [%d], target [%d]\n",
+ kind, local_id, targ_id);
+ return 0;
+ }
+}
+
+/*
+ * Given single high-level accessor (either named field or array index) in
+ * local type, find corresponding high-level accessor for a target type. Along
+ * the way, maintain low-level spec for target as well. Also keep updating
+ * target offset.
+ */
+static int bpf_core_match_member(const struct btf *local_btf,
+ const struct bpf_core_accessor *local_acc,
+ const struct btf *targ_btf,
+ __u32 targ_id,
+ struct bpf_core_spec *spec,
+ __u32 *next_targ_id)
+{
+ const struct btf_type *local_type, *targ_type;
+ const struct btf_member *local_member, *m;
+ const char *local_name, *targ_name;
+ __u32 local_id;
+ int i, n, found;
+
+ targ_type = skip_mods_and_typedefs(targ_btf, targ_id, &targ_id);
+ if (!targ_type)
+ return -EINVAL;
+ if (!btf_is_composite(targ_type))
+ return 0;
+
+ local_id = local_acc->type_id;
+ local_type = btf__type_by_id(local_btf, local_id);
+ local_member = (void *)(local_type + 1);
+ local_member += local_acc->idx;
+ local_name = btf__name_by_offset(local_btf, local_member->name_off);
+
+ n = BTF_INFO_VLEN(targ_type->info);
+ m = (void *)(targ_type + 1);
+ for (i = 0; i < n; i++, m++) {
+ __u32 offset;
+
+ /* bitfield relocations not supported */
+ if (BTF_INFO_KFLAG(targ_type->info)) {
+ if (BTF_MEMBER_BITFIELD_SIZE(m->offset))
+ continue;
+ offset = BTF_MEMBER_BIT_OFFSET(m->offset);
+ } else {
+ offset = m->offset;
+ }
+ if (offset % 8)
+ continue;
+
+ /* too deep struct/union/array nesting */
+ if (spec->raw_len == BPF_CORE_SPEC_MAX_LEN)
+ return -E2BIG;
+
+ /* speculate this member will be the good one */
+ spec->offset += offset / 8;
+ spec->raw_spec[spec->raw_len++] = i;
+
+ targ_name = btf__name_by_offset(targ_btf, m->name_off);
+ if (str_is_empty(targ_name)) {
+ /* embedded struct/union, we need to go deeper */
+ found = bpf_core_match_member(local_btf, local_acc,
+ targ_btf, m->type,
+ spec, next_targ_id);
+ if (found) /* either found or error */
+ return found;
+ } else if (strcmp(local_name, targ_name) == 0) {
+ /* matching named field */
+ struct bpf_core_accessor *targ_acc;
+
+ targ_acc = &spec->spec[spec->len++];
+ targ_acc->type_id = targ_id;
+ targ_acc->idx = i;
+ targ_acc->name = targ_name;
+
+ *next_targ_id = m->type;
+ found = bpf_core_fields_are_compat(local_btf,
+ local_member->type,
+ targ_btf, m->type);
+ if (!found)
+ spec->len--; /* pop accessor */
+ return found;
+ }
+ /* member turned out to be not we looked for */
+ spec->offset -= offset / 8;
+ spec->raw_len--;
+ }
+
+ return 0;
+}
+
+/*
+ * Try to match local spec to a target type and, if successful, produce full
+ * target spec (high-level, low-level + offset).
+ */
+static int bpf_core_spec_match(struct bpf_core_spec *local_spec,
+ const struct btf *targ_btf, __u32 targ_id,
+ struct bpf_core_spec *targ_spec)
+{
+ const struct btf_type *targ_type;
+ const struct bpf_core_accessor *local_acc;
+ struct bpf_core_accessor *targ_acc;
+ int i, sz, matched;
+
+ memset(targ_spec, 0, sizeof(*targ_spec));
+ targ_spec->btf = targ_btf;
+
+ local_acc = &local_spec->spec[0];
+ targ_acc = &targ_spec->spec[0];
+
+ for (i = 0; i < local_spec->len; i++, local_acc++, targ_acc++) {
+ targ_type = skip_mods_and_typedefs(targ_spec->btf, targ_id,
+ &targ_id);
+ if (!targ_type)
+ return -EINVAL;
+
+ if (local_acc->name) {
+ if (!btf_is_composite(targ_type))
+ return 0;
+
+ matched = bpf_core_match_member(local_spec->btf,
+ local_acc,
+ targ_btf, targ_id,
+ targ_spec, &targ_id);
+ if (matched <= 0)
+ return matched;
+ } else {
+ /* for i=0, targ_id is already treated as array element
+ * type (because it's the original struct), for others
+ * we should find array element type first
+ */
+ if (i > 0) {
+ const struct btf_array *a;
+
+ if (!btf_is_array(targ_type))
+ return 0;
+
+ a = (void *)(targ_type + 1);
+ if (local_acc->idx >= a->nelems)
+ return 0;
+ if (!skip_mods_and_typedefs(targ_btf, a->type,
+ &targ_id))
+ return -EINVAL;
+ }
+
+ /* too deep struct/union/array nesting */
+ if (targ_spec->raw_len == BPF_CORE_SPEC_MAX_LEN)
+ return -E2BIG;
+
+ targ_acc->type_id = targ_id;
+ targ_acc->idx = local_acc->idx;
+ targ_acc->name = NULL;
+ targ_spec->len++;
+ targ_spec->raw_spec[targ_spec->raw_len] = targ_acc->idx;
+ targ_spec->raw_len++;
+
+ sz = btf__resolve_size(targ_btf, targ_id);
+ if (sz < 0)
+ return sz;
+ targ_spec->offset += local_acc->idx * sz;
+ }
+ }
+
+ return 1;
+}
+
+/*
+ * Patch relocatable BPF instruction.
+ * Expected insn->imm value is provided for validation, as well as the new
+ * relocated value.
+ *
+ * Currently three kinds of BPF instructions are supported:
+ * 1. rX = <imm> (assignment with immediate operand);
+ * 2. rX += <imm> (arithmetic operations with immediate operand);
+ * 3. *(rX) = <imm> (indirect memory assignment with immediate operand).
+ *
+ * If actual insn->imm value is wrong, bail out.
+ */
+static int bpf_core_reloc_insn(struct bpf_program *prog, int insn_off,
+ __u32 orig_off, __u32 new_off)
+{
+ struct bpf_insn *insn;
+ int insn_idx;
+ __u8 class;
+
+ if (insn_off % sizeof(struct bpf_insn))
+ return -EINVAL;
+ insn_idx = insn_off / sizeof(struct bpf_insn);
+
+ insn = &prog->insns[insn_idx];
+ class = BPF_CLASS(insn->code);
+
+ if (class == BPF_ALU || class == BPF_ALU64) {
+ if (BPF_SRC(insn->code) != BPF_K)
+ return -EINVAL;
+ if (insn->imm != orig_off)
+ return -EINVAL;
+ insn->imm = new_off;
+ pr_debug("prog '%s': patched insn #%d (ALU/ALU64) imm %d -> %d\n",
+ bpf_program__title(prog, false),
+ insn_idx, orig_off, new_off);
+ } else if (class == BPF_ST && BPF_MODE(insn->code) == BPF_MEM) {
+ if (insn->imm != orig_off)
+ return -EINVAL;
+ insn->imm = new_off;
+ pr_debug("prog '%s': patched insn #%d (ST | MEM) imm %d -> %d\n",
+ bpf_program__title(prog, false),
+ insn_idx, orig_off, new_off);
+ } else {
+ pr_warning("prog '%s': trying to relocate unrecognized insn #%d, code:%x, src:%x, dst:%x, off:%x, imm:%x\n",
+ bpf_program__title(prog, false),
+ insn_idx, insn->code, insn->src_reg, insn->dst_reg,
+ insn->off, insn->imm);
+ return -EINVAL;
+ }
+ return 0;
+}
+
+/*
+ * Probe few well-known locations for vmlinux kernel image and try to load BTF
+ * data out of it to use for target BTF.
+ */
+static struct btf *bpf_core_find_kernel_btf(void)
+{
+ const char *locations[] = {
+ "/lib/modules/%1$s/vmlinux-%1$s",
+ "/usr/lib/modules/%1$s/kernel/vmlinux",
+ };
+ char path[PATH_MAX + 1];
+ struct utsname buf;
+ struct btf *btf;
+ int i, err;
+
+ err = uname(&buf);
+ if (err) {
+ pr_warning("failed to uname(): %d\n", err);
+ return ERR_PTR(err);
+ }
+
+ for (i = 0; i < ARRAY_SIZE(locations); i++) {
+ snprintf(path, PATH_MAX, locations[i], buf.release);
+ pr_debug("attempting to load kernel BTF from '%s'\n", path);
+
+ if (access(path, R_OK))
+ continue;
+
+ btf = btf__parse_elf(path, NULL);
+ if (IS_ERR(btf))
+ continue;
+
+ pr_debug("successfully loaded kernel BTF from '%s'\n", path);
+ return btf;
+ }
+
+ pr_warning("failed to find valid kernel BTF\n");
+ return ERR_PTR(-ESRCH);
+}
+
+static size_t bpf_core_hash_fn(const void *key, void *ctx)
+{
+ return (size_t)key;
+}
+
+static bool bpf_core_equal_fn(const void *k1, const void *k2, void *ctx)
+{
+ return k1 == k2;
+}
+
+static void *u32_to_ptr(__u32 x)
+{
+ return (void *)(uintptr_t)x;
+}
+
+/*
+ * CO-RE relocate single instruction.
+ *
+ * The outline and important points of the algorithm:
+ * 1. For given local type, find corresponding candidate target types.
+ * Candidate type is a type with the same "essential" name, ignoring
+ * everything after last triple underscore (___). E.g., `sample`,
+ * `sample___flavor_one`, `sample___flavor_another_one`, are all candidates
+ * for each other. Names with triple underscore are referred to as
+ * "flavors" and are useful, among other things, to allow to
+ * specify/support incompatible variations of the same kernel struct, which
+ * might differ between different kernel versions and/or build
+ * configurations.
+ * 2. For each candidate type, try to match local specification to this
+ * candidate target type. Matching involves finding corresponding
+ * high-level spec accessors, meaning that all named fields should match,
+ * as well as all array accesses should be within the actual bounds. Also,
+ * types should be compatible (see bpf_core_fields_are_compat for details).
+ * 3. It is supported and expected that there might be multiple flavors
+ * matching the spec. As long as all the specs resolve to the same set of
+ * offsets across all candidates, there is not error. If there is any
+ * ambiguity, CO-RE relocation will fail. This is necessary to accomodate
+ * imprefection of BTF deduplication, which can cause slight duplication of
+ * the same BTF type, if some directly or indirectly referenced (by
+ * pointer) type gets resolved to different actual types in different
+ * object files. If such situation occurs, deduplicated BTF will end up
+ * with two (or more) structurally identical types, which differ only in
+ * types they refer to through pointer. This should be OK in most cases and
+ * is not an error.
+ * 4. Candidate types search is performed by linearly scanning through all
+ * types in target BTF. It is anticipated that this is overall more
+ * efficient memory-wise and not significantly worse (if not better)
+ * CPU-wise compared to prebuilding a map from all local type names to
+ * a list of candidate type names. It's also sped up by caching resolved
+ * list of matching candidates per each local "root" type ID, that has at
+ * least one bpf_offset_reloc associated with it. This list is shared
+ * between multiple relocations for the same type ID and is updated as some
+ * of the candidates are pruned due to structural incompatibility.
+ */
+static int bpf_core_reloc_offset(struct bpf_program *prog,
+ const struct bpf_offset_reloc *relo,
+ int relo_idx,
+ const struct btf *local_btf,
+ const struct btf *targ_btf,
+ struct hashmap *cand_cache)
+{
+ const char *prog_name = bpf_program__title(prog, false);
+ struct bpf_core_spec local_spec, cand_spec, targ_spec;
+ const void *type_key = u32_to_ptr(relo->type_id);
+ const struct btf_type *local_type, *cand_type;
+ const char *local_name, *cand_name;
+ struct ids_vec *cand_ids;
+ __u32 local_id, cand_id;
+ const char *spec_str;
+ int i, j, err;
+
+ local_id = relo->type_id;
+ local_type = btf__type_by_id(local_btf, local_id);
+ if (!local_type)
+ return -EINVAL;
+
+ local_name = btf__name_by_offset(local_btf, local_type->name_off);
+ if (str_is_empty(local_name))
+ return -EINVAL;
+
+ spec_str = btf__name_by_offset(local_btf, relo->access_str_off);
+ if (str_is_empty(spec_str))
+ return -EINVAL;
+
+ pr_debug("prog '%s': relo #%d: insn_off=%d, [%d] (%s) + %s\n",
+ prog_name, relo_idx, relo->insn_off,
+ local_id, local_name, spec_str);
+
+ err = bpf_core_spec_parse(local_btf, local_id, spec_str, &local_spec);
+ if (err) {
+ pr_warning("prog '%s': relo #%d: parsing [%d] (%s) + %s failed: %d\n",
+ prog_name, relo_idx, local_id, local_name, spec_str,
+ err);
+ return -EINVAL;
+ }
+ pr_debug("prog '%s': relo #%d: [%d] (%s) + %s is off %u, len %d, raw_len %d\n",
+ prog_name, relo_idx, local_id, local_name, spec_str,
+ local_spec.offset, local_spec.len, local_spec.raw_len);
+
+ if (!hashmap__find(cand_cache, type_key, (void **)&cand_ids)) {
+ cand_ids = bpf_core_find_cands(local_btf, local_id, targ_btf);
+ if (IS_ERR(cand_ids)) {
+ pr_warning("prog '%s': relo #%d: target candidate search failed for [%d] (%s) + %s: %ld\n",
+ prog_name, relo_idx, local_id, local_name,
+ spec_str, PTR_ERR(cand_ids));
+ return PTR_ERR(cand_ids);
+ }
+ err = hashmap__set(cand_cache, type_key, cand_ids, NULL, NULL);
+ if (err) {
+ bpf_core_free_cands(cand_ids);
+ return err;
+ }
+ }
+
+ for (i = 0, j = 0; i < cand_ids->len; i++) {
+ cand_id = cand_ids->data[j];
+ cand_type = btf__type_by_id(targ_btf, cand_id);
+ cand_name = btf__name_by_offset(targ_btf, cand_type->name_off);
+
+ err = bpf_core_spec_match(&local_spec, targ_btf,
+ cand_id, &cand_spec);
+ if (err < 0) {
+ pr_warning("prog '%s': relo #%d: failed to match spec [%d] (%s) + %s to candidate #%d [%d] (%s): %d\n",
+ prog_name, relo_idx, local_id, local_name,
+ spec_str, i, cand_id, cand_name, err);
+ return err;
+ }
+ if (err == 0) {
+ pr_debug("prog '%s': relo #%d: candidate #%d [%d] (%s) doesn't match spec\n",
+ prog_name, relo_idx, i, cand_id, cand_name);
+ continue;
+ }
+
+ pr_debug("prog '%s': relo #%d: candidate #%d ([%d] %s) is off %u, len %d, raw_len %d\n",
+ prog_name, relo_idx, i, cand_id, cand_name,
+ cand_spec.offset, cand_spec.len, cand_spec.raw_len);
+
+ if (j == 0) {
+ targ_spec = cand_spec;
+ } else if (cand_spec.offset != targ_spec.offset) {
+ /* if there are many candidates, they should all
+ * resolve to the same offset
+ */
+ pr_warning("prog '%s': relo #%d: candidate #%d ([%d] %s): conflicting offset found (%u != %u)\n",
+ prog_name, relo_idx, i, cand_id, cand_name,
+ cand_spec.offset, targ_spec.offset);
+ return -EINVAL;
+ }
+
+ cand_ids->data[j++] = cand_spec.spec[0].type_id;
+ }
+
+ cand_ids->len = j;
+ if (cand_ids->len == 0) {
+ pr_warning("prog '%s': relo #%d: no matching targets found for [%d] (%s) + %s\n",
+ prog_name, relo_idx, local_id, local_name, spec_str);
+ return -ESRCH;
+ }
+
+ err = bpf_core_reloc_insn(prog, relo->insn_off,
+ local_spec.offset, targ_spec.offset);
+ if (err) {
+ pr_warning("prog '%s': relo #%d: failed to patch insn at offset %d: %d\n",
+ prog_name, relo_idx, relo->insn_off, err);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int
+bpf_core_reloc_offsets(struct bpf_object *obj, const char *targ_btf_path)
+{
+ const struct btf_ext_info_sec *sec;
+ const struct bpf_offset_reloc *rec;
+ const struct btf_ext_info *seg;
+ struct hashmap_entry *entry;
+ struct hashmap *cand_cache = NULL;
+ struct bpf_program *prog;
+ struct btf *targ_btf;
+ const char *sec_name;
+ int i, err = 0;
+
+ if (targ_btf_path)
+ targ_btf = btf__parse_elf(targ_btf_path, NULL);
+ else
+ targ_btf = bpf_core_find_kernel_btf();
+ if (IS_ERR(targ_btf)) {
+ pr_warning("failed to get target BTF: %ld\n",
+ PTR_ERR(targ_btf));
+ return PTR_ERR(targ_btf);
+ }
+
+ cand_cache = hashmap__new(bpf_core_hash_fn, bpf_core_equal_fn, NULL);
+ if (IS_ERR(cand_cache)) {
+ err = PTR_ERR(cand_cache);
+ goto out;
+ }
+
+ seg = &obj->btf_ext->offset_reloc_info;
+ for_each_btf_ext_sec(seg, sec) {
+ sec_name = btf__name_by_offset(obj->btf, sec->sec_name_off);
+ if (str_is_empty(sec_name)) {
+ err = -EINVAL;
+ goto out;
+ }
+ prog = bpf_object__find_program_by_title(obj, sec_name);
+ if (!prog) {
+ pr_warning("failed to find program '%s' for CO-RE offset relocation\n",
+ sec_name);
+ err = -EINVAL;
+ goto out;
+ }
+
+ pr_debug("prog '%s': performing %d CO-RE offset relocs\n",
+ sec_name, sec->num_info);
+
+ for_each_btf_ext_rec(seg, sec, i, rec) {
+ err = bpf_core_reloc_offset(prog, rec, i, obj->btf,
+ targ_btf, cand_cache);
+ if (err) {
+ pr_warning("prog '%s': relo #%d: failed to relocate: %d\n",
+ sec_name, i, err);
+ goto out;
+ }
+ }
+ }
+
+out:
+ btf__free(targ_btf);
+ if (!IS_ERR_OR_NULL(cand_cache)) {
+ hashmap__for_each_entry(cand_cache, entry, i) {
+ bpf_core_free_cands(entry->value);
+ }
+ hashmap__free(cand_cache);
+ }
+ return err;
+}
+
+static int
+bpf_object__relocate_core(struct bpf_object *obj, const char *targ_btf_path)
+{
+ int err = 0;
+
+ if (obj->btf_ext->offset_reloc_info.len)
+ err = bpf_core_reloc_offsets(obj, targ_btf_path);
+
+ return err;
+}
+
static int
bpf_program__reloc_text(struct bpf_program *prog, struct bpf_object *obj,
struct reloc_desc *relo)
@@ -2396,14 +3243,21 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
return 0;
}
-
static int
-bpf_object__relocate(struct bpf_object *obj)
+bpf_object__relocate(struct bpf_object *obj, const char *targ_btf_path)
{
struct bpf_program *prog;
size_t i;
int err;
+ if (obj->btf_ext) {
+ err = bpf_object__relocate_core(obj, targ_btf_path);
+ if (err) {
+ pr_warning("failed to perform CO-RE relocations: %d\n",
+ err);
+ return err;
+ }
+ }
for (i = 0; i < obj->nr_programs; i++) {
prog = &obj->programs[i];
@@ -2804,7 +3658,7 @@ int bpf_object__load_xattr(struct bpf_object_load_attr *attr)
obj->loaded = true;
CHECK_ERR(bpf_object__create_maps(obj), err, out);
- CHECK_ERR(bpf_object__relocate(obj), err, out);
+ CHECK_ERR(bpf_object__relocate(obj, attr->target_btf_path), err, out);
CHECK_ERR(bpf_object__load_progs(obj, attr->log_level), err, out);
return 0;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 5cbf459ece0b..6004bfe1ebf0 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -92,6 +92,7 @@ LIBBPF_API void bpf_object__close(struct bpf_object *object);
struct bpf_object_load_attr {
struct bpf_object *obj;
int log_level;
+ const char *target_btf_path;
};
/* Load/unload object into/from kernel */
--
2.17.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox