Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC] dt-bindings: net: phy: Add subnode for LED configuration
From: Matthias Kaehlcke @ 2019-07-25 17:52 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S . Miller, Rob Herring, Mark Rutland, Florian Fainelli,
	Heiner Kallweit, netdev, devicetree, linux-kernel,
	Douglas Anderson
In-Reply-To: <20190724180430.GB28488@lunn.ch>

Hi Andrew,

On Wed, Jul 24, 2019 at 08:04:30PM +0200, Andrew Lunn wrote:
> On Mon, Jul 22, 2019 at 03:37:41PM -0700, Matthias Kaehlcke wrote:
> > The LED behavior of some Ethernet PHYs is configurable. Add an
> > optional 'leds' subnode with a child node for each LED to be
> > configured. The binding aims to be compatible with the common
> > LED binding (see devicetree/bindings/leds/common.txt).
> > 
> > A LED can be configured to be 'on' when a link with a certain speed
> > is active, or to blink on RX/TX activity. For the configuration to
> > be effective it needs to be supported by the hardware and the
> > corresponding PHY driver.
> > 
> > Suggested-by: Andrew Lunn <andrew@lunn.ch>
> > Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
> > ---
> > This RFC is a follow up of the discussion on "[PATCH v2 6/7]
> > dt-bindings: net: realtek: Add property to configure LED mode"
> > (https://lore.kernel.org/patchwork/patch/1097185/).
> > 
> > For now posting as RFC to get a basic agreement on the bindings
> > before proceding with the implementation in phylib and a specific
> > driver.
> > ---
> >  Documentation/devicetree/bindings/net/phy.txt | 33 +++++++++++++++++++
> >  1 file changed, 33 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/net/phy.txt b/Documentation/devicetree/bindings/net/phy.txt
> > index 9b9e5b1765dd..ad495d3abbbb 100644
> > --- a/Documentation/devicetree/bindings/net/phy.txt
> > +++ b/Documentation/devicetree/bindings/net/phy.txt
> > @@ -46,6 +46,25 @@ Optional Properties:
> >    Mark the corresponding energy efficient ethernet mode as broken and
> >    request the ethernet to stop advertising it.
> >  
> > +- leds: A sub-node which is a container of only LED nodes. Each child
> > +    node represents a PHY LED.
> > +
> > +  Required properties for LED child nodes:
> > +  - reg: The ID number of the LED, typically corresponds to a hardware ID.
> > +
> > +  Optional properties for child nodes:
> > +  - label: The label for this LED. If omitted, the label is taken from the node
> > +    name (excluding the unit address). It has to uniquely identify a device,
> > +    i.e. no other LED class device can be assigned the same label.
> 
> Hi Matthias
> 
> I've thought about label a bit more. 
> 
> > +			label = "ethphy0:left:green";
> 
> We need to be careful with names here. systemd etc renames
> interfaces. ethphy0 could in fact be connected to enp3s0, or eth0
> might get renamed to eth1, etc. So i think we should avoid things like
> ethphy0.

Agreed, this could be problematic.

> Also, i'm not sure we actually need a label, at least not to
> start with.Do we have any way to expose it to the user?

As of now I don't plan to expose the label to userspace by the PHY
driver/framework itself.

From my side we can omit the label for now and add it later if needed.

> If we do ever make it part of the generic LED framework, we can then
> use the label. At that point, i would probably combine the label with
> the interface name in a dynamic way to avoid issues like this.

Sounds good.

Thanks

Matthias

^ permalink raw reply

* Re: [PATCH v6 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions
From: Jason Gunthorpe @ 2019-07-25 17:55 UTC (permalink / raw)
  To: Michal Kalderon, Kamal Heib
  Cc: ariel.elior, dledford, galpress, linux-rdma, davem, netdev
In-Reply-To: <20190709141735.19193-2-michal.kalderon@marvell.com>

On Tue, Jul 09, 2019 at 05:17:30PM +0300, Michal Kalderon wrote:
> Create some common API's for adding entries to a xa_mmap.
> Searching for an entry and freeing one.
> 
> The code was copied from the efa driver almost as is, just renamed
> function to be generic and not efa specific.
> 
> Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
>  drivers/infiniband/core/device.c      |   1 +
>  drivers/infiniband/core/rdma_core.c   |   1 +
>  drivers/infiniband/core/uverbs_cmd.c  |   1 +
>  drivers/infiniband/core/uverbs_main.c | 135 ++++++++++++++++++++++++++++++++++
>  include/rdma/ib_verbs.h               |  46 ++++++++++++
>  5 files changed, 184 insertions(+)
> 
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index 8a6ccb936dfe..a830c2c5d691 100644
> +++ b/drivers/infiniband/core/device.c
> @@ -2521,6 +2521,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
>  	SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
>  	SET_DEVICE_OP(dev_ops, map_phys_fmr);
>  	SET_DEVICE_OP(dev_ops, mmap);
> +	SET_DEVICE_OP(dev_ops, mmap_free);
>  	SET_DEVICE_OP(dev_ops, modify_ah);
>  	SET_DEVICE_OP(dev_ops, modify_cq);
>  	SET_DEVICE_OP(dev_ops, modify_device);
> diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
> index ccf4d069c25c..1ed01b02401f 100644
> +++ b/drivers/infiniband/core/rdma_core.c
> @@ -816,6 +816,7 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
>  
>  	rdma_restrack_del(&ucontext->res);
>  
> +	rdma_user_mmap_entries_remove_free(ucontext);
>  	ib_dev->ops.dealloc_ucontext(ucontext);
>  	kfree(ucontext);
>  
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index 7ddd0e5bc6b3..44c0600245e4 100644
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -254,6 +254,7 @@ static int ib_uverbs_get_context(struct uverbs_attr_bundle *attrs)
>  
>  	mutex_init(&ucontext->per_mm_list_lock);
>  	INIT_LIST_HEAD(&ucontext->per_mm_list);
> +	xa_init(&ucontext->mmap_xa);
>  
>  	ret = get_unused_fd_flags(O_CLOEXEC);
>  	if (ret < 0)
> diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
> index 11c13c1381cf..4b909d7b97de 100644
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -965,6 +965,141 @@ int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
>  }
>  EXPORT_SYMBOL(rdma_user_mmap_io);
>  
> +static inline u64
> +rdma_user_mmap_get_key(const struct rdma_user_mmap_entry *entry)
> +{
> +	return (u64)entry->mmap_page << PAGE_SHIFT;
> +}
> +
> +/**
> + * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa.
> + *
> + * @ucontext: associated user context.
> + * @key: The key received from rdma_user_mmap_entry_insert which
> + *     is provided by user as the address to map.
> + * @len: The length the user wants to map
> + *
> + * This function is called when a user tries to mmap a key it
> + * initially received from the driver. They key was created by
> + * the function rdma_user_mmap_entry_insert.
> + *
> + * Return an entry if exists or NULL if there is no match.
> + */
> +struct rdma_user_mmap_entry *
> +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len)
> +{
> +	struct rdma_user_mmap_entry *entry;
> +	u64 mmap_page;
> +
> +	mmap_page = key >> PAGE_SHIFT;
> +	if (mmap_page > U32_MAX)
> +		return NULL;
> +
> +	entry = xa_load(&ucontext->mmap_xa, mmap_page);
> +	if (!entry || entry->length != len)
> +		return NULL;
> +
> +	ibdev_dbg(ucontext->device,
> +		  "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
> +		  entry->obj, key, entry->address, entry->length);
> +
> +	return entry;
> +}
> +EXPORT_SYMBOL(rdma_user_mmap_entry_get);

It is a mistake we keep making, and maybe the war is hopelessly lost
now, but functions called from a driver should not be part of the
ib_uverbs module - ideally uverbs is an optional module. They should
be in ib_core.

Maybe put this in ib_core_uverbs.c ?

Kamal, you've been tackling various cleanups, maybe making ib_uverbs
unloadable again is something you'd be keen on?

> +/**
> + * rdma_user_mmap_entry_insert() - Allocate and insert an entry to the mmap_xa.
> + *
> + * @ucontext: associated user context.
> + * @obj: opaque driver object that will be stored in the entry.
> + * @address: The address that will be mmapped to the user
> + * @length: Length of the address that will be mmapped
> + * @mmap_flag: opaque driver flags related to the address (For
> + *           example could be used for cachability)
> + *
> + * This function should be called by drivers that use the rdma_user_mmap
> + * interface for handling user mmapped addresses. The database is handled in
> + * the core and helper functions are provided to insert entries into the
> + * database and extract entries when the user call mmap with the given key.
> + * The function returns a unique key that should be provided to user, the user
> + * will use the key to map the given address.
> + *
> + * Note this locking scheme cannot support removal of entries,
> + * except during ucontext destruction when the core code
> + * guarentees no concurrency.
> + *
> + * Return: unique key or RDMA_USER_MMAP_INVALID if entry was not added.
> + */
> +u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, void *obj,
> +				u64 address, u64 length, u8 mmap_flag)
> +{
> +	struct rdma_user_mmap_entry *entry;
> +	u32 next_mmap_page;
> +	int err;
> +
> +	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> +	if (!entry)
> +		return RDMA_USER_MMAP_INVALID;
> +
> +	entry->obj = obj;
> +	entry->address = address;
> +	entry->length = length;
> +	entry->mmap_flag = mmap_flag;
> +
> +	xa_lock(&ucontext->mmap_xa);
> +	if (check_add_overflow(ucontext->mmap_xa_page,
> +			       (u32)(length >> PAGE_SHIFT),

Should this be divide round up ?

> +			       &next_mmap_page))
> +		goto err_unlock;

I still don't like that this algorithm latches into a permanent
failure when the xa_page wraps.

It seems worth spending a bit more time here to tidy this.. Keep using
the mmap_xa_page scheme, but instead do something like

alloc_cyclic_range():

while () {
   // Find first empty element in a cyclic way
   xa_page_first = mmap_xa_page;
   xa_find(xa, &xa_page_first, U32_MAX, XA_FREE_MARK)

   // Is there a enough room to have the range?
   if (check_add_overflow(xa_page_first, npages, &xa_page_end)) {
      mmap_xa_page = 0;
      continue;
   }

   // See if the element before intersects 
   elm = xa_find(xa, &zero, xa_page_end, 0);
   if (elm && intersects(xa_page_first, xa_page_last, elm->first, elm->last)) {
      mmap_xa_page = elm->last + 1;
      continue
   }
  
   // xa_page_first -> xa_page_end should now be free
   xa_insert(xa, xa_page_start, entry);
   mmap_xa_page = xa_page_end + 1;
   return xa_page_start;
}

Approximately, please check it.

> @@ -2199,6 +2201,17 @@ struct iw_cm_conn_param;
>  
>  #define DECLARE_RDMA_OBJ_SIZE(ib_struct) size_t size_##ib_struct
>  
> +#define RDMA_USER_MMAP_FLAG_SHIFT 56
> +#define RDMA_USER_MMAP_PAGE_MASK GENMASK(EFA_MMAP_FLAG_SHIFT - 1, 0)
> +#define RDMA_USER_MMAP_INVALID U64_MAX
> +struct rdma_user_mmap_entry {
> +	void *obj;
> +	u64 address;
> +	u64 length;
> +	u32 mmap_page;
> +	u8 mmap_flag;
> +};
> +
>  /**
>   * struct ib_device_ops - InfiniBand device operations
>   * This structure defines all the InfiniBand device operations, providers will
> @@ -2311,6 +2324,19 @@ struct ib_device_ops {
>  			      struct ib_udata *udata);
>  	void (*dealloc_ucontext)(struct ib_ucontext *context);
>  	int (*mmap)(struct ib_ucontext *context, struct vm_area_struct *vma);
> +	/**
> +	 * Memory that is mapped to the user can only be freed once the
> +	 * ucontext of the application is destroyed. This is for
> +	 * security reasons where we don't want an application to have a
> +	 * mapping to phyiscal memory that is freed and allocated to
> +	 * another application. For this reason, all the entries are
> +	 * stored in ucontext and once ucontext is freed mmap_free is
> +	 * called on each of the entries. They type of the memory that

They -> the

> +	 * was mapped may differ between entries and is opaque to the
> +	 * rdma_user_mmap interface. Therefore needs to be implemented
> +	 * by the driver in mmap_free.
> +	 */
> +	void (*mmap_free)(struct rdma_user_mmap_entry *entry);
>  	void (*disassociate_ucontext)(struct ib_ucontext *ibcontext);
>  	int (*alloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
>  	void (*dealloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
> @@ -2709,6 +2735,11 @@ void ib_set_device_ops(struct ib_device *device,
>  #if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)
>  int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
>  		      unsigned long pfn, unsigned long size, pgprot_t prot);
> +u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, void *obj,
> +				u64 address, u64 length, u8 mmap_flag);
> +struct rdma_user_mmap_entry *
> +rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len);
> +void rdma_user_mmap_entries_remove_free(struct ib_ucontext
> *ucontext);

Should remove_free should be in the core-priv header?

Jason

^ permalink raw reply

* Re: [PATCH v6 rdma-next 5/6] RDMA/qedr: Add doorbell overflow recovery support
From: Jason Gunthorpe @ 2019-07-25 18:01 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: ariel.elior, dledford, galpress, linux-rdma, davem, netdev
In-Reply-To: <20190709141735.19193-6-michal.kalderon@marvell.com>

On Tue, Jul 09, 2019 at 05:17:34PM +0300, Michal Kalderon wrote:

> +static int qedr_init_user_db_rec(struct ib_udata *udata,
> +				 struct qedr_dev *dev, struct qedr_userq *q,
> +				 bool requires_db_rec)
> +{
> +	struct qedr_ucontext *uctx =
> +		rdma_udata_to_drv_context(udata, struct qedr_ucontext,
> +					  ibucontext);
> +
> +	/* Aborting for non doorbell userqueue (SRQ) or non-supporting lib */
> +	if (requires_db_rec == 0 || !uctx->db_rec)
> +		return 0;
> +
> +	/* Allocate a page for doorbell recovery, add to mmap ) */
> +	q->db_rec_data = (void *)get_zeroed_page(GFP_KERNEL);

I now think this needs to be GFP_USER and our other drivers have a bug
here as well..

Jason

^ permalink raw reply

* Re: [PATCH v6 rdma-next 0/6] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA
From: Jason Gunthorpe @ 2019-07-25 18:01 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: ariel.elior, dledford, galpress, linux-rdma, davem, netdev,
	Bernard Metzler
In-Reply-To: <20190709141735.19193-1-michal.kalderon@marvell.com>

On Tue, Jul 09, 2019 at 05:17:29PM +0300, Michal Kalderon wrote:
> This patch series uses the doorbell overflow recovery mechanism
> introduced in
> commit 36907cd5cd72 ("qed: Add doorbell overflow recovery mechanism")
> for rdma ( RoCE and iWARP )
> 
> The first three patches modify the core code to contain helper
> functions for managing mmap_xa inserting, getting and freeing
> entries. The code was taken almost as is from the efa driver.
> There is still an open discussion on whether we should take
> this even further and make the entire mmap generic. Until a
> decision is made, I only created the database API and modified
> the efa and qedr driver to use it. The doorbell recovery code will be based
> on the common code.
> 
> Efa driver was compile tested only.
> 
> rdma-core pull request #493
> 
> Changes from V5:
> - Switch between driver dealloc_ucontext and mmap_entries_remove.
> - No need to verify the key after using the key to load an entry from
>   the mmap_xa.
> - Change mmap_free api to pass an 'entry' object.
> - Add documentation for mmap_free and for newly exported functions.
> - Fix some extra/missing line breaks.

Lets do SIW now as well, it has the same xa scheme copied from EFA

Thanks,
Jason

^ permalink raw reply

* Re: [PATCH v6 rdma-next 4/6] qed*: Change dpi_addr to be denoted with __iomem
From: Jason Gunthorpe @ 2019-07-25 18:06 UTC (permalink / raw)
  To: Michal Kalderon
  Cc: ariel.elior, dledford, galpress, linux-rdma, davem, netdev
In-Reply-To: <20190709141735.19193-5-michal.kalderon@marvell.com>

On Tue, Jul 09, 2019 at 05:17:33PM +0300, Michal Kalderon wrote:
> Several casts were required around dpi_addr parameter in qed_rdma_if.h
> This is an address on the doorbell bar and should therefore be marked
> with __iomem.
> 
> Reported-by: Jason Gunthorpe <jgg@mellanox.com>
> Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
>  drivers/infiniband/hw/qedr/main.c          | 2 +-
>  drivers/infiniband/hw/qedr/qedr.h          | 2 +-
>  drivers/net/ethernet/qlogic/qed/qed_rdma.c | 5 ++---
>  include/linux/qed/qed_rdma_if.h            | 2 +-
>  4 files changed, 5 insertions(+), 6 deletions(-)

More lines are RDMA than net, so this patch applied to for-next

Thanks,
Jason

^ permalink raw reply

* [PATCH net] net: qualcomm: rmnet: Fix incorrect UL checksum offload logic
From: Subash Abhinov Kasiviswanathan @ 2019-07-25 18:07 UTC (permalink / raw)
  To: davem, netdev; +Cc: Subash Abhinov Kasiviswanathan, Sean Tranchetti

The udp_ip4_ind bit is set only for IPv4 UDP non-fragmented packets
so that the hardware can flip the checksum to 0xFFFF if the computed
checksum is 0 per RFC768.

However, this bit had to be set for IPv6 UDP non fragmented packets
as well per hardware requirements. Otherwise, IPv6 UDP packets
with computed checksum as 0 were transmitted by hardware and were
dropped in the network.

In addition to setting this bit for IPv6 UDP, the field is also
appropriately renamed to udp_ind as part of this change.

Fixes: 5eb5f8608ef1 ("net: qualcomm: rmnet: Add support for TX checksum offload")
Cc: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c | 13 +++++++++----
 include/linux/if_rmnet.h                             |  4 ++--
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
index 6018992..21d3816 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
@@ -206,9 +206,9 @@ static void rmnet_map_complement_ipv4_txporthdr_csum_field(void *iphdr)
 	ul_header->csum_insert_offset = skb->csum_offset;
 	ul_header->csum_enabled = 1;
 	if (ip4h->protocol == IPPROTO_UDP)
-		ul_header->udp_ip4_ind = 1;
+		ul_header->udp_ind = 1;
 	else
-		ul_header->udp_ip4_ind = 0;
+		ul_header->udp_ind = 0;
 
 	/* Changing remaining fields to network order */
 	hdr++;
@@ -239,6 +239,7 @@ static void rmnet_map_complement_ipv6_txporthdr_csum_field(void *ip6hdr)
 			      struct rmnet_map_ul_csum_header *ul_header,
 			      struct sk_buff *skb)
 {
+	struct ipv6hdr *ip6h = (struct ipv6hdr *)ip6hdr;
 	__be16 *hdr = (__be16 *)ul_header, offset;
 
 	offset = htons((__force u16)(skb_transport_header(skb) -
@@ -246,7 +247,11 @@ static void rmnet_map_complement_ipv6_txporthdr_csum_field(void *ip6hdr)
 	ul_header->csum_start_offset = offset;
 	ul_header->csum_insert_offset = skb->csum_offset;
 	ul_header->csum_enabled = 1;
-	ul_header->udp_ip4_ind = 0;
+
+	if (ip6h->nexthdr == IPPROTO_UDP)
+		ul_header->udp_ind = 1;
+	else
+		ul_header->udp_ind = 0;
 
 	/* Changing remaining fields to network order */
 	hdr++;
@@ -419,7 +424,7 @@ void rmnet_map_checksum_uplink_packet(struct sk_buff *skb,
 	ul_header->csum_start_offset = 0;
 	ul_header->csum_insert_offset = 0;
 	ul_header->csum_enabled = 0;
-	ul_header->udp_ip4_ind = 0;
+	ul_header->udp_ind = 0;
 
 	priv->stats.csum_sw++;
 }
diff --git a/include/linux/if_rmnet.h b/include/linux/if_rmnet.h
index b4f5403..9661416 100644
--- a/include/linux/if_rmnet.h
+++ b/include/linux/if_rmnet.h
@@ -41,11 +41,11 @@ struct rmnet_map_ul_csum_header {
 	__be16 csum_start_offset;
 #if defined(__LITTLE_ENDIAN_BITFIELD)
 	u16 csum_insert_offset:14;
-	u16 udp_ip4_ind:1;
+	u16 udp_ind:1;
 	u16 csum_enabled:1;
 #elif defined (__BIG_ENDIAN_BITFIELD)
 	u16 csum_enabled:1;
-	u16 udp_ip4_ind:1;
+	u16 udp_ind:1;
 	u16 csum_insert_offset:14;
 #else
 #error	"Please fix <asm/byteorder.h>"
-- 
1.9.1


^ permalink raw reply related

* Re: [PATCH] net: sfc: falcon: convert to i2c_new_dummy_device
From: David Miller @ 2019-07-25 18:19 UTC (permalink / raw)
  To: wsa+renesas
  Cc: linux-i2c, linux-net-drivers, ecree, mhabets, netdev,
	linux-kernel
In-Reply-To: <20190722172635.4535-1-wsa+renesas@sang-engineering.com>

From: Wolfram Sang <wsa+renesas@sang-engineering.com>
Date: Mon, 22 Jul 2019 19:26:35 +0200

> Move from i2c_new_dummy() to i2c_new_dummy_device(). So, we now get an
> ERRPTR which we use in error handling.
> 
> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>

Applied.

^ permalink raw reply

* Re: [PATCH] sis900: add support for ethtool --eeprom-dump
From: Andrew Lunn @ 2019-07-25 18:20 UTC (permalink / raw)
  To: Sergej Benilov; +Cc: venza, netdev
In-Reply-To: <CAC9-QvATLW0uCzGpeY1kLXs5BBsfNBF_BKCnCz+38_f+STJhog@mail.gmail.com>

On Thu, Jul 25, 2019 at 06:41:41PM +0200, Sergej Benilov wrote:
> On Thu, 25 Jul 2019 at 18:25, Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > +static int sis900_read_eeprom(struct net_device *net_dev, u8 *buf)
> > > +{
> > > +     struct sis900_private *sis_priv = netdev_priv(net_dev);
> > > +     void __iomem *ioaddr = sis_priv->ioaddr;
> > > +     int wait, ret = -EAGAIN;
> > > +     u16 signature;
> > > +     u16 *ebuf = (u16 *)buf;
> > > +     int i;
> > > +
> > > +     if (sis_priv->chipset_rev == SIS96x_900_REV) {
> > > +             sw32(mear, EEREQ);
> > > +             for (wait = 0; wait < 2000; wait++) {
> > > +                     if (sr32(mear) & EEGNT) {
> > > +                             /* read 16 bits, and index by 16 bits */
> > > +                             for (i = 0; i < sis_priv->eeprom_size / 2; i++)
> > > +                                     ebuf[i] = (u16)read_eeprom(ioaddr, i);
> > > +                     ret = 0;
> > > +                     break;
> > > +                     }
> > > +             udelay(1);
> > > +             }
> > > +     sw32(mear, EEDONE);
> >
> > The indentation looks all messed up here.
> 
> This has passed ./scripts/checkpatch.pl, as you had suggested for the
> previous patch.

checkpatch just checks for things like tabs vs space. 

I would expect the indentation to be more like:


     	if (sis_priv->chipset_rev == SIS96x_900_REV) {
             	sw32(mear, EEREQ);
		for (wait = 0; wait < 2000; wait++) {
			if (sr32(mear) & EEGNT) {
				/* read 16 bits, and index by 16 bits */
				for (i = 0; i < sis_priv->eeprom_size / 2; i++)
					ebuf[i] = (u16)read_eeprom(ioaddr, i);
				ret = 0;
				break;
			}
			udelay(1);
		}
		sw32(mear, EEDONE);
	} else {
		signature = (u16)read_eeprom(ioaddr, EEPROMSignature);
		if (signature != 0xffff && signature != 0x0000) {
			/* read 16 bits, and index by 16 bits */
			for (i = 0; i < sis_priv->eeprom_size / 2; i++)
				ebuf[i] = (u16)read_eeprom(ioaddr, i);
			ret = 0;
		}
	}
	return ret;

> > Why do you not put the data directly into data and avoid this memory
> > allocation, and memcpy?
> 
> Because EEPROM data from 'eeprom->offset' offset and of 'eeprom->len'
> length only is expected to be returned in 'data'.

O.K.

	Andrew

^ permalink raw reply

* Re: [PATCH] net-next: ag71xx: Rearrange ag711xx struct to remove holes
From: David Miller @ 2019-07-25 18:21 UTC (permalink / raw)
  To: rosenp; +Cc: netdev
In-Reply-To: <20190723034309.16492-1-rosenp@gmail.com>

From: Rosen Penev <rosenp@gmail.com>
Date: Mon, 22 Jul 2019 20:43:09 -0700

> Removed ____cacheline_aligned attribute to ring structs. This actually
> causes holes in the ag71xx struc as well as lower performance.

If you are legitimizing a change because of performance you must provide
detailed performance results that support this reason.

I'm not applying this patch until you respin it with the required
information.

Thank you.

^ permalink raw reply

* Re: [Patch net] ife: error out when nla attributes are empty
From: David Miller @ 2019-07-25 18:22 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, syzbot+fbb5b288c9cb6a2eeac4, jhs, jiri
In-Reply-To: <20190723044300.16143-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Mon, 22 Jul 2019 21:43:00 -0700

> act_ife at least requires TCA_IFE_PARMS, so we have to bail out
> when there is no attribute passed in.
> 
> Reported-by: syzbot+fbb5b288c9cb6a2eeac4@syzkaller.appspotmail.com
> Fixes: ef6980b6becb ("introduce IFE action")
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Jiri Pirko <jiri@resnulli.us>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied and queued up for -stable.

^ permalink raw reply

* Re: [RFC] dt-bindings: net: phy: Add subnode for LED configuration
From: Andrew Lunn @ 2019-07-25 18:34 UTC (permalink / raw)
  To: Matthias Kaehlcke
  Cc: David S . Miller, Rob Herring, Mark Rutland, Florian Fainelli,
	Heiner Kallweit, netdev, devicetree, linux-kernel,
	Douglas Anderson
In-Reply-To: <20190725175258.GE250418@google.com>

> As of now I don't plan to expose the label to userspace by the PHY
> driver/framework itself.

Great.

With that change, i think this proposed binding is O.K.

     Andrew

^ permalink raw reply

* Re: [PATCH net 0/2] DIM fixes for 5.3
From: David Miller @ 2019-07-25 18:34 UTC (permalink / raw)
  To: leon; +Cc: leonro, dledford, jgg, linux-rdma, talgi, yaminf, saeedm, netdev
In-Reply-To: <20190723072248.6844-1-leon@kernel.org>

From: Leon Romanovsky <leon@kernel.org>
Date: Tue, 23 Jul 2019 10:22:46 +0300

> Those two fixes for recently merged DIM patches, both exposed through
> RDMa DIM usage.

Series applied.

^ permalink raw reply

* Re: [PATCH net-next 0/2] mlxsw: Two small updates
From: David Miller @ 2019-07-25 18:36 UTC (permalink / raw)
  To: idosch; +Cc: netdev, jiri, amitc, mlxsw, idosch
In-Reply-To: <20190723075742.29029-1-idosch@idosch.org>

From: Ido Schimmel <idosch@idosch.org>
Date: Tue, 23 Jul 2019 10:57:40 +0300

> From: Ido Schimmel <idosch@mellanox.com>
> 
> Patch #1, from Amit, exposes the size of the key-value database (KVD)
> where different entries (e.g., routes, neighbours) are stored in the
> device. This allows users to understand how many entries can be
> offloaded and is also useful for writing scale tests.
> 
> Patch #2 increases the number of IPv6 nexthop groups mlxsw can offload.
> The problem and solution are explained in detail in the commit message.

Series applied.

^ permalink raw reply

* Re: [PATCH] ptp: ptp_dte: remove redundant dev_err message
From: David Miller @ 2019-07-25 18:38 UTC (permalink / raw)
  To: dingxiang; +Cc: richardcochran, netdev, linux-kernel
In-Reply-To: <1563872045-3692-1-git-send-email-dingxiang@cmss.chinamobile.com>

From: Ding Xiang <dingxiang@cmss.chinamobile.com>
Date: Tue, 23 Jul 2019 16:54:05 +0800

> devm_ioremap_resource already contains error message, so remove
> the redundant dev_err message
> 
> Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com>

Applied to net-next.

^ permalink raw reply

* Re: [PATCH v2] tun: mark small packets as owned by the tap sock
From: David Miller @ 2019-07-25 18:39 UTC (permalink / raw)
  To: abauvin; +Cc: stephen, jasowang, netdev
In-Reply-To: <20190723142301.39568-1-abauvin@scaleway.com>

From: Alexis Bauvin <abauvin@scaleway.com>
Date: Tue, 23 Jul 2019 16:23:01 +0200

> - v1 -> v2: Move skb_set_owner_w to __tun_build_skb to reduce patch size
> 
> Small packets going out of a tap device go through an optimized code
> path that uses build_skb() rather than sock_alloc_send_pskb(). The
> latter calls skb_set_owner_w(), but the small packet code path does not.
> 
> The net effect is that small packets are not owned by the userland
> application's socket (e.g. QEMU), while large packets are.
> This can be seen with a TCP session, where packets are not owned when
> the window size is small enough (around PAGE_SIZE), while they are once
> the window grows (note that this requires the host to support virtio
> tso for the guest to offload segmentation).
> All this leads to inconsistent behaviour in the kernel, especially on
> netfilter modules that uses sk->socket (e.g. xt_owner).
> 
> Signed-off-by: Alexis Bauvin <abauvin@scaleway.com>
> Fixes: 66ccbc9c87c2 ("tap: use build_skb() for small packet")

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net-next] Name NICs based on vmbus offer and enable async probe by default
From: David Miller @ 2019-07-25 18:46 UTC (permalink / raw)
  To: haiyangz
  Cc: sashal, linux-hyperv, netdev, kys, sthemmin, olaf, vkuznets,
	linux-kernel
In-Reply-To: <1563908517-55735-1-git-send-email-haiyangz@microsoft.com>


1) Subject: line lacks proper subsystem prefix

2) No module parameters in networking drivers, sorry.  Find some generic way to do
   this via devlink or similar.


^ permalink raw reply

* Re: [PATCH] stmmac_dt_phy: null check the allocation
From: David Miller @ 2019-07-25 18:52 UTC (permalink / raw)
  To: navid.emamdoost
  Cc: emamd001, kjlu, smccaman, secalert, peppe.cavallaro,
	alexandre.torgue, joabreu, mcoquelin.stm32, netdev, linux-stm32,
	linux-arm-kernel, linux-kernel
In-Reply-To: <20190723222809.9752-1-navid.emamdoost@gmail.com>

From: Navid Emamdoost <navid.emamdoost@gmail.com>
Date: Tue, 23 Jul 2019 17:28:09 -0500

> @@ -342,10 +342,13 @@ static int stmmac_dt_phy(struct plat_stmmacenet_data *plat,
>  		mdio = true;
>  	}
>  
> -	if (mdio)
> +	if (mdio) {
>  		plat->mdio_bus_data =
>  			devm_kzalloc(dev, sizeof(struct stmmac_mdio_bus_data),
>  				     GFP_KERNEL);
> +		if (!plat->mdio_bus_data)
> +			return -ENOMEM;

This leaks a reference to plat->mdio_node() which is acquired in the
for_each_child_of_node() loop right before the context here.

This is what I really fear about these automated patches, it is quite
often the case that acquired resources are subtly acquired in nearby
code and not released by the proposed "fix".

Therefore, either we end up with a regression, or as is the case here
a reviewer invests more time into double checking your patch than you
put into writing the patch in the first place.

^ permalink raw reply

* Re: [PATCH] fsl/fman: Remove comment referring to non-existent function
From: David Miller @ 2019-07-25 18:53 UTC (permalink / raw)
  To: chris.packham; +Cc: madalin.bucur, netdev, linux-kernel
In-Reply-To: <20190723233501.6626-1-chris.packham@alliedtelesis.co.nz>

From: Chris Packham <chris.packham@alliedtelesis.co.nz>
Date: Wed, 24 Jul 2019 11:35:01 +1200

> fm_set_max_frm() existed in the Freescale SDK as a callback for an
> early_param. When this code was ported to the upstream kernel the
> early_param was converted to a module_param making the reference to the
> function incorrect. The rest of the comment already does a good job of
> explaining the parameter so removing the reference to the non-existent
> function seems like the best thing to do.
> 
> Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>

Applied, thanks.

^ permalink raw reply

* ethtool 5.2 released
From: John W. Linville @ 2019-07-25 18:59 UTC (permalink / raw)
  To: netdev

ethtool version 5.2 has been released.

Home page: https://www.kernel.org/pub/software/network/ethtool/
Download link:
https://www.kernel.org/pub/software/network/ethtool/ethtool-5.2.tar.xz

Release notes:

	* Feature: Add 100BaseT1 and 1000BaseT1 link modes
	* Feature: Use standard file location macros in ethtool.spec

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: [PATCH net-next 1/3] flow_offload: move tc indirect block to flow offload
From: Jakub Kicinski @ 2019-07-25 19:10 UTC (permalink / raw)
  To: wenxu; +Cc: pablo, netfilter-devel, netdev
In-Reply-To: <1564048533-27283-1-git-send-email-wenxu@ucloud.cn>

Please provide cover letter for the patch set.

On Thu, 25 Jul 2019 17:55:31 +0800, wenxu@ucloud.cn wrote:
> +static bool rhash_table_init;
> +int flow_indr_rhashtable_init(void)
> +{
> +	int err = 0;
> +
> +	if (!rhash_table_init) {
> +		err = rhashtable_init(&indr_setup_block_ht,
> +				      &flow_indr_setup_block_ht_params);
> +
> +		if (!err)
> +			rhash_table_init = true;
> +	}
> +
> +	return err;
> +}
> +EXPORT_SYMBOL_GPL(flow_indr_rhashtable_init);

This should be written like this:

int flow_indr_rhashtable_init(void)
{
	static bool rhash_table_init;
	int err;

	if (rhash_table_init)
		return 0;

	err = rhashtable_init(&indr_setup_block_ht,
			      &flow_indr_setup_block_ht_params);
	if (err)
		return err;

	rhash_table_init = true;
	return 0;
}
EXPORT_SYMBOL_GPL(flow_indr_rhashtable_init);

^ permalink raw reply

* [PATCH] mac80211_hwsim: Fix a typo in the name of function 'mac80211_hswim_he_capab()'
From: Christophe JAILLET @ 2019-07-25 19:13 UTC (permalink / raw)
  To: johannes, kvalo, davem
  Cc: linux-wireless, netdev, linux-kernel, kernel-janitors,
	Christophe JAILLET

This function name should be 'mac80211_hwsim_he_capab()' (s wand w
switched) to be consistent with the rest of the file.
Fix and use it.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
---
 drivers/net/wireless/mac80211_hwsim.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c
index b5274d1f30fa..64b0e51f8e8d 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -2595,7 +2595,7 @@ static const struct ieee80211_sband_iftype_data he_capa_5ghz = {
 	},
 };
 
-static void mac80211_hswim_he_capab(struct ieee80211_supported_band *sband)
+static void mac80211_hwsim_he_capab(struct ieee80211_supported_band *sband)
 {
 	if (sband->band == NL80211_BAND_2GHZ)
 		sband->iftype_data =
@@ -2898,7 +2898,7 @@ static int mac80211_hwsim_new_radio(struct genl_info *info,
 		sband->ht_cap.mcs.rx_mask[1] = 0xff;
 		sband->ht_cap.mcs.tx_params = IEEE80211_HT_MCS_TX_DEFINED;
 
-		mac80211_hswim_he_capab(sband);
+		mac80211_hwsim_he_capab(sband);
 
 		hw->wiphy->bands[band] = sband;
 	}
-- 
2.20.1


^ permalink raw reply related

* Re: [RFC] dt-bindings: net: phy: Add subnode for LED configuration
From: Matthias Kaehlcke @ 2019-07-25 19:18 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: David S . Miller, Rob Herring, Mark Rutland, Florian Fainelli,
	Heiner Kallweit, netdev, devicetree, linux-kernel,
	Douglas Anderson
In-Reply-To: <20190725183441.GL21952@lunn.ch>

On Thu, Jul 25, 2019 at 08:34:41PM +0200, Andrew Lunn wrote:
> > As of now I don't plan to expose the label to userspace by the PHY
> > driver/framework itself.
> 
> Great.
> 
> With that change, i think this proposed binding is O.K.

Great, thanks for your feedback!

I'll probably post a new version with actual code next week, unless
there is more discussion before I get to it.

^ permalink raw reply

* Re: [PATCH] net: sfc: falcon: convert to i2c_new_dummy_device
From: Wolfram Sang @ 2019-07-25 19:27 UTC (permalink / raw)
  To: Edward Cree
  Cc: David Miller, wsa+renesas, linux-i2c, linux-net-drivers, mhabets,
	netdev, linux-kernel
In-Reply-To: <72968faa-e260-3640-99be-9c63bc79ad5e@solarflare.com>

[-- Attachment #1: Type: text/plain, Size: 566 bytes --]


> >> Move from i2c_new_dummy() to i2c_new_dummy_device(). So, we now get an
> >> ERRPTR which we use in error handling.
> >>
> >> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
> 
> Subject & description are incomplete, you're also changing i2c_new_device()
>  to i2c_new_client_device().

Right, this was an anomaly with this patch to have both code paths
returning an ERRPTR. The big conversion for i2c_new_device will come
later in a seperate series.

> Other than that,
> Acked-by: Edward Cree <ecree@solarflare.com>

Thanks!


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH net-next 3/3] dt-bindings: net: ethernet: Update mt7622 docs and dts to reflect the new phylink API
From: Andrew Lunn @ 2019-07-25 19:31 UTC (permalink / raw)
  To: René van Dorst
  Cc: netdev, frank-w, sean.wang, f.fainelli, linux, davem,
	matthias.bgg, vivien.didelot, john, linux-mediatek, linux-mips,
	robh+dt, devicetree
In-Reply-To: <20190724192411.20639-1-opensource@vdorst.com>

> +	gmac0: mac@0 {
> +		compatible = "mediatek,eth-mac";
> +		reg = <0>;
> +		phy-mode = "sgmii";
> +
> +		fixed-link {
> +			speed = <2500>;
> +			full-duplex;
> +			pause;
> +		};
> +	};

Hi René

SGMII and fixed-link is rather odd. Why do you need this combination?

      Andrew

^ permalink raw reply

* Re: [PATCH bpf-next 02/10] libbpf: implement BPF CO-RE offset relocation algorithm
From: Song Liu @ 2019-07-25 19:32 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, Alexei Starovoitov, Daniel Borkmann,
	Yonghong Song, Andrii Nakryiko, Kernel Team
In-Reply-To: <20190724192742.1419254-3-andriin@fb.com>



> On Jul 24, 2019, at 12:27 PM, Andrii Nakryiko <andriin@fb.com> wrote:
> 
> This patch implements the core logic for BPF CO-RE offsets relocations.
> All the details are described in code comments.

Some description in the change log is still useful. Please at least 
copy-paste key comments here. 

And, this is looooong. I think it is totally possible to split it into
multiple smaller patches. 

I haven't finished all of it. Please see my comments below of parts I
have covered. 

Thanks,
Song

> 
> Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> ---
> tools/lib/bpf/libbpf.c | 866 ++++++++++++++++++++++++++++++++++++++++-
> tools/lib/bpf/libbpf.h |   1 +
> 2 files changed, 861 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 8741c39adb1c..86d87bf10d46 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -38,6 +38,7 @@
> #include <sys/stat.h>
> #include <sys/types.h>
> #include <sys/vfs.h>
> +#include <sys/utsname.h>
> #include <tools/libc_compat.h>
> #include <libelf.h>
> #include <gelf.h>
> @@ -47,6 +48,7 @@
> #include "btf.h"
> #include "str_error.h"
> #include "libbpf_internal.h"
> +#include "hashmap.h"
> 
> #ifndef EM_BPF
> #define EM_BPF 247
> @@ -1013,16 +1015,22 @@ static int bpf_object__init_user_maps(struct bpf_object *obj, bool strict)
> }
> 
> static const struct btf_type *skip_mods_and_typedefs(const struct btf *btf,
> -						     __u32 id)
> +						     __u32 id,
> +						     __u32 *res_id)
> {
> 	const struct btf_type *t = btf__type_by_id(btf, id);

Maybe have a local "__u32 rid;" 

> 
> +	if (res_id)
> +		*res_id = id;
> +

and do "rid = id;" here

> 	while (true) {
> 		switch (BTF_INFO_KIND(t->info)) {
> 		case BTF_KIND_VOLATILE:
> 		case BTF_KIND_CONST:
> 		case BTF_KIND_RESTRICT:
> 		case BTF_KIND_TYPEDEF:
> +			if (res_id)
> +				*res_id = t->type;
and here

> 			t = btf__type_by_id(btf, t->type);
> 			break;
> 		default:
and "*res_id = rid;" right before return?

> @@ -1041,7 +1049,7 @@ static const struct btf_type *skip_mods_and_typedefs(const struct btf *btf,
> static bool get_map_field_int(const char *map_name, const struct btf *btf,
> 			      const struct btf_type *def,
> 			      const struct btf_member *m, __u32 *res) {
> -	const struct btf_type *t = skip_mods_and_typedefs(btf, m->type);
> +	const struct btf_type *t = skip_mods_and_typedefs(btf, m->type, NULL);
> 	const char *name = btf__name_by_offset(btf, m->name_off);
> 	const struct btf_array *arr_info;
> 	const struct btf_type *arr_t;
> @@ -1107,7 +1115,7 @@ static int bpf_object__init_user_btf_map(struct bpf_object *obj,
> 		return -EOPNOTSUPP;
> 	}
> 
> -	def = skip_mods_and_typedefs(obj->btf, var->type);
> +	def = skip_mods_and_typedefs(obj->btf, var->type, NULL);
> 	if (BTF_INFO_KIND(def->info) != BTF_KIND_STRUCT) {
> 		pr_warning("map '%s': unexpected def kind %u.\n",
> 			   map_name, BTF_INFO_KIND(var->info));
> @@ -2289,6 +2297,845 @@ bpf_program_reloc_btf_ext(struct bpf_program *prog, struct bpf_object *obj,
> 	return 0;
> }
> 
> +#define BPF_CORE_SPEC_MAX_LEN 64
> +
> +/* represents BPF CO-RE field or array element accessor */
> +struct bpf_core_accessor {
> +	__u32 type_id;		/* struct/union type or array element type */
> +	__u32 idx;		/* field index or array index */
> +	const char *name;	/* field name or NULL for array accessor */
> +};
> +
> +struct bpf_core_spec {
> +	const struct btf *btf;
> +	/* high-level spec: named fields and array indicies only */

typo: indices

> +	struct bpf_core_accessor spec[BPF_CORE_SPEC_MAX_LEN];
> +	/* high-level spec length */
> +	int len;
> +	/* raw, low-level spec: 1-to-1 with accessor spec string */
> +	int raw_spec[BPF_CORE_SPEC_MAX_LEN];
> +	/* raw spec length */
> +	int raw_len;
> +	/* field byte offset represented by spec */
> +	__u32 offset;
> +};
> +
> +static bool str_is_empty(const char *s)
> +{
> +	return !s || !s[0];
> +}
> +
> +static int btf_kind(const struct btf_type *t)
> +{
> +	return BTF_INFO_KIND(t->info);
> +}
> +
> +static bool btf_is_composite(const struct btf_type *t)
> +{
> +	int kind = btf_kind(t);
> +
> +	return kind == BTF_KIND_STRUCT || kind == BTF_KIND_UNION;
> +}
> +
> +static bool btf_is_array(const struct btf_type *t)
> +{
> +	return btf_kind(t) == BTF_KIND_ARRAY;
> +}
> +
> +/* 
> + * Turn bpf_offset_reloc into a low- and high-level spec representation,
> + * validating correctness along the way, as well as calculating resulting
> + * field offset (in bytes), specified by accessor string. Low-level spec
> + * captures every single level of nestedness, including traversing anonymous
> + * struct/union members. High-level one only captures semantically meaningful
> + * "turning points": named fields and array indicies.
> + * E.g., for this case:
> + *
> + *   struct sample {
> + *       int __unimportant;
> + *       struct {
> + *           int __1;
> + *           int __2;
> + *           int a[7];
> + *       };
> + *   };
> + *
> + *   struct sample *s = ...;
> + *
> + *   int x = &s->a[3]; // access string = '0:1:2:3'
> + *
> + * Low-level spec has 1:1 mapping with each element of access string (it's
> + * just a parsed access string representation): [0, 1, 2, 3].
> + *
> + * High-level spec will capture only 3 points:
> + *   - intial zero-index access by pointer (&s->... is the same as &s[0]...);
> + *   - field 'a' access (corresponds to '2' in low-level spec);
> + *   - array element #3 access (corresponds to '3' in low-level spec).
> + *
> + */

IIUC, high-level points are subset of low-level points. How about we introduce
"anonymous" high-level points, so that high-level points and low-level points
are 1:1 mapping? 

> +static int bpf_core_spec_parse(const struct btf *btf,
> +			       __u32 type_id,
> +			       const char *spec_str,
> +			       struct bpf_core_spec *spec)
> +{
> +	int access_idx, parsed_len, i;
> +	const struct btf_type *t;
> +	__u32 id = type_id;
> +	const char *name;
> +	__s64 sz;
> +
> +	if (str_is_empty(spec_str) || *spec_str == ':')
> +		return -EINVAL;
> +
> +	memset(spec, 0, sizeof(*spec));
> +	spec->btf = btf;
> +
> +	/* parse spec_str="0:1:2:3:4" into array raw_spec=[0, 1, 2, 3, 4] */
> +	while (*spec_str) {
> +		if (*spec_str == ':')
> +			++spec_str;
> +		if (sscanf(spec_str, "%d%n", &access_idx, &parsed_len) != 1)
> +			return -EINVAL;
> +		if (spec->raw_len == BPF_CORE_SPEC_MAX_LEN)
> +			return -E2BIG;
> +		spec_str += parsed_len;
> +		spec->raw_spec[spec->raw_len++] = access_idx;
> +	}
> +
> +	if (spec->raw_len == 0)
> +		return -EINVAL;
> +
> +	for (i = 0; i < spec->raw_len; i++) {
> +		t = skip_mods_and_typedefs(btf, id, &id);
> +		if (!t)
> +			return -EINVAL;
> +
> +		access_idx = spec->raw_spec[i];
> +
> +		if (i == 0) {
> +			/* first spec value is always reloc type array index */
> +			spec->spec[spec->len].type_id = id;
> +			spec->spec[spec->len].idx = access_idx;
> +			spec->len++;
> +
> +			sz = btf__resolve_size(btf, id);
> +			if (sz < 0)
> +				return sz;
> +			spec->offset += access_idx * sz;
          spec->offset = access_idx * sz;  should be enough

> +			continue;
> +		}

Maybe pull i == 0 case out of the for loop? 

> +
> +		if (btf_is_composite(t)) {
> +			const struct btf_member *m = (void *)(t + 1);
> +			__u32 offset;
> +
> +			if (access_idx >= BTF_INFO_VLEN(t->info))
> +				return -EINVAL;
> +
> +			m = &m[access_idx];
> +
> +			if (BTF_INFO_KFLAG(t->info)) {
> +				if (BTF_MEMBER_BITFIELD_SIZE(m->offset))
> +					return -EINVAL;
> +				offset = BTF_MEMBER_BIT_OFFSET(m->offset);
> +			} else {
> +				offset = m->offset;
> +			}
> +			if (m->offset % 8)
> +				return -EINVAL;
> +			spec->offset += offset / 8;
> +
> +			if (m->name_off) {
> +				name = btf__name_by_offset(btf, m->name_off);
> +				if (str_is_empty(name))
> +					return -EINVAL;
> +
> +				spec->spec[spec->len].type_id = id;
> +				spec->spec[spec->len].idx = access_idx;
> +				spec->spec[spec->len].name = name;
> +				spec->len++;
> +			}
> +
> +			id = m->type;
> +		} else if (btf_is_array(t)) {
> +			const struct btf_array *a = (void *)(t + 1);
> +
> +			t = skip_mods_and_typedefs(btf, a->type, &id);
> +			if (!t || access_idx >= a->nelems)
> +				return -EINVAL;
> +
> +			spec->spec[spec->len].type_id = id;
> +			spec->spec[spec->len].idx = access_idx;
> +			spec->len++;
> +
> +			sz = btf__resolve_size(btf, id);
> +			if (sz < 0)
> +				return sz;
> +			spec->offset += access_idx * sz;
> +		} else {
> +			pr_warning("relo for [%u] %s (at idx %d) captures type [%d] of unexpected kind %d\n",
> +				   type_id, spec_str, i, id, btf_kind(t));
> +			return -EINVAL;
> +		}
> +	}
> +
> +	if (spec->len == 0)
> +		return -EINVAL;

Can this ever happen? 

> +
> +	return 0;
> +}
> +
> +/* Given 'some_struct_name___with_flavor' return the length of a name prefix
> + * before last triple underscore. Struct name part after last triple
> + * underscore is ignored by BPF CO-RE relocation during relocation matching.
> + */
> +static size_t bpf_core_essential_name_len(const char *name)
> +{
> +	size_t n = strlen(name);
> +	int i = n - 3;
> +
> +	while (i > 0) {
> +		if (name[i] == '_' && name[i + 1] == '_' && name[i + 2] == '_')
> +			return i;
> +		i--;
> +	}
> +	return n;
> +}
> +
> +/* dynamically sized list of type IDs */
> +struct ids_vec {
> +	__u32 *data;
> +	int len;
> +};
> +
> +static void bpf_core_free_cands(struct ids_vec *cand_ids)
> +{
> +	free(cand_ids->data);
> +	free(cand_ids);
> +}
> +
> +static struct ids_vec *bpf_core_find_cands(const struct btf *local_btf,
> +					   __u32 local_type_id,
> +					   const struct btf *targ_btf)
> +{
> +	size_t local_essent_len, targ_essent_len;
> +	const char *local_name, *targ_name;
> +	const struct btf_type *t;
> +	struct ids_vec *cand_ids;
> +	__u32 *new_ids;
> +	int i, err, n;
> +
> +	t = btf__type_by_id(local_btf, local_type_id);
> +	if (!t)
> +		return ERR_PTR(-EINVAL);
> +
> +	local_name = btf__name_by_offset(local_btf, t->name_off);
> +	if (str_is_empty(local_name))
> +		return ERR_PTR(-EINVAL);
> +	local_essent_len = bpf_core_essential_name_len(local_name);
> +
> +	cand_ids = calloc(1, sizeof(*cand_ids));
> +	if (!cand_ids)
> +		return ERR_PTR(-ENOMEM);
> +
> +	n = btf__get_nr_types(targ_btf);
> +	for (i = 1; i <= n; i++) {
> +		t = btf__type_by_id(targ_btf, i);
> +		targ_name = btf__name_by_offset(targ_btf, t->name_off);
> +		if (str_is_empty(targ_name))
> +			continue;
> +
> +		targ_essent_len = bpf_core_essential_name_len(targ_name);
> +		if (targ_essent_len != local_essent_len)
> +			continue;
> +
> +		if (strncmp(local_name, targ_name, local_essent_len) == 0) {
> +			pr_debug("[%d] (%s): found candidate [%d] (%s)\n",
> +				 local_type_id, local_name, i, targ_name);
> +			new_ids = realloc(cand_ids->data, cand_ids->len + 1);
> +			if (!new_ids) {
> +				err = -ENOMEM;
> +				goto err_out;
> +			}
> +			cand_ids->data = new_ids;
> +			cand_ids->data[cand_ids->len++] = i;
> +		}
> +	}
> +	return cand_ids;
> +err_out:
> +	bpf_core_free_cands(cand_ids);
> +	return ERR_PTR(err);
> +}
> +
> +/* Check two types for compatibility, skipping const/volatile/restrict and
> + * typedefs, to ensure we are relocating offset to the compatible entities:
> + *   - any two STRUCTs/UNIONs are compatible and can be mixed;
> + *   - any two FWDs are compatible;
> + *   - any two PTRs are always compatible;
> + *   - for ENUMs, check sizes, names are ignored;
> + *   - for INT, size and bitness should match, signedness is ignored;
> + *   - for ARRAY, dimensionality is ignored, element types are checked for
> + *     compatibility recursively;
> + *   - everything else shouldn't be ever a target of relocation.
> + * These rules are not set in stone and probably will be adjusted as we get
> + * more experience with using BPF CO-RE relocations.
> + */
> +static int bpf_core_fields_are_compat(const struct btf *local_btf,
> +				      __u32 local_id,
> +				      const struct btf *targ_btf,
> +				      __u32 targ_id)
> +{
> +	const struct btf_type *local_type, *targ_type;
> +	__u16 kind;
> +
> +recur:
> +	local_type = skip_mods_and_typedefs(local_btf, local_id, &local_id);
> +	targ_type = skip_mods_and_typedefs(targ_btf, targ_id, &targ_id);
> +	if (!local_type || !targ_type)
> +		return -EINVAL;
> +
> +	if (btf_is_composite(local_type) && btf_is_composite(targ_type))
> +		return 1;
> +	if (BTF_INFO_KIND(local_type->info) != BTF_INFO_KIND(targ_type->info))
> +		return 0;
> +
> +	kind = BTF_INFO_KIND(local_type->info);
> +	switch (kind) {
> +	case BTF_KIND_FWD:
> +	case BTF_KIND_PTR:
> +		return 1;
> +	case BTF_KIND_ENUM:
> +		return local_type->size == targ_type->size;
> +	case BTF_KIND_INT: {
> +		__u32 loc_int = *(__u32 *)(local_type + 1);
> +		__u32 targ_int = *(__u32 *)(targ_type + 1);
> +
> +		return BTF_INT_OFFSET(loc_int) == 0 &&
> +		       BTF_INT_OFFSET(targ_int) == 0 &&
> +		       local_type->size == targ_type->size &&
> +		       BTF_INT_BITS(loc_int) == BTF_INT_BITS(targ_int);
> +	}
> +	case BTF_KIND_ARRAY: {
> +		const struct btf_array *loc_a, *targ_a;
> +
> +		loc_a = (void *)(local_type + 1);
> +		targ_a = (void *)(targ_type + 1);
> +		local_id = loc_a->type;
> +		targ_id = targ_a->type;
> +		goto recur;
> +	}
> +	default:
> +		pr_warning("unexpected kind %d relocated, local [%d], target [%d]\n",
> +			   kind, local_id, targ_id);
> +		return 0;
> +	}
> +}
> +
> +/* 
> + * Given single high-level accessor (either named field or array index) in
> + * local type, find corresponding high-level accessor for a target type. Along
> + * the way, maintain low-level spec for target as well. Also keep updating
> + * target offset.
> + */

Please describe the recursive algorithm here. I am kinda lost. 
Also, please document the meaning of zero, positive, negative return values.

> +static int bpf_core_match_member(const struct btf *local_btf,
> +				 const struct bpf_core_accessor *local_acc,
> +				 const struct btf *targ_btf,
> +				 __u32 targ_id,
> +				 struct bpf_core_spec *spec,
> +				 __u32 *next_targ_id)
> +{
> +	const struct btf_type *local_type, *targ_type;
> +	const struct btf_member *local_member, *m;
> +	const char *local_name, *targ_name;
> +	__u32 local_id;
> +	int i, n, found;
> +
> +	targ_type = skip_mods_and_typedefs(targ_btf, targ_id, &targ_id);
> +	if (!targ_type)
> +		return -EINVAL;
> +	if (!btf_is_composite(targ_type))
> +		return 0;
> +
> +	local_id = local_acc->type_id;
> +	local_type = btf__type_by_id(local_btf, local_id);
> +	local_member = (void *)(local_type + 1);
> +	local_member += local_acc->idx;
> +	local_name = btf__name_by_offset(local_btf, local_member->name_off);
> +
> +	n = BTF_INFO_VLEN(targ_type->info);
> +	m = (void *)(targ_type + 1);
> +	for (i = 0; i < n; i++, m++) {
> +		__u32 offset;
> +
> +		/* bitfield relocations not supported */
> +		if (BTF_INFO_KFLAG(targ_type->info)) {
> +			if (BTF_MEMBER_BITFIELD_SIZE(m->offset))
> +				continue;
> +			offset = BTF_MEMBER_BIT_OFFSET(m->offset);
> +		} else {
> +			offset = m->offset;
> +		}
> +		if (offset % 8)
> +			continue;
> +
> +		/* too deep struct/union/array nesting */
> +		if (spec->raw_len == BPF_CORE_SPEC_MAX_LEN)
> +			return -E2BIG;
> +
> +		/* speculate this member will be the good one */
> +		spec->offset += offset / 8;
> +		spec->raw_spec[spec->raw_len++] = i;
> +
> +		targ_name = btf__name_by_offset(targ_btf, m->name_off);
> +		if (str_is_empty(targ_name)) {
> +			/* embedded struct/union, we need to go deeper */
> +			found = bpf_core_match_member(local_btf, local_acc,
> +						      targ_btf, m->type,
> +						      spec, next_targ_id);
> +			if (found) /* either found or error */
> +				return found;
> +		} else if (strcmp(local_name, targ_name) == 0) {
> +			/* matching named field */
> +			struct bpf_core_accessor *targ_acc;
> +
> +			targ_acc = &spec->spec[spec->len++];
> +			targ_acc->type_id = targ_id;
> +			targ_acc->idx = i;
> +			targ_acc->name = targ_name;
> +
> +			*next_targ_id = m->type;
> +			found = bpf_core_fields_are_compat(local_btf,
> +							   local_member->type,
> +							   targ_btf, m->type);
> +			if (!found)
> +				spec->len--; /* pop accessor */
> +			return found;
> +		}
> +		/* member turned out to be not we looked for */
> +		spec->offset -= offset / 8;
> +		spec->raw_len--;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * Try to match local spec to a target type and, if successful, produce full
> + * target spec (high-level, low-level + offset).
> + */
> +static int bpf_core_spec_match(struct bpf_core_spec *local_spec,
> +			       const struct btf *targ_btf, __u32 targ_id,
> +			       struct bpf_core_spec *targ_spec)
> +{
> +	const struct btf_type *targ_type;
> +	const struct bpf_core_accessor *local_acc;
> +	struct bpf_core_accessor *targ_acc;
> +	int i, sz, matched;
> +
> +	memset(targ_spec, 0, sizeof(*targ_spec));
> +	targ_spec->btf = targ_btf;
> +
> +	local_acc = &local_spec->spec[0];
> +	targ_acc = &targ_spec->spec[0];
> +
> +	for (i = 0; i < local_spec->len; i++, local_acc++, targ_acc++) {
> +		targ_type = skip_mods_and_typedefs(targ_spec->btf, targ_id,
> +						   &targ_id);
> +		if (!targ_type)
> +			return -EINVAL;
> +
> +		if (local_acc->name) {
> +			if (!btf_is_composite(targ_type))
> +				return 0;
> +
> +			matched = bpf_core_match_member(local_spec->btf,
> +							local_acc,
> +							targ_btf, targ_id,
> +							targ_spec, &targ_id);
> +			if (matched <= 0)
> +				return matched;
> +		} else {
> +			/* for i=0, targ_id is already treated as array element
> +			 * type (because it's the original struct), for others
> +			 * we should find array element type first
> +			 */
> +			if (i > 0) {

i == 0 case would go into "if (local_acc->name)" branch, no? 

> +				const struct btf_array *a;
> +
> +				if (!btf_is_array(targ_type))
> +					return 0;
> +
> +				a = (void *)(targ_type + 1);
> +				if (local_acc->idx >= a->nelems)
> +					return 0;
> +				if (!skip_mods_and_typedefs(targ_btf, a->type,
> +							    &targ_id))
> +					return -EINVAL;
> +			}
> +
> +			/* too deep struct/union/array nesting */
> +			if (targ_spec->raw_len == BPF_CORE_SPEC_MAX_LEN)
> +				return -E2BIG;
> +
> +			targ_acc->type_id = targ_id;
> +			targ_acc->idx = local_acc->idx;
> +			targ_acc->name = NULL;
> +			targ_spec->len++;
> +			targ_spec->raw_spec[targ_spec->raw_len] = targ_acc->idx;
> +			targ_spec->raw_len++;
> +
> +			sz = btf__resolve_size(targ_btf, targ_id);
> +			if (sz < 0)
> +				return sz;
> +			targ_spec->offset += local_acc->idx * sz;
> +		}
> +	}
> +
> +	return 1;
> +}
> +
> +/*
> + * Patch relocatable BPF instruction.
> + * Expected insn->imm value is provided for validation, as well as the new
> + * relocated value.
> + *
> + * Currently three kinds of BPF instructions are supported:
> + * 1. rX = <imm> (assignment with immediate operand);
> + * 2. rX += <imm> (arithmetic operations with immediate operand);
> + * 3. *(rX) = <imm> (indirect memory assignment with immediate operand).
> + *
> + * If actual insn->imm value is wrong, bail out.
> + */
> +static int bpf_core_reloc_insn(struct bpf_program *prog, int insn_off,
> +			       __u32 orig_off, __u32 new_off)
> +{
> +	struct bpf_insn *insn;
> +	int insn_idx;
> +	__u8 class;
> +
> +	if (insn_off % sizeof(struct bpf_insn))
> +		return -EINVAL;
> +	insn_idx = insn_off / sizeof(struct bpf_insn);
> +
> +	insn = &prog->insns[insn_idx];
> +	class = BPF_CLASS(insn->code);
> +
> +	if (class == BPF_ALU || class == BPF_ALU64) {
> +		if (BPF_SRC(insn->code) != BPF_K)
> +			return -EINVAL;
> +		if (insn->imm != orig_off)
> +			return -EINVAL;
> +		insn->imm = new_off;
> +		pr_debug("prog '%s': patched insn #%d (ALU/ALU64) imm %d -> %d\n",
> +			 bpf_program__title(prog, false),
> +			 insn_idx, orig_off, new_off);
> +	} else if (class == BPF_ST && BPF_MODE(insn->code) == BPF_MEM) {
> +		if (insn->imm != orig_off)
> +			return -EINVAL;
> +		insn->imm = new_off;
> +		pr_debug("prog '%s': patched insn #%d (ST | MEM) imm %d -> %d\n",
> +			 bpf_program__title(prog, false),
> +			 insn_idx, orig_off, new_off);
> +	} else {
> +		pr_warning("prog '%s': trying to relocate unrecognized insn #%d, code:%x, src:%x, dst:%x, off:%x, imm:%x\n",
> +			   bpf_program__title(prog, false),
> +			   insn_idx, insn->code, insn->src_reg, insn->dst_reg,
> +			   insn->off, insn->imm);
> +		return -EINVAL;
> +	}
> +	return 0;
> +}
> +
> +/*
> + * Probe few well-known locations for vmlinux kernel image and try to load BTF
> + * data out of it to use for target BTF.
> + */
> +static struct btf *bpf_core_find_kernel_btf(void)
> +{
> +	const char *locations[] = {
> +		"/lib/modules/%1$s/vmlinux-%1$s",
> +		"/usr/lib/modules/%1$s/kernel/vmlinux",
> +	};
> +	char path[PATH_MAX + 1];
> +	struct utsname buf;
> +	struct btf *btf;
> +	int i, err;
> +
> +	err = uname(&buf);
> +	if (err) {
> +		pr_warning("failed to uname(): %d\n", err);
> +		return ERR_PTR(err);
> +	}
> +
> +	for (i = 0; i < ARRAY_SIZE(locations); i++) {
> +		snprintf(path, PATH_MAX, locations[i], buf.release);
> +		pr_debug("attempting to load kernel BTF from '%s'\n", path);
> +
> +		if (access(path, R_OK))
> +			continue;
> +
> +		btf = btf__parse_elf(path, NULL);
> +		if (IS_ERR(btf))
> +			continue;
> +
> +		pr_debug("successfully loaded kernel BTF from '%s'\n", path);
> +		return btf;
> +	}
> +
> +	pr_warning("failed to find valid kernel BTF\n");
> +	return ERR_PTR(-ESRCH);
> +}
> +
> +static size_t bpf_core_hash_fn(const void *key, void *ctx)
> +{
> +	return (size_t)key;
> +}
> +
> +static bool bpf_core_equal_fn(const void *k1, const void *k2, void *ctx)
> +{
> +	return k1 == k2;
> +}
> +
> +static void *u32_to_ptr(__u32 x)
> +{
> +	return (void *)(uintptr_t)x;
> +}
> +
> +/* 
> + * CO-RE relocate single instruction.
> + *
> + * The outline and important points of the algorithm:
> + * 1. For given local type, find corresponding candidate target types.
> + *    Candidate type is a type with the same "essential" name, ignoring
> + *    everything after last triple underscore (___). E.g., `sample`,
> + *    `sample___flavor_one`, `sample___flavor_another_one`, are all candidates
> + *    for each other. Names with triple underscore are referred to as
> + *    "flavors" and are useful, among other things, to allow to
> + *    specify/support incompatible variations of the same kernel struct, which
> + *    might differ between different kernel versions and/or build
> + *    configurations.
> + * 2. For each candidate type, try to match local specification to this
> + *    candidate target type. Matching involves finding corresponding
> + *    high-level spec accessors, meaning that all named fields should match,
> + *    as well as all array accesses should be within the actual bounds. Also,
> + *    types should be compatible (see bpf_core_fields_are_compat for details).
> + * 3. It is supported and expected that there might be multiple flavors
> + *    matching the spec. As long as all the specs resolve to the same set of
> + *    offsets across all candidates, there is not error. If there is any
> + *    ambiguity, CO-RE relocation will fail. This is necessary to accomodate
> + *    imprefection of BTF deduplication, which can cause slight duplication of
> + *    the same BTF type, if some directly or indirectly referenced (by
> + *    pointer) type gets resolved to different actual types in different
> + *    object files. If such situation occurs, deduplicated BTF will end up
> + *    with two (or more) structurally identical types, which differ only in
> + *    types they refer to through pointer. This should be OK in most cases and
> + *    is not an error.
> + * 4. Candidate types search is performed by linearly scanning through all
> + *    types in target BTF. It is anticipated that this is overall more
> + *    efficient memory-wise and not significantly worse (if not better)
> + *    CPU-wise compared to prebuilding a map from all local type names to
> + *    a list of candidate type names. It's also sped up by caching resolved
> + *    list of matching candidates per each local "root" type ID, that has at
> + *    least one bpf_offset_reloc associated with it. This list is shared
> + *    between multiple relocations for the same type ID and is updated as some
> + *    of the candidates are pruned due to structural incompatibility.
> + */
> +static int bpf_core_reloc_offset(struct bpf_program *prog,
> +				 const struct bpf_offset_reloc *relo,
> +				 int relo_idx,
> +				 const struct btf *local_btf,
> +				 const struct btf *targ_btf,
> +				 struct hashmap *cand_cache)
> +{
> +	const char *prog_name = bpf_program__title(prog, false);
> +	struct bpf_core_spec local_spec, cand_spec, targ_spec;
> +	const void *type_key = u32_to_ptr(relo->type_id);
> +	const struct btf_type *local_type, *cand_type;
> +	const char *local_name, *cand_name;
> +	struct ids_vec *cand_ids;
> +	__u32 local_id, cand_id;
> +	const char *spec_str;
> +	int i, j, err;
> +
> +	local_id = relo->type_id;
> +	local_type = btf__type_by_id(local_btf, local_id);
> +	if (!local_type)
> +		return -EINVAL;
> +
> +	local_name = btf__name_by_offset(local_btf, local_type->name_off);
> +	if (str_is_empty(local_name))
> +		return -EINVAL;
> +
> +	spec_str = btf__name_by_offset(local_btf, relo->access_str_off);
> +	if (str_is_empty(spec_str))
> +		return -EINVAL;
> +
> +	pr_debug("prog '%s': relo #%d: insn_off=%d, [%d] (%s) + %s\n",
> +		 prog_name, relo_idx, relo->insn_off,
> +		 local_id, local_name, spec_str);
> +
> +	err = bpf_core_spec_parse(local_btf, local_id, spec_str, &local_spec);
> +	if (err) {
> +		pr_warning("prog '%s': relo #%d: parsing [%d] (%s) + %s failed: %d\n",
> +			   prog_name, relo_idx, local_id, local_name, spec_str,
> +			   err);
> +		return -EINVAL;
> +	}
> +	pr_debug("prog '%s': relo #%d: [%d] (%s) + %s is off %u, len %d, raw_len %d\n",
> +		 prog_name, relo_idx, local_id, local_name, spec_str,
> +		 local_spec.offset, local_spec.len, local_spec.raw_len);
> +
> +	if (!hashmap__find(cand_cache, type_key, (void **)&cand_ids)) {
> +		cand_ids = bpf_core_find_cands(local_btf, local_id, targ_btf);
> +		if (IS_ERR(cand_ids)) {
> +			pr_warning("prog '%s': relo #%d: target candidate search failed for [%d] (%s) + %s: %ld\n",
> +				   prog_name, relo_idx, local_id, local_name,
> +				   spec_str, PTR_ERR(cand_ids));
> +			return PTR_ERR(cand_ids);
> +		}
> +		err = hashmap__set(cand_cache, type_key, cand_ids, NULL, NULL);
> +		if (err) {
> +			bpf_core_free_cands(cand_ids);
> +			return err;
> +		}
> +	}
> +
> +	for (i = 0, j = 0; i < cand_ids->len; i++) {
> +		cand_id = cand_ids->data[j];
> +		cand_type = btf__type_by_id(targ_btf, cand_id);
> +		cand_name = btf__name_by_offset(targ_btf, cand_type->name_off);
> +
> +		err = bpf_core_spec_match(&local_spec, targ_btf,
> +					  cand_id, &cand_spec);
> +		if (err < 0) {
> +			pr_warning("prog '%s': relo #%d: failed to match spec [%d] (%s) + %s to candidate #%d [%d] (%s): %d\n",
> +				   prog_name, relo_idx, local_id, local_name,
> +				   spec_str, i, cand_id, cand_name, err);
> +			return err;
> +		}
> +		if (err == 0) {
> +			pr_debug("prog '%s': relo #%d: candidate #%d [%d] (%s) doesn't match spec\n",
> +				 prog_name, relo_idx, i, cand_id, cand_name);
> +			continue;
> +		}
> +
> +		pr_debug("prog '%s': relo #%d: candidate #%d ([%d] %s) is off %u, len %d, raw_len %d\n",
> +			 prog_name, relo_idx, i, cand_id, cand_name,
> +			 cand_spec.offset, cand_spec.len, cand_spec.raw_len);
> +
> +		if (j == 0) {
> +			targ_spec = cand_spec;
> +		} else if (cand_spec.offset != targ_spec.offset) {
> +			/* if there are many candidates, they should all
> +			 * resolve to the same offset
> +			 */
> +			pr_warning("prog '%s': relo #%d: candidate #%d ([%d] %s): conflicting offset found (%u != %u)\n",
> +				   prog_name, relo_idx, i, cand_id, cand_name,
> +				   cand_spec.offset, targ_spec.offset);
> +			return -EINVAL;
> +		}
> +
> +		cand_ids->data[j++] = cand_spec.spec[0].type_id;
> +	}
> +
> +	cand_ids->len = j;
> +	if (cand_ids->len == 0) {
> +		pr_warning("prog '%s': relo #%d: no matching targets found for [%d] (%s) + %s\n",
> +			   prog_name, relo_idx, local_id, local_name, spec_str);
> +		return -ESRCH;
> +	}
> +
> +	err = bpf_core_reloc_insn(prog, relo->insn_off,
> +				  local_spec.offset, targ_spec.offset);
> +	if (err) {
> +		pr_warning("prog '%s': relo #%d: failed to patch insn at offset %d: %d\n",
> +			   prog_name, relo_idx, relo->insn_off, err);
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +bpf_core_reloc_offsets(struct bpf_object *obj, const char *targ_btf_path)
> +{
> +	const struct btf_ext_info_sec *sec;
> +	const struct bpf_offset_reloc *rec;
> +	const struct btf_ext_info *seg;
> +	struct hashmap_entry *entry;
> +	struct hashmap *cand_cache = NULL;
> +	struct bpf_program *prog;
> +	struct btf *targ_btf;
> +	const char *sec_name;
> +	int i, err = 0;
> +
> +	if (targ_btf_path)
> +		targ_btf = btf__parse_elf(targ_btf_path, NULL);
> +	else
> +		targ_btf = bpf_core_find_kernel_btf();
> +	if (IS_ERR(targ_btf)) {
> +		pr_warning("failed to get target BTF: %ld\n",
> +			   PTR_ERR(targ_btf));
> +		return PTR_ERR(targ_btf);
> +	}
> +
> +	cand_cache = hashmap__new(bpf_core_hash_fn, bpf_core_equal_fn, NULL);
> +	if (IS_ERR(cand_cache)) {
> +		err = PTR_ERR(cand_cache);
> +		goto out;
> +	}
> +
> +	seg = &obj->btf_ext->offset_reloc_info;
> +	for_each_btf_ext_sec(seg, sec) {
> +		sec_name = btf__name_by_offset(obj->btf, sec->sec_name_off);
> +		if (str_is_empty(sec_name)) {
> +			err = -EINVAL;
> +			goto out;
> +		}
> +		prog = bpf_object__find_program_by_title(obj, sec_name);
> +		if (!prog) {
> +			pr_warning("failed to find program '%s' for CO-RE offset relocation\n",
> +				   sec_name);
> +			err = -EINVAL;
> +			goto out;
> +		}
> +
> +		pr_debug("prog '%s': performing %d CO-RE offset relocs\n",
> +			 sec_name, sec->num_info);
> +
> +		for_each_btf_ext_rec(seg, sec, i, rec) {
> +			err = bpf_core_reloc_offset(prog, rec, i, obj->btf,
> +						    targ_btf, cand_cache);
> +			if (err) {
> +				pr_warning("prog '%s': relo #%d: failed to relocate: %d\n",
> +					   sec_name, i, err);
> +				goto out;
> +			}
> +		}
> +	}
> +
> +out:
> +	btf__free(targ_btf);
> +	if (!IS_ERR_OR_NULL(cand_cache)) {
> +		hashmap__for_each_entry(cand_cache, entry, i) {
> +			bpf_core_free_cands(entry->value);
> +		}
> +		hashmap__free(cand_cache);
> +	}
> +	return err;
> +}
> +
> +static int
> +bpf_object__relocate_core(struct bpf_object *obj, const char *targ_btf_path)
> +{
> +	int err = 0;
> +
> +	if (obj->btf_ext->offset_reloc_info.len)
> +		err = bpf_core_reloc_offsets(obj, targ_btf_path);
> +
> +	return err;
> +}
> +
> static int
> bpf_program__reloc_text(struct bpf_program *prog, struct bpf_object *obj,
> 			struct reloc_desc *relo)
> @@ -2396,14 +3243,21 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
> 	return 0;
> }
> 
> -
> static int
> -bpf_object__relocate(struct bpf_object *obj)
> +bpf_object__relocate(struct bpf_object *obj, const char *targ_btf_path)
> {
> 	struct bpf_program *prog;
> 	size_t i;
> 	int err;
> 
> +	if (obj->btf_ext) {
> +		err = bpf_object__relocate_core(obj, targ_btf_path);
> +		if (err) {
> +			pr_warning("failed to perform CO-RE relocations: %d\n",
> +				   err);
> +			return err;
> +		}
> +	}
> 	for (i = 0; i < obj->nr_programs; i++) {
> 		prog = &obj->programs[i];
> 
> @@ -2804,7 +3658,7 @@ int bpf_object__load_xattr(struct bpf_object_load_attr *attr)
> 	obj->loaded = true;
> 
> 	CHECK_ERR(bpf_object__create_maps(obj), err, out);
> -	CHECK_ERR(bpf_object__relocate(obj), err, out);
> +	CHECK_ERR(bpf_object__relocate(obj, attr->target_btf_path), err, out);
> 	CHECK_ERR(bpf_object__load_progs(obj, attr->log_level), err, out);
> 
> 	return 0;
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 5cbf459ece0b..6004bfe1ebf0 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -92,6 +92,7 @@ LIBBPF_API void bpf_object__close(struct bpf_object *object);
> struct bpf_object_load_attr {
> 	struct bpf_object *obj;
> 	int log_level;
> +	const char *target_btf_path;
> };
> 
> /* Load/unload object into/from kernel */
> -- 
> 2.17.1
> 


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox