Linux Documentation

Linux Documentation
 help / color / mirror / Atom feed

* Re: [PATCH v2] cpufreq-stats: document limitations on modern cpufreq drivers
From: Zhongqiu Han @ 2026-05-18  9:00 UTC (permalink / raw)
  To: NicoErdmann, linux-pm
  Cc: linux-doc, rafael, viresh.kumar, corbet, skhan, zhongqiu.han
In-Reply-To: <20260510193352.195181-1-nicobsc4@yahoo.com>

On 5/11/2026 3:33 AM, NicoErdmann wrote:
> Add a note clarifying that cpufreq-stats may not be present or may not provide meaningful statistics depending
> on the active CPU frequency scaling driver.
> 
> In particular, drivers such as intel_pstate and amd_pstate may use alternative mechanisms for frequency scaling
> and accounting.
> 

Hi NicoErdmann,

Thanks for working on this — this documentation gap seems worth
addressing.

Please run ./scripts/checkpatch.pl cpufreq-stats-document-xx.patch to
avoid style/format issue.

I see the below Error/Warning, please fix.

-----------------------------------------------------------------------
WARNING: Prefer a maximum 75 chars per line (possible unwrapped commit 
description?)
#6:
Add a note clarifying that cpufreq-stats may not be present or may not 
provide meaningful statistics depending

ERROR: trailing whitespace
#29: FILE: Documentation/cpu-freq/cpufreq-stats.rst:32:
+^I$

total: 1 errors, 1 warnings, 13 lines checked
-----------------------------------------------------------------------

> v2:
>   - Add missing period at end of sentence (reported by Randy)
> 
> Signed-off-by: NicoErdmann <nicobsc4@yahoo.com>
> ---
>   Documentation/cpu-freq/cpufreq-stats.rst | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/Documentation/cpu-freq/cpufreq-stats.rst b/Documentation/cpu-freq/cpufreq-stats.rst
> index 9ad695b1c7db..6ffa5a6a63c9 100644
> --- a/Documentation/cpu-freq/cpufreq-stats.rst
> +++ b/Documentation/cpu-freq/cpufreq-stats.rst
> @@ -28,6 +28,13 @@ Various statistics will form read_only files under this directory.
>   This driver is designed to be independent of any particular cpufreq_driver
>   that may be running on your CPU. So, it will work with any cpufreq_driver.

The existing statement "it will work with any cpufreq_driver" may not be
entirely accurate in practice. The stats driver relies on the scaling
driver populating a frequency table (policy->freq_table), which is not
the case for some modern drivers. It might be better to clarify this
dependency rather than keeping the current wording and adding a
contradicting note.

>   
> +.. note::
> +	
> +   On some modern systems, this interface may not be available or may not
> +   expose meaningful statistics depending on the active CPU frequency scaling driver.

Also, "may not expose meaningful statistics" could be a bit misleading.
In these cases, the stats/ directory is typically not created at all,
since cpufreq_stats_create_table() returns early when the frequency
table is not available.

> +
> +   In particular, drivers such as intel_pstate or amd_pstate may use alternative
> +   mechanisms for frequency scaling and accounting.

Similarly, describing this as "alternative mechanisms for frequency
scaling and accounting" may be slightly vague. The key point is that
these drivers do not populate policy->freq_table, which prevents the
stats driver from creating its sysfs interface.

Small nit: The subject line "modern cpufreq drivers" feels a bit vague;
it might be clearer to refer to drivers that do not populate
policy->freq_table, since that is the actual condition under
which cpufreq-stats is not available.

For completeness, it may also be worth mentioning cppc_cpufreq, which
behaves in a similar way.

>   
>   2. Statistics Provided (with example)
>   =====================================

Perhaps something along the following lines would make the behavior
clearer:

This driver is designed to be independent of any particular
cpufreq_driver that may be running on your CPU. However, it requires
the scaling driver to populate a frequency table
(``policy->freq_table``). Drivers that operate on a continuous
performance range rather than a discrete set of frequencies, such
as ``intel_pstate``, ``amd_pstate``, and ``cppc_cpufreq``, do not
populate this table. As a result, the ``stats/`` directory will not
be present for those drivers.

Thanks again for looking into this.

-- 
Thx and BRs,
Zhongqiu Han

^ permalink raw reply

* RE: [PATCH net-next 1/2] devlink: add generic device max_sfs parameter
From: Loktionov, Aleksandr @ 2026-05-18  8:57 UTC (permalink / raw)
  To: Tariq Toukan, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, David S. Miller
  Cc: Jiri Pirko, Simon Horman, Jonathan Corbet, Shuah Khan,
	Saeed Mahameed, Leon Romanovsky, Mark Bloch, Vlad Dumitrescu,
	Daniel Zahka, David Ahern, Nikolay Aleksandrov,
	netdev@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
	Gal Pressman, Dragos Tatulea, Jiri Pirko, Nikolay Aleksandrov
In-Reply-To: <20260517112700.343575-2-tariqt@nvidia.com>



> -----Original Message-----
> From: Tariq Toukan <tariqt@nvidia.com>
> Sent: Sunday, May 17, 2026 1:27 PM
> To: Eric Dumazet <edumazet@google.com>; Jakub Kicinski
> <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Andrew Lunn
> <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>
> Cc: Jiri Pirko <jiri@resnulli.us>; Simon Horman <horms@kernel.org>;
> Jonathan Corbet <corbet@lwn.net>; Shuah Khan
> <skhan@linuxfoundation.org>; Saeed Mahameed <saeedm@nvidia.com>; Leon
> Romanovsky <leon@kernel.org>; Tariq Toukan <tariqt@nvidia.com>; Mark
> Bloch <mbloch@nvidia.com>; Vlad Dumitrescu <vdumitrescu@nvidia.com>;
> Loktionov, Aleksandr <aleksandr.loktionov@intel.com>; Daniel Zahka
> <daniel.zahka@gmail.com>; David Ahern <dsahern@kernel.org>; Nikolay
> Aleksandrov <razor@blackwall.org>; netdev@vger.kernel.org; linux-
> doc@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> rdma@vger.kernel.org; Gal Pressman <gal@nvidia.com>; Dragos Tatulea
> <dtatulea@nvidia.com>; Jiri Pirko <jiri@nvidia.com>; Nikolay
> Aleksandrov <nikolay@nvidia.com>
> Subject: [PATCH net-next 1/2] devlink: add generic device max_sfs
> parameter
> 
> From: Nikolay Aleksandrov <nikolay@nvidia.com>
> 
> Add a new generic devlink device parameter (max_sfs) to control if and
> how many light-weight NIC subfunctions can be created. Subfunctions
> are a light-weight network functions backed by an underlying PCI
> function.
> Their lifecycle can already be managed by devlink, but currently users
> cannot enable them in the device. They can be enabled/disabled only
> via external vendor tools. This parameter allows subfunctions to be
> enabled
> (>0) or disabled (0) via devlink. A subsequent patch will add support
> for max_sfs to the mlx5 driver.
> 
> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
> Reviewed-by: David Ahern <dsahern@kernel.org>
> Reviewed-by: Jiri Pirko <jiri@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  Documentation/networking/devlink/devlink-params.rst | 6 ++++++
>  include/net/devlink.h                               | 4 ++++
>  net/devlink/param.c                                 | 5 +++++
>  3 files changed, 15 insertions(+)
> 
> diff --git a/Documentation/networking/devlink/devlink-params.rst
> b/Documentation/networking/devlink/devlink-params.rst
> index ea17756dcda6..29b8a9246fb6 100644
> --- a/Documentation/networking/devlink/devlink-params.rst
> +++ b/Documentation/networking/devlink/devlink-params.rst
> @@ -165,3 +165,9 @@ own name.
>       - u32
>       - Controls the maximum number of MAC address filters that can be
> assigned
>         to a Virtual Function (VF).
> +   * - ``max_sfs``
> +     - u32
> +     - The maximum number of subfunctions which can be created on the
> device.
> +       Modifying this parameter may require a device restart and PCI
> bus
> +       rescanning because the BAR layout may change. A value of 0
> disables
> +       subfunction creation.
> diff --git a/include/net/devlink.h b/include/net/devlink.h index
> bcd31de1f890..4ec455cfe7a4 100644
> --- a/include/net/devlink.h
> +++ b/include/net/devlink.h
> @@ -546,6 +546,7 @@ enum devlink_param_generic_id {
>  	DEVLINK_PARAM_GENERIC_ID_TOTAL_VFS,
>  	DEVLINK_PARAM_GENERIC_ID_NUM_DOORBELLS,
>  	DEVLINK_PARAM_GENERIC_ID_MAX_MAC_PER_VF,
> +	DEVLINK_PARAM_GENERIC_ID_MAX_SFS,
> 
>  	/* add new param generic ids above here*/
>  	__DEVLINK_PARAM_GENERIC_ID_MAX,
> @@ -619,6 +620,9 @@ enum devlink_param_generic_id {  #define
> DEVLINK_PARAM_GENERIC_MAX_MAC_PER_VF_NAME "max_mac_per_vf"
>  #define DEVLINK_PARAM_GENERIC_MAX_MAC_PER_VF_TYPE
> DEVLINK_PARAM_TYPE_U32
> 
> +#define DEVLINK_PARAM_GENERIC_MAX_SFS_NAME "max_sfs"
> +#define DEVLINK_PARAM_GENERIC_MAX_SFS_TYPE DEVLINK_PARAM_TYPE_U32
> +
>  #define DEVLINK_PARAM_GENERIC(_id, _cmodes, _get, _set, _validate)	\
>  {									\
>  	.id = DEVLINK_PARAM_GENERIC_ID_##_id,				\
> diff --git a/net/devlink/param.c b/net/devlink/param.c index
> cf95268da5b0..523243e49d88 100644
> --- a/net/devlink/param.c
> +++ b/net/devlink/param.c
> @@ -117,6 +117,11 @@ static const struct devlink_param
> devlink_param_generic[] = {
>  		.name = DEVLINK_PARAM_GENERIC_MAX_MAC_PER_VF_NAME,
>  		.type = DEVLINK_PARAM_GENERIC_MAX_MAC_PER_VF_TYPE,
>  	},
> +	{
> +		.id = DEVLINK_PARAM_GENERIC_ID_MAX_SFS,
> +		.name = DEVLINK_PARAM_GENERIC_MAX_SFS_NAME,
> +		.type = DEVLINK_PARAM_GENERIC_MAX_SFS_TYPE,
> +	},
>  };
> 
>  static int devlink_param_generic_verify(const struct devlink_param
> *param)
> --
> 2.44.0


Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

^ permalink raw reply

* Re: [PATCH v4 04/30] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration
From: David Woodhouse @ 2026-05-18  8:48 UTC (permalink / raw)
  To: Dongli Zhang, kvm
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Thomas Gleixner,
	Sean Christopherson, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dave Hansen, Vitaly Kuznetsov, x86, Marc Zyngier, Juergen Gross,
	Boris Ostrovsky, Paul Durrant, Jonathan Cameron, Sascha Bischoff,
	Jack Allister, Joey Gouly, joe.jin, linux-doc, linux-kernel,
	xen-devel, linux-kselftest
In-Reply-To: <0ae8e471-db7a-4842-aca4-8ef643acde8b@oracle.com>

[-- Attachment #1: Type: text/plain, Size: 6555 bytes --]

On Mon, 2026-05-18 at 00:52 -0700, Dongli Zhang wrote:
> On 5/9/26 3:46 PM, David Woodhouse wrote:

Huh, I didn't write that then; it isn't September yet. Did you mean
2026-05-09? We aren't all in the US... 

Strictly speaking, you just misattributed a quote of mine, which is
very poor form :)

What mailer are you using? Can it be fixed?

> > From: Jack Allister <jalliste@amazon.com>
> > 
> > Where kvm->arch.use_master_clock is false (because the host TSC is
> > unreliable, or the guest TSCs are configured strangely), the KVM clock
> > is *not* defined as a function of the guest TSC so KVM_GET_CLOCK_GUEST
> > returns an error. In this case, as documented, userspace shall use the
> > legacy KVM_GET_CLOCK ioctl. The loss of precision is acceptable in this
> 
> The description here confused me a little. It sounds like userspace should call
> KVM_SET_CLOCK if KVM_SET_CLOCK_GUEST fails. However, I assume it actually means
> that userspace should do nothing extra if KVM_SET_CLOCK_GUEST fails, and simply
> rely on the prior KVM_SET_CLOCK and KVM_VCPU_TSC_OFFSET workflow described in
> patch 07. Is that correct?

Yes. If KVM_SET_CLOCK_GUEST doesn't work (which might be because
KVM_GET_CLOCK_GUEST didn't work so userspace doesn't have the data in
the first place, or because the actual ioctl returns failure), then
userspace should rely on the old method using KVM_SET_CLOCK imprecisely
instead. That includes on a migration from an older kernel that *lacks*
KVM_GET_CLOCK_GUEST, of course.

I don't think it strictly matters whether userspace does KVM_SET_CLOCK
first, then *tries* KVM_SET_CLOCK_GUEST, or whether it tries
KVM_SET_CLOCK_GUEST and then only calls KVM_SET_CLOCK on failure? I'd
probably be inclined not to use KVM_SET_CLOCK at all unless it is known
to be needed?

> > +4.145 KVM_GET_CLOCK_GUEST
> > +----------------------------
> > +
> > +:Capability: none
> > +:Architectures: x86_64
> > +:Type: vcpu ioctl
> > +:Parameters: struct pvclock_vcpu_time_info (out)
> > +:Returns: 0 on success, <0 on error
> > +
> > +Retrieves the current time information structure used for KVM/PV clocks,
> > +in precisely the form advertised to the guest vCPU, which gives parameters
> > +for a direct conversion from a guest TSC value to nanoseconds.
> > +
> > +When the KVM clock is not in "master clock" mode, for example because the
> > +host TSC is unreliable or the guest TSCs are oddly configured, the KVM clock
> > +is actually defined by the host CLOCK_MONOTONIC_RAW instead of the guest TSC.
> > +In this case, the KVM_GET_CLOCK_GUEST ioctl returns -EINVAL.
> > +
> > +4.146 KVM_SET_CLOCK_GUEST
> > +----------------------------
> > +
> > +:Capability: none
> 
> Do we need a KVM_CHECK_EXTENSION capability for this? If userspace wants to
> support the new API, should it detect availability via KVM_CHECK_EXTENSION, or
> simply try the ioctl and handle failure?

That might be conventional, I suppose. But I suspect Jack's thinking
was that userspace is going to have to *try* it anyway, and still might
have to fall back to what KVM_SET_CLOCK can manage, so userspace
probably wouldn't even bother to check that capability; it doesn't
matter.

Since then, we've added some more attributes in this series though, and
it probably is worth adding a cap which advertises them *all*?
Something like KVM_CAP_CLOCK_PRECISION_API?

> > +#ifdef CONFIG_X86_64
> > +static int kvm_vcpu_ioctl_get_clock_guest(struct kvm_vcpu *v, void __user *argp)
> > +{
> > +	struct pvclock_vcpu_time_info hv_clock = {};
> > +	struct kvm_vcpu_arch *vcpu = &v->arch;
> > +	struct kvm_arch *ka = &v->kvm->arch;
> > +	unsigned int seq;
> > +
> > +	/*
> > +	 * If KVM_REQ_CLOCK_UPDATE is already pending, or if the pvclock
> > +	 * has never been generated at all, call kvm_guest_time_update().
> > +	 */
> > +	if (kvm_check_request(KVM_REQ_CLOCK_UPDATE, v) || !vcpu->hw_tsc_hz) {
> 
> This was flagged by AI, and I am still checking whether it is a real issue.
> 
> What happens if KVM_REQ_MASTERCLOCK_UPDATE and KVM_REQ_CLOCK_UPDATE are both
> pending?
> 
> From my perspective, I am also curious how we should reason about this in other
> scenarios in the future. Specifically, when do we need to process
> KVM_REQ_MASTERCLOCK_UPDATE before KVM_REQ_CLOCK_UPDATE, and when is it
> acceptable not to? I noticed that kvm_cpuid() already processes only
> KVM_REQ_CLOCK_UPDATE.

The way I've been thinking about it — and I'm only two cups of coffee
into Monday so take those words literally and don't think of them as
British understatement of something I believe is absolute truth — is
that MASTERCLOCK_UPDATE is updating the actual clock for the whole VM,
while CLOCK_UPDATE is about *putting* that information into the per-
vCPU pvclock structures.

So after a MASTERCLOCK_UPDATE, we need to do a CLOCK_UPDATE on all
vCPUs to disseminate the result. Which means that if CLOCK_UPDATE is
already pending before a MASTERCLOCK_UPDATE, it's probably redundant
and might as well be cleared because it's only going to get set *again*
in kvm_end_pvclock_update()? 

> > +	/*
> > +	 * Calculate the guest TSC at the new reference point, and the
> > +	 * corresponding KVM clock value according to user_hv_clock.
> > +	 * Adjust kvmclock_offset so both definitions agree.
> > +	 */
> > +	guest_tsc = kvm_read_l1_tsc(v, ka->master_cycle_now);
> > +	user_clk_ns = __pvclock_read_cycles(&user_hv_clock, guest_tsc);
> > +	ka->kvmclock_offset = user_clk_ns - ka->master_kernel_ns;
> 
> I used to explore adjusting ka->kvmclock_offset in KVM_SET_CLOCK based on the
> old hv_clock and the new hv_clock long time ago. At that time, my concern was
> what would happen if userspace provided bogus values. Theoretically, this is
> possible with any ioctl. My concern may be unnecessary.
> 
> Would it be helpful to validate that the delta is within a reasonable range,
> e.g. that the drift can never be more than five minutes (forward or backward)?

Setting confidential guests aside, which have their own way of trusting
the TSC and should never even *consider* using kvmclock, surely this is
supposed to be *entirely* under the control of the VMM? The kernel has
no business deciding what is 'bogus'?

If a guest has been running for months on a previous host and is
migrated to a new host, don't we expect that the KVM clock of the new
VM on the new host is tweaked from its default near-zero after
creation, to some large amount?

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH] docs: submitting-patches: Clarify that in English "reviewer" is a person
From: Mark Brown @ 2026-05-18  8:33 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE)
  Cc: Krzysztof Kozlowski, Jonathan Corbet, Shuah Khan, workflows,
	linux-doc, linux-kernel, Greg Kroah-Hartman, Andrew Morton,
	David Hildenbrand, Linus Torvalds, Guenter Roeck
In-Reply-To: <ce1e5e9b-83d0-4971-aee3-dc5a8f85ce22@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 788 bytes --]

On Sat, May 16, 2026 at 04:39:45PM +0200, Vlastimil Babka (SUSE) wrote:
> On 5/16/26 14:38, Krzysztof Kozlowski wrote:

> > Our docs already clearly mark that "Reviewed-by" must come from a
> > person:

...

> > However this is not enough and apparently English is not that precise,
> > so let's clarify that only a person can state the "Reviewer's statement
> > of oversight".

> I agree with the intent that the tag is for people (whether they use a tool
> or not to help them). We also don't put "Tested-by: kernel test robot" or
> syzkaller on every commit that they test and find no bugs. Review is also
> not just about absence of bugs, but agreeing with the larger design and
> whether the change makes sense to do in the first place.

Reviewed-by: Mark Brown <broonie@kernel.org>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v3 11/14] ixd: add basic driver framework for Intel(R) Control Plane Function
From: Larysa Zaremba @ 2026-05-18  8:32 UTC (permalink / raw)
  To: Tony Nguyen, davem, kuba, pabeni, edumazet, andrew+netdev, netdev
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	przemyslaw.kitszel, aleksander.lobakin, sridhar.samudrala,
	anjali.singhai, michal.swiatkowski, maciej.fijalkowski,
	emil.s.tantilov, madhu.chittim, joshua.a.hay, jacob.e.keller,
	jayaprakash.shanmugam, jiri, horms, corbet, richardcochran,
	linux-doc, Bharath R, Aleksandr Loktionov
In-Reply-To: <20260515224443.2772147-12-anthony.l.nguyen@intel.com>

On Fri, May 15, 2026 at 03:44:35PM -0700, Tony Nguyen wrote:
> From: Larysa Zaremba <larysa.zaremba@intel.com>
> 
> Add module register and probe functionality. Add the required support to
> register IXD PCI driver, as well as probe and remove call backs. Enable the
> PCI device and request the kernel to reserve the memory resources that will
> be used by the driver. Finally map the BAR0 address space.
> 
> For now, use devm_alloc() to allocate adapter, as it requires the least
> amount of code. In a later commit, it will be replaced with a devlink
> alternative.

I had reviewed the Sashiko feedback [0]. Here are a few notes:

1. "Should this file explicitly include linux/module.h and linux/reboot.h?"
   we do not normally, do that, but if you want to, there is a diff below.
2. "Could leaving PCI bus mastering enabled during shutdown cause memory
    corruption or IOMMU faults during a kexec or warm reboot?" - Could it? 
   Current flow is the same as ice.

[0] https://sashiko.dev/#/patchset/20260515224443.2772147-1-anthony.l.nguyen%40intel.com

diff --git a/drivers/net/ethernet/intel/ixd/ixd_main.c b/drivers/net/ethernet/intel/ixd/ixd_main.c
index 75ee53152e61..a08c0076926a 100644
--- a/drivers/net/ethernet/intel/ixd/ixd_main.c
+++ b/drivers/net/ethernet/intel/ixd/ixd_main.c
@@ -1,6 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0-only
 /* Copyright (C) 2025 Intel Corporation */

+#include <linux/module.h>
+#include <linux/reboot.h>
+
 #include "ixd.h"
 #include "ixd_lan_regs.h"


> 
> Co-developed-by: Amritha Nambiar <amritha.nambiar@intel.com>
> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> Tested-by: Bharath R <Bharath.r@intel.com>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> ---
>  .../device_drivers/ethernet/index.rst         |   1 +
>  .../device_drivers/ethernet/intel/ixd.rst     |  39 ++++++
>  drivers/net/ethernet/intel/Kconfig            |   2 +
>  drivers/net/ethernet/intel/Makefile           |   1 +
>  drivers/net/ethernet/intel/ixd/Kconfig        |  13 ++
>  drivers/net/ethernet/intel/ixd/Makefile       |   8 ++
>  drivers/net/ethernet/intel/ixd/ixd.h          |  28 +++++
>  drivers/net/ethernet/intel/ixd/ixd_lan_regs.h |  28 +++++
>  drivers/net/ethernet/intel/ixd/ixd_main.c     | 112 ++++++++++++++++++
>  9 files changed, 232 insertions(+)
>  create mode 100644 Documentation/networking/device_drivers/ethernet/intel/ixd.rst
>  create mode 100644 drivers/net/ethernet/intel/ixd/Kconfig
>  create mode 100644 drivers/net/ethernet/intel/ixd/Makefile
>  create mode 100644 drivers/net/ethernet/intel/ixd/ixd.h
>  create mode 100644 drivers/net/ethernet/intel/ixd/ixd_lan_regs.h
>  create mode 100644 drivers/net/ethernet/intel/ixd/ixd_main.c
> 
> diff --git a/Documentation/networking/device_drivers/ethernet/index.rst b/Documentation/networking/device_drivers/ethernet/index.rst
> index fd3be5d20397..39d2ff526cd8 100644
> --- a/Documentation/networking/device_drivers/ethernet/index.rst
> +++ b/Documentation/networking/device_drivers/ethernet/index.rst
> @@ -36,6 +36,7 @@ Contents:
>     intel/igbvf
>     intel/ixgbe
>     intel/ixgbevf
> +   intel/ixd
>     intel/i40e
>     intel/iavf
>     intel/ice
> diff --git a/Documentation/networking/device_drivers/ethernet/intel/ixd.rst b/Documentation/networking/device_drivers/ethernet/intel/ixd.rst
> new file mode 100644
> index 000000000000..1387626e5d20
> --- /dev/null
> +++ b/Documentation/networking/device_drivers/ethernet/intel/ixd.rst
> @@ -0,0 +1,39 @@
> +.. SPDX-License-Identifier: GPL-2.0+
> +
> +==========================================================================
> +iXD Linux* Base Driver for the Intel(R) Control Plane Function
> +==========================================================================
> +
> +Intel iXD Linux driver.
> +Copyright(C) 2025 Intel Corporation.
> +
> +.. contents::
> +
> +For questions related to hardware requirements, refer to the documentation
> +supplied with your Intel adapter. All hardware requirements listed apply to use
> +with Linux.
> +
> +
> +Identifying Your Adapter
> +========================
> +For information on how to identify your adapter, and for the latest Intel
> +network drivers, refer to the Intel Support website:
> +http://www.intel.com/support
> +
> +
> +Support
> +=======
> +For general information, go to the Intel support website at:
> +http://www.intel.com/support/
> +
> +If an issue is identified with the released source code on a supported kernel
> +with a supported adapter, email the specific information related to the issue
> +to intel-wired-lan@lists.osuosl.org.
> +
> +
> +Trademarks
> +==========
> +Intel is a trademark or registered trademark of Intel Corporation or its
> +subsidiaries in the United States and/or other countries.
> +
> +* Other names and brands may be claimed as the property of others.
> diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
> index 288fa8ce53af..780f113986ea 100644
> --- a/drivers/net/ethernet/intel/Kconfig
> +++ b/drivers/net/ethernet/intel/Kconfig
> @@ -398,4 +398,6 @@ config IGC_LEDS
>  
>  source "drivers/net/ethernet/intel/idpf/Kconfig"
>  
> +source "drivers/net/ethernet/intel/ixd/Kconfig"
> +
>  endif # NET_VENDOR_INTEL
> diff --git a/drivers/net/ethernet/intel/Makefile b/drivers/net/ethernet/intel/Makefile
> index 9a37dc76aef0..08b29f3b6801 100644
> --- a/drivers/net/ethernet/intel/Makefile
> +++ b/drivers/net/ethernet/intel/Makefile
> @@ -19,3 +19,4 @@ obj-$(CONFIG_IAVF) += iavf/
>  obj-$(CONFIG_FM10K) += fm10k/
>  obj-$(CONFIG_ICE) += ice/
>  obj-$(CONFIG_IDPF) += idpf/
> +obj-$(CONFIG_IXD) += ixd/
> diff --git a/drivers/net/ethernet/intel/ixd/Kconfig b/drivers/net/ethernet/intel/ixd/Kconfig
> new file mode 100644
> index 000000000000..f5594efe292c
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/ixd/Kconfig
> @@ -0,0 +1,13 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +# Copyright (C) 2025 Intel Corporation
> +
> +config IXD
> +	tristate "Intel(R) Control Plane Function Support"
> +	depends on PCI_MSI
> +	select LIBETH
> +	select LIBIE_PCI
> +	help
> +	  This driver supports Intel(R) Control Plane PCI Function
> +	  of Intel E2100 and later IPUs and FNICs.
> +	  It facilitates a centralized control over multiple IDPF PFs/VFs/SFs
> +	  exposed by the same card.
> diff --git a/drivers/net/ethernet/intel/ixd/Makefile b/drivers/net/ethernet/intel/ixd/Makefile
> new file mode 100644
> index 000000000000..3849bc240600
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/ixd/Makefile
> @@ -0,0 +1,8 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +# Copyright (C) 2025 Intel Corporation
> +
> +# Intel(R) Control Plane Function Linux Driver
> +
> +obj-$(CONFIG_IXD) += ixd.o
> +
> +ixd-y := ixd_main.o
> diff --git a/drivers/net/ethernet/intel/ixd/ixd.h b/drivers/net/ethernet/intel/ixd/ixd.h
> new file mode 100644
> index 000000000000..d813c27941a5
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/ixd/ixd.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (C) 2025 Intel Corporation */
> +
> +#ifndef _IXD_H_
> +#define _IXD_H_
> +
> +#include <linux/intel/libie/pci.h>
> +
> +/**
> + * struct ixd_adapter - Data structure representing a CPF
> + * @hw: Device access data
> + */
> +struct ixd_adapter {
> +	struct libie_mmio_info hw;
> +};
> +
> +/**
> + * ixd_to_dev - Get the corresponding device struct from an adapter
> + * @adapter: PCI device driver-specific private data
> + *
> + * Return: struct device corresponding to the given adapter
> + */
> +static inline struct device *ixd_to_dev(struct ixd_adapter *adapter)
> +{
> +	return &adapter->hw.pdev->dev;
> +}
> +
> +#endif /* _IXD_H_ */
> diff --git a/drivers/net/ethernet/intel/ixd/ixd_lan_regs.h b/drivers/net/ethernet/intel/ixd/ixd_lan_regs.h
> new file mode 100644
> index 000000000000..fbb88929d0de
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/ixd/ixd_lan_regs.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (C) 2025 Intel Corporation */
> +
> +#ifndef _IXD_LAN_REGS_H_
> +#define _IXD_LAN_REGS_H_
> +
> +/* Control Plane Function PCI ID */
> +#define IXD_DEV_ID_CPF			0x1efe
> +
> +/* Control Queue (Mailbox) */
> +#define PF_FW_MBX_REG_LEN		4096
> +#define PF_FW_MBX			0x08400000
> +
> +/* Reset registers */
> +#define PFGEN_RTRIG_REG_LEN		2048
> +#define PFGEN_RTRIG			0x08407000	/* Device resets */
> +
> +/**
> + * struct ixd_bar_region - BAR region description
> + * @offset: BAR region offset
> + * @size: BAR region size
> + */
> +struct ixd_bar_region {
> +	resource_size_t offset;
> +	resource_size_t size;
> +};
> +
> +#endif /* _IXD_LAN_REGS_H_ */
> diff --git a/drivers/net/ethernet/intel/ixd/ixd_main.c b/drivers/net/ethernet/intel/ixd/ixd_main.c
> new file mode 100644
> index 000000000000..75ee53152e61
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/ixd/ixd_main.c
> @@ -0,0 +1,112 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (C) 2025 Intel Corporation */
> +
> +#include "ixd.h"
> +#include "ixd_lan_regs.h"
> +
> +MODULE_DESCRIPTION("Intel(R) Control Plane Function Device Driver");
> +MODULE_IMPORT_NS("LIBIE_PCI");
> +MODULE_LICENSE("GPL");
> +
> +/**
> + * ixd_remove - remove a CPF PCI device
> + * @pdev: PCI device being removed
> + */
> +static void ixd_remove(struct pci_dev *pdev)
> +{
> +	struct ixd_adapter *adapter = pci_get_drvdata(pdev);
> +
> +	libie_pci_unmap_all_mmio_regions(&adapter->hw);
> +}
> +
> +/**
> + * ixd_shutdown - shut down a CPF PCI device
> + * @pdev: PCI device being shut down
> + */
> +static void ixd_shutdown(struct pci_dev *pdev)
> +{
> +	ixd_remove(pdev);
> +
> +	if (system_state == SYSTEM_POWER_OFF)
> +		pci_set_power_state(pdev, PCI_D3hot);
> +}
> +
> +/**
> + * ixd_iomap_regions - iomap PCI BARs
> + * @adapter: adapter to map memory regions for
> + *
> + * Returns: %0 on success, negative on failure
> + */
> +static int ixd_iomap_regions(struct ixd_adapter *adapter)
> +{
> +	const struct ixd_bar_region regions[] = {
> +		{
> +			.offset = PFGEN_RTRIG,
> +			.size = PFGEN_RTRIG_REG_LEN,
> +		},
> +		{
> +			.offset = PF_FW_MBX,
> +			.size = PF_FW_MBX_REG_LEN,
> +		},
> +	};
> +
> +	for (int i = 0; i < ARRAY_SIZE(regions); i++) {
> +		struct libie_mmio_info *mmio_info = &adapter->hw;
> +		bool map_ok;
> +
> +		map_ok = libie_pci_map_mmio_region(mmio_info,
> +						   regions[i].offset,
> +						   regions[i].size);
> +		if (!map_ok) {
> +			dev_err(ixd_to_dev(adapter),
> +				"Failed to map PCI device MMIO region\n");
> +
> +			libie_pci_unmap_all_mmio_regions(mmio_info);
> +			return -EIO;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * ixd_probe - probe a CPF PCI device
> + * @pdev: corresponding PCI device
> + * @ent: entry in ixd_pci_tbl
> + *
> + * Returns: %0 on success, negative errno code on failure
> + */
> +static int ixd_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> +{
> +	struct ixd_adapter *adapter;
> +	int err;
> +
> +	adapter = devm_kzalloc(&pdev->dev, sizeof(*adapter), GFP_KERNEL);
> +	if (!adapter)
> +		return -ENOMEM;
> +	adapter->hw.pdev = pdev;
> +	INIT_LIST_HEAD(&adapter->hw.mmio_list);
> +
> +	err = libie_pci_init_dev(pdev);
> +	if (err)
> +		return err;
> +
> +	pci_set_drvdata(pdev, adapter);
> +
> +	return ixd_iomap_regions(adapter);
> +}
> +
> +static const struct pci_device_id ixd_pci_tbl[] = {
> +	{ PCI_VDEVICE(INTEL, IXD_DEV_ID_CPF) },
> +	{ }
> +};
> +MODULE_DEVICE_TABLE(pci, ixd_pci_tbl);
> +
> +static struct pci_driver ixd_driver = {
> +	.name			= KBUILD_MODNAME,
> +	.id_table		= ixd_pci_tbl,
> +	.probe			= ixd_probe,
> +	.remove			= ixd_remove,
> +	.shutdown		= ixd_shutdown,
> +};
> +module_pci_driver(ixd_driver);
> -- 
> 2.47.1
> 

^ permalink raw reply related

* Re: [PATCH] docs: submitting-patches: Clarify that in English "reviewer" is a person
From: David Hildenbrand (Arm) @ 2026-05-18  8:31 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE), Krzysztof Kozlowski, Jonathan Corbet,
	Shuah Khan, workflows, linux-doc, linux-kernel
  Cc: Greg Kroah-Hartman, Andrew Morton, Linus Torvalds, Guenter Roeck
In-Reply-To: <ce1e5e9b-83d0-4971-aee3-dc5a8f85ce22@kernel.org>

On 5/16/26 16:39, Vlastimil Babka (SUSE) wrote:
> On 5/16/26 14:38, Krzysztof Kozlowski wrote:
>> Common understanding of word "Reviewer" is: a person performing a review
>> work [1]. Tools are not persons, thus cannot be reviewers in this term.
>> Also tools cannot make statements ("A Reviewed-by tag is a statement of
>> opinion"), since making a statement needs some sort of conscious mind.
>>
>> Our docs already clearly mark that "Reviewed-by" must come from a
>> person:
>>
>>  - "By offering my Reviewed-by: tag, I state that:"
>>
>>    Usage of first person "I" and word "state"
>>
>>  - "A Reviewed-by tag is *a statement of opinion* that the patch is an
>>     appropriate modification of the kernel without any remaining serious"
>>
>>    Only a person can make a statement of opinion.
>>
>>  - "Any interested reviewer (who has done the work) can offer a
>>    Reviewed-by"
>>
>>    A person can offer a tag thus above does not grant the tool
>>    permission to offer a tag.
>>
>> However this is not enough and apparently English is not that precise,
>> so let's clarify that only a person can state the "Reviewer's statement
>> of oversight".
>>
>> Link: https://en.wiktionary.org/wiki/reviewer [1]
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: Vlastimil Babka <vbabka@kernel.org>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: David Hildenbrand <david@kernel.org>
>> Cc: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
> 
> I agree with the intent that the tag is for people (whether they use a tool
> or not to help them). We also don't put "Tested-by: kernel test robot" or
> syzkaller on every commit that they test and find no bugs. Review is also
> not just about absence of bugs, but agreeing with the larger design and
> whether the change makes sense to do in the first place.

I'd assume that SOB/RB/ACK would all be real persons, not tools.

For SOB we term it as "known identity". I'd assume that a tool is not an
identity ...

So maybe we should also talk about "know identity" here?

In any case, bots providing RB tags is stupid

Acked-by: David Hildenbrand (Arm) <david@kernel.org>
-- 
Cheers,

David

^ permalink raw reply

* Re: [PATCH v13 3/4] gpio: rpmsg: add generic rpmsg GPIO driver
From: Padhi, Beleswar @ 2026-05-18  8:24 UTC (permalink / raw)
  To: Mathieu Poirier, Andrew Lunn
  Cc: tanmay.shah, Arnaud POULIQUEN, Shenwei Wang, Linus Walleij,
	Bartosz Golaszewski, Jonathan Corbet, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Bjorn Andersson, Frank Li,
	Sascha Hauer, Shuah Khan, linux-gpio@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	Pengutronix Kernel Team, Fabio Estevam, Peng Fan,
	devicetree@vger.kernel.org, linux-remoteproc@vger.kernel.org,
	imx@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	dl-linux-imx, Bartosz Golaszewski
In-Reply-To: <CANLsYkwBk0KbN-k9ce+5=oT+scdZ3nU5AOr3Fz4zT=0AFzghDA@mail.gmail.com>


On 5/12/2026 8:51 PM, Mathieu Poirier wrote:
> On Mon, 11 May 2026 at 12:18, Andrew Lunn <andrew@lunn.ch> wrote:
>>> Arnaud, Beleswar, Andrew and I are all advocating for one endpoint per
>>> GPIO controller.  The remaining issue it about the best way to work
>>> out source and destination addresses between Linux and the remote
>>> processor.  I'm running out of time for today but I'll return to this
>>> thread with a final analysis by the end of the week.
>> How many of the participants here will be in Minneapolis next week for
>> the Embedded Linux Conference? There is even a talk about this:
>>
>> https://osselcna2026.sched.com/event/2JQpx/building-virtual-drivers-with-rpmsg-key-design-principles-challenges-trade-offs-beleswar-prasad-padhi-texas-instruments?iframe=yes&w=100%&sidebar=yes&bg=no
>>
>> Maybe we can get together and decide on the final design after the
>> session.
>>
> I will not be in Minneapolis next week.  At this point I think things
> are converging into 2 main takeaways:
>
> 1) A serious refactoring of the protocol to include only what is
> available in the virtio-gpio specification [1].
> 2) The specification of GPIO controller number in an extension of the
> namespace announcement [2].


Fair enough. I am also aligned to use this solution with the support for
wildcard name service matching.

Thanks,
Beleswar

>
> Shenwei proposed embedding the GPIO controller number in the
> endpoint's source address [3], something I'm ambivalent about and
> still have to look into.  I also have to read Tanmay's latest
> comments.  I'm hoping to be done with all that by the end of the week.
> With the above (1) and (2), a new patchset will be required to reset
> this thread.
>
> Thanks,
> Mathieu
>
> [1]. https://lwn.net/ml/all/afjyH5JT0JS2j0L5@p14s/
> [2]. https://lwn.net/ml/all/afzIABSh1xtMEGbf%40p14s/
> [3]. https://lwn.net/ml/all/PAXPR04MB9185BFA6E7375FAD0B15B021893C2@PAXPR04MB9185.eurprd04.prod.outlook.com/
>
>>          Andrew

^ permalink raw reply

* Re: [PATCH] dcache: add fs.dentry-limit sysctl with negative-first reaper
From: Jan Kara @ 2026-05-18  8:19 UTC (permalink / raw)
  To: Ian Kent
  Cc: NeilBrown, Horst Birthelmer, Amir Goldstein, Miklos Szeredi,
	Jonathan Corbet, Shuah Khan, Alexander Viro, Christian Brauner,
	Jan Kara, linux-doc, linux-kernel, linux-fsdevel,
	Horst Birthelmer
In-Reply-To: <bc359831-e653-4269-9d57-742b48d56d9f@themaw.net>

Hi Ian,

On Mon 18-05-26 10:55:43, Ian Kent wrote:
> On 18/5/26 07:55, NeilBrown wrote:
> > On Fri, 15 May 2026, Horst Birthelmer wrote:
> > According to the email you linked, a problem arises when a directory has
> > a great many negative children.  Code which walks the list of children
> > (such as fsnotify) while holding a lock can suffer unpredictable delays
> > and result in long lock-hold times.  So maybe a limit on negative
> > dentries for any parent is what we really want.  That would be clumsy to
> > implement I imagine.
> 
> But the notion of dropping the dentry in ->d_delete() on last dput() is
> simple enough but did see regressions (the only other place in the VFS
> besides dentry_kill() that the inode is unlinked from the dentry on
> dput()). I wonder if the regression was related to the test itself
> deliberately recreating deleted files and if that really is normal
> behaviour. By itself that should prevent almost all negative dentries
> being retained. Although file systems could do this as well (think XFS
> inode recycling) it should be reasonable to require it be left to the
> VFS.
> 
> But even that's not enough given that, in my case, there would still be
> around 4 million dentries in the LRU cache and in fsnotify there are
> directory child traversals holding the parent i_lock "spinlock" that are
> going to cause problems.

Do you mean there are very many positive children of a directory?

> That's all that much more puzzling when I see things like commit
> 172e422ffea2 ("fsnotify: clear PARENT_WATCHED flags lazily") which looks
> like it implies the child flag depends entirely on the parent state (what
> am I missing Amir?)

PARENT_WATCHED dentry flags (as the name suggests) are only caching the
information whether the parent has notification marks receiving events from
the child. So yes, the flag fully depends on the parent state.

> so why is this traversal even retained in fsnotify?

Not sure which traversal you mean but if you set watch on a parent, you
have to walk all children to set PARENT_WATCHED flag so that you don't miss
events on children...

> > But what if we move dentries to the end of the list when they become
> > negative, and to the start of the list when they become positive?  Then
> > code which walks the child list could simply abort on the first
> > negative.
> > 
> > I doubt that would be quite as easy as it sounds, but it would at least
> > be more focused on the observed symptom rather than some whole-system
> > number which only vaguely correlates with the observed symptom.
> > 
> > Maybe a completely different approach: change children-walking code to
> > drop and retake the lock (with appropriate validation) periodically.
> > What too would address the specific symptom.
> 
> Another good question.
> 
> I have assumed that dropping and re-taking the lock cannot be done but
> this is a question I would like answered as well. Dropping and re-taking
> lock would require, as Miklos pointed out to me off-list, recording the
> list position with say a cursor, introducing unwanted complexity when it
> would be better to accept the cost of a single extra access to the parent
> flags (which I assume is one reason to set the flag in the child).

The parent access is actually more expensive than you might think. Based on
experience with past fsnotify related performance regression I expect some
20% performance hit for small tmpfs writes if you add unconditional parent
access to the write path.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [RFC PATCH v3 1/3] scripts: add kconfirm
From: Arnd Bergmann @ 2026-05-18  8:08 UTC (permalink / raw)
  To: Miguel Ojeda, Demi Marie Obenour
  Cc: Julian Braha, Nathan Chancellor, Nicolas Schier, Jani Nikula,
	Andrew Morton, Gary Guo, ljs, Greg Kroah-Hartman, Masahiro Yamada,
	Miguel Ojeda, Jonathan Corbet, qingfang.deng, yann.prono, ej,
	linux-kernel, rust-for-linux, linux-doc, linux-kbuild
In-Reply-To: <CANiq72mGTehUWS2-MgukOKmwAn3fB63boFNqbNENse6B00M7Zg@mail.gmail.com>

On Mon, May 18, 2026, at 00:53, Miguel Ojeda wrote:
> On Sun, May 17, 2026 at 10:25 PM Demi Marie Obenour
> <demiobenour@gmail.com> wrote:
>>
>> I was hoping for Linux to avoid the Rust trend of downloading tons
>> of third-party crates, with all the supply-chain risks that entails.
>
> I completely agree -- it is why I said a well-known, vetted set of crates.
>
> That is, we should decide on e.g. a single CLI arg parser, a single
> logger, etc. for most of our tools, and ideally they should be
> well-known crates (ideally already trusted via use in the compiler
> itself).
>
> Moreover, they should be pinned with `--locked` or similar (like we
> already recommend for `bindgen-cli`), so that we only ever use
> something that matches the hash in the lockfile that would be
> committed in the tree.

What about dependencies that are normally shipped by the distros
along with the rust compiler? Would it be possible to allow a
range of version that matches the ones that are present on
common distros like we do with C libraries, or would that cause
more problems than it solves?

     Arnd

^ permalink raw reply

* Re: [PATCH net-next v3 08/14] idpf: refactor idpf to use libie control queues
From: Larysa Zaremba @ 2026-05-18  8:01 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev, Tony Nguyen
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	Pavan Kumar Linga, przemyslaw.kitszel, aleksander.lobakin,
	sridhar.samudrala, anjali.singhai, michal.swiatkowski,
	maciej.fijalkowski, emil.s.tantilov, joshua.a.hay, jacob.e.keller,
	jayaprakash.shanmugam, jiri, horms, corbet, richardcochran,
	linux-doc, Aleksandr Loktionov, Samuel Salin
In-Reply-To: <20260515224443.2772147-9-anthony.l.nguyen@intel.com>

On Fri, May 15, 2026 at 03:44:32PM -0700, Tony Nguyen wrote:
> From: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
> 
> Support to initialize and configure controlqs, and manage their
> transactions was introduced in libie. As part of it, most of the existing
> controlq structures are renamed and modified. Use those APIs in idpf and
> make all the necessary changes.

I had reviewed the Sashiko feedback [0]. Here is why I not find the feedback 
very helpful for this particular patch:

1. "Could this parse arbitrary messages as asynchronous events if the opcode 
    isn't checked?" - fixed in a later patch.
2. idpf_send_create_vport_msg() - recv_mem.iov_len is verified by libeth to be 
   no bigger than LIBIE_CTLQ_MAX_BUF_LEN, so this memcpy is always OK.
3. "Should the check instead be against struct_size(recv_rk, key_flex, 
    key_size)?" Yes, it should, but this is old code.
4. idpf_send_get_rx_ptype_msg() "Does this correctly prevent reading past the 
    end of the received buffer?" - no, but this is not introduced by the patch.
5. "Can this race with mbx_task and cause a NULL pointer dereference or
    use-after-free?" - fixed by a later patch.
6. "Does this require a bounds check to verify the received message length?" - 
   yes.

I will send the diff for 3, 4 and 6 in the next message.

[0] https://sashiko.dev/#/patchset/20260515224443.2772147-1-anthony.l.nguyen%40intel.com


> 
> Previously for the send and receive virtchnl messages, there used to be a
> memcpy involved in controlq code to copy the buffer info passed by the send
> function into the controlq specific buffers. There was no restriction to
> use automatic memory in that case. The new implementation in libie removed
> copying of the send buffer info and introduced DMA mapping of the send
> buffer itself. To accommodate it, use dynamic memory for the larger send
> buffers. For smaller ones (<= 128 bytes) libie still can copy them into the
> pre-allocated message memory.
> 
> In case of receive, idpf receives a page pool buffer allocated by the libie
> and care should be taken to release it after use in the idpf.
> 
> The changes are fairly trivial and localized, with a notable exception
> being the consolidation of idpf_vc_xn_shutdown and idpf_deinit_dflt_mbx
> under the latter name. This has some additional consequences that are
> addressed in the following patches.
> 
> This refactoring introduces roughly additional 40KB of module storage used
> for systems that only run idpf, so idpf + libie_cp + libie_pci takes about
> 7% more storage than just idpf before refactoring.
> 
> We now pre-allocate small TX buffers, so that does increase the memory
> usage, but reduces the need to allocate. This results in additional 256 *
> 128B of memory permanently used, increasing the worst-case memory usage by
> 32KB but our ctlq RX buffers need to be of size 4096B anyway (not changed
> by the patchset), so this is hardly noticeable.
> 
> As for the timings, the fact that we are mostly limited by the HW response
> time which is far from instant, is not changed by this refactor.
> 
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
> Co-developed-by: Larysa Zaremba <larysa.zaremba@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> Tested-by: Samuel Salin <Samuel.salin@intel.com>
> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> ---
>  drivers/net/ethernet/intel/idpf/Makefile      |    2 -
>  drivers/net/ethernet/intel/idpf/idpf.h        |   28 +-
>  .../net/ethernet/intel/idpf/idpf_controlq.c   |  631 -------
>  .../net/ethernet/intel/idpf/idpf_controlq.h   |  142 --
>  .../ethernet/intel/idpf/idpf_controlq_api.h   |  177 --
>  .../ethernet/intel/idpf/idpf_controlq_setup.c |  169 --
>  drivers/net/ethernet/intel/idpf/idpf_dev.c    |   60 +-
>  .../net/ethernet/intel/idpf/idpf_ethtool.c    |   28 +-
>  drivers/net/ethernet/intel/idpf/idpf_lib.c    |   51 +-
>  drivers/net/ethernet/intel/idpf/idpf_main.c   |    5 -
>  drivers/net/ethernet/intel/idpf/idpf_mem.h    |   20 -
>  drivers/net/ethernet/intel/idpf/idpf_txrx.h   |    2 +-
>  drivers/net/ethernet/intel/idpf/idpf_vf_dev.c |   64 +-
>  .../net/ethernet/intel/idpf/idpf_virtchnl.c   | 1632 +++++++----------
>  .../net/ethernet/intel/idpf/idpf_virtchnl.h   |   94 +-
>  .../ethernet/intel/idpf/idpf_virtchnl_ptp.c   |  248 ++-
>  16 files changed, 832 insertions(+), 2521 deletions(-)
>  delete mode 100644 drivers/net/ethernet/intel/idpf/idpf_controlq.c
>  delete mode 100644 drivers/net/ethernet/intel/idpf/idpf_controlq.h
>  delete mode 100644 drivers/net/ethernet/intel/idpf/idpf_controlq_api.h
>  delete mode 100644 drivers/net/ethernet/intel/idpf/idpf_controlq_setup.c
>  delete mode 100644 drivers/net/ethernet/intel/idpf/idpf_mem.h
> 
> diff --git a/drivers/net/ethernet/intel/idpf/Makefile b/drivers/net/ethernet/intel/idpf/Makefile
> index 651ddee942bd..4aaafa175ec3 100644
> --- a/drivers/net/ethernet/intel/idpf/Makefile
> +++ b/drivers/net/ethernet/intel/idpf/Makefile
> @@ -6,8 +6,6 @@
>  obj-$(CONFIG_IDPF) += idpf.o
>  
>  idpf-y := \
> -	idpf_controlq.o		\
> -	idpf_controlq_setup.o	\
>  	idpf_dev.o		\
>  	idpf_ethtool.o		\
>  	idpf_idc.o		\
> diff --git a/drivers/net/ethernet/intel/idpf/idpf.h b/drivers/net/ethernet/intel/idpf/idpf.h
> index efdb58990a8b..679539a1b947 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf.h
> +++ b/drivers/net/ethernet/intel/idpf/idpf.h
> @@ -27,7 +27,6 @@ struct idpf_rss_data;
>  #include <linux/intel/virtchnl2.h>
>  
>  #include "idpf_txrx.h"
> -#include "idpf_controlq.h"
>  
>  #define GETMAXVAL(num_bits)		GENMASK((num_bits) - 1, 0)
>  
> @@ -37,11 +36,10 @@ struct idpf_rss_data;
>  #define IDPF_NUM_FILTERS_PER_MSG	20
>  #define IDPF_NUM_DFLT_MBX_Q		2	/* includes both TX and RX */
>  #define IDPF_DFLT_MBX_Q_LEN		64
> -#define IDPF_DFLT_MBX_ID		-1
>  /* maximum number of times to try before resetting mailbox */
>  #define IDPF_MB_MAX_ERR			20
>  #define IDPF_NUM_CHUNKS_PER_MSG(struct_sz, chunk_sz)	\
> -	((IDPF_CTLQ_MAX_BUF_LEN - (struct_sz)) / (chunk_sz))
> +	((LIBIE_CTLQ_MAX_BUF_LEN - (struct_sz)) / (chunk_sz))
>  
>  #define IDPF_WAIT_FOR_MARKER_TIMEO	500
>  #define IDPF_MAX_WAIT			500
> @@ -202,8 +200,8 @@ struct idpf_vport_max_q {
>   * @ptp_reg_init: PTP register initialization
>   */
>  struct idpf_reg_ops {
> -	void (*ctlq_reg_init)(struct idpf_adapter *adapter,
> -			      struct idpf_ctlq_create_info *cq);
> +	void (*ctlq_reg_init)(struct libie_mmio_info *mmio,
> +			      struct libie_ctlq_create_info *cctlq_info);
>  	int (*intr_reg_init)(struct idpf_vport *vport,
>  			     struct idpf_q_vec_rsrc *rsrc);
>  	void (*mb_intr_reg_init)(struct idpf_adapter *adapter);
> @@ -606,8 +604,6 @@ struct idpf_vport_config {
>  	DECLARE_BITMAP(flags, IDPF_VPORT_CONFIG_FLAGS_NBITS);
>  };
>  
> -struct idpf_vc_xn_manager;
> -
>  #define idpf_for_each_vport(adapter, iter) \
>  	for (struct idpf_vport **__##iter = &(adapter)->vports[0], \
>  	     *iter = (adapter)->max_vports ? *__##iter : NULL; \
> @@ -625,8 +621,10 @@ struct idpf_vc_xn_manager;
>   * @state: Init state machine
>   * @flags: See enum idpf_flags
>   * @reset_reg: See struct idpf_reset_reg
> - * @hw: Device access data
>   * @ctlq_ctx: controlq context
> + * @asq: Send control queue info
> + * @arq: Receive control queue info
> + * @xnm: Xn transaction manager
>   * @num_avail_msix: Available number of MSIX vectors
>   * @num_msix_entries: Number of entries in MSIX table
>   * @msix_entries: MSIX table
> @@ -659,7 +657,6 @@ struct idpf_vc_xn_manager;
>   * @stats_task: Periodic statistics retrieval task
>   * @stats_wq: Workqueue for statistics task
>   * @caps: Negotiated capabilities with device
> - * @vcxn_mngr: Virtchnl transaction manager
>   * @dev_ops: See idpf_dev_ops
>   * @cdev_info: IDC core device info pointer
>   * @num_vfs: Number of allocated VFs through sysfs. PF does not directly talk
> @@ -683,8 +680,10 @@ struct idpf_adapter {
>  	enum idpf_state state;
>  	DECLARE_BITMAP(flags, IDPF_FLAGS_NBITS);
>  	struct idpf_reset_reg reset_reg;
> -	struct idpf_hw hw;
>  	struct libie_ctlq_ctx ctlq_ctx;
> +	struct libie_ctlq_info *asq;
> +	struct libie_ctlq_info *arq;
> +	struct libie_ctlq_xn_manager *xnm;
>  	u16 num_avail_msix;
>  	u16 num_msix_entries;
>  	struct msix_entry *msix_entries;
> @@ -721,7 +720,6 @@ struct idpf_adapter {
>  	struct delayed_work stats_task;
>  	struct workqueue_struct *stats_wq;
>  	struct virtchnl2_get_capabilities caps;
> -	struct idpf_vc_xn_manager *vcxn_mngr;
>  
>  	struct idpf_dev_ops dev_ops;
>  	struct iidc_rdma_core_dev_info *cdev_info;
> @@ -881,12 +879,12 @@ static inline u8 idpf_get_min_tx_pkt_len(struct idpf_adapter *adapter)
>   */
>  static inline bool idpf_is_reset_detected(struct idpf_adapter *adapter)
>  {
> -	if (!adapter->hw.arq)
> +	struct libie_ctlq_info *arq = adapter->arq;
> +
> +	if (!arq)
>  		return true;
>  
> -	return !(readl(libie_pci_get_mmio_addr(&adapter->ctlq_ctx.mmio_info,
> -					       adapter->hw.arq->reg.len)) &
> -		 adapter->hw.arq->reg.len_mask);
> +	return !(readl(arq->reg.len) & arq->reg.len_mask);
>  }
>  
>  /**
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_controlq.c b/drivers/net/ethernet/intel/idpf/idpf_controlq.c
> deleted file mode 100644
> index 020b08367e18..000000000000
> --- a/drivers/net/ethernet/intel/idpf/idpf_controlq.c
> +++ /dev/null
> @@ -1,631 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0-only
> -/* Copyright (C) 2023 Intel Corporation */
> -
> -#include "idpf.h"
> -
> -/**
> - * idpf_ctlq_setup_regs - initialize control queue registers
> - * @cq: pointer to the specific control queue
> - * @q_create_info: structs containing info for each queue to be initialized
> - */
> -static void idpf_ctlq_setup_regs(struct idpf_ctlq_info *cq,
> -				 struct idpf_ctlq_create_info *q_create_info)
> -{
> -	/* set control queue registers in our local struct */
> -	cq->reg.head = q_create_info->reg.head;
> -	cq->reg.tail = q_create_info->reg.tail;
> -	cq->reg.len = q_create_info->reg.len;
> -	cq->reg.bah = q_create_info->reg.bah;
> -	cq->reg.bal = q_create_info->reg.bal;
> -	cq->reg.len_mask = q_create_info->reg.len_mask;
> -	cq->reg.len_ena_mask = q_create_info->reg.len_ena_mask;
> -	cq->reg.head_mask = q_create_info->reg.head_mask;
> -}
> -
> -/**
> - * idpf_ctlq_init_regs - Initialize control queue registers
> - * @hw: pointer to hw struct
> - * @cq: pointer to the specific Control queue
> - * @is_rxq: true if receive control queue, false otherwise
> - *
> - * Initialize registers. The caller is expected to have already initialized the
> - * descriptor ring memory and buffer memory
> - */
> -static void idpf_ctlq_init_regs(struct idpf_hw *hw, struct idpf_ctlq_info *cq,
> -				bool is_rxq)
> -{
> -	struct libie_mmio_info *mmio = &hw->back->ctlq_ctx.mmio_info;
> -
> -	/* Update tail to post pre-allocated buffers for rx queues */
> -	if (is_rxq)
> -		writel((u32)(cq->ring_size - 1),
> -		       libie_pci_get_mmio_addr(mmio, cq->reg.tail));
> -
> -	/* For non-Mailbox control queues only TAIL need to be set */
> -	if (cq->q_id != -1)
> -		return;
> -
> -	/* Clear Head for both send or receive */
> -	writel(0, libie_pci_get_mmio_addr(mmio, cq->reg.head));
> -
> -	/* set starting point */
> -	writel(lower_32_bits(cq->desc_ring.pa),
> -	       libie_pci_get_mmio_addr(mmio, cq->reg.bal));
> -	writel(upper_32_bits(cq->desc_ring.pa),
> -	       libie_pci_get_mmio_addr(mmio, cq->reg.bah));
> -	writel((cq->ring_size | cq->reg.len_ena_mask),
> -	       libie_pci_get_mmio_addr(mmio, cq->reg.len));
> -}
> -
> -/**
> - * idpf_ctlq_init_rxq_bufs - populate receive queue descriptors with buf
> - * @cq: pointer to the specific Control queue
> - *
> - * Record the address of the receive queue DMA buffers in the descriptors.
> - * The buffers must have been previously allocated.
> - */
> -static void idpf_ctlq_init_rxq_bufs(struct idpf_ctlq_info *cq)
> -{
> -	int i;
> -
> -	for (i = 0; i < cq->ring_size; i++) {
> -		struct idpf_ctlq_desc *desc = IDPF_CTLQ_DESC(cq, i);
> -		struct idpf_dma_mem *bi = cq->bi.rx_buff[i];
> -
> -		/* No buffer to post to descriptor, continue */
> -		if (!bi)
> -			continue;
> -
> -		desc->flags =
> -			cpu_to_le16(IDPF_CTLQ_FLAG_BUF | IDPF_CTLQ_FLAG_RD);
> -		desc->opcode = 0;
> -		desc->datalen = cpu_to_le16(bi->size);
> -		desc->ret_val = 0;
> -		desc->v_opcode_dtype = 0;
> -		desc->v_retval = 0;
> -		desc->params.indirect.addr_high =
> -			cpu_to_le32(upper_32_bits(bi->pa));
> -		desc->params.indirect.addr_low =
> -			cpu_to_le32(lower_32_bits(bi->pa));
> -		desc->params.indirect.param0 = 0;
> -		desc->params.indirect.sw_cookie = 0;
> -		desc->params.indirect.v_flags = 0;
> -	}
> -}
> -
> -/**
> - * idpf_ctlq_shutdown - shutdown the CQ
> - * @hw: pointer to hw struct
> - * @cq: pointer to the specific Control queue
> - *
> - * The main shutdown routine for any controq queue
> - */
> -static void idpf_ctlq_shutdown(struct idpf_hw *hw, struct idpf_ctlq_info *cq)
> -{
> -	spin_lock(&cq->cq_lock);
> -
> -	/* free ring buffers and the ring itself */
> -	idpf_ctlq_dealloc_ring_res(hw, cq);
> -
> -	/* Set ring_size to 0 to indicate uninitialized queue */
> -	cq->ring_size = 0;
> -
> -	spin_unlock(&cq->cq_lock);
> -}
> -
> -/**
> - * idpf_ctlq_add - add one control queue
> - * @hw: pointer to hardware struct
> - * @qinfo: info for queue to be created
> - * @cq_out: (output) double pointer to control queue to be created
> - *
> - * Allocate and initialize a control queue and add it to the control queue list.
> - * The cq parameter will be allocated/initialized and passed back to the caller
> - * if no errors occur.
> - *
> - * Note: idpf_ctlq_init must be called prior to any calls to idpf_ctlq_add
> - */
> -int idpf_ctlq_add(struct idpf_hw *hw,
> -		  struct idpf_ctlq_create_info *qinfo,
> -		  struct idpf_ctlq_info **cq_out)
> -{
> -	struct idpf_ctlq_info *cq;
> -	bool is_rxq = false;
> -	int err;
> -
> -	cq = kzalloc_obj(*cq);
> -	if (!cq)
> -		return -ENOMEM;
> -
> -	cq->cq_type = qinfo->type;
> -	cq->q_id = qinfo->id;
> -	cq->buf_size = qinfo->buf_size;
> -	cq->ring_size = qinfo->len;
> -
> -	cq->next_to_use = 0;
> -	cq->next_to_clean = 0;
> -	cq->next_to_post = cq->ring_size - 1;
> -
> -	switch (qinfo->type) {
> -	case IDPF_CTLQ_TYPE_MAILBOX_RX:
> -		is_rxq = true;
> -		fallthrough;
> -	case IDPF_CTLQ_TYPE_MAILBOX_TX:
> -		err = idpf_ctlq_alloc_ring_res(hw, cq);
> -		break;
> -	default:
> -		err = -EBADR;
> -		break;
> -	}
> -
> -	if (err)
> -		goto init_free_q;
> -
> -	if (is_rxq) {
> -		idpf_ctlq_init_rxq_bufs(cq);
> -	} else {
> -		/* Allocate the array of msg pointers for TX queues */
> -		cq->bi.tx_msg = kzalloc_objs(struct idpf_ctlq_msg *, qinfo->len);
> -		if (!cq->bi.tx_msg) {
> -			err = -ENOMEM;
> -			goto init_dealloc_q_mem;
> -		}
> -	}
> -
> -	idpf_ctlq_setup_regs(cq, qinfo);
> -
> -	idpf_ctlq_init_regs(hw, cq, is_rxq);
> -
> -	spin_lock_init(&cq->cq_lock);
> -
> -	list_add(&cq->cq_list, &hw->cq_list_head);
> -
> -	*cq_out = cq;
> -
> -	return 0;
> -
> -init_dealloc_q_mem:
> -	/* free ring buffers and the ring itself */
> -	idpf_ctlq_dealloc_ring_res(hw, cq);
> -init_free_q:
> -	kfree(cq);
> -
> -	return err;
> -}
> -
> -/**
> - * idpf_ctlq_remove - deallocate and remove specified control queue
> - * @hw: pointer to hardware struct
> - * @cq: pointer to control queue to be removed
> - */
> -void idpf_ctlq_remove(struct idpf_hw *hw,
> -		      struct idpf_ctlq_info *cq)
> -{
> -	list_del(&cq->cq_list);
> -	idpf_ctlq_shutdown(hw, cq);
> -	kfree(cq);
> -}
> -
> -/**
> - * idpf_ctlq_init - main initialization routine for all control queues
> - * @hw: pointer to hardware struct
> - * @num_q: number of queues to initialize
> - * @q_info: array of structs containing info for each queue to be initialized
> - *
> - * This initializes any number and any type of control queues. This is an all
> - * or nothing routine; if one fails, all previously allocated queues will be
> - * destroyed. This must be called prior to using the individual add/remove
> - * APIs.
> - */
> -int idpf_ctlq_init(struct idpf_hw *hw, u8 num_q,
> -		   struct idpf_ctlq_create_info *q_info)
> -{
> -	struct idpf_ctlq_info *cq, *tmp;
> -	int err;
> -	int i;
> -
> -	INIT_LIST_HEAD(&hw->cq_list_head);
> -
> -	for (i = 0; i < num_q; i++) {
> -		struct idpf_ctlq_create_info *qinfo = q_info + i;
> -
> -		err = idpf_ctlq_add(hw, qinfo, &cq);
> -		if (err)
> -			goto init_destroy_qs;
> -	}
> -
> -	return 0;
> -
> -init_destroy_qs:
> -	list_for_each_entry_safe(cq, tmp, &hw->cq_list_head, cq_list)
> -		idpf_ctlq_remove(hw, cq);
> -
> -	return err;
> -}
> -
> -/**
> - * idpf_ctlq_deinit - destroy all control queues
> - * @hw: pointer to hw struct
> - */
> -void idpf_ctlq_deinit(struct idpf_hw *hw)
> -{
> -	struct idpf_ctlq_info *cq, *tmp;
> -
> -	list_for_each_entry_safe(cq, tmp, &hw->cq_list_head, cq_list)
> -		idpf_ctlq_remove(hw, cq);
> -}
> -
> -/**
> - * idpf_ctlq_send - send command to Control Queue (CTQ)
> - * @hw: pointer to hw struct
> - * @cq: handle to control queue struct to send on
> - * @num_q_msg: number of messages to send on control queue
> - * @q_msg: pointer to array of queue messages to be sent
> - *
> - * The caller is expected to allocate DMAable buffers and pass them to the
> - * send routine via the q_msg struct / control queue specific data struct.
> - * The control queue will hold a reference to each send message until
> - * the completion for that message has been cleaned.
> - */
> -int idpf_ctlq_send(struct idpf_hw *hw, struct idpf_ctlq_info *cq,
> -		   u16 num_q_msg, struct idpf_ctlq_msg q_msg[])
> -{
> -	struct idpf_ctlq_desc *desc;
> -	int num_desc_avail;
> -	int err = 0;
> -	int i;
> -
> -	spin_lock(&cq->cq_lock);
> -
> -	/* Ensure there are enough descriptors to send all messages */
> -	num_desc_avail = IDPF_CTLQ_DESC_UNUSED(cq);
> -	if (num_desc_avail == 0 || num_desc_avail < num_q_msg) {
> -		err = -ENOSPC;
> -		goto err_unlock;
> -	}
> -
> -	for (i = 0; i < num_q_msg; i++) {
> -		struct idpf_ctlq_msg *msg = &q_msg[i];
> -
> -		desc = IDPF_CTLQ_DESC(cq, cq->next_to_use);
> -
> -		desc->opcode = cpu_to_le16(msg->opcode);
> -		desc->pfid_vfid = cpu_to_le16(msg->func_id);
> -
> -		desc->v_opcode_dtype = cpu_to_le32(msg->cookie.mbx.chnl_opcode);
> -		desc->v_retval = cpu_to_le32(msg->cookie.mbx.chnl_retval);
> -
> -		desc->flags = cpu_to_le16((msg->host_id & IDPF_HOST_ID_MASK) <<
> -					  IDPF_CTLQ_FLAG_HOST_ID_S);
> -		if (msg->data_len) {
> -			struct idpf_dma_mem *buff = msg->ctx.indirect.payload;
> -
> -			desc->datalen |= cpu_to_le16(msg->data_len);
> -			desc->flags |= cpu_to_le16(IDPF_CTLQ_FLAG_BUF);
> -			desc->flags |= cpu_to_le16(IDPF_CTLQ_FLAG_RD);
> -
> -			/* Update the address values in the desc with the pa
> -			 * value for respective buffer
> -			 */
> -			desc->params.indirect.addr_high =
> -				cpu_to_le32(upper_32_bits(buff->pa));
> -			desc->params.indirect.addr_low =
> -				cpu_to_le32(lower_32_bits(buff->pa));
> -
> -			memcpy(&desc->params, msg->ctx.indirect.context,
> -			       IDPF_INDIRECT_CTX_SIZE);
> -		} else {
> -			memcpy(&desc->params, msg->ctx.direct,
> -			       IDPF_DIRECT_CTX_SIZE);
> -		}
> -
> -		/* Store buffer info */
> -		cq->bi.tx_msg[cq->next_to_use] = msg;
> -
> -		(cq->next_to_use)++;
> -		if (cq->next_to_use == cq->ring_size)
> -			cq->next_to_use = 0;
> -	}
> -
> -	/* Force memory write to complete before letting hardware
> -	 * know that there are new descriptors to fetch.
> -	 */
> -	dma_wmb();
> -
> -	writel(cq->next_to_use,
> -	       libie_pci_get_mmio_addr(&hw->back->ctlq_ctx.mmio_info,
> -				       cq->reg.tail));
> -
> -err_unlock:
> -	spin_unlock(&cq->cq_lock);
> -
> -	return err;
> -}
> -
> -/**
> - * idpf_ctlq_clean_sq - reclaim send descriptors on HW write back for the
> - * requested queue
> - * @cq: pointer to the specific Control queue
> - * @clean_count: (input|output) number of descriptors to clean as input, and
> - * number of descriptors actually cleaned as output
> - * @msg_status: (output) pointer to msg pointer array to be populated; needs
> - * to be allocated by caller
> - *
> - * Returns an array of message pointers associated with the cleaned
> - * descriptors. The pointers are to the original ctlq_msgs sent on the cleaned
> - * descriptors.  The status will be returned for each; any messages that failed
> - * to send will have a non-zero status. The caller is expected to free original
> - * ctlq_msgs and free or reuse the DMA buffers.
> - */
> -int idpf_ctlq_clean_sq(struct idpf_ctlq_info *cq, u16 *clean_count,
> -		       struct idpf_ctlq_msg *msg_status[])
> -{
> -	struct idpf_ctlq_desc *desc;
> -	u16 i, num_to_clean;
> -	u16 ntc, desc_err;
> -
> -	if (*clean_count == 0)
> -		return 0;
> -	if (*clean_count > cq->ring_size)
> -		return -EBADR;
> -
> -	spin_lock(&cq->cq_lock);
> -
> -	ntc = cq->next_to_clean;
> -
> -	num_to_clean = *clean_count;
> -
> -	for (i = 0; i < num_to_clean; i++) {
> -		/* Fetch next descriptor and check if marked as done */
> -		desc = IDPF_CTLQ_DESC(cq, ntc);
> -		if (!(le16_to_cpu(desc->flags) & IDPF_CTLQ_FLAG_DD))
> -			break;
> -
> -		/* Ensure no other fields are read until DD flag is checked */
> -		dma_rmb();
> -
> -		/* strip off FW internal code */
> -		desc_err = le16_to_cpu(desc->ret_val) & 0xff;
> -
> -		msg_status[i] = cq->bi.tx_msg[ntc];
> -		msg_status[i]->status = desc_err;
> -
> -		cq->bi.tx_msg[ntc] = NULL;
> -
> -		/* Zero out any stale data */
> -		memset(desc, 0, sizeof(*desc));
> -
> -		ntc++;
> -		if (ntc == cq->ring_size)
> -			ntc = 0;
> -	}
> -
> -	cq->next_to_clean = ntc;
> -
> -	spin_unlock(&cq->cq_lock);
> -
> -	/* Return number of descriptors actually cleaned */
> -	*clean_count = i;
> -
> -	return 0;
> -}
> -
> -/**
> - * idpf_ctlq_post_rx_buffs - post buffers to descriptor ring
> - * @hw: pointer to hw struct
> - * @cq: pointer to control queue handle
> - * @buff_count: (input|output) input is number of buffers caller is trying to
> - * return; output is number of buffers that were not posted
> - * @buffs: array of pointers to dma mem structs to be given to hardware
> - *
> - * Caller uses this function to return DMA buffers to the descriptor ring after
> - * consuming them; buff_count will be the number of buffers.
> - *
> - * Note: this function needs to be called after a receive call even
> - * if there are no DMA buffers to be returned, i.e. buff_count = 0,
> - * buffs = NULL to support direct commands
> - */
> -int idpf_ctlq_post_rx_buffs(struct idpf_hw *hw, struct idpf_ctlq_info *cq,
> -			    u16 *buff_count, struct idpf_dma_mem **buffs)
> -{
> -	struct idpf_ctlq_desc *desc;
> -	u16 ntp = cq->next_to_post;
> -	bool buffs_avail = false;
> -	u16 tbp = ntp + 1;
> -	int i = 0;
> -
> -	if (*buff_count > cq->ring_size)
> -		return -EBADR;
> -
> -	if (*buff_count > 0)
> -		buffs_avail = true;
> -
> -	spin_lock(&cq->cq_lock);
> -
> -	if (tbp >= cq->ring_size)
> -		tbp = 0;
> -
> -	if (tbp == cq->next_to_clean)
> -		/* Nothing to do */
> -		goto post_buffs_out;
> -
> -	/* Post buffers for as many as provided or up until the last one used */
> -	while (ntp != cq->next_to_clean) {
> -		desc = IDPF_CTLQ_DESC(cq, ntp);
> -
> -		if (cq->bi.rx_buff[ntp])
> -			goto fill_desc;
> -		if (!buffs_avail) {
> -			/* If the caller hasn't given us any buffers or
> -			 * there are none left, search the ring itself
> -			 * for an available buffer to move to this
> -			 * entry starting at the next entry in the ring
> -			 */
> -			tbp = ntp + 1;
> -
> -			/* Wrap ring if necessary */
> -			if (tbp >= cq->ring_size)
> -				tbp = 0;
> -
> -			while (tbp != cq->next_to_clean) {
> -				if (cq->bi.rx_buff[tbp]) {
> -					cq->bi.rx_buff[ntp] =
> -						cq->bi.rx_buff[tbp];
> -					cq->bi.rx_buff[tbp] = NULL;
> -
> -					/* Found a buffer, no need to
> -					 * search anymore
> -					 */
> -					break;
> -				}
> -
> -				/* Wrap ring if necessary */
> -				tbp++;
> -				if (tbp >= cq->ring_size)
> -					tbp = 0;
> -			}
> -
> -			if (tbp == cq->next_to_clean)
> -				goto post_buffs_out;
> -		} else {
> -			/* Give back pointer to DMA buffer */
> -			cq->bi.rx_buff[ntp] = buffs[i];
> -			i++;
> -
> -			if (i >= *buff_count)
> -				buffs_avail = false;
> -		}
> -
> -fill_desc:
> -		desc->flags =
> -			cpu_to_le16(IDPF_CTLQ_FLAG_BUF | IDPF_CTLQ_FLAG_RD);
> -
> -		/* Post buffers to descriptor */
> -		desc->datalen = cpu_to_le16(cq->bi.rx_buff[ntp]->size);
> -		desc->params.indirect.addr_high =
> -			cpu_to_le32(upper_32_bits(cq->bi.rx_buff[ntp]->pa));
> -		desc->params.indirect.addr_low =
> -			cpu_to_le32(lower_32_bits(cq->bi.rx_buff[ntp]->pa));
> -
> -		ntp++;
> -		if (ntp == cq->ring_size)
> -			ntp = 0;
> -	}
> -
> -post_buffs_out:
> -	/* Only update tail if buffers were actually posted */
> -	if (cq->next_to_post != ntp) {
> -		if (ntp)
> -			/* Update next_to_post to ntp - 1 since current ntp
> -			 * will not have a buffer
> -			 */
> -			cq->next_to_post = ntp - 1;
> -		else
> -			/* Wrap to end of end ring since current ntp is 0 */
> -			cq->next_to_post = cq->ring_size - 1;
> -
> -		dma_wmb();
> -
> -		writel(cq->next_to_post,
> -		       libie_pci_get_mmio_addr(&hw->back->ctlq_ctx.mmio_info,
> -					       cq->reg.tail));
> -	}
> -
> -	spin_unlock(&cq->cq_lock);
> -
> -	/* return the number of buffers that were not posted */
> -	*buff_count = *buff_count - i;
> -
> -	return 0;
> -}
> -
> -/**
> - * idpf_ctlq_recv - receive control queue message call back
> - * @cq: pointer to control queue handle to receive on
> - * @num_q_msg: (input|output) input number of messages that should be received;
> - * output number of messages actually received
> - * @q_msg: (output) array of received control queue messages on this q;
> - * needs to be pre-allocated by caller for as many messages as requested
> - *
> - * Called by interrupt handler or polling mechanism. Caller is expected
> - * to free buffers
> - */
> -int idpf_ctlq_recv(struct idpf_ctlq_info *cq, u16 *num_q_msg,
> -		   struct idpf_ctlq_msg *q_msg)
> -{
> -	u16 num_to_clean, ntc, flags;
> -	struct idpf_ctlq_desc *desc;
> -	int err = 0;
> -	u16 i;
> -
> -	/* take the lock before we start messing with the ring */
> -	spin_lock(&cq->cq_lock);
> -
> -	ntc = cq->next_to_clean;
> -
> -	num_to_clean = *num_q_msg;
> -
> -	for (i = 0; i < num_to_clean; i++) {
> -		/* Fetch next descriptor and check if marked as done */
> -		desc = IDPF_CTLQ_DESC(cq, ntc);
> -		flags = le16_to_cpu(desc->flags);
> -
> -		if (!(flags & IDPF_CTLQ_FLAG_DD))
> -			break;
> -
> -		/* Ensure no other fields are read until DD flag is checked */
> -		dma_rmb();
> -
> -		q_msg[i].vmvf_type = (flags &
> -				      (IDPF_CTLQ_FLAG_FTYPE_VM |
> -				       IDPF_CTLQ_FLAG_FTYPE_PF)) >>
> -				       IDPF_CTLQ_FLAG_FTYPE_S;
> -
> -		if (flags & IDPF_CTLQ_FLAG_ERR)
> -			err  = -EBADMSG;
> -
> -		q_msg[i].cookie.mbx.chnl_opcode =
> -				le32_to_cpu(desc->v_opcode_dtype);
> -		q_msg[i].cookie.mbx.chnl_retval =
> -				le32_to_cpu(desc->v_retval);
> -
> -		q_msg[i].opcode = le16_to_cpu(desc->opcode);
> -		q_msg[i].data_len = le16_to_cpu(desc->datalen);
> -		q_msg[i].status = le16_to_cpu(desc->ret_val);
> -
> -		if (desc->datalen) {
> -			memcpy(q_msg[i].ctx.indirect.context,
> -			       &desc->params.indirect, IDPF_INDIRECT_CTX_SIZE);
> -
> -			/* Assign pointer to dma buffer to ctlq_msg array
> -			 * to be given to upper layer
> -			 */
> -			q_msg[i].ctx.indirect.payload = cq->bi.rx_buff[ntc];
> -
> -			/* Zero out pointer to DMA buffer info;
> -			 * will be repopulated by post buffers API
> -			 */
> -			cq->bi.rx_buff[ntc] = NULL;
> -		} else {
> -			memcpy(q_msg[i].ctx.direct, desc->params.raw,
> -			       IDPF_DIRECT_CTX_SIZE);
> -		}
> -
> -		/* Zero out stale data in descriptor */
> -		memset(desc, 0, sizeof(struct idpf_ctlq_desc));
> -
> -		ntc++;
> -		if (ntc == cq->ring_size)
> -			ntc = 0;
> -	}
> -
> -	cq->next_to_clean = ntc;
> -
> -	spin_unlock(&cq->cq_lock);
> -
> -	*num_q_msg = i;
> -	if (*num_q_msg == 0)
> -		err = -ENOMSG;
> -
> -	return err;
> -}
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_controlq.h b/drivers/net/ethernet/intel/idpf/idpf_controlq.h
> deleted file mode 100644
> index acf595e9265f..000000000000
> --- a/drivers/net/ethernet/intel/idpf/idpf_controlq.h
> +++ /dev/null
> @@ -1,142 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
> -/* Copyright (C) 2023 Intel Corporation */
> -
> -#ifndef _IDPF_CONTROLQ_H_
> -#define _IDPF_CONTROLQ_H_
> -
> -#include <linux/slab.h>
> -
> -#include "idpf_controlq_api.h"
> -
> -/* Maximum buffer length for all control queue types */
> -#define IDPF_CTLQ_MAX_BUF_LEN	4096
> -
> -#define IDPF_CTLQ_DESC(R, i) \
> -	(&(((struct idpf_ctlq_desc *)((R)->desc_ring.va))[i]))
> -
> -#define IDPF_CTLQ_DESC_UNUSED(R) \
> -	((u16)((((R)->next_to_clean > (R)->next_to_use) ? 0 : (R)->ring_size) + \
> -	       (R)->next_to_clean - (R)->next_to_use - 1))
> -
> -/* Control Queue default settings */
> -#define IDPF_CTRL_SQ_CMD_TIMEOUT	250  /* msecs */
> -
> -struct idpf_ctlq_desc {
> -	/* Control queue descriptor flags */
> -	__le16 flags;
> -	/* Control queue message opcode */
> -	__le16 opcode;
> -	__le16 datalen;		/* 0 for direct commands */
> -	union {
> -		__le16 ret_val;
> -		__le16 pfid_vfid;
> -#define IDPF_CTLQ_DESC_VF_ID_S	0
> -#define IDPF_CTLQ_DESC_VF_ID_M	(0x7FF << IDPF_CTLQ_DESC_VF_ID_S)
> -#define IDPF_CTLQ_DESC_PF_ID_S	11
> -#define IDPF_CTLQ_DESC_PF_ID_M	(0x1F << IDPF_CTLQ_DESC_PF_ID_S)
> -	};
> -
> -	/* Virtchnl message opcode and virtchnl descriptor type
> -	 * v_opcode=[27:0], v_dtype=[31:28]
> -	 */
> -	__le32 v_opcode_dtype;
> -	/* Virtchnl return value */
> -	__le32 v_retval;
> -	union {
> -		struct {
> -			__le32 param0;
> -			__le32 param1;
> -			__le32 param2;
> -			__le32 param3;
> -		} direct;
> -		struct {
> -			__le32 param0;
> -			__le16 sw_cookie;
> -			/* Virtchnl flags */
> -			__le16 v_flags;
> -			__le32 addr_high;
> -			__le32 addr_low;
> -		} indirect;
> -		u8 raw[16];
> -	} params;
> -};
> -
> -/* Flags sub-structure
> - * |0  |1  |2  |3  |4  |5  |6  |7  |8  |9  |10 |11 |12 |13 |14 |15 |
> - * |DD |CMP|ERR|  * RSV *  |FTYPE  | *RSV* |RD |VFC|BUF|  HOST_ID  |
> - */
> -/* command flags and offsets */
> -#define IDPF_CTLQ_FLAG_DD_S		0
> -#define IDPF_CTLQ_FLAG_CMP_S		1
> -#define IDPF_CTLQ_FLAG_ERR_S		2
> -#define IDPF_CTLQ_FLAG_FTYPE_S		6
> -#define IDPF_CTLQ_FLAG_RD_S		10
> -#define IDPF_CTLQ_FLAG_VFC_S		11
> -#define IDPF_CTLQ_FLAG_BUF_S		12
> -#define IDPF_CTLQ_FLAG_HOST_ID_S	13
> -
> -#define IDPF_CTLQ_FLAG_DD	BIT(IDPF_CTLQ_FLAG_DD_S)	/* 0x1	  */
> -#define IDPF_CTLQ_FLAG_CMP	BIT(IDPF_CTLQ_FLAG_CMP_S)	/* 0x2	  */
> -#define IDPF_CTLQ_FLAG_ERR	BIT(IDPF_CTLQ_FLAG_ERR_S)	/* 0x4	  */
> -#define IDPF_CTLQ_FLAG_FTYPE_VM	BIT(IDPF_CTLQ_FLAG_FTYPE_S)	/* 0x40	  */
> -#define IDPF_CTLQ_FLAG_FTYPE_PF	BIT(IDPF_CTLQ_FLAG_FTYPE_S + 1)	/* 0x80   */
> -#define IDPF_CTLQ_FLAG_RD	BIT(IDPF_CTLQ_FLAG_RD_S)	/* 0x400  */
> -#define IDPF_CTLQ_FLAG_VFC	BIT(IDPF_CTLQ_FLAG_VFC_S)	/* 0x800  */
> -#define IDPF_CTLQ_FLAG_BUF	BIT(IDPF_CTLQ_FLAG_BUF_S)	/* 0x1000 */
> -
> -/* Host ID is a special field that has 3b and not a 1b flag */
> -#define IDPF_CTLQ_FLAG_HOST_ID_M MAKE_MASK(0x7000UL, IDPF_CTLQ_FLAG_HOST_ID_S)
> -
> -struct idpf_mbxq_desc {
> -	u8 pad[8];		/* CTLQ flags/opcode/len/retval fields */
> -	u32 chnl_opcode;	/* avoid confusion with desc->opcode */
> -	u32 chnl_retval;	/* ditto for desc->retval */
> -	u32 pf_vf_id;		/* used by CP when sending to PF */
> -};
> -
> -/* Max number of MMIO regions not including the mailbox and rstat regions in
> - * the fallback case when the whole bar is mapped.
> - */
> -#define IDPF_MMIO_MAP_FALLBACK_MAX_REMAINING		3
> -
> -struct idpf_mmio_reg {
> -	void __iomem *vaddr;
> -	resource_size_t addr_start;
> -	resource_size_t addr_len;
> -};
> -
> -/* Define the driver hardware struct to replace other control structs as needed
> - * Align to ctlq_hw_info
> - */
> -struct idpf_hw {
> -	/* Array of remaining LAN BAR regions */
> -	int num_lan_regs;
> -	struct idpf_mmio_reg *lan_regs;
> -
> -	struct idpf_adapter *back;
> -
> -	/* control queue - send and receive */
> -	struct idpf_ctlq_info *asq;
> -	struct idpf_ctlq_info *arq;
> -
> -	/* pci info */
> -	u16 device_id;
> -	u16 vendor_id;
> -	u16 subsystem_device_id;
> -	u16 subsystem_vendor_id;
> -	u8 revision_id;
> -	bool adapter_stopped;
> -
> -	struct list_head cq_list_head;
> -};
> -
> -int idpf_ctlq_alloc_ring_res(struct idpf_hw *hw,
> -			     struct idpf_ctlq_info *cq);
> -
> -void idpf_ctlq_dealloc_ring_res(struct idpf_hw *hw, struct idpf_ctlq_info *cq);
> -
> -/* prototype for functions used for dynamic memory allocation */
> -void *idpf_alloc_dma_mem(struct idpf_hw *hw, struct idpf_dma_mem *mem,
> -			 u64 size);
> -void idpf_free_dma_mem(struct idpf_hw *hw, struct idpf_dma_mem *mem);
> -#endif /* _IDPF_CONTROLQ_H_ */
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_controlq_api.h b/drivers/net/ethernet/intel/idpf/idpf_controlq_api.h
> deleted file mode 100644
> index 3414c5f9a831..000000000000
> --- a/drivers/net/ethernet/intel/idpf/idpf_controlq_api.h
> +++ /dev/null
> @@ -1,177 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
> -/* Copyright (C) 2023 Intel Corporation */
> -
> -#ifndef _IDPF_CONTROLQ_API_H_
> -#define _IDPF_CONTROLQ_API_H_
> -
> -#include "idpf_mem.h"
> -
> -struct idpf_hw;
> -
> -/* Used for queue init, response and events */
> -enum idpf_ctlq_type {
> -	IDPF_CTLQ_TYPE_MAILBOX_TX	= 0,
> -	IDPF_CTLQ_TYPE_MAILBOX_RX	= 1,
> -	IDPF_CTLQ_TYPE_CONFIG_TX	= 2,
> -	IDPF_CTLQ_TYPE_CONFIG_RX	= 3,
> -	IDPF_CTLQ_TYPE_EVENT_RX		= 4,
> -	IDPF_CTLQ_TYPE_RDMA_TX		= 5,
> -	IDPF_CTLQ_TYPE_RDMA_RX		= 6,
> -	IDPF_CTLQ_TYPE_RDMA_COMPL	= 7
> -};
> -
> -/* Generic Control Queue Structures */
> -struct idpf_ctlq_reg {
> -	/* used for queue tracking */
> -	u32 head;
> -	u32 tail;
> -	/* Below applies only to default mb (if present) */
> -	u32 len;
> -	u32 bah;
> -	u32 bal;
> -	u32 len_mask;
> -	u32 len_ena_mask;
> -	u32 head_mask;
> -};
> -
> -/* Generic queue msg structure */
> -struct idpf_ctlq_msg {
> -	u8 vmvf_type; /* represents the source of the message on recv */
> -#define IDPF_VMVF_TYPE_VF 0
> -#define IDPF_VMVF_TYPE_VM 1
> -#define IDPF_VMVF_TYPE_PF 2
> -	u8 host_id;
> -	/* 3b field used only when sending a message to CP - to be used in
> -	 * combination with target func_id to route the message
> -	 */
> -#define IDPF_HOST_ID_MASK 0x7
> -
> -	u16 opcode;
> -	u16 data_len;	/* data_len = 0 when no payload is attached */
> -	union {
> -		u16 func_id;	/* when sending a message */
> -		u16 status;	/* when receiving a message */
> -	};
> -	union {
> -		struct {
> -			u32 chnl_opcode;
> -			u32 chnl_retval;
> -		} mbx;
> -	} cookie;
> -	union {
> -#define IDPF_DIRECT_CTX_SIZE	16
> -#define IDPF_INDIRECT_CTX_SIZE	8
> -		/* 16 bytes of context can be provided or 8 bytes of context
> -		 * plus the address of a DMA buffer
> -		 */
> -		u8 direct[IDPF_DIRECT_CTX_SIZE];
> -		struct {
> -			u8 context[IDPF_INDIRECT_CTX_SIZE];
> -			struct idpf_dma_mem *payload;
> -		} indirect;
> -		struct {
> -			u32 rsvd;
> -			u16 data;
> -			u16 flags;
> -		} sw_cookie;
> -	} ctx;
> -};
> -
> -/* Generic queue info structures */
> -/* MB, CONFIG and EVENT q do not have extended info */
> -struct idpf_ctlq_create_info {
> -	enum idpf_ctlq_type type;
> -	int id; /* absolute queue offset passed as input
> -		 * -1 for default mailbox if present
> -		 */
> -	u16 len; /* Queue length passed as input */
> -	u16 buf_size; /* buffer size passed as input */
> -	u64 base_address; /* output, HPA of the Queue start  */
> -	struct idpf_ctlq_reg reg; /* registers accessed by ctlqs */
> -
> -	int ext_info_size;
> -	void *ext_info; /* Specific to q type */
> -};
> -
> -/* Control Queue information */
> -struct idpf_ctlq_info {
> -	struct list_head cq_list;
> -
> -	enum idpf_ctlq_type cq_type;
> -	int q_id;
> -	spinlock_t cq_lock;		/* control queue lock */
> -	/* used for interrupt processing */
> -	u16 next_to_use;
> -	u16 next_to_clean;
> -	u16 next_to_post;		/* starting descriptor to post buffers
> -					 * to after recev
> -					 */
> -
> -	struct idpf_dma_mem desc_ring;	/* descriptor ring memory
> -					 * idpf_dma_mem is defined in OSdep.h
> -					 */
> -	union {
> -		struct idpf_dma_mem **rx_buff;
> -		struct idpf_ctlq_msg **tx_msg;
> -	} bi;
> -
> -	u16 buf_size;			/* queue buffer size */
> -	u16 ring_size;			/* Number of descriptors */
> -	struct idpf_ctlq_reg reg;	/* registers accessed by ctlqs */
> -};
> -
> -/**
> - * enum idpf_mbx_opc - PF/VF mailbox commands
> - * @idpf_mbq_opc_send_msg_to_cp: used by PF or VF to send a message to its CP
> - * @idpf_mbq_opc_send_msg_to_peer_drv: used by PF or VF to send a message to
> - *				       any peer driver
> - */
> -enum idpf_mbx_opc {
> -	idpf_mbq_opc_send_msg_to_cp		= 0x0801,
> -	idpf_mbq_opc_send_msg_to_peer_drv	= 0x0804,
> -};
> -
> -/* API supported for control queue management */
> -/* Will init all required q including default mb.  "q_info" is an array of
> - * create_info structs equal to the number of control queues to be created.
> - */
> -int idpf_ctlq_init(struct idpf_hw *hw, u8 num_q,
> -		   struct idpf_ctlq_create_info *q_info);
> -
> -/* Allocate and initialize a single control queue, which will be added to the
> - * control queue list; returns a handle to the created control queue
> - */
> -int idpf_ctlq_add(struct idpf_hw *hw,
> -		  struct idpf_ctlq_create_info *qinfo,
> -		  struct idpf_ctlq_info **cq);
> -
> -/* Deinitialize and deallocate a single control queue */
> -void idpf_ctlq_remove(struct idpf_hw *hw,
> -		      struct idpf_ctlq_info *cq);
> -
> -/* Sends messages to HW and will also free the buffer*/
> -int idpf_ctlq_send(struct idpf_hw *hw,
> -		   struct idpf_ctlq_info *cq,
> -		   u16 num_q_msg,
> -		   struct idpf_ctlq_msg q_msg[]);
> -
> -/* Receives messages and called by interrupt handler/polling
> - * initiated by app/process. Also caller is supposed to free the buffers
> - */
> -int idpf_ctlq_recv(struct idpf_ctlq_info *cq, u16 *num_q_msg,
> -		   struct idpf_ctlq_msg *q_msg);
> -
> -/* Reclaims send descriptors on HW write back */
> -int idpf_ctlq_clean_sq(struct idpf_ctlq_info *cq, u16 *clean_count,
> -		       struct idpf_ctlq_msg *msg_status[]);
> -
> -/* Indicate RX buffers are done being processed */
> -int idpf_ctlq_post_rx_buffs(struct idpf_hw *hw,
> -			    struct idpf_ctlq_info *cq,
> -			    u16 *buff_count,
> -			    struct idpf_dma_mem **buffs);
> -
> -/* Will destroy all q including the default mb */
> -void idpf_ctlq_deinit(struct idpf_hw *hw);
> -
> -#endif /* _IDPF_CONTROLQ_API_H_ */
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_controlq_setup.c b/drivers/net/ethernet/intel/idpf/idpf_controlq_setup.c
> deleted file mode 100644
> index d4d488c7cfd6..000000000000
> --- a/drivers/net/ethernet/intel/idpf/idpf_controlq_setup.c
> +++ /dev/null
> @@ -1,169 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0-only
> -/* Copyright (C) 2023 Intel Corporation */
> -
> -#include "idpf_controlq.h"
> -
> -/**
> - * idpf_ctlq_alloc_desc_ring - Allocate Control Queue (CQ) rings
> - * @hw: pointer to hw struct
> - * @cq: pointer to the specific Control queue
> - */
> -static int idpf_ctlq_alloc_desc_ring(struct idpf_hw *hw,
> -				     struct idpf_ctlq_info *cq)
> -{
> -	size_t size = cq->ring_size * sizeof(struct idpf_ctlq_desc);
> -
> -	cq->desc_ring.va = idpf_alloc_dma_mem(hw, &cq->desc_ring, size);
> -	if (!cq->desc_ring.va)
> -		return -ENOMEM;
> -
> -	return 0;
> -}
> -
> -/**
> - * idpf_ctlq_alloc_bufs - Allocate Control Queue (CQ) buffers
> - * @hw: pointer to hw struct
> - * @cq: pointer to the specific Control queue
> - *
> - * Allocate the buffer head for all control queues, and if it's a receive
> - * queue, allocate DMA buffers
> - */
> -static int idpf_ctlq_alloc_bufs(struct idpf_hw *hw,
> -				struct idpf_ctlq_info *cq)
> -{
> -	int i;
> -
> -	/* Do not allocate DMA buffers for transmit queues */
> -	if (cq->cq_type == IDPF_CTLQ_TYPE_MAILBOX_TX)
> -		return 0;
> -
> -	/* We'll be allocating the buffer info memory first, then we can
> -	 * allocate the mapped buffers for the event processing
> -	 */
> -	cq->bi.rx_buff = kzalloc_objs(struct idpf_dma_mem *, cq->ring_size);
> -	if (!cq->bi.rx_buff)
> -		return -ENOMEM;
> -
> -	/* allocate the mapped buffers (except for the last one) */
> -	for (i = 0; i < cq->ring_size - 1; i++) {
> -		struct idpf_dma_mem *bi;
> -		int num = 1; /* number of idpf_dma_mem to be allocated */
> -
> -		cq->bi.rx_buff[i] = kzalloc_objs(struct idpf_dma_mem, num);
> -		if (!cq->bi.rx_buff[i])
> -			goto unwind_alloc_cq_bufs;
> -
> -		bi = cq->bi.rx_buff[i];
> -
> -		bi->va = idpf_alloc_dma_mem(hw, bi, cq->buf_size);
> -		if (!bi->va) {
> -			/* unwind will not free the failed entry */
> -			kfree(cq->bi.rx_buff[i]);
> -			goto unwind_alloc_cq_bufs;
> -		}
> -	}
> -
> -	return 0;
> -
> -unwind_alloc_cq_bufs:
> -	/* don't try to free the one that failed... */
> -	i--;
> -	for (; i >= 0; i--) {
> -		idpf_free_dma_mem(hw, cq->bi.rx_buff[i]);
> -		kfree(cq->bi.rx_buff[i]);
> -	}
> -	kfree(cq->bi.rx_buff);
> -
> -	return -ENOMEM;
> -}
> -
> -/**
> - * idpf_ctlq_free_desc_ring - Free Control Queue (CQ) rings
> - * @hw: pointer to hw struct
> - * @cq: pointer to the specific Control queue
> - *
> - * This assumes the posted send buffers have already been cleaned
> - * and de-allocated
> - */
> -static void idpf_ctlq_free_desc_ring(struct idpf_hw *hw,
> -				     struct idpf_ctlq_info *cq)
> -{
> -	idpf_free_dma_mem(hw, &cq->desc_ring);
> -}
> -
> -/**
> - * idpf_ctlq_free_bufs - Free CQ buffer info elements
> - * @hw: pointer to hw struct
> - * @cq: pointer to the specific Control queue
> - *
> - * Free the DMA buffers for RX queues, and DMA buffer header for both RX and TX
> - * queues.  The upper layers are expected to manage freeing of TX DMA buffers
> - */
> -static void idpf_ctlq_free_bufs(struct idpf_hw *hw, struct idpf_ctlq_info *cq)
> -{
> -	void *bi;
> -
> -	if (cq->cq_type == IDPF_CTLQ_TYPE_MAILBOX_RX) {
> -		int i;
> -
> -		/* free DMA buffers for rx queues*/
> -		for (i = 0; i < cq->ring_size; i++) {
> -			if (cq->bi.rx_buff[i]) {
> -				idpf_free_dma_mem(hw, cq->bi.rx_buff[i]);
> -				kfree(cq->bi.rx_buff[i]);
> -			}
> -		}
> -
> -		bi = (void *)cq->bi.rx_buff;
> -	} else {
> -		bi = (void *)cq->bi.tx_msg;
> -	}
> -
> -	/* free the buffer header */
> -	kfree(bi);
> -}
> -
> -/**
> - * idpf_ctlq_dealloc_ring_res - Free memory allocated for control queue
> - * @hw: pointer to hw struct
> - * @cq: pointer to the specific Control queue
> - *
> - * Free the memory used by the ring, buffers and other related structures
> - */
> -void idpf_ctlq_dealloc_ring_res(struct idpf_hw *hw, struct idpf_ctlq_info *cq)
> -{
> -	/* free ring buffers and the ring itself */
> -	idpf_ctlq_free_bufs(hw, cq);
> -	idpf_ctlq_free_desc_ring(hw, cq);
> -}
> -
> -/**
> - * idpf_ctlq_alloc_ring_res - allocate memory for descriptor ring and bufs
> - * @hw: pointer to hw struct
> - * @cq: pointer to control queue struct
> - *
> - * Do *NOT* hold cq_lock when calling this as the memory allocation routines
> - * called are not going to be atomic context safe
> - */
> -int idpf_ctlq_alloc_ring_res(struct idpf_hw *hw, struct idpf_ctlq_info *cq)
> -{
> -	int err;
> -
> -	/* allocate the ring memory */
> -	err = idpf_ctlq_alloc_desc_ring(hw, cq);
> -	if (err)
> -		return err;
> -
> -	/* allocate buffers in the rings */
> -	err = idpf_ctlq_alloc_bufs(hw, cq);
> -	if (err)
> -		goto idpf_init_cq_free_ring;
> -
> -	/* success! */
> -	return 0;
> -
> -idpf_init_cq_free_ring:
> -	idpf_free_dma_mem(hw, &cq->desc_ring);
> -
> -	return err;
> -}
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_dev.c b/drivers/net/ethernet/intel/idpf/idpf_dev.c
> index e36b0017186f..3a357d5dea20 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_dev.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_dev.c
> @@ -10,44 +10,32 @@
>  
>  /**
>   * idpf_ctlq_reg_init - initialize default mailbox registers
> - * @adapter: adapter structure
> - * @cq: pointer to the array of create control queues
> + * @mmio: struct that contains MMIO region info
> + * @cci: struct where the register offset pointer to be copied to
>   */
> -static void idpf_ctlq_reg_init(struct idpf_adapter *adapter,
> -			       struct idpf_ctlq_create_info *cq)
> +static void idpf_ctlq_reg_init(struct libie_mmio_info *mmio,
> +			       struct libie_ctlq_create_info *cci)
>  {
> -	int i;
> -
> -	for (i = 0; i < IDPF_NUM_DFLT_MBX_Q; i++) {
> -		struct idpf_ctlq_create_info *ccq = cq + i;
> -
> -		switch (ccq->type) {
> -		case IDPF_CTLQ_TYPE_MAILBOX_TX:
> -			/* set head and tail registers in our local struct */
> -			ccq->reg.head = PF_FW_ATQH;
> -			ccq->reg.tail = PF_FW_ATQT;
> -			ccq->reg.len = PF_FW_ATQLEN;
> -			ccq->reg.bah = PF_FW_ATQBAH;
> -			ccq->reg.bal = PF_FW_ATQBAL;
> -			ccq->reg.len_mask = PF_FW_ATQLEN_ATQLEN_M;
> -			ccq->reg.len_ena_mask = PF_FW_ATQLEN_ATQENABLE_M;
> -			ccq->reg.head_mask = PF_FW_ATQH_ATQH_M;
> -			break;
> -		case IDPF_CTLQ_TYPE_MAILBOX_RX:
> -			/* set head and tail registers in our local struct */
> -			ccq->reg.head = PF_FW_ARQH;
> -			ccq->reg.tail = PF_FW_ARQT;
> -			ccq->reg.len = PF_FW_ARQLEN;
> -			ccq->reg.bah = PF_FW_ARQBAH;
> -			ccq->reg.bal = PF_FW_ARQBAL;
> -			ccq->reg.len_mask = PF_FW_ARQLEN_ARQLEN_M;
> -			ccq->reg.len_ena_mask = PF_FW_ARQLEN_ARQENABLE_M;
> -			ccq->reg.head_mask = PF_FW_ARQH_ARQH_M;
> -			break;
> -		default:
> -			break;
> -		}
> -	}
> +	struct libie_ctlq_reg *tx_reg = &cci[LIBIE_CTLQ_TYPE_TX].reg;
> +	struct libie_ctlq_reg *rx_reg = &cci[LIBIE_CTLQ_TYPE_RX].reg;
> +
> +	tx_reg->head		= libie_pci_get_mmio_addr(mmio, PF_FW_ATQH);
> +	tx_reg->tail		= libie_pci_get_mmio_addr(mmio, PF_FW_ATQT);
> +	tx_reg->len		= libie_pci_get_mmio_addr(mmio, PF_FW_ATQLEN);
> +	tx_reg->addr_high	= libie_pci_get_mmio_addr(mmio, PF_FW_ATQBAH);
> +	tx_reg->addr_low	= libie_pci_get_mmio_addr(mmio, PF_FW_ATQBAL);
> +	tx_reg->len_mask	= PF_FW_ATQLEN_ATQLEN_M;
> +	tx_reg->len_ena_mask	= PF_FW_ATQLEN_ATQENABLE_M;
> +	tx_reg->head_mask	= PF_FW_ATQH_ATQH_M;
> +
> +	rx_reg->head		= libie_pci_get_mmio_addr(mmio, PF_FW_ARQH);
> +	rx_reg->tail		= libie_pci_get_mmio_addr(mmio, PF_FW_ARQT);
> +	rx_reg->len		= libie_pci_get_mmio_addr(mmio, PF_FW_ARQLEN);
> +	rx_reg->addr_high	= libie_pci_get_mmio_addr(mmio, PF_FW_ARQBAH);
> +	rx_reg->addr_low	= libie_pci_get_mmio_addr(mmio, PF_FW_ARQBAL);
> +	rx_reg->len_mask	= PF_FW_ARQLEN_ARQLEN_M;
> +	rx_reg->len_ena_mask	= PF_FW_ARQLEN_ARQENABLE_M;
> +	rx_reg->head_mask	= PF_FW_ARQH_ARQH_M;
>  }
>  
>  /**
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> index bb99d9e7c65d..95c45f12b0f9 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
> @@ -225,7 +225,7 @@ static int idpf_add_flow_steer(struct net_device *netdev,
>  	spin_unlock_bh(&vport_config->flow_steer_list_lock);
>  
>  	if (err)
> -		goto out;
> +		goto out_free_fltr;
>  
>  	rule->vport_id = cpu_to_le32(vport->vport_id);
>  	rule->count = cpu_to_le32(1);
> @@ -252,17 +252,15 @@ static int idpf_add_flow_steer(struct net_device *netdev,
>  		break;
>  	default:
>  		err = -EINVAL;
> -		goto out;
> +		goto out_free_fltr;
>  	}
>  
>  	err = idpf_add_del_fsteer_filters(vport->adapter, rule,
>  					  VIRTCHNL2_OP_ADD_FLOW_RULE);
> -	if (err)
> -		goto out;
> -
> -	if (info->status != cpu_to_le32(VIRTCHNL2_FLOW_RULE_SUCCESS)) {
> -		err = -EIO;
> -		goto out;
> +	if (err) {
> +		/* virtchnl2 rule is already consumed */
> +		kfree(fltr);
> +		return err;
>  	}
>  
>  	/* Save a copy of the user's flow spec so ethtool can later retrieve it */
> @@ -274,9 +272,10 @@ static int idpf_add_flow_steer(struct net_device *netdev,
>  
>  	user_config->num_fsteer_fltrs++;
>  	spin_unlock_bh(&vport_config->flow_steer_list_lock);
> -	goto out_free_rule;
>  
> -out:
> +	return 0;
> +
> +out_free_fltr:
>  	kfree(fltr);
>  out_free_rule:
>  	kfree(rule);
> @@ -319,12 +318,7 @@ static int idpf_del_flow_steer(struct net_device *netdev,
>  	err = idpf_add_del_fsteer_filters(vport->adapter, rule,
>  					  VIRTCHNL2_OP_DEL_FLOW_RULE);
>  	if (err)
> -		goto out;
> -
> -	if (info->status != cpu_to_le32(VIRTCHNL2_FLOW_RULE_SUCCESS)) {
> -		err = -EIO;
> -		goto out;
> -	}
> +		return err;
>  
>  	spin_lock_bh(&vport_config->flow_steer_list_lock);
>  	list_for_each_entry_safe(f, iter,
> @@ -340,8 +334,6 @@ static int idpf_del_flow_steer(struct net_device *netdev,
>  
>  out_unlock:
>  	spin_unlock_bh(&vport_config->flow_steer_list_lock);
> -out:
> -	kfree(rule);
>  	return err;
>  }
>  
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c
> index 875472ae77fd..0d131bf0993e 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_lib.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c
> @@ -1371,6 +1371,7 @@ void idpf_statistics_task(struct work_struct *work)
>   */
>  void idpf_mbx_task(struct work_struct *work)
>  {
> +	struct libie_ctlq_xn_recv_params xn_params;
>  	struct idpf_adapter *adapter;
>  
>  	adapter = container_of(work, struct idpf_adapter, mbx_task.work);
> @@ -1381,7 +1382,14 @@ void idpf_mbx_task(struct work_struct *work)
>  		queue_delayed_work(adapter->mbx_wq, &adapter->mbx_task,
>  				   usecs_to_jiffies(300));
>  
> -	idpf_recv_mb_msg(adapter, adapter->hw.arq);
> +	xn_params = (struct libie_ctlq_xn_recv_params) {
> +		.xnm = adapter->xnm,
> +		.ctlq = adapter->arq,
> +		.ctlq_msg_handler = idpf_recv_event_msg,
> +		.budget = LIBIE_CTLQ_MAX_XN_ENTRIES,
> +	};
> +
> +	libie_ctlq_xn_recv(&xn_params);
>  }
>  
>  /**
> @@ -1909,7 +1917,6 @@ static void idpf_init_hard_reset(struct idpf_adapter *adapter)
>  		idpf_vc_core_deinit(adapter);
>  		if (!is_reset)
>  			reg_ops->trigger_reset(adapter, IDPF_HR_FUNC_RESET);
> -		idpf_deinit_dflt_mbx(adapter);
>  	} else {
>  		dev_err(dev, "Unhandled hard reset cause\n");
>  		err = -EBADRQC;
> @@ -1984,7 +1991,7 @@ void idpf_vc_event_task(struct work_struct *work)
>  	return;
>  
>  func_reset:
> -	idpf_vc_xn_shutdown(adapter->vcxn_mngr);
> +	libie_ctlq_xn_shutdown(adapter->xnm);
>  drv_load:
>  	set_bit(IDPF_HR_RESET_IN_PROG, adapter->flags);
>  	idpf_init_hard_reset(adapter);
> @@ -2567,44 +2574,6 @@ static int idpf_set_mac(struct net_device *netdev, void *p)
>  	return err;
>  }
>  
> -/**
> - * idpf_alloc_dma_mem - Allocate dma memory
> - * @hw: pointer to hw struct
> - * @mem: pointer to dma_mem struct
> - * @size: size of the memory to allocate
> - */
> -void *idpf_alloc_dma_mem(struct idpf_hw *hw, struct idpf_dma_mem *mem, u64 size)
> -{
> -	struct idpf_adapter *adapter = hw->back;
> -	size_t sz = ALIGN(size, 4096);
> -
> -	/* The control queue resources are freed under a spinlock, contiguous
> -	 * pages will avoid IOMMU remapping and the use vmap (and vunmap in
> -	 * dma_free_*() path.
> -	 */
> -	mem->va = dma_alloc_attrs(&adapter->pdev->dev, sz, &mem->pa,
> -				  GFP_KERNEL, DMA_ATTR_FORCE_CONTIGUOUS);
> -	mem->size = sz;
> -
> -	return mem->va;
> -}
> -
> -/**
> - * idpf_free_dma_mem - Free the allocated dma memory
> - * @hw: pointer to hw struct
> - * @mem: pointer to dma_mem struct
> - */
> -void idpf_free_dma_mem(struct idpf_hw *hw, struct idpf_dma_mem *mem)
> -{
> -	struct idpf_adapter *adapter = hw->back;
> -
> -	dma_free_attrs(&adapter->pdev->dev, mem->size,
> -		       mem->va, mem->pa, DMA_ATTR_FORCE_CONTIGUOUS);
> -	mem->size = 0;
> -	mem->va = NULL;
> -	mem->pa = 0;
> -}
> -
>  static int idpf_hwtstamp_set(struct net_device *netdev,
>  			     struct kernel_hwtstamp_config *config,
>  			     struct netlink_ext_ack *extack)
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c
> index 93b11fb1609f..db91039c54d0 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_main.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_main.c
> @@ -133,7 +133,6 @@ static void idpf_remove(struct pci_dev *pdev)
>  
>  	/* Be a good citizen and leave the device clean on exit */
>  	adapter->dev_ops.reg_ops.trigger_reset(adapter, IDPF_HR_FUNC_RESET);
> -	idpf_deinit_dflt_mbx(adapter);
>  
>  	if (!adapter->netdevs)
>  		goto destroy_wqs;
> @@ -170,8 +169,6 @@ static void idpf_remove(struct pci_dev *pdev)
>  	adapter->vport_config = NULL;
>  	kfree(adapter->netdevs);
>  	adapter->netdevs = NULL;
> -	kfree(adapter->vcxn_mngr);
> -	adapter->vcxn_mngr = NULL;
>  
>  	mutex_destroy(&adapter->vport_ctrl_lock);
>  	mutex_destroy(&adapter->vector_lock);
> @@ -194,7 +191,6 @@ static void idpf_shutdown(struct pci_dev *pdev)
>  	cancel_delayed_work_sync(&adapter->serv_task);
>  	cancel_delayed_work_sync(&adapter->vc_event_task);
>  	idpf_vc_core_deinit(adapter);
> -	idpf_deinit_dflt_mbx(adapter);
>  
>  	if (system_state == SYSTEM_POWER_OFF)
>  		pci_set_power_state(pdev, PCI_D3hot);
> @@ -239,7 +235,6 @@ static int idpf_cfg_device(struct idpf_adapter *adapter)
>  		pci_dbg(pdev, "PCIe PTM is not supported by PCIe bus/controller\n");
>  
>  	pci_set_drvdata(pdev, adapter);
> -	adapter->hw.back = adapter;
>  
>  	return 0;
>  }
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_mem.h b/drivers/net/ethernet/intel/idpf/idpf_mem.h
> deleted file mode 100644
> index 2aaabdc02dd2..000000000000
> --- a/drivers/net/ethernet/intel/idpf/idpf_mem.h
> +++ /dev/null
> @@ -1,20 +0,0 @@
> -/* SPDX-License-Identifier: GPL-2.0-only */
> -/* Copyright (C) 2023 Intel Corporation */
> -
> -#ifndef _IDPF_MEM_H_
> -#define _IDPF_MEM_H_
> -
> -#include <linux/io.h>
> -
> -struct idpf_dma_mem {
> -	void *va;
> -	dma_addr_t pa;
> -	size_t size;
> -};
> -
> -#define idpf_mbx_wr32(a, reg, value)	writel((value), ((a)->mbx.vaddr + (reg)))
> -#define idpf_mbx_rd32(a, reg)		readl((a)->mbx.vaddr + (reg))
> -#define idpf_mbx_wr64(a, reg, value)	writeq((value), ((a)->mbx.vaddr + (reg)))
> -#define idpf_mbx_rd64(a, reg)		readq((a)->mbx.vaddr + (reg))
> -
> -#endif /* _IDPF_MEM_H_ */
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
> index e101ffb20ae0..a82794c8db3b 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
> +++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
> @@ -236,7 +236,7 @@ enum idpf_tx_ctx_desc_eipt_offload {
>  				 (sizeof(u16) * IDPF_RX_MAX_PTYPE_PROTO_IDS))
>  #define IDPF_RX_PTYPE_HDR_SZ	sizeof(struct virtchnl2_get_ptype_info)
>  #define IDPF_RX_MAX_PTYPES_PER_BUF	\
> -	DIV_ROUND_DOWN_ULL((IDPF_CTLQ_MAX_BUF_LEN - IDPF_RX_PTYPE_HDR_SZ), \
> +	DIV_ROUND_DOWN_ULL(LIBIE_CTLQ_MAX_BUF_LEN - IDPF_RX_PTYPE_HDR_SZ, \
>  			   IDPF_RX_MAX_PTYPE_SZ)
>  
>  #define IDPF_GET_PTYPE_SIZE(p) struct_size((p), proto_id, (p)->proto_id_count)
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c b/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c
> index 98b8f678bd9a..3dafe680b701 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c
> @@ -9,42 +9,32 @@
>  
>  /**
>   * idpf_vf_ctlq_reg_init - initialize default mailbox registers
> - * @adapter: adapter structure
> - * @cq: pointer to the array of create control queues
> + * @mmio: struct that contains MMIO region info
> + * @cci: struct where the register offset pointer to be copied to
>   */
> -static void idpf_vf_ctlq_reg_init(struct idpf_adapter *adapter,
> -				  struct idpf_ctlq_create_info *cq)
> +static void idpf_vf_ctlq_reg_init(struct libie_mmio_info *mmio,
> +				  struct libie_ctlq_create_info *cci)
>  {
> -	for (int i = 0; i < IDPF_NUM_DFLT_MBX_Q; i++) {
> -		struct idpf_ctlq_create_info *ccq = cq + i;
> -
> -		switch (ccq->type) {
> -		case IDPF_CTLQ_TYPE_MAILBOX_TX:
> -			/* set head and tail registers in our local struct */
> -			ccq->reg.head = VF_ATQH;
> -			ccq->reg.tail = VF_ATQT;
> -			ccq->reg.len = VF_ATQLEN;
> -			ccq->reg.bah = VF_ATQBAH;
> -			ccq->reg.bal = VF_ATQBAL;
> -			ccq->reg.len_mask = VF_ATQLEN_ATQLEN_M;
> -			ccq->reg.len_ena_mask = VF_ATQLEN_ATQENABLE_M;
> -			ccq->reg.head_mask = VF_ATQH_ATQH_M;
> -			break;
> -		case IDPF_CTLQ_TYPE_MAILBOX_RX:
> -			/* set head and tail registers in our local struct */
> -			ccq->reg.head = VF_ARQH;
> -			ccq->reg.tail = VF_ARQT;
> -			ccq->reg.len = VF_ARQLEN;
> -			ccq->reg.bah = VF_ARQBAH;
> -			ccq->reg.bal = VF_ARQBAL;
> -			ccq->reg.len_mask = VF_ARQLEN_ARQLEN_M;
> -			ccq->reg.len_ena_mask = VF_ARQLEN_ARQENABLE_M;
> -			ccq->reg.head_mask = VF_ARQH_ARQH_M;
> -			break;
> -		default:
> -			break;
> -		}
> -	}
> +	struct libie_ctlq_reg *tx_reg = &cci[LIBIE_CTLQ_TYPE_TX].reg;
> +	struct libie_ctlq_reg *rx_reg = &cci[LIBIE_CTLQ_TYPE_RX].reg;
> +
> +	tx_reg->head		= libie_pci_get_mmio_addr(mmio, VF_ATQH);
> +	tx_reg->tail		= libie_pci_get_mmio_addr(mmio, VF_ATQT);
> +	tx_reg->len		= libie_pci_get_mmio_addr(mmio, VF_ATQLEN);
> +	tx_reg->addr_high	= libie_pci_get_mmio_addr(mmio, VF_ATQBAH);
> +	tx_reg->addr_low	= libie_pci_get_mmio_addr(mmio, VF_ATQBAL);
> +	tx_reg->len_mask	= VF_ATQLEN_ATQLEN_M;
> +	tx_reg->len_ena_mask	= VF_ATQLEN_ATQENABLE_M;
> +	tx_reg->head_mask	= VF_ATQH_ATQH_M;
> +
> +	rx_reg->head		= libie_pci_get_mmio_addr(mmio, VF_ARQH);
> +	rx_reg->tail		= libie_pci_get_mmio_addr(mmio, VF_ARQT);
> +	rx_reg->len		= libie_pci_get_mmio_addr(mmio, VF_ARQLEN);
> +	rx_reg->addr_high	= libie_pci_get_mmio_addr(mmio, VF_ARQBAH);
> +	rx_reg->addr_low	= libie_pci_get_mmio_addr(mmio, VF_ARQBAL);
> +	rx_reg->len_mask	= VF_ARQLEN_ARQLEN_M;
> +	rx_reg->len_ena_mask	= VF_ARQLEN_ARQENABLE_M;
> +	rx_reg->head_mask	= VF_ARQH_ARQH_M;
>  }
>  
>  /**
> @@ -157,11 +147,13 @@ static void idpf_vf_reset_reg_init(struct idpf_adapter *adapter)
>  static void idpf_vf_trigger_reset(struct idpf_adapter *adapter,
>  				  enum idpf_flags trig_cause)
>  {
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.chnl_opcode	= VIRTCHNL2_OP_RESET_VF,
> +	};
>  	/* Do not send VIRTCHNL2_OP_RESET_VF message on driver unload */
>  	if (trig_cause == IDPF_HR_FUNC_RESET &&
>  	    !test_bit(IDPF_REMOVE_IN_PROG, adapter->flags))
> -		idpf_send_mb_msg(adapter, adapter->hw.asq,
> -				 VIRTCHNL2_OP_RESET_VF, 0, NULL, 0);
> +		idpf_send_mb_msg(adapter, &xn_params, NULL, 0);
>  }
>  
>  /**
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> index 3e6411a07e4d..13c8505d126f 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> @@ -9,20 +9,6 @@
>  #include "idpf_virtchnl.h"
>  #include "idpf_ptp.h"
>  
> -/**
> - * struct idpf_vc_xn_manager - Manager for tracking transactions
> - * @ring: backing and lookup for transactions
> - * @free_xn_bm: bitmap for free transactions
> - * @xn_bm_lock: make bitmap access synchronous where necessary
> - * @salt: used to make cookie unique every message
> - */
> -struct idpf_vc_xn_manager {
> -	struct idpf_vc_xn ring[IDPF_VC_XN_RING_LEN];
> -	DECLARE_BITMAP(free_xn_bm, IDPF_VC_XN_RING_LEN);
> -	spinlock_t xn_bm_lock;
> -	u8 salt;
> -};
> -
>  /**
>   * idpf_vid_to_vport - Translate vport id to vport pointer
>   * @adapter: private data struct
> @@ -83,79 +69,65 @@ static void idpf_handle_event_link(struct idpf_adapter *adapter,
>  
>  /**
>   * idpf_recv_event_msg - Receive virtchnl event message
> - * @adapter: Driver specific private structure
> + * @ctx: control queue context
>   * @ctlq_msg: message to copy from
>   *
>   * Receive virtchnl event message
>   */
> -static void idpf_recv_event_msg(struct idpf_adapter *adapter,
> -				struct idpf_ctlq_msg *ctlq_msg)
> +void idpf_recv_event_msg(struct libie_ctlq_ctx *ctx,
> +			 struct libie_ctlq_msg *ctlq_msg)
>  {
> -	int payload_size = ctlq_msg->ctx.indirect.payload->size;
> +	struct kvec *buff = &ctlq_msg->recv_mem;
> +	int payload_size = buff->iov_len;
> +	struct idpf_adapter *adapter;
>  	struct virtchnl2_event *v2e;
>  	u32 event;
>  
> +	adapter = container_of(ctx, struct idpf_adapter, ctlq_ctx);
>  	if (payload_size < sizeof(*v2e)) {
>  		dev_err_ratelimited(&adapter->pdev->dev, "Failed to receive valid payload for event msg (op %d len %d)\n",
> -				    ctlq_msg->cookie.mbx.chnl_opcode,
> +				    ctlq_msg->chnl_opcode,
>  				    payload_size);
> -		return;
> +		goto free_rx_buf;
>  	}
>  
> -	v2e = (struct virtchnl2_event *)ctlq_msg->ctx.indirect.payload->va;
> +	v2e = (struct virtchnl2_event *)buff->iov_base;
>  	event = le32_to_cpu(v2e->event);
>  
>  	switch (event) {
>  	case VIRTCHNL2_EVENT_LINK_CHANGE:
>  		idpf_handle_event_link(adapter, v2e);
> -		return;
> +		break;
>  	default:
>  		dev_err(&adapter->pdev->dev,
>  			"Unknown event %d from PF\n", event);
>  		break;
>  	}
> +
> +free_rx_buf:
> +	libie_ctlq_release_rx_buf(buff);
>  }
>  
>  /**
>   * idpf_mb_clean - Reclaim the send mailbox queue entries
>   * @adapter: driver specific private structure
>   * @asq: send control queue info
> + * @deinit: release all buffers before destroying the queue
>   *
> - * Reclaim the send mailbox queue entries to be used to send further messages
> - *
> - * Return: 0 on success, negative on failure
> + * This is a helper function to clean the send mailbox queue entries.
>   */
> -static int idpf_mb_clean(struct idpf_adapter *adapter,
> -			 struct idpf_ctlq_info *asq)
> +static void idpf_mb_clean(struct idpf_adapter *adapter,
> +			  struct libie_ctlq_info *asq, bool deinit)
>  {
> -	u16 i, num_q_msg = IDPF_DFLT_MBX_Q_LEN;
> -	struct idpf_ctlq_msg **q_msg;
> -	struct idpf_dma_mem *dma_mem;
> -	int err;
> -
> -	q_msg = kzalloc_objs(struct idpf_ctlq_msg *, num_q_msg, GFP_ATOMIC);
> -	if (!q_msg)
> -		return -ENOMEM;
> -
> -	err = idpf_ctlq_clean_sq(asq, &num_q_msg, q_msg);
> -	if (err)
> -		goto err_kfree;
> -
> -	for (i = 0; i < num_q_msg; i++) {
> -		if (!q_msg[i])
> -			continue;
> -		dma_mem = q_msg[i]->ctx.indirect.payload;
> -		if (dma_mem)
> -			dma_free_coherent(&adapter->pdev->dev, dma_mem->size,
> -					  dma_mem->va, dma_mem->pa);
> -		kfree(q_msg[i]);
> -		kfree(dma_mem);
> -	}
> -
> -err_kfree:
> -	kfree(q_msg);
> +	struct libie_ctlq_xn_clean_params clean_params = {
> +		.ctx		= &adapter->ctlq_ctx,
> +		.ctlq		= asq,
> +		.rel_tx_buf	= kfree,
> +		.num_msgs	= IDPF_DFLT_MBX_Q_LEN,
> +		.force		= deinit,
> +	};
>  
> -	return err;
> +	libie_ctlq_xn_send_clean(&clean_params);
>  }
>  
>  #if IS_ENABLED(CONFIG_PTP_1588_CLOCK)
> @@ -189,7 +161,7 @@ static bool idpf_ptp_is_mb_msg(u32 op)
>   * @ctlq_msg: Corresponding control queue message
>   */
>  static void idpf_prepare_ptp_mb_msg(struct idpf_adapter *adapter, u32 op,
> -				    struct idpf_ctlq_msg *ctlq_msg)
> +				    struct libie_ctlq_msg *ctlq_msg)
>  {
>  	/* If the message is PTP-related and the secondary mailbox is available,
>  	 * send the message through the secondary mailbox.
> @@ -197,528 +169,83 @@ static void idpf_prepare_ptp_mb_msg(struct idpf_adapter *adapter, u32 op,
>  	if (!idpf_ptp_is_mb_msg(op) || !adapter->ptp->secondary_mbx.valid)
>  		return;
>  
> -	ctlq_msg->opcode = idpf_mbq_opc_send_msg_to_peer_drv;
> +	ctlq_msg->opcode = LIBIE_CTLQ_SEND_MSG_TO_PEER;
>  	ctlq_msg->func_id = adapter->ptp->secondary_mbx.peer_mbx_q_id;
> -	ctlq_msg->host_id = adapter->ptp->secondary_mbx.peer_id;
> +	ctlq_msg->flags = FIELD_PREP(LIBIE_CTLQ_DESC_FLAG_HOST_ID,
> +				     adapter->ptp->secondary_mbx.peer_id);
>  }
>  #else /* !CONFIG_PTP_1588_CLOCK */
>  static void idpf_prepare_ptp_mb_msg(struct idpf_adapter *adapter, u32 op,
> -				    struct idpf_ctlq_msg *ctlq_msg)
> +				    struct libie_ctlq_msg *ctlq_msg)
>  { }
>  #endif /* CONFIG_PTP_1588_CLOCK */
>  
>  /**
> - * idpf_send_mb_msg - Send message over mailbox
> + * idpf_send_mb_msg - send mailbox message to the device control plane
>   * @adapter: driver specific private structure
> - * @asq: control queue to send message to
> - * @op: virtchnl opcode
> - * @msg_size: size of the payload
> - * @msg: pointer to buffer holding the payload
> - * @cookie: unique SW generated cookie per message
> - *
> - * Will prepare the control queue message and initiates the send api
> - *
> - * Return: 0 on success, negative on failure
> - */
> -int idpf_send_mb_msg(struct idpf_adapter *adapter, struct idpf_ctlq_info *asq,
> -		     u32 op, u16 msg_size, u8 *msg, u16 cookie)
> -{
> -	struct idpf_ctlq_msg *ctlq_msg;
> -	struct idpf_dma_mem *dma_mem;
> -	int err;
> -
> -	/* If we are here and a reset is detected nothing much can be
> -	 * done. This thread should silently abort and expected to
> -	 * be corrected with a new run either by user or driver
> -	 * flows after reset
> -	 */
> -	if (idpf_is_reset_detected(adapter))
> -		return 0;
> -
> -	err = idpf_mb_clean(adapter, asq);
> -	if (err)
> -		return err;
> -
> -	ctlq_msg = kzalloc_obj(*ctlq_msg, GFP_ATOMIC);
> -	if (!ctlq_msg)
> -		return -ENOMEM;
> -
> -	dma_mem = kzalloc_obj(*dma_mem, GFP_ATOMIC);
> -	if (!dma_mem) {
> -		err = -ENOMEM;
> -		goto dma_mem_error;
> -	}
> -
> -	ctlq_msg->opcode = idpf_mbq_opc_send_msg_to_cp;
> -	ctlq_msg->func_id = 0;
> -
> -	idpf_prepare_ptp_mb_msg(adapter, op, ctlq_msg);
> -
> -	ctlq_msg->data_len = msg_size;
> -	ctlq_msg->cookie.mbx.chnl_opcode = op;
> -	ctlq_msg->cookie.mbx.chnl_retval = 0;
> -	dma_mem->size = IDPF_CTLQ_MAX_BUF_LEN;
> -	dma_mem->va = dma_alloc_coherent(&adapter->pdev->dev, dma_mem->size,
> -					 &dma_mem->pa, GFP_ATOMIC);
> -	if (!dma_mem->va) {
> -		err = -ENOMEM;
> -		goto dma_alloc_error;
> -	}
> -
> -	/* It's possible we're just sending an opcode but no buffer */
> -	if (msg && msg_size)
> -		memcpy(dma_mem->va, msg, msg_size);
> -	ctlq_msg->ctx.indirect.payload = dma_mem;
> -	ctlq_msg->ctx.sw_cookie.data = cookie;
> -
> -	err = idpf_ctlq_send(&adapter->hw, asq, 1, ctlq_msg);
> -	if (err)
> -		goto send_error;
> -
> -	return 0;
> -
> -send_error:
> -	dma_free_coherent(&adapter->pdev->dev, dma_mem->size, dma_mem->va,
> -			  dma_mem->pa);
> -dma_alloc_error:
> -	kfree(dma_mem);
> -dma_mem_error:
> -	kfree(ctlq_msg);
> -
> -	return err;
> -}
> -
> -/* API for virtchnl "transaction" support ("xn" for short). */
> -
> -/**
> - * idpf_vc_xn_lock - Request exclusive access to vc transaction
> - * @xn: struct idpf_vc_xn* to access
> - */
> -#define idpf_vc_xn_lock(xn)			\
> -	spin_lock(&(xn)->lock)
> -
> -/**
> - * idpf_vc_xn_unlock - Release exclusive access to vc transaction
> - * @xn: struct idpf_vc_xn* to access
> - */
> -#define idpf_vc_xn_unlock(xn)		\
> -	spin_unlock(&(xn)->lock)
> -
> -/**
> - * idpf_vc_xn_release_bufs - Release reference to reply buffer(s) and
> - * reset the transaction state.
> - * @xn: struct idpf_vc_xn to update
> - */
> -static void idpf_vc_xn_release_bufs(struct idpf_vc_xn *xn)
> -{
> -	xn->reply.iov_base = NULL;
> -	xn->reply.iov_len = 0;
> -
> -	if (xn->state != IDPF_VC_XN_SHUTDOWN)
> -		xn->state = IDPF_VC_XN_IDLE;
> -}
> -
> -/**
> - * idpf_vc_xn_init - Initialize virtchnl transaction object
> - * @vcxn_mngr: pointer to vc transaction manager struct
> - */
> -static void idpf_vc_xn_init(struct idpf_vc_xn_manager *vcxn_mngr)
> -{
> -	int i;
> -
> -	spin_lock_init(&vcxn_mngr->xn_bm_lock);
> -
> -	for (i = 0; i < ARRAY_SIZE(vcxn_mngr->ring); i++) {
> -		struct idpf_vc_xn *xn = &vcxn_mngr->ring[i];
> -
> -		xn->state = IDPF_VC_XN_IDLE;
> -		xn->idx = i;
> -		idpf_vc_xn_release_bufs(xn);
> -		spin_lock_init(&xn->lock);
> -		init_completion(&xn->completed);
> -	}
> -
> -	bitmap_fill(vcxn_mngr->free_xn_bm, IDPF_VC_XN_RING_LEN);
> -}
> -
> -/**
> - * idpf_vc_xn_shutdown - Uninitialize virtchnl transaction object
> - * @vcxn_mngr: pointer to vc transaction manager struct
> + * @xn_params: Xn send parameters to fill
> + * @send_buf: buffer to send
> + * @send_buf_size: size of the send buffer
>   *
> - * All waiting threads will be woken-up and their transaction aborted. Further
> - * operations on that object will fail.
> - */
> -void idpf_vc_xn_shutdown(struct idpf_vc_xn_manager *vcxn_mngr)
> -{
> -	int i;
> -
> -	spin_lock_bh(&vcxn_mngr->xn_bm_lock);
> -	bitmap_zero(vcxn_mngr->free_xn_bm, IDPF_VC_XN_RING_LEN);
> -	spin_unlock_bh(&vcxn_mngr->xn_bm_lock);
> -
> -	for (i = 0; i < ARRAY_SIZE(vcxn_mngr->ring); i++) {
> -		struct idpf_vc_xn *xn = &vcxn_mngr->ring[i];
> -
> -		idpf_vc_xn_lock(xn);
> -		xn->state = IDPF_VC_XN_SHUTDOWN;
> -		idpf_vc_xn_release_bufs(xn);
> -		idpf_vc_xn_unlock(xn);
> -		complete_all(&xn->completed);
> -	}
> -}
> -
> -/**
> - * idpf_vc_xn_pop_free - Pop a free transaction from free list
> - * @vcxn_mngr: transaction manager to pop from
> + * Fill the Xn parameters with the required info to send a virtchnl message.
> + * The send buffer is DMA mapped in the libie to avoid memcpy.
>   *
> - * Returns NULL if no free transactions
> - */
> -static
> -struct idpf_vc_xn *idpf_vc_xn_pop_free(struct idpf_vc_xn_manager *vcxn_mngr)
> -{
> -	struct idpf_vc_xn *xn = NULL;
> -	unsigned long free_idx;
> -
> -	spin_lock_bh(&vcxn_mngr->xn_bm_lock);
> -	free_idx = find_first_bit(vcxn_mngr->free_xn_bm, IDPF_VC_XN_RING_LEN);
> -	if (free_idx == IDPF_VC_XN_RING_LEN)
> -		goto do_unlock;
> -
> -	clear_bit(free_idx, vcxn_mngr->free_xn_bm);
> -	xn = &vcxn_mngr->ring[free_idx];
> -	xn->salt = vcxn_mngr->salt++;
> -
> -do_unlock:
> -	spin_unlock_bh(&vcxn_mngr->xn_bm_lock);
> -
> -	return xn;
> -}
> -
> -/**
> - * idpf_vc_xn_push_free - Push a free transaction to free list
> - * @vcxn_mngr: transaction manager to push to
> - * @xn: transaction to push
> - */
> -static void idpf_vc_xn_push_free(struct idpf_vc_xn_manager *vcxn_mngr,
> -				 struct idpf_vc_xn *xn)
> -{
> -	idpf_vc_xn_release_bufs(xn);
> -	spin_lock_bh(&vcxn_mngr->xn_bm_lock);
> -	set_bit(xn->idx, vcxn_mngr->free_xn_bm);
> -	spin_unlock_bh(&vcxn_mngr->xn_bm_lock);
> -}
> -
> -/**
> - * idpf_vc_xn_exec - Perform a send/recv virtchnl transaction
> - * @adapter: driver specific private structure with vcxn_mngr
> - * @params: parameters for this particular transaction including
> - *   -vc_op: virtchannel operation to send
> - *   -send_buf: kvec iov for send buf and len
> - *   -recv_buf: kvec iov for recv buf and len (ignored if NULL)
> - *   -timeout_ms: timeout waiting for a reply (milliseconds)
> - *   -async: don't wait for message reply, will lose caller context
> - *   -async_handler: callback to handle async replies
> - *
> - * @returns >= 0 for success, the size of the initial reply (may or may not be
> - * >= @recv_buf.iov_len, but we never overflow @@recv_buf_iov_base). < 0 for
> - * error.
> - */
> -ssize_t idpf_vc_xn_exec(struct idpf_adapter *adapter,
> -			const struct idpf_vc_xn_params *params)
> -{
> -	const struct kvec *send_buf = &params->send_buf;
> -	struct idpf_vc_xn *xn;
> -	ssize_t retval;
> -	u16 cookie;
> -
> -	xn = idpf_vc_xn_pop_free(adapter->vcxn_mngr);
> -	/* no free transactions available */
> -	if (!xn)
> -		return -ENOSPC;
> -
> -	idpf_vc_xn_lock(xn);
> -	if (xn->state == IDPF_VC_XN_SHUTDOWN) {
> -		retval = -ENXIO;
> -		goto only_unlock;
> -	} else if (xn->state != IDPF_VC_XN_IDLE) {
> -		/* We're just going to clobber this transaction even though
> -		 * it's not IDLE. If we don't reuse it we could theoretically
> -		 * eventually leak all the free transactions and not be able to
> -		 * send any messages. At least this way we make an attempt to
> -		 * remain functional even though something really bad is
> -		 * happening that's corrupting what was supposed to be free
> -		 * transactions.
> -		 */
> -		WARN_ONCE(1, "There should only be idle transactions in free list (idx %d op %d)\n",
> -			  xn->idx, xn->vc_op);
> -	}
> -
> -	xn->reply = params->recv_buf;
> -	xn->reply_sz = 0;
> -	xn->state = params->async ? IDPF_VC_XN_ASYNC : IDPF_VC_XN_WAITING;
> -	xn->vc_op = params->vc_op;
> -	xn->async_handler = params->async_handler;
> -	idpf_vc_xn_unlock(xn);
> -
> -	if (!params->async)
> -		reinit_completion(&xn->completed);
> -	cookie = FIELD_PREP(IDPF_VC_XN_SALT_M, xn->salt) |
> -		 FIELD_PREP(IDPF_VC_XN_IDX_M, xn->idx);
> -
> -	retval = idpf_send_mb_msg(adapter, adapter->hw.asq, params->vc_op,
> -				  send_buf->iov_len, send_buf->iov_base,
> -				  cookie);
> -	if (retval) {
> -		idpf_vc_xn_lock(xn);
> -		goto release_and_unlock;
> -	}
> -
> -	if (params->async)
> -		return 0;
> -
> -	wait_for_completion_timeout(&xn->completed,
> -				    msecs_to_jiffies(params->timeout_ms));
> -
> -	/* No need to check the return value; we check the final state of the
> -	 * transaction below. It's possible the transaction actually gets more
> -	 * timeout than specified if we get preempted here but after
> -	 * wait_for_completion_timeout returns. This should be non-issue
> -	 * however.
> -	 */
> -	idpf_vc_xn_lock(xn);
> -	switch (xn->state) {
> -	case IDPF_VC_XN_SHUTDOWN:
> -		retval = -ENXIO;
> -		goto only_unlock;
> -	case IDPF_VC_XN_WAITING:
> -		dev_notice_ratelimited(&adapter->pdev->dev,
> -				       "Transaction timed-out (op:%d cookie:%04x vc_op:%d salt:%02x timeout:%dms)\n",
> -				       params->vc_op, cookie, xn->vc_op,
> -				       xn->salt, params->timeout_ms);
> -		retval = -ETIME;
> -		break;
> -	case IDPF_VC_XN_COMPLETED_SUCCESS:
> -		retval = xn->reply_sz;
> -		break;
> -	case IDPF_VC_XN_COMPLETED_FAILED:
> -		dev_notice_ratelimited(&adapter->pdev->dev, "Transaction failed (op %d)\n",
> -				       params->vc_op);
> -		retval = -EIO;
> -		break;
> -	default:
> -		/* Invalid state. */
> -		WARN_ON_ONCE(1);
> -		retval = -EIO;
> -		break;
> -	}
> -
> -release_and_unlock:
> -	idpf_vc_xn_push_free(adapter->vcxn_mngr, xn);
> -	/* If we receive a VC reply after here, it will be dropped. */
> -only_unlock:
> -	idpf_vc_xn_unlock(xn);
> -
> -	return retval;
> -}
> -
> -/**
> - * idpf_vc_xn_forward_async - Handle async reply receives
> - * @adapter: private data struct
> - * @xn: transaction to handle
> - * @ctlq_msg: corresponding ctlq_msg
> + * Cleanup the mailbox queue entries of the previously sent message to
> + * unmap and release the buffer.
>   *
> - * For async sends we're going to lose the caller's context so, if an
> - * async_handler was provided, it can deal with the reply, otherwise we'll just
> - * check and report if there is an error.
> + * Return: 0 if the request was successful, -%EBUSY if reset is detected
> + *	   or Tx control queue is full, other negative error code on failure.
>   */
> -static int
> -idpf_vc_xn_forward_async(struct idpf_adapter *adapter, struct idpf_vc_xn *xn,
> -			 const struct idpf_ctlq_msg *ctlq_msg)
> +int idpf_send_mb_msg(struct idpf_adapter *adapter,
> +		     struct libie_ctlq_xn_send_params *xn_params,
> +		     void *send_buf, size_t send_buf_size)
>  {
> -	int err = 0;
> -
> -	if (ctlq_msg->cookie.mbx.chnl_opcode != xn->vc_op) {
> -		dev_err_ratelimited(&adapter->pdev->dev, "Async message opcode does not match transaction opcode (msg: %d) (xn: %d)\n",
> -				    ctlq_msg->cookie.mbx.chnl_opcode, xn->vc_op);
> -		xn->reply_sz = 0;
> -		err = -EINVAL;
> -		goto release_bufs;
> -	}
> -
> -	if (xn->async_handler) {
> -		err = xn->async_handler(adapter, xn, ctlq_msg);
> -		goto release_bufs;
> -	}
> -
> -	if (ctlq_msg->cookie.mbx.chnl_retval) {
> -		xn->reply_sz = 0;
> -		dev_err_ratelimited(&adapter->pdev->dev, "Async message failure (op %d)\n",
> -				    ctlq_msg->cookie.mbx.chnl_opcode);
> -		err = -EINVAL;
> -	}
> -
> -release_bufs:
> -	idpf_vc_xn_push_free(adapter->vcxn_mngr, xn);
> -
> -	return err;
> -}
> -
> -/**
> - * idpf_vc_xn_forward_reply - copy a reply back to receiving thread
> - * @adapter: driver specific private structure with vcxn_mngr
> - * @ctlq_msg: controlq message to send back to receiving thread
> - */
> -static int
> -idpf_vc_xn_forward_reply(struct idpf_adapter *adapter,
> -			 const struct idpf_ctlq_msg *ctlq_msg)
> -{
> -	const void *payload = NULL;
> -	size_t payload_size = 0;
> -	struct idpf_vc_xn *xn;
> -	u16 msg_info;
> -	int err = 0;
> -	u16 xn_idx;
> -	u16 salt;
> -
> -	msg_info = ctlq_msg->ctx.sw_cookie.data;
> -	xn_idx = FIELD_GET(IDPF_VC_XN_IDX_M, msg_info);
> -	if (xn_idx >= ARRAY_SIZE(adapter->vcxn_mngr->ring)) {
> -		dev_err_ratelimited(&adapter->pdev->dev, "Out of bounds cookie received: %02x\n",
> -				    xn_idx);
> -		return -EINVAL;
> -	}
> -	xn = &adapter->vcxn_mngr->ring[xn_idx];
> -	idpf_vc_xn_lock(xn);
> -	salt = FIELD_GET(IDPF_VC_XN_SALT_M, msg_info);
> -	if (xn->salt != salt) {
> -		dev_err_ratelimited(&adapter->pdev->dev, "Transaction salt does not match (exp:%d@%02x(%d) != got:%d@%02x)\n",
> -				    xn->vc_op, xn->salt, xn->state,
> -				    ctlq_msg->cookie.mbx.chnl_opcode, salt);
> -		idpf_vc_xn_unlock(xn);
> -		return -EINVAL;
> -	}
> -
> -	switch (xn->state) {
> -	case IDPF_VC_XN_WAITING:
> -		/* success */
> -		break;
> -	case IDPF_VC_XN_IDLE:
> -		dev_err_ratelimited(&adapter->pdev->dev, "Unexpected or belated VC reply (op %d)\n",
> -				    ctlq_msg->cookie.mbx.chnl_opcode);
> -		err = -EINVAL;
> -		goto out_unlock;
> -	case IDPF_VC_XN_SHUTDOWN:
> -		/* ENXIO is a bit special here as the recv msg loop uses that
> -		 * know if it should stop trying to clean the ring if we lost
> -		 * the virtchnl. We need to stop playing with registers and
> -		 * yield.
> -		 */
> -		err = -ENXIO;
> -		goto out_unlock;
> -	case IDPF_VC_XN_ASYNC:
> -		/* Set reply_sz from the actual payload so that async_handler
> -		 * can evaluate the response.
> -		 */
> -		xn->reply_sz = ctlq_msg->data_len;
> -		err = idpf_vc_xn_forward_async(adapter, xn, ctlq_msg);
> -		idpf_vc_xn_unlock(xn);
> -		return err;
> -	default:
> -		dev_err_ratelimited(&adapter->pdev->dev, "Overwriting VC reply (op %d)\n",
> -				    ctlq_msg->cookie.mbx.chnl_opcode);
> -		err = -EBUSY;
> -		goto out_unlock;
> -	}
> -
> -	if (ctlq_msg->cookie.mbx.chnl_opcode != xn->vc_op) {
> -		dev_err_ratelimited(&adapter->pdev->dev, "Message opcode does not match transaction opcode (msg: %d) (xn: %d)\n",
> -				    ctlq_msg->cookie.mbx.chnl_opcode, xn->vc_op);
> -		xn->reply_sz = 0;
> -		xn->state = IDPF_VC_XN_COMPLETED_FAILED;
> -		err = -EINVAL;
> -		goto out_unlock;
> -	}
> +	struct libie_ctlq_msg ctlq_msg = {};
>  
> -	if (ctlq_msg->cookie.mbx.chnl_retval) {
> -		xn->reply_sz = 0;
> -		xn->state = IDPF_VC_XN_COMPLETED_FAILED;
> -		err = -EINVAL;
> -		goto out_unlock;
> -	}
> +	if (idpf_is_reset_detected(adapter)) {
> +		if (!libie_cp_can_send_onstack(send_buf_size))
> +			kfree(send_buf);
>  
> -	if (ctlq_msg->data_len) {
> -		payload = ctlq_msg->ctx.indirect.payload->va;
> -		payload_size = ctlq_msg->data_len;
> +		return -EBUSY;
>  	}
>  
> -	xn->reply_sz = payload_size;
> -	xn->state = IDPF_VC_XN_COMPLETED_SUCCESS;
> +	idpf_prepare_ptp_mb_msg(adapter, xn_params->chnl_opcode, &ctlq_msg);
> +	xn_params->ctlq_msg = ctlq_msg.opcode ? &ctlq_msg : NULL;
>  
> -	if (xn->reply.iov_base && xn->reply.iov_len && payload_size)
> -		memcpy(xn->reply.iov_base, payload,
> -		       min_t(size_t, xn->reply.iov_len, payload_size));
> +	xn_params->send_buf.iov_base = send_buf;
> +	xn_params->send_buf.iov_len = send_buf_size;
> +	xn_params->xnm = adapter->xnm;
> +	xn_params->ctlq = xn_params->ctlq ? xn_params->ctlq : adapter->asq;
> +	xn_params->rel_tx_buf = kfree;
>  
> -out_unlock:
> -	idpf_vc_xn_unlock(xn);
> -	/* we _cannot_ hold lock while calling complete */
> -	complete(&xn->completed);
> +	idpf_mb_clean(adapter, xn_params->ctlq, false);
>  
> -	return err;
> +	return libie_ctlq_xn_send(xn_params);
>  }
>  
>  /**
> - * idpf_recv_mb_msg - Receive message over mailbox
> + * idpf_send_mb_msg_kfree - send mailbox message and free the send buffer
>   * @adapter: driver specific private structure
> - * @arq: control queue to receive message from
> + * @xn_params: Xn send parameters to fill
> + * @send_buf: buffer to send, can be released with kfree()
> + * @send_buf_size: size of the send buffer
>   *
> - * Will receive control queue message and posts the receive buffer.
> + * libie_cp functions consume only buffers above certain size,
> + * smaller buffers are assumed to be on the stack. However, for some
> + * commands with variable message size it makes sense to always use kzalloc(),
> + * which means we have to free smaller buffers ourselves.
>   *
> - * Return: 0 on success and negative on failure.
> + * Return: 0 if no unexpected errors were encountered,
> + *	   negative error code otherwise.
>   */
> -int idpf_recv_mb_msg(struct idpf_adapter *adapter, struct idpf_ctlq_info *arq)
> +int idpf_send_mb_msg_kfree(struct idpf_adapter *adapter,
> +			   struct libie_ctlq_xn_send_params *xn_params,
> +			   void *send_buf, size_t send_buf_size)
>  {
> -	struct idpf_ctlq_msg ctlq_msg;
> -	struct idpf_dma_mem *dma_mem;
> -	int post_err, err;
> -	u16 num_recv;
> -
> -	while (1) {
> -		/* This will get <= num_recv messages and output how many
> -		 * actually received on num_recv.
> -		 */
> -		num_recv = 1;
> -		err = idpf_ctlq_recv(arq, &num_recv, &ctlq_msg);
> -		if (err || !num_recv)
> -			break;
> -
> -		if (ctlq_msg.data_len) {
> -			dma_mem = ctlq_msg.ctx.indirect.payload;
> -		} else {
> -			dma_mem = NULL;
> -			num_recv = 0;
> -		}
> +	int err = idpf_send_mb_msg(adapter, xn_params, send_buf, send_buf_size);
>  
> -		if (ctlq_msg.cookie.mbx.chnl_opcode == VIRTCHNL2_OP_EVENT)
> -			idpf_recv_event_msg(adapter, &ctlq_msg);
> -		else
> -			err = idpf_vc_xn_forward_reply(adapter, &ctlq_msg);
> -
> -		post_err = idpf_ctlq_post_rx_buffs(&adapter->hw, arq,
> -						   &num_recv, &dma_mem);
> -
> -		/* If post failed clear the only buffer we supplied */
> -		if (post_err) {
> -			if (dma_mem)
> -				dma_free_coherent(&adapter->pdev->dev,
> -						  dma_mem->size, dma_mem->va,
> -						  dma_mem->pa);
> -			break;
> -		}
> -
> -		/* virtchnl trying to shutdown, stop cleaning */
> -		if (err == -ENXIO)
> -			break;
> -	}
> +	if (libie_cp_can_send_onstack(send_buf_size))
> +		kfree(send_buf);
>  
>  	return err;
>  }
> @@ -768,45 +295,43 @@ struct idpf_queue_set *idpf_alloc_queue_set(struct idpf_adapter *adapter,
>  static int idpf_send_chunked_msg(struct idpf_adapter *adapter,
>  				 const struct idpf_chunked_msg_params *params)
>  {
> -	struct idpf_vc_xn_params xn_params = {
> -		.vc_op		= params->vc_op,
> +	struct libie_ctlq_xn_send_params xn_params = {
>  		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= params->vc_op,
>  	};
>  	const void *pos = params->chunks;
> -	u32 num_chunks, num_msgs, buf_sz;
> -	void *buf __free(kfree) = NULL;
>  	u32 totqs = params->num_chunks;
>  	u32 vid = params->vport_id;
> +	u32 num_chunks, num_msgs;
>  
> -	num_chunks = min(IDPF_NUM_CHUNKS_PER_MSG(params->config_sz,
> -						 params->chunk_sz), totqs);
> +	num_chunks = IDPF_NUM_CHUNKS_PER_MSG(params->config_sz,
> +					     params->chunk_sz);
>  	num_msgs = DIV_ROUND_UP(totqs, num_chunks);
>  
> -	buf_sz = params->config_sz + num_chunks * params->chunk_sz;
> -	buf = kzalloc(buf_sz, GFP_KERNEL);
> -	if (!buf)
> -		return -ENOMEM;
> -
> -	xn_params.send_buf.iov_base = buf;
> -
>  	for (u32 i = 0; i < num_msgs; i++) {
> -		ssize_t reply_sz;
> +		u32 buf_sz;
> +		void *buf;
> +		int err;
>  
> -		memset(buf, 0, buf_sz);
> -		xn_params.send_buf.iov_len = buf_sz;
> +		num_chunks = min(num_chunks, totqs);
> +		buf_sz = params->config_sz + num_chunks * params->chunk_sz;
> +		buf = kzalloc(buf_sz, GFP_KERNEL);
> +		if (!buf)
> +			return -ENOMEM;
>  
> -		if (params->prepare_msg(vid, buf, pos, num_chunks) != buf_sz)
> +		if (params->prepare_msg(vid, buf, pos, num_chunks) != buf_sz) {
> +			kfree(buf);
>  			return -EINVAL;
> +		}
>  
> -		reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -		if (reply_sz < 0)
> -			return reply_sz;
> +		err = idpf_send_mb_msg_kfree(adapter, &xn_params, buf, buf_sz);
> +		if (err)
> +			return err;
>  
> +		libie_ctlq_release_rx_buf(&xn_params.recv_mem);
> +		xn_params.recv_mem = (struct kvec) {};
>  		pos += num_chunks * params->chunk_sz;
>  		totqs -= num_chunks;
> -
> -		num_chunks = min(num_chunks, totqs);
> -		buf_sz = params->config_sz + num_chunks * params->chunk_sz;
>  	}
>  
>  	return 0;
> @@ -881,11 +406,14 @@ static int idpf_wait_for_marker_event(struct idpf_vport *vport)
>   */
>  static int idpf_send_ver_msg(struct idpf_adapter *adapter)
>  {
> -	struct idpf_vc_xn_params xn_params = {};
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= VIRTCHNL2_OP_VERSION,
> +	};
> +	struct virtchnl2_version_info *vvi_recv;
>  	struct virtchnl2_version_info vvi;
> -	ssize_t reply_sz;
>  	u32 major, minor;
> -	int err = 0;
> +	int err;
>  
>  	if (adapter->virt_ver_maj) {
>  		vvi.major = cpu_to_le32(adapter->virt_ver_maj);
> @@ -895,24 +423,23 @@ static int idpf_send_ver_msg(struct idpf_adapter *adapter)
>  		vvi.minor = cpu_to_le32(IDPF_VIRTCHNL_VERSION_MINOR);
>  	}
>  
> -	xn_params.vc_op = VIRTCHNL2_OP_VERSION;
> -	xn_params.send_buf.iov_base = &vvi;
> -	xn_params.send_buf.iov_len = sizeof(vvi);
> -	xn_params.recv_buf = xn_params.send_buf;
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> +	err = idpf_send_mb_msg(adapter, &xn_params, &vvi, sizeof(vvi));
> +	if (err)
> +		return err;
>  
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -	if (reply_sz < 0)
> -		return reply_sz;
> -	if (reply_sz < sizeof(vvi))
> -		return -EIO;
> +	if (xn_params.recv_mem.iov_len < sizeof(*vvi_recv)) {
> +		err = -EIO;
> +		goto free_rx_buf;
> +	}
>  
> -	major = le32_to_cpu(vvi.major);
> -	minor = le32_to_cpu(vvi.minor);
> +	vvi_recv = xn_params.recv_mem.iov_base;
> +	major = le32_to_cpu(vvi_recv->major);
> +	minor = le32_to_cpu(vvi_recv->minor);
>  
>  	if (major > IDPF_VIRTCHNL_VERSION_MAJOR) {
>  		dev_warn(&adapter->pdev->dev, "Virtchnl major version greater than supported\n");
> -		return -EINVAL;
> +		err = -EINVAL;
> +		goto free_rx_buf;
>  	}
>  
>  	if (major == IDPF_VIRTCHNL_VERSION_MAJOR &&
> @@ -930,6 +457,9 @@ static int idpf_send_ver_msg(struct idpf_adapter *adapter)
>  	adapter->virt_ver_maj = major;
>  	adapter->virt_ver_min = minor;
>  
> +free_rx_buf:
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
> +
>  	return err;
>  }
>  
> @@ -942,9 +472,12 @@ static int idpf_send_ver_msg(struct idpf_adapter *adapter)
>   */
>  static int idpf_send_get_caps_msg(struct idpf_adapter *adapter)
>  {
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= VIRTCHNL2_OP_GET_CAPS,
> +	};
>  	struct virtchnl2_get_capabilities caps = {};
> -	struct idpf_vc_xn_params xn_params = {};
> -	ssize_t reply_sz;
> +	int err;
>  
>  	caps.csum_caps =
>  		cpu_to_le32(VIRTCHNL2_CAP_TX_CSUM_L3_IPV4	|
> @@ -1004,20 +537,22 @@ static int idpf_send_get_caps_msg(struct idpf_adapter *adapter)
>  			    VIRTCHNL2_CAP_LOOPBACK		|
>  			    VIRTCHNL2_CAP_PTP);
>  
> -	xn_params.vc_op = VIRTCHNL2_OP_GET_CAPS;
> -	xn_params.send_buf.iov_base = &caps;
> -	xn_params.send_buf.iov_len = sizeof(caps);
> -	xn_params.recv_buf.iov_base = &adapter->caps;
> -	xn_params.recv_buf.iov_len = sizeof(adapter->caps);
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> +	err = idpf_send_mb_msg(adapter, &xn_params, &caps, sizeof(caps));
> +	if (err)
> +		return err;
>  
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -	if (reply_sz < 0)
> -		return reply_sz;
> -	if (reply_sz < sizeof(adapter->caps))
> -		return -EIO;
> +	if (xn_params.recv_mem.iov_len < sizeof(adapter->caps)) {
> +		err = -EIO;
> +		goto free_rx_buf;
> +	}
>  
> -	return 0;
> +	memcpy(&adapter->caps, xn_params.recv_mem.iov_base,
> +	       sizeof(adapter->caps));
> +
> +free_rx_buf:
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
> +
> +	return err;
>  }
>  
>  /**
> @@ -1062,37 +597,39 @@ static void idpf_decfg_lan_memory_regions(struct idpf_adapter *adapter)
>   */
>  static int idpf_cfg_lan_memory_regions(struct idpf_adapter *adapter)
>  {
> -	struct virtchnl2_get_lan_memory_regions *rcvd_regions __free(kfree);
> -	struct idpf_vc_xn_params xn_params = {
> -		.vc_op = VIRTCHNL2_OP_GET_LAN_MEMORY_REGIONS,
> -		.recv_buf.iov_len = IDPF_CTLQ_MAX_BUF_LEN,
> -		.send_buf.iov_len =
> -			sizeof(struct virtchnl2_get_lan_memory_regions) +
> -			sizeof(struct virtchnl2_mem_region),
> +	struct virtchnl2_get_lan_memory_regions *send_regions, *rcvd_regions;
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.chnl_opcode = VIRTCHNL2_OP_GET_LAN_MEMORY_REGIONS,
>  		.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
>  	};
> -	int num_regions, size;
> -	ssize_t reply_sz;
> +	size_t send_sz, reply_sz, size;
> +	int num_regions;
>  	int err = 0;
>  
> -	rcvd_regions = kzalloc(IDPF_CTLQ_MAX_BUF_LEN, GFP_KERNEL);
> -	if (!rcvd_regions)
> +	send_sz = sizeof(struct virtchnl2_get_lan_memory_regions) +
> +		  sizeof(struct virtchnl2_mem_region);
> +	send_regions = kzalloc(send_sz, GFP_KERNEL);
> +	if (!send_regions)
>  		return -ENOMEM;
>  
> -	xn_params.recv_buf.iov_base = rcvd_regions;
> -	rcvd_regions->num_memory_regions = cpu_to_le16(1);
> -	xn_params.send_buf.iov_base = rcvd_regions;
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -	if (reply_sz < 0)
> -		return reply_sz;
> +	send_regions->num_memory_regions = cpu_to_le16(1);
> +	err = idpf_send_mb_msg_kfree(adapter, &xn_params, send_regions,
> +				     send_sz);
> +	if (err)
> +		return err;
>  
> +	rcvd_regions = xn_params.recv_mem.iov_base;
> +	reply_sz = xn_params.recv_mem.iov_len;
> +	if (reply_sz < sizeof(*rcvd_regions)) {
> +		err = -EIO;
> +		goto rel_rx_buf;
> +	}
>  	num_regions = le16_to_cpu(rcvd_regions->num_memory_regions);
>  	size = struct_size(rcvd_regions, mem_reg, num_regions);
> -	if (reply_sz < size)
> -		return -EIO;
> -
> -	if (size > IDPF_CTLQ_MAX_BUF_LEN)
> -		return -EINVAL;
> +	if (reply_sz < size) {
> +		err = -EIO;
> +		goto rel_rx_buf;
> +	}
>  
>  	for (int i = 0; i < num_regions; i++) {
>  		struct libie_mmio_info *mmio = &adapter->ctlq_ctx.mmio_info;
> @@ -1102,10 +639,14 @@ static int idpf_cfg_lan_memory_regions(struct idpf_adapter *adapter)
>  		len = le64_to_cpu(rcvd_regions->mem_reg[i].size);
>  		if (len && !libie_pci_map_mmio_region(mmio, offset, len)) {
>  			idpf_decfg_lan_memory_regions(adapter);
> -			return -EIO;
> +			err = -EIO;
> +			goto rel_rx_buf;
>  		}
>  	}
>  
> +rel_rx_buf:
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
> +
>  	return err;
>  }
>  
> @@ -1164,24 +705,43 @@ int idpf_add_del_fsteer_filters(struct idpf_adapter *adapter,
>  				struct virtchnl2_flow_rule_add_del *rule,
>  				enum virtchnl2_op opcode)
>  {
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.chnl_opcode = opcode,
> +		.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +	};
> +	struct virtchnl2_flow_rule_add_del *rx_rule;
>  	int rule_count = le32_to_cpu(rule->count);
> -	struct idpf_vc_xn_params xn_params = {};
> -	ssize_t reply_sz;
> +	size_t send_sz;
> +	int err;
>  
>  	if (opcode != VIRTCHNL2_OP_ADD_FLOW_RULE &&
> -	    opcode != VIRTCHNL2_OP_DEL_FLOW_RULE)
> +	    opcode != VIRTCHNL2_OP_DEL_FLOW_RULE) {
> +		kfree(rule);
>  		return -EINVAL;
> +	}
> +
> +	send_sz = struct_size(rule, rule_info, rule_count);
> +	err = idpf_send_mb_msg_kfree(adapter, &xn_params, rule, send_sz);
> +	if (err)
> +		return err;
> +
> +	if (xn_params.recv_mem.iov_len < send_sz) {
> +		err = -EIO;
> +		goto rel_rx;
> +	}
>  
> -	xn_params.vc_op = opcode;
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> -	xn_params.async = false;
> -	xn_params.send_buf.iov_base = rule;
> -	xn_params.send_buf.iov_len = struct_size(rule, rule_info, rule_count);
> -	xn_params.recv_buf.iov_base = rule;
> -	xn_params.recv_buf.iov_len = struct_size(rule, rule_info, rule_count);
> +	rx_rule = xn_params.recv_mem.iov_base;
> +	for (int i = 0; i < rule_count; i++) {
> +		if (rx_rule->rule_info[i].status !=
> +		    cpu_to_le32(VIRTCHNL2_FLOW_RULE_SUCCESS)) {
> +			err = -EIO;
> +			goto rel_rx;
> +		}
> +	}
>  
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -	return reply_sz < 0 ? reply_sz : 0;
> +rel_rx:
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
> +	return err;
>  }
>  
>  /**
> @@ -1555,11 +1115,13 @@ int idpf_queue_reg_init(struct idpf_vport *vport,
>  int idpf_send_create_vport_msg(struct idpf_adapter *adapter,
>  			       struct idpf_vport_max_q *max_q)
>  {
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= VIRTCHNL2_OP_CREATE_VPORT,
> +	};
>  	struct virtchnl2_create_vport *vport_msg;
> -	struct idpf_vc_xn_params xn_params = {};
>  	u16 idx = adapter->next_vport;
>  	int err, buf_size;
> -	ssize_t reply_sz;
>  
>  	buf_size = sizeof(struct virtchnl2_create_vport);
>  	vport_msg = kzalloc(buf_size, GFP_KERNEL);
> @@ -1586,33 +1148,29 @@ int idpf_send_create_vport_msg(struct idpf_adapter *adapter,
>  	}
>  
>  	if (!adapter->vport_params_recvd[idx]) {
> -		adapter->vport_params_recvd[idx] = kzalloc(IDPF_CTLQ_MAX_BUF_LEN,
> -							   GFP_KERNEL);
> +		adapter->vport_params_recvd[idx] =
> +			kzalloc(LIBIE_CTLQ_MAX_BUF_LEN, GFP_KERNEL);
>  		if (!adapter->vport_params_recvd[idx]) {
>  			err = -ENOMEM;
>  			goto rel_buf;
>  		}
>  	}
>  
> -	xn_params.vc_op = VIRTCHNL2_OP_CREATE_VPORT;
> -	xn_params.send_buf.iov_base = vport_msg;
> -	xn_params.send_buf.iov_len = buf_size;
> -	xn_params.recv_buf.iov_base = adapter->vport_params_recvd[idx];
> -	xn_params.recv_buf.iov_len = IDPF_CTLQ_MAX_BUF_LEN;
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -	if (reply_sz < 0) {
> -		err = reply_sz;
> -		goto free_vport_params;
> +	err = idpf_send_mb_msg_kfree(adapter, &xn_params, vport_msg,
> +				     sizeof(*vport_msg));
> +	if (err) {
> +		kfree(adapter->vport_params_recvd[idx]);
> +		adapter->vport_params_recvd[idx] = NULL;
> +		return err;
>  	}
>  
> -	kfree(vport_msg);
> +	memcpy(adapter->vport_params_recvd[idx], xn_params.recv_mem.iov_base,
> +	       xn_params.recv_mem.iov_len);
> +
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
>  
>  	return 0;
>  
> -free_vport_params:
> -	kfree(adapter->vport_params_recvd[idx]);
> -	adapter->vport_params_recvd[idx] = NULL;
>  rel_buf:
>  	kfree(vport_msg);
>  
> @@ -1674,19 +1232,22 @@ int idpf_check_supported_desc_ids(struct idpf_vport *vport)
>   */
>  int idpf_send_destroy_vport_msg(struct idpf_adapter *adapter, u32 vport_id)
>  {
> -	struct idpf_vc_xn_params xn_params = {};
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= VIRTCHNL2_OP_DESTROY_VPORT,
> +	};
>  	struct virtchnl2_vport v_id;
> -	ssize_t reply_sz;
> +	int err;
>  
>  	v_id.vport_id = cpu_to_le32(vport_id);
>  
> -	xn_params.vc_op = VIRTCHNL2_OP_DESTROY_VPORT;
> -	xn_params.send_buf.iov_base = &v_id;
> -	xn_params.send_buf.iov_len = sizeof(v_id);
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> +	err = idpf_send_mb_msg(adapter, &xn_params, &v_id, sizeof(v_id));
> +	if (err)
> +		return err;
> +
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
>  
> -	return reply_sz < 0 ? reply_sz : 0;
> +	return 0;
>  }
>  
>  /**
> @@ -1698,19 +1259,22 @@ int idpf_send_destroy_vport_msg(struct idpf_adapter *adapter, u32 vport_id)
>   */
>  int idpf_send_enable_vport_msg(struct idpf_adapter *adapter, u32 vport_id)
>  {
> -	struct idpf_vc_xn_params xn_params = {};
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= VIRTCHNL2_OP_ENABLE_VPORT,
> +	};
>  	struct virtchnl2_vport v_id;
> -	ssize_t reply_sz;
> +	int err;
>  
>  	v_id.vport_id = cpu_to_le32(vport_id);
>  
> -	xn_params.vc_op = VIRTCHNL2_OP_ENABLE_VPORT;
> -	xn_params.send_buf.iov_base = &v_id;
> -	xn_params.send_buf.iov_len = sizeof(v_id);
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> +	err = idpf_send_mb_msg(adapter, &xn_params, &v_id, sizeof(v_id));
> +	if (err)
> +		return err;
>  
> -	return reply_sz < 0 ? reply_sz : 0;
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
> +
> +	return 0;
>  }
>  
>  /**
> @@ -1722,19 +1286,22 @@ int idpf_send_enable_vport_msg(struct idpf_adapter *adapter, u32 vport_id)
>   */
>  int idpf_send_disable_vport_msg(struct idpf_adapter *adapter, u32 vport_id)
>  {
> -	struct idpf_vc_xn_params xn_params = {};
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= VIRTCHNL2_OP_DISABLE_VPORT,
> +	};
>  	struct virtchnl2_vport v_id;
> -	ssize_t reply_sz;
> +	int err;
>  
>  	v_id.vport_id = cpu_to_le32(vport_id);
>  
> -	xn_params.vc_op = VIRTCHNL2_OP_DISABLE_VPORT;
> -	xn_params.send_buf.iov_base = &v_id;
> -	xn_params.send_buf.iov_len = sizeof(v_id);
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> +	err = idpf_send_mb_msg(adapter, &xn_params, &v_id, sizeof(v_id));
> +	if (err)
> +		return err;
> +
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
>  
> -	return reply_sz < 0 ? reply_sz : 0;
> +	return 0;
>  }
>  
>  /**
> @@ -2573,11 +2140,14 @@ int idpf_send_delete_queues_msg(struct idpf_adapter *adapter,
>  				struct idpf_queue_id_reg_info *chunks,
>  				u32 vport_id)
>  {
> -	struct virtchnl2_del_ena_dis_queues *eq __free(kfree) = NULL;
> -	struct idpf_vc_xn_params xn_params = {};
> -	ssize_t reply_sz;
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= VIRTCHNL2_OP_DEL_QUEUES,
> +	};
> +	struct virtchnl2_del_ena_dis_queues *eq;
> +	ssize_t buf_size;
>  	u16 num_chunks;
> -	int buf_size;
> +	int err;
>  
>  	num_chunks = chunks->num_chunks;
>  	buf_size = struct_size(eq, chunks.chunks, num_chunks);
> @@ -2592,13 +2162,13 @@ int idpf_send_delete_queues_msg(struct idpf_adapter *adapter,
>  	idpf_convert_reg_to_queue_chunks(eq->chunks.chunks, chunks->queue_chunks,
>  					 num_chunks);
>  
> -	xn_params.vc_op = VIRTCHNL2_OP_DEL_QUEUES;
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> -	xn_params.send_buf.iov_base = eq;
> -	xn_params.send_buf.iov_len = buf_size;
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> +	err = idpf_send_mb_msg_kfree(adapter, &xn_params, eq, buf_size);
> +	if (err)
> +		return err;
>  
> -	return reply_sz < 0 ? reply_sz : 0;
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
> +
> +	return 0;
>  }
>  
>  /**
> @@ -2636,15 +2206,14 @@ int idpf_send_add_queues_msg(struct idpf_adapter *adapter,
>  			     struct idpf_q_vec_rsrc *rsrc,
>  			     u32 vport_id)
>  {
> -	struct virtchnl2_add_queues *vc_msg __free(kfree) = NULL;
> -	struct idpf_vc_xn_params xn_params = {};
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= VIRTCHNL2_OP_ADD_QUEUES,
> +	};
> +	struct virtchnl2_add_queues *vc_msg;
>  	struct virtchnl2_add_queues aq = {};
> -	ssize_t reply_sz;
> -	int size;
> -
> -	vc_msg = kzalloc(IDPF_CTLQ_MAX_BUF_LEN, GFP_KERNEL);
> -	if (!vc_msg)
> -		return -ENOMEM;
> +	size_t size;
> +	int err;
>  
>  	aq.vport_id = cpu_to_le32(vport_id);
>  	aq.num_tx_q = cpu_to_le16(rsrc->num_txq);
> @@ -2652,29 +2221,38 @@ int idpf_send_add_queues_msg(struct idpf_adapter *adapter,
>  	aq.num_rx_q = cpu_to_le16(rsrc->num_rxq);
>  	aq.num_rx_bufq = cpu_to_le16(rsrc->num_bufq);
>  
> -	xn_params.vc_op = VIRTCHNL2_OP_ADD_QUEUES;
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> -	xn_params.send_buf.iov_base = &aq;
> -	xn_params.send_buf.iov_len = sizeof(aq);
> -	xn_params.recv_buf.iov_base = vc_msg;
> -	xn_params.recv_buf.iov_len = IDPF_CTLQ_MAX_BUF_LEN;
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -	if (reply_sz < 0)
> -		return reply_sz;
> +	err = idpf_send_mb_msg(adapter, &xn_params, &aq, sizeof(aq));
> +	if (err)
> +		return err;
> +
> +	vc_msg = xn_params.recv_mem.iov_base;
> +	if (xn_params.recv_mem.iov_len < sizeof(*vc_msg)) {
> +		err = -EIO;
> +		goto free_rx_buf;
> +	}
>  
>  	/* compare vc_msg num queues with vport num queues */
>  	if (le16_to_cpu(vc_msg->num_tx_q) != rsrc->num_txq ||
>  	    le16_to_cpu(vc_msg->num_rx_q) != rsrc->num_rxq ||
>  	    le16_to_cpu(vc_msg->num_tx_complq) != rsrc->num_complq ||
> -	    le16_to_cpu(vc_msg->num_rx_bufq) != rsrc->num_bufq)
> -		return -EINVAL;
> +	    le16_to_cpu(vc_msg->num_rx_bufq) != rsrc->num_bufq) {
> +		err = -EINVAL;
> +		goto free_rx_buf;
> +	}
>  
>  	size = struct_size(vc_msg, chunks.chunks,
>  			   le16_to_cpu(vc_msg->chunks.num_chunks));
> -	if (reply_sz < size)
> -		return -EIO;
> +	if (xn_params.recv_mem.iov_len < size) {
> +		err = -EIO;
> +		goto free_rx_buf;
> +	}
> +
> +	err = idpf_vport_init_queue_reg_chunks(vport_config, &vc_msg->chunks);
> +
> +free_rx_buf:
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
>  
> -	return idpf_vport_init_queue_reg_chunks(vport_config, &vc_msg->chunks);
> +	return err;
>  }
>  
>  /**
> @@ -2686,49 +2264,51 @@ int idpf_send_add_queues_msg(struct idpf_adapter *adapter,
>   */
>  int idpf_send_alloc_vectors_msg(struct idpf_adapter *adapter, u16 num_vectors)
>  {
> -	struct virtchnl2_alloc_vectors *rcvd_vec __free(kfree) = NULL;
> -	struct idpf_vc_xn_params xn_params = {};
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= VIRTCHNL2_OP_ALLOC_VECTORS,
> +	};
> +	struct virtchnl2_alloc_vectors *rcvd_vec;
>  	struct virtchnl2_alloc_vectors ac = {};
> -	ssize_t reply_sz;
>  	u16 num_vchunks;
> -	int size;
> +	int size, err;
>  
>  	ac.num_vectors = cpu_to_le16(num_vectors);
>  
> -	rcvd_vec = kzalloc(IDPF_CTLQ_MAX_BUF_LEN, GFP_KERNEL);
> -	if (!rcvd_vec)
> -		return -ENOMEM;
> +	err = idpf_send_mb_msg(adapter, &xn_params, &ac, sizeof(ac));
> +	if (err)
> +		return err;
>  
> -	xn_params.vc_op = VIRTCHNL2_OP_ALLOC_VECTORS;
> -	xn_params.send_buf.iov_base = &ac;
> -	xn_params.send_buf.iov_len = sizeof(ac);
> -	xn_params.recv_buf.iov_base = rcvd_vec;
> -	xn_params.recv_buf.iov_len = IDPF_CTLQ_MAX_BUF_LEN;
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -	if (reply_sz < 0)
> -		return reply_sz;
> +	rcvd_vec = xn_params.recv_mem.iov_base;
> +	if (xn_params.recv_mem.iov_len < sizeof(*rcvd_vec)) {
> +		err = -EIO;
> +		goto free_rx_buf;
> +	}
>  
>  	num_vchunks = le16_to_cpu(rcvd_vec->vchunks.num_vchunks);
>  	size = struct_size(rcvd_vec, vchunks.vchunks, num_vchunks);
> -	if (reply_sz < size)
> -		return -EIO;
> -
> -	if (size > IDPF_CTLQ_MAX_BUF_LEN)
> -		return -EINVAL;
> +	if (xn_params.recv_mem.iov_len < size) {
> +		err = -EIO;
> +		goto free_rx_buf;
> +	}
>  
>  	kfree(adapter->req_vec_chunks);
>  	adapter->req_vec_chunks = kmemdup(rcvd_vec, size, GFP_KERNEL);
> -	if (!adapter->req_vec_chunks)
> -		return -ENOMEM;
> +	if (!adapter->req_vec_chunks) {
> +		err = -ENOMEM;
> +		goto free_rx_buf;
> +	}
>  
>  	if (le16_to_cpu(adapter->req_vec_chunks->num_vectors) < num_vectors) {
>  		kfree(adapter->req_vec_chunks);
>  		adapter->req_vec_chunks = NULL;
> -		return -EINVAL;
> +		err = -EINVAL;
>  	}
>  
> -	return 0;
> +free_rx_buf:
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
> +
> +	return err;
>  }
>  
>  /**
> @@ -2740,24 +2320,28 @@ int idpf_send_alloc_vectors_msg(struct idpf_adapter *adapter, u16 num_vectors)
>  int idpf_send_dealloc_vectors_msg(struct idpf_adapter *adapter)
>  {
>  	struct virtchnl2_alloc_vectors *ac = adapter->req_vec_chunks;
> -	struct virtchnl2_vector_chunks *vcs = &ac->vchunks;
> -	struct idpf_vc_xn_params xn_params = {};
> -	ssize_t reply_sz;
> -	int buf_size;
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= VIRTCHNL2_OP_DEALLOC_VECTORS,
> +	};
> +	struct virtchnl2_vector_chunks *vcs;
> +	int buf_size, err;
>  
> -	buf_size = struct_size(vcs, vchunks, le16_to_cpu(vcs->num_vchunks));
> +	buf_size = struct_size(&ac->vchunks, vchunks,
> +			       le16_to_cpu(ac->vchunks.num_vchunks));
> +	vcs = kmemdup(&ac->vchunks, buf_size, GFP_KERNEL);
> +	if (!vcs)
> +		return -ENOMEM;
>  
> -	xn_params.vc_op = VIRTCHNL2_OP_DEALLOC_VECTORS;
> -	xn_params.send_buf.iov_base = vcs;
> -	xn_params.send_buf.iov_len = buf_size;
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -	if (reply_sz < 0)
> -		return reply_sz;
> +	err = idpf_send_mb_msg_kfree(adapter, &xn_params, vcs, buf_size);
> +	if (err)
> +		return err;
>  
>  	kfree(adapter->req_vec_chunks);
>  	adapter->req_vec_chunks = NULL;
>  
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
> +
>  	return 0;
>  }
>  
> @@ -2781,18 +2365,22 @@ static int idpf_get_max_vfs(struct idpf_adapter *adapter)
>   */
>  int idpf_send_set_sriov_vfs_msg(struct idpf_adapter *adapter, u16 num_vfs)
>  {
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= VIRTCHNL2_OP_SET_SRIOV_VFS,
> +	};
>  	struct virtchnl2_sriov_vfs_info svi = {};
> -	struct idpf_vc_xn_params xn_params = {};
> -	ssize_t reply_sz;
> +	int err;
>  
>  	svi.num_vfs = cpu_to_le16(num_vfs);
> -	xn_params.vc_op = VIRTCHNL2_OP_SET_SRIOV_VFS;
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> -	xn_params.send_buf.iov_base = &svi;
> -	xn_params.send_buf.iov_len = sizeof(svi);
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
>  
> -	return reply_sz < 0 ? reply_sz : 0;
> +	err = idpf_send_mb_msg(adapter, &xn_params, &svi, sizeof(svi));
> +	if (err)
> +		return err;
> +
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
> +
> +	return 0;
>  }
>  
>  /**
> @@ -2805,10 +2393,14 @@ int idpf_send_set_sriov_vfs_msg(struct idpf_adapter *adapter, u16 num_vfs)
>  int idpf_send_get_stats_msg(struct idpf_netdev_priv *np,
>  			    struct idpf_port_stats *port_stats)
>  {
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= VIRTCHNL2_OP_GET_STATS,
> +	};
>  	struct rtnl_link_stats64 *netstats = &np->netstats;
> +	struct virtchnl2_vport_stats *stats_recv;
>  	struct virtchnl2_vport_stats stats_msg = {};
> -	struct idpf_vc_xn_params xn_params = {};
> -	ssize_t reply_sz;
> +	int err;
>  
>  
>  	/* Don't send get_stats message if the link is down */
> @@ -2817,38 +2409,41 @@ int idpf_send_get_stats_msg(struct idpf_netdev_priv *np,
>  
>  	stats_msg.vport_id = cpu_to_le32(np->vport_id);
>  
> -	xn_params.vc_op = VIRTCHNL2_OP_GET_STATS;
> -	xn_params.send_buf.iov_base = &stats_msg;
> -	xn_params.send_buf.iov_len = sizeof(stats_msg);
> -	xn_params.recv_buf = xn_params.send_buf;
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> +	err = idpf_send_mb_msg(np->adapter, &xn_params, &stats_msg,
> +			       sizeof(stats_msg));
> +	if (err)
> +		return err;
>  
> -	reply_sz = idpf_vc_xn_exec(np->adapter, &xn_params);
> -	if (reply_sz < 0)
> -		return reply_sz;
> -	if (reply_sz < sizeof(stats_msg))
> -		return -EIO;
> +	if (xn_params.recv_mem.iov_len < sizeof(*stats_recv)) {
> +		err = -EIO;
> +		goto free_rx_buf;
> +	}
> +
> +	stats_recv = xn_params.recv_mem.iov_base;
>  
>  	spin_lock_bh(&np->stats_lock);
>  
> -	netstats->rx_packets = le64_to_cpu(stats_msg.rx_unicast) +
> -			       le64_to_cpu(stats_msg.rx_multicast) +
> -			       le64_to_cpu(stats_msg.rx_broadcast);
> -	netstats->tx_packets = le64_to_cpu(stats_msg.tx_unicast) +
> -			       le64_to_cpu(stats_msg.tx_multicast) +
> -			       le64_to_cpu(stats_msg.tx_broadcast);
> -	netstats->rx_bytes = le64_to_cpu(stats_msg.rx_bytes);
> -	netstats->tx_bytes = le64_to_cpu(stats_msg.tx_bytes);
> -	netstats->rx_errors = le64_to_cpu(stats_msg.rx_errors);
> -	netstats->tx_errors = le64_to_cpu(stats_msg.tx_errors);
> -	netstats->rx_dropped = le64_to_cpu(stats_msg.rx_discards);
> -	netstats->tx_dropped = le64_to_cpu(stats_msg.tx_discards);
> -
> -	port_stats->vport_stats = stats_msg;
> +	netstats->rx_packets = le64_to_cpu(stats_recv->rx_unicast) +
> +			       le64_to_cpu(stats_recv->rx_multicast) +
> +			       le64_to_cpu(stats_recv->rx_broadcast);
> +	netstats->tx_packets = le64_to_cpu(stats_recv->tx_unicast) +
> +			       le64_to_cpu(stats_recv->tx_multicast) +
> +			       le64_to_cpu(stats_recv->tx_broadcast);
> +	netstats->rx_bytes = le64_to_cpu(stats_recv->rx_bytes);
> +	netstats->tx_bytes = le64_to_cpu(stats_recv->tx_bytes);
> +	netstats->rx_errors = le64_to_cpu(stats_recv->rx_errors);
> +	netstats->tx_errors = le64_to_cpu(stats_recv->tx_errors);
> +	netstats->rx_dropped = le64_to_cpu(stats_recv->rx_discards);
> +	netstats->tx_dropped = le64_to_cpu(stats_recv->tx_discards);
> +
> +	port_stats->vport_stats = *stats_recv;
>  
>  	spin_unlock_bh(&np->stats_lock);
>  
> -	return 0;
> +free_rx_buf:
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
> +
> +	return err;
>  }
>  
>  /**
> @@ -2868,14 +2463,16 @@ int idpf_send_get_set_rss_lut_msg(struct idpf_adapter *adapter,
>  				  struct idpf_rss_data *rss_data,
>  				  u32 vport_id, bool get)
>  {
> -	struct virtchnl2_rss_lut *recv_rl __free(kfree) = NULL;
> -	struct virtchnl2_rss_lut *rl __free(kfree) = NULL;
> -	struct idpf_vc_xn_params xn_params = {};
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= get ? VIRTCHNL2_OP_GET_RSS_LUT :
> +					VIRTCHNL2_OP_SET_RSS_LUT,
> +	};
> +	struct virtchnl2_rss_lut *rl, *recv_rl;
>  	int buf_size, lut_buf_size;
>  	struct idpf_vport *vport;
> -	ssize_t reply_sz;
>  	bool rxhash_ena;
> -	int i;
> +	int i, err;
>  
>  	vport = idpf_vid_to_vport(adapter, vport_id);
>  	if (!vport)
> @@ -2889,37 +2486,31 @@ int idpf_send_get_set_rss_lut_msg(struct idpf_adapter *adapter,
>  		return -ENOMEM;
>  
>  	rl->vport_id = cpu_to_le32(vport_id);
> -
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> -	xn_params.send_buf.iov_base = rl;
> -	xn_params.send_buf.iov_len = buf_size;
> -
> -	if (get) {
> -		recv_rl = kzalloc(IDPF_CTLQ_MAX_BUF_LEN, GFP_KERNEL);
> -		if (!recv_rl)
> -			return -ENOMEM;
> -		xn_params.vc_op = VIRTCHNL2_OP_GET_RSS_LUT;
> -		xn_params.recv_buf.iov_base = recv_rl;
> -		xn_params.recv_buf.iov_len = IDPF_CTLQ_MAX_BUF_LEN;
> -	} else {
> +	if (!get) {
>  		rl->lut_entries = cpu_to_le16(rss_data->rss_lut_size);
>  		for (i = 0; i < rss_data->rss_lut_size; i++)
>  			rl->lut[i] = rxhash_ena ?
>  				cpu_to_le32(rss_data->rss_lut[i]) : 0;
> -
> -		xn_params.vc_op = VIRTCHNL2_OP_SET_RSS_LUT;
>  	}
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -	if (reply_sz < 0)
> -		return reply_sz;
> +
> +	err = idpf_send_mb_msg_kfree(adapter, &xn_params, rl, buf_size);
> +	if (err)
> +		return err;
> +
>  	if (!get)
> -		return 0;
> -	if (reply_sz < sizeof(struct virtchnl2_rss_lut))
> -		return -EIO;
> +		goto free_rx_buf;
> +	if (xn_params.recv_mem.iov_len < sizeof(*recv_rl)) {
> +		err = -EIO;
> +		goto free_rx_buf;
> +	}
> +
> +	recv_rl = xn_params.recv_mem.iov_base;
>  
>  	lut_buf_size = le16_to_cpu(recv_rl->lut_entries) * sizeof(u32);
> -	if (reply_sz < lut_buf_size)
> -		return -EIO;
> +	if (xn_params.recv_mem.iov_len < lut_buf_size + sizeof(*recv_rl)) {
> +		err = -EIO;
> +		goto free_rx_buf;
> +	}
>  
>  	/* size didn't change, we can reuse existing lut buf */
>  	if (rss_data->rss_lut_size == le16_to_cpu(recv_rl->lut_entries))
> @@ -2931,13 +2522,16 @@ int idpf_send_get_set_rss_lut_msg(struct idpf_adapter *adapter,
>  	rss_data->rss_lut = kzalloc(lut_buf_size, GFP_KERNEL);
>  	if (!rss_data->rss_lut) {
>  		rss_data->rss_lut_size = 0;
> -		return -ENOMEM;
> +		err = -ENOMEM;
> +		goto free_rx_buf;
>  	}
>  
>  do_memcpy:
> -	memcpy(rss_data->rss_lut, recv_rl->lut, rss_data->rss_lut_size);
> +	memcpy(rss_data->rss_lut, recv_rl->lut, lut_buf_size);
> +free_rx_buf:
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
>  
> -	return 0;
> +	return err;
>  }
>  
>  /**
> @@ -2953,12 +2547,14 @@ int idpf_send_get_set_rss_key_msg(struct idpf_adapter *adapter,
>  				  struct idpf_rss_data *rss_data,
>  				  u32 vport_id, bool get)
>  {
> -	struct virtchnl2_rss_key *recv_rk __free(kfree) = NULL;
> -	struct virtchnl2_rss_key *rk __free(kfree) = NULL;
> -	struct idpf_vc_xn_params xn_params = {};
> -	ssize_t reply_sz;
> -	int i, buf_size;
> -	u16 key_size;
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= get ? VIRTCHNL2_OP_GET_RSS_KEY :
> +					VIRTCHNL2_OP_SET_RSS_KEY,
> +	};
> +	struct virtchnl2_rss_key *rk, *recv_rk;
> +	u16 key_size, recv_len;
> +	int i, buf_size, err;
>  
>  	buf_size = struct_size(rk, key_flex, rss_data->rss_key_size);
>  	rk = kzalloc(buf_size, GFP_KERNEL);
> @@ -2966,37 +2562,32 @@ int idpf_send_get_set_rss_key_msg(struct idpf_adapter *adapter,
>  		return -ENOMEM;
>  
>  	rk->vport_id = cpu_to_le32(vport_id);
> -	xn_params.send_buf.iov_base = rk;
> -	xn_params.send_buf.iov_len = buf_size;
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> -	if (get) {
> -		recv_rk = kzalloc(IDPF_CTLQ_MAX_BUF_LEN, GFP_KERNEL);
> -		if (!recv_rk)
> -			return -ENOMEM;
> -
> -		xn_params.vc_op = VIRTCHNL2_OP_GET_RSS_KEY;
> -		xn_params.recv_buf.iov_base = recv_rk;
> -		xn_params.recv_buf.iov_len = IDPF_CTLQ_MAX_BUF_LEN;
> -	} else {
> +	if (!get) {
>  		rk->key_len = cpu_to_le16(rss_data->rss_key_size);
>  		for (i = 0; i < rss_data->rss_key_size; i++)
>  			rk->key_flex[i] = rss_data->rss_key[i];
> -
> -		xn_params.vc_op = VIRTCHNL2_OP_SET_RSS_KEY;
>  	}
>  
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -	if (reply_sz < 0)
> -		return reply_sz;
> +	err = idpf_send_mb_msg_kfree(adapter, &xn_params, rk, buf_size);
> +	if (err)
> +		return err;
> +
>  	if (!get)
> -		return 0;
> -	if (reply_sz < sizeof(struct virtchnl2_rss_key))
> -		return -EIO;
> +		goto free_rx_buf;
>  
> +	recv_len = xn_params.recv_mem.iov_len;
> +	if (recv_len < sizeof(struct virtchnl2_rss_key)) {
> +		err = -EIO;
> +		goto free_rx_buf;
> +	}
> +
> +	recv_rk = xn_params.recv_mem.iov_base;
>  	key_size = min_t(u16, NETDEV_RSS_KEY_LEN,
>  			 le16_to_cpu(recv_rk->key_len));
> -	if (reply_sz < key_size)
> -		return -EIO;
> +	if (recv_len < key_size) {
> +		err = -EIO;
> +		goto free_rx_buf;
> +	}
>  
>  	/* key len didn't change, reuse existing buf */
>  	if (rss_data->rss_key_size == key_size)
> @@ -3007,13 +2598,16 @@ int idpf_send_get_set_rss_key_msg(struct idpf_adapter *adapter,
>  	rss_data->rss_key = kzalloc(key_size, GFP_KERNEL);
>  	if (!rss_data->rss_key) {
>  		rss_data->rss_key_size = 0;
> -		return -ENOMEM;
> +		err = -ENOMEM;
> +		goto free_rx_buf;
>  	}
>  
>  do_memcpy:
>  	memcpy(rss_data->rss_key, recv_rk->key_flex, rss_data->rss_key_size);
> +free_rx_buf:
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
>  
> -	return 0;
> +	return err;
>  }
>  
>  /**
> @@ -3190,15 +2784,18 @@ static void idpf_parse_protocol_ids(struct virtchnl2_ptype *ptype,
>   */
>  static int idpf_send_get_rx_ptype_msg(struct idpf_adapter *adapter)
>  {
> -	struct virtchnl2_get_ptype_info *get_ptype_info __free(kfree) = NULL;
> -	struct virtchnl2_get_ptype_info *ptype_info __free(kfree) = NULL;
>  	struct libeth_rx_pt *singleq_pt_lkup __free(kfree) = NULL;
>  	struct libeth_rx_pt *splitq_pt_lkup __free(kfree) = NULL;
> -	struct idpf_vc_xn_params xn_params = {};
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= VIRTCHNL2_OP_GET_PTYPE_INFO,
> +	};
> +	struct virtchnl2_get_ptype_info *get_ptype_info;
> +	struct virtchnl2_get_ptype_info *ptype_info;
> +	int err = 0, max_ptype = IDPF_RX_MAX_PTYPE;
> +	int buf_size = sizeof(*get_ptype_info);
>  	int ptypes_recvd = 0, ptype_offset;
> -	u32 max_ptype = IDPF_RX_MAX_PTYPE;
>  	u16 next_ptype_id = 0;
> -	ssize_t reply_sz;
>  
>  	singleq_pt_lkup = kzalloc_objs(*singleq_pt_lkup, IDPF_RX_MAX_BASE_PTYPE);
>  	if (!singleq_pt_lkup)
> @@ -3208,42 +2805,38 @@ static int idpf_send_get_rx_ptype_msg(struct idpf_adapter *adapter)
>  	if (!splitq_pt_lkup)
>  		return -ENOMEM;
>  
> -	get_ptype_info = kzalloc_obj(*get_ptype_info);
> -	if (!get_ptype_info)
> -		return -ENOMEM;
> -
> -	ptype_info = kzalloc(IDPF_CTLQ_MAX_BUF_LEN, GFP_KERNEL);
> -	if (!ptype_info)
> -		return -ENOMEM;
> +	while (next_ptype_id < max_ptype) {
> +		u16 num_ptypes;
>  
> -	xn_params.vc_op = VIRTCHNL2_OP_GET_PTYPE_INFO;
> -	xn_params.send_buf.iov_base = get_ptype_info;
> -	xn_params.send_buf.iov_len = sizeof(*get_ptype_info);
> -	xn_params.recv_buf.iov_base = ptype_info;
> -	xn_params.recv_buf.iov_len = IDPF_CTLQ_MAX_BUF_LEN;
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> +		get_ptype_info = kzalloc(buf_size, GFP_KERNEL);
> +		if (!get_ptype_info)
> +			return -ENOMEM;
>  
> -	while (next_ptype_id < max_ptype) {
>  		get_ptype_info->start_ptype_id = cpu_to_le16(next_ptype_id);
>  
>  		if ((next_ptype_id + IDPF_RX_MAX_PTYPES_PER_BUF) > max_ptype)
> -			get_ptype_info->num_ptypes =
> -				cpu_to_le16(max_ptype - next_ptype_id);
> +			num_ptypes = max_ptype - next_ptype_id;
>  		else
> -			get_ptype_info->num_ptypes =
> -				cpu_to_le16(IDPF_RX_MAX_PTYPES_PER_BUF);
> +			num_ptypes = IDPF_RX_MAX_PTYPES_PER_BUF;
>  
> -		reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -		if (reply_sz < 0)
> -			return reply_sz;
> +		get_ptype_info->num_ptypes = cpu_to_le16(num_ptypes);
> +		err = idpf_send_mb_msg_kfree(adapter, &xn_params,
> +					     get_ptype_info, buf_size);
> +		if (err)
> +			return err;
>  
> +		ptype_info = xn_params.recv_mem.iov_base;
> +		if (xn_params.recv_mem.iov_len < sizeof(*ptype_info)) {
> +			err = -EIO;
> +			goto free_rx_buf;
> +		}
>  		ptypes_recvd += le16_to_cpu(ptype_info->num_ptypes);
> -		if (ptypes_recvd > max_ptype)
> -			return -EINVAL;
> -
> -		next_ptype_id = le16_to_cpu(get_ptype_info->start_ptype_id) +
> -				le16_to_cpu(get_ptype_info->num_ptypes);
> +		if (ptypes_recvd > max_ptype) {
> +			err = -EINVAL;
> +			goto free_rx_buf;
> +		}
>  
> +		next_ptype_id = next_ptype_id + num_ptypes;
>  		ptype_offset = IDPF_RX_PTYPE_HDR_SZ;
>  
>  		for (u16 i = 0; i < le16_to_cpu(ptype_info->num_ptypes); i++) {
> @@ -3258,14 +2851,18 @@ static int idpf_send_get_rx_ptype_msg(struct idpf_adapter *adapter)
>  			pt_8 = ptype->ptype_id_8;
>  
>  			ptype_offset += IDPF_GET_PTYPE_SIZE(ptype);
> -			if (ptype_offset > IDPF_CTLQ_MAX_BUF_LEN)
> -				return -EINVAL;
> +			if (ptype_offset > LIBIE_CTLQ_MAX_BUF_LEN) {
> +				err = -EINVAL;
> +				goto free_rx_buf;
> +			}
>  
>  			/* 0xFFFF indicates end of ptypes */
>  			if (pt_10 == IDPF_INVALID_PTYPE_ID)
>  				goto out;
> -			if (pt_10 >= max_ptype)
> -				return -EINVAL;
> +			if (pt_10 >= max_ptype) {
> +				err = -EINVAL;
> +				goto free_rx_buf;
> +			}
>  
>  			idpf_parse_protocol_ids(ptype, &rx_pt);
>  			idpf_finalize_ptype_lookup(&rx_pt);
> @@ -3279,13 +2876,18 @@ static int idpf_send_get_rx_ptype_msg(struct idpf_adapter *adapter)
>  			if (!singleq_pt_lkup[pt_8].outer_ip)
>  				singleq_pt_lkup[pt_8] = rx_pt;
>  		}
> +
> +		libie_ctlq_release_rx_buf(&xn_params.recv_mem);
> +		xn_params.recv_mem = (struct kvec) {};
>  	}
>  
>  out:
>  	adapter->splitq_pt_lkup = no_free_ptr(splitq_pt_lkup);
>  	adapter->singleq_pt_lkup = no_free_ptr(singleq_pt_lkup);
> +free_rx_buf:
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
>  
> -	return 0;
> +	return err;
>  }
>  
>  /**
> @@ -3313,40 +2915,24 @@ static void idpf_rel_rx_pt_lkup(struct idpf_adapter *adapter)
>  int idpf_send_ena_dis_loopback_msg(struct idpf_adapter *adapter, u32 vport_id,
>  				   bool loopback_ena)
>  {
> -	struct idpf_vc_xn_params xn_params = {};
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= VIRTCHNL2_OP_LOOPBACK,
> +	};
>  	struct virtchnl2_loopback loopback;
> -	ssize_t reply_sz;
> +	int err;
>  
>  	loopback.vport_id = cpu_to_le32(vport_id);
>  	loopback.enable = loopback_ena;
>  
> -	xn_params.vc_op = VIRTCHNL2_OP_LOOPBACK;
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> -	xn_params.send_buf.iov_base = &loopback;
> -	xn_params.send_buf.iov_len = sizeof(loopback);
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -
> -	return reply_sz < 0 ? reply_sz : 0;
> -}
> -
> -/**
> - * idpf_find_ctlq - Given a type and id, find ctlq info
> - * @hw: hardware struct
> - * @type: type of ctrlq to find
> - * @id: ctlq id to find
> - *
> - * Returns pointer to found ctlq info struct, NULL otherwise.
> - */
> -static struct idpf_ctlq_info *idpf_find_ctlq(struct idpf_hw *hw,
> -					     enum idpf_ctlq_type type, int id)
> -{
> -	struct idpf_ctlq_info *cq, *tmp;
> +	err = idpf_send_mb_msg(adapter, &xn_params, &loopback,
> +			       sizeof(loopback));
> +	if (err)
> +		return err;
>  
> -	list_for_each_entry_safe(cq, tmp, &hw->cq_list_head, cq_list)
> -		if (cq->q_id == id && cq->cq_type == type)
> -			return cq;
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
>  
> -	return NULL;
> +	return 0;
>  }
>  
>  /**
> @@ -3357,40 +2943,43 @@ static struct idpf_ctlq_info *idpf_find_ctlq(struct idpf_hw *hw,
>   */
>  int idpf_init_dflt_mbx(struct idpf_adapter *adapter)
>  {
> -	struct idpf_ctlq_create_info ctlq_info[] = {
> +	struct libie_ctlq_ctx *ctx = &adapter->ctlq_ctx;
> +	struct libie_ctlq_create_info ctlq_info[] = {
>  		{
> -			.type = IDPF_CTLQ_TYPE_MAILBOX_TX,
> -			.id = IDPF_DFLT_MBX_ID,
> +			.type = LIBIE_CTLQ_TYPE_TX,
> +			.id = LIBIE_CTLQ_MBX_ID,
>  			.len = IDPF_DFLT_MBX_Q_LEN,
> -			.buf_size = IDPF_CTLQ_MAX_BUF_LEN
>  		},
>  		{
> -			.type = IDPF_CTLQ_TYPE_MAILBOX_RX,
> -			.id = IDPF_DFLT_MBX_ID,
> +			.type = LIBIE_CTLQ_TYPE_RX,
> +			.id = LIBIE_CTLQ_MBX_ID,
>  			.len = IDPF_DFLT_MBX_Q_LEN,
> -			.buf_size = IDPF_CTLQ_MAX_BUF_LEN
>  		}
>  	};
> -	struct idpf_hw *hw = &adapter->hw;
> +	struct libie_ctlq_xn_init_params params = {
> +		.num_qs = IDPF_NUM_DFLT_MBX_Q,
> +		.cctlq_info = ctlq_info,
> +		.ctx = ctx,
> +	};
>  	int err;
>  
> -	adapter->dev_ops.reg_ops.ctlq_reg_init(adapter, ctlq_info);
> +	adapter->dev_ops.reg_ops.ctlq_reg_init(&ctx->mmio_info,
> +					       params.cctlq_info);
>  
> -	err = idpf_ctlq_init(hw, IDPF_NUM_DFLT_MBX_Q, ctlq_info);
> +	err = libie_ctlq_xn_init(&params);
>  	if (err)
>  		return err;
>  
> -	hw->asq = idpf_find_ctlq(hw, IDPF_CTLQ_TYPE_MAILBOX_TX,
> -				 IDPF_DFLT_MBX_ID);
> -	hw->arq = idpf_find_ctlq(hw, IDPF_CTLQ_TYPE_MAILBOX_RX,
> -				 IDPF_DFLT_MBX_ID);
> -
> -	if (!hw->asq || !hw->arq) {
> -		idpf_ctlq_deinit(hw);
> -
> +	adapter->asq = libie_find_ctlq(ctx, LIBIE_CTLQ_TYPE_TX,
> +				       LIBIE_CTLQ_MBX_ID);
> +	adapter->arq = libie_find_ctlq(ctx, LIBIE_CTLQ_TYPE_RX,
> +				       LIBIE_CTLQ_MBX_ID);
> +	if (!adapter->asq || !adapter->arq) {
> +		libie_ctlq_xn_deinit(params.xnm, ctx);
>  		return -ENOENT;
>  	}
>  
> +	adapter->xnm = params.xnm;
>  	adapter->state = __IDPF_VER_CHECK;
>  
>  	return 0;
> @@ -3402,12 +2991,13 @@ int idpf_init_dflt_mbx(struct idpf_adapter *adapter)
>   */
>  void idpf_deinit_dflt_mbx(struct idpf_adapter *adapter)
>  {
> -	if (adapter->hw.arq && adapter->hw.asq) {
> -		idpf_mb_clean(adapter, adapter->hw.asq);
> -		idpf_ctlq_deinit(&adapter->hw);
> +	if (adapter->arq && adapter->asq) {
> +		idpf_mb_clean(adapter, adapter->asq, true);
> +		libie_ctlq_xn_deinit(adapter->xnm, &adapter->ctlq_ctx);
>  	}
> -	adapter->hw.arq = NULL;
> -	adapter->hw.asq = NULL;
> +
> +	adapter->arq = NULL;
> +	adapter->asq = NULL;
>  }
>  
>  /**
> @@ -3478,15 +3068,6 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
>  	u16 num_max_vports;
>  	int err = 0;
>  
> -	if (!adapter->vcxn_mngr) {
> -		adapter->vcxn_mngr = kzalloc_obj(*adapter->vcxn_mngr);
> -		if (!adapter->vcxn_mngr) {
> -			err = -ENOMEM;
> -			goto init_failed;
> -		}
> -	}
> -	idpf_vc_xn_init(adapter->vcxn_mngr);
> -
>  	while (adapter->state != __IDPF_INIT_SW) {
>  		switch (adapter->state) {
>  		case __IDPF_VER_CHECK:
> @@ -3633,8 +3214,7 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
>  	 * the mailbox again
>  	 */
>  	adapter->state = __IDPF_VER_CHECK;
> -	if (adapter->vcxn_mngr)
> -		idpf_vc_xn_shutdown(adapter->vcxn_mngr);
> +	idpf_deinit_dflt_mbx(adapter);
>  	set_bit(IDPF_HR_DRV_LOAD, adapter->flags);
>  	queue_delayed_work(adapter->vc_event_wq, &adapter->vc_event_task,
>  			   msecs_to_jiffies(task_delay));
> @@ -3657,7 +3237,7 @@ void idpf_vc_core_deinit(struct idpf_adapter *adapter)
>  	/* Avoid transaction timeouts when called during reset */
>  	remove_in_prog = test_bit(IDPF_REMOVE_IN_PROG, adapter->flags);
>  	if (!remove_in_prog)
> -		idpf_vc_xn_shutdown(adapter->vcxn_mngr);
> +		idpf_deinit_dflt_mbx(adapter);
>  
>  	idpf_ptp_release(adapter);
>  	idpf_deinit_task(adapter);
> @@ -3666,7 +3246,7 @@ void idpf_vc_core_deinit(struct idpf_adapter *adapter)
>  	idpf_intr_rel(adapter);
>  
>  	if (remove_in_prog)
> -		idpf_vc_xn_shutdown(adapter->vcxn_mngr);
> +		idpf_deinit_dflt_mbx(adapter);
>  
>  	cancel_delayed_work_sync(&adapter->serv_task);
>  	cancel_delayed_work_sync(&adapter->mbx_task);
> @@ -4203,9 +3783,9 @@ static void idpf_set_mac_type(const u8 *default_mac_addr,
>  
>  /**
>   * idpf_mac_filter_async_handler - Async callback for mac filters
> - * @adapter: private data struct
> - * @xn: transaction for message
> - * @ctlq_msg: received message
> + * @ctx: controlq context structure
> + * @buff: response buffer pointer and size
> + * @status: async call return value
>   *
>   * In some scenarios driver can't sleep and wait for a reply (e.g.: stack is
>   * holding rtnl_lock) when adding a new mac filter. It puts us in a difficult
> @@ -4213,13 +3793,14 @@ static void idpf_set_mac_type(const u8 *default_mac_addr,
>   * ultimately do is remove it from our list of mac filters and report the
>   * error.
>   */
> -static int idpf_mac_filter_async_handler(struct idpf_adapter *adapter,
> -					 struct idpf_vc_xn *xn,
> -					 const struct idpf_ctlq_msg *ctlq_msg)
> +static void idpf_mac_filter_async_handler(void *ctx,
> +					  struct kvec *buff,
> +					  int status)
>  {
>  	struct virtchnl2_mac_addr_list *ma_list;
>  	struct idpf_vport_config *vport_config;
>  	struct virtchnl2_mac_addr *mac_addr;
> +	struct idpf_adapter *adapter = ctx;
>  	struct idpf_mac_filter *f, *tmp;
>  	struct list_head *ma_list_head;
>  	struct idpf_vport *vport;
> @@ -4227,18 +3808,18 @@ static int idpf_mac_filter_async_handler(struct idpf_adapter *adapter,
>  	int i;
>  
>  	/* if success we're done, we're only here if something bad happened */
> -	if (!ctlq_msg->cookie.mbx.chnl_retval)
> -		return 0;
> +	if (!status || status == -ETIMEDOUT)
> +		return;
>  
> +	ma_list = buff->iov_base;
>  	/* make sure at least struct is there */
> -	if (xn->reply_sz < sizeof(*ma_list))
> +	if (buff->iov_len < sizeof(*ma_list))
>  		goto invalid_payload;
>  
> -	ma_list = ctlq_msg->ctx.indirect.payload->va;
>  	mac_addr = ma_list->mac_addr_list;
>  	num_entries = le16_to_cpu(ma_list->num_mac_addr);
>  	/* we should have received a buffer at least this big */
> -	if (xn->reply_sz < struct_size(ma_list, mac_addr_list, num_entries))
> +	if (buff->iov_len < struct_size(ma_list, mac_addr_list, num_entries))
>  		goto invalid_payload;
>  
>  	vport = idpf_vid_to_vport(adapter, le32_to_cpu(ma_list->vport_id));
> @@ -4258,16 +3839,13 @@ static int idpf_mac_filter_async_handler(struct idpf_adapter *adapter,
>  			if (ether_addr_equal(mac_addr[i].addr, f->macaddr))
>  				list_del(&f->list);
>  	spin_unlock_bh(&vport_config->mac_filter_list_lock);
> -	dev_err_ratelimited(&adapter->pdev->dev, "Received error sending MAC filter request (op %d)\n",
> -			    xn->vc_op);
> -
> -	return 0;
> +	dev_err_ratelimited(&adapter->pdev->dev, "Received error %d on sending MAC filter request\n",
> +			    status);
> +	return;
>  
>  invalid_payload:
> -	dev_err_ratelimited(&adapter->pdev->dev, "Received invalid MAC filter payload (op %d) (len %zd)\n",
> -			    xn->vc_op, xn->reply_sz);
> -
> -	return -EINVAL;
> +	dev_err_ratelimited(&adapter->pdev->dev, "Received invalid MAC filter payload (len %zd)\n",
> +			    buff->iov_len);
>  }
>  
>  /**
> @@ -4286,19 +3864,21 @@ int idpf_add_del_mac_filters(struct idpf_adapter *adapter,
>  			     const u8 *default_mac_addr, u32 vport_id,
>  			     bool add, bool async)
>  {
> -	struct virtchnl2_mac_addr_list *ma_list __free(kfree) = NULL;
>  	struct virtchnl2_mac_addr *mac_addr __free(kfree) = NULL;
> -	struct idpf_vc_xn_params xn_params = {};
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= add ? VIRTCHNL2_OP_ADD_MAC_ADDR :
> +					VIRTCHNL2_OP_DEL_MAC_ADDR,
> +	};
> +	struct virtchnl2_mac_addr_list *ma_list;
>  	u32 num_msgs, total_filters = 0;
>  	struct idpf_mac_filter *f;
> -	ssize_t reply_sz;
> -	int i = 0, k;
> +	int i = 0;
>  
> -	xn_params.vc_op = add ? VIRTCHNL2_OP_ADD_MAC_ADDR :
> -				VIRTCHNL2_OP_DEL_MAC_ADDR;
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> -	xn_params.async = async;
> -	xn_params.async_handler = idpf_mac_filter_async_handler;
> +	if (async) {
> +		xn_params.resp_cb = idpf_mac_filter_async_handler;
> +		xn_params.send_ctx = adapter;
> +	}
>  
>  	spin_lock_bh(&vport_config->mac_filter_list_lock);
>  
> @@ -4353,32 +3933,31 @@ int idpf_add_del_mac_filters(struct idpf_adapter *adapter,
>  	 */
>  	num_msgs = DIV_ROUND_UP(total_filters, IDPF_NUM_FILTERS_PER_MSG);
>  
> -	for (i = 0, k = 0; i < num_msgs; i++) {
> -		u32 entries_size, buf_size, num_entries;
> +	for (u32 i = 0, k = 0; i < num_msgs; i++) {
> +		u32 entries_size, num_entries;
> +		size_t buf_size;
> +		int err;
>  
>  		num_entries = min_t(u32, total_filters,
>  				    IDPF_NUM_FILTERS_PER_MSG);
>  		entries_size = sizeof(struct virtchnl2_mac_addr) * num_entries;
>  		buf_size = struct_size(ma_list, mac_addr_list, num_entries);
>  
> -		if (!ma_list || num_entries != IDPF_NUM_FILTERS_PER_MSG) {
> -			kfree(ma_list);
> -			ma_list = kzalloc(buf_size, GFP_ATOMIC);
> -			if (!ma_list)
> -				return -ENOMEM;
> -		} else {
> -			memset(ma_list, 0, buf_size);
> -		}
> +		ma_list = kzalloc(buf_size, GFP_ATOMIC);
> +		if (!ma_list)
> +			return -ENOMEM;
>  
>  		ma_list->vport_id = cpu_to_le32(vport_id);
>  		ma_list->num_mac_addr = cpu_to_le16(num_entries);
>  		memcpy(ma_list->mac_addr_list, &mac_addr[k], entries_size);
>  
> -		xn_params.send_buf.iov_base = ma_list;
> -		xn_params.send_buf.iov_len = buf_size;
> -		reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -		if (reply_sz < 0)
> -			return reply_sz;
> +		err = idpf_send_mb_msg_kfree(adapter, &xn_params, ma_list,
> +					     buf_size);
> +		if (err)
> +			return err;
> +
> +		if (!async)
> +			libie_ctlq_release_rx_buf(&xn_params.recv_mem);
>  
>  		k += num_entries;
>  		total_filters -= num_entries;
> @@ -4387,6 +3966,26 @@ int idpf_add_del_mac_filters(struct idpf_adapter *adapter,
>  	return 0;
>  }
>  
> +/**
> + * idpf_promiscuous_async_handler - async callback for promiscuous mode
> + * @ctx: controlq context structure
> + * @buff: response buffer pointer and size
> + * @status: async call return value
> + *
> + * Nobody is waiting for the promiscuous virtchnl message response. Print
> + * an error message if something went wrong and return.
> + */
> +static void idpf_promiscuous_async_handler(void *ctx,
> +					   struct kvec *buff,
> +					   int status)
> +{
> +	struct idpf_adapter *adapter = ctx;
> +
> +	if (status)
> +		dev_err_ratelimited(&adapter->pdev->dev, "Failed to set promiscuous mode: %d\n",
> +				    status);
> +}
> +
>  /**
>   * idpf_set_promiscuous - set promiscuous and send message to mailbox
>   * @adapter: Driver specific private structure
> @@ -4401,9 +4000,13 @@ int idpf_set_promiscuous(struct idpf_adapter *adapter,
>  			 struct idpf_vport_user_config_data *config_data,
>  			 u32 vport_id)
>  {
> -	struct idpf_vc_xn_params xn_params = {};
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.timeout_ms	= IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +		.chnl_opcode	= VIRTCHNL2_OP_CONFIG_PROMISCUOUS_MODE,
> +		.resp_cb	= idpf_promiscuous_async_handler,
> +		.send_ctx	= adapter,
> +	};
>  	struct virtchnl2_promisc_info vpi;
> -	ssize_t reply_sz;
>  	u16 flags = 0;
>  
>  	if (test_bit(__IDPF_PROMISC_UC, config_data->user_flags))
> @@ -4414,15 +4017,7 @@ int idpf_set_promiscuous(struct idpf_adapter *adapter,
>  	vpi.vport_id = cpu_to_le32(vport_id);
>  	vpi.flags = cpu_to_le16(flags);
>  
> -	xn_params.vc_op = VIRTCHNL2_OP_CONFIG_PROMISCUOUS_MODE;
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> -	xn_params.send_buf.iov_base = &vpi;
> -	xn_params.send_buf.iov_len = sizeof(vpi);
> -	/* setting promiscuous is only ever done asynchronously */
> -	xn_params.async = true;
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -
> -	return reply_sz < 0 ? reply_sz : 0;
> +	return idpf_send_mb_msg(adapter, &xn_params, &vpi, sizeof(vpi));
>  }
>  
>  /**
> @@ -4440,26 +4035,39 @@ int idpf_idc_rdma_vc_send_sync(struct iidc_rdma_core_dev_info *cdev_info,
>  			       u8 *recv_msg, u16 *recv_len)
>  {
>  	struct idpf_adapter *adapter = pci_get_drvdata(cdev_info->pdev);
> -	struct idpf_vc_xn_params xn_params = { };
> -	ssize_t reply_sz;
> -	u16 recv_size;
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.chnl_opcode = VIRTCHNL2_OP_RDMA,
> +		.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> +	};
> +	u8 on_stack_buf[LIBIE_CP_TX_COPYBREAK];
> +	void *send_buf;
> +	int err;
>  
> -	if (!recv_msg || !recv_len || msg_size > IDPF_CTLQ_MAX_BUF_LEN)
> +	if (!recv_msg || !recv_len || msg_size > LIBIE_CTLQ_MAX_BUF_LEN)
>  		return -EINVAL;
>  
> -	recv_size = min_t(u16, *recv_len, IDPF_CTLQ_MAX_BUF_LEN);
> -	*recv_len = 0;
> -	xn_params.vc_op = VIRTCHNL2_OP_RDMA;
> -	xn_params.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC;
> -	xn_params.send_buf.iov_base = send_msg;
> -	xn_params.send_buf.iov_len = msg_size;
> -	xn_params.recv_buf.iov_base = recv_msg;
> -	xn_params.recv_buf.iov_len = recv_size;
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -	if (reply_sz < 0)
> -		return reply_sz;
> -	*recv_len = reply_sz;
> +	if (!libie_cp_can_send_onstack(msg_size)) {
> +		send_buf = kzalloc(msg_size, GFP_KERNEL);
> +		if (!send_buf)
> +			return -ENOMEM;
> +	} else {
> +		send_buf = on_stack_buf;
> +	}
>  
> -	return 0;
> +	memcpy(send_buf, send_msg, msg_size);
> +	err = idpf_send_mb_msg(adapter, &xn_params, send_buf, msg_size);
> +	if (err)
> +		return err;
> +
> +	if (xn_params.recv_mem.iov_len > *recv_len) {
> +		err = -EINVAL;
> +		goto rel_buf;
> +	}
> +
> +	*recv_len = xn_params.recv_mem.iov_len;
> +	memcpy(recv_msg, xn_params.recv_mem.iov_base, *recv_len);
> +rel_buf:
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
> +	return err;
>  }
>  EXPORT_SYMBOL_GPL(idpf_idc_rdma_vc_send_sync);
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.h b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.h
> index 6992b768cef4..86a44b6e1488 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.h
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.h
> @@ -7,86 +7,6 @@
>  #include <linux/intel/virtchnl2.h>
>  
>  #define IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC	(60 * 1000)
> -#define IDPF_VC_XN_IDX_M		GENMASK(7, 0)
> -#define IDPF_VC_XN_SALT_M		GENMASK(15, 8)
> -#define IDPF_VC_XN_RING_LEN		U8_MAX
> -
> -/**
> - * enum idpf_vc_xn_state - Virtchnl transaction status
> - * @IDPF_VC_XN_IDLE: not expecting a reply, ready to be used
> - * @IDPF_VC_XN_WAITING: expecting a reply, not yet received
> - * @IDPF_VC_XN_COMPLETED_SUCCESS: a reply was expected and received, buffer
> - *				  updated
> - * @IDPF_VC_XN_COMPLETED_FAILED: a reply was expected and received, but there
> - *				 was an error, buffer not updated
> - * @IDPF_VC_XN_SHUTDOWN: transaction object cannot be used, VC torn down
> - * @IDPF_VC_XN_ASYNC: transaction sent asynchronously and doesn't have the
> - *		      return context; a callback may be provided to handle
> - *		      return
> - */
> -enum idpf_vc_xn_state {
> -	IDPF_VC_XN_IDLE = 1,
> -	IDPF_VC_XN_WAITING,
> -	IDPF_VC_XN_COMPLETED_SUCCESS,
> -	IDPF_VC_XN_COMPLETED_FAILED,
> -	IDPF_VC_XN_SHUTDOWN,
> -	IDPF_VC_XN_ASYNC,
> -};
> -
> -struct idpf_vc_xn;
> -/* Callback for asynchronous messages */
> -typedef int (*async_vc_cb) (struct idpf_adapter *, struct idpf_vc_xn *,
> -			    const struct idpf_ctlq_msg *);
> -
> -/**
> - * struct idpf_vc_xn - Data structure representing virtchnl transactions
> - * @completed: virtchnl event loop uses that to signal when a reply is
> - *	       available, uses kernel completion API
> - * @lock: protects the transaction state fields below
> - * @state: virtchnl event loop stores the data below, protected by @lock
> - * @reply_sz: Original size of reply, may be > reply_buf.iov_len; it will be
> - *	      truncated on its way to the receiver thread according to
> - *	      reply_buf.iov_len.
> - * @reply: Reference to the buffer(s) where the reply data should be written
> - *	   to. May be 0-length (then NULL address permitted) if the reply data
> - *	   should be ignored.
> - * @async_handler: if sent asynchronously, a callback can be provided to handle
> - *		   the reply when it's received
> - * @vc_op: corresponding opcode sent with this transaction
> - * @idx: index used as retrieval on reply receive, used for cookie
> - * @salt: changed every message to make unique, used for cookie
> - */
> -struct idpf_vc_xn {
> -	struct completion completed;
> -	spinlock_t lock;
> -	enum idpf_vc_xn_state state;
> -	size_t reply_sz;
> -	struct kvec reply;
> -	async_vc_cb async_handler;
> -	u32 vc_op;
> -	u8 idx;
> -	u8 salt;
> -};
> -
> -/**
> - * struct idpf_vc_xn_params - Parameters for executing transaction
> - * @send_buf: kvec for send buffer
> - * @recv_buf: kvec for recv buffer, may be NULL, must then have zero length
> - * @timeout_ms: timeout to wait for reply
> - * @async: send message asynchronously, will not wait on completion
> - * @async_handler: If sent asynchronously, optional callback handler. The user
> - *		   must be careful when using async handlers as the memory for
> - *		   the recv_buf _cannot_ be on stack if this is async.
> - * @vc_op: virtchnl op to send
> - */
> -struct idpf_vc_xn_params {
> -	struct kvec send_buf;
> -	struct kvec recv_buf;
> -	int timeout_ms;
> -	bool async;
> -	async_vc_cb async_handler;
> -	u32 vc_op;
> -};
>  
>  struct idpf_adapter;
>  struct idpf_netdev_priv;
> @@ -96,8 +16,6 @@ struct idpf_vport_max_q;
>  struct idpf_vport_config;
>  struct idpf_vport_user_config_data;
>  
> -ssize_t idpf_vc_xn_exec(struct idpf_adapter *adapter,
> -			const struct idpf_vc_xn_params *params);
>  int idpf_init_dflt_mbx(struct idpf_adapter *adapter);
>  void idpf_deinit_dflt_mbx(struct idpf_adapter *adapter);
>  int idpf_vc_core_init(struct idpf_adapter *adapter);
> @@ -124,9 +42,14 @@ bool idpf_sideband_action_ena(struct idpf_vport *vport,
>  			      struct ethtool_rx_flow_spec *fsp);
>  unsigned int idpf_fsteer_max_rules(struct idpf_vport *vport);
>  
> -int idpf_recv_mb_msg(struct idpf_adapter *adapter, struct idpf_ctlq_info *arq);
> -int idpf_send_mb_msg(struct idpf_adapter *adapter, struct idpf_ctlq_info *asq,
> -		     u32 op, u16 msg_size, u8 *msg, u16 cookie);
> +void idpf_recv_event_msg(struct libie_ctlq_ctx *ctx,
> +			 struct libie_ctlq_msg *ctlq_msg);
> +int idpf_send_mb_msg(struct idpf_adapter *adapter,
> +		     struct libie_ctlq_xn_send_params *xn_params,
> +		     void *send_buf, size_t send_buf_size);
> +int idpf_send_mb_msg_kfree(struct idpf_adapter *adapter,
> +			   struct libie_ctlq_xn_send_params *xn_params,
> +			   void *send_buf, size_t send_buf_size);
>  
>  struct idpf_queue_ptr {
>  	enum virtchnl2_queue_type	type;
> @@ -214,7 +137,6 @@ int idpf_send_get_set_rss_key_msg(struct idpf_adapter *adapter,
>  int idpf_send_get_set_rss_lut_msg(struct idpf_adapter *adapter,
>  				  struct idpf_rss_data *rss_data,
>  				  u32 vport_id, bool get);
> -void idpf_vc_xn_shutdown(struct idpf_vc_xn_manager *vcxn_mngr);
>  int idpf_idc_rdma_vc_send_sync(struct iidc_rdma_core_dev_info *cdev_info,
>  			       u8 *send_msg, u16 msg_size,
>  			       u8 *recv_msg, u16 *recv_len);
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl_ptp.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl_ptp.c
> index 8d8fb498e092..6d44021c222b 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl_ptp.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl_ptp.c
> @@ -15,7 +15,6 @@
>   */
>  int idpf_ptp_get_caps(struct idpf_adapter *adapter)
>  {
> -	struct virtchnl2_ptp_get_caps *recv_ptp_caps_msg __free(kfree) = NULL;
>  	struct virtchnl2_ptp_get_caps send_ptp_caps_msg = {
>  		.caps = cpu_to_le32(VIRTCHNL2_CAP_PTP_GET_DEVICE_CLK_TIME |
>  				    VIRTCHNL2_CAP_PTP_GET_DEVICE_CLK_TIME_MB |
> @@ -24,34 +23,34 @@ int idpf_ptp_get_caps(struct idpf_adapter *adapter)
>  				    VIRTCHNL2_CAP_PTP_ADJ_DEVICE_CLK_MB |
>  				    VIRTCHNL2_CAP_PTP_TX_TSTAMPS_MB)
>  	};
> -	struct idpf_vc_xn_params xn_params = {
> -		.vc_op = VIRTCHNL2_OP_PTP_GET_CAPS,
> -		.send_buf.iov_base = &send_ptp_caps_msg,
> -		.send_buf.iov_len = sizeof(send_ptp_caps_msg),
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.chnl_opcode = VIRTCHNL2_OP_PTP_GET_CAPS,
>  		.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
>  	};
>  	struct virtchnl2_ptp_cross_time_reg_offsets cross_tstamp_offsets;
>  	struct libie_mmio_info *mmio = &adapter->ctlq_ctx.mmio_info;
>  	struct virtchnl2_ptp_clk_adj_reg_offsets clk_adj_offsets;
>  	struct virtchnl2_ptp_clk_reg_offsets clock_offsets;
> +	struct virtchnl2_ptp_get_caps *recv_ptp_caps_msg;
>  	struct idpf_ptp_secondary_mbx *scnd_mbx;
>  	struct idpf_ptp *ptp = adapter->ptp;
>  	enum idpf_ptp_access access_type;
>  	u32 temp_offset;
> -	int reply_sz;
> +	size_t reply_sz;
> +	int err;
>  
> -	recv_ptp_caps_msg = kzalloc_obj(struct virtchnl2_ptp_get_caps);
> -	if (!recv_ptp_caps_msg)
> -		return -ENOMEM;
> +	err = idpf_send_mb_msg(adapter, &xn_params, &send_ptp_caps_msg,
> +			       sizeof(send_ptp_caps_msg));
> +	if (err)
> +		return err;
>  
> -	xn_params.recv_buf.iov_base = recv_ptp_caps_msg;
> -	xn_params.recv_buf.iov_len = sizeof(*recv_ptp_caps_msg);
> +	reply_sz = xn_params.recv_mem.iov_len;
> +	if (reply_sz != sizeof(*recv_ptp_caps_msg)) {
> +		err = -EIO;
> +		goto free_resp;
> +	}
>  
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -	if (reply_sz < 0)
> -		return reply_sz;
> -	else if (reply_sz != sizeof(*recv_ptp_caps_msg))
> -		return -EIO;
> +	recv_ptp_caps_msg = xn_params.recv_mem.iov_base;
>  
>  	ptp->caps = le32_to_cpu(recv_ptp_caps_msg->caps);
>  	ptp->base_incval = le64_to_cpu(recv_ptp_caps_msg->base_incval);
> @@ -112,7 +111,7 @@ int idpf_ptp_get_caps(struct idpf_adapter *adapter)
>  discipline_clock:
>  	access_type = ptp->adj_dev_clk_time_access;
>  	if (access_type != IDPF_PTP_DIRECT)
> -		return 0;
> +		goto free_resp;
>  
>  	clk_adj_offsets = recv_ptp_caps_msg->clk_adj_offsets;
>  
> @@ -145,7 +144,9 @@ int idpf_ptp_get_caps(struct idpf_adapter *adapter)
>  	ptp->dev_clk_regs.phy_shadj_h =
>  		libie_pci_get_mmio_addr(mmio, temp_offset);
>  
> -	return 0;
> +free_resp:
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
> +	return err;
>  }
>  
>  /**
> @@ -160,28 +161,34 @@ int idpf_ptp_get_caps(struct idpf_adapter *adapter)
>  int idpf_ptp_get_dev_clk_time(struct idpf_adapter *adapter,
>  			      struct idpf_ptp_dev_timers *dev_clk_time)
>  {
> +	struct virtchnl2_ptp_get_dev_clk_time *get_dev_clk_time_resp;
>  	struct virtchnl2_ptp_get_dev_clk_time get_dev_clk_time_msg;
> -	struct idpf_vc_xn_params xn_params = {
> -		.vc_op = VIRTCHNL2_OP_PTP_GET_DEV_CLK_TIME,
> -		.send_buf.iov_base = &get_dev_clk_time_msg,
> -		.send_buf.iov_len = sizeof(get_dev_clk_time_msg),
> -		.recv_buf.iov_base = &get_dev_clk_time_msg,
> -		.recv_buf.iov_len = sizeof(get_dev_clk_time_msg),
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.chnl_opcode = VIRTCHNL2_OP_PTP_GET_DEV_CLK_TIME,
>  		.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
>  	};
> -	int reply_sz;
> +	size_t reply_sz;
>  	u64 dev_time;
> +	int err;
>  
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -	if (reply_sz < 0)
> -		return reply_sz;
> -	if (reply_sz != sizeof(get_dev_clk_time_msg))
> -		return -EIO;
> +	err = idpf_send_mb_msg(adapter, &xn_params, &get_dev_clk_time_msg,
> +			       sizeof(get_dev_clk_time_msg));
> +	if (err)
> +		return err;
>  
> -	dev_time = le64_to_cpu(get_dev_clk_time_msg.dev_time_ns);
> +	reply_sz = xn_params.recv_mem.iov_len;
> +	if (reply_sz != sizeof(*get_dev_clk_time_resp)) {
> +		err = -EIO;
> +		goto free_resp;
> +	}
> +
> +	get_dev_clk_time_resp = xn_params.recv_mem.iov_base;
> +	dev_time = le64_to_cpu(get_dev_clk_time_resp->dev_time_ns);
>  	dev_clk_time->dev_clk_time_ns = dev_time;
>  
> -	return 0;
> +free_resp:
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
> +	return err;
>  }
>  
>  /**
> @@ -197,27 +204,30 @@ int idpf_ptp_get_dev_clk_time(struct idpf_adapter *adapter,
>  int idpf_ptp_get_cross_time(struct idpf_adapter *adapter,
>  			    struct idpf_ptp_dev_timers *cross_time)
>  {
> -	struct virtchnl2_ptp_get_cross_time cross_time_msg;
> -	struct idpf_vc_xn_params xn_params = {
> -		.vc_op = VIRTCHNL2_OP_PTP_GET_CROSS_TIME,
> -		.send_buf.iov_base = &cross_time_msg,
> -		.send_buf.iov_len = sizeof(cross_time_msg),
> -		.recv_buf.iov_base = &cross_time_msg,
> -		.recv_buf.iov_len = sizeof(cross_time_msg),
> +	struct virtchnl2_ptp_get_cross_time cross_time_send, *cross_time_recv;
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.chnl_opcode = VIRTCHNL2_OP_PTP_GET_CROSS_TIME,
>  		.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
>  	};
> -	int reply_sz;
> +	int err = 0;
>  
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -	if (reply_sz < 0)
> -		return reply_sz;
> -	if (reply_sz != sizeof(cross_time_msg))
> -		return -EIO;
> +	err = idpf_send_mb_msg(adapter, &xn_params, &cross_time_send,
> +			       sizeof(cross_time_send));
> +	if (err)
> +		return err;
>  
> -	cross_time->dev_clk_time_ns = le64_to_cpu(cross_time_msg.dev_time_ns);
> -	cross_time->sys_time_ns = le64_to_cpu(cross_time_msg.sys_time_ns);
> +	if (xn_params.recv_mem.iov_len != sizeof(*cross_time_recv)) {
> +		err = -EIO;
> +		goto free_resp;
> +	}
>  
> -	return 0;
> +	cross_time_recv = xn_params.recv_mem.iov_base;
> +	cross_time->dev_clk_time_ns = le64_to_cpu(cross_time_recv->dev_time_ns);
> +	cross_time->sys_time_ns = le64_to_cpu(cross_time_recv->sys_time_ns);
> +
> +free_resp:
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
> +	return err;
>  }
>  
>  /**
> @@ -234,23 +244,18 @@ int idpf_ptp_set_dev_clk_time(struct idpf_adapter *adapter, u64 time)
>  	struct virtchnl2_ptp_set_dev_clk_time set_dev_clk_time_msg = {
>  		.dev_time_ns = cpu_to_le64(time),
>  	};
> -	struct idpf_vc_xn_params xn_params = {
> -		.vc_op = VIRTCHNL2_OP_PTP_SET_DEV_CLK_TIME,
> -		.send_buf.iov_base = &set_dev_clk_time_msg,
> -		.send_buf.iov_len = sizeof(set_dev_clk_time_msg),
> -		.recv_buf.iov_base = &set_dev_clk_time_msg,
> -		.recv_buf.iov_len = sizeof(set_dev_clk_time_msg),
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.chnl_opcode = VIRTCHNL2_OP_PTP_SET_DEV_CLK_TIME,
>  		.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
>  	};
> -	int reply_sz;
> +	int err;
>  
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -	if (reply_sz < 0)
> -		return reply_sz;
> -	if (reply_sz != sizeof(set_dev_clk_time_msg))
> -		return -EIO;
> +	err = idpf_send_mb_msg(adapter, &xn_params, &set_dev_clk_time_msg,
> +			       sizeof(set_dev_clk_time_msg));
> +	if (!err)
> +		libie_ctlq_release_rx_buf(&xn_params.recv_mem);
>  
> -	return 0;
> +	return err;
>  }
>  
>  /**
> @@ -267,23 +272,18 @@ int idpf_ptp_adj_dev_clk_time(struct idpf_adapter *adapter, s64 delta)
>  	struct virtchnl2_ptp_adj_dev_clk_time adj_dev_clk_time_msg = {
>  		.delta = cpu_to_le64(delta),
>  	};
> -	struct idpf_vc_xn_params xn_params = {
> -		.vc_op = VIRTCHNL2_OP_PTP_ADJ_DEV_CLK_TIME,
> -		.send_buf.iov_base = &adj_dev_clk_time_msg,
> -		.send_buf.iov_len = sizeof(adj_dev_clk_time_msg),
> -		.recv_buf.iov_base = &adj_dev_clk_time_msg,
> -		.recv_buf.iov_len = sizeof(adj_dev_clk_time_msg),
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.chnl_opcode = VIRTCHNL2_OP_PTP_ADJ_DEV_CLK_TIME,
>  		.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
>  	};
> -	int reply_sz;
> +	int err;
>  
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -	if (reply_sz < 0)
> -		return reply_sz;
> -	if (reply_sz != sizeof(adj_dev_clk_time_msg))
> -		return -EIO;
> +	err = idpf_send_mb_msg(adapter, &xn_params, &adj_dev_clk_time_msg,
> +			       sizeof(adj_dev_clk_time_msg));
> +	if (!err)
> +		libie_ctlq_release_rx_buf(&xn_params.recv_mem);
>  
> -	return 0;
> +	return err;
>  }
>  
>  /**
> @@ -301,23 +301,18 @@ int idpf_ptp_adj_dev_clk_fine(struct idpf_adapter *adapter, u64 incval)
>  	struct virtchnl2_ptp_adj_dev_clk_fine adj_dev_clk_fine_msg = {
>  		.incval = cpu_to_le64(incval),
>  	};
> -	struct idpf_vc_xn_params xn_params = {
> -		.vc_op = VIRTCHNL2_OP_PTP_ADJ_DEV_CLK_FINE,
> -		.send_buf.iov_base = &adj_dev_clk_fine_msg,
> -		.send_buf.iov_len = sizeof(adj_dev_clk_fine_msg),
> -		.recv_buf.iov_base = &adj_dev_clk_fine_msg,
> -		.recv_buf.iov_len = sizeof(adj_dev_clk_fine_msg),
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.chnl_opcode = VIRTCHNL2_OP_PTP_ADJ_DEV_CLK_FINE,
>  		.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
>  	};
> -	int reply_sz;
> +	int err;
>  
> -	reply_sz = idpf_vc_xn_exec(adapter, &xn_params);
> -	if (reply_sz < 0)
> -		return reply_sz;
> -	if (reply_sz != sizeof(adj_dev_clk_fine_msg))
> -		return -EIO;
> +	err = idpf_send_mb_msg(adapter, &xn_params, &adj_dev_clk_fine_msg,
> +			       sizeof(adj_dev_clk_fine_msg));
> +	if (!err)
> +		libie_ctlq_release_rx_buf(&xn_params.recv_mem);
>  
> -	return 0;
> +	return err;
>  }
>  
>  /**
> @@ -336,18 +331,16 @@ int idpf_ptp_get_vport_tstamps_caps(struct idpf_vport *vport)
>  	struct virtchnl2_ptp_tx_tstamp_latch_caps tx_tstamp_latch_caps;
>  	struct idpf_ptp_vport_tx_tstamp_caps *tstamp_caps;
>  	struct idpf_ptp_tx_tstamp *ptp_tx_tstamp, *tmp;
> -	struct idpf_vc_xn_params xn_params = {
> -		.vc_op = VIRTCHNL2_OP_PTP_GET_VPORT_TX_TSTAMP_CAPS,
> -		.send_buf.iov_base = &send_tx_tstamp_caps,
> -		.send_buf.iov_len = sizeof(send_tx_tstamp_caps),
> -		.recv_buf.iov_len = IDPF_CTLQ_MAX_BUF_LEN,
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.chnl_opcode = VIRTCHNL2_OP_PTP_GET_VPORT_TX_TSTAMP_CAPS,
>  		.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
>  	};
>  	enum idpf_ptp_access tstamp_access, get_dev_clk_access;
>  	struct idpf_ptp *ptp = vport->adapter->ptp;
>  	struct list_head *head;
> -	int err = 0, reply_sz;
> +	size_t reply_sz;
>  	u16 num_latches;
> +	int err = 0;
>  	u32 size;
>  
>  	if (!ptp)
> @@ -359,19 +352,19 @@ int idpf_ptp_get_vport_tstamps_caps(struct idpf_vport *vport)
>  	    get_dev_clk_access == IDPF_PTP_NONE)
>  		return -EOPNOTSUPP;
>  
> -	rcv_tx_tstamp_caps = kzalloc(IDPF_CTLQ_MAX_BUF_LEN, GFP_KERNEL);
> -	if (!rcv_tx_tstamp_caps)
> -		return -ENOMEM;
> -
>  	send_tx_tstamp_caps.vport_id = cpu_to_le32(vport->vport_id);
> -	xn_params.recv_buf.iov_base = rcv_tx_tstamp_caps;
>  
> -	reply_sz = idpf_vc_xn_exec(vport->adapter, &xn_params);
> -	if (reply_sz < 0) {
> -		err = reply_sz;
> +	err = idpf_send_mb_msg(vport->adapter, &xn_params, &send_tx_tstamp_caps,
> +			       sizeof(send_tx_tstamp_caps));
> +	if (err)
> +		return err;
> +
> +	rcv_tx_tstamp_caps = xn_params.recv_mem.iov_base;
> +	reply_sz = xn_params.recv_mem.iov_len;
> +	if (reply_sz < sizeof(*rcv_tx_tstamp_caps)) {
> +		err = -EIO;
>  		goto get_tstamp_caps_out;
>  	}
> -
>  	num_latches = le16_to_cpu(rcv_tx_tstamp_caps->num_latches);
>  	size = struct_size(rcv_tx_tstamp_caps, tstamp_latches, num_latches);
>  	if (reply_sz != size) {
> @@ -426,7 +419,7 @@ int idpf_ptp_get_vport_tstamps_caps(struct idpf_vport *vport)
>  	}
>  
>  	vport->tx_tstamp_caps = tstamp_caps;
> -	kfree(rcv_tx_tstamp_caps);
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
>  
>  	return 0;
>  
> @@ -439,7 +432,7 @@ int idpf_ptp_get_vport_tstamps_caps(struct idpf_vport *vport)
>  
>  	kfree(tstamp_caps);
>  get_tstamp_caps_out:
> -	kfree(rcv_tx_tstamp_caps);
> +	libie_ctlq_release_rx_buf(&xn_params.recv_mem);
>  
>  	return err;
>  }
> @@ -536,9 +529,9 @@ idpf_ptp_get_tstamp_value(struct idpf_vport *vport,
>  
>  /**
>   * idpf_ptp_get_tx_tstamp_async_handler - Async callback for getting Tx tstamps
> - * @adapter: Driver specific private structure
> - * @xn: transaction for message
> - * @ctlq_msg: received message
> + * @ctx: adapter pointer
> + * @mem: address and size of the response
> + * @status: return value of the request
>   *
>   * Read the tstamps Tx tstamp values from a received message and put them
>   * directly to the skb. The number of timestamps to read is specified by
> @@ -546,22 +539,23 @@ idpf_ptp_get_tstamp_value(struct idpf_vport *vport,
>   *
>   * Return: 0 on success, -errno otherwise.
>   */
> -static int
> -idpf_ptp_get_tx_tstamp_async_handler(struct idpf_adapter *adapter,
> -				     struct idpf_vc_xn *xn,
> -				     const struct idpf_ctlq_msg *ctlq_msg)
> +static void
> +idpf_ptp_get_tx_tstamp_async_handler(void *ctx, struct kvec *mem, int status)
>  {
>  	struct virtchnl2_ptp_get_vport_tx_tstamp_latches *recv_tx_tstamp_msg;
>  	struct idpf_ptp_vport_tx_tstamp_caps *tx_tstamp_caps;
>  	struct virtchnl2_ptp_tx_tstamp_latch tstamp_latch;
>  	struct idpf_ptp_tx_tstamp *tx_tstamp, *tmp;
>  	struct idpf_vport *tstamp_vport = NULL;
> +	struct idpf_adapter *adapter = ctx;
>  	struct list_head *head;
>  	u16 num_latches;
>  	u32 vport_id;
> -	int err = 0;
>  
> -	recv_tx_tstamp_msg = ctlq_msg->ctx.indirect.payload->va;
> +	if (status)
> +		return;
> +
> +	recv_tx_tstamp_msg = mem->iov_base;
>  	vport_id = le32_to_cpu(recv_tx_tstamp_msg->vport_id);
>  
>  	idpf_for_each_vport(adapter, vport) {
> @@ -575,7 +569,7 @@ idpf_ptp_get_tx_tstamp_async_handler(struct idpf_adapter *adapter,
>  	}
>  
>  	if (!tstamp_vport || !tstamp_vport->tx_tstamp_caps)
> -		return -EINVAL;
> +		return;
>  
>  	tx_tstamp_caps = tstamp_vport->tx_tstamp_caps;
>  	num_latches = le16_to_cpu(recv_tx_tstamp_msg->num_latches);
> @@ -589,13 +583,13 @@ idpf_ptp_get_tx_tstamp_async_handler(struct idpf_adapter *adapter,
>  		if (!tstamp_latch.valid)
>  			continue;
>  
> -		if (list_empty(head)) {
> -			err = -ENOBUFS;
> +		if (list_empty(head))
>  			goto unlock;
> -		}
>  
>  		list_for_each_entry_safe(tx_tstamp, tmp, head, list_member) {
>  			if (tstamp_latch.index == tx_tstamp->idx) {
> +				int err;
> +
>  				list_del(&tx_tstamp->list_member);
>  				err = idpf_ptp_get_tstamp_value(tstamp_vport,
>  								&tstamp_latch,
> @@ -610,8 +604,6 @@ idpf_ptp_get_tx_tstamp_async_handler(struct idpf_adapter *adapter,
>  
>  unlock:
>  	spin_unlock_bh(&tx_tstamp_caps->latches_lock);
> -
> -	return err;
>  }
>  
>  /**
> @@ -627,15 +619,15 @@ int idpf_ptp_get_tx_tstamp(struct idpf_vport *vport)
>  {
>  	struct virtchnl2_ptp_get_vport_tx_tstamp_latches *send_tx_tstamp_msg;
>  	struct idpf_ptp_vport_tx_tstamp_caps *tx_tstamp_caps;
> -	struct idpf_vc_xn_params xn_params = {
> -		.vc_op = VIRTCHNL2_OP_PTP_GET_VPORT_TX_TSTAMP,
> +	struct libie_ctlq_xn_send_params xn_params = {
> +		.chnl_opcode = VIRTCHNL2_OP_PTP_GET_VPORT_TX_TSTAMP,
>  		.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
> -		.async = true,
> -		.async_handler = idpf_ptp_get_tx_tstamp_async_handler,
> +		.resp_cb = idpf_ptp_get_tx_tstamp_async_handler,
> +		.send_ctx = vport->adapter,
>  	};
>  	struct idpf_ptp_tx_tstamp *ptp_tx_tstamp;
> -	int reply_sz, size, msg_size;
>  	struct list_head *head;
> +	int size, msg_size;
>  	bool state_upd;
>  	u16 id = 0;
>  
> @@ -668,11 +660,7 @@ int idpf_ptp_get_tx_tstamp(struct idpf_vport *vport)
>  	msg_size = struct_size(send_tx_tstamp_msg, tstamp_latches, id);
>  	send_tx_tstamp_msg->vport_id = cpu_to_le32(vport->vport_id);
>  	send_tx_tstamp_msg->num_latches = cpu_to_le16(id);
> -	xn_params.send_buf.iov_base = send_tx_tstamp_msg;
> -	xn_params.send_buf.iov_len = msg_size;
> -
> -	reply_sz = idpf_vc_xn_exec(vport->adapter, &xn_params);
> -	kfree(send_tx_tstamp_msg);
>  
> -	return min(reply_sz, 0);
> +	return idpf_send_mb_msg_kfree(vport->adapter, &xn_params,
> +				      send_tx_tstamp_msg, msg_size);
>  }
> -- 
> 2.47.1
> 

^ permalink raw reply

* Re: [PATCH v4 04/30] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration
From: Dongli Zhang @ 2026-05-18  7:52 UTC (permalink / raw)
  To: David Woodhouse, kvm
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Thomas Gleixner,
	Sean Christopherson, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Dave Hansen, Vitaly Kuznetsov, x86, Marc Zyngier, Juergen Gross,
	Boris Ostrovsky, Paul Durrant, Jonathan Cameron, Sascha Bischoff,
	Jack Allister, Joey Gouly, joe.jin, linux-doc, linux-kernel,
	xen-devel, linux-kselftest
In-Reply-To: <20260509224824.3264567-5-dwmw2@infradead.org>



On 5/9/26 3:46 PM, David Woodhouse wrote:
> From: Jack Allister <jalliste@amazon.com>
> 
> In the common case (where kvm->arch.use_master_clock is true), the KVM
> clock is defined as a simple arithmetic function of the guest TSC, based
> on a reference point stored in kvm->arch.master_kernel_ns and
> kvm->arch.master_cycle_now.
> 
> The existing KVM_[GS]ET_CLOCK functionality does not allow for this
> relationship to be precisely saved and restored by userspace. All it can
> currently do is set the KVM clock at a given UTC reference time, which
> is necessarily imprecise.
> 
> So on live update, the guest TSC can remain cycle accurate at precisely
> the same offset from the host TSC, but there is no way for userspace to
> restore the KVM clock accurately.
> 
> Even on live migration to a new host, where the accuracy of the guest
> time-keeping is fundamentally limited by the accuracy of wallclock
> synchronization between the source and destination hosts, the clock jump
> experienced by the guest's TSC and its KVM clock should at least be
> *consistent*. Even when the guest TSC suffers a discontinuity, its KVM
> clock should still remain the *same* arithmetic function of the guest
> TSC, and not suffer an *additional* discontinuity.
> 
> To allow for accurate migration of the KVM clock, add per-vCPU ioctls
> which save and restore the actual PV clock info in
> pvclock_vcpu_time_info.
> 
> The restoration in KVM_SET_CLOCK_GUEST works by creating a new reference
> point in time just as kvm_update_masterclock() does, and calculating the
> corresponding guest TSC value. This guest TSC value is then passed
> through the user-provided pvclock structure to generate the *intended*
> KVM clock value at that point in time, and through the *actual* KVM
> clock calculation. Then kvm->arch.kvmclock_offset is adjusted to
> eliminate the difference.
> 
> Where kvm->arch.use_master_clock is false (because the host TSC is
> unreliable, or the guest TSCs are configured strangely), the KVM clock
> is *not* defined as a function of the guest TSC so KVM_GET_CLOCK_GUEST
> returns an error. In this case, as documented, userspace shall use the
> legacy KVM_GET_CLOCK ioctl. The loss of precision is acceptable in this

The description here confused me a little. It sounds like userspace should call
KVM_SET_CLOCK if KVM_SET_CLOCK_GUEST fails. However, I assume it actually means
that userspace should do nothing extra if KVM_SET_CLOCK_GUEST fails, and simply
rely on the prior KVM_SET_CLOCK and KVM_VCPU_TSC_OFFSET workflow described in
patch 07. Is that correct?

> case since the clocks are imprecise in this mode anyway.
> 
> On *restoration*, if kvm->arch.use_master_clock is false, an error is
> returned for similar reasons and userspace shall fall back to using
> KVM_SET_CLOCK. This does mean that, as documented, userspace needs to
> use *both* KVM_GET_CLOCK_GUEST and KVM_GET_CLOCK and send both results
> with the migration data (unless the intent is to refuse to resume on a
> host with bad TSC).
> 
> Co-developed-by: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> Signed-off-by: Jack Allister <jalliste@amazon.com>
> Reviewed-by: Paul Durrant <paul@xen.org>
> Cc: Dongli Zhang <dongli.zhang@oracle.com>
> ---
>  Documentation/virt/kvm/api.rst |  37 ++++++++
>  arch/x86/kvm/x86.c             | 151 +++++++++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h       |   3 +
>  3 files changed, 191 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 52bbbb553ce1..2268b4442df6 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6553,6 +6553,43 @@ KVM_S390_KEYOP_SSKE
>    Sets the storage key for the guest address ``guest_addr`` to the key
>    specified in ``key``, returning the previous value in ``key``.
>  
> +4.145 KVM_GET_CLOCK_GUEST
> +----------------------------
> +
> +:Capability: none
> +:Architectures: x86_64
> +:Type: vcpu ioctl
> +:Parameters: struct pvclock_vcpu_time_info (out)
> +:Returns: 0 on success, <0 on error
> +
> +Retrieves the current time information structure used for KVM/PV clocks,
> +in precisely the form advertised to the guest vCPU, which gives parameters
> +for a direct conversion from a guest TSC value to nanoseconds.
> +
> +When the KVM clock is not in "master clock" mode, for example because the
> +host TSC is unreliable or the guest TSCs are oddly configured, the KVM clock
> +is actually defined by the host CLOCK_MONOTONIC_RAW instead of the guest TSC.
> +In this case, the KVM_GET_CLOCK_GUEST ioctl returns -EINVAL.
> +
> +4.146 KVM_SET_CLOCK_GUEST
> +----------------------------
> +
> +:Capability: none

Do we need a KVM_CHECK_EXTENSION capability for this? If userspace wants to
support the new API, should it detect availability via KVM_CHECK_EXTENSION, or
simply try the ioctl and handle failure?

> +:Architectures: x86_64
> +:Type: vcpu ioctl
> +:Parameters: struct pvclock_vcpu_time_info (in)
> +:Returns: 0 on success, <0 on error
> +
> +Sets the KVM clock (for the whole VM) in terms of the vCPU TSC, using the
> +pvclock structure as returned by KVM_GET_CLOCK_GUEST. This allows the precise
> +arithmetic relationship between guest TSC and KVM clock to be preserved by
> +userspace across migration.
> +
> +When the KVM clock is not in "master clock" mode, and the KVM clock is actually
> +defined by the host CLOCK_MONOTONIC_RAW, this ioctl returns -EINVAL. Userspace
> +may choose to set the clock using the less precise KVM_SET_CLOCK ioctl, or may
> +choose to fail, denying migration to a host whose TSC is misbehaving.
> +
>  .. _kvm_run:
>  
>  5. The kvm_run structure
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d9ef165df6a1..d1327d5fba3f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6205,6 +6205,149 @@ static int kvm_get_reg_list(struct kvm_vcpu *vcpu,
>  	return 0;
>  }
>  
> +#ifdef CONFIG_X86_64
> +static int kvm_vcpu_ioctl_get_clock_guest(struct kvm_vcpu *v, void __user *argp)
> +{
> +	struct pvclock_vcpu_time_info hv_clock = {};
> +	struct kvm_vcpu_arch *vcpu = &v->arch;
> +	struct kvm_arch *ka = &v->kvm->arch;
> +	unsigned int seq;
> +
> +	/*
> +	 * If KVM_REQ_CLOCK_UPDATE is already pending, or if the pvclock
> +	 * has never been generated at all, call kvm_guest_time_update().
> +	 */
> +	if (kvm_check_request(KVM_REQ_CLOCK_UPDATE, v) || !vcpu->hw_tsc_hz) {

This was flagged by AI, and I am still checking whether it is a real issue.

What happens if KVM_REQ_MASTERCLOCK_UPDATE and KVM_REQ_CLOCK_UPDATE are both
pending?

From my perspective, I am also curious how we should reason about this in other
scenarios in the future. Specifically, when do we need to process
KVM_REQ_MASTERCLOCK_UPDATE before KVM_REQ_CLOCK_UPDATE, and when is it
acceptable not to? I noticed that kvm_cpuid() already processes only
KVM_REQ_CLOCK_UPDATE.

> +		int idx = srcu_read_lock(&v->kvm->srcu);
> +		int ret = kvm_guest_time_update(v);
> +
> +		srcu_read_unlock(&v->kvm->srcu, idx);
> +		if (ret)
> +			return -EINVAL;
> +	}
> +
> +	/*
> +	 * Reconstruct the pvclock from the master clock state, matching
> +	 * exactly what kvm_guest_time_update() writes to the guest.
> +	 */
> +	do {
> +		seq = read_seqcount_begin(&ka->pvclock_sc);
> +
> +		if (!ka->use_master_clock)
> +			return -EINVAL;
> +
> +		hv_clock.tsc_timestamp = kvm_read_l1_tsc(v, ka->master_cycle_now);
> +		hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset;
> +	} while (read_seqcount_retry(&ka->pvclock_sc, seq));
> +
> +	hv_clock.tsc_shift = vcpu->pvclock_tsc_shift;
> +	hv_clock.tsc_to_system_mul = vcpu->pvclock_tsc_mul;
> +	hv_clock.flags = PVCLOCK_TSC_STABLE_BIT;
> +
> +	if (copy_to_user(argp, &hv_clock, sizeof(hv_clock)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +/*
> + * Reverse the calculation in the hv_clock definition.
> + *
> + * time_ns = ( (cycles << shift) * mul ) >> 32;
> + * (although shift can be negative, so that's bad C)
> + *
> + * So for a single second,
> + * NSEC_PER_SEC = ( ( FREQ_HZ << shift) * mul ) >> 32
> + * NSEC_PER_SEC << 32 = ( FREQ_HZ << shift ) * mul
> + * ( NSEC_PER_SEC << 32 ) / mul = FREQ_HZ << shift
> + * ( NSEC_PER_SEC << 32 ) / mul ) >> shift = FREQ_HZ
> + */
> +static u64 hvclock_to_hz(u32 mul, s8 shift)
> +{
> +	u64 tm = NSEC_PER_SEC << 32;
> +
> +	/* Maximise precision. Shift right until the top bit is set */
> +	tm <<= 2;
> +	shift += 2;
> +
> +	/* While 'mul' is even, increase the shift *after* the division */
> +	while (!(mul & 1)) {
> +		shift++;
> +		mul >>= 1;
> +	}
> +
> +	tm /= mul;
> +
> +	if (shift > 0)
> +		return tm >> shift;
> +	else
> +		return tm << -shift;
> +}
> +
> +static int kvm_vcpu_ioctl_set_clock_guest(struct kvm_vcpu *v, void __user *argp)
> +{
> +	struct pvclock_vcpu_time_info user_hv_clock;
> +	struct kvm *kvm = v->kvm;
> +	struct kvm_arch *ka = &kvm->arch;
> +	u64 curr_tsc_hz, user_tsc_hz;
> +	u64 user_clk_ns;
> +	u64 guest_tsc;
> +	int rc = 0;
> +
> +	if (copy_from_user(&user_hv_clock, argp, sizeof(user_hv_clock)))
> +		return -EFAULT;
> +
> +	if (!user_hv_clock.tsc_to_system_mul)
> +		return -EINVAL;
> +
> +	user_tsc_hz = hvclock_to_hz(user_hv_clock.tsc_to_system_mul,
> +				    user_hv_clock.tsc_shift);
> +
> +	kvm_hv_request_tsc_page_update(kvm);
> +	kvm_start_pvclock_update(kvm);
> +	pvclock_update_vm_gtod_copy(kvm);
> +
> +	if (!ka->use_master_clock) {
> +		rc = -EINVAL;
> +		goto out;
> +	}
> +
> +	curr_tsc_hz = (u64)get_cpu_tsc_khz() * 1000;
> +	if (unlikely(curr_tsc_hz == 0)) {
> +		rc = -EINVAL;
> +		goto out;
> +	}
> +
> +	if (kvm_caps.has_tsc_control)
> +		curr_tsc_hz = kvm_scale_tsc(curr_tsc_hz,
> +					    v->arch.l1_tsc_scaling_ratio);
> +
> +	/*
> +	 * Allow for a discrepancy of 1 kHz either way between the TSC
> +	 * frequency used to generate the user's pvclock and the current
> +	 * host's measured frequency, since they may not precisely match.
> +	 */
> +	if (user_tsc_hz < curr_tsc_hz - 1000 ||
> +	    user_tsc_hz > curr_tsc_hz + 1000) {
> +		rc = -ERANGE;
> +		goto out;
> +	}
> +
> +	/*
> +	 * Calculate the guest TSC at the new reference point, and the
> +	 * corresponding KVM clock value according to user_hv_clock.
> +	 * Adjust kvmclock_offset so both definitions agree.
> +	 */
> +	guest_tsc = kvm_read_l1_tsc(v, ka->master_cycle_now);
> +	user_clk_ns = __pvclock_read_cycles(&user_hv_clock, guest_tsc);
> +	ka->kvmclock_offset = user_clk_ns - ka->master_kernel_ns;

I used to explore adjusting ka->kvmclock_offset in KVM_SET_CLOCK based on the
old hv_clock and the new hv_clock long time ago. At that time, my concern was
what would happen if userspace provided bogus values. Theoretically, this is
possible with any ioctl. My concern may be unnecessary.

Would it be helpful to validate that the delta is within a reasonable range,
e.g. that the drift can never be more than five minutes (forward or backward)?

Thank you very much!

Dongli Zhang

> +
> +out:
> +	kvm_end_pvclock_update(kvm);
> +	return rc;
> +}
> +#endif
> +
>  long kvm_arch_vcpu_ioctl(struct file *filp,
>  			 unsigned int ioctl, unsigned long arg)
>  {
> @@ -6605,6 +6748,14 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  		srcu_read_unlock(&vcpu->kvm->srcu, idx);
>  		break;
>  	}
> +#ifdef CONFIG_X86_64
> +	case KVM_SET_CLOCK_GUEST:
> +		r = kvm_vcpu_ioctl_set_clock_guest(vcpu, argp);
> +		break;
> +	case KVM_GET_CLOCK_GUEST:
> +		r = kvm_vcpu_ioctl_get_clock_guest(vcpu, argp);
> +		break;
> +#endif
>  #ifdef CONFIG_KVM_HYPERV
>  	case KVM_GET_SUPPORTED_HV_CPUID:
>  		r = kvm_ioctl_get_supported_hv_cpuid(vcpu, argp);
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 6c8afa2047bf..9b50191b859c 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1669,4 +1669,7 @@ struct kvm_pre_fault_memory {
>  	__u64 padding[5];
>  };
>  
> +#define KVM_SET_CLOCK_GUEST	_IOW(KVMIO, 0xd6, struct pvclock_vcpu_time_info)
> +#define KVM_GET_CLOCK_GUEST	_IOR(KVMIO, 0xd7, struct pvclock_vcpu_time_info)
> +
>  #endif /* __LINUX_KVM_H */


^ permalink raw reply

* Re: [PATCH v5 05/13] dt-bindings: iio: frequency: add ad9910
From: Krzysztof Kozlowski @ 2026-05-18  7:52 UTC (permalink / raw)
  To: Rodrigo Alencar
  Cc: linux-iio, devicetree, linux-kernel, linux-doc, linux-hardening,
	Lars-Peter Clausen, Michael Hennerich, Jonathan Cameron,
	David Lechner, Andy Shevchenko, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Philipp Zabel, Jonathan Corbet, Shuah Khan,
	Kees Cook, Gustavo A. R. Silva
In-Reply-To: <20260517-ad9910-iio-driver-v5-5-31599c88314a@analog.com>

On Sun, May 17, 2026 at 07:37:49PM +0100, Rodrigo Alencar wrote:
> +maintainers:
> +  - Rodrigo Alencar <rodrigo.alencar@analog.com>
> +
> +description:
> +  The AD9910 is a 1 GSPS direct digital synthesizer (DDS) with an integrated
> +  14-bit DAC. It features single tone mode with 8 configurable profiles,
> +  a digital ramp generator, RAM control, OSK, and a parallel data port for
> +  high-speed streaming.
> +
> +  https://www.analog.com/en/products/ad9910.html
> +
> +properties:
> +  compatible:
> +    const: adi,ad9910
> +
> +  reg:
> +    maxItems: 1
> +
> +  spi-max-frequency:
> +    maximum: 70000000
> +
> +  clocks:
> +    minItems: 1
> +    items:
> +      - description: Reference clock (REF_CLK).
> +      - description: Optional synchronization clock (SYNC_IN).
> +
> +  clock-names:
> +    oneOf:
> +      - items:
> +          - const: ref_clk
> +      - items:
> +          - const: ref_clk
> +          - const: sync_in

So that's just items with two items and minItems: 1. Like you have in
"clocks:".

You got this comment already at v2.


> +
> +  '#clock-cells':
> +    const: 1
> +
> +  clock-output-names:
> +    minItems: 1
> +    maxItems: 3
> +    items:
> +      enum: [ sync_clk, pdclk, sync_out ]

Why are the names fixed? And why is the order random?

> +
> +  interrupts:
> +    minItems: 1
> +    items:
> +      - description:
> +          Signal that indicates that Digital Ramp Generator has reached a limit.
> +      - description:
> +          Signal that indicates the end of a RAM Sweep.
> +
> +  interrupt-names:
> +    minItems: 1
> +    maxItems: 2
> +    items:
> +      enum: [ drover, ram_swp_ovr ]

Your "interrupts:" do not allow flexibility. Are you sure interrupts are
optional in the hardware?

> +
> +  dvdd-io33-supply:
> +    description: 3.3V Digital I/O supply.
> +
> +  avdd33-supply:
> +    description: 3.3V Analog DAC supply.

Best regards,
Krzysztof


^ permalink raw reply

* Re: [PATCH net-next v3 07/14] idpf: refactor idpf to use libie_pci APIs
From: Larysa Zaremba @ 2026-05-18  7:40 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev, Tony Nguyen
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	Pavan Kumar Linga, przemyslaw.kitszel, aleksander.lobakin,
	sridhar.samudrala, anjali.singhai, michal.swiatkowski,
	maciej.fijalkowski, emil.s.tantilov, joshua.a.hay, jacob.e.keller,
	jayaprakash.shanmugam, jiri, horms, corbet, richardcochran,
	linux-doc, bhelgaas, linux-pci, Samuel Salin
In-Reply-To: <20260515224443.2772147-8-anthony.l.nguyen@intel.com>

On Fri, May 15, 2026 at 03:44:31PM -0700, Tony Nguyen wrote:
> From: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
> 
> Use libie_pci init and MMIO APIs where possible, struct idpf_hw cannot be
> deleted for now as it also houses control queues that will be refactored
> later. Use libie_cp header for libie_ctlq_ctx that contains mmio info from
> the start in order to not increase the diff later.

I had reviewed the Sashiko feedback [0]. Here is why I not find the feedback 
very helpful for this particular patch:

1. "libie_pci_get_mmio_addr() can return a NULL pointer" - should not happen, if 
   the mapping is successful. If it was not, we would not be here.
2. "cleanup subsystem guidelines recommend..." - comment to non-modified code.
3. "idpf_vc_core_deinit() now calls idpf_decfg_lan_memory_regions(), which 
    unmaps the MMIO regions. When this is called during the teardown sequence in 
    idpf_remove(), the netdevs are still registered." - 
   idpf_decfg_lan_memory_regions() unmaps only non-static regions, so this is 
   fine.

[0] https://sashiko.dev/#/patchset/20260515224443.2772147-1-anthony.l.nguyen%40intel.com


> 
> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
> Co-developed-by: Larysa Zaremba <larysa.zaremba@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> Tested-by: Samuel Salin <Samuel.salin@intel.com>
> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> ---
>  drivers/net/ethernet/intel/idpf/Kconfig       |   1 +
>  drivers/net/ethernet/intel/idpf/idpf.h        |  70 +-------
>  .../net/ethernet/intel/idpf/idpf_controlq.c   |  26 ++-
>  .../net/ethernet/intel/idpf/idpf_controlq.h   |   2 -
>  drivers/net/ethernet/intel/idpf/idpf_dev.c    |  61 ++++---
>  drivers/net/ethernet/intel/idpf/idpf_idc.c    |  38 ++--
>  drivers/net/ethernet/intel/idpf/idpf_lib.c    |   7 +-
>  drivers/net/ethernet/intel/idpf/idpf_main.c   | 114 ++++++------
>  drivers/net/ethernet/intel/idpf/idpf_vf_dev.c |  57 +++---
>  .../net/ethernet/intel/idpf/idpf_virtchnl.c   | 169 +++++++++---------
>  .../ethernet/intel/idpf/idpf_virtchnl_ptp.c   |  58 +++---
>  11 files changed, 288 insertions(+), 315 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/Kconfig b/drivers/net/ethernet/intel/idpf/Kconfig
> index adab2154125b..586df3a4afe9 100644
> --- a/drivers/net/ethernet/intel/idpf/Kconfig
> +++ b/drivers/net/ethernet/intel/idpf/Kconfig
> @@ -6,6 +6,7 @@ config IDPF
>  	depends on PCI_MSI
>  	depends on PTP_1588_CLOCK_OPTIONAL
>  	select DIMLIB
> +	select LIBIE_CP
>  	select LIBETH_XDP
>  	help
>  	  This driver supports Intel(R) Infrastructure Data Path Function
> diff --git a/drivers/net/ethernet/intel/idpf/idpf.h b/drivers/net/ethernet/intel/idpf/idpf.h
> index 0d08f51be7e3..efdb58990a8b 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf.h
> +++ b/drivers/net/ethernet/intel/idpf/idpf.h
> @@ -23,6 +23,7 @@ struct idpf_rss_data;
>  
>  #include <linux/intel/iidc_rdma.h>
>  #include <linux/intel/iidc_rdma_idpf.h>
> +#include <linux/intel/libie/controlq.h>
>  #include <linux/intel/virtchnl2.h>
>  
>  #include "idpf_txrx.h"
> @@ -625,6 +626,7 @@ struct idpf_vc_xn_manager;
>   * @flags: See enum idpf_flags
>   * @reset_reg: See struct idpf_reset_reg
>   * @hw: Device access data
> + * @ctlq_ctx: controlq context
>   * @num_avail_msix: Available number of MSIX vectors
>   * @num_msix_entries: Number of entries in MSIX table
>   * @msix_entries: MSIX table
> @@ -682,6 +684,7 @@ struct idpf_adapter {
>  	DECLARE_BITMAP(flags, IDPF_FLAGS_NBITS);
>  	struct idpf_reset_reg reset_reg;
>  	struct idpf_hw hw;
> +	struct libie_ctlq_ctx ctlq_ctx;
>  	u16 num_avail_msix;
>  	u16 num_msix_entries;
>  	struct msix_entry *msix_entries;
> @@ -870,70 +873,6 @@ static inline u8 idpf_get_min_tx_pkt_len(struct idpf_adapter *adapter)
>  	return pkt_len ? pkt_len : IDPF_TX_MIN_PKT_LEN;
>  }
>  
> -/**
> - * idpf_get_mbx_reg_addr - Get BAR0 mailbox register address
> - * @adapter: private data struct
> - * @reg_offset: register offset value
> - *
> - * Return: BAR0 mailbox register address based on register offset.
> - */
> -static inline void __iomem *idpf_get_mbx_reg_addr(struct idpf_adapter *adapter,
> -						  resource_size_t reg_offset)
> -{
> -	return adapter->hw.mbx.vaddr + reg_offset;
> -}
> -
> -/**
> - * idpf_get_rstat_reg_addr - Get BAR0 rstat register address
> - * @adapter: private data struct
> - * @reg_offset: register offset value
> - *
> - * Return: BAR0 rstat register address based on register offset.
> - */
> -static inline void __iomem *idpf_get_rstat_reg_addr(struct idpf_adapter *adapter,
> -						    resource_size_t reg_offset)
> -{
> -	reg_offset -= adapter->dev_ops.static_reg_info[1].start;
> -
> -	return adapter->hw.rstat.vaddr + reg_offset;
> -}
> -
> -/**
> - * idpf_get_reg_addr - Get BAR0 register address
> - * @adapter: private data struct
> - * @reg_offset: register offset value
> - *
> - * Based on the register offset, return the actual BAR0 register address
> - */
> -static inline void __iomem *idpf_get_reg_addr(struct idpf_adapter *adapter,
> -					      resource_size_t reg_offset)
> -{
> -	struct idpf_hw *hw = &adapter->hw;
> -
> -	for (int i = 0; i < hw->num_lan_regs; i++) {
> -		struct idpf_mmio_reg *region = &hw->lan_regs[i];
> -
> -		if (reg_offset >= region->addr_start &&
> -		    reg_offset < (region->addr_start + region->addr_len)) {
> -			/* Convert the offset so that it is relative to the
> -			 * start of the region.  Then add the base address of
> -			 * the region to get the final address.
> -			 */
> -			reg_offset -= region->addr_start;
> -
> -			return region->vaddr + reg_offset;
> -		}
> -	}
> -
> -	/* It's impossible to hit this case with offsets from the CP. But if we
> -	 * do for any other reason, the kernel will panic on that register
> -	 * access. Might as well do it here to make it clear what's happening.
> -	 */
> -	BUG();
> -
> -	return NULL;
> -}
> -
>  /**
>   * idpf_is_reset_detected - check if we were reset at some point
>   * @adapter: driver specific private structure
> @@ -945,7 +884,8 @@ static inline bool idpf_is_reset_detected(struct idpf_adapter *adapter)
>  	if (!adapter->hw.arq)
>  		return true;
>  
> -	return !(readl(idpf_get_mbx_reg_addr(adapter, adapter->hw.arq->reg.len)) &
> +	return !(readl(libie_pci_get_mmio_addr(&adapter->ctlq_ctx.mmio_info,
> +					       adapter->hw.arq->reg.len)) &
>  		 adapter->hw.arq->reg.len_mask);
>  }
>  
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_controlq.c b/drivers/net/ethernet/intel/idpf/idpf_controlq.c
> index d2dde43269e9..020b08367e18 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_controlq.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_controlq.c
> @@ -1,7 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0-only
>  /* Copyright (C) 2023 Intel Corporation */
>  
> -#include "idpf_controlq.h"
> +#include "idpf.h"
>  
>  /**
>   * idpf_ctlq_setup_regs - initialize control queue registers
> @@ -34,21 +34,27 @@ static void idpf_ctlq_setup_regs(struct idpf_ctlq_info *cq,
>  static void idpf_ctlq_init_regs(struct idpf_hw *hw, struct idpf_ctlq_info *cq,
>  				bool is_rxq)
>  {
> +	struct libie_mmio_info *mmio = &hw->back->ctlq_ctx.mmio_info;
> +
>  	/* Update tail to post pre-allocated buffers for rx queues */
>  	if (is_rxq)
> -		idpf_mbx_wr32(hw, cq->reg.tail, (u32)(cq->ring_size - 1));
> +		writel((u32)(cq->ring_size - 1),
> +		       libie_pci_get_mmio_addr(mmio, cq->reg.tail));
>  
>  	/* For non-Mailbox control queues only TAIL need to be set */
>  	if (cq->q_id != -1)
>  		return;
>  
>  	/* Clear Head for both send or receive */
> -	idpf_mbx_wr32(hw, cq->reg.head, 0);
> +	writel(0, libie_pci_get_mmio_addr(mmio, cq->reg.head));
>  
>  	/* set starting point */
> -	idpf_mbx_wr32(hw, cq->reg.bal, lower_32_bits(cq->desc_ring.pa));
> -	idpf_mbx_wr32(hw, cq->reg.bah, upper_32_bits(cq->desc_ring.pa));
> -	idpf_mbx_wr32(hw, cq->reg.len, (cq->ring_size | cq->reg.len_ena_mask));
> +	writel(lower_32_bits(cq->desc_ring.pa),
> +	       libie_pci_get_mmio_addr(mmio, cq->reg.bal));
> +	writel(upper_32_bits(cq->desc_ring.pa),
> +	       libie_pci_get_mmio_addr(mmio, cq->reg.bah));
> +	writel((cq->ring_size | cq->reg.len_ena_mask),
> +	       libie_pci_get_mmio_addr(mmio, cq->reg.len));
>  }
>  
>  /**
> @@ -326,7 +332,9 @@ int idpf_ctlq_send(struct idpf_hw *hw, struct idpf_ctlq_info *cq,
>  	 */
>  	dma_wmb();
>  
> -	idpf_mbx_wr32(hw, cq->reg.tail, cq->next_to_use);
> +	writel(cq->next_to_use,
> +	       libie_pci_get_mmio_addr(&hw->back->ctlq_ctx.mmio_info,
> +				       cq->reg.tail));
>  
>  err_unlock:
>  	spin_unlock(&cq->cq_lock);
> @@ -518,7 +526,9 @@ int idpf_ctlq_post_rx_buffs(struct idpf_hw *hw, struct idpf_ctlq_info *cq,
>  
>  		dma_wmb();
>  
> -		idpf_mbx_wr32(hw, cq->reg.tail, cq->next_to_post);
> +		writel(cq->next_to_post,
> +		       libie_pci_get_mmio_addr(&hw->back->ctlq_ctx.mmio_info,
> +					       cq->reg.tail));
>  	}
>  
>  	spin_unlock(&cq->cq_lock);
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_controlq.h b/drivers/net/ethernet/intel/idpf/idpf_controlq.h
> index de4ece40c2ff..acf595e9265f 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_controlq.h
> +++ b/drivers/net/ethernet/intel/idpf/idpf_controlq.h
> @@ -109,8 +109,6 @@ struct idpf_mmio_reg {
>   * Align to ctlq_hw_info
>   */
>  struct idpf_hw {
> -	struct idpf_mmio_reg mbx;
> -	struct idpf_mmio_reg rstat;
>  	/* Array of remaining LAN BAR regions */
>  	int num_lan_regs;
>  	struct idpf_mmio_reg *lan_regs;
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_dev.c b/drivers/net/ethernet/intel/idpf/idpf_dev.c
> index 1a0c71c95ef1..e36b0017186f 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_dev.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_dev.c
> @@ -16,7 +16,6 @@
>  static void idpf_ctlq_reg_init(struct idpf_adapter *adapter,
>  			       struct idpf_ctlq_create_info *cq)
>  {
> -	resource_size_t mbx_start = adapter->dev_ops.static_reg_info[0].start;
>  	int i;
>  
>  	for (i = 0; i < IDPF_NUM_DFLT_MBX_Q; i++) {
> @@ -25,22 +24,22 @@ static void idpf_ctlq_reg_init(struct idpf_adapter *adapter,
>  		switch (ccq->type) {
>  		case IDPF_CTLQ_TYPE_MAILBOX_TX:
>  			/* set head and tail registers in our local struct */
> -			ccq->reg.head = PF_FW_ATQH - mbx_start;
> -			ccq->reg.tail = PF_FW_ATQT - mbx_start;
> -			ccq->reg.len = PF_FW_ATQLEN - mbx_start;
> -			ccq->reg.bah = PF_FW_ATQBAH - mbx_start;
> -			ccq->reg.bal = PF_FW_ATQBAL - mbx_start;
> +			ccq->reg.head = PF_FW_ATQH;
> +			ccq->reg.tail = PF_FW_ATQT;
> +			ccq->reg.len = PF_FW_ATQLEN;
> +			ccq->reg.bah = PF_FW_ATQBAH;
> +			ccq->reg.bal = PF_FW_ATQBAL;
>  			ccq->reg.len_mask = PF_FW_ATQLEN_ATQLEN_M;
>  			ccq->reg.len_ena_mask = PF_FW_ATQLEN_ATQENABLE_M;
>  			ccq->reg.head_mask = PF_FW_ATQH_ATQH_M;
>  			break;
>  		case IDPF_CTLQ_TYPE_MAILBOX_RX:
>  			/* set head and tail registers in our local struct */
> -			ccq->reg.head = PF_FW_ARQH - mbx_start;
> -			ccq->reg.tail = PF_FW_ARQT - mbx_start;
> -			ccq->reg.len = PF_FW_ARQLEN - mbx_start;
> -			ccq->reg.bah = PF_FW_ARQBAH - mbx_start;
> -			ccq->reg.bal = PF_FW_ARQBAL - mbx_start;
> +			ccq->reg.head = PF_FW_ARQH;
> +			ccq->reg.tail = PF_FW_ARQT;
> +			ccq->reg.len = PF_FW_ARQLEN;
> +			ccq->reg.bah = PF_FW_ARQBAH;
> +			ccq->reg.bal = PF_FW_ARQBAL;
>  			ccq->reg.len_mask = PF_FW_ARQLEN_ARQLEN_M;
>  			ccq->reg.len_ena_mask = PF_FW_ARQLEN_ARQENABLE_M;
>  			ccq->reg.head_mask = PF_FW_ARQH_ARQH_M;
> @@ -57,13 +56,14 @@ static void idpf_ctlq_reg_init(struct idpf_adapter *adapter,
>   */
>  static void idpf_mb_intr_reg_init(struct idpf_adapter *adapter)
>  {
> +	struct libie_mmio_info *mmio = &adapter->ctlq_ctx.mmio_info;
>  	struct idpf_intr_reg *intr = &adapter->mb_vector.intr_reg;
>  	u32 dyn_ctl = le32_to_cpu(adapter->caps.mailbox_dyn_ctl);
>  
> -	intr->dyn_ctl = idpf_get_reg_addr(adapter, dyn_ctl);
> +	intr->dyn_ctl = libie_pci_get_mmio_addr(mmio, dyn_ctl);
>  	intr->dyn_ctl_intena_m = PF_GLINT_DYN_CTL_INTENA_M;
>  	intr->dyn_ctl_itridx_m = PF_GLINT_DYN_CTL_ITR_INDX_M;
> -	intr->icr_ena = idpf_get_reg_addr(adapter, PF_INT_DIR_OICR_ENA);
> +	intr->icr_ena = libie_pci_get_mmio_addr(mmio, PF_INT_DIR_OICR_ENA);
>  	intr->icr_ena_ctlq_m = PF_INT_DIR_OICR_ENA_M;
>  }
>  
> @@ -78,6 +78,7 @@ static int idpf_intr_reg_init(struct idpf_vport *vport,
>  	struct idpf_adapter *adapter = vport->adapter;
>  	u16 num_vecs = rsrc->num_q_vectors;
>  	struct idpf_vec_regs *reg_vals;
> +	struct libie_mmio_info *mmio;
>  	int num_regs, i, err = 0;
>  	u32 rx_itr, tx_itr, val;
>  	u16 total_vecs;
> @@ -93,14 +94,17 @@ static int idpf_intr_reg_init(struct idpf_vport *vport,
>  		goto free_reg_vals;
>  	}
>  
> +	mmio = &adapter->ctlq_ctx.mmio_info;
> +
>  	for (i = 0; i < num_vecs; i++) {
>  		struct idpf_q_vector *q_vector = &rsrc->q_vectors[i];
>  		u16 vec_id = rsrc->q_vector_idxs[i] - IDPF_MBX_Q_VEC;
>  		struct idpf_intr_reg *intr = &q_vector->intr_reg;
> +		struct idpf_vec_regs *reg = &reg_vals[vec_id];
>  		u32 spacing;
>  
> -		intr->dyn_ctl = idpf_get_reg_addr(adapter,
> -						  reg_vals[vec_id].dyn_ctl_reg);
> +		intr->dyn_ctl = libie_pci_get_mmio_addr(mmio,
> +							reg->dyn_ctl_reg);
>  		intr->dyn_ctl_intena_m = PF_GLINT_DYN_CTL_INTENA_M;
>  		intr->dyn_ctl_intena_msk_m = PF_GLINT_DYN_CTL_INTENA_MSK_M;
>  		intr->dyn_ctl_itridx_s = PF_GLINT_DYN_CTL_ITR_INDX_S;
> @@ -110,22 +114,21 @@ static int idpf_intr_reg_init(struct idpf_vport *vport,
>  		intr->dyn_ctl_sw_itridx_ena_m =
>  			PF_GLINT_DYN_CTL_SW_ITR_INDX_ENA_M;
>  
> -		spacing = IDPF_ITR_IDX_SPACING(reg_vals[vec_id].itrn_index_spacing,
> +		spacing = IDPF_ITR_IDX_SPACING(reg->itrn_index_spacing,
>  					       IDPF_PF_ITR_IDX_SPACING);
>  		rx_itr = PF_GLINT_ITR_ADDR(VIRTCHNL2_ITR_IDX_0,
> -					   reg_vals[vec_id].itrn_reg,
> -					   spacing);
> +					   reg->itrn_reg, spacing);
>  		tx_itr = PF_GLINT_ITR_ADDR(VIRTCHNL2_ITR_IDX_1,
> -					   reg_vals[vec_id].itrn_reg,
> -					   spacing);
> -		intr->rx_itr = idpf_get_reg_addr(adapter, rx_itr);
> -		intr->tx_itr = idpf_get_reg_addr(adapter, tx_itr);
> +					   reg->itrn_reg, spacing);
> +		intr->rx_itr = libie_pci_get_mmio_addr(mmio, rx_itr);
> +		intr->tx_itr = libie_pci_get_mmio_addr(mmio, tx_itr);
>  	}
>  
>  	/* Data vector for NOIRQ queues */
>  
>  	val = reg_vals[rsrc->q_vector_idxs[i] - IDPF_MBX_Q_VEC].dyn_ctl_reg;
> -	rsrc->noirq_dyn_ctl = idpf_get_reg_addr(adapter, val);
> +	rsrc->noirq_dyn_ctl =
> +		libie_pci_get_mmio_addr(&adapter->ctlq_ctx.mmio_info, val);
>  
>  	val = PF_GLINT_DYN_CTL_WB_ON_ITR_M | PF_GLINT_DYN_CTL_INTENA_MSK_M |
>  	      FIELD_PREP(PF_GLINT_DYN_CTL_ITR_INDX_M, IDPF_NO_ITR_UPDATE_IDX);
> @@ -143,7 +146,9 @@ static int idpf_intr_reg_init(struct idpf_vport *vport,
>   */
>  static void idpf_reset_reg_init(struct idpf_adapter *adapter)
>  {
> -	adapter->reset_reg.rstat = idpf_get_rstat_reg_addr(adapter, PFGEN_RSTAT);
> +	adapter->reset_reg.rstat =
> +		libie_pci_get_mmio_addr(&adapter->ctlq_ctx.mmio_info,
> +					PFGEN_RSTAT);
>  	adapter->reset_reg.rstat_m = PFGEN_RSTAT_PFR_STATE_M;
>  }
>  
> @@ -155,11 +160,11 @@ static void idpf_reset_reg_init(struct idpf_adapter *adapter)
>  static void idpf_trigger_reset(struct idpf_adapter *adapter,
>  			       enum idpf_flags __always_unused trig_cause)
>  {
> -	u32 reset_reg;
> +	void __iomem *addr;
>  
> -	reset_reg = readl(idpf_get_rstat_reg_addr(adapter, PFGEN_CTRL));
> -	writel(reset_reg | PFGEN_CTRL_PFSWR,
> -	       idpf_get_rstat_reg_addr(adapter, PFGEN_CTRL));
> +	addr = libie_pci_get_mmio_addr(&adapter->ctlq_ctx.mmio_info,
> +				       PFGEN_CTRL);
> +	writel(readl(addr) | PFGEN_CTRL_PFSWR, addr);
>  }
>  
>  /**
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_idc.c b/drivers/net/ethernet/intel/idpf/idpf_idc.c
> index b7d6b08fc89e..0a7edb783758 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_idc.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_idc.c
> @@ -416,9 +416,12 @@ idpf_idc_init_msix_data(struct idpf_adapter *adapter)
>  int idpf_idc_init_aux_core_dev(struct idpf_adapter *adapter,
>  			       enum iidc_function_type ftype)
>  {
> +	struct libie_mmio_info *mmio = &adapter->ctlq_ctx.mmio_info;
>  	struct iidc_rdma_core_dev_info *cdev_info;
>  	struct iidc_rdma_priv_dev_info *privd;
> -	int err, i;
> +	struct libie_pci_mmio_region *mr;
> +	size_t num_mem_regions;
> +	int err, i = 0;
>  
>  	adapter->cdev_info = kzalloc_obj(*cdev_info);
>  	if (!adapter->cdev_info)
> @@ -436,22 +439,37 @@ int idpf_idc_init_aux_core_dev(struct idpf_adapter *adapter,
>  	cdev_info->rdma_protocol = IIDC_RDMA_PROTOCOL_ROCEV2;
>  	privd->ftype = ftype;
>  
> +	num_mem_regions = list_count_nodes(&mmio->mmio_list);
> +	if (num_mem_regions <= IDPF_MMIO_REG_NUM_STATIC) {
> +		err = -EINVAL;
> +		goto err_plug_aux_dev;
> +	}
> +
> +	num_mem_regions -= IDPF_MMIO_REG_NUM_STATIC;
>  	privd->mapped_mem_regions =
>  		kzalloc_objs(struct iidc_rdma_lan_mapped_mem_region,
> -			     adapter->hw.num_lan_regs);
> +			     num_mem_regions);
>  	if (!privd->mapped_mem_regions) {
>  		err = -ENOMEM;
>  		goto err_plug_aux_dev;
>  	}
>  
> -	privd->num_memory_regions = cpu_to_le16(adapter->hw.num_lan_regs);
> -	for (i = 0; i < adapter->hw.num_lan_regs; i++) {
> -		privd->mapped_mem_regions[i].region_addr =
> -			adapter->hw.lan_regs[i].vaddr;
> -		privd->mapped_mem_regions[i].size =
> -			cpu_to_le64(adapter->hw.lan_regs[i].addr_len);
> -		privd->mapped_mem_regions[i].start_offset =
> -			cpu_to_le64(adapter->hw.lan_regs[i].addr_start);
> +	privd->num_memory_regions = cpu_to_le16(num_mem_regions);
> +	list_for_each_entry(mr, &mmio->mmio_list, list) {
> +		struct resource *static_regs = adapter->dev_ops.static_reg_info;
> +		bool is_static = false;
> +
> +		for (uint j = 0; j < IDPF_MMIO_REG_NUM_STATIC; j++)
> +			if (mr->offset == static_regs[j].start)
> +				is_static = true;
> +
> +		if (is_static)
> +			continue;
> +
> +		privd->mapped_mem_regions[i].region_addr = mr->addr;
> +		privd->mapped_mem_regions[i].size = cpu_to_le64(mr->size);
> +		privd->mapped_mem_regions[i++].start_offset =
> +						cpu_to_le64(mr->offset);
>  	}
>  
>  	idpf_idc_init_msix_data(adapter);
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c
> index d88ca59edf97..875472ae77fd 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_lib.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c
> @@ -1847,15 +1847,14 @@ void idpf_deinit_task(struct idpf_adapter *adapter)
>  
>  /**
>   * idpf_check_reset_complete - check that reset is complete
> - * @hw: pointer to hw struct
> + * @adapter: adapter to check
>   * @reset_reg: struct with reset registers
>   *
>   * Returns 0 if device is ready to use, or -EBUSY if it's in reset.
>   **/
> -static int idpf_check_reset_complete(struct idpf_hw *hw,
> +static int idpf_check_reset_complete(struct idpf_adapter *adapter,
>  				     struct idpf_reset_reg *reset_reg)
>  {
> -	struct idpf_adapter *adapter = hw->back;
>  	int i;
>  
>  	for (i = 0; i < 2000; i++) {
> @@ -1918,7 +1917,7 @@ static void idpf_init_hard_reset(struct idpf_adapter *adapter)
>  	}
>  
>  	/* Wait for reset to complete */
> -	err = idpf_check_reset_complete(&adapter->hw, &adapter->reset_reg);
> +	err = idpf_check_reset_complete(adapter, &adapter->reset_reg);
>  	if (err) {
>  		dev_err(dev, "The driver was unable to contact the device's firmware. Check that the FW is running. Driver state= 0x%x\n",
>  			adapter->state);
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c
> index ab3c409e587b..93b11fb1609f 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_main.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_main.c
> @@ -15,6 +15,8 @@
>  
>  MODULE_DESCRIPTION(DRV_SUMMARY);
>  MODULE_IMPORT_NS("LIBETH");
> +MODULE_IMPORT_NS("LIBIE_CP");
> +MODULE_IMPORT_NS("LIBIE_PCI");
>  MODULE_IMPORT_NS("LIBETH_XDP");
>  MODULE_LICENSE("GPL");
>  
> @@ -56,8 +58,16 @@ static int idpf_get_device_type(struct pci_dev *pdev)
>  static int idpf_dev_init(struct idpf_adapter *adapter,
>  			 const struct pci_device_id *ent)
>  {
> +	struct libie_mmio_info *mmio_info = &adapter->ctlq_ctx.mmio_info;
>  	int ret;
>  
> +	ret = libie_pci_init_dev(adapter->pdev);
> +	if (ret)
> +		return ret;
> +
> +	mmio_info->pdev = adapter->pdev;
> +	INIT_LIST_HEAD(&mmio_info->mmio_list);
> +
>  	if (ent->class == IDPF_CLASS_NETWORK_ETHERNET_PROGIF) {
>  		ret = idpf_get_device_type(adapter->pdev);
>  		switch (ret) {
> @@ -90,6 +100,15 @@ static int idpf_dev_init(struct idpf_adapter *adapter,
>  	return 0;
>  }
>  
> +/**
> + * idpf_decfg_device - deconfigure device and device specific resources
> + * @adapter: driver specific private structure
> + */
> +static void idpf_decfg_device(struct idpf_adapter *adapter)
> +{
> +	libie_pci_unmap_all_mmio_regions(&adapter->ctlq_ctx.mmio_info);
> +}
> +
>  /**
>   * idpf_remove - Device removal routine
>   * @pdev: PCI device information struct
> @@ -159,6 +178,7 @@ static void idpf_remove(struct pci_dev *pdev)
>  	mutex_destroy(&adapter->queue_lock);
>  	mutex_destroy(&adapter->vc_buf_lock);
>  
> +	idpf_decfg_device(adapter);
>  	pci_set_drvdata(pdev, NULL);
>  	kfree(adapter);
>  }
> @@ -181,46 +201,45 @@ static void idpf_shutdown(struct pci_dev *pdev)
>  }
>  
>  /**
> - * idpf_cfg_hw - Initialize HW struct
> - * @adapter: adapter to setup hw struct for
> + * idpf_cfg_device - configure device and device specific resources
> + * @adapter: driver specific private structure
>   *
> - * Returns 0 on success, negative on failure
> + * Return: %0 on success, -%errno on failure.
>   */
> -static int idpf_cfg_hw(struct idpf_adapter *adapter)
> +static int idpf_cfg_device(struct idpf_adapter *adapter)
>  {
> -	resource_size_t res_start, mbx_start, rstat_start;
> +	struct libie_mmio_info *mmio_info = &adapter->ctlq_ctx.mmio_info;
>  	struct pci_dev *pdev = adapter->pdev;
> -	struct idpf_hw *hw = &adapter->hw;
> -	struct device *dev = &pdev->dev;
> -	long len;
> -
> -	res_start = pci_resource_start(pdev, 0);
> +	struct resource *region;
> +	bool mapped = false;
> +	int err;
>  
>  	/* Map mailbox space for virtchnl communication */
> -	mbx_start = res_start + adapter->dev_ops.static_reg_info[0].start;
> -	len = resource_size(&adapter->dev_ops.static_reg_info[0]);
> -	hw->mbx.vaddr = devm_ioremap(dev, mbx_start, len);
> -	if (!hw->mbx.vaddr) {
> -		pci_err(pdev, "failed to allocate BAR0 mbx region\n");
> -
> +	region = &adapter->dev_ops.static_reg_info[0];
> +	mapped = libie_pci_map_mmio_region(mmio_info, region->start,
> +					   resource_size(region));
> +	if (!mapped) {
> +		pci_err(pdev, "failed to map BAR0 mbx region\n");
>  		return -ENOMEM;
>  	}
> -	hw->mbx.addr_start = adapter->dev_ops.static_reg_info[0].start;
> -	hw->mbx.addr_len = len;
>  
>  	/* Map rstat space for resets */
> -	rstat_start = res_start + adapter->dev_ops.static_reg_info[1].start;
> -	len = resource_size(&adapter->dev_ops.static_reg_info[1]);
> -	hw->rstat.vaddr = devm_ioremap(dev, rstat_start, len);
> -	if (!hw->rstat.vaddr) {
> -		pci_err(pdev, "failed to allocate BAR0 rstat region\n");
> +	region = &adapter->dev_ops.static_reg_info[1];
>  
> +	mapped = libie_pci_map_mmio_region(mmio_info, region->start,
> +					   resource_size(region));
> +	if (!mapped) {
> +		pci_err(pdev, "failed to map BAR0 rstat region\n");
> +		libie_pci_unmap_all_mmio_regions(mmio_info);
>  		return -ENOMEM;
>  	}
> -	hw->rstat.addr_start = adapter->dev_ops.static_reg_info[1].start;
> -	hw->rstat.addr_len = len;
>  
> -	hw->back = adapter;
> +	err = pci_enable_ptm(pdev);
> +	if (err)
> +		pci_dbg(pdev, "PCIe PTM is not supported by PCIe bus/controller\n");
> +
> +	pci_set_drvdata(pdev, adapter);
> +	adapter->hw.back = adapter;
>  
>  	return 0;
>  }
> @@ -246,32 +265,21 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	adapter->req_rx_splitq = true;
>  
>  	adapter->pdev = pdev;
> -	err = pcim_enable_device(pdev);
> -	if (err)
> -		goto err_free;
>  
> -	err = pcim_request_region(pdev, 0, pci_name(pdev));
> +	err = idpf_dev_init(adapter, ent);
>  	if (err) {
> -		pci_err(pdev, "pcim_request_region failed %pe\n", ERR_PTR(err));
> -
> +		dev_err(&pdev->dev, "Unexpected dev ID 0x%x in idpf probe\n",
> +			ent->device);
>  		goto err_free;
>  	}
>  
> -	err = pci_enable_ptm(pdev);
> -	if (err)
> -		pci_dbg(pdev, "PCIe PTM is not supported by PCIe bus/controller\n");
> -
> -	/* set up for high or low dma */
> -	err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64));
> +	err = idpf_cfg_device(adapter);
>  	if (err) {
> -		pci_err(pdev, "DMA configuration failed: %pe\n", ERR_PTR(err));
> -
> +		pci_err(pdev, "Failed to configure device specific resources: %pe\n",
> +			ERR_PTR(err));
>  		goto err_free;
>  	}
>  
> -	pci_set_master(pdev);
> -	pci_set_drvdata(pdev, adapter);
> -
>  	adapter->init_wq = alloc_workqueue("%s-%s-init",
>  					   WQ_UNBOUND | WQ_MEM_RECLAIM, 0,
>  					   dev_driver_string(dev),
> @@ -279,7 +287,7 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	if (!adapter->init_wq) {
>  		dev_err(dev, "Failed to allocate init workqueue\n");
>  		err = -ENOMEM;
> -		goto err_free;
> +		goto err_init_wq;
>  	}
>  
>  	adapter->serv_wq = alloc_workqueue("%s-%s-service",
> @@ -324,20 +332,6 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	/* setup msglvl */
>  	adapter->msg_enable = netif_msg_init(-1, IDPF_AVAIL_NETIF_M);
>  
> -	err = idpf_dev_init(adapter, ent);
> -	if (err) {
> -		dev_err(&pdev->dev, "Unexpected dev ID 0x%x in idpf probe\n",
> -			ent->device);
> -		goto destroy_vc_event_wq;
> -	}
> -
> -	err = idpf_cfg_hw(adapter);
> -	if (err) {
> -		dev_err(dev, "Failed to configure HW structure for adapter: %d\n",
> -			err);
> -		goto destroy_vc_event_wq;
> -	}
> -
>  	mutex_init(&adapter->vport_ctrl_lock);
>  	mutex_init(&adapter->vector_lock);
>  	mutex_init(&adapter->queue_lock);
> @@ -356,8 +350,6 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  
>  	return 0;
>  
> -destroy_vc_event_wq:
> -	destroy_workqueue(adapter->vc_event_wq);
>  err_vc_event_wq_alloc:
>  	destroy_workqueue(adapter->stats_wq);
>  err_stats_wq_alloc:
> @@ -366,6 +358,8 @@ static int idpf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	destroy_workqueue(adapter->serv_wq);
>  err_serv_wq_alloc:
>  	destroy_workqueue(adapter->init_wq);
> +err_init_wq:
> +	idpf_decfg_device(adapter);
>  err_free:
>  	kfree(adapter);
>  	return err;
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c b/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c
> index a07d7e808ca9..98b8f678bd9a 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c
> @@ -15,31 +15,28 @@
>  static void idpf_vf_ctlq_reg_init(struct idpf_adapter *adapter,
>  				  struct idpf_ctlq_create_info *cq)
>  {
> -	resource_size_t mbx_start = adapter->dev_ops.static_reg_info[0].start;
> -	int i;
> -
> -	for (i = 0; i < IDPF_NUM_DFLT_MBX_Q; i++) {
> +	for (int i = 0; i < IDPF_NUM_DFLT_MBX_Q; i++) {
>  		struct idpf_ctlq_create_info *ccq = cq + i;
>  
>  		switch (ccq->type) {
>  		case IDPF_CTLQ_TYPE_MAILBOX_TX:
>  			/* set head and tail registers in our local struct */
> -			ccq->reg.head = VF_ATQH - mbx_start;
> -			ccq->reg.tail = VF_ATQT - mbx_start;
> -			ccq->reg.len = VF_ATQLEN - mbx_start;
> -			ccq->reg.bah = VF_ATQBAH - mbx_start;
> -			ccq->reg.bal = VF_ATQBAL - mbx_start;
> +			ccq->reg.head = VF_ATQH;
> +			ccq->reg.tail = VF_ATQT;
> +			ccq->reg.len = VF_ATQLEN;
> +			ccq->reg.bah = VF_ATQBAH;
> +			ccq->reg.bal = VF_ATQBAL;
>  			ccq->reg.len_mask = VF_ATQLEN_ATQLEN_M;
>  			ccq->reg.len_ena_mask = VF_ATQLEN_ATQENABLE_M;
>  			ccq->reg.head_mask = VF_ATQH_ATQH_M;
>  			break;
>  		case IDPF_CTLQ_TYPE_MAILBOX_RX:
>  			/* set head and tail registers in our local struct */
> -			ccq->reg.head = VF_ARQH - mbx_start;
> -			ccq->reg.tail = VF_ARQT - mbx_start;
> -			ccq->reg.len = VF_ARQLEN - mbx_start;
> -			ccq->reg.bah = VF_ARQBAH - mbx_start;
> -			ccq->reg.bal = VF_ARQBAL - mbx_start;
> +			ccq->reg.head = VF_ARQH;
> +			ccq->reg.tail = VF_ARQT;
> +			ccq->reg.len = VF_ARQLEN;
> +			ccq->reg.bah = VF_ARQBAH;
> +			ccq->reg.bal = VF_ARQBAL;
>  			ccq->reg.len_mask = VF_ARQLEN_ARQLEN_M;
>  			ccq->reg.len_ena_mask = VF_ARQLEN_ARQENABLE_M;
>  			ccq->reg.head_mask = VF_ARQH_ARQH_M;
> @@ -56,13 +53,14 @@ static void idpf_vf_ctlq_reg_init(struct idpf_adapter *adapter,
>   */
>  static void idpf_vf_mb_intr_reg_init(struct idpf_adapter *adapter)
>  {
> +	struct libie_mmio_info *mmio = &adapter->ctlq_ctx.mmio_info;
>  	struct idpf_intr_reg *intr = &adapter->mb_vector.intr_reg;
>  	u32 dyn_ctl = le32_to_cpu(adapter->caps.mailbox_dyn_ctl);
>  
> -	intr->dyn_ctl = idpf_get_reg_addr(adapter, dyn_ctl);
> +	intr->dyn_ctl = libie_pci_get_mmio_addr(mmio, dyn_ctl);
>  	intr->dyn_ctl_intena_m = VF_INT_DYN_CTL0_INTENA_M;
>  	intr->dyn_ctl_itridx_m = VF_INT_DYN_CTL0_ITR_INDX_M;
> -	intr->icr_ena = idpf_get_reg_addr(adapter, VF_INT_ICR0_ENA1);
> +	intr->icr_ena = libie_pci_get_mmio_addr(mmio, VF_INT_ICR0_ENA1);
>  	intr->icr_ena_ctlq_m = VF_INT_ICR0_ENA1_ADMINQ_M;
>  }
>  
> @@ -77,6 +75,7 @@ static int idpf_vf_intr_reg_init(struct idpf_vport *vport,
>  	struct idpf_adapter *adapter = vport->adapter;
>  	u16 num_vecs = rsrc->num_q_vectors;
>  	struct idpf_vec_regs *reg_vals;
> +	struct libie_mmio_info *mmio;
>  	int num_regs, i, err = 0;
>  	u32 rx_itr, tx_itr, val;
>  	u16 total_vecs;
> @@ -92,14 +91,17 @@ static int idpf_vf_intr_reg_init(struct idpf_vport *vport,
>  		goto free_reg_vals;
>  	}
>  
> +	mmio = &adapter->ctlq_ctx.mmio_info;
> +
>  	for (i = 0; i < num_vecs; i++) {
>  		struct idpf_q_vector *q_vector = &rsrc->q_vectors[i];
>  		u16 vec_id = rsrc->q_vector_idxs[i] - IDPF_MBX_Q_VEC;
>  		struct idpf_intr_reg *intr = &q_vector->intr_reg;
> +		struct idpf_vec_regs *reg = &reg_vals[vec_id];
>  		u32 spacing;
>  
> -		intr->dyn_ctl = idpf_get_reg_addr(adapter,
> -						  reg_vals[vec_id].dyn_ctl_reg);
> +		intr->dyn_ctl = libie_pci_get_mmio_addr(mmio,
> +							reg->dyn_ctl_reg);
>  		intr->dyn_ctl_intena_m = VF_INT_DYN_CTLN_INTENA_M;
>  		intr->dyn_ctl_intena_msk_m = VF_INT_DYN_CTLN_INTENA_MSK_M;
>  		intr->dyn_ctl_itridx_s = VF_INT_DYN_CTLN_ITR_INDX_S;
> @@ -109,22 +111,21 @@ static int idpf_vf_intr_reg_init(struct idpf_vport *vport,
>  		intr->dyn_ctl_sw_itridx_ena_m =
>  			VF_INT_DYN_CTLN_SW_ITR_INDX_ENA_M;
>  
> -		spacing = IDPF_ITR_IDX_SPACING(reg_vals[vec_id].itrn_index_spacing,
> +		spacing = IDPF_ITR_IDX_SPACING(reg->itrn_index_spacing,
>  					       IDPF_VF_ITR_IDX_SPACING);
>  		rx_itr = VF_INT_ITRN_ADDR(VIRTCHNL2_ITR_IDX_0,
> -					  reg_vals[vec_id].itrn_reg,
> -					  spacing);
> +					  reg->itrn_reg, spacing);
>  		tx_itr = VF_INT_ITRN_ADDR(VIRTCHNL2_ITR_IDX_1,
> -					  reg_vals[vec_id].itrn_reg,
> -					  spacing);
> -		intr->rx_itr = idpf_get_reg_addr(adapter, rx_itr);
> -		intr->tx_itr = idpf_get_reg_addr(adapter, tx_itr);
> +					  reg->itrn_reg, spacing);
> +		intr->rx_itr = libie_pci_get_mmio_addr(mmio, rx_itr);
> +		intr->tx_itr = libie_pci_get_mmio_addr(mmio, tx_itr);
>  	}
>  
>  	/* Data vector for NOIRQ queues */
>  
>  	val = reg_vals[rsrc->q_vector_idxs[i] - IDPF_MBX_Q_VEC].dyn_ctl_reg;
> -	rsrc->noirq_dyn_ctl = idpf_get_reg_addr(adapter, val);
> +	rsrc->noirq_dyn_ctl =
> +		libie_pci_get_mmio_addr(&adapter->ctlq_ctx.mmio_info, val);
>  
>  	val = VF_INT_DYN_CTLN_WB_ON_ITR_M | VF_INT_DYN_CTLN_INTENA_MSK_M |
>  	      FIELD_PREP(VF_INT_DYN_CTLN_ITR_INDX_M, IDPF_NO_ITR_UPDATE_IDX);
> @@ -142,7 +143,9 @@ static int idpf_vf_intr_reg_init(struct idpf_vport *vport,
>   */
>  static void idpf_vf_reset_reg_init(struct idpf_adapter *adapter)
>  {
> -	adapter->reset_reg.rstat = idpf_get_rstat_reg_addr(adapter, VFGEN_RSTAT);
> +	adapter->reset_reg.rstat =
> +		libie_pci_get_mmio_addr(&adapter->ctlq_ctx.mmio_info,
> +					VFGEN_RSTAT);
>  	adapter->reset_reg.rstat_m = VFGEN_RSTAT_VFR_STATE_M;
>  }
>  
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> index d4546d62cca9..3e6411a07e4d 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> @@ -2,6 +2,7 @@
>  /* Copyright (C) 2023 Intel Corporation */
>  
>  #include <linux/export.h>
> +#include <linux/intel/libie/pci.h>
>  #include <net/libeth/rx.h>
>  
>  #include "idpf.h"
> @@ -1020,12 +1021,46 @@ static int idpf_send_get_caps_msg(struct idpf_adapter *adapter)
>  }
>  
>  /**
> - * idpf_send_get_lan_memory_regions - Send virtchnl get LAN memory regions msg
> + * idpf_mmio_region_non_static - Check if region is not static
> + * @mmio_info: PCI resources info
> + * @reg: region to check
> + *
> + * Return: %true if region can be received though virtchnl command,
> + *	   %false if region is related to mailbox or resetting
> + */
> +static bool idpf_mmio_region_non_static(struct libie_mmio_info *mmio_info,
> +					struct libie_pci_mmio_region *reg)
> +{
> +	struct idpf_adapter *adapter =
> +		container_of(mmio_info, struct idpf_adapter,
> +			     ctlq_ctx.mmio_info);
> +
> +	for (uint i = 0; i < IDPF_MMIO_REG_NUM_STATIC; i++) {
> +		if (reg->bar_idx == 0 &&
> +		    reg->offset == adapter->dev_ops.static_reg_info[i].start)
> +			return false;
> +	}
> +
> +	return true;
> +}
> +
> +/**
> + * idpf_decfg_lan_memory_regions - Unmap non-static memory regions
> + * @adapter: Driver specific private structure
> + */
> +static void idpf_decfg_lan_memory_regions(struct idpf_adapter *adapter)
> +{
> +	libie_pci_unmap_fltr_regs(&adapter->ctlq_ctx.mmio_info,
> +				  idpf_mmio_region_non_static);
> +}
> +
> +/**
> + * idpf_cfg_lan_memory_regions - Send virtchnl get LAN memory regions msg
>   * @adapter: Driver specific private struct
>   *
>   * Return: 0 on success or error code on failure.
>   */
> -static int idpf_send_get_lan_memory_regions(struct idpf_adapter *adapter)
> +static int idpf_cfg_lan_memory_regions(struct idpf_adapter *adapter)
>  {
>  	struct virtchnl2_get_lan_memory_regions *rcvd_regions __free(kfree);
>  	struct idpf_vc_xn_params xn_params = {
> @@ -1037,7 +1072,6 @@ static int idpf_send_get_lan_memory_regions(struct idpf_adapter *adapter)
>  		.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
>  	};
>  	int num_regions, size;
> -	struct idpf_hw *hw;
>  	ssize_t reply_sz;
>  	int err = 0;
>  
> @@ -1060,86 +1094,56 @@ static int idpf_send_get_lan_memory_regions(struct idpf_adapter *adapter)
>  	if (size > IDPF_CTLQ_MAX_BUF_LEN)
>  		return -EINVAL;
>  
> -	hw = &adapter->hw;
> -	hw->lan_regs = kzalloc_objs(*hw->lan_regs, num_regions);
> -	if (!hw->lan_regs)
> -		return -ENOMEM;
> -
>  	for (int i = 0; i < num_regions; i++) {
> -		hw->lan_regs[i].addr_len =
> -			le64_to_cpu(rcvd_regions->mem_reg[i].size);
> -		hw->lan_regs[i].addr_start =
> -			le64_to_cpu(rcvd_regions->mem_reg[i].start_offset);
> +		struct libie_mmio_info *mmio = &adapter->ctlq_ctx.mmio_info;
> +		resource_size_t offset, len;
> +
> +		offset = le64_to_cpu(rcvd_regions->mem_reg[i].start_offset);
> +		len = le64_to_cpu(rcvd_regions->mem_reg[i].size);
> +		if (len && !libie_pci_map_mmio_region(mmio, offset, len)) {
> +			idpf_decfg_lan_memory_regions(adapter);
> +			return -EIO;
> +		}
>  	}
> -	hw->num_lan_regs = num_regions;
>  
>  	return err;
>  }
>  
>  /**
> - * idpf_calc_remaining_mmio_regs - calculate MMIO regions outside mbx and rstat
> + * idpf_map_remaining_mmio_regs - map MMIO regions outside mbx and rstat
>   * @adapter: Driver specific private structure
>   *
> - * Called when idpf_send_get_lan_memory_regions is not supported. This will
> + * Called when idpf_cfg_lan_memory_regions is not supported. This will
>   * calculate the offsets and sizes for the regions before, in between, and
>   * after the mailbox and rstat MMIO mappings.
>   *
>   * Return: 0 on success or error code on failure.
>   */
> -static int idpf_calc_remaining_mmio_regs(struct idpf_adapter *adapter)
> +static int idpf_map_remaining_mmio_regs(struct idpf_adapter *adapter)
>  {
>  	struct resource *rstat_reg = &adapter->dev_ops.static_reg_info[1];
>  	struct resource *mbx_reg = &adapter->dev_ops.static_reg_info[0];
> -	struct idpf_hw *hw = &adapter->hw;
> -
> -	hw->num_lan_regs = IDPF_MMIO_MAP_FALLBACK_MAX_REMAINING;
> -	hw->lan_regs = kzalloc_objs(*hw->lan_regs, hw->num_lan_regs);
> -	if (!hw->lan_regs)
> -		return -ENOMEM;
> +	struct libie_mmio_info *mmio = &adapter->ctlq_ctx.mmio_info;
> +	resource_size_t reg_start, size;
> +	bool ok = true;
>  
>  	/* Region preceding mailbox */
> -	hw->lan_regs[0].addr_start = 0;
> -	hw->lan_regs[0].addr_len = mbx_reg->start;
> -	/* Region between mailbox and rstat */
> -	hw->lan_regs[1].addr_start = mbx_reg->end + 1;
> -	hw->lan_regs[1].addr_len = rstat_reg->start -
> -					hw->lan_regs[1].addr_start;
> -	/* Region after rstat */
> -	hw->lan_regs[2].addr_start = rstat_reg->end + 1;
> -	hw->lan_regs[2].addr_len = pci_resource_len(adapter->pdev, 0) -
> -					hw->lan_regs[2].addr_start;
> -
> -	return 0;
> -}
> -
> -/**
> - * idpf_map_lan_mmio_regs - map remaining LAN BAR regions
> - * @adapter: Driver specific private structure
> - *
> - * Return: 0 on success or error code on failure.
> - */
> -static int idpf_map_lan_mmio_regs(struct idpf_adapter *adapter)
> -{
> -	struct pci_dev *pdev = adapter->pdev;
> -	struct idpf_hw *hw = &adapter->hw;
> -	resource_size_t res_start;
> +	size = mbx_reg->start;
> +	ok &= !size || libie_pci_map_mmio_region(mmio, 0, size);
>  
> -	res_start = pci_resource_start(pdev, 0);
> -
> -	for (int i = 0; i < hw->num_lan_regs; i++) {
> -		resource_size_t start;
> -		long len;
> +	/* Region between mailbox and rstat */
> +	reg_start = mbx_reg->end + 1;
> +	size = rstat_reg->start - reg_start;
> +	ok &= !size || libie_pci_map_mmio_region(mmio, reg_start, size);
>  
> -		len = hw->lan_regs[i].addr_len;
> -		if (!len)
> -			continue;
> -		start = hw->lan_regs[i].addr_start + res_start;
> +	/* Region after rstat */
> +	reg_start = rstat_reg->end + 1;
> +	size = pci_resource_len(adapter->pdev, 0) - reg_start;
> +	ok &= !size || libie_pci_map_mmio_region(mmio, reg_start, size);
>  
> -		hw->lan_regs[i].vaddr = devm_ioremap(&pdev->dev, start, len);
> -		if (!hw->lan_regs[i].vaddr) {
> -			pci_err(pdev, "failed to allocate BAR0 region\n");
> -			return -ENOMEM;
> -		}
> +	if (!ok) {
> +		idpf_decfg_lan_memory_regions(adapter);
> +		return -ENOMEM;
>  	}
>  
>  	return 0;
> @@ -1413,7 +1417,7 @@ static int __idpf_queue_reg_init(struct idpf_vport *vport,
>  				 struct idpf_q_vec_rsrc *rsrc, u32 *reg_vals,
>  				 int num_regs, u32 q_type)
>  {
> -	struct idpf_adapter *adapter = vport->adapter;
> +	struct libie_mmio_info *mmio = &vport->adapter->ctlq_ctx.mmio_info;
>  	int i, j, k = 0;
>  
>  	switch (q_type) {
> @@ -1423,7 +1427,8 @@ static int __idpf_queue_reg_init(struct idpf_vport *vport,
>  
>  			for (j = 0; j < tx_qgrp->num_txq && k < num_regs; j++, k++)
>  				tx_qgrp->txqs[j]->tail =
> -					idpf_get_reg_addr(adapter, reg_vals[k]);
> +					libie_pci_get_mmio_addr(mmio,
> +								reg_vals[k]);
>  		}
>  		break;
>  	case VIRTCHNL2_QUEUE_TYPE_RX:
> @@ -1435,8 +1440,8 @@ static int __idpf_queue_reg_init(struct idpf_vport *vport,
>  				struct idpf_rx_queue *q;
>  
>  				q = rx_qgrp->singleq.rxqs[j];
> -				q->tail = idpf_get_reg_addr(adapter,
> -							    reg_vals[k]);
> +				q->tail = libie_pci_get_mmio_addr(mmio,
> +								  reg_vals[k]);
>  			}
>  		}
>  		break;
> @@ -1449,8 +1454,8 @@ static int __idpf_queue_reg_init(struct idpf_vport *vport,
>  				struct idpf_buf_queue *q;
>  
>  				q = &rx_qgrp->splitq.bufq_sets[j].bufq;
> -				q->tail = idpf_get_reg_addr(adapter,
> -							    reg_vals[k]);
> +				q->tail = libie_pci_get_mmio_addr(mmio,
> +								  reg_vals[k]);
>  			}
>  		}
>  		break;
> @@ -3520,35 +3525,30 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
>  	}
>  
>  	if (idpf_is_cap_ena(adapter, IDPF_OTHER_CAPS, VIRTCHNL2_CAP_LAN_MEMORY_REGIONS)) {
> -		err = idpf_send_get_lan_memory_regions(adapter);
> +		err = idpf_cfg_lan_memory_regions(adapter);
>  		if (err) {
> -			dev_err(&adapter->pdev->dev, "Failed to get LAN memory regions: %d\n",
> +			dev_err(&adapter->pdev->dev, "Failed to configure LAN memory regions: %d\n",
>  				err);
>  			return -EINVAL;
>  		}
>  	} else {
>  		/* Fallback to mapping the remaining regions of the entire BAR */
> -		err = idpf_calc_remaining_mmio_regs(adapter);
> +		err = idpf_map_remaining_mmio_regs(adapter);
>  		if (err) {
> -			dev_err(&adapter->pdev->dev, "Failed to allocate BAR0 region(s): %d\n",
> +			dev_err(&adapter->pdev->dev, "Failed to configure BAR0 region(s): %d\n",
>  				err);
> -			return -ENOMEM;
> +			return err;
>  		}
>  	}
>  
> -	err = idpf_map_lan_mmio_regs(adapter);
> -	if (err) {
> -		dev_err(&adapter->pdev->dev, "Failed to map BAR0 region(s): %d\n",
> -			err);
> -		return -ENOMEM;
> -	}
> -
>  	pci_sriov_set_totalvfs(adapter->pdev, idpf_get_max_vfs(adapter));
>  	num_max_vports = idpf_get_max_vports(adapter);
>  	adapter->max_vports = num_max_vports;
>  	adapter->vports = kzalloc_objs(*adapter->vports, num_max_vports);
> -	if (!adapter->vports)
> -		return -ENOMEM;
> +	if (!adapter->vports) {
> +		err = -ENOMEM;
> +		goto decfg_regions;
> +	}
>  
>  	if (!adapter->netdevs) {
>  		adapter->netdevs = kzalloc_objs(struct net_device *,
> @@ -3614,6 +3614,8 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
>  err_netdev_alloc:
>  	kfree(adapter->vports);
>  	adapter->vports = NULL;
> +decfg_regions:
> +	idpf_decfg_lan_memory_regions(adapter);
>  	return err;
>  
>  init_failed:
> @@ -3647,7 +3649,6 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
>   */
>  void idpf_vc_core_deinit(struct idpf_adapter *adapter)
>  {
> -	struct idpf_hw *hw = &adapter->hw;
>  	bool remove_in_prog;
>  
>  	if (!test_bit(IDPF_VC_CORE_INIT, adapter->flags))
> @@ -3672,12 +3673,10 @@ void idpf_vc_core_deinit(struct idpf_adapter *adapter)
>  
>  	idpf_vport_params_buf_rel(adapter);
>  
> -	kfree(hw->lan_regs);
> -	hw->lan_regs = NULL;
> -
>  	kfree(adapter->vports);
>  	adapter->vports = NULL;
>  
> +	idpf_decfg_lan_memory_regions(adapter);
>  	clear_bit(IDPF_VC_CORE_INIT, adapter->flags);
>  }
>  
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl_ptp.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl_ptp.c
> index d9bcc3f61c65..8d8fb498e092 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl_ptp.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl_ptp.c
> @@ -31,6 +31,7 @@ int idpf_ptp_get_caps(struct idpf_adapter *adapter)
>  		.timeout_ms = IDPF_VC_XN_DEFAULT_TIMEOUT_MSEC,
>  	};
>  	struct virtchnl2_ptp_cross_time_reg_offsets cross_tstamp_offsets;
> +	struct libie_mmio_info *mmio = &adapter->ctlq_ctx.mmio_info;
>  	struct virtchnl2_ptp_clk_adj_reg_offsets clk_adj_offsets;
>  	struct virtchnl2_ptp_clk_reg_offsets clock_offsets;
>  	struct idpf_ptp_secondary_mbx *scnd_mbx;
> @@ -76,19 +77,20 @@ int idpf_ptp_get_caps(struct idpf_adapter *adapter)
>  	clock_offsets = recv_ptp_caps_msg->clk_offsets;
>  
>  	temp_offset = le32_to_cpu(clock_offsets.dev_clk_ns_l);
> -	ptp->dev_clk_regs.dev_clk_ns_l = idpf_get_reg_addr(adapter,
> -							   temp_offset);
> +	ptp->dev_clk_regs.dev_clk_ns_l =
> +		libie_pci_get_mmio_addr(mmio, temp_offset);
>  	temp_offset = le32_to_cpu(clock_offsets.dev_clk_ns_h);
> -	ptp->dev_clk_regs.dev_clk_ns_h = idpf_get_reg_addr(adapter,
> -							   temp_offset);
> +	ptp->dev_clk_regs.dev_clk_ns_h =
> +		libie_pci_get_mmio_addr(mmio, temp_offset);
>  	temp_offset = le32_to_cpu(clock_offsets.phy_clk_ns_l);
> -	ptp->dev_clk_regs.phy_clk_ns_l = idpf_get_reg_addr(adapter,
> -							   temp_offset);
> +	ptp->dev_clk_regs.phy_clk_ns_l =
> +		libie_pci_get_mmio_addr(mmio, temp_offset);
>  	temp_offset = le32_to_cpu(clock_offsets.phy_clk_ns_h);
> -	ptp->dev_clk_regs.phy_clk_ns_h = idpf_get_reg_addr(adapter,
> -							   temp_offset);
> +	ptp->dev_clk_regs.phy_clk_ns_h =
> +		libie_pci_get_mmio_addr(mmio, temp_offset);
>  	temp_offset = le32_to_cpu(clock_offsets.cmd_sync_trigger);
> -	ptp->dev_clk_regs.cmd_sync = idpf_get_reg_addr(adapter, temp_offset);
> +	ptp->dev_clk_regs.cmd_sync =
> +		libie_pci_get_mmio_addr(mmio, temp_offset);
>  
>  cross_tstamp:
>  	access_type = ptp->get_cross_tstamp_access;
> @@ -98,13 +100,14 @@ int idpf_ptp_get_caps(struct idpf_adapter *adapter)
>  	cross_tstamp_offsets = recv_ptp_caps_msg->cross_time_offsets;
>  
>  	temp_offset = le32_to_cpu(cross_tstamp_offsets.sys_time_ns_l);
> -	ptp->dev_clk_regs.sys_time_ns_l = idpf_get_reg_addr(adapter,
> -							    temp_offset);
> +	ptp->dev_clk_regs.sys_time_ns_l =
> +		libie_pci_get_mmio_addr(mmio, temp_offset);
>  	temp_offset = le32_to_cpu(cross_tstamp_offsets.sys_time_ns_h);
> -	ptp->dev_clk_regs.sys_time_ns_h = idpf_get_reg_addr(adapter,
> -							    temp_offset);
> +	ptp->dev_clk_regs.sys_time_ns_h =
> +		libie_pci_get_mmio_addr(mmio, temp_offset);
>  	temp_offset = le32_to_cpu(cross_tstamp_offsets.cmd_sync_trigger);
> -	ptp->dev_clk_regs.cmd_sync = idpf_get_reg_addr(adapter, temp_offset);
> +	ptp->dev_clk_regs.cmd_sync =
> +		libie_pci_get_mmio_addr(mmio, temp_offset);
>  
>  discipline_clock:
>  	access_type = ptp->adj_dev_clk_time_access;
> @@ -115,29 +118,32 @@ int idpf_ptp_get_caps(struct idpf_adapter *adapter)
>  
>  	/* Device clock offsets */
>  	temp_offset = le32_to_cpu(clk_adj_offsets.dev_clk_cmd_type);
> -	ptp->dev_clk_regs.cmd = idpf_get_reg_addr(adapter, temp_offset);
> +	ptp->dev_clk_regs.cmd = libie_pci_get_mmio_addr(mmio, temp_offset);
>  	temp_offset = le32_to_cpu(clk_adj_offsets.dev_clk_incval_l);
> -	ptp->dev_clk_regs.incval_l = idpf_get_reg_addr(adapter, temp_offset);
> +	ptp->dev_clk_regs.incval_l = libie_pci_get_mmio_addr(mmio, temp_offset);
>  	temp_offset = le32_to_cpu(clk_adj_offsets.dev_clk_incval_h);
> -	ptp->dev_clk_regs.incval_h = idpf_get_reg_addr(adapter, temp_offset);
> +	ptp->dev_clk_regs.incval_h = libie_pci_get_mmio_addr(mmio, temp_offset);
>  	temp_offset = le32_to_cpu(clk_adj_offsets.dev_clk_shadj_l);
> -	ptp->dev_clk_regs.shadj_l = idpf_get_reg_addr(adapter, temp_offset);
> +	ptp->dev_clk_regs.shadj_l = libie_pci_get_mmio_addr(mmio, temp_offset);
>  	temp_offset = le32_to_cpu(clk_adj_offsets.dev_clk_shadj_h);
> -	ptp->dev_clk_regs.shadj_h = idpf_get_reg_addr(adapter, temp_offset);
> +	ptp->dev_clk_regs.shadj_h = libie_pci_get_mmio_addr(mmio, temp_offset);
>  
>  	/* PHY clock offsets */
>  	temp_offset = le32_to_cpu(clk_adj_offsets.phy_clk_cmd_type);
> -	ptp->dev_clk_regs.phy_cmd = idpf_get_reg_addr(adapter, temp_offset);
> +	ptp->dev_clk_regs.phy_cmd =
> +		libie_pci_get_mmio_addr(mmio, temp_offset);
>  	temp_offset = le32_to_cpu(clk_adj_offsets.phy_clk_incval_l);
> -	ptp->dev_clk_regs.phy_incval_l = idpf_get_reg_addr(adapter,
> -							   temp_offset);
> +	ptp->dev_clk_regs.phy_incval_l =
> +		libie_pci_get_mmio_addr(mmio, temp_offset);
>  	temp_offset = le32_to_cpu(clk_adj_offsets.phy_clk_incval_h);
> -	ptp->dev_clk_regs.phy_incval_h = idpf_get_reg_addr(adapter,
> -							   temp_offset);
> +	ptp->dev_clk_regs.phy_incval_h =
> +		libie_pci_get_mmio_addr(mmio, temp_offset);
>  	temp_offset = le32_to_cpu(clk_adj_offsets.phy_clk_shadj_l);
> -	ptp->dev_clk_regs.phy_shadj_l = idpf_get_reg_addr(adapter, temp_offset);
> +	ptp->dev_clk_regs.phy_shadj_l =
> +		libie_pci_get_mmio_addr(mmio, temp_offset);
>  	temp_offset = le32_to_cpu(clk_adj_offsets.phy_clk_shadj_h);
> -	ptp->dev_clk_regs.phy_shadj_h = idpf_get_reg_addr(adapter, temp_offset);
> +	ptp->dev_clk_regs.phy_shadj_h =
> +		libie_pci_get_mmio_addr(mmio, temp_offset);
>  
>  	return 0;
>  }
> -- 
> 2.47.1
> 

^ permalink raw reply

* Re: [Linaro-mm-sig] Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Christian König @ 2026-05-18  7:34 UTC (permalink / raw)
  To: Barry Song, T.J. Mercier
  Cc: Albert Esteve, Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Sumit Semwal, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
	Benjamin Gaignard, Brian Starkey, John Stultz, Christian Brauner,
	Paul Moore, James Morris, Serge E. Hallyn, Stephen Smalley,
	Ondrej Mosnacek, Shuah Khan, cgroups, linux-doc, linux-kernel,
	linux-media, dri-, linaro-mm-sig, linux-mm, linux-security-module,
	selinux, linux-kselftest, mripard, echanude
In-Reply-To: <CAGsJ_4zyecY6E-=Tm4_couT7uoM9LMcFdTMUPkZAjj4zUKE-dQ@mail.gmail.com>

On 5/16/26 11:19, Barry Song wrote:
> On Thu, May 14, 2026 at 12:35 AM T.J. Mercier <tjmercier@google.com> wrote:
> [...]
>>>> I have a question about this part. Albert I guess you are interested
>>>> only in accounting dmabuf-heap allocations, or do you expect to add
>>>> __GFP_ACCOUNT or mem_cgroup_charge_dmabuf calls to other
>>>> non-dmabuf-heap exporters?
>>>
>>> We're scoping this to dma-buf heaps for now. CMA heaps and the dmem
>>> controller are on the radar for follow-up/parallel work (there will be
>>> dragons and will surely need discussion). For DRM and V4L2 the
>>> long-term intent is migration to heaps, which would make direct
>>> accounting on those paths unnecessary.
>>
>> Ah I see. GEM buffers exported to dmabufs are what I had in mind. I
>> guess this would only leave the odd non-DRM driver with the need to
>> add their own accounting calls, which I don't expect would be a big
>> problem.
>>
> 
> sounds like we still have a long way to go to correctly account for
> various v4l2, drm, GEM, CMA, etc. In patch 1, the charging is done in
> dma_buf_export(), so I guess it covers all dma-buf types except
> dma_heap, but the problem is that it has no remote charging support at
> all?

No, just the other way around

DMA-buf heaps can be handled here because we know that it is pure system memory and nothing special so memcg always applies.

dma_buf_export() on the other hand handles tons of different use cases, ranging from buffer accounted to dmem, over special resources which aren't even memory all the way to buffers which can migrate from dmem to memcg and back during their lifetime.

>>> udmabufs are already
>>> memcg-charged, so adding a separate MEMCG_DMABUF would double count.
>>> Are there any other exporters you had in mind that would benefit from
>>> this approach?

Well apart from DMA-buf memfd_create() is one of the things which as broken our neck in the past a couple of times.

But thinking more about it what if instead of making this DMA-buf heaps specific what if we have a general cgroups function which allows to change accounting of a buffer referenced by a file descriptor to a different process?

That would cover not only the DMA-buf heaps use case, but also all other DMA-buf with dmem and whatever we come up in the future as well.

The only drawback I can see is that DMA-buf heap allocations would be temporarily accounted to the memory allocation daemon, but I don't think that this would be a problem.

Regards,
Christian.

> 
> Thanks
> Barry

^ permalink raw reply

* Re: [PATCH net-next v3 05/14] libie: add bookkeeping support for control queue messages
From: Larysa Zaremba @ 2026-05-18  7:24 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev, Tony Nguyen
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	Phani R Burra, przemyslaw.kitszel, aleksander.lobakin,
	sridhar.samudrala, anjali.singhai, michal.swiatkowski,
	maciej.fijalkowski, emil.s.tantilov, madhu.chittim, joshua.a.hay,
	jacob.e.keller, jayaprakash.shanmugam, jiri, horms, corbet,
	richardcochran, linux-doc, Bharath R, Samuel Salin,
	Aleksandr Loktionov
In-Reply-To: <20260515224443.2772147-6-anthony.l.nguyen@intel.com>

On Fri, May 15, 2026 at 03:44:29PM -0700, Tony Nguyen wrote:
> From: Phani R Burra <phani.r.burra@intel.com>
> 
> All send control queue messages are allocated/freed in libie itself and
> tracked with the unique transaction (Xn) ids until they receive response or
> time out. Responses can be received out of order, therefore transactions
> are stored in an array and tracked though a bitmap.
> 
> Pre-allocated DMA memory is used where possible. It reduces the driver
> overhead in handling memory allocation/free and message timeouts.

I had reviewed the Sashiko feedback [0]. Here is why I not find the feedback 
very helpful for this particular patch:

1. "Should the cookie be tracked per-slot instead?" - it is, the xn cookie is a 
   combination of xn manager cookie + xn index.
2. "If the callback attempts to send a follow-up message" - not intended.
3. "[if] the driver only expects matched responses (providing no default 
    handler), will this dereference a NULL pointer" - no, we can expect members 
    params to be initialized properly.
4. This code is not intended to run in NAPI.
5. "could the hardware eventually read the new payload but process it using the 
    old descriptor's opcode, causing control plane data corruption?" - it is 
    highly unlikely, timeout time is very long, if HW queue stalls for that 
    long, data corruption is the least of our concerns.
6. "recv_mem still contains the pointer from a previous successful transaction 
    (which was already passed to the caller and freed/consumed), will
    this cause a double free of the page pool receive buffer?" - no, such 
    transaction is treated as a timed-out, and hence not containing a valid 
    recv_mem.
7. "Active transactions in the LIBIE_CTLQ_XN_ASYNC state are unconditionally
    pushed back to the free list, skipping the invocation of xn->resp_cb()."
   intended, resp_cb() is not supposed to be used for cleanup.

[0] https://sashiko.dev/#/patchset/20260515224443.2772147-1-anthony.l.nguyen%40intel.com


> 
> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> Signed-off-by: Phani R Burra <phani.r.burra@intel.com>
> Co-developed-by: Victor Raj <victor.raj@intel.com>
> Signed-off-by: Victor Raj <victor.raj@intel.com>
> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
> Co-developed-by: Larysa Zaremba <larysa.zaremba@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> Tested-by: Bharath R <bharath.r@intel.com>
> Tested-by: Samuel Salin <Samuel.salin@intel.com>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> ---
>  drivers/net/ethernet/intel/libie/controlq.c | 599 ++++++++++++++++++++
>  include/linux/intel/libie/controlq.h        | 177 ++++++
>  2 files changed, 776 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/libie/controlq.c b/drivers/net/ethernet/intel/libie/controlq.c
> index 934721c98f34..5b06d797facf 100644
> --- a/drivers/net/ethernet/intel/libie/controlq.c
> +++ b/drivers/net/ethernet/intel/libie/controlq.c
> @@ -609,6 +609,605 @@ u32 libie_ctlq_recv(struct libie_ctlq_info *ctlq, struct libie_ctlq_msg *msg,
>  }
>  EXPORT_SYMBOL_NS_GPL(libie_ctlq_recv, "LIBIE_CP");
>  
> +/**
> + * libie_ctlq_xn_pop_free - get a free Xn entry from the free list
> + * @xnm: Xn transaction manager
> + *
> + * Retrieve a free Xn entry from the free list.
> + *
> + * Return: valid Xn entry pointer or NULL if there are no free Xn entries.
> + */
> +static struct libie_ctlq_xn *
> +libie_ctlq_xn_pop_free(struct libie_ctlq_xn_manager *xnm)
> +{
> +	struct libie_ctlq_xn *xn;
> +	u32 free_idx;
> +
> +	guard(spinlock)(&xnm->free_xns_bm_lock);
> +
> +	if (unlikely(xnm->shutdown))
> +		return NULL;
> +
> +	free_idx = find_next_bit(xnm->free_xns_bm, LIBIE_CTLQ_MAX_XN_ENTRIES,
> +				 0);
> +	if (free_idx == LIBIE_CTLQ_MAX_XN_ENTRIES)
> +		return NULL;
> +
> +	__clear_bit(free_idx, xnm->free_xns_bm);
> +	xn = &xnm->ring[free_idx];
> +	xn->cookie = xnm->cookie++;
> +
> +	return xn;
> +}
> +
> +/**
> + * __libie_ctlq_xn_push_free - unsafely push a Xn entry into the free list
> + * @xnm: Xn transaction manager
> + * @xn: xn entry to be added into the free list
> + */
> +static void __libie_ctlq_xn_push_free(struct libie_ctlq_xn_manager *xnm,
> +				      struct libie_ctlq_xn *xn)
> +{
> +	__set_bit(xn->index, xnm->free_xns_bm);
> +
> +	if (likely(!xnm->shutdown))
> +		return;
> +
> +	if (bitmap_full(xnm->free_xns_bm, LIBIE_CTLQ_MAX_XN_ENTRIES))
> +		complete(&xnm->can_destroy);
> +}
> +
> +/**
> + * libie_ctlq_xn_push_free - push a Xn entry into the free list
> + * @xnm: Xn transaction manager
> + * @xn: xn entry to be added into the free list, not locked
> + *
> + * Safely add a used Xn entry back to the free list.
> + */
> +static void libie_ctlq_xn_push_free(struct libie_ctlq_xn_manager *xnm,
> +				    struct libie_ctlq_xn *xn)
> +{
> +	guard(spinlock)(&xnm->free_xns_bm_lock);
> +
> +	__libie_ctlq_xn_push_free(xnm, xn);
> +}
> +
> +/**
> + * libie_ctlq_xn_deinit_dma - free the DMA memory allocated for send messages
> + * @dev: device pointer
> + * @xnm: pointer to the transaction manager
> + * @num_entries: number of Xn entries to free the DMA for
> + */
> +static void libie_ctlq_xn_deinit_dma(struct device *dev,
> +				     struct libie_ctlq_xn_manager *xnm,
> +				     u32 num_entries)
> +{
> +	for (u32 i = 0; i < num_entries; i++) {
> +		struct libie_ctlq_xn *xn = &xnm->ring[i];
> +
> +		libie_cp_free_dma_mem(dev, xn->dma_mem);
> +		kfree(xn->dma_mem);
> +	}
> +}
> +
> +/**
> + * libie_ctlq_xn_init_dma - pre-allocate DMA memory for send messages that use
> + * stack variables
> + * @dev: device pointer
> + * @xnm: pointer to transaction manager
> + *
> + * Return: %0 on success or error if memory allocation fails
> + */
> +static int libie_ctlq_xn_init_dma(struct device *dev,
> +				  struct libie_ctlq_xn_manager *xnm)
> +{
> +	u32 i;
> +
> +	for (i = 0; i < LIBIE_CTLQ_MAX_XN_ENTRIES; i++) {
> +		struct libie_ctlq_xn *xn = &xnm->ring[i];
> +		struct libie_cp_dma_mem *dma_mem;
> +
> +		dma_mem = kzalloc_obj(*dma_mem);
> +		if (!dma_mem)
> +			goto dealloc_dma;
> +
> +		dma_mem->va = libie_cp_alloc_dma_mem(dev, dma_mem,
> +						     LIBIE_CTLQ_MAX_BUF_LEN);
> +		if (!dma_mem->va) {
> +			kfree(dma_mem);
> +			goto dealloc_dma;
> +		}
> +
> +		xn->dma_mem = dma_mem;
> +	}
> +
> +	return 0;
> +
> +dealloc_dma:
> +	libie_ctlq_xn_deinit_dma(dev, xnm, i);
> +
> +	return -ENOMEM;
> +}
> +
> +/**
> + * libie_ctlq_xn_process_recv - process Xn data in receive message
> + * @params: Xn receive param information to handle a receive message
> + * @ctlq_msg: received control queue message
> + *
> + * Process a control queue receive message and send a complete event
> + * notification.
> + *
> + * Return: true if a message has been processed, false otherwise.
> + */
> +static bool
> +libie_ctlq_xn_process_recv(struct libie_ctlq_xn_recv_params *params,
> +			   struct libie_ctlq_msg *ctlq_msg)
> +{
> +	struct libie_ctlq_xn_manager *xnm = params->xnm;
> +	struct libie_ctlq_xn *xn;
> +	u16 msg_cookie, xn_index;
> +	struct kvec *response;
> +	int status;
> +	u16 data;
> +
> +	data = ctlq_msg->sw_cookie;
> +	xn_index = FIELD_GET(LIBIE_CTLQ_XN_INDEX_M, data);
> +	msg_cookie = FIELD_GET(LIBIE_CTLQ_XN_COOKIE_M, data);
> +	status = ctlq_msg->chnl_retval ? -EFAULT : 0;
> +
> +	xn = &xnm->ring[xn_index];
> +	spin_lock(&xn->xn_lock);
> +	if (ctlq_msg->chnl_opcode != xn->virtchnl_opcode ||
> +	    msg_cookie != xn->cookie) {
> +		spin_unlock(&xn->xn_lock);
> +		return false;
> +	}
> +
> +	if (xn->state != LIBIE_CTLQ_XN_ASYNC &&
> +	    xn->state != LIBIE_CTLQ_XN_WAITING) {
> +		spin_unlock(&xn->xn_lock);
> +		return false;
> +	}
> +
> +	response = &ctlq_msg->recv_mem;
> +	if (xn->state == LIBIE_CTLQ_XN_ASYNC) {
> +		xn->resp_cb(xn->send_ctx, response, status);
> +		libie_ctlq_release_rx_buf(response);
> +		xn->state = LIBIE_CTLQ_XN_IDLE;
> +		spin_unlock(&xn->xn_lock);
> +		libie_ctlq_xn_push_free(xnm, xn);
> +
> +		return true;
> +	}
> +
> +	xn->recv_mem = *response;
> +	xn->state = status ? LIBIE_CTLQ_XN_COMPLETED_FAILED :
> +			     LIBIE_CTLQ_XN_COMPLETED_SUCCESS;
> +
> +	complete(&xn->cmd_completion_event);
> +	spin_unlock(&xn->xn_lock);
> +
> +	return true;
> +}
> +
> +/**
> + * libie_xn_check_async_timeout - Check for asynchronous message timeouts
> + * @xnm: Xn transaction manager
> + *
> + * Call the corresponding callback to notify the caller about the timeout.
> + */
> +static void libie_xn_check_async_timeout(struct libie_ctlq_xn_manager *xnm)
> +{
> +	u32 idx;
> +
> +	for_each_clear_bit(idx, xnm->free_xns_bm, LIBIE_CTLQ_MAX_XN_ENTRIES) {
> +		struct libie_ctlq_xn *xn = &xnm->ring[idx];
> +		u64 timeout_ms;
> +
> +		spin_lock(&xn->xn_lock);
> +
> +		timeout_ms = ktime_ms_delta(ktime_get(), xn->timestamp);
> +		if (xn->state != LIBIE_CTLQ_XN_ASYNC ||
> +		    timeout_ms < xn->timeout_ms) {
> +			spin_unlock(&xn->xn_lock);
> +			continue;
> +		}
> +
> +		xn->resp_cb(xn->send_ctx, NULL, -ETIMEDOUT);
> +		xn->state = LIBIE_CTLQ_XN_IDLE;
> +		spin_unlock(&xn->xn_lock);
> +		libie_ctlq_xn_push_free(xnm, xn);
> +	}
> +}
> +
> +/**
> + * libie_ctlq_xn_recv - process control queue receive message
> + * @params: Xn receive param information to handle a receive message
> + *
> + * Process a receive message and update the receive queue buffer.
> + *
> + * Return: remaining budget.
> + */
> +u32 libie_ctlq_xn_recv(struct libie_ctlq_xn_recv_params *params)
> +{
> +	struct libie_ctlq_msg ctlq_msg;
> +	u32 budget = params->budget;
> +
> +	while (budget && libie_ctlq_recv(params->ctlq, &ctlq_msg, 1)) {
> +		budget--;
> +		if (!libie_ctlq_xn_process_recv(params, &ctlq_msg))
> +			params->ctlq_msg_handler(params->xnm->ctx, &ctlq_msg);
> +	}
> +
> +	libie_ctlq_post_rx_buffs(params->ctlq);
> +	libie_xn_check_async_timeout(params->xnm);
> +
> +	return budget;
> +}
> +EXPORT_SYMBOL_NS_GPL(libie_ctlq_xn_recv, "LIBIE_CP");
> +
> +/**
> + * libie_cp_map_dma_mem - map a given virtual address for DMA
> + * @dev: device information
> + * @va: virtual address to be mapped
> + * @size: size of the memory
> + * @direction: DMA direction either from/to device
> + * @dma_mem: memory for DMA information to be stored
> + *
> + * Return: true on success, false on DMA map failure.
> + */
> +static bool libie_cp_map_dma_mem(struct device *dev, void *va, size_t size,
> +				 int direction,
> +				  struct libie_cp_dma_mem *dma_mem)
> +{
> +	dma_mem->pa = dma_map_single(dev, va, size, direction);
> +
> +	return dma_mapping_error(dev, dma_mem->pa) ? false : true;
> +}
> +
> +/**
> + * libie_cp_unmap_dma_mem - unmap previously mapped DMA address
> + * @dev: device information
> + * @dma_mem: DMA memory information
> + */
> +static void libie_cp_unmap_dma_mem(struct device *dev,
> +				   const struct libie_cp_dma_mem *dma_mem)
> +{
> +	dma_unmap_single(dev, dma_mem->pa, dma_mem->size,
> +			 dma_mem->direction);
> +}
> +
> +/**
> + * libie_ctlq_xn_process_send - process and send a control queue message
> + * @params: Xn send param information for sending a control queue message
> + * @xn: Assigned Xn entry for tracking the control queue message
> + *
> + * Return: %0 on success, -%errno on failure.
> + */
> +static
> +int libie_ctlq_xn_process_send(struct libie_ctlq_xn_send_params *params,
> +			       struct libie_ctlq_xn *xn)
> +{
> +	size_t buf_len = params->send_buf.iov_len;
> +	struct device *dev = params->ctlq->dev;
> +	void *buf = params->send_buf.iov_base;
> +	struct libie_cp_dma_mem *dma_mem;
> +	u16 cookie;
> +
> +	if (!buf || !buf_len)
> +		return -EOPNOTSUPP;
> +
> +	if (libie_cp_can_send_onstack(buf_len)) {
> +		dma_mem = xn->dma_mem;
> +		memcpy(dma_mem->va, buf, buf_len);
> +	} else {
> +		dma_mem = &xn->send_dma_mem;
> +		dma_mem->va = buf;
> +		dma_mem->size = buf_len;
> +		dma_mem->direction = DMA_TO_DEVICE;
> +
> +		if (!libie_cp_map_dma_mem(dev, buf, buf_len, DMA_TO_DEVICE,
> +					  dma_mem))
> +			return -ENOMEM;
> +	}
> +
> +	cookie = FIELD_PREP(LIBIE_CTLQ_XN_COOKIE_M, xn->cookie) |
> +		 FIELD_PREP(LIBIE_CTLQ_XN_INDEX_M, xn->index);
> +
> +	scoped_guard(spinlock, &params->ctlq->lock) {
> +		struct libie_ctlq_info *ctlq = params->ctlq;
> +		struct libie_ctlq_msg *ctlq_msg;
> +
> +		if (!libie_ctlq_send_desc_avail(ctlq)) {
> +			if (!libie_cp_can_send_onstack(buf_len))
> +				libie_cp_unmap_dma_mem(dev, dma_mem);
> +
> +			return -EBUSY;
> +		}
> +
> +		ctlq_msg = ctlq->tx_msg[ctlq->next_to_use];
> +		if (params->ctlq_msg)
> +			*ctlq_msg = *params->ctlq_msg;
> +		else
> +			/* Unused ctlq messages are already zeroed */
> +			ctlq_msg->opcode = LIBIE_CTLQ_SEND_MSG_TO_CP;
> +
> +		ctlq_msg->sw_cookie = cookie;
> +		ctlq_msg->send_mem = *dma_mem;
> +		ctlq_msg->data_len = buf_len;
> +		ctlq_msg->chnl_opcode = params->chnl_opcode;
> +		libie_ctlq_send(params->ctlq, 1);
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * libie_ctlq_xn_send - Function to send a control queue message
> + * @params: Xn send param information for sending a control queue message
> + *
> + * Send a control queue (mailbox or config) message.
> + * Based on the params value, the call can be completed synchronously or
> + * asynchronously.
> + *
> + * Return: %0 on success, -%errno on failure.
> + */
> +int libie_ctlq_xn_send(struct libie_ctlq_xn_send_params *params)
> +{
> +	bool free_send = !libie_cp_can_send_onstack(params->send_buf.iov_len);
> +	struct libie_ctlq_xn *xn;
> +	int ret;
> +
> +	if (params->send_buf.iov_len > LIBIE_CTLQ_MAX_BUF_LEN) {
> +		ret = -EINVAL;
> +		goto free_buf;
> +	}
> +
> +	xn = libie_ctlq_xn_pop_free(params->xnm);
> +	/* no free transactions available */
> +	if (unlikely(!xn)) {
> +		ret = -EAGAIN;
> +		goto free_buf;
> +	}
> +
> +	spin_lock(&xn->xn_lock);
> +	if (xn->state == LIBIE_CTLQ_XN_SHUTDOWN) {
> +		ret = -ENXIO;
> +		goto unlock_xn;
> +	}
> +
> +	xn->state = params->resp_cb ? LIBIE_CTLQ_XN_ASYNC :
> +				      LIBIE_CTLQ_XN_WAITING;
> +	xn->ctlq = params->ctlq;
> +	xn->virtchnl_opcode = params->chnl_opcode;
> +
> +	if (params->resp_cb) {
> +		xn->send_ctx = params->send_ctx;
> +		xn->resp_cb = params->resp_cb;
> +		xn->timeout_ms = params->timeout_ms;
> +		xn->timestamp = ktime_get();
> +	}
> +
> +	ret = libie_ctlq_xn_process_send(params, xn);
> +	if (ret)
> +		goto release_xn;
> +	else
> +		free_send = false;
> +
> +	spin_unlock(&xn->xn_lock);
> +
> +	if (params->resp_cb)
> +		return 0;
> +
> +	wait_for_completion_timeout(&xn->cmd_completion_event,
> +				    msecs_to_jiffies(params->timeout_ms));
> +
> +	spin_lock(&xn->xn_lock);
> +	switch (xn->state) {
> +	case LIBIE_CTLQ_XN_WAITING:
> +		ret = -ETIMEDOUT;
> +		break;
> +	case LIBIE_CTLQ_XN_COMPLETED_SUCCESS:
> +		params->recv_mem = xn->recv_mem;
> +		break;
> +	default:
> +		ret = -EBADMSG;
> +		break;
> +	}
> +
> +	/* Free the receive buffer in case of failure. On timeout, receive
> +	 * buffer is not allocated.
> +	 */
> +	if (ret && ret != -ETIMEDOUT)
> +		libie_ctlq_release_rx_buf(&xn->recv_mem);
> +
> +release_xn:
> +	xn->state = LIBIE_CTLQ_XN_IDLE;
> +	reinit_completion(&xn->cmd_completion_event);
> +unlock_xn:
> +	spin_unlock(&xn->xn_lock);
> +	libie_ctlq_xn_push_free(params->xnm, xn);
> +free_buf:
> +	if (free_send)
> +		params->rel_tx_buf(params->send_buf.iov_base);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_NS_GPL(libie_ctlq_xn_send, "LIBIE_CP");
> +
> +/**
> + * libie_ctlq_xn_send_clean - cleanup the send control queue message buffers
> + * @params: Xn clean param information for send complete handling
> + *
> + * Cleanup the send buffers for the given control queue, if force is set, then
> + * clear all the outstanding send messages irrespective their send status.
> + * Force should be used during deinit or reset.
> + *
> + * Return: number of send buffers cleaned.
> + */
> +u32 libie_ctlq_xn_send_clean(const struct libie_ctlq_xn_clean_params *params)
> +{
> +	struct libie_ctlq_info *ctlq = params->ctlq;
> +	struct device *dev = ctlq->dev;
> +	u32 ntc, i;
> +
> +	spin_lock(&ctlq->lock);
> +	ntc = ctlq->next_to_clean;
> +
> +	for (i = 0; i < params->num_msgs; i++) {
> +		struct libie_ctlq_msg *msg = ctlq->tx_msg[ntc];
> +		struct libie_ctlq_desc *desc;
> +		u64 qword;
> +
> +		desc = &ctlq->descs[ntc];
> +		qword = le64_to_cpu(desc->qword0);
> +
> +		if (!FIELD_GET(LIBIE_CTLQ_DESC_FLAG_DD, qword) &&
> +		    !(unlikely(params->force) && msg->data_len))
> +			break;
> +
> +		dma_rmb();
> +
> +		if (!libie_cp_can_send_onstack(msg->data_len)) {
> +			libie_cp_unmap_dma_mem(dev, &msg->send_mem);
> +			params->rel_tx_buf(msg->send_mem.va);
> +		}
> +
> +		memset(msg, 0, sizeof(*msg));
> +		desc->qword0 = 0;
> +
> +		if (unlikely(++ntc == ctlq->ring_len))
> +			ntc = 0;
> +	}
> +
> +	ctlq->next_to_clean = ntc;
> +	spin_unlock(&ctlq->lock);
> +
> +	return i;
> +}
> +EXPORT_SYMBOL_NS_GPL(libie_ctlq_xn_send_clean, "LIBIE_CP");
> +
> +/**
> + * libie_ctlq_xn_shutdown - terminate control queue transactions
> + * @xnm: pointer to the transaction manager
> + *
> + * Synchronously terminate existing transactions and stop accepting new ones.
> + */
> +void libie_ctlq_xn_shutdown(struct libie_ctlq_xn_manager *xnm)
> +{
> +	bool must_wait = false;
> +	u32 i;
> +
> +	/* Should be no new clear bits after this */
> +	spin_lock(&xnm->free_xns_bm_lock);
> +	xnm->shutdown = true;
> +
> +	for_each_clear_bit(i, xnm->free_xns_bm, LIBIE_CTLQ_MAX_XN_ENTRIES) {
> +		struct libie_ctlq_xn *xn = &xnm->ring[i];
> +
> +		spin_lock(&xn->xn_lock);
> +
> +		switch (xn->state) {
> +		/* if an idle xn is not free, it is about to be either
> +		 * freed or initialized, prevent the latter and wait
> +		 */
> +		case LIBIE_CTLQ_XN_IDLE:
> +			xn->state = LIBIE_CTLQ_XN_SHUTDOWN;
> +			fallthrough;
> +		/* waiting thread possibly needs a push to return the xn,
> +		 * transaction will be reported as timed out
> +		 */
> +		case LIBIE_CTLQ_XN_WAITING:
> +			complete(&xn->cmd_completion_event);
> +			fallthrough;
> +		/* these states will return the xn soon */
> +		case LIBIE_CTLQ_XN_COMPLETED_SUCCESS:
> +		case LIBIE_CTLQ_XN_COMPLETED_FAILED:
> +		case LIBIE_CTLQ_XN_SHUTDOWN:
> +			must_wait = true;
> +			break;
> +		/* no thread should reference async xns at this point */
> +		case LIBIE_CTLQ_XN_ASYNC:
> +			__libie_ctlq_xn_push_free(xnm, xn);
> +			break;
> +		}
> +
> +		spin_unlock(&xn->xn_lock);
> +	}
> +
> +	spin_unlock(&xnm->free_xns_bm_lock);
> +
> +	if (must_wait)
> +		wait_for_completion(&xnm->can_destroy);
> +}
> +EXPORT_SYMBOL_NS_GPL(libie_ctlq_xn_shutdown, "LIBIE_CP");
> +
> +/**
> + * libie_ctlq_xn_deinit - deallocate and free the transaction manager resources
> + * @xnm: pointer to the transaction manager
> + * @ctx: controlq context structure
> + *
> + * All Rx processing must be stopped beforehand.
> + */
> +void libie_ctlq_xn_deinit(struct libie_ctlq_xn_manager *xnm,
> +			  struct libie_ctlq_ctx *ctx)
> +{
> +	libie_ctlq_xn_shutdown(xnm);
> +	libie_ctlq_xn_deinit_dma(&ctx->mmio_info.pdev->dev, xnm,
> +				 LIBIE_CTLQ_MAX_XN_ENTRIES);
> +	kfree(xnm);
> +	libie_ctlq_deinit(ctx);
> +}
> +EXPORT_SYMBOL_NS_GPL(libie_ctlq_xn_deinit, "LIBIE_CP");
> +
> +/**
> + * libie_ctlq_xn_init - initialize the Xn transaction manager
> + * @params: Xn init param information for allocating Xn manager resources
> + *
> + * Return: %0 on success, -%errno on failure.
> + */
> +int libie_ctlq_xn_init(struct libie_ctlq_xn_init_params *params)
> +{
> +	struct libie_ctlq_xn_manager *xnm;
> +	int ret;
> +
> +	ret = libie_ctlq_init(params->ctx, params->cctlq_info, params->num_qs);
> +	if (ret)
> +		return ret;
> +
> +	xnm = kzalloc_obj(*xnm);
> +	if (!xnm)
> +		goto ctlq_deinit;
> +
> +	ret = libie_ctlq_xn_init_dma(&params->ctx->mmio_info.pdev->dev, xnm);
> +	if (ret)
> +		goto free_xnm;
> +
> +	spin_lock_init(&xnm->free_xns_bm_lock);
> +	init_completion(&xnm->can_destroy);
> +	bitmap_fill(xnm->free_xns_bm, LIBIE_CTLQ_MAX_XN_ENTRIES);
> +
> +	for (u32 i = 0; i < LIBIE_CTLQ_MAX_XN_ENTRIES; i++) {
> +		struct libie_ctlq_xn *xn = &xnm->ring[i];
> +
> +		xn->index = i;
> +		init_completion(&xn->cmd_completion_event);
> +		spin_lock_init(&xn->xn_lock);
> +	}
> +	xnm->ctx = params->ctx;
> +	params->xnm = xnm;
> +
> +	return 0;
> +
> +free_xnm:
> +	kfree(xnm);
> +ctlq_deinit:
> +	libie_ctlq_deinit(params->ctx);
> +
> +	return -ENOMEM;
> +}
> +EXPORT_SYMBOL_NS_GPL(libie_ctlq_xn_init, "LIBIE_CP");
> +
>  MODULE_DESCRIPTION("Control Plane communication API");
>  MODULE_IMPORT_NS("LIBETH");
>  MODULE_LICENSE("GPL");
> diff --git a/include/linux/intel/libie/controlq.h b/include/linux/intel/libie/controlq.h
> index a6ed4fa159b1..e355d161ca5e 100644
> --- a/include/linux/intel/libie/controlq.h
> +++ b/include/linux/intel/libie/controlq.h
> @@ -20,6 +20,8 @@
>  #define LIBIE_CTLQ_SEND_MSG_TO_CP		0x801
>  #define LIBIE_CTLQ_SEND_MSG_TO_PEER		0x804
>  
> +#define LIBIE_CP_TX_COPYBREAK		128
> +
>  /**
>   * struct libie_ctlq_ctx - contains controlq info and MMIO region info
>   * @mmio_info: MMIO region info structure
> @@ -60,11 +62,13 @@ struct libie_ctlq_reg {
>   * @va: virtual address
>   * @pa: physical address
>   * @size: memory size
> + * @direction: memory to device or device to memory
>   */
>  struct libie_cp_dma_mem {
>  	void		*va;
>  	dma_addr_t	pa;
>  	size_t		size;
> +	int		direction;
>  };
>  
>  /**
> @@ -246,4 +250,177 @@ u32 libie_ctlq_recv(struct libie_ctlq_info *ctlq, struct libie_ctlq_msg *msg,
>  
>  int libie_ctlq_post_rx_buffs(struct libie_ctlq_info *ctlq);
>  
> +/* Only 8 bits are available in descriptor for Xn index */
> +#define LIBIE_CTLQ_MAX_XN_ENTRIES		256
> +#define LIBIE_CTLQ_XN_COOKIE_M			GENMASK(15, 8)
> +#define LIBIE_CTLQ_XN_INDEX_M			GENMASK(7, 0)
> +
> +/**
> + * enum libie_ctlq_xn_state - Transaction state of a virtchnl message
> + * @LIBIE_CTLQ_XN_IDLE: transaction is available to use
> + * @LIBIE_CTLQ_XN_WAITING: waiting for transaction to complete
> + * @LIBIE_CTLQ_XN_COMPLETED_SUCCESS: transaction completed with success
> + * @LIBIE_CTLQ_XN_COMPLETED_FAILED: transaction completed with failure
> + * @LIBIE_CTLQ_XN_ASYNC: asynchronous virtchnl message transaction type
> + * @LIBIE_CTLQ_XN_SHUTDOWN: transaction cannot be used anymore
> + */
> +enum libie_ctlq_xn_state {
> +	LIBIE_CTLQ_XN_IDLE = 0,
> +	LIBIE_CTLQ_XN_WAITING,
> +	LIBIE_CTLQ_XN_COMPLETED_SUCCESS,
> +	LIBIE_CTLQ_XN_COMPLETED_FAILED,
> +	LIBIE_CTLQ_XN_ASYNC,
> +	LIBIE_CTLQ_XN_SHUTDOWN,
> +};
> +
> +/**
> + * struct libie_ctlq_xn - structure representing a virtchnl transaction entry
> + * @resp_cb: callback to handle the response of an asynchronous virtchnl message
> + * @xn_lock: lock to protect the transaction entry state
> + * @ctlq: send control queue information
> + * @cmd_completion_event: signal when a reply is available
> + * @dma_mem: DMA memory of send buffer that use stack variable
> + * @send_dma_mem: DMA memory of send buffer
> + * @recv_mem: receive buffer
> + * @send_ctx: context for callback function
> + * @timeout_ms: Xn transaction timeout in msecs
> + * @timestamp: timestamp to record the Xn send
> + * @virtchnl_opcode: virtchnl command opcode used for Xn transaction
> + * @state: transaction state of a virtchnl message
> + * @cookie: unique message identifier
> + * @index: index of the transaction entry
> + */
> +struct libie_ctlq_xn {
> +	void (*resp_cb)(void *ctx, struct kvec *mem, int status);
> +	spinlock_t			xn_lock;	/* protects state */
> +	struct libie_ctlq_info		*ctlq;
> +	struct completion		cmd_completion_event;
> +	struct libie_cp_dma_mem	*dma_mem;
> +	struct libie_cp_dma_mem	send_dma_mem;
> +	struct kvec			recv_mem;
> +	void				*send_ctx;
> +	u64				timeout_ms;
> +	ktime_t				timestamp;
> +	u32				virtchnl_opcode;
> +	enum libie_ctlq_xn_state	state;
> +	u8				cookie;
> +	u8				index;
> +};
> +
> +/**
> + * struct libie_ctlq_xn_manager - structure representing the array of virtchnl
> + *				   transaction entries
> + * @ctx: pointer to controlq context structure
> + * @free_xns_bm_lock: lock to protect the free Xn entries bit map
> + * @free_xns_bm: bitmap that represents the free Xn entries
> + * @ring: array of Xn entries
> + * @can_destroy: completion triggered by the last returned transaction
> + * @shutdown: shows the transactions the xnm shutdown is waiting for them
> + * @cookie: unique message identifier
> + */
> +struct libie_ctlq_xn_manager {
> +	struct libie_ctlq_ctx	*ctx;
> +	spinlock_t		free_xns_bm_lock;	/* get/check entries */
> +	DECLARE_BITMAP(free_xns_bm, LIBIE_CTLQ_MAX_XN_ENTRIES);
> +	struct libie_ctlq_xn	ring[LIBIE_CTLQ_MAX_XN_ENTRIES];
> +	struct completion	can_destroy;
> +	bool			shutdown;
> +	u8			cookie;
> +};
> +
> +/**
> + * struct libie_ctlq_xn_send_params - structure representing send Xn entry
> + * @resp_cb: callback to handle the response of an asynchronous virtchnl message
> + * @rel_tx_buf: driver entry point for freeing the send buffer after send
> + * @xnm: Xn manager to process Xn entries
> + * @ctlq: send control queue information
> + * @ctlq_msg: control queue message information
> + * @send_buf: represents the buffer that carries outgoing information
> + * @recv_mem: receive buffer
> + * @send_ctx: context for call back function
> + * @timeout_ms: virtchnl transaction timeout in msecs
> + * @chnl_opcode: virtchnl message opcode
> + */
> +struct libie_ctlq_xn_send_params {
> +	void (*resp_cb)(void *ctx, struct kvec *mem, int status);
> +	void (*rel_tx_buf)(const void *buf_va);
> +	struct libie_ctlq_xn_manager		*xnm;
> +	struct libie_ctlq_info			*ctlq;
> +	struct libie_ctlq_msg			*ctlq_msg;
> +	struct kvec				send_buf;
> +	struct kvec				recv_mem;
> +	void					*send_ctx;
> +	u64					timeout_ms;
> +	u32					chnl_opcode;
> +};
> +
> +/**
> + * libie_cp_can_send_onstack - can a message be sent using a stack variable
> + * @size: ctlq data buffer size
> + *
> + * Return: %true if the message size is small enough for caller to pass
> + *	   an on-stack buffer, %false if kmalloc is needed
> + */
> +static inline bool libie_cp_can_send_onstack(u32 size)
> +{
> +	return size <= LIBIE_CP_TX_COPYBREAK;
> +}
> +
> +/**
> + * struct libie_ctlq_xn_recv_params - structure representing receive Xn entry
> + * @ctlq_msg_handler: callback to handle a message originated from the peer
> + * @xnm: Xn manager to process Xn entries
> + * @ctlq: control queue information
> + * @budget: maximum number of messages to process
> + */
> +struct libie_ctlq_xn_recv_params {
> +	void (*ctlq_msg_handler)(struct libie_ctlq_ctx *ctx,
> +				 struct libie_ctlq_msg *msg);
> +	struct libie_ctlq_xn_manager		*xnm;
> +	struct libie_ctlq_info			*ctlq;
> +	u32					budget;
> +};
> +
> +/**
> + * struct libie_ctlq_xn_clean_params - Data structure used for cleaning the
> + * control queue messages
> + * @rel_tx_buf: driver entry point for freeing the send buffer after send
> + * @ctx: pointer to context structure
> + * @ctlq: control queue information
> + * @send_ctx: context for call back function
> + * @num_msgs: number of messages to be cleaned
> + * @force: clean even if DD is not yet set
> + */
> +struct libie_ctlq_xn_clean_params {
> +	void (*rel_tx_buf)(const void *buf_va);
> +	struct libie_ctlq_ctx			*ctx;
> +	struct libie_ctlq_info			*ctlq;
> +	void					*send_ctx;
> +	u16					num_msgs;
> +	bool					force;
> +};
> +
> +/**
> + * struct libie_ctlq_xn_init_params - Data structure used for initializing the
> + * Xn transaction manager
> + * @cctlq_info: control queue information
> + * @ctx: pointer to controlq context structure
> + * @xnm: Xn manager to process Xn entries
> + * @num_qs: number of control queues needs to initialized
> + */
> +struct libie_ctlq_xn_init_params {
> +	struct libie_ctlq_create_info		*cctlq_info;
> +	struct libie_ctlq_ctx			*ctx;
> +	struct libie_ctlq_xn_manager		*xnm;
> +	u32					num_qs;
> +};
> +
> +int libie_ctlq_xn_init(struct libie_ctlq_xn_init_params *params);
> +void libie_ctlq_xn_deinit(struct libie_ctlq_xn_manager *xnm,
> +			  struct libie_ctlq_ctx *ctx);
> +void libie_ctlq_xn_shutdown(struct libie_ctlq_xn_manager *xnm);
> +int libie_ctlq_xn_send(struct libie_ctlq_xn_send_params *params);
> +u32 libie_ctlq_xn_recv(struct libie_ctlq_xn_recv_params *params);
> +u32 libie_ctlq_xn_send_clean(const struct libie_ctlq_xn_clean_params *params);
> +
>  #endif /* __LIBIE_CONTROLQ_H */
> -- 
> 2.47.1
> 

^ permalink raw reply

* Re: [PATCH RFC 2/5] dma-heap: charge dma-buf memory via explicit memcg
From: Christian König @ 2026-05-18  7:19 UTC (permalink / raw)
  To: T.J. Mercier, Christian Brauner
  Cc: Albert Esteve, Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Shuah Khan, Sumit Semwal, Michal Hocko,
	Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
	Benjamin Gaignard, Brian Starkey, John Stultz, Paul Moore,
	James Morris, Serge E. Hallyn, Stephen Smalley, Ondrej Mosnacek,
	Shuah Khan, cgroups, linux-doc, linux-kernel, linux-media,
	dri-devel, linaro-mm-sig, linux-mm, linux-security-module,
	selinux, linux-kselftest, mripard, echanude
In-Reply-To: <CABdmKX0d6Zsg+_TxXjB80UZR23ZvXzxYoWzORgwmx=ZiuE+Nzw@mail.gmail.com>

On 5/15/26 19:06, T.J. Mercier wrote:
> On Fri, May 15, 2026 at 6:53 AM Christian Brauner <brauner@kernel.org> wrote:
>>
>> On Tue, May 12, 2026 at 11:10:44AM +0200, Albert Esteve wrote:
>>> On embedded platforms a central process often allocates dma-buf
>>> memory on behalf of client applications. Without a way to
>>> attribute the charge to the requesting client's cgroup, the
>>> cost lands on the allocator, making per-cgroup memory limits
>>> ineffective for the actual consumers.
>>>
>>> Add charge_pid_fd to struct dma_heap_allocation_data. When set to
>>
>> Please be aware that pidfds come in two flavors:
>>
>> thread-group pidfds and thread-specific pidfds. Make sure that your API
>> doesn't implicitly depend on this distinction not existing.
> 
> Hi Christian,
> 
> Memcg is not a controller that supports "thread mode" so all threads
> in a group should belong to the same memcg.

BTW: Exactly that is the requirement automotive has with their native context use case.

The use case is that you have a deamon which has multiple threads were each one is acting on behalve of some other process.

At the moment we basically say they are simply not using cgroups for that use case, but it would be really nice if we could handle that as well.

Summarizing the requirement of that use case: You need a different cgroup for each thread of a process.

Regards,
Christian.

> 
> Checking the flags from pidfd_get_pid would be the best way for an
> explicit check of the pidfd type?
> 
>>> a valid pidfd, DMA_HEAP_IOCTL_ALLOC resolves the target task's
>>> memcg and charges the buffer there via mem_cgroup_charge_dmabuf()
>>> inside dma_heap_buffer_alloc(). Without charge_pid_fd, and with
>>> the mem_accounting module parameter enabled, the buffer is charged
>>> to the allocator's own cgroup.
>>>
>>> Additionally, commit 3c227be90659 ("dma-buf: system_heap: account for
>>> system heap allocation in memcg") adds __GFP_ACCOUNT to system-heap
>>> page allocations. Keeping __GFP_ACCOUNT would charge the same pages
>>> twice (once to kmem, once to MEMCG_DMABUF), thus remove it and route
>>> all accounting through a single MEMCG_DMABUF path.
>>>
>>> Usage examples:
>>>
>>>   1. Central allocator charging to a client at allocation time.
>>>      The allocator knows the client's PID (e.g., from binder's
>>>      sender_pid) and uses pidfd to attribute the charge:
>>>
>>>        pid_t client_pid = txn->sender_pid;
>>>        int pidfd = pidfd_open(client_pid, 0);
>>>
>>>        struct dma_heap_allocation_data alloc = {
>>>            .len             = buffer_size,
>>>            .fd_flags        = O_RDWR | O_CLOEXEC,
>>>            .charge_pid_fd   = pidfd,
>>>        };
>>>        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
>>>        close(pidfd);
>>>        /* alloc.fd is now charged to client's cgroup */
>>>
>>>   2. Default allocation (no pidfd, mem_accounting=1).
>>>      When charge_pid_fd is not set and the mem_accounting module
>>>      parameter is enabled, the buffer is charged to the allocator's
>>>      own cgroup:
>>>
>>>        struct dma_heap_allocation_data alloc = {
>>>            .len      = buffer_size,
>>>            .fd_flags = O_RDWR | O_CLOEXEC,
>>>        };
>>>        ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &alloc);
>>>        /* charged to current process's cgroup */
>>>
>>> Current limitations:
>>>
>>>  - Single-owner model: a dma-buf carries one memcg charge regardless of
>>>    how many processes share it. Means only the first owner (and exporter)
>>>    of the shared buffer bears the charge.
>>>  - Only memcg accounting supported. While this makes sense for system
>>>    heap buffers, other heaps (e.g., CMA heaps) will require selectively
>>>    charging also for the dmem controller.
>>>
>>> Signed-off-by: Albert Esteve <aesteve@redhat.com>
>>> ---
>>>  Documentation/admin-guide/cgroup-v2.rst |  5 ++--
>>>  drivers/dma-buf/dma-buf.c               | 16 ++++---------
>>>  drivers/dma-buf/dma-heap.c              | 42 ++++++++++++++++++++++++++++++---
>>>  drivers/dma-buf/heaps/system_heap.c     |  2 --
>>>  include/uapi/linux/dma-heap.h           |  6 +++++
>>>  5 files changed, 53 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
>>> index 8bdbc2e866430..824d269531eb1 100644
>>> --- a/Documentation/admin-guide/cgroup-v2.rst
>>> +++ b/Documentation/admin-guide/cgroup-v2.rst
>>> @@ -1636,8 +1636,9 @@ The following nested keys are defined.
>>>               structures.
>>>
>>>         dmabuf (npn)
>>> -             Amount of memory used for exported DMA buffers allocated by the cgroup.
>>> -             Stays with the allocating cgroup regardless of how the buffer is shared.
>>> +             Amount of memory used for exported DMA buffers allocated by or on
>>> +             behalf of the cgroup. Stays with the allocating cgroup regardless
>>> +             of how the buffer is shared.
>>>
>>>         workingset_refault_anon
>>>               Number of refaults of previously evicted anonymous pages.
>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>>> index ce02377f48908..23fb758b78297 100644
>>> --- a/drivers/dma-buf/dma-buf.c
>>> +++ b/drivers/dma-buf/dma-buf.c
>>> @@ -181,8 +181,11 @@ static void dma_buf_release(struct dentry *dentry)
>>>        */
>>>       BUG_ON(dmabuf->cb_in.active || dmabuf->cb_out.active);
>>>
>>> -     mem_cgroup_uncharge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
>>> -     mem_cgroup_put(dmabuf->memcg);
>>> +     if (dmabuf->memcg) {
>>> +             mem_cgroup_uncharge_dmabuf(dmabuf->memcg,
>>> +                                       PAGE_ALIGN(dmabuf->size) / PAGE_SIZE);
>>> +             mem_cgroup_put(dmabuf->memcg);
>>> +     }
>>>
>>>       dmabuf->ops->release(dmabuf);
>>>
>>> @@ -764,13 +767,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
>>>               dmabuf->resv = resv;
>>>       }
>>>
>>> -     dmabuf->memcg = get_mem_cgroup_from_mm(current->mm);
>>> -     if (!mem_cgroup_charge_dmabuf(dmabuf->memcg, PAGE_ALIGN(dmabuf->size) / PAGE_SIZE,
>>> -                                   GFP_KERNEL)) {
>>> -             ret = -ENOMEM;
>>> -             goto err_memcg;
>>> -     }
>>> -
>>>       file->private_data = dmabuf;
>>>       file->f_path.dentry->d_fsdata = dmabuf;
>>>       dmabuf->file = file;
>>> @@ -781,8 +777,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
>>>
>>>       return dmabuf;
>>>
>>> -err_memcg:
>>> -     mem_cgroup_put(dmabuf->memcg);
>>>  err_file:
>>>       fput(file);
>>>  err_module:
>>> diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
>>> index ac5f8685a6494..ff6e259afcdc0 100644
>>> --- a/drivers/dma-buf/dma-heap.c
>>> +++ b/drivers/dma-buf/dma-heap.c
>>> @@ -7,13 +7,17 @@
>>>   */
>>>
>>>  #include <linux/cdev.h>
>>> +#include <linux/cgroup.h>
>>>  #include <linux/device.h>
>>>  #include <linux/dma-buf.h>
>>>  #include <linux/dma-heap.h>
>>> +#include <linux/memcontrol.h>
>>> +#include <linux/sched/mm.h>
>>>  #include <linux/err.h>
>>>  #include <linux/export.h>
>>>  #include <linux/list.h>
>>>  #include <linux/nospec.h>
>>> +#include <linux/pidfd.h>
>>>  #include <linux/syscalls.h>
>>>  #include <linux/uaccess.h>
>>>  #include <linux/xarray.h>
>>> @@ -55,10 +59,12 @@ MODULE_PARM_DESC(mem_accounting,
>>>                "Enable cgroup-based memory accounting for dma-buf heap allocations (default=false).");
>>>
>>>  static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
>>> -                              u32 fd_flags,
>>> -                              u64 heap_flags)
>>> +                              u32 fd_flags, u64 heap_flags,
>>> +                              struct mem_cgroup *charge_to)
>>>  {
>>>       struct dma_buf *dmabuf;
>>> +     unsigned int nr_pages;
>>> +     struct mem_cgroup *memcg = charge_to;
>>>       int fd;
>>>
>>>       /*
>>> @@ -73,6 +79,22 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
>>>       if (IS_ERR(dmabuf))
>>>               return PTR_ERR(dmabuf);
>>>
>>> +     nr_pages = len / PAGE_SIZE;
>>> +
>>> +     if (memcg)
>>> +             css_get(&memcg->css);
>>> +     else if (mem_accounting)
>>> +             memcg = get_mem_cgroup_from_mm(current->mm);
>>> +
>>> +     if (memcg) {
>>> +             if (!mem_cgroup_charge_dmabuf(memcg, nr_pages, GFP_KERNEL)) {
>>> +                     mem_cgroup_put(memcg);
>>> +                     dma_buf_put(dmabuf);
>>> +                     return -ENOMEM;
>>> +             }
>>> +             dmabuf->memcg = memcg;
>>> +     }
>>> +
>>>       fd = dma_buf_fd(dmabuf, fd_flags);
>>>       if (fd < 0) {
>>>               dma_buf_put(dmabuf);
>>> @@ -102,6 +124,9 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
>>>  {
>>>       struct dma_heap_allocation_data *heap_allocation = data;
>>>       struct dma_heap *heap = file->private_data;
>>> +     struct mem_cgroup *memcg = NULL;
>>> +     struct task_struct *task;
>>> +     unsigned int pidfd_flags;
>>>       int fd;
>>>
>>>       if (heap_allocation->fd)
>>> @@ -113,9 +138,20 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data)
>>>       if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS)
>>>               return -EINVAL;
>>>
>>> +     if (heap_allocation->charge_pid_fd) {
>>> +             task = pidfd_get_task(heap_allocation->charge_pid_fd, &pidfd_flags);
>>
>> Will always get a thread-group leader pidfd and will fail if this is a
>> thread-specific pidfd. pidfd_open(1234, PIDFD_THREAD) can be used to
>> open a thread-specific pidfd.
>>
>>> +             if (IS_ERR(task))
>>> +                     return PTR_ERR(task);
>>> +
>>> +             memcg = get_mem_cgroup_from_mm(task->mm);
>>> +             put_task_struct(task);
>>> +     }
>>> +
>>>       fd = dma_heap_buffer_alloc(heap, heap_allocation->len,
>>>                                  heap_allocation->fd_flags,
>>> -                                heap_allocation->heap_flags);
>>> +                                heap_allocation->heap_flags,
>>> +                                memcg);
>>> +     mem_cgroup_put(memcg);
>>>       if (fd < 0)
>>>               return fd;
>>>
>>> diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
>>> index 03c2b87cb1112..95d7688167b93 100644
>>> --- a/drivers/dma-buf/heaps/system_heap.c
>>> +++ b/drivers/dma-buf/heaps/system_heap.c
>>> @@ -385,8 +385,6 @@ static struct page *alloc_largest_available(unsigned long size,
>>>               if (max_order < orders[i])
>>>                       continue;
>>>               flags = order_flags[i];
>>> -             if (mem_accounting)
>>> -                     flags |= __GFP_ACCOUNT;
>>>               page = alloc_pages(flags, orders[i]);
>>>               if (!page)
>>>                       continue;
>>> diff --git a/include/uapi/linux/dma-heap.h b/include/uapi/linux/dma-heap.h
>>> index a4cf716a49fa6..e02b0f8cbc6a1 100644
>>> --- a/include/uapi/linux/dma-heap.h
>>> +++ b/include/uapi/linux/dma-heap.h
>>> @@ -29,6 +29,10 @@
>>>   *                   handle to the allocated dma-buf
>>>   * @fd_flags:                file descriptor flags used when allocating
>>>   * @heap_flags:              flags passed to heap
>>> + * @charge_pid_fd:   optional pidfd of the process whose cgroup should be
>>> + *                   charged for this allocation; 0 means charge the calling
>>> + *                   process's cgroup
>>> + * @__padding:               reserved, must be zero
>>>   *
>>>   * Provided by userspace as an argument to the ioctl
>>>   */
>>> @@ -37,6 +41,8 @@ struct dma_heap_allocation_data {
>>>       __u32 fd;
>>>       __u32 fd_flags;
>>>       __u64 heap_flags;
>>> +     __u32 charge_pid_fd;
>>> +     __u32 __padding;
>>>  };
>>>
>>>  #define DMA_HEAP_IOC_MAGIC           'H'
>>>
>>> --
>>> 2.53.0
>>>


^ permalink raw reply

* Re: [PATCH v2 1/3] Doc: deprecated.rst: add strlcat()
From: Geert Uytterhoeven @ 2026-05-18  7:11 UTC (permalink / raw)
  To: David Laight
  Cc: Heiko Carstens, Kees Cook, Manuel Ebner, Andy Shevchenko,
	Jonathan Corbet, Shuah Khan, Andy Whitcroft, Joe Perches,
	Dwaipayan Ray, Lukas Bulwahn, Randy Dunlap, Jani Nikula,
	open list:DOCUMENTATION PROCESS, open list:DOCUMENTATION,
	open list
In-Reply-To: <20260516173524.498984d0@pumpkin>

Hi David,

On Sat, 16 May 2026 at 18:35, David Laight <david.laight.linux@gmail.com> wrote:
> On Sat, 16 May 2026 17:28:19 +0200
> Heiko Carstens <hca@linux.ibm.com> wrote:
>
> > On Thu, May 14, 2026 at 09:31:46AM -0700, Kees Cook wrote:
> > > On Thu, May 14, 2026 at 06:26:53PM +0200, Manuel Ebner wrote:
> > > > add strlcat and alternatives
> > > >
> > > > Signed-off-by: Manuel Ebner <manuelebner@mailbox.org>
> > > > ---
> > > >  Documentation/process/deprecated.rst | 7 +++++++
> > > >  1 file changed, 7 insertions(+)
> > > >
> > > > diff --git a/Documentation/process/deprecated.rst b/Documentation/process/deprecated.rst
> > > > index fed56864d036..06e802f4bbfd 100644
> > > > --- a/Documentation/process/deprecated.rst
> > > > +++ b/Documentation/process/deprecated.rst
> > > > @@ -153,6 +153,13 @@ used, and the destinations should be marked with the `__nonstring
> > > >  attribute to avoid future compiler warnings. For cases still needing
> > > >  NUL-padding, strtomem_pad() can be used.
> > > >
> > > > +strlcat()
> > > > +---------
> > > > +strlcat() must re-scan the destination string from the beginning on each
> > > > +call (O(n^2) behavior). Alternatives are seq_buf_puts() and seq_buf_printf().
> > > > +snprintf(), scnprintf() and sysfs_emit() are possible aswell, but the adoption
> > > > +of the arguments needs to be taken care off.
> > > > +
> > >
> > > How about just:
> > >
> > > strlcat() must re-scan the destination string from the beginning on each
> > > call (O(n^2) behavior). Use the seq_buf API or similar instead.
> >
> > seq_buf API for appending something to e.g. boot_command_line seems to be odd,
> > since boot_command_line is usually "just there" (depending on architecture and
> > boot loader).
>
> Indeed, but ISTR that code uses strcat() a lot of the time.
> The lengths are all known, so memcpy() can be used.
>
> I don't really see why strlcat() should be deprecated.
> Clearly there are many cases where there are better ways to do things.

https://elixir.bootlin.com/linux/v7.0.8/source/include/linux/fortify-string.h#L346
already says "Do not use this function. [...] Prefer building the
 * string with formatting, via scnprintf(), seq_buf, or similar.".

> The only problem with strlcat() is that it returns the 'required length'.
> So there are some broken uses.
> - fs/nfs/flexfilelayout/flexfilelayout.c
> - lib/kunit/string-stream.c (although the preceding vsnprintf() looks like the actual bug).
> There is also some very strange code in security/selinus/ima.c - but it may be ok.
>
> In reality the return value of strlcat() isn't really much worse that that
> of snprintf().

So we need strscat()? ;-)

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH 4/8] drm/panthor: Add support for protected memory allocation in panthor
From: Boris Brezillon @ 2026-05-18  7:16 UTC (permalink / raw)
  To: Chia-I Wu
  Cc: Liviu Dudau, Marcin Ślusarz, Ketil Johnsen, David Airlie,
	Simona Vetter, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, Jonathan Corbet, Shuah Khan, Sumit Semwal,
	Benjamin Gaignard, Brian Starkey, John Stultz, T.J. Mercier,
	Christian König, Steven Price, Daniel Almeida, Alice Ryhl,
	Matthias Brugger, AngeloGioacchino Del Regno, dri-devel,
	linux-doc, linux-kernel, linux-media, linaro-mm-sig,
	linux-arm-kernel, linux-mediatek, Florent Tomasin, nd
In-Reply-To: <CAPaKu7QC7FdjL6m_OSb+E5aYKs6bmT-9DAHc5PC=XctCmRph2Q@mail.gmail.com>

On Wed, 13 May 2026 12:31:32 -0700
Chia-I Wu <olvaffe@gmail.com> wrote:

> On Tue, May 12, 2026 at 8:39 AM Liviu Dudau <liviu.dudau@arm.com> wrote:
> >
> > On Tue, May 12, 2026 at 04:11:11PM +0200, Boris Brezillon wrote:  
> > > On Tue, 12 May 2026 14:47:27 +0100
> > > Liviu Dudau <liviu.dudau@arm.com> wrote:
> > >  
> > > > On Thu, May 07, 2026 at 01:53:56PM +0200, Boris Brezillon wrote:  
> > > > > On Thu, 7 May 2026 11:02:26 +0200
> > > > > Marcin Ślusarz <marcin.slusarz@arm.com> wrote:
> > > > >  
> > > > > > On Tue, May 05, 2026 at 06:15:23PM +0200, Boris Brezillon wrote:  
> > > > > > > > @@ -277,9 +286,21 @@ int panthor_device_init(struct panthor_device *ptdev)
> > > > > > > >                     return ret;
> > > > > > > >     }
> > > > > > > >
> > > > > > > > +   /* If a protected heap name is specified but not found, defer the probe until created */
> > > > > > > > +   if (protected_heap_name && strlen(protected_heap_name)) {  
> > > > > > >
> > > > > > > Do we really need this strlen() > 0? Won't dma_heap_find() fail is the
> > > > > > > name is "" already?  
> > > > > >
> > > > > > If dma_heap_find() will fail, then the whole probe with fail too.
> > > > > > This check prevents that.  
> > > > >
> > > > > Yeah, that's also a questionable design choice. I mean, we can
> > > > > currently probe and boot the FW even though we never setup the
> > > > > protected FW sections, so why should we defer the probe here? Can't we
> > > > > just retry the next time a group with the protected bit is created and
> > > > > fail if we can find a protected heap?  
> > > >
> > > > The problem we have with the current firmware is that it does a number of setup steps at "boot"
> > > > time only. One of the steps is preparing its internal structures for when it enters protected
> > > > mode and it stores them in the buffer passed in at firmware loading. We cannot later run the
> > > > process when we have a group with protected mode set.  
> > >
> > > No, but we can force a full/slow reset and have that thing
> > > re-initialized, can't we? I mean, that's basically what we do when a
> > > fast reset fails: we re-initialize all the sections and reset again, at
> > > which point the FW should start from a fresh state, and be able to
> > > properly initialize the protected-related stuff if protected sections
> > > are populated. Am I missing something?  
> >
> > Right, we can do that. For some reason I keep associating the reset with the
> > error handling and not with "normal" operations.  
> I kind of hope we end up with either
> 
>  - panthor knows the exact heap to use and fails with EPROBE_DEFER if
> the heap is missing, or
>  - panthor gets a dma-buf from userspace and does the full reset
>    - userspace also needs to provide a dma-buf for each protected
> group for the suspend buffer
> 
> than something in-between. The latter is more ad-hoc and basically
> kicks the issue to the userspace.

Indeed, the second option is more ad-hoc, but when you think about it,
userspace has to have this knowledge, because it needs to know the
dma-heap to use for buffer allocation that cross a device boundary
anyway. Think about frames produced by a video decoder, and composited
by the GPU into a protected scanout buffer that's passed to the KMS
device. Why would the GPU driver be source of truth when it comes to
choosing the heap to use to allocate protected buffers for the video
decoder or those used for the display?

> 
> For the former, expressing the relation in DT seems to be the best,
> but only if possible :-). Otherwise, a kconfig option (instead of
> module param) should be easier to work with.
> 
> Looking at the userspace implementation, can we also have an panthor
> ioctl to return the heap to userspace?

Yes, it's something we can add, but again, I'm questioning the
usefulness of this: how can we ensure the heap used by panthor to
allocate its protected FW buffers is suitable for scanout buffers
(buffers that can be used by display drivers). There needs to be a glue
leaving in usersland and taking the decision, and I'm not too sure
trusting any of the component in the chain (vdec, gpu, display) is the
right thing to do.

^ permalink raw reply

* Re: [PATCH v6 03/11] dt-bindings: mfd: add documentation for S2MU005 PMIC
From: Krzysztof Kozlowski @ 2026-05-18  7:15 UTC (permalink / raw)
  To: Conor Dooley, Kaustabh Chakraborty
  Cc: Lee Jones, Pavel Machek, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, MyungJoo Ham, Chanwoo Choi, Sebastian Reichel,
	André Draszik, Alexandre Belloni, Jonathan Corbet,
	Shuah Khan, Nam Tran, Łukasz Lebiedziński, linux-leds,
	devicetree, linux-kernel, linux-pm, linux-samsung-soc, linux-rtc,
	linux-doc
In-Reply-To: <20260517-corrode-tuesday-a598ca734b38@spud>

On 17/05/2026 22:52, Conor Dooley wrote:
> On Sun, May 17, 2026 at 06:39:37PM +0530, Kaustabh Chakraborty wrote:
>>>>>>> +
>>>>>> +    properties:
>>>>>> +      compatible:
>>>>>> +        const: samsung,s2mu005-rgb
>>>>>> +
>>>>>> +    required:
>>>>>> +      - compatible
>>>>>> +
>>>>>> +    unevaluatedProperties: false
>>>>>> +
>>>>>> +  reg:
>>>>>> +    maxItems: 1
>>>>>
>>>>> Move this above the child nodes please.
>>>>
>>>> But properties are sorted in lex order?
>>>
>>> Typically the binding is sorted in the same order as properties go in
>>> nodes. Common stuff like reg/clocks/interrupts therefore send up above
>>> child nodes.
>>
>> So, do I change this? For one, I don't see the same being followed in
>> other schemas of samsung in the same dir (not that I'm trying to pose it
>> as an argument against your suggestion), and this was reviewed by
>> Krzysztof and is adderssed in v7.
> 
> If Krzysztof doesn't care, then I won't ask you to change it.

This builds on top of bindings for previous Samsung PMIC devices, so
that's why it keeps the compatibles for children, I guess. No one
complained about this at v1-v2 reviews, so when I joined reviewing in v3
I did not, either.

I don't think the compatible should be here, but I also don't want to
stall that patchset. I understand that it is inconsistent review from my
side, because other similar patchsets receive comment to drop the
compatible. But I don't think we will be fair asking to drop the
compatible now, when we did not ask for that in the early versions at all.

Best regards,
Krzysztof

^ permalink raw reply

* Re: [PATCH net-next v3 04/14] libie: add control queue support
From: Larysa Zaremba @ 2026-05-18  7:02 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev, Tony Nguyen
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	Phani R Burra, przemyslaw.kitszel, aleksander.lobakin,
	sridhar.samudrala, anjali.singhai, michal.swiatkowski,
	maciej.fijalkowski, emil.s.tantilov, madhu.chittim, joshua.a.hay,
	jacob.e.keller, jayaprakash.shanmugam, jiri, horms, corbet,
	richardcochran, linux-doc, Bharath R, Samuel Salin,
	Aleksandr Loktionov
In-Reply-To: <20260515224443.2772147-5-anthony.l.nguyen@intel.com>

On Fri, May 15, 2026 at 03:44:28PM -0700, Tony Nguyen wrote:
> From: Phani R Burra <phani.r.burra@intel.com>
> 
> Libie will now support control queue setup and configuration APIs. These
> are mainly used for mailbox communication between drivers and control
> plane.
> 
> Make use of the libeth_rx page pool support for managing controlq buffers.

I had reviewed the Sashiko feedback [0]. Here is why I not find the feedback 
very helpful for this particular patch:

1. libie_ctlq_post_rx_buffs/libie_ctlq_fill_rx_msg - libeth FQs are configured 
   so that offset is always 0 and truesize == HW DMA size.
2. libie_ctlq_deinit: Final teardown without the lock is fine.
3. "If a caller sets up a bidirectional message" - Rx and Tx queues are 
   independent. The whole comment is just confusing.
4. "[libie_ctlq_recv] does not clear the DD bit after processing a descriptor, 
    will libie_ctlq_recv() mistakenly process the barrier slot as a new hardware 
    completion?" - we clear DD before posting the buffer.

[0] https://sashiko.dev/#/patchset/20260515224443.2772147-1-anthony.l.nguyen%40intel.com

> 
> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> Signed-off-by: Phani R Burra <phani.r.burra@intel.com>
> Co-developed-by: Victor Raj <victor.raj@intel.com>
> Signed-off-by: Victor Raj <victor.raj@intel.com>
> Co-developed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
> Co-developed-by: Larysa Zaremba <larysa.zaremba@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> Tested-by: Bharath R <bharath.r@intel.com>
> Tested-by: Samuel Salin <Samuel.salin@intel.com>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> ---
>  drivers/net/ethernet/intel/libie/Kconfig    |   8 +
>  drivers/net/ethernet/intel/libie/Makefile   |   4 +
>  drivers/net/ethernet/intel/libie/controlq.c | 614 ++++++++++++++++++++
>  include/linux/intel/libie/controlq.h        | 249 ++++++++
>  4 files changed, 875 insertions(+)
>  create mode 100644 drivers/net/ethernet/intel/libie/controlq.c
>  create mode 100644 include/linux/intel/libie/controlq.h
> 
> diff --git a/drivers/net/ethernet/intel/libie/Kconfig b/drivers/net/ethernet/intel/libie/Kconfig
> index 500a95c944a8..9c5fdebb6766 100644
> --- a/drivers/net/ethernet/intel/libie/Kconfig
> +++ b/drivers/net/ethernet/intel/libie/Kconfig
> @@ -15,6 +15,14 @@ config LIBIE_ADMINQ
>  	  Helper functions used by Intel Ethernet drivers for administration
>  	  queue command interface (aka adminq).
>  
> +config LIBIE_CP
> +	tristate
> +	select LIBETH
> +	select LIBIE_PCI
> +	help
> +	  Common helper routines to communicate with the device Control Plane
> +	  using virtchnl2 or related mailbox protocols.
> +
>  config LIBIE_FWLOG
>  	tristate
>  	select LIBIE_ADMINQ
> diff --git a/drivers/net/ethernet/intel/libie/Makefile b/drivers/net/ethernet/intel/libie/Makefile
> index a28509cb9086..3065aa057798 100644
> --- a/drivers/net/ethernet/intel/libie/Makefile
> +++ b/drivers/net/ethernet/intel/libie/Makefile
> @@ -9,6 +9,10 @@ obj-$(CONFIG_LIBIE_ADMINQ) 	+= libie_adminq.o
>  
>  libie_adminq-y			:= adminq.o
>  
> +obj-$(CONFIG_LIBIE_CP)		+= libie_cp.o
> +
> +libie_cp-y			:= controlq.o
> +
>  obj-$(CONFIG_LIBIE_FWLOG) 	+= libie_fwlog.o
>  
>  libie_fwlog-y			:= fwlog.o
> diff --git a/drivers/net/ethernet/intel/libie/controlq.c b/drivers/net/ethernet/intel/libie/controlq.c
> new file mode 100644
> index 000000000000..934721c98f34
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/libie/controlq.c
> @@ -0,0 +1,614 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (C) 2025 Intel Corporation */
> +
> +#include <linux/bitfield.h>
> +#include <net/libeth/rx.h>
> +
> +#include <linux/intel/libie/controlq.h>
> +
> +#define LIBIE_CTLQ_DESC_QWORD0(sz)			\
> +	(LIBIE_CTLQ_DESC_FLAG_BUF |			\
> +	 LIBIE_CTLQ_DESC_FLAG_RD |			\
> +	 FIELD_PREP(LIBIE_CTLQ_DESC_DATA_LEN, sz))
> +
> +/**
> + * libie_ctlq_free_fq - free fill queue resources, including buffers
> + * @ctlq: Rx control queue whose resources need to be freed
> + */
> +static void libie_ctlq_free_fq(struct libie_ctlq_info *ctlq)
> +{
> +	struct libeth_fq fq = {
> +		.fqes		= ctlq->rx_fqes,
> +		.pp		= ctlq->pp,
> +	};
> +
> +	for (u32 ntc = ctlq->next_to_clean; ntc != ctlq->next_to_post; ) {
> +		page_pool_put_full_netmem(fq.pp, fq.fqes[ntc].netmem, false);
> +
> +		if (++ntc >= ctlq->ring_len)
> +			ntc = 0;
> +	}
> +
> +	libeth_rx_fq_destroy(&fq);
> +}
> +
> +/**
> + * libie_ctlq_init_fq - initialize fill queue for an Rx controlq
> + * @ctlq: control queue that needs a Rx buffer allocation
> + *
> + * Return: %0 on success, -%errno on failure
> + */
> +static int libie_ctlq_init_fq(struct libie_ctlq_info *ctlq)
> +{
> +	struct libeth_fq fq = {
> +		.count		= ctlq->ring_len,
> +		.truesize	= LIBIE_CTLQ_MAX_BUF_LEN,
> +		.nid		= NUMA_NO_NODE,
> +		.type		= LIBETH_FQE_SHORT,
> +		.hsplit		= true,
> +		.no_napi	= true,
> +	};
> +	int err;
> +
> +	err = libeth_rx_fq_create(&fq, ctlq->dev);
> +	if (err)
> +		return err;
> +
> +	ctlq->pp = fq.pp;
> +	ctlq->rx_fqes = fq.fqes;
> +	ctlq->truesize = fq.truesize;
> +
> +	return 0;
> +}
> +
> +/**
> + * libie_ctlq_reset_rx_desc - reset the descriptor with a new address
> + * @desc: descriptor to (re)initialize
> + * @addr: physical address to put into descriptor
> + * @mem_truesize: size of the accessible memory
> + */
> +static void libie_ctlq_reset_rx_desc(struct libie_ctlq_desc *desc,
> +				     dma_addr_t addr, u32 mem_truesize)
> +{
> +	u64 qword;
> +
> +	*desc = (struct libie_ctlq_desc) {};
> +	qword = LIBIE_CTLQ_DESC_QWORD0(mem_truesize);
> +	desc->qword0 = cpu_to_le64(qword);
> +
> +	qword = FIELD_PREP(LIBIE_CTLQ_DESC_DATA_ADDR_HIGH,
> +			   upper_32_bits(addr)) |
> +		FIELD_PREP(LIBIE_CTLQ_DESC_DATA_ADDR_LOW,
> +			   lower_32_bits(addr));
> +	desc->qword3 = cpu_to_le64(qword);
> +}
> +
> +/**
> + * libie_ctlq_post_rx_buffs - post buffers to descriptor ring
> + * @ctlq: control queue that requires Rx descriptor ring to be initialized with
> + *	  new Rx buffers
> + *
> + * The caller must make sure that calls to libie_ctlq_post_rx_buffs()
> + * and libie_ctlq_recv() for each queue are either serialized
> + * or used under ctlq->lock.
> + *
> + * Return: %0 on success, -%ENOMEM if any buffer could not be allocated
> + */
> +int libie_ctlq_post_rx_buffs(struct libie_ctlq_info *ctlq)
> +{
> +	u32 ntp = ctlq->next_to_post, ntc = ctlq->next_to_clean, num_to_post;
> +	const struct libeth_fq_fp fq = {
> +		.pp		= ctlq->pp,
> +		.fqes		= ctlq->rx_fqes,
> +		.truesize	= ctlq->truesize,
> +		.count		= ctlq->ring_len,
> +	};
> +	int ret = 0;
> +
> +	num_to_post = (ntc > ntp ? 0 : ctlq->ring_len) + ntc - ntp - 1;
> +
> +	while (num_to_post--) {
> +		dma_addr_t addr;
> +
> +		addr = libeth_rx_alloc(&fq, ntp);
> +		if (unlikely(addr == DMA_MAPPING_ERROR)) {
> +			ret = -ENOMEM;
> +			goto post_bufs;
> +		}
> +
> +		libie_ctlq_reset_rx_desc(&ctlq->descs[ntp], addr, fq.truesize);
> +
> +		if (unlikely(++ntp == ctlq->ring_len))
> +			ntp = 0;
> +	}
> +
> +post_bufs:
> +	if (likely(ctlq->next_to_post != ntp)) {
> +		ctlq->next_to_post = ntp;
> +
> +		writel(ntp, ctlq->reg.tail);
> +	}
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_NS_GPL(libie_ctlq_post_rx_buffs, "LIBIE_CP");
> +
> +/**
> + * libie_ctlq_free_tx_msgs - Free Tx control queue messages
> + * @ctlq: Tx control queue being destroyed
> + * @num_msgs: number of messages allocated so far
> + */
> +static void libie_ctlq_free_tx_msgs(struct libie_ctlq_info *ctlq,
> +				    u32 num_msgs)
> +{
> +	for (u32 i = 0; i < num_msgs; i++)
> +		kfree(ctlq->tx_msg[i]);
> +
> +	kvfree(ctlq->tx_msg);
> +}
> +
> +/**
> + * libie_ctlq_alloc_tx_msgs - Allocate Tx control queue messages
> + * @ctlq: Tx control queue being created
> + *
> + * Return: %0 on success, -%ENOMEM on allocation error
> + */
> +static int libie_ctlq_alloc_tx_msgs(struct libie_ctlq_info *ctlq)
> +{
> +	ctlq->tx_msg = kvzalloc_objs(*ctlq->tx_msg, ctlq->ring_len,
> +				     GFP_KERNEL);
> +	if (!ctlq->tx_msg)
> +		return -ENOMEM;
> +
> +	for (u32 i = 0; i < ctlq->ring_len; i++) {
> +		ctlq->tx_msg[i] = kzalloc_obj(*ctlq->tx_msg[i]);
> +		if (!ctlq->tx_msg[i]) {
> +			libie_ctlq_free_tx_msgs(ctlq, i);
> +			return -ENOMEM;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +/**
> + * libie_cp_free_dma_mem - Free the previously allocated DMA memory
> + * @dev: device information
> + * @mem: DMA memory information
> + */
> +static void libie_cp_free_dma_mem(struct device *dev,
> +				  struct libie_cp_dma_mem *mem)
> +{
> +	dma_free_coherent(dev, mem->size, mem->va, mem->pa);
> +	mem->va = NULL;
> +}
> +
> +/**
> + * libie_ctlq_dealloc_ring_res - Free memory allocated for control queue
> + * @ctlq: control queue that requires its ring memory to be freed
> + *
> + * Free the memory used by the ring, buffers and other related structures.
> + */
> +static void libie_ctlq_dealloc_ring_res(struct libie_ctlq_info *ctlq)
> +{
> +	struct libie_cp_dma_mem *dma = &ctlq->ring_mem;
> +
> +	if (ctlq->type == LIBIE_CTLQ_TYPE_TX)
> +		libie_ctlq_free_tx_msgs(ctlq, ctlq->ring_len);
> +	else
> +		libie_ctlq_free_fq(ctlq);
> +
> +	libie_cp_free_dma_mem(ctlq->dev, dma);
> +}
> +
> +/**
> + * libie_cp_alloc_dma_mem - Allocate a DMA memory
> + * @dev: device information
> + * @mem: memory for DMA information to be stored
> + * @size: size of the memory to allocate
> + *
> + * Return: virtual address of DMA memory or NULL.
> + */
> +static void *libie_cp_alloc_dma_mem(struct device *dev,
> +				    struct libie_cp_dma_mem *mem, u32 size)
> +{
> +	size = ALIGN(size, SZ_4K);
> +
> +	mem->va = dma_alloc_coherent(dev, size, &mem->pa, GFP_KERNEL);
> +	mem->size = size;
> +
> +	return mem->va;
> +}
> +
> +/**
> + * libie_ctlq_alloc_queue_res - allocate memory for descriptor ring and bufs
> + * @ctlq: control queue that requires its ring resources to be allocated
> + *
> + * Return: %0 on success, -%errno on failure
> + */
> +static int libie_ctlq_alloc_queue_res(struct libie_ctlq_info *ctlq)
> +{
> +	size_t size = array_size(ctlq->ring_len, sizeof(*ctlq->descs));
> +	struct libie_cp_dma_mem *dma = &ctlq->ring_mem;
> +	int err = -ENOMEM;
> +
> +	if (!libie_cp_alloc_dma_mem(ctlq->dev, dma, size))
> +		return -ENOMEM;
> +
> +	ctlq->descs = dma->va;
> +
> +	if (ctlq->type == LIBIE_CTLQ_TYPE_TX) {
> +		if (libie_ctlq_alloc_tx_msgs(ctlq))
> +			goto free_dma_mem;
> +	} else {
> +		err = libie_ctlq_init_fq(ctlq);
> +		if (err)
> +			goto free_dma_mem;
> +
> +		err = libie_ctlq_post_rx_buffs(ctlq);
> +		if (err) {
> +			libie_ctlq_free_fq(ctlq);
> +			goto free_dma_mem;
> +		}
> +	}
> +
> +	return 0;
> +
> +free_dma_mem:
> +	libie_cp_free_dma_mem(ctlq->dev, dma);
> +
> +	return err;
> +}
> +
> +/**
> + * libie_ctlq_init_regs - Initialize control queue registers
> + * @ctlq: control queue that needs to be initialized
> + *
> + * Initialize registers. The caller is expected to have already initialized the
> + * descriptor ring memory and buffer memory.
> + */
> +static void libie_ctlq_init_regs(struct libie_ctlq_info *ctlq)
> +{
> +	u32 dword;
> +
> +	if (ctlq->type == LIBIE_CTLQ_TYPE_RX)
> +		writel(ctlq->ring_len - 1, ctlq->reg.tail);
> +
> +	writel(0, ctlq->reg.head);
> +	writel(lower_32_bits(ctlq->ring_mem.pa), ctlq->reg.addr_low);
> +	writel(upper_32_bits(ctlq->ring_mem.pa), ctlq->reg.addr_high);
> +
> +	dword = FIELD_PREP(LIBIE_CTLQ_MBX_ATQ_LEN, ctlq->ring_len) |
> +		ctlq->reg.len_ena_mask;
> +	writel(dword, ctlq->reg.len);
> +}
> +
> +/**
> + * libie_find_ctlq - find the controlq for the given id and type
> + * @ctx: controlq context structure
> + * @type: type of controlq to find
> + * @id: controlq id to find
> + *
> + * Return: control queue info pointer on success, NULL on failure
> + */
> +struct libie_ctlq_info *libie_find_ctlq(struct libie_ctlq_ctx *ctx,
> +					enum virtchnl2_queue_type type,
> +					  int id)
> +{
> +	struct libie_ctlq_info *cq;
> +
> +	guard(spinlock)(&ctx->ctlqs_lock);
> +
> +	list_for_each_entry(cq, &ctx->ctlqs, list)
> +		if (cq->qid == id && cq->type == type)
> +			return cq;
> +
> +	return NULL;
> +}
> +EXPORT_SYMBOL_NS_GPL(libie_find_ctlq, "LIBIE_CP");
> +
> +/**
> + * libie_ctlq_add - add one control queue
> + * @ctx: controlq context information
> + * @qinfo: information that requires for queue creation
> + *
> + * Allocate and initialize a control queue and add it to the control queue list.
> + * The ctlq parameter will be allocated/initialized and passed back to the
> + * caller if no errors occur.
> + *
> + * Note: libie_ctlq_init must be called prior to any calls to libie_ctlq_add.
> + *
> + * Return: added control queue info pointer on success, error pointer on failure
> + */
> +static struct libie_ctlq_info *
> +libie_ctlq_add(struct libie_ctlq_ctx *ctx,
> +	       const struct libie_ctlq_create_info *qinfo)
> +{
> +	struct libie_ctlq_info *ctlq;
> +
> +	if (qinfo->id != LIBIE_CTLQ_MBX_ID)
> +		return ERR_PTR(-EOPNOTSUPP);
> +
> +	/* libie_ctlq_init was not called */
> +	scoped_guard(spinlock, &ctx->ctlqs_lock)
> +		if (!ctx->ctlqs.next)
> +			return ERR_PTR(-EINVAL);
> +
> +	ctlq = kvzalloc_obj(*ctlq);
> +	if (!ctlq)
> +		return ERR_PTR(-ENOMEM);
> +
> +	ctlq->type = qinfo->type;
> +	ctlq->qid = qinfo->id;
> +	ctlq->ring_len = qinfo->len;
> +	ctlq->dev = &ctx->mmio_info.pdev->dev;
> +	ctlq->reg = qinfo->reg;
> +
> +	if (libie_ctlq_alloc_queue_res(ctlq)) {
> +		kvfree(ctlq);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	libie_ctlq_init_regs(ctlq);
> +
> +	spin_lock_init(&ctlq->lock);
> +
> +	scoped_guard(spinlock, &ctx->ctlqs_lock)
> +		list_add(&ctlq->list, &ctx->ctlqs);
> +
> +	return ctlq;
> +}
> +
> +/**
> + * libie_ctlq_remove - deallocate and remove specified control queue
> + * @ctx: libie context information
> + * @ctlq: specific control queue that needs to be removed
> + */
> +static void libie_ctlq_remove(struct libie_ctlq_ctx *ctx,
> +			      struct libie_ctlq_info *ctlq)
> +{
> +	scoped_guard(spinlock, &ctx->ctlqs_lock)
> +		list_del(&ctlq->list);
> +
> +	libie_ctlq_dealloc_ring_res(ctlq);
> +	kvfree(ctlq);
> +}
> +
> +/**
> + * libie_ctlq_init - main initialization routine for all control queues
> + * @ctx: libie context information
> + * @qinfo: array of structs containing info for each queue to be initialized
> + * @numq: number of queues to initialize
> + *
> + * This initializes queue list and adds any number and any type of control
> + * queues. This is an all or nothing routine; if one fails, all previously
> + * allocated queues will be destroyed. This must be called prior to using
> + * the individual add/remove APIs.
> + *
> + * Return: %0 on success, -%errno on failure
> + */
> +int libie_ctlq_init(struct libie_ctlq_ctx *ctx,
> +		    const struct libie_ctlq_create_info *qinfo,
> +		     u32 numq)
> +{
> +	INIT_LIST_HEAD(&ctx->ctlqs);
> +	spin_lock_init(&ctx->ctlqs_lock);
> +
> +	for (u32 i = 0; i < numq; i++) {
> +		struct libie_ctlq_info *ctlq;
> +
> +		ctlq = libie_ctlq_add(ctx, &qinfo[i]);
> +		if (IS_ERR(ctlq)) {
> +			libie_ctlq_deinit(ctx);
> +			return PTR_ERR(ctlq);
> +		}
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(libie_ctlq_init, "LIBIE_CP");
> +
> +/**
> + * libie_ctlq_deinit - destroy all control queues
> + * @ctx: libie CP context information
> + */
> +void libie_ctlq_deinit(struct libie_ctlq_ctx *ctx)
> +{
> +	struct libie_ctlq_info *ctlq, *tmp;
> +
> +	list_for_each_entry_safe(ctlq, tmp, &ctx->ctlqs, list)
> +		libie_ctlq_remove(ctx, ctlq);
> +}
> +EXPORT_SYMBOL_NS_GPL(libie_ctlq_deinit, "LIBIE_CP");
> +
> +/**
> + * libie_ctlq_tx_desc_from_msg - initialize a Tx descriptor from a message
> + * @desc: descriptor to be initialized
> + * @msg: filled control queue message
> + */
> +static void libie_ctlq_tx_desc_from_msg(struct libie_ctlq_desc *desc,
> +					const struct libie_ctlq_msg *msg)
> +{
> +	const struct libie_cp_dma_mem *dma = &msg->send_mem;
> +	u64 qword;
> +
> +	qword = FIELD_PREP(LIBIE_CTLQ_DESC_FLAGS, msg->flags) |
> +		FIELD_PREP(LIBIE_CTLQ_DESC_INFRA_OPCODE, msg->opcode) |
> +		FIELD_PREP(LIBIE_CTLQ_DESC_PFID_VFID, msg->func_id);
> +	desc->qword0 = cpu_to_le64(qword);
> +
> +	qword = FIELD_PREP(LIBIE_CTLQ_DESC_VIRTCHNL_OPCODE,
> +			   msg->chnl_opcode) |
> +		FIELD_PREP(LIBIE_CTLQ_DESC_VIRTCHNL_MSG_RET_VAL,
> +			   msg->chnl_retval);
> +	desc->qword1 = cpu_to_le64(qword);
> +
> +	qword = FIELD_PREP(LIBIE_CTLQ_DESC_MSG_PARAM0, msg->param0) |
> +		FIELD_PREP(LIBIE_CTLQ_DESC_SW_COOKIE,
> +			   msg->sw_cookie) |
> +		FIELD_PREP(LIBIE_CTLQ_DESC_VIRTCHNL_FLAGS,
> +			   msg->virt_flags);
> +	desc->qword2 = cpu_to_le64(qword);
> +
> +	if (likely(msg->data_len)) {
> +		desc->qword0 |=
> +			cpu_to_le64(LIBIE_CTLQ_DESC_QWORD0(msg->data_len));
> +		qword = FIELD_PREP(LIBIE_CTLQ_DESC_DATA_ADDR_HIGH,
> +				   upper_32_bits(dma->pa)) |
> +			FIELD_PREP(LIBIE_CTLQ_DESC_DATA_ADDR_LOW,
> +				   lower_32_bits(dma->pa));
> +	} else {
> +		qword = msg->addr_param;
> +	}
> +
> +	desc->qword3 = cpu_to_le64(qword);
> +}
> +
> +/**
> + * libie_ctlq_send_desc_avail - get number of free descriptors on a Tx ctlq
> + * @ctlq: specific control queue which is going be used for sending messages
> + *
> + * The caller must hold ctlq->lock. Any dependent sending must be done
> + * in the same critical section.
> + *
> + * Return: number of available descriptors/messages on a given control queue.
> + */
> +u32 libie_ctlq_send_desc_avail(const struct libie_ctlq_info *ctlq)
> +{
> +	u32 ntu = ctlq->next_to_use, ntc = ctlq->next_to_clean;
> +
> +	return (ntc > ntu ? 0 : ctlq->ring_len) + ntc - ntu - 1;
> +}
> +EXPORT_SYMBOL_NS_GPL(libie_ctlq_send_desc_avail, "LIBIE_CP");
> +
> +/**
> + * libie_ctlq_send - send a message to Control Plane or Peer
> + * @ctlq: specific control queue which is used for sending a message
> + * @num_q_msg: number of messages present to send on @ctlq,
> + *	       positive and no greater than the number of available descriptors
> + *
> + * The caller must fill in @num_q_msg Tx messages staring at ntu beforehand.
> + *
> + * The caller must hold ctlq->lock. The intended pattern is to first check
> + * the number of descriptors available, then fill in the messages and perform
> + * send within a single critical section.
> + *
> + * Return: %0 on success, -%errno on failure.
> + */
> +void libie_ctlq_send(struct libie_ctlq_info *ctlq, u32 num_q_msg)
> +{
> +	u32 ntu = ctlq->next_to_use;
> +
> +	for (int i = 0; i < num_q_msg; i++) {
> +		struct libie_ctlq_msg *msg = ctlq->tx_msg[ntu];
> +		struct libie_ctlq_desc *desc;
> +
> +		desc = &ctlq->descs[ntu];
> +		libie_ctlq_tx_desc_from_msg(desc, msg);
> +
> +		if (unlikely(++ntu == ctlq->ring_len))
> +			ntu = 0;
> +	}
> +	writel(ntu, ctlq->reg.tail);
> +	ctlq->next_to_use = ntu;
> +}
> +EXPORT_SYMBOL_NS_GPL(libie_ctlq_send, "LIBIE_CP");
> +
> +/**
> + * libie_ctlq_fill_rx_msg - fill in a message from Rx descriptor and buffer
> + * @msg: message to be filled in
> + * @desc: received descriptor
> + * @rx_buf: fill queue buffer associated with the descriptor
> + */
> +static void libie_ctlq_fill_rx_msg(struct libie_ctlq_msg *msg,
> +				   const struct libie_ctlq_desc *desc,
> +				    struct libeth_fqe *rx_buf)
> +{
> +	u64 qword = le64_to_cpu(desc->qword0);
> +
> +	msg->flags = FIELD_GET(LIBIE_CTLQ_DESC_FLAGS, qword);
> +	msg->opcode = FIELD_GET(LIBIE_CTLQ_DESC_INFRA_OPCODE, qword);
> +	msg->data_len = FIELD_GET(LIBIE_CTLQ_DESC_DATA_LEN, qword);
> +	msg->hw_retval = FIELD_GET(LIBIE_CTLQ_DESC_HW_RETVAL, qword);
> +
> +	qword = le64_to_cpu(desc->qword1);
> +	msg->chnl_opcode =
> +		FIELD_GET(LIBIE_CTLQ_DESC_VIRTCHNL_OPCODE, qword);
> +	msg->chnl_retval =
> +		FIELD_GET(LIBIE_CTLQ_DESC_VIRTCHNL_MSG_RET_VAL, qword);
> +
> +	qword = le64_to_cpu(desc->qword2);
> +	msg->param0 =
> +		FIELD_GET(LIBIE_CTLQ_DESC_MSG_PARAM0, qword);
> +	msg->sw_cookie =
> +		FIELD_GET(LIBIE_CTLQ_DESC_SW_COOKIE, qword);
> +	msg->virt_flags =
> +		FIELD_GET(LIBIE_CTLQ_DESC_VIRTCHNL_FLAGS, qword);
> +
> +	if (likely(msg->data_len)) {
> +		if (unlikely(msg->data_len > LIBIE_CTLQ_MAX_BUF_LEN)) {
> +			msg->data_len = LIBIE_CTLQ_MAX_BUF_LEN;
> +			msg->chnl_retval = U32_MAX;
> +		}
> +		msg->recv_mem = (struct kvec) {
> +			.iov_base = netmem_address(rx_buf->netmem),
> +			.iov_len = msg->data_len,
> +		};
> +		libeth_rx_sync_for_cpu(rx_buf, msg->data_len);
> +	} else {
> +		msg->recv_mem = (struct kvec) {};
> +		msg->addr_param = le64_to_cpu(desc->qword3);
> +		page_pool_put_full_netmem(netmem_get_pp(rx_buf->netmem),
> +					  rx_buf->netmem, false);
> +	}
> +}
> +
> +/**
> + * libie_ctlq_recv - receive control queue message call back
> + * @ctlq: control queue that needs to processed for receive
> + * @msg: array of received control queue messages on this q;
> + * needs to be pre-allocated by caller for as many messages as requested
> + * @num_q_msg: number of messages that can be stored in msg buffer
> + *
> + * Called by interrupt handler or polling mechanism. Caller is expected
> + * to free buffers.
> + *
> + * The caller must make sure that calls to libie_ctlq_post_rx_buffs()
> + * and libie_ctlq_recv() for each queue are either serialized
> + * or used under ctlq->lock.
> + *
> + * Return: number of messages received
> + */
> +u32 libie_ctlq_recv(struct libie_ctlq_info *ctlq, struct libie_ctlq_msg *msg,
> +		    u32 num_q_msg)
> +{
> +	u32 ntc, i;
> +
> +	ntc = ctlq->next_to_clean;
> +
> +	for (i = 0; i < num_q_msg; i++) {
> +		const struct libie_ctlq_desc *desc = &ctlq->descs[ntc];
> +		struct libeth_fqe *rx_buf = &ctlq->rx_fqes[ntc];
> +		u64 qword;
> +
> +		qword = le64_to_cpu(desc->qword0);
> +		if (!FIELD_GET(LIBIE_CTLQ_DESC_FLAG_DD, qword))
> +			break;
> +
> +		dma_rmb();
> +
> +		libie_ctlq_fill_rx_msg(&msg[i], desc, rx_buf);
> +
> +		if (unlikely(++ntc == ctlq->ring_len))
> +			ntc = 0;
> +	}
> +
> +	ctlq->next_to_clean = ntc;
> +
> +	return i;
> +}
> +EXPORT_SYMBOL_NS_GPL(libie_ctlq_recv, "LIBIE_CP");
> +
> +MODULE_DESCRIPTION("Control Plane communication API");
> +MODULE_IMPORT_NS("LIBETH");
> +MODULE_LICENSE("GPL");
> diff --git a/include/linux/intel/libie/controlq.h b/include/linux/intel/libie/controlq.h
> new file mode 100644
> index 000000000000..a6ed4fa159b1
> --- /dev/null
> +++ b/include/linux/intel/libie/controlq.h
> @@ -0,0 +1,249 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (C) 2025 Intel Corporation */
> +
> +#ifndef __LIBIE_CONTROLQ_H
> +#define __LIBIE_CONTROLQ_H
> +
> +#include <net/libeth/rx.h>
> +
> +#include <linux/intel/libie/pci.h>
> +#include <linux/intel/virtchnl2.h>
> +
> +/* Default mailbox control queue */
> +#define LIBIE_CTLQ_MBX_ID			-1
> +#define LIBIE_CTLQ_MAX_BUF_LEN			SZ_4K
> +
> +#define LIBIE_CTLQ_TYPE_TX			0
> +#define LIBIE_CTLQ_TYPE_RX			1
> +
> +/* Opcode used to send controlq message to the control plane */
> +#define LIBIE_CTLQ_SEND_MSG_TO_CP		0x801
> +#define LIBIE_CTLQ_SEND_MSG_TO_PEER		0x804
> +
> +/**
> + * struct libie_ctlq_ctx - contains controlq info and MMIO region info
> + * @mmio_info: MMIO region info structure
> + * @ctlqs: list that stores all the control queues
> + * @ctlqs_lock: lock for control queue list
> + */
> +struct libie_ctlq_ctx {
> +	struct libie_mmio_info	mmio_info;
> +	struct list_head	ctlqs;
> +	spinlock_t		ctlqs_lock;	/* protects the ctlqs list */
> +};
> +
> +/**
> + * struct libie_ctlq_reg - structure representing virtual addresses of the
> + *			    controlq registers and masks
> + * @head: controlq head register address
> + * @tail: controlq tail register address
> + * @len: register address to write controlq length and enable bit
> + * @addr_high: register address to write the upper 32b of ring physical address
> + * @addr_low: register address to write the lower 32b of ring physical address
> + * @len_mask: mask to read the controlq length
> + * @len_ena_mask: mask to write the controlq enable bit
> + * @head_mask: mask to read the head value
> + */
> +struct libie_ctlq_reg {
> +	void __iomem	*head;
> +	void __iomem	*tail;
> +	void __iomem	*len;
> +	void __iomem	*addr_high;
> +	void __iomem	*addr_low;
> +	u32		len_mask;
> +	u32		len_ena_mask;
> +	u32		head_mask;
> +};
> +
> +/**
> + * struct libie_cp_dma_mem - structure for DMA memory
> + * @va: virtual address
> + * @pa: physical address
> + * @size: memory size
> + */
> +struct libie_cp_dma_mem {
> +	void		*va;
> +	dma_addr_t	pa;
> +	size_t		size;
> +};
> +
> +/**
> + * struct libie_ctlq_msg - control queue message data
> + * @flags: refer to 'Flags sub-structure' definitions
> + * @opcode: infrastructure message opcode
> + * @data_len: size of the payload
> + * @func_id: queue id for the secondary mailbox queue, 0 for default mailbox
> + * @hw_retval: execution status from the HW
> + * @chnl_opcode: virtchnl message opcode
> + * @chnl_retval: virtchnl return value
> + * @param0: indirect message raw parameter0
> + * @sw_cookie: used to verify the response of the sent virtchnl message
> + * @virt_flags: virtchnl capability flags
> + * @addr_param: additional parameters in place of the address, given no buffer
> + * @recv_mem: virtual address and size of the buffer that contains
> + *	      the indirect response
> + * @send_mem: physical and virtual address of the DMA buffer,
> + *	      used for sending
> + */
> +struct libie_ctlq_msg {
> +	u16			flags;
> +	u16			opcode;
> +	u16			data_len;
> +	union {
> +		u16		func_id;
> +		u16		hw_retval;
> +	};
> +	u32			chnl_opcode;
> +	u32			chnl_retval;
> +	u32			param0;
> +	u16			sw_cookie;
> +	u16			virt_flags;
> +	u64			addr_param;
> +	union {
> +		struct kvec	recv_mem;
> +		struct	libie_cp_dma_mem send_mem;
> +	};
> +};
> +
> +/**
> + * struct libie_ctlq_create_info - control queue create information
> + * @type: control queue type (Rx or Tx)
> + * @id: queue offset passed as input, -1 for default mailbox
> + * @reg: registers accessed by control queue
> + * @len: controlq length
> + */
> +struct libie_ctlq_create_info {
> +	enum virtchnl2_queue_type	type;
> +	int				id;
> +	struct libie_ctlq_reg		reg;
> +	u16				len;
> +};
> +
> +/**
> + * struct libie_ctlq_info - control queue information
> + * @list: used to add a controlq to the list of queues in libie_ctlq_ctx
> + * @type: control queue type
> + * @qid: queue identifier
> + * @lock: control queue lock
> + * @ring_mem: descriptor ring DMA memory
> + * @descs: array of descriptors
> + * @rx_fqes: array of controlq Rx buffers
> + * @tx_msg: Tx messages sent to hardware
> + * @reg: registers used by control queue
> + * @dev: device that owns this control queue
> + * @pp: page pool for controlq Rx buffers
> + * @truesize: size to allocate per buffer
> + * @next_to_use: next available slot to send buffer
> + * @next_to_clean: next descriptor to be cleaned
> + * @next_to_post: next available slot to post buffers to after receive
> + * @ring_len: length of the descriptor ring
> + */
> +struct libie_ctlq_info {
> +	struct list_head		list;
> +	enum virtchnl2_queue_type	type;
> +	int				qid;
> +	spinlock_t			lock;	/* for concurrent processing */
> +	struct libie_cp_dma_mem	ring_mem;
> +	struct libie_ctlq_desc		*descs;
> +	union {
> +		struct libeth_fqe		*rx_fqes;
> +		struct libie_ctlq_msg		**tx_msg;
> +	};
> +	struct libie_ctlq_reg		reg;
> +	struct device			*dev;
> +	struct page_pool		*pp;
> +	u32				truesize;
> +	u32				next_to_clean;
> +	union {
> +		u32			next_to_use;
> +		u32			next_to_post;
> +	};
> +	u32				ring_len;
> +};
> +
> +#define LIBIE_CTLQ_MBX_ATQ_LEN			GENMASK(9, 0)
> +
> +/* Flags sub-structure
> + * |0  |1  |2  |3  |4  |5  |6  |7  |8  |9  |10 |11 |12 |13 |14 |15 |
> + * |DD |CMP|ERR|  * RSV *  |FTYPE  | *RSV* |RD |VFC|BUF|  HOST_ID  |
> + */
> + /* libie controlq descriptor qword0 details */
> +#define LIBIE_CTLQ_DESC_FLAG_DD		BIT(0)
> +#define LIBIE_CTLQ_DESC_FLAG_CMP		BIT(1)
> +#define LIBIE_CTLQ_DESC_FLAG_ERR		BIT(2)
> +#define LIBIE_CTLQ_DESC_FLAG_FTYPE_VM		BIT(6)
> +#define LIBIE_CTLQ_DESC_FLAG_FTYPE_PF		BIT(7)
> +#define LIBIE_CTLQ_DESC_FLAG_FTYPE		GENMASK(7, 6)
> +#define LIBIE_CTLQ_DESC_FLAG_RD		BIT(10)
> +#define LIBIE_CTLQ_DESC_FLAG_VFC		BIT(11)
> +#define LIBIE_CTLQ_DESC_FLAG_BUF		BIT(12)
> +#define LIBIE_CTLQ_DESC_FLAG_HOST_ID		GENMASK(15, 13)
> +
> +#define LIBIE_CTLQ_DESC_FLAGS			GENMASK(15, 0)
> +#define LIBIE_CTLQ_DESC_INFRA_OPCODE		GENMASK_ULL(31, 16)
> +#define LIBIE_CTLQ_DESC_DATA_LEN		GENMASK_ULL(47, 32)
> +#define LIBIE_CTLQ_DESC_HW_RETVAL		GENMASK_ULL(63, 48)
> +
> +#define LIBIE_CTLQ_DESC_PFID_VFID		GENMASK_ULL(63, 48)
> +
> +/* libie controlq descriptor qword1 details */
> +#define LIBIE_CTLQ_DESC_VIRTCHNL_OPCODE	GENMASK(27, 0)
> +#define LIBIE_CTLQ_DESC_VIRTCHNL_DESC_TYPE	GENMASK_ULL(31, 28)
> +#define LIBIE_CTLQ_DESC_VIRTCHNL_MSG_RET_VAL	GENMASK_ULL(63, 32)
> +
> +/* libie controlq descriptor qword2 details */
> +#define LIBIE_CTLQ_DESC_MSG_PARAM0		GENMASK_ULL(31, 0)
> +#define LIBIE_CTLQ_DESC_SW_COOKIE		GENMASK_ULL(47, 32)
> +#define LIBIE_CTLQ_DESC_VIRTCHNL_FLAGS		GENMASK_ULL(63, 48)
> +
> +/* libie controlq descriptor qword3 details */
> +#define LIBIE_CTLQ_DESC_DATA_ADDR_HIGH		GENMASK_ULL(31, 0)
> +#define LIBIE_CTLQ_DESC_DATA_ADDR_LOW		GENMASK_ULL(63, 32)
> +
> +/**
> + * struct libie_ctlq_desc - control queue descriptor format
> + * @qword0: flags, message opcode, data length etc
> + * @qword1: virtchnl opcode, descriptor type and return value
> + * @qword2: indirect message parameters
> + * @qword3: indirect message buffer address
> + */
> +struct libie_ctlq_desc {
> +	__le64			qword0;
> +	__le64			qword1;
> +	__le64			qword2;
> +	__le64			qword3;
> +};
> +
> +/**
> + * libie_ctlq_release_rx_buf - Release Rx buffer for a specific control queue
> + * @rx_buf: Rx buffer to be freed
> + *
> + * Driver uses this function to post back the Rx buffer after the usage.
> + */
> +static inline void libie_ctlq_release_rx_buf(struct kvec *rx_buf)
> +{
> +	netmem_ref netmem;
> +
> +	if (!rx_buf->iov_base)
> +		return;
> +
> +	netmem = virt_to_netmem(rx_buf->iov_base);
> +	page_pool_put_full_netmem(netmem_get_pp(netmem), netmem, false);
> +}
> +
> +int libie_ctlq_init(struct libie_ctlq_ctx *ctx,
> +		    const struct libie_ctlq_create_info *qinfo,  u32 numq);
> +void libie_ctlq_deinit(struct libie_ctlq_ctx *ctx);
> +
> +struct libie_ctlq_info *libie_find_ctlq(struct libie_ctlq_ctx *ctx,
> +					enum virtchnl2_queue_type type,
> +					  int id);
> +
> +u32 libie_ctlq_send_desc_avail(const struct libie_ctlq_info *ctlq);
> +void libie_ctlq_send(struct libie_ctlq_info *ctlq, u32 num_q_msg);
> +u32 libie_ctlq_recv(struct libie_ctlq_info *ctlq, struct libie_ctlq_msg *msg,
> +		    u32 num_q_msg);
> +
> +int libie_ctlq_post_rx_buffs(struct libie_ctlq_info *ctlq);
> +
> +#endif /* __LIBIE_CONTROLQ_H */
> -- 
> 2.47.1
> 

^ permalink raw reply

* Re: Re: [PATCH] dcache: add fs.dentry-limit sysctl with negative-first reaper
From: Horst Birthelmer @ 2026-05-18  7:01 UTC (permalink / raw)
  To: NeilBrown
  Cc: Horst Birthelmer, Miklos Szeredi, Jonathan Corbet, Shuah Khan,
	Alexander Viro, Christian Brauner, Jan Kara, linux-doc,
	linux-kernel, linux-fsdevel, Horst Birthelmer
In-Reply-To: <177906210551.3947082.4313294634549021141@noble.neil.brown.name>

On Mon, May 18, 2026 at 09:55:05AM +1000, NeilBrown wrote:
> On Fri, 15 May 2026, Horst Birthelmer wrote:
> > From: Horst Birthelmer <hbirthelmer@ddn.com>
> > 
> > The dcache only shrinks under memory pressure, which is rarely reached
> > on machines with ample RAM, so cached negative dentries can accumulate
> > without bound.  Give administrators a soft cap they can set,
> > and a background worker that prefers negative dentries when reclaiming.
> > 
> > Two new sysctls under /proc/sys/fs/:
> > 
> >   dentry-limit             -- soft cap on nr_dentry.  0 (default)
> >                               disables the feature; behaviour is then
> >                               identical to before.
> 
> Is a system-wide cap really a suitable tool?  What guidance would you
> give to sysadmins who are considering setting a number?

I know it is a rhetorical question ... nevertheless
It's a soft cap, so it depends on the number of open files usually floating 
around on the machine. It even depends on the file systems. That was actually
my motivation (more than the negative entries). Some cache entries are 
expensive for our fuse server due to our DLM usage and private data 
held in user space.

> Is there a better approach?

After reading your thoughts and those of the others who have taken the time
to revisit this, I think there is no better solution in the VFS layer.

Since 2025 (commit 395b95530343e) shrink_dentry_list() is an exported symbol 
and that can be used for a specific file system to do its own housekeeping. 
This will probably be considered a misuse by some , but it would be more 
specific and better controllable especially from filesystems where certain
cache entries are more expensive than others and/or running in user space (FUSE).

> 
> According to the email you linked, a problem arises when a directory has
> a great many negative children.  Code which walks the list of children
> (such as fsnotify) while holding a lock can suffer unpredictable delays
> and result in long lock-hold times.  So maybe a limit on negative
> dentries for any parent is what we really want.  That would be clumsy to
> implement I imagine.
> 
> But what if we move dentries to the end of the list when they become
> negative, and to the start of the list when they become positive?  Then
> code which walks the child list could simply abort on the first
> negative.
> 
> I doubt that would be quite as easy as it sounds, but it would at least
> be more focused on the observed symptom rather than some whole-system
> number which only vaguely correlates with the observed symptom.
> 
> Maybe a completely different approach: change children-walking code to
> drop and retake the lock (with appropriate validation) periodically.
> What too would address the specific symptom.
> 
> Thanks for attempting to resolve this issue, but I'm not convinced that
> you have found a good solution yet.

Thanks for the clear words. I realy appreciate it!

> 
> NeilBrown
> 

^ permalink raw reply

* Re: [PATCH] nios2: remove the architecture
From: Uwe Kleine-König @ 2026-05-18  6:55 UTC (permalink / raw)
  To: Ethan Nelson-Moore
  Cc: linux-doc, devicetree, workflows, linux-arch, dmaengine,
	linux-i2c, linux-iio, netdev, linux-pci, linux-pwm,
	linux-hardening, linux-kbuild, linux-csky, Jonathan Corbet,
	Shuah Khan, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	Daniel Lezcano, Thomas Gleixner, Alex Shi, Yanteng Si,
	Dongliang Mu, Hu Haowen, Dinh Nguyen, Kees Cook, Oleg Nesterov,
	Will Deacon, Aneesh Kumar K.V, Andrew Morton, Nick Piggin,
	Peter Zijlstra, Vinod Koul, Frank Li, Dave Penkler, Andi Shyti,
	Jonathan Cameron, David Lechner, Nuno Sá, Andy Shevchenko,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Lorenzo Pieralisi, Krzysztof Wilczyński
In-Reply-To: <20260518042833.272221-1-enelsonmoore@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 853 bytes --]

Hello,

On Sun, May 17, 2026 at 09:28:33PM -0700, Ethan Nelson-Moore wrote:
> diff --git a/drivers/pwm/Kconfig b/drivers/pwm/Kconfig
> index 6f3147518376..d8145f369ec3 100644
> --- a/drivers/pwm/Kconfig
> +++ b/drivers/pwm/Kconfig
> @@ -131,7 +131,7 @@ config PWM_ATMEL_TCB
>  
>  config PWM_AXI_PWMGEN
>  	tristate "Analog Devices AXI PWM generator"
> -	depends on MICROBLAZE || NIOS2 || ARCH_ZYNQ || ARCH_ZYNQMP || ARCH_INTEL_SOCFPGA || COMPILE_TEST
> +	depends on MICROBLAZE || ARCH_ZYNQ || ARCH_ZYNQMP || ARCH_INTEL_SOCFPGA || COMPILE_TEST
>  	select REGMAP_MMIO
>  	help
>  	  This enables support for the Analog Devices AXI PWM generator.
> diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
> index 5a40252b8334..edc8c96d91b6 100644

Acked-by: Uwe Kleine-König <ukleinek@kernel.org> # for pwm

Best regards
Uwe

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v3 02/14] libie: add PCI device initialization helpers to libie
From: Larysa Zaremba @ 2026-05-18  6:46 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, netdev, Tony Nguyen
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, netdev,
	Phani R Burra, przemyslaw.kitszel, aleksander.lobakin,
	sridhar.samudrala, anjali.singhai, michal.swiatkowski,
	maciej.fijalkowski, emil.s.tantilov, madhu.chittim, joshua.a.hay,
	jacob.e.keller, jayaprakash.shanmugam, jiri, horms, corbet,
	richardcochran, linux-doc, bhelgaas, linux-pci, Bharath R,
	Samuel Salin, Aleksandr Loktionov
In-Reply-To: <20260515224443.2772147-3-anthony.l.nguyen@intel.com>

On Fri, May 15, 2026 at 03:44:26PM -0700, Tony Nguyen wrote:
> From: Phani R Burra <phani.r.burra@intel.com>
> 
> Add support functions for drivers to configure PCI functionality and access
> MMIO space.
>

I had reviewed the Sashiko feedback [0]. Here is why I not find the feedback 
very helpful for this particular patch:

1. .config selection is consistent with overall libie scheme, so if it is 
   suboptimal, bigger changes (out of scope) are needed.
2. __libie_pci_get_mmio_addr is not intended to check read boundaries, this is 
   consistent with pre-refactor implementation
3. I see that none of the checks x + y > pci_resource_len(...) check for 
   overflow in any way. If you have a good example, please share.
4. Not using device resources for ioremap() was a concious choice, to simplify 
   deleting and adding regions at runtime.

[0] https://sashiko.dev/#/patchset/20260515224443.2772147-1-anthony.l.nguyen%40intel.com
 
> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> Signed-off-by: Phani R Burra <phani.r.burra@intel.com>
> Co-developed-by: Victor Raj <victor.raj@intel.com>
> Signed-off-by: Victor Raj <victor.raj@intel.com>
> Co-developed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> Co-developed-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
> Co-developed-by: Larysa Zaremba <larysa.zaremba@intel.com>
> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
> Tested-by: Bharath R <bharath.r@intel.com>
> Tested-by: Samuel Salin <Samuel.salin@intel.com>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
> ---
>  drivers/net/ethernet/intel/libie/Kconfig  |   6 +
>  drivers/net/ethernet/intel/libie/Makefile |   4 +
>  drivers/net/ethernet/intel/libie/pci.c    | 208 ++++++++++++++++++++++
>  include/linux/intel/libie/pci.h           |  56 ++++++
>  4 files changed, 274 insertions(+)
>  create mode 100644 drivers/net/ethernet/intel/libie/pci.c
>  create mode 100644 include/linux/intel/libie/pci.h
> 
> diff --git a/drivers/net/ethernet/intel/libie/Kconfig b/drivers/net/ethernet/intel/libie/Kconfig
> index 70831c7e336e..500a95c944a8 100644
> --- a/drivers/net/ethernet/intel/libie/Kconfig
> +++ b/drivers/net/ethernet/intel/libie/Kconfig
> @@ -23,3 +23,9 @@ config LIBIE_FWLOG
>  	  for it. Firmware logging is using admin queue interface to communicate
>  	  with the device. Debugfs is a user interface used to config logging
>  	  and dump all collected logs.
> +
> +config LIBIE_PCI
> +	tristate
> +	help
> +	  Helper functions for management of PCI resources belonging
> +	  to networking devices.
> diff --git a/drivers/net/ethernet/intel/libie/Makefile b/drivers/net/ethernet/intel/libie/Makefile
> index db57fc6780ea..a28509cb9086 100644
> --- a/drivers/net/ethernet/intel/libie/Makefile
> +++ b/drivers/net/ethernet/intel/libie/Makefile
> @@ -12,3 +12,7 @@ libie_adminq-y			:= adminq.o
>  obj-$(CONFIG_LIBIE_FWLOG) 	+= libie_fwlog.o
>  
>  libie_fwlog-y			:= fwlog.o
> +
> +obj-$(CONFIG_LIBIE_PCI)		+= libie_pci.o
> +
> +libie_pci-y			:= pci.o
> diff --git a/drivers/net/ethernet/intel/libie/pci.c b/drivers/net/ethernet/intel/libie/pci.c
> new file mode 100644
> index 000000000000..7276a3533b54
> --- /dev/null
> +++ b/drivers/net/ethernet/intel/libie/pci.c
> @@ -0,0 +1,208 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (C) 2025 Intel Corporation */
> +
> +#include <linux/intel/libie/pci.h>
> +
> +/**
> + * libie_find_mmio_region - find MMIO region containing a range
> + * @mmio_list: list that contains MMIO region info
> + * @offset: range start offset
> + * @size: range size
> + * @bar_idx: BAR index containing the range to search
> + *
> + * Return: pointer to a MMIO region overlapping with the range in any way or
> + *	   NULL if no such region is mapped.
> + */
> +static struct libie_pci_mmio_region *
> +libie_find_mmio_region(const struct list_head *mmio_list,
> +		       resource_size_t offset, resource_size_t size,
> +		       int bar_idx)
> +{
> +	resource_size_t end_offset = offset + size;
> +	struct libie_pci_mmio_region *mr;
> +
> +	list_for_each_entry(mr, mmio_list, list) {
> +		resource_size_t mr_end = mr->offset + mr->size;
> +		resource_size_t mr_start = mr->offset;
> +
> +		if (mr->bar_idx != bar_idx)
> +			continue;
> +		if (offset < mr_end && end_offset > mr_start)
> +			return mr;
> +	}
> +
> +	return NULL;
> +}
> +
> +/**
> + * __libie_pci_get_mmio_addr - get the MMIO virtual address
> + * @mmio_info: contains list of MMIO regions
> + * @offset: register offset to find
> + * @num_args: number of additional arguments present
> + *
> + * This function finds the virtual address of a register offset by iterating
> + * through the non-linear MMIO regions that are mapped by the driver.
> + *
> + * Return: valid MMIO virtual address or NULL.
> + */
> +void __iomem *__libie_pci_get_mmio_addr(struct libie_mmio_info *mmio_info,
> +					resource_size_t offset,
> +					int num_args, ...)
> +{
> +	struct libie_pci_mmio_region *mr;
> +	int bar_idx = 0;
> +	va_list args;
> +
> +	if (num_args) {
> +		va_start(args, num_args);
> +		bar_idx = va_arg(args, int);
> +		va_end(args);
> +	}
> +
> +	list_for_each_entry(mr, &mmio_info->mmio_list, list)
> +		if (bar_idx == mr->bar_idx && offset >= mr->offset &&
> +		    offset < mr->offset + mr->size) {
> +			offset -= mr->offset;
> +
> +			return mr->addr + offset;
> +		}
> +
> +	return NULL;
> +}
> +EXPORT_SYMBOL_NS_GPL(__libie_pci_get_mmio_addr, "LIBIE_PCI");
> +
> +/**
> + * __libie_pci_map_mmio_region - map PCI device MMIO region
> + * @mmio_info: struct to store the mapped MMIO region
> + * @offset: MMIO region start offset
> + * @size: MMIO region size
> + * @num_args: number of additional arguments present
> + *
> + * Return: true on success, false on memory map failure.
> + */
> +bool __libie_pci_map_mmio_region(struct libie_mmio_info *mmio_info,
> +				 resource_size_t offset,
> +				 resource_size_t size, int num_args, ...)
> +{
> +	struct pci_dev *pdev = mmio_info->pdev;
> +	struct libie_pci_mmio_region *mr;
> +	resource_size_t pa;
> +	void __iomem *va;
> +	int bar_idx = 0;
> +	va_list args;
> +
> +	if (num_args) {
> +		va_start(args, num_args);
> +		bar_idx = va_arg(args, int);
> +		va_end(args);
> +	}
> +
> +	if (offset + size > pci_resource_len(pdev, bar_idx))
> +		return false;
> +
> +	mr = libie_find_mmio_region(&mmio_info->mmio_list, offset, size,
> +				    bar_idx);
> +	if (mr) {
> +		pci_warn(pdev,
> +			 "Mapping of BAR%u (offset=%llu, size=%llu) intersecting region (offset=%llu, size=%llu) already exists\n",
> +			 bar_idx, (unsigned long long)mr->offset,
> +			 (unsigned long long)mr->size,
> +			 (unsigned long long)offset, (unsigned long long)size);
> +		return mr->offset <= offset &&
> +		       mr->offset + mr->size >= offset + size;
> +	}
> +
> +	pa = pci_resource_start(pdev, bar_idx) + offset;
> +	va = ioremap(pa, size);
> +	if (!va) {
> +		pci_err(pdev, "Failed to map BAR%u region\n", bar_idx);
> +		return false;
> +	}
> +
> +	mr = kvzalloc_obj(*mr);
> +	if (!mr) {
> +		iounmap(va);
> +		return false;
> +	}
> +
> +	mr->addr = va;
> +	mr->offset = offset;
> +	mr->size = size;
> +	mr->bar_idx = bar_idx;
> +
> +	list_add_tail(&mr->list, &mmio_info->mmio_list);
> +
> +	return true;
> +}
> +EXPORT_SYMBOL_NS_GPL(__libie_pci_map_mmio_region, "LIBIE_PCI");
> +
> +/**
> + * libie_pci_unmap_fltr_regs - unmap selected PCI device MMIO regions
> + * @mmio_info: contains list of MMIO regions to unmap
> + * @fltr: returns true, if region is to be unmapped
> + */
> +void libie_pci_unmap_fltr_regs(struct libie_mmio_info *mmio_info,
> +			       bool (*fltr)(struct libie_mmio_info *mmio_info,
> +					    struct libie_pci_mmio_region *reg))
> +{
> +	struct libie_pci_mmio_region *mr, *tmp;
> +
> +	list_for_each_entry_safe(mr, tmp, &mmio_info->mmio_list, list) {
> +		if (!fltr(mmio_info, mr))
> +			continue;
> +		iounmap(mr->addr);
> +		list_del(&mr->list);
> +		kvfree(mr);
> +	}
> +}
> +EXPORT_SYMBOL_NS_GPL(libie_pci_unmap_fltr_regs, "LIBIE_PCI");
> +
> +/**
> + * libie_pci_unmap_all_mmio_regions - unmap all PCI device MMIO regions
> + * @mmio_info: contains list of MMIO regions to unmap
> + */
> +void libie_pci_unmap_all_mmio_regions(struct libie_mmio_info *mmio_info)
> +{
> +	struct libie_pci_mmio_region *mr, *tmp;
> +
> +	list_for_each_entry_safe(mr, tmp, &mmio_info->mmio_list, list) {
> +		iounmap(mr->addr);
> +		list_del(&mr->list);
> +		kvfree(mr);
> +	}
> +}
> +EXPORT_SYMBOL_NS_GPL(libie_pci_unmap_all_mmio_regions, "LIBIE_PCI");
> +
> +/**
> + * libie_pci_init_dev - enable and reserve PCI regions of the device
> + * @pdev: PCI device information
> + *
> + * Return: %0 on success, -%errno on failure.
> + */
> +int libie_pci_init_dev(struct pci_dev *pdev)
> +{
> +	int err;
> +
> +	err = pcim_enable_device(pdev);
> +	if (err)
> +		return err;
> +
> +	for (int bar = 0; bar < PCI_STD_NUM_BARS; bar++)
> +		if (pci_resource_flags(pdev, bar) & IORESOURCE_MEM) {
> +			err = pcim_request_region(pdev, bar, pci_name(pdev));
> +			if (err)
> +				return err;
> +		}
> +
> +	err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
> +	if (err)
> +		return err;
> +
> +	pci_set_master(pdev);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(libie_pci_init_dev, "LIBIE_PCI");
> +
> +MODULE_DESCRIPTION("Common Ethernet PCI library");
> +MODULE_LICENSE("GPL");
> diff --git a/include/linux/intel/libie/pci.h b/include/linux/intel/libie/pci.h
> new file mode 100644
> index 000000000000..effd072c55c8
> --- /dev/null
> +++ b/include/linux/intel/libie/pci.h
> @@ -0,0 +1,56 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (C) 2025 Intel Corporation */
> +
> +#ifndef __LIBIE_PCI_H
> +#define __LIBIE_PCI_H
> +
> +#include <linux/pci.h>
> +
> +/**
> + * struct libie_pci_mmio_region - structure for MMIO region info
> + * @list: used to add a MMIO region to the list of MMIO regions in
> + *	  libie_mmio_info
> + * @addr: virtual address of MMIO region start
> + * @offset: start offset of the MMIO region
> + * @size: size of the MMIO region
> + * @bar_idx: BAR index to which the MMIO region belongs to
> + */
> +struct libie_pci_mmio_region {
> +	struct list_head	list;
> +	void __iomem		*addr;
> +	resource_size_t		offset;
> +	resource_size_t		size;
> +	u16			bar_idx;
> +};
> +
> +/**
> + * struct libie_mmio_info - contains list of MMIO regions
> + * @pdev: PCI device pointer
> + * @mmio_list: list of MMIO regions
> + */
> +struct libie_mmio_info {
> +	struct pci_dev		*pdev;
> +	struct list_head	mmio_list;
> +};
> +
> +#define libie_pci_map_mmio_region(mmio_info, offset, size, ...)	\
> +	__libie_pci_map_mmio_region(mmio_info, offset, size,		\
> +				     COUNT_ARGS(__VA_ARGS__), ##__VA_ARGS__)
> +
> +#define libie_pci_get_mmio_addr(mmio_info, offset, ...)		\
> +	__libie_pci_get_mmio_addr(mmio_info, offset,			\
> +				   COUNT_ARGS(__VA_ARGS__), ##__VA_ARGS__)
> +
> +bool __libie_pci_map_mmio_region(struct libie_mmio_info *mmio_info,
> +				 resource_size_t offset, resource_size_t size,
> +				 int num_args, ...);
> +void __iomem *__libie_pci_get_mmio_addr(struct libie_mmio_info *mmio_info,
> +					resource_size_t offset,
> +					int num_args, ...);
> +void libie_pci_unmap_all_mmio_regions(struct libie_mmio_info *mmio_info);
> +void libie_pci_unmap_fltr_regs(struct libie_mmio_info *mmio_info,
> +			       bool (*fltr)(struct libie_mmio_info *mmio_info,
> +					    struct libie_pci_mmio_region *reg));
> +int libie_pci_init_dev(struct pci_dev *pdev);
> +
> +#endif /* __LIBIE_PCI_H */
> -- 
> 2.47.1
> 

^ permalink raw reply

* Re: [PATCH v3] killswitch: add per-function short-circuit mitigation primitive
From: Song Liu @ 2026-05-18  6:37 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, linux-doc, linux-kselftest, bpf, live-patching,
	Greg Kroah-Hartman, Andrew Morton, Jonathan Corbet,
	Mathieu Desnoyers, Joshua Peisach, Florian Weimer, Breno Leitao,
	Anthony Iliopoulos, Michal Hocko, Jiri Olsa
In-Reply-To: <20260517134858.146569-1-sashal@kernel.org>

On Sun, May 17, 2026 at 6:49 AM Sasha Levin <sashal@kernel.org> wrote:
>
> When a kernel (security) issue goes public, fleets stay exposed until a patched
> kernel is built, distributed, and rebooted into.
>
> For many such issues the simplest mitigation is to stop calling the buggy
> function. Killswitch provides that. An admin writes:
>
>     echo "engage af_alg_sendmsg -1" \
>         > /sys/kernel/security/killswitch/control
>
> After this, af_alg_sendmsg() returns -EPERM on every call without
> running its body. The mitigation takes effect immediately, and is dropped on
> the next reboot -- by which point a patched kernel is hopefully in place.
>
> A lot of recent kernel issues sit in code paths most installs only have enabled
> to support a relative minority of users: AF_ALG, ksmbd, nf_tables, vsock, ax25,
> and friends.
>
> For most users, the cost of "this socket family stops working for the day" is
> much smaller than the cost of running a known vulnerable kernel until the fix
> lands.
>
> Why not an existing facility:
>
> * livepatch needs a built, signed, per-kernel-version module per CVE.
>   Under Secure Boot the operator can't sign their own, so they wait
>   for the vendor, and only a minority of vendors actually ship
>   livepatches. Killswitch covers the days before that module shows
>   up.
>
> * fail_function (CONFIG_FUNCTION_ERROR_INJECTION) is disabled in
>   most production kernels. Even where enabled, it only works on
>   functions pre-annotated with ALLOW_ERROR_INJECTION() in source -
>   no help for a freshly-disclosed CVE. The debugfs UI is blocked by
>   lockdown=integrity and the override is probabilistic.
>
> * BPF override (bpf_override_return) honors the same
>   ALLOW_ERROR_INJECTION() whitelist, and BPF itself is off in many
>   production kernels. Even where on, the operator interface is
>   "load a verified BPF program," not a one-line write.

If it is OK for killswitch to attach to any kernel functions, do we still
need ALLOW_ERROR_INJECTION() for fail_function and BPF
override? Shall we instead also allow fail_function and BPF override
to attach to any kernel functions?

Thanks,
Song

^ permalink raw reply

* Re: [PATCH] killswitch: add per-function short-circuit mitigation primitive
From: Song Liu @ 2026-05-18  6:31 UTC (permalink / raw)
  To: Paul Moore
  Cc: Sasha Levin, corbet, akpm, skhan, linux-doc, linux-kernel,
	linux-kselftest, gregkh, linux-security-module
In-Reply-To: <CAHC9VhTwDt2Bx8n0io9Qge_fUEnrHsxrFAQY+KaemKWqJqBQxw@mail.gmail.com>

On Thu, May 14, 2026 at 8:48 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Thu, May 7, 2026 at 3:05 AM Sasha Levin <sashal@kernel.org> wrote:
> >
> > When a (security) issue goes public, fleets stay exposed until a patched kernel
> > is built, distributed, and rebooted into.
> >
> > For many such issues the simplest mitigation is to stop calling the buggy
> > function. Killswitch provides that. An admin writes:
> >
> >     echo "engage af_alg_sendmsg -1" \
> >         > /sys/kernel/security/killswitch/control
> >
> > After this, af_alg_sendmsg() returns -EPERM on every call without
> > running its body. The mitigation takes effect immediately, and is dropped on
> > the next reboot.
> >
> > A lot of recent kernel issues sit in code paths most installs only have enabled
> > to support a relative minority of users: AF_ALG, ksmbd, nf_tables, vsock, ax25,
> > and friends.
> >
> > For most users, the cost of "this socket family stops working for the day" is
> > much smaller than the cost of running a known vulnerable kernel until the fix
> > land.
> >
> > Assisted-by: Claude:claude-opus-4-7
> > Signed-off-by: Sasha Levin <sashal@kernel.org>
> > ---
> >  Documentation/admin-guide/index.rst           |   1 +
> >  Documentation/admin-guide/killswitch.rst      | 159 ++++
> >  Documentation/admin-guide/tainted-kernels.rst |   8 +
> >  MAINTAINERS                                   |  11 +
> >  include/linux/killswitch.h                    |  19 +
> >  include/linux/panic.h                         |   3 +-
> >  init/Kconfig                                  |   2 +
> >  kernel/Kconfig.killswitch                     |  31 +
> >  kernel/Makefile                               |   1 +
> >  kernel/killswitch.c                           | 798 ++++++++++++++++++
> >  kernel/panic.c                                |   1 +
> >  lib/Kconfig.debug                             |  13 +
> >  lib/Makefile                                  |   1 +
> >  lib/test_killswitch.c                         |  85 ++
> >  tools/testing/selftests/Makefile              |   1 +
> >  tools/testing/selftests/killswitch/.gitignore |   1 +
> >  tools/testing/selftests/killswitch/Makefile   |   8 +
> >  .../selftests/killswitch/cve_31431_test.c     | 162 ++++
> >  .../selftests/killswitch/killswitch_test.sh   | 147 ++++
> >  19 files changed, 1451 insertions(+), 1 deletion(-)
> >  create mode 100644 Documentation/admin-guide/killswitch.rst
> >  create mode 100644 include/linux/killswitch.h
> >  create mode 100644 kernel/Kconfig.killswitch
> >  create mode 100644 kernel/killswitch.c
> >  create mode 100644 lib/test_killswitch.c
> >  create mode 100644 tools/testing/selftests/killswitch/.gitignore
> >  create mode 100644 tools/testing/selftests/killswitch/Makefile
> >  create mode 100644 tools/testing/selftests/killswitch/cve_31431_test.c
> >  create mode 100755 tools/testing/selftests/killswitch/killswitch_test.sh
>
> If we made Lockdown an LSM, we should probably also make killswitch an LSM.

I don't think killswitch can stack with other LSMs. In fact, killswitch
can be used to bypass other LSMs, for example:

echo engage security_file_open 0 > /sys/kernel/security/killswitch/control

will bypass all hooks on security_file_open.

Thanks,
Song

> For the LSM crowd who might be seeing this for the first time, the
> original thread can be found on lore via the link below:
> https://lore.kernel.org/all/20260507070547.2268452-1-sashal@kernel.org
>
> --
> paul-moore.com
>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox