Linux-HyperV List
 help / color / mirror / Atom feed
* Re: [PATCH v4 17/47] x86/kvm: Mark TSC as reliable when it's constant and nonstop
From: David Woodhouse @ 2026-06-01 22:02 UTC (permalink / raw)
  To: seanjc
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, kas, kys, haiyangz,
	wei.liu, decui, longli, ajay.kaher, alexey.makhalov, jan.kiszka,
	luto, peterz, jgross, daniel.lezcano, jstultz, hpa,
	rick.p.edgecombe, vkuznets, bcm-kernel-feedback-list,
	boris.ostrovsky, sboyd, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, dwmw, thomas.lendacky,
	nikunj, dwmw2, mhklinux, tglx, sashiko-reviews
In-Reply-To: <ahnhnjvfIblFxTFX@google.com>

[-- Attachment #1: Type: text/plain, Size: 2028 bytes --]

On Fri, 29 May 2026 11:57:34 -0700, Sean Christopherson wrote:
> On Fri, May 29, 2026, sashiko-bot@kernel.org wrote:
> > > diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> > > index 909d3e5e5bcd5..4fe9c69bf40b3 100644
> > > --- a/arch/x86/kernel/kvm.c
> > > +++ b/arch/x86/kernel/kvm.c
> > [ ... ]
> > > @@ -1040,7 +1041,20 @@ static void __init kvm_init_platform(void)
> > [ ... ]
> > > -	kvmclock_init();
> > > +        /*
> > > +         * If the TSC counts at a constant frequency across P/T states, counts
> > > +         * in deep C-states, and the TSC hasn't been marked unstable, treat the
> > > +         * TSC reliable, as guaranteed by KVM.  Note, the TSC unstable check
> > > +         * exists purely to honor the TSC being marked unstable via command
> > > +         * line, any runtime detection of an unstable will happen after this.
> > > +         */
> > > +	tsc_is_reliable = boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> > > +			  boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> > > +			  !check_tsc_unstable();
> > 
> > [Severity: High]
> > Does this evaluate check_tsc_unstable() too early to catch the command line
> > parameter?
> 
> Huh, it does indeed.
> 
> > It looks like kvm_init_platform() is called from setup_arch(), but the
> > tsc=unstable kernel parameter is parsed via __setup() later during
> > parse_args() in start_kernel().
> > 
> > If check_tsc_unstable() evaluates to 0 here because the parameter hasn't
> > been parsed yet, wouldn't it incorrectly force X86_FEATURE_TSC_RELIABLE
> > and set prefer_tsc to true?
> 
> Yep, but this is a pre-existing problem that goes all the way back to the original
> commit 7539b174aef4 ("x86: kvmguest: use TSC clocksource if invariant TSC is exposed").
> 
> We could try to fix that, but I'm _very_ strongly inclined to add (yet another)
> patch to simply drop the check_tsc_unstable() since it has always been dead code.

Yeah, kill it with fire.

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v4 14/47] x86/kvmclock: Rename kvm_get_tsc_khz() to kvmclock_get_tsc_khz()
From: David Woodhouse @ 2026-06-01 21:53 UTC (permalink / raw)
  To: seanjc
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, kas, kys, haiyangz,
	wei.liu, decui, longli, ajay.kaher, alexey.makhalov, jan.kiszka,
	luto, peterz, jgross, daniel.lezcano, jstultz, hpa,
	rick.p.edgecombe, vkuznets, bcm-kernel-feedback-list,
	boris.ostrovsky, sboyd, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, dwmw, thomas.lendacky,
	nikunj, dwmw2, mhklinux, tglx
In-Reply-To: <20260529144435.704127-15-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 400 bytes --]

On Fri, 29 May 2026 07:44:01 -0700, Sean Christopherson wrote:
> Rename kvm_get_tsc_khz() to kvmclock_get_tsc_khz() in anticipation of
> adding support for getting TSC info from PV CPUID, i.e. in a KVM specific
> way, but without non-kvmclock.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v4 12/47] x86/tsc: Rename pit_hpet_ptimer_calibrate_cpu() => native_calibrate_cpu_late()
From: David Woodhouse @ 2026-06-01 21:52 UTC (permalink / raw)
  To: seanjc
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, kas, kys, haiyangz,
	wei.liu, decui, longli, ajay.kaher, alexey.makhalov, jan.kiszka,
	luto, peterz, jgross, daniel.lezcano, jstultz, hpa,
	rick.p.edgecombe, vkuznets, bcm-kernel-feedback-list,
	boris.ostrovsky, sboyd, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, dwmw, thomas.lendacky,
	nikunj, dwmw2, mhklinux, tglx
In-Reply-To: <20260529144435.704127-13-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 339 bytes --]

On Fri, 29 May 2026 07:43:59 -0700, Sean Christopherson wrote:
> Rename the late CPU calibration routine so that its relationship to the
> early routine is more obvious and intuitive.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v4 13/47] x86/tsc: Fold native_calibrate_cpu() into recalibrate_cpu_khz()
From: David Woodhouse @ 2026-06-01 21:52 UTC (permalink / raw)
  To: seanjc
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, kas, kys, haiyangz,
	wei.liu, decui, longli, ajay.kaher, alexey.makhalov, jan.kiszka,
	luto, peterz, jgross, daniel.lezcano, jstultz, hpa,
	rick.p.edgecombe, vkuznets, bcm-kernel-feedback-list,
	boris.ostrovsky, sboyd, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, dwmw, thomas.lendacky,
	nikunj, dwmw2, mhklinux, tglx
In-Reply-To: <20260529144435.704127-14-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 503 bytes --]

On Fri, 29 May 2026 07:44:00 -0700, Sean Christopherson wrote:
> Fold the guts of native_calibrate_cpu() into its sole remaining caller,
> recalibrate_cpu_khz() to eliminate the extra SMP=n #ifdef, and so that it's
> more obvious that directly invoking the early vs. late calibration routines
> in determine_cpu_tsc_frequencies() is intentional.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v4 11/47] x86/tsc: Kill off x86_platform_ops.calibrate_{cpu,tsc}() hooks
From: David Woodhouse @ 2026-06-01 21:51 UTC (permalink / raw)
  To: seanjc
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, kas, kys, haiyangz,
	wei.liu, decui, longli, ajay.kaher, alexey.makhalov, jan.kiszka,
	luto, peterz, jgross, daniel.lezcano, jstultz, hpa,
	rick.p.edgecombe, vkuznets, bcm-kernel-feedback-list,
	boris.ostrovsky, sboyd, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, dwmw, thomas.lendacky,
	nikunj, dwmw2, mhklinux, tglx
In-Reply-To: <20260529144435.704127-12-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 634 bytes --]

On Fri, 29 May 2026 07:43:58 -0700, Sean Christopherson wrote:
> Now that getting the CPU and/or TSC frequencies from the hypervisor uses
> dedicated hooks, drop x86_platform_ops.calibrate_{cpu,tsc}() and instead
> directly invoke the correct helper at each phase of (re)calibration.  In
> addition to eliminating unnecessary code, this makes it a bit more obvious
> when the "late" path invokes pit_hpet_ptimer_calibrate_cpu() instead of
> x86_platform_ops.calibrate_cpu().
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v4 8/47] x86/tsc: Add dedicated hypervisor hooks for getting known TSC/CPU frequencies
From: David Woodhouse @ 2026-06-01 21:49 UTC (permalink / raw)
  To: seanjc
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, kas, kys, haiyangz,
	wei.liu, decui, longli, ajay.kaher, alexey.makhalov, jan.kiszka,
	luto, peterz, jgross, daniel.lezcano, jstultz, hpa,
	rick.p.edgecombe, vkuznets, bcm-kernel-feedback-list,
	boris.ostrovsky, sboyd, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, dwmw, thomas.lendacky,
	nikunj, dwmw2, mhklinux, tglx
In-Reply-To: <20260529144435.704127-9-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 740 bytes --]

On Fri, 29 May 2026 07:43:55 -0700, Sean Christopherson wrote:
> Add dedicated hypervisor hooks for getting known TSC/CPU frequencies
> instead of overriding seemingly generic platform hooks, and explicitly
> priotize hypervisor-provided frequencies over native methods, but do NOT
> clobber the frequency obtained from trusted firmware.  While shuffling the
> hooks around is arguably "six of one, half dozen of the other", scoping
> them to x86_hyper_init makes their purpose more obvious, and allows for
> explicitly defining the priority of sources (as is done here).
>
> Cc: David Woodhouse <dwmw2@infradead.org>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v4 1/47] x86/tsc: Never re-calibrate TSC frequency if its exact timing is known
From: David Woodhouse @ 2026-06-01 21:46 UTC (permalink / raw)
  To: seanjc
  Cc: pbonzini, tglx, mingo, bp, dave.hansen, x86, kas, kys, haiyangz,
	wei.liu, decui, longli, ajay.kaher, alexey.makhalov, jan.kiszka,
	luto, peterz, jgross, daniel.lezcano, jstultz, hpa,
	rick.p.edgecombe, vkuznets, bcm-kernel-feedback-list,
	boris.ostrovsky, sboyd, kvm, linux-kernel, linux-coco,
	linux-hyperv, virtualization, xen-devel, dwmw, thomas.lendacky,
	nikunj, dwmw2, mhklinux, tglx
In-Reply-To: <20260529144435.704127-2-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 473 bytes --]

On Fri, 29 May 2026 07:43:48 -0700, Sean Christopherson wrote:
> Don't re-calibrate the TSC frequency if the TSC is known to run at a fixed
> frequency.  In practice, this is likely one big nop, as re-calibration is
> used only for SMP=n kernels, and only for hardware that is 20+ years old,
> i.e. is extremely unlikely to collide with TSC_KNOWN_FREQ.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [RFC PATCH 00/20] mshv: enable kexec with Hyper-V donated pages and partitions
From: Jork Loeser @ 2026-06-01 20:15 UTC (permalink / raw)
  To: Pasha Tatashin
  Cc: Mike Rapoport, linux-hyperv, linux-mm, kexec, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Pratyush Yadav,
	Alexander Graf, Jason Miu, Andrew Morton, David Hildenbrand,
	Muchun Song, Oscar Salvador, Baoquan He, Catalin Marinas,
	Will Deacon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Kees Cook, Ran Xiaokai,
	Justinien Bouron, Sourabh Jain, Pingfan Liu, Rafael J. Wysocki,
	Mario Limonciello, linux-arm-kernel, x86, linux-kernel,
	Michael Kelley
In-Reply-To: <ah2eBxaBnVs_1j5n@google.com>



On Mon, 1 Jun 2026, Pasha Tatashin wrote:

> On 05-31 20:10, Mike Rapoport wrote:

>>>  - A freeze mechanism to lock the tree before serializing for kexec
>>>    (patch 13).
>>
>> There were a lot of effort to make KHO stateless and drop the requirement
>> for finalization/freeze.
>
> Yes, using KHO directly here is incorrect. The state machine is provided
> by LUO, so we should use LUO here. MSHV should provide a file that
> userspace adds to LUO, and all state machine management would be the
> same as for all other clients participating in LU.

The thing is, there is no file handle to rely on. Even once partitions are 
all removed, Hyper-V might hang onto pages (and won't return them even if 
asked). However, these pages very much must be excluded from Linux 
post-kexec, or the system will crash. We cannot rely on UM to ensure 
integrity of memory management.

Contrast that to standard LUO use: If you drop individual file handles, or 
even skip the LUO phase entirely, the worst that will happen is that the 
objects will be gone post-kexec. The MM itself will still be consistent. 
For MSHV & page donation, this is different.

(And yes, partition preservation will very much tie into LUO)

Best,
Jork


^ permalink raw reply

* Re: [RFC PATCH 00/20] mshv: enable kexec with Hyper-V donated pages and partitions
From: Jork Loeser @ 2026-06-01 20:09 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-hyperv, linux-mm, kexec, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Dexuan Cui, Long Li, Pasha Tatashin, Pratyush Yadav,
	Alexander Graf, Jason Miu, Andrew Morton, David Hildenbrand,
	Muchun Song, Oscar Salvador, Baoquan He, Catalin Marinas,
	Will Deacon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H. Peter Anvin, Kees Cook, Ran Xiaokai,
	Justinien Bouron, Sourabh Jain, Pingfan Liu, Rafael J. Wysocki,
	Mario Limonciello, linux-arm-kernel, x86, linux-kernel,
	Michael Kelley
In-Reply-To: <ahxrc4pTvVU20RTX@kernel.org>



On Sun, 31 May 2026, Mike Rapoport wrote:

> Hi Jork,

>>  - A freeze mechanism to lock the tree before serializing for kexec
>>    (patch 13).
>
> There were a lot of effort to make KHO stateless and drop the requirement
> for finalization/freeze.
>
> Why is this necessary to add a freeze mechanism to kho_radix_tree?
> If it's a hard requirement of mshv maybe the freeze part should be handled
> there?

Good feedback. It's a safety-net so we do not accidentally donate pages 
without being able to track them. Thought it might be a good generic 
feature. Let me keep it in the MSHV driver.

>> Patch 13:      Radix tree freeze and del_key() error reporting
>
> del_key() error reporting sounds like something we'd want to avoid.
> del_key() is called on "freeing" path and during error handling, it would
> be hard if at all possible to deal with errors from del_key().

I hear you. Stating "yeah, it can only really fail if the key isn't there, 
or it's frozen, but not due to other things, so don't bother to check the 
return code if you are sure" is an odd contract. With the freeze-logic 
moving into MSHV, will revert to no-error.

>> Patch 19:      Export kexec_in_progress for modules
>
> Isn't there another way to differentiate kexec reboot?

I could not find one, unfortunately.

> Sincerely yours,
> Mike.

Best,
Jork

^ permalink raw reply

* RE: [PATCH net-next] net: mana: Add Interrupt Moderation support
From: Haiyang Zhang @ 2026-06-01 16:19 UTC (permalink / raw)
  To: Jagielski, Jedrzej, Haiyang Zhang, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org, KY Srinivasan, Wei Liu, Dexuan Cui,
	Long Li, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman,
	Shradha Gupta, Erni Sri Satya Vennela, Dipayaan Roy, Aditya Garg,
	Kees Cook, Breno Leitao, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org
  Cc: Paul Rosswurm
In-Reply-To: <PH0PR11MB590230F0CEBE8B1DEAA15F82F0152@PH0PR11MB5902.namprd11.prod.outlook.com>



> -----Original Message-----
> From: Jagielski, Jedrzej <jedrzej.jagielski@intel.com>
> Sent: Monday, June 1, 2026 5:39 AM
> To: Haiyang Zhang <haiyangz@linux.microsoft.com>; linux-
> hyperv@vger.kernel.org; netdev@vger.kernel.org; KY Srinivasan
> <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>; Wei Liu
> <wei.liu@kernel.org>; Dexuan Cui <DECUI@microsoft.com>; Long Li
> <longli@microsoft.com>; Andrew Lunn <andrew+netdev@lunn.ch>; David S.
> Miller <davem@davemloft.net>; Eric Dumazet <edumazet@google.com>; Jakub
> Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; Konstantin
> Taranov <kotaranov@microsoft.com>; Simon Horman <horms@kernel.org>;
> Shradha Gupta <shradhagupta@linux.microsoft.com>; Erni Sri Satya Vennela
> <ernis@linux.microsoft.com>; Dipayaan Roy
> <dipayanroy@linux.microsoft.com>; Aditya Garg
> <gargaditya@linux.microsoft.com>; Kees Cook <kees@kernel.org>; Breno
> Leitao <leitao@debian.org>; linux-kernel@vger.kernel.org; linux-
> rdma@vger.kernel.org
> Cc: Paul Rosswurm <paulros@microsoft.com>
> Subject: [EXTERNAL] RE: [PATCH net-next] net: mana: Add Interrupt
> Moderation support
> 
> [Niekt?re osoby, kt?re odebra?y t? wiadomo??, nie otrzymuj? cz?sto
> wiadomo?ci e-mail z jedrzej.jagielski@intel.com. Dowiedz si?, dlaczego
> jest to wa?ne, na stronie https://aka.ms/LearnAboutSenderIdentification ]
> 
> From: Haiyang Zhang <haiyangz@linux.microsoft.com>
> Sent: Saturday, May 30, 2026 9:50 PM
> 
> >From: Haiyang Zhang <haiyangz@microsoft.com>
> >
> >Add Static and Dynamic Interrupt Moderation (DIM) support for
> >Rx and Tx.
> >Update queue creation procedure with new data struct with the related
> >settings.
> >Add functions to collect stat for DIM, and workers to update DIM data
> >and settings.
> >Update ethtool handler to get/set the moderation settings from a user.
> >By default, adaptive-rx/tx (DIM) are enabled.
> >
> >Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> >---
> > drivers/net/ethernet/microsoft/Kconfig        |   1 +
> > .../net/ethernet/microsoft/mana/gdma_main.c   |  27 ++++
> > drivers/net/ethernet/microsoft/mana/mana_en.c | 101 ++++++++++++++-
> > .../ethernet/microsoft/mana/mana_ethtool.c    | 120 +++++++++++++++++-
> > include/net/mana/gdma.h                       |  24 +++-
> > include/net/mana/mana.h                       |  42 ++++++
> > 6 files changed, 309 insertions(+), 6 deletions(-)
> >
> >diff --git a/drivers/net/ethernet/microsoft/Kconfig
> b/drivers/net/ethernet/microsoft/Kconfig
> >index 3f36ee6a8ece..e9be18c92ca5 100644
> >--- a/drivers/net/ethernet/microsoft/Kconfig
> >+++ b/drivers/net/ethernet/microsoft/Kconfig
> >@@ -21,6 +21,7 @@ config MICROSOFT_MANA
> >       depends on X86_64 || (ARM64 && !CPU_BIG_ENDIAN)
> >       depends on PCI_HYPERV
> >       select AUXILIARY_BUS
> >+      select DIMLIB
> >       select PAGE_POOL
> >       select NET_SHAPER
> >       help
> >diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> >index 712a0881d720..5aa0ea794a00 100644
> >--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> >+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> >@@ -405,6 +405,7 @@ static int mana_gd_disable_queue(struct gdma_queue
> *queue)
> > #define DOORBELL_OFFSET_RQ    0x400
> > #define DOORBELL_OFFSET_CQ    0x800
> > #define DOORBELL_OFFSET_EQ    0xFF8
> >+#define DOORBELL_OFFSET_DIM   0x820
> >
> > static void mana_gd_ring_doorbell(struct gdma_context *gc, u32 db_index,
> >                                 enum gdma_queue_type q_type, u32 qid,
> >@@ -445,6 +446,16 @@ static void mana_gd_ring_doorbell(struct
> gdma_context *gc, u32 db_index,
> >               addr += DOORBELL_OFFSET_SQ;
> >               break;
> >
> >+      case GDMA_DIM:
> >+              e.dim.id = qid;
> >+              e.dim.mod_usec = tail_ptr;
> >+              e.dim.mod_usec_vld = tail_ptr >> 15;
> >+              e.dim.mod_comps = tail_ptr >> 16;
> 
> please use defines instead of magic
Will do.

> 
> >+              e.dim.mod_comps_vld = num_req;
> >+
> >+              addr += DOORBELL_OFFSET_DIM;
> >+              break;
> >+
> >       default:
> >               WARN_ON(1);
> >               return;
> >@@ -479,6 +490,22 @@ void mana_gd_ring_cq(struct gdma_queue *cq, u8
> arm_bit)
> > }
> > EXPORT_SYMBOL_NS(mana_gd_ring_cq, "NET_MANA");
> >
> >+void mana_gd_ring_dim(struct gdma_queue *cq, u32 mod_usec, bool
> mod_usec_vld,
> >+                    u32 mod_comps, bool mod_comps_vld)
> >+{
> >+      struct gdma_context *gc = cq->gdma_dev->gdma_context;
> >+      u32 dim_val;
> >+
> >+      /* Convert the DIM values to doorbell parameters */
> >+      dim_val = (mod_usec & MANA_INTR_MODR_USEC_MAX) |
> >+                (((u32)mod_usec_vld & 1) << 15) |
> >+                ((mod_comps & MANA_INTR_MODR_COMP_MAX) << 16);
> 
> i believe FIELD_PREP if preferrable in such cases
Will do.

> 
> >+
> >+      mana_gd_ring_doorbell(gc, cq->gdma_dev->doorbell, GDMA_DIM, cq-
> >id,
> >+                            dim_val, (u8)mod_comps_vld & 1);
> >+}
> >+EXPORT_SYMBOL_NS(mana_gd_ring_dim, "NET_MANA");
> >+
> > #define MANA_SERVICE_PERIOD 10
> >
> > static void mana_serv_rescan(struct pci_dev *pdev)
> >diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c
> b/drivers/net/ethernet/microsoft/mana/mana_en.c
> >index 82f1461a48e9..f1a16f8aca66 100644
> >--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> >+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
> >@@ -1551,6 +1551,9 @@ int mana_create_wq_obj(struct mana_port_context
> *apc,
> >
> >       mana_gd_init_req_hdr(&req.hdr, MANA_CREATE_WQ_OBJ,
> >                            sizeof(req), sizeof(resp));
> >+
> >+      req.hdr.req.msg_version = GDMA_MESSAGE_V3;
> >+      req.hdr.resp.msg_version = GDMA_MESSAGE_V2;
> >       req.vport = vport;
> >       req.wq_type = wq_type;
> >       req.wq_gdma_region = wq_spec->gdma_region;
> >@@ -1559,6 +1562,9 @@ int mana_create_wq_obj(struct mana_port_context
> *apc,
> >       req.cq_size = cq_spec->queue_size;
> >       req.cq_moderation_ctx_id = cq_spec->modr_ctx_id;
> >       req.cq_parent_qid = cq_spec->attached_eq;
> >+      req.req_cq_moderation = cq_spec->req_cq_moderation;
> >+      req.cq_moderation_comp = cq_spec->cq_moderation_comp;
> >+      req.cq_moderation_usec = cq_spec->cq_moderation_usec;
> >
> >       err = mana_send_request(apc->ac, &req, sizeof(req), &resp,
> >                               sizeof(resp));
> >@@ -2253,6 +2259,66 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
> >               xdp_do_flush();
> > }
> >
> >+static void mana_rx_dim_work(struct work_struct *work)
> >+{
> >+      struct dim *dim = container_of(work, struct dim, work);
> >+      struct mana_cq *cq = container_of(dim, struct mana_cq, dim);
> >+      struct dim_cq_moder cur_moder =
> >+              net_dim_get_rx_moderation(dim->mode, dim->profile_ix);
> 
> RCT; here and for following
Will update this and below.

> 
> >+
> >+      cur_moder.usec = min_t(u16, cur_moder.usec,
> MANA_INTR_MODR_USEC_MAX);
> >+      cur_moder.pkts = min_t(u16, cur_moder.pkts,
> MANA_INTR_MODR_COMP_MAX);
> >+
> >+      mana_gd_ring_dim(cq->gdma_cq, cur_moder.usec, true,
> >+                       cur_moder.pkts, true);
> >+
> >+      dim->state = DIM_START_MEASURE;
> >+}
> >+
> >+static void mana_tx_dim_work(struct work_struct *work)
> >+{
> >+      struct dim *dim = container_of(work, struct dim, work);
> >+      struct mana_cq *cq = container_of(dim, struct mana_cq, dim);
> >+      struct dim_cq_moder cur_moder =
> >+              net_dim_get_tx_moderation(dim->mode, dim->profile_ix);
> >+
> >+      cur_moder.usec = min_t(u16, cur_moder.usec,
> MANA_INTR_MODR_USEC_MAX);
> >+      cur_moder.pkts = min_t(u16, cur_moder.pkts,
> MANA_INTR_MODR_COMP_MAX);
> >+
> >+      mana_gd_ring_dim(cq->gdma_cq, cur_moder.usec, true,
> >+                       cur_moder.pkts, true);
> >+
> >+      dim->state = DIM_START_MEASURE;
> >+}
> >+
> >+static void mana_update_rx_dim(struct mana_cq *cq)
> >+{
> >+      struct mana_rxq *rxq = cq->rxq;
> >+      struct mana_port_context *apc = netdev_priv(rxq->ndev);
> >+      struct dim_sample dim_sample = {};
> >+
> >+      if (!apc->rx_dim_enabled)
> >+              return;
> >+
> >+      dim_update_sample(READ_ONCE(cq->dim_event_ctr), rxq-
> >stats.packets,
> >+                        rxq->stats.bytes, &dim_sample);
> >+      net_dim(&cq->dim, &dim_sample);
> >+}
> >+
> >+static void mana_update_tx_dim(struct mana_cq *cq)
> >+{
> >+      struct mana_txq *txq = cq->txq;
> >+      struct mana_port_context *apc = netdev_priv(txq->ndev);
> >+      struct dim_sample dim_sample = {};
> >+
> >+      if (!apc->tx_dim_enabled)
> >+              return;
> >+
> >+      dim_update_sample(READ_ONCE(cq->dim_event_ctr), txq-
> >stats.packets,
> >+                        txq->stats.bytes, &dim_sample);
> >+      net_dim(&cq->dim, &dim_sample);
> >+}
> >+
> > static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
> > {
> >       struct mana_cq *cq = context;
> >@@ -2271,7 +2337,13 @@ static int mana_cq_handler(void *context, struct
> gdma_queue *gdma_queue)
> >       if (w < cq->budget) {
> >               mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
> >               cq->work_done_since_doorbell = 0;
> >-              napi_complete_done(&cq->napi, w);
> >+
> >+              if (napi_complete_done(&cq->napi, w)) {
> >+                      if (cq->type == MANA_CQ_TYPE_RX)
> >+                              mana_update_rx_dim(cq);
> >+                      else
> >+                              mana_update_tx_dim(cq);
> >+              }
> >       } else if (cq->work_done_since_doorbell >=
> >                  (cq->gdma_cq->queue_size / COMP_ENTRY_SIZE) * 4) {
> >               /* MANA hardware requires at least one doorbell ring every
> 8
> >@@ -2303,6 +2375,7 @@ static void mana_schedule_napi(void *context,
> struct gdma_queue *gdma_queue)
> > {
> >       struct mana_cq *cq = context;
> >
> >+      WRITE_ONCE(cq->dim_event_ctr, cq->dim_event_ctr + 1);
> >       napi_schedule_irqoff(&cq->napi);
> > }
> >
> >@@ -2345,6 +2418,7 @@ static void mana_destroy_txq(struct
> mana_port_context *apc)
> >               if (apc->tx_qp[i]->txq.napi_initialized) {
> >                       napi_synchronize(napi);
> >                       napi_disable_locked(napi);
> >+                      cancel_work_sync(&apc->tx_qp[i]->tx_cq.dim.work);
> >                       netif_napi_del_locked(napi);
> >                       apc->tx_qp[i]->txq.napi_initialized = false;
> >               }
> >@@ -2475,6 +2549,10 @@ static int mana_create_txq(struct
> mana_port_context *apc,
> >               cq_spec.queue_size = cq->gdma_cq->queue_size;
> >               cq_spec.modr_ctx_id = 0;
> >               cq_spec.attached_eq = cq->gdma_cq->cq.parent->id;
> >+              cq_spec.req_cq_moderation = apc->tx_dim_enabled ||
> >+                      (apc->intr_modr_tx_usec && apc-
> >intr_modr_tx_comp);
> >+              cq_spec.cq_moderation_usec = apc->intr_modr_tx_usec;
> >+              cq_spec.cq_moderation_comp = apc->intr_modr_tx_comp;
> >
> >               err = mana_create_wq_obj(apc, apc->port_handle, GDMA_SQ,
> >                                        &wq_spec, &cq_spec,
> >@@ -2509,6 +2587,9 @@ static int mana_create_txq(struct mana_port_context
> *apc,
> >               napi_enable_locked(&cq->napi);
> >               txq->napi_initialized = true;
> >
> >+              INIT_WORK(&cq->dim.work, mana_tx_dim_work);
> >+              cq->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
> >+
> >               mana_gd_ring_cq(cq->gdma_cq, SET_ARM_BIT);
> >       }
> >
> >@@ -2543,6 +2624,7 @@ static void mana_destroy_rxq(struct
> mana_port_context *apc,
> >               napi_synchronize(napi);
> >
> >               napi_disable_locked(napi);
> >+              cancel_work_sync(&rxq->rx_cq.dim.work);
> >               netif_napi_del_locked(napi);
> >       }
> >
> >@@ -2780,6 +2862,10 @@ static struct mana_rxq *mana_create_rxq(struct
> mana_port_context *apc,
> >       cq_spec.queue_size = cq->gdma_cq->queue_size;
> >       cq_spec.modr_ctx_id = 0;
> >       cq_spec.attached_eq = cq->gdma_cq->cq.parent->id;
> >+      cq_spec.req_cq_moderation = apc->rx_dim_enabled ||
> >+              (apc->intr_modr_rx_usec && apc->intr_modr_rx_comp);
> >+      cq_spec.cq_moderation_usec = apc->intr_modr_rx_usec;
> >+      cq_spec.cq_moderation_comp = apc->intr_modr_rx_comp;
> >
> >       err = mana_create_wq_obj(apc, apc->port_handle, GDMA_RQ,
> >                                &wq_spec, &cq_spec, &rxq->rxobj);
> >@@ -2815,6 +2901,9 @@ static struct mana_rxq *mana_create_rxq(struct
> mana_port_context *apc,
> >
> >       napi_enable_locked(&cq->napi);
> >
> >+      INIT_WORK(&cq->dim.work, mana_rx_dim_work);
> >+      cq->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
> >+
> >       mana_gd_ring_cq(cq->gdma_cq, SET_ARM_BIT);
> > out:
> >       if (!err)
> >@@ -3432,6 +3521,16 @@ static int mana_probe_port(struct mana_context
> *ac, int port_idx,
> >       apc->port_idx = port_idx;
> >       apc->cqe_coalescing_enable = 0;
> >
> >+      /* Initialize interrupt moderation settings if supported by HW */
> >+      if (gc->pf_cap_flags1 &
> GDMA_PF_CAP_FLAG_1_DYN_INTERRUPT_MODERATION) {
> >+              apc->intr_modr_rx_usec = MANA_INTR_MODR_USEC_DEF;
> >+              apc->intr_modr_rx_comp = MANA_INTR_MODR_COMP_DEF;
> >+              apc->intr_modr_tx_usec = MANA_INTR_MODR_USEC_DEF;
> >+              apc->intr_modr_tx_comp = MANA_INTR_MODR_COMP_DEF;
> >+              apc->rx_dim_enabled = MANA_ADAPTIVE_RX_DEF;
> >+              apc->tx_dim_enabled = MANA_ADAPTIVE_TX_DEF;
> >+      }
> >+
> >       mutex_init(&apc->vport_mutex);
> >       apc->vport_use_count = 0;
> >
> >diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> >index 04350973e19e..a90216eba794 100644
> >--- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> >+++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
> >@@ -419,6 +419,15 @@ static int mana_get_coalesce(struct net_device
> *ndev,
> >           !kernel_coal->rx_cqe_nsecs)
> >               kernel_coal->rx_cqe_nsecs = MANA_RX_CQE_NSEC_DEF;
> >
> >+      ec->rx_coalesce_usecs = apc->intr_modr_rx_usec;
> >+      ec->rx_max_coalesced_frames = apc->intr_modr_rx_comp;
> >+
> >+      ec->tx_coalesce_usecs = apc->intr_modr_tx_usec;
> >+      ec->tx_max_coalesced_frames = apc->intr_modr_tx_comp;
> >+
> >+      ec->use_adaptive_rx_coalesce = apc->rx_dim_enabled;
> >+      ec->use_adaptive_tx_coalesce = apc->tx_dim_enabled;
> >+
> >       return 0;
> > }
> >
> >@@ -429,8 +438,28 @@ static int mana_set_coalesce(struct net_device
> *ndev,
> > {
> >       struct mana_port_context *apc = netdev_priv(ndev);
> >       u8 saved_cqe_coalescing_enable;
> >+      u16 old_rx_usec, old_rx_comp;
> >+      u16 old_tx_usec, old_tx_comp;
> >+      bool old_rx_dim, old_tx_dim;
> 
> how about using some sort of struct instead of declaring a number
> of params for bookkeeping? imho would be cleaner
Will consider this.

> 
> >+      bool modr_changed = false;
> >+      bool dim_changed = false;
> >+      struct gdma_context *gc;
> >       int err;
> >
> >+      gc = apc->ac->gdma_dev->gdma_context;
> >+
> >+      /* Both static and dynamic interrupt moderation (DIM) rely on the
> >+       * same HW capability advertised by the PF.
> >+       */
> >+      if ((ec->use_adaptive_rx_coalesce || ec->use_adaptive_tx_coalesce
> ||
> >+           ec->rx_coalesce_usecs || ec->tx_coalesce_usecs ||
> >+           ec->rx_max_coalesced_frames || ec->tx_max_coalesced_frames)
> &&
> >+          !(gc->pf_cap_flags1 &
> GDMA_PF_CAP_FLAG_1_DYN_INTERRUPT_MODERATION)) {
> >+              NL_SET_ERR_MSG(extack,
> >+                             "Interrupt Moderation is not supported by
> HW");
> >+              return -EOPNOTSUPP;
> >+      }
> >+
> >       if (kernel_coal->rx_cqe_frames != 1 &&
> >           kernel_coal->rx_cqe_frames != MANA_RXCOMP_OOB_NUM_PPI) {
> >               NL_SET_ERR_MSG_FMT(extack,
> >@@ -440,6 +469,47 @@ static int mana_set_coalesce(struct net_device
> *ndev,
> >               return -EINVAL;
> >       }
> >
> >+      if (ec->rx_coalesce_usecs > MANA_INTR_MODR_USEC_MAX ||
> >+          ec->tx_coalesce_usecs > MANA_INTR_MODR_USEC_MAX) {
> >+              NL_SET_ERR_MSG_FMT(extack,
> >+                                 "coalesce usecs must be <= %u",
> >+                                 MANA_INTR_MODR_USEC_MAX);
> >+              return -EINVAL;
> >+      }
> >+
> >+      if (ec->rx_max_coalesced_frames > MANA_INTR_MODR_COMP_MAX ||
> >+          ec->tx_max_coalesced_frames > MANA_INTR_MODR_COMP_MAX) {
> >+              NL_SET_ERR_MSG_FMT(extack,
> >+                                 "coalesce frames must be <= %u",
> >+                                 MANA_INTR_MODR_COMP_MAX);
> >+              return -EINVAL;
> >+      }
> >+
> >+      if (ec->rx_coalesce_usecs != apc->intr_modr_rx_usec ||
> >+          ec->rx_max_coalesced_frames != apc->intr_modr_rx_comp ||
> >+          ec->tx_coalesce_usecs != apc->intr_modr_tx_usec ||
> >+          ec->tx_max_coalesced_frames != apc->intr_modr_tx_comp)
> >+              modr_changed = true;
> >+
> >+      old_rx_usec = apc->intr_modr_rx_usec;
> >+      old_rx_comp = apc->intr_modr_rx_comp;
> >+      old_tx_usec = apc->intr_modr_tx_usec;
> >+      old_tx_comp = apc->intr_modr_tx_comp;
> >+
> >+      apc->intr_modr_rx_usec = ec->rx_coalesce_usecs;
> >+      apc->intr_modr_rx_comp = ec->rx_max_coalesced_frames;
> >+      apc->intr_modr_tx_usec = ec->tx_coalesce_usecs;
> >+      apc->intr_modr_tx_comp = ec->tx_max_coalesced_frames;
> >+
> >+      if (!!ec->use_adaptive_rx_coalesce != apc->rx_dim_enabled ||
> >+          !!ec->use_adaptive_tx_coalesce != apc->tx_dim_enabled)
> >+              dim_changed = true;
> >+
> >+      old_rx_dim = apc->rx_dim_enabled;
> >+      old_tx_dim = apc->tx_dim_enabled;
> >+      apc->rx_dim_enabled = !!ec->use_adaptive_rx_coalesce;
> >+      apc->tx_dim_enabled = !!ec->use_adaptive_tx_coalesce;
> >+
> >       saved_cqe_coalescing_enable = apc->cqe_coalescing_enable;
> >       apc->cqe_coalescing_enable =
> >               kernel_coal->rx_cqe_frames == MANA_RXCOMP_OOB_NUM_PPI;
> >@@ -447,10 +517,46 @@ static int mana_set_coalesce(struct net_device
> *ndev,
> >       if (!apc->port_is_up)
> >               return 0;
> >
> >-      err = mana_config_rss(apc, TRI_STATE_TRUE, false, false);
> >-      if (err)
> >-              apc->cqe_coalescing_enable = saved_cqe_coalescing_enable;
> >+      if (apc->cqe_coalescing_enable != saved_cqe_coalescing_enable &&
> >+          !modr_changed && !dim_changed) {
> >+              /* If only CQE coalescing setting is changed, we can just
> update
> >+               * RSS configuration.
> >+               */
> >+              err = mana_config_rss(apc, TRI_STATE_TRUE, false, false);
> >+              if (err) {
> >+                      netdev_err(ndev, "Change CQE coalescing
> failed: %d\n",
> >+                                 err);
> >+                      apc->cqe_coalescing_enable =
> >+                              saved_cqe_coalescing_enable;
> >+                      return err;
> >+              }
> >+              return 0;
> >+      }
> >+
> >+      if (modr_changed || dim_changed) {
> >+              err = mana_detach(ndev, false);
> >+              if (err) {
> >+                      netdev_err(ndev, "mana_detach failed: %d\n", err);
> >+                      goto restore_modr;
> >+              }
> >+
> >+              err = mana_attach(ndev);
> >+              if (err) {
> >+                      netdev_err(ndev, "mana_attach failed: %d\n", err);
> >+                      goto restore_modr;
> 
> i see there is already such pattern in the mana code; how about
> creating a helper?
We are planning to update this pattern. So I keep this part of code like
other functions. And we will refactor/update them in separate patch set.

> 
> >+              }
> >+      }
> >+
> >+      return 0;
> >
> >+restore_modr:
> >+      apc->cqe_coalescing_enable = saved_cqe_coalescing_enable;
> >+      apc->intr_modr_rx_usec = old_rx_usec;
> >+      apc->intr_modr_rx_comp = old_rx_comp;
> >+      apc->intr_modr_tx_usec = old_tx_usec;
> >+      apc->intr_modr_tx_comp = old_tx_comp;
> >+      apc->rx_dim_enabled = old_rx_dim;
> >+      apc->tx_dim_enabled = old_tx_dim;
> >       return err;
> > }
> >
> >@@ -574,7 +680,13 @@ static int mana_get_link_ksettings(struct net_device
> *ndev,
> > }
> >
> > const struct ethtool_ops mana_ethtool_ops = {
> >-      .supported_coalesce_params = ETHTOOL_COALESCE_RX_CQE_FRAMES,
> >+      .supported_coalesce_params = ETHTOOL_COALESCE_RX_CQE_FRAMES |
> >+                                  ETHTOOL_COALESCE_RX_USECS |
> >+                                  ETHTOOL_COALESCE_RX_MAX_FRAMES |
> >+                                  ETHTOOL_COALESCE_TX_USECS |
> >+                                  ETHTOOL_COALESCE_TX_MAX_FRAMES |
> >+                                  ETHTOOL_COALESCE_USE_ADAPTIVE_RX |
> >+                                  ETHTOOL_COALESCE_USE_ADAPTIVE_TX,
> >       .get_ethtool_stats      = mana_get_ethtool_stats,
> >       .get_sset_count         = mana_get_sset_count,
> >       .get_strings            = mana_get_strings,
> >diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
> >index 70d62bc32837..0a0cc7b080d3 100644
> >--- a/include/net/mana/gdma.h
> >+++ b/include/net/mana/gdma.h
> >@@ -47,6 +47,7 @@ enum gdma_queue_type {
> >       GDMA_RQ,
> >       GDMA_CQ,
> >       GDMA_EQ,
> >+      GDMA_DIM,
> > };
> >
> > enum gdma_work_request_flags {
> >@@ -126,6 +127,17 @@ union gdma_doorbell_entry {
> >               u64 tail_ptr    : 31;
> >               u64 arm         : 1;
> >       } eq;
> >+
> >+      struct {
> >+              u64 id           : 24;
> >+              u64 reserved     : 8;
> >+              u64 mod_usec     : 10;
> >+              u64 reserve1     : 5;
> >+              u64 mod_usec_vld : 1;
> >+              u64 mod_comps    : 8;
> >+              u64 reserve2     : 7;
> >+              u64 mod_comps_vld: 1;
> >+      } dim;
> > }; /* HW DATA */
> >
> > struct gdma_msg_hdr {
> >@@ -484,6 +496,9 @@ void mana_gd_ring_cq(struct gdma_queue *cq, u8
> arm_bit);
> >
> > int mana_schedule_serv_work(struct gdma_context *gc, enum gdma_eqe_type
> type);
> >
> >+void mana_gd_ring_dim(struct gdma_queue *cq, u32 mod_usec, bool
> mod_usec_vld,
> >+                    u32 mod_comps, bool mod_comps_vld);
> >+
> > struct gdma_wqe {
> >       u32 reserved    :24;
> >       u32 last_vbytes :8;
> >@@ -629,6 +644,9 @@ enum {
> > /* Driver supports self recovery on Hardware Channel timeouts */
> > #define GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECOVERY BIT(25)
> >
> >+/* Driver supports dynamic interrupt moderation - DIM */
> >+#define GDMA_DRV_CAP_FLAG_1_DYN_INTERRUPT_MODERATION BIT(27)
> >+
> > #define GDMA_DRV_CAP_FLAGS1 \
> >       (GDMA_DRV_CAP_FLAG_1_EQ_SHARING_MULTI_VPORT | \
> >        GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX | \
> >@@ -643,7 +661,8 @@ enum {
> >        GDMA_DRV_CAP_FLAG_1_SKB_LINEARIZE | \
> >        GDMA_DRV_CAP_FLAG_1_PROBE_RECOVERY | \
> >        GDMA_DRV_CAP_FLAG_1_HANDLE_STALL_SQ_RECOVERY | \
> >-       GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECOVERY)
> >+       GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECOVERY | \
> >+       GDMA_DRV_CAP_FLAG_1_DYN_INTERRUPT_MODERATION)
> >
> > #define GDMA_DRV_CAP_FLAGS2 0
> >
> >@@ -679,6 +698,9 @@ struct gdma_verify_ver_req {
> >       u8 os_ver_str4[128];
> > }; /* HW DATA */
> >
> >+/* HW supports dynamic interrupt moderation - DIM */
> >+#define GDMA_PF_CAP_FLAG_1_DYN_INTERRUPT_MODERATION BIT(15)
> >+
> > struct gdma_verify_ver_resp {
> >       struct gdma_resp_hdr hdr;
> >       u64 gdma_protocol_ver;
> >diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
> >index d9c27310fd04..57868a79f23d 100644
> >--- a/include/net/mana/mana.h
> >+++ b/include/net/mana/mana.h
> >@@ -4,6 +4,7 @@
> > #ifndef _MANA_H
> > #define _MANA_H
> >
> >+#include <linux/dim.h>
> > #include <net/xdp.h>
> > #include <net/net_shaper.h>
> >
> >@@ -64,6 +65,16 @@ enum TRI_STATE {
> > /* Maximum number of packets per coalesced CQE */
> > #define MANA_RXCOMP_OOB_NUM_PPI 4
> >
> >+/* Default/max interrupt moderation settings */
> >+#define MANA_INTR_MODR_USEC_DEF 0
> >+#define MANA_INTR_MODR_COMP_DEF 0
> >+
> >+#define MANA_ADAPTIVE_RX_DEF true
> >+#define MANA_ADAPTIVE_TX_DEF true
> >+
> >+#define MANA_INTR_MODR_USEC_MAX 1023
> >+#define MANA_INTR_MODR_COMP_MAX 255
> 
> used as a limiter and mask - for mask case i believe
> GENMASK cand be used
Will do. 

Thanks,
- Haiyang

^ permalink raw reply

* Re: [RFC PATCH 00/20] mshv: enable kexec with Hyper-V donated pages and partitions
From: Pasha Tatashin @ 2026-06-01 15:00 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Jork Loeser, linux-hyperv, linux-mm, kexec, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Pasha Tatashin,
	Pratyush Yadav, Alexander Graf, Jason Miu, Andrew Morton,
	David Hildenbrand, Muchun Song, Oscar Salvador, Baoquan He,
	Catalin Marinas, Will Deacon, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Kees Cook,
	Ran Xiaokai, Justinien Bouron, Sourabh Jain, Pingfan Liu,
	Rafael J. Wysocki, Mario Limonciello, linux-arm-kernel, x86,
	linux-kernel, Michael Kelley
In-Reply-To: <ahxrc4pTvVU20RTX@kernel.org>

On 05-31 20:10, Mike Rapoport wrote:
> Hi Jork,
> 
> Only had time to skim through the patches.
> I have a couple of high level questions for now.
> 
> On Wed, May 27, 2026 at 05:41:42PM -0700, Jork Loeser wrote:
> > When Linux runs as an L1 Virtual Host (L1VH) under Hyper-V, the MSHV
> > root partition driver deposits pages to the hypervisor and creates
> > partitions for guest VMs. Prior patches enabled kexec for L1VH, but
> > only when no partitions had been created and no memory had been donated.
> > 
> > This series lifts that limitation. It uses KHO (Kexec Handover) to:
> > 
> >  - Track all pages deposited to the hypervisor in a KHO radix tree
> >    and preserve them across kexec so the new kernel knows which pages
> >    are owned by the hypervisor.
> > 
> >  - Freeze running partitions before kexec, record their IDs in the
> >    KHO FDT, and vacuum (tear down + reclaim memory) stale partitions
> >    after kexec.
> > 
> >  - In case of a crash, exclude hypervisor-owned pages from crash
> >    dump collection by passing the radix tree root PA via Hyper-V
> >    crash MSR P2 to the crash kernel.
> > 
> > Dependency on Pratyush's KHO series
> > ===================================
> > 
> > Patches 1-12 are cherry-picked from Pratyush Yadav's v1 series
> > "kho: make boot time huge page allocation work nicely with KHO" [1],
> > which is still under discussion. This series uses functionality from
> > those patches -- specifically the meta-data page enumeration via table
> > callbacks and the restructured radix tree API. It also extends the
> > KHO radix tree with:
> > 
> >  - A freeze mechanism to lock the tree before serializing for kexec
> >    (patch 13).
> 
> There were a lot of effort to make KHO stateless and drop the requirement
> for finalization/freeze.

Yes, using KHO directly here is incorrect. The state machine is provided 
by LUO, so we should use LUO here. MSHV should provide a file that 
userspace adds to LUO, and all state machine management would be the 
same as for all other clients participating in LU.

> 
> Why is this necessary to add a freeze mechanism to kho_radix_tree?
> If it's a hard requirement of mshv maybe the freeze part should be handled
> there?
j  
> >  - A crash-kernel-safe variant that memremaps radix nodes for use
> >    outside the direct map (patch 14).
> > 
> > Patch overview
> > ==============
> > 
> > Patches 1-12:  KHO radix tree and memblock changes (from [1])
> > Patch 13:      Radix tree freeze and del_key() error reporting
> 
> del_key() error reporting sounds like something we'd want to avoid.
> del_key() is called on "freeing" path and during error handling, it would
> be hard if at all possible to deal with errors from del_key().
> 
> > Patch 14:      Crash-kernel-safe radix tree presence check
> > Patch 15:      Page tracker using KHO radix tree for deposited pages
> > Patch 16:      Debugfs interface for page tracker
> > Patches 17-18: Crash MSR reshuffling + crash dump page exclusion
> > Patch 19:      Export kexec_in_progress for modules
> 
> Isn't there another way to differentiate kexec reboot?
> 
> > Patch 20:      Freeze and vacuum partitions across kexec
> > 
> > Feedback
> > ========
> > 
> > This is an RFC. I am looking for feedback on the overall approach as
> > well as the KHO changes (patches 13-14).
> > 
> > [1] https://lore.kernel.org/linux-mm/20260429133928.850721-1-pratyush@kernel.org/
> > 
> > Based-on: linux-next/master (next-20260527)
> 
> -- 
> Sincerely yours,
> Mike.

^ permalink raw reply

* Re: [PATCH v4 07/10] drm/damage-helper: Remove old state from drm_atomic_helper_damage_iter_init()
From: Hamza Mahfooz @ 2026-06-01 14:01 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: mripard, maarten.lankhorst, airlied, airlied, simona, admin,
	gargaditya08, paul, jani.nikula, mhklinux, zack.rusin,
	bcm-kernel-feedback-list, dri-devel, linux-hyperv, intel-gfx,
	intel-xe, linux-mips, virtualization
In-Reply-To: <20260530185716.65688-8-tzimmermann@suse.de>

On Sat, May 30, 2026 at 08:53:20PM +0200, Thomas Zimmermann wrote:
> Nothing in drm_atomic_helper_damage_iter_init() requires the old
> plane state. Remove the parameter and mass-convert callers.
> 
> Most callers now no longer require the old plane state in their plane's
> atomic_update helper. Remove it as well.
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> Acked-by: Zack Rusin <zack.rusin@broadcom.com>
> ---

Acked-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> # hyperv

^ permalink raw reply

* Re: [PATCH v4 10/10] drm/vmwgfx: Remove unused field struct vmwgfx_du_update_plane.old_state
From: Javier Martinez Canillas @ 2026-06-01 10:30 UTC (permalink / raw)
  To: Thomas Zimmermann, mripard, maarten.lankhorst, airlied, airlied,
	simona, admin, gargaditya08, paul, jani.nikula, mhklinux,
	zack.rusin, bcm-kernel-feedback-list
  Cc: dri-devel, linux-hyperv, intel-gfx, intel-xe, linux-mips,
	virtualization, Thomas Zimmermann
In-Reply-To: <20260530185716.65688-11-tzimmermann@suse.de>

Thomas Zimmermann <tzimmermann@suse.de> writes:

> Plane updates no longer require the old plane state. Remove the field
> from struct vmwgfx_du_update_plane and fix all callers.
>
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> Reviewed-by: Zack Rusin <zack.rusin@broadcom.com>
> ---

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat


^ permalink raw reply

* Re: [PATCH v4 09/10] drm/damage-helper: Rename state parameters in damage helpers
From: Javier Martinez Canillas @ 2026-06-01 10:29 UTC (permalink / raw)
  To: Thomas Zimmermann, mripard, maarten.lankhorst, airlied, airlied,
	simona, admin, gargaditya08, paul, jani.nikula, mhklinux,
	zack.rusin, bcm-kernel-feedback-list
  Cc: dri-devel, linux-hyperv, intel-gfx, intel-xe, linux-mips,
	virtualization, Thomas Zimmermann
In-Reply-To: <20260530185716.65688-10-tzimmermann@suse.de>

Thomas Zimmermann <tzimmermann@suse.de> writes:

> Rename some of the state parameters of the damage-helper functions to
> align them with each other and other helpers. No functional changes.
>
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> Acked-by: Zack Rusin <zack.rusin@broadcom.com>
> ---

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat


^ permalink raw reply

* Re: [PATCH v4 08/10] drm/damage-helper: Remove old state from drm_atomic_helper_damage_merged()
From: Javier Martinez Canillas @ 2026-06-01 10:29 UTC (permalink / raw)
  To: Thomas Zimmermann, mripard, maarten.lankhorst, airlied, airlied,
	simona, admin, gargaditya08, paul, jani.nikula, mhklinux,
	zack.rusin, bcm-kernel-feedback-list
  Cc: dri-devel, linux-hyperv, intel-gfx, intel-xe, linux-mips,
	virtualization, Thomas Zimmermann
In-Reply-To: <20260530185716.65688-9-tzimmermann@suse.de>

Thomas Zimmermann <tzimmermann@suse.de> writes:

> Nothing in drm_atomic_helper_damage_merged() requires the old
> plane state. Remove the parameter and mass-convert callers.
>
> Most callers now no longer require the old plane state in their plane's
> atomic_update helper. Remove it as well.
>
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> Acked-by: Zack Rusin <zack.rusin@broadcom.com>
> ---

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat


^ permalink raw reply

* Re: [PATCH v4 07/10] drm/damage-helper: Remove old state from drm_atomic_helper_damage_iter_init()
From: Javier Martinez Canillas @ 2026-06-01 10:28 UTC (permalink / raw)
  To: Thomas Zimmermann, mripard, maarten.lankhorst, airlied, airlied,
	simona, admin, gargaditya08, paul, jani.nikula, mhklinux,
	zack.rusin, bcm-kernel-feedback-list
  Cc: dri-devel, linux-hyperv, intel-gfx, intel-xe, linux-mips,
	virtualization, Thomas Zimmermann
In-Reply-To: <20260530185716.65688-8-tzimmermann@suse.de>

Thomas Zimmermann <tzimmermann@suse.de> writes:

> Nothing in drm_atomic_helper_damage_iter_init() requires the old
> plane state. Remove the parameter and mass-convert callers.
>
> Most callers now no longer require the old plane state in their plane's
> atomic_update helper. Remove it as well.
>
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> Acked-by: Zack Rusin <zack.rusin@broadcom.com>
> ---

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat


^ permalink raw reply

* [PATCH net v3] net: mana: Optimize irq affinity for low vcpu configs
From: Shradha Gupta @ 2026-06-01 10:27 UTC (permalink / raw)
  To: Dexuan Cui, Wei Liu, Haiyang Zhang, K. Y. Srinivasan, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Konstantin Taranov, Simon Horman, Erni Sri Satya Vennela,
	Dipayaan Roy, Shiraz Saleem, Michael Kelley, Long Li, Yury Norov
  Cc: Shradha Gupta, linux-hyperv, linux-kernel, netdev, Paul Rosswurm,
	Shradha Gupta, Saurabh Singh Sengar, stable

In mana driver, the number of IRQs allocated is capped by the
min(num_cpu + 1, queue count). In cases, where the IRQ count is greater
than the vcpu count, we want to utilize all the vCPUs, irrespective of
their NUMA/core bindings.

This is important, especially in the envs where number of vCPUs are so
few that the softIRQ handling overhead on two IRQs on the same vCPU is
much more than their overheads if they were spread across sibling vCPUs.

This behaviour is more evident with dynamic IRQ allocation. Since MANA
IRQs are assigned at a later stage compared to static allocation, other
device IRQs may already be affinitized to the vCPUs. As a result, IRQ
weights become imbalanced, causing multiple MANA IRQs to land on the
same vCPU, while some vCPUs have none.

In such cases when many parallel TCP connections are tested, the
throughput drops significantly.

Test envs:
=======================================================
Case 1: without this patch
=======================================================
4 vcpu(2 cores), 5 MANA IRQs (1 HWC + 4 Queue)

	TYPE		effective vCPU aff
=======================================================
IRQ0:	HWC		0
IRQ1:	mana_q1		0
IRQ2:	mana_q2		2
IRQ3:	mana_q3		0
IRQ4:	mana_q4		3

%soft on each vCPU(mpstat -P ALL 1) on receiver
vCPU		0	1	2	3
=======================================================
pass 1:		38.85	0.03	24.89	24.65
pass 2:		39.15	0.03	24.57	25.28
pass 3:		40.36	0.03	23.20	23.17

=======================================================
Case 2: with this patch
=======================================================
4 vcpu(2 cores), 5 MANA IRQs (1 HWC + 4 Queue)

        TYPE            effective vCPU aff
=======================================================
IRQ0:   HWC             0
IRQ1:   mana_q1         0
IRQ2:   mana_q2         1
IRQ3:   mana_q3         2
IRQ4:   mana_q4         3

%soft on each vCPU(mpstat -P ALL 1) on receiver
vCPU            0       1       2       3
=======================================================
pass 1:         15.42	15.85	14.99	14.51
pass 2:         15.53	15.94	15.81	15.93
pass 3:         16.41	16.35	16.40	16.36

=======================================================
Throughput Impact(in Gbps, same env)
=======================================================
TCP conn	with patch	w/o patch
20480		15.65		7.73
10240		15.63		8.93
8192		15.64		9.69
6144		15.64		13.16
4096		15.69		15.75
2048		15.69		15.83
1024		15.71		15.28

Fixes: 755391121038 ("net: mana: Allocate MSI-X vectors dynamically")
Cc: stable@vger.kernel.org
Co-developed-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
---
Changes in v3
 * Optimize the comments in mana_gd_setup_dyn_irqs()
 * add more details in the dev_dbg for extra IRQs 
---
Changes in v2
 * Removed the unused skip_first_cpu variable
 * fixed exit condition in irq_setup_linear() with len == 0
 * changed return type of irq_setup_linear() as it will always be 0
 * removed the unnecessary rcu_read_lock() in irq_setup_linear()
 * added appropriate comments to indicate expected behaviour when
   IRQs are more than or equal to num_online_cpus()
---
 .../net/ethernet/microsoft/mana/gdma_main.c   | 60 ++++++++++++++++---
 1 file changed, 53 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 712a0881d720..00a28b3ca0a6 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -197,6 +197,8 @@ static int mana_gd_query_max_resources(struct pci_dev *pdev)
 	} else {
 		/* If dynamic allocation is enabled we have already allocated
 		 * hwc msi
+		 * Also, we make sure in this case the following is always true
+		 * (num_msix_usable - 1 HWC) <= num_online_cpus()
 		 */
 		gc->num_msix_usable = min(resp.max_msix, num_online_cpus() + 1);
 	}
@@ -1717,11 +1719,24 @@ static int irq_setup(unsigned int *irqs, unsigned int len, int node,
 	return 0;
 }
 
+/* should be called with cpus_read_lock() held */
+static void irq_setup_linear(unsigned int *irqs, unsigned int len)
+{
+	int cpu;
+
+	for_each_online_cpu(cpu) {
+		if (len == 0)
+			break;
+
+		irq_set_affinity_and_hint(*irqs++, cpumask_of(cpu));
+		len--;
+	}
+}
+
 static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
 {
 	struct gdma_context *gc = pci_get_drvdata(pdev);
 	struct gdma_irq_context *gic;
-	bool skip_first_cpu = false;
 	int *irqs, irq, err, i;
 
 	irqs = kmalloc_objs(int, nvec);
@@ -1729,6 +1744,8 @@ static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
 		return -ENOMEM;
 
 	/*
+	 * In this function, num_msix_usable = HWC IRQ + Queue IRQ.
+	 * nvec is only Queue IRQ (HWC already setup).
 	 * While processing the next pci irq vector, we start with index 1,
 	 * as IRQ vector at index 0 is already processed for HWC.
 	 * However, the population of irqs array starts with index 0, to be
@@ -1767,13 +1784,42 @@ static int mana_gd_setup_dyn_irqs(struct pci_dev *pdev, int nvec)
 	 * first CPU sibling group since they are already affinitized to HWC IRQ
 	 */
 	cpus_read_lock();
-	if (gc->num_msix_usable <= num_online_cpus())
-		skip_first_cpu = true;
+	if (gc->num_msix_usable <= num_online_cpus()) {
+		err = irq_setup(irqs, nvec, gc->numa_node, true);
+		if (err) {
+			cpus_read_unlock();
+			goto free_irq;
+		}
+	} else {
+		/*
+		 * When num_msix_usable are more than num_online_cpus, our
+		 * queue IRQs should be equal to num of online vCPUs.
+		 * We try to make sure queue IRQs spread across all vCPUs.
+		 * In such a case NUMA or CPU core affinity does not matter.
+		 * Note: in this case the total mana IRQ should always be
+		 * num_online_cpus + 1. The first HWC IRQ is already handled
+		 * in HWC setup calls
+		 * However, if CPUs went offline since num_msix_usable was
+		 * computed, queue IRQs will be more than num_online_cpus().
+		 * In such cases remaining extra IRQs will retain their default
+		 * affinity.
+		 */
+		int first_unassigned = num_online_cpus();
+		if (nvec > first_unassigned) {
+			char buf[32];
+
+			if (first_unassigned == nvec - 1)
+				snprintf(buf, sizeof(buf), "%d",
+					 first_unassigned);
+			else
+				snprintf(buf, sizeof(buf), "%d-%d",
+					 first_unassigned, nvec - 1);
+
+			dev_dbg(&pdev->dev,
+				"MANA IRQ indices #%s will retain the default CPU affinity\n", buf);
+		}
 
-	err = irq_setup(irqs, nvec, gc->numa_node, skip_first_cpu);
-	if (err) {
-		cpus_read_unlock();
-		goto free_irq;
+		irq_setup_linear(irqs, nvec);
 	}
 
 	cpus_read_unlock();

base-commit: 8415598365503ced2e3d019491b0a2756c85c494
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH v4 06/10] drm/damage-helper: Test src coord in drm_atomic_helper_check_plane_damage()
From: Javier Martinez Canillas @ 2026-06-01 10:27 UTC (permalink / raw)
  To: Thomas Zimmermann, mripard, maarten.lankhorst, airlied, airlied,
	simona, admin, gargaditya08, paul, jani.nikula, mhklinux,
	zack.rusin, bcm-kernel-feedback-list
  Cc: dri-devel, linux-hyperv, intel-gfx, intel-xe, linux-mips,
	virtualization, Thomas Zimmermann
In-Reply-To: <20260530185716.65688-7-tzimmermann@suse.de>

Thomas Zimmermann <tzimmermann@suse.de> writes:

> Planes require a full update if the source coordinates change across
> atomic commits. Evaluate this during the atomic-check and set the flag
> ignore_damage_clips in the plane state, if so. Remove the check from
> drm_atomic_helper_damage_iter_init().
>
> This will help with removing the old state from the atomic-commit phase
> and simplify atomic_update helpers a bit.
>
> Several unit tests check against the change of the src coordinate. Drop
> them as they do no longer serve a purpose. If the src coordinate changes
> across commits, atomic helpers will set the plane state's
> ignore_damage_clips flag, for which a separate unit test exists.
>
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> Acked-by: Zack Rusin <zack.rusin@broadcom.com>
> ---

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat


^ permalink raw reply

* Re: [PATCH v4 05/10] drm/atomic_helper: Do not evaluate plane damage before atomic_check
From: Javier Martinez Canillas @ 2026-06-01 10:22 UTC (permalink / raw)
  To: Thomas Zimmermann, mripard, maarten.lankhorst, airlied, airlied,
	simona, admin, gargaditya08, paul, jani.nikula, mhklinux,
	zack.rusin, bcm-kernel-feedback-list
  Cc: dri-devel, linux-hyperv, intel-gfx, intel-xe, linux-mips,
	virtualization, Thomas Zimmermann
In-Reply-To: <20260530185716.65688-6-tzimmermann@suse.de>

Thomas Zimmermann <tzimmermann@suse.de> writes:

> Remove the call to drm_atomic_helper_check_plane_damage() from before
> calling the atomic_check helpers. The call has no longer any purpose,
> as the actual evaluation happens after running atomic_check.
>
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> Acked-by: Zack Rusin <zack.rusin@broadcom.com>
> ---

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat


^ permalink raw reply

* Re: [PATCH v4 04/10] drm/appletbdrm: Allocate request/response buffers in begin_fb_access
From: Javier Martinez Canillas @ 2026-06-01 10:21 UTC (permalink / raw)
  To: Thomas Zimmermann, mripard, maarten.lankhorst, airlied, airlied,
	simona, admin, gargaditya08, paul, jani.nikula, mhklinux,
	zack.rusin, bcm-kernel-feedback-list
  Cc: dri-devel, linux-hyperv, intel-gfx, intel-xe, linux-mips,
	virtualization, Thomas Zimmermann
In-Reply-To: <20260530185716.65688-5-tzimmermann@suse.de>

Thomas Zimmermann <tzimmermann@suse.de> writes:

> In atomic_check, damage handling is not fully evaluated. Another
> atomic_check helper could trigger a full modeset and thus invalidate
> damage clips.
>
> Allocation of the request/response buffers in appletbdrm depends on
> correct damage information. Otherwise it might allocate incorrectly
> sized buffers. Allocate the buffers in the driver's begin_fb_access
> helper. It runs early during the commit when damage clipping has been
> fully evaluated.
>
> v2:
> - allocate before drm_gem_begin_shadow_fb_access() to avoid leak on error

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat


^ permalink raw reply

* Re: [PATCH v4 03/10] drm/ingenic: Remove calls to drm_atomic_helper_check_plane_damage()
From: Javier Martinez Canillas @ 2026-06-01 10:20 UTC (permalink / raw)
  To: Thomas Zimmermann, mripard, maarten.lankhorst, airlied, airlied,
	simona, admin, gargaditya08, paul, jani.nikula, mhklinux,
	zack.rusin, bcm-kernel-feedback-list
  Cc: dri-devel, linux-hyperv, intel-gfx, intel-xe, linux-mips,
	virtualization, Thomas Zimmermann
In-Reply-To: <20260530185716.65688-4-tzimmermann@suse.de>

Thomas Zimmermann <tzimmermann@suse.de> writes:

> Atomic helpers call drm_atomic_helper_check_plane_damage() after the
> atomic_check anyway. See atomic_helper_check_planes(). Remove the calls
> from the planes' atomic_check.
>
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> Acked-by: Zack Rusin <zack.rusin@broadcom.com>
> ---

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat


^ permalink raw reply

* Re: [PATCH v4 02/10] drm/atomic-helpers: Evaluate plane damage after atomic_check
From: Javier Martinez Canillas @ 2026-06-01 10:19 UTC (permalink / raw)
  To: Thomas Zimmermann, mripard, maarten.lankhorst, airlied, airlied,
	simona, admin, gargaditya08, paul, jani.nikula, mhklinux,
	zack.rusin, bcm-kernel-feedback-list
  Cc: dri-devel, linux-hyperv, intel-gfx, intel-xe, linux-mips,
	virtualization, Thomas Zimmermann
In-Reply-To: <20260530185716.65688-3-tzimmermann@suse.de>

Thomas Zimmermann <tzimmermann@suse.de> writes:

> Each plane's and CRTC's atomic_check might trigger a full modeset. As
> this affects the plane's damage handling, evaluate damage clips after
> running the atomic_check helpers.
>
> Examples can be found in a number of drivers, such as ast, gud, ingenic,
> mgag200 or vmwgfx, which all set mode_changed in the CRTC state to true.
> Ingenic even re-evaluates damage information in its plane's atomic_check.
> Doing this after the atomic_check helpers ran benefits all drivers.
>
> There's already a damage evaluation before the calls to atomic_check.
> With a few fixes to drivers, this can be removed.
>
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> Acked-by: Zack Rusin <zack.rusin@broadcom.com>
> ---
>  drivers/gpu/drm/drm_atomic_helper.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index 51f39edc31ed..4c37299e8ccb 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1065,6 +1065,10 @@ drm_atomic_helper_check_planes(struct drm_device *dev,
>  		}
>  	}
>  
> +	for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) {
> +		drm_atomic_helper_check_plane_damage(state, new_plane_state);
> +	}
> +

I wonder if it's worth to mention this in the drm_atomic_helper_check_planes()
function kernel-doc comment. But regardless, the change makes sense to me:

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat


^ permalink raw reply

* Re: [PATCH v4 01/10] drm/damage-helper: Do not alter damage clips on modeset, but ignore them
From: Javier Martinez Canillas @ 2026-06-01 10:16 UTC (permalink / raw)
  To: Thomas Zimmermann, mripard, maarten.lankhorst, airlied, airlied,
	simona, admin, gargaditya08, paul, jani.nikula, mhklinux,
	zack.rusin, bcm-kernel-feedback-list
  Cc: dri-devel, linux-hyperv, intel-gfx, intel-xe, linux-mips,
	virtualization, Thomas Zimmermann, stable
In-Reply-To: <20260530185716.65688-2-tzimmermann@suse.de>

Thomas Zimmermann <tzimmermann@suse.de> writes:

Hello Thomas,

> User space supplies rectangles for damage clipping in a plane property.
> For full mode sets, drivers still require a full plane update. In this
> case, leave the information as-is and set the ignore_damage_clips flag
> instead. The damage iterator will later ignore any damage information.
>
> Also fixes a bug where ignore_damage_clips was not cleared across plane-
> state duplications.
>
> Leaving the damage information as-is might be helpful to drivers that
> benefit from this information even on full modesets (e.g., for cache
> management). It will also help with consolidating the damage-handling
> logic.
>
> Also add a new unit test that evaluates the ignore_damage_clips flag. It
> sets two damage clips plus the flag and tests if the reported damage
> covers the entire framebuffer.
>
> v4:
> - slightly reword the commit description
>
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> Fixes: 35ed38d58257 ("drm: Allow drivers to indicate the damage helpers to ignore damage clips")
> Acked-by: Zack Rusin <zack.rusin@broadcom.com>
> Cc: dri-devel@lists.freedesktop.org
> Cc: <stable@vger.kernel.org> # v6.10+
> ---
>  drivers/gpu/drm/drm_atomic_state_helper.c     |  1 +
>  drivers/gpu/drm/drm_damage_helper.c           |  6 ++--
>  .../gpu/drm/tests/drm_damage_helper_test.c    | 28 +++++++++++++++++++
>  3 files changed, 31 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_atomic_state_helper.c b/drivers/gpu/drm/drm_atomic_state_helper.c
> index cc70508d4fdb..84d5231ccac1 100644
> --- a/drivers/gpu/drm/drm_atomic_state_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_state_helper.c
> @@ -359,6 +359,7 @@ void __drm_atomic_helper_plane_duplicate_state(struct drm_plane *plane,
>  	state->fence = NULL;
>  	state->commit = NULL;
>  	state->fb_damage_clips = NULL;
> +	state->ignore_damage_clips = false;
>  	state->color_mgmt_changed = false;
>  }

I would split this as a separate patch since is the bug you are fixing for
commit 35ed38d58257 ("drm: Allow drivers to indicate the damage helpers to
ignore damage clips").

>  EXPORT_SYMBOL(__drm_atomic_helper_plane_duplicate_state);
> diff --git a/drivers/gpu/drm/drm_damage_helper.c b/drivers/gpu/drm/drm_damage_helper.c
> index 74a7f4252ecf..945fac8dc27b 100644
> --- a/drivers/gpu/drm/drm_damage_helper.c
> +++ b/drivers/gpu/drm/drm_damage_helper.c
> @@ -78,10 +78,8 @@ void drm_atomic_helper_check_plane_damage(struct drm_atomic_commit *state,
>  		if (WARN_ON(!crtc_state))
>  			return;
>  
> -		if (drm_atomic_crtc_needs_modeset(crtc_state)) {
> -			drm_property_blob_put(plane_state->fb_damage_clips);
> -			plane_state->fb_damage_clips = NULL;
> -		}
> +		if (drm_atomic_crtc_needs_modeset(crtc_state))
> +			plane_state->ignore_damage_clips = true;
>  	}
>  }

This makes sense to me as well and I agree that re-using the flag for this
is better than making plane_state->fb_damage_clips == NULL the condition.

As mentioned though, I would make it a separate patch. Both changes look
good to me:

Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat


^ permalink raw reply

* RE: [PATCH net-next] net: mana: Add Interrupt Moderation support
From: Jagielski, Jedrzej @ 2026-06-01  9:39 UTC (permalink / raw)
  To: Haiyang Zhang, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Cui, Dexuan, Long Li, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Simon Horman,
	Shradha Gupta, Erni Sri Satya Vennela, Dipayaan Roy, Aditya Garg,
	Kees Cook, Breno Leitao, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org
  Cc: paulros@microsoft.com
In-Reply-To: <20260530194957.1690459-1-haiyangz@linux.microsoft.com>

From: Haiyang Zhang <haiyangz@linux.microsoft.com> 
Sent: Saturday, May 30, 2026 9:50 PM

>From: Haiyang Zhang <haiyangz@microsoft.com>
>
>Add Static and Dynamic Interrupt Moderation (DIM) support for
>Rx and Tx.
>Update queue creation procedure with new data struct with the related
>settings.
>Add functions to collect stat for DIM, and workers to update DIM data
>and settings.
>Update ethtool handler to get/set the moderation settings from a user.
>By default, adaptive-rx/tx (DIM) are enabled.
>
>Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
>---
> drivers/net/ethernet/microsoft/Kconfig        |   1 +
> .../net/ethernet/microsoft/mana/gdma_main.c   |  27 ++++
> drivers/net/ethernet/microsoft/mana/mana_en.c | 101 ++++++++++++++-
> .../ethernet/microsoft/mana/mana_ethtool.c    | 120 +++++++++++++++++-
> include/net/mana/gdma.h                       |  24 +++-
> include/net/mana/mana.h                       |  42 ++++++
> 6 files changed, 309 insertions(+), 6 deletions(-)
>
>diff --git a/drivers/net/ethernet/microsoft/Kconfig b/drivers/net/ethernet/microsoft/Kconfig
>index 3f36ee6a8ece..e9be18c92ca5 100644
>--- a/drivers/net/ethernet/microsoft/Kconfig
>+++ b/drivers/net/ethernet/microsoft/Kconfig
>@@ -21,6 +21,7 @@ config MICROSOFT_MANA
> 	depends on X86_64 || (ARM64 && !CPU_BIG_ENDIAN)
> 	depends on PCI_HYPERV
> 	select AUXILIARY_BUS
>+	select DIMLIB
> 	select PAGE_POOL
> 	select NET_SHAPER
> 	help
>diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
>index 712a0881d720..5aa0ea794a00 100644
>--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
>+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
>@@ -405,6 +405,7 @@ static int mana_gd_disable_queue(struct gdma_queue *queue)
> #define DOORBELL_OFFSET_RQ	0x400
> #define DOORBELL_OFFSET_CQ	0x800
> #define DOORBELL_OFFSET_EQ	0xFF8
>+#define DOORBELL_OFFSET_DIM	0x820
> 
> static void mana_gd_ring_doorbell(struct gdma_context *gc, u32 db_index,
> 				  enum gdma_queue_type q_type, u32 qid,
>@@ -445,6 +446,16 @@ static void mana_gd_ring_doorbell(struct gdma_context *gc, u32 db_index,
> 		addr += DOORBELL_OFFSET_SQ;
> 		break;
> 
>+	case GDMA_DIM:
>+		e.dim.id = qid;
>+		e.dim.mod_usec = tail_ptr;
>+		e.dim.mod_usec_vld = tail_ptr >> 15;
>+		e.dim.mod_comps = tail_ptr >> 16;

please use defines instead of magic

>+		e.dim.mod_comps_vld = num_req;
>+
>+		addr += DOORBELL_OFFSET_DIM;
>+		break;
>+
> 	default:
> 		WARN_ON(1);
> 		return;
>@@ -479,6 +490,22 @@ void mana_gd_ring_cq(struct gdma_queue *cq, u8 arm_bit)
> }
> EXPORT_SYMBOL_NS(mana_gd_ring_cq, "NET_MANA");
> 
>+void mana_gd_ring_dim(struct gdma_queue *cq, u32 mod_usec, bool mod_usec_vld,
>+		      u32 mod_comps, bool mod_comps_vld)
>+{
>+	struct gdma_context *gc = cq->gdma_dev->gdma_context;
>+	u32 dim_val;
>+
>+	/* Convert the DIM values to doorbell parameters */
>+	dim_val = (mod_usec & MANA_INTR_MODR_USEC_MAX) |
>+		  (((u32)mod_usec_vld & 1) << 15) |
>+		  ((mod_comps & MANA_INTR_MODR_COMP_MAX) << 16);

i believe FIELD_PREP if preferrable in such cases

>+
>+	mana_gd_ring_doorbell(gc, cq->gdma_dev->doorbell, GDMA_DIM, cq->id,
>+			      dim_val, (u8)mod_comps_vld & 1);
>+}
>+EXPORT_SYMBOL_NS(mana_gd_ring_dim, "NET_MANA");
>+
> #define MANA_SERVICE_PERIOD 10
> 
> static void mana_serv_rescan(struct pci_dev *pdev)
>diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
>index 82f1461a48e9..f1a16f8aca66 100644
>--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
>+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
>@@ -1551,6 +1551,9 @@ int mana_create_wq_obj(struct mana_port_context *apc,
> 
> 	mana_gd_init_req_hdr(&req.hdr, MANA_CREATE_WQ_OBJ,
> 			     sizeof(req), sizeof(resp));
>+
>+	req.hdr.req.msg_version = GDMA_MESSAGE_V3;
>+	req.hdr.resp.msg_version = GDMA_MESSAGE_V2;
> 	req.vport = vport;
> 	req.wq_type = wq_type;
> 	req.wq_gdma_region = wq_spec->gdma_region;
>@@ -1559,6 +1562,9 @@ int mana_create_wq_obj(struct mana_port_context *apc,
> 	req.cq_size = cq_spec->queue_size;
> 	req.cq_moderation_ctx_id = cq_spec->modr_ctx_id;
> 	req.cq_parent_qid = cq_spec->attached_eq;
>+	req.req_cq_moderation = cq_spec->req_cq_moderation;
>+	req.cq_moderation_comp = cq_spec->cq_moderation_comp;
>+	req.cq_moderation_usec = cq_spec->cq_moderation_usec;
> 
> 	err = mana_send_request(apc->ac, &req, sizeof(req), &resp,
> 				sizeof(resp));
>@@ -2253,6 +2259,66 @@ static void mana_poll_rx_cq(struct mana_cq *cq)
> 		xdp_do_flush();
> }
> 
>+static void mana_rx_dim_work(struct work_struct *work)
>+{
>+	struct dim *dim = container_of(work, struct dim, work);
>+	struct mana_cq *cq = container_of(dim, struct mana_cq, dim);
>+	struct dim_cq_moder cur_moder =
>+		net_dim_get_rx_moderation(dim->mode, dim->profile_ix);

RCT; here and for following

>+
>+	cur_moder.usec = min_t(u16, cur_moder.usec, MANA_INTR_MODR_USEC_MAX);
>+	cur_moder.pkts = min_t(u16, cur_moder.pkts, MANA_INTR_MODR_COMP_MAX);
>+
>+	mana_gd_ring_dim(cq->gdma_cq, cur_moder.usec, true,
>+			 cur_moder.pkts, true);
>+
>+	dim->state = DIM_START_MEASURE;
>+}
>+
>+static void mana_tx_dim_work(struct work_struct *work)
>+{
>+	struct dim *dim = container_of(work, struct dim, work);
>+	struct mana_cq *cq = container_of(dim, struct mana_cq, dim);
>+	struct dim_cq_moder cur_moder =
>+		net_dim_get_tx_moderation(dim->mode, dim->profile_ix);
>+
>+	cur_moder.usec = min_t(u16, cur_moder.usec, MANA_INTR_MODR_USEC_MAX);
>+	cur_moder.pkts = min_t(u16, cur_moder.pkts, MANA_INTR_MODR_COMP_MAX);
>+
>+	mana_gd_ring_dim(cq->gdma_cq, cur_moder.usec, true,
>+			 cur_moder.pkts, true);
>+
>+	dim->state = DIM_START_MEASURE;
>+}
>+
>+static void mana_update_rx_dim(struct mana_cq *cq)
>+{
>+	struct mana_rxq *rxq = cq->rxq;
>+	struct mana_port_context *apc = netdev_priv(rxq->ndev);
>+	struct dim_sample dim_sample = {};
>+
>+	if (!apc->rx_dim_enabled)
>+		return;
>+
>+	dim_update_sample(READ_ONCE(cq->dim_event_ctr), rxq->stats.packets,
>+			  rxq->stats.bytes, &dim_sample);
>+	net_dim(&cq->dim, &dim_sample);
>+}
>+
>+static void mana_update_tx_dim(struct mana_cq *cq)
>+{
>+	struct mana_txq *txq = cq->txq;
>+	struct mana_port_context *apc = netdev_priv(txq->ndev);
>+	struct dim_sample dim_sample = {};
>+
>+	if (!apc->tx_dim_enabled)
>+		return;
>+
>+	dim_update_sample(READ_ONCE(cq->dim_event_ctr), txq->stats.packets,
>+			  txq->stats.bytes, &dim_sample);
>+	net_dim(&cq->dim, &dim_sample);
>+}
>+
> static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
> {
> 	struct mana_cq *cq = context;
>@@ -2271,7 +2337,13 @@ static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue)
> 	if (w < cq->budget) {
> 		mana_gd_ring_cq(gdma_queue, SET_ARM_BIT);
> 		cq->work_done_since_doorbell = 0;
>-		napi_complete_done(&cq->napi, w);
>+
>+		if (napi_complete_done(&cq->napi, w)) {
>+			if (cq->type == MANA_CQ_TYPE_RX)
>+				mana_update_rx_dim(cq);
>+			else
>+				mana_update_tx_dim(cq);
>+		}
> 	} else if (cq->work_done_since_doorbell >=
> 		   (cq->gdma_cq->queue_size / COMP_ENTRY_SIZE) * 4) {
> 		/* MANA hardware requires at least one doorbell ring every 8
>@@ -2303,6 +2375,7 @@ static void mana_schedule_napi(void *context, struct gdma_queue *gdma_queue)
> {
> 	struct mana_cq *cq = context;
> 
>+	WRITE_ONCE(cq->dim_event_ctr, cq->dim_event_ctr + 1);
> 	napi_schedule_irqoff(&cq->napi);
> }
> 
>@@ -2345,6 +2418,7 @@ static void mana_destroy_txq(struct mana_port_context *apc)
> 		if (apc->tx_qp[i]->txq.napi_initialized) {
> 			napi_synchronize(napi);
> 			napi_disable_locked(napi);
>+			cancel_work_sync(&apc->tx_qp[i]->tx_cq.dim.work);
> 			netif_napi_del_locked(napi);
> 			apc->tx_qp[i]->txq.napi_initialized = false;
> 		}
>@@ -2475,6 +2549,10 @@ static int mana_create_txq(struct mana_port_context *apc,
> 		cq_spec.queue_size = cq->gdma_cq->queue_size;
> 		cq_spec.modr_ctx_id = 0;
> 		cq_spec.attached_eq = cq->gdma_cq->cq.parent->id;
>+		cq_spec.req_cq_moderation = apc->tx_dim_enabled ||
>+			(apc->intr_modr_tx_usec && apc->intr_modr_tx_comp);
>+		cq_spec.cq_moderation_usec = apc->intr_modr_tx_usec;
>+		cq_spec.cq_moderation_comp = apc->intr_modr_tx_comp;
> 
> 		err = mana_create_wq_obj(apc, apc->port_handle, GDMA_SQ,
> 					 &wq_spec, &cq_spec,
>@@ -2509,6 +2587,9 @@ static int mana_create_txq(struct mana_port_context *apc,
> 		napi_enable_locked(&cq->napi);
> 		txq->napi_initialized = true;
> 
>+		INIT_WORK(&cq->dim.work, mana_tx_dim_work);
>+		cq->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
>+
> 		mana_gd_ring_cq(cq->gdma_cq, SET_ARM_BIT);
> 	}
> 
>@@ -2543,6 +2624,7 @@ static void mana_destroy_rxq(struct mana_port_context *apc,
> 		napi_synchronize(napi);
> 
> 		napi_disable_locked(napi);
>+		cancel_work_sync(&rxq->rx_cq.dim.work);
> 		netif_napi_del_locked(napi);
> 	}
> 
>@@ -2780,6 +2862,10 @@ static struct mana_rxq *mana_create_rxq(struct mana_port_context *apc,
> 	cq_spec.queue_size = cq->gdma_cq->queue_size;
> 	cq_spec.modr_ctx_id = 0;
> 	cq_spec.attached_eq = cq->gdma_cq->cq.parent->id;
>+	cq_spec.req_cq_moderation = apc->rx_dim_enabled ||
>+		(apc->intr_modr_rx_usec && apc->intr_modr_rx_comp);
>+	cq_spec.cq_moderation_usec = apc->intr_modr_rx_usec;
>+	cq_spec.cq_moderation_comp = apc->intr_modr_rx_comp;
> 
> 	err = mana_create_wq_obj(apc, apc->port_handle, GDMA_RQ,
> 				 &wq_spec, &cq_spec, &rxq->rxobj);
>@@ -2815,6 +2901,9 @@ static struct mana_rxq *mana_create_rxq(struct mana_port_context *apc,
> 
> 	napi_enable_locked(&cq->napi);
> 
>+	INIT_WORK(&cq->dim.work, mana_rx_dim_work);
>+	cq->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
>+
> 	mana_gd_ring_cq(cq->gdma_cq, SET_ARM_BIT);
> out:
> 	if (!err)
>@@ -3432,6 +3521,16 @@ static int mana_probe_port(struct mana_context *ac, int port_idx,
> 	apc->port_idx = port_idx;
> 	apc->cqe_coalescing_enable = 0;
> 
>+	/* Initialize interrupt moderation settings if supported by HW */
>+	if (gc->pf_cap_flags1 & GDMA_PF_CAP_FLAG_1_DYN_INTERRUPT_MODERATION) {
>+		apc->intr_modr_rx_usec = MANA_INTR_MODR_USEC_DEF;
>+		apc->intr_modr_rx_comp = MANA_INTR_MODR_COMP_DEF;
>+		apc->intr_modr_tx_usec = MANA_INTR_MODR_USEC_DEF;
>+		apc->intr_modr_tx_comp = MANA_INTR_MODR_COMP_DEF;
>+		apc->rx_dim_enabled = MANA_ADAPTIVE_RX_DEF;
>+		apc->tx_dim_enabled = MANA_ADAPTIVE_TX_DEF;
>+	}
>+
> 	mutex_init(&apc->vport_mutex);
> 	apc->vport_use_count = 0;
> 
>diff --git a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
>index 04350973e19e..a90216eba794 100644
>--- a/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
>+++ b/drivers/net/ethernet/microsoft/mana/mana_ethtool.c
>@@ -419,6 +419,15 @@ static int mana_get_coalesce(struct net_device *ndev,
> 	    !kernel_coal->rx_cqe_nsecs)
> 		kernel_coal->rx_cqe_nsecs = MANA_RX_CQE_NSEC_DEF;
> 
>+	ec->rx_coalesce_usecs = apc->intr_modr_rx_usec;
>+	ec->rx_max_coalesced_frames = apc->intr_modr_rx_comp;
>+
>+	ec->tx_coalesce_usecs = apc->intr_modr_tx_usec;
>+	ec->tx_max_coalesced_frames = apc->intr_modr_tx_comp;
>+
>+	ec->use_adaptive_rx_coalesce = apc->rx_dim_enabled;
>+	ec->use_adaptive_tx_coalesce = apc->tx_dim_enabled;
>+
> 	return 0;
> }
> 
>@@ -429,8 +438,28 @@ static int mana_set_coalesce(struct net_device *ndev,
> {
> 	struct mana_port_context *apc = netdev_priv(ndev);
> 	u8 saved_cqe_coalescing_enable;
>+	u16 old_rx_usec, old_rx_comp;
>+	u16 old_tx_usec, old_tx_comp;
>+	bool old_rx_dim, old_tx_dim;

how about using some sort of struct instead of declaring a number
of params for bookkeeping? imho would be cleaner

>+	bool modr_changed = false;
>+	bool dim_changed = false;
>+	struct gdma_context *gc;
> 	int err;
> 
>+	gc = apc->ac->gdma_dev->gdma_context;
>+
>+	/* Both static and dynamic interrupt moderation (DIM) rely on the
>+	 * same HW capability advertised by the PF.
>+	 */
>+	if ((ec->use_adaptive_rx_coalesce || ec->use_adaptive_tx_coalesce ||
>+	     ec->rx_coalesce_usecs || ec->tx_coalesce_usecs ||
>+	     ec->rx_max_coalesced_frames || ec->tx_max_coalesced_frames) &&
>+	    !(gc->pf_cap_flags1 & GDMA_PF_CAP_FLAG_1_DYN_INTERRUPT_MODERATION)) {
>+		NL_SET_ERR_MSG(extack,
>+			       "Interrupt Moderation is not supported by HW");
>+		return -EOPNOTSUPP;
>+	}
>+
> 	if (kernel_coal->rx_cqe_frames != 1 &&
> 	    kernel_coal->rx_cqe_frames != MANA_RXCOMP_OOB_NUM_PPI) {
> 		NL_SET_ERR_MSG_FMT(extack,
>@@ -440,6 +469,47 @@ static int mana_set_coalesce(struct net_device *ndev,
> 		return -EINVAL;
> 	}
> 
>+	if (ec->rx_coalesce_usecs > MANA_INTR_MODR_USEC_MAX ||
>+	    ec->tx_coalesce_usecs > MANA_INTR_MODR_USEC_MAX) {
>+		NL_SET_ERR_MSG_FMT(extack,
>+				   "coalesce usecs must be <= %u",
>+				   MANA_INTR_MODR_USEC_MAX);
>+		return -EINVAL;
>+	}
>+
>+	if (ec->rx_max_coalesced_frames > MANA_INTR_MODR_COMP_MAX ||
>+	    ec->tx_max_coalesced_frames > MANA_INTR_MODR_COMP_MAX) {
>+		NL_SET_ERR_MSG_FMT(extack,
>+				   "coalesce frames must be <= %u",
>+				   MANA_INTR_MODR_COMP_MAX);
>+		return -EINVAL;
>+	}
>+
>+	if (ec->rx_coalesce_usecs != apc->intr_modr_rx_usec ||
>+	    ec->rx_max_coalesced_frames != apc->intr_modr_rx_comp ||
>+	    ec->tx_coalesce_usecs != apc->intr_modr_tx_usec ||
>+	    ec->tx_max_coalesced_frames != apc->intr_modr_tx_comp)
>+		modr_changed = true;
>+
>+	old_rx_usec = apc->intr_modr_rx_usec;
>+	old_rx_comp = apc->intr_modr_rx_comp;
>+	old_tx_usec = apc->intr_modr_tx_usec;
>+	old_tx_comp = apc->intr_modr_tx_comp;
>+
>+	apc->intr_modr_rx_usec = ec->rx_coalesce_usecs;
>+	apc->intr_modr_rx_comp = ec->rx_max_coalesced_frames;
>+	apc->intr_modr_tx_usec = ec->tx_coalesce_usecs;
>+	apc->intr_modr_tx_comp = ec->tx_max_coalesced_frames;
>+
>+	if (!!ec->use_adaptive_rx_coalesce != apc->rx_dim_enabled ||
>+	    !!ec->use_adaptive_tx_coalesce != apc->tx_dim_enabled)
>+		dim_changed = true;
>+
>+	old_rx_dim = apc->rx_dim_enabled;
>+	old_tx_dim = apc->tx_dim_enabled;
>+	apc->rx_dim_enabled = !!ec->use_adaptive_rx_coalesce;
>+	apc->tx_dim_enabled = !!ec->use_adaptive_tx_coalesce;
>+
> 	saved_cqe_coalescing_enable = apc->cqe_coalescing_enable;
> 	apc->cqe_coalescing_enable =
> 		kernel_coal->rx_cqe_frames == MANA_RXCOMP_OOB_NUM_PPI;
>@@ -447,10 +517,46 @@ static int mana_set_coalesce(struct net_device *ndev,
> 	if (!apc->port_is_up)
> 		return 0;
> 
>-	err = mana_config_rss(apc, TRI_STATE_TRUE, false, false);
>-	if (err)
>-		apc->cqe_coalescing_enable = saved_cqe_coalescing_enable;
>+	if (apc->cqe_coalescing_enable != saved_cqe_coalescing_enable &&
>+	    !modr_changed && !dim_changed) {
>+		/* If only CQE coalescing setting is changed, we can just update
>+		 * RSS configuration.
>+		 */
>+		err = mana_config_rss(apc, TRI_STATE_TRUE, false, false);
>+		if (err) {
>+			netdev_err(ndev, "Change CQE coalescing failed: %d\n",
>+				   err);
>+			apc->cqe_coalescing_enable =
>+				saved_cqe_coalescing_enable;
>+			return err;
>+		}
>+		return 0;
>+	}
>+
>+	if (modr_changed || dim_changed) {
>+		err = mana_detach(ndev, false);
>+		if (err) {
>+			netdev_err(ndev, "mana_detach failed: %d\n", err);
>+			goto restore_modr;
>+		}
>+
>+		err = mana_attach(ndev);
>+		if (err) {
>+			netdev_err(ndev, "mana_attach failed: %d\n", err);
>+			goto restore_modr;

i see there is already such pattern in the mana code; how about
creating a helper?

>+		}
>+	}
>+
>+	return 0;
> 
>+restore_modr:
>+	apc->cqe_coalescing_enable = saved_cqe_coalescing_enable;
>+	apc->intr_modr_rx_usec = old_rx_usec;
>+	apc->intr_modr_rx_comp = old_rx_comp;
>+	apc->intr_modr_tx_usec = old_tx_usec;
>+	apc->intr_modr_tx_comp = old_tx_comp;
>+	apc->rx_dim_enabled = old_rx_dim;
>+	apc->tx_dim_enabled = old_tx_dim;
> 	return err;
> }
> 
>@@ -574,7 +680,13 @@ static int mana_get_link_ksettings(struct net_device *ndev,
> }
> 
> const struct ethtool_ops mana_ethtool_ops = {
>-	.supported_coalesce_params = ETHTOOL_COALESCE_RX_CQE_FRAMES,
>+	.supported_coalesce_params = ETHTOOL_COALESCE_RX_CQE_FRAMES |
>+				    ETHTOOL_COALESCE_RX_USECS |
>+				    ETHTOOL_COALESCE_RX_MAX_FRAMES |
>+				    ETHTOOL_COALESCE_TX_USECS |
>+				    ETHTOOL_COALESCE_TX_MAX_FRAMES |
>+				    ETHTOOL_COALESCE_USE_ADAPTIVE_RX |
>+				    ETHTOOL_COALESCE_USE_ADAPTIVE_TX,
> 	.get_ethtool_stats	= mana_get_ethtool_stats,
> 	.get_sset_count		= mana_get_sset_count,
> 	.get_strings		= mana_get_strings,
>diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h
>index 70d62bc32837..0a0cc7b080d3 100644
>--- a/include/net/mana/gdma.h
>+++ b/include/net/mana/gdma.h
>@@ -47,6 +47,7 @@ enum gdma_queue_type {
> 	GDMA_RQ,
> 	GDMA_CQ,
> 	GDMA_EQ,
>+	GDMA_DIM,
> };
> 
> enum gdma_work_request_flags {
>@@ -126,6 +127,17 @@ union gdma_doorbell_entry {
> 		u64 tail_ptr	: 31;
> 		u64 arm		: 1;
> 	} eq;
>+
>+	struct {
>+		u64 id           : 24;
>+		u64 reserved     : 8;
>+		u64 mod_usec     : 10;
>+		u64 reserve1     : 5;
>+		u64 mod_usec_vld : 1;
>+		u64 mod_comps    : 8;
>+		u64 reserve2     : 7;
>+		u64 mod_comps_vld: 1;
>+	} dim;
> }; /* HW DATA */
> 
> struct gdma_msg_hdr {
>@@ -484,6 +496,9 @@ void mana_gd_ring_cq(struct gdma_queue *cq, u8 arm_bit);
> 
> int mana_schedule_serv_work(struct gdma_context *gc, enum gdma_eqe_type type);
> 
>+void mana_gd_ring_dim(struct gdma_queue *cq, u32 mod_usec, bool mod_usec_vld,
>+		      u32 mod_comps, bool mod_comps_vld);
>+
> struct gdma_wqe {
> 	u32 reserved	:24;
> 	u32 last_vbytes	:8;
>@@ -629,6 +644,9 @@ enum {
> /* Driver supports self recovery on Hardware Channel timeouts */
> #define GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECOVERY BIT(25)
> 
>+/* Driver supports dynamic interrupt moderation - DIM */
>+#define GDMA_DRV_CAP_FLAG_1_DYN_INTERRUPT_MODERATION BIT(27)
>+
> #define GDMA_DRV_CAP_FLAGS1 \
> 	(GDMA_DRV_CAP_FLAG_1_EQ_SHARING_MULTI_VPORT | \
> 	 GDMA_DRV_CAP_FLAG_1_NAPI_WKDONE_FIX | \
>@@ -643,7 +661,8 @@ enum {
> 	 GDMA_DRV_CAP_FLAG_1_SKB_LINEARIZE | \
> 	 GDMA_DRV_CAP_FLAG_1_PROBE_RECOVERY | \
> 	 GDMA_DRV_CAP_FLAG_1_HANDLE_STALL_SQ_RECOVERY | \
>-	 GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECOVERY)
>+	 GDMA_DRV_CAP_FLAG_1_HWC_TIMEOUT_RECOVERY | \
>+	 GDMA_DRV_CAP_FLAG_1_DYN_INTERRUPT_MODERATION)
> 
> #define GDMA_DRV_CAP_FLAGS2 0
> 
>@@ -679,6 +698,9 @@ struct gdma_verify_ver_req {
> 	u8 os_ver_str4[128];
> }; /* HW DATA */
> 
>+/* HW supports dynamic interrupt moderation - DIM */
>+#define GDMA_PF_CAP_FLAG_1_DYN_INTERRUPT_MODERATION BIT(15)
>+
> struct gdma_verify_ver_resp {
> 	struct gdma_resp_hdr hdr;
> 	u64 gdma_protocol_ver;
>diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
>index d9c27310fd04..57868a79f23d 100644
>--- a/include/net/mana/mana.h
>+++ b/include/net/mana/mana.h
>@@ -4,6 +4,7 @@
> #ifndef _MANA_H
> #define _MANA_H
> 
>+#include <linux/dim.h>
> #include <net/xdp.h>
> #include <net/net_shaper.h>
> 
>@@ -64,6 +65,16 @@ enum TRI_STATE {
> /* Maximum number of packets per coalesced CQE */
> #define MANA_RXCOMP_OOB_NUM_PPI 4
> 
>+/* Default/max interrupt moderation settings */
>+#define MANA_INTR_MODR_USEC_DEF 0
>+#define MANA_INTR_MODR_COMP_DEF 0
>+
>+#define MANA_ADAPTIVE_RX_DEF true
>+#define MANA_ADAPTIVE_TX_DEF true
>+
>+#define MANA_INTR_MODR_USEC_MAX 1023
>+#define MANA_INTR_MODR_COMP_MAX 255

used as a limiter and mask - for mask case i believe
GENMASK cand be used


^ permalink raw reply

* Re: [PATCH v2 0/5] treewide: Convert buses to use generic driver_override
From: Danilo Krummrich @ 2026-06-01  0:07 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: gregkh, rafael, linux, nipun.gupta, nikhil.agarwal, kys, haiyangz,
	wei.liu, decui, longli, andersson, mathieu.poirier, driver-core,
	linux-kernel, linux-hyperv, linux-arm-msm, linux-remoteproc
In-Reply-To: <20260505133935.3772495-1-dakr@kernel.org>

On Tue,  5 May 2026 15:37:20 +0200, Danilo Krummrich wrote:
> [PATCH v2 0/5] treewide: Convert buses to use generic driver_override

Applied, thanks!

  Branch: driver-core-testing
  Tree:   git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core.git

[1/5] amba: use generic driver_override infrastructure
      commit: 1947229f5f2a
[2/5] cdx: use generic driver_override infrastructure
      commit: d541aa1897f6
[3/5] Drivers: hv: vmbus: use generic driver_override infrastructure
      commit: 331d8900121a
[4/5] rpmsg: use generic driver_override infrastructure
      commit: 55ced13c4292
[5/5] driver core: remove driver_set_override()
      commit: 46def663dd34

The patches will appear in the next linux-next integration (typically within 24
hours on weekdays).

The patches are in the driver-core-testing branch and will be promoted to
driver-core-next after validation.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox